Advertisement

DMA in PCIe Switches Boosts Performance in Demanding Applications

DMA in PCIe Switches Boosts Performance in Demanding Applications

by Miguel Rodriguez, PLX Technology

The concept of direct-memory access (DMA) isn’t new; for years now, processors and chipset makers have been implementing DMA as part of their efforts to increase system performance.

Miguel Rodriguez, PLX Technology

by Miguel Rodriguez, PLX Technology

DMA in PCIe Switches Boosts Performance in  Demanding Applications

The concept of direct-memory access (DMA) isn’t new; for years now, processors and chipset makers have been implementing DMA as part of their efforts to increase system performance.  Similarly, endpoints such as Ethernet and Fibre Channel have been implementing DMA to minimize protocol latencies and maximize data throughput.  However, DMA integrated into a PCI Express (PCIe) switch is revolutionary – with significant advantages for control planes, embedded systems, and storage applications.

A communications control plane is not bandwidth intensive; instead, it is latency sensitive.  Performance is not measured in Megabytes per second, as is the case in the data plane, but rather in how long a transaction takes to complete.  At a high level, the communications control plane is made up of a control card connected to a group of line cards.  Depending on the size of the system, the number of line cards can be as many as several dozen.  In a PCIe-based control plane, these line cards are connected to the control card through a PCIe switch.  Most operations performed through the control plane fall within one of three categories: configuration of the endpoints, status of the devices, and statistics gathering.  These operations to the line card are performed by the control processor in a serial manner.  That is, the processor writes to and/or reads from the line cards one at a time; often these transactions are to a large number of registers on memory space per line card.  This large processor overhead is only magnified in systems with a large number of line cards.  Low-latency PCIe switches used in these systems play their part by shortening the round-trip latency of the transactions.  However, for systems with a large number of line cards and/or high latency devices on the line cards, alternative methods are employed to reduce the latency.

In a control plane, there are significant advantages to designing in a PCIe switch with integrated DMA.  With its four channels, the DMA engine can be configured to perform the housekeeping tasks normally undertaken by the processor.  Furthermore, one channel can be dedicated to statistics-gathering while the other can be used for status updates from line cards.  Both DMA channels are independent and control for prioritization between them is programmable in the PCIe switch.  Moreover, each channel can be independently assigned a traffic class; thus differentiating traffic.  This approach not only off-loads the processor but also masks the system latency by issuing multiple transactions at one time.

Traffic patterns in embedded systems differ from those of a compute system.  In a compute system, traffic is primarily from system memory to endpoint, whereas an embedded system sees multiple traffic flows: from system memory to endpoints, and endpoint to endpoint (often referred to as peer-to-peer in PCIe parlance).  The DMA engine in the PCIe switch has no limitations on the direction of the transfers; it can initiate data transfers from system memory to the endpoint and/or endpoint to endpoint.  Ultimately, the DMA engine can significantly reduce the implementation and design time of the custom ASICs or FPGAs.  This in turn can result in lower development costs.

Storage is another market where DMA is often used.  DMA engines in the endpoint move large amounts of data to and from the system memory.  These storage systems consist of a processor, fabric and physical storage elements.  It is not unusual for the data stored in these systems to be sensitive and, therefore, require redundancy.  In a PCIe storage system, redundancy can be achieved through the use of non-transparency.  A non-transparent (NT) port isolates the address space of the primary system and the backup system.  Having a dedicated DMA engine, such as in PLX’s newest line of PCIe switches, allows data to be copied from the primary system to the secondary system independently and without the intervention of the processor or endpoints.  This ensures that the backup system contains an updated copy of the data should the primary system fail.  DMA-triggered interrupts can be optionally enabled in the DMA engine to notify the host upon the completion of a data transfer.

In conclusion, the benefits of DMA in general are very well understood.  However, the concept of DMA in a PCIe switch, while still new, is being embraced by system designers who’ve recognized the benefits in today’s applications.  Though very flexible and powerful, this concept is not aimed to fully replace DMA on the processor, but instead complement systems today by enhancing current DMA usage and provide additional cost savings by removing the DMA burden from the processor and/or endpoint.

About the Author

Miguel Rodriguez is senior product marketing engineer at PLX Technology (PLXTech) Sunnyvale, Calif.  He can be reached at mrodriguez@plxtech.com

Advertisement

Leave a Reply