Fabric enhances multicore processor interconnects

A fabric topology enables coherency, and concurrency

BY SANJAY DESHPANDE and STEVE COLE
Freescale Semiconductor
Austin, TX
http://freescale.com

Demand for ever-higher performance from embedded devices is driving the creation of microprocessors featuring multiple cores. But as core counts rise, shared bus architectures common in some multicore approaches are impeding high-performance levels due to fundamental bottleneck and latency issues. Using a single bus to interconnect all of a system’s components can simplify design, but it seriously impairs overall performance if the system has a large number of processors and other devices.

Multicore systems are characterized by high traffic levels from multiple resources. Performance and scalability can greatly depend on the robustness of system’s interconnect and its ability to optimally manage access to and from resources. Therefore, it is critical to leverage a scalable interconnect scheme that performs as well for dual core processors as it does for devices with 32 cores.

Bus interconnect limitations

Maintaining coherency among caches and memory is an important function of a multicore interconnect. Using a bus to connect all system components fulfills basic connectivity and makes achieving coherency simple. However, this architectural decision can create performance issues as the system scales.

A bus-based coherency protocol serializes all storage accesses with one storage transaction at a time. A shared bus-based protocol depends on broadcasting the transaction to all participating entities in a single bus cycle. As more entities are connected on the bus, the propagation delay on the bus increases because of increased capacitive load and the longer distances the signals have to travel—both of which grow linearly with the number of devices connected. Beyond a certain size, this factor dominates the bus cycle time. Increased cycle times in turn lower the bus’s bandwidth.

In a complex multicore system, there can be many devices simultaneously attempting to send request and response transactions to a multitude of resources or destinations. Forcing all transaction traffic to travel one transaction per clock across the already slow bus can quickly create a bottleneck that can add large queuing delays to transaction delivery. These delays grow very rapidly as the use rises with a resulting dramatic decrease in system performance.

In a bus topology, system transaction type is placed on the common bus, one transaction per clock. There is no differentiation between resources, which reduces the amount of bandwidth available for any one specific type of activity, such as coherence or interprocessor messaging. For example, transactions implying no coherency activity can deny the necessary bandwidth for transactions requiring coherency action.

Additionally, in a bus protocol each transaction is presented to all entities connected to the bus regardless of whether it is the destination or not. This means every transaction is snooped by every processor on the shared bus whether it needs to be or not. This unnecessary snooping can overburden and congest a processor’s coherency management pipeline. In a shared bus interconnect architecture, these problems only get worse as the system grows larger.

Fabric interconnects

A fabric topology offers many other benefits that are simply not possible in a single shared bus-based topology. For instance, the fabric is a point-to-point topology instead of a broadcast topology, so the fabric cycle time does not degrade as system size increases. A fabric topology also offers multiple paths for transactions between resources, increasing the available bandwidth and concurrency, and lowering system congestion.

Fabric enhances multicore processor interconnects

Fabric interconnect technology manages interaction among multiple cores and all system resources in a router application.

With a fabric you can provide separate paths for different types of activities. For instance, coherent transactions, noncoherent transactions, and message transactions may all flow along different paths, eliminating mutual interference of the three classes of transactions, and improving the service times for each class. With multicore processors a system is logically partitioned into independent subsystems, where each partition or application is hosted by one or more of the cores using either SMP or AMP models. For example, a multicore processor can run an integrated services router application using control plane, Layer 2, and Layer 3 routing, and Layer 4-7 enhanced services as subapplications.

The subapplications operate independently for the most part, and their transaction traffic rarely needs to intermingle. Using a fabric-based interconnection scheme honors this independence by providing a “transaction filtering” capability to logically isolate the partition traffic so that devices in one partition do not encounter the traffic of another partition.

Transaction filtering creates better inter-partition isolation, and as the number of partitionable services increases, it allows for more parallel interconnect processing. Traffic isolation also conserves processor snoop bandwidth, which is a scarce resource in a large, coherent multicore system.

Transaction filtering capabilities can be extended to create more secure communications systems, preventing unauthorized transactions from reaching protected devices. A fabric-based interconnect makes achieving this type of secure partitioning an easy option.

Of course, deploying fabrics may involve some tradeoffs. For example, when there is no competition for resources, as could be the case when there is little transaction traffic, an individual transaction can experience a somewhat higher latency in a fabric interconnect due to its pipelined nature. However, counteractive techniques such as out-of-order execution and larger caches can help address this issue. Also, a bus is constructed mainly of passive elements (namely wires), while fabrics employ more active logic to accomplish its myriad functions, and this can mean that connectivity within the chip requires more silicon area and power. However, feature size and power reductions associated with technology advances continue to help lower these impacts.

Choosing a scheme

Choosing one interconnection scheme over another can have a major impact on the overall performance, capabilities and scalability of a multicore system. A bus topology, while simple to design, is not very scalable. The larger the system scales, the more entities are connected to the bus. The wires are longer and the fanout is higher, resulting in longer propagation times and lower bus frequency. In addition, more entities are requesting service from the bus, lengthening queuing delays, and latencies. A fabric-oriented topology can be designed to remove these performance barriers. It can be inherently “pipeline-able,” removing paradoxes that complicate design, providing a higher operating frequency, and allowing asynchronous interfaces. It can support multiple and distributed coherency resolution structures and support multiple data paths.

Fabric enhances multicore processor interconnects

On-chip fabric plays a key role in multicore architectures, such as Freescale’s, that target communications markets.

A fabric-based multicore interconnect gives designers the opportunity to create systems that more efficiently deliver the inherent performance advantages of multicore solutions and be more easily scaled with additional cores and functionality. ■

See http://electronicproducts-com-develop.go-vip.net/digital.asp for more information on multicore processors.

Learn more about Freescale Semiconductor

Fabric enhances multicore processor interconnects

Leave a Reply Cancel reply

THE EDITOR'S PERSPECTIVE

Gina Roos

Automotive: evolving technologies and new innovations

Featured Videos

FOLLOW