Multicore Architecture Alternatives for Embedded SOC Design
By Grant Martin, Tensilica, Inc.
The concept of using multiple processors or processor cores to build systems has really taken on a life of its own lately. The au courant term is “multicore” although that term means different things to different people, much like the ancient story of the blind men and the elephant.
Grant Martin, Tensilica, Inc.
By Grant Martin, Tensilica, Inc.
The concept of using multiple processors or processor cores to build many consumer system products has really taken on a life of its own lately. The au courant term is “multicore” although that term means different things to different people, much like the ancient story of the blind men and the elephant. Currently, the multicore following falls into two great camps: SMP and AMP. However, these two camps may not sufficiently define the design space and at least two more camps are needed.
The SMP—“symmetric multiprocessor”—camp looks at the multicore processor as a collection of identical, homogeneous processors with a large, shared, coherent memory space. Each of the homogeneous processors has its own cache and cache coherency is maintained by hardware that usually implements a MESI (modified, exclusive, shared, invalid) protocol. The advantage of this approach is that it makes a linearly scalable pool of general-purpose processing resources available to a multitasking operating system and is therefore a good alternative to rising clock speed for general-purpose computing systems.
Although SMP designs have been around since computer mainframes ruled the earth, the migration of Intel and AMD x86 processors into the SMP realm has popularized the concept. IBM and Sun make SMP chips based on their own architectures and SOC designers can now get SMP multicore processors from major IP vendors. SMP systems currently garner the lion’s share of public discussion about multicore design.
The AMP—“asymmetric multiprocessor”—camp employs an entirely different architectural approach. The processors in an AMP system are generally heterogeneous. Instead of providing a pool of general-purpose computing resource, each processor in an AMP system is tuned to a target application set. The processors in an AMP system are therefore treated more like a block of dedicated hardware rather than a general-purpose computing machine. The individual processors in an AMP system may each be hand designed, the traditional approach to processor design, or they may be developed as ASIPs (application-specific instruction-set processors). Processor-core vendors now offer automated tools for tailoring processors to specific tasks, thus making them more efficient at executing the target task—with improved power and performance characteristics.
AMP systems need not have hardware cache-coherency mechanisms. Although the processors in an AMP system may communicate through a block of coherent memory on a shared bus, there are other equally good or better alternatives for interprocessor communications in AMP systems including true dual- and multi-ported memory buffers and hardware FIFOs on dedicated processor-to-processor links. Many current embedded system architectures employ AMP designs. Such systems already include mobile phone handsets, personal media players, printers, and video gaming systems.
Although SMP and AMP provide the world with a clean 2-way, multicore-computing dichotomy, the real world of system design is messier than this model. At least two more computing models exist in the continuum running from fully homogeneous to fully heterogeneous systems. Moving from SMP, consider a system where the processors are all alike (homogeneous) but they are not strictly general-purpose in nature. For example, a video-processing application may well be able to use a pool of computing resources quite efficiently if each processor in that pool has been tailored for video processing. Many signal-processing tasks fall into this zone. You might call this sort of architecture “extended SMP” or ESMP, to denote the specialized nature of each processor.
Moving slightly closer to the heterogeneous side of the continuum, it’s easy to envision adding audio to the above example of a video ESMP configuration. However, audio will not need more than one processor, so it’s only necessary to add audio-processing extensions to one processor in the ESMP cluster to accommodate efficient audio processing. Now, the processing pool is no longer symmetric. One of the processors is different, although just by a little. You might call such an arrangement “selectively asymmetric SMP” or SASMP. Why is it important to create a different classification such as SASMP? Because operating systems such a Linux do not support asymmetry at the moment and will need to evolve to accommodate such hardware.
So finally, let’s turn to the software issues raised by these various multicore systems. Single-processor programming practice has created numerous obstacles that block simple migration to SMP, AMP, ESMP, and SASMP designs. These obstacles include:
• Latent concurrency issues
• Explicitly re-entrant code
• Priorities no longer assure mutex (mutual exclusion)
• Masking one CPU’s interrupts no longer locks access to a resource
• New possibilities for race conditions
• Inter-processor deadlocks
• Single-CPU crashes can hang an entire system
• Silent parallelization issues through parallel APIs
• Bad timing assumptions on task completion
• Weak memory consistency among CPUs
In addition to these unresolved programming issues, there is also the thorny problem of debugging MP systems. Where will we get the programmers who understand these issues? All of these interesting multicore problems will be attacked in the coming years ensuring a complex and interesting future in the realm of MP system design.
Grant Martin is a Chief Scientist at Tensilica, Inc. in Santa Clara, California. Before that, Grant worked for Burroughs in Scotland for 6 years; Nortel/BNR in Canada for 10 years; and Cadence Design Systems for 9 years, eventually becoming a Cadence Fellow in their Labs. He received his Bachelor's and Master's degrees in Mathematics (Combinatorics and Optimisation) from the University of Waterloo, Canada, in 1977 and 1978. Grant is a co-author or co-editor of nine books dealing with SoC design, SystemC, UML, modelling, EDA for integrated circuits and system-level design, including the first book on SoC design published in Russian. His most recent book, “ESL Design and Verification”, written with Brian Bailey and Andrew Piziali, was published by Elsevier Morgan Kaufmann in February, 2007.
This blog was co-authored by Steve Leibson, an experienced hardware and software design engineer, engineering manager, and design consultant. He spent 10 years working at electronic systems companies including HP’s Desktop Computer Division, Auto-Trol Technology (graphics workstations), and Cadnetix (EDA workstations) after earning his BSEE cum laude from Case Western Reserve University. At HP, Auto-Trol, and Cadnetix, he specialized in the design of desktop computers and workstations, especially in the areas of system and I/O design. Leibson has just written and published “Designing SOCs with Configured Cores,” a treatise on 21st-century MPSOC design. In 2004, he co-authored “Engineering the Complex SOC” with Tensilica’s president and CEO Chris Rowen, which has also been used as a textbook in university classes. He has also contributed chapters to several other SOC design books since joining Tensilica in 2001. For details on of the Tensilica offerings, click here: tensilica.com