Embedded processors
for PLDs
Modern PLDs are large enough, fast enough, and contain
sufficient memory and other resources to support today's processor cores
BY GORDON POCOCK
and BOB GARRETT
Altera, San Jose, CA
http://www.altera.com
The growth of embedded processor cores in the programmable-logic world has often been difficult, with different vendors proposing various solutions. Implementing a processor core within a programmable-logic device (PLD) has often been considered inefficient because of the desire to provide an industry-standard core 100% compatible with an existing software and tools base. The problem with this is that the core is usually too slow and inefficient in the required PLD functionality areas. In the past, PLDs have been too slow or of insufficient density to implement high-performance microprocessor cores. Modern PLDs are large enough, fast enough, and contain sufficient memory resources to support these designs, and a wide range of available supporting intellectual property (IP) cores provide communications, DSP, and bus interface functionality. Also, with the implementation of PLDs in cutting-edge semiconductor processor technology, the possibility of adding hard-processor macros into PLDs has become reality, especially with readily licensable processor cores such as those available from ARM.
The die of Altera's Excalibur EPXA10 Embedded Processor Device includes dual-port RAM, single-port RAM, customer logic, and the ARM 922T RISC processor combined with a complete peripheral set.
Soft processor cores for PLDs For the most part, existing soft processor cores offered for programmable logic have been either too expensive (like ASIC targeted cores) in terms of license and royalty fees, or they are low-performance, mature architectures. Either way neither choice makes a good fit in volume production for programmable logic. With the advent of processor cores designed specifically to be implemented in current PLDs, a higher-performance point is now attainable. For example, Altera's Nios soft-core processor provides a configurable 16- or 32-bit data-path processor with a 16-bit instruction set. With a four-stage pipeline, it is possible to operate the core at up to 80-MHz system clock speed in the newest PLD architectures from Altera. In the most recent Nios release, the on-chip interconnect among the CPU, memory, and peripherals is a bus fabric architecture that allows the processor and peripherals to access memory simultaneously, providing direct memory access to high-bandwidth peripherals without requiring data to flow through the processor itself. Designers can also add custom processor extensions to the Nios ALU to accelerate critical parts of their software. Up to five “custom instructions” can be added to the Nios processor to implement processing tasks in single-cycle (combinatorial) or multicycle (sequential) operation. The SOPC Builder provides a graphical user interface for configuring the processor, adding peripherals and custom instructions, and automatically generating a customized software library. Embedding hard macro processors A second way of getting microprocessor functionality is to embed a hard processor core into/onto a PLD. This gives the advantage of providing a high-performance core in a minimum die area. It also provides the opportunity to implement industry-standard cores that have existing software and tools base. Also, with the semiconductor manufacturer having taken on the responsibility for licensing the core, the access to these high-performance cores becomes global, no longer restricted to those designers or projects that allow ASIC cores to be used. When it comes to sheer processor core performance, a soft core will never compete with an embedded hard processor core. Using an industry-standard processor core from a major licensor of such IP provides access to not only high-speed computing, but also to an existing base of software and high-quality development tools that can provide significant time-to-market advantages. An 8- or 16-bit hard-core processor is probably not going to provide enough computing power to sensibly sit alongside a high-performance PLD, or be of use in applications using such a device. Moving up to 64 bits and beyond makes the core uneconomically large, so a 32-bit core is ideal. By licensing an ASIC core, designers no longer have to wish for access to these high-performance cores when their project volumes do not allow an ASIC. Because the manufacturer typically takes on the license, the end customer does not have to pay a license fee to the core vendor. Similarly, there are no associated nonrecurring expenses (NREs) or minimum order quantities (MOQs), as the embedded processor PLD becomes a standard device, usable in a plethora of applications in any market segment. Memory Once the processor core is chosen, there are other decisions to be made that have an equal impact on the performance of the device as the core. One is with regard to the memory arrangement. It would be possible to place a processor core on the PLD and use external memory for all memory spaces. This would limit performance, however, as external memory accesses are usually slower than using internal memory. It would be possible to use memory embedded into the PLD architecture for this purpose, although usually this memory will be of limited size and not optimized for processor application, due to block size and locations. Also, performance can be limited due to routing constraints within the device. The most satisfactory situation is to embed dedicated memory blocks along with the core that are tightly coupled to the processor. In addition, it makes sense to embed some peripherals along with the memory to form a complete microprocessor subsystem on the PLD, and have interfaces to the PLD through memory and dedicated interfaces. Bus architecture Another choice to be made is the bus architecture. This is often directly related to the processor core of choice, but there are also some industry-standard bus interfaces. Also, it is necessary to define a bus architecture that allows full performance of the processor core, avoiding bus bottlenecks–a major factor in memory accesses external to the processor core. An example of the first embedded PLD involving a hard-core processor is the Altera EPXA10 using an ARM922T processor with its associated caches and MMUs. This is a 32-bit RISC core, matched to 256 Kbytes of single-port memory, 128 Kbytes of dual-port memory (which also serves as an interface between the PLD and the embedded processor subsystem or stripe), as well as an SDRAM controller and an expansion bus interface. The PLD can be used to implement DSP functions, interfaces, or custom peripherals. With the large amount of both Altera and third-party IP, it is possible to create a complete system-on-a-programmable chip (SoPC) solution within the fastest possible time to market. With the flexibility of a programmable solution, any change in standards late in a project can be easily accommodated, something that an ASIC could never be made to do. With today's fast-moving world, a three-month delay can reduce the market potential of a product by 30%, which can often mean the difference between a product being profitable or not. One opportunity that this type of device creates is the ability for a designer to partition and repartition design functions between PLD and processor implementations depending on the required performance. This is particularly useful if specifications or design goals change in mid-design cycle, where it is more difficult to change the architecture, and certainly more difficult to test. The traditional means of integrating processor cores with custom logic in a single device has been by using an ASIC–a route available only to projects going into large-scale production, or very expensive systems. By taking a PLD and implementing a processor core either as soft IP, or as a hard-wired element allows the implementation of complex systems on a chip that have the flexibility of a reprogrammable solution with the integration and performance of an ASIC. With the device vendor taking license responsibility for the royalties, NREs, and MOQs, even the low-volume user can now have access to high-performance cores, previously only available to the ASIC user.