Advertisement

High-density PLDs defy simplistic categorization

AMD.NOV–rm

High-density PLDs defy simplistic categories

Selection, especially of one family for a series of different
appliications, entails compromises

BY BRYON MOYER Advanced Micro Devices Inc. Sunnyvale, CA

Creative chaos is energizing the high-density IC market. The need for
higher integration has outstripped the capabilities of the simple
single-PLD architecture, typified by PAL devices. The result is a barrage
of new architectures and the inevitable claims of the absolute superiority
of each architecture. Unfortunately, determining which device truly is
best is much more difficult at today's density levels than with
low-density PLDs. (See box, “Real-world decision criteria.”) Lacking hard
data, designers often select devices based on less tangible emotional
issues.

Architectural differences
When a designer chooses a simple PLD architecture, the criteria are: 1.
The right speed. 2. The right power. 3. A logic structure that can
handle the design. 4. A good choice of development tools; price. 5. The
standard service issues that go into any purchasing decision. High
density makes the selection more complicated. Speed is still an issue, but
now it is harder to tell how fast a device will really be for an
application. Predictability becomes an issue. PAL devices are 100%
interconnected. At high densities, nothing is 100% interconnected, so
routability becomes an issue. PAL devices were sum-of-products based, as
are some of the higher-density PLDs. Many high-density devices have a different cell structure that handles small functions more efficiently, while
limiting wide gating. So cell functionality is an issue. In addition, the
size of the logic block has significant influence on speed, logic
implementation, and routing. This means that granularity is now an issue.
PLDs surged in popularity because of the availability of design software.
For higher levels of integration, the kind of software used for PLDs is
really inadequate. Development tools have become probably the biggest
issue of all.

FPGAs vs. CPLDs Because of the variety of architectures in the
high-density arena, PLD makers have tried to classify some of the devices
to make life a bit easier, even if it does oversimplify the picture. The
two basic categories are generally field-programmable gate arrays (FPGAs)
and complex programmable logic devices (CPLDs). FPGAs look like gate
arrays whose interconnect and logic schemes are programmable. The basic
unit of logic is a small cell that varies among devices. Many of these
cells are interconnected by a channeled routing scheme. Xilinx' LCA
devices and Actel's ACT families are examples of this kind of structure
(see Fig. 1). CPLDs look like the integration of several PLDs. The basic
unit of logic is the product term. Such a device will have several large
blocks, each of which looking like a fully interconnected PLD. These
blocks are interconnected by a partially populated global switch matrix.
AMD's MACH devices and Altera's MAX devices are examples of this
arrangement (see Fig. 2). In general, CPLDs can achieve higher system
speeds for typical designs. On the other hand, FPGAs tend to provide
higher integration. In addition, FPGAs usually have a low standby current
requirement, with a linear increase in consumption with increasing
frequency. CPLDs generally require more current; this is certainly so if
the highest speed is needed.

Predicting application speed With a PLD, you know the speed of the
device from the data sheet. This speed holds no matter how the logic is
implemented. At higher densities, there are two possible ways that timing
can become less predictable: logic and routing. Most CPLDs have logic
timing variations, and some have timing variations due to routing. Most
FPGAs have significant timing variations due to routing. Some vendors
claim that their speed is predictable, arguing that once you implement the
design, you can predict the speed. The problem is, most of the development
time is spent trying to implement the design. With devices of this size,
it can sometimes take days or even weeks to get a design implemented. The
real issue is whether one can predict the speed of an application before
implementing the application. It is heartbreaking to do that work only to
find that the device cannot provide the performance you need. An example
of logic-dependent timing is a device whose speed depends on how many
product terms are used. Several schemes can be used to route product terms
to macrocells, and most (but not all) of them cause additional delay once
more than some base number (3 P 5) is used. These additional delays can
range between a few nanoseconds and tens of nanoseconds (see Fig. 3).
Routing variations can be caused by the length and kind of routing
resource needed and by loading. For example, in an FPGA, a significant
timing savings is realized by bringing logic on board, processing it in a
nearby cell, and sending it off-chip on another nearby pin. If, instead,
the signal is sent to the other side of the die, it will be much slower.
In addition, if a signal has high fanout, the speed may also suffer (see
Fig. 4). Different devices have different levels of speed predictability.
Only the MACH devices have timing that is completely insensitive to logic
and routing variations. All else being equal, most designers highly prefer
predictability. But some are willing to trade it off depending on other
needs.

Routability The nature of the routing resources is one of the key
elements that distinguishes FPGAs from CPLDs. The channeled nets of an
FPGA require a different design approach from the switch matrices used in
CPLDs. Because of the timing dependence on routing in an FPGA, much
attention must be paid to the way critical signals are routed. Tools are
usually provided to allow manual tweaking of signals that need to be fast.
The number of wires per channel and the number of interconnect points on
each wire are the prime determinants of the routability of an FPGA. The
more wires, and the more ways that wires can be steered around, the better
the routability. Note that extra routing flexibility often translates into
slower performance, since it's hard to add routing resources without
adding delay. In a CPLD, several items affect routability. The primary
consideration is the switch matrix. The switch matrix determines what
connections are possible in getting from one block to another. The richer
the switch matrix, the more likely it is that a particular connection will
be routable. A key variable is the number of ways that a signal can get
from one place to another. The more ways there are to route, the greater
the chances of success when some of those ways are blocked by other
signals. The tradeoff is that a large, highly populated switch matrix is
generally slower than a small, less routable switch matrix. This is why
there are no 100% fully populated switch matrices (see Fig. 5).
Besides the switch matrix, the logic allocation and macrocell-to-I/O pin
allocation also affect routability in a CPLD. Most CPLDs have some kind of
scheme for allocating the number of product terms to be used in a
function. As these schemes often borrow other product terms inside the
block, the placement of logic can be critical. If two functions are placed
next to each other, and each needs to borrow product terms from the other,
then the design won't fit. Such a situation can be remedied by relocating
the functions where they can each borrow terms. In many devices, this
results in the I/O pins moving to follow the logic. The possibility of I/O
pins moving exists in CPLDs and FPGAs alike. It tends to be more acute in
CPLDs, since many designers expect that they are essentially large PLDs,
and do the logic design while the board is being fabricated. Output
switch matrices can alleviate this problem to a large extent. An output
switch matrix decouples the I/O pin from the macrocell. This means that
the macrocell can move while still maintaining the old I/O pin. Note,
however, that not all combinations are possible, so the possibility of a
pinout change occurring hasn't been completely removed; only greatly
lessened. In some devices, using the output switch matrix affects speed;
in others it doesn't. If asked, most designers would opt for more
routability, obviously. In practice, however, the amount of routability
needed is “enough” routability. If an application, along with its
variations and fixes, fits, then its routing is adequate. In addition,
routing is difficult to quantify, and even so there is always an
application that won't fit in the most routable device. A high routability
number won't sooth the designer whose design won't fit.

Logic block particulars The nature of the logic block has a large impact
on the kind of logic that can be efficiently built. The smaller the basic
logic element, the more varied the logic that can be implemented. The
tradeoff is, of course, signal delay and delay predictability. A device
made up of large cells, each of which can implement a lot of logic, is
said to have coarse granularity. A device consisting of many small
elements that can be cascaded together to implement more logic is said to
have fine granularity. The granularity available ranges from near
transistor-level in the FPGAs available from Crosspoint to 22V10-size
blocks in MACH devices. Finer granularity offers much more flexibility,
since fundamental functions are primitive and can be built up in many
ways. This comes at the expense of predictable performance. Looking at a
basic Logic Cell from Xilinx, for example, a four-input logic function can
be implemented straightforwardly, with high speed (not considering
routing). But decoding a 12-bit address requires that three independent
cells handle four inputs each, to be combined by one more cell. This
arrangement requires four times as many cells for three times as many
inputs. It also doubles the propagation delay (again disregarding routing
delays). In its 4000 series, Xilinx incorporated wide decoding as a
hardware feature to address this problem. So the type of logic being
implemented greatly impacts performance (see Fig. 6). Coarser granularity
provides the ability to provide higher-level logic functions more quickly
and predictably. In a MACH435, for instance, as long as a function
requires less than 32 inputs and 20 product terms, it can be implemented
in a single predictable pass through the device. The tradeoff is that on a
given product term, if only four inputs are used, the capability for the
other 28 inputs is not freed up for other logic; it is lost. Because the
available logic in a coarse-granularity device is more highly structured, it
is inherently less flexible (see Fig. 7). The ideal architecture would
combine the speed predictability and wide gating nature of a
coarse-grained architecture with the flexibility of a fine-grained
architecture. Since no such architecture exists today, designers must
choose an architecture to suit each application.

Tools rule
In reality, a design will have a mix of needs that crosses back and
forth across the FPGA/CPLD boundary. A design may need to provide
lightning-fast memory decode while running a 30-flip-flop state machine.
No simple platitudes make the device decision easy. In addition, the
ability to make good efficient use of any device depends entirely on the
effectiveness of the toolset.Therefore design tools have become a critical
part of the decision process. In many cases they are the decision process.
In theory, a designer can pick a different device from a different
vendor for each socket to optimize perfectly. But it's pretty unlikely for
a company to select a different design tool for each device. Design tools
at this density level are expensive, and they can have a long learning
curve. In many cases, a company looks at several high-density solutions,
find a family that works adequately (not necessarily optimally) across a
wide variety of typical applications, evaluate the toolsets, and then
standardize on a single system for the company.
When a company standardizes on an architecture based on a toolset, it
locks other vendors out and it locks the user in. The lock can be opened for
both by getting support by third-party universal tool vendors. If a
company can standardize on a universal tool instead of a vendor-specific
tool, the device selection process is decoupled from the software
selection process. If a vendor arranges for a device to be supported on
universal tools, there are new opportunities for designers to use the
device. A user is more likely to use a device that is already supported by
a tool he or she already owns and knows. For universal support to work,
the silicon vendors and third-party tool vendors must work very closely
together to ensure that the tool vendor's software expertise is
supplemented with the chip vendor's IC expertise. If a designer feels that
a universal tool is not using the PLD effectively, then the integrity of
the universal tool strategy is compromised. Once universal tools support
a device, a whole range of design options opens up. As an example, many
users perceive CPLDs as only being supported at the Boolean equation level
through such tools as the PALASM, Abel, CUPL and PLDesigner languages.
However, for example, since MINC has its toolsets integrated into many
workstation environments, higher levels of design are possible, since they
eventually end up in a MINC language. Even Synopsis software can be used
to design a MACH device because of the set of links that have been
provided between various universal packages.

BOX:

Real-world decision criteria

After considering the facts about FPGAs and CPLDs, the designer needs to
ask several questions before deciding on a high-density programmable-logic
device for an application.

* Do I know the part? Am I already familiar and comfortable with the
architecture? What will the learning curve be? What do I perceive the risk
to be? * Have I had a good/bad experience with the part? Have I had any
bad timing surprises in the past? Have I generally found it easy to work
with? Has the data sheet accurately represented the device? Did the 16-bit
counter power consumption (which consumes as much power as one
combinatorial output switching at the clock frequency) come anywhere close
to my application's consumption? Did I get good support from the vendor or
distributor's field applications engineers when I needed it? * Do I have
the tools? How quickly can I be up and running on my design? How much will
I have to spend just to be able to evaluate some silicon? If I have
everything I need to work with a device that will do the job adequately,
I'll likely stay with that. * Will my application fit and be fast enough?
How much work will it take me to evaluate this? Do I have a particular
“religion”? Do I defend my favorite architecture with the same fervor that
Macintosh and DOS aficionados use to harangue each other?

Because no architecture is perfect, there are potential challenges with
any design on any part. This means that the experiences and emotional
history that a designer has with a particular family of devices weighs
much more heavily with high-density PLDs than it might in some other area
where the criteria can be more clearly delineated.

CAPTIONS:

Fig. 1. FPGAs generally have an array of relatively small logic cells,
with two-dimensional routing between them.

Fig. 2. CPLDs have a small number of relatively complex sum-of-products
blocks, joined by a switch matrix.

Fig. 3. The expansion term adds delay in (a) but the delay in the wide
decoding (b) is independent of the number of product terms actually used.

Fig. 4. Routing has significant effect on inter-cell delay. Both the
number of crossings and the fanout add delay.

Fig. 5. Adding possible paths necessitates programming the choice, hence
another delay.

Fig. 6. Wide decoders in elementary FPGA structures must go through
several logic levels

Fig. 7. A coarse-grained device may be able to handle any number of
inputs up to a limit, in this case 12. Using few doesn't change the speed,
but neither does it free the inputs for use elsewhere.

Advertisement

Leave a Reply