Revised PREP PLD benchmarks have new verification method

OL3.MAR– –RM– — –##

Benchmarking consortium will now use outside consultants, rather than the buddy system, to check vendors' figures

Mixed reports, some certified under the old rules, some new, show the value of variety, and the impossibility of a one-architecture-fits-all system for programmable logic.

Along with revisions of its benchmark series to version 1.3, The Programmable Electronics Performance Corp. (PREP), the programmable-logic benchmarking consortium, has gone to a new certification system for its benchmarks for determining the speed and capacity of programmable logic devices. Instead of pairing competitors who verified each others' results, the consortium now hires outside consultants to implement the tests and certify results.
Altera Corp. (San Jose, CA) and QuickLogic Corp. (Santa Clara, CA) have completed certification for several devices under the new scheme. The last round of certifications of version 1.2 tests covers most devices now on the market. That round was completed in December.
Two devices appear in both the 1.2 and 1.3 test series, the QL21X16-2 from QuickLogic and the EPF81188-3 from Altera. Unfortunately for the purposes of comparing benchmark versions, in the new tests both companies also used updated versions of their proprietary design tools, Max+ plus Rev. 3.3 and SpDE Rev. 3.1, respectively.
On the 16-bit counter, benchmark #7, QuickLogic increased the external frequency from 32 to 45 MHz, fitting eight instances of the benchmark (see box). Altera lost ground slightly on the same benchmark, dropping from 40 MHz to 38 or 39. Altera's much larger device packed from 50 to 55 instances. QuickLogic got a smaller but still startling improvement on #8, the prescaled counter, rising from 47 to 57 MHz. By specifying a critical path, Altera was able to raise internal frequencies from the 40s to the 70s, but the external frequency stayed at 39.
On both benchmarks #7 and #8, Altera's EPM7064-7 and 7032-7 achieved external speeds to 95 MHz, the fastest reported. They held four and two instances each. These small devices got the same 95 MHz whenever the logic required only one pass. On the large state machine, they dropped to 58 MHz, packing two and one instances respectively. The arithmetic circuit filled 76% of the EPM7064 with a single implementation at 33 MHz, but could get two implementations in at 22 MHz. Benchmark #6, the 16-bit accumulator, ran at 39 MHz, getting one and two instances in.
The 1.2 test data show significant architectural dependencies that may not change with the new system. For example, on the arithmetic circuit, the Xilinx 3190-3 static RAM-based LCA was outstanding, with 12 instances running 23 MHz. It was beaten only by much smaller devices from Altera and Lattice. The Lattice pLSI1048-80 managed four instances at 17 MHz, one third as many as the Xilinx and only three quarters as fast. On the large state machine, the same Lattice and Xilinx parts both hold nine instances, and the Lattice part runs faster, 33 MHz to 23 for Xilinx. The same Xilinx part achieved 80 MHz externally running 18 instances of #1, the data path. That speed was equaled only by the Intel FX780-10, which packed four instances. This is an example of just how architecture-dependent a single benchmark can be.
It behooves designers making use of these results to consider the nature of their proposed designs, since both speed and packing efficiency vary widely within the same part according to use.
Advanced Micro Devices, Inc. (Sunnyvale, CA) reported results for the same parts in the MACH family with different design tools. On benchmark #9, the memory map, PALASM 4 was unable to route at all *on the MACH230-15. MINC PLDesigner XL, on the other hand, was able to route eight instances that ran a respectable 50 MHz. On the smaller MACH 210A-10 the two tools got identical results, packing four instances and running 80 MHz.
Verified results for the current crop of product announcements should be coming. (Cypress Semiconductor used PREP benchmarks to clock its family of flash-erasable sum-of-products PLDs, see page 83.) If these examples say anything, it is beware the hype, and beware averaged results. The data jump around enough from one benchmark to another that averages are really meaningless.
–Rodney Myrvaagnes

BOX:

PREP Benchmarks

1. Data path
2. Timer-counter
3. State machine
4. Large state machine
5. Arithmetic circuit
6. 16-bit accumulator
7. 16-bit counter
8. Prescaled counter
9. Memory map

The benchmarks are programmed separately. As many instances as possible of the same benchmark are packed into the device under test. Then the speed of each instance is tested. Best, worst, and mean internal speeds are reported along with an external clock rate. The external rate is generally much slower, except for some single-pass logic. All tests must be implemented with automatic place-and-route and so reported. Vendors may also report hand-tweaked variations.
The tests as a whole are supposed to be architecture-neutral, and were agreed to by all consortium members. However, individual benchmarks cannot be architecture neutral, since they place widely differing demands on logic and routing resources. For this reason, a mean external or internal clock rate for the whole suite can be misleading, when a part does exceptionally well or poorly on individual tests.

Revised PREP PLD benchmarks have new verification method

Leave a Reply Cancel reply

THE EDITOR'S PERSPECTIVE

Gina Roos

Automotive: evolving technologies and new innovations

Featured Videos

FOLLOW