By Gina Roos, editor-in-chief
High-bandwidth compute applications and diverse workloads are driving the adoption of FPGAs in accelerator cards and how these devices are being consumed. Achronix Semiconductor Corp. and BittWare, a Molex Company, released a new class of FPGA accelerator card for cloud and edge computing that addresses the adoption of FPGAs in acceleration cards for a variety of different workloads. For customers looking for faster and lower risk deployment, the companies also offer the card as a pre-integrated and fully-tested FPGA server platform.
Targeting high-performance and high-bandwidth compute and data acceleration applications, the VectorPath S7t-VG6 accelerator card features Achronix’s 7-nm Speedster7t AC7t1500 FPGA, designed with the industry’s highest performance interfaces available on a PCI Express (PCIe) FPGA accelerator card.
The Speedster7t FPGA family announced earlier this year, features a 2D network-on-chip (NoC) with greater than 20 Tbits/s bandwidth capacity for moving data from the high-speed interfaces to and across the FPGA fabric. The 2D NoC alleviates data bottlenecks with 256-bit, unidirectional buses in each direction for a total of 512 Gbits/s for each NoC row and column. The primary interface for the NoC is industry-standard AXI channels.
The FPGAs also feature machine learning (ML) processors, which are optimized for compute intensive artificial intelligence/machine learning (AI/ML). The ResNet-50 benchmark is 8,600 images per second. Achronix claims it offers three to four times compute efficiency compared to GPUs.
Achronix said the 2D NoC eliminates the challenges faced by IP companies by enabling the IP to be simply connected to the AXI interface, and by handling the connectivity for high-speed interfaces and memories.
The Achronix and BittWare VectorPath PCIe accelerator card is powered by the Speedster7t FPGAs with a 2D NOC and ML processors. (Image: Achronix)
For compute applications, the idea of data growing exponentially has been with us for decades now, but as it continues to grow what’s changing is the workload, said Steve Mensor, vice president of sales and marketing at Achronix. There are examples across compute, network, storage, and sensor processing, which include voice recognition, image recognition, and AI/ML workloads, which are very compute intensive, and now, there are a variety of different workloads for networking and storage, he added.
Depending on the workload, FPGAs can run 10x to 100x faster than traditional CPUs, and at lower power, according to the companies.
“The change in the market is not just the growth in data, but the workloads themselves,” continued Mensor. “The biggest challenges faced by hyperscalers and cloud providers are energy and cooling, and ultimately, cost. If they figure out ways to be more efficient for any given function than that’s going to address both of those challenges.”
Some of the hyperscale customers such as Microsoft and Amazon have been very successful with FPGA technology, and over the last five plus years Nvidia has been very successful with GP-GPUs and CUDA programming environments, said Craig Petrie, vice president of marketing, at BittWare.
These high-profile successes from hyperscale customers have changed the industry, said Petrie. “What we saw from Microsoft was an acceleration of the Bing search engine using Altera FPGAs, and the use of Intel’s Stratix 10 FPGAs, acquired from Altera, for a persistent neural network AI application, Project Brainwave. More recently Amazon deployed Xilinx FPGAs in the AWS cloud platform, and in the last year we’ve seen server vendors such as Dell, HPE, and Fujitsu offer FPGA PCI cards as acceleration options for popular server platforms.”
These have resulted in some big changes in the adoption curve for FPGA technology in acceleration as well as the way FPGAs are consumed across different workloads.
Hyperscale companies like Microsoft and Amazon have the resources and the talent in-house to do chip-down designs, and they can invest significant effort and energy to develop acceleration cards from integration through to qualification, manufacturing, and testing in high volume, explained Petrie.
For other customers such as tier two hyperscalers or general enterprise customers, they don’t have the resources to justify chip-down design, and they need an off-the-shelf option, he added.
Petrie also noted these types of customers are looking to purchase FPGAs at the card level similar to the way they purchase GP-GPUs from Nvidia and AMD, and in some cases, they want to buy FPGAs at the server level, hence the change from Dell, HPE, and Fujitsu, which now offer FPGA PCI cards as acceleration options.
The joint product addresses these market changes as well as key business pressures such as energy efficiency, the ability to reprogram platforms for new equipment whether that’s on the edge or part of their data-center cloud infrastructure, and time-to-market challenges.
FPGAs for select workloads provide ASIC levels of performance and value for money, said Petrie, and for customers who want to repurpose end platforms for different use cases the reprogrammability of the FPGA is extremely important. “It’s certainly quicker to reprogram an FPGA than develop a new ASIC, for example.”
What’s in the accelerator card
Designed for prototyping as well as high-volume production applications, the VectorPath S7t-VG6 accelerator card allows designers to process massive amounts of data not possible with previous generations of FPGAs, said the companies. One of the things the partnership has tried to deliver on is application flexibility by adding several features that are typically only provided in development kits, or evaluation cards, not for high-volume deployment applications.
The VectorPath accelerator card includes 1×400 GbE and 2×100 GbE ports, and eight banks of GDDR6 memory with aggregate bandwidth of 4 Tbits/s, which makes it suited for high-bandwidth data acceleration applications. The full-height, ¾ length card (the same size as GPUs in mass deployment in hyperscale applications) also offers three cooling options: passive, active, and liquid.
Key features include:
- 400 GbE QSFP-DD and 100 GbE QSFP56 interfaces
- Eight banks of GDDR6 memory delivering 4 Tbits/s aggregate bandwidth
- One bank of DDR4 running at 2666 MHz with ECC
- PCIe compliance and certification
- 20 Tbits/s two-dimensional network-on-chip (NoC) inside the Speedster7t FPGA
- 692K 6-input LUTs
- 40K Int8 MACs that deliver >80 TOps
- OCuLink – four-lane PCIe Gen 4 connector for connecting expansion cards
The VectorPath comes loaded with high-speed data and memory interfaces as well as clock inputs, GPIO, and an expansion port. (Image: Achronix)
A couple of features worth some discussion are the GDDR6 memory and PCIe compliance.
FPGAs with embedded HBM and HBM2 memories are at the premium end of the market ,but what Achronix has done instead is to use hardened GDDR6 IP blocks in the FPGA, said Petrie. “We can use low-cost, off-the-shelf commodity GDDR6 chips, typically used for GP-GPUs, on the FPGA cards, which gives the same bandwidth as HPM2, but at a lower price point.”
The card will ship with out-of-the-box compatibility with PCIe Gen 3 x 16 (16 lane), which is currently used in the market today, but it also is future proofed to PCIe Gen 4 and Gen 5 in the hard IP.
Other key features of the Speedster7t is the suite of hard MAC and FEC IP for a range of industry-standard protocols and line rates in the networking world, and the flexible GbE connections for customers with different I/O requirements. These include 1x 200 GbE (2x 100 GbE or 4x 10/25/40/50 GbE), and 1x 400 GbE (2x 200 GbE, 4x 100 GbE or 8x 10/25/40/50 GbE). The single 400 GbE is unique in the market for an FPGA acceleration card, Petrie said.
The card also offers clock synchronization options for timing-critical application requirements. For example, BittWare added clock inputs on the front panel for customers who need to synchronize multiple cards together or use an external clock input to synchronize the data, along with clock jitter-cleaner circuitry.
The expansion port also is a big deal, offering high-speed serial card-to-card for low latency, deterministic scaling, and extra network ports for denser applications. It can interface to custom serial I/O protocols and directly to NVMe flash storage arrays.
The companies also spent a lot of effort on getting the right software and tools. Having the best hardware goes without saying but having the best hardware with no means of programming and controlling it is of no use, said Petrie.
In addition to the Achronix ACE development tools for the Speedster 7t, BittWare also offers its own toolkit that includes firmware and software for a comprehensive board management controller to monitor the health of the card, such as power consumption, temperature, current and voltage, along with safety circuitry if a problem arises such as when a fan stops working.
BittWare’s board management controller and developer toolkit includes the API, PCIe drivers, diagnostic self-test, and application example designs for an easy out-of-the-box experience. It supports both Linux and Windows (for legacy applications) operating systems.
“Diagnostic self-test is important because it gives customers a baseline at which to work from,” said Petrie. “Customers can run the self-test to verify that the card is behaving within expected parameters. It’s also a baseline for warranty and technical support. If customers have any problems, they can run a diagnostic self-test for a comprehensive report detailing all the performances and parameters on the card itself.
The enterprise-class PCI accelerator card also gives designers greater flexibility with two purchasing options. In addition to a standalone card, it can be purchased pre-integrated as turnkey Dell or HPE TeraBox servers from BittWare. The pre-integrated FPGA servers come with warranty and support from Dell and HPE.
BittWare also offers licensing for volumes in the tens of thousands of units. The company will allow customers to manufacture and test the S7t-VG6 at their preferred contract electronics manufacturer under license. They also can create their own variants under license.
The VectorPath S7t-VG6 accelerator card, priced at $7,500, will be available in Q2 2020. Contact Achronix for early engagements.
The article originally published on sister publication EE Times.
Learn more about BittWareElectronic Products MagazineMolex