AMD has unveiled the industry’s largest FPGA-based adaptive system-on-chip (SoC), targeting emulation and prototyping applications. The Versal Premium VP1902 chiplet-based adaptive SoC, built on 7-nm process technology, is designed to streamline the verification of complex semiconductor designs to bring them to market faster. It offers twice the capacity over the prior generation, enabling faster validation of ASIC and SoC designs, along with other performance improvements including higher transceiver count and bandwidth as well as faster debugging.
AMD is pushing the limits of the technology to deliver the highest programmable logic capacity and the market that cares the most about that is emulation and prototyping, said Rob Bauer, AMD’s senior product line manager, Versal. “This is all about enabling semiconductor companies to design next-generation chips.”
Emulation and prototyping allow the chip designers to create a digital version or “digital twin” of their chip in hardware, so that they can validate it, identify and iron out the bugs and even develop software upfront before they even have silicon, Bauer said.
Emulation and prototyping challenges
From a technology perspective, the biggest challenges for emulation and prototyping are the chips getting bigger and more complex with chiplet integration and the design costs continuing to climb, making it a problem of a higher magnitude, specifically for verification and software, Bauer said.
However, Bauer said the biggest challenge is looking forward to future ASICs and SoCs and what those will require to create that digital twin. For example, compute (FLOPS) requirements for ML training models roughly followed Moore’s Law, needing to double over 20 months, but in 2010 with deep learning and the large-scale era, including large generative AI models, they now require doubling the compute about every six to nine months, he added.
What this means for emulation and prototyping is the chips that the designers need to emulate are getting much bigger, Bauer continued.
For example, TSMC’s CoWos process has moved from 16 nm to 5 nm over three generations and advanced packaging techniques have moved from a 1× to 3× reticle size interposer, Bauer said. “They’ve been able to achieve a 20× increase in normalized transistor count. This is great in that it means we can pack more compute onto a single device like the MI300 [AMD’s new APU accelerator for AI and HPC], but at the same time there are major challenges when it comes to the integration.”
Techniques like heterogeneous integration or chiplet architectures help drive more performance, but there are major challenges, he added. “We’re not just talking about emulating a single die, or a single piece of silicon. We have to create a digital twin of all these different chiplets and they have to communicate with one another. We also have to verify the communication between them, so that adds some challenges to the emulation and prototyping systems.”
This all drives higher costs. It also means that AI chips and advanced integration require new solutions for emulation and prototyping.
IBS data shows that an advanced 2-nm design is expected to break $700 million in cost, 11.5× more than at 22 nm, Bauer said. Over half of the design cost is for verification and software, he added.
This shows the magnitude or the impact of these emulation and prototyping systems, which allow for the verification and software development before there is silicon, helping to maintain time to market, he continued. “In today’s world, if a semiconductor company waited until they had silicon to start developing software, they’d be way late to market, and they wouldn’t be able to release a new chipset every year.
“At AMD, it is about accelerating what we call a ‘shift left,’ an industry term about doing more verification and more software development earlier in the design cycle for these complex designs,” Bauer said.
“We’re also proud that we are enabling the entire semiconductor industry,” he continued. “Think of future technologies like generative AI in your pocket and autonomous vehicles. All of those are going to require highly sophisticated chips, and to bring those chips to market you need emulation and prototyping systems to make it happen.”
The specs
The VP1902, AMD’s sixth generation of FPGA, claims industry-leading capacity and connectivity, delivering 18.5 million logic cells for 2× higher programmable logic density and 2× aggregate I/O bandwidth compared with the previous-generation Virtex UltraScale+ VU19P FPGA.
Capacity is important because it enables the handling of next-generation SoC and ASIC designs, which have become more complex with advances in Al and ML-based chips, driving the need for extensive verification of silicon and software before tape-out.
The adaptive SoC also offers 2× the transceiver count and 2.3× the transceiver bandwidth, compared with the previous-generation VU19P FPGA, meeting the increased requirements for emulation and prototyping.
The 18.5 million logic cells make it the world’s largest and that is valuable because the designs that need to be emulated are getting bigger and bigger, Bauer said. For example, a 1-billion-gate design chip is not going to fit on one VP1902, he said.
The emulation and prototyping platforms are architected to use multiples of these devices in a kind of a mesh where they are all communicating with one another, he continued. “The chip-to-chip communication is very important because ultimately that 1-billion-gate design is going to get partitioned across multiple VP1902s. That is where the over 2× I/O bandwidth comes into play. We need massive I/O bandwidths between these devices.”
Bauer said a 1-billion-gate design would need roughly 24 VP1902s, but what about platforms that need to support tens of billions of gates? Ultimately, this requires “stringing together hundreds of these VP1902s, so the way these systems will be architected is with racks of these modules with hundreds of these devices,” he explained.
“For example, to move from rack-to-rack and maintain that mesh and keep all these adaptive SOCs communicating that’s where the transceivers or SerDes come into play,” he continued. “Compared with the prior generation, the VP1902 has twice the SerDes count and over twice the SerDes bandwidth so that we can deliver that scalability of the system.”
Another advance is faster debugging with the Versal architecture, providing up to 8× faster debugging compared with the previous-generation VU19P FPGA, thanks in part to the programmable network-on-chip (NoC). This faster debug performance is critical because it will help chip designers verify their design and track down bugs much faster.
The programmable NoC for high-bandwidth debug traffic over a hardened infrastructure “is valued in a lot of applications but in emulation and prototyping, it’s extra-valuable because it allows the user to decouple the emulated design, which lives in the programmable logic, from all the debug infrastructure,” Bauer said.
The VP1902 also uses a novel two-by-two super logic region (SLR) arrangement for enhanced routability and lower latency. “This is the first time that we’ve had an adaptive SoC or FPGA that has this unique chiplet architecture,” Bauer said. “We call them SLRs because we’ve been calling them that since before the chiplet was even a term.”
This is AMD’s fourth-generation device to use a chiplet-based architecture and the first in this quadrant configuration, he added. “It helps to minimize routing congestion as the user’s design gets partitioned across these four dice.”
Another first for the VP1902 is the processing subsystem. It includes a dual-core Arm A72 processor that can boot Linux and is used for flexible boot and control of the design. “A design gets deployed into the programmable logic, then the emulation platform can communicate with this device through the Linux operating system running on this processing subsystem to configure and manage everything as it’s running,” Bauer said. “Historically, that would have been implemented in programmable logic, like a state machine.”
AMD also reduced the latency of chip-to-chip interfacing by 36% over the XPIOs (parallel digital I/O). “The reduced latency moving from one chip to another chip will allow you to run the logic that spans both of them at a higher clock rate, leading to higher productivity for the person doing verification or software development of the design,” Bauer explained.
All of these improvements translate into a 2× higher max gate count at the system level. Emulation or prototyping platforms in the market that support the highest gate counts is about 30 billion gates, and AMD is doubling that to 60 billion gates, which is a tremendous increase generationally, Bauer said.
In addition, AMD estimates a 2× faster clock rate (<50 MHz) on average for emulated designs. “This means the software developer or hardware verification engineer working with the emulated SoC can run it twice as fast,” he said. “Imagine the productivity improvements if you’re able to get things done twice as quickly.”
But not everyone is designing massive AI SoCs. “There are plenty of smaller designs with lower size, weight and power chipsets, so you can also look at it in terms of emulation density—being able to do more in a smaller footprint,” Bauer said. “A 1-billion-gate design, for example, would have previously required 48 VU19Ps and now we’re at 24 VP1902s, which is a significant decrease in equipment and footprint required, given a fixed design size.”
Tools and software
AMD also addressed the key challenges in emulation and prototyping software development. These include providing sufficient debug visibility, enabling faster design iteration and minimizing dependency on specific hardware for faster time to market.
AMD works closely with the top EDA vendors, including Cadence, Siemens and Synopsys, to help provide the features and scalability required by designers. The company’s strong relationship with EDA vendors helped them architect the VP1902 from the hardware feature sets— “all the speeds and feeds”—to the software by understanding their challenges.
“A lot of investment goes into the software that our customers are developing for emulation and prototyping,” Bauer said. “It’s important that they have a consistent interface to our tools and for the last four generations of the world’s largest FPGAs we’ve had a consistent design tool — the Vivado IDE — so that helps minimize the amount of time and the amount of investment it takes them to adopt the latest and greatest adaptive SoC or FPGA technology.
“We’ve re-architected or made significant enhancements on the backend of our Vivado IDE to support this large device,” he added.
The Vivado ML design suite provides a development platform to help designers quickly design, debug and validate next-generation applications and technologies. New features that support the VP1902 adaptive SoC include automated design closure assistance, interactive design tuning, remote multi-user real-time debugging and enhanced back-end compilation.
These features help designers to iterate IC designs faster, AMD said.
The AMD Versal Premium VP1902 adaptive SoC will begin sampling in the third quarter to early-access customers, followed by production in the first half of 2024. An evaluation kit and documentation are available now.