Transformer-based networks driving generative AI need a massive increase in compute and memory resources and optimized processing architectures. In response, Ceva has enhanced its NeuPro-M NPU IP family, addressing generative AI processing with high performance and power efficiency for AI inferencing workloads from the cloud to the edge.
The NeuPro-M NPU IP is designed for both classic AI and generative AI workloads. The architecture and tools were extensively redesigned to support transformer networks, CNNs and other neural networks and for future machine-learning inferencing models. This allows highly optimized applications that leverage generative and classic AI to be seamlessly developed and run on the NPU in communication gateways, optically-connected networks, cars, notebooks and tablets, AR/VR headsets and smartphones.
The power-efficient NeuPro-M scalable NPU architecture offers peak performance of 350 tera operations per second per watt (TOPS/W) at a 3-nm process node, Ceva said, and can process more than 1.5-million tokens per second/W for transformer-based LLM inferencing. It provides a full AI software stack that includes the NeuPro-M system architecture planner tool, neural network training optimizer tool and CDNN AI compiler and runtime.
The NeuPro-M meets stringent safety and quality compliance standards including automotive ISO 26262 ASIL-B functional safety standard and A-Spice quality assurance standards. The NPU architecture also supports secure access in the form of optional root of trust, authentication against IP/identity theft, secure boot and end-to-end data privacy.
The NeuPro-M architecture is flexible and future proof thanks to an integrated vector processing unit (VPU), supporting future network layers, said Ceva. The architecture also supports any activation and data flow, with true sparsity for data and weights that enables up to 4× acceleration in performance, the company added.
As a result, multiple applications and multiple markets can be addressed with a single NPU family. For larger scalability, the NeuPro-M adds the new NPM12 and NPM14 NPU cores, with two and four NeuPro-M engines, respectively, to migrate to higher performance AI workloads. The enhanced NeuPro-M family consists of four NPUs – the NPM11, NPM12, NPM14 and NPM18.
The enhanced NeuPro-M architecture also includes a revamped development tool chain, based on Ceva’s neural network CDNN AI compiler and CDNN software, architecturally aware for full utilization of NeuPro-M parallel processing engines and for maximizing AI application performance.
The NPM11 NPU IP is generally available with the NPM12, NPM14 and NPM18 available for lead customers.