Ambarella, Inc. has demonstrated multi-modal large language models (LLMs) running on its new N1 SoC series at a fraction of the power-per-inference of leading GPU solutions. The company will initially offer optimized generative AI processing capabilities on its mid- to high-end SoCs, from the CV72 that delivers on-device performance under 5 W to the new N1 series for server-grade performance under 50 W.
Ambarella claims that its complete SoC solutions are up to 3× more power-efficient per generated token compared to GPUs and other accelerators.
The N1 series of SoCs, based on the company’s CV3-HD architecture initially developed for autonomous driving applications, runs multi-modal LLMs in an extremely low power footprint, such as Llama2-13B with up to 25 output tokens per second in single-streaming mode at under 50 W of power. The new solution helps OEMs to deploy generative AI into any power-sensitive application.
Ambarella plans to bring generative AI to edge endpoint devices and on-premises hardware applications including video security analysis, robotics and industrial. “Virtually every edge application will get enhanced by generative AI in the next 18 months,” said Omdia’s principal analyst, Advanced Computing, Alexander Harrowell, in a statement.
Harrowell explained that when moving genAI workloads to the edge, it is all about performance/watt and integration with the existing edge ecosystem, rather than just raw throughput.
Generative AI can deliver new functions to the edge that were not previously possible, Ambarella said. The company’s AI SoCs are supported by its new Cooper Developer Platform that provides seamless integration of software, hardware, fine-tuned AI models and services. This platform offers a modular and prepackaged suite of hardware and software development tools.
Ambarella has pre-ported and optimized popular LLMs, such as Llama-2 and the Large Language and Video Assistant (LLava) model, running on N1 for multi-modal vision analysis of up to 32 camera sources. These pre-trained and fine-tuned models will be available for partners to download from the Cooper Model Garden.
Examples of the on-device LLM and multi-modal processing include smart contextual searches of security footage; robots that are controllable with natural language commands, and AI helpers that perform anything from code generation to text and image generation.
The local processing, resulting from Ambarella’s solutions, suits application-specific LLMs, fine-tuned on the edge for individual scenarios rather than using a traditional server approach with bigger and power-hungry LLMs for every use case.