Kinara, Inc. Introduced the Ara-2 edge AI processor, capable of addressing the massive compute demands of generative AI and transformer-based models while maintaining cost-effectiveness. The Ara-2 delivers increased performance/watt and performance/cost over the previous generation.
Designed to power edge servers and laptops with high performance and energy efficient inference, running applications such as video analytics and large language models (LLMs), the Ara-2 is also suited for edge applications with traditional AI models and state-of-the-art AI models using transformer-based architectures.
The Ara-2 handles 16-32+ video streams fed into edge servers, laptops and high-end cameras, with improved object detection, recognition and tracking through the use of its advanced compute engines, processing images faster and with greater accuracy, Kinara said. Ara-2 hits roughly 0.5 seconds per iteration for stable diffusion and tens of tokens/sec for LLaMA-7B.
The edge AI processor offers an enhanced feature set, more than 5× to 8× the performance over the first-generation Ara-1 processor, and real-time responsiveness with high throughput. The processor delivers a latency optimized design, balanced on-chip memories and high off-chip bandwidth to execute very large models with extremely low latency, the company said.
Security features include secure boot, encrypted memory access and a secure host interface, enabling enterprise AI deployments with greater security.
Most LLMs and generative AI applications run on GPUs in data centers with “high latency, high cost and questionable privacy,” Kinara said. In comparison, Ara-2 simplifies the transition to the edge, supporting 10’s of billions of parameters used by these generative AI models, the company added. Compute engines in Ara-2 and the software development kit (SDK) support high-accuracy quantization, a dynamically moderated host runtime and direct FP32 support.
The Ara-2 edge AI processor is currently available as a stand-alone device, USB module, an M.2 module and a PCIe card featuring multiple Ara-2 devices. Kinara supports Ara-2 with a comprehensive SDK including a model compiler and compute-unit scheduler, flexible quantization options that include the integrated Kinara quantizer, support for pre-quantized PyTorch and TFLite models, a load balancer for multi-chip systems and a dynamically moderated host runtime.