The Fastest AI Inference Solution
Cerebras Systems, a pioneer in high-performance AI compute, recently unveiled Cerebras Inference, the fastest AI inference solution in the world. It delivers a staggering 1,800 tokens per second for LLaMA 3.1 8B and 450 tokens per second for LLaMA 3.1 70B. This performance is 20 times faster than Nvidia GPU-based solutions in hyperscale clouds. With a starting cost of just 10 cents per million tokens, Cerebras Inference offers a cost-effective solution, providing up to 100 times better price-performance for AI workloads.
Uncompromised Accuracy with 16-bit Precision
Unlike alternative approaches that often compromise accuracy for performance, Cerebras Inference maintains state-of-the-art accuracy by staying in the 16-bit domain throughout the entire inference run. “Artificial Analysis has verified that LLaMA 3.1 8B and 70B on Cerebras Inference achieve quality evaluation results in line with native 16-bit precision per Meta’s official versions,” stated Micah Hill-Smith, co-founder and CEO of Artificial Analysis.
Transforming AI Application Development
With speeds that push performance to new frontiers and competitive pricing, Cerebras Inference is particularly appealing for developers aiming to build next-generation AI applications needing real-time or high-volume processing. AI leaders, including Dr. Andrew Ng and executives from GlaxoSmithKline, have acknowledged the potential of this powerful solution. “Cerebras has built an impressively fast inference capability which will be very helpful to such workflows,” said Dr. Andrew Ng, founder of deeplearning.ai.
Affordable Pricing Models
Cerebras has structured its inference service across three competitively priced tiers: Free, Developer, and Enterprise. The Free tier offers free API access with generous usage limits. The Developer tier, designed for flexible, serverless deployment, offers LLaMA 3.1 8B and 70B models at just 10 cents and 60 cents per million tokens, respectively. For sustained workloads, the Enterprise tier provides fine-tuned models, custom service level agreements, and dedicated support via a Cerebras-managed private cloud or on customer premises.
Strategic partnerships, like the one with LiveKit, are set to accelerate the development of multimodal AI applications. As Russell D’Sa, CEO and co-founder of LiveKit, mentioned, “Combining Cerebras’ best-in-class compute and SOTA models with LiveKit’s global edge network, developers can now create voice and video-based AI experiences with ultra-low latency and more human-like characteristics.”