At Hot Chips 2024, FuriosaAI is lifting the veil on RNGD (pronounced “renegade”), a groundbreaking AI accelerator designed for high-performance and efficient inference of large language models (LLMs) and multimodal models in data centers. During his presentation, Furiosa co-founder and CEO June Paik shared technical insights and provided a first-hand look at the fully operational RNGD card.
Technical Specifications and Innovations
With a thermal design power (TDP) of 150 watts, RNGD features a novel chip architecture and advanced memory technology, such as HBM3, specifically optimized for the demands of LLMs and multimodal models. This innovation aims to achieve unparalleled performance, efficiency, and programmability, a trifecta that has proved elusive for GPUs and other AI chips in the industry.
Remarkable Milestones
FuriosaAI’s journey with RNGD has been rapid and impressive. The company completed a full bring-up of RNGD just weeks after receiving the first silicon samples from TSMC in May. The first hardware boot occurred less than a week later, and by early June, the chips were running industry-standard Llama 3.1 models. Early access customers began receiving their first RNGD silicon in July, followed by private demos last week.
The Road Ahead
FuriosaAI is now focused on refining its software stack as RNGD production ramps up. Their experience with their first-generation chip, introduced in 2021, has set a precedent for rapid performance improvements through software enhancements. For example, a single RNGD currently generates about 12 queries per second running the GPT-J 6B model, and this is expected to improve as the software advances.
Industry Engagement and Future Plans
Hot Chips 2024 marks a significant milestone for FuriosaAI and RNGD. The company’s large engineering team is eager to engage with the AI community at their booth, discuss their work, and gather feedback. FuriosaAI plans to provide more benchmark results, availability details, and updates on RNGD in the coming weeks and months, aiming for wide availability in early 2025.