FuriosaAI RNGD

RNGD is FuriosaAI's second-generation NPU designed to accelerate deep learning inference across LLMs, multi-modal, and vision workloads.

RNGD is FuriosaAI's second-generation Neural Processing Unit (NPU) designed to accelerate deep learning inference. The NPU natively supports a broad spectrum of workloads, including high-performance Large Language Models (LLMs), multi-modal models, and vision-based neural networks.

FuriosaAI RNGD

RNGD implements the Tensor Contraction Processor (TCP), a novel architecture designed to natively execute complex tensor contractions---the core mathematical operations underlying modern deep learning---directly in hardware. By leveraging this paradigm, the NPU maximizes both compute utilization and energy efficiency. Fabricated on TSMC's 5nm process node and operating at a 1.0 GHz clock frequency, RNGD delivers high-throughput execution across diverse precision formats, achieving 256 TFLOPS for BF16, 512 TFLOPS for FP8, 512 TOPS for INT8, and 1024 TOPS for INT4 workloads. To sustain these compute throughputs without memory bottlenecks, the chip integrates two HBM3 modules that provide an aggregate memory bandwidth of 1.5 TB/s. Furthermore, it interfaces with the host via PCIe Gen5 x16, ensuring low-latency data movement.

In addition to its core compute capabilities, RNGD seamlessly integrates into cloud-native and multi-tenant environments like Kubernetes. By implementing Single Root I/O Virtualization (SR-IOV), the system allows a single physical chip to be partitioned into 2, 4, or 8 independent NPU instances. Each virtual instance operates with strict hardware isolation, utilizing its own dedicated compute cores and private memory bandwidth to guarantee deterministic latency for co-located workloads.

For a detailed technical analysis of the TCP architecture and the RNGD implementation, please refer to the following publications:

RNGD Hardware Specification

Specification	Value
Architecture	Tensor Contraction Processor
Process Node	TSMC 5nm
Frequency	1.0 GHz
BF16	256 TFLOPS
FP8	512 TFLOPS
INT8	512 TOPS
INT4	1024 TOPS
Memory Bandwidth	HBM3 1.5TB/s
Memory Capacity	HBM3 48GB
On-Chip SRAM	256MB
Interconnect Interface	PCIe Gen5 x16
Thermal Solution	Passive
Thermal Design Power (TDP)	150W
Power Connector	12VHPWR
Form Factor	PCIe dual-slot full-height 3/4 Length
Multi-Instance Support	8
Virtualization Support	Yes
SR-IOV	8 Virtual Functions
ECC Memory Support	Yes
Secure Boot with Root of Trust	Yes

FuriosaAI RNGD

RNGD Hardware Specification

On this page