FuriosaAI RNGD

RNGD is FuriosaAI's second-generation NPU designed to accelerate deep learning inference across LLMs, multi-modal, and vision workloads.

RNGD is FuriosaAI's second-generation Neural Processing Unit (NPU) designed to accelerate deep learning inference. The NPU natively supports a broad spectrum of workloads, including high-performance Large Language Models (LLMs), multi-modal models, and vision-based neural networks.

FuriosaAI RNGD

RNGD implements the Tensor Contraction Processor (TCP), a novel architecture designed to natively execute complex tensor contractions---the core mathematical operations underlying modern deep learning---directly in hardware. By leveraging this paradigm, the NPU maximizes both compute utilization and energy efficiency. Fabricated on TSMC's 5nm process node and operating at a 1.0 GHz clock frequency, RNGD delivers high-throughput execution across diverse precision formats, achieving 256 TFLOPS for BF16, 512 TFLOPS for FP8, 512 TOPS for INT8, and 1024 TOPS for INT4 workloads. To sustain these compute throughputs without memory bottlenecks, the chip integrates two HBM3 modules that provide an aggregate memory bandwidth of 1.5 TB/s. Furthermore, it interfaces with the host via PCIe Gen5 x16, ensuring low-latency data movement.

In addition to its core compute capabilities, RNGD seamlessly integrates into cloud-native and multi-tenant environments like Kubernetes. By implementing Single Root I/O Virtualization (SR-IOV), the system allows a single physical chip to be partitioned into 2, 4, or 8 independent NPU instances. Each virtual instance operates with strict hardware isolation, utilizing its own dedicated compute cores and private memory bandwidth to guarantee deterministic latency for co-located workloads.

For a detailed technical analysis of the TCP architecture and the RNGD implementation, please refer to the following publications:

RNGD Hardware Specification

SpecificationValue
ArchitectureTensor Contraction Processor
Process NodeTSMC 5nm
Frequency1.0 GHz
BF16256 TFLOPS
FP8512 TFLOPS
INT8512 TOPS
INT41024 TOPS
Memory BandwidthHBM3 1.5TB/s
Memory CapacityHBM3 48GB
On-Chip SRAM256MB
Interconnect InterfacePCIe Gen5 x16
Thermal SolutionPassive
Thermal Design Power (TDP)150W
Power Connector12VHPWR
Form FactorPCIe dual-slot full-height 3/4 Length
Multi-Instance Support8
Virtualization SupportYes
SR-IOV8 Virtual Functions
ECC Memory SupportYes
Secure Boot with Root of TrustYes

On this page