Welcome to Furiosa Docs

FuriosaAI's streamlined software stack for deep learning model inference on FuriosaAI NPUs.

Welcome! FuriosaAI offers a streamlined software stack designed for deep learning model inference on FuriosaAI NPUs. This guide covers the entire workflow for creating inference applications, starting from a PyTorch model, through model quantization, and model serving and deployment.

Latest Release 2026.1.0rc2

Stay up to date with the newest features, improvements, and fixes in the latest release.

Quick Start with Furiosa-LLM

Furiosa-LLM is a high-performance inference engine for LLM models. This document explains how to install and use Furiosa-LLM.

Roadmap Overview

See what's ahead for FuriosaAI with our planned releases and upcoming features. Stay informed on development progress and key milestones.

Hugging Face Hub

Pre-optimized and pre-compiled models for FuriosaAI NPUs are available on the Hugging Face Hub. Check out the latest models and their capabilities.

Overview

FuriosaAI Software Stack

RNGD: RNGD Hardware Specification, and features
Software Stack: An overview of the FuriosaAI software stack
Supported Models: A list of supported models
What's New: New features and changes in the latest release
Roadmap: The future roadmap of FuriosaAI Software Stack

Get Started

Installing Prerequisites: How to install the prerequisites for FuriosaAI Software Stack
Upgrade Guide: How to upgrade the FuriosaAI Software Stack
Quick Start with Furiosa-LLM

Furiosa-LLM

Furiosa-LLM: An introduction to Furiosa-LLM
OpenAI-Compatible Server: More details about the OpenAI-compatible server and its features
Tool Calling: Guide to tool calling with parsers and choice options
Structured Output: Guide to structured output generation
Vision-Language Models: Guide to serving Vision-Language models with image inputs
Prefix Caching: Guide to prefix caching for improved performance
Hybrid KV Cache: Understanding hybrid KV cache management
Data Parallel Routing: Understanding scoring-based data-parallel routing
Model Preparation: How to prepare LLM models to be served by Furiosa-LLM
Model Parallelism: Tensor/Pipeline/Data parallelism in Furiosa-LLM
API Reference: The Python API reference for Furiosa-LLM
Examples: Examples of using Furiosa-LLM
Kubernetes Deployment: A guide to deploying Furiosa-LLM on Kubernetes

Cloud Native Toolkit

Cloud Native Toolkit: An overview of the Cloud Native Toolkit
Container: An overview of Container Support
Kubernetes: An overview of the Kubernetes Support
LLM-D: A guide to deploying LLM-D on Kubernetes

Device Management

Furiosa SMI CLI: A command line utility for managing FuriosaAI NPUs
Furiosa SMI Library: A library for managing FuriosaAI NPUs
Host Tuning: Host PCI optimization tuning guides

Tutorials and Examples

FuriosaAI SDK CookBook: A collection of OSS projects for AI-driven solutions using FuriosaAI NPUs.