Welcome to Furiosa Docs
FuriosaAI's streamlined software stack for deep learning model inference on FuriosaAI NPUs.
Welcome! FuriosaAI offers a streamlined software stack designed for deep learning model inference on FuriosaAI NPUs. This guide covers the entire workflow for creating inference applications, starting from a PyTorch model, through model quantization, and model serving and deployment.
Latest Release 2026.1.0rc2
Stay up to date with the newest features, improvements, and fixes in the latest release.
Quick Start with Furiosa-LLM
Furiosa-LLM is a high-performance inference engine for LLM models. This document explains how to install and use Furiosa-LLM.
Roadmap Overview
See what's ahead for FuriosaAI with our planned releases and upcoming features. Stay informed on development progress and key milestones.
Hugging Face Hub
Pre-optimized and pre-compiled models for FuriosaAI NPUs are available on the Hugging Face Hub. Check out the latest models and their capabilities.
Overview

- RNGD: RNGD Hardware Specification, and features
- Software Stack: An overview of the FuriosaAI software stack
- Supported Models: A list of supported models
- What's New: New features and changes in the latest release
- Roadmap: The future roadmap of FuriosaAI Software Stack
Get Started
- Installing Prerequisites: How to install the prerequisites for FuriosaAI Software Stack
- Upgrade Guide: How to upgrade the FuriosaAI Software Stack
- Quick Start with Furiosa-LLM
Furiosa-LLM
- Furiosa-LLM: An introduction to Furiosa-LLM
- OpenAI-Compatible Server: More details about the OpenAI-compatible server and its features
- Tool Calling: Guide to tool calling with parsers and choice options
- Structured Output: Guide to structured output generation
- Vision-Language Models: Guide to serving Vision-Language models with image inputs
- Prefix Caching: Guide to prefix caching for improved performance
- Hybrid KV Cache: Understanding hybrid KV cache management
- Data Parallel Routing: Understanding scoring-based data-parallel routing
- Model Preparation: How to prepare LLM models to be served by Furiosa-LLM
- Model Parallelism: Tensor/Pipeline/Data parallelism in Furiosa-LLM
- API Reference: The Python API reference for Furiosa-LLM
- Examples: Examples of using Furiosa-LLM
- Kubernetes Deployment: A guide to deploying Furiosa-LLM on Kubernetes
Cloud Native Toolkit
- Cloud Native Toolkit: An overview of the Cloud Native Toolkit
- Container: An overview of Container Support
- Kubernetes: An overview of the Kubernetes Support
- LLM-D: A guide to deploying LLM-D on Kubernetes
Device Management
- Furiosa SMI CLI: A command line utility for managing FuriosaAI NPUs
- Furiosa SMI Library: A library for managing FuriosaAI NPUs
- Host Tuning: Host PCI optimization tuning guides
Tutorials and Examples
- FuriosaAI SDK CookBook: A collection of OSS projects for AI-driven solutions using FuriosaAI NPUs.