Scoring (Similarity Scoring)
Use the LLM.score() method to calculate similarity scores between text pairs for binary classification and reranker models.
This example demonstrates how to use the LLM.score() method for calculating similarity scores between text pairs.
This is applicable to binary classification models, including Qwen3-Reranker models or models converted using as_binary_seq_cls_model.
Python API Example
The following example demonstrates 1-to-1, 1-to-N, and N-to-N scoring with PoolingParams:
from furiosa_llm import LLM, PoolingParams
# Load a reranker or binary classification model
with LLM("furiosa-ai/Qwen3-Reranker-8B") as llm:
# ============================================================
# Example 1: 1-to-1 scoring (single query, single document)
# ============================================================
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence."
outputs = llm.score(query, document)
print(f"Similarity score: {outputs[0].outputs.score}")
print("-" * 80)
# ============================================================
# Example 2: 1-to-N scoring (single query, multiple documents)
# ============================================================
query = "What is deep learning?"
documents = [
"Deep learning uses neural networks with multiple layers.",
"Python is a popular programming language.",
"Machine learning is a field of artificial intelligence.",
"Neural networks are inspired by the human brain.",
]
outputs = llm.score(query, documents)
for i, output in enumerate(outputs):
print(f"Document {i}: score = {output.outputs.score:.4f}")
print(f" Text: {documents[i][:50]}...")
print("-" * 80)
# ============================================================
# Example 3: N-to-N scoring (multiple queries, paired documents)
# ============================================================
queries = [
"What is Python?",
"What is JavaScript?",
"What is SQL?",
]
documents = [
"Python is a programming language.",
"JavaScript is used for web development.",
"SQL is a database query language.",
]
outputs = llm.score(queries, documents)
for i, (q, d, output) in enumerate(zip(queries, documents, outputs)):
print(f"Pair {i}: score = {output.outputs.score:.4f}")
print(f" Query: {q}")
print(f" Document: {d}")
print("-" * 80)
# ============================================================
# Example 4: Using PoolingParams for truncation
# ============================================================
# Truncate long documents to fit within model limits
pooling_params = PoolingParams(truncate_prompt_tokens=512)
query = "What is the capital of France?"
long_documents = [
"Paris is the capital and most populous city of France. " * 50, # Long document
"London is the capital of the United Kingdom. " * 50,
]
outputs = llm.score(query, long_documents, pooling_params=pooling_params)
for i, output in enumerate(outputs):
print(f"Document {i} score: {output.outputs.score:.4f}")Use Cases
The LLM.score() method is useful for:
- Document Retrieval: Finding the most relevant documents for a query
- Semantic Similarity: Measuring how similar two pieces of text are
- Question Answering: Identifying which document best answers a question
- Duplicate Detection: Finding similar or duplicate content
- Content Recommendation: Suggesting related articles or documents
For ranking multiple documents by relevance, see Rerank API example.
Server API Example
You can also use the scoring functionality through the OpenAI-compatible server:
import os
import requests
# Start server with: furiosa-llm serve path/to/reranker-model
base_url = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")
# 1-to-N scoring via HTTP API
response = requests.post(
f"{base_url}/score",
json={
"model": "reranker",
"text_1": "What is machine learning?",
"text_2": [
"Machine learning is a subset of AI.",
"Python is a programming language.",
"Deep learning uses neural networks.",
],
},
)
data = response.json()
for item in data["data"]:
print(f"Index {item['index']}: score = {item['score']:.4f}")See Score API Reference for complete server API documentation.