PoolingParams class

Reference for the PoolingParams class, which controls pooling task behavior such as embedding normalization, prompt truncation, and output dimensionality.

classPoolingParams

API parameters for pooling models.

paramself

paramtruncate_prompt_tokensint | None

= None

paramdimensionsint | None

= None

paramnormalizebool | None

= True

paramtaskPoolingTask | None

= None

Attributes

attributetruncate_prompt_tokens

= truncate_prompt_tokens

Controls prompt truncation. Set to -1 to use the model's default truncation size. Set to k to keep only the last k tokens (left truncation). Set to None to disable truncation.

attributedimensions

= dimensions

Number of dimensions for the output embedding. If set, truncates the embedding to the first N dimensions (Matryoshka Representation Learning). Must be a positive integer.

attributenormalize

= normalize

Whether to normalize the embeddings outputs. Only supported for embedding tasks.

attributetask

= task

Parameters

task

Type: PoolingTask (Literal["embed", "score"])

Specifies the pooling task type:

"embed": For embedding generation tasks. The model outputs dense vector representations.
"score": For similarity scoring tasks. The model outputs scalar similarity scores.

This parameter is usually inferred from the model's metadata, but can be explicitly set when needed.

normalize

Type: bool

Default: True

Whether to normalize the embedding outputs using L2 normalization. Only applicable for embedding tasks (task="embed").

When True, the embedding vectors are normalized to unit length, which is useful for:

Cosine similarity computations
Reducing the impact of vector magnitude differences
Standardizing embeddings for downstream tasks

Example:

python

from furiosa_llm import LLM, PoolingParams

with LLM(artifact_path="path/to/embedding/model") as llm:
    # With normalization (default)
    params_normalized = PoolingParams(normalize=True)
    outputs = llm.embed("Hello, world!", pooling_params=params_normalized)
    # Output vectors have unit length (L2 norm = 1.0)

    # Without normalization
    params_raw = PoolingParams(normalize=False)
    outputs = llm.embed("Hello, world!", pooling_params=params_raw)
    # Output vectors preserve original magnitudes

truncate_prompt_tokens

Type: int | None

Default: None

The maximum number of tokens to truncate the input prompt to. If the input exceeds this length, it will be truncated to fit within the limit.

When None, no truncation is applied, and the input is processed up to the model's maximum sequence length.

This is particularly useful for:

Handling variable-length inputs in batch processing
Ensuring inputs fit within model constraints
Controlling computational costs for long documents

Example:

python

from furiosa_llm import PoolingParams

# Truncate to 512 tokens
params = PoolingParams(truncate_prompt_tokens=512)

dimensions

Type: int | None

Default: None

Reduces the dimensionality of the embedding output to the specified number of dimensions.

When None, the full embedding dimension from the model is returned.

This parameter is useful for:

Reducing storage requirements
Speeding up downstream similarity computations
Matching embedding dimensions for compatibility with other systems

NOTE

Not all models support dimension reduction. Check your model's capabilities before using this parameter.

Example:

python

from furiosa_llm import PoolingParams

# Reduce to 256 dimensions
params = PoolingParams(dimensions=256)

PoolingParams class

Attributes

Parameters

task

normalize

truncate_prompt_tokens

dimensions

On this page