How to Use Sampling Parameters
To use custom sampling parameters, pass a dictionary of your desired parameters to thesampling_params
argument in your SDK call.
Each model has a set of default sampling parameters that are recommended by the model creator for best performance. When you provide your own dictionary, it is merged with these defaults, and any values you specify will always take precedence.
Example of Overriding Defaults
Let’s assume the model’s default parameters are:temperature
: 0.6top_k
: 20max_tokens
: 32768
temperature
: 0.9max_tokens
: 512top_k
: 20
Parameter Reference
We support any vLLM compatible sampling parameters. Please reference their sampling parameters class for a complete list of valid parameters. Note: we do not set defaults for every parameter invllm.SamplingParams
, in this case the value used falls back to the vLLM default.
Default Parameters by Model Family
Different model families have different recommended default parameters. Here is a reference for the base configurations. Each configuration is set based off the given lab’s recommended settings, e.g. Qwen3 14B > Sampling Parameters.Llama Family
Parameter | Default Value |
---|---|
temperature | 0.75 |
top_p | 1 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
Qwen 3 Family
The Qwen family has different defaults depending on whether the model is used for standard generation (Non-Thinking
) or for tasks that require reasoning (Thinking
).
Non-Thinking Defaults
Parameter | Default Value |
---|---|
temperature | 0.7 |
top_p | 0.8 |
top_k | 20 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
Parameter | Default Value |
---|---|
temperature | 0.6 |
top_p | 0.95 |
top_k | 20 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
Certain Qwen Mixture-of-Experts (MoE) models use a higher
max_tokens
default. They are as follows:Defaults to 16,384:qwen-3-30b-a3b
qwen-3-235b-a22b
qwen-3-30b-a3b-thinking
qwen-3-235b-a22b-thinking
Gemma Family
Parameter | Default Value |
---|---|
temperature | 0.95 |
top_p | 0.95 |
top_k | 64 |
max_tokens | 4096 |
repetition_penalty | 1.0 |