sampling_params
argument in your SDK call.
Each model has a set of default sampling parameters that are recommended by the model creator for best performance. When you provide your own dictionary, it is merged with these defaults, and any values you specify will always take precedence.
temperature
: 0.6top_k
: 20max_tokens
: 32768temperature
: 0.9max_tokens
: 512top_k
: 20vllm.SamplingParams
, in this case the value used falls back to the vLLM default.
Parameter | Default Value |
---|---|
temperature | 0.75 |
top_p | 1 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
Non-Thinking
) or for tasks that require reasoning (Thinking
).
Non-Thinking Defaults
Parameter | Default Value |
---|---|
temperature | 0.7 |
top_p | 0.8 |
top_k | 20 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
Parameter | Default Value |
---|---|
temperature | 0.6 |
top_p | 0.95 |
top_k | 20 |
max_tokens | 4096 |
repetition_penalty | 1.0 |
max_tokens
default. They are as follows:Defaults to 16,384:qwen-3-30b-a3b
qwen-3-235b-a22b
qwen-3-30b-a3b-thinking
qwen-3-235b-a22b-thinking
Parameter | Default Value |
---|---|
temperature | 0.95 |
top_p | 0.95 |
top_k | 64 |
max_tokens | 4096 |
repetition_penalty | 1.0 |