Available Models#
Sutro currently offers access to the category leading open-source language models. All models are currently run at full precision (no quantization). We will continue to add more models with increasing modalities, capabilities, and task-specific training over time.
Text-to-Text Models#
Model Name |
Model ID |
Context Window (input tokens) |
Modalities |
Price (USD/Million tokens, input and output) |
---|---|---|---|---|
llama-3.2-3b |
128k |
Text |
0.007 |
|
llama-3.1-8b |
128k |
Text |
0.013 |
|
llama-3.3-70b-8k |
8k |
Text |
0.148 |
|
llama-3.3-70b-64k |
64k |
Text |
0.149 |
|
gemma-3-4b-it |
128k |
Text |
0.01 |
|
gemma-3-27b-it-16k |
16k |
Text |
0.059 |
|
gemma-3-27b-it-128k |
128k |
Text |
0.08 |
|
qwen-3-4b |
32k |
Text |
0.059 |
|
qwen-3-32b |
32k |
Text |
0.08 |
Reasoning Models (text-to-text)#
Model Name |
Model ID |
Context Window (input tokens) |
Modalities |
Price (USD/Million tokens, input and output) |
---|---|---|---|---|
qwen-3-4b-thinking |
32k |
Text |
0.04 |
|
qwen-3-32b-thinking |
32k |
Text |
0.08 |
Embedding Models#
Model Name |
Model ID |
Context Window (input tokens) |
Embedding Dimension |
Modalities |
Price (USD/Million tokens, input and output) |
---|---|---|---|---|---|
multilingual-e5-large-instruct |
512 |
1024 |
Text |
0.0001 |
|
gte-qwen2-7b-instruct |
32k |
3584 |
Text |
0.007 |
Custom Models#
We also support custom and fine-tuned models. Please contact team@sutro.sh for more information.