Available Models#

Sutro currently offers access to the category leading open-source language models. All models are currently run at full precision (no quantization). We will continue to add more models with increasing modalities, capabilities, and task-specific training over time.

Text-to-Text Models#

Model Name

Model ID

Context Window (input tokens)

Modalities

Price (USD/Million tokens, input and output)

Llama 3.2 3B Instruct

llama-3.2-3b

128k

Text

0.007

Llama 3.1 8B Instruct

llama-3.1-8b

128k

Text

0.013

Llama 3.3 70B Instruct

llama-3.3-70b-8k

8k

Text

0.148

Llama 3.3 70B Instruct

llama-3.3-70b-64k

64k

Text

0.149

Gemma 3 4B Instruct

gemma-3-4b-it

128k

Text

0.01

Gemma 3 27B Instruct

gemma-3-27b-it-16k

16k

Text

0.059

Gemma 3 27B Instruct

gemma-3-27b-it-128k

128k

Text

0.08

Qwen 3 4B Instruct

qwen-3-4b

32k

Text

0.059

Qwen 3 32B Instruct

qwen-3-32b

32k

Text

0.08

Reasoning Models (text-to-text)#

Model Name

Model ID

Context Window (input tokens)

Modalities

Price (USD/Million tokens, input and output)

Qwen 3 4B Instruct

qwen-3-4b-thinking

32k

Text

0.04

Qwen 3 32B Instruct

qwen-3-32b-thinking

32k

Text

0.08

Embedding Models#

Model Name

Model ID

Context Window (input tokens)

Embedding Dimension

Modalities

Price (USD/Million tokens, input and output)

Multilingual e5 Large Instruct

multilingual-e5-large-instruct

512

1024

Text

0.0001

GTE Qwen 2 7B Instruct

gte-qwen2-7b-instruct

32k

3584

Text

0.007

Custom Models#

We also support custom and fine-tuned models. Please contact team@sutro.sh for more information.