Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt

Use this file to discover all available pages before exploring further.

Cost Estimates

A significant benefit to batch inference is decreased costs as well as transparent pricing as inputs are known in advance. We aim to provide transparent pricing models so you know in advance how much a batch job will cost before running it.

Understanding Pricing

We charge based on the count of input and output tokens that successfully complete inference. Our pricing page contains the average cost per million-token for each model, blending both input and output tokens and weighted according to typical usage patterns. However, output tokens are generally more expensive than input tokens, and we do charge differently for each.

Using the Dry Run Feature

To get the cost estimate for a batch job, set dry_run=True in the Python SDK. If you call the HTTP API directly, use cost_estimate=true in the request body. Instead of running the inference, the API will return an estimated cost for the job. Dry runs are free, so we recommend running a cost estimate first on large jobs.