Cost Estimates#

A significant benefit to batch inference is decreased costs as well as transparent pricing as inputs are known in advance. We aim to provide transparent pricing models so you know in advance how much a batch job will cost before running it.

Understanding Pricing#

We charge based on the count of input and output tokens that successfully complete inference. Our pricing page contains the average cost per million-token for each model, blending both input and output tokens and weighted according to typical usage patterns. However, output tokens are generally more expensive than input tokens, and we do charge differently for each.

Using the Dry Run Feature#

To get the cost estimate for a batch job, set the dry_run parameter to True in the SDK. Instead of running the inference, the API will return an estimated cost for the job. Dry runs are free, so we recommend setting this parameter to True before running the job to ensure you understand costs beforehand.