Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.
Request Body
The list of inputs to run inference on
The Model ID to use for inference. See Available Models for a list of available models
A system prompt to use for the inference. Use this parameter to provide consistent, task-specific instructions to
the model. See System Prompts for more information
If supplied, a JSON schema that the output must adhere to. Must follow the json-schema.org specification. See
Structured Outputs for more information
If supplied, a dictionary of sampling parameters to use for the inference. See Sampling
Parameters for more information
The priority of the job. Currently, only priority 0 and 1 are supported. See Job Priority
for more information
If True, the API will return cost estimates instead of running inference. See Cost
Estimates for more information
If True, a random seed will be generated for each input. This is useful for diversity in outputs
If True, any rows that have a token count exceeding the context window length of the selected model will be
truncated to the max length that will fit within the context window
Headers
Your Sutro API key using Key authentication scheme.Format:
Key YOUR_API_KEY
Example: Authorization: Key sk_live_abc123...
Response
Returns a job_id that can be used to poll for the status and results of the job.The unique identifier for the batch inference job