POST
/
batch-inference
Creating a Batch Inference Job
curl --request POST \
  --url https://api.sutro.sh/batch-inference \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "inputs": [
    {}
  ],
  "model": "<string>",
  "system_prompt": "<string>",
  "json_schema": {},
  "sampling_params": {},
  "job_priority": 123,
  "dryrun": true,
  "random_seed_per_input": true,
  "truncate_rows": true
}'
{
    "job_id": "batch_job_12345"
}
Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.
Run batch inference on an array of inputs.

Request Body

inputs
array
required
The list of inputs to run inference on
model
string
default:"llama-3.1-8b"
The Model ID to use for inference. See Available Models for a list of available models
system_prompt
string
default:"None"
A system prompt to use for the inference. Use this parameter to provide consistent, task-specific instructions to the model. See System Prompts for more information
json_schema
object
default:"None"
If supplied, a JSON schema that the output must adhere to. Must follow the json-schema.org specification. See Structured Outputs for more information
sampling_params
object
default:"None"
If supplied, a dictionary of sampling parameters to use for the inference. See Sampling Parameters for more information
job_priority
integer
default:"0"
The priority of the job. Currently, only priority 0 and 1 are supported. See Job Priority for more information
dryrun
boolean
default:"false"
If True, the API will return cost estimates instead of running inference. See Cost Estimates for more information
random_seed_per_input
boolean
default:"false"
If True, a random seed will be generated for each input. This is useful for diversity in outputs
truncate_rows
boolean
default:"false"
If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window

Headers

Authorization
string
required
Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns a job_id that can be used to poll for the status and results of the job.
job_id
string
The unique identifier for the batch inference job
{
    "job_id": "batch_job_12345"
}

Code Examples

import requests

response = requests.post(
'https://api.sutro.sh/batch-inference',
headers={
'Authorization': 'Key YOUR_SUTRO_API_KEY',
'Content-Type': 'application/json'
},
json={
'inputs': [
'What is the capital of France?',
'Explain quantum computing in simple terms',
'Write a haiku about programming'
],
'model': 'llama-3.1-8b',
'system_prompt': 'You are a helpful assistant.',
'job_priority': 0
}
)

result = response.json()
print(f"Job created: {result['job_id']}")