Creating a Batch Inference Job

Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.

Run batch inference on an array of inputs.

Request Body

inputs

array

required

The list of inputs to run inference on

model

string

default:"llama-3.1-8b"

The Model ID to use for inference. See Available Models for a list of available models

system_prompt

string

default:"None"

A system prompt to use for the inference. Use this parameter to provide consistent, task-specific instructions to the model. See System Prompts for more information

json_schema

object

default:"None"

If supplied, a JSON schema that the output must adhere to. Must follow the json-schema.org specification. See Structured Outputs for more information

sampling_params

object

default:"None"

If supplied, a dictionary of sampling parameters to use for the inference. See Sampling Parameters for more information

job_priority

integer

default:"0"

The priority of the job. Currently, only priority 0 and 1 are supported. See Job Priority for more information

dryrun

boolean

default:"false"

If True, the API will return cost estimates instead of running inference. See Cost Estimates for more information

random_seed_per_input

boolean

default:"false"

If True, a random seed will be generated for each input. This is useful for diversity in outputs

truncate_rows

boolean

default:"false"

If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window

Headers

Authorization

string

required

Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns a job_id that can be used to poll for the status and results of the job.

job_id

string

The unique identifier for the batch inference job

{
    "job_id": "batch_job_12345"
}

Code Examples

import requests

response = requests.post(
'https://api.sutro.sh/batch-inference',
headers={
'Authorization': 'Key YOUR_SUTRO_API_KEY',
'Content-Type': 'application/json'
},
json={
'inputs': [
'What is the capital of France?',
'Explain quantum computing in simple terms',
'Write a haiku about programming'
],
'model': 'llama-3.1-8b',
'system_prompt': 'You are a helpful assistant.',
'job_priority': 0
}
)

result = response.json()
print(f"Job created: {result['job_id']}")

Batch API

Datasets API

Creating a Batch Inference Job

Request Body

Headers

Response

Code Examples

Batch API

Datasets API

​Request Body

​Headers

​Response

​Code Examples

Request Body

Headers

Response

Code Examples