Skip to main content
POST
/
batch-inference
Creating a Batch Inference Job
curl --request POST \
  --url https://api.sutro.sh/batch-inference \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "inputs": {},
  "column_name": "<string>",
  "model": "<string>",
  "system_prompt": "<string>",
  "json_schema": {},
  "sampling_params": {},
  "job_priority": 123,
  "cost_estimate": true,
  "random_seed_per_input": true,
  "truncate_rows": true,
  "name": "<string>",
  "description": "<string>"
}
'
{
  "metadata": {
    "job_id": "job-12345678-1234-1234-1234-1234567890ab",
    "message": "Job created successfully"
  },
  "results": "job-12345678-1234-1234-1234-1234567890ab"
}

Documentation Index

Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt

Use this file to discover all available pages before exploring further.

Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.
Run batch inference over a list of inputs, a dataset, or an HTTP(S) CSV/Parquet download URL.

Using a Sutro Function as model

Set model to the published Sutro Function name and send rows whose keys match that Function’s inputs.
Use the Function name only. Do not include a namespace, owner, or revision in model.Sutro resolves the Function namespace from the authenticated API key’s user account and loads the currently published revision through that Function’s latest.json pointer.
When model is a Sutro Function name:
  • object rows are validated and rendered using the Function’s input fields
  • string rows are treated as already-rendered prompts
  • HTTP(S) CSV/Parquet download URLs are read as row objects whose columns match the Function inputs
  • system_prompt and json_schema should be omitted because they come from the published Function
  • request-level sampling_params are merged on top of the Function/runtime defaults
  • dataset IDs such as dataset-<uuid> are not supported
Only text Functions are supported through the Batch API today. Image, PDF, and other multimodal Functions are not supported here yet.

Request Body

inputs
string[]|object[]|string
required
Accepts one of the following input forms:
  • Array — an array of strings, or object rows for a Sutro Function/custom model
  • Dataset ID — a dataset ID such as dataset-<uuid>
  • Download URL — an HTTP(S) CSV or Parquet download URL
Direct standalone model runs (i.e. model="gpt-oss-20b") expect string rows. Sutro Function runs expect object rows whose keys match the Function inputs, already-rendered string rows, or a CSV/Parquet download URL with matching columns.
column_name
string
default:"None"
Column name to use when inputs is a dataset ID or a download URL for standalone model inference.Dataset IDs require a column_name to be passed indicating which column to use. For pre-signed download URLs, column_name selects the column to run; if omitted, the first column is used.Omit column_name when model is a Sutro Function name. Instead, data sent via download URLs is matched against using the Function’s declared input fields and then templated into the right string format by Sutro.
model
string
default:"gpt-oss-20b"
Standalone model ID, custom model name, or published Sutro Function name.If the value is not an available standalone model, Sutro treats it as a Function name and resolves the correct model to use based on the Function’s latest spec.
system_prompt
string
default:"None"
System prompt for standalone model batch inference.Omit this field when model is a Sutro Function name.
json_schema
object
default:"None"
Structured output schema for standalone model batch inference.Omit this field when model is a Sutro Function name.
sampling_params
object
default:"None"
Sampling parameters for the batch job. See Sampling Parameters.For Sutro Function jobs, most users should omit this and use the published defaults. If provided, these values override the Function/runtime defaults for that job.
job_priority
integer
default:"0"
Batch priority level. Priorities 0 and 1 are supported.Dataset IDs require priority 1.
cost_estimate
boolean
default:"false"
If True, the API will return cost estimates instead of running full inference. See Cost Estimates for more information
random_seed_per_input
boolean
default:"false"
If true, generate a random seed per input row.
truncate_rows
boolean
default:"false"
If true, rows that exceed the selected model’s context window are truncated to fit.
name
string
default:"None"
Optional job name for metadata and experiment tracking. Maximum length is 45 characters.
description
string
default:"None"
Optional job description for metadata and experiment tracking. Maximum length is 512 characters.

Headers

Authorization
string
required
Your Sutro API key using the Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns the created job ID in both metadata.job_id and results.
metadata
object
Metadata for the created job. Contains job_id and message.
results
string
Job ID for the created batch inference job. This is the same value as metadata.job_id.
{
  "metadata": {
    "job_id": "job-12345678-1234-1234-1234-1234567890ab",
    "message": "Job created successfully"
  },
  "results": "job-12345678-1234-1234-1234-1234567890ab"
}

Code Examples

Standalone model with array inputs

import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-oss-20b",
        "inputs": [
            "What is the capital of France?",
            "Explain quantum computing in simple terms.",
            "Write a haiku about programming.",
        ],
        "system_prompt": "You are a helpful assistant.",
        "job_priority": 0,
    },
)

result = response.json()
print(f"Job created: {result['results']}")

Published Sutro Function with object rows

Replace lead-qualifier and the input field names with your published Function name and schema.
import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "lead-qualifier",
        "inputs": [
            {
                "query": "Find cybersecurity leaders evaluating AI vendors.",
                "region": "APAC",
            },
            {
                "query": "Find sales operations leaders replacing manual enrichment.",
                "region": "EMEA",
            },
        ],
        "job_priority": 0,
        "name": "lead-qualifier-smoke",
    },
)

result = response.json()
print(f"Job created: {result['results']}")

Published Sutro Function with a download URL

The CSV or Parquet file must contain columns matching the Function inputs. For this example, the file contains query and optionally region.
import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "lead-qualifier",
        "inputs": "https://your-bucket.s3.amazonaws.com/leads.parquet?X-Amz-Algorithm=...",
        "job_priority": 1,
        "name": "lead-qualifier-file-run",
    },
)

result = response.json()
print(f"Job created: {result['results']}")

Standalone model with download URL input

import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-oss-20b",
        "inputs": "https://your-bucket.s3.amazonaws.com/data.parquet?X-Amz-Algorithm=...",
        "column_name": "prompt",
        "system_prompt": "You are a helpful assistant.",
        "job_priority": 1,
    },
)

result = response.json()
print(f"Job created: {result['results']}")

Standalone model with dataset input

import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-oss-20b",
        "inputs": "dataset-8be01234-abcd-5678-ef90-1234567890ab",
        "column_name": "prompt",
        "system_prompt": "You are a helpful assistant.",
        "job_priority": 1,
    },
)

result = response.json()
print(f"Job created: {result['results']}")

Cost estimate

import requests

response = requests.post(
    "https://api.sutro.sh/batch-inference",
    headers={
        "Authorization": "Key YOUR_SUTRO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "lead-qualifier",
        "inputs": [
            {
                "query": "Find cybersecurity leaders evaluating AI vendors.",
                "region": "APAC",
            }
        ],
        "job_priority": 0,
        "cost_estimate": True,
    },
)

result = response.json()
print(f"Estimate job created: {result['results']}")