Documentation Index
Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt
Use this file to discover all available pages before exploring further.
Using the API directly is not recommended for most users. Instead, we recommend using the
Python SDK.
Run batch inference over a list of inputs, a dataset, or an HTTP(S) CSV/Parquet download URL.
Using a Sutro Function as model
Set model to the published Sutro Function name and send rows whose keys match that Function’s inputs.
Use the Function name only. Do not include a namespace, owner, or revision in model.Sutro resolves the Function namespace from the authenticated API key’s user account and loads the currently published revision through that Function’s latest.json pointer.
When model is a Sutro Function name:
- object rows are validated and rendered using the Function’s input fields
- string rows are treated as already-rendered prompts
- HTTP(S) CSV/Parquet download URLs are read as row objects whose columns match the Function inputs
system_prompt and json_schema should be omitted because they come from the published Function
- request-level
sampling_params are merged on top of the Function/runtime defaults
- dataset IDs such as
dataset-<uuid> are not supported
Only text Functions are supported through the Batch API today. Image, PDF, and other multimodal Functions are not supported here yet.
Request Body
inputs
string[]|object[]|string
required
Accepts one of the following input forms:
- Array — an array of strings, or object rows for a Sutro Function/custom model
- Dataset ID — a dataset ID such as
dataset-<uuid>
- Download URL — an HTTP(S) CSV or Parquet download URL
Direct standalone model runs (i.e. model="gpt-oss-20b") expect string rows. Sutro Function runs expect object rows whose keys match the Function inputs, already-rendered string rows, or a CSV/Parquet download URL with matching columns.
Column name to use when inputs is a dataset ID or a download URL for standalone model inference.Dataset IDs require a column_name to be passed indicating which column to use. For pre-signed download URLs, column_name selects the column to run; if omitted, the first column is used.Omit column_name when model is a Sutro Function name. Instead, data sent via download URLs is matched against using the Function’s declared input fields and then templated into the right string format by Sutro.
model
string
default:"gpt-oss-20b"
Standalone model ID, custom model name, or published Sutro Function name.If the value is not an available standalone model, Sutro treats it as a Function name and resolves the correct model to use based on the Function’s latest spec.
System prompt for standalone model batch inference.Omit this field when model is a Sutro Function name.
Structured output schema for standalone model batch inference.Omit this field when model is a Sutro Function name.
Sampling parameters for the batch job. See Sampling Parameters.For Sutro Function jobs, most users should omit this and use the published defaults. If provided, these values override the Function/runtime defaults for that job.
Batch priority level. Priorities 0 and 1 are supported.Dataset IDs require priority 1.
If True, the API will return cost estimates instead of running full inference. See Cost
Estimates for more information
If true, generate a random seed per input row.
If true, rows that exceed the selected model’s context window are truncated to fit.
Optional job name for metadata and experiment tracking. Maximum length is 45 characters.
Optional job description for metadata and experiment tracking. Maximum length is 512 characters.
Your Sutro API key using the Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...
Response
Returns the created job ID in both metadata.job_id and results.
Metadata for the created job. Contains job_id and message.
Job ID for the created batch inference job. This is the same value as metadata.job_id.
{
"metadata": {
"job_id": "job-12345678-1234-1234-1234-1234567890ab",
"message": "Job created successfully"
},
"results": "job-12345678-1234-1234-1234-1234567890ab"
}
Code Examples
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "gpt-oss-20b",
"inputs": [
"What is the capital of France?",
"Explain quantum computing in simple terms.",
"Write a haiku about programming.",
],
"system_prompt": "You are a helpful assistant.",
"job_priority": 0,
},
)
result = response.json()
print(f"Job created: {result['results']}")
Published Sutro Function with object rows
Replace lead-qualifier and the input field names with your published Function name and schema.
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "lead-qualifier",
"inputs": [
{
"query": "Find cybersecurity leaders evaluating AI vendors.",
"region": "APAC",
},
{
"query": "Find sales operations leaders replacing manual enrichment.",
"region": "EMEA",
},
],
"job_priority": 0,
"name": "lead-qualifier-smoke",
},
)
result = response.json()
print(f"Job created: {result['results']}")
Published Sutro Function with a download URL
The CSV or Parquet file must contain columns matching the Function inputs. For this example, the file contains query and optionally region.
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "lead-qualifier",
"inputs": "https://your-bucket.s3.amazonaws.com/leads.parquet?X-Amz-Algorithm=...",
"job_priority": 1,
"name": "lead-qualifier-file-run",
},
)
result = response.json()
print(f"Job created: {result['results']}")
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "gpt-oss-20b",
"inputs": "https://your-bucket.s3.amazonaws.com/data.parquet?X-Amz-Algorithm=...",
"column_name": "prompt",
"system_prompt": "You are a helpful assistant.",
"job_priority": 1,
},
)
result = response.json()
print(f"Job created: {result['results']}")
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "gpt-oss-20b",
"inputs": "dataset-8be01234-abcd-5678-ef90-1234567890ab",
"column_name": "prompt",
"system_prompt": "You are a helpful assistant.",
"job_priority": 1,
},
)
result = response.json()
print(f"Job created: {result['results']}")
Cost estimate
import requests
response = requests.post(
"https://api.sutro.sh/batch-inference",
headers={
"Authorization": "Key YOUR_SUTRO_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "lead-qualifier",
"inputs": [
{
"query": "Find cybersecurity leaders evaluating AI vendors.",
"region": "APAC",
}
],
"job_priority": 0,
"cost_estimate": True,
},
)
result = response.json()
print(f"Estimate job created: {result['results']}")