> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Creating a Batch Inference Job

> Run batch inference on inline inputs, a dataset, a download URL, or a published Sutro Function.

<Warning>Using the API directly is not recommended for most users. Instead, we recommend using the [Python SDK](/python-sdk/setup).</Warning>

Run batch inference over a list of inputs, a dataset, or an HTTP(S) CSV/Parquet download URL.

## Using a Sutro Function as `model`

Set `model` to the published Sutro Function name and send rows whose keys match that Function's inputs.

<Note>
  Use the Function name only. Do not include a namespace, owner, or revision in `model`.

  Sutro resolves the Function namespace from the authenticated API key's user account and loads the currently published revision through that Function's `latest.json` pointer.
</Note>

When `model` is a Sutro Function name:

* object rows are validated and rendered using the Function's input fields
* string rows are treated as already-rendered prompts
* HTTP(S) CSV/Parquet download URLs are read as row objects whose columns match the Function inputs
* `system_prompt` and `json_schema` should be omitted because they come from the published Function
* request-level `sampling_params` are merged on top of the Function/runtime defaults
* dataset IDs such as `dataset-<uuid>` are not supported

<Warning>
  Only text Functions are supported through the Batch API today. Image, PDF, and other multimodal Functions are not supported here yet.
</Warning>

## Request Body

<ParamField body="inputs" type="string[]|object[]|string" required>
  Accepts one of the following input forms:

  * **Array** — an array of strings, or object rows for a Sutro Function/custom model
  * **Dataset ID** — a dataset ID such as `dataset-<uuid>`
  * **Download URL** — an HTTP(S) CSV or Parquet download URL

  Direct standalone model runs (i.e. `model="gpt-oss-20b"`) expect string rows. Sutro Function runs expect object rows whose keys match the Function inputs, already-rendered string rows, or a CSV/Parquet download URL with matching columns.
</ParamField>

<ParamField body="column_name" type="string" default="None">
  Column name to use when `inputs` is a dataset ID or a download URL for standalone model inference.

  Dataset IDs require a `column_name` to be passed indicating which column to use. For pre-signed download URLs, `column_name` selects the column to run; if omitted, the first column is used.

  Omit `column_name` when `model` is a Sutro Function name. Instead, data sent via download URLs is matched against using the Function's declared input fields and then templated into the right string format by Sutro.
</ParamField>

<ParamField body="model" type="string" default="gpt-oss-20b">
  Standalone model ID, custom model name, or published Sutro Function name.

  If the value is not an available standalone model, Sutro treats it as a Function name and resolves the correct model to use based on the Function's latest spec.
</ParamField>

<ParamField body="system_prompt" type="string" default="None">
  System prompt for standalone model batch inference.

  Omit this field when `model` is a Sutro Function name.
</ParamField>

<ParamField body="json_schema" type="object" default="None">
  Structured output schema for standalone model batch inference.

  Omit this field when `model` is a Sutro Function name.
</ParamField>

<ParamField body="sampling_params" type="object" default="None">
  Sampling parameters for the batch job. See [Sampling Parameters](/concepts/sampling-parameters).

  For Sutro Function jobs, most users should omit this and use the published defaults. If provided, these values override the Function/runtime defaults for that job.
</ParamField>

<ParamField body="job_priority" type="integer" default="0">
  Batch priority level. Priorities `0` and `1` are supported.

  Dataset IDs require priority `1`.
</ParamField>

<ParamField body="cost_estimate" type="boolean" default="false">
  If True, the API will return cost estimates instead of running full inference. See [Cost
  Estimates](/concepts/cost-estimates/) for more information
</ParamField>

<ParamField body="random_seed_per_input" type="boolean" default="false">
  If `true`, generate a random seed per input row.
</ParamField>

<ParamField body="truncate_rows" type="boolean" default="true">
  If `true`, rows that exceed the selected model's context window are truncated to fit. Truncation removes the minimal amount of text such that the token count of (input text + prompt text + max output tokens) is less than the model's context window length. If `false`, jobs with rows that exceed the context window will be marked as FAILED.
</ParamField>

<ParamField body="name" type="string" default="None">
  Optional job name for metadata and experiment tracking. Maximum length is 45 characters.
</ParamField>

<ParamField body="description" type="string" default="None">
  Optional job description for metadata and experiment tracking. Maximum length is 512 characters.
</ParamField>

## Headers

<ParamField header="Authorization" type="string" required>
  Your Sutro API key using the Key authentication scheme.

  Format: `Key YOUR_API_KEY`

  Example: `Authorization: Key sk_live_abc123...`
</ParamField>

## Response

Returns the created job ID in both `metadata.job_id` and `results`.

<ResponseField name="metadata" type="object">
  Metadata for the created job. Contains `job_id` and `message`.
</ResponseField>

<ResponseField name="results" type="string">
  Job ID for the created batch inference job. This is the same value as `metadata.job_id`.
</ResponseField>

<ResponseExample>
  ```json Response theme={null}
  {
    "metadata": {
      "job_id": "job-12345678-1234-1234-1234-1234567890ab",
      "message": "Job created successfully"
    },
    "results": "job-12345678-1234-1234-1234-1234567890ab"
  }
  ```
</ResponseExample>

## Code Examples

### Standalone model with array inputs

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "gpt-oss-20b",
          "inputs": [
              "What is the capital of France?",
              "Explain quantum computing in simple terms.",
              "Write a haiku about programming.",
          ],
          "system_prompt": "You are a helpful assistant.",
          "job_priority": 0,
      },
  )

  result = response.json()
  print(f"Job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-oss-20b",
      "inputs": [
        "What is the capital of France?",
        "Explain quantum computing in simple terms.",
        "Write a haiku about programming."
      ],
      "system_prompt": "You are a helpful assistant.",
      "job_priority": 0
    }'
  ```
</CodeGroup>

### Published Sutro Function with object rows

Replace `lead-qualifier` and the input field names with your published Function name and schema.

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "lead-qualifier",
          "inputs": [
              {
                  "query": "Find cybersecurity leaders evaluating AI vendors.",
                  "region": "APAC",
              },
              {
                  "query": "Find sales operations leaders replacing manual enrichment.",
                  "region": "EMEA",
              },
          ],
          "job_priority": 0,
          "name": "lead-qualifier-smoke",
      },
  )

  result = response.json()
  print(f"Job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "lead-qualifier",
      "inputs": [
        {
          "query": "Find cybersecurity leaders evaluating AI vendors.",
          "region": "APAC"
        },
        {
          "query": "Find sales operations leaders replacing manual enrichment.",
          "region": "EMEA"
        }
      ],
      "job_priority": 0,
      "name": "lead-qualifier-smoke"
    }'
  ```
</CodeGroup>

### Published Sutro Function with a download URL

The CSV or Parquet file must contain columns matching the Function inputs. For this example, the file contains `query` and optionally `region`.

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "lead-qualifier",
          "inputs": "https://your-bucket.s3.amazonaws.com/leads.parquet?X-Amz-Algorithm=...",
          "job_priority": 1,
          "name": "lead-qualifier-file-run",
      },
  )

  result = response.json()
  print(f"Job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "lead-qualifier",
      "inputs": "https://your-bucket.s3.amazonaws.com/leads.parquet?X-Amz-Algorithm=...",
      "job_priority": 1,
      "name": "lead-qualifier-file-run"
    }'
  ```
</CodeGroup>

### Standalone model with download URL input

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "gpt-oss-20b",
          "inputs": "https://your-bucket.s3.amazonaws.com/data.parquet?X-Amz-Algorithm=...",
          "column_name": "prompt",
          "system_prompt": "You are a helpful assistant.",
          "job_priority": 1,
      },
  )

  result = response.json()
  print(f"Job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-oss-20b",
      "inputs": "https://your-bucket.s3.amazonaws.com/data.parquet?X-Amz-Algorithm=...",
      "column_name": "prompt",
      "system_prompt": "You are a helpful assistant.",
      "job_priority": 1
    }'
  ```
</CodeGroup>

### Standalone model with dataset input

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "gpt-oss-20b",
          "inputs": "dataset-8be01234-abcd-5678-ef90-1234567890ab",
          "column_name": "prompt",
          "system_prompt": "You are a helpful assistant.",
          "job_priority": 1,
      },
  )

  result = response.json()
  print(f"Job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-oss-20b",
      "inputs": "dataset-8be01234-abcd-5678-ef90-1234567890ab",
      "column_name": "prompt",
      "system_prompt": "You are a helpful assistant.",
      "job_priority": 1
    }'
  ```
</CodeGroup>

### Cost estimate

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://api.sutro.sh/batch-inference",
      headers={
          "Authorization": "Key YOUR_SUTRO_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "lead-qualifier",
          "inputs": [
              {
                  "query": "Find cybersecurity leaders evaluating AI vendors.",
                  "region": "APAC",
              }
          ],
          "job_priority": 0,
          "cost_estimate": True,
      },
  )

  result = response.json()
  print(f"Estimate job created: {result['results']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.sutro.sh/batch-inference \
    -H "Authorization: Key YOUR_SUTRO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "lead-qualifier",
      "inputs": [
        {
          "query": "Find cybersecurity leaders evaluating AI vendors.",
          "region": "APAC"
        }
      ],
      "job_priority": 0,
      "cost_estimate": true
    }'
  ```
</CodeGroup>
