> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Batch Inference

> Reference documentation for Batch Inference.

### Running batch inference

```Python theme={null}
infer(self, data, model='gpt-oss-20b', column=None, output_column='inference_result', job_priority=0, output_schema=None, system_prompt=None, dry_run=False, stay_attached=None, truncate_rows=True)
```

Run LLM inference on a large list, table, dataframe, or file.

#### Parameters:

* `data` (Union\[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on.

* `model` (list, str, optional): The model(s) to use for inference. Default is “gpt-oss-20b”. Can accept a list of models, in which case the inference will be run in parallel for each model with `stay_attached=False`.

* `name` (Union\[str, List\[str]], optional): A job name for experiment/metadata tracking purposes. If using a list of models, you must pass a list of names with length equal to the number of models, or None. Defaults to None.

* `description` (Union\[str, List\[str]], optional): A job description for experiment/metadata tracking purposes. If using a list of models, you must pass a list of descriptions with length equal to the number of models, or None. Defaults to None.

* `column` (Union\[str, List\[str]], optional): The column name to use for inference. Required if data is a DataFrame or file path. If a list is supplied, it will concatenate the columns of the list into a single column, accepting separator strings.

* `output_column` (str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference\_result”.

* `job_priority` (int, optional): The [priority](/concepts/job-priority) of the job. Default is 0.

* `output_schema` (Union\[Dict\[str, Any], BaseModel], optional): A structured schema for the output. Can be either a dictionary representing a JSON schema or a pydantic BaseModel. Defaults to None.

* `system_prompt` (str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None.

* `sampling_params` (dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters.

* `random_seed_per_input` (bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False.

* `dry_run` (bool, optional): If True, return cost estimates instead of running inference. Default is False.

* `stay_attached` (bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default behavior is True for priority 0 jobs, and False for priority 1 jobs.

* `truncate_rows` (bool, optional): If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window. Defaults to True.

**Returns:**

str: The ID of the inference job.

### Monitoring job status

```Python theme={null}
attach(self, job_id: str)
```

Attach to an existing job and stream its progress in real-time. This has the equivalent behavior of setting `stay_attached=True` when calling `infer(...)`

>

This method connects to a running job and displays live progress updates, including the number of rows processed and token statistics. It shows a progress bar with real-time updates until the job completes.

>

**Parameters:**

* `job_id` (str): The ID of the job to attach to

>

**Returns:** None

>

**Job Status Behavior:**

* `RUNNING`: Streams progress updates with a live progress bar and job statistics
* `SUCCEEDED`: Notifies that the job already completed and suggests using `sutro jobs results`
* `FAILED`: Displays failure message and exits
* `CANCELLED`: Displays cancellation message and exits

>

**Example:**

```python theme={null}
# Attach to a running job to monitor its progress
sutro.attach("job_12345")
# Progress bar will display:
# Progress: 45%[████████████████                ] 450/1000 [00:32\<00:45] Input tokens processed: 12500, Tokens generated: 8300, Total tokens/s: 325.4
```

>

**Note:** This method is ideal for monitoring long-running jobs interactively. For programmatic use cases where you don't want live progress updates, use the simpler `await_job_completion()` instead.

## Await Job Completion

```Python theme={null}
await_job_completion(self, job_id: str, timeout: int | None = 7200) → list | None
```

When deployed as part of a pipeline (Dagster, Airflow, etc) you might not be interested in seeing the progress of the job as it happens. `await_job_completion` is best for this use case, and should only be used when not using the `stay_attached` parameter of `infer(...)`, or the `attach(...)` function.

>

Waits for a job to complete and return its results upon successful completion.

>

This method polls the job status every 5 seconds (and prints it out) until the job completes, fails, is cancelled, or the timeout is reached.

>

**Parameters:**

* `job_id` (str): The ID of the job to await.
* `timeout` (Optional\[int]): Maximum time in seconds to wait for job completion. Defaults to 7200 (2 hours).

>

**Returns:** list | None: The results of the job if it completes successfully, or `None` if the job fails, is cancelled, or encounters an error.

>

**Job Status Outcomes:**

* `SUCCEEDED`: Returns the job results
* `FAILED`: Returns `None`
* `CANCELLED`: Returns `None`

>

**Example:**

```python theme={null}
results = client.await_job_completion("job_12345", timeout=3600)
# Job status is RUNNING for job-f9102252-ae2f-4d61-a879-a657e314f2e0
if results:
    print(f"Job completed with {len(results)} results")
```

## Getting Quotas

```Python theme={null}
get_quotas(self)
```

Get your current quotas.

**Returns:** list: A list of quotas, one for each priority level. Contains row\_quota and token\_quota for each priority level.
