Reference documentation for Batch Inference.
data
(Union[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on.
model
(list, str, optional): The model(s) to use for inference. Default is “llama-3.1-8b”. Can accept a list of models, in which case the inference will be run in parallel for each model with stay_attached=False
.
column
(str, optional): The column name to use for inference. Required if data is a DataFrame or file path.
output_column
(str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference_result”.
job_priority
(int, optional): The priority of the job. Default is 0.
output_schema
(Union[Dict[str, Any], BaseModel], optional): A structured schema for the output. Can be either a dictionary representing a JSON schema or a pydantic BaseModel. Defaults to None.
system_prompt
(str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None.
sampling_params
(dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters.
random_seed_per_input
(bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False.
dry_run
(bool, optional): If True, return cost estimates instead of running inference. Default is False.
stay_attached
(bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default behavior is True for priority 0 jobs, and False for priority 1 jobs.
truncate_rows
(bool, optional): If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window. Defaults to False.
stay_attached=True
when calling infer(...)
This method connects to a running job and displays live progress updates, including the number of rows processed and token statistics. It shows a progress bar with real-time updates until the job completes.
Parameters:
job_id
(str): The ID of the job to attach toReturns: None
Job Status Behavior:
RUNNING
: Streams progress updates with a live progress bar and job statisticsSUCCEEDED
: Notifies that the job already completed and suggests using sutro jobs results
FAILED
: Displays failure message and exitsCANCELLED
: Displays cancellation message and exitsExample:
Note: This method is ideal for monitoring long-running jobs interactively. For programmatic use cases where you don’t want live progress updates, use the simpler
await_job_completion()
instead.
await_job_completion
is best for this use case, and should only be used when not using the stay_attached
parameter of infer(...)
, or the attach(...)
function.
Waits for a job to complete and return its results upon successful completion.
This method polls the job status every 5 seconds (and prints it out) until the job completes, fails, is cancelled, or the timeout is reached.
Parameters:
job_id
(str): The ID of the job to await.timeout
(Optional[int]): Maximum time in seconds to wait for job completion. Defaults to 7200 (2 hours).Returns: list | None: The results of the job if it completes successfully, or
None
if the job fails, is cancelled, or encounters an error.
Job Status Outcomes:
SUCCEEDED
: Returns the job resultsFAILED
: Returns None
CANCELLED
: Returns None
Example: