Running batch inference
Parameters:
-
data
(Union[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on. -
model
(list, str, optional): The model(s) to use for inference. Default is “llama-3.1-8b”. Can accept a list of models, in which case the inference will be run in parallel for each model withstay_attached=False
. -
column
(str, optional): The column name to use for inference. Required if data is a DataFrame or file path. -
output_column
(str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference_result”. -
job_priority
(int, optional): The priority of the job. Default is 0. -
output_schema
(Union[Dict[str, Any], BaseModel], optional): A structured schema for the output. Can be either a dictionary representing a JSON schema or a pydantic BaseModel. Defaults to None. -
system_prompt
(str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None. -
sampling_params
(dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters. -
random_seed_per_input
(bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False. -
dry_run
(bool, optional): If True, return cost estimates instead of running inference. Default is False. -
stay_attached
(bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default behavior is True for priority 0 jobs, and False for priority 1 jobs. -
truncate_rows
(bool, optional): If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window. Defaults to False.
Monitoring job status
stay_attached=True
when calling infer(...)
This method connects to a running job and displays live progress updates, including the number of rows processed and token statistics. It shows a progress bar with real-time updates until the job completes.
Parameters:
job_id
(str): The ID of the job to attach to
Returns: None
Job Status Behavior:
RUNNING
: Streams progress updates with a live progress bar and job statisticsSUCCEEDED
: Notifies that the job already completed and suggests usingsutro jobs results
FAILED
: Displays failure message and exitsCANCELLED
: Displays cancellation message and exits
Example:
Note: This method is ideal for monitoring long-running jobs interactively. For programmatic use cases where you don’t want live progress updates, use the simpler
await_job_completion()
instead.
Await Job Completion
await_job_completion
is best for this use case, and should only be used when not using the stay_attached
parameter of infer(...)
, or the attach(...)
function.
Waits for a job to complete and return its results upon successful completion.
This method polls the job status every 5 seconds (and prints it out) until the job completes, fails, is cancelled, or the timeout is reached.
Parameters:
job_id
(str): The ID of the job to await.timeout
(Optional[int]): Maximum time in seconds to wait for job completion. Defaults to 7200 (2 hours).
Returns: list | None: The results of the job if it completes successfully, or
None
if the job fails, is cancelled, or encounters an error.
Job Status Outcomes:
SUCCEEDED
: Returns the job resultsFAILED
: ReturnsNone
CANCELLED
: ReturnsNone
Example: