Running batch inference
Parameters:
-
data(Union[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on. -
model(list, str, optional): The model(s) to use for inference. Default is “llama-3.1-8b”. Can accept a list of models, in which case the inference will be run in parallel for each model withstay_attached=False. -
name(Union[str, List[str]], optional): A job name for experiment/metadata tracking purposes. If using a list of models, you must pass a list of names with length equal to the number of models, or None. Defaults to None. -
description(Union[str, List[str]], optional): A job description for experiment/metadata tracking purposes. If using a list of models, you must pass a list of descriptions with length equal to the number of models, or None. Defaults to None. -
column(Union[str, List[str]], optional): The column name to use for inference. Required if data is a DataFrame or file path. If a list is supplied, it will concatenate the columns of the list into a single column, accepting separator strings. -
output_column(str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference_result”. -
job_priority(int, optional): The priority of the job. Default is 0. -
output_schema(Union[Dict[str, Any], BaseModel], optional): A structured schema for the output. Can be either a dictionary representing a JSON schema or a pydantic BaseModel. Defaults to None. -
system_prompt(str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None. -
sampling_params(dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters. -
random_seed_per_input(bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False. -
dry_run(bool, optional): If True, return cost estimates instead of running inference. Default is False. -
stay_attached(bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default behavior is True for priority 0 jobs, and False for priority 1 jobs. -
truncate_rows(bool, optional): If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window. Defaults to True.
Monitoring job status
stay_attached=True when calling infer(...)
This method connects to a running job and displays live progress updates, including the number of rows processed and token statistics. It shows a progress bar with real-time updates until the job completes.
Parameters:
job_id(str): The ID of the job to attach to
Returns: None
Job Status Behavior:
RUNNING: Streams progress updates with a live progress bar and job statisticsSUCCEEDED: Notifies that the job already completed and suggests usingsutro jobs resultsFAILED: Displays failure message and exitsCANCELLED: Displays cancellation message and exits
Example:
Note: This method is ideal for monitoring long-running jobs interactively. For programmatic use cases where you don’t want live progress updates, use the simpler
await_job_completion() instead.
Await Job Completion
await_job_completion is best for this use case, and should only be used when not using the stay_attached parameter of infer(...), or the attach(...) function.
Waits for a job to complete and return its results upon successful completion.
This method polls the job status every 5 seconds (and prints it out) until the job completes, fails, is cancelled, or the timeout is reached.
Parameters:
job_id(str): The ID of the job to await.timeout(Optional[int]): Maximum time in seconds to wait for job completion. Defaults to 7200 (2 hours).
Returns: list | None: The results of the job if it completes successfully, or
None if the job fails, is cancelled, or encounters an error.
Job Status Outcomes:
SUCCEEDED: Returns the job resultsFAILED: ReturnsNoneCANCELLED: ReturnsNone
Example: