Python SDK

Python SDK#

The Python SDK provides a Pythonic way to interact with the API. In many prototyping scenarios, you may find it most convenient to use the Python SDK and CLI to interact with Sutro.

See the Installation guide to install the SDK.

Basic Methods#

Setting your API key#

When you initialize the SDK, you can set your API key by calling the set_api_key method. Additionally, you can set your API key by running the sutro login command in the CLI.

set_api_key(self, api_key: str)#

Set the API key for the Sutro API.

Parameters:

api_key (str): The API key to set.

Returns: None

Running batch inference#

infer(self, data, model='llama-3.1-8b', column=None, output_column='inference_result', job_priority=0, output_schema=None, system_prompt=None, dry_run=False, stay_attached=None, truncate_rows=False)#

Run LLM inference on a large list, table, dataframe, or file.

Parameters:

data (Union[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on.
model (str, optional): The model to use for inference. Default is “llama-3.1-8b”.
column (str, optional): The column name to use for inference. Required if data is a DataFrame or file path.
output_column (str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference_result”.
job_priority (int, optional): The priority of the job. Default is 0.
output_schema (Union[Dict[str, Any], BaseModel], optional): A structured schema for the output. Can be either a dictionary representing a JSON schema or a pydantic BaseModel. Defaults to None.
system_prompt (str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None.
sampling_params (dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters.
random_seed_per_input (bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False.
dry_run (bool, optional): If True, return cost estimates instead of running inference. Default is False.
stay_attached (bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default behavior is True for priority 0 jobs, and False for priority 1 jobs.
truncate_rows (bool, optional): If True, any rows that have a token count exceeding the context window length of the selected model will be truncated to the max length that will fit within the context window. Defaults to False.

Returns: Union[List, pd.DataFrame, pl.DataFrame, str]: The results of the inference or job ID.

Monitoring job status#

attach(self, job_id: str)#

Attach to an existing job and stream its progress in real-time. This has the equivalent behavior of setting stay_attached=True when calling infer(…)

This method connects to a running job and displays live progress updates, including the number of rows processed and token statistics. It shows a progress bar with real-time updates until the job completes.

Parameters:

job_id (str): The ID of the job to attach to

Returns: None

Job Status Behavior:

RUNNING: Streams progress updates with a live progress bar and job statistics
SUCCEEDED: Notifies that the job already completed and suggests using sutro jobs results
FAILED: Displays failure message and exits
CANCELLED: Displays cancellation message and exits

Example:

>>> # Attach to a running job to monitor its progress
>>> client.attach("job_12345")
>>> # Progress bar will display:
>>> # Progress: 45%|████████████████                    | 450/1000 [00:32<00:45]
>>> # Input tokens processed: 12500, Tokens generated: 8300, Total tokens/s: 325.4

Note: This method is ideal for monitoring long-running jobs interactively. For programmatic use cases where you don’t want live progress updates, use the simpler await_job_completion() instead.

await_job_completion(self, job_id: str, timeout: int | None = 7200) → list | None#

When deployed as part of a pipeline (Dagster, Airflow, etc) you might not be interested in seeing the progress of the job as it happens. await_job_completion is best for this use case, and should only be used when not using the stay_attached parameter of infer(…), or the attach(…) function.

Waits for a job to complete and return its results upon successful completion.

This method polls the job status every 5 seconds (and prints it out) until the job completes, fails, is cancelled, or the timeout is reached.

Parameters:

job_id (str): The ID of the job to await.
timeout (Optional[int]): Maximum time in seconds to wait for job completion. Defaults to 7200 (2 hours).

Returns: list | None: The results of the job if it completes successfully, or None if the job fails, is cancelled, or encounters an error.

Job Status Outcomes:

SUCCEEDED: Returns the job results
FAILED: Returns None
CANCELLED: Returns None

Example:

>>> results = client.await_job_completion("job_12345", timeout=3600)
>>> # Job status is RUNNING for job-f9102252-ae2f-4d61-a879-a657e314f2e0
>>> if results:
...     print(f"Job completed with {len(results)} results")

Getting quotas#

get_quotas(self)#

Get your current quotas.

Returns: list: A list of quotas, one for each priority level. Contains row_quota and token_quota for each priority level.

Job Methods#

Listing jobs#

list_jobs(self)#

List all jobs associated with the API key.

Returns: list: A list of job details.

Getting job status#

get_job_status(self, job_id: str)#

Get the status of a job by its ID.

Parameters:

job_id (str): The ID of the job to retrieve the status for.

Returns: dict: The status of the job.

Getting job results#

get_job_results(self, job_id: str, include_inputs: bool = False, include_cumulative_logprobs: bool = False, with_original_df: pl.DataFrame | pd.DataFrame = None, output_column: str = 'inference_result')#

Get the results of a job by its ID.

Parameters:

job_id (str): The ID of the job to retrieve the results for.
include_inputs (bool, optional): Whether to include the inputs in the results. Defaults to False.
include_cumulative_logprobs (bool, optional): Whether to include the cumulative logprobs in the results. Defaults to False.
with_original_df (Union[pl.DataFrame, pd.DataFrame], optional): Original DataFrame to join results with. Defaults to None.
output_column (str, optional): Name of the column containing results. Defaults to “inference_result”.

Returns: Union[pl.DataFrame, pd.DataFrame]: Results as a DataFrame.

If with_original_df is provided: Returns the same type as the input DataFrame with results added as a new column
If with_original_df is None: Returns a polars DataFrame by default

The DataFrame will contain:

inputs column (if include_inputs=True). Each cell contains the input string given to the model.
inference_result column (or custom name via output_column)
cumulative_logprobs column (if include_cumulative_logprobs=True)

Example:

# Get just the results
results = sutro.get_job_results(job_id)
# Returns: pl.DataFrame with one column 'inference_result'

# Get results with inputs
results = sutro.get_job_results(job_id, include_inputs=True)
# Returns: pl.DataFrame with columns ['inputs', 'inference_result']

# Add results back to original DataFrame
df_with_results = sutro.get_job_results(job_id, with_original_df=original_df)
# Returns: Same type as original_df with 'inference_result' column added. Matches the return shape of .infer(...) when stay_attached=True.

Cancelling jobs#

cancel_job(self, job_id: str)#

Cancel a job by its ID.

Parameters:

job_id (str): The ID of the job to cancel.

Returns: dict: The status of the job cancellation.

Dataset Methods#

Creating a dataset#

create_dataset(self)#

Create a new internal dataset.

Returns: dict: A dictionary containing the dataset ID.

Listing all datasets#

list_datasets(self)#

List all datasets.

Returns: list: A list of dataset IDs.

Listing all files in a dataset#

list_dataset_files(self, dataset_id: str)#

List all files in a dataset.

Parameters:

dataset_id (str): The ID of the dataset to list the files in.

Returns: list: A list of file names in the dataset.

Uploading files to a dataset#

upload_to_dataset(self, dataset_id: List[str] | str = None, file_paths: List[str] | str = None)#

Upload files to a dataset.

This method uploads files to a dataset. Accepts a dataset ID and file paths. If only a single parameter is provided, it will be interpreted as the file paths.

Parameters:

dataset_id (Union[List[str], str], optional): The ID of the dataset to upload the files to. If not provided, the files will be uploaded to a new dataset.
file_paths (Union[List[str], str], optional): A list of file paths to upload.

Returns: list: A list of file names in the dataset.

Downloading files from a dataset#

download_from_dataset(self, dataset_id: str, files: List[str] | str = None, output_path: str = None)#

Download a file from a dataset.

This method downloads files from a dataset. Accepts a dataset ID and file name. If no file name is provided, all files in the dataset will be downloaded.

Parameters:

dataset_id (str): The ID of the dataset to download the file from.
files (Union[List[str], str], optional): The name(s) of the file(s) to download. If not provided, all files in the dataset will be downloaded.
output_path (str, optional): The directory to save the downloaded files to. If not provided, the files will be saved to the current working directory.

Returns: None

Python SDK

Contents

Python SDK#

Basic Methods#

Setting your API key#

Running batch inference#

Monitoring job status#

Getting quotas#

Job Methods#

Listing jobs#

Getting job status#

Getting job results#

Cancelling jobs#

Dataset Methods#

Creating a dataset#

Listing all datasets#

Listing all files in a dataset#

Uploading files to a dataset#

Downloading files from a dataset#