Inference Example with Datasets

Contents

Inference Example with Datasets#

This example demonstrates how to use the Python SDK to upload a dataset, submit a batch inference job, and download the results.

Submitting the Job#

from import sutro as so

so.set_api_key("sk_******") # you can skip this if you set your key via `sutro login`

# create a dataset
dataset_id = so.create_dataset()
# Note: both files must have the same schema
so.upload_to_dataset(dataset_id, ["file1.parquet", "file2.parquet"])

system_prompt = "Label the following text as positive or negative."

json_schema = {
    "type": "object",
    "properties": {
        "label": {"type": "string", "enum": ["positive", "negative"]}
    },
    "required": ["label"]
}

# submit a batch inference job
so.infer(dataset_id, column="prompt", model="llama-3.2-3b", system_prompt=system_prompt, output_schema=json_schema, job_priority=1)

Checking the Status of the Job#

You can either poll for the status of the job, or periodically check in via sutro jobs status <job_id> in the CLI or web app.

import time
import sutro as so

job_id = so.infer(
    inputs="my_file.csv",
    column="reviews",
    system_prompt="Classify the review as positive, neutral, or negative.",
    job_priority=1,
)

# Single check
status = so.get_job_status(job_id)
print(status)
# >> STARTING

# Use `await_job_completion` (blocking) to poll for status updates and
# retrieve results once they're ready
results = so.await_job_completion(job_id)

Downloading the Results#

Once the job is done, you can download the results:

import sutro as so

dataset_id = "dataset-8be.."

# download the results, which will be in the output_dir directory, and will be named the same as the files in the dataset.
so.download_from_dataset(dataset_id, output_path="output_dir")

# you can inspect the inference results

df = pl.read_parquet("output_dir/*.parquet")

for row in df.iter_rows(named=True):
    print(f"prompt: {row['prompt']}, label: {row['inference_result']['label']}")
    break