Inference Example with Datasets#
This example demonstrates how to use the Python SDK to upload a dataset, submit a batch inference job, and download the results.
from import sutro as so
so.set_api_key("your_api_key") # you can skip this if you set your key via `sutro login`
# create a dataset
dataset_id = so.create_dataset("your_dataset_id")
print(f"Created dataset with ID: {dataset_id}")
# Let's upload some files to the dataset. In this case, the two files are in the same directory as this script.
# Both have the same schema, that looks like:
# {
# "id": int64,
# "prompt": string,
# }
so.upload_to_dataset(dataset_id, ["file1.parquet", "file2.parquet"])
system_prompt = "Label the following text as positive or negative."
json_schema = {
"type": "object",
"properties": {
"label": {"type": "string", "enum": ["positive", "negative"]}
},
"required": ["label"]
}
# submit a batch inference job
so.infer(dataset_id, column="prompt", model="llama-3.2-3b", system_prompt=system_prompt, output_schema=json_schema, job_priority=1)
You can either poll for the status of the job, or periodically check in via sutro jobs status <job_id> in the CLI.
Once the job is done, you can download the results:
import sutro as so
dataset_id = "your_dataset_id"
# download the results, which will be in the output_dir directory, and will be named the same as the files in the dataset.
so.download_from_dataset(dataset_id, output_path="output_dir")
# you can inspect the inference results
df = pl.read_parquet("output_dir/*.parquet")
for row in df.iter_rows(named=True):
print(f"prompt: {row['prompt']}, label: {row['inference_result']['label']}")
break