Skip to main content

Using Datasets

When working with larger amounts of data, you may want to handle uploads, downloads, and other prepration separate from job submission. You can do this using by creating a dataset within Sutro, uploading your data, and then using the dataset ID when submitting a job. Currently, the datasets API is limited to columnar data and only supports Parquet files. In the future, we’ll add support for other file formats and more flexible data types, as well as the ability to create datasets from external storage.

Specifying a Column for Inference

When submitting a batch inference job with a dataset, you must specify which column contains the input data. See the Python SDK and batch inference API reference for usage details. See our example Dataset Inference Example for a complete example of using a dataset to submit a job.