Using Datasets

Using Datasets#

When working with larger amounts of data, you may want to handle uploads, downloads, and other prepration separate from job submission. You can do this using by creating a dataset within Sutro, uploading your data, and then using the dataset ID when submitting a job.

Currently, the datasets API is limited to columnar data and only supports Parquet files. In the future, we’ll add support for other file formats and more flexible data types, as well as the ability to create datasets from external storage.

See our example Inference Example with Datasets for a complete example of using a dataset to submit a job.