POST
/
upload-to-dataset
Uploading a File
curl --request POST \
  --url https://api.sutro.sh/upload-to-dataset \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "dataset_id": "<string>"
}'
{
  "file_id": "file_abc123def456"
}
Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.
Upload a file to a dataset.

Usage Notes

  • Currently, only parquet files are supported. Snappy compression is supported.
  • You can only upload one file at a time via the API. You must make multiple requests to upload multiple files. The Python SDK supports uploading multiple files at once.
  • All files must have the same schema. Files with schemas that do not match will be rejected.
  • Names must be unique. If you upload a file with a name that already exists in the dataset, it will be rejected.
  • When you upload to a dataset, the ordering will be preserved. If you upload multiple files to a dataset, they will be added to the dataset in the order you provide them.

Request Body

dataset_id
string
required
The ID of the dataset to upload the file to
file
file
required
The file to upload (must be a parquet file)

Headers

Authorization
string
required
Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns the file ID of the uploaded file.
file_id
string
The unique identifier for the uploaded file
{
  "file_id": "file_abc123def456"
}

Code Examples

import requests

# Upload a parquet file
with open('data.parquet', 'rb') as file:
    response = requests.post(
        'https://api.sutro.sh/upload-to-dataset',
        headers={
            'Authorization': 'Key YOUR_SUTRO_API_KEY'
        },
        json={
            'dataset_id': 'dataset_12345'
        },
        files={
            'file': file
        }
    )

result = response.json()
if 'file_id' in result:
    print(f"File uploaded successfully: {result['file_id']}")
else:
    print(f"Upload failed: {result.get('error', 'Unknown error')}")

Important Considerations

  • File Format: Only parquet files are currently supported
  • Schema Consistency: All files in a dataset must share the same schema
  • Unique Names: File names must be unique within each dataset
  • Ordering: Files are processed in the order they are uploaded
  • Compression: Snappy compression is supported for parquet files
  • Single File Limit: API supports one file per request (use Python SDK for batch uploads)