Listing Files in a Dataset

curl --request POST \
  --url https://api.sutro.sh/list-dataset-files \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "dataset_id": "<string>"
}'

{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

POST

list-dataset-files

curl --request POST \
  --url https://api.sutro.sh/list-dataset-files \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "dataset_id": "<string>"
}'

{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.

List all files in a dataset.

Request Body

dataset_id

string

required

The ID of the dataset to list the files in

Headers

Authorization

string

required

Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns a JSON object containing a list of files in the dataset, in the order they were uploaded.

files

array

A list of files in the dataset, ordered by upload time. Each file object contains metadata about the file including file_name, upload time, size, and other relevant information.

{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

Code Examples

import requests

response = requests.post(
    'https://api.sutro.sh/list-dataset-files',
    headers={
        'Authorization': 'Key YOUR_SUTRO_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'dataset_id': 'dataset_12345'
    }
)

result = response.json()
print(f"Found {len(result['files'])} files in dataset:")

for i, file in enumerate(result['files'], 1):
    print(f"{i}. {file['file_name']}")
    print(f"   File ID: {file['file_id']}")
    print(f"   Uploaded: {file['uploaded_at']}")
    print(f"   Size: {file['size_bytes']} bytes")
    print(f"   Rows: {file['row_count']}")
    print("---")

File Object Fields

Each file in the files array contains the following fields:

file_name: Name of the file as it appears in the dataset
file_id: Unique identifier for the file
uploaded_at: ISO timestamp of when the file was uploaded
size_bytes: Size of the file in bytes
row_count: Number of rows/records in the file
schema: Schema information including field names and types

Notes

Files are returned in the order they were uploaded to the dataset
This ordering is preserved and matches the order used for batch inference
Use the file_name from this response with the download endpoint
All files in a dataset share the same schema structure

Listing Datasets

Batch API

Datasets API

Listing Files in a Dataset

Request Body

Headers

Response

Code Examples

File Object Fields

Notes

Batch API

Datasets API

​Request Body

​Headers

​Response

​Code Examples

​File Object Fields

​Notes

Request Body

Headers

Response

Code Examples

File Object Fields

Notes