POST
/
list-dataset-files
Listing Files in a Dataset
curl --request POST \
  --url https://api.sutro.sh/list-dataset-files \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "dataset_id": "<string>"
}'
{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}
List all files in a dataset.

Request Body

dataset_id
string
required
The ID of the dataset to list the files in

Headers

Authorization
string
required
Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns a JSON object containing a list of files in the dataset, in the order they were uploaded.
files
array
A list of files in the dataset, ordered by upload time. Each file object contains metadata about the file including file_name, upload time, size, and other relevant information.
{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

Code Examples

import requests

response = requests.post(
    'https://api.sutro.sh/list-dataset-files',
    headers={
        'Authorization': 'Key YOUR_SUTRO_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'dataset_id': 'dataset_12345'
    }
)

result = response.json()
print(f"Found {len(result['files'])} files in dataset:")

for i, file in enumerate(result['files'], 1):
    print(f"{i}. {file['file_name']}")
    print(f"   File ID: {file['file_id']}")
    print(f"   Uploaded: {file['uploaded_at']}")
    print(f"   Size: {file['size_bytes']} bytes")
    print(f"   Rows: {file['row_count']}")
    print("---")

File Object Fields

Each file in the files array contains the following fields:
  • file_name: Name of the file as it appears in the dataset
  • file_id: Unique identifier for the file
  • uploaded_at: ISO timestamp of when the file was uploaded
  • size_bytes: Size of the file in bytes
  • row_count: Number of rows/records in the file
  • schema: Schema information including field names and types

Notes

  • Files are returned in the order they were uploaded to the dataset
  • This ordering is preserved and matches the order used for batch inference
  • Use the file_name from this response with the download endpoint
  • All files in a dataset share the same schema structure