Skip to main content
POST
/
list-dataset-files
Listing Files in a Dataset
curl --request POST \
  --url https://api.sutro.sh/list-dataset-files \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "dataset_id": "<string>"
}
'
{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

Documentation Index

Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt

Use this file to discover all available pages before exploring further.

Using the API directly is not recommended for most users. Instead, we recommend using the Python SDK.
List all files in a dataset.

Request Body

dataset_id
string
required
The ID of the dataset to list the files in

Headers

Authorization
string
required
Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...

Response

Returns a JSON object containing a list of files in the dataset, in the order they were uploaded.
files
array
A list of files in the dataset, ordered by upload time. Each file object contains metadata about the file including file_name, upload time, size, and other relevant information.
{
  "files": [
    {
      "file_name": "training_batch_1.parquet",
      "file_id": "file_abc123def456",
      "uploaded_at": "2024-01-15T10:30:00Z",
      "size_bytes": 524288,
      "row_count": 1000,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    },
    {
      "file_name": "training_batch_2.parquet",
      "file_id": "file_def456ghi789",
      "uploaded_at": "2024-01-15T11:00:00Z",
      "size_bytes": 262144,
      "row_count": 500,
      "schema": {
        "fields": [
          {"name": "input", "type": "string"},
          {"name": "output", "type": "string"},
          {"name": "category", "type": "string"}
        ]
      }
    }
  ]
}

Code Examples

import requests

response = requests.post(
    'https://api.sutro.sh/list-dataset-files',
    headers={
        'Authorization': 'Key YOUR_SUTRO_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'dataset_id': 'dataset_12345'
    }
)

result = response.json()
print(f"Found {len(result['files'])} files in dataset:")

for i, file in enumerate(result['files'], 1):
    print(f"{i}. {file['file_name']}")
    print(f"   File ID: {file['file_id']}")
    print(f"   Uploaded: {file['uploaded_at']}")
    print(f"   Size: {file['size_bytes']} bytes")
    print(f"   Rows: {file['row_count']}")
    print("---")

File Object Fields

Each file in the files array contains the following fields:
  • file_name: Name of the file as it appears in the dataset
  • file_id: Unique identifier for the file
  • uploaded_at: ISO timestamp of when the file was uploaded
  • size_bytes: Size of the file in bytes
  • row_count: Number of rows/records in the file
  • schema: Schema information including field names and types

Notes

  • Files are returned in the order they were uploaded to the dataset
  • This ordering is preserved and matches the order used for batch inference
  • Use the file_name from this response with the download endpoint
  • All files in a dataset share the same schema structure