Sutro API Documentation#
Sutro is a serverless batch LLM inference service. We’re currently in beta with a focus on data processing workloads, enabling use cases including synthetic data generation, classification, structured extraction, and more.
Our product is available via our hosted batch inference API and Python SDK. You can expect up to 80-90% decrease in costs relative to online inference solutions, support for very large-scale jobs, extreme throughput capabilities, and white-glove support. If you have a use case that may benefit from our tools, please reach out to us at team@sutro.sh to discuss your requirements.
Contents
- Introduction
- Installation
- Quickstart
- Examples
- Batch API Reference
- Stage API Reference
- Python SDK and CLI Reference
- Available Models
- Text-to-Text Models
- Reasoning Models (text-to-text)
- Embedding Models
- Custom Models