Introduction
Why Sutro?
While newer, pre-trained AI models and inference infrastructure typically support transactional, online workloads like chatbots and IDEs, they are also incredibly useful for offline, large-scale, data analysis and generation workloads. Sutro offers a managed, scalable, and cost-effective platform for batch and offline inference with LLM’s. We handle the infrastructure, inference optimizations, and cost management, so you can focus on solving the problems you care about. You simply bring your data, (and optionally, your models), and we’ll handle the rest. With Sutro, you can expect improvements in:- Speed: you can expect large-scale jobs (thousands of requests or more) to finish in minutes, allowing you to iterate quickly.
- Scale: Our platform can handle a few tokens, to billions of tokens per job.
- Cost: you can expect to pay up to 90% less than real-time inference providers in certain cases.
- Security: set custom data retention policies and optionally bring your own storage for zero-visibility deployments.
- Ease of use: you can use our Python SDK to quickly get up and running, and use our web observability app to easily monitor jobs and view results.