Skip to main content

What is Sutro?

Sutro helps you build reliable AI decision systems for repeated tasks. Instead of hand-tuning prompts and hoping they hold up in production, Sutro gives you a structured workflow to align AI behavior with your team’s actual preferences and then deploy the result at scale. Sutro has two core components:
  • Sutro Functions — Build task-specific judges, along with classifiers and extractors, that reflect your team’s decision preferences. With an evolving record book of impactful annotations, it helps you build an optimized, deployable function you can invoke by name.
  • Batch Inference — Run large-scale offline inference across thousands to millions of inputs. Easily execute your Sutro Functions at any scale, or run OSS LLMs directly for analytical and generation workloads.
Functions and Batch work best together, but can be used independently.

Sutro Functions

With Sutro Functions, you can expect:
  • Speed: Maximizes prompt quality per unit of your time; you spend minutes labeling, not hours of testing and rewriting.
  • Stability Create a consistent foundation of expertise to measure & optimize against
  • Maintainability: Swap models, add new data, and re-optimize without regressing on past failures.
  • Adaptability: Compress tasks into the right model for the job.
Describe your task, upload a representative sample of your data, and label the highest-impact cases Sutro surfaces. Sutro then optimizes a prompt that matches your preferences — and improves further with each iteration.

Key use cases

  • Evals for agents and single call LLMs
  • User intent analysis
  • Data filtering and transformation (multimodal and text-based)
  • Data tagging or labelling
  • Classical ML decisioning (lead generation, fraud detection, compliance, KYC, etc)

Sutro Functions

Learn how Functions work and what you can build with them.

Batch Inference

Sutro’s batch inference platform is the production runtime for Sutro Functions, built to process millions of rows at once. It also handles standalone large-scale offline workloads — synthetic data generation, embeddings, LLM-as-a-judge evaluations, and more. With batch inference, you can expect:
  • Speed: Large-scale jobs finish in an hour or less, not a day from now.
  • Scale: From a handful of inputs to billions of tokens per job.
  • Cost: Less than 25% the cost of real-time inference providers.
  • Security: Custom data retention policies and optional bring-your-own-storage.

Batch Inference

Run your first batch job.

When to use Sutro

Quickly build an expert-aligned judge -> Run it at scale Most users start with Sutro Functions to build a judge or other decision model that is aligned with their preferences, then use Batch Inference to run that function across production data at scale. We think Functions and Batch work well used together, as the types of tasks that Functions is best at formalizing are often latency insensitive and high volume in nature, thus well suited for batch inference. Batch Inference also works standalone for workloads that don’t need strong preference alignment — synthetic data generation, embeddings, and more.

When not to use Sutro

Functions

If your model will produce lengthy, abstractive, or otherwise “open-world” text - rather than a specific decision - Sutro Functions won’t be a great fit today. In some cases, you may be able to optimize these types of tasks indirectly using a Sutro built judge, if it can be evaluated in a verifiable manner. See our task design section section for more info on best practices.

Batch Inference

You’re building a user-facing application with real-time, one-off inference calls (e.g. a chatbot) where latency is critical. For those use cases, we recommend an inference provider that optimizes for latency.

Not sure where to start?

We’d love to hear about your use case and help you figure out the right approach. We also offer custom solutions for enterprise customers. Contact us at team@sutro.sh.