A/B Tests
Datasets or use cases that hold your prompts and let you compare variants side-by-side.
What are A/B tests?
A/B tests are collections of requests—actual user queries with their conversation history. They represent datasets or use cases that you want to test and optimize by comparing different variant configurations side-by-side.
When do you use A/B tests?
Use A/B tests to organize prompts by use case or dataset, then determine which model configuration performs best on your actual data. Compare quality, latency, and cost across variants.
How do A/B tests work?
First, add prompts to your A/B test via:
- Traces - LLM tracing
- Live Endpoint - API traffic
- Manual Input - Hand-entered examples
- File Import - CSV/JSON uploads
Once you have prompts in your A/B test, you can compare up to 5 variants at once. Each variant runs on every prompt, and you evaluate the results using metrics to determine which configuration performs best.