A/B Tests

Datasets or use cases that hold your prompts and let you compare variants side-by-side.

What are A/B tests?

A/B tests are collections of requests—actual user queries with their conversation history. They represent datasets or use cases that you want to test and optimize by comparing different variant configurations side-by-side.

When do you use A/B tests?

Use A/B tests to organize prompts by use case or dataset, then determine which model configuration performs best on your actual data. Compare quality, latency, and cost across variants.

How do A/B tests work?

First, add prompts to your A/B test via:

  • Traces - LLM tracing
  • Live Endpoint - API traffic
  • Manual Input - Hand-entered examples
  • File Import - CSV/JSON uploads

Once you have prompts in your A/B test, you can compare up to 5 variants at once. Each variant runs on every prompt, and you evaluate the results using metrics to determine which configuration performs best.