Creating A/B Tests
Set up A/B tests to organize your prompts and compare variants.
A/B tests are datasets or use cases that bring together prompts, model variants, and evaluation criteria to run systematic experiments and compare performance.
Each A/B test orchestrates the complete testing workflow: it takes your test data (prompts), runs them through multiple model configurations (variants), and measures the results using automated quality metrics.
Core Components
Each A/B test requires three key components:
- Data Sources: Choose how to load prompts into your A/B test (manual entry, file upload, traces import, or live testing through an API endpoint)
- Variants: Define at least 2 model configurations to test against each other
- Quality Metrics: Configure automatic evaluation criteria to measure output quality
How It All Works Together
Here's the flow when you run an A/B test:
- A/B test pulls prompts from your configured Data Source
- Each prompt is sent to all selected Variants in parallel
- Every variant generates an output for each prompt
- Quality Metrics automatically evaluate each output (if expected outputs are provided)
- Results are aggregated and displayed, broken down by variant
This creates a complete comparison matrix: for every prompt, you can see how each variant performed across cost, latency, and quality dimensions.