Creating A/B Tests

Set up A/B tests to organize your prompts and compare variants.

A/B tests are datasets or use cases that bring together prompts, model variants, and evaluation criteria to run systematic experiments and compare performance.

Each A/B test orchestrates the complete testing workflow: it takes your test data (prompts), runs them through multiple model configurations (variants), and measures the results using automated quality metrics.

Core Components

Each A/B test requires three key components:

  • Data Sources: Choose how to load prompts into your A/B test (manual entry, file upload, traces import, or live testing through an API endpoint)
  • Variants: Define at least 2 model configurations to test against each other
  • Quality Metrics: Configure automatic evaluation criteria to measure output quality

How It All Works Together

Here's the flow when you run an A/B test:

  1. A/B test pulls prompts from your configured Data Source
  2. Each prompt is sent to all selected Variants in parallel
  3. Every variant generates an output for each prompt
  4. Quality Metrics automatically evaluate each output (if expected outputs are provided)
  5. Results are aggregated and displayed, broken down by variant

This creates a complete comparison matrix: for every prompt, you can see how each variant performed across cost, latency, and quality dimensions.