Running Tests

Execute A/B tests and generate experiments to compare variant performance.

Once you've configured your A/B test with data sources, variants, and metrics, you're ready to run experiments and analyze results.

The Testing Flow

Running an A/B test follows a three-step workflow through the interface tabs:

1. Setup Tab

Configure all the necessary components:

Add your data source (prompts to test)
Select or create at least 2 variants to compare
Enable quality metrics for evaluation

2. Run Your Test

Once everything is configured in Setup, click the Run button to start an experiment:

The button appears in the top-right corner of the A/B test interface
Click Run to generate a new experiment
The platform will:
- Send each prompt to all selected variants in parallel
- Collect responses from each variant
- Automatically evaluate outputs using your configured metrics
- Calculate cost and latency for each variant

Each run creates a new experiment—a snapshot of results for all variants at that moment. This lets you track performance changes over time as you adjust configurations.

3. Results Tab

After running, you'll be automatically redirected to the Results tab where you can:

View aggregated metrics across all variants
Compare cost, latency, and quality side-by-side
Drill down into individual prompts and responses
Identify which variant performs best for your use case

When to Re-run Tests

Important: After making changes to your A/B test configuration, you must run again to see the impact. Previous results remain unchanged until you generate a new experiment.

You need to click Run again whenever you:

Change Variants

Add or remove variants from your test
Modify variant settings (model, system prompt, parameters)
Adjust temperature, max tokens, or other model parameters

The most common scenario is tweaking variant configurations to improve performance. Each change requires a new run to generate fresh results.

Configuring Metrics Manual Evaluations