Quickstart
Quick start guide to running your first experiment
The purpose of this page is to give you a quick feel for running A/B tests with Narev
Overview
This guide walks you through running your first experiment in Narev using the pre-configured HellaSwag Default Experiment. This experiment compares GPT-4o Mini against Claude-3.5 Haiku on the HellaSwag dataset.
Running Your First Experiment
- Access the default experiment - When you first log in, you will see the default experiment
- Run the experiment - Click "Run Experiment" in the top right
- Wait for completion - The experiment will be queued, executed, and metrics calculated automatically
- View results - Once complete, scroll down to see the experiment impact summary showing performance comparisons
Understanding Your Results
After completion, go to Results tab. You'll see:
- Summary: Quick comparison between variants showing improvements in cost, latency, and quality
- Cost: Drill down on cost performance across variants
- Latency: Drill down on total latency, time to first token and tokens per second for each variant
- Quality: Details on the quality metrics for each variant
- Prompt-by-Prompt: Detailed breakdown of how each variant performed on individual prompts
For a detailed guide on analyzing results and understanding A/B testing concepts, see the Introduction to A/B Testing.
Next Steps
Now that you've run your first experiment, you can:
- Create custom A/B tests with your own prompts and data sources
- Set up model variants to compare additional LLMs
- Configure custom evaluation metrics tailored to your use case