Quickstart

Quick start guide to running your first experiment

The purpose of this page is to give you a quick feel for running A/B tests with Narev

Overview

This guide walks you through running your first experiment in Narev using the pre-configured HellaSwag Default Experiment. This experiment compares GPT-4o Mini against Claude-3.5 Haiku on the HellaSwag dataset.

Running Your First Experiment

Access the default experiment - When you first log in, you will see the default experiment
Run the experiment - Click "Run Experiment" in the top right
Wait for completion - The experiment will be queued, executed, and metrics calculated automatically
View results - Once complete, scroll down to see the experiment impact summary showing performance comparisons

Understanding Your Results

After completion, go to Results tab. You'll see:

Summary: Quick comparison between variants showing improvements in cost, latency, and quality
Cost: Drill down on cost performance across variants
Latency: Drill down on total latency, time to first token and tokens per second for each variant
Quality: Details on the quality metrics for each variant
Prompt-by-Prompt: Detailed breakdown of how each variant performed on individual prompts

For a detailed guide on analyzing results and understanding A/B testing concepts, see the Introduction to A/B Testing.

Next Steps

Now that you've run your first experiment, you can:

Create custom A/B tests with your own prompts and data sources
Set up model variants to compare additional LLMs
Configure custom evaluation metrics tailored to your use case

Overview Why Routers?