Quickstart

Quick start guide to running your first experiment

The purpose of this page is to give you a quick feel for running A/B tests with Narev

Overview

This guide walks you through running your first experiment in Narev using the pre-configured HellaSwag Default Experiment. This experiment compares GPT-4o Mini against Claude-3.5 Haiku on the HellaSwag dataset.

Running Your First Experiment

  1. Access the default experiment - When you first log in, you will see the default experiment
  2. Run the experiment - Click "Run Experiment" in the top right
  3. Wait for completion - The experiment will be queued, executed, and metrics calculated automatically
  4. View results - Once complete, scroll down to see the experiment impact summary showing performance comparisons

Understanding Your Results

After completion, go to Results tab. You'll see:

  • Summary: Quick comparison between variants showing improvements in cost, latency, and quality
  • Cost: Drill down on cost performance across variants
  • Latency: Drill down on total latency, time to first token and tokens per second for each variant
  • Quality: Details on the quality metrics for each variant
  • Prompt-by-Prompt: Detailed breakdown of how each variant performed on individual prompts

For a detailed guide on analyzing results and understanding A/B testing concepts, see the Introduction to A/B Testing.

Next Steps

Now that you've run your first experiment, you can:

  1. Create custom A/B tests with your own prompts and data sources
  2. Set up model variants to compare additional LLMs
  3. Configure custom evaluation metrics tailored to your use case