Introduction

Learn how to compare variants and optimize your AI applications with A/B testing in Narev.

How does benchmarking with Narev work?

There are three steps to getting your benchmark up and running.

Create a dataset and define quality metric

This dataset will contain all the prompts that the benchmark will be based on. Here, you also define the way to evaluate the response.

Add variants to your benchmark

Variants are a combination of:

  • model and provider (e.g. Deepseek R1 from OpenRouter)
  • system prompt
  • parameters

You can add as many variants as needed to determine the most cost efficient model.

Run the benchmark and interpret the results

(optional) Publish benchmark to the Hub

Narev Hub is a community of public benchmarks created by our users.