Quality evaluations - Narev Docs

What are quality evaluations?

Quality evaluations measure how well your variants perform. There are two types of quality evaluations:

Evaluations that require a source of truth, such as expected output matching
Evaluations that don’t require a source of truth, such as structured output schema checks

You can define the source of truth with:

Expected output defined when you create the benchmark, which works best when responses are deterministic
State-of-the-art model output used as a reference baseline

When do you use quality evaluations?

Use quality evaluations to measure how each variant performs on the same benchmark.

How do quality evaluations work?

Quality evaluations take a model response as input and score the response against your selected evaluation criteria.

⌘I