Quality Evaluations

Define how to evaluate if the variant performed correctly.

What are Quality Evaluations?

Quality evaluations measure how well your variants performed.

There are two types of quality evaluations:

Source of truth can be defined with:

Expected output - defined when the benchmark is created, best for benchmarks where the response is obvious
State of the art model - defined by the State of the Art model

Use quality evaluations to evaluate the performance of a variant on the benchmark.

Under the hood, quality evaluations take model's response as an input and evaluate the correctness of the responses.

Still have questions? Ask on Discord

On This Page