Skip to main content

What are quality evaluations?

Quality evaluations measure how well your variants perform. There are two types of quality evaluations:
  • Evaluations that require a source of truth, such as expected output matching
  • Evaluations that don’t require a source of truth, such as structured output schema checks
You can define the source of truth with:
  • Expected output defined when you create the benchmark, which works best when responses are deterministic
  • State-of-the-art model output used as a reference baseline

When do you use quality evaluations?

Use quality evaluations to measure how each variant performs on the same benchmark.

How do quality evaluations work?

Quality evaluations take a model response as input and score the response against your selected evaluation criteria.