What are quality evaluations?
Quality evaluations measure how well your variants perform. There are two types of quality evaluations:- Evaluations that require a source of truth, such as expected output matching
- Evaluations that don’t require a source of truth, such as structured output schema checks
- Expected output defined when you create the benchmark, which works best when responses are deterministic
- State-of-the-art model output used as a reference baseline