Quality Evaluations
Define how to evaluate if the variant performed correctly.
What are Quality Evaluations?
Quality evaluations measure how well your variants performed.
There are two types of quality evaluations:
- require source of truth (e.g., LLM Judge)
- do not require source of truth (e.g., Structured Output Schema)
Source of truth can be defined with:
- Expected output - defined when the benchmark is created, best for benchmarks where the response is obvious
- State of the art model - defined by the State of the Art model
When do you use Quality Evaluations?
Use quality evaluations to evaluate the performance of a variant on the benchmark.
How do Quality Evaluations work?
Under the hood, quality evaluations take model's response as an input and evaluate the correctness of the responses.
Still have questions? Ask on Discord
On This Page