Benchmarks

Datasets that hold your prompts and let you compare variants side-by-side.

What are benchmarks?

Benchmarks are collections of requests. The collection is a dataset used to find the optimal model by comparing how variants perform.

When do you use benchmarks?

Use benchmarks to organize prompts by use case.

How do benchmarks work?

Prompts can be added to benchmarks through:

  • Manual Input by added from the Narev UI
  • Traces by connecting your tracing provider
  • Live Endpoint by sending requests to the Narev endpoint through /application or /router endpoint
  • File Import by uploading formatted JSON, JSONL or CSV files
  • Clawhub autogenerated from the SKILL.md file