Benchmarking with API
Use the Applications API to run A/B tests programmatically.
The Applications API provides an OpenAI-compatible endpoint that enables automatic benchmarking through your production code. Make requests with different configurations, and Narev automatically tracks and compares performance.
Quick Start
Replace your OpenAI base URL with your Narev A/B test endpoint:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_NAREV_API_KEY",
base_url="https://narev.ai/api/applications/{benchmark_id}/v1"
)
response = client.chat.completions.create(
model="openai:gpt-4",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)That's it! Narev now will add every prompt sent to this endpoint to your benchmark.
Selecting a model provider or gateway
Narev supports multiple AI providers/gateways through gateway prefixes:
{gateway}:{model_name}Available Gateways:
openai- OpenAI directopenrouter- OpenRouter aggregatornvidia- NVIDIA NIM modelskilo- Kilo Code modelsgithub- Github models
Examples:
openai:gpt-4- OpenAI's GPT-4openrouter:anthropic/claude-3-opus- Claude via OpenRouteranthropic:claude-3-sonnet-20240229- Direct Anthropicopenrouter:meta-llama/llama-3.1-70b-instruct- Llama via OpenRouter
The same model accessed through different gateways is treated as a separate variant. This lets you compare:
- Latency - Which gateway is faster?
- Cost - Which is more economical?
- Reliability - Which has better uptime?
# Test GPT-4 via OpenAI
response1 = client.chat.completions.create(
model="openai:gpt-4",
messages=[{"role": "user", "content": prompt}]
)
# Test GPT-4 via OpenRouter
response2 = client.chat.completions.create(
model="openrouter:openai/gpt-4",
messages=[{"role": "user", "content": prompt}]
)Default behavior: If you omit the gateway prefix (e.g., just gpt-4), Narev defaults to the native provider.
However, we recommend always using explicit gateway prefixes for clarity.
Include expected outputs for a subset of your production requests to continuously monitor quality and ensure model changes don't harm accuracy.
You can also include custom metadata for filtering and analysis:
response = client.chat.completions.create(
model="openai:gpt-4",
messages=[{"role": "user", "content": prompt}],
extra_body={
"metadata": {
"expected_output": "Expected answer here",
"user_id": "user_123",
"session_id": "session_456",
"category": "customer_support"
}
}
)Still have questions? Ask on Discord
On This Page