Routing with the API

Use the Router API to dynamically route requests to different A/B tests.

The Router API provides intelligent request routing based on filters and rules you configure in the Narev UI. Instead of specifying models in your code, you define routing logic that determines which A/B test handles each request.

Quick Start

Replace your OpenAI base URL with your Narev router endpoint:

from openai import OpenAI
 
client = OpenAI(
    api_key="YOUR_NAREV_API_KEY",
    base_url="https://narev.ai/api/router/{router_id}/v1"
)
 
# No model needed - router decides
response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

That's it! The router evaluates your configured filters and routes the request to the appropriate A/B test.

How Routing Works

When a request arrives at the Router API:

  1. Filter Evaluation - The router evaluates filters in priority order
  2. A/B Test Selection - The first matching filter determines which A/B test to use
  3. Production Variant - The selected A/B test's production variant handles the request
  4. Response - The response is returned with the model identifier used
graph LR
    A[Request] --> B[Router]
    B --> C{Filter 1}
    C -->|Match| D[A/B Test A]
    C -->|No Match| E{Filter 2}
    E -->|Match| F[A/B Test B]
    E -->|No Match| G{Fallback}
    G --> H[Default A/B Test]

The router uses the production variant of the matched A/B test. Ensure all A/B tests referenced in your routing rules have a production variant configured.

Understanding Filters

Filters are rules that determine which A/B test handles a request. Each filter can evaluate:

1. Message Content

Match based on keywords, patterns, or semantic meaning in user messages:

Examples:

  • "Contains 'code' or 'python'" → Code A/B Test
  • "Contains 'story' or 'creative'" → Creative Writing A/B Test
  • "Question about geography" → Factual A/B Test

2. Message Length

Route based on prompt complexity:

Examples:

  • "Less than 50 tokens" → Quick Response A/B Test (GPT-3.5)
  • "More than 500 tokens" → Complex A/B Test (GPT-4)

3. Metadata Fields

Route based on custom metadata you include in requests:

Examples:

  • user_tier == "premium" → Premium A/B Test
  • complexity == "high" → Advanced Model A/B Test
  • language == "spanish" → Spanish A/B Test

4. Combination Rules

Create complex logic combining multiple conditions:

Examples:

  • "Message contains 'code' AND user_tier is 'premium'" → Premium Code A/B Test
  • "Message length > 200 OR complexity is 'high'" → GPT-4 A/B Test

Configuring Routing Rules

Creating a Router

  1. Go to Routers in the Narev dashboard
  2. Click New Router
  3. Give it a descriptive name
  4. Add filters and assign A/B tests

Filter Priority

Filters are evaluated in order from top to bottom:

Priority 1: Premium users → Premium A/B Test
Priority 2: Code-related → Code A/B Test
Priority 3: Long prompts → GPT-4 A/B Test
Fallback: Default → GPT-3.5 A/B Test

Order filters from most specific to most general. Put narrow filters (like premium users) before broad filters (like content type).

Setting a Fallback

Always configure a fallback A/B test to handle requests that don't match any filter:

# Without fallback: Request fails if no filter matches
# With fallback: Request always succeeds, routed to default A/B test

Best Practice: Use a reliable, cost-effective model (like GPT-3.5 Turbo) as your fallback.

Request Parameters Ignored

Unlike the Applications API, the Router API ignores certain parameters because they're determined by the routed A/B test's configuration:

ParameterIgnored?Reason
model✅ YesDetermined by matched A/B test's production variant
temperature✅ YesDetermined by matched A/B test's production variant
top_p✅ YesDetermined by matched A/B test's production variant
max_tokens✅ YesDetermined by matched A/B test's production variant
messages❌ NoUsed for routing decisions and passed to A/B test
metadata❌ NoUsed for routing decisions and tracking
stream❌ NoHonored by the router and routed A/B test
# These parameters are IGNORED by the router
response = client.chat.completions.create(
    model="openai:gpt-4",           # ❌ Ignored - app config determines model
    temperature=0.9,                 # ❌ Ignored - app config determines temp
    messages=[{"role": "user", "content": "Hello"}],  # ✅ Used for routing
    extra_body={
        "metadata": {                # ✅ Used for routing decisions
            "user_tier": "premium"
        }
    }
)

Metadata-Based Routing

Include metadata in your requests to enable sophisticated routing logic:

User Segmentation

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "user_tier": "premium",
            "user_id": "user_123"
        }
    }
)

Router Configuration:

  • Filter: user_tier == "premium" → Premium A/B Test (GPT-4)
  • Fallback → Standard A/B Test (GPT-3.5)

Task Complexity

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "complexity": "high",
            "estimated_tokens": 1500
        }
    }
)

Router Configuration:

  • Filter: complexity == "high" → Complex Tasks A/B Test (GPT-4 Turbo)
  • Filter: estimated_tokens > 1000 → Long Context A/B Test (Claude)
  • Fallback → Standard A/B Test

Domain-Specific Routing

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "domain": "code",
            "language": "python",
            "task_type": "debugging"
        }
    }
)

Router Configuration:

  • Filter: domain == "code" → Code A/B Test (GPT-4)
  • Filter: domain == "creative" → Creative A/B Test (Claude)
  • Filter: domain == "data" → Data Science A/B Test (GPT-4)
  • Fallback → General A/B Test

Content-Based Routing

The router can analyze message content to determine routing:

Keyword Matching

# "Write Python code to sort a list"
# Router detects "code" keyword → Routes to Code A/B Test

Semantic Understanding

# "I need help debugging my application"
# Router understands this is code-related → Routes to Code A/B Test

Question Type

# "What is 2+2?"
# Router identifies simple factual question → Routes to Quick Answers A/B Test

Content-based routing is configured in the Narev UI using natural language descriptions. The router uses AI to evaluate whether messages match your criteria.

Use Case Examples

1. Cost Optimization

Route simple queries to cheaper models, complex queries to premium models:

Router Configuration:

Filter 1: Message length < 100 tokens → GPT-3.5 Turbo A/B Test
Filter 2: Message length >= 100 tokens → GPT-4 A/B Test

Code (same for all requests):

response = client.chat.completions.create(
    messages=[{"role": "user", "content": user_query}]
)

Cost Savings: ~70% reduction by routing simple queries to GPT-3.5.

2. Domain-Specific Routing

Route to specialized A/B tests based on task type:

Router Configuration:

Filter 1: Contains "code" or "programming" → Code Assistant (GPT-4)
Filter 2: Contains "creative" or "story" → Creative Writer (Claude)
Filter 3: Contains "analyze" or "data" → Data Analyst (GPT-4 Turbo)
Fallback: General Assistant (GPT-3.5 Turbo)

Code:

# All handled by the same endpoint
response = client.chat.completions.create(
    messages=[{"role": "user", "content": user_query}]
)

3. Gradual Model Rollout

Test new models with a subset of users:

Router Configuration:

Filter 1: user_id % 10 < 2 → New Model A/B Test (Claude 3.5)
Filter 2: Always → Current Model A/B Test (GPT-4)

Code:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "user_id": int(user_id)
        }
    }
)

Result: 20% of users get the new model, 80% stay on current model.

4. Tiered Service Levels

Provide different service quality based on user tier:

Router Configuration:

Filter 1: user_tier == "enterprise" → Premium A/B Test (GPT-4 Turbo, fast servers)
Filter 2: user_tier == "pro" → Professional A/B Test (GPT-4)
Filter 3: user_tier == "free" → Free A/B Test (GPT-3.5, rate limited)

Code:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "user_tier": user.subscription_tier
        }
    }
)

5. Geographic Routing

Route to region-specific A/B tests for latency optimization:

Router Configuration:

Filter 1: region == "us-east" → US East A/B Test (OpenAI US)
Filter 2: region == "eu-west" → EU West A/B Test (OpenAI EU)
Filter 3: region == "asia" → Asia A/B Test (Azure OpenAI Asia)

Code:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "region": detect_user_region()
        }
    }
)

6. Fallback Chains for Reliability

Create automatic failover between providers:

Router Configuration:

Filter 1: Try primary OpenAI A/B Test
  - On failure, automatic retry triggers fallback filter
Filter 2 (Fallback): Anthropic A/B Test
  - On failure, triggers second fallback
Filter 3 (Final Fallback): OpenRouter A/B Test

This requires configuring retry logic in your router settings.

Testing Your Routing Logic

Test Suite Approach

Create a test suite to verify routing behavior:

test_cases = [
    {
        "query": "Write Python code to sort a list",
        "metadata": {"user_tier": "free"},
        "expected_app": "Code A/B Test"
    },
    {
        "query": "Tell me a creative story",
        "metadata": {"user_tier": "free"},
        "expected_app": "Creative A/B Test"
    },
    {
        "query": "What is 2+2?",
        "metadata": {"user_tier": "free"},
        "expected_app": "Quick Answers A/B Test"
    },
    {
        "query": "Complex data analysis task",
        "metadata": {"user_tier": "premium"},
        "expected_app": "Premium A/B Test"
    }
]
 
for test in test_cases:
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": test["query"]}],
        extra_body={"metadata": test["metadata"]}
    )
 
    # The model field shows which app was used
    print(f"Query: {test['query']}")
    print(f"Routed to: {response.model}")
    print(f"Expected: {test['expected_app']}")
    print()

Routing Analytics

View routing performance in the Narev dashboard:

  • Distribution Charts - Which A/B tests receive what percentage of traffic
  • Filter Match Rates - How often each filter matches
  • Response Metrics - Latency, cost, and quality by routing rule
  • Fallback Usage - How often the fallback is triggered

If your fallback A/B test receives more than 30% of traffic, your filters may be too narrow. Consider adding broader matching rules.

Best Practices

1. Always Configure a Fallback

# ❌ Bad: No fallback
Filter 1: Contains "code" → Code A/B Test
# If no filter matches → Request fails
 
# ✅ Good: Fallback configured
Filter 1: Contains "code" → Code A/B Test
Fallback: General A/B Test
# All requests succeed

2. Order Filters by Specificity

# ✅ Good Order (specific to general)
Filter 1: user_tier == "enterprise" AND domain == "code" → Premium Code A/B Test
Filter 2: user_tier == "enterprise" → Premium A/B Test
Filter 3: domain == "code" → Code A/B Test
Filter 4: Fallback → General A/B Test
 
# ❌ Bad Order (general first blocks specific)
Filter 1: domain == "code" → Code A/B Test
Filter 2: user_tier == "enterprise" AND domain == "code" → Premium Code A/B Test
# Filter 2 never matches because Filter 1 catches all code requests first

3. Provide Routing Context in Metadata

# ✅ Good - Rich metadata helps routing
response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "user_tier": "premium",
            "task_type": "analysis",
            "expected_length": "long",
            "domain": "finance"
        }
    }
)
 
# ❌ Less effective - Missing context
response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}]
)

4. Use Consistent Metadata Keys

Define a metadata schema and stick to it:

# ✅ Good - Consistent naming
METADATA_SCHEMA = {
    "user_tier": ["free", "pro", "enterprise"],
    "task_type": ["question", "generation", "analysis"],
    "complexity": ["low", "medium", "high"],
    "domain": ["code", "creative", "data", "general"]
}
 
# Use it consistently
metadata = {
    "user_tier": "pro",
    "task_type": "generation",
    "complexity": "medium",
    "domain": "creative"
}

5. Test Routing Before Production

Deploy routers in stages:

  1. Development Router - Test with synthetic data
  2. Staging Router - Test with replica production traffic
  3. Canary Router - Route 5% of production traffic
  4. Production Router - Full production rollout

6. Monitor Fallback Usage

High fallback usage indicates:

  • Filters too narrow - Add broader matching rules
  • Unexpected request patterns - Add filters for new patterns
  • Metadata missing - Ensure clients send expected metadata
# Check if fallback is overused
fallback_percentage = (fallback_requests / total_requests) * 100
 
if fallback_percentage > 30:
    print("⚠️ Fallback overused - review filter configuration")

7. Handle System Messages Appropriately

Important: The router filters out system messages before routing evaluation. Only user and assistant messages are considered for content-based routing.

If you need system prompts:

  • Configure them in the A/B test's production variant
  • Don't rely on system messages in requests for routing decisions
# ❌ Bad - System message won't affect routing
response = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a code assistant"},  # Not used for routing
        {"role": "user", "content": "Help me debug"}
    ]
)
 
# ✅ Good - Use metadata for routing, configure system prompt in A/B test
response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Help me debug"}
    ],
    extra_body={
        "metadata": {
            "domain": "code"  # This affects routing
        }
    }
)
# Router routes to Code A/B Test, which has system prompt configured

Debugging Routing Issues

Check Which A/B Test Was Used

The response's model field shows which A/B test handled the request:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Test query"}]
)
 
print(f"Model used: {response.model}")
# Output: "openai:gpt-4" (indicates GPT-4 A/B test was used)

Review Routing History

In the Narev dashboard:

  1. Go to Routers → Select your router
  2. Click Analytics tab
  3. View request history with routing decisions

Each request shows:

  • Which filter matched (or if fallback was used)
  • Which A/B test handled it
  • Response time and cost
  • Any errors

Common Issues

Issue: Request routed to fallback unexpectedly

Possible causes:

  • Metadata keys don't match filter configuration
  • Message content doesn't match filter patterns
  • Filter conditions are too strict

Solution:

# Add debug metadata to track routing
response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "debug_routing": True,
            "expected_filter": "code_filter",
            # ... other metadata
        }
    }
)

Issue: Wrong A/B test handling requests

Possible causes:

  • Filter priority order is incorrect
  • Multiple filters matching (first match wins)
  • Metadata values don't match filter expectations

Solution: Review filter order and test with specific examples.

Issue: No production variant error

Error message:

{
  "error": {
    "message": "A/B test 'Code Helper' does not have a production variant configured",
    "code": "no_production_variant"
  }
}

Solution: Configure a production variant for the A/B test in the Narev dashboard.

Advanced Routing Patterns

Time-Based Routing

Route based on time of day to optimize for peak hours:

Router Configuration:

Filter 1: hour >= 9 AND hour <= 17 → Fast Response A/B Test (more instances)
Filter 2: hour < 9 OR hour > 17 → Standard A/B Test (fewer instances)

Code:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "hour": datetime.now().hour
        }
    }
)

Load-Based Routing

Route to different A/B tests based on current load:

Code:

import random
 
# Distribute load across multiple A/B tests
response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "load_bucket": random.randint(0, 9)  # 0-9
        }
    }
)

Router Configuration:

Filter 1: load_bucket < 3 → A/B Test Pool A (30%)
Filter 2: load_bucket < 7 → A/B Test Pool B (40%)
Filter 3: Fallback → A/B Test Pool C (30%)

Multi-Step Routing

For complex workflows, chain multiple router calls:

# Step 1: Route to classifier
classifier_response = classifier_client.chat.completions.create(
    messages=[{"role": "user", "content": f"Classify this query: {query}"}]
)
 
classification = classifier_response.choices[0].message.content
 
# Step 2: Route to specialized handler based on classification
response = main_client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "classification": classification
        }
    }
)

Migration from Applications API

If you're currently using the Applications API and want to migrate to the Router API:

Before (Applications API)

# Code determines which A/B test to use
if task_type == "code":
    base_url = "https://narev.ai/api/applications/app_code_123/v1"
elif task_type == "creative":
    base_url = "https://narev.ai/api/applications/app_creative_456/v1"
else:
    base_url = "https://narev.ai/api/applications/app_general_789/v1"
 
client = OpenAI(api_key=api_key, base_url=base_url)
response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[{"role": "user", "content": query}]
)

After (Router API)

# Single endpoint - router handles the logic
client = OpenAI(
    api_key=api_key,
    base_url="https://narev.ai/api/router/router_123/v1"
)
 
response = client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    extra_body={
        "metadata": {
            "task_type": task_type
        }
    }
)

Benefits:

  • Simpler code - no conditional logic
  • Centralized routing configuration
  • Easy to update routing rules without code changes
  • Better analytics and monitoring

Next Steps

  • See the Router API Reference for complete endpoint documentation
  • Compare with Applications API for explicit model selection
  • Configure your first router in the Narev dashboard
  • View routing analytics to optimize your filter configuration