Why routers?

How choosing the right model can make your application faster and cheaper

Here, we explain how sending the query to the right model gives better latency, lower cost while getting the same quality

Know the difference: Router vs Gateway

Gateway

What it does - provides one interface to many models

What it doesn't do - does not choose which model to use for your request

Examples: OpenRouter, LiteLLM, Vertex AI, AWS Bedrock

Router

What it does - chooses which model to use for each request

What it doesn't do - doesn't provide a unified interface across providers (often relies on gateways for that)

Examples: Narev (welcome!), NotDiamond, Martian

How routers work

The basic problem

Different models have different costs and capabilities.

Powerful Models

Examples: GPT-5.1, Claude Opus

Pros: Give great responses, handle complex queries

Cons: Expensive to run due to computation required

Smaller Models

Examples: Mixtral-8x7B, Llama 3

Pros: Much cheaper, faster response times

Cons: Don't perform as well on complex tasks

Normally, you'd have to choose one: either pay a lot for high quality, or save money but accept lower quality.

Why routers work

The key insight is that not all queries are equally difficult. Simple tasks and can be handled by a cheaper, faster model. Complex ones by a powerful model.

Routing uses this insight by selecting the right model for each request.

Research papers like RouteLLM: Learning to Route LLMs with Preference Data (2024) demonstrates how this approach can reduce costs by over 2x without compromising response quality.

Cost savings and latency improvement

  • The cost savings come from reducing how often the expensive model is used.
  • Improved latency come from the fact that smaller models are significantly faster at generating responses.

Why Narev router?

Deterministic Routing

We took the design decision to make the routing rules deterministic and to give control over the rules to the engineers. In contrast to ML routers that use black-box classifiers, Narev gives visibility to why each request routes where it does. This avoids router no drift and removes the need to retrain the router model.

Edge-native: sub-25ms routing latency and lowest cost

Simple rules allow us to execute routing on the edge, close to your user and achieve the lowest latency. ML routers add significant overhead due to embedding generation + model inference round trips.

Product-aware routing, not just query analysis

Narev's rule system allows routing based on any metadata included in the request, including your product context. Some examples of routing can include:

  • routing based on user tier or cohort
  • routing based on user history
  • routing based on time of day