Why routers?
How choosing the right model can make your application faster and cheaper
Here, we explain how sending the query to the right model gives better latency, lower cost while getting the same quality
Know the difference: Router vs Gateway
Gateway
What it does - provides one interface to many models
What it doesn't do - does not choose which model to use for your request
Examples: OpenRouter, LiteLLM, Vertex AI, AWS Bedrock
Router
What it does - chooses which model to use for each request
What it doesn't do - doesn't provide a unified interface across providers (often relies on gateways for that)
Examples: Narev (welcome!), NotDiamond, Martian
How routers work
The basic problem
Different models have different costs and capabilities.
Normally, you'd have to choose one: either pay a lot for high quality, or save money but accept lower quality.
Why routers work
The key insight is that not all queries are equally difficult. Simple tasks and can be handled by a cheaper, faster model. Complex ones by a powerful model.
Routing uses this insight by selecting the right model for each request.
Research papers like RouteLLM: Learning to Route LLMs with Preference Data (2024) demonstrates how this approach can reduce costs by over 2x without compromising response quality.
Cost savings and latency improvement
- The cost savings come from reducing how often the expensive model is used.
- Improved latency come from the fact that smaller models are significantly faster at generating responses.
Why Narev router?
Deterministic Routing
We took the design decision to make the routing rules deterministic and to give control over the rules to the engineers. In contrast to ML routers that use black-box classifiers, Narev gives visibility to why each request routes where it does. This avoids router no drift and removes the need to retrain the router model.
Edge-native: sub-25ms routing latency and lowest cost
Simple rules allow us to execute routing on the edge, close to your user and achieve the lowest latency. ML routers add significant overhead due to embedding generation + model inference round trips.
Product-aware routing, not just query analysis
Narev's rule system allows routing based on any metadata included in the request, including your product context. Some examples of routing can include:
- routing based on user tier or cohort
- routing based on user history
- routing based on time of day