Know what you're spending on
Track LLM costs at the source. Break down aggregate spending into actionable insights by application, feature, and team.
At the end of this step, you should have answers to the following questions:
- Where are our LLM tokens being consumed? (Which apps, features, or workflows?)
- Who owns each area of spending?
- What's our current cost per [user/query/conversation/outcome]?
The aggregation problem
Most teams start with a single bill from their LLM provider:
Total monthly spend: $47,293
Input tokens: 2.3B
Output tokens: 890M
That's it. One number. No context. No way to know what's working and what's hemorrhaging money.
You can't answer basic questions:
- Which product feature costs the most?
- Is the chatbot more expensive than the code assistant?
- Did that optimization last week actually work?
- Which team should we talk to about their spending spike?
This is like running a business with one bank statement that says "$47K spent on stuff." You wouldn't accept that for regular business expenses. Don't accept it for AI.
The fundamentals: Input + Output Tokens
Good news: LLM billing is simpler than cloud. Almost everything comes down to:
- Input tokens: What you send to the model (prompts, context, documents)
- Output tokens: What the model generates back
Some providers charge for cache hits, function calls, or image processing, but tokens are 90%+ of your bill.
Break down the spending
The goal is to move from aggregate numbers to granular insights. Here's the progression:
Level 1: Organization-wide
❌ Total: $47,293/month across all models and applications
You have no idea what to optimize. Every feature looks equally responsible.
Level 2: By application or product
- Customer chatbot: $28,400 (60%)
- Code assistant: $12,900 (27%)
- Email classifier: $5,993 (13%)
Now you know where to focus. The chatbot is your biggest opportunity.
Level 3: By feature or use case
- Customer chatbot
- Live support: $18,200 (64%)
- FAQ responses: $7,100 (25%)
- Conversation summaries: $3,100 (11%)
Now you can make informed decisions. Maybe FAQ responses don't need your best model. Maybe summaries can run in batch overnight.
Level 4: By owner and cost per outcome
- Live support (Customer Success Team)
- Monthly spend: $18,200
- Conversations handled: 4,320
- Cost per conversation: $4.21
- Owner: Sarah Chen (VP Customer Success)
Now you can measure success. If you optimize and drop cost per conversation to $2.50, you saved $7,387/month—and you know exactly who to celebrate with.
Don't obsess over perfect attribution on day one. Start with application-level tracking. Refine to feature-level as you optimize. Perfection is the enemy of progress.
How to track spending at the source
You need infrastructure that attributes costs automatically. Here are the three most common approaches:
1. Resource tagging
Tag every LLM-related resource (API keys, endpoints, services) with metadata:
{
"application": "customer-chatbot",
"feature": "live-support",
"team": "customer-success",
"environment": "production"
}
Your cloud provider's billing dashboard can then group costs by these tags. This works well if you already use infrastructure-as-code and have mature tagging practices.
Pros: Automatic attribution once configured
Cons: Requires disciplined tagging from day one
2. Application-specific endpoints
Create separate API keys or proxy endpoints for each application:
api.yourcompany.com/chatbot
→ Customer chatbotapi.yourcompany.com/code-assist
→ Code assistantapi.yourcompany.com/classifier
→ Email classifier
Each endpoint logs usage separately. Your billing becomes self-documenting.
Pros: Immediate visibility with no code changes
Cons: Requires some infrastructure work upfront
How Narev helps: We've built application-specific endpoints with automatic usage tracking and attribution. You get granular cost visibility without infrastructure changes—just route traffic through Narev and see exactly where tokens are going.
3. Tracing
Wrap your LLM calls with middleware that logs metadata and traces execution:
# Pseudocode
response = llm.generate(
prompt=user_query,
metadata={
"application": "chatbot",
"feature": "live-support",
"user_id": "user_12345",
"team": "customer-success"
}
)
Your middleware sends usage data to your analytics platform (Datadog, Grafana, custom DB). Consider using tracing frameworks for deeper visibility:
- OpenTelemetry: Industry-standard observability framework that captures spans, traces, and metrics across your LLM pipeline
- LangSmith: Purpose-built for LLM applications, tracks prompts, completions, latency, and costs
- Langfuse: Open-source LLM observability with automatic cost tracking and evaluation workflows
- Arize Phoenix: Monitors model performance, token usage, and traces multi-step agent workflows
Tracing gives you more than just cost attribution—you see the full execution path, identify bottlenecks, and debug quality issues. If your LLM call is part of a multi-step workflow (agents, retrieval pipelines, function calling), tracing shows exactly where tokens and time are spent.
Pros: Maximum flexibility, control, and deep observability
Cons: Most engineering work required, needs integration with existing monitoring stack
How Narev helps: We've built an easy integration with most of the tracing solutions in our Enterprise plan.
Identify the owners
Every dollar of LLM spending should have a clear owner—someone who:
- Understands the use case and user experience
- Can make tradeoffs between cost, quality, and speed
- Has authority to approve changes
This might be:
- A product manager (for user-facing features)
- An engineering lead (for internal tooling)
- A team lead (for department-specific applications)
- You (if it's your project)
Why this matters: When you find a 60% cost reduction in Step 3, you need someone who can evaluate the quality tradeoff and greenlight the change. If you're optimizing someone else's feature without their input, you're setting yourself up for rejection.
It's helpful to have a simple ownership map:
Application/Feature | Owner | Team | Monthly Spend | Priority |
---|---|---|---|---|
Chatbot - Live support | Sarah Chen | Customer Success | $18,200 | High |
Chatbot - FAQ | Sarah Chen | Customer Success | $7,100 | Medium |
Code assistant | Alex Rivera | Engineering | $12,900 | High |
Email classifier | Jordan Kim | Operations | $5,993 | Low |
Share this with your stakeholders from Step 1. Make sure everyone agrees on:
- The spending breakdown (does it look right?)
- The ownership assignments (is Alex the right person for code assist?)
- The optimization priorities (should we start with live support or the code assistant?)
You're ready to optimize
Time to cut costs while improving quality. Move to Step 3: Optimize.