Docs
Know what you're spending on

Know what you're spending on

Track LLM costs at the source. Break down aggregate spending into actionable insights by application, feature, and team.

At the end of this step, you should have answers to the following questions:

  • Where are our LLM tokens being consumed? (Which apps, features, or workflows?)
  • Who owns each area of spending?
  • What's our current cost per [user/query/conversation/outcome]?

The aggregation problem

Most teams start with a single bill from their LLM provider:

Total monthly spend: $47,293
Input tokens: 2.3B
Output tokens: 890M

That's it. One number. No context. No way to know what's working and what's hemorrhaging money.

You can't answer basic questions:

  • Which product feature costs the most?
  • Is the chatbot more expensive than the code assistant?
  • Did that optimization last week actually work?
  • Which team should we talk to about their spending spike?

This is like running a business with one bank statement that says "$47K spent on stuff." You wouldn't accept that for regular business expenses. Don't accept it for AI.

The fundamentals: Input + Output Tokens

Good news: LLM billing is simpler than cloud. Almost everything comes down to:

  • Input tokens: What you send to the model (prompts, context, documents)
  • Output tokens: What the model generates back

Some providers charge for cache hits, function calls, or image processing, but tokens are 90%+ of your bill.

The first challenge isn't understanding the bill. It's attributing it.

Break down the spending

The goal is to move from aggregate numbers to granular insights. Here's the progression:

Level 1: Organization-wide

Total: $47,293/month across all models and applications

You have no idea what to optimize. Every feature looks equally responsible.

Level 2: By application or product

  1. Customer chatbot: $28,400 (60%)
  2. Code assistant: $12,900 (27%)
  3. Email classifier: $5,993 (13%)

Now you know where to focus. The chatbot is your biggest opportunity.

Level 3: By feature or use case

  1. Customer chatbot
    • Live support: $18,200 (64%)
    • FAQ responses: $7,100 (25%)
    • Conversation summaries: $3,100 (11%)

Now you can make informed decisions. Maybe FAQ responses don't need your best model. Maybe summaries can run in batch overnight.

Level 4: By owner and cost per outcome

  1. Live support (Customer Success Team)
    • Monthly spend: $18,200
    • Conversations handled: 4,320
    • Cost per conversation: $4.21
    • Owner: Sarah Chen (VP Customer Success)

Now you can measure success. If you optimize and drop cost per conversation to $2.50, you saved $7,387/month—and you know exactly who to celebrate with.

Don't obsess over perfect attribution on day one. Start with application-level tracking. Refine to feature-level as you optimize. Perfection is the enemy of progress.

How to track spending at the source

You need infrastructure that attributes costs automatically. Here are the three most common approaches:

1. Resource tagging

Tag every LLM-related resource (API keys, endpoints, services) with metadata:

{
  "application": "customer-chatbot",
  "feature": "live-support",
  "team": "customer-success",
  "environment": "production"
}

Your cloud provider's billing dashboard can then group costs by these tags. This works well if you already use infrastructure-as-code and have mature tagging practices.

Pros: Automatic attribution once configured

Cons: Requires disciplined tagging from day one

2. Application-specific endpoints

Create separate API keys or proxy endpoints for each application:

  • api.yourcompany.com/chatbot → Customer chatbot
  • api.yourcompany.com/code-assist → Code assistant
  • api.yourcompany.com/classifier → Email classifier

Each endpoint logs usage separately. Your billing becomes self-documenting.

Pros: Immediate visibility with no code changes

Cons: Requires some infrastructure work upfront

How Narev helps: We've built application-specific endpoints with automatic usage tracking and attribution. You get granular cost visibility without infrastructure changes—just route traffic through Narev and see exactly where tokens are going.

3. Tracing

Wrap your LLM calls with middleware that logs metadata and traces execution:

# Pseudocode
response = llm.generate(
    prompt=user_query,
    metadata={
        "application": "chatbot",
        "feature": "live-support",
        "user_id": "user_12345",
        "team": "customer-success"
    }
)

Your middleware sends usage data to your analytics platform (Datadog, Grafana, custom DB). Consider using tracing frameworks for deeper visibility:

  • OpenTelemetry: Industry-standard observability framework that captures spans, traces, and metrics across your LLM pipeline
  • LangSmith: Purpose-built for LLM applications, tracks prompts, completions, latency, and costs
  • Langfuse: Open-source LLM observability with automatic cost tracking and evaluation workflows
  • Arize Phoenix: Monitors model performance, token usage, and traces multi-step agent workflows

Tracing gives you more than just cost attribution—you see the full execution path, identify bottlenecks, and debug quality issues. If your LLM call is part of a multi-step workflow (agents, retrieval pipelines, function calling), tracing shows exactly where tokens and time are spent.

Pros: Maximum flexibility, control, and deep observability

Cons: Most engineering work required, needs integration with existing monitoring stack

How Narev helps: We've built an easy integration with most of the tracing solutions in our Enterprise plan.

Identify the owners

Every dollar of LLM spending should have a clear owner—someone who:

  • Understands the use case and user experience
  • Can make tradeoffs between cost, quality, and speed
  • Has authority to approve changes

This might be:

  • A product manager (for user-facing features)
  • An engineering lead (for internal tooling)
  • A team lead (for department-specific applications)
  • You (if it's your project)

Why this matters: When you find a 60% cost reduction in Step 3, you need someone who can evaluate the quality tradeoff and greenlight the change. If you're optimizing someone else's feature without their input, you're setting yourself up for rejection.

It's helpful to have a simple ownership map:

Application/FeatureOwnerTeamMonthly SpendPriority
Chatbot - Live supportSarah ChenCustomer Success$18,200High
Chatbot - FAQSarah ChenCustomer Success$7,100Medium
Code assistantAlex RiveraEngineering$12,900High
Email classifierJordan KimOperations$5,993Low

Share this with your stakeholders from Step 1. Make sure everyone agrees on:

  • The spending breakdown (does it look right?)
  • The ownership assignments (is Alex the right person for code assist?)
  • The optimization priorities (should we start with live support or the code assistant?)

You're ready to optimize

Time to cut costs while improving quality. Move to Step 3: Optimize.