Know your objective and who's in charge
Establish what success looks like before optimizing. Get your team aligned on goals and identify who makes the final call.
At the end of this step, you should have answer to the following questions:
- What are our top 2-3 optimization priorities?
- Who are the stakeholders, and do they agree?
- Who is the decision-maker when tradeoffs get tough?
Understand the tradeoffs
Everything is a tradeoff. Speed vs. accuracy. Cost vs. quality. Latency vs. intelligence.
LLM providers want you to believe their flagship models are one-size-fits-all solutions. They're not.
Think about cars. You wouldn't use the same car for every job:
Car Type | Best For | Strength | Tradeoff |
---|---|---|---|
City car | Daily commutes | Fuel-efficient, easy to park | Limited cargo space |
Bus | Mass transportation | Cost-effective per passenger | Slow, inflexible routes |
Forklift | Heavy lifting | Precise load handling | Useless for transportation |
Sports car | Maximum speed | Blazing fast | Terrible fuel economy |
Pickup truck | Heavy cargo | High payload capacity | Poor fuel economy, less comfort |
Motorcycle | Quick trips | Agile and economical | Limited capacity |
You wouldn't drive a forklift to the grocery store. You wouldn't use a city car to traverse a mountain trail. The job determines the car you need.
LLMs work the same way. Different models for different tasks:
- Simple queries? Use a lightweight model (the "city car")
- Complex reasoning? Premium model justified (the "sports car")
- Batch processing? Optimize for throughput (the "bus")
- Real-time interaction? Optimize for speed (the "motorcycle")
Yet most teams default to the "sports car" for everything.
One size doesn't fit all
LLM providers invest heavily in marketing their flagship models as universal solutions. They showcase impressive benchmark scores (MMLU, HumanEval, GPQA) that measure "intelligence" in controlled environments.
But here's what those benchmarks ignore:
- Cost: Premium models can be 100x more expensive
- Latency: Premium models can be 2-3x slower—users abandon apps while waiting for the "perfect" answer
- Overkill: Most queries don't need maximum intelligence
The newest model always costs the same
Yes, GPT-3.5 is now 10x cheaper than it was at launch. However, the price of the frontier models stay constant.
And the best model? It always costs roughly the same—around $15-75 per million tokens—because that's what the edge of compute costs today.
Time | Price of GPT-4 Class (Output) | Decrease Since Inception | Frontier Model (Most Capable) | Price (Output) | Source |
---|---|---|---|---|---|
2023 Q1 (Mar) | $60/M tokens | 0% (baseline) | GPT-4 | $60/M tokens | Nebuly |
2023 Q4 (Nov) | $30/M tokens (GPT-4 Turbo) | 50% | GPT-4 Turbo | $30/M tokens | Nebuly |
2024 Q1 (Mar) | $30/M tokens | 50% | Claude 3 Opus | $75/M tokens | PromptHub |
2024 Q2 (May) | $15/M tokens (GPT-4o) | 75% | Claude 3 Opus | $75/M tokens | Nebuly |
2024 Q3 (Aug) | $10/M tokens (GPT-4o) | 83% | Claude 3 Opus | $75/M tokens | Microsoft |
2024 Q3 (Sep) | $10/M tokens | 83% | o1 (reasoning) | $60/M tokens | Artificial Analysis |
2025 Q2 (May) | $10/M tokens | 83% | Claude Opus 4 | $75/M tokens | Anthropic, PromptHub |
2025 Q3 (Current) | $10/M tokens | 83% | Claude Opus 4.1 | $75/M tokens | Anthropic |
The frontier model price is remarkably stable. What drops in price is yesterday's newspaper—models that are no longer state-of-the-art.
Hidden inflation no one talks about
Modern models consume exponentially more tokens through:
- Longer context windows (128K → 200K → 1M tokens)
- Test-time compute (models that "think" longer use more tokens)
- Agentic workflows (models that iterate, check their work, and refine)
A simple query that used to return 500 tokens might now return 50,000 tokens as the model plans, researches, writes, and refines. The per-token cost stays flat, but you're burning 100x more tokens.
Real-world example
Model | Cost per 1M Requests | Tokens (In/Out) | Token Price per 1M | Latency |
---|---|---|---|---|
GPT-3.5 Turbo | $53.00 | 22/28 | $0.50/$1.50 | 485ms |
GPT-4o-mini | $19.50 | 22/27 | $0.15/$0.60 | 980ms |
GPT-4 Turbo | $850.00 | 22/21 | $10/$30 | 1,147ms |
GPT-5 | $1,526.25 | 21/150 | $1.25/$10 | 6,259ms |
Notice that GPT-5 has lower token prices than GPT-4 Turbo ($1.25/$10 vs $10/$30), yet it costs 80% more per request ($1,526 vs $850). Why? Because GPT-5 uses 7x more output tokens (150 vs 21).
What matters isn't the price per token—it's the cost to solve your problem. Optimize for cost per successful outcome, not cost per token.
Define your objective to know the tradeoff
So what are you actually optimizing for? You can't know until you ask the right people.
Assemble your the people who care about the outcome:
- Product Manager - owns user experience and conversion metrics
- Engineering Lead - owns performance, reliability, and technical architecture
- Data Scientist/ML Engineer - owns model quality and evaluation
- Finance/Operations - owns budget and unit economics
- Executive Leadership - owns strategic priorities and resource allocation
Gather them and ask: What matters most to us?
Common answers:
- Conversion to paid features
- User retention and engagement
- Customer support ticket reduction
- Response speed (time to first token)
- Cost per interaction or per user
- Accuracy on mission-critical tasks
- Throughput (queries per second)
You can't optimize for everything. That's the point. Pick your top 2-3 priorities and accept the tradeoffs.
Get everyone aligned
For small teams, alignment might happen in a 30-minute conversation. For larger organizations, you need explicit buy-in. In a bigger team or complex product, decide who makes the final call when there's conflict? The product manager? Head of engineering? CEO? You?
Bring this person on the journey from day one. Keep them involved. When you need to choose between 10% better accuracy or 50% lower costs, they make the call.
The golden rule: What can't be measured, can't be optimized. Establish your baseline first, then define what success looks like. Only then can you demonstrate real progress.
Once you have clear answers, move to Step 2: Know what you're spending on.