Know your objective and who's in charge

Establish what success looks like before optimizing. Get your team aligned on goals and identify who makes the final call.

At the end of this step, you should have answer to the following questions:

What are our top 2-3 optimization priorities?
Who are the stakeholders, and do they agree?
Who is the decision-maker when tradeoffs get tough?

Understand the tradeoffs

Everything is a tradeoff. Speed vs. accuracy. Cost vs. quality. Latency vs. intelligence.

LLM providers want you to believe their flagship models are one-size-fits-all solutions. They're not.

Think about cars. You wouldn't use the same car for every job:

Car Type	Best For	Strength	Tradeoff
City car	Daily commutes	Fuel-efficient, easy to park	Limited cargo space
Bus	Mass transportation	Cost-effective per passenger	Slow, inflexible routes
Forklift	Heavy lifting	Precise load handling	Useless for transportation
Sports car	Maximum speed	Blazing fast	Terrible fuel economy
Pickup truck	Heavy cargo	High payload capacity	Poor fuel economy, less comfort
Motorcycle	Quick trips	Agile and economical	Limited capacity

You wouldn't drive a forklift to the grocery store. You wouldn't use a city car to traverse a mountain trail. The job determines the car you need.

LLMs work the same way. Different models for different tasks:

Simple queries? Use a lightweight model (the "city car")
Complex reasoning? Premium model justified (the "sports car")
Batch processing? Optimize for throughput (the "bus")
Real-time interaction? Optimize for speed (the "motorcycle")

Yet most teams default to the "sports car" for everything.

One size doesn't fit all

LLM providers invest heavily in marketing their flagship models as universal solutions. They showcase impressive benchmark scores (MMLU, HumanEval, GPQA) that measure "intelligence" in controlled environments.

But here's what those benchmarks ignore:

Cost: Premium models can be 100x more expensive
Latency: Premium models can be 2-3x slower—users abandon apps while waiting for the "perfect" answer
Overkill: Most queries don't need maximum intelligence

The newest model always costs the same

Yes, GPT-3.5 is now 10x cheaper than it was at launch. However, the price of the frontier models stay constant.

And the best model? It always costs roughly the same—around $15-75 per million tokens—because that's what the edge of compute costs today.

Time	Price of GPT-4 Class (Output)	Decrease Since Inception	Frontier Model (Most Capable)	Price (Output)	Source
2023 Q1 (Mar)	$60/M tokens	0% (baseline)	GPT-4	$60/M tokens	Nebuly
2023 Q4 (Nov)	$30/M tokens (GPT-4 Turbo)	50%	GPT-4 Turbo	$30/M tokens	Nebuly
2024 Q1 (Mar)	$30/M tokens	50%	Claude 3 Opus	$75/M tokens	PromptHub
2024 Q2 (May)	$15/M tokens (GPT-4o)	75%	Claude 3 Opus	$75/M tokens	Nebuly
2024 Q3 (Aug)	$10/M tokens (GPT-4o)	83%	Claude 3 Opus	$75/M tokens	Microsoft
2024 Q3 (Sep)	$10/M tokens	83%	o1 (reasoning)	$60/M tokens	Artificial Analysis
2025 Q2 (May)	$10/M tokens	83%	Claude Opus 4	$75/M tokens	Anthropic, PromptHub
2025 Q3 (Current)	$10/M tokens	83%	Claude Opus 4.1	$75/M tokens	Anthropic

The frontier model price is remarkably stable. What drops in price is yesterday's newspaper—models that are no longer state-of-the-art.

Hidden inflation no one talks about

Modern models consume exponentially more tokens through:

Longer context windows (128K → 200K → 1M tokens)
Test-time compute (models that "think" longer use more tokens)
Agentic workflows (models that iterate, check their work, and refine)

A simple query that used to return 500 tokens might now return 50,000 tokens as the model plans, researches, writes, and refines. The per-token cost stays flat, but you're burning 100x more tokens.

Real-world example

Hello world benchmark results

Source: A/B test run by Narev

Model	Cost per 1M Requests	Tokens (In/Out)	Token Price per 1M	Latency
GPT-3.5 Turbo	$53.00	22/28	$0.50/$1.50	485ms
GPT-4o-mini	$19.50	22/27	$0.15/$0.60	980ms
GPT-4 Turbo	$850.00	22/21	$10/$30	1,147ms
GPT-5	$1,526.25	21/150	$1.25/$10	6,259ms

Notice that GPT-5 has lower token prices than GPT-4 Turbo ($1.25/$10 vs $10/$30), yet it costs 80% more per request ($1,526 vs $850). Why? Because GPT-5 uses 7x more output tokens (150 vs 21).

What matters isn't the price per token—it's the cost to solve your problem. Optimize for cost per successful outcome, not cost per token.

Define your objective to know the tradeoff

So what are you actually optimizing for? You can't know until you ask the right people.

Assemble your the people who care about the outcome:

Product Manager - owns user experience and conversion metrics
Engineering Lead - owns performance, reliability, and technical architecture
Data Scientist/ML Engineer - owns model quality and evaluation
Finance/Operations - owns budget and unit economics
Executive Leadership - owns strategic priorities and resource allocation

Gather them and ask: What matters most to us?

Common answers:

Conversion to paid features
User retention and engagement
Customer support ticket reduction
Response speed (time to first token)
Cost per interaction or per user
Accuracy on mission-critical tasks
Throughput (queries per second)

You can't optimize for everything. That's the point. Pick your top 2-3 priorities and accept the tradeoffs.

Get everyone aligned

For small teams, alignment might happen in a 30-minute conversation. For larger organizations, you need explicit buy-in. In a bigger team or complex product, decide who makes the final call when there's conflict? The product manager? Head of engineering? CEO? You?

Bring this person on the journey from day one. Keep them involved. When you need to choose between 10% better accuracy or 50% lower costs, they make the call.

The golden rule: What can't be measured, can't be optimized. Establish your baseline first, then define what success looks like. Only then can you demonstrate real progress.

Once you have clear answers, move to Step 2: Know what you're spending on.

Overview Step 2: Know what you're spending on