Narev

Research

Data analysis and industry insights.

Tested the 'top of 2025' LLMs on a real task. GPT-3.5 won.

Tested the 'top of 2025' LLMs on a real task. GPT-3.5 won.

MMLU Pro winners against MMLU Pro loosers. How big is the gap on a single, repeatable task. The result is surprising.

oskar

October 14, 2025

View Article
The MMLU Benchmark Reproducibility Problem

The MMLU Benchmark Reproducibility Problem

MMLU scores vary by 13 points for the same model depending on who's measuring. Yet the "top" models differ by just 1%.

oskar

October 13, 2025

View Article
AI adoption rate for large firms continues to trend down

AI adoption rate for large firms continues to trend down

U.S. Census Bureau data shows AI adoption among large firms has continued to decline after peaking in July 2025, while smallest firms keep growing

oskar

October 11, 2025

View Article