by @querulous-deer
HellaSwag benchmark dataset for common sense reasoning
Best in Class
Overpriced
Your Selection
Click on the model to make selection
| Position | Model Name | Config | Avg Cost / 1M req | Quality | Score | User |
|---|---|---|---|---|---|---|
#1 | meta-llama/llama-3-70b-instruct | $64.97 | 100.0% | 97.90 | ||
#2 | openai/gpt-5-nano | $219.87 | 100.0% | 92.80 | ||
#3 | google/gemma-3-27b-it | $5.30 | 60.0% | 59.90 | ||
#4 | undi95/remm-slerp-l2-13b | $68.57 | 60.0% | 58.70 | ||
#5 | z-ai/glm-5 | $163.21 | 20.0% | 18.90 | ||
#6 | z-ai/glm-4.7 | $612.45 | 20.0% | 16.00 | ||
#7 | mistralai/ministral-3b | $4.66 | 0.0% | 0.00 |
| Position | Model Name | Score |
|---|---|---|
#1 | meta-llama/llama-3-70b-instruct | 97.90 |
#2 | openai/gpt-5-nano | 92.80 |
#3 | google/gemma-3-27b-it | 59.90 |
#4 | undi95/remm-slerp-l2-13b | 58.70 |
#5 | z-ai/glm-5 | 18.90 |
#6 | z-ai/glm-4.7 | 16.00 |
#7 | mistralai/ministral-3b | 0.00 |
Loading prompt execution data...