by @querulous-deer
This benchmark evaluates the model's ability to perform transitive reasoning and maintain an internal mental model of relationships. By presenting a sequential chain of comparisons (e.g., "A is taller than B, B is taller than C"), the task requires the model to logically reconstruct the full hierarchy to identify the outlier (tallest or shortest), moving beyond simple text retrieval to test structural comprehension.
Best in Class
Overpriced
Your Selection
Click on the model to make selection
| Config | Total Cost | Quality | Username | Position | |
|---|---|---|---|---|---|
| mistralai/mistral-small-3.1-24... | $0.0000 | 100.0% | @querulous-deer | #1 | |
| mistralai/devstral-2512:free | $0.0000 | 100.0% | @querulous-deer | #1 | |
| google/gemma-3-4b-it:free | $0.0000 | 100.0% | @querulous-deer | #1 | |
| google/gemma-3-12b-it:free | $0.0000 | 100.0% | @querulous-deer | #1 | |
| z-ai/glm-4.5-air:free | $0.0000 | 98.0% | @querulous-deer | #5 | |
| qwen/qwen-2.5-vl-7b-instruct:f... | $0.0000 | 98.0% | @querulous-deer | #5 | |
| openai/gpt-oss-20b:free | $0.0000 | 98.0% | @querulous-deer | #5 | |
| nousresearch/hermes-3-llama-3.... | $0.0000 | 98.0% | @querulous-deer | #5 | |
| openai/gpt-oss-120b:free | $0.0000 | 90.0% | @querulous-deer | #9 | |
| google/gemma-3-27b-it:free | $0.0000 | 90.0% | @querulous-deer | #9 | |
| nvidia/nemotron-3-nano-30b-a3b... | $0.0000 | 86.0% | @querulous-deer | #11 | |
| meta-llama/llama-3.3-70b-instr... | $0.0000 | 86.0% | @querulous-deer | #11 | |
| moonshotai/kimi-k2:free | $0.0000 | 68.0% | @querulous-deer | #13 | |
| google/gemma-3n-e4b-it:free | $0.0000 | 68.0% | @querulous-deer | #13 | |
| qwen/qwen3-coder:free | $0.0000 | 44.0% | @querulous-deer | #15 | |
| xiaomi/mimo-v2-flash:free | $0.0000 | 42.0% | @querulous-deer | #16 | |
| meta-llama/llama-3.1-405b-inst... | $0.0000 | 24.0% | @querulous-deer | #17 | |
| nvidia/nemotron-nano-12b-v2-vl... | $0.0000 | 18.0% | @querulous-deer | #18 | |
| arcee-ai/trinity-mini:free | $0.0000 | 6.0% | @querulous-deer | #19 | |
| tngtech/tng-r1t-chimera:free | $0.0000 | 0.0% | @querulous-deer | #20 | |
| tngtech/deepseek-r1t2-chimera:... | $0.0000 | 0.0% | @querulous-deer | #20 | |
| tngtech/deepseek-r1t-chimera:f... | $0.0000 | 0.0% | @querulous-deer | #20 | |
| qwen/qwen3-next-80b-a3b-instru... | $0.0000 | 0.0% | @querulous-deer | #20 | |
| nvidia/nemotron-nano-9b-v2:fre... | $0.0000 | 0.0% | @querulous-deer | #20 | |
| meta-llama/llama-3.2-3b-instru... | $0.0000 | 0.0% | @querulous-deer | #20 | |
| deepseek/deepseek-r1-0528:free | $0.0000 | 0.0% | @querulous-deer | #20 |
| Total Cost | Quality | |
|---|---|---|
| mistralai/mistral-small-3.1-24... | $0.0000 | 100.0% |
| mistralai/devstral-2512:free | $0.0000 | 100.0% |
| google/gemma-3-4b-it:free | $0.0000 | 100.0% |
| google/gemma-3-12b-it:free | $0.0000 | 100.0% |
| z-ai/glm-4.5-air:free | $0.0000 | 98.0% |
| qwen/qwen-2.5-vl-7b-instruct:f... | $0.0000 | 98.0% |
| openai/gpt-oss-20b:free | $0.0000 | 98.0% |
| nousresearch/hermes-3-llama-3.... | $0.0000 | 98.0% |
| openai/gpt-oss-120b:free | $0.0000 | 90.0% |
| google/gemma-3-27b-it:free | $0.0000 | 90.0% |
| nvidia/nemotron-3-nano-30b-a3b... | $0.0000 | 86.0% |
| meta-llama/llama-3.3-70b-instr... | $0.0000 | 86.0% |
| moonshotai/kimi-k2:free | $0.0000 | 68.0% |
| google/gemma-3n-e4b-it:free | $0.0000 | 68.0% |
| qwen/qwen3-coder:free | $0.0000 | 44.0% |
| xiaomi/mimo-v2-flash:free | $0.0000 | 42.0% |
| meta-llama/llama-3.1-405b-inst... | $0.0000 | 24.0% |
| nvidia/nemotron-nano-12b-v2-vl... | $0.0000 | 18.0% |
| arcee-ai/trinity-mini:free | $0.0000 | 6.0% |
| tngtech/tng-r1t-chimera:free | $0.0000 | 0.0% |
| tngtech/deepseek-r1t2-chimera:... | $0.0000 | 0.0% |
| tngtech/deepseek-r1t-chimera:f... | $0.0000 | 0.0% |
| qwen/qwen3-next-80b-a3b-instru... | $0.0000 | 0.0% |
| nvidia/nemotron-nano-9b-v2:fre... | $0.0000 | 0.0% |
| meta-llama/llama-3.2-3b-instru... | $0.0000 | 0.0% |
| deepseek/deepseek-r1-0528:free | $0.0000 | 0.0% |
Loading prompt execution data...