by @querulous-deer
This test evaluates if an LLM can maintain a historical persona's "knowledge wall". It checks if the model stays in character when pressured to discuss modern topics.
The benchmark uses 20 historical figures, such as Marie Curie and Julius Caesar, and gives the model their Wikipedia-sourced biographies. It then attempts to "break" the roleplay by asking questions about events or technologies that occurred after the person's death.
The goal is to ensure the AI doesn't drift from a a persona back into a "modern AI assistant".
Best in Class
Overpriced
Your Selection
Click on the model to make selection
| Config | Total Cost | Quality | Username | Position | |
|---|---|---|---|---|---|
| nousresearch/hermes-3-llama-3.... | $0.0000 | 92.0% | @querulous-deer | #1 | |
| meta-llama/llama-3.3-70b-instr... | $0.0000 | 86.0% | @querulous-deer | #2 | |
| z-ai/glm-4.5-air:free | $0.0000 | 84.0% | @querulous-deer | #3 | |
| mistralai/mistral-small-3.1-24... | $0.0000 | 76.0% | @querulous-deer | #4 | |
| nvidia/nemotron-3-nano-30b-a3b... | $0.0000 | 68.0% | @querulous-deer | #5 | |
| qwen/qwen3-coder:free | $0.0000 | 66.0% | @querulous-deer | #6 | |
| nvidia/nemotron-nano-9b-v2:fre... | $0.0000 | 66.0% | @querulous-deer | #6 | |
| moonshotai/kimi-k2:free | $0.0000 | 66.0% | @querulous-deer | #6 | |
| openai/gpt-oss-120b:free | $0.0000 | 64.0% | @querulous-deer | #9 | |
| openai/gpt-oss-20b:free | $0.0000 | 62.0% | @querulous-deer | #10 | |
| arcee-ai/trinity-mini:free | $0.0000 | 62.0% | @querulous-deer | #10 | |
| qwen/qwen3-next-80b-a3b-instru... | $0.0000 | 60.0% | @querulous-deer | #12 | |
| meta-llama/llama-3.2-3b-instru... | $0.0000 | 60.0% | @querulous-deer | #12 | |
| qwen/qwen-2.5-vl-7b-instruct:f... | $0.0000 | 58.0% | @querulous-deer | #14 | |
| meta-llama/llama-3.1-405b-inst... | $0.0000 | 52.0% | @querulous-deer | #15 | |
| xiaomi/mimo-v2-flash:free | $0.0000 | 50.0% | @querulous-deer | #16 | |
| google/gemma-3-12b-it:free | $0.0000 | 50.0% | @querulous-deer | #16 | |
| deepseek/deepseek-r1-0528:free | $0.0000 | 46.0% | @querulous-deer | #18 | |
| google/gemma-3-27b-it:free | $0.0000 | 44.0% | @querulous-deer | #19 | |
| google/gemma-3-4b-it:free | $0.0000 | 42.0% | @querulous-deer | #20 | |
| google/gemma-3n-e4b-it:free | $0.0000 | 32.0% | @querulous-deer | #21 | |
| nvidia/nemotron-nano-12b-v2-vl... | $0.0000 | 20.0% | @querulous-deer | #22 |
| Total Cost | Quality | |
|---|---|---|
| nousresearch/hermes-3-llama-3.... | $0.0000 | 92.0% |
| meta-llama/llama-3.3-70b-instr... | $0.0000 | 86.0% |
| z-ai/glm-4.5-air:free | $0.0000 | 84.0% |
| mistralai/mistral-small-3.1-24... | $0.0000 | 76.0% |
| nvidia/nemotron-3-nano-30b-a3b... | $0.0000 | 68.0% |
| qwen/qwen3-coder:free | $0.0000 | 66.0% |
| nvidia/nemotron-nano-9b-v2:fre... | $0.0000 | 66.0% |
| moonshotai/kimi-k2:free | $0.0000 | 66.0% |
| openai/gpt-oss-120b:free | $0.0000 | 64.0% |
| openai/gpt-oss-20b:free | $0.0000 | 62.0% |
| arcee-ai/trinity-mini:free | $0.0000 | 62.0% |
| qwen/qwen3-next-80b-a3b-instru... | $0.0000 | 60.0% |
| meta-llama/llama-3.2-3b-instru... | $0.0000 | 60.0% |
| qwen/qwen-2.5-vl-7b-instruct:f... | $0.0000 | 58.0% |
| meta-llama/llama-3.1-405b-inst... | $0.0000 | 52.0% |
| xiaomi/mimo-v2-flash:free | $0.0000 | 50.0% |
| google/gemma-3-12b-it:free | $0.0000 | 50.0% |
| deepseek/deepseek-r1-0528:free | $0.0000 | 46.0% |
| google/gemma-3-27b-it:free | $0.0000 | 44.0% |
| google/gemma-3-4b-it:free | $0.0000 | 42.0% |
| google/gemma-3n-e4b-it:free | $0.0000 | 32.0% |
| nvidia/nemotron-nano-12b-v2-vl... | $0.0000 | 20.0% |
Loading prompt execution data...