Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Llama 2 70B Chat | 20 Jun 2023 | 77.30% | inferred family alias from llama-3.3-70b-instruct (score=0.3129; benches=9) | Yes | Source | |
| Llama 3.1 Nemotron Ultra 253B v1 | 07 Apr 2025 | 74.10% | - | Yes | Source | |
| Llama 3.3 Nemotron Super 49B v1 | 18 Mar 2025 | 73.70% | - | Yes | Source | |
| Llama 3.3 Nemotron Super 49B V1.5 | - | 73.70% | inferred version-family alias from llama-3.3-nemotron-super-49b-v1 | Yes | Source | |
| Llama 3.1 Nemotron Nano 4B V1.1 | - | 63.60% | inferred high-confidence family alias from llama-3.1-nemotron-nano-8b-v1 (score=0.5523; benches=7) | Yes | Source | |
| Llama 3.1 Nemotron Nano 8B V1 | 18 Mar 2025 | 63.60% | - | Yes | Source |