BFCL v2 Benchmark

BFCL v2 Benchmark | Phaseo

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Meta	Llama 2 70B Chat	20 Jun 2023	77.30%	inferred family alias from llama-3.3-70b-instruct (score=0.3129; benches=9)	Yes	Source
Nvidia	Llama 3.1 Nemotron Ultra 253B v1	07 Apr 2025	74.10%	-	Yes	Source
Nvidia	Llama 3.3 Nemotron Super 49B v1	18 Mar 2025	73.70%	-	Yes	Source
Nvidia	Llama 3.3 Nemotron Super 49B V1.5	-	73.70%	inferred version-family alias from llama-3.3-nemotron-super-49b-v1	Yes	Source
Nvidia	Llama 3.1 Nemotron Nano 4B V1.1	-	63.60%	inferred high-confidence family alias from llama-3.1-nemotron-nano-8b-v1 (score=0.5523; benches=7)	Yes	Source
Nvidia	Llama 3.1 Nemotron Nano 8B V1	18 Mar 2025	63.60%	-	Yes	Source

BFCL v2