Interactive artifacts and guides to contextualize our findings
Detailed
Benchmarks
Benchmarks
See an interactive benchmark level breakdown of individual LLM performance and the AI Frontier
Select a benchmark to view the chart
Choose any benchmark to see the results in the graph
How to:
read this data
First author Bradley Fowler talks you through the findings, calling out the exciting implications of them.