Interactive artifacts and guides to contextualize our findings

Detailed
Benchmarks

See an interactive benchmark level breakdown of individual LLM performance and the AI Frontier
Select a benchmark to view the chart

Choose any benchmark to see the results in the graph

How to:
read this data

First author Bradley Fowler talks you through the findings, calling out the exciting implications of them.