Chatbot Arena

A crowd-sourced LLM evaluation where humans compare blind side-by-side responses from two models and vote which is better. Produces an Elo rating per model. Currently the most-cited 'real-world quality' ranking because it captures preferences benchmarks miss. As of April 2026, Claude Opus 4.7 leads, with DeepSeek V4 the top open-weights model at 1342.