Everyone who's used Opus knows it's better than the others in a way that isn't captured by the benchmarks. I would describe it as taste.
Lots of models get really close on benchmarks, but benchmarks only tell us how good they are at solving a defined problem. Opus is far better at solving ill-defined ones.