...

alphabetting

5790

Karma

2021-04-20

Created

Recent Activity

  • Commented: "Gemini 3.1 Pro"

    the agentic benchmarks for 3.1 indicate Gemini has caught up. the gains are big from 3.0 to 3.1.

    For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:

    1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%

  • 12 points1 commentswww.platformer.news

    A “whistleblower” tried to corroborate his viral post with AI-generated evidence. This is how I caught him. PLUS: Grok's image-generation crisis, and the rapture over Claude Opus 4.5

HackerNews