Improved Gemini 2.5 Flash and Flash-Lite

2025-09-2517:20340188developers.googleblog.com

Today, we are releasing updated versions of Gemini 2.5 Flash and 2.5 Flash-Lite, available on

Rev21Flash_Wagtial_RD2-V01

Today, we are releasing updated versions of Gemini 2.5 Flash and 2.5 Flash-Lite, available on Google AI Studio and Vertex AI, aimed at continuing to deliver better quality while also improving the efficiency.

Intelligence vs End-to-End response time
Improvements in quality and speed for Gemini 2.5 Flash and 2.5 Flash Lite preview models compared to the current stable models
Output token efficiency
50% reduction in output tokens (hence costs) for Gemini 2.5 Flash-Lite and 24% reduction for Gemini 2.5 Flash

The latest version of Gemini 2.5 Flash-Lite was trained and built based on three key themes:

  • Better instruction following: The model is significantly better at following complex instructions and system prompts.
  • Reduced verbosity: It now produces more concise answers, a key factor in reducing token costs and latency for high-throughput applications (see charts above).
  • Stronger multimodal & translation capabilities: This update features more accurate audio transcription, better image understanding, and improved translation quality.


You can start testing this version today using the following model string: gemini-2.5-flash-lite-preview-09-2025.

This latest 2.5 Flash model comes with improvements in two key areas we heard consistent feedback on:

  • Better agentic tool use: We've improved how the model uses tools, leading to better performance in more complex, agentic and multi-step applications. This model shows noticeable improvements on key agentic benchmarks, including a 5% gain on SWE-Bench Verified, compared to our last release (48.9% → 54%).
  • More efficient: With thinking on, the model is now significantly more cost-efficient—achieving higher quality outputs while using fewer tokens, reducing latency and cost (see charts above).

We’re already seeing positive feedback from early testers. As Yichao ‘Peak’ Ji, Co-Founder & Chief Scientist at Manus, an autonomous AI agent, noted: “The new Gemini 2.5 Flash model offers a remarkable blend of speed and intelligence. Our evaluation on internal benchmarks revealed a 15% leap in performance for long-horizon agentic tasks. Its outstanding cost-efficiency enables Manus to scale to unprecedented levels—advancing our mission to Extend Human Reach.”

You can start testing this preview version today by using the following model string: gemini-2.5-flash-preview-09-2025.


Start building with Gemini

Over the last year, we’ve learned that shipping preview versions of our models allows you to test our latest improvements and innovations, provide feedback, and build production-ready experiences with the best of Gemini. Today’s releases are not intended to graduate to a new, stable version but will help us shape our future stable releases, and allow us to continue iterating and bring you the best of Gemini.

To make it even easier to access our latest models while also reducing the need to keep track of long model string names, we are also introducing a -latest alias for each model family. This alias always points to our most recent model versions, allowing you to experiment with new features without needing to update your code for each release. You can access the new previews using:


To ensure you have time to test new models, we will always provide a 2-week notice (via email) before we make updates or deprecate a specific version behind -latest. These are just model aliases so the rate limits, cost, and features available may fluctuate between releases.

For applications that require more stability, continue to use gemini-2.5-flash and gemini-2.5-flash-lite.

We continue to push the frontier of what is possible with Gemini and this release is just another step in that direction. We will have more to share soon, but in the meantime, happy building!


Read the original article

Comments

  • By davidmckayv 2025-09-2518:288 reply

    This really captures something I've been experiencing with Gemini lately. The models are genuinely capable when they work properly, but there's this persistent truncation issue that makes them unreliable in practice.

    I've been running into it consistently, responses that just stop mid-sentence, not because of token limits or content filters, but what appears to be a bug in how the model signals completion. It's been documented on their GitHub and dev forums for months as a P2 issue.

    The frustrating part is that when you compare a complete Gemini response to Claude or GPT-4, the quality is often quite good. But reliability matters more than peak performance. I'd rather work with a model that consistently delivers complete (if slightly less brilliant) responses than one that gives me half-thoughts I have to constantly prompt to continue.

    It's a shame because Google clearly has the underlying tech. But until they fix these basic conversation flow issues, Gemini will keep feeling broken compared to the competition, regardless of how it performs on benchmarks.

    https://github.com/googleapis/js-genai/issues/707

    https://discuss.ai.google.dev/t/gemini-2-5-pro-incomplete-re...

    • By golfer 2025-09-2519:512 reply

      Unfortunately Gemini isn't the only culprit here. I've had major problems with ChatGPT reliability myself.

      • By SilverElfin 2025-09-2521:38

        I think what I am seeing from ChatGPT is highly varying performance. I think this must be something they are doing to manage limitations of compute or costs. With Gemini, I think what I see is slightly different - more like a lower “peak capability” than ChatGPT’s “peak capability”.

      • By mguerville 2025-09-2520:422 reply

        I only hit that problem in voice mode, it'll just stop halfway and restart. It's a jarring reminder of its lack of "real" intelligence

        • By patrickmcnamara 2025-09-2521:28

          I've heard a lot that voice mode uses a faster (and worse) model than regular ChatGPT. So I think this makes sense. But I haven't seen this in any official documentation.

        • By Narciss 2025-09-2521:52

          This is more because of VAD - voice activity detection

    • By driese 2025-09-2521:092 reply

      Small things like this or the fact that AI studio still has issues with simple scrolling confuse me. How does such a brilliant tool still lack such basic things?

      • By normie3000 2025-09-2521:27

        I see Gemini web frequently break its own syntax highlighting.

      • By brap 2025-09-2522:44

        The scrolling in AI Studio is an absolute nightmare and somehow they managed to make it worse.

        It’s so annoying that you have this super capable model but you interact with it using an app that is complete ass

    • By simlevesque 2025-09-2519:33

      The latest comment on that issue is someone saying there's a fix available for you to try.

    • By dorianmariecom 2025-09-2518:471 reply

      chatgpt also has lots of reliability issues

      • By diego_sandoval 2025-09-2518:582 reply

        If anyone from OpenAI is reading this, I have two complaints:

        1. Using the "Projects" thing (Folder organization) makes my browser tab (on Firefox) become unusably slow after a while. I'm basically forced to use the default chats organization, even though I would like to organize my chats in folders.

        2. After editing a message that you already sent,you get to select between the different branches of the chat (1/2, and so on), which is cool, but when ChatGPT fails to generate a response in this "branched conversation" context, it will continue failing forever. When your conversation is a single thread and a ChatGPT message fails with an error, re trying usually works and the chat continues normally.

        • By porridgeraisin 2025-09-2520:161 reply

          And 3)

          On mobile (android) opening the keyboard scrolls the chat to the bottom! I sometimes want to type referring something from the middle of the LLMs last answer.

          • By Sabinus 2025-09-2522:26

            Projects should have their own memory system. Perhaps something more interactive than the existing Memories but projects need their own data (definitions, facts, draft documents) that is iterated on and referred to per project. Attached documents aren't it, the AI needs to be able to update the data over multiple chats.

        • By zarmin 2025-09-2519:251 reply

          It would also be nice if ChatGPT could move chats between projects. My sidebar is a nightmare.

          • By throwaway240403 2025-09-2520:521 reply

            You can drag and drop chats between projects

            • By zarmin 2025-09-2523:24

              i know. i want the assistant to do it. shouldn't it be able to do work on its own platform?

    • By m101 2025-09-2520:36

      I wonder if this is because a memory cap was reached at that output token. Perhaps they route conversations to different hardware depending on how long they expect it to be.

    • By tanvach 2025-09-2521:09

      Yes agree, it was totally broken when I tested the API two months ago. Lots of failed to connect and very slow response time. Hoping the update fixes these issues.

    • By mattmanser 2025-09-2519:14

      That used to happen a lot in ChatGPT too.

    • By reissbaker 2025-09-2521:221 reply

      FWIW, I think GLM-4.5 or Kimi K2 0905 fit the bill pretty well in terms of complete and consistent.

      (Disclosure: I'm the founder of Synthetic.new, a company that runs open-source LLMs for monthly subscriptions.)

      • By noname120 2025-09-2521:38

        That’s not a “disclosure”, that’s an ad.

  • By simonw 2025-09-2518:522 reply

    I added support to these models to my llm-gemini plugin, so you can run them like this (using uvx so no need to install anything first):

      export LLM_GEMINI_KEY='...'
      uvx --isolated --with llm-gemini llm -m gemini-flash-lite-latest 'An epic poem about frogs at war with ducks'
    
    Release notes: https://github.com/simonw/llm-gemini/releases/tag/0.26

    Pelicans: https://github.com/simonw/llm-gemini/issues/104#issuecomment...

    • By zamalek 2025-09-2522:511 reply

      I wonder if [good examples of] SVGs of pelicans on bikes are "being introduced" into training sets. Some of the engineers who work on this stuff are the kind to hang out here.

      • By simonw 2025-09-2522:57

        It's possible, but honestly I've never seen a decent vector illustration of a pelican on a bicycle myself so they'd have to work pretty hard to find one!

    • By canadiantim 2025-09-2519:002 reply

      Who wins in the end? the frogs? the ducks? or the pelicans?

      • By tclancy 2025-09-2520:24

        I heard the dragon took the pole, but it may have been wind-aided.

      • By nine_k 2025-09-2520:18

        This depends on the value of your LLM_GEMINI_KEY!

  • By herpderperator 2025-09-2522:224 reply

    Serious question: If it's an improved 2.5 model, why don't they call it version 2.6? Seems annoying to have to remember if you're using the old 2.5 or the new 2.5. Kind of like when Apple released the third-gen iPad many years ago and simply called it the "new iPad" without a number.

    • By skerit 2025-09-2522:28

      That's why people called the second version of Sonnet v3.5 simply v3.6, and Anthropic acknowledged that by naming the next version v3.7

    • By alwillis 2025-09-2522:353 reply

      It's pretty common to refer to models by the month and year they were released.

      For example, the latest Gemini 2.5 Flash is known as "google/gemini-2.5-flash-preview-09-2025" [1].

      [1]: https://openrouter.ai/google/gemini-2.5-flash-preview-09-202...

      • By cpeterso 2025-09-2523:11

        If they're going to include the month and year as part of the version number, they should at least use big endian dates like gemini-2.5-flash-preview-2025-09 instead of 09-2025.

      • By relatedtitle 2025-09-2523:11

        I'm pretty sure Google just does that for preview models and they drop the date from the name when it's released.

      • By herpderperator 2025-09-2522:361 reply

        Or, you know, just Gemini 2.6 Flash. I don't recall the 2.5 version having a date associated with it when it came out, though maybe they are using dates now. In marketing, at least, it's always known as Gemini 2.5 Flash/Pro.

        • By kingo55 2025-09-2522:47

          It had a date, but I also agree this is extremely confusing. Even semver 2.5.1 would be clearer IMO.

    • By qafy 2025-09-2522:35

      2.5 is not the version number, it's the generation of the underlying model architecture. Think of it like the trim level on a Mazda 3 hatchback. Mazda already has the Mazda 3 Sport in their lineup, then later they release the Mazda 3 Turbo which is much faster. When they release this new version of the vehicle its not called the Mazda 4... that would be an entirely different vehicle based on a new platform and powertrain etc (if it existed). The new vehicle is just a new trim level / visual refresh of the existing Mazda 3.

      That's why Google names it like this, but I agree its dumb. Semver would be easier.

    • By JumpCrisscross 2025-09-2522:511 reply

      Maybe they’re signalling it’s more of a bug fix?

      • By manquer 2025-09-260:151 reply

        2.5.1 then .

        semantic versioning works for most scenarios.

        • By JumpCrisscross 2025-09-260:26

          Would that automatically roll over anyone pinging 2.5 via their API?

HackerNews