Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite

2025-06-1716:06368217blog.google

Gemini 2.5 Flash and Pro are now generally available, and we’re introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.

[{"model": "blogsurvey.survey", "pk": 7, "fields": {"name": "Article Improvements - March 2025", "survey_id": "article-improvements-march-2025_250321", "scroll_depth_trigger": 75, "previous_survey": null, "display_rate": 75, "thank_message": "Thank you!", "thank_emoji": "✅", "questions": "[{\"id\": \"5a12fd89-d978-4a1b-80e5-2442a91422be\", \"type\": \"simple_question\", \"value\": {\"question\": \"How could we improve this article?\", \"responses\": [{\"id\": \"30122b0d-1169-4376-af7c-20c9de52c91c\", \"type\": \"item\", \"value\": \"Make it more concise\"}, {\"id\": \"18f3016a-7235-468b-b246-ffe974911ae9\", \"type\": \"item\", \"value\": \"Add more detail\"}, {\"id\": \"5d19c11d-6a61-49d3-9f1d-dad5d661ba4f\", \"type\": \"item\", \"value\": \"Make it easier to understand\"}, {\"id\": \"97064d1f-d9af-4a83-a44f-a84f8ed899d6\", \"type\": \"item\", \"value\": \"Include more images or videos\"}, {\"id\": \"a9ec2a70-c7c5-4f00-a179-31a7b5641879\", \"type\": \"item\", \"value\": \"It's fine the way it is\"}]}}]", "target_article_pages": true}}]

We’re expanding our Gemini 2.5 family of models

Blue and black futuristic illustration with Gemini 2.5 logo in the middle

We designed Gemini 2.5 to be a family of hybrid reasoning models that provide amazing performance, while also being at the Pareto Frontier of cost and speed. Today, we’re taking the next step with our 2.5 Pro and Flash models by releasing them as stable and generally available. And we’re bringing you 2.5 Flash-Lite in preview — our most cost-efficient and fastest 2.5 model yet.

Making 2.5 Flash and 2.5 Pro generally available

Thanks to all of your feedback, today we’re releasing stable versions of 2.5 Flash and Pro, so you can build production applications with confidence. Developers like Spline and Rooms and organizations like Snap and SmartBear have already been using the latest versions in-production for the last few weeks.

Introducing Gemini 2.5 Flash-Lite

We’re also introducing a preview of the new Gemini 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet. You can start building with the preview version now, and we’re looking forward to your feedback.

2.5 Flash Lite has all-around higher quality than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks. It excels at high-volume, latency-sensitive tasks like translation and classification, with lower latency than 2.0 Flash-Lite and 2.0 Flash on a broad sample of prompts. It comes with the same capabilities that make Gemini 2.5 helpful, including the ability to turn thinking on at different budgets, connecting to tools like Google Search and code execution, multimodal input, and a 1 million-token context length.

See more details about our 2.5 family of models in the latest Gemini technical report.

Gemini 2.5 Flash Lite benchmarks table

The preview of Gemini 2.5 Flash-Lite is now available in Google AI Studio and Vertex AI, alongside the stable versions of 2.5 Flash and Pro. Both 2.5 Flash and Pro are also accessible in the Gemini app. We’ve also brought custom versions of 2.5 Flash-Lite and Flash to Search.

We can’t wait to see what you continue to build with Gemini 2.5.


Read the original article

Comments

  • By simonw 2025-06-1717:107 reply

    They don't mention it in the post, but it looks like this includes a price increase for the Gemini 2.5 Flash model.

    For 2.5 Flash Preview https://web.archive.org/web/20250616024644/https://ai.google...

    $0.15/million input text / image / video

    $1.00/million audio

    Output: $0.60/million non-thinking, $3.50/million thinking

    The new prices for Gemini 2.5 Flash ditch the difference between thinking and non-thinking and are now: https://ai.google.dev/gemini-api/docs/pricing

    $0.30/million input text / image / video (2x more)

    $1.00/million audio (same)

    $2.50/million output - significantly more than the old non-thinking price, less than the old thinking price.

    • By Workaccount2 2025-06-1717:464 reply

      The blog post has more info about the pricing changes

      https://developers.googleblog.com/en/gemini-2-5-thinking-mod...

      • By jjani 2025-06-1718:293 reply

        The real news is that non-thinking output is now 4x more expensive, which they of course carefully avoid mentioning in the blog, only comparing the thinking prices.

        How cute they are with their phrasing:

        > $2.50 / 1M output tokens (*down from $3.50 output)

        Which should be "up from $0.60 (non-thinking)/down from $3.50 (thinking)"

        • By recursive 2025-06-1816:102 reply

          I have LLM fatigue, so I'm not paying attention to headlines... but LLMs are thinking now? That used to be a goal post. "AI can't do {x} because it's not thinking." Now it's part of a pricing chart?

          How did I miss this?

          • By svachalek 2025-06-1817:09

            "Thinking" means spamming a bunch of stream-of-consciousness bs before it actually generates the final answer. It's kind of like the old trick of prompting to "think step by step". Seeding the context full of relevant questions and concepts improves the quality of the final generation, even though it's rarely a direct conclusion of the so-called thinking before it.

          • By stirfish 2025-06-1819:49

            "Thinking" really just means "write on some scratch paper" for llms.

        • By amazingamazing 2025-06-1718:492 reply

          Is it possible to get non-thinking only now, though? If not, why would that matter, since it's irrelevant?

          • By jjani 2025-06-1718:542 reply

            Yes, by setting the thinking budget to 0. Which is very common when a task doesn't need thinking.

            In addition, it's also relevant because for the last 3 months people have built things on top of this.

            • By Workaccount2 2025-06-1722:023 reply

              To be fair, the point of preview models and stable releases is so you know what is stable to build on.

              • By Aeolun 2025-06-1723:491 reply

                The moment you start charging for preview stuff I think you give a tacit agreement that you can expect the price to not increase by a factor of 4.

                • By woleium 2025-06-184:371 reply

                  that’s a somewhat naïve viewpoint.

                  • By Aeolun 2025-06-1823:33

                    I think the fact that everyone is like ‘wtf’ now kind of reinforces my viewpoint?

                    Doesn’t mean you can’t do it, but people won’t be happy.

              • By jjani 2025-06-186:33

                Gmail was in beta for what, 2 decades? Did you never use it during that time? They've been using these "Preview" models on their non-technical user facing Gemini app and product for months now. Like, Google themselves has been using them in production, on their main apps. And gemini-1.5-pro is 2 months from depreciation and there was no production alternative.

                They told everyone to build their stuff on top of it, and then jacked up the price by 4x. Just pointing to some fine print doesn't change that.

              • By vardump 2025-06-1815:32

                I'd be more worried about Google just discontinuing another product. For example Stadia was similarly high profile, but it's gone now.

                More examples here: https://killedbygoogle.com/

            • By amazingamazing 2025-06-1718:571 reply

              interesting - why wouldn't you use dynamic thinking? and yeah, sucks when the price changes.

              • By dcre 2025-06-1721:37

                It makes responses much slower with zero benefit for many tasks. Flash with thinking off is very fast.

          • By drag0s 2025-06-1720:261 reply

            one example where non-thinking matters would be latency-sensitive workflows, for example voice AI.

            • By jjani 2025-06-1723:221 reply

              Correct, though pretty much anything end-user facing is latency-sensitive, voice is a tiny percentage. No one likes waiting, the involvement of an LLM doesn't change this from a user PoV.

              • By eru 2025-06-181:21

                I wonder if you can hide the latency, especially for voice?

                What I have in mind is to start the voice response with a non-thinking model, say a sentence or two in a fraction of a second. That will take the voice model a few seconds to read out. In that time, you use a thinking model to start working on the next part of the response?

                In a sense, very similar to how everyone knows to stall in an interview by starting with 'this is a very good question...', and using that time to think some more.

        • By drift_code 2025-06-1718:502 reply

          They seem just rebrand the non-thinking model to flash-lite, so it’s less expensive than before

          • By jjani 2025-06-1718:531 reply

            Not at all. Non-thinking flash is... flash with the thinking budget set to 0 (which you can still run that way, just at 2x input 4x output pricing). Flash-lite is far weaker, unusable for the overwhelming majority of usecases of flash. A quick glance at the benchmark reveals this.

            • By rvnx 2025-06-1719:03

              Yeah, so basically their announcement is "good news, we tripled the price, and will deprecate Gemini Flash 2.0 asap"

          • By mcintyre1994 2025-06-1718:54

            The OP says Flash-Lite has thinking and non-thinking, so it’s not that simple.

      • By aryehof 2025-06-186:28

        > Today we are excited to share updates …

        They are obviously excited about their price increase

      • By pama 2025-06-1722:041 reply

        “While we strive to maintain consistent pricing between preview and stable releases to minimize disruption, this is a specific adjustment reflecting Flash’s exceptional value, still offering the best cost-per-intelligence available.”

        • By cadence- 2025-06-183:471 reply

          Anthropic did the same thing with their Haiku model when they released version 3.5. I hate it.

          • By pama 2025-06-184:21

            Pricing is a hard problem. Theoretically, if companies occasionally raise prices dramatically once something is useful, they sometimes can create early demand and more testers for future product releases. Ofc they have to be careful to avoid annoying regular users too much. When you sell the harm is limited to late users, but when you rent it is harder to figure out the optimal strategy.

      • By shock 2025-06-1720:451 reply

        Do you work for google?

    • By irthomasthomas 2025-06-1717:386 reply

      "Soon, AI too cheap to meter" "Meantime, price go up".

      • By llm_nerd 2025-06-1719:37

        Not too long ago Google was a bit of a joke in AI and their offerings were uncompetitive. For a while a lot of their preview/beta models had a price of 0.00. They were literally giving it away for free to try to get people to consider their offerings when building solutions.

        As they've become legitimately competitive they have moved towards the pricing of their competitors.

      • By skybrian 2025-06-1717:521 reply

        There are a lot more price drops, though.

        • By cdblades 2025-06-1813:22

          From prices that were already losing services money.

          If you aren't making a profit, lowering prices is only about trying to capture market share before you're forced to increase prices to remain solvent.

      • By tekno45 2025-06-1717:51

        "will be too cheap to meter" means we're definitely metering it now.

      • By victorbjorklund 2025-06-1720:511 reply

        Just google. They were behind. So they just dumped their prices to get a foot in the door. Now they are popular and can raise it to market prices.

        • By sodality2 2025-06-1721:111 reply

          I still don’t think there’s any real stickiness to using a Google model over any other model, with things like openrouter. So maybe for brand recognition alone.

          • By victorbjorklund 2025-06-1916:56

            Yea, but brand have some stickiness. Maybe not for the absolute nerds but lots of people just stick to what they are already using. Look at all the people just using ChatGPT because that is what they tried first.

      • By tom_m 2025-06-1720:31

        No way. AI pricing is going up because people are willing to pay for it.

      • By nicce 2025-06-1717:523 reply

        We have likely seen the cheapest prices already. Once we can’t function without them anymore - go as high as you can!

        • By hirako2000 2025-06-1720:383 reply

          By then comparable or even better models will easily run on edge.

          So if they crank up the prices we could just switch to local and not get lured by bigger and bigger models, rag, Agentic, MCP driven tech as if all of that couldn't run locally either.

          • By gnatolf 2025-06-1721:41

            I am not as optimistic that locally run models will be able to compete anytime soon. And even if, the price to run them means you have to buy the compute/gear for a price that is likely equivalent to a lot of 'remote' tokens

          • By cdblades 2025-06-1813:221 reply

            > By then comparable or even better models will easily run on edge.

            What are you basing that on?

            • By croon 2025-06-1817:061 reply

              Presumably your goal is to extract some practical value from this and not just higher benchmark numbers. If you can get the functionality you need from last-gen, there's no point in paying for next-gen. YMMV.

              • By hirako2000 2025-06-1914:10

                Indeed that was the premise that we would be a step behind when running on edge. That's already the case.

          • By nicce 2025-06-1722:09

            The most meaningful models will run in the future on those trillion dollar data centers that are currently being build.

        • By nico 2025-06-1717:582 reply

          Hopefully we get more competition and someone willing to undercut the more expensive options

          • By overfeed 2025-06-1720:37

            It's more likely the shareholder zeitgeist will soon shift to demanding returns on the ungodly amounts already invested into AI.

          • By nicce 2025-06-1719:311 reply

            Entering the market and being competitive gets more difficult all the time. People want the best and fastest models - can you compete with trillion dollar datacenters?

            • By eru 2025-06-181:23

              You might be right, but there's plenty of deep pocketed companies who are still very excited to compete in this market.

        • By eru 2025-06-181:22

          You know that competition is a thing, do you?

    • By tonyhart7 2025-06-1723:03

      I know they undercutting the price a lot, because at first launch gemini price is not make sense seeing it cheaper than competition (like a lot cheaper)

      finally we starting to see the real price

    • By rudedogg 2025-06-1717:252 reply

      A cool 2x+ price increase.

      And Gemini 2.0 Flash was $0.10/$0.40.

      • By __jl__ 2025-06-1718:42

        1.5 -> 2.0 was a price increase as well (double, I think, and something like 4x for image input)

        Now 2.0 -> 2.5 is another hefty price increase.

      • By jjani 2025-06-1718:26

        4x price increase over preview output for non-thinking.

    • By slig 2025-06-1813:36

      FWIW: On OpenRouter, the non `:thinking` 2.5 flash endpoint seems to be returning reasoning tokens now.

    • By dangoodmanUT 2025-06-1719:48

      Good catch, that's a pretty notable change considering this was about to be the GOAT of audio-to-audio

    • By k8sToGo 2025-06-1717:282 reply

      You can also see this difference in open router.

      But why is there only thinking flash now?

      • By Tiberium 2025-06-1717:371 reply

        It might be a bit confusing, but there's no "only thinking flash" - it's a single model, and you can turn off thinking if you set thinking budget to 0 in the API request. Previously 2.5 Flash Preview was much cheaper with the thinking budget set to 0, now the price is the same. Of course, with thinking enabled the model will still use far more output tokens than the non-thinking mode.

        • By davedx 2025-06-1811:08

          Interesting design choice, and makes me think of "Thinking, Fast and Slow" by Kahneman.

          (I thought of it quickly, not slowly, so the comparison may only be surface deep.)

      • By hnuser123456 2025-06-1717:36

        Apparently, you can make a request to 2.5 flash to not use thinking, but it will still sometimes do it anyways, this has been an issue for months, and hasn't been fixed by model updates: https://github.com/google-gemini/cookbook/issues/722

  • By varun_chopra 2025-06-1716:4412 reply

    At one point, when they made Gemini Pro free on AI Studio, Gemini was the model of choice for many people, I believe.

    Somehow it's gotten worse since then, and I'm back to using Claude for serious work.

    Gemini is like that guy who keeps talking but has no idea what he's actually talking about.

    I still use Gemini for brainstorming, though I take its suggestions with several grains of salt. It's also useful for generating prompts that I can then refine and use with Claude.

    • By therealmarv 2025-06-1717:253 reply

      not according to Aider leaderboard https://aider.chat/docs/leaderboards/

      I use only the APIs directly with Aider (so no experience with AI Studio).

      My feeling with Claude is that they still perform good with weak prompts, the "taste" is maybe a little better when the direction is kinda unknown by the prompter.

      When the direction is known I see Gemini 2.5 Pro (with thinking) on top of Claude with code which does not break. And with o4-mini and o3 I see more "smart" thinking (as if there is a little bit of brain inside these models) at the expense of producing unstable code (Gemini produces more stable code).

      I see problems with Claude when complexity increases and I would put it behind Gemini and o3 in my personal ranking.

      So far I had no reason to go back to Claude since o3-mini was released.

      • By stavros 2025-06-1717:553 reply

        I just spent $35 for Opus to solve a problem with a hardware side-project (I'm turning an old rotary phone into a meeting handset so I can quit meetings by hanging up, if you must know). It didn't solve the problem, it churned and churned and spent a ton of money.

        I was much more satisfied with o3 and Aider, I haven't tried them on this specific problem but I did quite a bit of work on the same project with them last night. I think I'm being a bit unfair, because what Claude got stuck on seems to be a hard problem, but I don't like how they'll happily consume all my money trying the same things over and over, and never say "yeah I give up".

      • By macNchz 2025-06-1718:164 reply

        Using all of the popular coding models pretty extensively over the past year, I've been having great success with Gemini 2.5 Pro as far as getting working code the first time, instruction following around architectural decisions, and staying on-task. I use Aider and write mostly Python, JS, and shell scripts. I've spent hundreds of dollars on the Claude API over time but have switched almost entirely to Gemini. The API itself is also much more reliable.

        My only complaint about 2.5 Pro is around the inane comments it leaves in the code (// Deleted varName here).

        • By ZeWaka 2025-06-1718:213 reply

          If you use one of the AI static instructions methods (e.g., .github/copilot-instructions.md) and tell it to not leave the useless comments, that seems to solve the issue.

          • By macNchz 2025-06-1718:381 reply

            I've been intending to try some side by side tests with and without a conventions file instructing it not to leave stupid comments—I'm curious to see if somehow they're providing value to the model, e.g. in multi-turn edits.

            • By luckydata 2025-06-1718:57

              it's easier to just make it do a code review with focus on removing unhelpful comments instead of asking it not to do it the first time. I do the cleanup after major rounds of work and that strategy seems to work best for me.

          • By jjani 2025-06-1718:46

            This was not my experience with the earlier preview (03), where its insistence on comment spam was too strong to overcome. Wonder if this adherence improved in the 05 or 06 updates.

          • By sans_souse 2025-06-1722:32

            can you elaborate on this?

        • By dominicrose 2025-06-1812:361 reply

          I don't mind the comments, I read them while removing them. It's normal to have to adapt the output, change some variable names, refactor a bit. What's impressive is that the output code actually works (or almost). I didn't give it the hardest of problems to solve/code but certainly not easy ones.

          • By macNchz 2025-06-1822:29

            Yeah I've mostly just embraced having to remove them as part of a code review, helps focus the review process a bit, really.

        • By avereveard 2025-06-1722:47

          I'm using pro for backend and claude for ux work, claude is so much thoughtful about how user interact with software and can usually replicate better the mock up that gpt4o image generator produces, while not being overly fixated on the mockup design itself.

          My complaint is that it catches python exceptions and don't log them by default.

        • By miki123211 2025-06-189:55

          And the error handling. God, does it love to insert random try/except statements everywhere.

      • By hirako2000 2025-06-1720:50

        You feelings of a little brain in there, and stable code are unfounded. All these models collapse pretty fast. If not due to context limit, then in their inability to interpret problems.

        An LLM is just statistical regressions with a llztjora of engineering tricks, mostly NLP to produce an illusion.

        I don't mean it's useless. I mean comparing these ever evolving models is like comparing escort staff in NYC vs those in L.A, hard to reach any conclusjon. We are getting fooled.

        On the price increase, it seems Google was aggressively looking for adoption, Gemini was for a short range of time the best value for money of all the LLMs out there. Adoption likely surged, scaling needs be astronomical, costing Google billions to keep up. The price adjustment could've been expected before they announced it.

    • By unshavedyak 2025-06-1717:211 reply

      Yea, i had similar experiences. At first it felt like it solved complex problems really well, but then i realized i was having trouble steering it for simple things. It was also very verbose.

      Overall though my primary concern is the UX, and Claude Code is the UX of choice for me currently.

    • By willseth 2025-06-1717:291 reply

      Same experience here. I even built a Gem with am elaborate prompt instructing it how to be concise, but it still gives annoying long-winded responses and frequently expands the scope of its answer far beyond the prompt.

      • By theturtletalks 2025-06-1717:491 reply

        I feel like this is part of the AI playbook now. Launch a really strong, capable model (expensive price inference) and once users think it’s SOTA, neuter it so the cost is cheaper and most users won’t notice.

        The same happened with GPT-3.5. It was so good early on and got worse as OpenAI began to cut costs. I feel like when GPT-4.1 was cloaked as Optimus on Openrouter, it was really good, but once it launched, it also got worse.

        • By carlos22 2025-06-1718:082 reply

          That is the capitalism' playbook all along. Its just much faster because its just software. But they do it for everything all the time.

          • By theturtletalks 2025-06-1718:321 reply

            I disagree with the comparison between LLM behavior and traditional software getting worse. When regular software declines in quality, it’s usually noticeable through UI changes, release notes, or other signals. Companies often don’t bother hiding it, since their users are typically locked into their ecosystem.

            LLMs, on the other hand, operate under different incentives. It’s in a company’s best interest to initially release the strongest model, top the benchmarks, and then quietly degrade performance over time. Unlike traditional software, LLMs have low switching costs, users can easily jump to a better alternative. That makes it more tempting for companies to conceal model downgrades to prevent user churn.

            • By jjani 2025-06-1718:491 reply

              > When regular software declines in quality, it’s usually noticeable through UI changes, release notes, or other signals.

              Counterexample: 99% of average Joes have no idea how incredibly enshittified Google Maps has become, to just name one app. These companies intentionally boil the frog very slowly, and most people are incredibly bad at noticing gradual changes (see global warming).

              Sure, they could know by comparing, but you could also know whether models are changing behind the scenes by having sets of evals.

              • By theturtletalks 2025-06-1719:02

                This is where switching costs matter. Take Google Maps, many people can’t switch to another app. In some areas, it’s the only app with accurate data, so Google can degrade the experience without losing users.

                We can tell it’s getting worse because of UI changes, slower load times, and more ads. The signs are visible.

                With LLMs, it’s different. There are no clear cues when quality drops. If responses seem off, users often blame their own prompts. That makes it easier for companies to quietly lower performance.

                That said, many of us on HN use LLMs mainly for coding, so we can tell when things get worse.

                Both cases involve the “boiling frog” effect, but with LLMs, users can easily jump to another pot. With traditional software, switching is much harder.

          • By andybak 2025-06-1718:25

            Do you mind explaining how you see this working as a nefarious plot? I don't see an upside in this case so I'm going with the old "never ascribe to malice" etc

    • By jasonjmcghee 2025-06-1718:48

      I have no inside information but feels like they quantized it. I've seen patterns that I usually only see in quantized models like getting stuck repeating a single character indefinitely

    • By noisy_boy 2025-06-1720:002 reply

      They should just roll back to the preview versions. Those were so much more even keeled and actually did some useful pushback instead of this cheerleader-on-steroids version they GA'd.

      • By samvher 2025-06-1815:01

        Yes I was very surprised after the whole "scandal" around ChatGPT becoming too sycophantic that there was this massive change in tone from the last preview model (05-06) to the 06-05/GA model. The tone is really off-putting, I really liked how the preview versions felt like intelligent conversation partners and recognize what you're saying about useful pushback - it was my favorite set of models (the few preview iterations before this one) and I'm sad to see them disappearing.

        Many people on the Google AI Developer forums have also noted either bugs or just performance regression in the final model.

      • By k8sToGo 2025-06-1720:091 reply

        But they claim it's the same model and version?

        • By noisy_boy 2025-06-180:43

          I don't know but it sure doesn't feel the same. I have been using Gemini 2.5 pro (preview and now GA) for a while. The difference in tone is palpable. I also noticed that the preview took longer time and the GA is faster so it could be quantization.

          Maybe a bunch of people with authority to decide thought that it was too slow/expensive/boring and screwed up a nice thing.

    • By huevosabio 2025-06-1717:24

      They made it talk like buzzfeed articles for every single interaction. It's absolutely horrible

    • By FirmwareBurner 2025-06-1717:18

      I found Gemini now terrible for coding. I gave it my code blocks and told it what to change and it added tonnes and tonnes of needles extra code plus endless comments. It turned a tight code into a Papyrus.

      ChatGPT is better but tends to be too agreeable, never trying to disagree with what you say even if it's stupid so you end up shooting yourself in the foot.

      Claude seems like the best compromise.

      Just my two kopecks.

    • By UncleOxidant 2025-06-1716:581 reply

      Used to be able to use Gemini Pro free in cline. Now the API limits are so low that you immediately get messages about needing to top up your wallet and API queries just don't go through. Back to using DeepSeek R1 free in cline (though even that eventually stops after a few hours and you have to wait until the next day for it to work again). Starting to look like I need to setup a local LLM for coding - which means it's time to seriously upgrade my PC (well, it's been about 10 years so it was getting to be time anyway)

      • By Workaccount2 2025-06-1717:55

        By the time you breakeven on whatever you spend on a decent LLM capable build, your hardware will be too far behind to run whatever is best locally then. It's something that feels cheaper but with the pace of things, unless you are churning an insane amount of tokens, probably doesn't make sense. Never mind that local models running on 24 or 48GB are maybe around flash-lite in ability while being slower than SOTA models.

        Local models are mostly for hobby and privacy, not really efficiency.

    • By chrismustcode 2025-06-1718:05

      When I ask it do to do something in cursor it goes full sherlock thinking about every possible outcome.

      Just claude 4 sonnet with thinking just has a bit think then does it

    • By DangitBobby 2025-06-184:30

      Same for me. I've been using Gemini 2.5 Pro for the past week or so because people said Gemini is the best for coding! Not at all my experience with Gemini 2.5 Pro, on top of being slow and flaky, the responses are kind of bad. Claud Sonnet 4 is much better IMO.

    • By r0fl 2025-06-1812:08

      The context window on ai studio feels endless.

      All other ai’s seem to give me errors when working with large bodies of code.

    • By dr_kiszonka 2025-06-1717:314 reply

      They nerfed Pro 2.5 significantly in the last few months. Early this year, I had genuinely insightful conversations with Gemini 2.5 Pro. Now they are mostly frustrating.

      I also have a personal conspiracy theory, i.e., that once a user exceeds a certain use threshold of 2.5 Pro in the Google Gemini app, they start serving a quantized version. Of course, I have no proof, but it certainly feels that way.

      • By conradkay 2025-06-1718:471 reply

        Maybe they've been focusing so much on improving coding performance with RL for the new versions/previews that other areas degraded in performance

        • By dr_kiszonka 2025-06-1719:45

          I think you are right and this is probably the case.

          Although, given that I rapidly went from +4 to 0 karma, a few other comments in this topic are grey, and at least one is missing, I am getting suspicious. (Or maybe it is just lunch time in MTV.)

      • By SirensOfTitan 2025-06-1813:14

        There was a significant nerf of Gemini 3-25 a little while ago, so much so that I detected it without knowing there was even a new release.

        Totally convinced they quantized the model quietly and improved on the coding benchmark to hide that fact.

        I’m frankly quite tired of LLM providers changing the model I’m paying for access to behind the scenes, often without informing me, and in Gemini’s case on the API too—at least last time I checked they updated the 3-25 checkpoint to the May update.

      • By cma 2025-06-1812:18

        One of the early updates improved agentic coding scores while lowering other general benchmark scores, which may have impacted those kind of conversations.

      • By esafak 2025-06-1718:44

        I wonder how smart they are about quantizing. Do they look at feedback to decide which users won't mind?

  • By lvl155 2025-06-1717:523 reply

    I am very impressed with Gemini and stopped using OpenAI. Sometimes, I ping all three major models on OpenRouter but 90% is on Gemini now. Compare that to 90% ChatGPT last year.

    • By codingwagie 2025-06-1719:331 reply

      I love to hate on google, but yeah their models are really good. The larger context window is huge

      • By kapildev 2025-06-1721:21

        Doesn't OpenAI's GPT 4.1 also have 1 million context length?

    • By aatd86 2025-06-1719:251 reply

      Same. For now I have canceled my claude subscription. Gemini has been catching up.

      • By glohbalrob 2025-06-1720:27

        Also me. Still pay for OpenAI, I use gpt4 for excel work and is super fast and able to do more excel related work like combine files that come up often for projects I work on.

    • By voiper1 2025-06-188:511 reply

      I don't like the thinking time, but for coding, journaling, and other stuff I've often been impressed with Gemini Pro 2.5 out of the box.

      Possibly I could do much more prompt fine-tuning to nudge openai/anthropic in the direction I want, but with the same prompts Gemini often gives me answers/structure/tone I like much better.

      Example: I had claude 3.7 generating embedding images and captions along with responses. Same prompt into Gemini it gave much more varied and flavorful pictures.

      • By deanstag 2025-06-1814:00

        Curious. How do you use gemini for journaling? What is your workflow?

HackerNews