This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes

Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model. Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.
Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI.
Priced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models. It outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark while maintaining similar or better quality. This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.
Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning).
This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens).
That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark.
The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task.
> 3.1 Flash-Lite (reasoning)
(reasoning) doesn't say much. Is it low/med/high reasoning? I ran my own benchmarks, and 3.1 Flash-Lite on high costs A LOT: https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...
Do not use 3.1 Flash-Lite with HIGH reasoning, it reasons for almost max output size, you can quickly get to millions of tokens of reasoning in a few requests.
Wow, that’s very interesting. I wish more benchmarks were reported along with the total cost of running that benchmark. Dollars per token is kind of useless for the reasons you mentioned.
Yup, MiniMax M-2.5 is a standout in that aspect. It's $/token is very low, because it reasons forever (fun fact, that's also the reason why it's #1 on OpenRouter, because it simply burns through tokens, and OpenRouter ranking is based on tokens usage)...
many tasks don't need any reasoning
Unfortunate, significant price increase for a 'lite' model: $0.25 IN / $1.50 OUT vs. Gemini 2.5 Flash-Lite $0.10 IN / $0.40 OUT.
For the last 2 years, startup wisdom has been that models will continue to get cheaper and better. Claude first, and now Gemini has shown that it's not the case.
We priced an enterprise contract using Flash 1.5 pricing last summer, and today that contract would be unit economic negative if we used Flash 3. Flash 2.5 and now Flash 3.1 Lite barely breaks even.
I predict open-source models and fine-tuning are going to make a real comeback this year for economic reasons.
Opus 4.5 became significantly cheaper than Opus 4.1
Not true. You just measure cost by amount of money spent per task. I would argue that this lite version is equivalent to older flash.
Yea but there is a whole world of tasks for which Flash 2.5-lite was sufficiently intelligent. Given Google's depreciation policy, there will soon be no way to get that intelligence at that price.
I hope they release models at every intelligence resolution although the thinking effort can be a good alternative
> We priced an enterprise contract using Flash 1.5 pricing last summer,
Interesting. Flash 1.5 was already a year old at that point.
I mean the same level of intelligence does get cheaper. People just care about being on the frontier. But if you track a single level of intelligence the price just drops and drops.
What's the cheaper alternative from Gemini for Flash-2.5-lite level intelligence when it gets deprecated on 22nd July 2026?