Hacker News

Gemini 3.1 Flash-Lite: Built for intelligence at scale

2026-03-0316:376030blog.google

Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.

Show article

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model. Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.

Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI.

Cost-efficiency without compromise

Priced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models. It outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark while maintaining similar or better quality. This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.

POSTED IN:

Read the original article

meetpateltech

Karma: 28459

@Hacker__News
@hacker._news

Comments

By vlmutolo 2026-03-0318:032 reply

Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning).

This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens).

That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark.

The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task.

By XCSme 2026-03-0321:551 reply

> 3.1 Flash-Lite (reasoning)

(reasoning) doesn't say much. Is it low/med/high reasoning? I ran my own benchmarks, and 3.1 Flash-Lite on high costs A LOT: https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...

Do not use 3.1 Flash-Lite with HIGH reasoning, it reasons for almost max output size, you can quickly get to millions of tokens of reasoning in a few requests.

By vlmutolo 2026-03-041:021 reply

Wow, that’s very interesting. I wish more benchmarks were reported along with the total cost of running that benchmark. Dollars per token is kind of useless for the reasons you mentioned.

By XCSme 2026-03-041:051 reply

Yup, MiniMax M-2.5 is a standout in that aspect. It's $/token is very low, because it reasons forever (fun fact, that's also the reason why it's #1 on OpenRouter, because it simply burns through tokens, and OpenRouter ranking is based on tokens usage)...

By XCSme 2026-03-041:06

https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...

By msp26 2026-03-0320:08

many tasks don't need any reasoning

By sync 2026-03-0317:05

Unfortunate, significant price increase for a 'lite' model: $0.25 IN / $1.50 OUT vs. Gemini 2.5 Flash-Lite $0.10 IN / $0.40 OUT.

By rohansood15 2026-03-0317:334 reply

For the last 2 years, startup wisdom has been that models will continue to get cheaper and better. Claude first, and now Gemini has shown that it's not the case.

We priced an enterprise contract using Flash 1.5 pricing last summer, and today that contract would be unit economic negative if we used Flash 3. Flash 2.5 and now Flash 3.1 Lite barely breaks even.

I predict open-source models and fine-tuning are going to make a real comeback this year for economic reasons.

By dktp 2026-03-0318:01

Opus 4.5 became significantly cheaper than Opus 4.1

By simianwords 2026-03-0317:421 reply

Not true. You just measure cost by amount of money spent per task. I would argue that this lite version is equivalent to older flash.

By rohansood15 2026-03-0317:521 reply

Yea but there is a whole world of tasks for which Flash 2.5-lite was sufficiently intelligent. Given Google's depreciation policy, there will soon be no way to get that intelligence at that price.

By simianwords 2026-03-0318:16

I hope they release models at every intelligence resolution although the thinking effort can be a good alternative

By xnx 2026-03-0318:12

> We priced an enterprise contract using Flash 1.5 pricing last summer,

Interesting. Flash 1.5 was already a year old at that point.

By typs 2026-03-0317:371 reply

I mean the same level of intelligence does get cheaper. People just care about being on the frontier. But if you track a single level of intelligence the price just drops and drops.

By rohansood15 2026-03-0317:55

What's the cheaper alternative from Gemini for Flash-2.5-lite level intelligence when it gets deprecated on 22nd July 2026?