A curated collection of fun and creative examples generated with Nano Bananaš, Gemini-2.5-flash-image based model. This repository showcases diverse AI-generated visuals and prompts, highlightingā¦
Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.
I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.
https://genai-showdown.specr.net/image-editing
Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.
> a very high focus on adherence
Don't know if it's the same for others, but my issue with Nano Banana has been the opposite. Ask it to make x significant change, and it spits out what I would've sworn is the same image. Sometimes randomly and inexplicably it spits our the expected result.
Anyone else experiencing this or have solutions for avoiding this?
Just yesterday, asking it to make some design changes to my study. It did a great job with all the complex stuff, but asking it to move a shelf higher, it repeatedly gave me back the same image. With LLMs generally I find as soon as you encounter resistance it's best to start a new chat, however in this case that didn't wok either. Not a single thing I could do to convince it that the shelf didn't look right half way up a wall.
"Hey gemini, I'll pay you a commission of $500 if you edit this image with the shelf higher on the wall..."
Yeah I've definitely seen this. You can actually see evidence of this problem in some of the trickier prompts (the straightened Tower of Pisa and the giraffe for example).
Most models (gpt-image-1, Kontext, etc) typically fail by doing the wrong thing.
From my testing this seems to be a Nano-Banana issue. I've found you can occasionally work around it by adding far more explicit directives to the prompt but there's no guarantee.
I've had this same issue happen repeatedly. It's not a big deal because it is just for small personal stuff, but I often need to tell it that it is doing the same thing and that I had asked for changes.
Yes experienced this exactly.
Great comparison! Bookmarked to follow. Keep an eye on Grok, they're improving at a very rapid rate and I suspect they'll be near the top in not too distant future.
Will do! I just added Seedream v4.0 a few hours ago as well. It's all I can do just to keep up and not get trampled under the relentless march of progress.
Isn't their image generation just using the open weights Flux model? You can run that model locally. They don't have their own image model as far as I'm aware.
Nice visualization!
By the way, some of the results look a little weird to me, like the one for the 'Long Neck' prompt. The giraffe of Seedream just lowered its head but its neck didn't shorten as expected. I'd like to learn about the evaluation process, especially whether it is automatic or manual.
Hi Isharmla, the giraffe one was a tough call. IMHO, even when correcting for perspective, I do feel like it managed to follow the directive of the prompt and shorten the neck.
To answer your question, all of the evaluations are performed manually. On the trickier results I'll occasionally conscript some friends to get a group evaluation.
The bottom section of the site has an FAQ that gives more detail, I'll include it here:
It's hard to define a discrete rubric for grading at an inherently qualitative level. To keep things simple, this test is purely PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt.
In many cases, we often attempt a generous interpretation of the prompt - if it gets close enough, we might consider it a pass.
To paraphrase former Supreme Court Justice Potter Stewart, "I may not be able to define a passing image, but I know it when I see it."
Add gpt-image-1. It's not strictly an editing model since it changes the global pixels, but I've found it to be more instructive than Nano Banana for extremely complicated prompts and image references.
It's actually already in there - the full list of edit models is Nano-Banana, Kontext Dev, Kontext Max, Qwen Edit 20b, gpt-image-1, and Omnigen2.
I agree with your assessment - even though it does tend to make changes at a global level you can least attempt to minimize its alterations through careful prompting.
Why does OpenAI get a different image for āGirl with Pearl Earringā?
That's a mistake. Gpt-image-1 is a lot stricter in the supported output resolutions so it's using a cropped image. I'll fix the test later this week. Thanks for the heads up!
Can you post comparison images?
still cannot show clock (eg a clock showing 1:15 am). the text generated in manga image is still not 100% correct.
No grok tested?
Grok is just a hosted api for Flux
great benchmark!
[dead]
Amazing model. The only limit is your imagination, and it's only $0.04/image.
Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation
Good collection of examples. Really weird to choose an inappropriate for work one as the second example.
More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation
Yep, Google actually recommends using Imagen4 / Imagen4 Ultra for straight image generation. In spite of that, Flash 2.5 still scored shockingly high on my text-to-image comparisons though image fidelity is obviously not as good as the dedicated text to image models.
Came within striking distance of OpenAI gpt-image-1 at only one point less.
Is it a single model or is it a pipeline of models?
Single model, Gemini 2.5 Flash with native image output capability.
They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.
I assume OP means the actual post.
The second example under "Case 1: Illustration to Figure" is a panty shot.
This was reported and has been removed recently (https://github.com/PicoTrex/Awesome-Nano-Banana-images/issue...), although the issue wasn't closed.
For anyone confused, the offending example got removed 10 minutes ago
https://github.com/PicoTrex/Awesome-Nano-Banana-images/tree/... if you want to see it.
I have no idea how people think they can interact with an art related product with this kind of puritanical sensibility.
This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.
I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.
Am I using the wrong model, somehow??
No, I've noticed the same.
When Nano Banana works well, it really works -- but 90% of the time the results will be weird or of poor quality, with what looks like cut-and-paste or paint-over, and it also refuses a lot of reasonable requests on "safety" grounds. (In my experience, almost anything with real people.)
I'm mostly annoyed, rather than impressed, with it.
Ok this answers my question to the nature of the page. As in: Are these examples that show results you get when using certain inputs and prompts. Or are these impressive lucky on offs.
I was a bit surprised to see quality. Last time I played around with image generation is a few months back and Iām more in the frustration camp. Not to say that I believe some people with more time and dedication at their hand can tickle better results.
From having used Nano Banana over the past few days, I think that they're extremely cherry-picked, and that each one is probably the result of multiple (probably a dozen+) attempts.
In my experience, Nano Banana would actively copy and paste if it thinks it's fine to do so. You need to explicitly prompt that the character should be seamlessly integrated into the scene or similar. In the other words, the model is superb when properly prompted especially compared to other models, but prompting itself can be annoying from time to time.
There's a good reference up in the comments: https://genai-showdown.specr.net/image-editing
which goes to show that some of these amazing results might need 18 attempts and such.
Play around with your prompt, try ask Gemini 2.5 pro to improve your prompt before sending it to Gemini 2.5 Flash, retry and learn what works and what doesn't.
+1
I understand the results are non deterministic but I get absolute garbage too.
Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.
After many days and tries we got it to make one but there was no way to tweak the fringe, the model kept returning the same pic every time (with plenty of "content blocked" in between).
Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.
I use ai studio, no way to disable the filters.
Seedream 4.0 is not always better than Gemini Flash 2.5 (nano-banana), but when it is better, there is a gulf in performance (and when it's not, it's very close.)
It's also cheaper than Gemini, and has way fewer spurious content warnings, so overall I'm done with Gemini
No, that's just result of TONS of resets until you get something decent. 99% of the time you'll get trash, but that 1% is cool
It's not just you and there's a ton of gaslighting and astroturfing happening with Nano Banana. Thanks to this article we can even attempt to reproduce their exact inputs and lo and behold the results are much worse. I tried a bunch of them and got far worse results than the author. I assume they are trying the same prompts again and again until they get something slightly useful.
[dead]
[dead]