Gemini 2.5 Flash Image

2025-08-2614:011093475deepmind.google

Bring your imagination to life. Generate detailed images with Gemini, using text and image prompts.

Prompt: A classic, faded photograph capturing a scene from a 1960s recording studio, featuring these two blue characters. They are depicted in the control room, surrounded by the warm glow of vacuum tubes and the complex array of a large-format mixing console. The larger of the two blue figures has a pair of bulky headphones placed slightly askew on its head and gazes peacefully through the soundproof glass at a musician in the live room. The smaller character, perched on a stool, wears a tiny pair of round, 1960s-style glasses and is turned slightly to adjust a knob on a reel-to-reel tape machine. The entire image has the aesthetic of an aged photograph, with a grainy texture, soft focus, and a desaturated, warm color palette.


Read the original article

Comments

  • By fariszr 2025-08-2615:1519 reply

    This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

    Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

    • By qingcharles 2025-08-2616:535 reply

      I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.

      • By spaceman_2020 2025-08-2617:093 reply

        If you compare to the amount of effort required in Photoshop to achieve the same results, still a vast improvement

        • By qingcharles 2025-08-2617:192 reply

          I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?

          • By spaceman_2020 2025-08-2621:051 reply

            We had an exhibition some time back where I used AI to generate the posters for our product. This is a side project and not something we do seriously, but the results were outstanding - better than what the majority of much bigger exhibitors had.

            It took me a LOT of time to get things right, but if I was to get an actual studio to make those images, it would have cost me a thousands of dollars

            • By Bombthecat 2025-08-2711:32

              Yeah, played around with it, it created an amazing poster for starfinder ttrpg ( something like DND) with specifies who looked really! Good. Usually stuff likes this fails hard, since there isn't much training data of unique fantasy creatures.

              But flash 2.5? Worked! It did it, crazy stuff

          • By Bombthecat 2025-08-2711:26

            How many times did you tried? I uploaded a black and white photo and let it colourize, something like 20 percent were still black and white.

        • By echelon 2025-08-2618:547 reply

          Vibe coding might not be real, but vibe graphics design certainly is.

          https://imgur.com/a/internet-DWzJ26B

          Anyone can make images and video now.

          • By cwmoore 2025-08-274:113 reply

            Are those oil derricks, or wind turbines? Who cares! Graphic design is easy now!

          • By lebimas 2025-08-2620:371 reply

            What tools did you use to make those videos from the PG image?

            • By echelon 2025-08-2620:552 reply

              I used a bunch of models in conjunction:

              - Midjourney (background)

              - Qwen Image (restyle PG)

              - Gemini 2.5 Flash (editing in PG)

              - Gemini 2.5 Flash (adding YC logo)

              - Kling Pro (animation)

              I didn't spend too much time correcting mistakes.

              I used a desktop model aggregation and canvas tool that I wrote [1] to iterate and structure the work. I'll be open sourcing it soon.

              [1] https://getartcraft.com

              • By kstenerud 2025-08-2623:47

                The app looks interesting, but I think it needs some documentation. I think I generated something? Maybe? I saw a spinny thing for awhile, but then nothing.

                I couldn't get the 3d thing to do much. I had assets in the scene but I couldn't for the life of me figure out how to use the move, rotate or scale tools. And the people just had their arms pointing outward. Are you supposed to pose them somehow? Maybe I'm supposed to ask the AI to pose them?

                Inpainting I couldn't figure out either... It's for drawing things into an existing image (I think?) but it doesn't seem to do anything other than show a spinny thing for awhile...

                I didn't test the video tool because I don't have a midjourney account.

              • By unixhero 2025-08-277:402 reply

                What is PG?

          • By spaceman_2020 2025-08-2621:091 reply

            Midjourney with style references is just about the easiest way right now for an absolute noob to get good aesthetics

            • By bacchusracine 2025-08-2812:44

              This post may or may not violate our community standards so we aren't going to display it.

          • By throwaway638637 2025-08-271:14

            What is up with that T rex's arms?

          • By benreesman 2025-08-2714:02

            I think much like coding, the top of the game is all the old stuff and a bunch of new stuff that is impossible to master without some real math or at least outlier mathematical intuition.

            The old top of the game is available to more people (though mid level people trying to level up now face a headwind in a further decoupling of easily read signals and true taste, making the old way of developing good taste harder).

            This stuff makes people who were already "master rate" who are also nontrivially sophisticated machine learning hobbyists minimum and drives their peak and frontier out, drives break even collaboration overhead down.

            It's always been possible to DIY code or graphic design, it's always been possible to tell the efforts of dabblers and pros apart, and unlike many commodities? There is rarely a "good enough". In software this is because compute is finite and getting more out of it pays huge, uneven returns, in graphic design its because extreme quality work is both aesthetically pleasing as well as a mark of quality (imperfect but a statement someone will commit resources).

            And it's just hard to see it being different in any field. Lawyers? Opposing counsel has the best AI, your lawyer better have it too. Doctors? No amount of health is "enough" (in general).

            I really think HN in particular but to some extent all CNBC-adjacent news (CEO OnlyFans stuff of all categories) completely misses the forest (the gap between intermediate and advanced just skyrocketed) for the trees (space-filling commodity knowledge work just plummeted in price).

            But "commodity knowledge work" was always kind of an oxymoron, David Graeber called such work "bullshit jobs". You kinda need it to run a massive deficit in an over-the-hill neoliberal society, it's part of the " shift from production to consumption" shell game. But it's a very recent, very brief thing that's already looking more than wobbly. Outside of that? Apprentices, journeymen, masters is the model that built the world.

            AI enables a new even more extreme form of mastery, blurs the line between journeyman and dabbler, and makes taking on apprentices a much longer-term investment (one of many reasons the PRC seems poised to enjoy a brief hegemony before demographics do in the Middle Kingdom for good, in China, all the GPUs run Opus, none run GPT-5 or LLaMA Behemoth).

            The thing I really don't get is why CEOs are so excited about this and I really begin to suspect they haven't as a group thought it through (Zuckerberg maybe has, he's offering Tulloch a billion): the kind of CEO that manages a big pile of "bullshit jobs"?

            AI can do most of their job today. Claude Opus 4.1? It sounds like if a mid-range CEO was exhaustively researched and gaff immune. Ditto career machine politicians. AI non practitioner prognosticators. That crowd.

            But the top graphic communications people and CUDA kernel authors? Now they have to master ComfyUI or whatever and the color theory to get anything from it that stands out.

            This is not a democratizing thing. And I cannot see it accruing to the Zuckerberg side of the labor/capital divvy up without a truly durable police state. Zuck offering my old chums nation state salaries is an extreme and likely transitory thing, but we know exactly how software professional economics work when it buckets as "sorcery" and "don't bother": that's 1950 to whenever we mark the start of the nepohacker Altman Era, call it 2015. In that world good hackers can do whatever they want, whenever they want, and the money guys grit their teeth. The non-sorcery bucket has paper mache hack-magnet hackathon projects in it at a fraction of the old price. So disruption, wow.

            Whether that's good or bad is a value judgement I'll save for another blog post (thank you for attending my TED Talk).

          • By captnFwiffo 2025-08-2620:171 reply

            Sure, now the client wants 130 edits without losing coherency with the original. What does a vibe designer do? Just keep re-prompting and re-generating until it works? Sounds hard to me.

            • By Filligree 2025-08-2711:45

              They use Kontext, Qwen-Edit or Gemini.

        • By petralithic 2025-08-2711:24

          Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.

      • By vitorgrs 2025-08-275:14

        The model seems good, but it seems to have huge issues in doing garbage most of times lol.

        Still needs more RLHF tuning I guess? As the previous version was even worse.

      • By druskacik 2025-08-2617:122 reply

        Is it because the model is not good enough at following the prompt, or because the prompt is unclear?

        Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.

        • By toddmorey 2025-08-2618:51

          Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.

        • By qingcharles 2025-08-2617:22

          No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.

      • By ericlang 2025-08-2618:431 reply

        How did you get early access? Thanks.

        • By Thorrez 2025-08-2621:20

          I believe lmarena.

      • By animanoir 2025-08-2622:24

        [dead]

    • By hapticmonkey 2025-08-2621:293 reply

      Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.

      But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.

      • By vineyardmike 2025-08-276:44

        > finally be put to use…for product placement.

        Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?

        Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.

        Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.

      • By johnfn 2025-08-2711:55

        Oh come on - you have this incredible technology at your disposal and all you can think to use it for is product placement?

      • By torginus 2025-08-2718:25

        I am pretty sure a lot of said engineering talent isn't actually contributing to AI but doing other stuff

    • By torginus 2025-08-2619:441 reply

      Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.

      • By drmath 2025-08-272:011 reply

        Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.

        • By ethbr1 2025-08-272:191 reply

          Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon

          • By wiz21c 2025-08-276:09

            look at the bottom of the sleeves, they don't match. the bottom of the jacket doesn't match either.

            I didn't see it at first sight but it certainly is not the same jacket. If you use that as an advertisement, people can sue you for lying about the product.

    • By dcre 2025-08-2615:581 reply

      Alarming hands on the third one: it can't decide which way they're facing. But Gemini didn't introduce that, it's there in the base image.

      • By 725686 2025-08-2619:191 reply

        Yes, the base image's hands are creepy.

        • By meatmanek 2025-08-2622:311 reply

          I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?

          • By dcre 2025-08-2715:32

            It doesn't seem to matter: people have posted tons of examples on social media of non-AI base images that it was equally able to hold steady while making edits.

    • By ceroxylon 2025-08-2615:372 reply

      It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?

      • By bonoboTP 2025-08-2615:551 reply

        I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.

        • By ceroxylon 2025-08-2616:07

          Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.

          There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.

      • By vunderba 2025-08-2617:02

        They're almost all scams. Nano banana AI image generator sites were showing up when this model was still only available in LM Arena.

    • By 93po 2025-08-2617:52

      Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.

    • By ivape 2025-08-2620:351 reply

      Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.

      • By fariszr 2025-08-2621:00

        Tool use and sycophancy are still big issues in gemini 2.5 models.

    • By summerlight 2025-08-2618:41

      I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.

    • By koakuma-chan 2025-08-2615:463 reply

      Why is it called nano banana?

      • By ehsankia 2025-08-2616:441 reply

        Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.

        • By mvdtnz 2025-08-2618:371 reply

          What are "the arenas"?

          • By patates 2025-08-2618:481 reply

            Blind rating battlegrounds, one is https://lmarena.ai/ (first google result)

            • By kstenerud 2025-08-270:091 reply

              I don't quite get what this is? I asked the AI on the site "What is imarena.ai?" and it just gave some hallucinated answer that made no sense.

              • By adventured 2025-08-271:141 reply

                People vote on the performance of AI, generating ranking boards.

                • By kstenerud 2025-08-275:19

                  Ah, that was the missing piece of information! Thanks!

      • By Jensson 2025-08-2615:50

        Engineers often have silly project names internally, then some marketing team rewrites the name for public release.

      • By ZephyrBlu 2025-08-2616:171 reply

        I'm pretty sure it's because an image of a banana under a microscope generated by the model went super viral

    • By rplnt 2025-08-2616:48

      Oh no, even more mis-scaled product images.

    • By torginus 2025-08-2621:02

      No, it's not really that much of an improvement. Once you start coming up with specific tasks, it fails just like the others.

    • By littlestymaar 2025-08-2711:59

      > An example. https://x.com/D_studioproject/status/1958019251178267111

      “Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.

    • By polishdude20 2025-08-270:521 reply

      The fingernails on one of them. Ohhh nooo

      • By ethbr1 2025-08-272:21

        Image genai made me realize just how inattentive to detail a lot of people are.

    • By goosejuice 2025-08-272:28

      Yet it's failed spectacularly at almost everything I've given it.

    • By r33b33 2025-08-278:40

      nano banana is good, but not insanely good

    • By Viaya 2025-08-276:201 reply

      [dead]

    • By Viaya 2025-08-276:19

      [dead]

    • By fHr 2025-08-2620:47

      cope

    • By echelon 2025-08-2615:594 reply

      > This is the gpt 4 moment for image editing models.

      No it's not.

      We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".

      Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.

      Flux Kontext and Qwen are also possible to fine tune and run locally.

      Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.

      We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.

      It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.

      • By raincole 2025-08-2616:032 reply

        In other words, this is the gpt 4 moment for image editing models.

        Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.

        • By jug 2025-08-2620:18

          I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.

        • By retinaros 2025-08-2616:061 reply

          did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018

          • By raincole 2025-08-2616:111 reply

            I've tested it on Google AI Studio since it's available to me (which is just a few hours so take it with a grain of salt). The prompt comprehension is uncannily good.

            My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.

            Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

            • By echelon 2025-08-270:51

              > FluxKontext

              Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.

              Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.

              > Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

              Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.

              When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.

              They need to learn more image editing tricks.

      • By krackers 2025-08-2618:311 reply

        I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.

        • By echelon 2025-08-2618:56

          People really slept on gpt-image-1 and were too busy making Miyazaki/Ghibli images.

          I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.

          LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.

      • By fariszr 2025-08-2620:27

        I'm sorry I absolutely don't agree. This model is on a whole other level.

        It's not even close. https://twitter.com/fareszr/status/1960436757822103721

      • By bsenftner 2025-08-2712:54

        I'm totally with you. Dismayed by all these fanbois.

  • By vunderba 2025-08-2616:497 reply

    I've updated the GenAI Image comparison site (which focuses heavily on strict text-to-image prompt adherence) to reflect the new Google Gemini 2.5 Flash model (aka nano-banana).

    https://genai-showdown.specr.net

    This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.

    What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

    Comparison of gpt-image-1, flash, and imagen.

    https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...

    • By bla3 2025-08-2619:171 reply

      Why do Hunyuan, OpenAI 4o and Gwen get a pass for the octopus test? They don't cover "each tentacle", just some. And midjourney covers 9 of 8 arms with sock puppets.

      • By vunderba 2025-08-2619:22

        Good point. I probably need to adjust the success pass ratios to be a bit stricter, especially as the models get better.

        > midjourney covers 9 of 8 arms with sock puppets.

        Midjourney is shown as a fail so I'm not sure what your point is. And those don't even look remotely close to sock puppets, they resemble stockings at best.

    • By arresin 2025-08-2618:11

      You need a separate benchmark for editing of course

    • By cubefox 2025-08-2623:361 reply

      What's interesting is that Imagen 4 and Gemini 2.5 Flash Image look suspiciously similar in several of these tests cases. Maybe Gemini 2.5 Flash first calls Imagen in the background to get a detailed baseline image (diffusion models are good at this) and then Gemini edits the resulting image for better prompt adherence.

      • By pkach 2025-08-2711:32

        Yes, saw on a reddit about an employee confirming this is the case (at least on Gemini app) where the request for an image from scratch is routed to imagen and the follow-up edits are done using Gemini.

    • By MrOrelliOReilly 2025-08-278:311 reply

      This is incredibly useful! I was manually generating my own model comparisons last night, so great to see this :)

      I will note that, personally, while adherence is a useful measure, it does miss some of the qualitative differences between models. For your "spheron" test for example, you note that "4o absolutely dominated this test," but the image exhibits all the hallmarks of a ChatGPT-generated image that I personally dislike (yellow, with veiny, almost impasto brush strokes). I have stopped using ChatGPT for image generation altogether because I find the style so awful. I wonder what objective measures one could track for "style"?

      It reminders be a bit of ChatGPT vs Claude for software development... Regardless of how each scores on benchmarks, Claude has been a clear winner in terms of actual results.

      • By vunderba 2025-08-2718:30

        Yeah - unfortunately the ubiquitous "piss filter" strikes again. You pretty much have to pass GPT-image-1 through a tone map, LUT, etc. in something like Krita or Photoshop to try to mitigate this. I'm honestly a bit surprised that they haven't built this in already given how obvious the color shift is.

    • By gundmc 2025-08-2618:111 reply

      > Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

      Came into this thread looking for this post. It's a great way to compare prompt adherence across models. Have you considered adding editing capabilities in a similar way given the recent trend of inpainting-style prompting?

      • By vunderba 2025-08-2619:361 reply

        Adding a separate section for image editing capabilities is a great idea.

        I've done some experimentation with Qwen and Kontext and been pretty impressed, but it would be nice to see some side by sides now that we have essentially three models that are capable of highly localized in-painting without affecting the rest of the image.

        https://mordenstar.com/blog/edits-with-kontext

        • By dostick 2025-08-289:49

          For editing prompts testing it is best to start with “only change …” to prevent model from changing everything. Even Nano banana does that.

    • By jay_kyburz 2025-08-2622:061 reply

      I really like your site.

      Do you know of any similar sites that that compares how well the various models can adhere to a style guide? Perhaps you could add this?

      I.e. pride the model with a collection of drawings in a single style, then follow prompts and generate images in the same style?

      For example if you wanted to illustrate a book, and have all the illustrations look like they were from the same artists.

      • By vunderba 2025-08-2718:33

        Hi Jay, unfortunately I haven't see a site like that but being able to rank models in terms of "style adherence" but it would be a nice feature.

        It's basically a necessity if you're working on something like a game or comic where you need consistency around characters, sprites, etc.

    • By mrcwinn 2025-08-2715:47

      I really enjoyed reviewing this! Good work.

  • By carlosbaraza 2025-08-2621:324 reply

    Unfortunately, it suffers from the same safetyism than other many releases. Half of the prompts get rejected. How can you have character consistency if the model is forbidden from editing any human. And most of my photo editing involves humans, so basically this is just a useless product. I get that Google doesn't want to be responsible for deep fake advances, but that seems inevitable, so this is just slightly delaying progress. Eventually we will have to face it and allow for society to adapt.

    This trend of tools that point a finger at you and set guardrails is quite frustrating. We might need a new OSS movement to regain our freedom.

    • By Workaccount2 2025-08-2622:084 reply

      I have an old photo of my girlfriend with her cousin when they were young, wearing Christmas dresses in front of the tree, not long before they were separated to other sides of the world for decades now. The photo is itself low quality on top of the photo itself being physically beat up.

      So far no model is willing to clean it up :/

      • By gaudystead 2025-08-2623:391 reply

        There are reddit communities (I admittedly don't remember which, but could probably be found from a simple search) where people will offer their photo editing skills to touch up the photo, often for free. Could be worth trying a real human if the robots are going full HAL 9000 and telling you they can't do it.

      • By boppo1 2025-08-2715:40

        If you are not personally offended by looking at CRAZY pornography, you could start digging into the comfyui ecosystem. It's not all porn, there are lots of pro photo-manipulators doing sfw stuff, but the community overlap with NSFW is basically borderless, so you'll probably bump into it.

        However, the results the comfyui people get are lightyears ahead of any oneshot-prompt model. Either you can find someone to do cleanup for you (should be trivial, I wouldn't pay more than $10-15) or if you have good specs for inference you could learn to do it yourself.

      • By AuryGlenz 2025-08-274:41

        If you have a decent GPU Qwen Edit can probably do it and certainly won’t refuse.

        Keep in mind no editing model is magic and if the pixels just aren’t there for their faces it’s essentially going to be making stuff up.

      • By yfontana 2025-08-277:57

        Open source models like Flux Kontext or Qwen image edit wouldn't refuse, but you need to either have a sufficiently strong GPU or get one in the cloud (not difficult nor expensive with services like runpod), then set up your own processing pipeline (again, not too difficult if you use ComfyUI). Results won't be SOTA, but they shouldn't be too far off.

    • By danpalmer 2025-08-273:571 reply

      I've done ~20 prompts so far and not had one be rejected so far. What sort of things are you asking it to do? I've tried things like changing clothing and accessories on people.

      • By carlosbaraza 2025-08-277:213 reply

        Basic things like: "{uploaded image of a man} can you remove the glasses?" or "make everyone in the picture smile" or "open the eyes of everyone in the photo". Nothing that a human would consider "unsafe". I am based in EU and using Google AI Studio with all safety toggles set to "Off".

        • By danpalmer 2025-08-282:38

          Strange. I wouldn't have thought the safety rules would differ by region, at least not for things like that. I uploaded a photo and asked to change the glasses and change the shirt and it did both with no problem.

          I just went back to the chat and asked it to remove the glasses and it worked. Asking it to remove the shirt also succeeded, although a) this is a head and shoulders photo so nothing NSFW, and b) it didn't do a great job of guessing what my shoulders look like.

        • By technofiend 2025-08-2718:23

          For a joke between friends I had it take my selfie and make me a bald Catholic priest and then add hair to a friend who is bald. No refusals, although those are pretty tame. In contrast to the quality images nano-banana produced, Copilot removed my glasses and made my eyes brown.

        • By simedw 2025-08-279:10

          I noticed that I get far fewer refusals when I set my VPN to the USA.

    • By mudkipdev 2025-08-2621:51

      I was using Veo two days ago when video generations were free. I removed all words that sounded even remotely bad, but it still refused. Eventually gave up but now I'm thinking it's because I tried to generate myself

HackerNews