The most Purr-fect Image File Format for your AI workflows - Kuberwastaken/meow
> Instead of storing metadata alongside the image where it can be lost, MEOW ENCODES it directly inside the image pixels using LSB steganography
That makes the data much more fragile than metadata fields, though? Any kind of image alteration or re-encoding (which almost all sites do to ensure better compression — discord, imgur, et al) is going to trash the metadata or make it utterly useless.
I'll be honest, I don't see the need for synthesizing a "new image format" because "these formats are ancient (1995 and 1992) - it's about time we get an upgrade" and "metadata [...] gets stripped way too easily" when the replacement you are advocating not only is the exact same format as a PNG but the metadata embedding scheme is much more fragile in terms of metadata being stripped randomly when uploaded somewhere. This seems very bizarre to me and ill-thought-out.
Anyway, if you want a "new image format" because "the old ones were developed 30 years ago", there's a plethora of new image formats to choose from, that all support custom metadata. including: webp, jpeg 2000, HEIF, jpeg xl, farbfeld (the one the suckless guys made).
I'll be honest... this is one of the most irritating parts of the new AI trend. Everyone is an "ideas guy" when they start programming, it's fine and normal to come up with "new ideas" that "nobody else has ever thought of" when you're a green-eared beginner and utterly inexperienced. The irritating part is what happens after the ideas phase.
What used to happen was you'd talk about this cool idea in IRC and people would either help you make it, or they would explain why it wasn't necessarily a great idea, and either way you would learn something in the process. When I was 12 and new to programming, I had the "genius idea" that if we could only "reverse the hash algorithm output to it's input data" we would have the ultimate compression format... anyone with an inch of knowledge will smirk at this preposition! And so I learned from experts on why this was impossible, and not believing them, I did my own research, and learned some more :)
Nowadays, an AI will just run with whatever you say — "why yes if it were possible to reverse a hash algorithm to its input we would have the ultimate compression format", and then if you bully it further, it will even write (utterly useless!) code for you to do that, and no real learning is had in the process because there's nobody there to step in and explain why this is a bad idea. The AI will absolutely hype you up, and if it doesn't you learn to go to an AI that does. And now within a day or two you can go from having a useless idea, to advertising that useless idea to other people, and soon I imagine you'll be able to go from advertising that useless idea to other people, to manufacturing it IRL, and at no point are you learning or growing as a person or as a programmer. But you are wasting your own time and everyone else's time in the process (whereas before, no time was wasted because you would learn something before you invested a lot of time and effort, rather than after).
Exactly. Not long ago, someone showed up on Hacker News who had, on his own, begun to rediscover the benefits of arithmetic coding. Naturally, he was convinced he’d come up with a brand-new entropy coding method. Well, no harm done and it’s nice that people study compression but I was surpised how easily he got himself convinced of a discovery. Clearly he knew very little.
Overall, I think this is a positive ”problem” to have :-)
I've had several revolutionary discoveries during my time programming. In each case, after the euphoria had settled a bit, I asked myself: Why aren't we already doing this? Why isn't this already a thing? What am I missing?
And lo and behold, in each case I did find that it was either not novel at all or it had some major downside I had initially missed.
Still, fun to think about new ways of doing things, so I still go at it.
I mean, I think it would be a positive problem to have if people were actually learning things and growing, but... honestly this doesn't seem to be the case, what I see here is AI-generated marketing fluff about a "brand new format" that does something that off-the-shelf software already does, that doesn't actually fit the intended use-case, (all of which would be fine if it wasn't-) generated also by AI.
> webp, jpeg 2000, HEIF, jpeg xl, farbfeld
I think you just illustrated how difficult it is to propose a new standard. Webp was not supported by many image related softwares (including the Adobe suite!) for years and earned a bad reputation, HEIF is also poorly supported, JPEG XL was removed from Chrome despite being developed by Google and not supported by any other browser AFAIK. Never heard of farbfeld before.
If the backing from Apple and Google was not enough to drive the adoption of an image format, I fail to see how this thing can go anywhere.
[flagged]
You generated pretty much ~all of this with Claude (c.f. ASCII diagrams with emojis on each line to "prove" various not-even-wrong claims it was told to justify), and the work is mediocre enough that it's worth full-throatedly criticizing both the work quality and that you inflicted this upon the world.
Look how many confused comments there are due to the page claiming features you don't have, don't understand, and don't make sense on their own terms (what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? how big would the image be? is it possible to fit that in the 2 bits we claim to fit in every 4 bytes)
I don't want for you to take it personally, at all, but I never, ever, want to see something like this on the front page again.
You've reinvented EXIF and JPEG metadata, in the voice of a diligent teenager desiring to create something meaningful, but with 0 understanding of the computing layers, 4 hours with Wikipedia, and 0 intellectual humility - though, with youth, born not from obstinance, but naiveté.
Some warning signs you should have taken heed of:
- Metadata is universally desirable, yet, somehow unexplored until now?
- Your setup instructions use UNIX commands up until they require running a Windows batch file
- The example of hiding data hides it in 2 bits in a channel then "demonstrates" this is visually lossless because its hidden in 1 bit across 2 channels (it isn't, because if it was, how would we determine which 2 of the channels?) ("visually lossless" confuses "lossless", a technical term meaning no information was lost with a weaker claim of being lossy-but-not-detectably-so)
I'll leave it here, I think you have the idea and there's a difference between being firm and honest, and being cruel, and length will dictate a lot of that to a casual observer.
>You generated pretty much ~all of this with Claude
>- Your setup instructions use UNIX commands up until they require running a Windows batch file
Is your comment AI generated? The only setup instructions prior to the windows commands are "git", "cd", and "pip". "cd" exists on both windows and unix. The other commands might not be available by default on windows, but they're not exactly "UNIX" commands either. The other code blocks mostly seem to be assuming windows (eg. "start" or "copy" command), so I don't see any contradictions here.
> Is your comment AI generated?
Are you asking this earnestly, or, is it meant to communicate something else? If so, what? :)
Genuinely, the most interesting part of the comment to me, in that it is does not have 0 meaning, and rings of some form of frustration, yet the rest of your comment stays focused on technical knowledge, and AFAIK you are not the author (who I'd expect would be at least temporarily angry at my contribution)
>Are you asking this earnestly, or, is it meant to communicate something else? If so, what? :)
If you're going to accuse some else of technical inconsistencies, maybe you should make sure your critiques are free of technical inconsistencies as well. You know, "people who live in glass houses shouldn't throw stones" and all that.
There's a false equivalence there, between being not-even-wrong and "you have a bunch of UNIX commands followed by a Windows batch file execution."
Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.
Let's say I instead said "this doesn't work on Windows"
I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...
Let's steelman this to the max: any possible prerequisite that could block it, doesn't mean its actually blocked. Dev mode on, WSL, prerequisites wrestled with and installed, can download source and edit then compile, but can only patch build errors, not add new functionality.
Are you 100% sure those commands will work?
(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)
>There's a false equivalence there, between being not-even-wrong and "you have a bunch of UNIX commands followed by a Windows batch file execution."
>Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.
No, you made a specific claim of "Your setup instructions use UNIX commands up until they require running a Windows batch file", when those "UNIX commands" were "pip" and "python". That statement is incorrect because those commands are readily available on windows.
Your remark about "you seem to assume I claimed something else, like, cd doesn't exist on windows" is absurd at best and verges on bad faith that I'm not even going to engage with it.
>I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...
Which commands are those? The only non-native windows commands I see are git, pip, and python, the latter of which are both included in python. You're making it sound like you need to jump through a bunch of hoops to get those commands working, when really all you have to do is run the installers for git and python.
>Are you 100% sure those commands will work?
Again, my claim isn't that the project works 100%, or even that it's not AI generated, it's that your critique makes little sense either.
>(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)
No, the reason why I accused you of AI generated comments and made the remark about glass houses is that claiming "pip" and "python" are "UNIX commands" is so absurdly wrong that it's on the level of the OP. I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.
> Your remark about "you seem to assume I claimed something else, like, cd doesn't exist on windows" is absurd at best and verges on bad faith that I'm not even going to engage with it.
You seem very upset, at least, I'm not used to people being this aggressive on HN, and I've been here for 15 years. I apologize for my contribution to that, if not my sole responsibility for it.
I remain fascinated by your process, I never have heard bad faith invoked when someone points at their actual words.
Generally, it is rare someone invokes "bad faith" when someone else's thoughts don't match their expectations.
I just...can't lie to you. I can't claim I thought it wouldn't work on Windows. I thought the opposite! That the sequence had 0% chance of working on not-Windows, and a 70% chance of working on Windows.
>> Are you 100% sure those commands will work? > Again, my claim isn't that the project works 100%, or even that it's not AI generated,
Oh! I'm referring to the commands, not the project :) The project can output "APRIL FOOLS!", as far as I care for this exercise.
> it's that your critique makes little sense either.
Oh, interesting - happy to hear more beyond that I must have meant pip/Python aren't available on Windows. If that's your sole issue, well, more power to you :) I do want to avoid lying to you just to avoid an aggressive conversation, you may not be even meaning to be aggressive. With the principle of "don't lie", I can't say I had something else in my head that matches your understanding so far, I presume something like "They are UNIX commands follows by Windows commands" [and thus this won't work on Windows]
> claiming "pip" and "python" are "UNIX commands"
Do you think I thought pip/Python wasn't on Windows? Sorry, no - in fact that's what I was using on Windows this week! (well, porting Python code to Dart) I just was 70% sure the commands as written would not work on Windows, and I suppose there's an implication I'm 100% sure they wouldn't work on not-Windows given the .bat file. Beyond that, nada.
>> separately, you misunderstand the quote re: glass houses
> No, I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.
Intriguing, as always: "Did you write this with AI?" followed by a kind inquiry into the meaning of that, followed by "people in glass shouldn't throw stones" meant "you said something wrong when you said something else is wrong, but its cool, that's fine" - "shouldn't" seems to bely that interpretation, but I'm sure I have it wrong.
P.s. all the best, my friend. :)
> You generated pretty much ~all of this with Claude Haha no, it was a reworked version of an older image format I found that I modified to fit this, yes, there was AI assisted coding involved in the process but it wasn't a "make me an image format that does x"
>what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? By “attention map” I meant a visual representation of where a model can focuses its “attention” when analyzing an image — basically, a heatmap highlighting important regions that influence the model’s output. It isn't something that is very useful now but might be.
> You reinvented EXIF/JPEG metadata with naivete Partly true (at least for now) the core idea was to experiment with alternative metadata or feature embedding, not to replace well-established standards. It's not where I NEED it to be yet but as far as metadata usecases go, it's pretty cool.
> Your setup instructions use UNIX commands up until they require running a Windows batch file It's easier to set windows up to directly open other file formats, it's just a thing (and I'm on windows - so)
Reality check:
Your extra data is a big JSON blob. Okay, fine.
File formats dating back to Targa (https://en.wikipedia.org/wiki/Truevision_TGA) support arbitrary text blobs if you're weird enough.
PNG itself has both EXIF data and a more general text chunk mechanism (both compressed and uncompressed, https://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.An... , section 4.2.3, you probably want iTXt chunks).
exiftool will already let you do all of this, by the way. There's no reason to summon non-standard file format into the world (especially when you're just making a weird version of PNG that won't survive resizing or quantization properly).
Here, two incantations:
> exiftool -config exiftool.config -overwrite_original -z '-_custom1<=meta.json' cat.png
and
> exiftool -config exiftool.config -G1 -Tag_custom1 cat.png
You can (with AI help no less) figure out what `exiftool.config` should look like. `meta.json` is just your JSON from github.
Now go draw the rest of the owl. :)
Hi! Thanks for checking it out, means a lot :)
Yes, it is a big JSON blob atm, haha and t's definitely still a POC, but the idea is to avoid having a separate JSON file that adds to the complexity. While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.
I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.
There's definitely a ton of work left to do, but I see a lot of potential in something like this (also, nice username)
> While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.
That's why I mentioned that you put anything, include binary data--which includes images--into the chunks in a PNG. I think Pillow even supports this (there are some PRs, like https://github.com/python-pillow/Pillow/pull/4292 , that suggest this).
Your problem domain is:
* Have something that looks like a PNG...
* ...that doesn't need supporting files outside itself...
* ...that can also store textual data (e.g., that JSON blob of bounding boxes and whatnot)...
* ...and can also store image data (e.g., attention maps and saliency regions).
What I'm telling you is that the PNG file format already supports all of this stuff, you just need to be smart enough to read the spec and apply the affordances it gives you.
> I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.
In the 90s, we'd already spent vast sums of gold and blood and tears solving the "holy shit, how do we encode multiple things in images so that they can survive an image pipeline, be extensible to end users, and be compressed reliably."
None of this has been new for three decades. Nothing you are going to do is going to be a value add over correctly using the file format you already have.
I promise that you aren't going to see anything particularly new or exciting in this AI goldrush that isn't an isomorphism of something much smarter, much better-paid people solved back when image formats were still a novel problem domain (again, in the 1990s).
> it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.
Why not exactly? ComfyUI encodes an absolute bonker amount of information (all arbitrary JSON) into workflow PNG files without any issues.
Indeed. And character cards for chatbots (like in SillyTavern) have supported this for years.