What happens when you put random data through a PNG decoder?

2024-03-1213:4631jacobwinters.com

Huffman coding: The image data stream is a series of integers in the range 0-255 representing how much red, green, or blue light to add to a pixel. Consider this image. 99% of the values are 0; the…

  • Huffman coding: The image data stream is a series of integers in the range 0-255 representing how much red, green, or blue light to add to a pixel. Consider this image. 99% of the values are 0; the other 1% are picked from the range 1-255.

    [Image generated by JS]

    We've got a stream of values, each taken from a set of 256 possibilities, and we need to encode them as a string of bits. Our first impulse is to encode each value as one byte, but there's nothing saying we have to do it that way. Instead, let's encode the value 0 as the single bit 0 and all other integers as a 1 bit followed by an eight-bit encoding of the integer. The decoder will look like this:

    def readPixelValue():
        if readBit() == 0:
            return 0
        else:
            return read8Bits()

    99% of values are encoded in 1 bit and 1% are encoded in 9 bits, giving an average of 1.08 bits per value, a substantial reduction from 8 bits (1 byte) per value. (There's still a little room for improvement left because there are two ways to encode the value 0, namely the bit strings 0 and 000000000, but you get the idea.)

    If we had an image that was mostly 0s and 255s, we might say 00 → 0, 01 → 255, 1xxxxxxxx → xxxxxxxx, bumping 0 to a slightly longer code so that we can give 255 a shorter code.

    Huffman coding is what you get if you take this line of thinking to its logical conclusion: an algorithm that, given the relative frequencies of the different values that you have to encode, will figure out what tradeoffs need to be made to make the best possible variable-length code to store your data, giving shorter encodings to more frequent values at the expense of taking more bits to store less frequent values. It's quite elegant, and if you want to know more Wikipedia explains it better than I can.

  • LZ77: Now consider this grayscale image, where I've put every value in the range 0-255 in a random order and then repeated that pattern several times.

    [Image generated by JS]

    Each row is identical to the row two positions above it. Because all the possible values occur equally often, Huffman coding can't help us, but there's clearly some structure here. LZ77 detects and compresses repeated sequences of values. The first two rows are written out normally, but after that each new row can be written as an instruction to look back 256 bytes in the data that's just been decoded and repeat 128 bytes from there. In fact, why not encode two rows at a time - "go back 256 bytes, repeat 256 bytes"?

    My understanding of LZ77 is even more tenuous than my understanding of Huffman coding; the curious reader is once again directed to Wikipedia.

  • Row filters: Final example image.

    Going left to right, then top to bottom, the values we need to encode are [0, 1, 2, ..., 254, 255, 255, 254, 253, ..., 1, 0]. There's structure but DEFLATE can't see it. No one value is more common than the others, so Huffman coding can't help. No sequence is repeated, so LZ77 can't help either.

    Before passing pixel data to DEFLATE, PNG runs a pre-processing step that makes the patterns easier for DEFLATE to see. It's the only step in the process that knows that we're looking at image data, not text or code or audio or anything else. For each row of pixels, the encoder picks one of five pre-processing filters to use. In this case, it would probably choose to use the Sub filter, which encodes the first pixel of each row normally and then each subsequent pixel as the difference between itself and the pixel to its left: [<new row, using filter Sub>, 0, +1, +1, ..., +1, <new row, using filter Sub> 255, -1, -1, ..., -1]. In bytes, Sub is filter 1 and -1 is stored as its two's complement representation 255, so DEFLATE sees [1 (filter for new row), 0, 1, 1, ..., 1, 1 (filter for new row), 255, 255, 255, ..., 255]. The gradient that DEFLATE didn't understand has become long runs of repeated symbols, a kind of structure DEFLATE does understand and compresses well.


  • Read the original article

    Comments

    • By fddrdplktrew 2024-03-1221:31

      There must be a bug in this generator because I keep seeing those long lines of same-color pixels... pseudo-random, maybe?

      Should have used the lava-lamp number generator.......

    HackerNews