Hashing
One-way functions that generate a unique hash of some data
Note: This chapter is about attacking hashes in a cryptographic way. For information about brute-force cracking hashes of passwords, for example, see Cracking Hashes
Collisions
Collisions in hashing functions mean two different inputs result in the same hash. In a perfect hash function, this should not be possible, or at least infeasible. There are a few types of collisions with varying exploitability:
For lots of details on how these attacks work and how to exploit them see the following GitHub repository:
MD5 - Identical Prefix
When looking at collisions MD5 is very broken. Nowadays it's trivial to create your own Identical Prefix attack within minutes. We can use a tool like HashClash with very efficient code to do the work for us:
With MD5 you can create any identical blocks consisting of 64 bytes as the prefix, then two collision blocks that differ, and finally any identical suffix.
An example of two files you could create with this is the following:
To create such a file you can use the poc_no.sh script from the HashClash repository. It takes one argument which is the prefix for the collision. It can contain:
Any exact multiple of 64 bytes, as the identical prefix
A multiple of 4 bytes, with a maximum of 12 bytes in total. These will be the starting bytes for the collision blocks
In the example above I used a file containing "A"*64 + "B"*64 + "C"*64 + "test"
as the prefix. This will make sure the identical prefix starts with AAA...CCC and the collision blocks start with "test".
Then after this, I added "D"*64 + "E"*64 + "F"*64
to the generated collisions because any data after will only change the hash, but the collision will remain.
MD5 - Chosen Prefix
The chosen prefix attack is a lot more powerful but also takes quite a bit longer to compute. It takes about one day to do one collision between files, depending on your computer.
With such a collision you could make two completely different files have the same md5 sum, only having a few collision blocks at the end, and allowing an identical suffix.
To create a collision like this, you could use the cpc.sh script from HashClash. It takes two different prefix files as input and creates two files with those prefixes and some collisions block appended. Then you can manually add an identical suffix to it later because the collision will remain.
I've let a VPS with 24 cores run for 1.5 days to find a chosen-prefix collision like this. I chose one prefix of a simple 256x256 png image, and the other prefix to be an XSS and PHP shell payload. So I could leave the terminal and look back at it later I used the screen
command to start a session, and used screen -r
every once in a while to check back into it. Another way would be to redirect the output to some log file to check.
The cpc.sh script will then find a collision by appending collision blocks to both prefixes, which is why I added a <!--
comment tag to the shell, and I made sure to add the IEND
and CRC to the PNG image to signify the end of the PNG. Some tools like pngcheck
complain about data after the IEND
, but all other things I've tried parse the PNG completely fine. Here are the original prefixes:
Then after the collision, there were 9 blocks of 64 bytes added. You can see the raw collision files below:
SHA1
Google Research has found an identical prefix collision in the SHA1 hashing algorithm, and so far is the only one to do so. It still takes 110 years of single-GPU computations to compute a collision yourself, so the only practical way right now is to use the prefix from Google.
SHA1 works by splitting the input into blocks of 512 bits (64 bytes). For every block, it does its operations, and if the two blocks are the same, the two outputs of that block are the same. It keeps going taking the previous block and continuing on it with the next block.
A collision in SHA1 means that there were 2 sets of 5 blocks (320 bytes) found, that when SHA1 hashed give the same hash, while actually being different.
Because of the way SHA1 works, we can start off by using the first 5 blocks from SHAttered, and then if the rest of the files have identical content, their hashes will be the same.
To get different behavior from the two files, a common idea is to check if a certain byte that is different in the collision blocks. This way both files contain both pieces of code, but only one is chosen in both.
In HTML, this can be done by looking at the innerHTML
with charCodeAt(102)
in JavaScript. For a simple example see:
Length-extension Attack
Hashing algorithms are sometimes used for verifying messages. This can be done by appending a secret "salt" value in front of the data that only the server knows. If the server then generates a hash of this value, it is the only party that is able to do so. If you don't have the salt value you cannot generate a hash that starts with that salt.
But the length-extension attack makes this semi-possible. It allows you to add data to an existing hash, with the catch that there will be some data prepended to your addition.
Hashing functions like SHA256 or SHA512 might sound secure, but without proper precautions to this attack, they may be vulnerable to it. The following hashing algorithms are all vulnerable to this attack:
MD4
MD5
RIPEMD-160
SHA-0
SHA-1
SHA-256
SHA-512
WHIRLPOOL
To see an example of this attack look at the script below:
I also made a writeup of a challenge that uses this attack to perform SQL Injection:
Last updated