Race Conditions
Multiple processes running at the same time messing with each other or interrupting code with other code to create brief flawed states
Last updated
Multiple processes running at the same time messing with each other or interrupting code with other code to create brief flawed states
Last updated
To make programs faster, developers often implement parallelism, or accept multiple connections at once, handling each one in a separate thread. This essentially makes it possible for pieces of code to run at the same time, which in actuality means the CPU can schedule freely between the instructions. While sometimes useful, this can also cause unexpected situations if these simultaneous processes depend on each other. Below you can see some different possibilities for instruction ordering if the P1
and P2
processes execute at the same time:
Regular execution, where the processes don't influence each other. This may happen randomly if the CPU decides to schedule it this way
P1
do_action()
is called between P2
check_input()
and P2
do_action()
, which may change the state that the check P2
verified, before performing the action
The whole of P2
executes in between P1
check_input()
and P1
do_action()
, which is a higher chance of being able to change the state that P1
checked
A more complex interaction, where first P2
do_action()
intercepts a check and action from P1
, but afterward P1
does the same back to a P2
do_action()
, causing 2 race conditions
These behaviors are called Time Of Check, Time Of Use (TOCTOU) vulnerabilities. The problem is when a program checks something securely, but in the time it is checked and the time it is used, something changes. This kind of vulnerability can also happen when a check is performed after the resource becomes usable , which opens up a timeframe where it can be abused before being deleted. It can cause all sorts of vulnerabilities depending on what the check is trying to prevent.
While reviewing code that can run at the same time, you should always think about if something can change between a check, and a use, as well as if anything is exposed for a small period of time. Often even the tightest timing windows are exploitable due to rapid attempts.
Exploitability is increased by making faster and more attempts, as well as increasing the vulnerable window. These factors depend on
A common place for these vulnerabilities to show up is on the filesystem. A file might be checked for malicious input like its length for a buffer overflow, but by the time the file is used by the program sometime after it might have changed, and become malicious when it was previously safe.
Here is an example of a classic vulnerability:
It's not possible to directly pass /flag
, because it would contain the "flag" string. Even a symlink referencing the file would not work because that is also checked before opening it. The vulnerability however is that the path is opened again by cat
after checking, which opens a time window where the file could turn into a symlink referencing /flag
.
To exploit it, we don't need to be successful every time. With many attempts, while quickly swapping in and out the file, we will at some time get lucky where the swap happens right in between the exploitable window. We can first build a simple while
loop in Bash that runs the program on a regular file over and over again:
Then as the attack, we need to write a regular file first that will pass the checks, and then replace that file with a symlink to the flag, which we hope will eventually happen right before cat
runs:
Running both of these loops in separate terminal windows, we find inconsistent output like this with the flag every once in a while:
RENAME_EXCHANGE
The example above works pretty often because there is a while startup of the cat
command between the check, and open()
'ing the file. The window will not always be this big if the check and read happen within the same process for example. In such cases, the whole while true
loop of creating a file, removing it, creating a symlink, and removing it will take too much time. A faster operation is moving a regular file in place, and then swapping it with a symlink file.
You might try making more bash commands, but there actually exists a perfect syscall that swaps two files, by performing renameat2
with the RENAME_EXCHANGE
option. The two files given as arguments are then renamed to each other, which can be looped infinitely as fast as possible to get much faster speeds and a higher chance of success:
Running this binary will swap the files you provide very quickly (~25.000/s). We can exploit the symlink example if we prepare a regular file and a symlink file, then swap them with this binary:
You will notice there are 0 errors of "No such file or directory", because we are never removing files, only renaming them.
The more stuff in between the check and the use, the higher the chance of success. If it takes a few seconds you may even be able to reliably exploit it manually without needing a script. Increasing this delay can be possible if you have some level of control over the operation. If some folder is removed there, for example, you could create a huge recursive structure of folders that would take some time to delete.
Another trick for the filesystem specifically, is when you provide a path for the file that should be opened. The deeper in nested folders your file is, the longer it will take Linux to find the file, as it has to keep entering directories and searching for the next one (paths are limited to 4096 bytes). This can be increased even more by using a chain of symlinks that eventually leads to your file (Linux limits you to 40 symlinks). Combining these can make a very slow path to follow, taking multiple times more milliseconds than a direct path. This is called a "filesystem maze".
Implementing this to its fullest potential would look something like this:
After running (which may take a couple seconds), we have a maze/
folder with all the directories, and importantly symlinks named a_
, b_
, ... t_
that go through the whole maze and eventually end up back at the start. You can put any file in the maze directory, or use ../
to move to another path where the target file is actually stored:
In programs, it is common to see different threads that perform tasks in parallel. These threads sometimes need to share data, with one quick and dirty way being global variables, a form of shared memory. Both threads can read and write on this variable and bugs can happen when they try to do this at the same time. Look at the following example:
Even though two additions of 1
took place, a
starts at 0 and ends at 1! This can happen if memory is read by one thread, and changed in the meantime before it is used and written back. This is why global variables should be used with care. Length checking for buffer overflows, for example, can have much more impact.
In general, if you can get the program in an unexpected state there may be possibilities to exploit it. If this state is even for a split second, you can find this window in a sophisticated attack.
Writing data also doesn't happen all at once. There is often a for()
loop that writes data byte-by-byte, and any time while writing the state of the global variable might be invalid. As you might expect these windows are way smaller than what we have been working within the Filesystem, and thus require more thought into maximizing the speed of attempts and increasing the attack window.
For testing, simple Python scripts and running in a debugger with pauses might be enough to find a bug, but when actually exploiting a raw binary where the window is a few CPU instructions, you ideally want a C program that can run attempts as fast as the system allows. In a network situation, it is also common to send a ton of data repeatedly commanding the server to do something, and a simple trick to quickly send one string over and over again is using yes
:
For a more delicate approach, you might need to implement some multithreading yourself which can send multiple commands at the same time, triggering a race condition:
All previous attacks talked about multiple processes and multiple threads. But in some cases, it is even possible to cause race condition effects in a single thread! This is possible by interrupting the execution with something else, like a signal handler. These are defined with the signal()
function in C and require a signal number as well as a pointer to the signal handler function. A program can define what happens if a certain signal is sent to it, like an alarm()
signal (SIGALRM
=14
) going off.
The important part is that you can send any signal to the process yourself with the kill
command! This means whatever code is in the signal handler, you can run whenever you want during execution. If this handler makes some change between a check and the use of any piece of code, multithreaded or not, it can mess with the state and lead to exploitable behavior.
First, you can recognize a signal handler like this:
The 0xe
(14) is the signal number, and a full list can be found with kill -l
:
If a process is running, we first need to find its Process ID, which we can find with ps aux
or if we know the name of the binary simply pidof [name]
. Then we use kill -[N] [PID]
to send any signal number to the process at runtime:
As this is just another syscall, it can also be executed very quickly on a process to more reliably cause a race condition.
While not having much to do with binary exploitation, race conditions are also very common on the web. These web servers often require being able to accept multiple connections at the same time which requires parallel processing for everything a request can do. A classic example is applying a coupon code, which possibly looks like this in the backend code:
The logic seems correct until we think about what possible flawed states this code can be in. It is expected that we have either not updated anything yet, or that the price and coupons have both been updated. However, the case in between where the cart price is updated, but the coupon itself has not, is a vulnerability here. If we would try to apply coupons very quickly in multiple threads at the same time the code might do the following:
Thread #1 checks if the coupon is used: it is not
Thread #2 checks if the coupon is used: it is not
Thread #1 updates the price from $100 to $90
Thread #2 updates the price from $90 to $81
Thread #1 inserts the coupon as used
Thread #2 inserts the coupon as used
Here we have gotten two reductions out of one coupon code. Going further, there is no hard limit to how many connections we can make at the same time, so this number can go even higher if we carefully send multiple requests at almost the exact same time. This is easily done with the free Turbo Intruder Burp Suite extension which has an example/race.py
script that opens up connections first, and sometime later opens the floodgates to finish all requests at the same time, increasing the chance of a successful race condition significantly.
To use it, right-click any request you want to attempt to spam against the server (likely by Intercepting and then Dropping to not send it yet), then choose Extensions -> Turbo Intruder -> Send to Turbo Intruder. Choose the right script and if you don't need different payloads per request, remove the target.baseInput,
argument from the engine.queue()
call. Then you can press Attack to start and analyze all requests and their responses in the table of results.
A common vulnerable pattern seen in the web is a small window of time where some resource is usable when it shouldn't be. When a webserver in the backend writes some files for example that it later protects, there is a small attack window where the files may be accessible before they are protected. This can for example be seen with file uploads if they are removed after being found to contain malicious content like PHP tags:
Tip: This specific example is flawed for two reasons because a <?
filter can be bypassed with a .htaccess
file and UTF-7 encoding, see my writeup for an example
The above code is vulnerable to a Race Condition because an invalid state exists where the data is put on the disk, but its content is not checked yet. With Turbo Intruder or any other tool you can repeatedly request the URL you think the file will show up at, and at the same time try uploading the file with this snippet of code, which might cause your Intruder to fetch the file contents right in between the write and the check, allowing you to execute any PHP code without the filter.