Reverse Engineering for Pwn
Understand the binary and find vulnerabilities by analyzing it
Last updated
Understand the binary and find vulnerabilities by analyzing it
Last updated
Binary Exploitation always first starts with understanding the binary. This can be done in two ways, static analysis and/or dynamic analysis. In static analysis, you are looking at the code of the binary itself, and not yet executing it. With dynamic analysis, you are not looking at the code but running the binary and trying inputs yourself that might do something interesting. Often the best choice is a combination of both.
For really simple programs, you might get away with just dumping the assembly code and looking through it:
However, this is very low-level code and makes it hard to see the big picture. That is why we use decompilers to try and guess what the original source code might have looked like. Common ones include IDA or Ghidra.
When looking at that C code, you can look at what steps the code takes and especially where your user input goes. It may be written to a buffer with a smaller size than the input allows, which can overflow it. When you find such a case you can switch over to Dynamic analysis to test your ideas.
A good thing to know before jumping straight into dynamic analysis on a compiled binary is that if you have the C source code, you can add debug symbols for yourself with the -g
argument:
This will not only show the source code while debugging but also local variables and structs. Using commands like p [variable]
you can print local variables in their fancy representation, for structs this means including the names of attributes.
Dynamic analysis is running the program and testing things. In the case of a buffer overflow, your input is bigger than the buffer it is being put into. While you can try to find this through Static analysis, in most cases it is easiest to just test it with a large input to see if the program crashes. For example:
Programs often have input in the form of STDIN (Standard Input, typing after the program is started), or from command-line arguments. In some cases, it may also read files, connect to sockets, or more. When you find any sort of input it is a good idea to try putting a large string in there just to be sure. For simpler binary exploitation challenges this will almost always find you the vulnerability quickly:
When you see this Segmentation fault
message it is a clear sign of something in the program corrupting causing it to panic. To view the error in more detail you can look at the dmesg
command which will generate a few logs on such a fault.
From here, you'll often want to find the exact offset for your payload to know what will be overwritten. This is often done using a de Bruijn sequence, also known as a cyclic pattern. This is a string of text that never repeats itself, so if you find some substring of it in an error for example you can easily find what part of the string that substring was. As opposed to if you just saw "AAAA" and don't know what part of the 200 A's it came from.
A pattern like this can easily be generated using pwntools:
Buffer overflows are about controlling the Instruction Pointer, and when a crash happens this is often because of a ret
(return) instruction which pops an address from the stack, and jumps to it. If you have overflowed the stack in this way, you might have overflown this return address, and it will try to jump to the address your text represents. This address can be found easily using GDB GEF. You can run the binary, and then when it crashes you will get a lot more information:
This x $rsp
command examines the value at the Stack Pointer register. The ret
instruction that it crashes at will take the first value from there, and jump to it. In this example the value was 0x6161616f
, which we can look up with the cyclic
tool:
Now we know that we need to provide 56 A's to overflow the buffer, and then the following bytes become the instruction pointer. Finding this offset is something you will have to do for almost every buffer overflow you find, so it is good to get used to.
In GDB, you don't have to type all your input yourself. Similarly to bash, you can redirect input from a file into your binary. This can also be really useful for inputting special characters while testing your payload.