Sandboxes (chroot, seccomp & namespaces)
Escaping from sandboxes environments by exploiting the capabilities that were left open
chroot
chroot is a command and syscall that means Change Root, which will change the meaning of /
. You provide it with a path to a directory that will be the jail, and it does two things (source):
Point
/
to the jailed directory (eg./tmp/jail
)While inside the jail,
../
will not go up further than the root (/tmp/jail/..
->/tmp/jail
)
This command or syscall normally needs root permissions to work, and can be called like this:
Or in the shell:
Afterward, any path starting with /
will be relative to /tmp/jail
, and if your current directory is /tmp/jail
, any ../
attempt will stop at /tmp/jail
. A common pitfall is the fact that paths like /bin/bash
or even dynamically linked libraries to spawn a shell are also relative to here, so all required functionality needs to be moved into the jail directory to work.
Importantly, what it does not do is:
Close existing resources (file descriptors)
Change the current working directory into the jail
It is not intended to be a security measure, as it is very limited in what it does, and many tricks can get past its protections, as will be explained in the following section.
Bypasses
Current directory still outside chroot()
chroot()
One simple problem might be that the current directory is not set inside the jail. This allows you to access any file in your current directory before entering the jail and allows you to use ../
sequences freely. The only catch is that /
paths will still be relative to the jail.
Overwrite chroot()
chroot()
Another big problem is that you can only have one chroot at a time, meaning if another chroot is started the previous one will be forgotten. Remember that only users with the CAP_SYS_CHROOT
capability can call it, but if you are able to it is trivial to escape the jail, even from inside it. This can be done by moving the jail to somewhere you are not, such as a new directory you make.
Open Resources
The last trick is utilizing already-opened resources that are outside the jail, which you can still interact with. If you are able to open
the /flag
file before being jailed for example, you can still read from the file descriptor it has (starts at 3).
The same goes for open directories, where you can use it as a different relative directory using syscalls like openat
instead of open
, or fchmodat
instead of chmod
.
This last method is even more powerful because if you are able to start the program yourself via bash (like SetUID), you can let bash open a directory for you using the [n]< path
syntax:
Using the fchdir
syscall you can also use this trick to change the current directory outside the chroot, and use ../
directory traversal tricks again
seccomp
Seccomp (secure computing mode) is built to be a security mechanism, unlike chroot. It is a highly customizable way to restrict which syscalls are allowed and does so like a network firewall (it even uses the Berkley Packet Filter, originally made for networking). Often you will see this as an allowlist (all blocked except a few) or a blocklist (all allowed except a few). Blocklists are inherently dangerous because a developer may forget a dangerous call with unexpected functionality, but syscalls in an allowlist may still give unnecessary permissions that can be exploited.
A simple example of a blocklist using seccomp
looks like this:
This is an irreversible action that will make it so the rules are applied to any code further on the program, such as shellcode and even child processes or forks, but also any code the program contains itself. Because of this, the rules need to be lenient enough to allow regularly required code, but not so lenient that an exploit can abuse it.
Reading the rules
A good start in trying to bypass these rules is understanding them correctly. While static analysis might be enough for a simple program, a more complex one can be easier to understand through dynamic analysis. The following tool implements handy utilities for extracting seccomp rules:
The simplest and most common command is seccomp-tools dump
which takes a binary that it will run. Using ptrace
it can extract the seccomp rules at runtime and print them to the console:
Seccomp rules are built with Berkeley Packet Filters (BPF), meaning they have instructions and code flow like assembly. This is what you see dumped and can analyze.
In the above example, it first checks if the architecture is equal to 64-bit syscalls, if not, it will goto
the return KILL
command, blocking any 32-bit syscalls. Then step by step the open
, openat
, and execve
syscalls are checked and killed if it is any of those. When it passes through all the checks it ends at return ALLOW
continuing with the syscall.
In more practical situations you might need some arguments or configuration for the seccomp filter to activate, where this tool won't find them yet by just running the binary. A simple trick for passing arguments is creating a .sh
file that starts the program how you want it, then analyze that:
Lastly, you can even attach to a running process with this tool if the process is in a seccomp'ed state you would like to analyze.
Bypasses
There is no clean-cut way to "bypass" any seccomp configuration, and it really depends on what specific syscalls are allowed or denied. With that being said, there are some tricks developers might not expect that can still lead to a big impact (source).
Overly permissive policies
When common syscalls like open
are blocked, there may be syscalls the developer forgot to block. Something like openat
might still be allowed, while it can do almost the same using a Directory File Descriptor (DFD). In this specific case, a useful variable is AT_FDCWD
which has a value of -100
. It is a default DFD that points to the current working directory, meaning it can be used as a valid DFD in the ...at
versions of syscalls.
There are simply a ton of syscalls, making a blocklist hard to make secure. You can check out a table of all syscalls to find one that seems interesting and is allowed, and get more information about an unknown syscall using the man 2
command for syscalls (eg. man 2 openat
).
Architecture Confusion
This is a special case. There are two types of syscalls: syscall
for 64-bit and int 0x80
for 32-bit. These architectures have different syscall numbers dependent on rax
and eax
respectively. By default, seccomp will kill all 32-bit syscalls. However, in certain non-default situations, you might find the 32-bit syscalls are enabled and more permissive than 64-bit. They can be enabled with the following line, and afterward need to be handled separately from 64-bit syscalls:
To exploit this, simply use a 32-bit syscall table and int 0x80
instructions instead of syscall
. During compilation you don't need to do anything special, here is an example:
Side Channels
While you might want to execute a shell using execve()
, sometimes leaking secrets can be enough. If you find yourself being able to access sensitive information without being able to exfiltrate it to yourself, think about possible Side Channels. Even 1 bit of information can eventually be a full secret if repeated often enough. Here are some ideas:
The exit code of the program is 8 bits (0-255), using
exit()
: In bash, you can check the exit code of the previous command with the$?
variable, and executing a program in any programing language often returns its 8-bit exit code. If you are able to read it you can exfiltrate 8 bits in one go, like one character of a string. Then repeatedly do this for each character in the string.The runtime of a program, similar to Blind SQL Injection (
sleep()
, long computation, loop): To be efficient, this is a balance between a low wait for fast attempts, and a long enough wait to be able to confidently measure the difference. This scenario can be useful if there is really no response from the program, like in a remote setting.Crash vs no crash: In some cases, it is obvious that a program crashed, because the application explicitly told you with an error message or simply if some expected output is missing. This can tell you 1 bit of information by either crashing or continuing execution, and the way to exploit this is similar to using the runtime of the program.
Using the exit code is pretty straightforward. We will read the flag into memory, then load a byte of it into the exit()
argument, and call the function. In Assembly, we could leak the first byte:
This might give exit code 67 when we check using echo $?
, which corresponds to the C
character. To leak the whole secret, we simply keep doing this while incrementing the offset:
When the exit code is not directly visible, you might be able to get a boolean response using the time or crash method. Implementing this can be done in various ways, but the simplest and most efficient way is to simply take the n
th bit, and decide what to do depending on that bit.
To crash the program, you could read/write from an unmapped address for a 1, and simply ret
for a 0. This is nicer than a time-based boolean result because you get an instant yes/no response, allowing quicker and more confident extraction. To extract a single bit in assembly, you can first use a byte offset, and then shift it using a bit offset:
We can then dynamically change this payload to get any byte and bit we need. In a Python script we go through all the bits to slowly recover the whole secret:
Namespaces
The workings of namespaces can go very complex, so this section will not go very deep. It will only show a few simple ideas on how to escape from namespaces with specific permissive features.
One dangerous part of namespaces is the ability to mount the host filesystem in the sandboxed environment. If you can read/write as a high-privilege user in the sandbox, you can do the same on the mount. This is the most interesting when you have access to a low user on the host machine, and access to a high user inside the sandbox. These can interact with each other to possibly receive high privileges on the host machine.
Imagine there is a /data
directory mounted to /tmp/data
in the sandbox that comes from the host. When you can write here, you can create a SetUID binary as the high-privilege user to then access as the low-privilege user on the host machine:
Even when there is no directory explicitly mounted, you may be able to write a SetUID shell and access it on the host through the jail directory. Similarly to chroot, namespaces can use pivot_root
to change the /
to somewhere else. If you can make this directory accessible to the low-privilege host user, however, you may be able to access the SetUID shell again because it exists on the same filesystem:
Last updated