RSA

An encryption standard using prime number factorization to encrypt and decrypt with an asymmetric keypair

Description

A big description of the whole RSA algorithm, and equations

Symbols

  • n: Modulus, part of public key

  • p and q: The prime factors of n

  • e: Exponent, part of public key

  • c: Ciphertext, result after encrypting

  • d: Decryption exponent, part of the private key

  • m: Message, plaintext

  • ϕϕ or phi: Decryption modulus

Equations

  • n=pqn=pq, where pp and qq are distinct primes.

  • ϕ=(p1)(q1)ϕ=(p−1)(q−1)

  • e<ne<n such that gcd(e,ϕ)=1gcd(e,ϕ)=1

  • d=e1modϕd=e^{-1} \bmod ϕ

    • ed=1modϕe*d=1 \bmod ϕ

    • dp=dmodp1d_p=d \bmod p-1

      • edp=1modp1e*d_p=1 \bmod p-1

    • dq=dmodq1d_q=d \bmod q-1

      • edq=1modq1e*d_q=1 \bmod q-1

  • c=memodnc=m^e \bmod n, where 1<m<n1<m<n

  • m=cdmodnm=c^d \bmod n

Python

RSA is a mathematical cryptosystem that doesn't support strings straight away. That's why we use the long_to_bytes() functions here to convert the strings to Big Integers first for the calculations, and then back to a string to display the text

RsaCtfTool has a lot of attacks built-in for common challenges. See test.sh and the test() functions in the attacks on their GitHub for lots of examples of what inputs an attack needs

Common options

  • --private: Display private key if recovered, should always be included

  • --attack: Specify the attack modes (default: all)

  • -n N: Specify the modulus (format: int or 0xhex)

  • -e E: Specify the public exponent, using commas to separate multiple exponents (format: int or 0xhex)

  • --uncipher UNCIPHER: Uncipher a ciphertext, using commas to separate multiple ciphers

  • --publickey PUBLICKEY: public key file. You can use comma-separated or wildcards (* or ?) for multiple keys.

Attacks

A collection of how to exploit attacks on specific RSA cases

Private key from Public key

This automatically tries many common vulnerabilities in private key generation to guess it:

Factoring manually

The whole security of RSA comes from the difficulty of finding the private factors p and q that multiply to the public n. With huge numbers generated completely randomly, this task is impossible for today's computers. But for smaller numbers or numbers with specific patterns this task may become doable. The method above tries many patterns of primes, but if the primes are small we can try to factor them ourselves.

First, a good idea is to check if this hasn't been done already. FactorDB has a giant database of already factored numbers, some of which are surprisingly big. It does not contain every computable number though, so sometimes you'll want to do it manually. One tool that does this very quickly and efficiently is yafu (install):

Efficiently compute prime factors of a number on your own machine

Small exponent, short plaintext (root)

With a small exponent, the plaintext (m) will not be very large after exponentiation. Then after the modulus n is applied only a few iterations of k will be done, or even none if me<nm^e<n. This means that we can just iterate over k until we find a perfect integer root.

A good indicator of this is when c is significantly smaller than n. Here's an example where e=3, resulting in the equation we can brute-force:

c=m3modnm3=c+n×km=c+n×k3\begin{align*} c &= m^3 \bmod n \\ m^3 &= c + n \times k \\ m &= \sqrt[3]{c + n \times k} \end{align*}

Chinese Remainder Theorem (CRT)

The attack using the Chinese Remainder Theorem is a more powerful version of the me<nm^e<n idea from above. Instead, it works when m<nm<n, and you have ee amount of different cc's and nn's with the same plaintext. So often when the message isn't too long (no padding) and you have multiple ciphertexts and public keys, you can use this attack.

Let's say that in RSA you use e=3e=3 (can be any exponent just requires more samples). Then you would need 3 ciphertext and public key examples. The equations for this would look like this:

{c1=m3modn1c2=m3modn2c3=m3modn3\left\{ \begin{array}{ll} c_1 = & m^3\mod n_1 \\ c_2 = & m^3\mod n_2 \\ c_3 = & m^3\mod n_3 \\ \end{array} \right.

Then we can use the idea of CRT which says using:

{c1=xmodn1c2=xmodn2c3=xmodn3\left\{ \begin{array}{ll} c_1 = & x\mod n_1 \\ c_2 = & x\mod n_2 \\ c_3 = & x\mod n_3 \\ \end{array} \right.

where you know c1,c2,c3c_1, c_2, c_3 and n1,n2,n3n_1, n_2, n_3, you can find xx efficiently. In the case of RSA, this would be m3m^3, and then we can simply get the cube root to find mm and we have cracked the message. (source)

Note: The CRT actually gives:

x=m3modn1×n2×n3x = m^3\mod n_1\times n_2\times n_3 This is why mm has to be less than nn, otherwise this modulus will have wrapped around and you would have to guess how many times this has been done. If it's barely too large you might be able to brute-force this kk value, but otherwise, it will take too much computation

An example implementation for this attack would be:

Here get_encrypted() is a simple RSA implementation with a high enough e=257 that a simple root of the ciphertext won't work. But using the CRT you can get 257 different samples and compute x, to finally get m.

This script runs for about 40 seconds, but for e=65537 and 1024-bit primes, it would take about 10 hours. The biggest bottleneck here is generating the primes by the "server", as this can take around a second for 1024-bit primes. When we need 65537 samples this really adds up, but in a real-world scenario, 10 hours is very doable.

Coppersmith's Attack

The small exponent attack explained in the earlier root section only works when the plaintext is short. That is why there is another attack that requires any of the following information:

The technique involves quite a bit of math, being a result of Lattice Reduction (LLL). A great page with some details and resources about the specifics of the Coppersmith's attack is this GitHub repository by ashutosh1206.

Stereotyped Messages

For this type of attack, we need to know a part of the start of the plaintext. An example of such a challenge would be the following (also see this writeup):

To verify we can use this attack to efficiently recover the plaintext we need to make sure that: n1/e>differencen^{1/e} > \mathit{difference}. Where this difference is between the plaintext and your guess of the plaintext. If you know a significant part of the start of a plaintext this difference will be small enough to satisfy the condition. It also means that the smaller e is, the larger the upper bound for the difference, and the less plaintext we need to know. We can do a sanity check in Python like this:

In this example, the XXX is small enough that we can use this attack to efficiently recover the plaintext. We can use SageMath to do some mathematical magic and compute possible differences, which is Coppersmith's attack. We can even try multiple lengths of the unknown text because this attack only takes a few seconds:

Euclidean Algorithm

The Euclidean Algorithm is originally an algorithm for efficiently computing the Greatest Common Divisor (GCD) for two numbers. If this answer is 1, it means the numbers don't share any factors, which may be important in some cryptosystems.

There is also the Extended Euclidean Algorithm that can do a lot more. It can find two new numbers, that when multiplied with their respective numbers and added, equal the greatest common divisor. This is especially useful when the numbers are coprime, meaning the GCD is equal to 1. Then you can rearrange the equation to have something useful in modular arithmetic. In the example below, aa and bb would be the inputs, and the Extended Euclidean Algorithm finds ss and tt:

as+bt=gcd(a,b)as+bt=1as=1btas=1modb\begin{align*} as + bt & = \gcd(a, b)\\ as + bt & = 1\\ as & = 1 - bt\\ as & = 1 \mod{b}\\ \end{align*}

This last line is how we get the multiplicative inverse in RSA, used to generate the value of d. ​But in custom schemes similar to RSA, this may be exploitable. When working with mod b\bmod\text{ }b it may be possible to reduce some arguments to 1 like above.

Reverse of Modulo Multiplication

Here is an example of using this knowledge to break a flawed cryptosystem. Let's say it uses two random numbers that multiply together, and after the modulus is applied, you get the answer. In this example you have only one of the factors, the modulus, and the result, you want to calculate the other factor that was used.

In normal arithmetic without a modulus, this would be easy. Just divide the answer by the factor you know, to get the other one. When in modular arithmetic, this is a bit harder. The answer might have wrapped around to another iteration of the modulus. With big numbers, just brute-forcing this takes way too long. That is where the Extended Euclidean Algorithm comes in. As explained in this math exchange answer, it tells us we can get a number ss that when multiplied with aa, becomes 1modb1 \bmod b. So in our original equation, we can just multiply by this ss to get a nice equation for calculating xx with only variables we know:

xa=cmodbxas=csmodbx1=csmodbx=csmodb\begin{align*} x * a &= c \mod{b}\\ x * a * s &= c * s \mod{b}\\ x * 1 &= c * s \mod{b}\\ x &= c * s \mod{b}\\ \end{align*}

​So this finally means we have an efficient way of calculating the other factor we wanted. In code, it would look something like this:

Redacted Private Key

Sometimes you'll find part of a private key or a partially redacted screenshot for example. In some cases, there is still enough information in the redacted key that we can recover the entire private key. A good example is this writeup from Cryptohack.

Often you'll find the private key in the PEM format:

This format encodes a few numbers that RSA uses in the following order:

When the key is decoded from Base64, the raw data is split by ASN.1 headers. These differ per private key but are in a simple format. For example 02 82 01 01:

  • 02: The data type: Integer

  • 82: Meaning the length of the encoded integer value will be stored in the following 2 bytes

  • 0101: The length of the integer value that follows. Taking this value (257) and reading the next 257 from Big Endian results in the number

You can then search the redacted private key for these header values and find parts of the private key. Then after writing down every number you have found, you can try to use the RSA Equations to calculate or brute-force unknown values.

You might find the dpd_p value and q, but no n as seen in the writeup linked above. In that case, we know some equations, and we can find p having only one unknown:

dp=dmodp1edp=1modp1edp=1+kp(p1)p=edp1kp+1\begin{align*} d_p = d \bmod p-1 \\ e*d_p = 1 \bmod p-1 \\ e*d_p = 1+k_p(p−1) \\ p = \frac{e*d_p-1}{k_p}+1 \end{align*}

Here kp<ek_p < e meaning we can easily brute-force it until we get a valid p that is prime.

Now we have p and q, and can easily multiply them to get n. We can also now calculate d from p and q to get the decryption key.

If you get multiple possible values for p, you could use some partial numbers in the redacted private key to verify each possible value. If you have some bits of the n for example, you could calculate n and then verify it with the known bits.

No modulus (n)

If exponent e is small, the ciphertext might not have wrapped around with the modulus and you can just root it to get the original plaintext. You can verify this by checking if the ciphertext is a perfect root.

Known plaintext

If you have 2 plaintext-ciphertext pairs, you can recover N with some math:

m1e=c1modnm2e=c2modnm1ec1=k1nm2ec2=k2n\begin{align*} m_1^e &= c_1 \bmod n \\ m_2^e &= c_2 \bmod n \\ &\Downarrow \\ m_1^e - c_1 &= k_1*n \\ m_2^e - c_2 &= k_2*n \\ \end{align*}

Here the last two equations both have n multiplied by some different k. The product of knk*n can be easily calculated as the difference between mem^e and cc, which we know from the plaintext-ciphertext pairs. This means we know two different numbers, which have n as a factor. We can use the Greatest Common Divisor (GCD) to calculate this common factor n. Sagemath has a very fast implementation for the power and GCD:

2 keys: Same n, same ciphertext, different e

Answer to Cryptography Stack Exchange post explaining solution

Use the comma (,) to separate -e and --uncipher values

Multiple keys with common factors

Converting keys

Formats of RSA keys can be tricky. Here are a few common ways to convert public/private keys into other useful formats

Documentation explaining how to work with RSA keys in PyCryptodome

n and e to any file format

Any file format to n and e

Last updated