Rubber duck math: RSA

One topic that I always wanted to learn was mathematics. I never had the opportunity to develop a strong foundation outside of what's needed for day to day programming. The following was created to systematically track my own learning in what's known as rubber duck debugging style; Writing down what I need to know and then find ways to figure that out. I personally find it easier to learn with numeric examples and that's the path taken here. The goal is to build up a full understanding of the topic at hand.

Topic 1: RSA
Since this is learning by example I've picked out RSA cryptography as the starting point. I use RSA day to day in my life as a software engineer. I setup keys for my Linux machines and log in remotely using RSA keys. Yet I have never taken the time to dive in and understand it fully. I can read the Wikipedia article on RSA and I get a vague gist. But I do not have a good understanding of mathematics. So I will need to take it slowly and build a knowledge base as i go.

From Wikipedia:

“A basic principle behind RSA is the observation that it is practical to find three very large positive integers e, d, and n, such that with modular exponentiation for all integers m (with 0 ≤ m < n):

(me)d ≡ m (mod n)

and that knowing e and n, or even m, it can be extremely difficult to find d. The triple bar (≡) here denotes modular congruence.

In addition, for some operations it is convenient that the order of the two exponentiations can be changed and that this relation also implies:

(md)e ≡ m (mod n)”

Exponents under modulo

The above (me)d ≡ m (mod n) statement is new to me. A base to a certain power under modulo returns the base?

Lets load up Python and numerically play with exponents under modulo in order to build up a strong understanding. We'll start with 2x mod 9 as a toy example.

Here's some Python code to generate exponents under a modulo for the sake of playing with this:

def exponentiate(base, modulo, iterations):

current = 1

for exponent in range(0, iterations):

print(str(base) + "^" + str(exponent) + " mod " + str(modulo) +

" = " + str(current % modulo))

current = current * base

exponentiate(2, 9, 12)

2x mod 9:

2^0 mod 9 = 1

2^1 mod 9 = 2

2^2 mod 9 = 4

2^3 mod 9 = 8

2^4 mod 9 = 7

2^5 mod 9 = 5

2^6 mod 9 = 1

2^7 mod 9 = 2

2^8 mod 9 = 4

2^9 mod 9 = 8

2^10 mod 9 = 7

2^11 mod 9 = 5

2^12 mod 9 = 1

The result of exponentiation does loop. 1, 2, 4, 8, 7, 5 repeats as a sequence. Buy why?

Lets keep playing

3x mod 9:

3^0 mod 9 = 1

3^1 mod 9 = 3

3^2 mod 9 = 0

3^3 mod 9 = 0

This one doesn't loop. It just goes to 0 and never moves from there.

4x mod 9:

4^0 mod 9 = 1

4^1 mod 9 = 4

4^2 mod 9 = 7

4^3 mod 9 = 1

4^4 mod 9 = 4

4^5 mod 9 = 7

4^6 mod 9 = 1

4^7 mod 9 = 4

Ok this loops as well. At a different rate than 2x. It loops every 3 rather than every 6 powers.

5x mod 9:

5^0 mod 9 = 1

5^1 mod 9 = 5

5^2 mod 9 = 7

5^3 mod 9 = 8

5^4 mod 9 = 4

5^5 mod 9 = 2

5^6 mod 9 = 1

5^7 mod 9 = 5

This loops every 6 powers, just like 2x.

6x mod 9:

6^0 mod 9 = 1

6^1 mod 9 = 6

6^2 mod 9 = 0

6^3 mod 9 = 0

Again this one just gets stuck on 0 and stays there.

7x mod 9:

7^0 mod 9 = 1

7^1 mod 9 = 7

7^2 mod 9 = 4

7^3 mod 9 = 1

7^4 mod 9 = 7

7^5 mod 9 = 4

7^6 mod 9 = 1

7^7 mod 9 = 7

Loops every 3 powers.

8x mod 9:

8^0 mod 9 = 1

8^1 mod 9 = 8

8^2 mod 9 = 1

8^3 mod 9 = 8

8^4 mod 9 = 1

8^5 mod 9 = 8

8^6 mod 9 = 1

8^7 mod 9 = 8

Loops every 2 powers. This makes sense since it's congruent to -1 under mod 9.

We'll stop here since 9x mod 9 is 0, 10x mod 9 is just going to be the same as 1x mod 9 and 11x mod 9 the same as 2x mod 9 and so on.

We've covered all bases (pun intended).

Some observations;

If the base does not share a factor of 9 you'll never get a factor of 9 as the result.

Bases coprime (no shared factors) to 9 will never generate 0, 3 or 6 as the result of modulo 9 under any power since that would require being a multiple of 3 to start with. As for bases that do share a factor of 9 they are less interesting here and don't behave the same way since they only generate factors of 9. We'll only look at coprime bases from here on.

For coprime bases there must be a loop. The maximum size of that loop is 6 elements.

We've established {0, 3, 6} will never be a result from ax mod 9 unless the base was already factor of 9.

With {0, 3, 6} ruled out it only leaves {1, 2, 4, 5, 7, 8} as possible results. That's 6 possible results which is known as Euler's totient for 9.

Powers are a repeated operation so as soon as a single result repeats we know we will repeat the sequence from there. We couldn't expect a loop to occur after 6 powers because there simply aren't enough unique possible results and any duplicate would mean the loop has already begun. So it must loop and the maximum loop size will be every 6th power.

Some loops occur every 2 or 3 powers rather than every 6.

Lets look at 4x as a numeric example to further play with. If we multiply the results of 4x mod 9 by an offset we should get new results. Lets play with this in Python to get a feel for the results and see if we learn anything from this.

def exponentiate_and_multiply(base, modulo, iterations, multiple):

current = 1

for exponent in range(0, iterations):

print(str(base) + "^" + str(exponent) +

" * " + str(multiple) + " mod " + str(modulo) +

" = " + str((current*multiple) % modulo))

current = current * base

exponentiate_and_multiply(4, 9, 8, 2)

4x * 2 mod 9:

4^0 * 2 mod 9 = 2

4^1 * 2 mod 9 = 8

4^2 * 2 mod 9 = 5

4^3 * 2 mod 9 = 2

4^4 * 2 mod 9 = 8

4^5 * 2 mod 9 = 5

4^6 * 2 mod 9 = 2

4^7 * 2 mod 9 = 8

There's still 3 results before looping but they are all completely different. What about some other offsets of 4x?

4x * 5 mod 9:

4^0 * 5 mod 9 = 5

4^1 * 5 mod 9 = 2

4^2 * 5 mod 9 = 8

4^3 * 5 mod 9 = 5

4^4 * 5 mod 9 = 2

4^5 * 5 mod 9 = 8

4^6 * 5 mod 9 = 5

4^7 * 5 mod 9 = 2

The same sequence as 4x * 2 mod 9 (2,8,5), just with a different starting point.

4x * 7 mod 9:

4^0 * 7 mod 9 = 7

4^1 * 7 mod 9 = 1

4^2 * 7 mod 9 = 4

4^3 * 7 mod 9 = 7

4^4 * 7 mod 9 = 1

4^5 * 7 mod 9 = 4

4^6 * 7 mod 9 = 7

4^7 * 7 mod 9 = 1

The above is the same sequence as the original 4x mod 9 (1,4,7), just with a different starting point.

We still loop at the same rate no matter the offset we multiply by

Every variant of 4x * offset still loops every 3 powers just as 4x did. This makes sense since the offset is just a static multiple so it won't change how often the loop occurs. The multiple changes the values not the loop length.

An offset let's us choose any starting number for the powers to loop on

4x * offset mod 9 starts on offset rather than at 1. So we can start with any number and from there we'll have a loop of 3 elements.

If one of the numbers didn't appear previously then we have a completely new set

Powers are a repeated operation so if you hit a result you've seen before for a given base the following results will repeat just as they did before. We see the sets of {1,4,7} and {2,8,5} appear in different contexts here and at different starting values but it's still the same repeating sequence. So if you hit any one of those numbers in these sets you're absolutely going to be repeating that set as you continue multiplying by 4.

If we hit a completely new number not yet seen then we know we must have a completely new set and as stated above that set must be the same size as previous sets. So as soon as we saw a '2' in 4x * 2 mod 9 we knew that the next results couldn't possibly be part of the {1,4,7} set we saw earlier since we know repeatedly multiplying those numbers by 4 never produces 2 or else it would be part of that set already. It must be part of a new set of 3 numbers from the possibilities of {1,2,4,5,7,8} and numbers of that new set can't contain {1,4,7}.

So in short we've established that;

We can start at any value so all values must be part of some set. All values can only appear once in one set. All sets must be the same size.

Breaking the 6 possible numbers {1,2,4,5,7,8} into equal sized sets that contain no duplicates across sets is only achievable by dividing by factors of 6 which is known as Euler's totient for the modulus, 9.

So the length of the loops we see must be factors of the Euler's totient for the modulus. That's why all loops are a factor of 6 here. They have to be!

Note that there's still a lot we haven't figured out here. We don't know why some bases specifically loop earlier than every 6 results but we can at least understand that they will always loop at a factor of 6 which is key to understanding RSA.

Ok this is starting to make some sense!

Lastly can we figure out why 4x and 7x looped every three powers?

We saw that 2x and 5x loop every 6 powers, the maximum. But 4x and 7x loop every 3 powers. 8x loops every 2 powers. The frequency of these loops, 6, 3 and 2 are all factors of 6 though.

Lets figure out why that is. 9 has prime factors of 32. Let's look at of 2x mod 3.

2^0 mod 3 = 1

2^1 mod 3 = 2

2^2 mod 3 = 1

Ok that's simple. It loops every 2 powers. 1x of course will always return 1 under mod 3.

Now there are 6 numbers under 9 that are either 1 or 2 under mod 3. There are 3 numbers under 9 that return 1 mod 3. Since 2x and 5x are congruent to 2x mod 3 they have 6 possible results under mod 9. Since 1x and 4x are congruent to 1x mod 3 they have 3 possible results.

So it makes sense that 4x always returns one of three numbers. It has to always give results that are congruent to 1 mod 3 and there are only 3 numbers under 9 that work.

We'll come back to this idea later when calculating the Carmichael's totient.

Euler's theorem and Fermat's little theorem

Euler's theorem that we figured out above simply states that if we raise a coprime base to Euler's totient for that base it will equal 1. We can see this in the above examples we created where 6 was found to be the Euler's totient for 9.

2^6 mod 9 = 1

4^6 mod 9 = 1

5^6 mod 9 = 1

7^6 mod 9 = 1

8^6 mod 9 = 1

This shouldn't be surprising since we now know that our loops of powers must loop at a factor of Euler's totient. They all start at 1 and all loop back to 1 at powers of that are factors of Euler's totient.

At this point lets start writing “Euler's totient” as φ(n) for brevity. That symbol is called phi. This is the standard mathematical notation for this.

Euler's theorem written down is thus.

aφ(n) mod n ≡ 1 ( mod n )

or to go one power higher to match the RSA example

aφ(n)+1 mod n ≡ a ( mod n )

The ≡ means 'congruent' or equal under modulo. 1 and 10 are not equal generally but under mod 9 they are.

We now understand how a base to a power under modulus can equal that base!

Seems familiar...

We'll talk about how to calculate Euler's totient next but we should take a quick diversion here. Years ago I was made aware of some mathematics that seemed like magic. It stated the following;

If a number, n, is prime then a(n-1) ≡ 1 ( mod n )

This is now understandable. If n was a prime number then by definition of prime numbers there's n - 1 numbers below it that are coprime to it. So φ(n) is n – 1 in these cases and it's just a specialization of Euler's theorem.

This is known as Fermat's little theorem and is a very easy way to test if a number is not prime.

Fermat's little theorem doesn't guarantee a number is prime however since it's quite possible for φ(n) and n-1 to share factors which can lead to false positives.

So how do we calculate Euler's totient?

At this point we're getting a good sense of why Euler's totient, φ(n), is important. It tells us when results of ax mod n will loop. We can see that the RSA algorithm relies on this since it uses bases that when raised to different powers return the base itself under mod n. The powers used in RSA must be factors of φ(n)+1.

Since numeric examples make things clear lets numerically calculate φ(30)

To calculate φ(n) for some number, eg. 30 (2*3*5) we want to;

–Rule out half of factors (the factors of 2)
–Rule out 1/3rd of factors in the remaining half (the factors of 3), leaving 2/3rd of a half.
–Rule out 1/5th of factors in what remains. Leaving 4/5th of that remainder.

The formula for what we've just done is clear. For each prime factor p we are taking the remainder after ruling out 1/p of what's remaining from the previous step. The remainder of 1/p is (1 – 1/p).

So the formula for the probability of a number not being a multiple of p1, p2, p3... is the following.

(1 – 1/p1) (1 – 1/p2) (1 – 1/p3) (1 – 1/p4)....

Or more concisely

\prod_{〈p ∣ n〉} (1 - \frac{1}{p})

This gives a fraction which represents how often a number is not already a multiple of one of those primes listed.

To turn this fraction into φ(n) which is a count of numbers that aren't factors simply multiply by n.

So φ(n) = n(1 – 1/p1) (1 – 1/p2) (1 – 1/p3) (1 – 1/p4).... =

n \prod_{〈p ∣ n〉} (1 - \frac{1}{p})

We do this for all p distinct prime factors of n.

Note that distinct prime factors are slightly different to the unique prime factor. If asked for the unique prime factors you would write 20 as 2*2*5. Distinct prime factors is the de-duplicated version of this. We don't want to rule out factors of 2 twice so the distinct prime factors of 20 are only 2 and 5.

Here's the code to generate φ(n) up to a given n. We'll use this next.

def calculate_eulers_totient_given_factors(n, distinct_factors):

numerator = 1

denominator = 1

for factor in distinct_factors:

# Calculate numerator and denominator independently to avoid FPU errors

numerator *= (factor - 1)

denominator *= factor

return numerator*n//denominator

def distinct_factors(n):

# Only works to 100^2

primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

factors = []

for f in primes:

if n % f == 0:

factors.append(f)

if len(factors) == 0:

# It's prime

factors.append(n)

return factors

def generate_eulers_totient_to_n(n):

for x in range(n):

print('φ(' + str(x) + ')=' +

str(calculate_eulers_totient_given_factors(x, distinct_factors(x))))

generate_eulers_totient_to_n(51)

So now we understand (me)d ≡ m ( mod n ) right?

We're just beginning to understand how this works now. Knowing Euler's theorem that aφ(n) mod n ≡ 1 ( mod n ) we could make a guess that perhaps e * d = φ(n)+1.

Onto the next part of the Wikipedia article on RSA, Key Generation:

“The keys for the RSA algorithm are generated in the following way:

1. 1. 1)Choose two distinct prime numbers p and q. p and q are kept secret.
  2. 2)Compute n = pq.
  3. 3)Compute λ(n), where λ is Carmichael's totient function. Since n = pq, λ(n) = lcm(λ(p),λ(q)), and since p and q are prime, λ(p) = φ(p) = p − 1 and likewise λ(q) = q − 1. Hence λ(n) = lcm(p − 1, q − 1).”

Ok we expected that this would use Euler's totient function but it actually uses something called Carmichael's totient function. We figured out φ(n) as the absolute worst case of when a power will loop back to the original value. But it makes sense there might be smaller numbers that are factors of φ(n) that work as well. In fact we saw some powers loop earlier than φ(n) before.

If two primes share factors in their individual Euler's totient then when multiplied they will actually loop earlier than Euler's totient. Lets write this out numerically to gain an understanding.

φ(3) = 2

φ(5) = 4

Now φ(15) is just the multiple of φ(3) and φ(5) since the calculation of the φ(n) breaks up the factors of n and performs the same operation on each.

φ(15) = φ(3)*φ(5) = 8

But lets numerically play with φ(15), skipping factors of 15 of course.

2^0 * 1 mod 15 = 1

2^1 * 1 mod 15 = 2

2^2 * 1 mod 15 = 4

2^3 * 1 mod 15 = 8

2^4 * 1 mod 15 = 1

4^0 * 1 mod 15 = 1

4^1 * 1 mod 15 = 4

4^2 * 1 mod 15 = 1

4^3 * 1 mod 15 = 4

4^4 * 1 mod 15 = 1

7^0 * 1 mod 15 = 1

7^1 * 1 mod 15 = 7

7^2 * 1 mod 15 = 4

7^3 * 1 mod 15 = 13

7^4 * 1 mod 15 = 1

8^0 * 1 mod 15 = 1

8^1 * 1 mod 15 = 8

8^2 * 1 mod 15 = 4

8^3 * 1 mod 15 = 2

8^4 * 1 mod 15 = 1

11^0 * 1 mod 15 = 1

11^1 * 1 mod 15 = 11

11^2 * 1 mod 15 = 1

11^3 * 1 mod 15 = 11

11^4 * 1 mod 15 = 1

13^0 * 1 mod 15 = 1

13^1 * 1 mod 15 = 13

13^2 * 1 mod 15 = 4

13^3 * 1 mod 15 = 7

13^4 * 1 mod 15 = 1

14^0 * 1 mod 15 = 1

14^1 * 1 mod 15 = 14

14^2 * 1 mod 15 = 1

14^3 * 1 mod 15 = 14

14^4 * 1 mod 15 = 1

φ(15) = 8 but surprisingly everything loops at factors of 4 or less, never 8 itself.

If we look at results from ax mod 15, 3 and 5

2^0 mod 3 = 1

2^1 mod 3 = 2

2^2 mod 3 = 1

2^3 mod 3 = 2

2^0 mod 5 = 1

2^1 mod 5 = 2

2^2 mod 5 = 4

2^3 mod 5 = 3

2^4 mod 5 = 1

2^5 mod 5 = 2

The reason becomes clearer. A given power must be return 1 or 2 mod 3. There are 10 numbers below 15 that match. A given power must then also return 1 or 4 mod 5 (if it was 1 mod 3) or 2 or 3 mod 5 (if it was 2 mod 3). Within those 10 numbers there are only 4 that are possible. The results of powers modulus of 3 and 5 align and limit our options under mod 15.

The formula here is to find the least common multiple of 2 and 4. This is where the results of the totient's for 3 and 5 align. In this case it's 4 and that's our Carmichael's totient.

Note that for primes the Carmichael's totient and Euler's totient are equivalent. So this only takes a small modification to the code above.