The one-time pad

This encryption is theoretically impossible to break. However, it is essentially theoretical because it remains very complex to implement.

Posted on
12 minutes
2465 words

A bit of history

The one-time pad is an encryption method invented by Gilbert Vernam, an American engineer at AT&T Bell Labs, in 1917.

Note: This post was translated from french with the help of AI. The original post was written with the knowledge of a younger me.

It is based on a much older encryption technique called the “ Vigenère cipher”.

The one-time pad was further perfected by the American general Joseph Mauborgne, who introduced the randomness of the encryption key.

It consists of encrypting a message with a key by adding the key to the message, element by element. The key must be used only once (hence the “one-time” or disposable). To decrypt, you subtract the key from the message.

Okay, history is nice but in practice, how does it work?

How it works

To understand the one-time pad, you first need to master (or at least understand) congruence (or modulo arithmetic).

The basics

We’ll start by using non-random texts for the keys. Note, this is just to start slowly because here security is zero.

We’ll take a Bigflo&Oli (french rappers) text as the message to encrypt and a Georges Brassens (french singer) text as the key.

The message is from “Alors alors”:

“Alors alors, On devait faire le tour de la Terre”

The key is from “Bancs publics”:

“Les gens qui voient de travers pensent que les bancs verts”

Got modulo in mind? Good, let’s start by assigning a number to each letter of our alphabet:

A=0, B=1, C=2, D=3, E=4, F=5, G=6, H=7, I=8, J=9, K=10, L=11, M=12, N=13, O=14, P=15, Q=16, R=17, S=18, T=19, U=20, V=21, W=22, X=23, Y=24, Z=25

So, for our 26-letter alphabet, we have numbers from 0 to 25.

Encryption

To encrypt, it’s simple, we add letters face to face. “Add letters, are you crazy?” you might say, but that’s why we’ve given them numerical equivalents, follow along. Since we haven’t assigned values to commas, we’ll remove them.

Let’s go:

"Alors alors On devait faire le tour de la Terre"

+

"Les gens qui voient de travers pensent que les bancs verts"

The “A” of “Alors” + the “L” of “Les” = A + L = 0 + 11 = 11 = L

Then “l” + “e” = l + e = 11 + 4 = 15 = p

So, little by little:

"Alors alors On devait faire le tour de la Terre"

+ + +

"Les gens qui voient de travers pensent que les bancs verts"

= = =

0 11 14

+ + +

11 4 18

= = =

11 15 32

= = =

L p ?

Indeed, for adding “o” and “s”, we get 14 + 18 = 32 and here is the problem, we don’t have a letter with value 32.

But remember, congruence (modulo, all that).

Since we have only 26 letters, we’ll use modulo 26 to stay in range.

Modulo 26 of 32 is:

$$ 32 \pmod{26} \equiv 32 -26 \pmod{26} \equiv 6 \pmod{26} $$

And 6 corresponds to letter G.

So, we can finish our calculation:

"Alors alors On devait faire le tour de la Terre"

+ + + ...

"Les gens qui voient de travers pensent que les bancs verts"

= = =

0 11 14

+ + +

11 4 18

= = =

11 15 32 mod(26)≡ 6 mod(26)

= = =

L p g …

In the end:

"Alors alors On devait faire le tour de la Terre"

+

"Les gens qui voient de travers pensent que les bancs verts"

=

"Lpgxw ndela Jb liitlx yrimi cw ishj hr eq Nicvw"

We have our encrypted message.

Decryption

To decrypt, we do the opposite: subtract the key from the encrypted message.

We have encrypted message “Lpgxw ndela Jb liitlx yrimi cw ishj hr eq Nicvw” and the key “Les gens qui voient de travers pensent que les bancs verts”.

So:

"Lpgxw ndela Jb liitlx yrimi cw ishj hr eq Nicvw"

- - -

"Les gens qui voient de travers pensent que les bancs verts"

= = =

11 15 6

- - -

11 4 18

= = =

0 11 -12

= = =

A l ?

Again, same problem as encryption, we have -12 with no corresponding letter.

We apply modulo 26:

$$ -12 \pmod{26} \equiv -12 + 26 \pmod{26} \equiv 14 \pmod{26} \equiv o $$

We continue the calculation and recover:

"Lpgxw ndela Jb liitlx yrimi cw ishj hr eq Nicvw"

- - - ...

"Les gens qui voient de travers pensent que les bancs verts"

=

"Alors alors On devait faire le tour de la Terre"

Mauborgne in da house

Okay, that’s nice but not very secure. To improve security, let’s listen to General Mauborgne and use a random key (or at least a more random one).

After putting a pseudo-random generator (yes, I know, an incomprehensible term, we’ll come back to it in a future article) on my computer, rolling my head on the keyboard, and letting the cat walk on it for entropy (yes, another incomprehensible term), I got this key:

"snchfzrhqjlnveivpyonrkudrpqtpxptfwrildof"

We’ll encrypt the following message, again by Bigflo&Oli, from “Dommage”:

“Il croisait cette même fille, avec son doux parfum”

Encryption

Here we go again:

"Il croisait cette même fille, avec son doux parfum"

+ +

"snchfzrhqjlnveivpyonrkudrpqtpxptfwrildof"

= =

8 11

+ +

18 13

≡ =

0 mod(26) 24

=

a y …

We get the encrypted message:

"Il croisait cette même fille, avec son doux parfum"

+

"snchfzrhqjlnveivpyonrkudrpqtpxptfwrildof"

=

"ayeythjhycnroxmhtksszvfhrkuvhlcwtqoxlutze"

Decryption

I think by now you get it. One quick example.

Still with modulo(26), decrypt this message:

"irszlg k wiu ladw"

with the key:

"xjgzpyisqbjmplcxhwfpijqrstghighnhylshqza"

And tweet or post the result.

The Che level

So far we’ve been cute, now we tackle the hard stuff.

Che Guevara and Fidel Castro communicated with a cryptographic technique based on the one-time pad. However, there was an extra complexity.

They didn’t use a sequence of numbers like we did, but assigned arbitrary numbers to letters of the alphabet.

Here is their substitution:

A=6, B=38, C=32, D=4, E=8, F=30, G=36, H=34, I=39, J=31, K=78, L=72, M=70, N=76, O=9, P=79, Q=71, R=58, S=2, T=0, U=52, V=50, W=56, X=54, Y=1, Z=59

You might ask “how do you do congruence with this?”, hold on, it’s coming.

Actually, they transmitted numbers, not letters, and the key was itself composed of numbers. They used the most known modulo (the one we unknowingly use), modulo 10.

So, how did they do it? Let’s take an example.

Encryption

We want to encrypt the message:

“J’aime pas les framboises” (“I don’t like raspberries” in french, just an example, I do like raspberries)

We replace the letters with their numbers:

“J A I M E P A S L E S F R A M B O I S E S” becomes “31 6 39 70 8 79 6 2 72 8 2 30 58 6 70 38 9 39 2 8 2”

We encrypt this message with the key:

"4317892093088287991744997181628757157172"

We add the numbers face to face and apply modulo 10. To ease things, they first grouped digits in packets of 5.

So, the message:

"31 6 39 70 8 79 6 2 72 8 2 30 58 6 70 38 9 39 2 8 2"

becomes:

"31639 70879 62728 23058 67038 93928 2"

We add with modulo 10:

"31639 70879 62728 23058 67038 93928 2"

+

"43178 92093 08828 79917 44997 18162 87571 57172"

=

747 (7+3 ≡ 10 mod(10) ≡ 0) …

=

74707 62862 60546 92965 01925 01080 0

The encrypted message is therefore:

"74707 62862 60546 92965 01925 01080 0"

Decryption

We subtract the key from the message using modulo 10:

"74707 62862 60546 92965 01925 01080 0"

-

"43178 92093 08828 79917 44997 18162 87571 57172"

=

"31639 70879 62728 23058 67038 93928 2"

Then, we convert the numbers into their respective letters. And now you’re thinking, “But how can they tell the difference between the letter with value 2 and the letter with value 23? How can they know if it’s a one-digit or two-digit number?”

Very simple, the digits used for one-digit numbers are never used to start a two-digit number.

In fact, if you encounter 3, 7, or 5, you know it’s a two-digit number and so you must group the following digit with it.

In our case, we start with a 3, so we know it’s a two-digit number, we join the 1 to the 3.

"31 639 70879 62728 23058 67038  93928 2"

Next, the 6 is not in (3,7,5), so it’s a one-digit number:

"31 6 39 70879 62728 23058 67038  93928 2"

And so on:

"31 6 39 70 8 79 6 2 72 8 2 30 58 6 70 38 9 39 2 8 2"

Finally, we replace the numbers with their corresponding letters:

"J A I M E P A S L E S F R A M B O I S E S"

There you go!

The Perfection of the Cipher

Claude Shannon (more name-dropping, but this one is the flagship of cryptographers) mathematically proved that the one-time pad is considered cryptographically secure, or unconditionally secure, during World War II. That is to say, the system cannot be cracked even if the adversary has unlimited resources or unlimited time, as long as they do not know the secret key.

Claude Shannon proved that the one-time pad is “perfectly secret,” meaning that if you have a ciphertext encrypted with the one-time pad, there is absolutely no way to guess anything about the original plaintext.

This means it is also resistant to brute-force attacks. If you try every key in the universe, you will get every possible message in the universe, so you have no way to know which one is correct.

I won’t redo the mathematical proofs here. This article is meant for broad, easy-to-understand dissemination.

But to achieve this perfection, there are some challenges.

The Difficulties

The difficulties to overcome to achieve a perfect one-time pad are as follows:

Key Size and Quantity

For every exchange, you need a new key. For every exchange, that is, each person you communicate with and each message you send to them.

So, if you exchange 2 messages with 2 people, that makes 4 keys. 3 messages with 3 people, 9 keys…

Moreover, the key must be longer than the message.

Because if it is shorter and repeated to complete the encryption, you increase the probability of certain letters appearing and you can see patterns emerge giving clues about the message. This problem is similar to the need for randomness in creating keys.

This poses problems for securely storing keys, as the cipher collapses if the key is recovered by the adversary.

The Randomness of the Keys

In French, the letter “e” is the most commonly used letter, then “a”, then “s”…

And patterns like “le”, “les” are frequent. The key, when repeated, increases the chances of encrypting the same patterns in the same way.

The attacker can spot these patterns and deduce when the key is repeated.

Gradually, they can reconstruct the key or even manage without it.

An example:

Imagine the key “Wesh”, the attacker doesn’t know it but has found its length is 4 by analyzing the frequencies of recurring patterns.

They can cut the ciphertext into groups of 4 letters: “Uedl lsll ysfk wqfl myaz kvlp newu zsmj a” and subtract the groups from each other:

Uedl - lsll = X? + “Wesh” - (Y? + “Wesh”) = X? - Y? + “Wesh” - “Wesh”

= X? - Y?

The key disappears and we have two texts which are no longer random subtracted from each other. We analyze the letter frequency and form hypotheses. The letters resulting from the subtraction, the most frequent ones, are more likely to correspond to an “e” in one of their base letters, etc.

Step by step, by multiplying hypotheses, possibilities reduce and one can find the key or decrypt messages without it.

In conclusion, if you do a frequency analysis running your computer for three days, you will get the following message:

“Y’a le pote condamné qui sortira en douce,” always Bigflo & Oli, excerpt from “Comme d’hab”

This seems obscure, incomprehensible. Don’t worry, a future article will clarify all this with examples.

So, to ensure every letter has the same probability of appearing, you need randomly generated keys. But that is very hard to do. We can make pseudo-randomness but as its name indicates it’s not truly random, the probabilities of letter appearances are not exactly equal.

Key Transmission

Key transmission is not a problem for very small keys (an article will soon explain how).

But it becomes very complicated once keys must exceed a certain size (the current minimum size is 4096 bits or, in some crazy cases, 8192 bits, that is 1024 bytes, 1 KB for IT folks).

So, often, keys are exchanged physically beforehand. This is not practical and becomes very complicated when sending many messages to many interlocutors.

Uniqueness of Use

We return to the same problem as with key size and randomness.

If you reuse a key, you lose the randomness and the ciphertext becomes again vulnerable to frequency analysis.

The problem is even more serious because uniqueness of use implies destroying the key after use and there is no easy way to ensure that your correspondents do not keep the keys in a corner at risk of being stolen later (in 5 years, 10 years…) or reuse them for other messages on their side.

Message Poisoning

The last possible problem is key poisoning but this can be easily fixed (at least temporarily) in IT.

Since the one-time pad is immune to brute force (all keys in the universe produce all messages in the universe), an attacker can mess with the ciphertext to give it a different meaning.

They can also modify the key without necessarily knowing it and thus alter the plaintext.

Example:

Let’s take our famous message again:

“J’aime pas les framboises”

which we encrypt with the key:

“nzhtjplgpggsefirksxohsbyhkazsb”

using modulo 26 (thus without the apostrophe), we get:

“N zrry rrg yef eytvqzohky”

The attacker only knows this secret message but can modify the encryption key. They replace the key by:

“ezjfuqjcltbmtcvihgdtgefirksxohsbyhkazsb”

And that gives the following plaintext:

“Jaimebienlesfraisiers” ("I like strawberry bushes" in french)

So, not at all the original message.

To prevent this, in IT, you can sign and hash the message and keys so a user can verify that the message they receive was indeed created by the correct key and that the key they use is still the correct one.

We will come back to how signing and hashing work in future articles.

I think we have already well exceeded 2000 words. Let’s stop here.