Cryptopals: Set 1

As many before me I will have a go at the Cryptopals challenges. They are a great opportunity to learn a new programming language (Typescript), have some fun solving challenges and bring home the message that rolling your own crypto is most of the time a bad idea.

This is my first attempt at working with both Node.js and Typescript, if the code is too unsightly please let me know. All that being said lets get started!

Challenge 1: Reading and Printing Buffers

We are asked to read the following data encoded as a hex string

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

and print it back as a base64 encoded string.

Special emphasis is made in the fact that all actual data manipulation should be done with a byte representation. In our case, we will use Node's Buffer class which is exactly that, an array of bytes.

The code to achieve this and many other small tasks will be kept in the following gist. Any function that is not particularly relevant for a specific challenge or that is pulled out from thin air will probably live in the gist.

Finally, here is the code



That wasn't too hard. Next!

Challenge 2: Fixed XOR

Now we need to XOR two streams of bytes against each other. Consider the second stream to be our key, if we can make it completely random then we have build the definitive unbreakable crypto. The only problem is that you now have to safely transmit the key to the other party, on top of that, the key is as big as the message so you find yourself in a bit of a situation...

One important and very useful property of the XOR gate is that if we XOR a single byte twice against the same byte we recover the original one.




Challenge 3: Single-byte XOR cipher

The previous challenge showed how we can encrypt a stream of data. However, we had the problem that the key was as long as the original message. This is clearly not very practical, so how about if we use a single byte as key and encrypt the message by XORing all the message against it. It turns out that's not a great idea. In this challenge we are asked to break an encrypted message using this technique.

The first thing we need is a way to encrypt/decrypt messages and that is straightforward enough. Here is the code:



Since we only have 256 possible keys we could try all of them and inspect the output until we find a piece of English text. But what if the key space had 2 bytes instead? I'm not going over the 65536 possibilities. There is a better way. We will test how English-like the decrypted message with each possible key is by creating a scoring function. The one(s) closest to English will be our candidate solutions. The scoring function will look at the frequency of each byte (ASCII-encoded) of the decoded messages and compare it to that of English. With that in mind we first make a function that returns a dictionary like object with the frequency of each letter. We will limit ourselves to lowercase letters.



Then we need a way to measure the frequency of each character in our candidates solutions and a "measure" of distance between the frequencies of characters in English and those of the candidate. Here I use the cosine similarity which leaves a lot to be desired as far as distance measures between discrete probability distributions go but it's simple and does the job well enough here.



To arrive at the solution we need to put everything together. Loop over all possible solution, store them with their score on an array and sort to find the best one.



Challenge 4: Detect single-byte character XOR

We can use brute-force and some statistics to break the single-byte XOR cipher, but what if we don't know that this is the technique was used in the first place? In this challenge we will discover which of a list of ciphertexts has been encrypted in this way.

The simplest way to approach this is by looking for the best possible solution using the strategy from the previous challenge for every ciphertext. Then we just need to rank them all.

If you look closely, all the solution to this challenge is adding is an extra loop to go through all the ciphertexts.



Challenge 5: Implement repeating-key XOR (a.k.a. the Vignere Cipher)

Clearly, using a single byte to encrypt our text is not a great idea because we are too susceptible to a brute-force attack. There is an obvious solution! Use more than one byte for the key and then apply it in a rolling fashion over the plain text. Now, instead of having to try \(256\) keys you need to try \(256^{\textrm{#bytes}}\).

To get this to work you only have to go around the key using the remainder operator when selecting what byte to encrypt with. You are now untouchable.



Challenge 6: Break repeating-key XOR

Unfortunately, unless you are living int the mid 19th century the Vignere cipher is completely insecure :(. We have several simple methods to break it if we are given enough ciphertext.

First we need to find the Hadamard distance between two bytes and then extend that to two buffers of the same but arbitrary length. The Hadamard distance between two bytes is equal to the number of bits that need to get flipped to turn when into the other.

In my solution, I first apply a negated exnor gate which returns 0 if two bits are the same and 1 if they are different. Then all you need to do is count how many 1 bits are there. That is accomplished by right shifting and comparing to the result of dividing by 2. This works in the following way. Let's say you have 101 which is 5 in decimal. Right shifting transforms it into 010 (2) while dividing results in 2.5. Since 2.5 is bigger than 2 we know we right shifted out a 1. For a number with a bit representation ending in 0, i.e. an even number, the result of the shift is identical to dividing.



We now have the ability to test how similar two ciphertext blocks are. The next important realization is that we should expect similar bits patterns to emerge in the ciphertext if the key length matches or is a multiple of the buffer sizes being compared. Wikipedia has a classic example of what happens when you use ECB on an image.


On the image above the uniform white background results on identical blocks which keep repeating periodically.

The code below will check for the Hadamard distance of two contiguous blocks of ciphertext and store the result together with the block size. It then returns the block length that resulted in the smallest average distance per byte.



Now we know the keylength, or at least we have a small number of candidates. To finish it off, we slice the encrypted buffer as shown below. Each color represents the byte of the key with which it has been encrypted.


The same technique we used in challenge 3 will take care of each colored buffer producing the encryption key.



Challenge 7: AES in ECB mode

In the second to last challenge of set 1 we need to decrypt a piece of text that has been encrypted with AES ECB mode. The code I've shown thus far, despite presenting the core ideas behind it, don't reproduce all the details of the standard. To simplify all a bit I will just use the aes-js library for now.



Challenge 8: Detect AES in ECB mode

The last challenge of set 1! How would you try to detect if a text has been encrypted in ECB mode? We now know that ECB is deterministic and operates on individual blocks of plaintext, if we encrypt two identical blocks of text they will produce identical blocks of ciphertext. In the simplest scenario, where our plaintext is just the same character repeating all the time , e.g. aaaaaaaa, we would see the cipher repeat every 16 blocks. This is precisely, what was happening to Tux's image above.

A good encryption algorithm should spit out something that has no apparent patterns, just noise. Assuming that the output is completely random the probability of two separate 16 bytes being identical is \(\frac{1}{2^{128}}\). We can then apply the birthday party problem problem solution to figure out how many repeated blocks we should have on average for a given length of ciphertext. If the actual number of repeated blocks is significantly larger than what we calculated then we will claim that it has been encoded using ECB mode.

All that being said, given the extremely low collision probabilities for a realistic ciphertext length we might just as well only look for repeated blocks.