Sunday, December 14, 2014

Padding Oracles: The Poo in Poodle

Buckle up, 'cause this involves cryptography and it's going to get a bit technical. That said, I'll try to keep it at a high level (nothing like S-boxes and avalanching); as long as you understand exclusive-or (xor) you'll have no trouble following along. To (hopefully) make it easier, I'll put major points in sections with headings.

First of all, a padding oracle has nothing to do with putting sumo suits on databases. They're a vulnerability in cryptographic systems arising from encryption algorithms needing data (we'll call it plaintext) to come in blocks of a fixed size (inventively called the blocksize). If your encryption algorithm works on blocks of size 4, but you've only got 3 bytes of data to send it, you're going to need a way to get it up to 4 bytes. The way that that's done is called padding.

Padding

We need a way to make our messages a multiple of our encryption algorithm's blocksize while still being able to tell what the plaintext was when we decrypt it (i.e. if you pad with random bytes, how will you know what's padding vs plaintext?). Fortunately, there's already a handy standard defined for us (it's PKCS #7 if you're interested). The way it works is that you encrypt each block as normal until you get to the last one. For the last block, you calculate the padding as the blocksize - the length of the last block. E.g. if your blocksize is 4 and your last block is 3 bytes long, your padding is 1, so you append a 1 to the last block; if your last block was of length 2, you'd append 2 bytes with the value of 2. More concretely, with a blocksize of 4 bytes, the plaintext {1,2,3} becomes {1,2,3,1} and {1,3} would become {1,3,2,2}. Now, you may be wondering what would happen if your plaintext was {1,2,3,1}; would the last 1 be discarded?  Nope, we always add padding, so {1,2,3,1} would become {1,2,3,1,4,4,4,4} (it's not perfect since it bloats the data, but it has the nice effect that it means we can decrypt the data).

Now we've covered the case where we don't have enough data to fill a block for encryption, and we've seen that you may have more than one block of data to encrypt, and (most) encryption algorithms work on exactly one block of data, so we need to figure out what to do about that. Turning a block cipher (encryption algorithms that work on data a block at a time; i.e. most of them) into a stream cipher (one that works on an arbitrary amount of data) is an important area of research and one that's subtly easy to mess up. So, let's dig into some of those methods (which are frequently called "modes" in crypto-speak).

From Block to Stream

The most obvious approach to encrypting a stream of data longer than one block is to just keep applying the algorithm. This particular way of doing things is called Electronic Code Book (ECB), and it has one really nasty problem: using ECB means that any particular byte will always encrypt to the same value for a given key. A demonstration may help demonstrate why that's so bad.

Starting with this plaintext image:

I applied a Caesar cipher (really simple algorithm that does a constant mapping from one byte to another; you'd never use this in practice, but it exacerbates the effects of ECB so they're easier to see), chaining them with ECB.






The result was this image:

That's obviously not doing a great job of hiding what the plaintext was. In fact, with this particular encryption algorithm, it's no more secure than a newspaper puzzle.

It's pretty clear that we're going to need something better, something that muddles the plaintext of each block before it goes into the encryption algorithm. 

Our next contender mixes up each block of plaintext by applying xor to it and the result of encrypting the previous block and is called (again with the imaginative names) Cipher Block Chaining (or CBC for short; we LOVE acronyms!)


To see how CBC fares, I put it through the same encryption algorithm used for ECB and got this image (spoiler alert: you can still kind of see the shape; that's because the encryption has a blocksize of 1):

Even with such a painfully simple encryption algorithm, CBC isn't a complete failure. You actually have to look at the image to see the schooner Pac Man and you can't readily tell that there are two of them overlapping. 

The only obvious problem with CBC is what do you do about the first block? You could just work from its plaintext, but then the first block would have the same issues as ECB (and the first block in any data exchange tends to have a lot of data that's the same every time; e.g. GET / HTTP/1.1\r\n)



To avoid having our first block basically be in ECB mode, we add a block of random data, called an Initialization Vector (IV; goooooooo acronyms!) Using an IV, CBC mode looks pretty secure and it was used for a very long time. Unfortunately, appearances can be deceiving and there's a devastating vulnerability hidden there. To see it, let's make a trip to Delphi to see the padding oracle.

The Padding Oracle (of CBC Mode)

Before getting into the details of the padding oracle, a quick refresher on the only thing you'll need to know about xor -- xor undoes itself. By that, I mean that if you have a xor b = c, then c xor a = b and c xor b = a. It is to itself as subtraction is to addition. 

The base assumptions for the padding oracle are that you have the ciphertext (one or more blocks of it) and if one end of the communications channel receives data that it can't decrypt properly, it'll respond with an error message.  One cause that it may not decrypt properly is that the padding is wrong. Looking back, we see two things: 1) we know what the last byte of a block has to be if it's the last block of a stream and 2) the plaintext will be modified by xor'ing it against the prior block's ciphertext prior to encrypting (which means that the decryption will yield the xor of the plaintext and the previous block's ciphertext).

Given those assumptions, here's how the padding oracle works. For the sake of simplicity, I'll refer to the end of the communications channel to which we're sending data to be decrypted as the server.
 
  1. Take one block of ciphertext out of the stream (doesn't really matter which one)
  2. Send that block to the server with a custom IV
  3. If you get an error that decryption failed, change the last byte of the IV and try again; repeat this step until decryption succeeds and you'll know that the last byte of the plaintext xor'ed with your IV is one of two values -- either 1 or 2 if the penultimate byte of the xor'ed plaintext is 2 (or 3 if the 2 penultimate bytes are 3, etc)
  4. At this point, you suspect that the plaintext xor the IV that you're sending ends in 1; so let's say that the last byte of the original plaintext is P and the last byte of the IV is I and the last byte of the xor'ed plaintext is Px. We know that P xor I = Px and we know that Px is probably 1, so we can reorder that so that Px xor I = P and now we can calculate P. The oracle has started speaking and is pronouncing the doom of your message's secrecy.
  5. Now that we know what the last byte of the actual plaintext is (what we found in 4), we can change our IV so that the last byte will decrypt to 2 and repeat step 3 modifying the next-to-late byte until the server doesn't report a decryption error. 
  6. If you get through all 256 possible values for that byte without a success, then you know that you did not find the actual value of the last byte, so you drop back to step 3 and continue until (with the IV that you were using in 3, without 4's changes) it successfully decrypts again. 
  7. Once you get a success, you've definitely found the actual value of the last byte. You can use that and the value that you got the first time to figure out what value you had generated the first time and then you'll know that the plaintext has a consecutive run of one less than that many of that number; i.e. if you find out that you had originally gotten a 3, then you know that the two penultimate bytes of the plaintext are 3. 
  8. From steps 3-7, you now know at least the last two bytes of plaintext and you can keep applying step 3-7 moving one byte towards the front each time until you have the entire plaintext and then just keep repeating those steps for each block

Silencing the Oracle

Things look bleak; the Oracle's pronouncing our doom and Cassandra's gone hoarse from warning us. Is there anything we can do? Fortunately, yes, there is! Newer versions of TLS (Transport Layer Security, the successor to SSL) also provide other modes for block ciphers like GCM (Galois/Counter Mode) that don't give an attacker that level of control over the IV that gets fed into the xor operation. 

To make sure that your traffic isn't exposed, the most straightforward protection is to completely disable SSLv3 (and earlier, obviously; SSLv2 was broken almost 20 years ago!) and only use newer versions of TLS (v 1.2 if you can, because that's where AES-GCM was introduced as an available algorithm).

Thursday, June 26, 2014

Storing Strings, Succinctly

Alas, I lack an alliterative algorithm for squashing strings into less space, but hopefully I can make it up to you by sharing an improvement on a simple way of storing strings. To make it easier to follow, I'll build up to the full algorithm, step-by-step. 

Step 1: Reduce Randomness

Compression algorithms in general work better if you can reduce the randomness in your data. That may seem impossible, after all, the information is what the information is, but it is actually possible. The way that we're going to reduce randomness is by sorting our strings (e.g. putting them in alphabetic order). As an example, if we have the strings {cat, carrot, car}, we can store them as [car, carrot, cat]. That may not seem like it helps, but it does.

Step 2: Remove Redundancy

Looking at our list above ([car, carrot, cat]), we see that car is a prefix of carrot and carrot and cat share the prefix "ca". We can use that information to cut down on what we have to store, by recording the length of the shared prefix and then just writing the suffix.  What we end up with is [c,a,r,3,r,o,t,2,t], where 3 and 2 are the shared prefix lengths. Now, you may be asking how we can tell when one word ends and the next begins, and if you are, good, because that's a great question! One approach, that will probably be warmly embraced by C programmers, is to null terminate the strings, i.e. end them with a 0 (that's a zero, not an O). That approach works pretty well and leaves us with [c,a,r,0,3,r,o,t,0,2,t]

Step 3: Look at Lengths

English words, by and large, aren't terribly long. With one byte, like the one that we used to store the length of the shared prefix or the one with which we terminate strings, we can store up to the number 255. Most words aren't nearly that long and, unless you're dealing with protein names, you're not likely to encounter any of the few that are longer than that. To take advantage of that, instead of null-terminating, let's store the length of the prefix and the length of the suffix. Since we're storing lengths, we can also take advantage of words being short by storing the prefix and suffix lengths in one byte, together. If we devote 4 bits to the length of the prefix, we can avoid storing up to 15 bytes of prefix; if the shared prefix is longer than that, we can just duplicate the extra (4 bits isn't a magic number; I arrived at it through trial and error using a list of 99,171 English words, my initial guess at a good number was 3 bits). That leaves us with another 4 bits to store the remainder and a bit of a quandary if the remaining suffix is more than 16 bytes long (it would be 15 but, since the suffix must be at least 1 byte, we trade the ability to store 0 to make the remainder 1 smaller). Thankfully, another compression trick called (among several other things) variable byte encoding, comes to our rescue.

The standard way that variable byte encoding works is to use one bit to encode if there are more bytes in the number. We're going to deviate from that just a bit by instead taking up to the entire 4 bits to store the length. If the length is larger than, or equal to, 15 we store 1's in all 4 bits, subtract 15 from the remainder, and then use standard variable byte encoding to store the rest (even if that's a byte of all zeros). 

Results

So, what has all this bit-twiddling bought us? Encoding 99,171 words, which took a total of 938,848 bytes got it down to 323,370 bytes. Now, that's still not tiny, but it's about a 61% reduction in size. Gzip gets it down to about 256,000 bytes, which is considerably smaller, but this also has the benefit that it should be fairly easy to implement it such that it can encode & decode faster than gzip and permit more nearly random access to elements.

Hey, what do you know, I did manage to describe the algorithm using alliteration! Bonus! I hope this has been at least slightly entertaining, I hope it has been useful, and I hope it has been informative.