TLS and RC4 (March 2013)

You have probably read (e.g. at slashdot and everywhere else) about this month’s SSL/TLS weakness, this time around related to RC4.

The authors have a good page at http://www.isg.rhul.ac.uk/tls/ which is a good starting point. The slide deck at http://cr.yp.to/talks/2013.03.12/slides.pdf has nice raw data and graphs.

The attack

RC4 is a stream cipher (not a block cipher like DES or AES). In short, RC4 is essentially a pseudo-random number generator. To encrypt, the resulting stream of bits are combined (XOR’d) with the plaintext to generate the ciphertext. Ideally, if the random bits were truly random there would not be any patterns or biases in the output. For every byte, each possible value (0-255) should be equally likely.

This is not quite true for RC4. Also, that is not news, it had been shown earlier. But this latest work has mapped out in detail the biases in all first 256 bytes of the stream output (the slide deck has graphs for all of those). Given that info, it becomes statistically possible to recover plaintext from the first 256 bytes. The probability of success depends on how much ciphertext has been captured. See slides 305 to 313 of the slide deck for success probabilities. To summarize, with 224 ciphertexts we can already recover some bytes. With 232 ciphertexts we can recover just about all of the plaintext with near certainty.

This attack is on RC4 output, not specific to TLS protocol. TLS is the victim since RC4 is used in several of the most commonly used TLS ciphersuites. Recall that as a mitigation to some of the other recent TLS attacks there have been recommendation to switch to RC4 as a way to avoid the CBC ciphersuites! Now the pendulum swings the other way.

In practice

As with most attacks, the theoretical result is not necessarily so easily applied in practice.

This attack has a few things going for it though. It is not a timing attack, so it does not need to rely on carefully measuring execution time biases over a noisy network. Also, the attacker only needs to passively observe (and record) the encrypted exchanges between the two parties. The attacker does not need to be able to intercept or modify the communication.

The attacker still does need to capture a fair amount of ciphertext. On the surface, the numbers are not very big. 224 is less than 17 million requests (although that only gets you some bytes of the plaintext). For certainty, 232 is over 4 billion requests. Harder, but not an intractable problem. And as the saying goes, attacks only ever get better. It is conceivable some future work will identify even more efficient ways of correlating the content.

For the attack to work, each of those requests does need to have the same plaintext content in the same location (or, at least, for those bytes we care to recover). Assuming HTTP plaintext, the protocol format will tend to produce mostly constant content which works to the benefit of the attacker.

For example, take the following request (edited, but a real capture from my gmail connection). The cookie value starts at about byte 60 and if the browser generated it in that location once, every repeat of the same request is likely to have the same content in the same place.

GET /mail/?tab=om HTTP/1.1
Host: mail.google.com
Cookie: GX=ahc8aihe3aemahleo5zod8vooxeehahjaedufaeyohk4saif8cachoeph...
...

So now the attacker just needs to trick the victim’s browser into repeating those requests and sit back and record the traffic. Tricking the browser into making the requests is not that hard, plenty of attack channels available on that front. Successfully making enough of them before someone/something notices (e.g. gmail monitoring… although your local credit union probably won’t notice as fast) or the content changes (e.g. cookie expires) will be trickier but not impossible.

So it seems to me this attack is relatively hard to pull off in practice right now, but not impossible. Repeat the attack enough time on enough people and a few will succeed.

Some prevention

  • Change cookie/token values (whatever content is valuable to attack) often enough that capturing enough ciphertext within that time window becomes that much harder or even impossible.
  • Making header order/length unpredictable would help (same bytes won’t be in the same locations) but needs to be done by browsers (for the most likely attack scenario).
  • Avoid RC4 ciphersuites.
  • AES-GCM ciphersuites have been mentioned as an alternative to avoid both RC4 and CBC modes, but support for those is not quite there yet and there is a performance penalty.

Additional reading…