When I heard about Gawker getting compromised I knew it was not going to be pretty. Particularly with regards to their password database. Once again, the ugly warts of shared secret authentication systems are brought to the headlines. We got our hands on a copy of the password database. For reasons only Gawker administration know at this point the database only has traditional DES crypt style hashes (yes, that DES). Ideally every password for a web application user is stored using a random salt per password (at least 4 bytes for good measure) and a safe hash algorithm, like SHA1. That’s it. (See the end of this post for a better recommendation that should be considered the end goal) Storing passwords securely is not difficult, you just have to know what to do and believe it is important enough to do it. What you are about to read would not be possible if this simple guideline was followed.
There are a few lessons we can learn from this that I think are instructive for anyone that has to store user passwords and authenticate users:
- The top 100 passwords you NEVER want to allow your users to select.
- How to properly store passwords and how to construct a password policy.
First, the technical details of how we proceeded: I have about 14-16 cores I can dedicate to password cracking spread across about six machines. They are all fairly beefy, but not crazy. The challenge this presents is parallellzing the whole process across the network. This is where John the Ripper with the MPI patchcomes in. MPI allows John to distribute the work to a large number of disparate systems, which has advantages in that any old hardware can be plugged in for the task. Each cracking node needs to run a daemon or agent that communicates with one central node that coordinates the cracking efforts.
Setting all of this up requires a bit of effort, but is not the most difficult task. Once a john –test kicks off (as shown below) properly we are off to the races with our 14 cracking nodes.
mpiexec -machinefile machines -n 14 ./john --test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 33099K c/s real, 35460K c/s virtual Only one salt: 28179K c/s real, 30031K c/s virtual
Password cracking of this nature is an embarrassingly parallel problem to solve. My crack nodes were sending a few hundred bytes per second to each other to handle the overhead of coordinating the cracking efforts. In a couple hours of cracking we were able to crack about 160,000 passwords:
166166 password hashes cracked, 581986 left
This means approximately 22% of the passwords were recovered on short order. All of these users that reuse passwords are now at severe risk. Additionally, time and time again, the #1 worst password is “123456″ by a wide margin. Taking out my copy of the book “Perfect Passwords” and pulling up their top 500 worst passwords of all times shows the following passwords as the 11 worst:
Perfect Passwords Top 11
123456 password 12345678 1234 pussy 12345 dragon qwerty 696969 mustang letmein
OK, so I cheated that is actually 11 passwords. Now, here are the top 11 from the Gawker dump (number on the left is number of occurences):
Gawker Top 11
4162 123456 3332 password 1444 12345678 861 lifehack 765 qwerty 529 abc123 503 12345 471 monkey 439 111111 410 consumer 391 letmein
Notice that number 11 is exactly the same? And that number 1 and 2 and 3 are exactly the same? This keeps happening….. over and over. :( The old saying, “creatures of habit” comes to mind. We are letting our users down by letting them pick these passwords. We are also letting our users down by not protecting their secrets better.
I could dig into the statistics and interesting features of these passwords (and there is a lot to dig into here), but the real lesson here is that you should never store passwords in a recoverable format like this. If you have a number of passwords like this you need to run, not walk, to the design board and find a way to fix it ASAP. Also, consider disallowing users from picking any password in the gawker password database. Every last one you can recover. Don’t let your users use a single one of those!
A few updates: WSJ has good coverage of this topic as well.
Also, our suggestion at the top for password storage is a bare minimum recommendation. You should be using something like bcrypt that is several orders of magnitude more difficult to crack. Another option is to roll your own, but you really should not. The key idea is you need something that takes a lot of time (for a computer). 100-300 milliseconds on your hardware, but this depends on the time vs. performance trade offs you can tolerate. Even 10ms is better than a few microseconds. There are several schemes to do this, they generally repeat multiple rounds of their core algorithm (something that really mixes the data up). The key idea is to make password recovery very slow by forcing an attacker to perform some expensive operation a variable number of times for just one candidate password attempt. Read this article on Unix crypt and then read this article to become even better informed.
(Link to [download id="267" format="2" autop="false"] for the curious)
That is it for now, thanks to fellow Intrepidus consultant Jason Ross for doing some of this footwork!
Published date:  14 December 2010
Written by:  jeremy.allen