Enough With the Salts: Updates on Secure Password Schemes

I’ve been spending some time recently combing through the old Matasano Blog Catacombs and blowing the dust off years old tomes. It’s been amazing to see how much information from years ago is still relevant today. Case in point: “Enough With the Rainbow Tables: What You Need to Know About Secure Password Schemes” by Thomas Ptacek. In that post, Tom discusses the fascination with Rainbow Tables, and gives some solid guidelines on secure password storage. He goes on to explain why the focus on rainbow tables is flawed and risks missing the true threat. If you haven’t read it, go read it now. I’ll wait.

Back? Okay, good. Now I’d like to expand on what’s changed since that post, and why its message is still relevant today.


Recap

From Tom’s post, you’ll recall that rainbow tables really only work against poorly designed password storage mechanisms. We’ve had the technology to thwart them since 1976. So why talk about them in a 2007 blog post?

Well, this was about the time that it became cost practical (due to rapidly increasing hard drive sizes and decreased prices) for your average person to store full rainbow tables for LM Hashes. With less than 50 Gigabytes of space, you could store a table that contained every possible LM hash, allowing you to look up any password, given its LM hash. These weaknesses were widely known in 2007 and prior, but LM hashes were often used by default for backwards-compatibility purposes. Only in January of 2007, with the release of Windows Vista, did we start to see a shift away from LM hashes on Windows systems and domains, and with it a declining perception of rainbow tables as some sort of mystical force. Today, the use of LM hashes is almost unheard of, and rainbow tables are largely historic artifacts.


But this is 2015! So what has changed?

The security community is somewhat less obsessed now with rainbow tables, and rightly so since they’ve become less relevant. Still, I see the mantra ‘salt your password hashes!’ repeated over and over, when its only real value is against rainbow tables - a threat that’s largely irrelevant in practice.

Password security, however, still remains an area full of poor advice, and worse implementations. In Tom’s post, he cites an unnamed blog describing a ‘secure’ password storage scheme as:

hash = md5('deliciously-salty-' + password)

This is, of course, terrible. But it’s 2015, and things have progressed somewhat. Right? Well, just recently I was reviewing a web application platform and found a password storage scheme which can be distilled down to:

hash = sha2(per_user_salt + sitewide_salt + password)

So… SHA2 instead of MD5, which seems on the surface to be better, but the entire construction is just poorly formed. The multiple salts don’t really do much since they’re all likely known to an attacker or can be quickly calculated given knowledge of the other two components. The differences between SHA2 and MD5 here are largely irrelevant, since both can be calculated quickly on modern hardware. So maybe not so much has changed after all.

Furthermore, this was the best storage mechanism of the options afforded by this system. Other choices included unsalted SHA2, salted and unsalted MD5, and plain text. Yes, plain text. In 2015. In a framework meant for building other applications upon. While the developers of these applications are smart to avoid implementing their own scheme, unfortunately the platform doesn’t appear to provide the sort of protection they may be expecting.

Slower == Better

The primary point of Tom’s 2007 post, and one which is as true today as it was then, is that you want password digest computation to be slow. You want it to take significant resources for an attacker trying tens of millions of possibilities (or more), while still being fast enough for your application to authenticate users without creating usability issues. ‘Fast enough for your application’ might be as long as a second or more - people are relatively slow compared to computers, and logins are relatively infrequent occurrences. Good password storage practices really haven’t fundamentally changed very much, but they’re still inconsistently applied. Tom’s post recommends bcrypt. While I’d probably choose scrypt today, for reasons I’ll cover below, bcrypt is still holding up well and is a reasonable choice. Following Tom’s advice will still yield a solid system.

Pass the salt

'salt' isn't the cure all that some seem to believe

Storing passwords as unsalted hashes is a grievous mistake of course, and breaches disclosing poorly stored passwords are still too common. But ‘salt’ isn’t the cure all that some seem to believe. (See what I did there? Salt? Cure? Don’t forget to tip your wait staff!) With nearly every report of a breach disclosing password digests, there’s some mention of whether they were hashed and salted, or just hashed. Talk of ‘salted hashes’ has become a common way to put users’ minds at ease when a breach occurs. The thing is, salting doesn’t matter nearly as much as many reports lead you to believe. ‘Salt’ is the bare minimum a reasonable password storage scheme needs, but it does not necessarily make a secure password storage scheme. Ideally it’s not something a password storage scheme should require its implementor to think about. Salted password hashes are attacked, recovering the plain text passwords en masse every day. The value of salts is limited to thwarting attacks leveraged pre calculated lookup tables such as rainbow tables. This is far from the largest threat facing disclosed password digests, so to focus on it as a primary concern is failing to see the forest for the trees.

The reason is simple, and the same as Tom described in his post - if your password digest can be calculated quickly, an attacker can try a ridiculously large number of hashes quickly. This is even more true today than it was when Tom’s post was penned (or keyboarded as the case may be). The increased prevalence of multi-core machines and GPU password cracking has made this a much more prominent threat than rainbow tables, and salts are entirely insufficient protection against it on their own.

For an example of this in action, we can look to Jeremi Gosney’s 25 GPU cluster. Granted, this is an exceptional system, but it’s within the reach of anyone with a few dollars, and we’re not talking three letter agency or globalomnimegacorp dollars.

From Jeremi’s data, here’s how many digests of various types his rig can calculate per second:

 Tries/sec
NTLM 350,000,000,000
MD5 180,000,000,000
SHA1 63,000,000,000
SHA512Crypt 364,000
Bcrypt 71,000


The more tries you allow your attacker, the faster they can guess a password value successfully. QED.


Additional choices

A few more password digest schemes are worth a mention. We want a password storage mechanism that slows down brute force and wordlist attacks well enough to thwart people with systems like the one described above, while still operating quickly enough for our needs. For the sake of completeness, I want to touch on two of the most common that weren’t mentioned in Tom’s post. To be fair, one of them didn’t even exist then.

PBKDF2

Password Based Key Derivation Function v2 (PBKDF2) is a key-stretching scheme, defined within RFC2898which dates back to the year 2000. PBKDF2 allows for multiple iterations over an HMAC or other pseudorandom function, while mixing in salt over a configurable number of iterations. This allows it to be made as slow as desired, and made slower over time to account for advances in hardware. The National Institute of Standards and Technology (NIST) also approves PBKDF2 provided an approved hash function is used as the pseudo-random function (PRF). This approved status is an important criteria in many environments, and may alone be enough to drive the decision.

PBKDF2 can be somewhat more difficult to properly implement than other alternatives due to the API complexity. Generation and storage of salts is left to the implementor, who must also determine an appropriate PRF to use. While this flexibility can be valuable, it also offers the opportunity to make poor choices, weakening the overall scheme.

SCrypt

Scrypt, developed by Colin Percival in 2009, is the relative new kid on the block. This may make some people immediately uncomfortable, but so far, its stood up well to scrutiny. While bcrypt has been best-practice for a long while now, scrypt seeks to address a key shortcoming shared by both PBKDF2 and bcrypt.

bcrypt and PBKDF2 aim to make calculation computationally expensive, which in turn makes them slow. However, both are incredibly parallel - with twice the hardware, you crack the passwords nearly twice as fast.scrypt tries to counter that through ‘memory hardness’.

It’s complicated and mathy, but the unique idea that scrypt brings to the table is that it tries to use as much memory as it can while completing its calculation within the same running time. This is the exact opposite of what most people want their code to do, but makes for a strong algorithm which requires a lot of storage, and resists parallelization making it much more resistant to hardware attacks whether from ASICs, FPGAs, or GPGPUs.

The details of how it operates really aren’t something the consuming application’s implementer needs to be worried about, but it uses PBKDF2 internally to create a bit stream which is then fed through an algorithm called ROMix. ROMix is the part responsible for the memory constraints. It creates a table with pseudorandom values, and swaps characters from the input with those in the table in a psuedorandom order. Since each swap can impact subsequent swaps, this means that an implementation must perform these in order, and must build the table in memory. These two criteria help prevent parallel implementations, and ensure a much more significant memory requirement than PBKDF2 by itself. The output of ROMix is then fed through PBKDF2 again, and our final digest is the output.

Unlike PBKDF2, most implementations of scrypt can automatically and securely generate salts when creating digests. That’s one less detail for the implementer to get right, definitely a win. While work factor defaults are generally reasonable across various implementations, the presence of three distinct settings increases API complexity somewhat in comparison to bcrypt’s single ‘cost’ or PBKDF2’s ‘iterations’ settings.

Work factor

Each of these algorithms requires setting of an appropriate work factor. This should be tuned to make digest computation as difficult as possible while still performing responsively enough for the application. For concurrent user logins, you may need <0.5ms response times on whatever hardware is performing the validation. For something like single user disk or file encryption, a few seconds might be fine. As a result, the specific work factor values appropriate for your application are dependent on the hardware running your application and the use case of the application itself. Thomas Pornin gave some great guidance on determining these values in a post to stackexchange

A reasonable starting point for work factors is something like:

  • scrypt: N=2^14, r=8, p=1
  • bcrypt: cost=11
  • PBKDF2 with SHA256: iterations=86,000

Gazing into the crystal ball

The future will likely bring us more large scale password compromises. Projects like the Password Hashing Competition may yield more options in strong password storage, and move the state of the art forward. Increased use of two-factor authentication, and a focus on choosing unique passwords will hopefully limit the impacts of password reuse. Hopefully, we eventually find a successor to the venerable password. For now, it’s up to us as application developers and security professionals to at least ensure we’re storing credentials in a reasonably secure fashion.


TL;DR

This means you'll want to use scrypt, bcrypt, or PBKDF2 (in my order of preference) with an appropriate work factor.

 

When it comes to password storage, you’re much better off using a well tested and reviewed system than writing your own. You’ll want that to be something purpose built, rather than relying on fast cryptographic hashes which are especially susceptible to fast guessing and hardware optimization. This means you’ll want to use scrypt, bcrypt, or PBKDF2 (in my order of preference) with an appropriate work factor. If you’re stuck deciding between scrypt and bcrypt, you can pretty much flip a coin and end up with something reasonable.

One hopes to never be victim of a breach that discloses their authentication database to an attacker, but it’s better to plan for the worst than to be caught in a bad situation should that occur. Think of it as an insurance plan, and plan for the worst case.

 

Published date:  26 March 2015

Written by:  Jeff Jarmoc