Sunday, March 28, 2010

How to safely handle user passwords

When you create a website or other service you take on a responsibility to properly store user's credentials; someone who gets access to a user's account can easily see personal information, potentially even financial information. Even if you think you don't have anything of importance, 73% of users[1] have the same password for everything including online banking accounts.

When dealing with passwords, you should generally assume that the attacker has a copy of your database. Many websites still have SQL injection attacks[2]. Most websites are vulnerable to XSS/CSRF attacks.[3]. For this reason it is a bad idea to store passwords as clear text; instead one should use a hash[4].

Even with properly stored passwords users still insist on choosing low security passwords.[5]. As a result malicious hackers have compiled lists of commonly used passwords. In order to mitigate these brute force attacks a lot of techniques have been developed. For example, some sites require you to enter a CAPTCHA, but most of these could easily be defeated[6]. Another simple technique is to limit login failures (once an incorrect password has been entered don't allow another attempt for X period of time). I prefer an exponential delay where each failed attempt causes the delay the become longer than the previous one. There are other possibilities that could take into account more sophisticated criteria such as the geographical region and time of day.

Although important to know the above solutions won't help in the case the attacker has a copy of the user database. In the past hashing the password was sufficient to prevent an attacker from accessing user accounts. Nowadays computers are fast enough that even though the passwords are hashed, compromise is still possible. This is done by taking the list of commonly used passwords and hashing them to see if any match the database. In order to limit the potential for this attack, a new defense was created called "salting". What this entails is hashing a random value (called a "salt") combined with the password. When the user submits his password the system hashes it combined with the salt and compares the combination to the hash already stored in the database. The security benefit of this is that the attacker needs to calculate the hashes for common passwords for each user.

As technology improved even this became insecure. Nowadays attackers can just hash every conceivable password. Furthermore attackers can work together and using "rainbow tables" which contain pre-computed hashes of millions of passwords and salts. These tables are often generated using distributed computing - so each attacker does not have to develop one on their own. This reduces the amount of security that salts can offer.

Now that salting is not good enough we need to explore other options. The main factor when exploring these options is time; this is because it is impossible to create an uncrackable password. What we do instead is increase the time requirement to discover the passwords so as to discourage the attacker. The issue is that many hash functions were not designed for password security, but rather for speedy verification. Hash functions like sha-256 (despite currently being unbroken) lends itself to quickly hashing lists of passwords. Modern computers can md5-hash every conceivable alphanumeric 7 character password in less than hour[7]. Despite widespread use old hash functions like md5[8] or sha1[9] were recently discovered to be insecure.

Today there is bcrypt [pdf], a special hash function created specifically for password security. Uniquely designed, it is slow, and can keep up with the constantly increasing speeds of computers because it uses a "work factor". Although you might be thinking that this will bog down your server, those that use it don't find this to be an issue. For these reasons I strongly advise the use bcrypt along with the the previously stated techniques such as salting.

2011-6-16: update: At the time I wrote this article I was unaware scrypt, a slower (and therefore better) function to use[10]

[4] A hash is a function that takes some input and outputs a (sufficiently) unique output such that the original input can not be recovered. One simplistic (and highly insecure) hash function would be count the number of times specific letters occurs and store that instead. For example "One very bad zany apple" would become "31012000000102110100010021". It is not possible to know whether this hash becomes "One very bad zany apple", "Noe vrey adb nzya pplea", or "Npdyoazaevebrnpyela". A cryptographic hash function is designed so that multiple passwords like this are hard to create.
[5] [pdf]
[6] Aska: A viable alternative to CAPTCHA? - Eitan Adler (2008)

Edit 2010-3-31: fix footnote numbers; subtle grammar errors fixed.
Edit 2011-2-11: grammar errors fixed - thanks JT.

Edit 2011-6-16: change dates to use ISO format


  1. Another method is to simply store multiple hashes alongside one another and only accept the password if it matches both/all of them, as it becomes increasingly more difficult to generate a valid password to match more than one hash without knowing the password itself.

  2. @Dan
    That is an approach I didn't think of and it also seems to work.

  3. that is true, current project working on incorporates bcrypt, checking two hashes its a good idea. You can have 2 different hashes with two different hash algorithms.

  4. The more I think about it the less I value checking multiple hashes. It doesn't seem to offer any particular guarantee that checking one hash does not. Furthermore is doesn't have a minimum work guarantee that scrypt (or even bcrypt) offers. It probably does not hurt, but I don't think it helps either.

  5. i really dont know much about hashing before your article but your depth is really great

  6. If the attacker has access to both pw-hashes, he can chose which hash he attacks. Naturally he would chose the one which uses less resources to verify against (if they aren't in the same complexity class) and verify with the second hash to make sure he didn't get another pw which is hashed to the same value. So using two would be less secure in the sense that you would think you are more secure than your strongest hash, while you are only as secure as the weakest link alone. I don't really know, but I think that the fact that the attacker could find a word which doesn't match the PW but computes to the same hash is not that significant from a complexity point of view in this case.

    Two hashes help (if they are not from the same algorithm family) if you use them as e.g. file checksums (to protect aliasing attacks which try to generate a file with the same hash but different content) instead of password hashes. Here you can be more secure (you could not only detect that a file has changed or not, you could also detect that someone tries to hide that he has changed the file).

    1. Wow. I did not think of that attack even though it should have been obvious. Thanks!


Have something you want to say? You think I'm wrong? Found something I said useful?
Leave a comment!