Defaults to bcrypt with a cost factor of 10. A totally sane default. The interface is clear, simple, and easy to audit for. This is a definite step forward.
Auditing bcrypt (OpenBSD Blowfish) hashed passwords with JtR is much less satisfying than auditing md5 hashed passwords ;) Joking aside, I agree that this is a good move in the right direction.
Yeah, I'm just saying, the new standard password hash has a simple function signature; it's now easy to take a new PHP codebase and quickly check it to see if they're doing something wacky with passwords.
bcrypt as specified is limited to 72 char passwords [1]. That may not matter for most applications where nobody is using more than 72 characters, but it's sad that one of the leading password hashing systems has such a built-in limitation.
[1] bcrypt, the passphrase-handling part being eksblowfish, takes the user passphrase and turns it into the array of 18 32-bit (4 byte) subkey values blowfish needs for encryption. 18 * 4=72 so that's where 72 comes from. Increasing that limit requires pre-hashing the passphrase down to 72 bytes or less, or chaining multiple bcrypts using each 72-byte chunk of the passphrase. Either of which makes it no longer bcrypt. The problem with pre-hashing is that there is no standard; every implementation uses some arbitrary hashing algorithm. It also adds complexity to implementations.
First, nobody has a 72 character password. Second, at some point, the length of the password stops being the dominant factor in brute-forcing the hash, as Mazieres points out in the bcrypt paper.
Whose fault will it be if some naive application designer uses bcrypt on automatically constructed passphrases that have enough total entropy, but where the first 72 characters have very low entropy? It might be obvious to people who have looked at bcrypt that you should never do that, but someone may expect it to behave like any other hash that accepts arbitrary length input, and that could be catastrophic.
I would prefer that every hash function, including one that's designated as a password hash, utilize every bit of input to generate the output. If password length is limited, shouldn't it be done earlier, and not silently?
During the Defcon password cracking contest this year, there were 5,150 bcrypt hashes. In 48 hours, only 76 of them were cracked. In contrast, there were 8,544 plain sha1 hashes, 4,111 of those were cracked. You can see the full list of hash types and how many were cracked here:
This is probably the premier password cracking contest in the world. I placed 7th overall and was first to crack most of the TrueCrypt volumes and I despise attempting to crack bcrypt hashes (most sane people do). You really have to try and crack them for yourself to fully appreciate how difficult and slow they are to attack.
No modern hash function, including "salted SHA" and Unix crypt() is vulnerable to "rainbow tables". Rainbow tables work exclusively against the dumbest possible hashing schemes.
Bcrypt isn't a defense against "rainbow tables"; the hash function bcrypt supplanted (PHK's MD5 crypt) was also not "rainbow-table-able".
Correct me if I'm wrong but couldn't you construct rainbow tables for any hash function which takes a single input (including salted SHA, if the salt is the same for all passwords)?
Of course, you'd need a seriously large set of passwords on your hands for it to be worth the effort, but it could be done right?
(Possibly worth mentioning, possibly not, that if your salts are extremely weak then the combined hash might show up in regular rainbow tables, whether your salts are unique or not - it seems unlikely this would ever happen in practice though)
The point of rainbow tables is that you can pre-calculate them. I wrote some myself for fun, and it does take a while to build stuff up.
Even if someone was dumb enough to use the same salt everywhere, once you know that you would just start brute-forcing, not building a rainbow table for the joy of searching it. (Unless you are into that like me.)
It's a fun academic exercise, but Google seems to be a better rainbow table than anyone can construct on their own. Search for hashes. The results are frightening.
If you're using the same salt for everything, you've very nearly defeated the purpose of using a salt.
No salt means that a password hashes to the same thing, everywhere. A site specific salt means that someone can generate rainbow table to efficiently attack all accounts. A user specific salt means that generating rainbow tables is effectively the same (complexity) as brute-forcing.
Just to clarify a bit more: Usually you need more (a lot more) code than this on pre 5.5. In particular you can't normally assume that `openssl_random_pseudo_bytes` is available. Instead you'll go through various entropy sources (typically `mcrypt_create_iv`, `/dev/urandom`, maybe COM, with fallback to `mt_rand`).
People often get the salt generation wrong, e.g. by just using a substring of `md5(mt_rand())`, which is obviously wrong (in several respects) :/
I know this is a dumb question, but I haven't found a great answer to it yet: why does it matter how the salt is generated, so long as its done on a per-user basis?
If the salt is allowed to be "less than secret" (by which I mean it can be stored in plain-text, not that it should be published on your website), then what does it matter if it's "pretty random" versus "cryptographically random"?
But for this case, the salt generation is much better (assuming that `mt_rand` is a good enough source of entropy, which may or may not be the case).
The rest definitely applies though.
In short, you're using the wrong algorithm ($2y$ is the better one, the one you're using has a known bug). You're not checking for errors from `crypt()` prior to storing the hash. So you can wind up significantly messing up your database and potentially leaving it in a worse state than if you just used `md5($password)`... And the minor note about timing attacks...
Not to mention that you currently have an issue in your code (it needs to be .= for a string, not +=)...
But salts don't have to be strictly unique, they only have to be a barrier to rainbow tables. Even a 16 bit salt should make it enormously more difficult to try to precompute passwords. The chance of two hashes matching may go up marginally but that doesn't help you figure out the pass.
At least, testing passwords against multiple hashes (at the price of one) is impossible. And it is not possible to see if different entries shares a same password (or to see if they have a different password). Also, it (probably unintentionally) mitigates timing attacks when comparing if the entered password matches.
>testing passwords against multiple hashes (at the price of one) is impossible
That's defending against a newly generated rainbow table.
> And it is not possible to see if different entries shares a same password (or to see if they have a different password).
That's repeating the first point with different words! It's defending against a pre-generated table. i.e. a rainbow table.
Timing attacks are not mitigated by salts; they're mitigated by the design of the encryption. You should not rely on salts for this. In fact, if your hash is exposed you should assume your salt is also exposed.
What do salts guard against other than rainbow tables?
> That's defending against a newly generated rainbow table.
You lost me here. Anyway, the attacker does not need a rainbow table at all to attack against multiple hashes at the price of one.
> That's repeating the first point with different words! It's defending against a pre-generated table. i.e. a rainbow table.
Again, no rainbow tables at all are needed to see if different entries shares a password or not.
About timing attacks, see my earlier comment in this comment chain.
> You should not rely on salts for this. In fact, if your hash is exposed you should assume your salt is also exposed.
I see and agree with your point about "relying on salts", but salts just happens to (as a side-effect probably) mitigate the attack. Remember, your salts are not exposed "as-is" if the attacker manages to fetch the password hash using timing.
> I don't understand how salts could help against timing attacks, though.
The salt which is unknown/unpredictable (and contains enough entropy) to the attacker makes his offline attack against the hash unfeasible (after he has managed to fetch the password hash from the server using timing leaks). I'm not sure if it is possible to fetch the (whole) hash using timing, because it is not a direct comparison. But anyway, if the attacker managed to do that, now because of "a proper salt" he would have to crack a hash that was composed of, say, 128 bits of salt and 20 bits of the actual password. It is unfeasible because of that 128 bits of salt alone.
> But for this case, the salt generation is much better
This is the only question I was asking, and you haven't really addressed it in any detail. The reddit comment you linked was replying to some obviously bad code. I mean, limiting your salt to use only 16 possible characters? Really?
> you're using the wrong algorithm ($2y$ is the better one, the one you're using has a known bug)
I didn't know about that bug until just after writing my last comment (don't worry, I don't do this for a living). I just used `2a` because that's a) the example I see most often, and b) that's what was used in this HN comment thread.
The security fix notice[1] linked from the manual page for crypt() mentions that `2a`, on systems where `2y` is available, has countermeasures to try to combat the vulnerability for newly generated hashes, and even says "if the app prefers security and correctness over backwards compatibility, no action is needed - just upgrade to new PHP and use its new behavior (with $2a$)" which doesn't make it sound like it's a huge issue to use `2a` on newer installs, just that you should prefer `2y` where possible.
That said, I'll make a note to use the new one since it is superior. I do find your comment that "if you're on too old of a PHP version to use that (5.3.7 IIRC), then don't even talk about security..." to be needlessly flippant. You don't even bother to offer an alternative to the poor bastards that are stuck on older versions.
> You're not checking for errors from `crypt()` prior to storing the hash.
It's example code, not production code. Maybe I should have made that clearer, but I thought it would be pretty obvious.
> And the minor note about timing attacks...
What's the timing attack on my (non-production, air code)? Your comment on timing in the reddit comment was about verification, which my code doesn't mention.
> Not to mention that you currently have an issue in your code (it needs to be .= for a string, not +=)...
That's just a stupid typo/brain fart. I didn't actually run this; it's just "air code".
---
Can you elaborate more on the salt generation specifically, since that's all I was really trying to ask here? Is the point of using a more cryptographically secure RNG just to make it more likely that each new salt will be unique? How important is absolute uniqueness?
If you could also elaborate on the problems with mt_rand() while you're at it, I'd appreciate it. The only thing the manual mentions, as big a problem as it may be on its own, is that it prefers even numbers on 64-bit systems in certain configurations. Is there more to it than that?
I'm not a PHP pro, as should be clear by now, so I appreciate any information you can pass along. I'm just trying to learn.
> This is the only question I was asking, and you haven't really addressed it in any detail.
I thought you meant the function in its entirety.
So, to your specific point, it's not bad. That doesn't mean it can't be improved upon.
For example, `mt_rand()` is susceptible to certain types of seed poisoning attacks. That's because the state that it uses is process specific. So when running PHP in a case similar to what happens with mod_php, that state is shared among all php instances (just like with APC). What that means is that the security and randomness of your usage depends on everyone else's usage. So if someone calls `mt_srand()` in one app over and over with the same value, your randomness can be thrown out of the window.
Now, that's a very significant edge case with very limited attack potential. However, when it comes to security if there's a better way, why not use it. And in this case, there is (/dev/urandom). Just read from that source (via fopen, via mcrypt_create_iv, via openssl_random_pseudo_bytes, etc).
I'd much rather edge on the safer side as long as there are not significant downsides...
As far as 2a vs 2y, I would stick with 2y unless you have a very good reason for sticking with 2a.
As far as the error checking, I thought it was worth mentioning, since it seems that $hash = crypt(...) is all you need, when in reality it isn't. Which goes to further my point that crypt() is too difficult to use out of the box...
> That's just a stupid typo/brain fart.
I realize that. I was just pointing it out.
> That said, I'll make a note to use the new one since it is superior. I do find your comment that "if you're on too old of a PHP version to use that (5.3.7 IIRC), then don't even talk about security..." to be needlessly flippant. You don't even bother to offer an alternative to the poor bastards that are stuck on older versions.
Correct. Because older versions have fairly significant vulnerabilities associated with them. Two major DOS vulnerabilities come to mind. Is the comment flippant? Perhaps. Does that make it wrong? No...
And as far as "offer an alternative to the poor bastards that are stuck on older versions", there are plenty of those. PHPass supports PHP all the way back to like 4.2... If you need a password hashing algorithm for an unsupported version (or 5.3.x < 5.3.7), just use that.
Which actually brings me to the entire point (I don't need to tell you, just making the point again). Just use a library for this. It may seem easy to just do it yourself, but there's a lot to it. Just use a library and be done with it. There's no reason to re-implement it every time...
I apologize if I got a bit crass in my earlier comment. I think the Wil Wheaton's Guide To Depression post has me a bit sensitive today. I probably took the worst view of your comments and got annoyed over my own warped perception.
I probably should just use a library for this, but I've been in a pretty "reinvent the wheel to learn about wheels" mode with the thing that I'm building (the latest in a series of projects which continue to elude actual completion). I even started writing a framework a while back, before switching to CodeIgniter since it's widely used and easy (before switching back to writing a framework after getting annoyed fighting CI... kidding).
Since there's likely only ever going to be a single user for this thing, I doubt the password implementation actually matters much, but it's certainly going to be a debate for a day or two.
This is great news. One of the things that PHP has been lacking for some time is easy to use secure password hashing. You'll still have instances where developers are either too lazy, forget or don't learn these new methods and the same problems will occur.
It's great to see the PHP team thinking ahead, and I completely understand and agree with why they chose procedural over an object oriented approach to implementing the new functionality: ease of use.
Next step I hope is some kind of native support for web sockets, I'm tired of using third party libraries that don't implement a web socket server correctly. That would be an amazing feature.
As much as the PHP team is thinking ahead and working hard, there's a ton of inertia in the PHP community that will take another decade to overcome.
The one thing that would help the PHP community immensely is nuking w3schools from orbit. It's like a museum of bad ideas that somehow is the first place people end up when looking to learn PHP.
Considering that this will make password-related code much shorter and more readable, I think it's a good idea regardless of what level of competence you expect from the developers.
There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies, and the other way is to
make it so complicated that there are no obvious deficiencies.
- C. A. R. Hoare
Indeed. Of course all the sane defaults in the world won't stop someone copying and pasting shit like this from a dodgy tutorial:
$password = mysql_query("SELECT password FROM users
WHERE email = ".$_POST['email']." LIMIT 1");
To clarify this to the downvoters who chose not to comment:
The finest password hashing algorithm ever committed to source control won't protect your other sensitive information (address, phone numbers, etc.) when SQL injection can make your entire database vulnerable to attack.
Yes, but there's a greater chance that basic tutorials that for example use unsalted sha1 will use this method instead. It's just as simple for a beginner to grok, but secure by default.
Why would you even worry about it not being a joke? Even if by some quirk of fate they were serious, the code would immediately fail and put no data at risk.
No, this is totally insecure. You're supposed to generate a public/private key pair for the user, encrypt the hash with the public key and store the private key in a separate database on a separate server that isn't connected to the internet and doesn't allow remote log-in from the lan.
Very nice. I'm surprised, though, that they didn't go the OOP route, since they're transitioning a lot of other core procedural functions to objects. Even if it would be a super-simple class interface (almost to the point of not being strictly necessary), it would at least be consistent with the general drift of next-gen PHP.
Additionally, the general drift hasn't been towards OOP within PHP. Some OOP libraries are being added, but the core remains largely procedural (and doesn't appear to be shifting significantly yet). It's taking the (very logical IMHO) route of "Does this make sense to be OOP". If the answer is yes, then it goes in as a class structure. If not, it goes in procedurally.
I did not feel it made sense to make this a class, and as such I did not...
I think the PHP's standard library is as inconsistent as the next guy, but I kind of appreciate there just being a function when just a function is needed.
i.e. Python's standard library provides objects where appropriate, but often they will just put functions directly into modules for simplicity.
Here's the last paragraph (in case it's TLDR, or you don't want to click through):
> So in short (or not), I just felt that there's room for this API and things like PasswordLib to live side by side. And I will continue to maintain that project in the long run. But for the generic use-case, I felt that an OOP API was too much risk for not enough gain for a core implementation.
With that said, if you can come up with a clean API, I'd be all ears and willing to consider implementing it. But for now, this is the better alternative IMHO...
Seems bizarre to me that hash_pbkdf2() has also been accepted, was authored by the same author, and provides for "better"* hashing than bcrypt(), and yet isn't the PASSWORD_DEFAULT or, indeed, even an option for the secure password hashing algorithm. wtf?
* In the crypto community, "better" usually refers to how long an algorithm has been around, how well reviewed & used it is, and how bug-free it's been. Looking at all of these, PBKDF2 is clearly superior to bcrypt, which has had significant bugs, isn't as widely deployed, etc.
The main reason that I didn't provide bindings to PBKDF2 is that I didn't want to create a new output format.
There's presently no crypt(3) format specified for PBKDF2. So that means that I would need to invent one. That's not something I'm willing to do for a core language feature.
Additionally, pbkdf2 is actually slightly weaker than bcrypt (partially due to the higher memory requirements of the later, 32kb vs < 1kb).
So without a strong reason for including it, it wasn't included.
However, the API is designed to be extendable. When scrypt gets bindings to crypt(3), it'll be made available. If PBKDF2 gets bindings, it'll be made available. If a new and stronger algorithm is made, it'll be made available (but not default for quite some time).
But I personally am not willing to go out on a limb and create a new cryptographic specification for this project. And that's what it would have taken to put pbkdf2 into it. Hence, why it's not there...
This is a great argument. Thanks for the clarification. Storing the salt, algorithm, strength, etc in some standard format would certainly be required for a simplified API like this.
None of what you've said about PBKDF2 vs. bcrypt is true. PBKDF2 is inferior to bcrypt (see Colin's scrypt paper for details), has never had a published design flaw, and is less widely deployed than PBKDF2 (many people think they're "doing PBKDF2" when they iterate a hash function 1000 times).
Having said all that, who cares? Pick one or the other. I'd have exactly the same positive comment regarding the new PHP interface if they had used PBKDF2 with a sane cost. The only harmful decision you can make regarding password hashes is to wait to implement them in order to pick the "best" one.
Yes, and PBKDF2 relies on other algorithms that have also shown problems. It's just looping and clever bitmasking (and takes about 10LOC to implement, assuming you have hash_hmac easily available). So if there's a problem found with sha256 or whatever hashing function you choose it use it with, then it too has a problem.
That's not to knock PBKDF2 - I'm just pointing out that both have flaws, and that people should use the right tool for the job. The "KDF" in "PBKDF2" stands for Key Derivation Function; i.e., it's for deriving a(n encryption) key. bcrypt exists basically for the sole purpose of password storage. In practical terms, it's more obvious how to use bcrypt's work factor as computers speed up than PBKDF2's iteration count, although either is quite suitable for the job. It's also easier to rehash bcrypt-stored passwords when upping your work factor - the old version will still verify, just check for $WF$ not matching your current work factor. PBKDF2 requires a bit more logic since you have to rehash with a different iteration count and do a second comparison, which conceivably also opens you up to a timing attack unless you're very careful about how you do it.
To me, it's a matter of what a tool was designed for. Given that I have the right tool for the job, why use something that's designed for something else even though it works quite well?
PBKDF2 is objectively worse than bcrypt, but the difference between them isn't meaningful. The only meaningful choice in password hashes is between PBKDF2/bcrypt and scrypt, which is hardened against gate-level-optimized attacks.
They're not comparable: PBKDF2 is a mode of operation, bcrypt is an actual instantiation of a password hashing scheme. Saying one is (objectively) better than the other is meaningless, or at least very misleading.
Oups, my mistake. I thought that md5 was used by default by the new API (I made a quick read) but you are all right ! Tiredness can be deceptive...
But it's still a good website where I learned to make secure password hashing for the first time.