Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I do want to reinforce that all applications need to confirm data integrity, you either confirm it when saving the data, when extracting the data, or have to very carefully balance various consistency concerns and either enforce consistency before saving data or enforce consistency after saving (but before extracting) data.

There are two kinds of error handling. The “happy error” and the “fuck you asshole” error handing. Good systems have both.

Yes, the application should check it isn’t about to violate a FK constraint. Just like it usually checks that it isn’t about to violate a unique constraint. That way the application can fail in happy, controlled way. But if the application doesnt do the right thing than the DB server still should puke all over the transaction with a “fuck you, don’t feed me bullshit” error.

Just like data entry over the web. The javascript can return all kinds of nice happy messages to the user about missing fields and stuff. But the backend should always validate and enforce correctness even if it’s mode of failure is some ugly “piss off, don’t feed me crap” message.

Just like you should never trust user provided data coming from an HTTP post, a database should never trust the INSERT or UPDATE is valid and won’t corrupt the database. Both backend systems can return mean old ugly errors when shit is bad and let the front end do pre-validation that can do happy nice errors. The backend always has to enforce its own data integrity. Period.



Oh I absolutely agree and even when a DB is properly configured with references all cleared defined and constrained it's absolutely a good UX thing to pre-check as much as possible.

But, beyond that, it is quite possible to remove FK checks and still have strong guarantees about data integrity. It is stupidly expensive and unless you have a few billion in the bank there is absolutely no reason to even consider it, but if you're dealing with data volumes like GitHub then it's conceivable that all the other salves for enforcing data integrity fall short. In that case there are ways to approach removing FKs, but, when you do so, you're not (logically speaking) giving up the data-integrity from FKs, you are replacing FKs as a tool for data-integrity with another tool for data-integrity (one that will probably be very similar to FKs) - under this guise DB FKs can stop making sense (though also having any sort of RDBMS engine likely also stops making sense as you're essentially adopting the functional responsibility for being an RDBMS into the primary application).


Being pedantic in this case doesn’t help the cause. Too many developers don’t understand database theory at all and will read this

> But, beyond that, it is quite possible to remove FK checks and still have strong guarantees about data integrity

And not the rest of your post. Yes technically you are right but it is stupidly expensive and nobody should do it.

The problem with being technically correct is, again, people will stop at the sentence I quoted and go build yet another FK free system. Said system will undoubtedly fill up with corrupt bullshit data that eventually leads to exciting mystery bugs in production that has everybody scratching their heads. I’ve seen it time and time again....


I feel like reading HN should come with a warning on the tin that "If you're reading a long technical comment, taking away just part of it is dangerous" - were I speaking to someone in the business side of a corp that asked "Hey do we need these FK things, some developers have been saying they're slow" I'd say 'Yes, we absolutely do need FKs' then go on to talk to the developers, double check I wasn't at one of the about dozen of companies with data at a scale that FKs as implemented in RDBMSes (especially postgres, mysql tends to drop off in performance much easier without heavy tweaking) is insufficient, and then tell them that FKs do work and they probably really just need to read up a lot on indexes and stop throwing around table locks like it's christmas.


Hey there, you and I still need work. Let them speak!


Unique constraint is a good example, because it reminds us about race conditions.

The app can check that it won't violate a unique constraint before doing an insert/update, but in between that check and actually doing the insert/update, some other process may have changed data, such that unique constraint can be violated.

So when the rdbms catches this, it'snot just a "fuck you for giving me bad data" condition if the implication there is that it was a bug in app code, and it's a failsafe. It isn't necessarily a bug at all -- unless you intended it to be the app's responsibility to use db-level locks and/or transactions to guarantee this can't happen without the uniqueness constraint -- but then why not just use a uniqueness constraint, the tool the db is actually giving you for this?

Mature rdbms's sometimes don't get the recognition they deserve for being amazing at enforcing data consistency and integrity under concurrency, with pretty reasonable performance for a wide variety of usage patterns.

Foreign key constraint can be similar; you can do all the checking you want, but some other process can delete the object with the pk right before you set the fk to it.

If you have app usage patterns such that you really can't afford database data integrity enforcement (what level of use that is of course depends on your rdbms)... you are forced to give up a lot of things the rdbms is really good at, and reinvent them (in a way that is more performant than the db??) or figure out how to make sure all code written never actually assumes consistent data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: