> I do want to reinforce that all applications need to confirm data integrity, y...

munk-a · on Nov 8, 2019

Oh I absolutely agree and even when a DB is properly configured with references all cleared defined and constrained it's absolutely a good UX thing to pre-check as much as possible.

But, beyond that, it is quite possible to remove FK checks and still have strong guarantees about data integrity. It is stupidly expensive and unless you have a few billion in the bank there is absolutely no reason to even consider it, but if you're dealing with data volumes like GitHub then it's conceivable that all the other salves for enforcing data integrity fall short. In that case there are ways to approach removing FKs, but, when you do so, you're not (logically speaking) giving up the data-integrity from FKs, you are replacing FKs as a tool for data-integrity with another tool for data-integrity (one that will probably be very similar to FKs) - under this guise DB FKs can stop making sense (though also having any sort of RDBMS engine likely also stops making sense as you're essentially adopting the functional responsibility for being an RDBMS into the primary application).

spookthesunset · on Nov 8, 2019

Being pedantic in this case doesn’t help the cause. Too many developers don’t understand database theory at all and will read this

> But, beyond that, it is quite possible to remove FK checks and still have strong guarantees about data integrity

And not the rest of your post. Yes technically you are right but it is stupidly expensive and nobody should do it.

The problem with being technically correct is, again, people will stop at the sentence I quoted and go build yet another FK free system. Said system will undoubtedly fill up with corrupt bullshit data that eventually leads to exciting mystery bugs in production that has everybody scratching their heads. I’ve seen it time and time again....

munk-a · on Nov 13, 2019

I feel like reading HN should come with a warning on the tin that "If you're reading a long technical comment, taking away just part of it is dangerous" - were I speaking to someone in the business side of a corp that asked "Hey do we need these FK things, some developers have been saying they're slow" I'd say 'Yes, we absolutely do need FKs' then go on to talk to the developers, double check I wasn't at one of the about dozen of companies with data at a scale that FKs as implemented in RDBMSes (especially postgres, mysql tends to drop off in performance much easier without heavy tweaking) is insufficient, and then tell them that FKs do work and they probably really just need to read up a lot on indexes and stop throwing around table locks like it's christmas.

couchand · on Nov 9, 2019

Hey there, you and I still need work. Let them speak!

jrochkind1 · on Nov 8, 2019

Unique constraint is a good example, because it reminds us about race conditions.

The app can check that it won't violate a unique constraint before doing an insert/update, but in between that check and actually doing the insert/update, some other process may have changed data, such that unique constraint can be violated.

So when the rdbms catches this, it'snot just a "fuck you for giving me bad data" condition if the implication there is that it was a bug in app code, and it's a failsafe. It isn't necessarily a bug at all -- unless you intended it to be the app's responsibility to use db-level locks and/or transactions to guarantee this can't happen without the uniqueness constraint -- but then why not just use a uniqueness constraint, the tool the db is actually giving you for this?

Mature rdbms's sometimes don't get the recognition they deserve for being amazing at enforcing data consistency and integrity under concurrency, with pretty reasonable performance for a wide variety of usage patterns.

Foreign key constraint can be similar; you can do all the checking you want, but some other process can delete the object with the pk right before you set the fk to it.

If you have app usage patterns such that you really can't afford database data integrity enforcement (what level of use that is of course depends on your rdbms)... you are forced to give up a lot of things the rdbms is really good at, and reinvent them (in a way that is more performant than the db??) or figure out how to make sure all code written never actually assumes consistent data.