Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I’m pretty confident that GitHub doesn’t use foreign keys because it was built as a Rails app

Maybe originally, but lack of foreign key usage is certainly not Rails specific today. Large MySQL shops generally don't use foreign keys, full stop, for the exact reasons Shlomi described in the original comment.

Facebook does not use foreign keys either. In my experience, same thing is true at all the other large MySQL-based companies. And those companies make up a majority of the largest sites on the net btw -- including most of the consumer-facing social networking and user-generated content products/apps out there.

This does not mean that, for example, Facebook is full of data that would violate constraints. There are other asynchronous processes that detect and handle such problems.

> If you get to a point where sharding is best for your company you hopefully have enough revenue coming in to fund a data transition.

Do you mean, stay with your current RDBMS of choice and shard while also removing the FKs? Or do you mean transition to some other system? (and if so, what?)

The former path is bad: sharding alone is very painful even without introducing a ton of new application-level constraint logic at the same time.

The latter path is also bad: only very recent NewSQL DBs have sharding built-in, and you probably don't want to bet your existing rocket ship startup on your ability to transition to one under pressure. Especially given much higher latency profiles of those DBs, meaning a very thorough caching system is also required, and now you have all the fun of figuring out multi-region read-after-write cache consistency while also transitioning to an entirely new DB :)



“[S]harding alone is very painful even without introducing a ton of new application-level constraint logic at the same time.”

I typically espouse app logic to check state and use foreign keys to ensure enforcement (eg, race conditions that are very hard to ensure at the app level but are built into many RDBMS). Foreign key failures are just treated like any other failure mode.

But honestly, I haven’t been part of a company that have really hit those upper limits that require sharding. They exist, yes. But most companies will never need to worry about it. Which is my point.


I agree that most companies won't ever need to shard, and pre-sharding (or worrying about extreme scalability challenges in general) is usually unwise premature optimization. But it does depend on the product category.

Social networking and user-generated content products (including GitHub) need to be built with scale somewhat in mind: if the product reaches hockey-stick growth, a lot of scalability work will need to be completed very quickly or the viral opportunity is lost and the company will fail. I'm not saying they should pre-shard, but it does make sense to skip FKs with future sharding in mind.

This was nearly a decade ago, but in one case I helped shard my employer's DBs (and roll out the related application-level changes) with literally only a few hours to spare before the main unsharded DB's drives filled up. If we also had to deal with removing FKs at the same time, we definitely wouldn't have made it in time and the product literally would have gone read-only for days or weeks, probably killing the company. Granted, these situations are incredibly rare, but they really do happen!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: