Yes it's tedious to write plumbing code, but it's also dead simple. Just write the damn code. Don't try to create some weird beast that "automagically" does the n different things. Just. Write. The. Code.
Yes it does suck. You know what sucks worse? Zero separation of concerns and the tar pit you get from it.
When writing tests, my goal is to verify a given routine works as intended.
I don't want to write tests for the same functionality over and over. Repeated functionality should be extracted, tested in isolation and then used in composition with other tested code.
This is how you write correct code without stress or worry. People that take "just write the code" as dogma have produced some of the most untestable, bug-ridden code I've ever encountered.
Dogma leads to shitty code no matter which way a person leans. One of the worst pieces of code I ever worked on was a query generator. Somebody noticed that there were recurring patterns in some BI-ish queries that were used to generate a dashboard for customers that wanted to see their usage, and they decided to factor out the redundant parts and eliminate the boilerplate.
What did they end up with? The few hundred lines of code expressing the BI queries shrank in half, but behind the simplicity was close to a thousand lines of dense, inscrutable magic. It was a net increase in LOC, but the value of the magic was supposed to compound as they added more queries. What happened was, the original programmer moved on, and every attempt to add more queries failed, until I joined and it was my turn to be sacrificed to the monster. (I did manage to figure it out. The key was realizing that the whole thing was stupid, from conception to execution — the other engineers had put the original programmer on a pedestal, and they were trying to make the code make sense, which it didn't.)
After making the query generator work for a few queries, I had established the credibility to say that we shouldn't use it anymore, and we should just write out all the boilerplate instead. Suddenly adding and modifying queries became something that anybody could do.
It isn't just custom code that ends up this way. I'm currently working on a project that uses SQLAlchemy, and as the glutton for punishment I am, I'm the person who cleans up all our SQLAlchemy difficulties. I virtually always have the documentation open in a tab, and I have the source code checked out to the version we use. If we just wrote raw SQL and wrote our own row mappers, we'd have twice as much database code, but we'd understand it, and anybody could write and debug it. Instead, half the team treats it as witchcraft, and I feel like I've invested more time learning SQLAlchemy in the last year than I ever spent learning SQL.
This is not to say I'm against abstraction, just that it can be done so poorly that it's counterproductive. You always have to compare -- are we better off with this, or without it? Saying that something reduces boilerplate or reduces repetition isn't the end of the conversation, even if it's true. You have to ask what the cost is.
Write the raw SQL and then generate the boilerplate from that.
This has very few surprises because it’s a bottom up approach. And even better: you can do the exact same thing by hand.
There’s tools/libs that help with that like hugsql (Clojure) or sqlc (Go and other languages).
Doing it top down (ORM etc.) is what can cause so many problems outside of the happy path and trivial cases. These tools basically need to reinvent SQL and map it into a procedural language.
There's a tool called PugSQL that looks promising for Python, but it seems that async isn't directly supported yet[0]. If I ever find time, I'd love to jump on this and make it work, but nobody should hold their breath for that.
I think query builders can be very helpful in a language with a good type system. The only times I haven't used raw SQL and didn't feel like it was a massive mistake were when using Scala, via Slick and Quill.
This is very true. And this is what the blog post was advocating too ! It was not about using some smart custom ORM, but about writing dead simple raw SQL queries in SQLPage instead of hundreds of lines of python and typescript.
You're talking about two entirely different things.
OP is saying don't write a "magic thingcombobulator factory" that "simplifies X endpoints with Y and Z similar behavior". This might be an earnest attempt to try to speed development, but it all collapses under its own weight at scale. The maintainers after you will be left holding the bag and have immense difficulty refactoring, adding a new set of requirements, migrating to a new data model, or moving to an entirely new service.
Clever abstraction kills.
I've dealt with undoing insane balls of twine left by unthoughtful devs, mostly in magic method dispatch, included behavioral overrides, and monkey patching (some of these behaviors are a hallmark in Ruby land).
One person once exposed the entire database as a "safe" SQL-like query parameter DSL. No more endpoints to write - just use the thing.
There are so many problems with this. For example, when millions of transactions per day on mobile clients or via third party integrators bake these assumptions in, you can't easily migrate them away. You have to keep serving the same data assumptions, even while you're gutting and changing everything under the hood. You have to understand the callers, the data flows, the read and write paths. For complex spider webs of business critical logic, it can take several people entire quarters to even years to unwind the mess.
Simple endpoint logic is best. Your data model should be well thought out, and the CRUD code serves as a well-defined, super literate, super maintainable means to manipulate it.
Simplicity of design is important from the simplest Django endpoints all the way up to the most battle-hardened active/active 500k transaction per second endpoints.
Agreed. I’ve also gone into a codebase and seen the most boring code ever. It looked like examples from an intro to web programming class. The backend did simple parameteried SQL queries. It was a pleasure to work with.
My conclusion is that the real “star” developers will, most of the time, write code that’s so simple, it looks like anyone could have written it. They ship a project on time, with good performance and availability, and then they move on. Anyone can come in and maintain it because the code is so obvious.
So much this. I feel that many developers have some fear of writing simple code. Probably because that is the way newbies do it, i.e. straight forward big functions that do stuff.
Reading function pointer dispatch code, disguised as whatever it is called when not C, can be hopeless.
I think your parent comment was making a good point. The "just write the code" and "don't try to be smart" mentality is good only up to a certain point.
Too much "just write the code" ends up creating huge unmaintainable monstrosities.
When you have a lot of time in front of you and a large team, it's okay to just put two junior developers at work for two weeks, and get a big CRUD REST api in the end.
But when you are trying to iterate quickly with a small team, exposing your database is not as stupid an idea as it sounds. And that's why things like Firebase, Hasura, Apollo, Postgraphile, etc. are so popular.
The post is not trying to convince people to build custom DSLs just for querying their database (sorry you had to work with that). It is saying that there are things that exist today, that dramatically reduce the complexity of full stack applications. And that whether or not we like it, this is probably the direction the industry is taking.
The end goal is to minimize software TCO. In addition to being semantically less clear, repeated plumbing code tends to diverge over time, which makes it difficult to refactor and more bug prone if people assume behavior is homogenous.
The best way to handle cases that will be almost the same but may diverge over time is to create a functional mini DSL that describes the domain behavior, and create a template implementation that can be used if desired. Then everything is using a common language, and a non-template implementation indicates the presence of non-standard logic.
> The best way to handle cases that will be almost the same but may diverge over time is to create a functional mini DSL that describes the domain behavior, and create a template implementation that can be used if desired. Then everything is using a common language, and a non-template implementation indicates the presence of non-standard logic.
I mean yeah, I'm a big fan of DSLs. The problem occurs when someone writes the DSL, doesn't document it and leaves. Then it becomes super, super painful to maintain and extend.
Basically I'm coming round to the conclusion that (assuming reasonably competent colleagues), the least experienced person should be able to maintain and extend the code if it's to have any hope of remaining useful over time.
And good tests, for gods sake test the crap out of anything complicated with well-chosen names so that people can read the tests and understand how the code should be used.
If the functions are all clearly named and reasonably small-ish DSLs can be mostly self-documenting. Plus you can always ctrl click in your IDE of choice to view function source. I'm talking something like this:
It's not just writing the code. Writing the code is easy. It's maintaining it. And then debugging it. There is a limit to how many lines of code a single person can maintain.
In my experience, the limit does not depend on the volume as such, but more the complexity. This complexity can be intrinsic frombthe business domain, or accudental from technical choices. If frontend, backend and storage have parallell structure based on predictable patterns, the triple line cost is easily ignorable by skimming.
Development heavily slows down under unpredictability. Maintainance is slower partially because knowledge loss hightens unpredictability. One-off half-documented pseudo-frameworks create much higher knowledge loss in maintenance, and are a much worse time eater than simple code, even if tripled.
Relevant quote from the book "A Philosophy of Software Design"
Complexity is what a developer experiences at a particular point in time when trying to achieve a particular goal. It doesn’t necessarily relate to the overall size or functionality of the system. People often use the word “complex” to describe large systems with sophisticated features, but if such a system is easy to work on, then, for the purposes of this book, it is not complex. Of course, almost all large and sophisticated software systems are in fact hard to work on, so they also meet my definition of complexity, but this need not necessarily be the case. It is also possible for a small and unsophisticated system to be quite complex.
Hey, I'm Ophir, the co-author of the post, and main contributor to the SQLPage one-off half-documented pseudo-framework :)
I'm not sure if you had a look at what SQLPage really does. It is not a framework in the same sense as Django, Rails, or Laravel. It doesn't have a large set of functions you need to interact with.
It lets you write the database queries you would have written anyway to get data out of your database, and just renders that as a nice frontend. All the components you can use for rendering are heavily documented with many examples on https://sql.ophir.dev/documentation.sql
OK, here is a severe misunderstanding brewing. I definitely did not mean SQLPage when I said one-off half-documented pseudo-framework. In fact, I did not mean any real, standalone, named product with this. I do however see very much how you could think so from my description, so my apologies.
What I meant: consider any random big software development. It might be mind-numbingly boring, very technically repetitive, you might have devs who never did any maintenance, or devs being expensive got the command to start building something anything while the business has yet to start delivering something resembling requirements.
In this kind of case, programmers tend to start building abstractions based on their imagined needs, with an We-will-add-the-business-stuff-later attitude. The results are generally some kind of architecture astronaut horror. Abstraction will be very high, weird features and handling of useless corner cases will abound. In-code documentation, logging, debugging features will be absent. Higher level documentation was either not written or lost long ago. That's your average one-off half-documented pseudo-framework.
I've seen plenty of these (and committed a few crimes of my own). From the top of my head, some of the worst:
* A full-blown 3000 lines templating library, for rendering exactly 1 report that was basically a for loop dumping an sql query to a html file.
* A C10K database connection manager built on top of apache commons pooling (which while a good library was not fit for this purpose at all), hyperoptimized for TCP port open/close speed, for an application making at most a few connections per minute.
* A cache manager for files, deciding when to remove a file based on either AI or linear regression, with a web UI for configuring this decision and all the zillion config parameters and strategies, but the time to generate the cached data was shorter than the time to read it from disk and the files easily lived for months.
* A java message building code that did everything humanly possible to only allocate a big buffer once at the beginning because 'GC is too slow', but the coder forgot how joining strings together created temporaries that were of course cleaned up by the GC.
Needless to say, the people maintaining these beast cursed the devs who implemented them, and tended to rip them out on sight if possible, or pay the very heavy maintenance cost.
"In this kind of case, programmers tend to start building abstractions based on their imagined needs, with an We-will-add-the-business-stuff-later attitude"
I wish I could publish examples from my current codebase, because that's exactly what happened. Difficult and verbose abstractions, with sometimes 50 classes being involved in displaying a simple table (one class per column display, one class per filterable column), and that's just the "R" part of CRUD.
And there are 6 or 7 different teams working on it, and each one uses different methodologies to do their work. In some cases it's abstractions on top of GraphQL.
Everyone involved had the best intentions possible, but the end result doesn't reflect it.
It's true that if all the code works well, is tested and all the features are supposed to stay the way they were when the code was written, then, any developer can maintain any amount of code, there is just nothing to do.
The problem arises when there is a change in what we want the code to do. Changing a feature that is implemented over three codebases in three different languages is definitely much more work than updating something that was written in SQLPage, for instance.
Oh, I hadn't noticed your username! On the topic of maintenance: could you have a look at this pull request I opened three years ago on a repo of yours : https://github.com/simonw/datasette/pull/1159 ?
No there is not. A line of code takes no resources, has no overhead, requires no upkeep. I think you may be referring to the drag complexity imposes in future development. That I agree with, but LOC is a poor proxy for complexity, and code that is static costs nothing.
Every line of code has an overhead; has a chance of bugs, and demands upkeep just for existing. Having class A, class B, and class C, that do almost the same, but slightly different thing means that when the business rules change, that you have to be sure that similar, but slightly different changes to class B and class C, which aren't neatly going to be self-contained in B.cpp and C.cpp (or .py, .rs, .rb; you get the point) have to be made, and then you can't ever be sure that A.cpp doesn't also have some long-forgotten but similar and crucial bit of functionality that this one customer relies on (because that was written before TDD became popular).
---
LoC itself is a bad proxy for complexity, but I think taking the log of the number of LoC tells you enough to build some expectations. A codebase where log LoC is ~6 (so in the neighborhood of ~1M LoC) is different enough from one where log LOC is ~3 (so ~1,000 LoC) that you have an idea of what you're getting into if someone asks you to make a change to either one of those.
The key to understanding our (apparent) disagreement is:
> that when the business rules change
Yes, when things change complexity has a cost. The inverse is also true however, if nothing changes, it has no cost. If class A, B, and C do almost the same thing, then nobody cares because the computer will gladly execute almost the same thing in different locations in memory. The modern computer built today is essentially perfect. It will execute the same thing every time, it will not suddenly require changes because there was some degradation in an adder, and no cogs need changing. All the maintenance is stuff we make up because we want it to do something it never did before.
Things always change. Software does not perform in a vacuum. It's subject to the inexorable progression of hardware decay and business knowledge loss, at the very least.
A friend's friend's company absolutely relies on this bespoke computer program running on an un-networked desktop computer running Windows XP from the 2000s. There will be a degradation in its hard drive, its power supply, its fan; something. All the lines of code that comprise that program (which are lost to the sands of time) are a liability because that code has been lost. All we can do now is virtualize the application and move it to newer hardware that isn't on the verge of failing. Rewriting the app is out of everyone's budget so that's all we can do, and hope for the best.
The lower the log LoC of their Visual Basic app, the easier it should be to replace and rewrite atop a modern tech stack.
If it ain't broke... you point out. It's old and creaky, and everyone's just afraid of the thing. There's no real backup (working on that!), there's no accessibility to it from the Internet - looking up info on that computer via a smartphone or tablet would be a boon to the company. It's absolutely load bearing, but it's like a bridge that's too small for the city that's grown around it.
The world moves forwards around software that's sat in place, so the software wants to move as well. We're not "making up" maintenance stuff just for the hell of it. Unless you work on the same chair and desk you used when you were 5. I don't fit in mine, and they were lost to a move anyway.
Saying “a line of code requires no resources” can only be true under a particular set of assumptions and particular system for accounting. It’s not a useful or interesting argument by itself, because it doesn’t explain the assumptions and accounting system that it implies.
I'm of two minds on this, I both agree and disagree.
Once a code base is a certain size, explicit but bigger can be a boon. Magic dynamic dispatch systems and other tools that simplify plumbing make onboarding and routine, drive-by maintenance way harder IME.
I find that once you understand systems that have a dash of "magic", though, it is easier to add features and stuff. Single points of maintenance and all that.
It's a continuum, with each side having different benefits.
Debugging is easier when you have a backend server which logs the API calls.
I did debug apps where UI and DB access lived in a single code space (VB/Delphi style). This was pretty hard to debug and logic was so tightly coupled with the UI code that it was nearly impossible to write tests for it.
Because those Delphi apps were written by less capable people. I've done tons of Delphi's applications in the past and still do some now (both Delphi and Lazarus). In every case the UI and backend business logic was clearly separated.
Extending and debugging complex code (eg autogenerating tools, macros etc) is much more difficult than simple code, even if the before can be written in fewer lines than replicating (nearly) identical but simple code.
I’ve become a fan of code generation (data driven).
The benefits: you write code faster, automatically uniform and the result is “dumb” and less abstract AKA easy to debug and modify. Tedium/boilerplate is gone, you focus in the overall model.
The costs: you think more up front, you have to see the result first (hand written). It’s easy to see common patterns too early.
With some patience, caution and experience some of the costs can be mitigated.
I work at a company that does a lot of code generation, and it gets uglier the longer you do it. It's much harder to write the code that generates the code you want than to just write the damn code in the first place. The abstractions & assumptions made for your code generator will eventually begin to break down, and when that finally happens everything goes from a simple refactoring to way overly complicated update to the generator.
We too do lots of code generation, but I have the opposite experience.
The articles example would imply in our use case:
1) add one key to the schema (which is database independent), which will generate encoders, decoders, apis (to work with the data structure, not in network-sense) automatically
2) add the key in the views you want to add it (when updating/reading or more complex network apis)
3) specify how the key is retrieved/saved in the use cases (controller-like)
4) use the key in the frontend.
It took me longer to write this post from mobile than it would've taken me do the first 3 steps.
I’m on my first project that resembles your description, and I _really_ like it (so far).
Auto-documentation is also a big plus, imo. Our “truth schema” also outputs OpenAPI specs, markdown docs, etc with zero added effort (past writing inline comments). Love it.
Yes, and I wish we had more time to document and clean it up for users outside our company because it's pretty incomprehensible for users outside it.
Notice that we use a custom typescript compiler (tsplus), we make use of some quite advanced typescript, and we add codegeneration via eslint on top of it.
Took me 3 months here before it started making sense, but then it started clicking.
It sounds as if the code generators you use are pretty bad. The ones I use at work are fantastic. It has literally saved me (and others) thousands of hours of boring tedious work.
I loved the idea of code generation when I first encountered it, but I've since come to hate it.
A large code base that was auto-generated and then subtly modified in some places is hard to refactor, and if you need to change the signature of a function that is used thousands of time across the generated code, you are in for a long ride.
There is an art to writing good code generators. Bad code generators are really really bad. Good code generators are absolutely awesome! I have saved thousands of hours using my own code generators. But I have also seen very bad code generators in the wild that I wouldn’t recommend using.
So much time & frustration expended simply to avoid typing out the magic database commands... And the constant ego trips attempting to outperform 30+ year old query planner codebases on 7-way+ joins by using baby's first ORM.
We are in the era of hyperscale SQL engines. Database engines that are spread out across multiple servers, racks and buildings. Engines so vast & complex the compute & storage responsibilities have to be separated into different stacks. But, they (the good ones) still work just like the old school approach from an application perspective. The workload necessary to actually saturate one of these databases would be incredible. I some days wonder if Twitter could be rewritten on top of one without much suffering.
And, if you aren't trying to go big and bold or spend a bunch of money, there's always SQLite. It also supports basically all the same damn things. It can run entirely in memory. It has FTS indexing. Your CTE-enabled queries will work just fine on it. If you find SQLite doesn't scale with you, swapping to a different engine really isn't that big of a deal either. You will have some dialect conflicts but it's generally very workable, especially if you use some thin layer like Dapper between your code and the actual connection instances.
I asked some developers to implement something with guidelines over how to do it.
Ultimately they tried to do more than asked which then caused problems because maintenance is now harder, and some types were removed while others were “enriched”, and much like uranium, became more dangerous to wield.
To be fair, a good IDE can give you low-effort tools to one-click typical use-cases.
Other than that I completely agree. Devs get hang-up on trivial syntax topics waaaay too often, when the actual time-killer lies in reasoning and performing test-cycles.
The thought leaders at my job had this philosophy and now we have a gigantic project that takes forever to compile. And you do always have to compile all of it because it's all one commingled codebase. Tough place to be.
Great question. A good abstraction can offer an order of magnitude improvement in some dimension, whether that be clarity, speed, or the like. A bad abstraction trades a lot of one dimension for a little of another. In this case, I'll happily take an order of magnitude improvement in understandability, debuggability, extensibility, and a lower learning curve over a crappy ORM or DSL that saves me the effort of writing ~30 LOC; heck even ~5k LOC. If we get farther than that, we can talk. And even then, the solution is probably not going to be an ORM or a DSL.
Yes it does suck. You know what sucks worse? Zero separation of concerns and the tar pit you get from it.