> Achieving the Senior Executive status is often mistaken for a comfortable reward, a final destination with enhanced perks and support. A more fitting analogy is reaching the NFL Super Bowl. You are now part of an elite team where nothing less than peak performance is acceptable. As the Navy SEALs put it...
There are techniques to mitigate this. You can reuse containers instead of creating a new one each time. You can mount in directories (like ~/.claude) from your local machine so you dont have to set claude up each time.
I use agents in a container and persist their config like you suggest. After seeing some interest I shared my setup at https://github.com/asfaload/agents_container
It works fine for me on Linux.
Completely agree. If you're motivated enough about a topic to post about it online, you're probably emotional about it and unable to see it in a clear-headed manner.
The people I know who have the most reasonable political opinions never post about it online. The people who have developed unhealthy and biased obsessions are the ones who post constantly.
> If you're motivated enough about a topic to post about it online, you're probably emotional about it and unable to see it in a clear-headed manner.
> The people I know who have the most reasonable political opinions never post about it online.
And here you are posting your opinions online! How fascinating. I hope you recognize the extreme irony in the fact that you were motivated enough about this topic to post about it.
If I'm a linux user who uses firefox currently, what's the value prop for this browser? I already get privacy and extensions, is it just for testing my app on webkit?
The main benefit is that Orion (contrary to Firefox) has a business model. The downside is that it's not open source. They have some explanation on why, but it might be a deal breaker for someone.
You can already test a site on a webkit engine in Linux using Gnome Web (previously Epihany) or LuaKit ( https://luakit.github.io/ ). But it is always good to have options, even if commercial ones. From that aspect Orion on Linux is good news.
Left-populists and right-populists like to frame issues as being a conflict between the elites and the common man. Banning big banks from owning homes is a perfect example of this.
It's fine to ban big banks from buying homes and wont do damage to the nation, but don't expect it to solve the problem.
High housing prices are due to zoning-based supply restrictions. These are entrenched due to politically active NIMBY voters.
Actually fixing the housing crisis means addressing zoning, but that doesn't fit the elite vs common man narrative so gets ignored by the populists.
I agree completely with the author that AI assisted coding pushes the bottleneck to verification of the code.
But you don't really need complete formal verification to get these benefits. TDD gets you a lot of them as well. Perhaps your verification is less certain, but it's much easier to get high automated test coverage than it is to get a formally verifiable codebase.
I think AI assisted coding is going to cause a resurgence of interest in XP (https://en.wikipedia.org/wiki/Extreme_programming) since AI is a great fit for two big parts of XP. AI makes it easy to write well-tested code. The "pairing" method of writing code is also a great model for interacting with an AI assistant (much better than the vibe-coding model).
Trouble is that TDD, and formal proofs to much the same extent, assume a model of "double entry accounting". Meaning that you write both the test/proof and the implementation, and then make sure they agree. Like in accounting, the assumption is that the probability of you making the same mistake twice is fairly low, giving high confidence to accuracy when they agree. When there is a discrepancy, then you can then unpack if the problem is in the test/proof or the implementation. The fallible human can easily screw either.
But if you only fill out one side of the ledger, so to speak, an LLM will happily invent something that ensures that it is balanced, even where your side of the entry is completely wrong. So while this type of development is an improvement over blindly trusting an arbitrary prompt without any checks and balances, it doesn't really get us to truly verifying the code to the same degree we were able to achieve before. This remains an unsolved problem.
I don't fully understand what you mean by accounting expects the probability of making the same mistake twice is fairly low? Double-entry bookkeeping can only tell you if the books are balanced or not. We absolutely cannot assume that the books reflect reality just because they're balanced. You don't need to mess up twice to mess up the books in terms of truthness.
Also tests and code are independent while you always affect both sides in double-entry always. Audits exist for a reason.
With double-entry bookkeeping, the only way an error can slip through is if you make the same error on both sides, or else they wouldn’t be balanced. A similar thing is true for testing: If you make both an error in your test and in your implementation, they can cancel out and appear to be error-free.
I don’t quite agree with that reasoning, however, because a test that fails to test the property it should test for is a very different kind of error than having an error in the implementation of that property. You don’t have to make the “same” error on both sides for an error to remain unnoticed. Compared to bookkeeping, a single random error in either the tests or the implementation is more likely to remain unnoticed.
> With double-entry bookkeeping, the only way an error can slip through is if you make the same error on both sides, or else they wouldn’t be balanced. A similar thing is true for testing: If you make both an error in your test and in your implementation, they can cancel out and appear to be error-free.
Yeah but it's very different from tests vrs code though, right? Every entry has two sides at least and you do it together, they are not independent like test and code.
You can easily make a mistake if you write a wrong entry and it will still balance. Balanced books =/= accurate books is my point. And there is no difference between "code" and "tests" in double entry, it's all just "code".
So it seems like the person who made the metaphor doesn't really know how double-entry works or took maybe one accounting class.
> Yeah but it's very different from tests vrs code though, right? Every entry has two sides at least and you do it together, they are not independent like test and code.
The point of the current thread is that the use of AI coding agents threatens to disrupt that. For example, they could observe a true positive test failure and opt to modify the test to ensure a pass instead.
You can use a little less snark and "high confidence" is pretty easy to understand but your metaphor makes no sense. Balanced books =/= accurate books and it is not at all a sign that the bookkeeping is accurate. The entries are also not independent like code and tests.
Naturally. Hence "high confidence" and not "full confidence". But let's not travel too far into the weeds here. Getting us back on track, what about the concept of "high confidence" is not understandable?
That sounds right in theory, but in practice my code is far, far higher quality when I do TDD than when I don't. This applies whether or not I'm using an Ai coding assistant
I don't think GP disagrees. They are (I think) stating that AI-assisted TDD is not as reliable as human TDD, because AI will invent a pointless test just to achieve a passing outcome.
The issues raised in this article are why I think highly-opinionated frameworks will lead to higher developer productivity when using AI assisted coding
You may not like all the opinions of the framework, but the LLM knows them and you don’t need to write up any guidelines for it.
Yep. I ran an experiment this morning building the same app in Go, Rust, Bun, Ruby (Rails), Elixir (Phoenix), and C# (ASP whatever). Rails was a done deal almost right away. Bun took a lot of guidance, but I liked the result. The rest was a lot more work with so-so results — even Phoenix, surprisingly.
I liked the Rust solution a lot, but it had 200+ dependencies vs Bun’s 5 and Rails’ 20ish (iirc). Rust feels like it inherited the NPM “pull in a thousand dependencies per problem” philosophy, which is a real shame.
I can vouch for this as someone who works in a 1.6 million line codebase, where there are constant deviations and inconsistent patterns. LLMs have been almost completely useless on it other than for small functions or files.
I can't believe anyone actually wrote this.
reply