*sort of, sometimes, with simple enough problems with sufficiently little context, for code that can be easily tested, and for which sufficient examples exist in the training data.
I mean hey, two years after being promised AGI was literally here, LLMs are almost as useful as traditional static analysis tools!
I guess you could have them generate comments for you based on the code as long as you're happy to proofread and correct them when they're wrong.
Remember when CPUs were obsolete after three years? GPT has shown zero improvement in its ability to generate novel content since it was first released as GPT2 almost ten years ago! I would know because I spent countless hours playing with that model.
Firstly, GPT-2 was released in 2019. Five years is not "almost ten years".
Secondly, LLMs are objectively useful for coding now. That's not the same thing as saying they are replacements for SWEs. They're a tool, like syntax highlighting or real-time compiler error visibility or even context-aware keyword autocompletion.
Some individuals don't find those things useful, and prefer to develop in a plain text editor that does not have those features, and that's fine.
But all of those features, and LLMs are now on that list, are broadly useful in the sense that they generally improve productivity across the industry. They already right now save enormous amounts of developer time, and to ignore that because you are not one of the people whose time is currently being saved, indicates that you may not be keeping up with understanding the technology of your field.
There's an important difference between a tool being useful for generating novel content, and a tool being useful. I can think of a lot of useful tools that are not useful for generating novel content.
> are broadly useful in the sense that they generally improve productivity across the industry. They already right now save enormous amounts of developer time,
But is that actually a true statement. Are there actual studies to back that up?
AI is hyped to the moon right now. It is really difficult to separate the hype from reality. There are ancedotal reports of ai helping with coding, but there are also ancedotal reports that they get things almost right but not quite, which often leads to bugs which wouldn't otherwise happen. I think its unclear if that is a net win for productivity in software engineering. It would be interesting if there was a robust study about it.
Shall we therefore conclude that syntax highlighting is not useful, that developers who use syntax highlighting are just part of the IDE hype train, and that anecdotal reports of syntax highlighting being helpful are counterbalanced by anecdotal reports of $IDE having incorrect syntax highlighitng on $Esoteric_file_format?
Most of the failures of LLMs with coding that I have seen has been a result of asking too much of the LLM. Writing a hundred context-aware unit tests is something that an LLM is excellent at, and would have taken a developer a long time previously. Asking an LLM to write a novel algorithm to speed up image processing of the output of your electron microscope will go less well.
> Shall we therefore conclude that syntax highlighting is not useful, that developers who use syntax highlighting are just part of the IDE hype train, and that anecdotal reports of syntax highlighting being helpful are counterbalanced by anecdotal reports of $IDE having incorrect syntax highlighitng on $Esoteric_file_format?
Yes. We should conclude that syntax highlighting is not useful in languages that the syntax highlighter does not support. I think basically everyone would agree with this statement.
Similarly an llm that worked 100% of the time and could solve any problem would be pretty useful. (Or at least worked correctly as often as syntax highlighting in situations where it is actually used does)
However that's not the world we live in. Its a reasonable question to ask if llm is good enough yet where the productivity gain outweighs the productivity lost.
Your stance feels somewhat contradictory. A syntax highlighter is not useful in languages it does not support, therefore an LLM must be able to solve any problem to be useful?
The point I was trying to make was, an LLM is as reliably useful as syntax highlighting, for the tasks that coding LLMs are good at today. Which is not a lot, but enough to speed up junior devs. The issues come when people assume they can solve any problem, and try to use them on tasks to which they are not suited. Much like applying syntax highlighting on an unsupported language, this doesn't work.
Like any tool, there's a learning curve. Once someone learns what does and does not work, it's generally an strict productivity boost.
The problem is that there are no tasks that LLMs are reliably good at. I believe that's what OP is getting at.
I fixed a production issue earlier this year that turned out to be a naive infinite loop - it was trying to load all data from a paginated API endpoint, but there was no logic to update the page number being fetched.
There was a test for it. Alas, the test didn't actually cover this scenario.
I mention this because it was committed by a co-worker whose work is historically excellent, but who started using Copilot / ChatGPT. I'm pretty sure it was an LLM-generated function and test, and they were deeply broken.
Mostly they've been working great for this co-worker.
I understand that, the point I'm making is that reliability is not a requirement for utility. One does not need to be reliable to be reliably useful :)
A very similar example is StackOverflow. If you copy/paste answers verbatim from SO, you will have problems. Some top answers are deeply broken or have obvious bugs. Frequently, SO answers are only related to your question, but do not explicitly answer it.
SO is useful to the industry in the same way LLMs are.
Sure, there is a range. If it works 100% of the time its clearly useful. If it works 0% then it clearly isn't.
LLMs are in the middle. Its unclear which side of the line they are on. Some ancedotes say one thing some say another. That's why studies would be great. Its also why syntax highlighting is a bad comparison since that is not in the greyzone.
exactly. many SWEs currently are fighting this fight of “oh it is not good enough bla bla…” on my team currently (50-ish people) you would not last longer than 3 months if you tried to do your work “manually” like we did before. several have tried, no longer around. I believe SWEs fighting LLMs are doing themselves a huge disservice in that they should be full-on embracing it and trying to figure out how to more effectively use them. just like any other tool, it is as good as the user of the tool :)
> Secondly, LLMs are objectively useful for coding now.
No, they're subjectively useful for coding in the view of some people. I find them useless for coding. If they were objectively useful, it would be impossible for me to find them useless because that is what objectivity means.
*sort of, sometimes, with simple enough problems with sufficiently little context, for code that can be easily tested, and for which sufficient examples exist in the training data.
I mean hey, two years after being promised AGI was literally here, LLMs are almost as useful as traditional static analysis tools!
I guess you could have them generate comments for you based on the code as long as you're happy to proofread and correct them when they're wrong.
Remember when CPUs were obsolete after three years? GPT has shown zero improvement in its ability to generate novel content since it was first released as GPT2 almost ten years ago! I would know because I spent countless hours playing with that model.