Humans bullshit and hallucinate and claim authority without citation or knowledge. They will believe all manner of things. They frequently misunderstand.
The LLM doesn’t need to be perfect. Just needs to beat a typical human.
LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
And many, many companies are proposing and implementing uses for LLM's to intentionally obscure that accountability.
If a person makes up something, innocently or maliciously, and someone believes it and ends up getting harmed, that person can have some liability for the harm.
If a LLM hallucinates something, that somone believes and they end up getting harmed, there's no accountability. And it seems that AI companies are pushing for laws & regulations that further protect them from this liability.
These models can be useful tools, but the targets these AI companies are shooting for are going to be activly harmful in an economy that insists you do something productive for the continued right to exist.
This is correct. On top of that, the failure modes of AI system are unpredictable and incomprehensible. Present day AI systems can fail on/be fooled by inputs in surprising ways that no humans would.
1. To make those harmed whole. On this, you have a good point. The desire of AI firms or those using AI to be indemnified from the harms their use of AI causes is a problem as they will harm people. But it isn't relevant to the question of whether LLMs are useful or whether they beat a human.
2. To incentivize the human to behave properly. This is moot with LLMs. There is no laziness or competing incentive for them.
That’s not a positive at all, the complete opposite. It’s not about laziness but being able to somewhat accurately estimate and balance risk/benefit ratio.
The fact that making a wrong decision would have significant costs for you and other people should have a significant influence on decision making.
That reads as "people shouldn't trust what AI tells them", which is in opposition to what companies want to use AI for.
An airline tried to blame its chatbot for inaccurate advice it gave (whether a discount could be claimed after a flight). Tribunal said no, its chatbot was not a separate legal entity.
Yeah. Where I live, we are always reminded that our conversations with insurance provider personnel over phone are recorded and can be referenced while making a claim.
Imagine a chatbot making false promises to prospective customers. Your claim gets denied, you fight it out only to learn their ToS absolves them of "AI hallucinations".
> LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
On the contrary. Humans can earn trust, learn, and can admit to being wrong or not knowing something. Further, humans are capable of independent research to figure out what it is they don't know.
My problem isn't that humans are doing similar things to LLMs, my problem is that humans can understand consequences of bullshitting at the wrong time. LLMs, on the other hand, operate purely on bullshitting. Sometimes they are right, sometimes they are wrong. But what they'll never do or tell you is "how confident am I that this answer is right". They leave the hard work of calling out the bullshit on the human.
There's a level of social trust that exists which LLMs don't follow. I can trust when my doctor says "you have a cold" that I probably have a cold. They've seen it a million times before and they are pretty good at diagnosing that problem. I can also know that doctor is probably bullshitting me if they start giving me advice for my legal problems, because it's unlikely you are going to find a doctor/lawyer.
> Just needs to beat a typical human.
My issue is we can't even measure accurately how good humans are at their jobs. You now want to trust that the metrics and benchmarks used to judge LLMs are actually good measures? So much of the LLM advocates try and pretend like you can objectively measure goodness in subjective fields by just writing some unit tests. It's literally the "Oh look, I have an oracle java certificate" or "Aws solutions architect" method of determining competence.
And so many of these tests aren't being written by experts. Perhaps the coding tests, but the legal tests? Medical tests?
The problem is LLM companies are bullshiting society on how competently they can measure LLM competence.
> On the contrary. Humans can earn trust, learn, and can admit to being wrong or not knowing something. Further, humans are capable of independent research to figure out what it is they don't know.
Some humans can, certainly. Humans as a race? Maybe, ish.
Well there are still millions that can. There is a handful of competitive LLMs and their output given the same inputs are near identical in relative terms (compared to humans).
Your second point directly contradicts your first point.
In fact we do know how good doctors and lawyers are at their jobs, and the answer is "not very." Medical negligence claims are a huge problem. Claims agains lawyers are harder to win - for obvious reasons - but there is plenty of evidence that lawyers cannot be presumed competent.
As for coding, it took a friend of mine three days to go from a cold start with zero dev experience to creating a usable PDF editor with a basic GUI for a specific small set of features she needed for ebook design.
No external help, just conversations with ChatGPT and some Googling.
Obviously LLMs have issues, but if we're now in the "Beginners can program their own custom apps" phase of the cycle, the potential is huge.
> As for coding, it took a friend of mine three days to go from a cold start with zero dev experience to creating a usable PDF editor with a basic GUI for a specific small set of features she needed for ebook design.
This is actually an interesting one - I’ve seen a case where some copy/pasted PDF saving code caused hundreds of thousands of subtly corrupted PDFs (invoices, reports, etc.) over the span of years. It was a mistake that would be very easy for an LLM to make, but I sure wouldn’t want to rely on chatgpt to fix all of those PDFs and the production code relying on them.
Well humans are not a monolithic hive mind that all behave exactly the same as an “average” lawyer, doctor etc. that provides very obvious and very significant advantages.
> days to go from a cold start with zero dev experience
>> In fact we do know how good doctors and lawyers are at their jobs, and the answer is "not very." Medical negligence claims are a huge problem. Claims agains lawyers are harder to win - for obvious reasons - but there is plenty of evidence that lawyers cannot be presumed competent.
This paragraph makes little sense. A negligence claim is based on a deviation from some reasonable standard, which is essentially a proxy for the level of care/service that most practitioners would apply in a given situation. If doctors were as regularly incompetent as you are trying to argue then the standard for negligence would be lower because the overall standard in the industry would reflect such incompetence. So the existence of negligence claims actually tells us little about how good a doctor is individually or how good doctors are as a group, just that there is a standard that their performance can be measured against.
I think most people would agree with you that medical negligence claims are a huge problem, but I think that most of those people would say the problem is that so many of these claims are frivolous rather than meritorious, resulting in doctors paying more for malpractice insurance than necessary and also resulting in doctors asking for unnecessarily burdensome additional testing with little diagnostic value so that they don’t get sued.
It's fine if it isn't perfect if whomever is spitting out answers assumes liability when the robot is wrong. But, what people want is the robot to answer questions and there to be no liability when it is well known that the robot can be wildly inaccurate sometimes. They want the illusion of value without the liability of the known deficiencies.
If LLM output is like a magic 8 ball you shake, that is not very valuable unless it is workload management for a human who will validate the fitness of the output.
I never ask a typical human for help with my work, why should that be my benchmark for using an information tool? Afaik, most people do not write about what they don't know, and if one made a habit of it, they would be found and filtered out of authoritative sources of information.
ok, but people are building determinative software _on top of them_. It's like saying "it's ok, people make mistakes, but lets build infrastructure on some brain in a vat". It's just inherently not at the point that you can make it the foundation of anything but a pet that helps you slop out code, or whatever visual or textual project you have.
It's one of those "quantities is so fascisnating, lets ignore how we got here in the first place"
You’re moving the goalposts. LLMs are masquerading as superb reference tools and as sources of expertise on all things, not as mere “typical humans.” If they were presented accurately as being about as fallible as a typical human, typical humans (users) wouldn’t be nearly as trusting or excited about using them, and they wouldn’t seem nearly as futuristic.
The LLM doesn’t need to be perfect. Just needs to beat a typical human.
LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.