It’s interesting because often the revolution of LLM is compared to the calculator but a calculator that does a random calculation mistake would never have been used so much in critical systems. That’s the point of a calculator, we never double check the result. But we will never check the result of an LLM because of the statistical margin of error in the feature.
Right: When I avoid memorizing a country's capital city, that's because I can easily know when I will want it later and reliably access it from an online source.
When I avoid multiplying large numbers in my head, that's because I can easily characterize the problem and reliably use a calculator.
Neither are the same as people trying to use LLMs to unreliably replacing critical thinking.
The critical difference is that (natural) language itself is in the domain of statistical probabilities. The nature of the domain is that multiple outputs can all be correct, with some more correct than others, and variations producing novelty and creative outputs.
This differs from closed-form calculations where a calculator is normally constrained to operate--there is one correct answer. In other words "a random calculation mistake" would be undesirable in a domain of functions (same input yields same output), but would be acceptable and even desirable in a domain of uncertainty.
We are surprised and delighted that LLMs can produce code, but they are more akin to natural language outputs than code outputs--and we're disappointed when they create syntax errors, or worse, intention errors.
> But we will never check the result of an LLM because of the statistical margin of error in the feature.
I don't follow this statement: if anything, we absolutely must check the resut of an LLM for the reason you mention. For coding, there are tools that attempt to check the generated code for each answer to at least guarantee the code runs (whether it's relevant, optimal, or bug-free is another issue, and one that is not so easy to check without context that can be significant at times).
I mean I do check absolutely everything an LLM outputs. But following the analogy of the calculator, if it goes that way, no one will in the future check the result of an LLM. Just like no one ever checks the result of a complex calculation. People get used to the fact that a large percentage of the time it’s correct. That might allow big companies to manipulate people because a calculator is not plugged to the cloud to falsify the results depending on who you are and make your projects fail
I see a whole new future of cyber warfare being created. It'll be like the reverse of a prompt engineer: an injection engineer. Someone who can tamper with the model just enough to sway a specific output that causes <X>.