That is a great xkcd comic, but it doesn't show that the error rate "isn't much better." But are there are sources that have measured things and demonstrated this? If this is a fact I am genuinely interested in the evidence.
The problem is that the LLM's sources can be LLM generated. I was looking up some health question and tried clicking to see the source for one of the LLMs claim. The source was a blog post that contained an obvious hallucination or false elaboration.
Excellent observation. I get so frustrated every time I hear the "we have test-suites and can test deterministically" argument. Have we learned absolutely nothing from the last 40 years of computer science? Testing does not prove the absence of bugs.
It seems a better and fuller solution to a lot of these problems is to just stop using AI.
I may be an odd one but I'm refusing to use agents, and just happily coding almost everything myself. I only ask a LLM occasional questions about libraries etc or to write the occasional function. Are there others like me put there?
It seems a better and fuller solution to a lot of these problems is to just stop using AI.
I may be an odd one but I'm refusing to use agents, and just happily coding everything myself. I only ask a LLM occasional questions about libraries etc. Are there others like me put there?
Hi, no one's responded to you after 12 hours so I will.
I don't outright refuse to use LLM's, but I use them as little as possible. I enjoy the act of programming too much to delegate it to a machine.
For awhile now there have been programmers who don't actually enjoy programming and are just in it for the money. This happens because programmer salaries are high and the barrier to entry is relatively low. I could imagine LLMs must feel like a godsend to them.
People keep using these analogies but I think these are fundamentally different things.
1. hand arithmetic -> using a calculator
2. assembly -> using a high level language
3. writing code -> making an LLM write code
Number 3 does not belong. Number 3 is a fundamentally different leap because it's not based on deterministic logic. You can't depend on an LLM like you can depend on a calculator or a compiler. LLMs are totally different.
There are definitely parallels though. eg you could swap out your compiler for a different one that produces slightly different assembly. Similarly a LLM may implement things differently…but if it works do we care? Probably no more than when you buy software you don’t care precisely what compiler optimisation were used. The precise deterministicness isn’t a key feature
With the llm, it might work or it might not. If it doesn't work, then you have to keep iterating and hand holding it to make it work. Sometimes that process is less optimal than writing the code manually. With a calculator, you can be sure that the first attempt will work. An idiot with a calculator can still produce correct results. An idiot with an llm often cannot outside trivial solutions.
It often doesn't work. That's the point. A calculator works 100% of the time. A LLM might work 95% of the time, or 80%, or 40%, or 99% depending on what you're doing. This is difference and a key feature.
I see. I’d call that fragility/reliability rather than deterministic but semantics I suppose.
To me that isn’t a show stopper. Much of the real world works like that. We put very unreliable humans behind the wheel of 2 ton cars. So in a way this is perhaps just programmers aligning with the messy real world?
Perhaps a bit like architects can only model things so far eventually you need to build the thing and deal with the surprises and imperfection of dirt
There could be more linear and "resource-aware" type systems coming down the pipes through research. These would allow the type checker to show performance / resource information. Check out Resource Aware ML.
Super interesting, but I think this will be very difficult in practice due to the gigantic effect of nondeterminism at the hardware level (caches, branch prediction, out of order execution, etc.)
There is a bunch of research happening around "Resource-Aware" type theory. This kind of type theory checks performance, not just correctness. Just like the compiler can show correctness errors, the compiler could show performance stats/requirements.
I wish people understood that this is pretty much true of software building as well.
reply