You are sure.
I'm not
Beyond that, it's a black box.
I'm not
Beyond that, it's a black box.
Neither of us knows
You are sure. I'm not Beyond that, it's a black box. |
|
arXiv.org - Ha[l]lucination is inevitible.
https://arxiv.org/abs/2401.11817 Hallucination is Inevitable: An Innate Limitation of Large Language Models LLMs are impressive, but they're still just "page-level autocomplete" as DeLong says. They're even worse when humans don't curate the data going into them and just hoover up everything they can find... Cheers, Scott. |
|
Humans are also susceptible to hallucination
Do we have reason to believe LLMs will consistently and perpetually be "worse" than human participants? -- Drew |
|
Dunno.
I think the point is that LLMs are not and cannot be an "intelligence" as commonly understood. They can string together words that might, or might not, be a good approximation of information. Since the supposed point of them is to approximate knowledgeable humans ("they can pass the bar exam!!"), the fact that they cannot be hallucination-free seems to raise red flags. What good is an "expert" that you have to second-guess and double-check when it comes to anything important? Hacker News thread where someone makes similar points about Google's "AI Overviews". Yeah, humans make mistakes. But everyone knows that, and that's why "I want to speak to the manager" exists. What "manager" are we going to talk to when everyone we attempt to interact with is just a LLM instance?? We all know the ancient "computers cannot make mistakes" trope that gets trotted out when there are problems. There's a rather infamous example of that still playing out in the UK. I'm sure there are examples of LLMs that do a decent job (what used to be called "expert systems" seems related, I think), but they didn't crawl TheOnion and Reddit in an attempt to get huge and crush their competitors. The good expert systems builders fed it known-good, or at least best-effort good, information. But even then, humans always have to check the work. Didn't someone say that once garbage info - like putting glue on pizza, or eating rocks - gets into these LLMs, then it's impossible to get it out? They basically have to retrain it with new data? We'll see. Cheers, Scott. |
|
That's an interesting test
Can you teach it that what it thinks it knows is wrong? And again, show me that you can teach a Trumper that what they know is wrong and I'll concede that LLMs are worse than humans. -- Drew |
|
Tangent -- re: "page-level autocomplete"
Not exactly page-, but post-/comment-level: I've fumble-fingered away responses on various online platforms on my phone a few times, so I've had to start over from the beginning. And a few times, ordinary dumb AutoCarrot on my phone has apparently recognised "He's trying to say the same thing again" and offered up the next word -- all of them, in order; the next one after the one I just accepted, all the way through to the end -- from the screed I'd failed to post just before. Far from "intelligence", but nice and handy. Too bad it only happens so rarely, not onsistently. -- Christian R. Conrad The Man Who Apparently Still Knows Fucking Everything Mail: Same username as at the top left of this post, at iki.fi |
|
It's a solvable problem
I initially thought along the lines of having multiple AI back ends answering the same problem and then the resulting answers would be voted upon by multiple AI back ends and if one of them starts hallucinating, two of them will say that guy's hallucinating. If the two of them are hallucinating then a red flag will be thrown up And the answer will be presented as untrustworthy. But if you trigger three then your problem is really s******* and I'm sorry. So then the concept of mixture of agents showed up simply to attempt to get the best answers possible. It sends the same question to multiple back ends and then aggregates them. It answers your question and it simultaneously presents a variety of answers that are alternatives to educate you on the possibilities. https://youtu.be/aoikSxHXBYw?si=JnjVH0oO8roS9kQN I don't know if it'll handle hallucinations right now, but I can guarantee you they are thinking about it. There will be some type of boundary checking sooner or later when there are multiple AI agents working in concert but using a different back ends. It failed on the snake game but is incredible on logic puzzles in general. I saw Claude 3.5 do the snake game in a single shot and it was incredible. |