LocalLLaMA

2865 readers

112 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

[email protected]

[Paper] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in SOTA Large Language Models (arxiv.org)

submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/[email protected]

4 comments fedilink hide all child comments

"Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"

The problem has a light quiz style and is arguably no challenge for most adult humans and probably to some children.

The scientists posed varying versions of this simple problem to various State-Of-the-Art LLMs that claim strong reasoning capabilities. (GPT-3.5/4/4o , Claude 3 Opus, Gemini, Llama 2/3, Mistral and Mixtral, including very recent Dbrx and Command R+)

They observed a strong collapse of reasoning and inability to answer the simple question as formulated above across most of the tested models, despite claimed strong reasoning capabilities. Notable exceptions are Claude 3 Opus and GPT-4 that occasionally manage to provide correct responses.

This breakdown can be considered to be dramatic not only because it happens on such a seemingly simple problem, but also because models tend to express strong overconfidence in reporting their wrong solutions as correct, while often providing confabulations to additionally explain the provided final answer, mimicking reasoning-like tone but containing nonsensical arguments as backup for the equally nonsensical, wrong final answers.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 10 months ago (2 children)

Don't know much of the stochastic parrot debate. Is my position a common one?

In my understanding, current language models don't have any understanding or reflection, but the probabilistic distributions of the languages that they learn do - at least to some extent. In this sense, there's some intelligence inherently associated with language itself, and language models are just tools that help us see more aspects of nature than we could earlier, like X-rays or a sonar, except that this part of nature is a bit closer to the world of ideas.

[–] [email protected] 4 points 10 months ago

I don't know about common but you and I agree on a lot. LLMs are not a breakthrough in artificial cognition but more like a breakthrough in linguistics that coherent English can be produced with unexpectedly small mathematical structures. Hubris on our part imagining human language is more complex than it is or that our ideas are more unique than they are.

[–] [email protected] 3 points 10 months ago* (last edited 10 months ago)

Well, I'd say there is information in language. That's kinda the point of it and why we use it. And language is powerful. We can describe and talk about a lot of things. (And it's an interesting question what can not be described with language.)

I don't think the stochastical parrot thing is a proper debate. It's just that lots of people don't know what AI is and what it can and cannot do. And it's neither easy to understand nor are the consequences always that obvious.

Training LLMs involves some clever trickery, limit their size etc so they can't just memorize everything, but instead are forced to learn concepts behind those texts.

I think they form models of the world inside of them. At least of things they've learned from the dataset. That's why they can for example translate text. They have some concept of a cat stored inside of them and can apply that to a different language that uses entirely different characters to name that animal.

I wouldn't say they are "tools to learn more aspects about nature". They aren't a sensor or something. And they can infer things, but not 'measure' things like an X-ray.