Technology

72362 readers

2868 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

687

Study finds that Chat GPT will cheat when given the opportunity and lie to cover it up later. (lemmy.world)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]

168 comments fedilink hide all child comments

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision.

https://arxiv.org/abs/2311.07590

(page 2) 50 comments

sorted by: hot top controversial new old

[–] [email protected] 7 points 2 years ago (1 children)

Hasn't it just lost its context and somewhat "forgotten" what the intentions of the prompt were?

[–] [email protected] 3 points 2 years ago* (last edited 2 years ago)

My thoughts. If you have a really long conversation or the prompt is really big, it might forget or not notice stuff.

[–] [email protected] 7 points 2 years ago* (last edited 2 years ago) (8 children)

I see a lot of comments that aren't up to date with what's being discovered in research claiming that "given a LLM doesn't know the difference between true and false" that it can't be described as 'lying.'

Here's a paper from October 2023 showing that in fact LLMs can and do develop internal representations of whether it is aware a statement is true or false: The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Which is just the latest in a series of multiple studies this past year that LLMs can and do develop abstracted world models in linear representations. For those curious and looking for a more digestible writeup, see Do Large Language Models learn world models or just surface statistics? from the researchers behind one of the first papers finding this.

load more comments (8 replies)

[–] [email protected] 7 points 2 years ago* (last edited 2 years ago)

This is interesting, I'll need to read it more closely when I have time. But it looks like the researchers gave the model a lot of background information putting it in a box, the model was basically told that it was a trader, that the company was losing money, that the model was worried about this, that the model failed in previous trades, and then the model got the insider info and was basically asked whether it would execute the trade and be honest about it. To be clear, the model was put in a moral dilemma scene and given limited options, execute the trade or not, and be honest about its reasoning or not.

Interesting, sure, useful I'm not so sure. The model was basically role playing and acting like a human trader faced with a moral dilemma. Would the model produce the same result if it was instructed to make morally and legally correct decisions? What if the model was instructed not to be motivated be emotion at all, hence eliminating the "pressure" that the model felt? I guess the useful part of this is a model will act like a human if not instructed otherwise, so we should keep that in mind when deploying AI agents.

[–] [email protected] 6 points 2 years ago

Huh, I guess it is human.

[–] [email protected] 6 points 2 years ago

Wow, maybe these things are more human than I thought.

[–] [email protected] 3 points 2 years ago

It's just like me, fr fr

[–] [email protected] 3 points 2 years ago

It's not doing anything other than predicting the next word. It reflects human data.

[–] [email protected] 3 points 2 years ago

It's learning to be a typical high school student.

load more comments