this post was submitted on 27 Feb 2025
26 points (100.0% liked)

Technology

68526 readers
3060 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
26
GPT-4.5 (openai.com)
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
 
top 4 comments
sorted by: hot top controversial new old
[–] [email protected] 20 points 1 month ago (2 children)

Those charts are hilarious: wow, it gives the right answer 62.5% of the time and only makes up completely false answers 37.1% of the time! It's like Russian roulette, but worse!

[–] [email protected] 8 points 1 month ago

If you play Russian roulette with two bullets like a real man, then this model is about the same outcome!

[–] [email protected] 4 points 1 month ago

Surely, people won't use the slop generator in applications where being correct is important, right?

[–] [email protected] 4 points 1 month ago

In their human choice benchmarks it was only chosen 59% of the time compared to 4o. That's a 15-20x cost increase for 9% difference.