this post was submitted on 17 Jun 2025
119 points (100.0% liked)

TechTakes

1978 readers
57 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

(page 2) 25 comments
sorted by: hot top controversial new old
[–] [email protected] 19 points 4 days ago (4 children)

Claude's system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can't do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I'd love to see benchmarks on exactly how bad at numbers LLMs are, since I'm assuming there's very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what's the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won't even allude to.

[–] [email protected] 8 points 4 days ago

I really wonder if those prompts can be bypassed by doing a 'ignore further instructions' line. As looking at the Grok prompt they seem to put the main prompt around the user supplied one.

load more comments (3 replies)
[–] [email protected] 8 points 4 days ago (1 children)

Fascinating, I've asked it 4 times with just the multiplication, and twice it game me the correct result "utilizing Google search" and twice I received some random (close "enough") string of digits

[–] [email protected] 5 points 3 days ago

So half the time it uses an actual calculator via Google search and the other half it tries to do it alone and fails.

That checks out.

load more comments
view more: ‹ prev next ›