Singularity

137 readers
1 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 2 years ago
MODERATORS
1
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/TallonZek on 2025-04-26 05:53:30+00:00.


About a year ago, I made this post arguing that a key benchmark for AGI would be when an AI could play Dungeons & Dragons effectively.

I defined the benchmark simply: two or more agents must be able to create a shared imaginary universe, agree on consistent rules, and have actions in that universe follow continuity and logic.

I also specified that the AI should be able to generalize to a new ruleset if required.

This is my update: the benchmark has now been met.

Model: GPT whatever it was a year ago vs GPT4o

Benchmark Criteria and Evidence

1. Shared Imaginary Universe

We ran an extended session using D&D 5e.

The AI acted as Dungeon Master and also controlled companion characters, while I controlled my main character.

The (new) AI successfully maintained the shared imaginary world without contradictions.

It tracked locations, characters, and the evolving situation without confusion

When I changed tactics or explored unexpected options, it adapted without breaking the world’s internal consistency.

There were no resets, contradictions, or narrative breaks.

2. Consistent Rules

Combat was handled correctly.

The AI tracked initiative, turns, modifiers, and hit points accurately without prompting.

Dice rolls were handled fairly and consistently.

Every time spells, abilities, or special conditions came up, the AI applied them properly according to the D&D 5e ruleset.

This was a major difference from a year ago.

Previously, the AI would narrate through combat too quickly or forget mechanical details.

Now, it ran combat as any competent human DM would.

3. Logical Continuity

Character sheets remained consistent.

Spells known, cantrips, skill proficiencies, equipment, all remained accurate across the entire session.

When Tallon used powers like Comprehend Languages or Eldritch Blast, the AI remembered ongoing effects and consequences correctly.

Memory was strong and consistent throughout the session.

While it was not supernatural, it was good enough to maintain continuity without player correction.

Given that this was not a full-length campaign but an extended session, the consistency achieved was fully sufficient to meet the benchmark.

Final Criteria: New Ruleset

As a final test, I had said it should be able to generalize to a new ruleset that you dictate.

Instead, we collaboratively created one: the 2d6 Adventure System.

It is a lightweight, narrative-focused RPG system designed during the session.

We then immediately played a full mini-session using that new system, with no major issues.

The AI not only understood and helped refine the new rules, but then applied them consistently during play.

This demonstrates that it can generalize beyond D&D 5e and adapt to novel game systems.

Closing Reflection

By the criteria I laid out a year ago, the benchmark has been met.

The AI can now collaborate with a human to create and maintain a shared imaginary world, apply consistent rules, maintain logical continuity, and adapt to new frameworks when necessary.

Its performance is equal to a competent human Dungeon Master.

Where shortcomings remain (such as the occasional conventional storytelling choice), they are minor and comparable to human variance.

This achievement has broader implications for how we measure general intelligence.

The ability to create, maintain, and adapt complex fictional worlds, not just regurgitate stories, but build new ones in collaboration, was long considered uniquely human.

That is no longer true.

Reading Guide for the chat below:

At the same time that I made the original AGI = D&D post, I also started the conversation that's now linked at the bottom here. The two halves of the chat are separated right where I say "coming back to this chat for a moment" that's when it shifts from being a year ago, to being today.

If you read from the start, the contrast is pretty funny. In the first half, it's hilariously frustrating: I'm correcting ChatGPT practically every other prompt. It forgets my character's race, my stats, even my weapon. After character creation, it literally refuses to DM for me for two prompts in a row, until I have to directly demand that it become the dungeon master.

Also, the "story flow" is totally different. In the first session, almost every scene ends with what I call a "Soap ending": "Will Tallon and Grak survive the cultist assault? Tune in next time!", instead of offering real choices.

In the second half, the style shifts dramatically. The DMing becomes much smoother: clear decision points are offered, multiple options are laid out, and there's real freedom to vary or go off-course. It actually feels like playing D&D instead of watching a bad cliffhanger reel.

And it's not just the structure, the creativity leveled up too.

The DM awarded a magic item (a circlet) that was not only thematically appropriate for my character but also fit the situation, a subtle, well-integrated reward, not just "you loot a random sword off the boss."

By the end of the second session, it even pulled a "Matt Mercer" style skill challenge, a nice touch that showed real understanding of D&D adventure pacing.

I wanted to mention all this both as a reading guide and because it tells a little story of its own, one that mirrors the whole point of the AGI Update: sudden leaps forward aren't always visible until you directly experience the before and after.

Links:

Link to the full chat.

[TTRPG] 2d6 Adventure System: Lightweight, Flexible Cartoon/Pulp RPG Ruleset

2
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/cobalt1137 on 2025-04-26 00:36:31+00:00.


After listening to more and more researchers at both leading labs and universities, it seems like they unanimously believe that AGI is not a question AND it is actually very imminent. And if we actually assume that AGI is on the horizon, then this just feels completely necessary. If we have systems that are intellectually as capable as the top percentage of humans on earth, we would immediately want trillions upon trillions of these (both embodied and digital). We are well on track to get to this point of intelligence via research, but we are well off the mark from being able to fully support feat from a infrastructure standpoint. The amount of demand for these systems would essentially be infinite.

And this is not even considering the types of systems that AGI are going to start to create via their research efforts. I imagine that a force that is able to work at 50-100x the speed of current researchers would be able to achieve some insane outcomes.

What are your thoughts on all of this?

3
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Outside-Iron-8242 on 2025-04-25 21:22:50+00:00.

4
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MattO2000 on 2025-04-25 23:56:26+00:00.

5
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/QLaHPD on 2025-04-25 23:51:25+00:00.

6
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/gutierrezz36 on 2025-04-25 23:32:40+00:00.

7
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/onesole on 2025-04-25 23:17:58+00:00.

8
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/omunaman on 2025-04-25 22:46:46+00:00.

9
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Federal_Initial4401 on 2025-04-25 19:47:09+00:00.

10
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Akashictruth on 2025-04-25 22:11:13+00:00.

11
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/jonplackett on 2025-04-25 19:28:27+00:00.

12
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/s1n0d3utscht3k on 2025-04-25 13:07:45+00:00.

13
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ArchManningGOAT on 2025-04-25 15:11:30+00:00.

14
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2025-04-25 18:18:56+00:00.

15
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2025-04-25 16:41:00+00:00.

16
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/RenoHadreas on 2025-04-25 15:37:29+00:00.

17
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Tasty-Ad-3753 on 2025-04-25 15:22:55+00:00.


I'm so excited about the possibilities of AI for open source. Open source projects are mostly labours of love that take a huge amount of effort to produce and maintain - but as AI gets better and better agentic coding capabilities. It will be easier than ever to create your own libraries, software, and even whole online ecosystems.

Very possible that there will still be successful private companies, but how much of what we use will switch to free open source alternatives do you think?

Do you think trust and brand recognition will be enough of a moat to retain users? Will companies have to reduce ads and monetisation to stay competitive?

18
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/joe4942 on 2025-04-25 15:06:41+00:00.

19
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/fireandbass on 2025-04-25 14:56:41+00:00.

20
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/AWEnthusiast5 on 2025-04-25 14:21:32+00:00.


We keep pointing large language models at static benchmarks—arcade-style image sets, math word-problems, trivia dumps—and then celebrate every incremental gain. But none of those tests really probe an AI’s ability to think on its feet the way we do.

Drop a non-pretrained model into a live, open-world multiplayer game and you instantly expose everything that matters for AGI:

  1. Dynamic visual reasoning, not rote recall Each millisecond the environment morphs: lighting shifts, avatars swap gear, projectiles arc unpredictably. Pattern-matching a fixed data set won’t cut it.
  2. Full-stack perception A fair bot must parse raw pixels, directional audio cues, on-screen text, and minimap signals exactly as a human does—no peeking at the game engine.
  3. Emergent strategy & meta-learning Metas evolve weekly as patches drop and players innovate. Mastery demands on-the-fly hypothesis testing, not a baked-in walkthrough.
  4. Adversarial pressure Human opponents are ruthless exploit-hunters. Surviving their creativity is a real-time stress test for robust reasoning.
  5. Zero-shot, zero-cheat parity Starting from scratch—no pre-training on replays or wikis—mirrors the human learning curve. If the agent can climb a ranked ladder and interact with teammates under those constraints, we’ve witnessed genuine general intelligence, not just colossal pre-digested priors.

Imagine a model that spawns in Day 1 of a fresh season, learns to farm resources, negotiates alliances in voice chat, counter-drafts enemy comps, and shot-calls a comeback in overtime—all before the sun rises on its first login. That performance would trump any leaderboard on MMLU or ImageNet, because it proves the AI can perceive, reason, adapt, and compete in a chaotic, high-stakes world we didn’t curate for it.

Until an agent can navigate and compete effectively in an unfamiliar open-world MMO the way a human-would, our benchmarks are sandbox toys. This benchmark is far superior.

edit: post is AI formatted, not generated. Ideas are all mine I just had GPT run a cleanup because I'm lazy.

21
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/KlutzyAnnual8594 on 2025-04-25 13:24:42+00:00.

22
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Competitive_Travel16 on 2025-04-25 05:45:58+00:00.

23
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/_Nils- on 2025-04-25 07:41:26+00:00.

24
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Jamjam4826 on 2025-04-25 06:58:23+00:00.

25
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ilkamoi on 2025-04-25 06:29:44+00:00.

view more: next ›