LocalLLaMA

2758 readers

19 users here now

Welcome to LocalLLama! This is a community to discuss local large language models such as LLama, Deepseek, Mistral, and Qwen.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support eachother and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

[email protected]

NVIDIA's GeForce RTX 4090 With 96GB VRAM Reportedly Exists; The GPU May Enter Mass Production Soon, Targeting AI Workloads (wccftech.com)

submitted 2 weeks ago by [email protected] to c/[email protected]

4 comments fedilink hide all child comments

Maybe AMD's loss is Nvidias gain ?

top 4 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

Hmmh, the 4090 is kind if the wrong choice for this, due to its memory bus width... For AI workloads and especially if you want to connect lots of memory, you kind of want the widest bus possible.

[–] [email protected] 1 points 1 week ago

but it has micorns ram

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago) (1 children)

It's 384 bit? It's not bad, 512 bit is super expensive and basically only exists on the 5090 die.

Also, it seems LLMs are drifting towards being less memory-speed bound with the diffusion model experiments.

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago)

Hmmh, I had another look at the numbers and all the 16 / 21 / 28 Gbps "effective" memory speed of the 3090 / 4090 and 5090 seem to be about in the same ballpark. So are AMD desktop graphics cards. I thought a 5090 would do more. But don't the AI (datacenter) cards that are designed for AI workloads and 80GB of VRAM have something like 2 or 3 TB/s? I mean running large LLMs at home is kind of a niche, I'm not sure what kind of requirements people have for this. But at this price point an AMD Epyc processor with the several memory channels could maybe(?) do a similar job. I'm really not sure what the target audience is.

And I'm also curious about the alternative approaches to language models. Afaik we're not there yet with diffusion models. And it might take some time til we get a freely available state of the art model at that size. I guess cutting down on the memory-speed requirements would make things easier for a lot of use-cases.