this post was submitted on 31 Jan 2025
17 points (100.0% liked)
LocalLLaMA
2849 readers
14 users here now
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
At some point you'll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.
Yes, but if he's world building, a larger, slower model might just be an acceptable compromise.
I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.