DavidGarcia

joined 2 years ago
[–] [email protected] 5 points 4 weeks ago (1 children)
[–] [email protected] 25 points 4 weeks ago (3 children)

Investors poured completely insane amounts of money into thd endless money pit that is ClosedAI. then they realize betting everything on one horse was really stupid, since they have zero competitve advantage.

now they try to get as much loot off the sinking ship as possible lmao

they'll probably exit scam soon

[–] [email protected] 2 points 1 month ago

Q4 will give you like 98% of quality vs Q8 and like twice the speed + much longer context lengths.

If you don't need the full context length, you can try loading the model at shorter context length, meaning you can load more layers on the GPU, meaning it will be faster.

And you can usually configure your inference engine to keep the model loaded at all times, so you're not loosing so much time when you first start the model up.

Ollama attempts to dynamically load the right context lenght for your request, but in my experience that just results in really inconsistent and long time to first token.

The nice thing about vLLM is that your model is always loaded, so you don't have to worry about that. But then again, it needs much more VRAM.

[–] [email protected] 5 points 1 month ago (2 children)

In my experience anything similar to qwen-2.5:32B comes closest to gpt-4o. I think it should run on your setup. the 14b model is alright too, but definitely inferior. Mistral Small 3 also seems really good. anything smaller is usually really dumb and I doubt it would work for you.

You could probably run some larger 70b models at a snails pace too.

Try the Deepseek R1 - qwen 32b distill, something like deepseek-r1:32b-qwen-distill-q4_K_M (name on ollama) or some finefune of it. It'll be by far the smartest model you can run.

There are various fine tunes that remove some of the censorship (ablated/abliterated) or are optimized for RP, which might do better for your use case. But personally haven't used them so I can't promise anything.

[–] [email protected] 11 points 1 month ago (4 children)

who would he even genocide? could be anyone

[–] [email protected] 12 points 1 month ago

not to be confused with the fertility rat, which has been very unstable over the last decade

[–] [email protected] 23 points 1 month ago (3 children)

when are people going to learn that centralized social media will always be garbage...

[–] [email protected] 4 points 1 month ago

yeah, might as well dress him up

[–] [email protected] 15 points 1 month ago

what a beatiful green car

[–] [email protected] 13 points 1 month ago

what do you mean "also"?

[–] [email protected] 3 points 1 month ago* (last edited 1 month ago)

when you're in a glowing cum dripping from the ceiling competition and yours is fastest so you win

[–] [email protected] 2 points 1 month ago (1 children)

nothing gives me more joy than intentionally writing "should of" to annoy pedants like you

view more: ‹ prev next ›