this post was submitted on 14 Jul 2025
45 points (100.0% liked)

Selfhosted

49461 readers
1045 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 
GPU VRAM Price (€) Bandwidth (TB/s) TFLOP16 €/GB €/TB/s €/TFLOP16
NVIDIA H200 NVL 141GB 36284 4.89 1671 257 7423 21
NVIDIA RTX PRO 6000 Blackwell 96GB 8450 1.79 126.0 88 4720 67
NVIDIA RTX 5090 32GB 2299 1.79 104.8 71 1284 22
AMD RADEON 9070XT 16GB 665 0.6446 97.32 41 1031 7
AMD RADEON 9070 16GB 619 0.6446 72.25 38 960 8.5
AMD RADEON 9060XT 16GB 382 0.3223 51.28 23 1186 7.45

This post is part "hear me out" and part asking for advice.

Looking at the table above AI gpus are a pure scam, and it would make much more sense to (atleast looking at this) to use gaming gpus instead, either trough a frankenstein of pcie switches or high bandwith network.

so my question is if somebody has build a similar setup and what their experience has been. And what the expected overhead performance hit is and if it can be made up for by having just way more raw peformance for the same price.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 20 hours ago* (last edited 20 hours ago) (1 children)

One last thing: I've heard mixed things about 235B, hence there might be a smaller, more optimal LLM for whatever you do.

For instance, Kimi 72B is quite a good coding model: https://huggingface.co/moonshotai/Kimi-Dev-72B

It might fit in vllm (as an AWQ) with 2x 4090s. It and would easily fit in TabbyAPI as an exl3: https://huggingface.co/ArtusDev/moonshotai_Kimi-Dev-72B-EXL3/tree/4.25bpw_H6

As another example, I personally use Nvidia Nemotron models for STEM stuff (other than coding). They rock at that, specifically, and are weaker elsewhere.

[–] [email protected] 1 points 19 hours ago (1 children)

What do I need to run Kimi? Does it have apple silicon compatible releases? It seems promising.

[–] [email protected] 1 points 18 hours ago* (last edited 18 hours ago)

Depends. You're in luck, as someone made a DWQ (which is the most optimal way to run it on Macs, and should work in LM Studio): https://huggingface.co/mlx-community/Kimi-Dev-72B-4bit-DWQ/tree/main

It's chonky though. The weights alone are like 40GB, so assume 50GB of VRAM allocation for some context. I'm not sure what Macs that equates to... 96GB? Can the 64GB can allocate enough?

Otherwise, the requirement is basically a 5090. You can stuff it into 32GB as an exl3.

Note that it is going to be slow on Macs, being a dense 72B model.