I would run it in a Podman container with the GPU passed though
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Why not throw that into a VM with VFIO passthrough, plug the GPU in via an external dock and if we are already at abstracting shit away for unnecessary complexity and non-compatibility do all that on windows?
Because that is way more complicated?
It is really easy to run ollama in a container.
Really easy to start running it
Then everything goes wrong, from configuration over logs to cuda. And the worst fucking debugging ever.
On Linux you can download Alpaca. I think it is CPU only but it is simpler.
Ollama is simple too, I meant that containers make everything a nightmare to maintain.
Nested VMs stay performant about three levels deep, so do that as well.