On macOS I’ve been using Ollama. It’s very easy to setup, can run as a service and expose an API.
You can talk to it directly from the CLI (ollama run
) or via applications and plugins (like https://continue.dev ) that consume the API.
It can run on Linux but I haven’t personally tried it.