13.787 ± 0.020 billion years
sntx
I'm also on p2p 2x3090 with 48GB of VRAM. Honestly it's a nice experience, but still somewhat limiting...
I'm currently running deepseek-r1-distill-llama-70b-awq with the aphrodite engine. Though the same applies for llama-3.3-70b. It works great and is way faster than ollama for example. But my max context is around 22k tokens. More VRAM would allow me more context, even more VRAM would allow for speculative decoding, cuda graphs, ...
Maybe I'll drop down to a 35b model to get more context and a bit of speed. But I don't really want to justify the possible decrease in answer quality.
I'm running such a setup!
This is my nixos config, though feel free to ignore it, since it's optmized for me and not others.
How did I achieve your described setup?
- nixos + flakes & colmena: Sync system config & updates
- impermanence through btrfs snapshots: destroy all non-declarative state between reboots to avoid drift between systems
- syncthing: synchronise ALL user files between systems (at least my server is always online to reduce sync inconsistencies from only having a single device active at the time)
- rustic: hourly backups from all devices to the same repos, since this is deduplicated and my systems are mostly synchronised, I have a very clear record of my file histories
I like the reading tip! I think not many people are aware of it.
My thoughts use meaning, instead a specific words.
The whole interpreting my thoughts as natural language often takes longer than coming up with the tought itself.
This results in:
- I often use the intersection of the languages I, and the people I speak with know.
- When it's a topic I'm knowledgeable about, I often talk too fast and people find it hard to follow me.
- I draw many associations and comprehending the larger picture is easy. Yet I often miss the point in the smaller picture.
Or about half a year if we're only counting the time during which I've been alive.