Looks promising, hope this ends up in an open source process that improved RAG type task.
FromSoft fans who have been around since Kingfield in the late 90s: "always has been" (because from hit gold with the old and decaying fantasy world with dwindling hope in its inhabitants atmospheric vibe and keep reusing it)
This is so exciting! Glad to see mistral at it with more bangers.
Hi Hawke, I understand your fustration with needing to troubleshoot things. Steam allows you to import any exe as a 'non-steam game' to your library and run it with the proton compatability layer. I sometimes have success getting a GOG game installed by running the install exe through proton or wine. Make sure you are using the most up to date version of lutris many package managers are outdated flatpak will gaurentee its most up to date. Hope it all works out for you
assuming you use steam, see which of your favorite games run with proton compatability layer and which absolutely require windows. You may be suprised.
Awesome, thank you!
Thank you for the support Sergio! I hope it works out too.
Ive tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.
DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution.
I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following system prompt changes for roleplay scenarios or adopting complex personality archetypes.
This let's you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.
The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. So its important to choose between a direct answer or spend 5 minutes to let it really think. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.
Thats why I am so happy the 8b model is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I'm getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model. 6.4x speed inceease at the CoT is a huge W for my real life human time spent waiting for a complete output.
Great question. Sometimes I feel like ive lost the plot of enjoying the human experience. When the majority of the members of my species exhibit a behavior or inpulse that I cant relate to or dont understand, I second guess whether its me missing something fundamental to enjoying my existence or whether Ive overcome a certain flaw most others haven't or something else.
You are correct though I should be a little more confident in the choices ive made and the rationals of why I made it.
I think the idea of calling multiple different kinds of ways to for llms to 'process' a given input in a standard way is promising.
I feel that after reasoning we will train models how to think emotionally in a more intricate way. By combining reasoning with a more advanced sense of individuality and greater emotions simulation we may get a little closer to finding a breakthrough.
I just spent a good few hours optimizing my LLM rig. Disabling the graphical interface to squeeze 150mb of vram from xorg, setting programs cpu niceness to highest priority, tweaking settings to find memory limits.
I was able to increase the token speed by half a second while doubling context size. I don't have the budget for any big vram upgrade so I'm trying to make the most of what ive got.
I have two desktop computers. One has better ram+CPU+overclocking but worse GPU. The other has better GPU but worse ram, CPU, no overclocking. I'm contemplating whether its worth swapping GPUs to really make the most of available hardware. Its bee years since I took apart a PC and I'm scared of doing somthing wrong and damaging everything. I dunno if its worth the time, effort, and risk for the squeeze.
Otherwise I'm loving my self hosting llm hobby. Ive been very into l learning computers and ML for the past year. Crazy advancements, exciting stuff.
Thanks for sharing!