That's pretty cool. I've tried a few of the distills, but I've mostly gone back to regular models.
LocalLLaMA
Welcome to LocalLLama! This is a community to discuss local large language models such as LLama, Deepseek, Mistral, and Qwen.
Ask questions, share your funny prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support eachother and share our enthusiasm in a positive constructive way.
Agree. I also shift between them. As the bare minimum, I use a thinking model to 'open up' the conversation, and then often continue with a normal model, but it certainly depends on the topic.
Long ago we got 'routellm' I think, that routed a request depended on its content, but the concept never got traction for some reason. Now it seems that closedai and other big names are putting some attention to it. Great to see DeepHermes and other open players be in front of the pack.
I don't think it will take long before we have the agentic framework do the activation of different 'modes' of thinking dependent on content/context, goals etc. It would be great if a model can be triggered into several modes in a standard way.
I think the idea of calling multiple different kinds of ways to for llms to 'process' a given input in a standard way is promising.
I feel that after reasoning we will train models how to think emotionally in a more intricate way. By combining reasoning with a more advanced sense of individuality and greater emotions simulation we may get a little closer to finding a breakthrough.
How does it compare to regular deepseek distills though?
DeepHermes 24B CoT thought patterns feels about on par with the official R1 distill Ive tried. Its important to note though my experience is limited to the deepseek r1 NeMo 12B distill as thats what fit nice and fast on my card.
All the r1 distill thought process internal monolouge humanisms "let me write that down" "if I remember correctly" "oh, but wait that doesnt sound right lets try again" are there. the multiple 'but wait, what if's" before ending the thought to examine multiple sides are there too. It spends about 2-5k tokens thinking. It tends to stay on track and catch minor mistakes or hallucinations.
Compared to the unofficial mistral-24b distills this is top tier for sure. I think its toe to toe with ComputationDolphins 24B R1 distill, and its just a preview.