Technology

71761 readers

3292 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source (www.minimax.io)

submitted 5 days ago by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 48 points 4 days ago (1 children)

Well… 🤔

[–] [email protected] 17 points 4 days ago (1 children)

DeepSeek imposes similar restrictions, but only on their website. You can self-host and then enjoy relatively truthful (as truthful as a bullshit generator can be) answers about both Tianmen Square, Palestine, and South Africa (something American-made bullshit generators apparently like making up, to appease their corporate overlords or conspiracy theorists respectively).

[–] [email protected] 6 points 4 days ago (2 children)

Nope, Self hosted deepseek 8b thinking and distilled variants still clam up about Tianmen Square

[–] [email protected] 1 points 2 days ago

You want abliterated models, not distilled.

[–] [email protected] 5 points 4 days ago* (last edited 4 days ago) (1 children)

If you're talking about the distillations, AFAIK they take somebody else's model and run it through their (actually open-source) distiller. I tried a couple of those models because I was curious. The distilled Qwen model is cagey about Tianmen Square, but Qwen was made by Alibaba. The distillation of a US-made model did not have this problem.

(Edit: we're talking about these distillations, right? If somebody else ran a test and posted it online, I'm not privy to it.)

I don't have enough RAM to run the full DeepSeek R1, but AFAIK it doesn't have this problem. Maybe it does.

In case it isn't clear, BTW, I do despise LLMs and AI in general. The biggest issue with their lies (leaving aside every other issue with them for a moment) isn't the glaringly obvious stuff. Not Tianmen Square, and certainly not the "it's woke!" complaints about generating images of black founding fathers. The worst lies are the subtle and insidious little details like agreeableness - trying to get people to spend a little more time with them, which apparently turns once-reasonable people into members of micro-cults. Like cults, perhaps, spme skeptics think they can join in and not fall for the BS... And then they do.

All four students had by now joined their chosen groups... Hugh had completely disappeared into a nine-week Arica training seminar; he was incommunicado and had mumbled something before he left about “how my energy has moved beyond academia.”

[–] [email protected] 3 points 4 days ago (1 children)

That's not how distillation works if I understand what you're trying to explain.

If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.

I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

[–] [email protected] 1 points 4 days ago

I've been able to run distillations of Deepseek R1 up to 70B

Where do you find those?

There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.

Thank you for mentioning this, as I finally confronted my own preconceptions and actually found an article by Perplexity that demonstrated R1 itself has demonstrable pro-China bias.

Although Perplexity's own description should cause anybody who understands the nature of LLMs to pause. They describe it in their header as a

version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information.

That's a bold (read: bullshit) statement, considering the only altered its biases on China. I wouldn't consider the original model to be unbiased either, but apparently perplexity is giving them a pass on everything else. I guess it's part of the grand corporate lie that claims "AI is unbiased," a delusion that perplexity needs to maintain.