this post was submitted on 11 Jun 2023
7 points (100.0% liked)
World News
22311 readers
8 users here now
Breaking news from around the world.
News that is American but has an international facet may also be posted here.
Guidelines for submissions:
- Where possible, post the original source of information.
- If there is a paywall, you can use alternative sources or provide an archive.today, 12ft.io, etc. link in the body.
- Do not editorialize titles. Preserve the original title when possible; edits for clarity are fine.
- Do not post ragebait or shock stories. These will be removed.
- Do not post tabloid or blogspam stories. These will be removed.
- Social media should be a source of last resort.
These guidelines will be enforced on a know-it-when-I-see-it basis.
For US News, see the US News community.
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
That’s actually where Reddit is useful as a training corpus, because different subreddits are at different levels of quality. It’s pretty easy to identify the high quality ones for training answers, and the low quality ones are excellent for training basic transforms (making sense out of an input that is niche and flawed in some way).
There are very few other sources of lightly structured training data that span all of humanity broken down into topics, graded to different levels of quality. Over time, the data will become less relevant as society moves on, so a living training set is important.
Having said that, Lemmy could prove to be an even better training source for expert system LLMs, as there could be curated instances of high quality with the ability to pull in more federated data as needed.