this post was submitted on 07 Jun 2024
303 points (100.0% liked)
Technology
67987 readers
3222 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
These companies absolutely collect the prompt data and user session behavior. Who knows what kinda analytics they can use it for at any time in the future, even if it's just assessing how happy the user was with the answers based on response. But having it detached from your person is good. Unless they can identify you based on metrics like time of day, speech patterns, etc
You can collect the data and figure out how to use it later. Just look at the Google leaks lately and what they collect, it's literally everything down to the length of clicks and full walks through the site
Collecting data about user interests is in itself valuable, and it's plausible to use various metrics to analyze it, something as simple as sentiment analysis, which has been broadly done. Sentiment analysis has predated modern ML by a long margin, but you can read the wiki page on that
But yeah just think about stuff like Google trends, tracking interest in topics, as an example of what such data could be used for. And deanonymizing the inputs is probably possible to some degree, aside from the obvious trust we place in DDG as a centralized failure point
I'm curious, how does it work?
Not who you asked but you don't want your AI to train itself based on the questions random users ask because it could introduce incorrect or offensive information. For this reason llms are usually trained and used in a separate step. If a user gave the llms private information you wouldn't want it to learn that information and pass it on to other users so there are protections in place usually to stop it from learning new things while just processing requests.