LocalLLaMA
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
Rules:
Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.
Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.
Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.
Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.
view the rest of the comments
Europe has clearly chosen a path that will increase its technological dependency on either the US or China. It's not likely to play a large role in figuring out the future economic order. We'll see how long it can continue on this path.
Its AI policies are reminiscent of Feudalism. People create AI, but then they have to pay a levy to people who have contributed nothing. But they have rights awarded by the government. AI is not the only area where the EU is shifting to policies that facilitate wealth extraction rather than creation. I don't think that is domestically sustainable. Sooner or later the European nations will try to extract wealth from each other and that will be the end. It doesn't have to go that far. Maybe we will just see a stagnation and decline, as in South America.
Is your stands limited to AI or do you generally condone paying a levy? Like towards Spotify or Netflix or Hollywood, because I could as well skip that and watch the newest movies without obeying their copyright...
I mean it's not nothing, there is some effort people put into things. Like the Wikipedia is super useful for machine learning. My computer code on Github teaches AI programming. And I can see the crawlers at my own server and today I had to update my config because it's been hammered by Alibaba. Dozens of different IP addresses, fake user agent and they completely overloaded my database with requests. It's not like I don't contribute or am part of a different world?!
That's a bit of an odd question, given my praise of American Fair Use. The USA has had copyright, including Fair Use, for longer than much of Europe. The predecessor of modern copyright law was created in the 1700s in the UK. There is a German scholar, Eckhard Höffner, who argues that this caused book production to plummet in the UK. He also says that the German-speaking lands produced more books, more different books, than the UK in the century before such laws arrived.
The American founding fathers were men of the Enlightenment. They, or some of them, understood the problems with such government sponsored monopolies. Therefore, the US Constitution limits copyrights and patents. It's an interesting clause. Congress is empowered "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries". It's about progress first; very much a product of the Enlightenment.
I don't know if there was ever a discussion if entertainment should qualify at all for copyright protection. I have to try to look it up at some time.
In 1998, US copyright was extended by 20 years. Now it is life of the author +70 years. That has been called the Mickey-Mouse-Protection-Act, because it meant that the original Mouse enjoyed another 20 years of copyright This was roundly criticized by economists and even lead to a case before the Supreme Court. Obviously, making copyright retroactively longer does not encourage any kind of creativity. It's in the past. Well, the case was lost, nevertheless.
For many left/liberal people, this is corruption; just the Disney company getting what it wants.
The EU countries had expanded their copyright years earlier, without resistance or even comment. Smug Europeans may feel superior when Americans rage against the corporations. But the truth is often like this, where Europeans simply quietly accept such outrages.
The original copyright in the US (and before that in the UK) was 14 years. Copyright protection required registration. It worked like the patent system. The interesting thing is that patents still work a lot like that. One must register and publish them and then they last for 20 years. Patents still have a 20-year duration. Meanwhile, copyrights have gone from 14 years to life+70 years, no registration required.
Patents are public so that people can learn from them. That has been used as an argument for patents. The alternative would be that everyone tries to keep new inventions secret. This way, people can learn and try to circumvent patents; find other ways of achieving the same thing. That's an interesting observation in light of AI training, no?
I haven't answered your question. In my experience, pro-copyright people will always refuse to argue over what should be covered by copyright or how long. They demand an expansion and use psychological manipulation to get it. If you do not let yourself be manipulated, they change the subject and will argue if copyright should exist at all. I have never met a single person who was able to defend copyright as it exists. Perhaps you can answer own question now.
Yes, I mainly wanted to rule out the opposite. Because the multi billion dollar companies currently do some lobbying as well. Including the same manipulation and narratives, just the other way around. They want everyone else to lose rights, while they themselves retain full rights, little to no oversight... And that's just inherently unfair.
As I said. Copyright might not be something good or defendable. It clearly comes with many obvious flaws and issues. The video you linked is nice. I'd be alright with abolishing copyright. Preferrably after finding a suitable replacement/alternative. But I'm completely against subsidising big companies just so they can grow and manifest their own Black Mirror episode. Social scoring, making my insurance 3x more expensive on a whim and a total surveillence state should be prohibited. And the same rules need to apply to everyone. Once a book author doesn't get copyright any longer, so does OpenAI and the big tech companies. They can invest some $100 million in training models, but it's then not copyrighted either. I get to access the model however I like and I can sell a competing service with their model weights. That's fair and same rules for everyone. And Höffner talks to some degree about prior work and what things are based upon. So the big companies have to let go of their closely guarded trade secrets and give me the training datasets as well. I believe that'd be roughly in the spirit of what he said in the talk. And maybe that'd be acceptable. But it really has to be same rules for everyone, including big corporations.
Can you back this up? They certainly do not have the same reach or influence as the copyright industry.
What do you mean by that?
AI models may not be copyrightable under US law. I'm fairly sure that base models aren't. Whether curating training data, creating new training data, RL, and so on, ever makes a copyrighted model is something that courts will eventually have to decide.
They are probably copyrightable under EU law (maybe protected as databases). That's an EU choice.
The rules are different in different countries. They are not different for corporations.
The current thing is Meta is very vocal about the EU AI act. Their opinion is everywhere in the tech news, this week. And they're a very influential company. Completely dominating some markets like messengers, parts of social media. Also well-known in the AI industry.
Other companies do the same. They test what they can get away with all the time. Like stealing Scarlett Johansson's voice, pirating books on bittorrent... And they definitely have enough influence and money to pay very good lawyers. Choose what to settle out of court and what to fight. We shouldn't underestimate the copyright industry. But Meta for example is a very influential company with a lot of impact on society and the world.
And AI is in half the products these days. Assisting you, or harvesting your data... Whether you want it or not. That's quite some reach, pervasive, and those are the biggest companies on earth. I'd be with you if AI were some niche thing. But it's not.
And Meta are super strict with trademark law and parts of copyright when it's the other way around. I lately spent some time reading how you can and cannot use or mention their trademark, embed it into your website. And they're very strict if it's me using their stuff. The other way around they want free reign.
I mean manifacturing a supply chain for them where they get things practically for free. Netflix has to pay for licenses to distribute Hollywood content. OpenAI's product also has other people's content going into the product, but they don't need to do the same. It's subsidised and they get the content practically for free for their business model.
And what do you think I do with my server and the incident last week? If I now pay $30 more for a VPS that's able to withstand Alibaba's crawlers... Wouldn't that be a direct sunsidy from me towards them? I pay an extra $30 a month just so they can crawl my data?
We were talking about a specific lecture that questions the entire concept of copyright as we have it now. You can't argue to abolish copyright and then in the next sentence defend it for yourself or your friends. It's either copyright for book authors and machine learning models, or it's none of them. But you can't say information in the products from other people is not copyright, but the information in the products of AI companies is copyright. That doesn't make any sense.
And they're not wrong.
That doesn't quite back up what you claimed, though. You wrote: "They want everyone else to lose rights, while they themselves retain full rights,"
Their claim of Fair Use seems straightforward. That's not everyone else losing their rights. I am not aware where they lobby for "full rights" for themselves, whatever that means.
There are different kinds of intellectual property. Trademarks are different from copyright. Then there's also trade secrets, patents, publicity rights, privacy, etc.
Generally, you can use any Trademark as long as you don't use it for trade or harm the business that owns it. I'm not going to look it up but I'm guessing that the rules are around not giving a misleading impression of your page's relationship with Meta.
As for copyright, when you are in the US you can make Fair Use of their materials, regardless of what the license says.
That you can't do that in Europe is not Meta's fault.
Oh. You're talking about Net Neutrality and not copyright. I'm afraid I don't know enough about the network business to form an opinion on that.
I don't think what happened to you was a subsidy, though. You're offering something for free, and apparently Alibaba took advantage of you for that. That's just how it is, sometimes.
I touched on a lot of subjects. In a nutshell, I am against rent-seeking. No more, no less.
BTW, that turned out to be a false.
That's correct. My point was that they're following an agenda as well. But they're correct that that signature has consequences and doesn't translate into unlimited corporate growth.
OpenAI is very secretive and not transparent at all. They promised to release a model which they've delayed several times now. But other than that, they don't write papers for some time now, they don't share stuff. And they do other small little things for their own benefit and so the competition can't do the same. They even go ahead and keep simple numbers like the model size a big trade secret. They guard everything closely and they like it that way. It's the literal opposite of free exchange of information. And they do that with most of their business decisions.
And Meta's model come with a license plus an EULA. And I've lost track of the current situation, but as an Europen I've been prohibited from downloading and using Meta's LLMs for some time. Sometimes they also want my e-mail address, I have to abide by their terms and I don't like the terms... That's their rights. And they're making use of them. It is not I can just download it and do whatever because that were Fair Use as well... They retain rights, and many of them.
Trademark is definitely part of the conversation. Can models paint a mickey mouse? other trademarked stuff? Sure they do. And it's the same trademark that protects fictional characters and other concepts. So once AI ingests that, it needs addressing as well. And it's not just that. They (Meta/Instagram) also address copyright and they also have a lot of rules about that. With that specific thing I was more concerned with their logo, though, and that is mostly trademark law.
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
Yeah, that's kind of my point. They're taking advantage of people. And kind of in a mischevious way, because they've thought about how they can defeat the usual defenses. How do you think I'm supposed to deal with that? Let everyone take advantage of me? Take down my server and quit this place?
I'm with you on this. As long as it's fair. Make sure AI companies aren't rent-seeking either. Because currently that's big part of their business model.
I mean what do you think the big piles of information the gather for training are? That they don't share and do contracts and even buy up companies to get exclusive access... How they gobble up the resources? And how prices for graphics cards skyrocket first due to crypto and then due to AI? That's kinda rent-seeking on several different levels...
It's definitely inspired by her performance on "Her". Sam Altman himself made a reference, connecting his product and that specific movie. It's likely not a coincidence. And they kind of followed up and removed that voice along with a few others. Clearly not because they were right and this is an uncontroversial topic.
The vision models are not for the EU. Meta trained them on Facebook data. The EU did not allow that. Meta said that this would mean that their models would not have the necessary knowledge to be useful for European users, and disallowed their use in the EU. It also means that some EU regulations don't apply, but they did not give that as a reason, I think.
In any case, it seems quite fair to me. If Europe does not want to pitch in, but only makes demands, then why should it reap the benefits?
Some other recent open models by Tencent and Huawei are also not for the EU. That is in response to the AI Act. I am surprised that it is not a standard clause yet.
No. They can't override fair use. That's the point of fair use. You cannot do what you like with it because you are in Europe and don't have fair use.
I really don't understand how that is supposed to make sense. You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It's ridiculous.
I don't see how that is about copyright.
Back that up or retract the statement.
What you are saying is that someone who sounds a bit like Scarlet Johanson must get permission from her to speak in public.
Maybe there is a language issue here. But from what you are writing, you are not against rent-seeking. You demand privileges and free money for special people; a new aristocracy. You even want privileges for Meta, even though you use these privileges as arguments why these privileges should exist. This is all absolutely ridiculous.
Here's rent-seeking in the German Wikipedia: https://de.wikipedia.org/wiki/Renten%C3%B6konomie
Let me rephrase it a bit: OpenAI is one of the prime examples. They wrote one or two scientific papers early on. And then they stopped. Deliberately. They're not contributing anything to science. All they invent is strictly for-profit and happens behind closed doors. They take, they don't contribute back.
And the main asset in the digital age is information. It's necessary for AI training to pile that up in a dataset. So that's their supply and they want it cheap because they need a lot of it. That's where they generate their "rent" from. Do they contribute anything back with that? No. They "seek" it and pile it up and that becomes their trade secret. And that's why I call them "rent-seeking". (Thanks for the Wikipedia article, yours was way better than the convoluted definition I read yesterday...) And it even translates to the illegal activities mentioned in the Wikipedia article. Meta has admitted to pirating books to pile up datasets faster. OpenAI likely did the same(?) It's just that they keep everything a secret. No company tells you anymore whether your content went into a dataset, since you might be able to use the legal system against them.
We can see that also with some platforms like Github, which turned out to be a great resource for AI training for Microsoft. Harvesting data is one of the main business models these days. And having that data is what pays the rent. It's not all there is to it. There's a lot of work in compiling it, curating datasets, RLHF... And then of course the science behind AI itself. But the last one aside, that's also often done with negative effects on society. We all know about the precarious situation of the data labellers in Africa.
And then all of this, plus the experts they get from the public universities and all the GPUs in the datacenters and some electricity get turned into their (OpenAI's) intellectual property.
Maybe tell me what they contribute back? Is there anything they give? I don't think so. They mainly seem like parasites to me, freeloading on all the information they can gather in electronic form. And then? Is there anything we get in return?
And maybe we're having a small misunderstanding here. I'm not Anti-AI or anything. I just want people who take something from society, to contribute something back to society. And they really like to take, but they themselves painstakingly avoid disclosing the smallest little details.
I'd say there is two options. Either they do contribute back and we find a healthy relationship between society and big-tech AI companies. That'd make it completely fine if they also take things and it's give-and-take. Or they want to do a for-profit dubious service with no-one having a say in it or look inside or be able to use it aside from what they devised for society... But then the same rules apply to them. They then also have to contribute back in form of money to pay for their supplies and license the content that goes in to their product.
My own opinion: Allow AI and cater to scientific progress. In a healthy way, though. The companies do AI and they get resources. But they're obligated to transparency and contribute back. For example open-weight models are a good idea. I'd go further than that, because science and society also needs to address biases, what AI can be used for, and a bunch of issues that come with it. Like misinformation, spam... The companies aren't incentivised to address that. And it starts to show impact on the internet and society. And regulations are the way to make them do what's necessary or benefitial in the long run.
I'm generally against hyper-capitalism and big corporations. They often don't do us any good. It's a bit complicated with AI since those companies are over-valued and there is a big investment bubble, which isn't necessarily about society. But the copyright-industry is part of the same picture. Spotify for example isn't healthy for society at all. And the Höffner video you linked had a lot of good points about that. I'm not sure whether you're aware of the other side of the coin... For example I've talked to some musicians (copyright holders) and I've written some few pages of technical documentation and I'm aware that it takes several weeks behind the desk to produce 40 pages. And like half a year or more to write a novel. And somehow you need to eat something during those months... So with capitalism it's not always easy. The current situation is sub par. And the copyright industry is mainly a business model to leech on people who create something. We'd be better off if we cut out the middle men.
I see. Thank you. I'm afraid you don't quite understand what rent-seeking means. Let me try a hypothetical example.
Food is pretty cheap. But suppose a single company had a monopoly on supplying food. How much would people be willing to pay? People would give almost anything they have.
The reason food is cheap, is because there is no monopoly. If someone charges more than the competition, you go to the competition. You get a market price. It's complicated but one thing that goes into the price of food is the cost of labor. Many people must work to supply food.
These workers could do other things with their time. But also, other people could do their work of supplying food. No one has a monopoly. Eventually, the cost of labor depends on how much money you must offer to people to be willing to put up with the work.
If someone had a monopoly on food supply, they could charge fantastic prices. Their cost would not change. The difference between the market price and the monopoly price is the monopoly rent.
Let's take this closer to AI training.
Let's say there's some guy who's searching through libraries and archives for stuff to digitize so that it can be sold to AI companies for training. He finds an archive of old newspapers. How much would the market price for scans of these newspapers be? Let's ignore copyright for now.
Maybe the potential buyer could send someone else to scan the papers. So our guy could only ask to be paid for the labor in scanning the papers.
So our guy will not say where he found that archive. That is his trade secret. The potential buyer would have to send someone to search for that archive and scan it. That means our guy can ask to be paid for his labor in finding the archive AND scanning it. The potential buyer will only hire someone else to do that if our guy asks too high a price.
There is a way our guy can get more. If he destroys all remaining copies of these newspapers, then he has a monopoly. Now he can ask for as much as the potential buyer is willing to pay. That's a monopoly rent.
Now copyright... Those newspapers are probably under copyright. If our guy is in Europe, he will have to get permission by the rights-holder to scan the papers. Copyright is a monopoly enforced by the state. The rights-holder can now extract the monopoly rent from our guy.
If the publisher has gone out of business, the rights-holders may be hard to find but he has to make the effort. In practice, this means that there is really no point in making the effort to preserve European culture and history. The copyright people don't just harm technological progress and the European economy, they harm European culture. That's parasitic.
You're making the argument that OpenAI and others are trying to get paid. That's not rent-seeking. Ideally, our laws ensure that seeking money makes you work for the benefit of other people.
Farmers work for money, and everyone else gets a lot of good, cheap food out of it. If you demand that farmers should work for free, then you're demanding that many of us should starve.