Microsoft AI boss Mustafa Suleyman incorrectly believes that the moment you publish anything on the open web, it becomes “freeware” that anyone can freely copy and use.
When CNBC’s Andrew Ross Sorkin asked him whether “AI companies have effectively stolen the world’s IP,” he said:
That certainly hasn’t kept many AI companies from claiming that training on copyrighted content is “fair use,” but most haven’t been as brazen as Suleyman when talking about it.
Speaking of brazen, he’s got a choice quote about the purpose of humanity shortly after his “fair use” remark:
Suleyman does seem to think there’s something to the robots.txt idea — that specifying which bots can’t scrape a particular website within a text file might keep people from taking its content.
Disclosure: Vox Media, The Verge’s parent company, has a technology and content deal with OpenAI.
The original article contains 351 words, the summary contains 139 words. Saved 60%. I'm a bot and I'm open source!
I got the math the wrong way around but read the bottom of the bot's post. The bot's job is to cut the fluff out of articles, and it copy/pastes the remaining text for us to read here.
So my comment should have said 40%, but the point was if we're comparing what the bot did with your coworkers talking about a game, it'd be more akin to them reciting the commentator verbatim.
I thought that even discussing the game without the express permission of the media company you used to watch and the sports league was a violation. Not sure why you are bringing commentary on commentary in it. Again not a sports ball guy but when I do hear people talk about sports they are talking about sports not the person talkimg about sports.
And if a company makes a negligent decision, which kills a million people over time, why is no one being put on death row? They can and do have it both ways, but I can still wish for a just world where if companies are people, they can be put to death for mass casualties caused by their decisions.
!Arthur Dent has his home demolished while humans simultaneously have Earth demolished by an alien race called Vogons, but him and Ford Prefect escape by hitchhiking onto the Vogon ship. They're discovered and thrown into space, but miraculously saved by Ford's relative (can't remember how they're related) and his ship The Heart of Gold, which is powerful but unpredictable. They wind up on a mythical planet due to that unpredictability, and learn that Earth was a designer planet created to calculate the ultimate answer to the ultimate question of life, the universe, and everything. (The famous "42" thing). The whole crew escapes the planet and decides to go to The Restaurant at the End of The Universe to eat and watch the universe end.!<
Have I just stolen The Hitchhikers Guide to the Galaxy and given it to you?
You've probably not infringed the copyright, only the court can decide though; if you were to be challenged by the rights holder.
I think there are lots of factors in your defence:
you're not selling it , your use is an example for education
I don't think you're reducing the market value for the original(s) in any way
you've not included substantial verbaitim sections of the original works , but I think you have used more than just facts and ideas (not sure though).
But add in some more quotes, flesh it out, and then try to sell it . . . each step weakens the 'fair use' defence.
This the the problem for the LLM, it can be used for many things, and if it has no filter or limit, then eventually the collective derived works might add up to commercial, substantial reuse, and might include enough to have copied a substantial portion of the original.
Very hard to determine I'd think. Each individual use might be fair, but did the LLM itself go too far at some point?
Copyright holder probably struggles to challenge the LLM on the basis of all the things infinite mokeys might use it for in future.
This the the problem for the LLM, it can be used for many things, and if it has no filter or limit
I agree with pretty much everything before this but that particular comment was just talking about summaries, which imo is a lot more cut and dry. (SparkNotes, for example)
An LLM by itself is unlimited and unfiltered, but it's not impossible to limit one and sell it. For all the shit OpenAI deserves to get, I have to give them one thing, their copyright restriction system seems to be on par with YouTube. I paid for a month of it when GPT4 came out and tried my hardest to bypass it, but it won't even give me copyrighted texts when the words are all replaced with synonyms or jumbled around.
I think if someone's offering their LLM as a service and has a system like that in place, they aren't stealing any more than YouTube is stealing. Otherwise I agree that there's a strong argument for copyright infringement.
But paraphrasing is not copyright infringement either. It's no different than Wikipedia having a synopsis for every single episode of a TV series. Telling someone about what a work contains for informational purposes is perfectly fine.
You do need a registration key, but now it's tied to the hardware so it activates as soon as you connect to the network, no need to actually type the registration key.