The obsession with protecting access to lyrics is one of the strangest long-running legal battles to me. I will skip tracks on Spotify sometimes specifically because there are no lyrics available. Easy access to lyrics is practically an advertisement for the music. Why do record companies not want lyrics freely available? In most cases, it means they aren't available at all. How is that a good business decision?
They probably fear a domino effect if they let go of this. And so they defend it vehemently to avoid setting a precedent.
Think about compositions, samples, performance rights, and so on. There is a lot more at stake.
What's the benefit of protecting monetary IP rights to art?
We'll only get the art that artists really wanted to make? Great!
> What's the benefit of protecting monetary IP rights to art?
What's the benefit of protecting monetary IP rights to software?
What's the benefit of consolidating all meaningful access to computing services to a few trillion-dollar gate-keeping corpos?
What's the benefit of getting paid for your work? We'll only get the work people really want to do? Great!
Art existed before IP rights. Artists did get paid.
Hot take: it’s all bullshit.
Like software patents - when you’re not a normie.
Thoughts by someone who doesn’t make a living by songs?
I’m guessing you’d want to restrict lyrics to encourage more plays of the song by people who are motivated to understand them. Along with the artist’s appreciation of that experience of extracting what you’re fascinated by. Burdensome processes generate love and connection to things.
Not everything is a functional commodity to be used and discarded at whim.
One amusing part of lyrics on Spotify to me is how they don't seem to track which songs are instrumentals or not and use that to skip the message about them not knowing the lyrics. An instrumental will pop up and it will say something like "Sorry, we don't have the lyrics to this one yet".
The only thing funnier than that is when they do have the lyrics to a song that probably doesn't need them, like Hocus Pocus by Focus: https://open.spotify.com/track/2uzyiRdvfNI5WxUiItv1y9?si=7a7...
Oh they track that, it's in their API as the "instrumentalness" score: https://developer.spotify.com/documentation/web-api/referenc...
The fact that they don't do anything with that information is unrelated.
Interesting, especially that it's a probability rather than a boolean! The line can be blurry sometimes (like in the example I mentioned), so it makes sense that it might not be possible to come up with a consistent way of classifying them that everyone would agree with.
I’ve also seen cases where they list lyrics for a song that doesn’t have any (usually an instrumental jazz version of an old standard).
The content industries should have been the ones to invent LLMs, but their head is so stuck in the past and in regressive thinking about how they protect their revenue streams that they're incapable of innovating. Publishing houses should have been the ones to have researchers looking into how to computationally leverage their enormous corpus of data. But instead, they put zero dollars into actual research and development and paid the lawyers instead. And so it leads to attitudes like this.
The only people seeing themselves as "content creators" are people giving social media stuff so their users get something they can doom scroll. Other people see themselves as artists, entertainers, musicians, authors, etc.
I'm referring to the rent seekers sitting in between the artists and the public.
“The content industries.”
Why would people invest in destroying what they love?
He meant, the stream of free money from unsuspecting monkeys.
> The content industries should have been the ones to invent LLMs
While exclusively-controlled LLMs would be mildly useful to them, the technology existing is dangerous to them, and they already have a surplus supply of content at low cost that they monetize by controlling discovery, gatekeeping, and promotion, so I don't think it makes sense for them to put energy into LLMs even if they had the technical acumen to recognize the possibilities (much the same way that Google, despite leading in developing the underlying technology, had vvery little incentive to productize it since it was disruptive to their established business, until someone else already did and the choice was to compete on that or lose entirely.)
You have to get ahead of the disruption that will destroy you. At least, if you care about longevity of your company. I realize this isn't always the case.
That's always been the case, eg. how they were latecomers to streaming.
Streaming had to compete with digital music piracy. As a result, Spotify is impossibly cheap compared to buying individual albums or singles in the past. So musicians hardly receive any money from recorded music anymore. Nowadays they basically have only concerts left as a means to earn money.
The composition and lyrics are owned separately from the recorded performance.
I'm pretty sure you could even have lyrics with a separate copyright from the composition itself. For example, you can clearly have lyrics without the music and you can have the composition alone in the case that it is performed as an instrumental cover or something.
This is a tough one for the HN crowd. It's like that man not sure which button to push meme.
1) RIAA is evil for enforcing copyrights on lyrics?
2) OpenAI is evil for training on lyrics?
I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.
Should young adults be allowed to violate copyright and no one else? The damages caused seem far worse than an LLM being able to reproduce song lyrics.
Is it simply "we like college kids" and "we hate OpenAI"? that dictates this?
I'm ready, hit me with the nuance.
A young adult who pirates, is also more likely to make purchases in that industry, and has an impact that is limited.
A corporation who pirates, is more likely to pirate en masse everything that they can get their hands on, in an ongoing manner, and throw everything they can at contesting their right to do so in court.
This is neither true nor relevant.
Maybe individuals and corporations are differents enough copyright should not work the same way.
[flagged]
What damages? You can learn lyrics by listening the song.
Sometimes, sometimes not.
I'm still trying to work out the lyrics to Prisencolinensinainciusol. https://youtu.be/fU-wH8SrFro
... Alright?
Sounds like you agree with me.
Why not both? As the GP mentioned, lyrics are also invaluable for people besides training for AI.
I think the perceived lack-of-value for them is related to how easy it is to write lyrics down, compared to any other aspect of the music. Anyone can do it within the time of the song, usually first try. Any other aspect of the song cant't just be written down from ear (yes, including the sheet music, which isn't nearly expressive enough to reproduce a performance*).
*There are some funny "play from sheet music without knowing the song" type videos out there, with funny results. YouTube/google search is no longe usable, so I can't find any.
I think you mean the RIAA
RAII is a different kind of (necessary) evil
Indeed, too much C++. Edited.
3) Some types of data are more ethical to train on than others.
Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.
Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.
Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.
LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...
This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.
If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.
Does that mean you would support open LLM model training on copyrighted data?
I think that opens several other cans of worms, but in principle I would support a solution that allows using copyrighted materials if it is for the common good (I.e the results are released fully open, means not just weights but everything else).
As a side note i am definitely not strong into IP rights, but I can see the benefits of copyright much more clearly than patents.
My point wasn't supposed to be that copyright is bad (or that it's good), just that the business logic of fighting the sharing of lyrics is incomprehensible to me.
That aside, I think there's a lot more complexity than you're presenting. The issue is who gets to benefit from what work.
As hackers, we build cool things. And our ability to build cool things comes in large part from standing on the shoulders of giants. Free and open sharing of ideas is a powerful force for human progress.
But people also have to eat. Which means even as hackers focused on building cool things, we need to get paid. We need to capture for ourselves some of the economic value of what we produce. There's nothing wrong with wanting to get paid for what you create.
Right now, there is a great deal of hacker output the economic value of which is being captured almost exclusively by LLM vendors. And sure, the LLM is more amazing than whatever code or post or book or lyric it was trained on. And sure, the LLM value comes from the sum of the parts of its source material instead of the value of any individual source. But fundamentally the LLM couldn't exist without the source material, and yet the LLM vendor is the one who gets to eat.
The balance between free and open exchange of ideas and paying value creators a portion of the value they create is not an easy question, and it's not anti-hacker to raise it. There are places where patents and other forms of exclusive rights seem to be criminally mismanaged, stifling progress. But there's also "some random person in Nebraska" who has produced billions of dollars in value and will never see a penny of it. Choosing progress alone as the goal will systematically deprive and ultimately drive away the very people whose contributions are enabling the progress. (And of course choosing "fair" repayment alone as the goal will shut down progress and allow less "fair" players to take over... that's why this isn't easy.)
Sounds like it was never about copyright as a principle, only symbolic politics (ie. copyrights benefit megacorps? copyright needs to be weaker! copyright hurts megacorps? copyright needs to be stronger!)
Actually in Germany it's GEMA
It's a good decision because it must be an incredible minority of people who only listen to music when the lyrics can be displayed. I'd imagine most people aren't even looking at the music playing app while listening to music. Regardless, they are copyrighted and they get license fees from parties that do license them and they make money that way. Likely much more money than they would make from the streams they are losing from you.
I think it depends on the music. Most people will have a greatly improved experience when listening to opera if they have access to (translated) lyrics. Even if you know the language of an opera, it can be extremely difficult for a lot of people to understand the lyrics due to all the ornamentation.
What percentage of streaming income does opera, as a genre, represent such that it could even factor into this business decision?
I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them. But these days with generative AI, they can take lyrics and just make a new song with them, and you can probably see why artists and record companies would want to stop that.
Plus, from TFA,
"GEMA hoped discussions could now take place with OpenAI on how copyright holders can be remunerated."
Getting something back is better than nothing
I didn't downvote, but
> I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them.
Reproducing lyrics in text form is, in fact, a problem, independent of AI. The music industry has historically been aggressively litigious in going after websites which post unlicensed song lyrics[0]. There are many arcane and bizarre copyright rules around lyrics. e.g. If you've ever watched a TV show with subtitles where there's a musical number but none of the lyrics are subtitled, you might think it was just laziness, but it's more likely the subtitlers didn't have permission to translate&subtitle the lyrics. And many songs on Spotify which you'd assume would have lyrics available, just don't, because they don't have the rights to publish them.
[0] https://www.billboard.com/music/music-news/nmpa-targets-unli...
Thanks. Maybe that misconception was the problem. Taking a hammering in downvotes, lol
Had a couple of drive-by downvotes... Is it that stupid an opinion? Granted I know nothing about the case except for what's in TFA
I'm not one of the downvoters, but it may be this: "Many sites have been doing that for decades and as far as I know record companies haven't gone after them."
Record companies have in fact, for decades, been going after sites for showing lyrics. If you play guitar, for example, it's almost impossible to find chords/tabs that include the lyrics because sites get shut down for doing that.
Hmm, alright. I actually do play guitar and used to find chords/tabs with lyrics easily. I haven't been doing that for maybe 10-15 years. Anyway, maybe those sites were paying for a license and I just never considered it
> Had a couple of drive-by downvotes... Is it that stupid an opinion?
While I do not agree with your take, FWIW I found your comment substantive and constructive.
You seem to be making two points that are both controversial:
The first is that generative AI makes the availability of lyrics more problematic, given new kinds of reuse and transformation it enables. The second is that AI companies owe something (legally or morally) to lyric rights holders, and that it is better to have some mechanism for compensation, even if the details are not ideal.
I personally do not believe that AI training is meaningfully different from traditional data analysis, which has long been accepted and rarely problematized.
While I understand that reproducing original lyrics raises copyright issues, this should only be a concern in terms of reproduction, not analysis. Example: Even if you do no data analysis at all and your random character generator publishes the lyrics of a famous Beatles song (or other forbidden numbers) by sheer coincidence, it would still be a copyright issue.
I also do not believe in selective compensation schemes driven by legal events. If a legitimate mechanism for rights holders cannot be constructed in general, it is poor policy craftsmanship to privilege the music industry specifically.
Doing so relieves the pressure to find a universal solution once powerful stakeholders are satisfied. While this might be seen as setting a useful precedent by small-scale creators, I doubt it will help them.
It's like saying that movie studios haven't gone after Netflix over movies, so what's the issue with hosting pirated movies on your own site. The reason movie studios don't go after Netflix is that they have a license to show it.
If anything, AI would scramble the lyrics more than a human "taking lyrics to make a new song from them".
Maybe, but it's also possible to get an AI to produce a song with the exact same lyrics. And a human copying lyrics would also be a copyright issue in any case.
But anyway it seems I misinterpreted the issue and record companies have always been against reproduction of lyrics whether an AI or human is doing it
Likely because you're a "luddite" which in the current atmosphere of HN and other tech spaces, mean you have a problem with a "research institution" which has a separate for-profit enterprise face that it wears when it feels like it having free and open access to the collected works of humanity so it can create a plagiarism machine that it can then charge for people to access.
I don't respect this opinion but it is unfortunately infesting tech spaces right now.
Simon Willison had an analysis of Claude's system prompt back in May. One of the things that stood out was the effort they put in to avoiding copyright infringement: https://simonwillison.net/2025/May/25/claude-4-system-prompt...
Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.
All AI companies know they're breaking the law. They all have prompts effectively saying "Don't show that we broke the law!". That we continue to have tech companies consistently breaking the law and nothing happens is an indictment of our current economy.
And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.
This is a point that I don't see discussed enough. I think anthropic decided to purchase books in bulk, tear them apart to scan them, and then destroy those copies. And that's the only source of copyrighted material I've ever heard of that is actually legal to use for training LLMs.
Most LLMs were trained on vast troves of pirated copyrighted material. Folks point this out, but they don't ever talk about what the alternative was. The content industries, like music, movies, and books, have done nothing to research or make their works available for analysis and innovation, and have in fact fought industries that seek to do so tooth and nail.
Further, they use the narrative that people that pirate works are stealing from the artists, where the vast majority of money that a customer pays for a piece of copyrighted content goes to the publishing industry. This is essentially the definition of rent seeking.
Those industries essentially tried to stop innovation entirely, and they tried to use the law to do that (and still do). So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.
> So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.
I think they try to expand copyright from "protected expression" to "protected patterns and abstractions", or in other words "infringement without substantial similarity". Otherwise why would they sue AI companies? It makes no sense:
1. If I wanted a specific author, I would get the original works, it is easy. Even if I am cheap it is still much easier to pirate than use generative models. In fact AI is the worst infringement tool ever invented - it almost never reproduces faithfully, it is slow and expensive to use. Much more expensive than copying which is free, instant and makes perfect replicas.
2. If I wanted AI, it means I did not want the original, I wanted something Else. So why sue people who don't want the originals? The only reason to use AI is when you want to steer the process to generate something personalized. It is not to replace the original authors, if that is what I needed no amount of AI would be able to compare to the originals. If you look carefully almost all AI outputs get published in closed chat rooms, with a small fraction being shared online, and even then not in the same venues as the original authors. So the market substitution logic is flimsy.
You're using the phrase "actually legal" when the ruling in fact meant it wasn't piracy after the change. Training on the shredded books was not piracy. Training on the books they downloaded was piracy. That is where the damages come from.
Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.
I think your first paragraph is entirely congruent with my first two paragraphs.
Your second paragraph is not what I'm discussing right now, and was not ruled on in the case you're referring to. I fully expect that, generally speaking, infringement will be on the users of the AI, rather than the models themselves, when it all gets sorted out.
I'm in agreement that it will be targeted at the users of AI as well. Once that prevails legally someone will try litigating against the users and the AI corporations as a common group.
>Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.
Nothing says it's illegal, either. If anything the courts are leaning towards it being legal, assuming it's not trained on pirated materials.
>A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn't illegal but that Anthropic wrongfully acquired millions of books through pirate websites.
https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...
I don’t follow. You’re punishing the publishing industry by punishing authors?
I'm saying that LLMs are worthwhile useful tools, and that I'm glad that we built them, and that the publishing industry, which holds the copyright on the material that we would use to train the LLMs, have had no hand in developing them, have done no research, and have actively tried to fight the process at every turn. I have no sympathy for them.
The authors have been abused by the publishing industry for many decades. I think they're just caught in the middle, because they were never going to get a payday, whether from AI or selling books. I think the percentage of authors that are commercially successful is sub 1%.
So the argument is because LLMs are useful and the publishing industry was not involved in their creation we should disregard the property rights of the publishing industry and allow using their work without a license? By that same argument (if something useful is being build, we ignore existing rights) shouldn't not also just take the code/models from OpenAI etc. and just publish them somewhere? Why not also their datacenters?
It's not really an argument. It's an observation that they sat on their hands while other industries out-innovated them. They were complacent and now they're paying the price.
We have laws and rules, but those are intended to work for society. When they fail to do so, society routes around them. Copyright in particular has been getting steadily weaker in practice since the advent of the Internet, because the mechanisms it uses to extract value are increasingly impractical since they are rooted in the idea of printed media.
Copyright is fundamentally broken for the modern world, and this is just a symptom of that.
> Folks point this out, but they don't ever talk about what the alternative was.
That LLMs would be as expensively priced as they really are on society and energy costs? A lot of things are possible, whether they are economically feasible is determined by giving them a price. When that price doesn't reflect the real costs, society starts to wast work on weird things, like building large AI centers, because of a financial bubble. And yes putting people out of business does come with a cost.
"Innovation" is not an end goal.
Innovation is absolutely an end goal, at least in terms of our legal framework. The primary impetus for copyright and patent law is is innovation: to credit those that innovate their due, and I do think this stems from our society seeing innovation as an end goal. But the intent of the system is always different than its actual effect, and I'm fairly passionate about examining the shear.
I run my AI models locally, paying for the hardware and electricity myself, precisely to ensure the unit economics of the majority of my usage are something I can personallly support. I do use hosted models regularly, though not often these days, which is why I say "the majority of my usage".
In terms of the concerns you express, I'm simply not worried. Time will sort it out naturally.
You’re willing to eliminate the entire concept of intellectual property for a possibility something might be a technological advancement? If creators are the reason you believe this advancement can be achieved, are you willing to provide them the majority of the profits?
That's an absolutely good tradeoff. There's no longer any need for copyright. Patents should go next. Only trademarks can stay.
> There's no longer any need for copyright
So you assign zero value to the process of creation?
Zero value to the process of production?
So people who write and produce books, shows and films should all do what? Give up their craft?
Creation isn't special, or constrained in number.
Process of creation itself is gratifying and valuable to those who will pursue it. No reason to additionally reward it.
Lamp lighters had to give up their craft I suppose and made way to a better world.
> Creation isn't special, or constrained in number. > Process of creation itself is gratifying and valuable to those who will pursue it.
spoken like someone who has never made anything in the real world
Holding a boom mic in the air is not gratifying and valuable to anyone who has to do it.
The fruits of your labour are not your labour.
Bullshit. Read up and understand the history of these things and their benefits to society. There is a reason they were created in the first place. Over a very long time. With lots of thoughts into the tradeoff/benefits to society. That Disney fucked with it does not make the original tradeoff not a benefit to society.
The fact that you don't actually call out the specific benefit is telling. We're in a world of plenty and don't need copyright to have those benefits for our fellow humans.
Without agreeing or disagreeing with your view, I feel like the the issue the issue with that paradigm is inconsistency. If an individual "pirates", they get fines and possible jail time, but if a large enough company does it, they get rewarded by stockholders and at most a slap on the wrist by regulators. If as a society we've decided that the restrictions aren't beneficial, they should be lifted for everyone, not just ignored when convenient for large corporations. As it stands right now, the punishments are scaled inversely to the amount of damage that the one breaking the law actually is capable of doing.
> And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century
You mean like, murder ?
The whole industry is based on breaking the law. You don’t get to be Microsoft, Google, Amazon, meta, etc without large amounts of illegality.
And the VC ecosystem and valuations are built around this assumption.
I don’t read this as “don’t show we broke the law,” I read it as “don’t give the user the false impression that there’s any legal issue with this generated content.”
There’s nothing law breaking about quoting publicly available information. Google isn’t breaking the law when it displays previews of indexed content returned by the search algorithm, and that’s clearly the approach being taken here.
Masked token prediction is reconstruction. It goes far beyond “quoting.”
This is incorrect. Two judges have now ruled that training on copyrighted data is fair use. https://www.whitecase.com/insight-alert/two-california-distr...
Training on copyright is not illegal. Even in the lawsuit against anthropic it was found to be fair use.
Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.
If my for profit/for sale product couldn't exist without inputting copyrighted works into it, then my product is derivative of those works. It's a pretty simple concept. No 'but human brains learn'. Humans aren't a corpo's for profit product.
'Would this product have the same value without the copyrighted works?'
If yes then it's not derivative. If no then it is.
There is US precedent for training being deemed not fair use. https://www.dglaw.com/court-rules-ai-training-on-copyrighted...
Why wouldn’t training be illegal? It’s illegal for me to acquire and watch movies or listen to songs without paying for them*. If consuming copyrighted material isn’t fair use, then it doesn’t make sense that AI training would be fair use.
* I hope it’s obvious but I feel compelled to qualify that, of course, I’m talking about downloading (for example torrenting) media, and not about borrowing from the library or being gifted a DVD, CD, book or whatever, and not listening/watching one time with friends. People have been successfully prosecuted for consuming copyrighted material, and that’s what I’m referring to.
That interpretation is not correct. The owner explicitly denied license to the data and then the company went to a third party to gain access to the data that they were denied license to.
> When building its tool, Ross sought to license Westlaw’s content as training data for its AI search engine. As the two are competitors, Thomson Reuters refused. Instead, Ross hired a third party, LegalEase, to provide training data in the form of “Bulk Memos,” which were created using Westlaw headnotes. Thomson Reuters’s suit followed, alleging that Ross had infringed upon its copyrighted Westlaw headnotes by using them to train the AI tool.
You’re contradicting the conclusion / interpretation written on dglaw.com? What is incorrect, exactly? It doesn’t seem like your summary challenges either my comment or the article I linked to, it’s not clear what you’re arguing. The court did find in this case that the use of the unlicensed data used for AI training was not fair use.
The case isn't on LLMs or transformers, it's on using some other form of non generative AI to create an index of case law. The details are light, but I would guess that the "AI" was just copying over the data from Thomson Reuters.
> Training on copyright is not illegal.
The court decision this thread is about holds that it is, on the grounds that the training data was copied to the LLM's memory.
You can always vote, but there is always someone going through the back door paying politicians and judges.
and training on mountains of open source code with no attribution is exactly the same
the code models should also be banned, and all output they've generated subject to copyright infringement lawsuits
the sloppers (OpenAI, etc) may get away with it in the US, but the developed world has far more stringent copyright laws
and the countries that have massive industries based on copyright aren't about to let them evaporate for the benefit of a handful of US tech-bros
No thank you. I am perfectly fine with AI training on my open source code and it is perfectly legal because my open source code does not include a license that bans AI training.
which license is that then?
because other than public domain they all require at least displaying the license, which "AI" ignores
post trained models strongly inclined to pass response similar to what got them high RL score, it's slightly wrong to keep thinking of LLMs as just next token predictions from dataset's probability distribution like it's some Markov Chain
I found this bit very revealing:
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
Another glimpse into the "mind" of a tech corporation allowing itself full freedom to profit from the vast body of human work available online, while explicitly declining any societal responsibility at all. It's the user's fault, he wrote an illegal prompt! We're only providing the "technology"!
This is largely how it works for nearly all coprightable work. I can draw Mickey Mouse but legally I'm not doing anything wrong until I try to sell it. It certainly doesn't put Crayola or Adobe at legal risk for me to do so.
But you are not the one drawing Mickey Mouse in this scenario, are you? You are instructing the AI company to draw something or more close to the original post you are prompting to generate lyrics for song X.
Your prompt may be asking something for illegal (i.e. reproducing the lyrics), but the one reproducing the lyrics is the AI company, not you yourself.
In your example you are asking Adobe to draw Mickey Mouse and Adobe happily draws a perfect rendition of Mickey Mouse for you and you have to pay Adobe for that image.
This keeps coming up, and I am not a lawyer, but as far as I can tell none of that matters. I can pay someone to draw Mickey Mouse for me and hang it up in my house. If I invite people to visit my Mickey Mouse House and charge them for the privilege, I'm in violation. Maybe the artist I paid to draw the mouse is also in some smaller violation but it all comes back to distribution and impact. I don't think it devalues Mickey Mouse in any way if I have a slot machine that spits out pictures of Mickey Mouse. If it does devalue it, maybe it doesn't have much value to begin with.
Reproduction (again, IANAL) seems to consist of a lot more than "I made it", it consists of how you use it and whether that usage constitutes infringement.
EDIT: To add, genuine question, what does "asking" come down to? I can ask Photoshop to draw Mickey Mouse through a series of clever Mickey-Mouse-shaped brush strokes. I can ask Microsoft Word to reproduce lyrics by typing them in. At what gradient between those actions and text prompting am I (or OpenAI, or Adobe) committing copyright infringement?
Now I get where you are coming from (also not a lawyer):
- You asking the painter to create a Mickey Mouse painting: not illegal. You still are asking for a derivative work without permission, but if used privately you're good (this is different per jurisdiction) - The artist creating the painting of a derivative work is acting illegally - they are selling you the picture and hence this is a commercial act and trademark infringement - Displaying the bought Mickey Mouse image publicly is likely infringement, but worse is if you would charge admission to show the picture, that would definitely be illegal - If you were to hide the image in your basement and look at it privately, it would most likely not be illegal (private use - but see first point since this is different per jurisdiction)
Comparing violations doesn't really make sense (the artist creating it vs. you displaying it) - the act of creating the image for money is illegal. If it were the artist creating the image for him/herself - that would be fine.
Now getting back to the LLM and your question which also the court answered (jurisdiction: Germany). The courts opinion is that the AI recreating these lyrics by itself is illegal (think about the artist creating the image for you for money).
Personally I would think the key part and similarity is the payment. You pay for using OpenAI. You pay for it creating those lyrics/texts. In my head I can create a similar reasoning to your Mickey Mouse example. If we'd take open source LLMs and THEY would create perfect lyrics, I think the court would have a much harder case to make. Who would you be suing and for what kind of money? It would all be open source and nobody is paying anyone anything to recreate the lyrics. It would be and is very hard to prove that the LLMs were trained on copyrighted material - in the lyrics example, they may have ingested illegal lyrics-sharing sites, but they may also just have ingested Twitter or Reddit where people talk about the lyrics - how could any LLM know that these contents were illegal or not to be ingested.
Not really, if I ask an artist to draw me a Mickey Mouse (for money) who is committing copyright infringement?
It's an interesting observation that the big AI corps very much argue that learning "is the same that humans do", so fair use. But then when it comes to using that learning they argue the other way, i.e. "this is just a machine, it's the person asking who is doing the infringement".
Companies care about material damages in practice. I'm not a lawyer but my understanding is that in that case, the artist drawing and selling the work is infringing (to a degree, because this seems to be a case Disney et al doesn't care about) but that if you take their work and publish and promote it and sell it, YOU become Disney's problem. If the wind and rain and erosion and time and God managed to produce a perfect post-Steamboat-Willie Mickey Mouse in the desert sand, visible from space, that wouldn't be infringement until you monetized it, called it Mickey Mouse and charged people to see it. A lot of the entities trying to get their piece of Infringement Pie seem to think their authority and their works are in the first position here instead; that my newfound capability to generate a Mickey Mouse from scratch on a whim affects their pockets, when in fact we're back to a variant of the classic piracy argument - I was not ever going to pay for it under any condition. If I decide this weekend to have one of the robots help me publish To Kill a Mockingbird Part 2, then sue me into the ground.