(oops, I didn't check the usernames properly, sorry about that)
I still don't think this is fully accurate.
The view I'm noticing is that people consider that they have a right to the programs they produce, regardless of whether they are writing them by hand or by prompting an LLM in the right ways to produce that output. And this remains true both for work produced as an employee/company owner, and for code contributed to an OSS project.
Also, as an employee, the relationship is very different. I am hired to produce solutions to problems my company wants resolved. This may imply writing code, finding OSS code, finding commercial code that we can acquire, or generating code. As part of my contract, I relinquish any rights I may have to any of this code to the company, and of course I commit to not use any code without a valid license. However, if some of the code I produce for the company is not copyrightable at all, that is not in any way in breach of my contract - as long as the company is aware of how the code is produced and I'm not trying to deceive them, of course.
In practice, at least in my company, there has been a legal analysis and the company has vetted a certain suite of AI tools for use for code generation. Using any other AI tools is not allowed, and would be a breach of contract, but using the approved ones is 100% allowed. And I can guarantee you that our lawyers would assert copyright to any of the code generated in this way if I was to try to publish it or anything of the kind.
The way I see this it looks like this:
1. Initially, when you claim that someone has violated your copyright, the burden is on you to make a convincing claim on why the work represents a copy or derivative of your work.
2. If the work doesn't obviously resemble your original, which is the case here, then the burden is still on you to prove that either
(a), it is actually very similar in some fundamental way that makes it a derived work, such as being a translation or a summary of your work
or (b), it was produced following some kind of mechanical process and is not a result of the original human creativity of its authors
Now, in regards to item 2b, there are two possible uses of LLMs that are fundamentally different.
One is actually very clear cut: if I give an LLM a prompt consisting of the original work + a request to create a new work, then the new work is quite clearly a derived work of the original, just as much as a zip file of a work is a derived work.
The other is very much not yet settled: if I give an LLM a prompt asking for it to produce a piece of code that achieves the same goal as the original work, and the LLM had in its training set the original work, is the output of the LLM a derived work of the original (and possibly of other parts of the training set)? Of course, we'll only consider the case where the output doesn't resemble the original in any obvious way (i.e. the LLM is not producing a verbatim copy from memory). This question is novel, and I believe it is being currently tested in court for some cases, such as the NYT's case against OpenAI.
The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.
I think this is a bit too broad. There are actually three possible cases.
When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.
If the code is completely different, then clean room or not is indeed irrelevant. The only way the author can claim that you violated their copyright despite no apparent similarity is for them to have proof you followed some kind of mechanical process for generating the new code based on the old one, such as using an LLM with the old code as input prompt (TBD, completely unsettled: what if the old code is part of the training set, but was not part of the input?) - the burden of proof is on them to show that the dissimilarity is only apparent.
In realistic cases, you will have a mix of similar and dissimilar portions, and portions where the similarity is questionable. Each of these will need to be analyzed separately - and it's very likely that all the similar portions will need to be re-written again if you can't prove that they were not copied directly or from memory from the original, even if they represent a very small part of the work overall. Even if you wrote a 10k page book, if you copied one whole page verbatim from another book, you will be liable for that page, and the author may force you to take it out.
> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.
Actually this is very much how people think for code.
Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.