No right to relicense this project

2026-03-058:37532373github.com

Hi, I'm Mark Pilgrim. You may remember me from such classics as "Dive Into Python" and "Universal Character Encoding Detector." I am the original author of chardet. First off, I would like to thank...

Show article

Hi, I'm Mark Pilgrim. You may remember me from such classics as "Dive Into Python" and "Universal Character Encoding Detector." I am the original author of chardet. First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story.

However, it has been brought to my attention that, in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

I respectfully insist that they revert the project to its original license.

You can’t perform that action at this time.

Page 2

You can’t perform that action at this time.

Read the original article

Comments

By antirez 2026-03-0511:2111 reply

I believe that Pilgrim here does not understand very well how copyright works:

> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code

This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.

By maybewhenthesun 2026-03-0517:0511 reply

Regardless of the legal interpretations, I think it's very worrying if an automated AI rewrite of GPLed code (or any code for that matter) could somehow be used to circumvent the original license. That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

By rao-v 2026-03-0517:494 reply

I’m genuinely surprised to see this not discussed more by the FOSS community. There are so many ways to blow past the GPL now:

1. File by file rewrite by AI (“change functions and vars a bit”)

2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity

The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.

I don’t think any current license has a plausible defense against this sort of attack.

By nmfisher 2026-03-062:112 reply

I’ve never delved fully into IP law, but wouldn’t these be considered derivative works? They’re basically just reimplementing exactly the same functionality with slightly different names?

This would be different from the “API reimplementation” (see Google vs Oracle) because in that case, they’re not reusing implementation details, just the external contract.

By rerdavies 2026-03-0810:451 reply

Because copyrights do not protect ideas. Thankfully. We are free to express ideas, as long as we do so in our own words. How that principle is applied in actual law, and how that principle is a applied to software is ridiculously complicated. But that is the heart of the principle at play here. The law draws a line between ideas (which cannot be copyrighted), and particular expressions of those ideas (e.g. the original source code), which are protected. However, it is an almost fractally complicated line which, in many place, relies on concepts of "fairness", and, because our legal system uses a system of legal precedence, depends on interpretation of a huge body of prior legal decisions.

Not being a trained lawyer, or a Supreme Court justice, I cannot express a sensible position as to which side of the line this particular case falls. There are, however, enormously important legal precedents that pretty much all professional software developers use to guide their behaviour with respect to handling of copyrighted material (IBM vs. Ahmdall, and Google v. Oracle, particularly) that seem to suggest to us non-lawyers that this sort of reimplementation is legal. (Seek the advice of a real lawyer if it matters).

By rao-v 2026-03-0819:511 reply

Taking a step back, it seems fairly clear that wherever you set the bar, it should be possible to automate a system that reads code, generates some sort of intermediate representation at the acceptable level of abstraction and then regenerates code that passes an extensive set of integration tests … every day.

At that point our current understanding of open source protections … fails?

By rerdavies 2026-03-095:05

Depends whether you sit on the MIT half of open source, or the GPL side of open source, I suppose.

By pas 2026-03-0623:20

there's usually a test for originality, and it involves asking things (from the jury) like, is it transformative enough?

so if someone tells the LLM to write it in WASM and also make it much faster and use it in a different commercial sector... then maybe

since 2023 the standard is much higher (arguably it was placed too low in 1993)

By singpolyma3 2026-03-062:381 reply

"change functions and bars a bit" isn't a rewrite. Anything where the LLM had access to the original code isn't a rewrite. This would just be a derivative work.

However most of the industry willfully violates the GPL without even trying such tricks anyway so there are certainly issues

By UqWBcuFx6NV4r 2026-03-0614:36

The fact that you are drawing such absolute conclusions is indication enough that you are not qualified to speak on this.

By wakawaka28 2026-03-062:05

#1 is already possible and always has been. I never heard of a case of anyone actually trying it. #2 is too nitpicky and unnecessarily costly for LLMs. It would be better to just ask it to generate a spec and tests based on the original, them create a separate implementation based on that. A person can do that today free and clear. If LLMs will be able to do this, we will just need to cope. Perhaps the future is in validating software instead of writing it.

By wlonkly 2026-03-063:24

(1) sounds like a derivative work, but (2) is an interesting AI-simulacrum of a clean room implementation IF the first LLM writes a specification and not a translation.

By wareya 2026-03-060:35

It's worrying, but it's consistent with how copyright law is currently written. Laws haven't caught up with what technology is currently capable of yet. The discussion should be whether, and if so how, our laws should be tweaked to stop this from getting out of hand, IMO.

By TruePath 2026-03-0620:191 reply

If the AI is good enough to truly implement the whole thing to a similar level of reliability without copying it then who cares. At that point you should be able to decompile any program you want and find enough information inside that an AI can go write a similar quality program from the vague information about the call graph. We've transcended copyright in computer code.

If it can't and it costs a bunch of money to clean it up then same as always.

OTOH if what is actually happening is just that it is rewording the existing code so it looks different then it is still going to run afoul of copyright. You can't just rewrite harry potter with different words.

Note that even with Google vs oracle it was important they didn't need the actual code just the headers to get the function calls were enough. Yes it's true that the clean room isn't required but when you have an AI and you can show that it can't do it a second time without looking at the source (not just function declarations) that's pretty strong evidence.

By therealpygon 2026-03-0518:551 reply

Take AI out…if a person can do it, which they can, the situation hasn’t changed. Further, it was a person who did it, with the assistance of AI. Also, the concept that you “can’t be exposed to the code before writing a compatible alternative” is utterly false in their arguments. In fact, one could take every single interface definition they have defined to communicate and use those interfaces directly to write their own, because in fact this i(programmatic) interface code is not covered by copyright (with an implicit fair use exemption due to the face the software cannot operate without activating said interfaces). The Java lawsuit set that as precedent with JDK. A person could have absolutely rewritten this software using the interfaces and their knowledge, which is perfectly legal if they don’t literally copy and re-word code. Now, if it IS simply re-worded copies of the same code and otherwise the entire project structure is basically the same, it’s a different story. That doesn’t sound like what happened.

Finally, how exactly do people think corporations rewrite portions of code that were contributed before re-licensing under a private license? It is ABSOLUTELY possible to rewrite code and relicense it.

Edit: Further, so these people think you contribute to a project, that project is beholden to your contribution permanently and it can never be excised? That seems like it would blatantly violate their original persons rights to exercise their own control of the code without those contributions, which is exactly the purpose of a rewrite.

By rmast 2026-03-066:54

As part of the relicensing ZeroMQ did a few years ago, they sought permission from all previous contributors (yes, it was a multi-year effort). Code contributions that they weren’t able to get permission to relicense resulting in the corresponding lines being removed (or functionality rewritten from scratch).

By ottah 2026-03-062:49

It cuts both ways. You can write a GPL version of a proprietary or permissively licensed program. The only difference is the effort of the rewrite is (theoretically) easier.

(I have my doubts the rewrite is a reasonably defect free replacement)

By ineedasername 2026-03-075:05

It’s less worry to me given that a year ago this would have been exceptionally harder to do, requiring a lot more time and effort and been more costly. A year from now it will be even easier. All of this means that one aspect of the mission that brought about the need for a license like this is now fundamentally easier whether or not the license is used. There can be less worry about software locked up in closed source overall.

By TophWells 2026-03-0613:40

True, but if that is found to be how it works then an automated AI rewrite of closed-source code is just as unbound by the original license. Which is a much bigger win for the open-source community, since any closed-source software can become the inspiration for an open-source project.

By luma 2026-03-063:531 reply

If automated AI rewrites are generally feasible, then the marginal price of nearly all software trends to zero.

By robinsonb5 2026-03-068:36

If code becomes essentially free (ignoring for a moment the environmental cost or the long term cost of allowing code generation to be tollboothed by AI megacorps) the value of code must lie in its track record.

The 5-day-old code in chardet has little to no value. The battle-tested years-old code that was casually flushed away to make room for it had value.

By wakawaka28 2026-03-062:021 reply

Soulless multinationals often want to share costs with other soulless multinationals, just like individuals do. So I think there will always be publicly shared code. The real question is whether this code will be worth much if it can be implemented so quickly by a machine.

By judahmeek 2026-03-063:51

Implementation is only one of the costs shared through open-source projects.

There are others, such as security vulnerability detection, support, & general maintenance.

By singpolyma3 2026-03-062:36

If it actually is a rewrite it's not "circumventing" it's just a new thing

By CamperBob2 2026-03-0518:401 reply

That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

I'll trade that stick for what GenAI can do for me, in a heartbeat.

The question, of course, is how this attitude -- even if perfectly rational at the moment -- will scale into the future. My guess is that pretty much all the original code that will ever need to be written has already been written, and will just need to be refactored, reshaped, and repurposed going forward. A robot's job, in other words. But that could turn out to be a mistaken guess.

By beepbooptheory 2026-03-0518:523 reply

I think it's very weird but valid I guess to want to be just atomic individual in constant LLM feedback loop. But, at risk of sounding too trite and wholesome here, what about caring for others, the world at large? If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it. Isn't that like even the rational position here if you still wanna have good training materials for future models? These need not be conflicting interests! We can all be in this together, even if you wanna totally fork yourself into your own LLM output world.

What happened to sticking up for the underdogs? For the goodness of well-made software in itself, for itself? Isn't that what gave you all the stuff you have now? Don't you feel at least a little grateful, if maybe not obliged? Maybe we can start there?

By lukeschlather 2026-03-0520:56

> If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it.

The whole point of the GPL is to encourage sharing! Making money off of GPL code is not encouraged by the text of the license, but it is encouraged by the people who wrote the licenses. Saying "don't share it" is antithetical to the goals of the free software movement.

I feel like everyone is getting distracted by protecting copyright, when in fact the point of the GPL is that we should all share and share alike. The GPL is a negotiation tactic, it is not an end unto itself. And curl, I might note, is permissively licensed so there's no need for a clean room reimplementation. If someone's rewriting it I'm very interested to hear why and I hope they share their work. I'm mostly indifferent to how they license it.

By joquarky 2026-03-060:501 reply

> what about caring for others, the world at large

30 years of experience in the tech industry taught me that this will get you nowhere. Nobody will reciprocate generosity or loyalty without an underlying financial incentive.

> What happened to sticking up for the underdogs?

Underdogs get SPACed out and dump the employees that got them there.

By beepbooptheory 2026-03-061:52

Grateful I do not share your experiences. But I'm sure your viewpoint here is hard won. Sorry.

By CamperBob2 2026-03-0519:051 reply

Everything I have now arose from processes of continuous improvement, carried out by smart people taking full advantage of the best available tools and technologies including all available means of automation.

It'll be OK.

By beepbooptheory 2026-03-0520:081 reply

Ah well, I tried.. To paraphrase Nietzsche, a man can be measured by how well he sleeps at night. I can only hope you stay well rested into this future ;).

And yes, it will be ok!

By CamperBob2 2026-03-0521:251 reply

Ah, Nietzsche. "They call him Ubermensch, 'cause he's so driven." He told us that man is a thing that will be surpassed, and asked what we've done to surpass him. The last thing I want to do is get in the way of the people doing it.

By dragonwriter 2026-03-0515:07

Neither does the maintainer that claims a mechanical test of structural similarities can prove anything either waybwith regard to whether legally it is a derivative work (or even a mechnaical copy without the requisite new creative work to be a derivative work.)

And then Pilgrim is again wrong by saying that the use of Claude definitively makes it a derivative work because of the inability to prove it the work in question did not influence the neurons involved.

It is all dueling lay misreadings of copyright law, but it is also an area where the actual specific applicable law, on any level specific enough to cleanly apply, isn’t all that clear.

By simiones 2026-03-0516:012 reply

I think this is a bit too broad. There are actually three possible cases.

When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

If the code is completely different, then clean room or not is indeed irrelevant. The only way the author can claim that you violated their copyright despite no apparent similarity is for them to have proof you followed some kind of mechanical process for generating the new code based on the old one, such as using an LLM with the old code as input prompt (TBD, completely unsettled: what if the old code is part of the training set, but was not part of the input?) - the burden of proof is on them to show that the dissimilarity is only apparent.

In realistic cases, you will have a mix of similar and dissimilar portions, and portions where the similarity is questionable. Each of these will need to be analyzed separately - and it's very likely that all the similar portions will need to be re-written again if you can't prove that they were not copied directly or from memory from the original, even if they represent a very small part of the work overall. Even if you wrote a 10k page book, if you copied one whole page verbatim from another book, you will be liable for that page, and the author may force you to take it out.

By Someone 2026-03-0516:163 reply

> When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

Yes, but you do not have to prove that you haven’t copied the original; you have to prove you didn’t infringe copyright. For that there are other possible defenses, for example:

- fair use

- claiming the copied part doesn’t require creativity

- arguing that the copied code was written by AI (there’s jurisdiction that says AI-generated art can’t be copyrighted (https://www.theverge.com/2023/8/19/23838458/ai-generated-art...). It’s not impossible judges will make similar judgments for AI-generated programs)

By kube-system 2026-03-0517:111 reply

Courts have ruled that you can't assign copyrights to a machine, because only humans qualify for human rights. ** There is not currently a legal consensus on whether or not the humans using AI tools are creating derivative works when they use AI models to create things.

** this case is similar to an old case where a ~~photographer~~ PETA claimed a monkey owned a copyright to a photo, because they said a monkey took the photo completely on their own. The court said "okay well, it's public domain then because only humans can have copyrights"

Imagine you put a harry potter book in a copy machine. It is correct that the copy machine would not have a copyright to the output. But you would still be violating copyright by distributing the output.

By schlauerfox 2026-03-0517:321 reply

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput... Specifically he claimed he owned the copyright on a photo he didn't directly take. PETA weighed in trying to say the monkey owned the copyright.

By kube-system 2026-03-0517:43

Ah yeah you’re right I forgot it was PETA arguing that.

By pseudalopex 2026-03-0516:36

> there’s jurisdiction that says AI-generated art can’t be copyrighted

The headline was misleading. The courts said what Thaler could have copyrighted was a complicated question they ignored because he said he was not the author.

By gpm 2026-03-063:27

- Arguing that you owned the copyright on the copied code (the author here has apparently been the sole maintainer of this library since 2013, not all, but a lot of the code that could be copied here probably already belongs to him...)

By dmurvihill 2026-03-070:172 reply

The burden of proof is completely uncharted when it comes to LLMs. Burden of proof is assigned by court precedent, not the Copyright Act itself (in US law). Meaning, a court looking at a case like this could (should) see the use of an LLM trained on the copyrighted work as a distinguishing factor that shifts the burden to the defense. As a matter of public policy, it's not great if infringers can use the poor accountability properties of LLMs to hide from the consequences of illegally redistributing copyrighted works.

By simiones 2026-03-0911:18

The way I see this it looks like this:

1. Initially, when you claim that someone has violated your copyright, the burden is on you to make a convincing claim on why the work represents a copy or derivative of your work.

2. If the work doesn't obviously resemble your original, which is the case here, then the burden is still on you to prove that either

(a), it is actually very similar in some fundamental way that makes it a derived work, such as being a translation or a summary of your work

or (b), it was produced following some kind of mechanical process and is not a result of the original human creativity of its authors

Now, in regards to item 2b, there are two possible uses of LLMs that are fundamentally different.

One is actually very clear cut: if I give an LLM a prompt consisting of the original work + a request to create a new work, then the new work is quite clearly a derived work of the original, just as much as a zip file of a work is a derived work.

The other is very much not yet settled: if I give an LLM a prompt asking for it to produce a piece of code that achieves the same goal as the original work, and the LLM had in its training set the original work, is the output of the LLM a derived work of the original (and possibly of other parts of the training set)? Of course, we'll only consider the case where the output doesn't resemble the original in any obvious way (i.e. the LLM is not producing a verbatim copy from memory). This question is novel, and I believe it is being currently tested in court for some cases, such as the NYT's case against OpenAI.

By rerdavies 2026-03-0811:57

On the other hand, as a matter of public policy, nobody should be able to claim copyright protection for the process of detecting whether a string is correctly formed unicode using code that in no material way resembles the original. This is not rocket science.

By red_admiral 2026-03-0515:07

I'm with you here, but I see another problem.

The expected functionality of chardet (detect the unicode encoding) is kind of fixed - apart from edge cases and new additions to unicode, you'd expect the original and new implementations to largely pass the same tests, and have a lot of similar code such as for "does this start with a BOM".

The fact that the JPlag shows such a low %overlap for an implementation of "the same interface" is convincing evidence for me that it's not just plagiarised.

By cubefox 2026-03-0514:331 reply

If you let an LLM merely rephrase the codebase, that's like letting it rephrase the Harry Potter novels. Which, I'm pretty sure, would still be considered a copy under copyright law, not an original work, despite not copying any text verbatim.

By actsasbuffoon 2026-03-0517:441 reply

But what if it didn’t summarize Harry Potter? What if it analyzed Harry Potter and came back with a specification for how to write a compelling story about wizards? And then someone read that spec and wrote a different story about wizards that bears only the most superficial resemblance to Harry Potter in the sense that they’re both compelling stories about wizards?

This is legitimately a very weird case and I have no idea how a court would decide it.

By cubefox 2026-03-067:18

That seems unrelated to what happened.

By kahnclusions 2026-03-0523:39

I’m surprised they think the AI generated rewrite is even copyrightable.

By TZubiri 2026-03-0514:434 reply

Ok sure, in the alternative, here's the argument:

The AI was trained with the code, so the complete rewrite is tainted and not a clean room. I can't believe this would need spelling out.

By pocksuppet 2026-03-0515:142 reply

"Tainted rewrite" isn't a legal concept either. You have to prove (on balance of probabilities - more likely than not) that the defendant made an unauthorized copy, made an unauthorized derivative work, etc. Clean-room rewriting is a defense strategy, because if the programmer never saw the original work, they couldn't possibly have made a derivative. But even without that, you still have to prove they did. It's not an offence to just not be able to prove you didn't break the law.

By rmast 2026-03-067:00

If you wanted to do the clean-room approach for something like chardet in a less controversial way, instead of having the AI do all the work couldn’t the AI generate the spec and then a human (with no exposure to the original code) do an initial implementation based on the spec?

By Manuel_D 2026-03-0515:36

As other pointed out, the notion of "clean room" rewrites is to make a particularly strong case of non-infringement. It doesn't mean that anything other than a clean room implementation is an infringement.

By jdauriemma 2026-03-0515:081 reply

This is interesting and I'm not sure what to make of it. Devil's advocate: the person operating the AI also was "trained with the code," is that materially different from them writing it by hand vs. assisted by an LLM? Honestly asking, I hadn't considered this angle before.

By cardanome 2026-03-0515:381 reply

If you worked at Microsoft and had access to the Windows source code you probably should not be contributing to WINE or similar projects as there would be legal risk.

So for this case, not much different legally. Of course there is the practical difference just like there is between me seeing you with my own eyes and me taking a picture of you.

"Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

I can say I "trained" my printer to print copyrighted material by feeding it bits but that that would be pure sophism.

Problem is that law hasn't really caught up the our brave new AI future yet so lots of decisions are up in the air. Plus governments incentivized to look the other way regarding copyright abuses when it comes to AI as they think that having competitive AI is of strategic importance.

By jdauriemma 2026-03-0517:32

> "Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

Maybe? But the design of the floppy disk is for data storage and retrieval per se. It can't give you your bits in a novel order like an LLM does (by design). From what I can tell in this case, the output is significantly differentiated from the source code.

By senko 2026-03-0515:081 reply

Reread the parent: clean room is not required.

By TZubiri 2026-03-0623:12

Oh, got it.

Parent was making a claim about clean room not being required, without making claims about whether LLM coding is or isn't clean room.

By spwa4 2026-03-0515:182 reply

Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work. After all, they all are substantially based on GPL licensed works by others. The standard courts have always used for "substantially based" by the way, is the ability to extract from the new work anything bigger than an excerpt of the original work.

So convincing evidence, by historical standards, that ChatGPT, Gemini, Copilot AND Claude are all derivative works of the GPL linux kernel can be gotten simply by asking "give me struct sk_buff", then keep asking until you're out of the headers (say, ask how a network driver uses it).

That means if courts are honest (and they never are when it comes to GPL) OpenAI, Google and Anthropic would be forced to release ALL materials needed to duplicate their models "at cost". Given how LLMs work that would include all models, code, AND training data. After all, that is the contract these companies entered into when using the GPL licensed linux kernel.

But of course, to courts copyright applies to you when Microsoft demands it ($30000 per violation PLUS stopping the use of the offending file/torrent/software/... because such measures are apparently justified for downloading a $50 piece of software), it does not apply to big companies when the rules would destroy them.

The last time this was talked about someone pointed out that Microsoft "stole", as they call it, the software to do product keys. They were convicted for doing that, and the judge even increased damages because of Microsoft's behavior in the case.

But there is no way in hell you'll ever get justice from the courts in this. In fact courts have already decided that AI training is fair use on 2 conditions:

1) that the companies acquired the material itself without violating copyright. Of course it has already been proven that this is not the case for any of them (they scraped it without permission, which has been declared illegal again and again in the file sharing trials)

2) that the models refuse to reproduce copyrighted works. Now go to your favorite model and ask "Give me some code written by Linus Torvalds": not a peep about copyright violation.

... but it does not matter, and it won't matter. Courts are making excuses to allow LLM models to violate any copyright, the excuse does not work, does not convince rational people, but it just doesn't matter.

But of course, if you thought that just because they cheat against the law to make what they're already doing legal, they'll do the same for you, help you violate copyright, right? After all, that's how they work! Ok now go and ask:

"Make me an image of Mickey Mouse peeling a cheese banana under an angry moon"

And you'll get a reply "YOU EVIL COPYRIGHT VILLAIN". Despite, of course, Mickey Mouse no longer being covered under copyright!

And to really get angry, find your favorite indie artist, and ask to make something based on their work. Even "Make an MC Escher style painting of Sonic the Hedgehog" ... even that doesn't count as copyright violation, only the truly gigantic companies deserve copyright protection.

By dragonwriter 2026-03-066:411 reply

> Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work.

That’s not how “derivative works”, well, work.

First of all, a thing can only be a derivative work if it is itself an original work of authorship.

Otherwise, it might be (or contain) a complete copy or a partial copy of one or more source works (which, if it doesn't fall into a copyright exception, would still be a at least a potential violation), but its not a derivative work.

By spwa4 2026-03-069:511 reply

So you're saying LLMs don't count as an original work and so have zero copyright protection? So anyone running those models can just freely copy them if they have access to them? And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal? I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally. Because that would be the implication, wouldn't it?

Plus, if this is true, it would be a loophole. Plus this is totally crazy.

It would be great if courts declared WHAT is the case. But they won't, because copyright only protects massive companies.

By dragonwriter 2026-03-0610:371 reply

> So you're saying LLMs don't count as an original work and so have zero copyright protection?

No, I'm saying that your explanation of what makes something a derivative work is wrong. Now, personally, I think there is a very good argument that LLMs and similar models, if they have a copyright at all, do so only because of whatever copyright can be claimed on the training set as a work of its own (which, if ti exists, would be a compilation copyright), as a work of authorship of which it is a mechanical transformation (similar to object code having a copyright as a consequence of the copyright on the source code, which is a work of authorship.) Its also quite arguable that they are not subject to copyright, and many have made that argument.

> So anyone running those models can just freely copy them if they have access to them?

I'm not arguingn for that, but yes that is the consequence if they are not subject to copyright, assuming no other (e.g., contractual) prohibition binds the parties seeking to make copies.

> And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal?

Distillation isn't an “attack” and probably isn't a violation of copyright even if models are protected, they are literally interacting with the model through its interface to reproduce its function; they are functional reverse engineering.

Distillation is a violation of ToS, for which there are remedies outside of copyright.

> I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally.

Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

> Plus, if this is true, it would be a loophole. Plus this is totally crazy.

Its not a “loophole” that copyright law only covers works of original authorship, it is the whole point of copyright law.

> It would be great if courts declared WHAT is the case.

If there is a dispute which turns on what is the case, courts will rule one way or the other on the issues necessary to resolve it. Courts (in the US at least) do not rule on issues not before them, except to the extent that a general rule which resolves but covers somewhat more than the immediate case can usefully be articulated by an appellate court.)

> But they won't, because copyright only protects massive companies.

Leaving out any question of whether the premise of this claim is true, the conclusion doesn't follow from it, since “what is the case” here is the kind of thing that is quite likely to be an issue between massive companies at some point in the not too distant future, requiring courts to resolve it even if they only address the meaning of copyright law for that purpose.

By spwa4 2026-03-0612:091 reply

Your first 3-4 arguments I just read as trying to weasel out from under the GPL. Because everyone trains on GPL code and if the GPL applies to the result ... well clearly you know the implications of that.

And btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions). Everyone will happily follow that rule, don't worry.

> Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

I don't claim the break-in would be legal, but without copyright protection, if that made a model leak, it would be fair game for everyone to use.

> Distillation is a violation of ToS, for which there are remedies outside of copyright.

But the models were created by violating ToS of webservers! This has the exact same problem the copyright violations have, only far far bigger! Scraping webservers is a violation of the ToS of those servers. For example [1]. Almost all have language somewhere that only allows humans to browse them, and bots, and IF bots are allowed at all (certainly not always), only specific bots for the purpose of indexing. So this is a much bigger problem for AI labs than even the GPL issue.

So yes, if you wanted to make the case that the AI labs, and large companies, violate any kind of contract, not just copyright licenses, excellent argument. But I know already: I'm a consultant, and I've had to sue, and won, 2 very large companies on terms of payment. In one case, I've had to do something called "forced execution", of the payment order (ie. going to the bank and demanding the bank execute the transaction against a random account of the company, against the will of the large company. Let me tell you, banks DO NOT like to do this)

Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable interpretation, that's also a ToS violation. Probably a lot more implicit than the ones spelled out on websites, but not fundamentally different.

[1] https://www.bakerdatacounsel.com/blogs/terms-of-use-10-thing...

By dragonwriter 2026-03-073:29

> Your first 3-4 arguments I just read as trying to weasel out from under the GPL.

I haven't talked about any license, or given any though to any particular license in any of this; I don't know where you are reading anything about the GPL specifically into it.

None of this has anything to do with the GPL, except that the GPL only is even necessary where there is something to license because of a prohibition on copyright law.

> nd btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions).

No, that's not what it means, and I don't know where you got the "other terms" or the dependency on publication from; neither is from copyright law.

> But the models were created by violating ToS of webservers!

And, so what?

To the extent those terms are binding (more likely the case for sites where there is affirmative assent to the conditions, like ones that are gated on accounts with a signup process that requires agreeing to the ToS, e.g., “clickwrap”), there are remedies. For those where the conditions are not legally binding (more like the case where the terms are linked but there is no access gating, clear notice, or affirmative assent), well, they aren't binding.

> Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable inteUhrpretation, that's also a ToS violation.

Uh, what? We are just creating imaginary new categories of intellectual property and imaginary terms of service and imaginary bases for those terms to be enforceable now?

By vsl 2026-03-066:37

The LLM would, under that argument, be a transformative derivative work, which has important fair use implications (that don’t exist in the chardet case)…

By aaron695 2026-03-0513:03

[dead]

By jacquesm 2026-03-0512:00

This is correct. I think any author of a main chunk of code that they claim ownership to (which is probably all of us!) should at least study the basics of copyright law. Getting little details wrong can cost you time, money and eventually your business if you're not careful.

By dathinab 2026-03-0511:067 reply

The argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound.

"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.

Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.

This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.

But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).

In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.

By simiones 2026-03-0512:212 reply

> "Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

On the contrary. Except for discussions about punitive damages and so on, insider knowledge or lack thereof is completely irrelevant to patent law. If company A has a patent on something, they can assert said patent against company B regardless of whether any person in company B had ever seen or heard of company A and their patent. Company B could have a legal trail proving they invented their product that matches the patent from scratch with no outside knowledge, and that they had been doing this before company A had even filed their patent, and it wouldn't matter at all - company A, by virtue of filing and being granted a patent, has a legal monopoly on that invention.

In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work. Even more, you have your own copyright over your own work, and can assert it over anyone that tries to copy your work without permission, despite an identical work existing and being owned by someone else.

Now, purely in principle this would remain true even if you had seen the other work. But in reality, it's impossible to convince any jury that you happened to produce, entirely out of your own creativity, an original work that is identical to a work you had seen before.

> But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so.

No, this is very much false. You will never be able to win a court case on this, as any significant similarity between your work and the original will be considered a copyright violation, per the preponderance of the evidence.

By aleph_minus_one 2026-03-0514:082 reply

> In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work.

This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

> https://www.travelandleisure.com/photography/illegal-to-take...

> https://www.headout.com/blog/eiffel-tower-copyright/

By simiones 2026-03-0514:34

This has no relation to what I was saying. Taking a photo of a copyrighted work is a method for creating a copy of said work using a mechanical device, so it is of course covered by copyright (whether buildings or light shows fall under copyright is an irrelevant detail).

What I'm saying is that if you, say, create an image of a red oval in MS Paint, you have copyright over said image. If 2 years later I create an identical image myself having never seen your image, I also have copyright over my image - despite it being identical to your image, I have every right to sell copies of my image, and even to sue someone who distributes copies of my image without my permission (but not if they're distributing copies of your image).

But if I had seen your image of a red oval before I created mine, it's basically impossible for me to prove that I created my own image out of my own creativity, and I didn't just copy yours. So, if you were to sue me for copyright infringement, I would almost certainly lose in front of any reasonable jury.

By chimeracoder 2026-03-0514:151 reply

> This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

That example is not analogous to the topic at hand.

But furthermore, it also is specific to French/European copyright law. In the US, the US Copyright Act would not permit restrictions on photographs of architectural works that are visible from public spaces.

By jerrysievert 2026-03-0515:161 reply

actually, the US Copyright Act does in fact allow restrictions on photographs of architectural works that are visible from public spaces:

https://en.wikipedia.org/wiki/Portlandia_(statue)

the Portlandia statue is one such architectural work - and its creator is fairly litigious.

By chimeracoder 2026-03-0515:21

I don't know the details of that specific case so I can't speak to it, but the text of the AWCPA is very clear:

> The copyright in an architectural work that has been constructed does not include the right to prevent the making, distributing, or public display of pictures, paintings, photographs, or other pictorial representations of the work, if the building in which the work is embodied is located in or ordinarily visible from a public place.

This codifies an already-established principle in US law. French law does not have that same principle.

By twoodfin 2026-03-0512:173 reply

If I read Mario Puzo’s The Godfather and then proceed to write a structurally identical novel with many of the same story beats and character types, it will not be difficult to convince a jury exposed to these facts that I’ve created a derivative work.

On the other hand, if I can prove to the jury’s satisfaction that I’ve never been exposed to Puzo’s work in any form, it’s independent creation.

By Manuel_D 2026-03-0515:501 reply

To the contrary, there have been many cases of very similar novels with largely identical plot points and settings that survive copyright allegations, even if the author was exposed to the original work.

For a rather entertaining example (though raunchy, for a heads up): https://www.youtube.com/watch?v=zhWWcWtAUoY&themeRefresh=1

By twoodfin 2026-03-0518:231 reply

Sure, but there’s some level of slavish copying with the serial numbers filed off that would convince a judge or a jury that it’s derivative.

By Manuel_D 2026-03-060:54

Sure, but that level is a lot higher than what a lot of commenters seem to think.

By helsinkiandrew 2026-03-0513:081 reply

In the case of chardet though it wouldn't it be more like you were the publisher of the godfather novel, withdrawing it from print and releasing a novel with the same name with much of the same plot and characters but claiming the new version was an independent creation?

By pocksuppet 2026-03-0515:15

That's even worse for your case.

By helsinkiandrew 2026-03-0511:47

If the new maintainers used Claude as their “fancy code generator” (there’s a Claude.md file in the repository so it seems so) then it was almost certainly trained with the chardet source code.

By oneeyedpigeon 2026-03-0511:421 reply

> And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

How different does the new code have to be from the old code and how is that measured?

By larodi 2026-03-0513:07

nobody can tell and this is how we entered this very turbulent modern times of "everything can be retold" without punishment. LLMs already doing it at large, while original author is correct in terms of the LGPL, it is nearly impossible to say how different should expression of an idea be to be considered separate one. this is truly fundamental philosophical question that may not have an easy answer.

By jmyeet 2026-03-0511:521 reply

This is a bad argument.

Think of a rewrite (by a human or an LLM) as a translation. If you wrote a book in English and somebody translated it into Spanish, it'd still be a copyright issue. Same thing with translations.

That's very different to taking the idea of a body of work. So you can't copyright the idea of a pirate taking a princess hostage and a hero rescuing her. That's too generic. But even here there are limits. There have been lawsuits over artistic works being too similar.

Back to software, you can't copyright the idea of photo-editing software but you can copyright the source code that produces that software. If you can somehow prompt an LLM to produce photo editing software or if a person writes it themselves then you have what's generally referred to as a "cleanroom" implmentation and that's copyright-free (although you may have patent issues, which is a whole separate issue).

But even if you prompted an LLM that way, how did the LLM learn what it needed? Was the source code of another project an input in its training? This is a legal grey area, currently. But I suspect it's going to be a problem.

By pera 2026-03-0512:20

Suchir Balaji, the OpenAI researcher who was found dead in his flat just before testifying against his employer, published an excellent article somehow related to this topic:

When does generative AI qualify for fair use?

https://suchir.net/fair_use.html

Balaji's argument is very strong and I feel we will see it tested in court as soon as LLM license-washing starts getting more popular.

By bsenftner 2026-03-0513:06

Hate to be "that guy" but in a corrupt legal system, which ours is, none of this matters. Who has the influence and dollars to make the decision theirs is all that matters.

By RcouF1uZ4gsC 2026-03-0511:244 reply

I think you could have an LLM produce a written English detailed description of the complete logic of the program and tests.

Then use another LLM to produce code from that spec.

This would be similar to the cleanroom technique.

By simiones 2026-03-0512:33

Producing a copy of a copyrighted work through a purely mechanical process is clear violation of copyright. LLMs are absolutely not different from a copier machine in the eyes of the law.

Original works can only be produced by a human being, by definition in copyright law. Any artifact produced by an animal, a mechanical process, a machine, a natural phenomenon etc is either a derived work if it started from an original copyrighted work, or a public domain artifact not covered by copyright law if it didn't.

For example, an image created on a rock struck by lightning is not a copyright covered work. Similarly, an image generated by an diffusion model from a randomly generated sentence is not a copyrightable work. However, if you feed a novel as a prompt to an LLM and ask for a summary, the resulting summary is a derived work of said novel, and it falls under the copyright of the novel's owner - you are not allowed to distribute copies of the summary the LLM generated for you.

Whether the output of an LLM, or the LLM weights themselves, might be considered derived works of the training set of that LLM is a completely different discussion, and one that has not yet been settled in court.

By robinsonb5 2026-03-0511:40

Perhaps - but an argument might still be made that the result is a derivative work of the original, given that it's produced by feeding the original work through automated tooling.

But either way, deleting the original version from the repo and replacing it with the new version - as opposed to, say, archiving the old version and starting a new repo with the new version - would still be a dick move.

By robin_reala 2026-03-0511:36

Assuming the second LLM hadn’t been trained on the existing codebase. Which in this case we can’t know, but can assume that it was.

By knollimar 2026-03-0511:381 reply

Does the second LLM have the codebase in its training?

By 9864247888754 2026-03-0511:58

One could use Comma, which has only been trained on public domain texts:

https://arxiv.org/pdf/2506.05209

By Roritharr 2026-03-059:2115 reply

As part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.

One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.

Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.

"How do we protect ourselves against a competitor doing this?"

Noodling on this at the moment.

By 3rodents 2026-03-0510:321 reply

You're not describing anything new, you're describing progress. A company invests time and money and expertise into building a product, it becomes established, people copy in 1/10th of the time, the quality of products across the industry improve. Long before generative AI, Instagram famously copied Snapchat's stories concept in a weekend, and that is now a multi-multi-multi-billion contributor to Meta's bottom line.

As engineers, we often think only about code, but code has never been what makes a business succeed. If your client thinks that their businesses primary value is in the mobile app code they wrote, 1) why is it even open source? 2) the business is doomed.

Realistically, though, this is inconsequential, and any time spent worrying about this is wasted time. You don't protect yourself from your competitor by worrying about them copying your mobile app.

By amelius 2026-03-0510:541 reply

> You don't protect yourself from your competitor by worrying about them copying your mobile app.

They did not copy the mobile app. They copied the service.

By 3rodents 2026-03-0513:32

Replace “mobile app” with “backend” in my comment.

By IanCal 2026-03-059:331 reply

You might be interested in the dark factory work here https://factory.strongdm.ai/

They do something very similar for some of their work. It’s hard to use external services so they replicate them and the cost of doing so has come down from “don’t be daft, we can’t reimplement slack and google drive this sprint just to make testing faster” to realistic. They run the sdks against the live services and their own implementations until they don’t see behaviour differences. Now they have a fast slack and drive and more (that do everything they need for their testing) accelerating other work. I’m dramatically shifting my concept of what’s expensive and not for development. What you’re describing could have been done by someone before, but the difficulty of building that backend has dropped enormously. Even if the application was closed you could probably either now or soon start to do the same thing starting with building back to core user stories and building the app as well.

You can view some of this as having things like the application as a very precise specification.

Really fascinating moment of change.

By Garlef 2026-03-0511:32

> It’s hard to use external services

I think it's interesting to add what they use it for and why its hard.

What they use it for:

- It's about automated testing against third party services.

- It's not about replicating the product for end users

Why using external services is hard/problematic

- Performance: They want to have super fast feedback cycles in the agentic loop: In-Memory tests. So they let the AI write full in-memory simulations of (for example) the slack api that are behaviorally equivalent for their use cases.

- Feasiblity: The sandboxes offered by these services usually have performance limits (= number of requests per month, etc) that would easily be exhausted if attached to a test harness that runs every other minute in an automated BDD loop.

By zozbot234 2026-03-059:35

> "How do we protect ourselves against a competitor doing this?"

If the platform is so trivial that it can be reverse engineered by an AI agent from a dumb frontend, what's there to protect against? One has to assume that their moat is not that part of the backend but something else entirely about how the service is being provided.

By littlecranky67 2026-03-059:311 reply

Interesting case, IANAL but sounds legal and legit. The AI did not have expose to the backend it re-implemented. The API itself is public and not protectable.

By bandrami 2026-03-059:335 reply

OTOH as of yesterday the output of the LLM isn't copyrightable, which makes licensing it difficult

By graemep 2026-03-0510:472 reply

As other's have pointed out, this case is really about refusing to allow an LLM to be recognised as the author. The person using the LLM waived any right to be recognised as the author.

Its also US only. Other countries will differ. This means you can only rely on this ruling at all for something you are distributing only in the US. Might be OK for art, definitely not for most software. Very definitely not OK for a software library.

For example UK law specifically says "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."

https://www.legislation.gov.uk/ukpga/1988/48/section/9

By jacquesm 2026-03-0511:24

> The person using the LLM waived any right to be recognised as the author.

They can't waive their liability from being identified as an infringer though.

By bakugo 2026-03-0510:581 reply

> the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

This seems extremely vague. One could argue that any part of the pipeline counts as an "arrangement necessary for the creation of the work", so who is the author? The prompter, the creator of the model, or the creator of the training data?

By graemep 2026-03-0511:54

The courts will have to settle that according to circumstances. I think it is likely to be the prompter, and in some cases the creator of the training data as well. The creator of the model will have copyright on the model, but unlikely to have copyright on its outputs (any more than the writer of a compiler has copyright on its output).

By NitpickLawyer 2026-03-059:472 reply

I wrote this comment on another thread earlier, but it seems relevant here, so I'll just c/p:

I think we didn't even began to consider all the implications of this, and while people ran with that one case where someone couldn't copyright a generated image, it's not that easy for code. I think there needs to be way more litigation before we can confidently say it's settled.

If "generated" code is not copyrightable, where do draw the line on what generated means? Do macros count? Does code that generates other code count? Protobuf?

If it's the tool that generates the code, again where do we draw the line? Is it just using 3rd party tools? Would training your own count? Would a "random" code gen and pick the winners (by whatever means) count? Bruteforce all the space (silly example but hey we're in silly space here) counts?

Is it just "AI" adjacent that isn't copyrightable? If so how do you define AI? Does autocomplete count? Intellisense? Smarter intellisense?

Are we gonna have to have a trial where there's at least one lawyer making silly comparisons between LLMs and power plugs? Or maybe counting abacuses (abaci?)... "But your honour, it's just random numbers / matrix multiplications...

By bandrami 2026-03-0512:03

In terms of adoption, "it's not settled" is even worse

By amelius 2026-03-0510:57

Maybe we should build an LLM that can be the judge of that :)

By senko 2026-03-059:442 reply

That's a very incorrect reading.

AI can't be the author of the work. Human driving the AI can, unless they zero-shotted the solution with no creative input.

By camgunz 2026-03-0511:202 reply

Only the authored parts can be copyrighted, and only humans can author [0].

"For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology—not the human user."

"In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that 'the resulting work as a whole constitutes an original work of authorship.'"

"Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself."

IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

[0]: https://www.federalregister.gov/d/2023-05321/p-40

By simiones 2026-03-0515:341 reply

> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

Actually this is very much how people think for code.

Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.

By camgunz 2026-03-0519:561 reply

No software company thinks this, not Oracle, not Google, not Meta, no one. See: the guy they sued for taking things to Uber.

By simiones 2026-03-069:171 reply

The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.

By maxerickson 2026-03-0512:002 reply

So if I want to publish a project under some license and I put a comment in an AI generated file (never mind what I put in the comment), how do you go about proving which portion of that file is not protected under copyright?

If the AI code isn't copyrightable, I don't have any obligations to acknowledge it.

By bandrami 2026-03-0512:321 reply

You're looking at this as the infringer rather than the owner. How do you as a copyright owner prove you meaningfully arranged the work when you want to enforce your copyright?

By camgunz 2026-03-0513:02

Copyright office says this has to be done case-by-case. My guess is they'd ask to see prompts and evidence of authorship.

By skeledrew 2026-03-0510:211 reply

The human is still at best a co-author, as the primary implementation effort isn't theirs. And I think effort involved is the key contention in these cases. Yesterday ideas were cheap, and it was the execution that matters. Today execution is probably cheaper than ideas, but things should still hold.

By tricorn 2026-03-0616:03

No, effort is explicitly not a factor in copyright. It was at one point, but "sweat of the brow" doctrine went away in Feist Publications in 1991, at least in the US.

By phire 2026-03-0510:34

That's not really what the ruling said. Though, I suspect this type of "vibe rewrite" does fall afoul of the same issue.

But for this type of copyright laundering, it doesn't really matter. The goal isn't really about licensing it, it's about avoiding the existing licence. The idea that the code ends up as public domain isn't really an issue for them.

By oblio 2026-03-0510:111 reply

As of yesterday?

By phi-go 2026-03-0510:18

I think they mean this: https://news.ycombinator.com/item?id=47232289

By rwmj 2026-03-0511:20

No serious enterprise SaaS company differentiates themselves solely on the product (the products are usually terrible). It's the sales channel, the fact that you know how to bill a big company, the human engineer who is sent on site to deploy and integrate the product, the people on the support line 24/7, the regulatory framework that ensures the customer can operate legally and obtain insurance, the fact that there's a deep pool of potential hires who have used and understand the product. Those are the differentiators.

By jillesvangurp 2026-03-0514:031 reply

> "How do we protect ourselves against a competitor doing this?"

You can try patenting; but not after the fact. Copyright won't help you here. You can't copyright an algorithm or idea, just a specific form or implementation of it. And there is a lot of legal history about what is and isn't a derivative work here. Some companies try to forbid reverse engineering in their licensing. But of course that might be a bit hard to enforce, or prove. And it doesn't work for OSS stuff in any case.

Stuff like this has been common practice in the industry for decades. Most good software ideas get picked apart, copied and re-implemented. IBM's bios for the first PC quickly got reverse engineered and then other companies started making IBM compatible PCs. IBM never open sourced their bios and they probably did not intend for that to happen. But that didn't matter. Likewise there were several PC compatible DOS variants that each could (mostly) run the same applications. MS never open sourced DOS either. There are countless examples of people figuring out how stuff works and then creating independent implementations. All that is perfectly legal.

By jasomill 2026-03-0523:35

IBM never open sourced their BIOS, but they did publish complete source code listings:

https://bitsavers.org/pdf/ibm/pc/pc/6025008_PC_Technical_Ref...

https://bitsavers.org/pdf/ibm/pc/xt/1502237_PC_XT_Technical_...

https://bitsavers.org/pdf/ibm/pc/at/1502494_PC_AT_Technical_...

Between this and the fact that their PC-DOS (née MS-DOS) license was nonexclusive, I'm honestly not sure what they expected to happen.

The nature of early IBM PC advertising suggests to me that they expected the IBM name and established business relationships to carry as much weight as the specifications itself, and that "IBM PC compatible" systems would be no more attractive than existing personal computers running similar if not identical third-party software (PC-DOS wasn't the only example of IBM reselling third-party software under nonexclusive license), and would perhaps even lead to increased sales of first-party IBM PCs.

Which, in fact, they did, leading me to believe the actual result may have been not too far from their original intent, only with IBM capturing and holding a larger share of the pie.

By ShowalkKama 2026-03-059:451 reply

If your backend is trivial enough to be implemented by a large language model, what value are you providing?

I know it's a provoking question but that answers why a competitor is not a competitor.

By dboreham 2026-03-0514:27

I suspect you're underestimating the capabilities of today's LLMs.

By consumer451 2026-03-0516:25

> "How do we protect ourselves against a competitor doing this?"

I have been thinking about this a lot lately, as someone launching a niche b2b SaaS. The unfortunate conclusion that I have come to is: have more capital than anyone for distribution.

Is there any other answer to this? I hope so, as we are not in the well-capitalized category, but we have friendly user traction. I think the only possible way to succeed is to quietly secure some big contracts.

I had been hoping to bootstrap, but how can we in this new "code is cheap" world? I know it's always been like this, but it is even worse now, isn't it?

By nandomrumber 2026-03-059:311 reply

Maybe a better question is:

How do our competitors protect themselves against us doing this?

By dredmorbius 2026-03-0517:23

Particularly if you're named "Google", "Amazon", "Microsoft", or "Apple".

By jmyeet 2026-03-0511:59

I think the genie is out of the bottle on this one and there's really no putting it back.

There is a certain amount of brand loyalty and platform inertia that will keep people. Also, as you point out, just having the source code isn't enough. Running a platform is more than that. But that gap will narrow with time.

The broader issue here is that there are people in tech who don't realize that AI is coming for their jobs (and companies) too. I hope people in this position can maybe understand the overall societal issues for other people seeing their industries "disrupted" (ie destroyed) by AI.

By mellosouls 2026-03-059:27

The famous case Google vs Oracle may need to be re-evaluated in the light of Agents making API implementation trivial.

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....

By Meneth 2026-03-0512:45

"How do we protect ourselves against a competitor doing this?"

That's the neat thing: you don't!

By senko 2026-03-059:461 reply

> "How do we protect ourselves against a competitor doing this?"

DMCA. The EULA likely prohibits reverse engineering. If a competitor does that, hit'em with lawyers.

Or, if you want to be able to sleep at night, recognize this as an opportunity instead of a threat.

By orthoxerox 2026-03-0510:521 reply

What about jurisdictions where reverse engineering is an inalienable right?

By dredmorbius 2026-03-0517:221 reply

Which are those?

By orthoxerox 2026-03-0517:471 reply

Afaik, the EU and Russia says that observing/experimenting with the external behavior of the program to determine its internal logic is legal.

Russia even allows to decompile object code if you have to solve private compatibility issues.

By jasomill 2026-03-060:011 reply

Even in the US, are there any non-DRM examples where reverse engineering for the purpose of interoperability in violation of a license agreement have been used as the basis for copyright claims, even when the results are incorporated into a competing product?

For example, I don't recall Microsoft ever being sued by WordPerfect or Lotus for reading and writing their applications' unpublished file formats, which wouldn't have necessarily involved disassembly or decompilation, but was still the result of reverse engineering that almost certainly involved using a licensed or unlicensed copy of the competitor's product.

By dredmorbius 2026-03-064:12

Google LLC v. Oracle America, Inc. is also a relevant case, I suspect. Found for Google against Oracle's claim of copyright infringement, for non-clean-room RE of Java APIs:

<https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...>

By fragmede 2026-03-059:52

Nothing. This is why SaaS stocks took a dump last week.

By amelius 2026-03-0510:55

Makes me wonder when AI will put the mobile phone OS duopoly to an end.