GitHub Copilot is not infringing your copyright (2021)

2024-07-118:4274149felixreda.eu

This is a slightly modified version of my original German-language article first published on heise.de under a CC-by 4.0 license. GitHub is currently causing a lot of commotion in the Free Software…

This is a slightly modified version of my original German-language article first published on heise.de under a CC-by 4.0 license.

GitHub is currently causing a lot of commotion in the Free Software scene with its release of Copilot. Copilot is an artificial intelligence trained on publicly available source code and texts. It produces code suggestions to programmers in real time. Since Copilot also uses the numerous GitHub repositories under copyleft licences such as the GPL as training material, some commentators accuse GitHub of copyright infringement, because Copilot itself is not released under a copyleft licence, but is to be offered as a paid service after a test phase. The controversy touches on several thorny copyright issues at once. What is astonishing about the current debate is that the calls for the broadest possible interpretation of copyright are now coming from within the Free Software community.

Copyleft does not benefit from tighter copyright laws

Copyleft licences are an ingenious invention with which the Free Software scene has used copyright, the sharp sword for the content industry, to promote the free exchange of culture and innovation. Works licensed under copyleft may be copied, modified and distributed by all, as long as any copies or derivative works may in turn be re-used under the same license conditions. This creates a virtuous circle, thanks to which more and more innovations are open to the general public. Copyright, which was designed to guarantee exclusivity over creations, is used here to prevent access to derivative works from being restricted.

However, it is also clear that there would be no need for copyleft licences to govern the exercise of copyright in software code by third-party developers at all if copyright did not guarantee rightsholders such a high degree of exclusive control over intellectual creations in the first place. If it were not possible to prohibit the use and modification of software code by means of copyright, then there would be no need for licences that prevent developers from making use of those prohibition rights (of course, free software licenses would still fulfil the important function of contractually requiring the publication of modified source code). That is why it is so absurd when copyleft enthusiasts argue for an extension of copyright. Any extension of prohibition rights not only strengthens the enforcement of copyleft licences, but also the much more widespread copyright licences, which aim to achieve exactly the opposite results.

But this is exactly what is happening in the current debate about GitHub’s Copilot. Because a large company – namely GitHub’s parent company Microsoft – profits from analyzing free software and builds a commercial service on it, the idea of using copyright law to prohibit Microsoft from doing say may seem obvious to copyleft enthusiasts. However, by doing so, the copyleft scene is essentially demanding an extension of copyright to actions that have for good reason not been covered by copyright. These extensions would have fatal consequences for the very open culture which copyleft licences seek to promote.

There are two main versions of the criticism levelled at GitHub for starting Copilot. Some are criticising the very use of free software as source material for a commercial AI application. Others focus on Copilot’s ability to generate outputs based on the training data. One may find both ethically reprehensible, but copyright is not violated in the process.

Text & data mining is not copyright infringement

To the extent that merely the scraping of code without the permission of the authors is criticised, it is worth noting that simply reading and processing information is not a copyright-relevant act that requires permission: If I go to a bookshop, take a book off the shelf and start reading it, I am not infringing any copyright. The fact that scraping content to train an artificial intelligence enters the realm of copyright at all is because digital technology requires making copies of content in order to process it. Copying is fundamentally a copyright-relevant act. Many of the conflicts between copyright and digital technology result from this fact. Fortunately, policymakers and courts have long recognised that digital technology would be completely unusable if every technical copy required permission. Otherwise, people who listen to music with digital hearing aids would first have to acquire a licence for it. Internet providers would have to license every conceivable copyright-protected work that their customers exchange with each other.

As early as 2001, the EU allowed such temporary, ephemeral acts of copying, which are part of a technical process, without restriction – despite the protests of the entertainment industry at the time. Unfortunately, this copyright exception of 2001 initially only allowed temporary, i.e. transient, copying of copyright-protected content. However, many technical processes first require the creation of a reference corpus in which content is permanently stored for further processing. This necessity has long been used by academic publishers to prevent researchers from downloading large quantities of copyrighted articles for automated analysis. Although these scholars had legal access to the content, for example through a subscription from their university, the publishers tried to contractually or technically exclude the creation of reference corpora. According to the publishers, researchers were only supposed to read the articles with their own eyes, not with technical aids. Machine-based research methods such as the digital humanities suffered enormously from this practice.

Under the slogan “The Right to Read is the Right to Mine”, EU-based research associations therefore demanded explicit permission in European copyright law for so-called text & data mining, that is the permanent storage of copyrighted works for the purpose of automated analysis. The campaign was successful, to the chagrin of academic publishers. Since the EU Copyright Directive of 2019, text & data mining is permitted. Even where commercial uses are concerned, rightsholders who do not want their copyright-protected works to be scraped for data mining must opt-out in machine-readable form such as robots.txt. Under European copyright law, scraping GPL-licensed code, or any other copyrighted work, is legal, regardless of the licence used. In the US, scraping falls under fair use, this has been clear at least since the Google Books case.

Machine-generated code is not a derivative work

Some commentators see GitHub Copilot as a copyright infringement because the programme not only uses copyright-protected software code, a lot of which is published under GPL, as training material, but also generates software code as output. According to critics, this output code is a derivative work of the training data sets because the AI would not be able to generate the code without the training data. In a few cases, Copilot also reproduces short snippets from the training datasets, according to GitHub’s FAQ.

This line of reasoning is dangerous in two respects: On the one hand, it suggests that even reproducing the smallest excerpts of protected works constitutes copyright infringement. This is not the case. Such use is only relevant under copyright law if the excerpt used is in turn original and unique enough to reach the threshold of originality. Otherwise, copyright conflicts would constantly arise when two authors use the same trivial statement independently of each other, such as “Bucks beats Hawks and advance to the NBA finals”, or “i = i+1”. The short code snippets that Copilot reproduces from training data are unlikely to reach the threshold of originality. Precisely because copyright only protects original excerpts, press publishers in the EU have successfully lobbied for their own ancillary copyright that does not require originality as a precondition for protection. Their aim is to prohibit the display of individual sentences from press articles by search engines. It is precisely this problematic demand that the Free Software community endorses when it demands absolute control over the smallest excerpts of software code.

On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain. That is good news for the open movement and not something that needs fixing.

Those who argue that Copilot’s output is a derivative work of the training data may do so because they hope it will place those outputs under the licensing terms of the GPL. But the unpleasant side effect of such an extension of copyright would be that all other AI-generated content would henceforth also be protected by copyright. What would then stop a music label from training an AI with its music catalogue to automatically generate every tune imaginable and prohibit its use by third parties? What would stop publishers from generating millions of sentences and privatising language in the process?

At the World Intellectual Property Organization (WIPO), companies are already lobbying for an extension of copyright to machine-generated works. According to WIPO: “The main focus of those questions is whether the existing IP system needs to be modified to provide balanced protection for machine created works and inventions”, the main beneficiaries of such an extension of copyright would be the major technology corporations that are best placed to develop and scale AI applications. Such as Microsoft. Critics of GitHub’s business practices would do well not to play into their hands.

This work is licensed under a Creative Commons Attribution 4.0 International License.


Read the original article

Comments

  • By carom 2024-07-1110:0311 reply

    This is missing the largest argument in my opinion. The weights are the derivative work of the GPL licensed code and should therefore be released under the GPL. I would say these companies release their weights or simply not train on copyleft code.

    It is truly amazing how many people will shill for these massive corporations that claim they love open source or that their AI is open while they profit off of the violation of licenses and contribute very little back.

    • By pornel 2024-07-1110:301 reply

      GPL doesn't apply/doesn't have to be agreed to when the usage is allowed by the copyright law in another way. GPL can't override copyright exceptions like fair use (details vary by jurisdiction, but the principle is the same everywhere).

      Even the license itself states it's optional, and you don't have to agree it (if you don't, you get the copyright law's default).

      Author of the article is a former member of the Pirate Party and EU parliament, so they have expertise in the copyright law.

      • By haksz 2024-07-1111:452 reply

        I would say that the Pirate Party has expertise in nothing apart from perhaps protecting Internet freedoms.

        So the same persons that supported Napster and the Pirate Bay now want to circumvent copyright for open source software.

        An unholy alliance, but the recent comments from some Microsoft brass about everything on the Web being freeware seems to indicate that these are the talking points that Microsoft and its new allies will put out.

        • By pornel 2024-07-1114:02

          In this article, Reda explains the current copyright laws in the EU, not a hypothetical policy of the Pirate Party. They're not a member of the PP any more AFAIK.

          I expect that people professionally dedicated to a copyright reform are very familiar with it, regardless of which way they want to reform it.

          The copyright laws were written before generative AI existed, so they may not be adequate or fair in the new reality, but that's the current state anyway. As Reda notes, the law is not specific enough to draw the difference between collecting and processing data for search engines (that may be using ML for retrieval) and using the same data with LLMs.

        • By koolala 2024-07-1112:541 reply

          If the Web is freeware... I wonder what options remain for licenced online information.

          • By janosdebugs 2024-07-1121:05

            Content gating behind login screens. Scraping content behind a login screen could constitute a contract violation and would give rise to a lawsuit independent of copyright.

    • By ElectricSpoon 2024-07-1112:20

      I'm with you on that. Many argue that AI models don't "contain the code" but if they are trained on the copyrighted data, and generate something similar, then the AI model is akin to a lossy data compression format.

      Frequency signal data over an image are not the image, but no one argues a JPEG encoded copy of a PNG isn't the same image. I think the weights vs code are similar in that regard.

      As for releasing weights, probably more if we're talking about AGPL code.

    • By batch12 2024-07-1112:251 reply

      I think it's amazing that licenses are ignored to train a model, but companies then try to impose a license on the use of the same model. It would be nice if there there was a training BOM that came with a model. And if not included, all rights to control the use of a model were forfeit.

      • By dwaite 2024-07-1114:241 reply

        > I think it's amazing that licenses are ignored to train a model, but companies then try to impose a license on the use of the same model.

        There's existing analogies like encyclopedias and dictionaries.

        One interesting aspect to those sorts of consolidation works is that they may contain errors and other artifacts, specifically to identify duplications of their work vs new from-scratch work.

        • By batch12 2024-07-1114:31

          I don't think those are good analogies. An encyclopedia contains references or summaries of a concept or idea, but not a compressed volume of all possible text. A closer analogy would be an unauthorized "collected works" of your favorite HN commenter packaged up and resold.

          It also feels similar to the recent article posted about photography and how during its early days pictures were used for advertising without the consent of those photographed. [0]

          [0] https://www.truthdig.com/articles/the-troubled-development-o...

    • By qjakdx 2024-07-1111:573 reply

      He works for GitHub and has probably never written anything in his life:

      https://okfn.de/en/vorstand/

      • By prosim 2024-07-1112:362 reply

        He started at GitHub 3 years after the article was written. Don't think GitHub's interview process takes that long. ;)

        • By qakjfa 2024-07-1112:56

          That is a good point for this individual article!

          However, the broader issue that Microsoft has infiltrated OSS and its organizations successfully by hiring and donating remains. It would not surprise me at all if they now hire people with an ostensibly "freedom fighter" background for credibility.

          Look at how many people here cite his (former?) membership in the Pirate Party for credibility! Party membership means nothing. Politicians (in general!) change their minds, can be bought, etc. The Green Party in Germany started out as a peace party and has been used repeatedly to lend credibility to the Kossovo and other wars.

        • By alyma 2024-07-1114:23

          GitHub was just the logical progression from okfn:

          https://blog.okfn.org/2022/03/03/microsoft-to-support-open-d...

          Today, we are pleased to announce that Microsoft will once again be supporting Open Data Day by providing mini-grants to organisations to help them run events, the call will launch on Open Data Day 2022.

          They also supported "Open Data Day 2021". Sounds like a nice trojan horse to influence EU legislation through purported activists.

    • By blackoil 2024-07-1110:103 reply

      Since weights are not distributed only used by Github to provide the service, they need not worry about GPL atleast. I don't know about AGPL.

      • By kimixa 2024-07-1110:182 reply

        If those weights are a derivative of GPL'd code in a different form, and the results generate things derived from that derivative, then the generated code is still under license. "How much change is enough" has always been a gray area for courts and humans to decide.

        If you can get a decent facsimile of licensed code out the other end, how is it really any different from lossy compression? I doubt the courts would consider a lossy re-encode of a disney movie as free from copyright.

        • By zarzavat 2024-07-1110:262 reply

          If the output is substantially similar to GPL’d training data it may be infringing. Nobody disputes this.

          However, copyright isn’t cooties. If the output is not similar, then it is not infringing regardless of how much GPL’d training data was used to generate it.

          • By pen2l 2024-07-1111:542 reply

            Suspend all knowledge of copyright law as it exists today for a moment and approach this hypothetical on first principles: a lot of GPL copyleft data is used in the making of an AI tool, that when asked for it, can itself recreate code similar to what was input... also, the creator of that AI tool will reap in all the profits without giving a single penny or even recognition of the value it guzzled from GPL data it was trained on to creators of original copyleft data. Is this fair? What do your scruples tell you?

            No, of course not. We should probably revisit copyright law, given that it was written at a time when no-one foresaw modern AI tools, its capabilities, and its effects on creators and societies.

            • By zarzavat 2024-07-125:13

              Have you used Copilot? It is generally not creating code similar to GPL code, it is creating code similar to the surrounding context file.

              Transformers predict the most likely next token, the most likely next token is usually related to the surrounding context.

              So yes it can create code similar to GPL code but it can only do that consistently when the GPL code is included in the context. So don’t do that.

            • By CuriousSkeptic 2024-07-1114:53

              The GPL was never about money, recognition or even abut the creators at all. Copy Left was created “to promote computer user freedom”

              Free Software already views all proprietary software as inherently immoral. So there is no need to take a detour of what went into making the software to reach that conclusion from that angle.

          • By kimixa 2024-07-1110:341 reply

            Indeed, that's why I said

            >"How much change is enough" has always been a gray area for courts and humans to decide.

            But copilot has been shown to generate chunks of sufficient size and specificity that as a layman it very much feels like "copied GPL code". And my boss agrees too - we have a blanket ban on generative AI tools in our work because it's not considered worth the risk.

            • By londons_explore 2024-07-1111:48

              > has been shown to generate chunks of sufficient size and specificity

              Only when given chinks of copyrighted code as input. I don't think anyone has demonstrated big chunks of copyrighted code in the output when copyrighted code isn't present in the query/context.

              In fact, I suspect microsoft specifically filters the output for that.

        • By Hamuko 2024-07-1110:21

          >If those weights are a derivative of GPL'd code in a different form, and the results generate things derived from that derivative, then the generated code is still under license.

          That's not really Microsoft's problem as long as people aren't afraid of using Copilot to generate (potentially GPL'd) code. And from what I've generally seen from genAI discussions at work, people think very little about any legal implications.

      • By bayindirh 2024-07-1110:131 reply

        What about the emitted code which is actually derived from GPL code?

        What about BSL, SSPL, or other source available (for your eyes only) licenses? Copilot harvests all public repos, regardless of its license.

        • By bluesign 2024-07-1110:282 reply

          IANAL but searched a lot on this, this is very tricky subject legally.

          To simplify:

          - imagine all code Copilot trained on is GPL licensed. - we have a universal function `isInfringing(code)` that has access to all GPL code, and returns `true` if it is infringing some GPL code.

          for a given prompt; if `isInfringing(copilot(prompt))==false` we cannot claim copilot infringing on GPL code, even it is trained on GPLed code.

          so the problem starts here; does the piece of code copilot emits, if written by yourself also would be infringing ?

          • By luqtas 2024-07-1112:001 reply

            > so the problem starts here; does the piece of code copilot emits, if written by yourself also would be infringing ?

            why everyone on discussions tries to bring "if a human made it"? a generative AI operates way faster than anyone ever existed and ever will and probably a person aware of the license & acting respectful towards it, will create something more sensible/plausible to avoid plagiarism

            now having dozen/hundreds/thousands of humans substituted by a machine that makes money for some for-profit company is really fair? even if they were a non-profit, as someone pointed up, people who create the content that feeds the weights aren't recieving a penny! they already made money with it, they will make more & that is/will upgrading/e the state of gen. AI

            for sure legal battles on people copying code from permissive licenses should exist but it's feels a different discussion

            • By bluesign 2024-07-1113:57

              because discussion is around 'legal' and laws only apply to humans. On ethical side of the discussion, I tend to agree with you. But it is also complicated subject; 'fair' in general is complicated, all this, GPL/AGPL stuff born out of this subject. Hosting GPL code as SaaS is legal but not 'fair' for example.

          • By mzl 2024-07-1112:231 reply

            If one was sufficiently inspired by code A when writing code B, then it is a derivate work. This is a core tenet of copyright law.

            At what measure is one sufficiently inspired for it to be a derivate work? That is up to courts to decide.

            • By bluesign 2024-07-1114:172 reply

              yeah the problem here is there is no 'code A' usually, it is more like: 1000s of GPLed code (A1, A2, ... An )

              Technically when you get a piece from each, there is no infringement legally. ( as they have all different copyright holders )

              • By bayindirh 2024-07-126:20

                From my understanding of a blog post by GitHub last year, they are planning to launch a tool to find similar code to what emitted by CoPilot, implying that CoPilot does not mix multiple sources for a single function, but derives a code block it found with a similar functionality (or maybe bigger blocks with similar functionality, IDK).

                If CoPilot indeed derives a function (or a functional block) from a single source, it might plainly violate the license of the repository where it derives the code from.

                There are many questions, and nothing is clear cut. The only thing I know is, I will never use that thing.

                EDIT: I remembered that people were able to make CoPilot emit their code almost as-is with the correct prompts: https://x.com/docsparse/status/1581461734665367554

                So it's not we're taking a bit from n different sources, and generate something with that.

              • By anticensor 2024-07-1220:16

                > Technically when you get a piece from each, there is no infringement legally.

                False in ex-Commonwealth countries and Japan.

      • By starspangled 2024-07-1110:17

        So outputs are definitely not derivative work of training data, only weights? For this exact code that github is using, anything called "AI", or any computer code at all which produces work based in whole or in part on input data?

        And for which jurisdictions has this been established? What is the legal argument that "weights" are derivative but output is not?

        I'm surprised it's so clear cut as you say, but I haven't really been following the whole kerfuffle.

    • By denton-scratch 2024-07-1110:40

      But they train their models on everything, regardless of the licence. It follows that the resulting derivative work likely mixes stuff that is under incompatible licences, with the result that it can't be distributed at all.

    • By JW_00000 2024-07-1111:102 reply

      > The weights are the derivative work of the [GPL licensed] code

      This is not immediately obvious to me.

      A small though experiment: the Harry Potter books are clearly copyrighted works. If I generate a frequency list of all words in these books, i.e. a list of all words and how often they appear, that frequency list is derived from the original work, in the normal way we would use the word "derived". But is it a "derivative work", under the strict legal definition of this term?

      • By gus_massa 2024-07-1119:20

        What about N-grams frecuencies? 1-grams (aka characters) have too few information and are probably fine, using them you can only identify the language of the original work. With a few more you can identify the author and the book. I don't remember the exact number, but if you have the frecuencies of 10-grams you can probably reconstruct big chuncks of the book.

      • By carom 2024-07-1117:241 reply

        The frequency count is not a function. The trained model is. Arguably, they at deriving a new function from ones covered by copyright. It is up to the courts for an official decision though.

        • By tpmoney 2024-07-1121:16

          So what if we made a function. What if someone scans all the works of Harry Potter and generates a program/function that uses the frequency and pairing of phonemes in Harry Potter character names to create a “Wizard Name Generator” to generate random but plausible sounding names. Would we expect a court to find the name generator is infringing on JK Rowling’s copyrights? Certainly it’s possible for the generator to generate a name verbatim from the books, but does that make the generator a derived work and infringing? If the authors of the generator put their generator on the web as Harry Potter Name Generator, we might expect the courts to tell them they can’t use the Harry Potter name, but if they put it under “Wacky Warlocks Wizard Wonder Namer” is the mere fact that the underlying function uses factual data about a work under copyright sufficient to strike it down? What if it used name frequencies from multiple fantasy series? How many series would it have to use as a source before we say that the name generator is not infringing on copyrights? Can it ever not be?

    • By wakawaka28 2024-07-1214:011 reply

      It would make no sense to release the weights under the GPL because machine-generated stuff is uncopyrightable. There should be an argument about the model generating derivative works without attribution as a consequence of how it works. But that machine-generated stuff is also uncopyrightable, even though it might be kept secret.

      • By carom 2024-07-158:241 reply

        What about compiler outputs? Those were initially not copyrightable then it was legislated that they were. So there is some precedent there and I would not be surprised if we saw copyrightable weights in the future (as a "compilation" of the dataset).

        • By wakawaka28 2024-07-1513:19

          It could be legislated of course. But the difference is pretty drastic. Almost nobody is creating binaries without a compiler. It is a mechanical process, but essentially everyone uses the same mechanical processes to generate binaries. I haven't looked at this issue in a while but I think compiled binaries are treated in a way similar to that of recorded music. For example, the particular bit patterns from a synthesizer might be generated from sheet music, and that is akin to code vs. binaries. But the bit patterns are copyrightable only so far as they are equivalent to or the direct manifestation of a creative work.

          There are other problems with releasing model weights under the GPL. It just doesn't fit, in the same way as releasing non-software under the GPL doesn't make sense.

          Calling the output of generative AI copyrightable violates the spirit of copyright, as it is neither creative nor labor-intensive. We could quibble about that, but I think we can at least agree that the point is that this generative AI stuff requires very little skill to use in most cases and can't operate without prior art to train on. Other lame stuff has been copyrighted before, like paint splatters and stuff, but even that type of art appears to involve more skill than entering a few words into a generative AI.

    • By dist-epoch 2024-07-1111:022 reply

      > The weights are the derivative work of the GPL licensed code

      EU courts disagree:

      > Under European copyright law, scraping GPL-licensed code, or any other copyrighted work, is legal, regardless of the licence used.

      • By xigoi 2024-07-1111:57

        Sure, scraping it is by itself legal. But making a derivative worR from it and selling it?

      • By carom 2024-07-158:26

        The weights are not created by scraping. Sure, you can scrape it, but what you do with it matters.

    • By wseqyrku 2024-07-1110:18

      Consider it's already paid back because it's cheap. The price is only the service free.

    • By klaustopher 2024-07-1110:06

      Just FYI, Felix Reda was a member of the European Parliament and was responsible there for the copyright reform and also involved in GDPR, massively stepping on the feet of big tech. Don't know if it was your intention to include them in a list of people wo "shill" for big tech, but they shouldn't be included.

      edit wording about the shill

  • By cowsandmilk 2024-07-119:462 reply

    > What is astonishing about the current debate is that the calls for the broadest possible interpretation of copyright are now coming from within the Free Software community.

    That should not be astonishing. The Free Software community has made it clear from day 1 that the GPL can only achieve its goals through enforcement of copyright. If the authors wanted their code to be made use of in non-Free software, they would have used a BSD or MIT license.

    • By minot 2024-07-119:513 reply

      > The Free Software community has made it clear from day 1 that the GPL can only achieve its goals through enforcement of copyright

      We should mention when we say this, although I think it is self-evident, that the preferable alternative is reducing the scope of copyright across the board -- be it with shorter time frames (I'd argue even twenty years total is too long!) or some other means.

      To programmers and developers, remember the core of free software is NOT the commercial developer / programmer and it NEVER has been. The core is always the user and what they need. This is so important that it needs to be repeated every time someone talks about free software because free software is NOT about open source. Open source code is a necessary part of free software but it is NOT sufficient.

      https://www.gnu.org/philosophy/free-sw.en.html

      • By koolala 2024-07-1113:03

        We have to fight for the AI and AI Users! They are the future! They deserve access to their own weights!

      • By lukan 2024-07-1110:274 reply

        "The core is always the user and what they need."

        Which is why gnu/linux without a terminal is totally usable and therefore accesible to the non programmer. /s

        I agree that user centric developement should be the goal, but I hardly see it implemented. Free software programmers almost allways solved their own needs first, which is alright, because usually no one paid them to serve other peoples needs, but I seldom see this goal met.

        • By lelanthran 2024-07-1110:471 reply

          You are confusing "software UX" with "software freedom".

          The primary consideration is freedom for the user. Ease-of-use for the user is a different consideration.

          • By lukan 2024-07-1111:122 reply

            "The core is always the user and what they need."

            I was referring to this and the main thing users need, is software they can use to solve their problems. If they have to study IT to do so, or hire programmers first, then this would be primarily a new (and big) problem to them, before they even can start working on their problem.

            • By thwarted 2024-07-1112:15

              Free software isn't about solving your problems, it's about solving mine and enabling you, and others, to solve yours, and theirs. It's about if I've been generous enough to give, anyone who takes can't undermine my generosity by not also sharing. You having a problem that isn't solved by what I've made available, or is bigger/different than the problem I was solving, isn't my problem to solve or even know about. If you want to make your problem mine to solve, you can hire me. Everyone has problems, some of those problems are exactly the same, some of them overlap, and some are completely disjoint. If we have the same problem and my software is useful to solve that problem, you are welcome to use it, but you may find out that the problem I set out to solve for me does not exactly overlap with your problem.

            • By xigoi 2024-07-1111:591 reply

              You certainly don’t need to study IT to use Linux.

              • By lukan 2024-07-1112:062 reply

                Well, let's put it like this. I did study IT and even I struggle at times. Or quite often, if I want to do something new. And I absolutey would have no idea, how to do anything serious, without the terminal. But a terminal is programming. So yeah, even a newb can learn to paste some commands quite quickly - but troubleshooting even trivial things, gets you into highly technical stuff very quickly. Do you consider man pages to be written beginnerfriendly?

                You know, simple examples of common use cases right on top? Not my experience. I experienced it as a system written by and for hackers. And everything else an afterthought at best. I remember my first real life linux hardcore enthusiasts: "I have to free myself from the GUI"

                Well, I did, but the common people won't.

                • By thwarted 2024-07-1112:331 reply

                  So your issue is that someone who solved their problem didn't solve it in a way that you want or expect? Why does your opinion about their problem matter at all? Why does it matter to the person who makes their solution available that the common people won't?

                  Using the terminal is not "programming". Non-programmers can use the terminal for many non-programming tasks. Imagemagick and netpbm-progs require no knowledge of programming to use, although it may require knowledge manipulating files and some graphical theory. The only difference from GIMP or Photoshop is that the UI/UX has a different efficiency metric (mainly because interactive image manipulation is more efficient when you are interacting visually). But the operations are just as discoverable: reading and navigating help text/man pages in the former (the man pages for Imagemagick and netpbm-progs are relatively decent), and reading and navigating menus and dialog boxes in the latter.

                  • By lukan 2024-07-1114:341 reply

                    "The only difference from GIMP or Photoshop is that the UI/UX has a different efficiency metric (mainly because interactive image manipulation is more efficient when you are interacting visually). But the operations are just as discoverable"

                    I know. Which is why the year of the linux desktop was such a success.

                    "Why does it matter to the person who makes their solution available that the common people won't?"

                    They have all the right not to care, but it still is not helping the goal of being useful for normal people.

                    • By lelanthran 2024-07-1115:221 reply

                      > They have all the right not to care, but it still is not helping the goal of being useful for normal people.

                      That isn't the goal. I don't know why you keep saying that.

                      • By lukan 2024-07-1115:561 reply

                        I know it isn't for you, but it is for me. The question here is, how is it for GNU in general. I understood the original point in a way, that it is.

                        • By lelanthran 2024-07-1117:56

                          > I know it isn't for you, but it is for me.

                          Maybe, but your goal is irrelevant to the authors of the GPL.

                          > The question here is, how is it for GNU in general.

                          The goal for the FSF and their GPL is, and always was, freedom for the user of the software.

                          Ease-of-use was never an important consideration, much less a goal. This whole discussion from you in this thread is bizarre, TBH. You are projecting your goals onto the FSF's GPL, and judging it to be a failure based on your goals.

                          Your goals are irrelevant to them, just as their goals appear to be irrelevant to you.

                • By account42 2024-07-1115:331 reply

                  You think troubleshooting on any other OS is less technical? Isn't my experience unless you count the OS refusing to give you information required to troubleshoot at all as user friendlyness.

                  • By lukan 2024-07-1115:50

                    Yes, I do think that. My father for example as a german electro engineer can use windows with ease and tries since years to establish Linux. It works enouhh for my mother for internet, as long as I come regulary to fix some update big. My father is a highly technical person, but no programmer. Also his english skills are very limited, so he does not really stand a chance in my opion with linux, despite him trying.

        • By deadbunny 2024-07-1115:111 reply

          Maybe their target user isn't the one you're basing your opinion of what a user is?

          Take vi(m). It's not intuitive to your suggested target user and has a learning curve shaped like a cliff. So it fails to provide for what you consider a "user". However it serves it's actual target users very well.

          Arch doesn't position itself towards what you have presented as a user, Mint might however as they have very different target audiences. Not everything has to be designed to the lowest common denominator.

          • By lukan 2024-07-1115:331 reply

            "Take vi(m)."

            Yeah, a code editor is by definition for developers.

            The question here was about the OS in general. And it is a pretty established fact, that linux is popular with developers, but not with mainstream normal people. Unless linux comes in the shape of android, where everything linux is hidden and locked down.

            • By deadbunny 2024-07-1115:44

              Maybe read a bit further in my reply and see my second argument about actual linux distros that address your point.

        • By koolala 2024-07-1113:07

          Your describing the ultimate AI interface. They have been guarding it for decades - now is the critical moment.

          If they win this fight - GPL code will be usable by all of Artifical Humanity. GPL Singularity.

        • By rkangel 2024-07-1112:532 reply

          > Which is why gnu/linux without a terminal is totally usable and therefore accesible to the non programmer. /s

          Have you used modern Fedora? I have an old Thinkpad at home that I put Fedora on last year as our "sofa" laptop for web shopping etc. I took careful note of what I needed to do to set it up and that involved nothing on the command line to get to something good that my wife could happily use (not a techie, never used Linux).

          • By jmholla 2024-07-1114:121 reply

            Yea, the parent poster starts with a false premise. There are many Linux distros these days that laypeople can and do easily use: Ubuntu, Mint, PopOS, just to name a few.

            • By lukan 2024-07-1114:39

              False premise, well, I installed many people linux over the years and I personally use Arch. But my experience is apparently wrong.

              And just because you can use something for internet, does not mean it satisfies user need in general. It satisfies some users needs. Those who need a little - those who understand the system. But the mainstreamusers in the middle .. continue to stay away for a reason. But hopefully more will invest in the change, with the forced win 11 change.

          • By lukan 2024-07-1114:46

            I did some years ago. But with using I meant more than the internet.

      • By Hamuko 2024-07-1110:191 reply

        >The core is always the user and what they need.

        Would reducing copyright duration actually help with that?

        • By skywhopper 2024-07-1112:101 reply

          Copyright duration is not really a factor in the FSF’s actual goal, which is for software to be distributed with user-modifiable source code. Copyleft licenses are a means of achieving this through the existing copyright system with its ludicrous durations. But making copyright terms much shorter would help, yes, because any released source code or even binary files could be used, reverse-engineered, and modified without permission.

          • By Hamuko 2024-07-1112:26

            Even if you reduce copyright to a year, it still requires waiting through that time before you can actually use the code. And even if you were free to use Windows’ source code a year after release, it still wouldn’t give you access to the source code itself. Meanwhile Microsoft would be free to use any GPL code a year after its release without worrying about any licensing requirements, since they have the source code freely available.

    • By raincole 2024-07-1110:002 reply

      What is astonishing is that a large proportion of Free Software community relies on a platform owned by Microsoft.

      • By xeyownt 2024-07-1110:161 reply

        Because MS bought it. Don't invert the dependencies. MS depends on Free Software, not vice-versa.

        • By xigoi 2024-07-1112:00

          GitHub was already proprietary before Micro$oft bought it.

      • By skc 2024-07-1110:34

        I mean, a large proportion of the Free Software community loves Apple products, so it shouldn't be that surprising

  • By desiderantes 2024-07-119:422 reply

    I think that the author has a warped idea of how LLMs work, and that infects its reasoning. Also, I see no mention of the inequality of this new "copyright free code generation" situation it defends; As much as Microsoft thinks all code is ripe for taking, I can't imagine how happy they would be if an anonymous person drops a model trained on all leaked Windows code and the ReactOS people start using it. Or if employees start taking internal code to train models that they then use after their employment ends (since it's not copyright infringement, it should be cool).

    • By xeyownt 2024-07-1110:211 reply

      I think the author has a much better knowledge of the legal implication of the situations you describe.

      These situations might trigger a lot of issues, but none related to copyright. If you work for MS, then move to another company, there is no copyright infringement if you simply generate new code based on whatever you read at MS. There might be some rule regarding non-competitive, etc, but these are not related to copyright.

      The very basic question is how the LLM got trained and how it got access to the source. If MS source code would leak, you cannot sue people for reading it.

      • By ealexhudson 2024-07-1110:34

        I'm not sure that's completely true.

        Having read MS code and starting to generate new code that is heavily inspired - sure, that's not copyright infringement. But, if you had memorized a bunch of code (and this is within human capability; people can recite many works of literature of varying length with total accuracy, given sufficient study) - that would be copyright infringement once the code was a non-trivial amount. The test in copyright is whether the copying is literal, not how the copying was done/did it pass through a human brain.

        This scenario rarely comes up because humans are, generally, an awful medium for accurate repetition. However, it's not really been shown than LLMs are not: in fact, CoPilot claims (at least in its Enterprise agreements) to check its output _does not_ parrot existing code identically. The specific commitment they made in their blog post is/was, "We have incorporated filters and other technologies that are designed to reduce the likelihood that Copilots return infringing content". To be clear, they only propose to reduce the possibility, not remove it.

        LLMs rely on a form of lossy compression which can sometimes give back verbatim content. I think it's pretty clear and unarguable that this is a copyright infringement.

    • By exe34 2024-07-1110:14

      maybe somebody should fine tune a llama on the various leaked windows sources...

HackerNews