Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have no opinion on whether there's infrigement or not, as I am not a lawyer, but I found the argument that the terms of use specify that you allow them to analyze your code pretty convincing:

> We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.



Copilot seems to go further than all that. It can suggest verbatim, nontrivial, copyrighted work with no means of attribution. And the sources are so broad no human alive could say with certainty any given output is not infringing, unless it's so trivial it cannot be copyrighted.


I've always had trouble understanding intellectual property when it comes to code, because if I read open source code, remember everything, and write my own version (even if slightly different) 10 years later, I am not 100% sure whether it's copyright infrigement or not.

I see it as something akin to a painter studying someone else's work, then reproducing some of the techniques invented by the original artist... except that their techniques didn't have a license I guess?

I think I am not equipped to fully understand the legal boundaries between inspiration and theft, and I think most programmers are in the same situation.


Painter analogy falls down when you consider Copilot can output verbatim chunks. So more like a photo of a painting or a stroke-for-stroke copy, even if portions changed or only parts taken.

Now if changes are so significant it becomes impossible to recognize the reference then that could be legit. Though based on how Copilot works I don't think that can be assumed, or even proven. When I studied art the teachers usually taught us to begin with our own original photograph, ideally without trademarked items, for reference.

And even in literature or journalism one must quote sources, even if paraphrased. I was taught to put down any inspiring work and only begin my own work after taking a break, to reduce the likelihood of unintentionally copying the original.


Whether or not I agree with your interpretation of that clause (I am choosing not to analyze that too deeply), the vast majority of my code isn't on GitHub because I uploaded it there... it is on GitHub because it was open source and someone else--someone who uses GitHub to manage their projects--uploaded it, whether as a mere fork or as a legitimately derived work. If you were right, and this did matter to their legality, this would thereby mean 1) that Copilot's database is already tainted and 2) that I guess GitHub isn't actually capable of being used to host GPL projects at all (which I doubt is the intention).


Yeah this case is interesting actually, I had never thought of a legitimate derivative landing on github, not something the original author intended to happen...

My guess is that you're right and it does mean 1) and 2), but again, it's probably a matter of interpretation and actual ruling on the matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: