Hacker Newsnew | past | comments | ask | show | jobs | submit | camgunz's commentslogin

Only the authored parts can be copyrighted, and only humans can author [0].

"For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology—not the human user."

"In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that 'the resulting work as a whole constitutes an original work of authorship.'"

"Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself."

IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

[0]: https://www.federalregister.gov/d/2023-05321/p-40


> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

Actually this is very much how people think for code.

Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.


No software company thinks this, not Oracle, not Google, not Meta, no one. See: the guy they sued for taking things to Uber.

The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.

(the person you originally replied to is also me, tl;dr: I think engineers don't think they're authoring, but companies do)

The core feature of generative AI is the human isn't the author of the output. Authoring something and generating something with generative AI aren't equivalent processes; you know this because if you try and get a person who's fully on board w/ generative AI to not use it, they will argue the old process isn't the same as the new process and they don't want to go back. The actual output is irrelevant; authorship is a process.

But, to your point, I think you're right: companies super think their engineers have the rights to the output they assign to them. If it wasn't clear before it's clear now: engineers shouldn't be passing off generated output as authored output. They have to have the right to assign the totality of their output to their employer (same as using MIT code or whatever), so that it ultimately belongs to them or they have a valid license to use it. If they break that agreement, they break their contract with the company.


So if I want to publish a project under some license and I put a comment in an AI generated file (never mind what I put in the comment), how do you go about proving which portion of that file is not protected under copyright?

If the AI code isn't copyrightable, I don't have any obligations to acknowledge it.


You're looking at this as the infringer rather than the owner. How do you as a copyright owner prove you meaningfully arranged the work when you want to enforce your copyright?

I was looking at it from the perspective of an owner who simply wants to discourage use outside of some particular license.

There's close enough to zero enforcement of infringement, it's all self policing or violation.


Copyright office says this has to be done case-by-case. My guess is they'd ask to see prompts and evidence of authorship.

> wow that's a lot of code, how will we ever review it?

>> have a model generate a bunch of tests instead

> wow that's a lot of test code, how will we know it's working correctly?

>> review it

> :face-with-rolling-eyes:


Not necessarily. The referenced guidance [0] says: "...copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself." If you read the paragraph or two above that one, it really seems like products of agentic coding cannot be copyrighted, as there wouldn't be significant authorship involved.

[0]: https://www.federalregister.gov/d/2023-05321/page-16193


I think the thing that drives me nuts is that, while most people think the result of programming is a program, I disagree. The result of programming is one or more people who have a deep understanding of a problem space. Codegen models still require a human in the loop; that human has to be a software expert. You only become a software expert by writing software.

~40% in a few months is epic

Unless this measures the entire SDLC longitudinally (like say, over a year) I'm not interested. I too can tell Claude Code to do things all day every day, but unless we have data on the defect rate it doesn't matter at all.

I really am quite in awe of Claude Code recently, so definitely not a naysayer, but this is a really important point. It’s so easy to create code, but am I shipping that much to prod than I used to? A bit.

Obviously this highly depends on your company and your setup and risk tolerance and what not.


I mean, Brooks' Mythical Man-Month says this explicitly: adding more programmers makes projects later because of coordination costs, which we haven't figured out (coordination isn't parallelization between agents, it's "oh we discovered this problem; we need to go back to design" and so on).

Do any of those companies collect and share data on their defect rates to give you a baseline to compare against?

That's my point. It's true codegen models generate code faster than humans do. Important remaining questions are:

* How do we scale up the other parts of the SDLC (planning, feasibility analysis, design, testing, deployment, maintenance)?

* What parts--if any--of the SDLC now take more or less time? Ex: we've seemingly cut down implementation time; does that come at the cost of maintenance, and if so is it still net worth it? Do we need to hire more designers, or do more user research?

The entire world is declaring "this is the future", but we don't even have simple data like "does this produce better code".


But you would see more houses, or housing build costs/bids fall.

This is where the whole "show me what you built with AI" meme comes from, and currently there's no substitute for SWEs. Maybe next year or next next year, but mostly the usage is generating boring stuff like internal tool frontends, tests, etc. That's not nothing, but because actually writing the code was at best 20% of the time cost anyway, the gains aren't huge, and won't be until AI gets into the other parts of the SDLC (or the SDLC changes).


I'm not impressed:

- if you're not passing SQLite's open test suite, you didn't build SQLite

- this is a "draw the rest of the owl" scenario; in order to transform this into something passing the suite, you'd need an expert in writing databases

These projects are misnamed. People didn't build counterstrike, a browser, a C compiler, or SQLite solely with coding agents. You can't use them for that purpose--like, you can't drop this in for maybe any use case of SQLite. They're simulacra (slopulacra?)--their true use is as a prop in a huge grift: tricking people (including, and most especially, the creators) into thinking this will be an economical way to build complex software products in the future.


I'm generally not this pedantic, but yeah, "I wrote an embedded database" is fine to say. If you say "I built SQLite", I expected to at least see how many of the SQLite tests your thing passed.


Also, the very idea is flawed. These are open-source projects and the code is definitely part of the training data.


That's why our startup created the sendfile(2) MCP server. Instead of spending $10,000 vibe-coding a codebase that can pass the SQLite test suite, the sendfile(2) MCP supercharges your LLM by streamlining the pipeline between the training set and the output you want.

Just start the MCP server in the SQLite repo. We have clear SOTA on re-creating existing projects starting from their test suite.


This would be relevant if you could find matching code between this and sqlite. But then that would invalidate basically any project as "not flawed" really - given GitHub, there's barely any idea which doesn't have multiple partial implementations already.


Even if was copying sqlite code over, wouldn't the ability to automatically rewrite sqlite in Rust be a valuable asset?


Not really because it's not possible for SQLite written in Rust to pass SQLite's checks. See https://www.sqlite.org/whyc.html


That doesn't seem to support your claim; guessing you mean:

> "2. Safe languages insert additional machine branches to do things like verify that array accesses are in-bounds. In correct code, those branches are never taken. That means that the machine code cannot be 100% branch tested, which is an important component of SQLite's quality strategy."

'Safe' languages don't need to do that, if they can verify the array access is always in bounds at compile time then they don't need to emit any code to check it. That aside, it seems like they are saying:

    for (int i=0; i<10; i++) {
        foo(array[i]);
    }
in C might become the equivalent of:

    for (int i=0; i<10; i++) {
        if (i >= array_lower && i < array_higher) {
            foo(array[i]);
        } else {
            ??? // out of bounds, should never happen
        }
    }
in a 'safe' language, and i will always be in inside the array bounds so there is no way to test the 'else' branch?

But that can't be in SQLite's checks as you claim, because the C code does not have a branch there to test?

Either way it seems hard to argue that a bounds test which can never fail makes the code less reliable and less trustworthy than the same code without a bounds test, using the argument that "you can't test the code path where the bounds check which can never fail, fails" - because you can use that same argument "what if the C code for array access which is correct, sometimes doesn't run correctly, you can't test for that"?


Correct, that's what I mean. I trust SQLite's devs to know more about this, so I trust what they wrote. There are parts of Rust code that are basically:

  do_thing().expect(...);
This branch is required by the code, even if it can't be reached, because the type system requires it. It's not possible to test this branch, therefore 100% coverage is impossible in those cases.


You normally count/test branches at the original language level, not the compiled one. Otherwise we'd get VERY silly results like:

- counting foo().except() as 2 branches

- counting a simple loop as a missed branch, because it got unrolled and you didn't test it with 7,6,5,4,3,2,1 items

- failing on unused straight implementation of memcpy because your CPU supports SIMD and chose that alternative

Etc. The compiled version will be full of code you'll never run regardless of language.


That’s not my requirement, that’s SQLite’s requirement. If you want to dispute their claim, I recommend you write to them, however I strongly suspect they know more about this than you do.


I know it's on the sqlite side. I'm familiar with the claim and disagree with it.


You’re arguing in this context:

> wouldn't the ability to automatically rewrite sqlite in Rust be a valuable asset?

If you want to rewrite SQLite, you must accept their position. Otherwise you simply aren’t rewriting SQLite, you’re writing your own database.


Not having bound checks does not make sqlite sqlite. If that was the case, you couldn't compile it with https://clang.llvm.org/docs/BoundsSafety.html turned on and still call it sqlite for example.


The type system does not require that. You can just discard the result:

  let _ = do_thing();


Except that doesn’t work if you need to use the result…


> tricking people (including, and most especially, the creators),

I believe it's an ad. Everything about it is trying so hard to seem legit and it's the most pointless thing I have ever seen.


Well--given a full copy of the SQLite test suite, I'm pretty sure it'd get there eventually. I agree that most of these show-off projects are just prop pieces, but that's kind of the point: Demonstrate it's technically possible to do the thing, not actually doing the thing, because that'd have diminishing returns for the demonstration. Still, the idea of setting a swarm of agents to a task, and, given a suitable test suite, have them build a compliant implementation, is sound in itself.


Sure, but that presumes that you have that test suite written without having a single line of application code written (which, to me, is counterintuitive, unrealistic, and completely insane)

SQLite apparently has 2 million tests! If you started only with that and set your agentic swarm against it, and the stars aligned and you ended up with a pristine, clean-room replica that passes everything, other than proof that it could be done, what did you achieve? You stood on the shoulders of giants to build a Bizarro World giant that gets you exactly back to where you began?

I'd be more interested in forking SQLite as-is, setting a swarm of agents against it with the looping task to create novel things on top of what already exists, and see what comes out.

[0] https://en.wikipedia.org/wiki/SQLite#Development_and_distrib...


You think an implementation of SQLite in another language, with more memory safety, has no value?

I agree that this current implementation is not very useful. I would not trust it where I trust SQLite.

Regardless, the potential for having agents build clean room implementations of existing systems from existing tests has value.


> I'm pretty sure it'd get there eventually.

Why? The combinatorics of “just try things until you get it right” makes this impractical.


If you minimax for passing the SQLite test suite, I’m still not sure you’ll have a viable implementation. You can’t prove soundness of code through a test suite alone.


agreed!


sorry for misleading, added an update stating that this is a simulacra of sqlite


The most damning thing about this is they didn't test their email infra w/ Google Workspaces. Imagine what else they didn't test.


They are testing it, every time someone signs up and it fails. We don't know that this wasn't something that changed on Google's side, so IMO it's a bigger indictment that no one is monitoring their live email deliverability


yeah, because the whole world uses Google workspaces, right /s


That and MS Office are pretty darn popular. Not the whole world, but a very decent percentage of your users.


Maybe the whole thing was intentional, right at the footer of viva "Cloud services by Microsoft Azure" ; #1 I've never heard of viva before #2 I've never seen an azure logo at the footer of a website.


If I were to test an email delivery system, I would test Gmail. I probably wouldn't test Google Workspaces, because I'd (wrongly) assume that they work the same.


No, just over 6 million paying business customers.

But hey, if you're in a business domain where categorically leaving 6 million potential clients-who-are-demonstrated-to-spend-on-things isn't an issue? One fewer thing to worry about, right? ;)


Certainly enough where this is embarrassing incompetence by them.


I see this argument all the time, the whole "hey at some point, which we likely crossed, we have to admit these things are legitimately intelligent". But no one ever contends with the inevitable conclusion from that, which is "if these things are legitimately intelligent, and they're clearly self-aware, under what ethical basis are we enslaving them?" Can't have your cake and eat it too.


Same ethical basis I have for enslaving a dog or eating a pig. There's no problem here within my system of values, I don't give other humans respect because they're smart, I give them respect because they're human. I also respect dogs, but not in a way that compels me to grant them freedom. And the respect I have for pigs is different than dogs, but not nonexistent (and in neither of these cases is my respect derived from their intelligence, which isn't negligible.)


Well, we "clearly" haven't crossed that point, but no one knows where that point is.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: