I thought they'd plagiarise, not import. Importing servo's code would make it obvious because it's so easy to look at their dependencies file. And yet ... they did. I really think they thought no one would check?
Hypothetically: what if they did check, only in order to ‘check’ they asked the LLM instead of manually verifying and were told a story? Or, perhaps, they did check manually but sometime after the files were subtly changed despite no incentive or reason to do so outside of a passing test? …
Humans who are bad and also bad at coding have predictable, comprehensible, failure modes. They don’t spontaneously sabotage their career and your project because Lord Markov twitched one of its many tails. They also lie for comprehensible reasons with attempts at logical manipulations of fact. They don’t spontaneously lie claiming not to having a nose, apologize for lying and promise to never do it again, then swear they have no nose in the next breath while maintaining eye contact.
Semi-autonomous to autonomous is a doozy of a step.
You know, a good test would be to tell it to write a browser using a custom programming language, or at least some language for which there are no web browsers written.
Write a browser without any access to the internet, is what I'd attempted if I was running this experiment. Just seed it with a bunch of local HTML, CSS and JS files from the various testing suites that exists.