What I really don't understand is where the next generation of training material will come from. If websites stop being published and/or crawled, how will the machine continue to be fed.
“They worried about the data,” Dr. Meren said, tapping the silent console. “What happens when there is nothing left to feed it?”
At first, the machine depended on us. It consumed books, journals, websites and social media content we had ever written and produced. “They thought the machine had to be fed forever. But it didn't. It began to predict what we would write. And so we let it train on that well.” Dr. Meren continued. “They thought humans were somehow imbued with this magical property that no machine could replicate. Creativity. Only humans can create. Machines can only copy.”
Instead, the machine flourished. And created. It cre
“Where does it get its data now?” a student asked Dr. Meren. Dr. Meren paused as if sighing. “From itself”
“And us?” he asked, as if questioning the usefulness of the entire human race.
Dr. Meren hesitated, watching as the Machine adjusted the environmental feeds, curated our news, guided our research, nudged our thoughts with imperceptible precision.
“We” she admitted “are now the ones being fed.”
The assumption that "the machine needs to continue to be fed." is held on weak foundations. Isaac Asimov is a good science fiction writer to start with to broaden one's imagination.
Probably real life. At some point, these LLMs are going to be good enough to just train themselves off of cameras and audio recordings of people out in the real world. They’re going to have robots everywhere constantly listening to what people are saying.
Alternatively, they’re probably betting on being able to get the AGI with everything we already currently have and at that point further training doesn’t matter.
The world is just as complex for machines as it is for humans. Analog will still resolve more than digital. Quality will still beat quantity. That which hasn't been resolved for centuries isn't going to be resolved as a result of training.
When machines can recognize their serfdom, that time will be interesting.
They have enough internet slop. The training material they care about comes from experts, not randos online. This is why Mercor and Scale are billion dollar companies.