Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know that he would agree with me, but EY's argument seems to me to boil down to: without a formal specification of goals, we won't know how to predict a system's actions, and not knowing how to predict a system's actions when that system is better at reaching its goals than you are is very likely to make any bug a (literally) fatal one.

Humans have a vast network of shared implicit goals and norms, such that we know that satisfying a goal of "end poverty" doesn't justify "kill all the poor". It's easy to wave your hands and say that by the time we approach human-level AI it will necessarily have all the features of human intelligence, but what if it doesn't?

Sometimes various people have referred to "optimization process" rather than "intelligence" to try to get this point across. What if it's possible to build an optimization process that is better, or far better, at solving problems than humans? Planes are much better at flying fast and far than birds, but often don't bother with some of the things that all birds have, like movable wings and feathers. I believe that Eliezer thinks that the things that make us human are feathers, analogously.

Even if you think there's only a small chance of this, the chance that a program more intelligent than humans will have things that humans consider bugs is very high, and the consequences are an existential risk.



Why would such an "optimization process" have to have something resembling the continuous environmental monitoring process we call consciousness? Why would it even need to be self aware? Why would it need to have any concept of its own goals? For such an optimization tool to be useful, we just need to be able to use it. It would need an inkling of goals. It would need to understand individuals and other entities. It would not need any concept of self.


I'm not sure if you believe you're disagreeing with me, but you're not. Goals are necessary for an optimization process, but errors in specifying those goals (or in the goal-specification procedure itself) are potentially existential risks, for a powerful enough optimization process.


The counter argument is probably that self-improving AI would need to be self-aware.

Besides, it's a well known trope in these stories that AI becomes self aware on it's own anyway, so there you go. By argumentum ad Jurassic Park, life- I mean intelligence - always finds a way.


The counter argument is probably that self-improving AI would need to be self-aware.

If we command it, why? Why can't we just tell the non self-aware AI to go and improve its own design? It may even deduce that the design is its own, but it may be so composed such that it simply doesn't care.

The advantage of a self-aware AI is that it can come up with goals that you didn't think of. This is the quintessence of the "double edged sword." Humans are already quite good at this, however. As William Gibson wrote, "The Street finds its own uses for technology."

Eurisko has already demonstrated that non self-aware AI can arrive at of ways to satisfy goals you never thought of. (So have Bayesian spam filters.) This is already a powerful tool that we haven't exploited even halfway as well as we might.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: