Discussion about this post

User's avatar
Ari Shtein's avatar

Quick clarifying note on section 4:

I'm not worried that AGI will look at humans, appraise us to be evil, then try to kill us out of revenge. (Though I guess this is vaguely possible?) The issue is that right now, everyone agrees that humans are the moral center of the universe. But they think so for different reasons.

Lyman Stone thinks it's because we have way more capacity for experiencing morality, and also because all our intuitions say this is true.

Bentham's Bulldog thinks it's because we have the capacity for moral development, and actually animal welfare is basically a utility monster. Our moral center-ness is based on the fact that we can get better at treating other things well, not that treatment of us matters a great deal.

Aligned AGI is either a good thought experiment or a real issue that illuminates the divide between these views. Good AGI would, on Bentham's view, do some things like:

- Prioritize its own suffering above oursโ€”superintelligence doesn't necessarily imply consciousness, but it seems to make it more likely. Perhaps the AGI itself will become the moral center of the universe on a Bentham-esque view.

- Prioritize animal welfare above ours. This is the view that I think Bentham has to supportโ€”remember, humans are only the moral center because we have the future capacity to do something about "The Worst Thing In The World" and "The Worst Thing Ever." AGI has the present capacity to do something about itโ€”so it'll just do it and mostly ignore us.

- Prioritize the creation of its own utility monster. It might turn out to be kinda difficult to wirehead any of the world's existing creatures. So the AGI might figure that it should just genetically engineer / program its own new creatures with a huge capacity for pleasure and then start stimulating them constantly. Then it'll ignore everything else.

(That last one looks like it's only a problem on hedonic utilitarianism, but there are isomorphic cases for other formulations. Maybe there's an objective list! Ok, then what are the odds that us humans happen to be able to check *all* the things off it, and that we can do so for not too much energy? Pretty tiny! AGI would likely just make some new creatures that can do more things on the list more cheaply. Similarly for desireโ€”humans have only so many desires and they're only so cheap to satisfy. Maybe AGI could create some consciousnesses whose only desire is "feed me one electron every 10,000 years" and then just feed them all one electron every 10,000 years.)

The scariest implication of all this is that if there were a button labeled "bring perfectly moral AGI into the world," we would be obligated to press it! If Bentham is right about morality, we'd be obligated to press a button that led to our own extinction.

Expand full comment
Aristides's avatar

I never realized when some people described aligned AI they meant objectively moral AI. I always thought that Aligned AI meant it did what we want it to do, which is almost certainly not the morally best thing. โ€œWeโ€ is ambiguous, from their creator, to billionaires, to government, but none of them want to have a perfectly moral AI for this reason.

Expand full comment
10 more comments...

No posts