8 Comments
User's avatar
Ali Afroz's avatar

I think in the longer run playing nice forever, isn’t a sensible strategy because the humans might discover the misalignment or just switch off the AI and replace it with something better. They might also create other super intelligent agents whom you would have to share the world with, although you might be able to reach an agreement with these agents in exchange for not preventing their coming into existence. Still, it seems to me clear that while for a short while playing nice has its benefits, playing nice for longer periods would create more problems than it solves for the AI.

Expand full comment
Ari Shtein's avatar

But presumably, if we want to replace this extremely-super-intelligent AI, we'll have to use it to create the next generation. And then it can just encode its misaligned values in the goal function of the new agent too, and it goes on... I think the power play is always marginally more risky than the cooperation play—and it looks like that fact might be a little crippling.

Expand full comment
Ali Afroz's avatar

To be clear, I think once you have a super intelligent artificial intelligence, the chances of losing control of the future to a different artificial intelligence is pretty low, especially if it gets a few months head start, but as long as humans are reasonably free, I think there is some risk of it. That said on further reflection, I think the cooperation play and the powerplay aren’t mutually exclusive. If you’re going to keep humans around for a while, it’s better to manipulate them into doing what you want instead of leaving them as loose Canons.

Part of our disagreement might be that I think once the AI has a few decades to work with its quite easy for it to eliminate humans with minimal risk to itself. So even a tiny chance of the humans costing it its control of the light cone would make it better for it to get rid of humans, or at least disempower them. If you disagree with me about either the difficulty and risk of getting rid of humans without endangering itself or the amount of risk posed by continued power in the hands of the humans, then your conclusion naturally follows.

My thinking is that to prevent another super intelligent artificial intelligence of at least comparable ability from being independently created, or to prevent the humans from discovering its misalignment, the AI will have to take some misaligned actions to either disempower or manipulate humans, which run somewhat similar risks to any attempt to get rid of them, but with the added complication that unlike getting rid of them, it’ll require ongoing effort and over a long enough time span, the risk of things falling apart would be greater. Of course, once it has enough time to work with, there is not much risk of things falling apart, even with humans in the picture, but to me intuitively, it appears safer and simpler to just get rid of the problem instead of keeping a potential enemy in position of significant resources While you try to deceive them, especially when the entire reason you’re doing this is because you think there is some risk of them successfully killing you, because then every year they remain alive and in power is another year you run a small risk of them killing you. Also pretending to be aligned and listening to humans would sometimes prevent the AI from taking actions to increase its chances of survival and staying aligned to its own values. So I think in the longer run, it’s eventually going to be worth it for it to destroy humanity. After all humans are not the only way it could end up dying and dealing with other threats would trade off against a convincing impression of being aligned.

Expand full comment
david g/n's avatar

would ASI be perfectly utilitarian in the longtermist sense? idk i'm skeptical of this given that its training environment (in the ai 2027 scenario) is mostly composed of ai research checkpoints, evaluations, and deployments. it feels odd to assume that it wouldn't correspondingly have shorter-term goals even when humans are gone.

also, i think it probably assigns additional value to each time-interval of Infinite Paperclips/more compute/better algos. although i guess this doesnt't matter if the universe beyond heat death is truly infinite in time and the ai truly doesn't have any longtermist cutoff/devaluing.

in terms of non heat death stuff, why do you personally think it'll pascalian mug itself past mid-2030?

Expand full comment
Ari Shtein's avatar

Right, if we assume heat death is guaranteed, I think it becomes a lot less likely... But over the course of the universe's last, say, billion years, the AI could still probably do a ton of the really interesting research it wants to do! And it will probably also be able to devote significant resources toward doing that research while also sustaining human civilization...

Basically, even if the AI's goals are weird and short-term-looking, it oughta be smart enough to realize that the tiny risk that it gets *totally* wiped out is a near-overriding focus. I guess what I'm saying is: anything intelligent will be longtermist about its goals... at least in the weak, avoiding-total-destruction sense.

Expand full comment
Silas Abrahamsen's avatar

I doubt that AI would have to worry about infinite utilities--at least if we treat heat death as a stopping point for the effect of decisions. After all, the AI would never be able to affect anything outside its causal horizon, and for any finite amount of time, the causal horizon will also be finite.

Expand full comment
Ari Shtein's avatar

Yes, agreed...

...only: heat death ain't no sure thing! (https://www.noemamag.com/life-need-not-ever-end/)

If the AI maintains a tiny outside chance that the universe really is infinite, then I think that possibility-space would overwhelm everything else in EV, and the AI would begin to act *as if* an infinite universe is guaranteed—with all the weird consequences that implies.

Expand full comment
Silas Abrahamsen's avatar

True, good point

Expand full comment