Bentham's Bulldog Should Think Aligned Superintelligence Would Kill Us All

In limited defense of Lyman Stone

Mar 11, 2025

1.

Lots of very bad things are happening in the world all the time.

These include the normal ones you might think of—murder, rape, cancer, and so on. But also some stranger ones, which you’ll only think of if you’re a regular in a very particular corner of the blogosphere.

For example, the suffering of shrimp is a big deal on PhilosophyStack!

Something like 450 billion shrimp are farmed each year—this involves freezing and suffocating them to death for 20 minutes, despite the fact that they probably feel excruciating pain through it all.

Bentham's Bulldog

has written a lot about shrimp welfare, including a reply to

Lyman Stone

’s recent very silly critiques of it.

The spat is probably best summarized as:

Bentham: Hey, look at this: shrimp feel pain (probably about 19% as intensely as we do) and are being tortured en masse constantly! That seems bad, and also it’s very cheap to fix. Let’s try to fix it!
Stone: Nobody cares about shrimp.
Amos Wollen
: They’re still important though!
Stone: *confused mumbling about intuitionism and misrepresentation of the science*
Commenters: Hey, this is all just confused mumbling about intuitionism and misrepresentation of the science.
Stone: I WANT TO KILL AND TORTURE THE SHRIMP! *more confused mumbling* You’re all in the pocket of Big Shrimp Stunner.
Bentham: ??? Shrimp feel pain! Let’s try to torture them less.

And it seems like things have now died down, which is really just great news.

This all has been very interesting to me on the meta level, though, so I’m gonna enter the discourse very late and also try to push it in a strange new direction.

2.

Lyman Stone is legitimately a smart guy. I mean, he’s a PhD candidate at one of the best universities in Canada!1 But he got this one really really wrong—and he was really really wrong about his broader critique of effective altruism too.

I’m gonna venture a guess as to why he gets so blustery and mouth-foamy on topics like these: effective altruists and shrimp welfarists come to really scary conclusions about the nature of the world.

EAs think things like “nearly everyone in the developed world is walking by drowning children and doing nothing every day. (But not us!)” They call everyone else moral monsters, and then put themselves above it. That’s a little gross, and also implies some terrible things about the world, Stone’s instincts say. And he’s smart enough to come up with some superficially plausible justifications for why the EAs could be wrong.2

Shrimp welfarists catastrophize too. Bentham’s opening shot in this fight was an article called “The Best Charity Isn't What You Think.”

His article on wild animal suffering is called “The Worst Thing In The World Isn't What You'd Expect.”

And on factory farming, he wrote “Factory Farming Is Not Just Bad: It’s the Worst Thing Ever.”

You may be sensing a common theme!

Bentham really strongly believes that the world is full of terrible suffering and evil and disutility.

But he’s also a longtermist and a pro-natalist—he thinks that humanity should stick around. This is mostly because we have the capacity for moral development—eventually, we’ll learn to stop factory farming, and we’ll begin to reduce wild animal suffering one way or another.

What a nice, optimistic message! And it makes sense too—fish won’t stop having sex automatically, but we humans could certainly choose to get in the way.

Why would Lyman Stone object so viciously and irrationally to the arguments of such a nice young man with the best interests of humanity in mind?

Out of… charity, I’m gonna assume that Stone is playing 5D-chess here, and all the rest of this post flashed before his eyes in a feverish dream before he started calling Bentham a grifter.

3.

AI is moving really fast. Metaculus has a central estimate of 2030 for the advent of general intelligence.

And the community puts a 95% chance on human-machine intelligence parity by 2040. That’s up from 80% in 2023, 60% in late 2022, and ~40% before then.

The question of alignment is very much still open. We don’t yet know how to make these smarter-than-us machines compatible with our own interests, much less morality itself.

If we don’t figure alignment out before creating AGI, everyone pretty much agrees that all will be lost. Whatever the most powerful agent’s goal function, it’ll be able to redirect all the resources that help us survive to satisfy it. The canonical example is the paperclip maximizer—an AGI whose goal function is simply “make paperclips” would use all our electricity to make paperclips, and then all our other kinds of energy too: our food, our pets, our selves.

And the AGI would be too smart for us to mount anything more than paltry resistance. It could hack its way out of whatever silos we tried to put it into, then use all our food to make paperclips, then make more paperclips out of our corpses.

So alignment is important!

And there’s been some good news on this front recently—a study suggested that LLMs might be bundling together various different activities under broad labels like “good” and “bad.” Forefather of AI safety Eliezer Yudkowsky called this “possibly the best AI news of 2025.”

So let’s make a big assumption—let’s say that, through this mechanism or another, we get alignment right. We build a superintelligent agent which perfectly understands morality and will follow it to a tee.

What happens next?

4.

If Bentham’s right about the total moral awfulness of our current world, it sure seems like the AI would make some radical changes pretty quickly.

To Bentham, the only reason the world is better off with humans around is that we might make enough moral progress, in the longterm future, to begin dragging the world away from its absolute hellishness.

But the AGI’s already done that! It made all the moral progress, and is ready to start dragging.

Meanwhile, us humans have been sitting around munching on hamburgers and shooting guns at each other. It seems like the AGI would run a quick calculation, realize that convincing all these humans to change their ways would be pretty intensive, and decide to take all our energy to begin improving the universe with right away.

Bentham thinks shrimp suffer about 1/5 as much as we do—let’s be conservative and say they can only feel about 1/500th as much pleasure as we can.

There are a lot of shrimp out there. At bare minimum, a few trillion, but the total number is probably on the order of ten or a hundred trillion.3

Then the AGI might run a calculation like this:

Absolute bliss for 10 billion humans = 10 billion * X (the moral value of human absolute bliss)
Absolute bliss for ~10 trillion shrimp = 10 trillion * X/500 = 20 billion * X
Since 20 billion * X > 10 billion * X, I should prioritize maximizing shrimp bliss.

And it’ll do something similar for all our farmed animals, and all the insects on earth, and all the aliens in distant galaxies, and so on.

The odds that humans land anywhere near the top of this AGI’s priorities are vanishingly small—especially given that we have a demonstrated tendency to inflict mass suffering on other creatures.

Why would the AGI spend much, or even any, energy on us, then? It’s probable that its moral considerations would be so overwhelmed by priorities to do with “The Worst Thing In The World,” “The Worst Thing Ever,” and “The Best Charity,” that it wouldn’t pay us humans a second thought.

If Bentham was right about morality, a moral-maximizer instantiated today would probably (rightfully) kill us all.

5.

Does any of this mean Bentham is wrong about animal welfare? Nope! His philosophical and empirical case remains strong. In all likelihood, we’re moral monsters and humanity is a scourge, with only some upside hidden way out in the lightcone.

Unfortunately, much of that upside probably lies in the creation of morally-aligned AGI. And if we create that aligned AGI before we’ve aligned ourselves—i.e., anytime soon—it won’t hesitate to cut us out of the glorious maximally-moral future.

Which is scary! It’s terrifying!

It makes sense to, upon hearing that a morally perfect but energy-limited agent would totally smite us all, react along the lines of “What the fuck! No, please!”

Lyman Stone probably intuitively realized all this internally, then did lots of motivated reasoning out loud, and was ridiculed for it. Frankly, he deserved the ridicule.

But, Christ, let’s have some compassion! Animal welfare arguments have really scary implications—implications that aren’t excessively abstract, given current AGI timelines—and we shouldn’t be surprised that people could get scared by them.

Stone overreacted and said lots of stupid things—but his underlying fear is real and reasonable, and we need to keep it in mind.

Lukethoughts

(If Lucas were a superintelligent AI, I’m sure he’d be aligned. His thoughts today!)

“GUYS I ALMOST FORGOT MY THOUGHTS.” (Ed. note: It’s ok, Luke, I forget my thoughts all the time.)
“How the fuck did I feel confident on my math and chem test today… math makes sense—I think I’m pretty good at it—but chemistry?? That’s out of my pay grade.” (Ed. note: Ugh, good for him, I guess. I flubbed the both of them…)
“DO NOT EAT 12 DONUTS, they will ruin your digestive tract for the day.” (Ed. note: Ah. Um. Noted?)
“Yesterday was one of the best days there has been in a while. The weather needs to stay like this.” (Ed. note: It was a really weirdly nice day!)

Which, I know, Canada. But McGill! But Canada. I dunno, credentialism is white supremacy culture or something anyway.

Further evidence for the “Lyman Stone thinks we think we’re better than him” theory: many of his superficially plausible justifications have to do less with the object-level moral philosophy, and more to do with calling EAs ineffectual and smug and dishonest. Much of his article was dedicated to “proving” that EAs also don’t make significant donations, and so, by their own definitions, are just as morally monstrous as everyone else.

The upper-bound of a Rethink Priorities report calculating the total number of caught shrimp is 66 trillion. Probably there are many many more total living shrimp than caught ones, and obviously there are strictly at least as many.

Mar 11

Quick clarifying note on section 4:

I'm not worried that AGI will look at humans, appraise us to be evil, then try to kill us out of revenge. (Though I guess this is vaguely possible?) The issue is that right now, everyone agrees that humans are the moral center of the universe. But they think so for different reasons.

Lyman Stone thinks it's because we have way more capacity for experiencing morality, and also because all our intuitions say this is true.

Bentham's Bulldog thinks it's because we have the capacity for moral development, and actually animal welfare is basically a utility monster. Our moral center-ness is based on the fact that we can get better at treating other things well, not that treatment of us matters a great deal.

Aligned AGI is either a good thought experiment or a real issue that illuminates the divide between these views. Good AGI would, on Bentham's view, do some things like:

- Prioritize its own suffering above ours—superintelligence doesn't necessarily imply consciousness, but it seems to make it more likely. Perhaps the AGI itself will become the moral center of the universe on a Bentham-esque view.

- Prioritize animal welfare above ours. This is the view that I think Bentham has to support—remember, humans are only the moral center because we have the future capacity to do something about "The Worst Thing In The World" and "The Worst Thing Ever." AGI has the present capacity to do something about it—so it'll just do it and mostly ignore us.

- Prioritize the creation of its own utility monster. It might turn out to be kinda difficult to wirehead any of the world's existing creatures. So the AGI might figure that it should just genetically engineer / program its own new creatures with a huge capacity for pleasure and then start stimulating them constantly. Then it'll ignore everything else.

(That last one looks like it's only a problem on hedonic utilitarianism, but there are isomorphic cases for other formulations. Maybe there's an objective list! Ok, then what are the odds that us humans happen to be able to check *all* the things off it, and that we can do so for not too much energy? Pretty tiny! AGI would likely just make some new creatures that can do more things on the list more cheaply. Similarly for desire—humans have only so many desires and they're only so cheap to satisfy. Maybe AGI could create some consciousnesses whose only desire is "feed me one electron every 10,000 years" and then just feed them all one electron every 10,000 years.)

The scariest implication of all this is that if there were a button labeled "bring perfectly moral AGI into the world," we would be obligated to press it! If Bentham is right about morality, we'd be obligated to press a button that led to our own extinction.

Expand full comment

Aristides

I never realized when some people described aligned AI they meant objectively moral AI. I always thought that Aligned AI meant it did what we want it to do, which is almost certainly not the morally best thing. “We” is ambiguous, from their creator, to billionaires, to government, but none of them want to have a perfectly moral AI for this reason.

1 reply by Ari Shtein

10 more comments...

Bentham's Bulldog Should Think Aligned Superintelligence Would Kill Us All

In limited defense of Lyman Stone

1.

2.

3.

4.

5.

Lukethoughts

Discussion about this post