Bentham's Bulldog Should Think Aligned Superintelligence Would Kill Us All
In limited defense of Lyman Stone
1.
Lots of very bad things are happening in the world all the time.
These include the normal ones you might think ofโmurder, rape, cancer, and so on. But also some stranger ones, which youโll only think of if youโre a regular in a very particular corner of the blogosphere.
For example, the suffering of shrimp is a big deal on PhilosophyStack!
Something like 450 billion shrimp are farmed each yearโthis involves freezing and suffocating them to death for 20 minutes, despite the fact that they probably feel excruciating pain through it all.
has written a lot about shrimp welfare, including a reply to โs recent very silly critiques of it.The spat is probably best summarized as:
Bentham: Hey, look at this: shrimp feel pain (probably about 19% as intensely as we do) and are being tortured en masse constantly! That seems bad, and also itโs very cheap to fix. Letโs try to fix it!
Stone: Nobody cares about shrimp.
: Theyโre still important though!Stone: *confused mumbling about intuitionism and misrepresentation of the science*
Commenters: Hey, this is all just confused mumbling about intuitionism and misrepresentation of the science.
Stone: I WANT TO KILL AND TORTURE THE SHRIMP! *more confused mumbling* Youโre all in the pocket of Big Shrimp Stunner.
Bentham: ??? Shrimp feel pain! Letโs try to torture them less.
And it seems like things have now died down, which is really just great news.
This all has been very interesting to me on the meta level, though, so Iโm gonna enter the discourse very late and also try to push it in a strange new direction.
2.
Lyman Stone is legitimately a smart guy. I mean, heโs a PhD candidate at one of the best universities in Canada!1 But he got this one really really wrongโand he was really really wrong about his broader critique of effective altruism too.
Iโm gonna venture a guess as to why he gets so blustery and mouth-foamy on topics like these: effective altruists and shrimp welfarists come to really scary conclusions about the nature of the world.
EAs think things like โnearly everyone in the developed world is walking by drowning children and doing nothing every day. (But not us!)โ They call everyone else moral monsters, and then put themselves above it. Thatโs a little gross, and also implies some terrible things about the world, Stoneโs instincts say. And heโs smart enough to come up with some superficially plausible justifications for why the EAs could be wrong.2
Shrimp welfarists catastrophize too. Benthamโs opening shot in this fight was an article called โThe Best Charity Isn't What You Think.โ
His article on wild animal suffering is called โThe Worst Thing In The World Isn't What You'd Expect.โ
And on factory farming, he wrote โFactory Farming Is Not Just Bad: Itโs the Worst Thing Ever.โ
You may be sensing a common theme!
Bentham really strongly believes that the world is full of terrible suffering and evil and disutility.
But heโs also a longtermist and a pro-natalistโhe thinks that humanity should stick around. This is mostly because we have the capacity for moral developmentโeventually, weโll learn to stop factory farming, and weโll begin to reduce wild animal suffering one way or another.
What a nice, optimistic message! And it makes sense tooโfish wonโt stop having sex automatically, but we humans could certainly choose to get in the way.
Why would Lyman Stone object so viciously and irrationally to the arguments of such a nice young man with the best interests of humanity in mind?
Out ofโฆ charity, Iโm gonna assume that Stone is playing 5D-chess here, and all the rest of this post flashed before his eyes in a feverish dream before he started calling Bentham a grifter.
3.
AI is moving really fast. Metaculus has a central estimate of 2030 for the advent of general intelligence.
And the community puts a 95% chance on human-machine intelligence parity by 2040. Thatโs up from 80% in 2023, 60% in late 2022, and ~40% before then.
The question of alignment is very much still open. We donโt yet know how to make these smarter-than-us machines compatible with our own interests, much less morality itself.
If we donโt figure alignment out before creating AGI, everyone pretty much agrees that all will be lost. Whatever the most powerful agentโs goal function, itโll be able to redirect all the resources that help us survive to satisfy it. The canonical example is the paperclip maximizerโan AGI whose goal function is simply โmake paperclipsโ would use all our electricity to make paperclips, and then all our other kinds of energy too: our food, our pets, our selves.
And the AGI would be too smart for us to mount anything more than paltry resistance. It could hack its way out of whatever silos we tried to put it into, then use all our food to make paperclips, then make more paperclips out of our corpses.
So alignment is important!
And thereโs been some good news on this front recentlyโa study suggested that LLMs might be bundling together various different activities under broad labels like โgoodโ and โbad.โ Forefather of AI safety Eliezer Yudkowsky called this โpossibly the best AI news of 2025.โ
So letโs make a big assumptionโletโs say that, through this mechanism or another, we get alignment right. We build a superintelligent agent which perfectly understands morality and will follow it to a tee.
What happens next?
4.
If Benthamโs right about the total moral awfulness of our current world, it sure seems like the AI would make some radical changes pretty quickly.
To Bentham, the only reason the world is better off with humans around is that we might make enough moral progress, in the longterm future, to begin dragging the world away from its absolute hellishness.
But the AGIโs already done that! It made all the moral progress, and is ready to start dragging.
Meanwhile, us humans have been sitting around munching on hamburgers and shooting guns at each other. It seems like the AGI would run a quick calculation, realize that convincing all these humans to change their ways would be pretty intensive, and decide to take all our energy to begin improving the universe with right away.
Bentham thinks shrimp suffer about 1/5 as much as we doโletโs be conservative and say they can only feel about 1/500th as much pleasure as we can.
There are a lot of shrimp out there. At bare minimum, a few trillion, but the total number is probably on the order of ten or a hundred trillion.3
Then the AGI might run a calculation like this:
Absolute bliss for 10 billion humans = 10 billion * X (the moral value of human absolute bliss)
Absolute bliss for ~10 trillion shrimp = 10 trillion * X/500 = 20 billion * X
Since 20 billion * X > 10 billion * X, I should prioritize maximizing shrimp bliss.
And itโll do something similar for all our farmed animals, and all the insects on earth, and all the aliens in distant galaxies, and so on.
The odds that humans land anywhere near the top of this AGIโs priorities are vanishingly smallโespecially given that we have a demonstrated tendency to inflict mass suffering on other creatures.
Why would the AGI spend much, or even any, energy on us, then? Itโs probable that its moral considerations would be so overwhelmed by priorities to do with โThe Worst Thing In The World,โ โThe Worst Thing Ever,โ and โThe Best Charity,โ that it wouldnโt pay us humans a second thought.
If Bentham was right about morality, a moral-maximizer instantiated today would probably (rightfully) kill us all.
5.
Does any of this mean Bentham is wrong about animal welfare? Nope! His philosophical and empirical case remains strong. In all likelihood, weโre moral monsters and humanity is a scourge, with only some upside hidden way out in the lightcone.
Unfortunately, much of that upside probably lies in the creation of morally-aligned AGI. And if we create that aligned AGI before weโve aligned ourselvesโi.e., anytime soonโit wonโt hesitate to cut us out of the glorious maximally-moral future.
Which is scary! Itโs terrifying!
It makes sense to, upon hearing that a morally perfect but energy-limited agent would totally smite us all, react along the lines of โWhat the fuck! No, please!โ
Lyman Stone probably intuitively realized all this internally, then did lots of motivated reasoning out loud, and was ridiculed for it. Frankly, he deserved the ridicule.
But, Christ, letโs have some compassion! Animal welfare arguments have really scary implicationsโimplications that arenโt excessively abstract, given current AGI timelinesโand we shouldnโt be surprised that people could get scared by them.
Stone overreacted and said lots of stupid thingsโbut his underlying fear is real and reasonable, and we need to keep it in mind.
Lukethoughts
(If Lucas were a superintelligent AI, Iโm sure heโd be aligned. His thoughts today!)
โGUYS I ALMOST FORGOT MY THOUGHTS.โ (Ed. note: Itโs ok, Luke, I forget my thoughts all the time.)
โHow the fuck did I feel confident on my math and chem test todayโฆ math makes senseโI think Iโm pretty good at itโbut chemistry?? Thatโs out of my pay grade.โ (Ed. note: Ugh, good for him, I guess. I flubbed the both of themโฆ)
โDO NOT EAT 12 DONUTS, they will ruin your digestive tract for the day.โ (Ed. note: Ah. Um. Noted?)
โYesterday was one of the best days there has been in a while. The weather needs to stay like this.โ (Ed. note: It was a really weirdly nice day!)
Which, I know, Canada. But McGill! But Canada. I dunno, credentialism is white supremacy culture or something anyway.
Further evidence for the โLyman Stone thinks we think weโre better than himโ theory: many of his superficially plausible justifications have to do less with the object-level moral philosophy, and more to do with calling EAs ineffectual and smug and dishonest. Much of his article was dedicated to โprovingโ that EAs also donโt make significant donations, and so, by their own definitions, are just as morally monstrous as everyone else.
The upper-bound of a Rethink Priorities report calculating the total number of caught shrimp is 66 trillion. Probably there are many many more total living shrimp than caught ones, and obviously there are strictly at least as many.
Quick clarifying note on section 4:
I'm not worried that AGI will look at humans, appraise us to be evil, then try to kill us out of revenge. (Though I guess this is vaguely possible?) The issue is that right now, everyone agrees that humans are the moral center of the universe. But they think so for different reasons.
Lyman Stone thinks it's because we have way more capacity for experiencing morality, and also because all our intuitions say this is true.
Bentham's Bulldog thinks it's because we have the capacity for moral development, and actually animal welfare is basically a utility monster. Our moral center-ness is based on the fact that we can get better at treating other things well, not that treatment of us matters a great deal.
Aligned AGI is either a good thought experiment or a real issue that illuminates the divide between these views. Good AGI would, on Bentham's view, do some things like:
- Prioritize its own suffering above oursโsuperintelligence doesn't necessarily imply consciousness, but it seems to make it more likely. Perhaps the AGI itself will become the moral center of the universe on a Bentham-esque view.
- Prioritize animal welfare above ours. This is the view that I think Bentham has to supportโremember, humans are only the moral center because we have the future capacity to do something about "The Worst Thing In The World" and "The Worst Thing Ever." AGI has the present capacity to do something about itโso it'll just do it and mostly ignore us.
- Prioritize the creation of its own utility monster. It might turn out to be kinda difficult to wirehead any of the world's existing creatures. So the AGI might figure that it should just genetically engineer / program its own new creatures with a huge capacity for pleasure and then start stimulating them constantly. Then it'll ignore everything else.
(That last one looks like it's only a problem on hedonic utilitarianism, but there are isomorphic cases for other formulations. Maybe there's an objective list! Ok, then what are the odds that us humans happen to be able to check *all* the things off it, and that we can do so for not too much energy? Pretty tiny! AGI would likely just make some new creatures that can do more things on the list more cheaply. Similarly for desireโhumans have only so many desires and they're only so cheap to satisfy. Maybe AGI could create some consciousnesses whose only desire is "feed me one electron every 10,000 years" and then just feed them all one electron every 10,000 years.)
The scariest implication of all this is that if there were a button labeled "bring perfectly moral AGI into the world," we would be obligated to press it! If Bentham is right about morality, we'd be obligated to press a button that led to our own extinction.
I never realized when some people described aligned AI they meant objectively moral AI. I always thought that Aligned AI meant it did what we want it to do, which is almost certainly not the morally best thing. โWeโ is ambiguous, from their creator, to billionaires, to government, but none of them want to have a perfectly moral AI for this reason.