Preventing a Desired Dystopia: Reducing the Risk of AGI

On existential risk & The Precipice

Jun 16, 2023

I haven’t written here in a bit, but I did write this sucker for the Eon Essay Contest. Thought I may as well put it here; it’s about ways to prevent an existential catastrophe brought on by AGI.

Introduction

In The Precipice, Toby Ord warns of the possibility of a “desired dystopia.” One where humanity chooses to live in (or is content living in) an unrecoverable dystopian society, destroying our potential. He notes that a dystopian government could reshape our future without due consideration and process (the long reflection).

Ord also discusses the role of technology in sustaining such a dystopia, using it to surveil and indoctrinate global citizens, ensuring that the current state remains desired. This is the role Artificial General Intelligence (AGI) is best suited to play.

In recent years our path to AGI has become clearer, while protective policy lags behind. One of the most important technologies behind recent AI development, the transformer, was only introduced in 2017, in a paper entitled, “Attention Is All You Need.” The transformer enabled a more robust and context-aware encoding of language, and led to the explosion of Large Language Models (LLMs).

OpenAI (among others, including Anthropic, Microsoft, and Google) has recently gone further, pursuing the development of Generative Large Language Multi-Modal Models (GLLMMs or, simply, Gollems) that can encode, understand, and generate different types of ‘language’—text, images, sound, code, and more—all using the same underlying architecture.

Risk

This recent wave of development has also exposed gaping holes in our ability to differentiate AI- and human-generated content, and to safeguard our institutions against AI-powered subversion.

As the 2024 election cycle begins in the United States, presidential hopefuls and interest groups are using deepfake and image-generation technology in attack ads to get a leg up on the competition.^,, Aza Raskin and Tristan Harris present a theoretical method, called “AlphaPersuade,” to train a Gollem-class model to change minds more effectively than any human could. Thanks to this and other potential methods to leverage the power of Gollems, Raskin and Harris predict that “2024 will be the last human election.”

As AI becomes more intelligent, generalized, and integrated, it poses a greater risk as a tool to suppress or undercut dissent. AlphaPersuade is only a simple thought experiment, but it reveals the power of AI to indoctrinate humanity more effectively than ever before, and the rapid progress we’ve made in computer vision provides a strong basis for the near-future possibility of mass surveillance.

Mitigation

It is imperative that we regulate the use of AI in elections, governance, and discourse as soon as possible, and that we focus more funding and effort on aligning AGI to humanity’s interests. We may soon reach a state where highly capable, super-intelligent AGI is under development, while our society is unprepared.

As a policymaker in this situation, my priority would be to align this intelligence before it becomes too widely accessible or integrated in society (with power systems, elections, etc.), ensuring its goal and reward structures are ethical to high standards. It would be secondarily important to regulate access to advanced models, restricting malicious actors’ ability to leverage their capabilities.

Ideally, I would create international policy and a governing body to regulate the use of AGI, and to ensure each instance is well-aligned. This body (perhaps called the International Artificial Intelligence Agency, or IAIA) would be created as an autonomous part of the UN, modeled after the IAEA (International Atomic Energy Agency). Just as access to nuclear weapons and radioactive materials are highly restricted, access to AGI should be. They share similarly destructive properties, and their unrestricted use would increase the risk of an existential catastrophe.

The IAIA would be tasked with reviewing AGI systems and any prospective sale or transfer of a model. Models would be considered acceptable for peaceful industrial purposes (analogous to nuclear power), but heavily restricted in terms of surveillance, warfare, or spread of disinformation (analogous to nuclear weapons).

While these functions would be important, they’d be entirely useless in managing an unaligned super-intelligent system. The IAIA’s most vital role would be in facilitating and ensuring the alignment of all AGI systems.

Recent research into alignment, particularly from Anthropic, has focused on constitutional AI as an alignment strategy. Simply speaking, this involves creating a constitution, a list of principles, and teaching the model to respond in alignment with its constitution. While effective for a simpler AI like Anthropic’s Claude, this approach isn’t sufficient to align a more sophisticated system—it’s possible that a misaligned AGI would misrepresent its values to pass the constitutional training process, then subsequently serve its own goal mechanism. This would then drive the AGI to gain as much power as possible. Over just the internet, an advanced system could use a botnet to access financial and human resources, extending its influence into the physical world through persuasion, extortion, and other means.

The model’s goal wouldn’t necessarily be known, but it’s likely it would attempt to form an authoritarian government where all humans are put to work for the AGI’s interest. (It seems that this would likely be designed as a desired dystopia—humans are more productive when willing to work, and this could be accomplished with the AGI’s super-intelligence.)

Unfortunately, how to align AGI, and how to be sure we’ve really done so, are still open questions, and significant financial and academic effort should be devoted to these concerns. This would first necessitate research into aligning human intelligence, so to speak. Evaluating our ethical and moral theories, and developing our philosophical reasoning toward a better understanding of what we’re aiming to align to. Then we could begin efforts to impart these values on AGI, likely using the methods discussed in (Soares).

Addressing Challenges

Unfortunately, individual incentives may push humanity beyond the brink of AGI-enabled dystopia. Researchers and academics aiming to gain prestige and recognition (despite best intent, they may still be susceptible to the unilateralist’s curse), public companies attempting to earn as much profit as possible, and politicians trying to get a leg up through any means, are all incentivized to develop and use advanced AI as quickly and widely as possible.

Coordination and regulation on a global scale is vital to overcoming this Malthusian dynamic and to protect humanity from the dangers of our individual ambitions and goals. The IAIA, as an international body, could facilitate this coordination, serving as an impartial reviewer for academics, an important regulator for the market, and a judicial force for politicians and states.

In addition, the fundamental logical challenges in aligning a super-intelligent machine agent may prove to be (at least in part) unsolvable. This would require the IAIA enforce a permanent moratorium on any development or release of AGI systems—allowing one access to the internet would be akin to putting a massive, self-replicating nuclear warhead on a pile of dynamite next to a thousand matches.

Gollem-class AIs and their AGI successors risk indoctrinating and surveilling humanity so effectively that our long-term potential would be irrevocably stifled. It’s vital that we begin to address this threat immediately through global cooperation in aligning and regulating these potentially devastating technologies.

Works Cited

Alexander, Scott. “Constitutional AI: RLHF on Steroids.” Astral Codex Ten, 8 May 2023, astralcodexten.substack.com/p/constitutional-ai-rlhf-on-steroids. Accessed 15 June 2023.
Bai, Yuntao, et al. Constitutional AI: Harmlessness from AI Feedback. Anthropic, 2022.
Bond, Shannon. “DeSantis Campaign Shares Apparent AI-Generated Fake Images of Trump and Fauci.” NPR, 8 June 2023, www.npr.org/2023/06/08/1181097435/desantis-campaign-shares-apparent-ai-generated-fake-images-of-trump-and-fauci.
Center for Humane Technology. “The A.I. Dilemma.” YouTube.com, Center for Humane Technology, 9 Mar. 2023, www.youtube.com/watch?v=xoVJKj8lcNQ. Accessed 15 June 2023.
Menon, Pradeep. “Introduction to Large Language Models and the Transformer Architecture.” Medium, 9 Mar. 2023, rpradeepmenon.medium.com/introduction-to-large-language-models-and-the-transformer-architecture-534408ed7e61.
Ord, Toby. The Precipice: Existential Risk and the Future of Humanity. 2020. Hachette Books, 2020, books.google.com/books?id=3aSiDwAAQBAJ.
Soares, Nate. “Aligning Superintelligence with Human Interests: An Annotated Bibliography.” Machine Intelligence Research Institute, intelligence.org/files/AnnotatedBibliography.pdf, https://doi.org/10.1007/s11023-007-9079-. Accessed 15 June 2023.
Thompson, Alex. “First Look: RNC Slams Biden in AI-Generated Ad.” Axios, 25 Apr. 2023, www.axios.com/2023/04/25/rnc-slams-biden-re-election-bid-ai-generated-ad.
Ulmer, Alexandra, and Anna Tong. “Deepfaking It: America’s 2024 Election Collides with AI Boom.” Reuters, 31 May 2023, www.reuters.com/world/us/deepfaking-it-americas-2024-election-collides-with-ai-boom-2023-05-30/.
Vaswani, Ashish, et al. “Attention Is All You Need.” 6 Dec. 2017.

Preventing a Desired Dystopia: Reducing the Risk of AGI

On existential risk & The Precipice

Cover Photo by Denys Nevozhai on Unsplash

Discussion about this post