Part 3: Balancing Progress and Precaution
The metagame, and how everyone's reacting to the latest AI progress
Part of a series on AI Safety. Part 1: A Non-Hyped Overview of AI Safety and Part 2: Strategies for Safe AI.
Doom and gloom
There’s a big debate on Twitter between the Doomer and the accelerationist (e/acc) crowd. The extreme doomer position is to bomb the data centers, as Eliezer Yudkowsky said, as we’re on a collision course with extinction unless we take drastic action. At least it’s (in theory) logical: if an infinitely bad scenario is possible, even with small probabilities it makes sense to incur significant harm in the short term to avoid it. Given the uncertain probability of the existential threat, times the uncertainty of that mitigation actually working (the unabomber kind of tried that and it didn’t do much), times the uncertainty that other less bad mitigations could also work, I think we should proceed with a bit more caution and humility. We have to be very careful about moral absolutism and compromising our own basic codes, and should work to minimize uncertainty before taking drastic action. We would want our AGI overlords to do the same.
Accelerate
On the other hand, the accelerationist or e/acc crowd (e.g. Beff Jezos) think this is self-interested hype and that we should proceed full steam ahead. Progress has been good for the world so far, and therefore will continue to be good. We’ve seen automation before and there are still plenty of jobs. Just relax and enjoy the abundance to come!
It’s true that technology has helped create massive benefit to humanity, with global measures for poverty, health outcomes, education, quality of life, etc. all dramatically improving since the industrial revolution. It’s also true that there were doomsayers many times along the way, only to be proven wrong. But I really do think that time is different. Replacing a horse with a train is not the same thing as replacing your calculator with a general purpose, self-improving, agency-having intelligence. While I’m normally a growther, I think this camp is too cavalier about AI risk.
Regulate
Governments have a strong incentive to regulate here. This is a technology that will likely impact the majority of their electorate, and it’s also fashionable and popular to be tough on tech. Empires rise and fall on the backs of new technology waves, and this paradigm shift has the potential for weaker nation-states to leapfrog stronger ones via economic strength, hard power, or both. The US (Executive Order 14110), EU (EU AI Act), and China have all published early forms of various AI regulation and are clearly thinking about this. It is a bit surprising that this technology is so unregulated given the significant impact, power, and risk it can have, compared to, say, the highly regulated financial sector. Our regulatory apparatus is slow to identify and mitigate risks and operates at a constant speed, while the impact from technology is accelerating. Hopefully technology can help this problem too!
The proposals are generally considering transparently tracking the training and characterization of large models, mitigating consumer harm through bias or unethical deployment, and respecting data privacy, with more restrictions for more advanced models. The US specifically calls out deepfakes as a dangerous technology and will soon require clear disclosure. The CHIPS act also specifically restricts exporting our most advanced chips to China to ensure we continue to have the best frontier models. It’s hard for these policies to be effective without creating regulatory capture for the biggest companies, stifling innovation, and handing the advantage to adversaries, but at least it’s a start. Governments will play a crucial role in the future by advocating for, funding, and requiring safe development and deployment of AIs, especially if they can break their own race dynamics and coordinate globally. I just hope that it’s done in a way that’s actually effective.
Resist Moloch
It’s hard to slow down progress, especially with godlike power so tantalizingly close. It has been done before, with recombinant DNA in the 1970s, on avian flu gain-of-function research in 2011, and on genetic engineering of the human germline in 2019. There was a high-profile attempt in March 2023 to pause research on AI systems more capable than GPT-4, but it didn’t work, likely because of competitive pressure, coordination problems, and a lack of consensus on the probability, severity, and potential timeline of the risk.
It’s now clear that things aren’t about to start slowing down. Even if the US and EU governments get together and impose a moratorium, what about China? What about the UAE? We’re locked in a wild race without knowing if the finish line is a trap. We’re driven by Moloch to obey the same primitive laws we always have. Compete. Dominate. Generate profit. Governments and scientific communities should absolutely do what they can to bring sanity to the system, but I fear that the race dynamics will remain extremely hard to resist.
Our central challenge is that it’s practically impossible to stop progress forever, but we also can’t afford to completely take our hands off the wheel. Mustafa Suleyman, cofounder of Deepmind, describes this as “the narrow path” in The Coming Wave. Our only solution is to boldly advance, step by step, and figure it out as we go.
Take heart. GPT-4 isn’t AGI, and won’t wake up tomorrow with dreams of world domination. AI alignment research is making progress. AI safety is increasingly mainstream. At this point, we need to focus on accelerating safety research to ensure that it’s “solved” before takeoff (i.e. before 2028), and then be increasingly diligent that these safe conditions are met as the day approaches.
Adults in the room
This is where I think the closed model of OpenAI and Anthropic make sense. If there was a computer program that had a 0.0001% chance of launching a nuke whenever you ran it, you wouldn’t want millions of people running that computer program.
Open source is helpful in the short term to advance both capability and safety research, but the closer we get to AGI, the more we should close things down to the most cautious, thoughtful, safety-focused actors. I actually feel safer in a world in which either of these companies get to AGI first, with the right safety mitigations in place, rather than it happening on some random hacker’s laptop or a research lab in China. They can do better safety research by being on the frontier, and they’ll have a 3+ month head start to tread carefully with their AGI before anyone else catches up.
Some of the most vocal critics are those who don’t trust these private AI labs to do the right thing. The world doesn’t think the masters of the universe in silicon valley will put humanity above profits. Power corrupts, and these companies are on track to become very powerful. Even though knee-jerk criticism of silicon valley is annoying and often unfounded, perceptions matter, especially for companies that have a humanity-scale mandate. They’ll need to earn and keep the public’s trust. To protect against corruption, or even simply prioritizing profits above safe AI, governance structures will need to strengthen and evolve over time. These companies are already proactively laying the groundwork, for example with Anthropic’s Long-Term Benefit Trust. Finally, if we trust these companies to achieve safe AGI on our behalf, and they do, they’ll become fabulously wealthy in the process; they’ll need to share the fruits of this abundance with everyone.
As we were all glued to the thanksgiving OpenAI drama, one thing stood out for me. For about 24 hours after Sam was fired, even though tech twitter thought he was awesome, most held off their judgment and waited to hear the rationale in case the fate of the species was at stake and it was the right move. Unfortunately the reasoning was weaksauce and the board completely botched it, hurting its own credibility and making everyone a little more skeptical of ideologically-driven boards in general, but I think it actually demonstrated that the system can work, both from a governance and public opinion perspective. If OpenAI, Anthropic, or anyone else at the frontier does discover something bad, like “we consistently see GPT-6 trying to deceive us and we’re pretty sure it’s impossible to control these things”, with some evidence, I think the world would take it seriously and cooperate on a moratorium, invest 1% of GDP into safety research, or something similarly drastic. The big question is how large is the gap between us being scared into action and us being unable to do anything about it.
I’m enjoying writing about this topic - please comment or reply to this email if you have any requests for what I should cover next!
I enjoyed reading this. I've caught bits and pieces from twitter, though I tend to avoid the "religious" accounts so the recap part was valuable.
IMO, the genie is out of the bottle and there is no slowing down. I view this as an arm's race now, just like the nuclear bomb, and I expect states to pursue their interests and fund their home companies to compete on the global stage.
Regulation is interesting. I'm a little cynical here - this technology is moving so fast and the creators don't fully understand it. I can't expect old, mostly lawyers, to understand it either. I do think something will be needed here, I'm not sure what.
I disagree with you on the "adults in the room" section. I think we will see a wide amount of models, many of them open source. I think they will begin to reach parity with commercial models. There will be no adult in the room at that point - and any hacker with a laptop will have access to AGI (if we see that in our lifetime).