{"id":6821,"date":"2022-11-20T14:44:33","date_gmt":"2022-11-20T20:44:33","guid":{"rendered":"https:\/\/scottaaronson.blog\/?p=6821"},"modified":"2022-11-22T17:11:59","modified_gmt":"2022-11-22T23:11:59","slug":"reform-ai-alignment","status":"publish","type":"post","link":"https:\/\/scottaaronson.blog\/?p=6821","title":{"rendered":"Reform AI Alignment"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Update (Nov. 22):<\/mark><\/strong> Theoretical computer scientist and longtime friend-of-the-blog <a href=\"https:\/\/www.boazbarak.org\/\">Boaz Barak<\/a> writes to tell me that, coincidentally, he and Ben Edelman just released a big essay advocating a version of &#8220;Reform AI Alignment&#8221; <a href=\"https:\/\/windowsontheory.org\/2022\/11\/22\/ai-will-change-the-world-but-wont-take-it-over-by-playing-3-dimensional-chess\/\">on Boaz&#8217;s Windows on Theory blog<\/a>, <a href=\"https:\/\/www.lesswrong.com\/posts\/zB3ukZJqt3pQDw9jz\/ai-will-change-the-world-but-won-t-take-it-over-by-playing-3\">as well as on LessWrong<\/a>.  (I warned Boaz that, having taken the momentous step of posting to LessWrong, in 6 months he should expect to find himself living in a rationalist group house in Oakland&#8230;)  Needless to say, I don&#8217;t necessarily endorse their every word or vice versa, but there&#8217;s a striking amount of convergence.  They also have a much more detailed discussion of (e.g.) which kinds of optimization processes they consider relatively safe.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Nearly halfway into my year at OpenAI, still reeling from the FTX collapse, I feel like it&#8217;s finally time to start blogging my AI safety thoughts&#8212;starting with a little appetizer course today, more substantial fare to come.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many people claim that&nbsp;AI&nbsp;alignment is little more a modern eschatological religion&#8212;with prophets, an end-times prophecy, sacred scriptures, and even a god (albeit, one who doesn&#8217;t exist quite yet).  The obvious response to that claim is that, while there&#8217;s some truth to it, &#8220;religions&#8221; based around technology are a little different from the old kind, because technological progress <em>actually happens<\/em> regardless of whether you believe in it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I mean, the Internet is sort of like the old concept of the collective unconscious, except that it actually exists and you&#8217;re using it right now.  Airplanes and spacecraft are kind of like the ancient dream of Icarus&#8212;except, again, for the actually existing part.  Today GPT-3 and DALL-E2 and LaMDA and AlphaTensor exist, as they didn&#8217;t two years ago, and one has to try to project forward to what their vastly-larger successors will be doing a decade from now.  Though some of my colleagues are still in denial about it, I regard the fact that such systems will have transformative effects on civilization, comparable to or greater than those of the Internet itself, as &#8220;already baked in&#8221;&#8212;as just the mainstream position, not even a question anymore.  That doesn&#8217;t mean that future AIs are going to convert the earth into paperclips, or give us eternal life in a simulated utopia.  But their story <em>will<\/em> be a central part of the story of this century.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Which brings me to a second response.  If&nbsp;AI&nbsp;alignment is a religion, it\u2019s now large and established enough to have a thriving &#8220;Reform&#8221; branch, in addition to the original &#8220;Orthodox&#8221; branch epitomized by Eliezer Yudkowsky and <a href=\"https:\/\/intelligence.org\/\">MIRI<\/a>.&nbsp; As far as I can tell, this Reform branch now counts among its members a large fraction of the&nbsp;AI&nbsp;safety&nbsp;researchers now working in academia and industry. &nbsp;(I\u2019ll leave the formation of a Conservative branch of&nbsp;AI&nbsp;alignment, which reacts against the Reform branch by moving <em>slightly<\/em> back in the direction of the Orthodox branch, as a problem for the future \u2014 to say nothing of Reconstructionist or Marxist branches.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here\u2019s an incomplete but hopefully representative list of the differences in doctrine between Orthodox and Reform AI Risk:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(1) Orthodox&nbsp;AI-riskers tend to believe that humanity will survive or be destroyed based on the actions of a few elite engineers over the next decade or two.&nbsp; Everything else&#8212;climate change, droughts, the future of US democracy, war over Ukraine and maybe Taiwan&#8212;fades into insignificance except insofar as it affects those engineers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers, by contrast, believe that&nbsp;AI&nbsp;might well pose civilizational risks in the coming century, but so does all the other stuff, and it&#8217;s all tied together.&nbsp; An invasion of Taiwan might change which world power gets access to TSMC GPUs.&nbsp; Almost everything affects which entities pursue the&nbsp;AI&nbsp;scaling frontier and whether they&#8217;re cooperating or competing to be first.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(2) Orthodox&nbsp;AI-riskers believe that public outreach has limited value: most people can&#8217;t understand this issue anyway, and will need to be saved from&nbsp;AI&nbsp;despite themselves.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers believe that trying to get a broad swath of the public on board with one&#8217;s preferred&nbsp;AI&nbsp;policy is something close to a deontological imperative.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(3) Orthodox&nbsp;AI-riskers worry almost entirely about an agentic, misaligned&nbsp;AI&nbsp;that deceives humans while it works to destroy them, along the way to maximizing its strange utility function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers entertain that possibility, but we worry at least as much about powerful AIs that are weaponized by bad humans, which we expect to pose existential risks much earlier in any case.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(4) Orthodox&nbsp;AI-riskers have limited interest in&nbsp;AI&nbsp;safety&nbsp;research applicable to actually-existing systems (LaMDA, GPT-3, DALL-E2, etc.), seeing the dangers posed by those systems as basically trivial compared to the looming danger of a misaligned agentic&nbsp;AI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers see research on actually-existing systems as one of the only ways to get feedback from the world about which&nbsp;AI&nbsp;safety&nbsp;ideas are or aren&#8217;t promising.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(5) Orthodox&nbsp;AI-riskers worry most about the &#8220;FOOM&#8221; scenario, where some&nbsp;AI&nbsp;might cross a threshold from innocuous-looking to plotting to kill all humans in the space of hours or days.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers worry most about the &#8220;slow-moving trainwreck&#8221; scenario, where (just like with climate change) well-informed people can see the writing on the wall decades ahead, but just can&#8217;t line up everyone&#8217;s incentives to prevent it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(6) Orthodox&nbsp;AI-riskers talk a lot about a &#8220;pivotal act&#8221; to prevent a misaligned&nbsp;AI&nbsp;from ever being developed, which might involve (e.g.) using an aligned&nbsp;AI&nbsp;to impose a worldwide surveillance regime.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers worry more about such an act causing the very calamity that it was intended to prevent.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(7) Orthodox&nbsp;AI-riskers feel a strong need to repudiate the norms of mainstream science, seeing them as too slow-moving to react in time to the existential danger of&nbsp;AI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers feel a strong need to get mainstream science on board with the&nbsp;AI&nbsp;safety&nbsp;program.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(8) Orthodox&nbsp;AI-riskers are maximalists about the power of pure, unaided superintelligence to just figure out how to commandeer whatever physical resources it needs to take over the world (for example, by messaging some lab over the Internet, and tricking it into manufacturing nanobots that will do the superintelligence&#8217;s bidding).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We Reform&nbsp;AI-riskers believe that, here just like in high school, there are limits to the power of pure&nbsp;intelligence&nbsp;to achieve one&#8217;s goals.&nbsp; We&#8217;d expect even an agentic, misaligned&nbsp;AI, if such existed, to need a stable power source, robust interfaces to the physical world, and probably allied humans before it posed much of an existential threat.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What have I missed?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update (Nov. 22): Theoretical computer scientist and longtime friend-of-the-blog Boaz Barak writes to tell me that, coincidentally, he and Ben Edelman just released a big essay advocating a version of &#8220;Reform AI Alignment&#8221; on Boaz&#8217;s Windows on Theory blog, as well as on LessWrong. (I warned Boaz that, having taken the momentous step of posting [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_seo_schema_type":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"{title}\n\n{excerpt}\n\n{url}","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wpas_customize_per_network":false,"jetpack_post_was_ever_published":false},"categories":[12,8],"tags":[],"class_list":["post-6821","post","type-post","status-publish","format-standard","hentry","category-metaphysical-spouting","category-the-fate-of-humanity"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/6821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6821"}],"version-history":[{"count":3,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/6821\/revisions"}],"predecessor-version":[{"id":6827,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/6821\/revisions\/6827"}],"wp:attachment":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}