{"id":7230,"date":"2023-04-16T17:18:25","date_gmt":"2023-04-16T22:18:25","guid":{"rendered":"https:\/\/scottaaronson.blog\/?p=7230"},"modified":"2023-04-16T19:24:46","modified_gmt":"2023-04-17T00:24:46","slug":"ai-safety-what-should-actually-be-done-now","status":"publish","type":"post","link":"https:\/\/scottaaronson.blog\/?p=7230","title":{"rendered":"AI safety: what should actually be done now?"},"content":{"rendered":"\n<p>So, I recorded a 2.5-hour-long podcast with <a href=\"https:\/\/danielfilan.com\/\">Daniel Filan<\/a> about &#8220;reform AI alignment,&#8221; and the work I\u2019ve been doing this year at OpenAI.\u00a0 The end result is &#8230; well, probably closer to my current views on this subject than anything else I&#8217;ve said or written! <a href=\"https:\/\/podcasts.google.com\/feed\/aHR0cHM6Ly9heHJwb2RjYXN0LmxpYnN5bi5jb20vcnNz\/episode\/NTM1YTQ1MzAtMDVlNS00OTE4LThlMjgtOWRmZWUzMjM1Mjk0\">Listen here<\/a> or <a href=\"https:\/\/axrp.net\/episode\/2023\/04\/11\/episode-20-reform-ai-alignment-scott-aaronson.html\">read the transcript here<\/a>.  Here&#8217;s Daniel&#8217;s abstract:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.<\/p>\n<\/blockquote>\n\n\n\n<p>Thanks so much to Daniel for making this podcast happen.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Maybe I should make a broader comment, though.<\/p>\n\n\n\n<p>From my recent posts, and from my declining to sign the <a href=\"https:\/\/futureoflife.org\/open-letter\/pause-giant-ai-experiments\/\">six-month AI pause letter<\/a> (even though I sympathize with many of its goals), many people seem to have goten the impression that I\u2019m not worried about AI, or that (ironically, given my job this year) I&#8217;m basically in the &#8220;full speed ahead&#8221; camp.<\/p>\n\n\n\n<p>This is not true.\u00a0 In reality, I\u2019m <em>full<\/em> <em>of<\/em> worry.  The issue is just that, in this case, I\u2019m also full of <em>metaworry<\/em>&#8212;i.e., the worry that whichever things I worry about will turn out to have been the wrong things.<\/p>\n\n\n\n<p>Even if we look at the pause letter, or more generally, at the people who wish to slow down AI research, we find that they wildly disagree <em>among themselves<\/em> about why a slowdown is called for.\u00a0 One faction says that AI needs to be paused because it will spread misinformation and entrench social biases \u2026 or (this part is said aloud surprisingly often) because progress is being led by, you know, like, <em>totally gross<\/em> capitalistic Silicon Valley nerdbros, and might enhance those nerds&#8217; power.<\/p>\n\n\n\n<p>A second faction, one that <em>contains<\/em> many of the gross nerdbros, is worried about AI because it might become superintelligent, recursively improve itself, and destroy all life on earth while optimizing for some alien goal.  Hopefully both factions agree that this scenario would be bad, so that the only disagreement is about its likelihood.<\/p>\n\n\n\n<p>As I&#8217;ll never tire of pointing out, the two factions seem to have been converging on the same conclusion&#8212;namely, <em>AI progress urgently needs to be slowed down<\/em>&#8212;even while they sharply reject each other&#8217;s rationales and indeed are barely on speaking terms with each other.<\/p>\n\n\n\n<p>OK, you might object, but that&#8217;s just sociology.  Why shouldn&#8217;t a rational person worry about near-term AI risk <em>and<\/em> long-term AI risk?  Why shouldn&#8217;t the ethics people focused on the former and the alignment people focused on the latter strategically join forces?  Such a hybrid Frankenpause is, it seems to me, precisely what the pause letter was trying to engineer.  Alas, the result was that, while a few people closer to the AI ethics camp (like Gary Marcus and Ernest Davis) agreed to sign, many others (Emily Bender, Timnit Gebru, Arvind Narayanan&#8230;) pointedly declined, because&#8212;as they explained on social media&#8212;to do so would be to legitimate the gross nerds and their sci-fi fantasies.<\/p>\n\n\n\n<p>From my perspective, the problem is this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Under the ethics people&#8217;s assumptions, I don&#8217;t see that an AI pause is called for.<\/strong>  Or rather, while I understand the arguments, the <em>same<\/em> arguments would seem to have justified stopping the development of the printing press, aviation, radio, computers, the Internet, and virtually every other nascent technology, until committees of academic experts had decided that the positive social effects would outweigh the negative ones, which might&#8217;ve been never.  The trouble is, well, how do you even <em>study<\/em> the social effects of a new technology, before society starts using it?  Aren&#8217;t we mostly <em>happy<\/em> that technological pioneers went ahead with all the previously-mentioned things, and dealt with the problems later as they arose?  But preventing the widespread societal adoption of GPT-like tools seems to be what the AI ethics camp <em>really<\/em> wants, much more than preventing further scaling for scientific research.  I reject any anti-AI argument that could be generalized and transplanted backwards to produce an argument against moving forward with, let&#8217;s say, agriculture or metallurgy.<\/li>\n\n\n\n<li><strong>Under the alignment people&#8217;s assumptions, I <em>do<\/em> see that an AI pause is urgently called for&#8212;but I&#8217;m not yet on board with their assumptions.<\/strong>  The kind of relentlessly optimizing AI that could form the intention to doom humanity, still seems very different to me from the kind of AI that\u2019s astonished the world these past couple years, to the point that it\u2019s not obvious how much progress in the latter should increase our terror about the former.\u00a0 Even Eliezer Yudkowsky <a href=\"https:\/\/www.youtube.com\/watch?v=AaTRHFaaPG8\">agrees<\/a> that GPT-4 doesn&#8217;t seem too dangerous in itself.  And an AI that was only <em>slightly<\/em> dangerous could presumably be recognized as such before it was too late.  So everything hinges on the conjecture that, in going from GPT-n to GPT-(n+1), there might be a &#8220;sharp turn&#8221; where an existential risk to humanity very suddenly emerged, with or without the cooperation of bad humans who used GPT-(n+1) for nefarious purposes.  I still don&#8217;t know how to think about the likelihood of this risk.  The empirical case for it is likely to be inadequate, by its proponents&#8217; own admission.  I admired how my friend Sarah Constantin thought through the issues in her recent essay <a href=\"https:\/\/sarahconstantin.substack.com\/p\/why-i-am-not-an-ai-doomer\">Why I Am Not An AI Doomer<\/a>&#8212;but on the other hand, as others have pointed out, Sarah ends up conceding a staggering fraction of the doomers&#8217; case in the course of arguing against the rest of it.  What today passes for an &#8220;anti-doomer&#8221; might&#8217;ve been called a &#8220;doomer&#8221; just a few years ago.<\/li>\n<\/ol>\n\n\n\n<p>In short, one could say, the ethics and alignment communities are <em>both<\/em> building up cases for pausing AI progress, working at it from opposite ends, but their efforts haven&#8217;t yet met at any single argument that I wholeheartedly endorse.<\/p>\n\n\n\n<p>This might just be a question of timing.  <em>If<\/em> AI is going become existentially dangerous, then I definitely want global coordination well <em>before<\/em> that happens.  And while it seems unlikely to me that we&#8217;re anywhere near the existential danger zone yet, the pace of progress over the past few years has been so astounding, and has upended so many previous confident assumptions, that caution seems well-advised.<\/p>\n\n\n\n<p>But is a pause the right action?  How should we compare the risk of acceleration now to the risk of a so-called &#8220;overhang,&#8221; where capabilities might skyrocket even faster in the future, faster than society can react or adapt, <em>because<\/em> of a previous pause?  Also, would a pause even force OpenAI to change its plans from what they would&#8217;ve been otherwise?  (If I knew, I&#8217;d be prohibited from telling, which makes it convenient that I don&#8217;t!)  Or would the main purpose be symbolic, just to show that the main AI labs can coordinate on <em>something<\/em>?<\/p>\n\n\n\n<p>If so, then one striking aspect of the pause letter is that it was written without consultation with the main entities who would need to agree to any such pause (OpenAI, DeepMind, Google, &#8230;).  Another striking aspect is that it applies only to systems &#8220;more powerful than&#8221; GPT-4.  There are two problems here.  Firstly, the concept &#8220;more powerful than&#8221; isn&#8217;t well-defined: presumably it rules out more parameters and more gradient descent, but what about more reinforcement learning or tuning of hyperparameters?  Secondly, to whatever extent it makes sense, it seems specifically tailored to tie the hands of OpenAI, while giving OpenAI&#8217;s competitors a chance to catch up to OpenAI.  The fact that the most famous signatory is Elon Musk, who&#8217;s now trying to build an &#8220;anti-woke&#8221; chatbot to compete against GPT, doesn&#8217;t help.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>So, if not this pause letter, <em>what do I think ought to happen instead?<\/em><\/p>\n\n\n\n<p>I&#8217;ve been thinking about it a lot, and the most important thing I can come up with is: clear articulation of fire alarms, red lines, whatever you want to call them, along with what our responses to those fire alarms should be.  Two of my previous fire alarms were the first use of chatbots for academic cheating, and the first depressed person who commits suicide after interacting with a chatbot.  Both of those have now happened.  Here are some others:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A chatbot is used to impersonate someone for fraudulent purposes, by imitating his or her writing style.<\/li>\n\n\n\n<li>A chatbot helps a hacker find security vulnerabilities in code that are then actually exploited.<\/li>\n\n\n\n<li>A child dies because his or her parents follow wrong chatbot-supplied medical advice.<\/li>\n\n\n\n<li>Russian or Iranian or Chinese intelligence, or some other such organization, uses a chatbot to mass-manufacture disinformation and propaganda.<\/li>\n\n\n\n<li>A chatbot helps a terrorist manufacture weapons that are used in a terrorist attack.<\/li>\n<\/ul>\n\n\n\n<p>I&#8217;m extremely curious: which fire alarms are <em>you<\/em> most worried about?  How do you think the AI companies and governments should respond if and when they happen?<\/p>\n\n\n\n<p>In my view, articulating fire alarms actually provides multiple benefits.  Not only will it give us a playbook if and when any of the bad events happen, it will also give us clear <em>targets to try to forecast<\/em>.  If we&#8217;ve decided that behavior X is unacceptable, and if extrapolating the performance of GPT-1 through GPT-n on various metrics leads to the prediction that GPT-(n+1) will be capable of X, then we suddenly have a clear, legible case for delaying the release of GPT-(n+1).<\/p>\n\n\n\n<p>Or&#8212;and this is yet a third benefit&#8212;we have something clear on which to <em>test<\/em> GPT-(n+1), in &#8220;sandboxes,&#8221; before releasing it.  I think the kinds of <a href=\"https:\/\/www.lesswrong.com\/posts\/4Gt42jX7RiaNaxCwP\/more-information-about-the-dangerous-capability-evaluations\">safety evals<\/a> that ARC (the Alignment Research Center) did on GPT-4 before it was released&#8212;for example, testing its ability to deceive Mechanical Turkers&#8212;were an extremely important prototype, something that we&#8217;ll need a lot more of before the release of future language models.   But all of society should have a say on what, specifically, <em>are<\/em> the dangerous behaviors that these evals are checking for.<\/p>\n\n\n\n<p>So let&#8217;s get started on that!  Readers: which unaligned behaviors would you like GPT-5 to be tested for prior to its release?  Bonus points for plausibility and non-obviousness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, I recorded a 2.5-hour-long podcast with Daniel Filan about &#8220;reform AI alignment,&#8221; and the work I\u2019ve been doing this year at OpenAI.\u00a0 The end result is &#8230; well, probably closer to my current views on this subject than anything else I&#8217;ve said or written! Listen here or read the transcript here. Here&#8217;s Daniel&#8217;s abstract: [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wpas_customize_per_network":false},"categories":[8],"tags":[],"class_list":["post-7230","post","type-post","status-publish","format-standard","hentry","category-the-fate-of-humanity"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/7230","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7230"}],"version-history":[{"count":7,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/7230\/revisions"}],"predecessor-version":[{"id":7240,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=\/wp\/v2\/posts\/7230\/revisions\/7240"}],"wp:attachment":[{"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7230"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7230"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scottaaronson.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}