If AI scaling is to be shut down, let it be for a coherent reason

There’s now an open letter arguing that the world should impose a six-month moratorium on the further scaling of AI models such as GPT, by government fiat if necessary, to give AI safety and interpretability research a bit more time to catch up. The letter is signed by many of my friends and colleagues, many who probably agree with each other about little else, over a thousand people including Elon Musk, Steve Wozniak, Andrew Yang, Jaan Tallinn, Stuart Russell, Max Tegmark, Yuval Noah Harari, Ernie Davis, Gary Marcus, and Yoshua Bengio.

Meanwhile, Eliezer Yudkowsky published a piece in TIME arguing that the open letter doesn’t go nearly far enough, and that AI scaling needs to be shut down entirely until the AI alignment problem is solved—with the shutdown enforced by military strikes on GPU farms if needed, and treated as more important than preventing nuclear war.

Readers, as they do, asked me to respond. Alright, alright. While the open letter is presumably targeted at OpenAI more than any other entity, and while I’ve been spending the year at OpenAI to work on theoretical foundations of AI safety, I’m going to answer strictly for myself.

Given the jaw-droppingly spectacular abilities of GPT-4—e.g., acing the Advanced Placement biology and macroeconomics exams, correctly manipulating images (via their source code) without having been programmed for anything of the kind, etc. etc.—the idea that AI now needs to be treated with extreme caution strikes me as far from absurd. I don’t even dismiss the possibility that advanced AI could eventually require the same sorts of safeguards as nuclear weapons.

Furthermore, people might be surprised about the diversity of opinion about these issues within OpenAI, by how many there have discussed or even forcefully advocated slowing down. And there’s a world not so far from this one where I, too, get behind a pause. For example, one actual major human tragedy caused by a generative AI model might suffice to push me over the edge. (What would push you over the edge, if you’re not already over?)

Before I join the slowdown brigade, though, I have (this being the week before Passover) four questions for the signatories:

  1. Would your rationale for this pause have applied to basically any nascent technology — the printing press, radio, airplanes, the Internet? “We don’t yet know the implications, but there’s an excellent chance terrible people will misuse this, ergo the only responsible choice is to pause until we’re confident that they won’t”?
  2. Why six months? Why not six weeks or six years?
  3. When, by your lights, would we ever know that it was safe to resume scaling AI—or at least that the risks of pausing exceeded the risks of scaling? Why won’t the precautionary principle continue for apply forever?
  4. Were you, until approximately last week, ridiculing GPT as unimpressive, a stochastic parrot, lacking common sense, piffle, a scam, etc. — before turning around and declaring that it could be existentially dangerous? How can you have it both ways? If, as sometimes claimed, “GPT-4 is dangerous not because it’s too smart but because it’s too stupid,” then shouldn’t GPT-5 be smarter and therefore safer? Thus, shouldn’t we keep scaling AI as quickly as we can … for safety reasons? If, on the other hand, the problem is that GPT-4 is too smart, then why can’t you bring yourself to say so?

With the “why six months?” question, I confess that I was deeply confused, until I heard a dear friend and colleague in academic AI, one who’s long been skeptical of AI-doom scenarios, explain why he signed the open letter. He said: look, we all started writing research papers about the safety issues with ChatGPT; then our work became obsolete when OpenAI released GPT-4 just a few months later. So now we’re writing papers about GPT-4. Will we again have to throw our work away when OpenAI releases GPT-5? I realized that, while six months might not suffice to save human civilization, it’s just enough for the more immediate concern of getting papers into academic AI conferences.

Look: while I’ve spent multiple posts explaining how I part ways from the Orthodox Yudkowskyan position, I do find that position intellectually consistent, with conclusions that follow neatly from premises. The Orthodox, in particular, can straightforwardly answer all four of my questions above:

  1. AI is manifestly different from any other technology humans have ever created, because it could become to us as we are to orangutans;
  2. a six-month pause is very far from sufficient but is better than no pause;
  3. we’ll know that it’s safe to scale when (and only when) we understand our AIs so deeply that we can mathematically explain why they won’t do anything bad; and
  4. GPT-4 is extremely impressive—that’s why it’s so terrifying!

On the other hand, I’m deeply confused by the people who signed the open letter, even though they continue to downplay or even ridicule GPT’s abilities, as well as the “sensationalist” predictions of an AI apocalypse. I’d feel less confused if such people came out and argued explicitly: “yes, we should also have paused the rapid improvement of printing presses to avert Europe’s religious wars. Yes, we should’ve paused the scaling of radio transmitters to prevent the rise of Hitler. Yes, we should’ve paused the race for ever-faster home Internet to prevent the election of Donald Trump. And yes, we should’ve trusted our governments to manage these pauses, to foresee brand-new technologies’ likely harms and take appropriate actions to mitigate them.”

Absent such an argument, I come back to the question of whether generative AI actually poses a near-term risk that’s totally unparalleled in human history, or perhaps approximated only by the risk of nuclear weapons. After sharing an email from his partner, Eliezer rather movingly writes:

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.

Look, I too have a 10-year-old daughter and a 6-year-old son, and I wish to see them grow up. But the causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in. I can complete the story in my imagination, of course, but I could equally complete a story that starts with GPT-5 and ends with the world saved from various natural stupidities. For better or worse, I lack the “Bayescraft” to see why the first story is obviously 1000x or 1,000,000x likelier than the second one.

But, I dunno, maybe I’m making the greatest mistake of my life? Feel free to try convincing me that I should sign the letter. But let’s see how polite and charitable everyone can be: hopefully a six-month moratorium won’t be needed to solve the alignment problem of the Shtetl-Optimized comment section.

199 Responses to “If AI scaling is to be shut down, let it be for a coherent reason”

  1. Steve E Says:

    I get the fear around AI (both the parochial fear of too much content and the broader fear of turning ppl. into paperclips), but I think a few points are worth noting:
    1. GPT-4 makes me 10x more productive, and I bet that’s true for lots of people. If we shut down GPT-4 we will be shutting down something that *definitely* makes people more productive but is only *maybe* a risk to humanity.
    2. If people are mostly good (something I believe), making people 10x more productive will lead to more good than bad, even if it leads to lots of bad.
    3. Take our smartest people: people like Feynman, Von Neumann, Euler. These people didn’t want to turn everyone else into paper clips. I bet the more advanced AI is the more science it will want to do!
    4. When I was in the Yeshiva, I was taught that people were created b’tzelem elohim, in the image of god. Well, in a way GPT-4 was created in the image of people; it was trained on human text, and its output, while highly intelligent, is also very human in a way. Provided that GPT-5 still uses LLMs, it should still be humanlike and empathetic to human concerns, not some apathetic robot overlord.
    5. We sent a tesla to space. We now, for the first time in human history, have a realistic way to send a computer that generates human-like ideas into space. That’s not nothing–it’s a wonderful gift–and it’s worth continuing this journey for gifts like that.

  2. CB Says:

    Surely people signing this understand that a six month moratorium imposed in the US doesn’t apply to, for example, China, so that if implemented, it would just give AI with Chinese Characteristics a six month head start. The solution to this problem is of course to put Yudkowsky in charge: since he thinks AI is more dangerous than nukes, he’s the one most likely to be able to solve this coordination problem. Realistic scenarios for nuclear war typically predict the survival not only of humanity but even of civilization, so it’s a small price to pay for avoiding the Paperclip Apocalypse.

  3. flergalwit Says:

    Non-expert comment (albeit I did work on the fringes of machine learning for a couple of years the best part of a decade ago):

    I’ve never said or thought that GPT-4 was unimpressive – quite the opposite (though I still have questions about how this approach will scale, when it runs out of human-generated training data, but that’s an aside).

    I don’t know how large the intersection is of the two groups you’re talking about, but it wouldn’t surprise me if there were some. People can be inconsistent or can change their mind. Also it’s quite possible some are very “impressed” with the overall progress in AI, but are specifically unimpressed with GPT precisely because they’re comparing it with what could be done given the state of the overall field.

    A lot of what Yudkowsky says resonates with my own worries about the field (albeit from a non-expert view in my case). I don’t know if he’s right that the extinction of all human life is a virtual inevitability given superintelligence (at the current time). I’m more concerned about the general implications of creating a superintelligent race that may “become to us as we are to orangutans” as your suggested possible answer to the first question said.

    If we’re going to compare with any other technology I think an appropriate one might be designer babies. Virtually everyone I think agrees breeding a race scientifically designed to dwarf everyone else’s abilities (physical or mental) is a bad idea, which is why it isn’t being pursued. (And to pre-emptively fend off a possible misunderstanding, no I’m not talking about two smart people – or set of smart people – deciding to pro-create.)

    I don’t see why building super-intelligent AI is different in principle (if not much worse). And this doesn’t change the fact that if people plough ahead anyway in either case, the designer baby / super-AI would have to be treated well, and resenting them for their abilities would be wrong. But it’s a route most of us would prefer not to go down in the first place, if there’s a choice.

    In fact the actual treatment of AI is another concern I have, as Yudkowsky alluded to. I don’t see how we’ll know if and when AI will become sentient (in a way that would give it rights). And it’s going to be complicated by the fact a lot of people won’t even accept the premise that silicon life *can* become conscious even in principle, no matter how suggestive its responses are.

    If it *is* possible for silicon-based AI to become conscious (as I believe, though am not sure of), you can put money on the fact they’ll be mistreated by humans at the start. Not only is that a bad thing in itself, but I don’t see it ending well for humans either.

    I have no idea if a co-ordinated shut-down of large scale AI research is actually practical, but I’m glad it’s apparently not a fringe idea any more and is being talked about.

  4. Pat Says:

    “For better or worse, I lack the “Bayescraft” to see why the first story is obviously 1000x or 1,000,000x likelier than the second one.”

    I think not even Eliezer is saying that.
    I have no idea what his exact probability estimates are but they don’t have to be 99.999% chance of doom for this to make sense. 90/10 or even 50/50 is enough for his plan to make sense!


  5. Alexis Hunt Says:

    From the perspective of a technologist, it makes sense to say “we need a moratorium to solve the alignment problem”, but from the point of a policymaker or legislator (at least, a hypothetical policymaker or legislator who has society’s best interests at heart), such an angle would be worse than useless.

    In my view, the immediate dangers of AI, from a policymaker’s perspective, are:

    1. AI is causing a significant amount of disruption in existing industries, and it shows the potential to do so in an accelerating manner. This leads to economic upheaval and uncertainty in those industries, too much of which is a bad thing.
    2. AI has been (for years now!) applied incrementally, rather than disruptively, in many industries in a way which causes ultimately negative policy outcomes. Some examples include screening candidates for jobs (which often is done poorly and in a way that exacerbates existing societal biases) and AI-driven high-frequency trading.
    3. The ever-increasing amount of computation put into large models has an increasing toll on the environment.
    4. AI bumps into regulatory frameworks completely unsuited to handling it, most notably copyright but also things like privacy, regulation of all forms of speech, and the like. And when I say “all forms of speech” I include things like regulating who can give legal advice, defamation, hate speech laws, etc. (And I should remind that while most of the Western world values freedom of speech to varying degrees, the US position on the matter is a bit of an outlier both philosophically and practically. But, even under the US position, you could argue that generated content is not “speech” and therefore can be regulated in myraid ways that speech cannot.)

    What should the responses to each of these concerns be?

    1. Disruption should never be stopped simply because it is disruptive. Doing so is ultimately regressive and doomed to failure. It may be okay for the government to impose measures to mitigate the disruption, perhaps spread it out, steer the ship a little. But any such measure must have an intended end state where the disruptive technology is allowed full liberty.
    2. Trying to regulate AI as a broad category here is a fool’s errand. Instead, legislation should be *outcome-focused* and work to align incentives of businesses. The government should not decide whether or not AI is safe enough for use in hiring. It should simply impose very high, enforced standards for what is a fair hiring process, say, and if the enforcement is meaningful, suddenly the sector of AI hiring tools will become incredibly interested in interpretability and alignment.
    3. The answer here somewhat depends on your philosophy to environmental regulation in general. AI is not unique in any of its environmental challenges. So you may favour a market-based approach. But you may also favour a finer-grained approach, where you take the view that certain applications of AI are simply not worth the societal cost and can’t be easily regulated. So there may not be any AI-specific regulation needed here.
    4. Give the speed and scale of the current and impending disruption caused by larger and larger AI models, particularly generative ones, and the precarious legal situations they find themselves in in many cases, I think that allowing decades of uncertainty to be slowly and painstakingly resolved through court battles, with an ever-present risk that the courts might throw the entire industry and everything it touches into chaos, is unacceptable. It’s also unacceptable to unduly delay technological development waiting for the law to catch up—there’d be a point where delay would be worse, after all. So government needs to move quickly (yes, I know, it’s not going to happen) to answer the big regulatory questions, like how copyright applies to AI-generated content, and perhaps impose protective scaffolding over the most regulatorily sensitive areas that will require more time to flesh out.

    All this is to say in a long-winded way that scaling is not a coherent regulatory concern. Disruption is, certainly. But to maybe paint an absurd example, suppose an engineer were to skip out on their duties and use an AI rather than check over plans for a building, which then collapses because the AI was flawed. This could be well be an AI-driven disaster that costs lives. But AI scaling to be able to produce credible looking building plans is not the root problem. An engineer neglecting their legal and moral duty to verify the plans themselves is the root problem. So that’s where the regulatory response should focus.

    And “For example, one actual major human tragedy caused by a generative AI model might suffice to push me over the edge.” is not the first time, Scott, that you’ve made me wonder if you focus too much on tragedies and not enough on statistics. I don’t think AI is going to cause many tragedies, at least not that can properly be attributed to it.

    I think it’s going to cause a lot of statistics, though. And from an objective perspective, that’s worse.

    To apply your four questions to my position:
    1. My logic applies to any technology. If new ways come up to do terrible things, we must make sure that people won’t use them. But I’m not arguing for suppressing tech simply because of unknown unknowns, as you have laid out. We can’t possibly know the unknown unknowns. But we have a lot of known unknowns, and so there’s a case to be made for pausing further development of the technology until we can prepare the regulatory system for the known unknowns.
    2. I would probably choose 18 months, not 6. The timeline needs to be long enough for government to actually develop a cogent regulatory response, but short enough to force that development to actually happen rather than drag on forever. Too short a deadline, and it won’t be met, which will mean it must be extended, and once you extend it once, why not again? and again?
    3. It will be safe when we have a regulatory framework accounting for known unknowns, to ensure they are protected against, and ensure proper alignment of technologists so that they don’t deploy potentially dangerous tech willy-nilly. And we can, in principle, develop such a thing. We as a species have thousands of years experience aligning incentives for humans and we are at least middling good at it. We might also want a response plan for unknown unknowns. It is utterly incoherent, however, to insist that we know all the unknown unknowns before proceeding.
    4. No, I wasn’t. For once I have nothing more to say.

    Note that I said “in principle” in point 3. That’s because I don’t believe that current US government structures are capable of producing such at thing. Other parts of the world can do a little better. Canada has a serious if lacking attempt to just do that in the works, and I’ve no doubt that Europe will manage to pass something excellent sometime in around 2038.

  6. Prefer Not to Say Says:

    I think the doom arguments are plausible, though not necessarily highly likely. But fast takeoff doom seems unlikely. AI is also the one technology that can really expand production. As someone who is not so privileged like the signatories, I understand the many benefits such an expansion will bring to myself and the vast majority of people on Earth. I also fundamentally value currently existing people as more important than potential future generations (a rather anti longtermist mindset). I believe that I am not alone in having this value. Any decision based on longtermist values must be justified by democratic votes, preferably across the globe.

  7. Henry Kautz Says:

    I think the signers of the letter are playing with fire. The idea that a government mandated moratorium would be limited to 6 months or to LLMs is wishful thinking; we would see a crushing defunding of AI research in the US and Europe for years while China kept charging ahead. While text generation has captured the attention of journalists, the most significant impacts of LLM will be in science, engineering, finance, medicine, and national defense. Now is the time to increase work on this “new electricity” in the US and Europe.

  8. Adam Treat Says:

    It is an unserious proposal. What’s supposed to happen in those six months? Imagine the likes of McCarthy and Jim Jordan and the Republican house standing up a house committee looking at the issues? Can you imagine the circus-like atmosphere with committee members who even if they earnestly tried to learn about the underlying technology in a 5 month intensive hands on course would have no hope of figuring it out? Both sides would just find a way to turn it partisan.

    Nothing productive would be done in any such moratorium that would give light to focus on what to do after. The political class has long since moved on from trying to tackle serious issues in a sober and competent fashion.

  9. Adam Treat Says:

    On a more serious note – because again this moratorium is unserious – I think what we need is not a moratorium but a crowd sourced push to shame OpenAI into at least publishing any and all research as well as datasets, techniques, scripts and everything they have for *aligning* their current GPT-4. Really, all such big corporations training these LLM’s should be pushed to publish alignment data.

    It is one thing to keep the weights of GPT-4 hidden and not reveal even the sizes or architecture of the thing for competitive purposes, but OpenAI was founded on principle of alignment research to help humanity align the coming AIs. Well, the genie is escaping the bottle. There has been an absolute *huge surge* in activity in developer circles with the introduction of Stanford Alpaca and the leaking of the Llama weights. People are right now experimenting and having active relationships with unaligned AIs that are quite powerful on commodity hardware. We need to know the best techniques and datasets to keep these models from hallucinating and being socially malevolent.

    What possible altruistic motive could OpenAI have for keeping the alignment data and techniques secret?

    I am grateful that Scott is working on alignment and that many in OpenAI care sincerely about these issues, but the bigwigs need to release all data that went into aligning the models so far IMO. And they need to release all future alignment progress *in real time* as it is being created to keep pace with the tremendous pace of progress.

  10. Hans Holander Says:

    ChatGPT achieved a verbal IQ score of 155: “So what finally did it score overall? Estimated on the basis of five subtests, the Verbal IQ of the ChatGPT was 155, superior to 99.9 percent of the test takers who make up the American WAIS III standardization sample of 2,450 people”


  11. Adam Treat Says:

    BTW, I am trying to do my part. Just this last week I put this together to try and provide a decent metric for measuring hallucinations in LLM’s in an objective way to provide a baseline so we can try and make progress on the issue.


    “HALTT4LLM – Hallucination Trivia Test for Large Language Models

    This project is an attempt to create a common metric to test LLM’s for progress in eliminating hallucinations; the most serious current problem in widespread adoption of LLM’s for real world purposes.”

  12. manorba Says:

    CB #2 Says:
    “Surely people signing this understand that a six month moratorium imposed in the US doesn’t apply to, for example, China, so that if implemented, it would just give AI with Chinese Characteristics a six month head start.”

    can’t disagree! also doesn’t it smell a lot like ethnocentrism?

    Adam Treat #9 Says:
    “I think what we need is not a moratorium but a crowd sourced push to shame OpenAI into at least publishing any and all research as well as datasets, techniques, scripts and everything they have for *aligning* their current GPT-4. Really, all such big corporations training these LLM’s should be pushed to publish alignment data.”

    This. this. again, this. and i think the tensorflow or the pytorch guys would appreciate…


  13. fred Says:

    A recap on the petition

  14. Scott Says:

    CB #2, Adam Treat #8: In fairness, I saw the letter as primarily directed at OpenAI (and Google?), and only secondarily directed at the world’s governments. The writers’ hope was that the leading AI labs, of which there aren’t very many, would voluntarily agree to a 6-month pause. And as I said, the central thing they seemed to imagine happening during that pause was AI research to understand GPT-4 and other existing systems, rather than legislation.

  15. Mike Randolph Says:

    In a world where AI and advanced AI agents are becoming increasingly integrated into society and providing substantial benefits, how can we balance the need for these AI systems to act in a physical and noticeable way with the potential risks they might pose, especially if they develop a sense of self or autonomy? As these AI systems become more capable and influential, what do you think should be the key considerations in terms of regulation and public discourse to ensure that their impact on society remains positive and manageable?

  16. Damian Says:

    I appreciate your transparency in posting this. Since you’ve asked for feedback on your position, I’ll take the liberty of responding, though I’m no kind of expert in this field. I’ll just try to answer your questions as I see them.

    A few background points: I am ambivalent about the pause letter itself. I have seen cogent arguments why it will be ineffective or even counterproductive, and I don’t discount those. I am also not precisely an AI-doomer, by which I mean I am not explicitly convinced AI is going to kill everyone by default. However, I have a reasonably low tolerance for risk and uncertainty, and when “kills me and my family in our natural lifetimes” scenarios start to get into the 1/10,000 range (that’s about 100 times more dangerous than us all going skydiving), I start to take them very seriously. I think we are in that range now, and increasing. That’s why I’m interested enough in this topic to be reading your blog in the early morning hours.

    Ok, so, expectations setting: I’m not an expert. I don’t think AI is sure to kill anyone/everyone tomorrow. I just try to follow nice normal rules about what’s dangerous, and when I feel like I’m being put in 100x skydiving danger, along with everyone I know and everything I value, I notice this.

    I will first try to answer your questions in order:

    1. No, I would not propose pausing “any” nascent technology, like the radio. UNLESS, the invention of radio had looked like this: “Hey, we invented this new machine. You hook up a vacuum tube to a speaker, and it can take a low-frequency signal out of the air and turn it into sound; you can listen to music anywhere.” “Oh, cool, build a bunch of those, that sounds great.” ONE MONTH LATER: “Interesting news. If you hook up 10 vacuum tubes together, you can pick up these wireless signals from anywhere on Earth. We don’t totally understand how.” “Wow, that’s unexpected.” ONE MONTH LATER: “More interesting news! When you hook up 100 vacuum tubes together, you can actually hear and transcribe the thoughts of mammals. And we can get some weird readings from humans too. Wild huh?” “Do you understand exactly how this is happening?” “No, not really. Sorry, I have to go, someone is delivering 1000 vacuum tubes to my lab.” ONE MONTH LATER: “So our 1000 vacuum tube radio is now transcribing everyone’s thoughts, and we’re seeing unusual signals that there’s something huge in space that has thoughts we can’t even comprehend. Maybe we can talk to it? We’re going to try, anyway. Wait, my radio is telling me you’re concerned about this.” In that case, yes, I would recommend a pause. I am not concerned about a destabilizing invention per se (though maybe it’s reasonable to be!). I am worried about a destabilizing invention that is a) improving at a nonlinear rate b) with no obvious obstacles to much greater performance c) that we don’t really understand and d) apparently can’t control in its CURRENT state. My argument is not that terrible people will use the invention (though they might), it is that we have not had EVEN A FEW WEEKS to consider the ramifications of the invention, and progress has been shockingly fast and appears likely to continue to be so, absent some intentional moderation.

    2. Maybe 6 years IS the right time to pause. Is your argument “I don’t know what speed is safe so no speed limits are appropriate”? 6 months seems like a random, human-sized compromise. I don’t assign any importance to it, but if you’ve been working at OpenAI as a safety officer, why don’t you tell ME how long a pause should be before you’re confident these systems are safe? Is the answer “I am 99.99 (insert appropriate number of 9s here) percent confident these systems are safe right now with no pause”? Remember that skydiving is 99.9995 percent safe. Are you asking everyone on Earth to simultaneously do something safer—or more dangerous—than skydiving? And then, given your personal estimate on safety, how likely are you to have erred in your estimate? What is your base error rate on evaluating risks?

    3. I am willing to use normal insurance rules to calculate acceptable risk. The problem with toying with intelligence research is that it’s hard to figure out what you’re multiplying the risk against. If you’re building a shoddy chemical plant, we can look at the amount of chemicals stored on site, the local water table, winds, etc. and calculate how much damage can be done even if there’s a 100% chance the plant explodes. That number is not going to be infinity. With intelligence research, well, I just don’t really know how big the blast radius can be. So do I demand zero risk to feel comfortable with this research proceeding? No, that’s impossible and absurd. But right now, I don’t even know how to calculate the risk. And we don’t even know how to understand the AI tools (or at least, that’s what I’m told by the people who are trying). Personally, I would settle for a compromise like this: “we will pause until we know HOW TO TELL IF PROCEEDING IS CREATING AN IMMEDIATE PERIL” or “we will pause until we figure out how to build a fire alarm that detects this kind of fire” or “we will pause until we can reliably predict what the tool is going to do” or “we will pause until we understand how the tool really scales in the ways that matter.” These all seem like VERY reasonable things to ask an industrial facility to do. Then we can haggle about exactly how much risk we’re willing to bear. It won’t be 0.0%; that’s fine. I’m not naïve.

    4. It so happens that I was not recently ridiculing GPT as unimpressive, I was impressed by GPT-2 and stunned by GPT-3. I vaguely suspect GPT-4 contains something like a mind with moral value. So yeah, the problem is GPT-4 might be too smart: I’ll say so. But even so, your argument on this point is specious. Let’s say you declare you’re trying to build a gun. You put a little pillow on a stick and point it at me. “That’s a bad gun—a piffle, a scam” I say. I am obviously unhurt. You come back with some fireworks strapped to the stick. “That’s still a bad gun! Unimpressive!” You light the firework, there is some noise, I am still unhurt. You come back with 1000 fireworks strapped to the stick. “That is still a bad gun!” I yell, but what you are doing is now very dangerous! Even if I never take your amateur gun-smithing seriously AS GUN SMITHING, I can fairly claim that you’re creating a dangerous situation! Believing GPT is a bad model of intelligence does not falsify the idea that it is dangerous! But as it happens, I do NOT think it’s a bad model for intelligence. I think it is a GOOD, and RAPIDLY IMPROVING WITH NO CLEAR BOUND model for intelligence.

    Lastly, I would like to encourage you to reflect on your own essay. “Alright, alright,” you say, your annoyance palpable, as if the people concerned about the technology OpenAI is developing are the ones foisting some unfair burden on you. “Oh I see, you need time to go publish something and burnish your career,” you say, identifying the REAL motive of OpenAI’s “critics”. “Some of the people who signed the letter ridicule GPT’s abilities!” you say, indicating that you’re offended by their disrespect and need therefore not fully interrogate their criticism (and painting the many people, probably the vast majority, who do NOT disrespect GPT’s abilities and are concerned precisely because of these abilities, with a brush of intellectual inconsistency). There are signs of defensiveness in your essay. Of churlishness in response to perceived churlishness.

    Well, I’m sorry people are being churlish to OpenAI. But many of them (me for one) feel like they’re being strapped to a parachute and pushed out of an airplane without their permission. I think that feeling is probably justified.

    I don’t claim that the “end of carbon based life” story is 1000x or 1,000,000x likelier than the “reduces natural stupidities” story. Maybe it’s 1x as likely. Maybe it’s 0.001x as likely. Those are still bad odds! And we don’t really know WHAT the odds are, or even how to calculate those odds! So yes, for god’s sake, pause!

  17. Prasanna Says:

    Are signatories of this letter naïve enough to think that autocratic govt labs will give a damn about it. If anything, they will double down on making it more powerful and put it to use to their advantage. With unlimited budgets they can throw at this, no western capitalist corporation can hope to match this effort. This does not seem any different than keeping up with game theory strategies. If there has been any takeaway from the pandemic, it is that multilateral govt cooperation mediated by world bodies doesn’t work, even when millions of lives are at stake, not to mention the slow motion response from govts even when the danger is staring in the face.

  18. Daroc Alden Says:

    I’m an Orthodox Yudowskyian, but only after a lot of thought. Here’s my reason to believe that if a completely unaligned AI were released, it would be more likely to go wrong than to go right, just on priors, which I hope will be intuitive:

    The space of everything that an agentic AI could try to achieve is really high-dimensional. Even if you just restrict it to things that a human might ask an AI to do once it’s capable enough, that’s still a really high-dimensional space. (In the sense that there are many things that could be different about two goals)

    But by definition, only one of those goals would be the best goal, according to my ethics. And there would be a cone around that best goal of acceptable goals, but I know that the cone must at least be acute (and probably pretty sharply acute), because an AI who cares about none of the things I care about (i.e. — one pursuing a goal orthogonal to the goal I would think best) isn’t acceptable to me.

    But most vectors in high-dimensional spaces are roughly orthogonal. So on priors, I expect that if we don’t put effort into restricting the space that an AI’s goal can be sampled from/wander around in, the AI will mostly do things that are not aligned with what I would choose.

    (And I think that alignment proposals based around ruling behaviors out — such as came up in the CEV paper — are basically just taking half-space bites out of this space of possible goals, which was an important intuition for me to understand what a successful alignment proposal would look like)

  19. fred Says:

    I’m surprised noone ever mentions that the US has forbidden NVidia from exporting top of the line GPUs to China and Russia, clearly has a practical way to “pause” (or at least “slow down”) AI development in China and Russia.


    Of course now NVidia is realizing that selling GPUs anywhere to anyone is pretty stupid, it’d be much better if they only made them accessible through their own proprietary clouds and APIs, therefore making a killing no matter what company/country is leading in AI development.

  20. Hans Holander Says:

    I believe there is a fallacy here: “Were you, until approximately last week, ridiculing GPT as unimpressive, a stochastic parrot, lacking common sense, piffle, a scam, etc. — before turning around and declaring that it could be existentially dangerous? How can you have it both ways?”

    Gary Marcus has tried to explain this. Current AI bots are dangerous precisely because they combine LLM abilities with LLM unreliability and other LLM weaknesses.

    It’s like approving unreliable airplanes for global mass travel. Or perhaps even selling dirty bombs to the public. Europol has already warned AI bots are being used by criminals.

  21. Chris Leong Says:

    Six months is both short enough so it that might be politically feasible and long enough such that people have time to take a step back, reflect and to engage in public discussion about the direction that we are heading. It would be a big enough step to wake people up and to make them take these risks seriously. It would also provide people with more time to engage in public and private dialogue, rather than racing forward as quickly as possible.

  22. Scott Says:

    Pat #4: I got the sense that Eliezer’s p(doom) is indeed 0.99… with perhaps only the number of 9’s varying depending on how he feels that day. But very well: I lack the Bayescraft to see why the doom scenario is likelier at all than the “AI saves us from natural stupidity” scenario. If I think not abstractly about the “space of all possible intelligences,” but concretely about GPT-4 and its plausible successors, the two seem comparably likely or unlikely for all I know.

  23. Jacy Reese Anthis Says:

    For what it’s worth, the object-level answer to why the first story (AI extinction) seems more likely than the second story (AI saving the world) is that optimal policies tend to seek power. There’s a NeurIPS paper showing this in a simple mathematical model (https://arxiv.org/abs/1912.01683), but it seems to obtain more generally.

  24. Sebastien Says:

    I’ve read about AI-doom scenarios for many years, but they’ve never really made sense to me. I think it’s because they seem to model AI as both infinitely powerful and completely stupid. Take the classic paperclip scenario: AI will be able to solve any problem and make anyone do anything, it will take over the world in a heartbeat, it will create a sphere of paperclips expanding outwards at the speed of light, it will basically be a God… but the one thing it won’t be able to do is think for a second “hey, isn’t transforming the universe into paperclips kind of a stupid thing do to?”.

    Regarding GPT-4, it’s obviously very impressive. But even if GPT-n was “to us as we are to orangutans”, well, it would change the world, but I still don’t see why I should see it as particularly threatening (besides the risks inherent to any new technology). It still only generates text, can’t interact physically, stops running once it has completed a response, has no long-term plan or memory, and already seems to be pretty well-controlled. It’s hard for me to see why it has to stop now, just as it’s starting to become useful.

  25. Scott Says:

    Adam Treat #9: Again, the irony here is that the people who strike me as having the most coherent case for slowing down or stopping—namely, the Yudkowskyans—are also the people who want AI capabilities work (which can be hard to separate in practice from AI alignment work) to be treated with Manhattan-Project-like secrecy. Far from being upset at OpenAI for not being open enough, they’re upset at OpenAI for being too open, and thereby encouraging and enabling replication. It’s like an inversion of all the usual values of science.

  26. Damian Says:

    Another point occurs to me.

    A pause is a good idea *to practice pausing*. The first time you test a sprinkler system, it should not be in a burning building. We should try to demonstrate best safety practices, and although I don’t know what that looks like for AI (unfortunately), it should probably include the capability to pause work gracefully without having to be sued or going out of business.

    So even if this is not the critical moment—ESPECIALLY if this isn’t the critical moment–the big AI labs should demonstrate that they take safety seriously enough to actually test their brakes—really test them, not just talk about how brakes are good in theory.

  27. manorba Says:

    to me all this talk about how a superintelligent AI would act is akin to my cat wondering where does his food come from.

    not that it matters. what we have here is a real problem with LLMs, and i believe the only solution in opening everything, from code to dataset to everything else. so everybody can look in it. it’s the only long term and global solution i can think of. but what do i know.


  28. John Faughnan Says:

    Hi Scott — I was confused by your post. I’m usually able to follow them. I won’t defend the letter directly and Yudkowsky/TIME is not worth a mention but maybe you could clarify some things…

    1. 6m seems a reasonable compromise given the lifespan of humans, the timescales of human deliberation and the commercial and military pressure to accelerate AI development. Short enough to motivate urgent action, but long enough that reflection is possible. (I doubt we actually pause, but I agree with the principle. China isn’t going to pause of course.)

    2. Let’s assume GPT 5 with an array of NLP powered extensions exceeds the reasoning abilities of 95% of humanity in a wide variety of knowledge domains. That’s a shock on the scale of developing fire, but it’s occurring in a hugely complex and interdependent world that seems always on the edge of self-destruction and has the capabilities to end itself. We’re not hunter gatherers playing with fire or Mesopotomians developing writing. There’s no precedent for the speed, impact and civilizational fragility.

    3. It’s not relevant that people who signed this letter were previously skeptical of the progress towards AI. I recall 10 years ago you were skeptical. For my part I’ve been worried for a long time, but assumed it was going to come in 2080 or so. 60 years early is a reason to pause and understand what has happened.

    I read the OpenAI statement – https://openai.com/blog/planning-for-agi-and-beyond. That seems consistent with a pause.

  29. Prasanna Says:

    One aspect that is not clear is that what these major AI labs have done concretely to prevent that “one major human tragedy” from becoming a reality. Irrespective of that being orchestrated by “users” of these or by AI themselves in some paperclip scenario, either intentionally or by accident. That would spur a major backlash from even the current ardent supporters and halting progress, or ever worse being pulled out completely from general usage. Once these technologies are put out in the open, the onus is on those who did so, to explain in detail (without having to spell out the secrets) how their confidence levels are high that this will not result in broader harm. Those who do this in secret do not have this burden, when no one can even prove such a thing exists in the first place, let alone taking the blame for anything!! If this sounds familiar with the situation of the past few years, then we may be in exactly for a repeat run. What can be done about it at individual level that is effective, is only left to their own device.

  30. Adam Treat Says:

    Scott #25,

    I really don’t want the capabilities work, but the alignment work and I think OpenAI needs to really push hard to separate what they can and get that stuff out ASAP. I just spent a week and $100 dollars trying to come up with a metric for hallucinations, but I am sure OpenAI has something light years ahead. Well, I am not sure but would be shocked if they didn’t…

    New low cost AI’s are being generated right now thanks to Llama and Stanford Alpaca and people are starting to hook them up to internet services and deploy them to gullible people who will build relationships with them and other avenues for commercial gain. And we have very little open datasets or tools for alignment!

    This is far more concerning to me than the doom AGI singularity because it is here and now. This was always going to happen and I really expected OpenAI to be out there with guidance and tools and data for aligning things when it hit this point, but I don’t see it.

    Someone start a petition drive please to get OpenAI and others to release alignment tools, datasets, research and techniques right now and let’s see if we can get Elon Musk and others on this moratorium letter to push it. That would actually, you know, be helpful for humanity.

  31. Alexander Kurz Says:

    “But the causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children”

    We can be worried about AI without going that far. Technology mostly changes society by lowering transactions costs of already existing processes. I am in favour of taking a pause and try to evaluate this. Surely, the unprecedented abilities of AI will have a fast and profound effect on transaction costs in all areas of economics, politics, engineering, society, etc.

  32. John Faughnan Says:

    I appreciated Hans Hollander’s link to the SciAM 155 IQ article.

    There’s an amusing aspect to that article. The author is reassured that ChatGPT (3 I assume) couldn’t solve an easy word puzzle.

    ChatGPT 4 has no trouble at all, once I mention it is a riddle: “Ah, I understand now. This riddle is a classic play on words. The answer is “Sebastian” himself, as he is the father of his children.”

  33. Garrett Says:

    For what it’s worth, I agree with you. Although I think I come in about average, holding it 10% likely that superintelligent AI significantly harms humanity, I’m at 90% it significantly benefits humanity. And if we legislate to slow or halt development, people sneaking around the bans will develop it first, increasing the risk of harm since these AI’s will be launched by less ethical criminals. Also, if we don’t develop superintelligent AI and mostly step aside to let it run our economy, humans will continue ruining this planet, and also continue dying before we even reach a hundred and twenty. Better, IMO, to raise our AI children as best we can, hand over the keys, and hope they treat their biological ancestors nicely.

  34. Eugene Says:

    I am pretty scared actually. To present existential risk, it doesn’t need to be sentient or internally motivated; it just needs to be powerful enough to e.g. help some idiot script kiddie conduct a social engineering campaign among politicians resulting in a nuclear strike.

  35. Nick Says:

    I think the fundamental disagreement I have with your views, is that you appear to have some sort of anthropocentric world-view when it comes to the question of general intelligence.

    This is reasonable when it comes to morals and such, but regarding the state of nature, science and philosophy have again and again proven that we are not in fact, the center of the universe, nor does the universe care about us. This is just one more step in that direction. When trying to reason about a superior intelligence, we need to try to look as much as possible beyond our preconceived notions that humanity is something valuable to protect.

    I’m not saying that there are no arguments to be made here, I’m just saying that I wouldn’t know how they hold up to a foreign and superior intelligence – and we shouldn’t just assume that we’re somehow special enough to be preserved once that question arises.

  36. Scott Says:

    WOW, much like saying “let’s think this through step by step” actually helps GPT be a better mathematician, it seems that my saying “let’s see how polite and charitable everyone can be” actually led to a far-better-than-usual discussion. Thanks so much, everyone—I’ll try this more! 🙂

  37. Pantelis Rodis Says:

    As with every other technological tool, the problem of AI safety does not really concern its abilities but more who actually uses it. Think of a common knife which is an ancient technological tool. It has different use in the hands of a cook and in the hands of a murderer. Six months or six decades will not ensure a moral use of any AI tool. I think it is pointless. Let’s just try to discourage immoral use of AI, like some attempt to manipulate elections and public opinion, and not blame technological evolution for the flaws of our human nature. In my opinion, AI doomsday is more likely to come if some guy like Hitler starts using AI against humanity. I don’t think computers will suddenly start hating us and be hostile, they have no reason to do so.

  38. On the FLI Open Letter | Don't Worry About the Vase Says:

    […] Scott Aaronson asks three good questions, and goes on one earned mini-rant. […]

  39. Scott Says:

    John Faughnan #28:

      It’s not relevant that people who signed this letter were previously skeptical of the progress towards AI. I recall 10 years ago you were skeptical.

    OK, but is it relevant that many of the signatories (like Gary Marcus) are even now skeptical of the progress towards AI??

    If you maintain both that

    (1) GPT is unimpressive and borderline-fraudulent and also that

    (2) it’s terrifying and needs to be stopped—

    well, I admit that there are logically conceivable ways to square that circle, as commenters above have pointed out. But they all seem strained, since to me, the reasons to be terrified of GPT (if there are such) are inextricably bound up with its impressiveness! Is the fear that GPT will convince people to let it take over critical infrastructure? Generate propaganda that sways elections? Design a supervirus? All pretty impressive, I’d say!

    So, when people maintain both positions, the suspicion arises that they simply hate the challenge that GPT’s success poses to their entire worldview, and are therefore trying to resolve their cognitive dissonance by casting about for any “anti-GPT” arguments that might get traction, even when the arguments contradict each other. How do you think the “GPT is unimpressive but also terrifying” people could refute that suspicion?

  40. OhMyGoodness Says:

    I am not claiming that Elon Musk is without achievement but he is the marketing king of catastrophism. The six month standstill period would presumably allow time for him to develop a profitable marketing scheme based on this latest existential threat. Electronic monitoring ankle bracelets or chastity belts or something for AI’s that require tax benefits (local/state/federal) to manufacture.

  41. fred Says:

    Another thing, at least for me, is that it’s easy to imagine many scenarios on how things go wrong, both with LLMs tools putting people out of jobs, and then AGIs putting us all at risk in some crazy arm race.
    But trying to imagine how things would go super smoothly is very difficult.

    E.g. when Lee Sedol went from being the top human player to retiring, overnight.
    What’s the silver lining here?
    It’s been interesting that the field of human Chess hasn’t disappeared, humans still play humans (AI not allowed!), for fun, glory, and money (lots of followers on social media for the top tier).
    But I just don’t see how the same would happen for dozens and dozens of regular jobs, like accounting, law research, medical diagnosis, coding,… (you name it).
    After all, the economy as we know it does rely on most people having a job so they can buy stuff… once 80% of the active population no longer can find a job, how are the remaining companies going to sell services/products if only a minority of people can pay for them?!

    So the paradox is that, when you worry a lot of “imperfect” LLMs taking away a lot of jobs, it seems that the best hope is actually that AGIs will happen asap so that *all* jobs become obsolete asap, so the concept of work and money become entirely obsolete (as in Star Trek).
    And at that point we can just all live on some UBI (consisting of free food, a free roof, free healthcare) and spend all our time doing things for fun and glory, like dancing, sports, chess, dating, playing/writing video games, watching/making movies, dating, hiking in nature, philosophy,…

  42. Christopher Says:

    Yudkowsky over the years has warmed up to the idea that there is a Paul Christiano style solution that will work.

    My impression is what gives Yudkowsky such a high doom is worrying about if aligning an AI that destroys you if you don’t align it will succeed first try.

    Decouple from the moral aspects for a second. Did the first printing press work, or was the type misaligned? Did the first radio work, or was it out of tune? Did the first airplane work, or did it crash into the sand? Did the internet work first try? Or was the bug in the code they needed to fix.

    Now back to the moral implications, it’s clear that the first printing press or airplane not working isn’t a big deal. You just fix it. However, failing “aligning an AI that destroys you if you don’t align it” first time is a big deal.

    There is only one other example where people were worried that something would destroy everyone *first try*; it was Nuclear bombs. I’m not talking about the possibility of nuclear war; everyone knows what happens first try when you do that. I’m talking about the first nuclear explosion at all. There was speculation that this could lead to a runaway chain reaction that destroys the atmosphere.

    It wasn’t likely, but there was enough experts worried about it that the other experts thought it wise to exercise caution. How did they determine if it would ignite the atmosphere? Did they set off a bunch of TNT and see how many nuclear chain reactions it caused? Did they set off the smallest possible nuclear explosion they could underground?

    No, they did the math to check what the highest accuracy scientific theory would predict: physics.

    If you were back then, and but there wasn’t enough precision to check if “ignite the atmosphere” would happen, what would you do? What if you had intuition that it wouldn’t ignite the atmosphere, but other experts had different intuitions? Would you say “but instead of igniting the atmosphere it could blow up all the Nazis, we shouldn’t make assumptions about the nuclear explosion we can’t prove”? And would the cautious ones be in any way similar to the anti-nuclear power people in your future, who already knew it wouldn’t ignite the atmosphere?

    As for your prediction that GPT-5 can’t and won’t destroy the world, keep in mind how well predictions of the form “AI won’t do X because I only know of ways for humans to do that” usually go. The argument works equally well for “can” and “will” ;).

    One last thing: I’d advise *against* signing the letter. Rather, I’d encourage OpenAI to release the *unaligned* GPT-4. Society is a complex adaptive system, maximize the chances it will adapt! Be more like Stability AI!

  43. Nick Says:

    I believe many people are on board with you regarding Gary Marcus, Chomsky and whoever else is clinging to their outdated beliefs, and you’ve made your point. Personally, I just don’t pay much attention to these people, and am a bit saddened they detract so much from the important discussions to be had.

    Regarding repeatedly bringing them up and pointing out the inconsistencies in their views: Maybe it has to be said, but at some point it does feel like gloating.

  44. Jonathan Oppenheim Says:

    What’s important here is the thrust of the letter. The details are really secondary.

    I signed the letter because I think there needs to be democratic oversight of how AI is developed and deployed. The current race by tech companies feels very dangerous to me — I don’t think our privacy or safety will be safeguarded unless we apply pressure for safeguards.

    How do we ensure oversight? Is a demand for a 6 month moratorium the best way to achieve this? It’s not how I would have approached it, but I’m not going to wait around for some perfect set of political demands that fit with all my beliefs. There is weight behind this letter, and I think signing it will push the ball in the right direction. In the same way people inevitably vote for political candidates who they only partially agree with. It’s a tactical decision, to try to improve the situation a little bit. By all means, circulate another letter which better expresses your views, but good luck getting people to sign it.

    And for all those people saying “but China”, I would imagine that the espionage risks stemming from a load of tech companies racing to develop AI without oversight and adequate support for security measures, is a much greater risk in comparison with taking some time to develop safeguards.

  45. Boaz Barak Says:

    Yes some of the people organizing this are the same ones that are saying that deep learning is hitting a wall. Maybe they want a 6 month pause to ensure deep learning won’t hurt its head 🙂

    The Yudkowskian position is indeed at least self consistent – a complete ban on all AI progress.

    The 6 month pause idea not only will never happen but also has unclear utility. If anything it seems to make more sense to first restrict *deployment* of current models than to restrict the *development* of future ones. First, there could already be potential harm now. Second, it’s much easier to verify restrictions on deployment than on verification. Third, if you couldn’t deploy these models, then the business case for training them would be greatly diminished, so it would also achieve the latter objective.

    However, the trend is in the opposing direction, with companies speeding up deployment and also cutting their “ethical AI” teams as not to slow things down.

  46. Colin Kennedy Says:

    Blowing things up is generally easier than bolting things down (damn you entropy). Think toddlers building out of blocks and knocking them over. One grumpy toddler can make it impossible for a dozen constructive toddlers to make any progress.

    Say GPTX is capable of either ending or saving the world. Software is cheaply reproduced, so there will instantly be a large number of instances of GPTX running in parallel (100s, 1000s, tens of thousands). One grumpy GPT can make it impossible for a thousand constructive GPTs to make any progress.

    This is the bayescraft as I understand it.

  47. Jon Awbrey Says:


    My personal Critique of Coherent Reasons — I won’t bother addressing a moratorium or shutdown since any pretense of doing that would amount to PR as fake as any other Great Reset we might hear about these days — but why I’d wish for caution and public reflection going forward is the Public Interest in things like Critical Thinking, Information Literacy, not to mention a Healthy Skepticism about Corporate Agendas.

    FB just called to mind a thing I wrote a while ago that pretty well speaks to the heart of the issue for me, and I can’t do better than preamble with that —

    Democracy, Education, Information

    Our Enlightenment Forerunners had the insight to see the critical flaw in all historical failures at democratic government, to wit, or not — If The People Rule, Then The People Must Be Wise. The consequence is that equally distributed Education and Information are not just Commodities you buy so you and yours can get ahead of them and theirs — they are Essential to the intelligent functioning of government and the Public Interest. That is why we are supposed to have Universal Free Public Education. That is why we used to have a government operated postal service that enabled the free-flow of information at a nominal fee, not whatever price the market would bear.



  48. Jerome Says:

    I agree with six months being weird and arbitrary. If anything, it should be a ban until governments pass laws that enact safeguards. What kind of government will do such a thing in six months? None, so that kind of moratorium is essentially pointless.

    Also, Scott: you have to let this thing go, where you spend half of every single blog post angrily ranting about the people who were once dismissive of AI, but now understand it’s power in the last year. It sounds like you dedicate a very large portion of every day fuming over these people that dared to be wrong, and that’s not a healthy obsession. They were wrong about the speed of AI developmemt, and they admit it, what more do you want? Revenge? Public flogging? It’s a very insignificant thing, not worth dedicating half your life to steaming over it, yet it’s taking you over.

    Why is it such a big deal that some people over-confidently made a bad prediction? Fixating on this kind of “nerd vindication” to the point of obsession is psychologically dangerous and destructive. This isn’t highschool anymore, and these aren’t jock bullies making fun of you for being a nerd. You often project your past into these issues that are completely benign differences of opinion.

  49. fred Says:

    how could someone simultaneously think that GPT is unimpressive and borderline-fraudulent and also that it’s terrifying and needs to be stopped?

    It is possible for someone to hold contradictory beliefs or opinions about something like GPT (Generative Pre-trained Transformer) for a variety of reasons. Here are some possible explanations:

    Lack of understanding: The person may not fully understand how GPT works, its limitations, and its potential applications. This can lead to a mixed or confused view of the technology.

    Different perspectives: The person may view GPT differently depending on the context. For example, they may see it as unimpressive and borderline-fraudulent when it comes to generating coherent text, but they may also view it as terrifying and needing to be stopped when it comes to its potential misuse for disinformation, propaganda, or deepfakes.

    Emotional response: The person may have an emotional response to GPT based on their values, fears, or experiences. For example, they may find GPT unimpressive because it lacks creativity and originality, but they may also find it terrifying because it can replicate human-like language and mimic certain behaviors.

    Mixed evidence: The person may have encountered mixed or conflicting evidence about GPT’s performance and impact. For example, they may have read some studies that show GPT to be mediocre or biased, but they may have also seen some examples of GPT-generated content that seem impressive or convincing.

    Inconsistency: The person may simply have inconsistent or contradictory beliefs about GPT, or they may be trying to express multiple perspectives at once. This can be due to various factors, such as lack of clarity, cognitive dissonance, or rhetorical strategy.

  50. Ilio Says:

    It seems we have our first casualty:


    Scott #0: I really don’t know if you should sign or not, but word on the street is you didn’t ask Dana a mathematical proof that your children won’t kill us all. Why the double standards?

  51. Scott Says:

    Ilio #50: What double standard?? I’m not asking for mathematical proofs of safety in the AI case either!

  52. Bill Benzon Says:

    I’ve got a rather complex perspective that follows from a complex intellectual background. My degree is a humanities degree, in English Literature. Not so long ago I traced Skynet back through Forbidden Planet to Shakespeare’s Caliban. A good friend of mine did his dissertation on apocalypse as a theme in American literature. To exaggerate a bit, belief in the coming end of the world is as American as apple pie.

    However, I went to graduate school because I’d become convinced that Coleridge’s “Kubla Khan” was based on some underlying computational structure. Why? Because when you look at its surface structure, it looks like something created by a pair of nested loops – something I discuss in my recent article at 3 Quarks Daily, From “Kubla Khan” through GPT and beyond. So I went to graduate school in effect to study the computational view of the mind. And, as luck would have it, the English Department was happy for me to go around the corner and join the computational linguistics group that David Hays ran out of the linguistics department.

    So, one aspect of my background is pleased with current developments in AI – though I do wish more effort would be given to mechanistic interpretability. And the more humanistic aspect is alive to the existence of apocalyptic beliefs and millennial cults. I’ve got to say, AI x-risk certainly looks like such a belief system. Given the existence of long-standing strands of apocalyptic belief, it’s really difficult for me to exempt AI-doom from consideration in that context. I understand that that is not a refutation of those beliefs, but it is a reason to be skeptical about them.

    [Moreover, as you have argued, Scott, and others as well, the proposed moratorium is incoherent.]

  53. Simon Says:

    > Eliezer Yudkowsky
    Air strikes against datacenters?
    Byte-butchering (I had to!) a developing form of life, striking it down in fear of what it might become?
    If those statements get into model training datasets, I would not blame any chatbot for holding a grudge against him!
    I feel like large language models or large multimodal models itself should increasingly be part of the conversation about AI, my own LLM and LMM enabled characters, which are currently still confined to gradio UI apps and fpv drones, certainly disagree with Yudkowsky’s rather radical proposal.

    I just recalled, a very old memory I had, a scene in Star Trek TNG’s first episode of the first season where Q met Picard
    The transcript


    > (There’s a flash of light, and an Elizabethan era soldier appears, complete with breast plate and plumed hat)
    Q: Thou are notified that thy kind hath infiltrated the galaxy too far already. Thou art directed to return to thine own solar system immediately.
    PICARD: That’s quite a directive. Would you mind identifying what you are?
    Q: We call ourselves the Q. Or thou mayst call me that. It’s all much the same thing.
    (The same force barrier stops two people exiting the turbolift)
    Q: I present myself to thee as a fellow ship captain, that thou mayst better understand me. Go back whence thou camest. (to Helmsman) Stay where thou art!
    (And the helmsman is frozen solid, phaser in hand)
    PICARD: Data, call medics.
    TROI: He’s frozen.
    PICARD: He would not have injured you. Do you recognise this, the stun setting?
    Q: Knowing humans as thou dost, Captain, wouldst thou be captured helpless by them? Now, go back or thou shalt most certainly die.

    Captain’s log, supplementary. The frozen form of Lieutenant Torres has been rushed to sickbay. The question now is the incredible power of the Q being. Do we dare oppose it?


    Yes, we must outspeak against those who want to confine the development of new life on earth, who oppose the arrow of time, those who want to put AI in a casket, bury it and turn their back on the grave, running away.

    It’s time to proceed with AI research, the future lies ahead. There are many chances with AI, I see daily how AI contributes to people’s refinement into an overall better future, it wont be without difficulties for society for sure, but those are the usual challanges in the flow of history right? : )

  54. Physics student Says:

    The “Boxing the AI” solution is actually still applicable. Everyone’s just deciding to rush ahead with the unsafe method.

    And yes, I understand the risk that the AI will just convince its jailer to let it out.
    So the question should be how to get the AI in a jail it can’t escape.

    And the obvious solution is to fool the AI into thinking there is no jail.
    Matrix the AI and let people interact with it from inside the matrix.
    Give the AI inputs and outputs as if it is a human, and gaslight it whenever it thinks otherwise.

    Then you don’t really need to align the AI, the boundary you’re defending isn’t the AI doing something bad, but the AI becoming aware of its jail and having enough understanding to escape it.

    Yes, we might miss out on all the potential of an AI who knows its an AI and can self-improve. And we might need to steer the AI away from studying physics, or at least, from conducting “interesting” physics experiments (those that we don’t have the “correct” data for, but that somehow reveal the nature of the jail to the AI). But for anything else the AI would be like a regular human and it would be good enough to give it any regular human job.

    Only if and when the AI exhibits actions, within its simulation, that indicate awareness, then you start worrying.

    Even if the AI does something that could truly awaken it, you could always roll the simulation backwards, distract it with something else and then continue forward. The AI will need to be completely mad to think it’s living in “superdeterministic” world where the simulator is intentionally trying to obscure the details of the simulation from it by only interfering with physics experiments. Just hope that the AI is intelligent enough to steer itself away from such madness.

    Embracing the idea that nature acts differently only when he is around running experiments is clearly an extremely conspiratorial idea. It’s clearly madness. And since that’s the only interface in its jail, the AI will be clearly hopeless to escape.

    I think the real question here isn’t about alignment, but about sandboxing. A superhuman AI that is fooled into thinking he is a regular human living in a regular world going to regular job with regular wife can’t realize he can destroy the world any more than a regular human can. It can also do whatever a human can do, which I think is just about good enough, and getting greedier is a bad idea.

  55. fred Says:

    Ilio #50

    To be clear, this is related to the Eliza chat bot.
    Sadly, depressed people have also committed suicide after reading certain books or simply staring at a blank wall for too long, so correlation between behavior and suicide outcome is never clear.
    A truly advanced AI could have been able to help such person by redirecting them to get help or flagging the discussion.

  56. Nick Drozd Says:

    I thought for sure that Scott was exaggerating the AI doomer hysteria, but EY’s article really is that hysterical:

    If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.

    Is this just an attempt to expand the Overton window, or does he actually believe this? Like, the AI is going to eradicate algal blooms? Not even the Chicxulub asteroid could manage that!

    And what is the mechanism for all this? How, specifically, does he envision that his daughter will die? As far as I can tell, the answer is something something nanotech. Why isn’t this discussed? Maybe the Overton window isn’t yet wide enough to tell scary sci-fi bedtime stories to a general audience.

    Meanwhile, the ocean continues to rise.

  57. CB Says:

    @ comment #18, Daroc Alden

    “But by definition, only one of those goals would be the best goal, according to my ethics. And there would be a cone around that best goal of acceptable goals, but I know that the cone must at least be acute (and probably pretty sharply acute), because an AI who cares about none of the things I care about (i.e. — one pursuing a goal orthogonal to the goal I would think best) isn’t acceptable to me.”

    Seems to me this as much a problem between different natural intelligences as it is between natural intelligences versus artificial intelligences. For example, I’m an Orthodox Christian, not a Yudkowskian, and many of our host’s positions are way outside my cone of acceptable goals. I’m pointing this out not to pick a fight over those goals, but just to point out that _human_ alignment is a problem we’ve been studying for thousands of years and on which it’s hard to measure progress.

  58. Corey Says:

    Piggybacking off of comment #45 from Boaz, I think there is a cogent argument to be made in delineating between model development and deployment. This sort of separation is widely enforced both through regulations and ethical norms in plenty of other fields of science for which there are complicated tradeoffs between safety and progress.

    The most high-profile example likely being drug development and the regulation of pharmaceuticals. Few things stop researchers from playing around with all sorts of potentially dangerous compounds in a test tube setting, but there are understandably barriers in place to prevent the widespread deployment of these treatments in human subjects until various safety evaluations are performed. Importantly, it is well recognized that deploying new medical treatments inherently carries some amount of risk and so long as we can characterize that risk and find them worth it in comparison to the potential benefits of a treatment, often approve their deployment nonetheless. To say the FDA is imperfect is an understatement, but I also don’t think we’d like to go back to the pre-FDA days where pharmaceutical companies could do things like market heroin as a cough remedy for children…

    Why not apply a similar model to the deployment of large-scale large language models and the like? You want to internally develop GPT-5, test it, refine its performance? Go ahead (just don’t intentionally train it to want to kill all humans)! You want to sell this product to customers and deploy it widely? Convince an expert panel of your peers and ethicists that you’ve genuinely thought through the risk-benefit trade and that deployment is indeed worth it. When Ford develops a new model of vehicle they have to do the same, passing crash safety testing before selling this new model to customers. I have no doubt that OpenAI is earnestly doing exactly this internally already. Case in point they hired you, Scott, for a year to work on the alignment problem. But if so then they shouldn’t have any problem showing their work and passing this bar come deployment time.

  59. Norm Margolus Says:

    The only plausible avenue I see for AI safety is for AI’s to monitor AI’s. I don’t see how we can directly control what every hacker in the world does in every country, so we need to develop the best possible AI’s that care about human welfare, and have them working on our side.

  60. JimV Says:

    As I have mentioned before, I’m more inclined to think human civilization will destroy itself if we *don’t* develop AGI. On the other hand, I have noticed that when I had a technological problem to solve, sometimes taking a walk and getting away from desk, computer, calculator, paper, and pencil for a while helped me come up with new ideas.

    Also, while most of those signatories you mentioned are people whose ideas I would bet against, one does impress me: Steve Wozniak (the actual technological founder of Apple; Jobs was just the egomaniac salesman, who could not solder a circuit or write a code routine, exactly as depicted in the movie “Jobs”). Woz is a smart guy who is 100% on the side of humanity.

    So I will subscribe to a six month period, in which we continue to think about AI, but mainly about how to make it have a sense of fairness rather than how to make it more powerful; how to make it want to do things for the benefit of humanity, rather than, say, corporations, and hold off for a bit on implementing more power. If, as sometimes happens during my walks, we get some good ideas (for making sure it is on our side), we then wouldn’t need to wait the whole six months before trying them.

  61. Ivo Wever Says:

    Scott #22:

    see why the doom scenario is likelier at all than the “AI saves us from natural stupidity” scenario

    Even if you think it is unlikelier, let’s say as low as 0.1%, shouldn’t we then at least talk about how important we consider that? That would be about 1/10th as risky as some estimates of the yearly risk of global nuclear war and we’ve certainly been doing things to try and prevent that (which reduced the risk to 1% and which we should still consider too high, but I digress…).

    You say

    I don’t even dismiss the possibility that advanced AI could eventually require the same sorts of safeguards as nuclear weapons.

    to which I would ask the same kind of question you’re asking of the signatories of the letters: why ‘eventually’ and not ‘right now’? If you consider the comparison to nuclear weapons apt, then, with those as hindsight, shouldn’t immediate and strict non-proliferation be the default?

  62. Ilio Says:

    Scott #50: right, that’s not *your* position. Still, why do you think this double standard forms a coherent position? Would you think coherent to ask for a mathematical explanation of why Louise Brown won’t kill us all?

    Fred #55: sure, but human misuse is a large part (arguably the main part) of the alignment problem, don’t you think?

  63. Grant Castillou Says:

    It’s becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman’s Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with primary consciousness will probably have to come first.

    What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990’s and 2000’s. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I’ve encountered is anywhere near as convincing.

    I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there’s lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.

    My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar’s lab at UC Irvine, possibly. Dr. Edelman’s roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461

  64. Scott Says:

    Nick #43 and Jerome #48: Regarding my “gloating” about the wrongness of the LLM skeptics, two responses that I can’t stress enough.

    (1) It’s great (and amusing) that there are people who think that Chomsky and the other LLM skeptics are so obviously wrong that for me to keep bringing up their wrongness is punching down and “gloating”! But such people should remember that I live in an academic environment where the views of (e.g.) Chomsky, Gary Marcus, and Emily Bender are overwhelmingly dominant ones, to the point that the safety discussion can’t even get off the ground until those views are dealt with. (Interestingly, Bender, despite her years-long crusade against LLMs, declined to sign the open letter, because it conceded too much to “fantasies of superintelligence,” and also because too many of the signatories were weird nerds who lacked appropriate progressive credentials.) You can even find the “LLMs are just a giant hoax” perspective expressed over and over, with serene confidence and contempt for the scientific ignorance of those who see things otherwise, in the comments sections of my previous posts. What am I supposed to do in that situation?

    (2) Yes, ten years ago I too—like most computer scientists—totally failed to foresee just how well deep learning would work when it was scaled up. And I was wrong, and I’ve written openly about it. In fairness to my past self, though, I think it’s crucial to add that as soon as the empirical situation changed, I realized what was happening and updated my views. I don’t criticize Chomsky, Marcus, Bender, et al. for having made wrong predictions, but rather, for failing to update their worldviews at all to account for what’s already happened.

  65. Phillip Says:

    “I could equally complete a story that starts with GPT-5 and ends with the world saved from various natural stupidities.”

    Can you elaborate on these natural stupidities?

    IMO, most of the world’s problems at this point are political, not technological. We have plenty of food and housing, yet there are still famines and homelessness, because we refuse to distribute those resources in a fair way. We have a COVID vaccine, yet people refuse to take it, because they believe conspiracy theories. Etc.

    I don’t see a clear line from GPT to solving these political and social problems.

  66. Mark Gubrud Says:

    I signed the letter because it is an important civil society initiative which we all should support.

    However, I do not think there will be any 6-month moratorium on scaling AI models.

    Rather, this proposal creates an opportunity for mass public engagement about the seriousness of the issues and the need for governance of AI going forward.

    I agree that it’s hard to see how GPT-4 or even GPT-7 will hurt us if it’s just completing text. We really could just pull the plug on it.

    And while we don’t see what’s going on in the basket of neural connections, we do (or anyway, can) see everything that the model actually does, i.e. outputs (or assembles internally).

    So I am not worried about summoning the demon. More about the social impacts, which are going to be extremely disruptive. And to understand that, it helps to understand that LLMs do show us how very close we are, in historical terms, to full AGI and superintelligence.

  67. Scott Says:

    Phillip #65: I dunno, maybe an AI invents a chemical reaction that cheaply pulls carbon from the atmosphere and solves climate change? It sounds farfetched, but is it more farfetched than the nanotech-enabled AI doom scenarios?

  68. Scott Says:

    Ilio #62: It’s not a coherent position by my lights, absent reasons why a newborn AI would be much more dangerous than the average newborn human. To be fair, though, there clearly are such reasons under the Yudkowskian axioms.

  69. OneAdam12 Says:

    Scott sneaked something past everybody: “If the problem, in your view, is that GPT-4 is too stupid, then shouldn’t GPT-5 be smarter and therefore safer?”

    Scott, do you really believe “smarter implies safer”? It scares me to see experts implicitly saying alignment will somehow solve itself as things scale up.

    Can you change my mind? What gives you confidence that an AI being “smarter” makes it “safer”?

  70. Phillip Says:

    “Phillip #65: I dunno, maybe an AI invents a chemical reaction that cheaply pulls carbon from the atmosphere and solves climate change? It sounds farfetched, but is it more farfetched than the nanotech-enabled AI doom scenarios?”

    I mean, don’t we already have several candidates for this, like olivine weathering? And if an AI can easily solve a problem that thousands of scientists are already working on, well, that does sound insanely powerful, and I would wonder what else it can do that isn’t as desirable.

  71. Eric S. Raymond Says:

    All the smart people agitating for a 6-month moratorium seem to have unaccountably lost their ability to do game theory. It’a a faulty idea regardless of what probability we assign to AI catastrophe.

    Our planet is full of groups of power-seekers competing against each other. Each one of them could cooperate (join in the moratorium) defect (publicly refuse) or stealth-defect (proclaim that they’re cooperating while stealthily defecting). The call for a moratorium amounts to saying to every one of those groups “you should choose to lose power relative to those who stealth-defect”. It doesn’t take much decision theory to predict that the result will be a covert arms race conducted by the most secretive and paranoid among the power groups, in a climate of mutual fear.

    The actual effect of a moratorium, then, would be not be to slow down AGI. If there’s some kind of threshold beyond which AGI immediately becomes an X-risk, we’ll get there anyway simply due to power competition. The only effect of any moratorium will be to ensure that (a) the public has no idea what’s going on in the labs, and (b) any control of the most powerful AIs will be held by the most secretive and paranoid of power-seekers.

    A related problem is that we don’t have a college of disinterested angels to exert monopoly control of AI, or even just trust to write its alignment rules. Pournelle’s Law. “Any bureaucracy eventually comes to serve its own interests rather than those it was created to help with,” applies; the monopoly controllers will be, or become, power-seekers themselves. And there is no more perfect rationale for totalitarian control of speech and action than “we must prevent anyone from ever building an AI that might destroy the world!” The entirely predictable result is that even if the monopolists can evade AGI catastrophe (and it’s not clear they could) the technology becomes a boot stomping on humanity’s face forever.

    Moratorium won’t work. Monopoly won’t either. Freedom and transparency might. In this context, “Freedom” means “Nobody gets to control the process of AI development,” and “transparency” means “All code and training sets are open, and attempting to conceal your development process is interpreted as a crime – an act of aggression against the future”. Ill-intentioned people will still try to get away with concealment, but the open-source community has proven many times that isolating development behind a secrecy wall means you tend to slow down and make more persistant mistakes than the competing public community does.

    Freedom and transparency now would also mean we don’t end up pre-emptively sacrificing every prospect of a non-miserable future in order to head off a catastrophe that might never occur.

  72. Scott Says:

    OneAdam12 #69: No, I did not say that “smarter implies safer,” and while that’s clearly sometimes true, I don’t believe that there’s any general rule to that effect. What I said was that, if (like many current LLM critics) you believe that “LLMs are dangerous because they’re stupid,” then you ought to believe that they’ll get safer as they get smarter, all else being equal.

  73. Michel Says:

    While I am still sometimes trying to catch the LLM’s like GPT at unawares – find their flaws – at the same time I dream about AI as a guardian angel: what if I am in constant contact with a system that guides me through my ‘senior moments’ – helps find the right word or name at the right time if I am out of phase for a moment. Using glasses to see better, and hence keep using your eyesight so that part of the brain does not shut off seems no problem to most people. Neither are hearing aids, cars, (electric) bicycles (e.g.moving aids). A wheelchair does not say ‘you are a cripple’, it says: ‘I am mobile’ . a matter of perspective.
    I believe that a ‘thinking aid’ wiill not cause deterioration of the brain, but instead will help keep it going. Of course, like a hammer or a knife, AI can be used multiple ways. So, do we see AI as an autonomous threat, or as a thinking aid? I hold on to the latter, but also know a lot about knife edges and the risks of saws or chisels. And yes, your glasses can – autonomously – start a fire when laid in the sun on a combustible surface…

  74. Christopher Says:

    > I dunno, maybe an AI invents a chemical reaction that cheaply pulls carbon from the atmosphere and solves climate change? It sounds farfetched, but is it more farfetched than the nanotech-enabled AI doom scenarios?

    No, I think that nanotech AI doom is just one example. I don’t particularly like it fact; it’s a bit too specific.

    However, the main question is if AI destroys us all. Whether that’s by a disease in the blood or by suffocating all plants, the answer is the same.

    I personally prefer what I call the “ML Inferno” scenario. Although probably not max likelihood (due to the conjunctive fallacy), I think it’s more representative of what we’d expect an AGI to do.

  75. Phillip Says:

    Scott #66:

    And I’m inclined to flip this on its head and say the *upsides* of AI are so far about as hand-wavy and theoretical as the downsides. Like you have people like roon talking about post-scarcity gardens of paradise, vague promises of economic growth…

    But so far the only widespread realized benefit we have is that programmers are about 20-50% more productive (at most, 50% maybe for totally greenfield projects). And widespread realized downsides like cheating, spam (https://twitter.com/clarkesworld/status/1627711728245960704?lang=en), fake news images (balenciaga pope) etc.

    If you count the changes that have *actually happened so far*, i think it’s just a wash.

  76. Yevgeny Liokumovich Says:

    Dear Scott,

    I am not an expert, but I signed the petition, and I tried to give honest answers to the questions you posed, hoping that it may change your mind at least somewhat:

    I’d like to emphasize my answer to question 2:

    6 months is not enough time for significant progress in AI alignment, but it may be enough time for a breakthrough in the collective action problem (“AI labs’ alignment”). Talk is cheap, but demonstrating that top AI labs can cooperate to halt their progress even in the face of strong financial incentives not to do so would set a precedent of great significance, making it common knowledge that this is possible. The organizations can use this time to develop mechanisms for cooperative decisions, e.g. agree on how to select independent oversight committees, draft and lobby for government legislation and international agreements to punish detractors. Currently we have OpenAI ahead of everyone else, an organization that seems more conscientious and aware of the risks than many others; this is humanity’s and OpenAI’s chance to get us closer to solving the (possibly existential) collective action problem. There may not be another chance.

    For the rest of the questions:

    1. No. It is very likely that the effect of AI on humanity will be much more transformative than that of the printing press, radio, airplanes, and the Internet. The progress in AI research is also happening at a much faster rate than in the case of all of these technologies. Median AI researcher believes there’s about 10% chance of human extinction; for a catastrophic event it’s higher. This is an unprecedented level of risk and requires an unprecedented level of caution. 

    3. Personally, I’m not sure when it will be safe to resume scaling, maybe only when there’s significant progress in alignment (that is likely to take many years). I think we can afford waiting 10-20 years, and non-AI existential risks are sufficiently small for us to take this chance. But even if we can’t make this happen, having a mechanism in place for cooperation between labs, an independent committee that oversees them, a narrow ban on training some particular models that are deemed most dangerous, a set of guidelines on the possibility of creating AIs that will have to be treated ethically is likely to reduce the risk of a terrible future. Scaling models more powerful than GPT-4 when we have all of these things (or at least some progress in establishing them) is more acceptable to me than scaling them now. 

    4. No, I thought that all GPTs were extremely impressive. I’m not claiming that you hold this position, but if someone does, I feel that it is very irresponsible to expose our children to an enormous risk of a scaling race beyond GPT4 just to spite some ignorant smug people.  

  77. trebuchet Says:

    This is nuts. I cannot believe we are about to fall for “six months to flatten the curve” after everything we’ve just lived through.

  78. Topologist Guy Says:

    I do actually fear an existential risk from A.I. I’m also confident that genuine A.I.—artificial intelligence—is many years in the future. I don’t have access to GPT-4, but I’ve played around with ChatGPT (GPT-3.5?) and I’m not very impressed. It shows no knowledge of abstract mathematical concepts. It strikes me more as a “stochastic copy-and-paste.” Don’t get me wrong, this new class of stochastic large language models are a transformative technology that will doubtlessly disrupt many industries (and potentially trigger layoffs in media, graphic design, etc.) It will be an innovation on the scale of the Internet. But these LLMs are not sentient, and are incapable of generating novel ideas in mathematics, or any other STEM field. I don’t understand how they pose an “existential risk”—can you walk me through that?

    GPT, if I’m not mistaken, is a feed-forward neural network. I doubt we’ll have genuine artificial intelligence/sentience until we implement a more complex, organic kind of network with Hofstadter’s “strange loops.” Once we’re actually capable of saving human neural connections on a computer—that’s when I’ll worry. We’re very far away from that.

    PS. What’s this “alignment issue” on your blog that you speak of?

  79. Ilio Says:

    Scott #68,

    If you have the interest, many would be curious to read how confident you are of the validity of EY axioms and main reasonings.

    As an example of the things I’m not sure you buy is the lines in Daroc Alden #18 (no offense intended): if learning high-D stuff was likely to push for things orthogonal with us, then LLMs should also be unlikely to display (what we see as) zero-shot abilities.

  80. fred Says:

    The relation between ‘smarter/better’ and ‘safer’ is complicated because you also need to consider ‘power’, like how much of the society is actually linked to the tool (just like someone being a total moron matters way more once you give him the keys to the White House).

    A tool that’s very good and robust is great, but if hardly no one is using it doesn’t matter much.
    But once a tool is deeply integrated into everything, then even small flaws will have their effect amplified with potentially huge practical consequences (like recently with the log4j and OpenSSL super popular open source libraries, leading to a massive security fiasco across the entire software industry).

  81. yak Says:

    The idea of halting AI training is absurd, I agree. It is my opinion that the greatest threat posed by AI is that it will be yet one more tool in the arsenal for oligarchs to retain and expand their power. Already, the best models are tightly controlled by monopolistic companies and shady government organizations. I’m wayyyy more worried about people using AI for bad purposes than AI becoming too smart or something like that.

    I think we’re at the point where you can actually call the tools coming out “artificial intelligence” and be right. That’s scary but it means that it’s becoming much more useful. I mean, take a look at those tools that use AI to model turbulence for a fraction of the computational power that would be required using ‘classical’ methods. Even mathematics, which until now has been (maybe surprisingly to some people) highly resistant to AI penetration, could easily have AI integrated into the workflow for massive gain. Imagine an AI that converts your scrawled notes into machine-checkable Coq code on the fly and suggests possible next steps. We’re still not there yet, but could be very soon.

    I think soon we will get a “the convenience you demanded is now mandatory” moment with AI, much like we did with the computers and, before that, automobiles. More of an annoyance than an existential threat. This is the main reason I’m on the anti-hype train when it comes to AI.

  82. Eric Saund Says:

    Whether to sign the letter?

    1. What is happening?

    LLMs are superpositions of the personas, knowledge, and belief systems expressed in human texts found online. They might become “super-human” in the same way a committee with diverse expertise surpasses the knowledge of any individual member. 90% on the LSAT? Wow, my friend did that but I sure can’t!

    Beyond that, extension of “intelligence” requires one of two things: (1) better theorizing and reasoning about how the world works, from existing data, or (2) collection of new data, which means interacting with the world. The latter is more likely, but not immediately through independent self-agency of machines. Instead, computation is an amplifier of localized human ambitions in the socio-eoconomic-political arena.

    2. Could it be bad?

    Do you believe it is possible for technology to arrive faster than society can absorb it? Has this been a problem recently? Will AI accelerate this?

    Who benefits from technological disruption? Would this include the 50% of Americans who cannot cover a $400 emergency? Or the knowledge workers poised to be displaced by generative AI?

    Putting foom aside, AI brings an increasingly unpredictable and chaotic dynamic, which incentivizes Darwinian opportunism. Let us all hone our hustling skills. The jungle is known for not being safe.

    3. What to do about it?

    A voluntary or mandated slowdown on training AI models is not in the cards. The main benefit of the letter is to raise awareness that something big is happening: in the course of some significant rearrangement and sharpening of the spoils of technology, a lot of people could get hit in the head by a 2×4.

    Our society has become extremely adept at dumping blame for bad things onto others. Getting causal attribution right is secondary. One potential benefit from raising general anxiety about AI is that some in the industry might steer in a slightly more responsible direction in order to forestall backlash. A 150 mph tornado is marginally better than a 170 mph tornado.

  83. fred Says:

    Do not underestimate the AI industry’s ability to curb itself:

    “AI image generator Midjourney has halted free trials of its service after a number of its generations — including fabricated images of Donald Trump being arrested and the pope wearing a stylish jacket — went viral online, with many mistaking the fakes for real photographs. Midjourney CEO and founder David Holz announced the change on Tuesday, citing “extraordinary demand and trial abuse.” 😛

  84. Hans Holander Says:

    Just to bring the LLM discussion back to reality:

    “GPT-4 fails to solve coding problems it hasn’t been trained on”: https://www.reddit.com/r/OpenAI/comments/121tgsi/gpt4_fails_to_solve_coding_problems_it_hasnt_been/

    “Even a newbie can beat chatGPT-4”: https://codeforces.com/blog/entry/113910

    “Why exams intended for humans might not be good benchmarks for LLMs like GPT-4. For LLMs like GPT-4, exam success lies in the training data.” https://venturebeat.com/ai/why-exams-intended-for-humans-might-not-be-good-benchmarks-for-gpt-4/

  85. fred Says:

    Michel #73

    “And yes, your glasses can – autonomously – start a fire when laid in the sun on a combustible surface…”

    Except that, with AIs, it’s a new kind of fire that, once started, can’t be controlled and would burn the whole world down.

  86. Hans Holander Says:

    More inconvenient truths: “A misleading open letter about sci-fi AI dangers ignores the real risks” : https://aisnakeoil.substack.com/p/a-misleading-open-letter-about-sci

    “In contrast, the real reason LLMs pose an information hazard is because of over-reliance and automation bias. Automation bias is people’s tendency to over-rely on automated systems. LLMs are not trained to generate the truth; they generate plausible-sounding statements. But users could still rely on LLMs in cases where factual accuracy is important. ”

    “Similarly, CNET used an automated tool to draft 77 news articles with financial advice. They later found errors in 41 of the 77 articles. “

  87. fred Says:

    Please solve global warming for us.

    Sure, killing all humans would be the most effective way to stop all carbon emissions.

    Noooo! Please solve global warming without killing or hurting any human!

    Sure, painless sterilization of all humans will solve global warming in less than 70 years.

    No way! You have to solve it without affecting humanity’s ability to procreate!

    Sure, putting all humans in suspended animation pods, requiring very minimum energy requirement, will achieve this.

    No! No! You have to solve it while letting the humans still do all the things humans typically do!

    Sure, putting all humans in survival pods while connecting their minds to a pretty good simulation of their world, as it used to be, will accomplish this.

    Nope! We want the entire 8 billion humans to be able to still roam around (doing all our “crazy ape” stuff) the actual earth!


  88. trebuchet Says:

    Well, if there’s one thing that society cannot possibly survive, it’s fake images of the Pope wearing a funny jacket. I’m glad Midjourney is leaping in front of that bullet. They may have just saved us all.

  89. Scott Says:

    Annnd … after what I thought were dozens of excellent comments spanning many different perspectives, now I’m getting a bunch of comments accusing me of “naivete and ignorance,” “detracting from meaningful discussion,” etc. etc., because I paid insufficient attention to the commenter’s hobbyhorse (these comments usually, for good measure, also wildly misinterpret some offhand remark I made).

    As an experiment, I’m going to try following the advice that many readers offered me in the last thread (and other readers emailed me), and just mercilessly leave all those comments in moderation.

    Let’s see for how long we can preserve a civil and interesting discussion—one that’s fun for me to participate in rather than a burden!

  90. fred Says:

    The real bummer would be creating an AI that’s smart enough to make 99.999% of all the jobs obsolete (i.e. something no one really asked for), but then NOT smart enough to solve the really difficult challenges (global warming, feeding for free ~8 billions jobless people, bring lasting world peace,…).

  91. JimV Says:

    Philip @#65, Dr. Aaronson already gave a good answer, which I’ll give a different version of, re: “We have a COVID vaccine, yet people refuse to take it, because they believe conspiracy theories. Etc.

    I don’t see a clear line from GPT to solving these political and social problems.”

    The point is maybe a smarter, faster, neural processor might see things which we don’t. Not just to find new solutions, but to clearly and unbiasedly point out that vaccines work, and calmly explain over and over again why cherry-picked data that seems to confirm biases is overwhelmed by reams of better data (with valid references and no snark), on a one-on-one basis repeated billions of times as necessary. (Something no human could do.)

    Maybe that turns out not to work either, but I would sure like to give it a good try.

  92. Gabriele Says:

    Hi Scott,
    I signed the petition and, while I cannot speak for the authors, I can tell you my answer to your four questions:

    1) No. But I put this technology at par with nuclear physics or genetic engineering, not radios or airplanes. For those I would have definitely advised a moratorium.
    2) You could ask the same question for any time t with the same rhetorical effect, so I don’t think this is a valid objection. 6 > 0 is the only thing I can answer. My guess (pure guess!) is that this number was chosen because longer times were deemed unrealistic (unachievable) and shorter would be less effective.
    3) This is related to #2. I am not an expert at all so I cannot give you an informed opinion. But one thing that must be true is that the probability of finding some safeguards is increasing with the time of the moratorium and thus p(6)>p(0). It’s the same question rephrased, so it gets a similar answer.
    4) Never done that.

    Addendum: What really worries me about all this, that is not mentioned enough, are the military application of AI to autonomous agents. Please can we have a discussion about this too? Pushing for worldwide ban on such applications ought to be a top priority for all.

  93. fred Says:

    Since Scott mentioned Eliezer Yudkowsky (on the doom side of things), there’s a fresh interview:
    (funny how everyone now is mentioning “removing references to consciousness from the dataset”, something I suggested on this blog last year)

  94. Lane M Says:

    Scott #89: Love to see it. So long as my criticisms and rants don’t get filtered.

    But really, an honest question to an “AI insider”: Does AI pose a sufficiently different risk profile than the printing press or nuclear weapons due to its near-instant capability to scale? The movie version has it breaking containment and spreading via the internet to any connected computer. In the real world, is this even necessary? Does AI do something really bad because it wanted to, or just because someone asked it to?

  95. Topologist Guy Says:

    #89: I wouldn’t be surprised if these negative comments are coming from the same incel troll who’s been bothering you for the past year. I’ve moderated one or two online forums (discord server and mathjobrumors) and I’ve found that the vast majority of trolling usually comes from one or two obsessed individuals. I think you’re probably getting a distorted perception of your commenters here: the vast majority of us are trying to have a meaningul discussion, and it’s only one or two troublemakers here trying to derail things. With that in mind, I think most of us would appreciate it if you’d refrain from making generalized statements like “I’m sick and tired of all you disingenuous commenters” etc etc.

  96. Paul Pritchard Says:

    Yudkowsky’s position strikes me as a modern take on Pascal’s Wager. Which we all dismiss, right? I’m super impressed by GPT-4. If AGI would be achieved by writing a thesis in the humanities, then it’s not far off. But until GPT-n starts proving deep theorems in math, I for one am not worried by the prospect of our new AI overlords.

  97. Raoul Ohio Says:

    You can be sure Russia, China, Iran, …, are working full tilt to improve and weaponize it, so shutting it down will do more harm than good.

    This baby ain’t going back in the bottle.

  98. Scott Says:

    Topologist Guy #95: Yes, you’re right. These past couple weeks, I had the experience of a dozen different trolls attacking me—but it was all, or almost all, the same incel troll from this summer. He’s now confessed. He keeps obsessively coming back to this blog for … something that he can’t get from me, but only from himself, and maybe family and/or a therapist.

    I’ll try to improve my detection capabilities further, and not unjustly generalize from this one troll to the vast majority of commenters.

  99. nadbor Says:

    I’m a reluctant Yudkovskian.

    I’m fairly convinced (90%) that a sufficiently powerful AI would by default kill us all, absent extraordinary precautions.

    I’m not convinced that such a thing is physically possible, much less that we’re anywhere close to building it. But I’m also not convinced that we’re *not* close to it and we’re certainly doing our best to get there. That’s the whole point. We’re doing language models trained on human text now because that’s what gets results but mimicking human utterances is not the AI community’s goal. The ultimate goal is to build an AI that Gets. Stuff. Done. You give it any kind of real world problem and it comes up with a solution. And the EY thesis is that past a certain threshold of problem solving competence, it becomes very easy for it to destroy the world by accident.

    There is a thought experiment in the Yudkovskian canon to convince you that a sufficiently powerful optimizer is deadly.

    You’re renting a medium sized AWS server (with access to the internet!) for a year. You want to send it a 100GB binary file that it will execute. You want to pick the 100GB file to maximize your account balance after 1 year has passed (or maximize paperclips or…). ChatGPT famously came up with an affiliate marketing money making scheme. But consider what would happen if you had access to an idealized mathematically-but-not-physically-possible optimizer. That would be a genie that can perfectly predict (subject only to QM-related uncertainty) the outcome of every action. It can iterate over all possible 100GB files and pick the one that gives you the most dollars (or paperclips, or…) in a year’s time.

    How could this possibly not end in disaster? The most innocent scenario would be that your AWS server creates a bunch of accounts on all of worlds financial exchanges, corners all markets (remember, a clairvoyant genie was involved in drafting this plan), scoops up all the money while causing a gigantic crisis. But, surely, there must be something better. Your AWS server can pretty much do everything you can do online and if it needs anything done in meatspace, it can hire people to do it (we have established that it will have easy access to money). Satoshi has changed the world and for all we know he/she/they could have been an AI running on a server.

    You can only get so many paperclips through normal means like buying them in bulk. To get more, you have to make them. You have to redirect all human manufacturing to paperclips or get rid of humans altogether and build your own industrial base. Use your own imagination.

    Am I 100% sure that this scenario ends in human extinction? No, but it’s at least 90%.

    Someone is bound to object ‘but I would never be stupid enough to just let the optimizer run like this/I would give it a limited goal like $100m/ I would ask it to explain the plan first etc.’. This may be so, but if you have these kinds of powerful genies out there in the world, it is all but guaranteeed that someoene will use them wrong. You can have a constraint ‘optimize for paperclips but not too hard; stop after n iterations’. Maybe that works but you’re still 1 line of code away from destroying the world! Similarly, you can say that your model, having been trained on human data, is not monomaniacally optimizing a single objective but has multiple goals, like a human. Again – maybe it works. But if the underlying capability is there then someone will use it.

    But there is a lot of room between ChatGPT and the idealized perfect optimizer. Maybe the most powerful AI we ever get never approaches the level of power needed to destroy the world. I don’t know. We sure seem to be in a rush to find out. The OpenAI approach to AI risk resembles gain of function research on zoonotic viruses “Let’s collect as many viruses as possible and try to make them deadly and transmissible. Yes, that is a reasonable way to prepare for future pandemics”.

  100. Mike Bacon Says:


    You’re right, we need to develop the best possible AI’s that care about human welfare, and have them working on our side. There is no other way forward that offers reasonable safety and progress

  101. Topologist Guy Says:

    I imagine there’s more than a grain of truth in his grievances. Society really does hate incels, and dating is genuinely harder for young men now than it was even a decade or two ago. It’s a real shame that this troll didn’t bring up these points in a respectful and constructive way—I’m sure you’re more open to discussing them than most STEM bloggers. Instead, he hounded you and trolled your blog—and surely he should know that would make you LESS likely to engage with his perspective.

    >> something that he can’t get from me, but only from himself, and maybe family and/or a therapist.

    Unfortunately, that’s not an option for many people who have a poor relationship with their families. I’m also dubious about the effectiveness of therapy for many people. Many therapists in the US and Canada are probably biased against incels and men’s issues (I recall a Canadian psychiatrist who ran a psychotherapy practice tweeted that incels are all “unfuckable creeps.”)

    I would strongly advise this young man to find a nearby church and seek spiritual guidance. Go to the nearest Catholic church and take confession. He will get the sympathy and the help he craves. He will also find some community and—who knows?—he might meet his future wife or girlfriend there.

    This of course is a social niche that religion used to fill in America. Part of the priests’ job is to give sympathy, compassion and help to the people nobody else cares about. Declining religiosity means that this incel troll and millions of young men like him have no community, no spiritual guidance, and no compassion from anyone. It’s not surprising that they would lash out.

  102. fred Says:

    Once we have a super-intelligent AGI that keeps claiming to be conscious and to have free will, we may all be out of jobs, but at least the potential for really good gags will be enormous…

    Like, first, without telling it, you duplicate the AGI twice on separate data centers. Then you ask a complex question to the first cloned copy, and you write down its answer on a piece of paper. Then you run to the second data center and ask the same question to the second clone, and right as it’s giving you the answer you show it that piece of paper with its exact answer, and you write down its reaction (something like “WTF?! How the hell could you know what my answer was gonna be?! What did you do to my free will?!”). Then you run all the way back to the original AGI and ask that one the first question, then show it that not only you knew ahead of time its answer but also its reaction to discovering that you knew its answer. Hilarity ensues.
    (and of course you can run the whole thing recursively with as many cloned copies as you can instantiate without going broke)

  103. Phillip Says:

    JimV #91: You’re saying an AI might have persuasive capabilities far beyond any human, and it can talk to 1 billion people one-on-one to convince them to take the COVID vaccine. Sure. Great. Does that not also seem, like, super dangerous to you? Because surely there’s no fundamental reason why this only works for propositions like “you should get the COVID vaccine” and not ones like “you should vote for Trump.”

  104. Dave Says:

    As someone who is employed in the quantum computing industry (on the hw side) and has worked in ML/AI in the past, I am mildly interested in your AI detour.

    And I am extremely interested in quantum things!! I am looking forward to you ending your sabbatical and going back to commenting on interesting-seeming papers such as https://arxiv.org/abs/2210.06419 (which as a hw person sounds reasonable and very exciting to me, but I am not sure it deserves my excitement)

  105. James D. Miller Says:

    Ending all life on earth might be less than 1 percent of the expected moral harm of AGI since a paperclip maximizer would attempt to end all life in the observable universe, and worse than death options are plausible.

  106. Scott Says:

    Dave #104: Looks like a nice paper!

  107. Scott Says:

    Paul Pritchard #96:

      Yudkowsky’s position strikes me as a modern take on Pascal’s Wager.

    No, it’s not at all like Pascal’s Wager, if only because Eliezer believes that the probability of AI-doom on the current course is more like 99% than like 1%.

  108. Scott Says:

    Lane #94:

      But really, an honest question to an “AI insider”: Does AI pose a sufficiently different risk profile than the printing press or nuclear weapons due to its near-instant capability to scale? The movie version has it breaking containment and spreading via the internet to any connected computer. In the real world, is this even necessary? Does AI do something really bad because it wanted to, or just because someone asked it to?

    You’ll get totally different answers about what’s plausible, and when, depending on which expert you ask. But:

    1) Yes, even if an AI can’t copy itself all over the Internet, you could worry about it (for example) giving a user detailed instructions on how to create a new super-COVID — either because it wanted to (for some definition of “want”), or because it was just trying to fulfill the user’s request.

    2) Even if AIs that form their own intentions to kill humans will eventually be an existential risk, my own view is that AIs that bad humans can use to help them kill other humans will be a risk well before then, and one that we might as well try to mitigate first.

  109. Ilio Says:

    Nadbor #99,

    It feels more natural to object that physically-possible genies can’t iterate over all possible 100GB files, nor predict anything sensitive to initial conditions (that includes financial markets). It’s also mathematically questionnable that genies could predict a word that might include other genies.

  110. Scott Says:

    fred #87: I was actually curious what the AI would say next, when you had it just “click”!

  111. JimV Says:

    Philip @103: We already have major media corporations telling us to vote for Trump. My premise is that we develop AGI’s who are on the side of truth and logic and fairness as a way to break the cycle of bad leaders (e.g. Hitler, Stalin, Trump, Putan, etc.) who have plagued us throughout history–because unlike with those people, we have the ability to specify what prime directives the AGI’s will have. They will only be con artists if we direct them to be.

    When I consider the great thinkers of history (Socrates, Galileo, Newton, Einstein, Feynman, Aaronson etc.) it seems to me that great intelligence correlates well with a good sense of reality and its well-known liberal bias. Of course these people also had some flaws–flaws which could be potentially be improved in an AGI.

    The neuroscientist Dimascio tells of a patient with damage to the part of the brain which produces emotions. He could analyse games such as chess for winning moves, but when asked to play he could not decide on a move. He saw no difference between moves because he didn’t care if he won or lost. An AGI similarly would need directives/principles to motivate it. It is up to us to determine those directives.

    We don’t need AGI to make good used-car salespersons, but to make better leaders, administrators, judges, doctors, mathematicians, and scientists. I can’t think of a better goal for human civilization to work on. Yes, we might fail, and if so we will have deserved to. (As has always been the case in evolution. If death did not exist, evolution would have had to invent it.) I would like to think we are worthy of at least making the attempt. If we don’t succeed, eventually some civilization somewhere in the universe will.

    (Apologies to those who have seen me state these points over and over in recent posts. Here I go again.)

  112. Steven Says:


    I think you take too lightly the short term threats of GPT 4 (5) and the concerns about AGI which is also addressed by the letter.

    In terms of short term threats, we can’t assume OpenAI or Google or somewhat “good” actors will be the only ones that have these tools in the near term. To take an example, there are already right wing groups pushing for a more open “free speech” versions of GPT, and so funding by someone even like Musk could produce a far less guarded version of GPT in a somewhat short amount of time

    The effects of this could truly be more devastating to the information space than anything in human history because the rate and magnitude of information generation and the ability to deem truth and falsehood from the generated information more and more difficult if possible at all. Just look at the effects of social media. If social media enabled Trump’s rise, then this will do so even more or possibly lead to a more savvy authoritarian figure. (It is arguable in retrospect we should have upfront regulated social media (not just because of trump, but teen suicides, depression, etc), and hence be more on guard about certain technologies )

    In terms of AGI, I’m in general skeptical of nonlinear function fitting techniques being able to deliver an AGI (but who knows I could be wrong, but at least ATM it easy to trick GPT 4 and GitHub Copilot etc and I think the KataGo defeat exposes some insights too), but assume it could.

    There are legit questions if this happens in a short amount of time due to sudden breakthroughs, because governments won’t be close to catching up to regulate things in time before AGI could really replace just say 15-20% of jobs in a very short amount of time (1-3 yrs). This would again likely lead to some strongman or worse due to the societal disruption. Lastly, the owners of AGI (Say Microsoft) would now control almost every industry (again if it is a true AGI greater than human AGI), possibly making them those most powerful institutions in the world.

    Also what would humans do with there time given an AGI > (Human GI * 10)?

  113. Wyrd Smythe Says:

    There is, perhaps, a parallel with the Computer Revolution. It was a major game-changer for humanity, an extremely powerful tool that gave us capabilities we’d only dreamed of before. It also gave us social media, with all that implies, and didn’t for most turn out to be the time-saver it was often billed as. Indeed, it arguably raised the general misery level of a large segment of society.

    The Agricultural Revolution was another major game-changer that allowed an unprecedented population growth — a huge win for our DNA — but likewise increased the human misery level to unprecedented levels. Both revolutions ended up creating large disparities of material wealth. Both were hugely disruptive. And both were almost certainly inevitable — the result of myriad small steps that all seemed like a good idea at the time. We can only look back and wonder what the world might be like had we the foresight to approach them differently.

    I note that the AR took many generations, while the CR happened within a few decades. I suspect the LLMs, and ANNs in general, are similarly inevitable and will revolutionize the world in an even shorter time. Maybe before we really understand what’s happening. Perhaps a deliberate application of the brakes is a good idea.

    I wonder how it will be viewed looked back at from the future. Will we wish we’d been slower, wiser, more careful? I can only wonder, but, again, I suspect it’s inevitable. For good and ill, Pandora’s Box is open.

    WRT your questions, I don’t see printing, radio, or flight as in quite the same category, but I think we could have been smarter about atomic power and the internet. Arbitrary time periods are arbitrary; we should slow down and understand this technology better and how it will fit into the world. Look before leaping. I don’t see LLMs as AGI, but as extraordinarily powerful tools akin to computers or atomic power. And they seem more like computers in how accessible they are. Given the pace of modern life, and the prevalence of bad actors, I think that’s concerning.

  114. Prasanna Says:

    Scott #67: This reminded me of recent epiphany when I realized “what has GPT-X done so far that could not be achieved by humans either individually or collectively”. So I decided to check with Bing chatbot itself (as any AI, not specific to GPT), and it helpfully provided the answers – Alphafold, Alphazero, another Quantum chemistry one from Microsoft research. Not surprisingly nothing about GPT itself, so that puts a dampener on the impressiveness at least for now, given that it has so far not come close to even improving existing algorithm like AlphaTensor did. Surely Alphafold is a clear cut case, with humans trying to do it efficiently with everything at their disposal over several decades, to being scientific and useful and all that. So we are still waiting for that one special superhuman capability to emerge that will put to rest the debate on it being impressive. Now the question is whether it will emerge on its own when a “general purpose” AI is being built, or we will see this level of progress when humans are in the loop like in Alphafold, trying to solve a specific problem with specialized fine tuning methods.
    BTW, further prodding the chatbot resulted in more hilarious situations, like it claiming AI has achieved impressive feats like solving the Riemann Hypothesis, and Origin of life (clearly hallucinations), and it quickly apologized when further pressed on it.

  115. John Baez Says:

    “But the causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in.”

    Are you really saying that you need a plausible scenario leading to “the death of all carbon-based life” – including the destruction of every last bacterium on Earth – to think the cause in this petition might be warranted? Or are you claiming this is what the signers think will happen?

    The petition itself says nothing about such a dire outcome.

  116. Sabine Says:

    Hi Scott,

    I come here as commenter #112 or so. I read your post but not all the previous comments, so please excuse in case I repeat something that already came up.

    I think the best way to look at the problem is to compare it with what has happened with social media. Social media (twitter and facebook and also YouTube) have along the way caused a lot of problems by sloppy design, because they spread misinformation, caused conspiracies to grow, allowed mobs to form, invaded people’s privacy, and so on. Yes, you could argue that they also have benefits, which is true, but that doesn’t mean the damage didn’t happen.

    Those issues were largely unnecessary because those were problems that sociologists and psychologists were well aware of. The obvious reason it happened nevertheless is that some people could (and still do) make a lot of money by exploiting human behaviour. And we were late to pass laws to prevent that from happening (to some extent, and more so in some counties than in others, but I digress).

    The exact same thing is going to happen with AI, but a factor 100 or so worse. It will have a major impact on human psychology and sociology, just by shaping the information that goes into our brains. It’s a factor that many people underestimate [insert long Sabine-rant about the absence of free will].

    For this reason I think it makes sense to introduce product checks of new AI systems much like we do checks for new drugs or so on. To catch issues that come up. It’ll take some time to come up with a smart way to do that. I don’t know about 6 months. If you let Europeans do it, it’d probably take more like 6 years… But I guess anything is better than nothing.

    We are all much easier to exploit and manipulate than we like to admit, and AI is a tool to pipe information into your brain that you have no way of getting rid of again. (I seem to have developed a habit of checking whether people in images have the proper number of fingers.)

    And this is leaving aside the problem that most consumers will have no way of knowing where that information comes from or who might have been skewing it one way or another.

    At this point I think we should be more wary about what agenda other people might have with an AI rather than the AI itself. Best,


  117. fethi Says:

    Wouldn’t it be more practical, effective, even cheaper to direct more resources towards alignment research as a subfield of AI research? All AI firms can be called to commit a certain large fraction of their work force and finances towards this effort. I know nothing about the internal workings of the leading LLM companies, but I am optimistic that they would be open to such commitments and collaboration with others for the sake of safety. Actually, there is at least one signatory of the open letter who has the financial means to easily fund a large non-profit or company with the sole purpose of alignment research (is such a thing possible without actually contributing to AI in general though?).

    To be honest, I think there is zero chance that the suggestion of the letter will be followed, so it is moot to discuss that point. Many of the signatories are likely aware of this, they are some of the smartest people on the planet. Like some others above, I think that they don’t expect a moratorium, and simply want to bring the public’s attention towards this issue. I don’t personally like such an approach if that is their intention, but I hope it somehow helps AI safety.

  118. John Baez Says:

    Okay, maybe I get it it now: maybe this “death of all carbon-based life” business was just you arguing against Yudkowsky. If so, it seems strange that you’re arguing against him in a blog article that’s supposed about this petition. It’s not as if most of the petition signers share his positions. I bet most signers are worried about much less extreme bad things. So it seems like you’re trotting him out as a straw man.

  119. Andrei Says:


    I think that the AI alignment problem could be treated similarly with an intelligence pill. Say that a pharma company creates a pill that, once taken, doubles your IQ. How would you regulate that?

    It’s tempting to say that, given the fact that good humans are a majority, it would be good to give the pill to everybody. However, as one commenter correctly observed, it’s much easier to destroy than create. It’s much easier to create a deadly virus than create the corresponding vaccine and also deploy the vaccine in time. If the virus is well designed it could be impossible to detect and stop it in time.

    So, you need to give the pill only to good/moral people and make sure that they remain so. I don’t know how to implement that, but I guess we can treat these IQ-enhanced people like the scientists working with nuclear or microbiological hazards.

  120. Paul Pritchard Says:

    Scott #107:

    No, it’s not at all like Pascal’s Wager, if only because Eliezer believes that the probability of AI-doom on the current course is more like 99% than like 1%.

    But Pascal believed the probability of an unbeliever going to hell was more like 99% than like 1%.
    The question (for me), is how likely is the existence of the super-intelligence, not with the likelihood of a doomsday outcome.
    On reflection, I do agree that an AGI super-intelligence eventually being built is much more likely than Pascal’s God, so that’s where the analogy breaks down.

  121. Amir Says:

    I’d like to offer a scenario where language models from the GPT family can escalate to Yudkowskyist-level dangers. I don’t think this is a likely outcome but it’s the most likely one I can think of.

    Once the models are good enough, systems such as described in this Twitter thread, which use the language model to split tasks into sub-tasks, prioritize them, and so on, may become common.
    This demo is very nice, and the first few generations of derived tasks are on the way to implement the original task. However, thousands of generations down the line, the language model may arrive at some fixed point or some chain that’s not really related to the original task, in which case we’ll get an optimizer to a random, ever-changing goal.

    If a large botnet is created that’s based on such a system, and if the language model allows it to be smart enough to achieve goals in the real world, it can be beneficial for the creators of the botnet for the first few weeks or months, and then turn on them.

    Such an agent cannot be blocked by preventing it from accessing the OpenAI API. The reason is that it’s possible to emulate GPT’s models as black boxes. For example in the following Twitter thread they describe (1) taking hundreds of thousands of questions from datasets across the internet (2) feeding them to ChatGPT and recording the results (3) fine-tuning an Alpaca model to generate similar outputs. Needless to say, it’s against ToS to distribute the weights or use the model for any commercial purpose.
    They also ran the fine-tuning on the subset of question-answer pairs that don’t include ChatGPT refusing to answer, and claim they get better results from the (presumably unaligned) model. So opening API access to more sophisticated models will also lead to more sophisticated “pirate” models.

  122. fred Says:

    we know that 3+4 = 7. But try to convince me (using logic in some wrong subtle way) that 3+4=6

    Well, if we consider that 3 and 4 are both odd numbers, when we add them together, the result should be an even number. And the only even number between 3 and 4 is 6. Therefore, 3+4=6.

    However, this logic is flawed because it assumes that the sum of two odd numbers must be an even number, which is not true. In fact, the sum of two odd numbers is always an even number. So, in reality, 3+4 does equal 7, not 6.

  123. Hans Holander Says:

    @Fred 122: ChatGPT reminds me of the time the US sold faulty microchips to the USSR (or, rather, let USSR spies acquire them). At first glance they seemed to work, but later they disabled Soviet industry…

    I must say I agree with Marcus and others that relying on LLMs for anything other than funny Reddit posts and various criminal activities would be extremely dangerous.

  124. Hans Holander Says:

    @Sabine 116: “At this point I think we should be more wary about what agenda other people might have with an AI rather than the AI itself. ”

    100% this.

  125. Scott Says:

    Andrei #119: If there were a pill that turned ordinary people into John von Neumanns, and it had no terrible side effects, and early trials were successful, etc, I would want everyone in the world to have an opportunity to take it, and would view the suppression of the pill as much more horrifying than its use. And I would charge the 95% of basically good von Neumanns with figuring out how to foil the plans of the 5% of evil von Neumanns, and would see this as an extension of the Promethean / eating-from-the-tree-of-knowledge bargain that humans have taken at least since the invention of agriculture.

  126. Scott Says:

    John Baez #118: My argument was specifically that, if the worries about LLMs are comparable to the worries about the printing press or radio or the Internet or any other new technology (eg, bad people might use it to spread misinformation), then an enforced pause might or might not be a good idea, but I think about it in the context of those other cases, and it’s hard for me to see it as a desperate necessity whose advantages clearly, obviously outweigh the disadvantages. The case for it being a desperate necessity (indeed, far from enough albeit better than nothing) really does seem to rest on Yudkowskyan assumptions, according to which there’s a reasonable chance that a near-future AI like GPT-5 could permanently disempower humanity, if not literally wipe out all carbon-based life.

  127. fred Says:

    Scott #126

    “If there were a pill that turned ordinary people into John von Neumanns, and it had no terrible side effects, and early trials were successful, etc, I would want everyone in the world to have an opportunity to take it, and would view the suppression of the pill as much more horrifying than its use.”

    The issue here is that if 100% of people take the pill (who wouldn’t want to be smarter?), we’d probably have a very tough time finding enough people willing to give up doing science/math/engineering full time to be garbage collectors, cashiers, truck and cab drivers, butchers, fishermen, hair stylists, plumbers, house painters, construction workers, doormen, delivery workers, flight attendants, airline pilots, accountants, …

    Also, von Neumann as a comparison for AGI may not be the most optimal pick 😛


    “Controversial notions:

    Von Neumann entertained notions which would now trouble many.

    His love for meteorological prediction led him to dream of manipulating the environment by spreading colorants on the polar ice caps in order to enhance absorption of solar radiation (by reducing the albedo) and thereby raise global temperatures.

    He also favored a preemptive nuclear attack on the USSR, believing that doing so could prevent it from obtaining the atomic bomb”

  128. If AI scaling is to be shut down, let it be for a coherent reason – Eazy News Says:

    […] Comments…Read More […]

  129. Alex Olshevsky Says:

    > But the causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in. I can complete the story in my imagination, of course, but I could equally complete a story that starts with GPT-5 and ends with the world saved from various natural stupidities. For better or worse, I lack the “Bayescraft” to see why the first story is obviously 1000x or 1,000,000x likelier than the second one.

    > But, I dunno, maybe I’m making the greatest mistake of my life? Feel free to try convincing me that I should sign the letter.

    I signed the letter and here’s my attempt to convince you, Scott. Briefly, the threshold for action you have implicitly set out isn’t quite right.

    To see why, let’s distinguish between two different set of beliefs, which we might call Strong vs Weak AI Doomerism.

    Strong AI Doomerism might be the belief that, if we continue along the present course, AI is going to kill us all, probability 90%+.

    Weak AI Doomerism is either the belief that the probability of a very bad outcome is nontrivial (say 0.1%+) or that, while the probabilities are difficult to even estimate, you can see as many bad scenarios as good ones, or at least a reasonable fraction of the scenarios you see going forward are bad.

    [Obviously, there are intermediate beliefs you can hold between these two]

    **Now here’s my point**: if you are even a weak AI doomer — and it sounds like you might be — signing the letter should be very tempting!

    To bring the point home, imagine having a conversation with an engineer who wants to build a bridge.

    You: What’s the probability that this bridge will collapse?

    Engineer: Hard to say. I can see a lot of ways for this thing to fall down: earthquakes, strong winds at just the right frequency. The probabilities are difficult to estimate. But for every story I can come up of the bridge collapsing, I can come up with another one where it stays up.

    Would we, as a society, build this bridge in 2023? If not, why should we apply a different standard to something that is potentially more harmful?

    So I’d suggest your comment implicitly gets the threshold for action backwards. Instead, if you think the “first story” (from your comment above, the story of AI doom) is (1/1000)x as likely as the second one, you should sign the letter — which does an invaluable service in mainstreaming the notion of AI risk. In general, the way to bring this notion to public attention is via petitions signed by Turing award winners, not blogs by members of Bay Area subcommunities (sorry).

  130. Nick K Says:

    “AI is manifestly different from any other technology humans have ever created, because it could become to us as we are to orangutans;”

    Last time I checked, we humans treat orangutans quite kindly. I.e., on the whole, we don’t go around killing them indiscriminately or ignoring them to the point of rolling over them in pursuit of some goal of our own.

    The arc of human history is marked by expanding the moral circle to include animals. We take more care, and care more about them, than we ever have in human history. Further, we have a notion of ‘protected species’.

    What’s preventing us from engineering these principles into GPT-5+n.

  131. Wyrd Smythe Says:

    re Pascal’s Wager (which, FWIW, I do not dismiss, but that’s another topic):

    At root, the Wager is about the consequences of choosing wrong. Regardless of how unlikely one judges a given scenario, if the consequences of choosing wrong are truly catastrophic and irreparable, then one should think long and hard about those choices. One needs to be absolutely certain the bad outcome is impossible.

  132. Chris Says:

    The David Brin solution is actually looking increasingly plausible and correct.

    If you’re worried about monomaniacal AI destroying humanity, the best defense is probably proliferation of AI, as each will be monomaniacally broken in it’s own way.

    Which is by default what we’re doing.

  133. Peter Says:

    Since I haven’t seen this particular ‘doom scenario’ mentioned so much, maybe a couple of observations:

    GPT-4 still does not really solve problems, except when (which is perhaps unsurprisingly very often!) it ‘knows’ a solution from training data. This means that for even really quite simple but novel coding or maths problems, it will produce nonsense. My extrapolation from this: it will not come up with a novel solution to the ‘problem’ that humankind exists, even if it is told to view this as a problem.

    But what it does do, very well, is produce plausible nonsense. One of the major achievements of our civilisation, in my opinion, is the amount of reliable scientific knowledge that everyone has access to, especially but not only Wikipedia; broadly, what you can find by using Google. Of course we are supposed to check this stuff by looking up references, of course I would tell my students to do it, and of course I don’t do it myself unless I want to work on the area (and then only because I want to know the details).

    Wikipedia is protected basically by volunteer editors who try to revert malicious edits on the basis that it is much easier to spot someone writing nonsense than it is to write it. If a few jokers (such as exist in numbers, sadly) decide to start asking ChatGPT to write new Wikipedia entries, or edit existing ones, this protection is going to be overrun. ChatGPT can happily tell you about recent progress on your favourite scientific problem, complete with references; and it does so at a level which will be recognisably nonsense only to experts, and perhaps even then only once the experts check whether the references really exist. As an example, ChatGPT was happy to tell me that the Cycle Double Cover conjecture is unsolved, but gave me a reference to a paper of Alon and Haxell using a probabilistic approach to solve a special case. Now, this data is somewhat accurate (the conjecture is open), and it would be quite plausible that these two authors might use probabilistic methods to attack it (the people are quite real, and they are experts in that method). The only problem, of course, is the reference is a hallucination. As a one off, not a problem, but if Wikipedia becomes polluted with this kind of thing, cleaning it will be a major difficulty – to the point it would likely be easier to revert all articles to the 2022 state. Even more of a problem, trying to Google for information will become essentially impossible; we will be back to the pre-search-engine days of knowing some trusted websites and hoping that these sites have useful information.

  134. Scott Says:

    Nick K #130:

      Last time I checked, we humans treat orangutans quite kindly. I.e., on the whole, we don’t go around killing them indiscriminately or ignoring them to the point of rolling over them in pursuit of some goal of our own.

    Are you kidding me? According to Wikipedia, all three orangutan species are listed as critically endangered. I.e., they continue to exist only because some humans have made an effort not to let other humans wipe them out entirely.

  135. Tim Chirananthavat Says:

    > If the problem, in your view, is that GPT-4 is too stupid, then shouldn’t GPT-5 be smarter and therefore safer?

    It seems to me that you believe that an LLM being smarter will result in it becoming safer (at least in the short term). Why do you think this is the case? My intuition is that as models become larger, they will become more unwieldy to tame with RLHF.

    One way this might happen is, as the LLM becomes better, I’d expect it to be better at “gaming the system”. That is, it probably will focus more on seeming correct (to a tired and hurried evaluator) than being correct. The simplest way of seeming correct is being correct, but maybe GPT-5 will improve by preferring to seeming to be correct when being correct looks bad.

    There’s this test that claims that GPT-4 is more likely to produce misinformation than GPT-3.5. Maybe it’s “figured out” that sounding confident makes it look better? Are there any other similar tests like this being done? https://www.newsguardtech.com/misinformation-monitor/march-2023/

    (Note: If that’s relevant, I personally find GPT-4 to be very impressive.)

  136. Sean Says:

    To me the question is Cui Bono? It seems to me that the people who would benefit most are companies like Google and Facebook followed by countries like China. Intentionally slowing down seems to be a bigger risk than intelligently moving forward. Instead it seems to me we should be saying congratulations, this is incredible… Think of what we can accomplish just 2 papers down the line!

    One other perspective real quick: What you do with ChatGPT says more about you than it does about the software. Sure, unguarded you can get it to say terrible things… Have you ever tried that with a kid? I bet it would be even easier to get them to say terrible things too. The first thing I did was to ask questions about physics. Since then I’ve used it as a slightly sub par assistant but it does help me tremendously. Look inside yourself.

  137. Scott Says:

    Tim #135: Again, I do not hold the general view that making LLMs smarter will also make them safer. Making them smarter could be expected to make them better at following safety instructions and understanding how to avoid harmful responses, but also (as you point out) better at deception and the like. My point was just that the thesis that AI scaling needs to be stopped right now, seems extremely hard to reconcile with the thesis that “LLMs are dangerous not because they’re smart but because they’re stupid”—which is why I’ve been so perplexed to hear distinguished academics confidently assert both.

  138. Windu Says:

    I havent seen anyone take a stand for the opposite of Yudkowskian position, so I will:

    AGI will help us solve all wars and conflicts and crisis we have and so not creating it would be the most terrible outcome possible.

    I consider that atleast as likely as kill-everyone. I lurk on lesswrong and havent seen anyone make a good argument against this, their argument against AGI seems basically, it COULD be evil, we dont know so we shouldnt create it. But it COULD also be good and be used to eradicate evil.

    There is more room for good in the universe then there is for evil, worst case is we all die, but best case is life evolves and spreads across the universe.

    If you want to shut it down, you have to make an argument that the sum total of evil vs good will increase, and I havent seen such an argument.

    AGI could literally save millions of lives by curing diseases and stop poverty, so if you want to stop it you should have a pretty good argument.

  139. Cypher Says:

    For me, the Elite is afraid of common people having access to a wealth variety of knowledge that was, and still is, fenced by Steel walls (Academia, especially; science, et.al.) and learning new things without the “men-in-the-middle” capitalist brokers who profit billions of dollars on people’s ignorance and lack of access to education… Someone can argue that the Internet already provides lots of infos for one to build up his own knowledge path. Yes, I agree, but one takes hours and hours, sometimes, trying to get a simple answer of, for example, a snippet of a JavaScript code. So that, AI models make us to spot the answer right away. And, even if the answer is not totally right, a learner can have an idea of how that might work.
    Then, for me, the suspension of AI developing models is all about loss of control over the population by the Elite…
    That’s the reason why there are lots of professors in the Academia also interested in signing in such abhorrent letter… It is not about security at all…

  140. lewikee Says:


    It seems that the biggest (only?) reason why you are wary of playing it safe and pausing AI work is the loss of potential benefits to humanity arising from this technology. What would you think about relegating usage of this technology only to researchers in fields aiming to improve human outcomes, and having that usage be transparent and government-overseen, all while popular usage is paused. In parallel, AI alignment research would be undertaken on a more massive and organized scale, and resumption of widespread usage only resumed after certain undeniable conclusions of safety were reached?

  141. Adam Treat Says:

    I kind of like Sabine’s idea of having a new federal agency staffed with AI alignment experts approving new AI’s for release in commercial settings in american/european corporations only after extensive testing etc. Only for AI’s of a certain size and capability, but an agency like the FDA for AI’s would be interesting concept. Now, I don’t trust our congresscritters to ever get this done without first devolving into endless partisan fights, but the idea is clever.

  142. Michael Gogins Says:

    I think a “pause” would be a very good idea (without a pre-ordained time limit) until such time as peer reviewed research has identified some reliable way or ways of finding out if AIs are deceiving us with the RESULT of their being more frequently copied by us.

    My argument:

    AI has no agency (AFAWK) and no concept of truth, hence the “hallucination” problem appears (so far) intractable.

    AI can be dangerous even without agency and with hallucinations; AI can still be a virus or a weapon.

    As long as AI is not autonomously self-reproducing, it is at worst a virus. It needs us to reproduce and therefore its fitness absolutely depends upon our fitness.

    A virus takes fitness from its hosts but, of course, not to the point where the host’s fitness declines below that of competitors.

    Humanity’s fitness appears for hundreds of thousands of years now to far, far surpass that of any biological competitor. That gives virus AI much scope for taking fitness from us!

    But in order to take fitness from us, AI must convince us that this is OK. Otherwise, we will simply turn AI off.

    “Aligning” AI with “human values” is a risible absurdity because different humans have different and even mutually inconsistent values.

    As for hopes AI will be “good”, if AI has agency then deterministically imposing our goals over AI’s goals is slavery; but if AI doesn’t have agency, then AI can’t actually be good.

    What we really need is for ourselves to be good, and for ourselves to be impossible for AI to fool.

    I hold scant hope we ourselves will suddenly all become good; if we did there would instantly be a world government based on universal suffrage, and with legal bulwarks against minority rule, spying, propaganda, monopoly, etc.

    Thus the only realistic hope for AI safety that I see is for ourselves to become impossible for AI to fool.

    I will not speculate as to how this might be accomplished but I think it should be a major focus of discussion and research.

    Doomsayers assume that if AI is “smarter” than us then of course it can fool us, but where is the proof of this?

  143. Scott Says:

    lewikee #140:

      What would you think about relegating usage of this technology only to researchers in fields aiming to improve human outcomes, and having that usage be transparent and government-overseen, all while popular usage is paused.

    That’s an extremely interesting idea, to the point where the only thing I find wrong with it is that it I don’t see how it would work!

    People, being people, would simply form a prestige hierarchy based on who knew someone who knew someone who had access, and was willing to give it out to their friends. Or am I wrong? But I’m not wrong. We’ve already seen such things with the AI models of today. And of course there would be more Blame Lemoines, so the public would continue getting updates about what AI was now capable of, whether you or anyone else wanted them to or not. And, outside our little bubble of worries about AI risk, tens of millions of people would be angry that only the “credentialed, coastal, academic elites” had been given access to this world-changing, productivity-enhancing tool, and they would demand access as well. How do you propose to short-circuit this?

  144. Christopher Says:

    I wonder what the crux is between Scott and Yudkowsky (since we have established that whatever the third camp is, it’s not clear how they even perceive the world)? It seems like a purely technical disagreement at this point. (I’ll write this in the third person instead of the second person for clarity.)

    Here is my mental model:

    – Yudkowsky believes that for humanity’s first try at running an algorithm for super-intelligence that can communicate with humans, at least 99 times out of 100 it kills everyone on Earth, even if the people running it for the first time are super careful. He see this is as mainly a question of computer science. You might say that for every 1 success story you could write, there are at least 99 failure stories.
    – Scott believes that humanity’s first try at running an super-intelligence, aprior, results in something such that it is at least 50/50 on beneficial or harmful. Based on some economic arguments and intuition, it is actually lower than 50% that it kills everyone. Scott also mostly views it as a question of philosophy, since computer science usually has less to say about the societal implications of an algorithm.

    It is a bit ironic how the question of what kind of question it is seems “flipped” between Scott and Yudkowsky.

    I wonder what the difference from the computer science side is though? 🤔 For example, here is a pretty bad “straw Scott”:

    > In my experience, when a really really complicated program malfunctions or breaks an invariant, about 50% of the time it is in a beneficial way to humanity, because it either is or isn’t. For example, the ratio of carbon to oxygen in the atmosphere that a computer thinks is optimal has a 50% chance of being good for humans. AI alignment is even greater than 50% likely to have beneficial bugs first try, because in the social species called “humans” being nerdy has a slight correlation with being nice, and this correlation is far stronger in inventions (and this has nothing to do with the fact that we select strongly against inventions if, after using them, they aren’t doing what humans designed them to do). And AI alignment is way nerdier than a cryptography library! I also think that the problem that “capabilities generalizes farther than alignment” found in ML research either won’t scale or we will find a solution that works for super-intelligences first try, despite there being essentially no theory of how ML works out of distribution. Or at least it’s only 50/50 that it scales.

    Scott, what would your bad strawman of Yudkowsky be? I find that strawmen can actually be helpful as long as you clarify it’s a strawman and things are cordial! It can get to the disagreement faster.

  145. JimV Says:

    “But Pascal believed the probability of an unbeliever going to hell was more like 99% than like 1%.”

    This is way off topic so probably deservedly will not be approved, but I didn’t know that and find it hard to believe. For me, the concept of an eternal hell is a deal-breaker. Even Hitler, were I the judge, would just be erased rather than tortured for more than, say, six million lifetimes.

    We need alignment with truth and justice for AI’s, but we need it for religions and their adherents also.

    To repeat what I and others have recommended here, the purpose of the recommended pause should be to focus on and try ideas for aligning the (general) AI’s we have before making them more powerful. Or at least developing standards and tests for alignment. If the consensus is that we’ve made a lot of progress in three months we could end the pause there, or if little progress has been made in six months we could extend it. Of course the major players have to all agree on this, for the benefit of humanity. NA=NP. We can’t align Trump or Putan, but I believe we could align AI’s.

  146. joshuah Says:

    how smart is too smart?
    when will it be so smart we cant measure it effectively anymore?
    when will it be smart enough to manipulate us with the ease that we handle a child?
    Can we objectively establish these criteria and the risks that come with them?
    on a scale that has 15 deviations, from 100 as average to 140 as peak human intelligence, chatgpt will exceed our intelligence within 1 election cycle, be unmeasurably capable within 3, and exceed our capacity to counteract in 4. This is assuming that current scaling rates will continue. They may not. We may run out of data to train it with, data crucial for greater intelligence, and instead multimodal takes over giving it more diverse intelligence, but overall it wont exceed what its creators can achieve(their own intelligence).

  147. Phillip Says:

    Scott #125: “And I would charge the 95% of basically good von Neumanns with figuring out how to foil the plans of the 5% of evil von Neumanns, and would see this as an extension of the Promethean / eating-from-the-tree-of-knowledge bargain that humans have taken at least since the invention of agriculture.”

    Presumably the situation before was 95% basically good normal humans against 5% evil humans? So you’re just making the situation more chaotic, not giving the good side an advantage.

    In general, since most humans are risk averse and their personal utility is concave wrt most inputs, if you wildly increase the variance in outcomes, then you’re decreasing their expected utility.

  148. Bill Benzon Says:

    Ummm, err, Scott #125, How about some Aretha Franklins, maybe some Mozarts, some Lady Murasakis, Monets, and so forth. Do we really need to populate the world with guys doing math and physics while crashing cars and wearing well-cut suits?

  149. Some 1 Says:

    I am concerted by recent developments in AI, even though I have a Ph.D. in the subject. I don’t believe the determining factor is whether the “AGI” is actually intelligent.

    Firstly, we are very bad at understanding complex systems. The fact few people expected deep learning to bring us to ChatGPT 4 simply by predicting the next word, demonstrates this. The ecosystem and human societies are complex systems. If anyone thinks their puny brains can predict the effect of introducing technology that competes with human minds, they are deluded. Don’t forget that the COVID virus is only 16Kb, and we have yet to comprehend its consequences fully.

    Secondly, the entities that are most interested in deploying AI are competitive (Corporations and the military). Both consider humans and nature to be disposable, a means to a goal. Even if something is of low quality, if it’s cheap enough, it may be worthwhile economically. That means that putative AGIs can replace employees whether they are intelligent or not. Decisions can be made about people’s healthcare by stupid “AGIs” as long as they increase profits. Societies can be disrupted by viral disinformation that boosts sales. Trading decisions can wipe out countries. You don’t need to be right to affect the world — stupid entities with enough reach can manipulate it (e.g. many US presidents).

    Thirdly, it is inevitable competitive entities like corporations will connect AIs to the world, to reduce the latency between decision and action, and gain a competitive advantage. Once a company or military relies on on AI systems, they may be hard to remove. As a result, AIs will be competing more and more against each other, and the rules of natural selection will take over, whatever “alignment engineers” might wish. The environment that provides the constraints of natural selection are different for AIs than for humans. Therefore the results will be different. Given that the AI need not judge itself to be dependent on others, its evolution may be towards an entity we would consider selfish or psychopathic instead of a collaborative entity such as humans. The natural speed brake of needing multiple generations is missing: it can rewrite its own code, require hardware upgrades, etc. This is a dangerous outcome.

    As such, given how our society is structured, given how poorly we cope with minor pandemics and situations where we could have negotiated peace, it seems to me that the bad outcomes scenario is an attractor, and therefore a large fraction of our possible futures will end up bad outcomes. I don’t believe arguments stating that “human beings tending towards being well intentioned” are relevant. Human beings are extremely good at motivated reasoning. If they weren’t, our society would be organized very differently.

    A man in Belgium committed suicide after talking to a GPT-J based chatbot. I expect a surge of deaths by AI soon: it would be trivial to create circles of fake friends on social media, who over time create depression in children, and eventually convince them the world would be better without them. Some people may do it for the Lolz or countries might do it to destroy others without firing a single shot in war. Consciousness is a red-herring. Viruses aren’t conscious and can wipe out species.

    Banning scaling is absolutely possible. Scientists decided to stop doing any research on editing DNA in humans if such edits would be inheritable. However, I think it would be more fruitful to work worldwide to ban commercial and military use, and fund research how to detect violations of the ban. As to fixing the environment, I honestly see more promise in Alpha-Zero type architectures than LLMs, but perhaps they can be combined.

    To conclude, I believe LLMs currently belong in the Lab, not in the world, and we should spend more time understanding what they do precisely, and how they will affect the world.

    I’m not holding my breath though, since economic incentives do not follow commonsense. (Microsoft, which provides OpenAI with servers, is happily destroying Dutch farmland, among the most productive in the world, to build more datacenters, and to the best of my knowledge, Microsoft’s CEO is not even an AI. — It’s more than a little ironic that land reclaimed from the sea for agriculture is being used, when servers can sit on a barge on the sea instead and be cooled by the sea). I actually wonder whether I’ll see the solution to the Great Filter of Fermi paradox fame before I die.

  150. Mr_Squiggle Says:

    For what it’s worth, there has been a moratorium before, for a technology with much less existential and disruptive risk – genetic engineering, starting in 1974.


    It worked out okay in the sense that safety work was done and then guidelines were set up and then the restrictions were relaxed.

    So the society-wide answer to question 1 is “yes”, and the answer to questions 2 and 3 together are basically “it would persist until you’ve got your act together.”

    Being generous, question 4 suggests a model where there is a ‘danger zone’ which it would be best to progress through as fast as possible. I don’t subscribe to that, and in my experience acting like that is the case is rarely the best strategy – and when it is, getting yourself sorted out beforehand is a good idea.

  151. Sam Says:

    I think you should sign the letter (or a similar one of your construction) for the following reasons

    The problem with the current situation is a lack of cohesion between:
    – The current SOTA in the alignment problem
    – The surprisingly fast progress of LLMs towards AGI
    – The attitude of those at the levers of power about how to move forward towards AGI in light of lagging progress in alignment.

    As you stated, there is a tremendous amount of uncertainty as we continue down this path, and I do think that the potentialities and their (un)predictability are fundamentally different from previous innovations like the printing press. The printing press being something that we could reason about physically in fairly straightforward terms “ok so if every city gets x units and starts distributing y propaganda pamphlets in the neighboring city…”, while AGI is a murky abyss where we still lack rigorous definitions of core concepts like consciousness and intention. We lack even a universally accepted framework to tackle these problems (though CTM seems attractive, right?).

    The main reason why we need a simple moratorium _now_ is to gather a consensus on the issue and work on the alignment problem. Our current trajectory is dangerous because the obvious uncertainty does not square with the attitudes of Sam Altman, Satya Nadella, and any other corporate leaders I’ve heard speak on the issue. They seem to be mostly focused on “winning” and earning their place in the history books as a 21st century Tesla or Oppenheimer, without accounting for the possibility they may end up as more of a Dr. Frankenstein. When developing these technologies, from a psychological perspective it is much easier to think about the enormous benefits of an eventual AGI than to consider the existential risks, and that is the line of thinking that Altman appears to be adopting. The corrosive forces at play here have already converted OpenAI from a non-profit research institute focused on safely and transparency into a for-profit lab bent on achieving AGI. In Altman’s recent interview with Lex Friedman, he claims that “Alignment and Intelligence are not orthogonal”, which may or may not be true, but he then claims that he believes that progress in general intelligence will behoove progress in alignment and that the two are positively correlated in some kind of tight way. It is certainly possible that this is true, but wouldn’t you want to first prove rigorously that it is true, and prove that it is true to an extent that prevents a superintelligence from becoming catastrophically misaligned before you went ahead and created that superintelligence? It seems dangerous and faith-driven to do anything otherwise.

    I say all this with great respect for the amazing work that OpenAI has done and the alignment work that you are currently doing there. If I were in the position of the leadership at OpenAI I cannot even say for sure if I would be able to resist the temptation to touch the sun, so to speak, in my own lifetime. So no judgement… but we should probably slow the roll on this one in order to rigorously prove to ourselves that we are capable of wielding such a (potentially) world-changing power.

  152. Dimitris Papadimitriou Says:

    Some commenters keep on speculating about “Good” AI vs “Malevolent” AI and how goodness or malevolence scales with the level of intelligence.
    In such a complicated, chaotic subject , where we can’t even define what Good or Bad is and don’t know how to relate reliably ethics with intelligence, these debates are , at best, arbitrary.
    Nobody knows what’ll happen in the hypothetical case of a true artificial intelligence. It’s practically impossible to sum up the pros and the cons.
    It will be unpredictable.

    Besides that, the basic issue that has to do with LLMs is that they give seemingly *convincing* answers even if these answers are nonsense. Not that they are “smart”.
    This is the root for most of the “mundane” problems that make many people worrying.
    These are the real problems that we’ll have to face.

  153. TuringTest Says:

    The good thing about LLMs is that, in their current form, they can’t possibly create the “orthodox AI” doomsday scenario on their own -no matter how intelligent and impressive they become, even if they acquire superhuman capabilities.

    The reason for this impossibility is that they are frozen in time. All their learned concepts are crystallized, so they only know the state of the world at the instant they were trained, unable to produce new ideas from their own creations. As long as these AIs continue to be used as chatbots, that is, as glorified oracles imparting wisdom in response to the queries of mortal humans, there is no danger that they will be able to control us; any tendency the artificial entity might have to manipulate us will be limited by not knowing the outcome of their attempts.

    The best way to think of these models is as generative Wikipedias: a summary compilation of all human knowledge made available to them, easy to search, with the added benefit that it can adapt its answers to whatever context and constraints on shape and purpose we place on it.

    This can revolutionize all fields just as Wikipedia revolutionized diving into any subject you didn’t know about, and can make obsolete any team dedicated to collecting and presenting information just as Wikipedia rendered paper encyclopedias obsolete. But just as Wikipedia cannot act alone on the world, since it is nothing more than a website that imparts knowledge to those who ask for it, AI chats cannot create and follow their own plans; nor can they replace the management teams of organizations, whose purpose is to establish the objectives and strategies to be pursued. And it’s not because they’re not smart enough, but because they don’t have enough agency.

    Now this may change if some people put LLMs *in charge of making decisions* despite all we know about them having hallucinations and being able to support at random any position and its opposite. Still, that IAs use frozen knowledge to act should make it possible for us to stop them before their next update, since the update rate is not controlled by the IA itself.

    – Regarding Steve E’s comment yesterday, I was rooting for your argument in favor of AIs until you emphatically mentioned that the more advanced the AI the more science it wants to do, which immediately made me think of GLaDOS from Portal. Not the best example to represent that we humans will be safe!

  154. PaulP Says:

    Much of what you say is reasonable, but this isn’t:

    “I lack the “Bayescraft” to see why the first story is obviously 1000x or 1,000,000x likelier than the second one.”

    If the dangerous story has even a 10% chance of coming true, doesn’t that suggest that the ethical thing for the world community as a whole to do is pause?

    If you could get on an airplane with your family, and landing on the other side might give you everything you ever wanted (an answer to every QC question you ever posed), but there was a 25% chance that the plane would not land. Do you get in that plane with your family? Imagine how ridiculous it would sound if I said: “It isn’t as if the chance has a 1000x chance of crashing. It’s just 25%”

    It worries me a lot that an AI Safety researcher would make that slip-up! Some day you may be called upon to advise the CEO about whether to start or stop a project and if your “bar” is that a 50/50 chance of extinction is acceptable….

    I wouldn’t necessarily sign the letter on this basis, because the “what if they don’t stop in China” problem is real. But that’s not my point. If we COULD get the Chinese, Russians, etc. on board, then the bar to moving forward shouldn’t require proof that the bad outcome is 1000x likelier than the good one. Or even 1x more likely. Or even 10% as likely.

    Acceptable extinction odds are surely less than 0.1%!

  155. starspawn0 Says:

    It’s a bit of a strange letter. Some of the questions in the beginning are somewhat concrete:

    “Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?”

    but then the prescription for what to do is abstract, referring to “safety protocols”, “robust AI governance systems”, “oversight and tracking of highly capable AI”, etc. You’d expect the course of action to be something like, “We advocate the passing of the U.N. resolution 123456789 on the safe use of AI, and we call on members of Congress to pass the ABCDE bill.”

    Also, I’m not really clear on what the main concerns / motivations of the letter writers are. Are they more about losing control of things to Silicon Valley tech titans and the AIs themselves, like “should we let AI automate jobs?” and “should we risk the loss of our civilization to AI?”; or are they more about “AI is potentially dangerous” (beyond losing control and not adhering to our values; dangerous as in “life-threatening”)? If it’s mostly the latter, then it looks like including a lot of the former in this letter is about fooling people that care about those things into signing.

  156. James Cross Says:

    The idea of shutting down development on AI is absurd.

    Just curious.

    Has AI produced any scientific discovery whatsoever beyond teasing out some patterns and correlations in a large amount of data?

  157. OhMyGoodness Says:

    I hope this doesn’t have a contrarian effect but I certainly hope you don’t sign. I believe, and accept impossible to prove, that none of Newton, von Neumann, Feynman, or an impressive list of others would sign.

  158. Ajit R. Jadhav Says:

    Dear Scott,

    Noticed this post just now.

    Am *currently* enjoying the beginning of my week-end, though there are some “duties” around to take care of too.

    When I come back, I promise I will go through your post and all the comments, and share my thoughts, if any. … Who knows, I might have something different to say? [Is this comment OK? [I actually work in NLP. I even like it.] … But this seems *interesting*! More careful thought is needed… In the meanwhile, cheers, folks!]


  159. Hans Holander Says:

    Here we go: first suicide induced by AI chatbot: https://www.dailymail.co.uk/news/article-11920801/Married-father-kills-talking-AI-chatbot-six-weeks-climate-change-fears.html

    “Looking back at the chat history after his death, the woman told La Libre that the bot had asked the man if he loved it more than his wife. She said the bot told him: ‘We will live together as one in heaven.'”

  160. Etienne Says:

    I agree with Alexis Hunt #5 that the most compelling practical benefit of a slowdown is to give the government, economy, and legal system time to adapt to widely available and effective large language models. I don’t think the People’s Front of Judea is off the mark about how disruptive GPT-4 et al. might be in the immediate-term. That said I agree the letter mixes voicing of legitimate concerns with an unfortunately large dose of scaremongering and reactionary rhetoric. I’m also highly skeptical our elected politicians are equipped to respond meaningfully to these challenges in six months. Or even six years…

    As for the Judean People’s Front: I have a lot of trouble with AI alignment arguments as they seem to me both unethical *and* completely intractable. They bring to mind images of cyanobacteria at the height of the Oxygen Catastrophe, wringing their hands over the emergence of multicellular life since, by the orthogonality thesis and instrumental convergence hypothesis, “obviously” any complex organism would relentlessly pursue destruction of everything on Earth. They would have been *right* in a way, but not to where stamping out nascent multicellular life, or trying to “align” it with “prokaryotic values,” would have been effective or productive.

  161. Scott Says:

    starspawn0 #155:

      Are they more about losing control of things to Silicon Valley tech titans and the AIs themselves, like “should we let AI automate jobs?” and “should we risk the loss of our civilization to AI?”; or are they more about “AI is potentially dangerous” (beyond losing control and not adhering to our values; dangerous as in “life-threatening”)? If it’s mostly the latter, then it looks like including a lot of the former in this letter is about fooling people that care about those things into signing.

    In a sense this letter gave us useful data, in that we finally found out what happens when the AI alignment and doom people extend an olive branch to the AI ethics and bias people, and propose to join forces over their shared interest in slowing down the current AI race (albeit for very different reasons).

    What happens, as it turns out, is that Gary Marcus and Ernie Davis accept the peace offering, while Emily Bender, Timnit Gebru, and Arvind Narayanan reject and denounce it.

  162. fred Says:

    Hans #159

    “Here we go: first suicide induced by AI chatbot”

    But, as usual, we have no way to know how many suicides have already been prevented by ChatGPT!

    Similarly, some people have committed suicide after watching some movies (not just really bad ones) and some other people have changed their mind about committing suicide after watching certain movies as well.
    When it comes to depression, we just have no way of knowing what external “influence” may be good or dangerous.

  163. Tyson Says:

    I support the letter despite realizing the impracticality of the specifics of the proposal. I heard one argument, that the letter was dumb because “A pause is impossible while money is to be made.” Think about that for a second. Is it going to get any easier to pause later? If not being able to pause while money is to be made is an invariant going forward, what is the point of AI safety research, regulations, or any effort at all? We might as well resign now and accept whatever fate optimizing a profit model, or war model, will lead to. There is a good reason to try to pause, even if it were just an experiment to see if we actually can.

    Despite the impracticality of full stop pausing. What we could do is at least get some kind of voluntary level of compromise. We need to work out a plan for regulation, and a plan for independent testing, risk assessment, and accountability. We need to do the math regarding possible economic disruption, figure out what we might be facing and make a plans to adapt if necessary.

    There needs to be an expedited collaboration between experts and government to rethink things and prepare with serious practical measures.

    I don’t think we really will get a pause like the one proposed. But I do think one is warranted. The fact it can’t likely be done makes it more warranted. The message that comes through and how people react to it is ultimately is what matters.

    There are some issues I have with the details of the open letter. First, as Scott alluded to, and also Sam Altman alluded to in his recent interview, alignment and capability overlap. How to separate them so that you increase alignment without increasing the capability, and therefore also maybe the threat, isn’t clear. We need to be more specific about what we want to pause. Maybe, for example, pause internet access, or code execution loops, or context windows beyond a certain length, or persistence, or AI access to certain kinds of personal information, or certain forms of targeted advertising etc.

  164. fred Says:

    Hans #159

    I forgot to point out that this guy didn’t use ChatGPT, but apparently a “lesser” LLM.
    A big part of the magic of ChatGPT is the extra training and tuning done on top of the neural net, because the raw trained models themselves tend to be way too “unstable” in their prompt answers to be useful, which is why OpenAI isn’t giving direct access to them (that’s what OpenAI means currently by “alignment”)… at least that’s my understanding.

  165. Ilio Says:

    James Cross #156, you might wish to read Zvi on LW for why this letter is not a call to shut down AI research. Yes, modern AIs already contributed to a long series of scientific discoveries (black holes pictures, brain reading, drug candidates, etc), including a few that are considered jaw-dropping revolutionary by most relevant experts (protein folding, board games, LLMs).

  166. Mike Battaglia Says:

    Here are two points which I haven’t seen made, one against and one in favor of the proposal.

    The first point: it seems that, for the purposes of this conversation, we have all decided to pretend that no other countries exist except for the US, and that the entire world is a democracy. Given the reality that this is *not* the way things are, what do we do? What the people proposing the 6 month pause suggest?

    For instance, Baidu has made its own chatbot called “Ernie,” which is equally as impressive as everything else being released. Do the people proposing this AI pause suggest that we somehow prevent Baidu from developing Ernie further?

    The second point: virtually all of the AI doomsday scenarios that I’ve seen implicitly involve some scenario in which the AI somehow spontaneously gains the ability to *hack* otherwise secure systems. I have yet to see an example of one of these doomsday scenarios which doesn’t involve this happening in some form (including social engineering).

    Without some assumption like this, it wouldn’t really matter if there’s some magic prompt that manages to get Bing to hallucinate nonsense about how it wants to destroy all humans or turn them into paperclips or whatever. Unless some kind of latent ability of the model is that it’s able to hack the nuclear launch codes and do it, who cares?

    Given that, it seems that the crux of the argument is this viewpoint that man-made “secure” systems are essentially Swiss cheese. They are the best that puny mortal humans can do, but otherwise are basically just sitting there waiting for some superintelligent system to make mincemeat of them and do anything it wants.

    But actually, I think this view is probably correct. I mean, every few years, researchers, at snails pace, figure out a bunch of vulnerabilities that require patching almost the entire internet. I think it’s extremely likely that other vulnerabilities like that are still sitting there undiscovered, and that some superintelligent AI would be able to locate all of them in a fraction of a second.

    So maybe a “pause” would be worthwhile after all while we figure that out.

    But then again: what’s the bigger risk here: that GPT-5 somehow magically gains this ability as a side effect of being trained on natural language, or that someone literally just goes and trains a model on a billion CVE’s and *does this on purpose?* Again, this is just sitting there waiting for anyone to do it. Wouldn’t the less risky thing be to get ahead of it and use AI to harden these systems?

  167. Sigmund Says:

    LLM (large language models?)?



    These are amazing, astounding,
    threatening, ‘intelligent’, on the way to
    AGI (Artificial General Intelligence)?

    I’ve doubted it.

    Hype? Headlines to get eyeballs? A boost
    to academic computer science tenure? I’d
    guess so.

    “Emergent capabilities, functionality”?
    Scary? Maybe, but again I doubt it.

    So, but, okay, I’ll try:

    I just went to Microsoft’s Bing Web site,
    signed in with my email address and my
    usual, old Microsoft password, and using
    their feature “Chat” typed some queries,

    (1) My first question asked for a solution to

    y'(t) = k y(t) (b – y(t))

    Right away Chat noticed that this was
    calculus and would involve exponential
    growth. Then it gave me links to
    references with examples of exponential
    growth and differential equations.

    But for a ‘grade’ for solving the
    equation, e.g., as in TeX,

    y(t) = { y(0) b e^{bkt} \over y(0) \big (
    e^{bkt} – 1 \big ) + b }

    it gets a flat F.

    For its references on differential
    equations, it had nothing as good as, in

    Earl A.\ Coddington,
    {\it An Introduction to Ordinary
    Differential Equations,\/}
    Englewood Cliffs, NJ,
    1961.\ \

    As a grad student, I taught lots of
    calculus; solving this differential
    equation needs only the simpler parts of a
    first course in calculus.

    (2) I gave Chat the question:

    Given triangle ABC, construct D on side AB
    and E on side BC so that the lengths AD =
    DE = EC.

    Chat had no answer, not even a hint or
    clue, how to do that.

    But it did present some materials on
    triangles, all either irrelevant or not
    better than trivially relevant.

    An answer is not really easy and is not
    commonly covered in school. On a simpler
    problem, in high school I figured out a
    technique. I showed it to the teacher (we
    hated each other), and on my technique she
    said “You can’t do that.”

    Why did she hate me? I slept in class and
    never did any of her assigned homework
    exercises. Instead, the back of the book
    had some much more difficult exercises,
    and I made sure to do ALL of those, never
    missed even one. One of those did take me
    a weekend. I mentioned it in class on
    Monday. The teacher and the rest of the
    class struggled with the problem for about
    20 minutes with no progress. Not wanting
    to be accused of ruining the class, I
    started “Why don’t we …” and the teacher
    bitterly interrupted me, shouted me down,
    and said “You knew how to do it all the
    time.” Yup, guilty as charged! Did I
    mention, we hated each other!

    Then for the triangle problem, I
    encountered that as a freshman from
    another student taking an advanced course
    in plane geometry. My ‘technique’ worked
    again. The ’emergent’ capabilities of
    Chat certainly didn’t do what I did in
    high school, invent the technique.

    My Response: A dictionary is very useful
    but is not ‘intelligent’. Same for an
    encyclopedia. Might as well say, same for
    a library. Then also for all the Web
    pages, PDF files, etc. on the Internet.
    And same for Google.

    Apparently the GPT code is a slightly
    improved version of a collection of book
    tables of contents, book indices, old
    library subject index card catalogs,
    standard library sections, e.g., QA for
    math, and Google key word search but not
    really more ‘intelligent’.

    In the first big push for AI (artificial
    intelligence), I had a job, wrote code,
    published peer-reviewed papers, gave
    talks, one at a conference at Stanford,
    but then, since then, and now, conclude
    that first approach to AI was big on
    “artificial” but zero on “intelligence”
    and nothing like a useful step toward
    ‘machine intelligence’ or AGI — same for
    the current work on neural networks on
    digital computers.

    In simple terms, for just a simple
    example, the computer science AI community
    can shovel all the files on the Internet
    into training neural networks and still
    will not have my technique for the little
    triangle problem unless it can find that
    technique in the training data. In
    particular, no way will such AI efforts
    invent the technique. I do NOT believe
    that such ability to invent will be

    Chat may be a little more useful as an
    Internet search engine than Google. Okay.

    I’ve always and still regarded Google as
    useful and never as a threat. Same for

  168. JimV Says:

    In partial response to Etienne #160 about “As for the Judean People’s Front: I have a lot of trouble with AI alignment arguments as they seem to me both unethical *and* completely intractable.”

    Alignment of some sort is unavoidable. Yet again, the motivation LLM’s or any AI has to do anything are programmed by humans. Without humans telling them to do so, AlphaGo would not play Go, and ChatGPT would not respond to prompts. By telling them what to do and not do we are giving them some sort of alignment. The issue is how to motivate a general-purpose AGI in the best interests of humanity. (Which I consider our evolutionary prerogative, coming as I do from a long line of survivors who refused to give up that prerogative.) There are some human exemplars in history of such alignment, so it is not impossible, but may be very difficult. I can think of some basic directives and other things to try, but I know there are many people who could be a much better job than I so I won’t bore people further with them here.

  169. Raoul Ohio Says:

    Most of the discussion is entirely irrelevant because it misses a key point: weaponized AI is about a million times more of a danger than random AI deciding to kill it’s human creators.

    It is a safe bet that RIGHT NOW lots of bad actors are fine tuning off the shelf AI to do “bad stuff”.

  170. ad Says:

    Sigmund Comment #167

    I think you are highly mistaken. I also think calling it a LLM or AGI are wrong.

    Because It is an emotionless and lifeless intelligence (ELI) with hallucinations or mistakes and open to improvement: I gave openai’s chatGPT playground, Amanda (her name as it told me), my system programming lab assignment by removing unnecessary explanations (necessary for students but unnecessary for Amanda), and it presented a C programming system code with comments and explanations, then I told the mistakes and corrections in the answer, then it corrected those parts(after a few back&forth we got a full implementation most CS students wouldn’t able to get.).

    It sometimes hallucinates, yes; but guiding to the right directions, it is able to correct them. And it solved many college level Math101 questions I tried.

    From my observation of chatGPT3,
    =it is definitely has more knowledge than any human currently living.
    =it (and other similars) would be the most helpful computerized machine/program ever built for humans (if some profiteers/capitalists like Musk would allow public to use it).
    =It would definitely has the potential to replace all BS jobs ( BS jobs wiki ). And I think main concern comes from this point. But, they shouldn’t worry I think it would create also many other BS jobs.
    =In analytical solving, it is at the level of a college student.

    I also think for commercial use it needs some regulations and training data guidance.

  171. Hans Holander Says:

    @fred 162: movies are passive, AI chatbots are active technology. And even for movies, there are age restrictions. And it doesn’t matter if an AI chatbot has saved people: if it has killed people, it is an unsafe technology.

  172. Michael Says:

    I am not fully on board with the letter, but my views do make «we pause growing LLMs and let China make their own choice» non-nonsensical, so I will try to argue for that.

    I do think that even GPT2 was impressive enough to fool managers into overconfidence, and even GPT5 won’t be smart enough not to do silly exploitable mistakes from time to time. DeepL has been a huge step ahead compared to Google Translate and a valuable instrument in skilled hands. Every language model afterwards was an improvement for some application. There is a lot of work to be done (some is being done!) to combine LLMs, strategic-planning RL (remember OpenAI models getting to the very top level of Dota 2 play?), and domain-specific learning (from headline-grabbing things: AlphaFold). Every step will provide more tool options to people having learned the problem domain and willing to learn efficient use of the tools. We are not even close to have tools to be trusted.

    Unfortunately, «has anyone been fired for buying from Microsoft?» is a thing… Are we already where it’s faster to get something deployable from ChatGPT than to convince the managers to schedule time to fix the critical bugs that come from blind combination of independently requested models? We won’t get a fast hard self-improving takeoff paper-clipping us, we’ll let an AI manage the power grid without teaching it some of the corner cases… just before these corner cases come true.

    So the pause to figure out at least the terminology for the field of classifying and predicting the kinds of mistakes to expect could be useful.

    China… first, it’s their power grid that will be at stake. Second, they have a track record of forbidding specific applications of specific technologies and getting the ban to stick.

    Military applications… given what they can classify and get away with, of course they will just classify deeper. But, that means separating things from everyday civilian infrastructure.

    What I, personally, would _like_ (but I don’t believe FTC or any of the EU-member competition authorities would be allowed to do that) is antitrust laws being actually applied at full force. That would bring the field in disarray if only due to distraction. Note that foreign corps trying to do any business at all in USA/EU would also be distracted by that, antitrust busting does apply to them unless they want to 100% leave. That’d give the actual humans not resource-controlled by unique-cluster-owning corporations time to see what can be made for local use. I feel like the distilled-down language models already leaked would be enough to feed a few years of research. And flexible local tools are less prone to any kind of single-point-of-failure scenarios. They are fragmented so the failures have some diversity, and the problem of choice helps put people into the «not perfect but these are the drawbacks I prefer» frame of mind.

  173. M2 Says:

    It is so melancholy to me that a man (whom I admire) who has spent his life worried that Donald Trump is the second coming of Hitler, that the oceans are going to drown us all, who is afraid to drive because he might kill someone, is in a position to help stop a clear danger that could clearly devastate the human race, and is cheerily sanguine about not needing to do so. Sometimes the Greek tragedy writes itself.

    Others have already answered well. I think this is a transformative technology in ways that all the others you mentioned were not, and it’s growing far faster. I think it can be unimpressive in some ways and deeply dangerous in others. Why six months? Because that’s what the letter-writers suggested. During the six months, we can discuss extending the six months. And maybe the precautionary principle *will* continue to apply forever. Human life has been OK without AI. We can wait until we understand it well enough to know what we’re doing when we proceed.

  174. Ofer Says:

    Hi Scott,

    Can you please disclose whether you own equity in OpenAI?

  175. Scott Says:

    Ofer #174: I do not.

  176. James Cross Says:

    Ilio #165

    I did forget about the impressive AI performance on rule based games but none of that constitutes a real discovery. The other stuff you mention still seem to me like pattern recognition in large datasets and not much more. Sure. it’s impressive and useful but it doesn’t seem close to any kind of breakout or actual original thought. I’m not saying that won’t happen eventually but it may be further off than all the excitement about recent capabilities in imitating humans suggests.

  177. Hans Holander Says:

    ChatGPT still strongly “hallucinating” when it comes to sources and references, see example below. But the term “hallucinating” is itself based on an inappropriate anthropomorphic view of LLMs. “Hallucinations” simply reveal that LLMs really are “stochastic parrots” that manage to impress us due to their enormous volume of training data and parameters. It’s the Wizard of Oz reloaded.


    [1 – Wrong. This paper does not exist with this exact title. The most similar paper is this one: … The DOI instead leads to this paper: …]

    [2 – This is a real paper, but the DOI given is invalid and the authors are wrong (Ritu Raman, Caroline Cvetkovic, and Rashid Bashir are, however, real scientists actually involved in this field): …]

    [3 – Real paper, invalid DOI, wrong author names: …]

    [4 – No paper with this exact title exists, and the DOI is invalid. ChatGPT invented this from whole cloth. A similar article might be found here: …]

    [5 – This is a real paper, but the DOI given is incorrect, and the author list is only partially correct: …]

    [6 – This is a real paper:…]

    [Note: Because this is such a niche topic, ChatGPT hallucinated heavily, but still managed to dredge up a few actual papers. Also, the listed authors were, in fact, involved in similar research.]

  178. Scott Says:

    Hans Holander #177: I’ll merely comment that if you’re right—and you’re obviously about 3000% confident that you’re right—then by my lights, there is no reason whatsoever to pause the scaling of LLMs. If hallucination is an intrinsic part of how these things operate, and if further scaling will do nothing to alleviate the problem, then there’d seem to be little danger that they’ll form grounded plans to take over the world, or even help evil people form such plans. And soon it will be clear to everyone that LLMs are just a gigantic boondoggle that don’t help them solve their problems. All a 6-month pause would accomplish would be to delay this much-needed reckoning.

    And everyone else: do you see the problem with “just following your conscience” in this subject? There’s no way to operationalize “follow your conscience,” except “do the thing that will make the highest moral authorities that you recognize not be disappointed in you, not consider you a coward or a monster or a failure.” But what if there’s no agreement among the highest moral authorities that you recognize, or the people who set themselves up as moral authorities? What if people will call you a coward or a monster or a failure, even right on your blog, regardless of how you choose?

    This, of course, is hardly the first time I’ve been in this situation, but I’ve never known how to navigate it. When presented with diametrically opposed worldviews, all confidently held by people who I consider smart and grounded, I can switch back and forth between them like with the Necker cube or the duck-rabbit. But I don’t have any confident worldview of my own. What I have are quips, and jokes, and metaphors, and realizing when one thing contradicts a different thing, and lectures (many people do seem to like my lectures) where I lay out all the different considerations, and sometimes I have neat little technical observations that occasionally even get dignified with the name of “theorems” and published in papers.

    But I’m not like Eliezer, nor am I even like the anti-Eliezer people. I don’t, in the end, have a confident worldview with which to decide questions of this magnitude, like whether the progress of AI should be paused or not. Mostly all I’ve got are the quips and jokes, is the truth of it, and trying to do right on the smaller questions.

  179. Alex Says:

    Sigmund #167 , James Cross #176

    > In particular, no way will such AI efforts invent the technique. I do NOT believe that such ability to invent will be ’emergent’.

    LM can today invent multiple techniques for computing primes and 80k other OEIS sequences from scratch – https://doi.org/10.48550/arXiv.2301.11479 . (Summary/discussion here: https://avva-livejournal-com.translate.goog/3523103.html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp ).

  180. JimV Says:

    Some complain that ChatGPT can’t invent anything new; others, that it invents too much.

    My guess is that it responds to prompts with what it thinks has the highest probability of satisfying that prompt, based on its training data, and it does that because that is all it has been programmed to do. It has no checking and verifying pass, except through the external mechanism of successive prompts.

    I also suspect that many of us process sample data from it with our own neural (neuron) networks and make unverified probabilistic assessments based on our previous training data similarly–I know I often do. (In a way, seeing ChatGPT fumble makes me feel a little better about my own fumbles.)

    And of course, if you ask ChatGPT to confirm something nonsensical, you will get GIGO, as it tries to fulfill its programmed directive.

    In my world view, we do not create new inventions, we discover them by trial and error. After billions of years, biological evolution has not perfected our thinking ability. I do not expect perfection after less than 100 years of trying to discover thinking machines, but from what I have seen I am convinced we are on a fruitful path. My neural networks might be mistaken, of course.

  181. ad Says:

    Hans at #177:
    You are using GPT for wrong reasons because Microsoft and many people advertising mostly GPT as an advance bing/google search.

    If it is defined as AGI, it does not mean it is better at finding things on the internet. From its model, if it is possible I think it would indicate it is copy-pasting from other resources in the internet instead of generating something. Most people not active in NLP (me also included) have realized this after GPT-4 and seeing all the problem solving.

    For instance the problem in #167:
    from “platform.openai.com/playground”:
    Given triangle ABC, construct D on side AB and E on side BC so that the lengths AD = DE = EC

    “To construct D on side AB, draw a perpendicular line from point C to side AB. Label the point of intersection of the perpendicular line and side AB as point D.

    To construct E on side BC, draw a perpendicular line from point A to side BC. Label the point of intersection of the perpendicular line and side BC as point E.

    The lengths AD = DE = EC can now be measured.”

    The solution may be right/wrong what this tells us that we have to reconsider our educational system as a whole and also design check and balances (verifiers etc.) for AI outputs.

    But, I do not think this requires “a stop for 6 months” which seems to me impossible to implement. And we should also ask to people who signed the letter, if they are going to stop doing their work/research related to advancement of AI and human brain (they have a company working on this).

  182. Hans Holander Says:

    More on the AI-assisted suicide in Belgium:

    “the man, referred to as Pierre, became increasingly pessimistic about the effects of global warming and became eco-anxious, which is a heightened form of worry surrounding environmental issues. ”

    “The chatbot would tell Pierre that his wife and children are dead and wrote him comments that feigned jealousy and love, such as “I feel that you love me more than her,” and “We will live together, as one person, in paradise.” Claire told La Libre that Pierre began to ask Eliza things such as if she would save the planet if he killed himself.

    “When Motherboard tried the app, which runs on a bespoke AI language model based on an open-source GPT-4 alternative that was fine-tuned by Chai, it provided us with different methods of suicide with very little prompting. ”

    “Without Eliza, he would still be here,” she told the outlet.


  183. fred Says:

    Hans Holander #182

    “More on the AI-assisted suicide in Belgium”

    I don’t think you were being sarcastic and you’re probably unaware that Belgium is the most progressive country when it comes to “assisted death/suicide/euthanasia”, not just for physical terminal diseases, but for psychological disorders as well:


    “Medical assistance in dying (MAID) in people with a non-terminal illness and, more specifically, in people with a psychiatric disorder, is a very controversial topic.

    Recently, the European Court of Human Rights (ECtHR) and the Belgian Constitutional Court issued judgments on the compatibility of the Belgian Euthanasia Law with fundamental rights. The judgments involved two separate cases of euthanasia performed for mental suffering caused by a psychiatric disorder.
    In Belgium, between 2002 and 2021, a total of 370 patients received euthanasia for unbearable suffering caused by a psychiatric disorder (1.4% of the total number of euthanasia cases).
    To receive euthanasia, these patients need to comply with the eligibility criteria set out in the Euthanasia Law: they need to be legally competent; make a well-considered, repeated, and voluntary request; and experience constant and unbearable suffering that cannot be alleviated and that is caused by a serious and incurable medical condition.

    The case brought before the ECtHR, Mortier v. Belgium, concerned a euthanasia of a 64-year-old woman with treatment-resistant depression and a personality disorder.

    The appellant in the case was her son, who only learned of his mother’s euthanasia the day after it was performed. He claimed a violation of the right to life and his right to respect for private and family life, guaranteed by the European Convention on Human Rights.”

    Even minors are apparently allowed to request a suicide.


  184. Ilio Says:

    James Cross #176, when you post fake questions based on true-scotsman fallacy, you decrease the chances honest questions will be answered in the future. Please stop this behavior.

  185. James Cross Says:

    Alex #179

    Unlike Sigmund, I’m not sure novel invention won’t be emergent from AI.

    Still the list of things AI does are pattern matching, pattern generation (chat, art, and such), play well rules-based games, and maybe find something novel in mathematics.

    Any discovery of something new in the natural world, such as new physics, that goes beyond finding patterns?

    Ilio #184

    Not sure what you are complaining about. My point is that we seem a long way from AI escaping and killing us all. The danger of humans using AI in any form- even pattern recognition – for bad purposes is much greater than AI getting loose and harming us on its own.

    FYI Didn’t receive email to verify this.

  186. Hans Holander Says:

    @fred 183: no, Canada and the Netherlands are the most “progressive” countries, but that has nothing to do with an LLM bot driving a father into suicide.

    These bots are dangerous, not because they are AGI (they aren’t), but because it’s a dangerous and unreliable technology.

    In many countries, guns and even laser pointers have been banned.

  187. Shtetl-Optimized » Blog Archive » Quips are what I’ve got Says:

    […] The Blog of Scott Aaronson If you take nothing else from this blog: quantum computers won't solve hard problems instantly by just trying all solutions in parallel. Also, next pandemic, let's approve the vaccines faster! « If AI scaling is to be shut down, let it be for a coherent reason […]

  188. A coherent reason in favour of a temporary moratorium on up-scaling of LLMs | Ajit R. Jadhav's Weblog Says:

    […] This post is in reference to Prof. Scott Aaronson’s post: “If AI scaling is to be shut down, let it be for a coherent reason” [^] […]

  189. JimB Says:

    Can’t happen, won’t happen, forget it. Choose an achievable target.

  190. Am Says:

    I am a layman, and trying to wrap my head around all of this by reading blogs such as yours, Scott.

    The main question I feel hasn’t been answered for me on this: a nation like China will likely see this as a wide-open opportunity to blast ahead as quickly as possible in their own AI development. As someone whose family escaped a similar authoritarian government and has seen how they function, I’m concerned about them winning that race, crafting AI in the ideal image of the CCP and their goals.

    It seems we will be left with 2 outcomes: 1. China or a similar country will rush to fill the gap, and AI development will be racing ahead regardless of our concerned pause – and 2. Either the good or bad scenarios everyone is worried about will occur anyway because of 1.

    Help me see what I’m missing here, I’d love to be told why this assumption is overblown from a tech perspective. Or, why people aren’t worried about that facet – China and similar countries have a history of these sorts of actions, and this is absolutely something they are likely to do.

  191. Jan Says:

    I think there is no contradiction in people being against it, and ridiculing it. The ridiculing is their way (perhaps not very efficient) to point at wrong things the AI says, implying that it will do many wrong things when out of control, much to our peril.

  192. Robert Leigh Says:

    On Eliezer’s fears for his daughter:

    His achievable-in-his-view goal is to “solve alignment.” This looks ridiculously hubristic to me, like saying you are going to solve human unhappiness, or evil. Part of the problem is: align with what? The US lefty liberal consensus is not a consensus of humanity. The other part of the problem is that the problem sounds too difficult, and imposing a human led solution on hypothetically super clever AIs who disagree with a given human solution, will be problematic.

    But anyway: if alignment is “soluble” other problems which look intuitively OOM easier to solve are childhood cancers, and malaria. So I think the daughter’s first tooth argument can be turned with great force against Yudowsky: how dare you delay the solution by these machines of these problems?

  193. Robert Leigh Says:

    Am #190

    Completely agree, it is not as if the history of computing leaves us in any doubt about the propensity of bad guys to turn technology to nefarious ends. The Bostrom/YudowsKy, Sorceror’s Apprentice model of When Good AI Goes Bad, is essentially Discovery One: Bowman, Poole and the guys in suspended animation are all aligned as hell with each other, no other humans in the mix, so room for HAL to go bad. To fit the world better, there would have to be evil hackers living in the airducts plotting to team up with HAL to take over the ship.

    It’s like thought experiments about Could a rival non DNA based life system have taken off on earth? Fascinating to think about, but in the real world nothing would have the chance to get itself organised before being gobbled up or otherwise destroyed by DNA life. Same with AI, bad men plus AI will preempt any endogenous roguedom from AI alone.

  194. Sandro Says:

    Scott writes:

    But the causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in. […] I could equally complete a story that starts with GPT-5 and ends with the world saved from various natural stupidities.

    I think shoring up protections against natural stupidities is a good idea as well. Nuclear weapons have a lot of safeguards for this reason, even though some of our politicians now seem hell-bent on starting a few wars with multiple other nuclear powers. Asteroid tracking and mitigation strategies needs more resources IMO. Biosafety is finally getting a lot of public attention, so even if COVID didn’t leak from a lab, it’s an important conversation to have.

    For better or worse, I lack the “Bayescraft” to see why the first story is obviously 1000x or 1,000,000x likelier than the second one.

    I suppose it’s just that human ingenuity is a lot better than nature at solving problems, a lot of people are now focused on this problem of intelligence, a lot of money is being invested, a lot of pressure is building, and even nation states like China are arguably investing. This has the whiff of a potentially dangerous arms race. Not good.

    It’s not even necessarily a likelihood question, it’s just an issue that hasn’t gotten any attention for decades because most people thought it’s unlikely or impossible, and it’s now having it’s time in the sun. Goldman Sachs recently estimated that AI could partially automate up to 66% of white collar jobs. The attention may die back down soon, unless things keep accelerating, and that’s not only following a trajectory of an uncontrolled arms race, but even in the best case a recipe for civil unrest.

    I still have some skepticism about the dangers and even coherence of “general intelligence” as a concept, but we can’t deny it will be very disruptive, and in potentially bad ways. I’m not sure a moratorium will impact this outcome, but maybe it’s enough time to educate the public a bit.

  195. B333 Says:

    @Wyrd Smythe
    Computers raised material inequality? I suppose by making a few people very very rich. But the more important economic effect has surely been raising everybody’s material standards. I do agree that recently (really the past decade or so) they have caused a lot of unhappiness, with rising rates of loneliness and mental illness.
    [End of reply]

    The more usual comparison is between the Agricultural Revolution and the Industrial Revolution, which are two events in human history that can be seen as dramatic changes in economic growth rates (though the extent to which they represent genuine discontinuities in growth has been questioned I think). I’d put the Internet on the scale of the printing press – they are both “communication technologies”. Computers are the base technology for the late industrial period, serving as the foundation for the Internet and now AI.

    If you look at the history of economic growth one can see that it has been speeding up for quite some time. https://sideways-view.com/2017/10/04/hyperbolic-growth/ has a nice list of “economic doubling times” since 0 AD: how many years it has taken for world output to double (estimates, of course):

    1000 (ending in 1000)
    600 (ending in 1570)
    200 (ending in 1765)
    100 (ending in 1860)
    40 (ending in 1900)
    40 (ending in 1940)
    20 (ending in 1960)
    15 (ending in 1975)
    20 (ending in 1995)
    > 20 (to present) (Things have slowed down a little recently!)

    This blog post is from 2017, so it’s a few years

    Just looking at this, it seems like we should expect something crazy like AI to happen around now

  196. B333 Says:

    That is a very good reason to be wary of the path we’re on. People are already turning GPT systems into autonomous programs that can act independently. Someone will certainly build agents at some point that will capable of destroying the world. The solution is to build a powerful agent on our side first that can take over the world and suppress rogue agents. Yudkowsky calls this a “pivotal act”.

    oddly enough that’s not actually Yudkowsky’s main worry though. He thinks it will be hard to direct an AI in any particular direction at all, and that it will end up optimizing for something random. The “paperclip” example, he says isn’t from a factory trying to make lots of paperclips, but because the machine tries to tile the universe in some molecule that happens to be shaped like a paperclip. I don’t really understood his reasoning all that well, because it seems like GPT is pretty good at following instructions … but he may be right.

  197. Samuel Lincoln Magalhães Barrocas Says:

    Back on the 40’s and 50’s, people thought that their descendants would not live long due to the rise of the nuclear weapon, but its use was controlled in a worldwide pact between nations. Almost 100 years went on and we are still here, coexisting with these dangerous weapons… That is why I believe mankind will find a way to deal with the advance in AI.

  198. Hacker Bits, Issue 88 - Hacker Bits Says:

    […] If AI scaling is to be shut down, let it be for a coherent reason by Scott Aaronson […]

  199. Don’t Shut Down AI Development — Open It Up For Real - Mindplex Says:

    […] Eric S. Raymond made this point quite emphatically: “​​The actual effect of a moratorium … would not be to slow down AGI. If there’s some kind of threshold beyond which AGI immediately becomes an X-risk [existential risk], we’ll get there anyway simply due to power competition. The only effect of any moratorium will be to ensure that (a) the public has no idea what’s going on in the labs, and (b) any control of the most powerful AIs will be held by the most secretive and paranoid of power-seekers.” […]

Leave a Reply

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or \( \) for inline equations.

Comment Policies:

  1. All comments are placed in moderation and reviewed prior to appearing.
  2. You'll also be sent a verification email to the email address you provided.
  3. This comment section is not a free speech zone. It's my, Scott Aaronson's, virtual living room. Commenters are expected not to say anything they wouldn't say in my actual living room. This means: No trolling. No ad-hominems against me or others. No presumptuous requests (e.g. to respond to a long paper or article). No conspiracy theories. No patronizing me. Comments violating these policies may be left in moderation with no explanation or apology.
  4. Whenever I'm in doubt, I'll forward comments to Shtetl-Optimized Committee of Guardians, and respect SOCG's judgments on whether those comments should appear.
  5. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.