## Reform AI Alignment

Update (Nov. 22): Theoretical computer scientist and longtime friend-of-the-blog Boaz Barak writes to tell me that, coincidentally, he and Ben Edelman just released a big essay advocating a version of “Reform AI Alignment” on Boaz’s Windows on Theory blog, as well as on LessWrong. (I warned Boaz that, having taken the momentous step of posting to LessWrong, in 6 months he should expect to find himself living in a rationalist group house in Oakland…) Needless to say, I don’t necessarily endorse their every word or vice versa, but there’s a striking amount of convergence. They also have a much more detailed discussion of (e.g.) which kinds of optimization processes they consider relatively safe.

Nearly halfway into my year at OpenAI, still reeling from the FTX collapse, I feel like it’s finally time to start blogging my AI safety thoughts—starting with a little appetizer course today, more substantial fare to come.

Many people claim that AI alignment is little more a modern eschatological religion—with prophets, an end-times prophecy, sacred scriptures, and even a god (albeit, one who doesn’t exist quite yet). The obvious response to that claim is that, while there’s some truth to it, “religions” based around technology are a little different from the old kind, because technological progress actually happens regardless of whether you believe in it.

I mean, the Internet is sort of like the old concept of the collective unconscious, except that it actually exists and you’re using it right now. Airplanes and spacecraft are kind of like the ancient dream of Icarus—except, again, for the actually existing part. Today GPT-3 and DALL-E2 and LaMDA and AlphaTensor exist, as they didn’t two years ago, and one has to try to project forward to what their vastly-larger successors will be doing a decade from now. Though some of my colleagues are still in denial about it, I regard the fact that such systems will have transformative effects on civilization, comparable to or greater than those of the Internet itself, as “already baked in”—as just the mainstream position, not even a question anymore. That doesn’t mean that future AIs are going to convert the earth into paperclips, or give us eternal life in a simulated utopia. But their story will be a central part of the story of this century.

Which brings me to a second response. If AI alignment is a religion, it’s now large and established enough to have a thriving “Reform” branch, in addition to the original “Orthodox” branch epitomized by Eliezer Yudkowsky and MIRI.  As far as I can tell, this Reform branch now counts among its members a large fraction of the AI safety researchers now working in academia and industry.  (I’ll leave the formation of a Conservative branch of AI alignment, which reacts against the Reform branch by moving slightly back in the direction of the Orthodox branch, as a problem for the future — to say nothing of Reconstructionist or Marxist branches.)

Here’s an incomplete but hopefully representative list of the differences in doctrine between Orthodox and Reform AI Risk:

(1) Orthodox AI-riskers tend to believe that humanity will survive or be destroyed based on the actions of a few elite engineers over the next decade or two.  Everything else—climate change, droughts, the future of US democracy, war over Ukraine and maybe Taiwan—fades into insignificance except insofar as it affects those engineers.

We Reform AI-riskers, by contrast, believe that AI might well pose civilizational risks in the coming century, but so does all the other stuff, and it’s all tied together.  An invasion of Taiwan might change which world power gets access to TSMC GPUs.  Almost everything affects which entities pursue the AI scaling frontier and whether they’re cooperating or competing to be first.

(2) Orthodox AI-riskers believe that public outreach has limited value: most people can’t understand this issue anyway, and will need to be saved from AI despite themselves.

We Reform AI-riskers believe that trying to get a broad swath of the public on board with one’s preferred AI policy is something close to a deontological imperative.

(3) Orthodox AI-riskers worry almost entirely about an agentic, misaligned AI that deceives humans while it works to destroy them, along the way to maximizing its strange utility function.

We Reform AI-riskers entertain that possibility, but we worry at least as much about powerful AIs that are weaponized by bad humans, which we expect to pose existential risks much earlier in any case.

(4) Orthodox AI-riskers have limited interest in AI safety research applicable to actually-existing systems (LaMDA, GPT-3, DALL-E2, etc.), seeing the dangers posed by those systems as basically trivial compared to the looming danger of a misaligned agentic AI.

We Reform AI-riskers see research on actually-existing systems as one of the only ways to get feedback from the world about which AI safety ideas are or aren’t promising.

(5) Orthodox AI-riskers worry most about the “FOOM” scenario, where some AI might cross a threshold from innocuous-looking to plotting to kill all humans in the space of hours or days.

We Reform AI-riskers worry most about the “slow-moving trainwreck” scenario, where (just like with climate change) well-informed people can see the writing on the wall decades ahead, but just can’t line up everyone’s incentives to prevent it.

(6) Orthodox AI-riskers talk a lot about a “pivotal act” to prevent a misaligned AI from ever being developed, which might involve (e.g.) using an aligned AI to impose a worldwide surveillance regime.

We Reform AI-riskers worry more about such an act causing the very calamity that it was intended to prevent.

(7) Orthodox AI-riskers feel a strong need to repudiate the norms of mainstream science, seeing them as too slow-moving to react in time to the existential danger of AI.

We Reform AI-riskers feel a strong need to get mainstream science on board with the AI safety program.

(8) Orthodox AI-riskers are maximalists about the power of pure, unaided superintelligence to just figure out how to commandeer whatever physical resources it needs to take over the world (for example, by messaging some lab over the Internet, and tricking it into manufacturing nanobots that will do the superintelligence’s bidding).

We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals.  We’d expect even an agentic, misaligned AI, if such existed, to need a stable power source, robust interfaces to the physical world, and probably allied humans before it posed much of an existential threat.

What have I missed?

### 184 Responses to “Reform AI Alignment”

1. Skeptic Says:

There is a ton of wordcel essays on AI alignment and yeah, that is a religion – these people should show everyone what they have done *in technological terms*: only then I will stop being a massive skeptic.

AFAIK Yud can’t code.

2. Isaac Grosof Says:

I’ve been thinking about this split in the AI risk community for a while – I wrote a blog post on it back in February: https://isaacg1.github.io/2022/02/08/ai-safety-board.html

I would add to point (4) that in addition to the need to study existing systems to verify our AI risk theories against reality, we also need to engage with existing systems as a pathway to implement techniques to lower AI risk. The goal is to make techniques that lower AI risk standard throughout the industry, so that when high-danger AI systems start arising, risk-reduction techniques have become standard practice.

My impression is that this pathway aligns more with the Reform AI-risk community.

3. Davide Orsucci Says:

Thanks for posting this, it looks like an excellent summary of the shape that the AI safety community is molding into!

A quantitative difference between the Orthodoxs and the Reformists could also be in the probabilitistic estimate of AI-induced human extinction by the end of the century. Eliezer is convinced that this probability is indistinguishable from one (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities) while Toby Ord assessed it to around 10% in his book “the precipice”. I myself am undecided, but frankly I fear that conceivably Eliezer might be right (not that a 10% chance of human extinction is a rosy situation, but it leaves plenty of space for hope). Do you have any probabilistic assessment in this regard?

4. I Says:

Reformed AI-riskers typically think there’s a 10-40% chance of extinction from AI, and a higher chance of existential risk i.e. something that permantently cripples humanity’s chance to claim our cosmic birth right.

Orthdox AI-riskers typically think there’s a much higher chance of extinction risk, and a somewhat higher chance of existential risk.

I.E. AI’s still the biggest threat around, by far.

Reformed AI-riskers think AGI is much more likely to come out of the current paradigm than Orthodox AI-riskers.

But honestly, this definition doesn’t seem like a natural one. Yes, there is something like a cluster of people who think we’re much more likely to be doomed vs not. But these views you describe don’t cut between the clusters, but through them. Many AI-riskers don’t think we’re doomed, but are wary of public advocacy. Many AI-riskers worry about misaligned superintelligence but think we’d get the first AI through a manhattan like project. And so on. Your post feels like it is going to confuse people more than not. This framing feels quite conflict-oriented/aggresive as well, which was kind of upsetting. There’s content about what your model is, but not enough to give us a gears level understanding.

It would have been better if you had focused on the object level problem rather than the social dynamics, like you did with. Thinking about it, peopel who were bugging you about this stuff were probably hoping for something like your post on the independance of CH. That was a fantastic primer, with excellent discussion. OK, this is meant to be an appetizer, and probably something you wrote just to get the ball rolling. But it feels more like a journalist writing about AI safety than a brilliant, technical minds first foray into a field.

5. Michael Says:

I would maybe add to (8) a symmetrical statement: a _weak_ agentic intelligence unreasonably given too much stable unsupervised power has before and will in future deal a lot of damage.

This of course fits the comment on (4) in #2 by Isaac Grosof, and also (2): one needs some established facts about safety to start advocating for making the extremely bad ideas criminal negligence…

6. jeromy Says:

Hi, Scott.
“An invasion of Taiwan might change which world power gets access to TSMC GPUs….”

This is not true….
No doubt. TSMC is the world’s most important hardware company. But TSMC relies on the US and Japan for production equipment and materials. Invading Taiwan cannot get the world’s most advanced GPUs……

Best

7. Miquel Ramirez Says:

For the first time (ever?) I can make a comment on a topic in this blog relatively speaking close to my research.

I think you have made a pretty thorough enumeration, very helpful as well. Yet the dychotomy you have posed between Orthodox and Reform will probably piss off someone (my own take is that the labels should be Asimovians vs. Turingites). I appreciate the humour, so let me play along. Is Stuart J. Russell’s “Human Compatible” to be listed in the codex of proscribed texts of the Reform Church, or part of the Accepted Gospel?

Jokes aside, I find 8) to be particularly important and key to turn this discussion from the purely phylosophical discourse to one that is *both* philosophical and scientific.

8. Hyman Rosen Says:

I like your Orthodox vs. Reform analogy, because it points to the fact that just like Judaism, both sides are wrong. Gods don’t exist, and the sort of AI risk that the people making careers of worrying about does not exist either, and is unlikely to exist for decades, possibly centuries, and if it ever does exist, will have risk features that none of the careerists ever anticipated. AI risk studies now are a pure waste of time and resources, except for the people making a living off them.

9. Daniel Kokotajlo Says:

Thanks for this, I think it’s a helpful contribution to the conversation! Here are my opinions:*

I think 5, 6, 7, and 8 are unfair mischaracterizations of the orthodox position. 1, 2, 3, and 4 are a bit lopsided/biased but basically right.

Here’s a point by point reply; for the first 4 I’ll explain why I think the orthodox position is more correct than the reform position, and for the last 4 I’ll explain what I think the orthodox position actually is.

(1) “Reform AI-riskers… believe that AI might well pose civilizational risks in the coming century, but so does all the other stuff, and it’s all tied together. An invasion of Taiwan might change which world power gets access to TSMC GPUs. Almost everything affects which entities pursue the AI scaling frontier and whether they’re cooperating or competing to be first.” –> Yeah, but crunch the numbers. The civilizational risk posed by other x-risks such as climate change, bio risk, nuclear war, etc. is large in absolute terms but small compared unaligned AGI. And at the end of the day, it will come down to how some group of engineers at some group of AI labs (possibly just one lab) program their training runs / deploy the results, what alignment techniques they use, etc. You can call them “elite” if you want, but that’s not the orthodox position, the orthodox position is simply that paths to a good future involve those 1000 or so engineers doing a good job on safety.

(2) “Orthodox AI-riskers believe that public outreach has limited value: most people can’t understand this issue anyway, and will need to be saved from AI despite themselves.
We Reform AI-riskers believe that trying to get a broad swath of the public on board with one’s preferred AI policy is something close to a deontological imperative.” –> Most people can’t understand how cars work, or how vaccines work, etc. So it’s not a crazy claim to say most people can’t understand how AGI works either. That doesn’t mean they can’t be “on board” though — the public generally believes vaccines are a good idea without understanding the science well enough to judge, because the public defers to authority. So too with AGI matters. The important thing is to convince the experts, and then the public will follow. But anyhow (unfortunately) the public has less of a say in this matter than they should–the general public would be FURIOUS if they understood what was happening in the AI industry. Alas, by the time they realize, it’ll be too late; the public has only limited and slow ability to affect what happens at AI labs. Also: When you say deontological imperative, do you really mean that? As in, you think that even if it’s looking probable that focusing on public outreach (instead of, say, actual safety research, or outreach to labs) is going to doom the world, we should still do it?

(3) “Orthodox AI-riskers worry almost entirely about an agentic, misaligned AI that deceives humans while it works to destroy them, along the way to maximizing its strange utility function. We Reform AI-riskers entertain that possibility, but we worry at least as much about powerful AIs that are weaponized by bad humans, which we expect to pose existential risks much earlier in any case.” –> This is an accurate depiction of the orthodox view, and my view. I don’t actually think that human weaponization will pose existential risk much earlier. Quantitatively I’d say maybe… a year earlier? Something like that. How many years would you say? And quantitatively how much risk–e.g. I’d guess that during that year when humans can destroy the world using AI, but AIs are not yet powerful enough to deceive humans & reason strategically on their own, there’s something like a 5% chance that humans actually destroy the world. So, pretty unlikely, though a lot higher than the historical base rate of course (and still terrifying). Whereas the risk from unaligned AGI is a lot higher than 5%.

(4) “Orthodox AI-riskers have limited interest in AI safety research applicable to actually-existing systems (LaMDA, GPT-3, DALL-E2, etc.), seeing the dangers posed by those systems as basically trivial compared to the looming danger of a misaligned agentic AI. We Reform AI-riskers see research on actually-existing systems as one of the only ways to get feedback from the world about which AI safety ideas are or aren’t promising.” –> Yep, the looming danger of misaligned agentic AGI sure does seem quantitatively a lot bigger than those other terrible risks. What numbers would you put on them (probability x utility) such that they are within an OOM of each other? As for feedback from reality: Yep, it’s pretty hard to get feedback from reality about this stuff unfortunately. That’s actually one of the key tenets of the orthodox position, and why the orthodox position is so pessimistic–humans have a terrible track record of solving problems without feedback from reality. I’d be interested to hear concretely how you think e.g. studying the failure modes of DALL-E gives us feedback from reality about which methods for aligning a possibly-deceptive agentic AGI are going to work.

(5) “Orthodox AI-riskers worry most about the “FOOM” scenario, where some AI might cross a threshold from innocuous-looking to plotting to kill all humans in the space of hours or days. We Reform AI-riskers worry most about the “slow-moving trainwreck” scenario, where (just like with climate change) well-informed people can see the writing on the wall decades ahead, but just can’t line up everyone’s incentives to prevent it.” –> OK now this is a mischaracterization I think. Even Yudkowsky has often thrown around much bigger numbers, like six months. I’ve been saying a few months to a few years, though recently I’ve updated towards faster takeoff. Quantitatively what do you think the takeoff will look like?

(6) “Orthodox AI-riskers talk a lot about a “pivotal act” to prevent a misaligned AI from ever being developed, which might involve (e.g.) using an aligned AI to impose a worldwide surveillance regime. We Reform AI-riskers worry more about such an act causing the very calamity that it was intended to prevent.” –> Orthodox AI riskers also worry about such an act causing the very calamity it was intended to prevent. They just think that if no one does a pivotal act, someone will create unaligned agentic AGI. Also, note that the pivotal act you chose is in fact what the world does to deal with nuclear proliferation and bioweapons–we have people whose job it is to discover rogue WMD projects before they are completed, and then nation-states bring pressure to bear to stop them. Also, the main example of a pivotal act (such as the one originally used https://arbital.com/p/pivotal/) is even more benign: Use narrow AI to solve uploading so that we can upload our alignment researchers and they can then think faster and figure out a more robust solution to alignment. (All that said, I think the “pivotal act” framing/terminology has caused more harm than good and should be dropped. One way in which it does harm is that it encourages people to do more AGI capabilities research, justifying it to themselves on the grounds that they are going to use it For Good.)

(7) “Orthodox AI-riskers feel a strong need to repudiate the norms of mainstream science, seeing them as too slow-moving to react in time to the existential danger of AI. We Reform AI-riskers feel a strong need to get mainstream science on board with the AI safety program.” –> Orthodox view also would LOVE mainstream science to get on board. The problem is that it doesn’t seem to change quickly enough. I’d love to be wrong about this. Anyhow, the reason I think this is a mischaracterization is that I don’t know what you mean by “repudiate the norms of mainstream science.” It sounds like you mean “be unscientific; lower our epistemic standards.” If instead you mean something like “Just do research and share it directly with labs instead of trying to get it published in prestigious journals first” then it’s not a mischaracterization, just a disagreement about strategy.

(8) “Orthodox AI-riskers are maximalists about the power of pure, unaided superintelligence to just figure out how to commandeer whatever physical resources it needs to take over the world (for example, by messaging some lab over the Internet, and tricking it into manufacturing nanobots that will do the superintelligence’s bidding). We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals. We’d expect even an agentic, misaligned AI, if such existed, to need a stable power source, robust interfaces to the physical world, and probably allied humans before it posed much of an existential threat.” –> How hard do you think it’ll be for it to get allied humans? I think that might be the crux. I label this a mischaracterization because *of course* AI needs a power source to exist, *of course* it needs to interface with the physical world, etc. eventually. The orthodox position is that if it is smart it’ll be able to get those things fairly easily, e.g. by playing dumb and pretending to be nice and then using persuasion or hacking abilities. (As for the underlying disagreement here about the limits of pure intelligence: yes there’s a disagreement there too. But a better characterization of the disagreement would be about what sorts of takeover scenarios are most realistic, and about how much we should take seriously the possibility of surprising strategies–humans losing in a way they didn’t anticipate.)

*Should go without saying but my opinions are my own and do not represent my employer etc.

10. gentzen Says:

Today GPT-3 and DALL-E2 and LaMDA and AlphaTensor exist, as they didn’t two years ago, and one has to try to project forward to what their vastly-larger successors will be doing a decade from now.

I guess they will consume orders of magnitude more electrical power (and cooling) than they already do today. Perhaps they consume it for their training on every electronic document available to them, or for their playing against themselves and each other to gain more experience, or for their operation to exploit the results of all that learning and experience. Or for whatever else, that we cannot even imagine today.

11. Ilio Says:

@Scott, enlightening, thanks!

@AIA folks, what about counting ourselves?

1-Reformist
2-Orthodox
3-Reformist
4-Neither
5-Reformist
6-Reformist
7-Orthodox
8-Reformist

Total: 5/8 reformist, 1/4 orthodox, 1/8 unaligned.

12. Scott Says:

Hyman Rosen #8: In subsequent posts, I’ll offer a detailed case that you’re wrong. Briefly, though, GPT-3 and DALL-E2 already required a safety team to be rolled out in a way where they wouldn’t spew racist invective, violent and sexually explicit rhetoric and images, deepfakes, bad medical advice, etc etc, thereby generating a backlash that would’ve forced them to be withdrawn from use. We can expect the safety problems to become more severe as these systems improve and become better and better for (e.g.) impersonation, propaganda, and academic plagiarism. Soon I’ll tell you about some concrete projects I and others are working on to address these sorts of misuses—not in some speculative future but in the next year!

13. Scott Says:

Miquel Ramirez #7: Human Compatible strikes me as one of the rare books with cross-cutting appeal, for the Orthodox and Reform orientations alike. High-quality technical research in AI safety is a relatively new phenomenon, but as such research becomes more common, of course I hope it will have cross-cutting appeal as well.

14. Reform AI Alignment - My Blog Says:

15. abe Says:

You’ve missed the fact your premise is one from a cynical contempt of humanity. You’re arguing over two competing visions where the underlying assumption is that the technology is better handled by an elite few instead of trying to democratize it for all. Even the name, “OpenAI”, is a cynical attempt at open washing while not providing the underlying technology to the public.

If power corrupts then by not trying to democratize AI you’re participating in the corrupting influence. I distrust anyone who tells me they’re acting in my best interest while stripping me of power. I would trust you more if you released the code and the models under a libre/free license and actually took efforts to democratize AI in a real sense.

In my opinion, history shows that this type of corruption is dealt with when more people are empowered, usually through new technology. OpenAI could have been an accelerant to empowering people. Instead, it’ll most likely be remembered as misguided attempt to help, at best, or, more likely, a venal attempt to control the inevitable.

16. Scott Says:

Davide Orsucci #3:

Do you have any probabilistic assessment in this regard?

I’d second Toby Ord in giving at least (say) a 1/6 probability of human extinction in the next century. Furthermore, if we do go extinct, more likely than not I expect AI to have played a role, for the simple reason that I expect AI to play a role in everything before long. But I still regard scenarios involving (e.g.) AIs being misused by bad humans, or allied with bad humans, as far more likely than the Skynet / paperclip-maximizer scenario. I think we should be so lucky to survive for long enough that the latter scenario becomes the likeliest!

17. Scott Says:

abe #15: It won’t surprise you that I disagree about the “cynical contempt for humanity” part. 🙂

OpenAI has, objectively, done more than any other company to make extremely powerful text and image models (GPT-3 and DALL-E2) available for use by the general public—so much so that others have given them a lot of flak for it! But I personally feel like public access is absolutely crucial for the world to understand what’s now possible and its potential dangers. These things have to be tried to be believed.

Like many at OpenAI, I actually agree with you about the need for a democratic process to help determine the values of these systems, as they become more powerful. How to design such a process is itself an extremely hard question!

Having said that, there’s a very obvious problem with releasing the code and models, as you suggest. Namely, the moment you do that, anyone becomes able to run the models without any of the safeguards you meticulously added — and for every person praising you for your openness, there will be a thousand denouncing you for your recklessness. We’ve already started to see such a dynamic with Stable Diffusion, for example.

18. David Karapetyan Says:

Has anyone figured out how to cure cancer with AI? Understanding atomic physics allows for the creation of nuclear reactors and bombs. Presumably if symbol shuffling machines are on the brink of annihilating humanity we’d see some telltale signs of these capabilities applied towards something that would be analogous to a nuclear reactor or a bomb and yet all the AGI fanfiction seems to be literal fantasy combined with some basic probability theory.

So if we are indeed faced with existential risk from symbol shuffling gadgets powered by oscillatory sources of electrons then it sure does seem like these gadgets are waiting until global warming does most of the damage before they take over.

19. Scott Says:

David Karapetyan #18: If you’ve been paying attention, you’ll know that within the last year AlphaFold has been having a huge impact on almost everything in molecular biology, including cancer research. That doesn’t mean that AI is on the brink of replacing cancer researchers, but is one more indication of the at-least-as-large-as-the-Internet impact that it’s about to have on the world.

If you reread my post carefully, you might notice that I never once endorsed the view that AI is “on the brink of annihilating humanity.” Quite the contrary! Of course, AI doesn’t have to be on the brink of annihilating humanity, to raise all sorts of safety concerns that require addressing.

20. asdf Says:

But planes are still decades away from displacing most bird jobs.

https://guzey.com/ai/planes-vs-birds/

21. zesty Says:

Love this post Scott. I blogged a little bit on “alternative AI safety” a few months back. Some ideas I think could be developed further:

– can we measure the work / energy that an AI system could conceivably do, in order to come up with a rough calculation of an AI systems “interfaces to the physical world”? (I realize how daunting / impossible this)
– is there political will to Manhattan project 2.0 this? (don’t ban AI, but instead make Deepmind / OpenAI / Anthropic all move to the nevada desert with its own internet /power etc.)
– What other mathematical / scientific subjects could be applied to AI Alignment that haven’t really yet? e.g. Chaos theory / Complex systems
– What can we learn about other potential high risk technologies? If nuclear engineers wrote safety guidelines for reactors instead of regulators, what would they look like?

22. ultimaniacy Says:

Rather than “Reform”, I think a better religion analogy would be “Cultural AI Alignment”. Just as so-called “cultural Christians” find inspiration to be a good person in the Gospels without actually accepting the core tenets of Christianity, I would say that what you call “Reform” AI alignments don’t actually agree with Yudkowsky and his followers on any non-trivial factual point, they just think his writing has some good reminders to use common sense when designing things.

“Aligning AIs is important” is, by itself, an extremely trivial point. “Aligning” is just a fancy way of saying “getting it to do what people want”, and the default assumption when you’re developing a product is that you should make sure it does what you want before you make it available for public use. “We need to work on making sure AIs are safe” is similarly trivial; that you should be careful to look out for unforeseen risks is, again, the default assumption when you’re developing radical new technologies.

The thing that makes Yudkowsky’s version of AI-risk theories non-trivial and interesting is in the two claims that 1) AGI will likely develop through a FOOM scenario, and 2) the default failure mode for AGI alignment is the rapid and total extinction of life on Earth. If AI risk is a religion, then these are *the* core tenets that distinguishes it from all other belief systems, and upon which everything else hinges. Iff Yudkowsky is right on this point, then AI risk is qualitatively different from all other technological risks and cannot be handled in the same way, because there is no room for an iterative process of trial-and-error — either the first AGI is 100% aligned, or we all die immediately.

Once you take away that assumption, AI risk stops being the supremely important risk that we need to devote our entire lives to, and becomes just one of many hard-to-predict risks that we have to deal with. Believing that risks exist, by itself, isn’t religion-like.

23. Miquel Ramirez Says:

> High-quality technical research in AI safety is a relatively new phenomenon, but as such research becomes more common, of course I hope it will have cross-cutting appeal as well.

Amen to that, Scott. I really enjoyed that one, a read I recommend to everyone (even in paper reviews).

For some reason, #11 counted me as Orthodox. I self-identify myself as 90% Reformist.

24. David Karapetyan Says:

Scott #18: Thanks for the reply Scott. Re: AlphaFold, I’ll wait and see what happens. If you have timelines on such discoveries I’d actually be curious to hear them. I’m willing to bet that there will be no novel therapies for cancer developed from AlphaFold in the next 5 years. (I was going to say within the next year but that’s pretty obvious so 5 years seems more reasonable).

25. J Says:

As someone who’s admittedly more skeptical of AI-associated risks than most, I feel a little at ease after reading this post, because I was under the (presumably false?) assumption that most researchers in the field were of the Orthodox variety you describe here. Knowing that the people thinking about these things are more cool-headed than I thought is a relief, because I’m infinitely more terrified of a luddite overreaction akin to Dune’s Butlerian Jihad than some childish Terminator boogeyman. The threat of the Orthodox group to the prosperity of civilization is two orders of magnitude greater than that of any AI itself.

The media has a way of showcasing members of the AI risk assessment field that are a bunch of too-influential billionaire techbro reactionaries that watched Terminator too many times, who all suffer from extreme overconfidence and seem to think they have the unilateral right to decide the best future for humanity. And I’m glad to know that that isn’t everyone. Skynet is not real, but there are fearmongering Orthodox cavemen with far too much power and sway that seem to think AI’s will suddenly exterminate us for no reason like in a low-budget science-fantasy film.

Which is all to say, I think the risk assessment field has a serious, *serious* publicity problem, and the Reform group needs to start pushing hard to be seen, letting the world know that the Orthodox position doesn’t represent everyone. Because I think the common person is completely unaware the Reformist position even exists.

26. Hyman Rosen Says:

Scott #12

First of all, those things shouldn’t be stopped, because it’s clearly something that people want. For example, I’d bet there are millions of people who would enjoy deepfake porn featuring themselves together with a celebrity. And I’d bet there are millions of students who would love to have automated systems write essays for them. (Also, leading automated systems into unintended pathways, such as racism, is funny. Just wait for the adversarial attacks once self-driving cars become truly widespread.) No matter what you do, eventually there will be open-source versions of these software tools that will let people generate whatever content they like.

Second, none of these systems are yet in any way intelligent beyond abilities to pattern match using enormous amounts of training data. These systems are producing volumes from Borges’s La Biblioteca de Babel. They’re regurgitating what they’ve been fed, sometimes well and sometime badly, and the better they get, the harder it will be to see when they’re wrong. But they have no will or consciousness. They’re not going to be taking anything over anytime in the near or even distant future, any more than my spell checker wants to become emperor.

Third, complaints that these systems produce racist and sexist results (such as in automated processing of résumés) are misguided. Rather, these results reflect things that people don’t want to see but are difficult to hide from mechanical, unintelligent eyes that are just analyzing data and reporting based on what is there. It’s sadly funny that people are working to produce woke AI that will pretend not to see in the same way that woke people blind themselves to reality.

27. Scott Says:

Hyman Rosen #26:

(1) If you agree that the sorts of misuses I described will happen, but just think they’ll be funny, then you’ve conceded nearly everything I need. It’s only necessary to add that the rest of the world might lack your sense of humor!

(2) “Regurgitating what you’ve been fed” vs. “producing something genuinely novel” is not a binary dichotomy, just two directions along a continuum. Even Einstein and Tolstoy required lots of input in order to produce their output, though admittedly orders of magnitude less than current LLMs require. Meanwhile, current LLMs can write poems, like the “Philip Larkin cryptocurrency poem”, that I find witty and delightful and that in no sense appeared in their training data. If such things don’t count because they’re “just mechanical,” then at what point do our creative works not count because of the mechanistic nature of our brains?

(3) It’s true that the current generation of LLMs is architecturally limited: they have no persistent identities over time, no long-term memory at all beyond their training parameters, and no ability to execute code or access the Internet autonomously in an attempt to fulfill a user request. How long do you imagine it will be before companies create LLM-powered systems that no longer have those limitations?

28. Michael M Says:

zesty #21: “– What other mathematical / scientific subjects could be applied to AI Alignment that haven’t really yet? e.g. Chaos theory / Complex systems”

One thing I like about the field of AI alignment is how broad of scope it is and how it connects to so many different fields. Who knows which approach will be useful, but I think it’s worth thinking about it from all sides!

* Philosophy. This is literally the is-ought problem writ large. Also, which ethics should we pick? I think it will really matter, Deontology vs Utilitarian, etc. What are other sources of value, other than survival?

* Politics. How do you design systems with accountability and transparency?

* Psychology. Give a child an IQ of 10,000 and raise them from birth. Will they kill all humans? Why or why not? Is there something you would have to teach them first? Would some of them still do it?

* Neurology — how do humans formulate values, are values fluid or fixed? If they are fluid, can they go ‘off the rails’ or are they stable?

* Evolution. What environments give rise to species/agents that are ‘aligned’ with their ecosystem? Does this happen if the species/agent is overpowered?

* CS/Math — obvious

* Middle management — how do you convince an entity beyond your grasp to not fire you?

29. g Says:

Miquel #23, I think #11 was enumerating _questions_ 1..8 and giving Ilio’s own view on each of Scott’s eight issues, not enumerating _people_ and giving Ilio’s estimate of the overall position of the first eight commenters here.

30. Scott Says:

Daniel Kokotajlo #9: Thanks so much for your extremely thorough and helpful comment! I wanted to take time to read and consider before replying.

It sounds like we actually agree about the broad contours of the “Orthodox/Reform” divide! Even your amendments to 5-8 strike me as quibbles rather than denials that these are some of the basic axes of disagreement. Of course, in my post, I never explicitly argued that the Reform side is right and the Orthodox side is wrong. I just suggested my leanings, writing checks that will need to be cashed later.

Here are my detailed responses for now:

– For (1), your injunction to “crunch the numbers” seems to assume the desired conclusion. For at least the next century, I see AI as severely limited by its interfaces with the physical world. Even supposing an AI wanted to kill us all, I see its channels for doing so being strongly concentrated on the sorts of channels we already know about (pandemics, nuclear weapons, runaway climate change..). An AI, of course, might try to talk us or lull us into exacerbating those risks, but it’s not as if our existing knowledge about the risks would suddenly become irrelevant.

– For (2), if you stipulate that, e.g., we’re extremely confident that torturing a certain child is the only way to save the world, then even a sane “deontologist” might (with utmost reluctance) torture the child. It seems to me that that’s not the crux of disagreement with utilitarians. Rather, the disagreement is about how plausible it is that, in real life, we could ever have the requisite confidence. And I’d say the same here. If you stipulate that the only way to save the world is for AI-safety experts to do something unilaterally that they don’t think can be defended or justified to the public—then of course they should do it! But if I ever thought we were in that situation, I’d first remind myself about 500 times that I was “running on corrupted hardware”—in Eliezer’s own words, widely and appropriately quoted these past couple weeks in the context of the FTX collapse.

– For (3), my personal guess (FWIW) is that AI that could destroy the world with the cooperation of bad humans would come decades earlier than AI that could destroy the world without such cooperation.

– For (4), I actually think we’ve learned a lot from the experience of the last few years that’s potentially relevant to alignment. We’ve learned how spectacularly well ML can do when it has both a clear goal and copious training data (or the ability to generate such data synthetically)—and how much harder it is when one or both of those is missing. We’ve learned that, contrary to an earlier generation’s expectations, automating artistic and intellectual work is actually much, much easier than automating robust interaction with the physical world. We’ve learned that a little RL on top of a basic model (as in InstructGPT) can go a surprisingly long way to suppress misbehavior. We’ve learned that, while LLMs can indeed be trained to lie, one can then look inside them to find representations of something like their “true beliefs”—treating that as yet another ML problem. None of this was obvious, at least to me!

– For (5), could we just define 1 year or less as “FOOM,” 1 decade or more as “not-FOOM,” and anything in between as indeterminate? If so, then yes, you can put me firmly in the not-FOOM camp. Meanwhile, certainly back in the Sequences days, I remember Eliezer talking about takeoffs lasting mere hours or days—maybe someone can find a link? Maybe he’s changed or clarified his view since then?

– For (6), I’m glad to learn that the Orthodox camp worries about the dangers of a pivotal act. I guess the crux of disagreement is just: I think we know so little about what a useful pivotal act would even consist of, that it’s unhelpful to talk in those terms at all.

– For (7), yes, as a matter of strategy, I strongly prefer to see AI safety research that is “legibly impressive” (correct, original, and interesting) to the mainstream scientific community, or at least the relevant parts of it, such as the AI community. I think such work is finally possible and is even being done—e.g., the work of Jacob Steinhardt’s group on interpretability, or work on backdoors and adversarial inputs. I prefer this not merely for reasons of PR, but because I see science as an integrated whole, rather like the Bitcoin blockchain—and because, taking an “outside view,” the track record of research communities that have tried to break off from the main blockchain, and establish their own separate chain, tends to be a sorry one. I have a strong sense that the Orthodox view here differs from mine (with, e.g., Paul Christiano’s view being somewhere in between).

– For (8), I know the Orthodox agree that even a malevolent AGI would need a power source, factories, robust interfaces to the physical world, etc. The difference is, I see these as tremendous difficulties, whereas for 15 years, I’ve read comments by Eliezer that say things like “and then the AI emails instructions to a lab that unwittingly synthesizes the self-reproducing molecular nanobots that the AI can then use to manufacture anything it needs to take over the world, the end.” Suffice it to say, I see it as much likelier than he does that this last step will actually contain enormous bottlenecks! 🙂

31. Andrei Says:

“here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals”

This sounds a bit sneerclub-ish, and, most importantly, I think it strawmans the Orthodox position. And in grand Orthodox tradition, I will refer you to the Sequences:

https://www.lesswrong.com/posts/aiQabnugDhcrFtr9n/the-power-of-intelligence

(and I appreciate a lot your effort to “codify” the two positions)

32. Arc Says:

Scott, I wonder – would you say that it is your intuition on computational complexity theory that makes the slow-trainwreck scenario seem more likely than the ‘oops singularity instant death everywhere’ one?

33. Scott Says:

Arc #32: Not really, no. Just a general intuition that it would be extraordinary to be able to forecast the whole future of civilization from knowledge of a single technological advance, while needing to know virtually nothing about anything else happening in the world at the same time. I don’t think there’s been a single such example.

34. Danylo Yakymenko Says:

The power structure of the world is changing drastically right now. Even after defeating Russia and shutting down other fascist voices we have a problem of global inflation, correlated with ever increasing disparity in wealth distribution (e.g. 50% increase in corporate profits). The world is heading towards largely segregated societies, caste systems, and eventually to factual slavery.

The consideration of the scenario where AI destroys humanity because of misinterpreting its utility function looks like a joke, a mockery. I wouldn’t take seriously any research that concerns this.

The only research that matters is about how AI will be (ab)used by those in power. Because the hunger for power is the most consistent human behavior throughout the history. We should prepare for Cambridge Analytica “research” scaled 100x times. For intelligent bots that will guide social behavior, like sheepdogs. For advanced AI weapons, e.g. fully automatic military drones and vehicles. For hyper-surveillance of everything and everywhere, which includes reading the thoughts in your mind, eventually.

Also, I don’t think that the concept of AGI is meaningful at all. Sure, there can be a big leap in some intellectual abilities of computer systems. But the idea that general AI could understand everything that humans can feels “too general” to me. In a way, it contradicts to Godel’s incompleteness – there is no explicit boundary to the set of statements that we can regard as truth.

35. Vanessa Kosoy Says:

“We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals.”

There seems to be a confusion here, coming from the every-day association between “intelligence” and nerds. However, charisma is also a form of intelligence (i.e. the product of the mind more than the body: even though AI will be fully capable of designing beautiful bodies for itself). There’s no reason AI should imitate a particular human personality type, rather than uniting in a single entity e.g. the talents of an inhumanly brilliant scientist, an inhumanly charismatic politician, an inhumanly shrewd businessman and an inhumanly talented military strategist.

More broadly, whether AI can take over the world starting from minimal resources is a relatively minor point. This point *is* important when devising safety rules for AI labs, and in a sane world the possibility would be taken seriously even if regarded as relatively unlikely. However, it is far from a necessary assumption for existential risk.

Assuming a slow take-off, the natural trajectory is for humans to gradually put AI in charge of everything, because AI will do it better and cheaper. Even if accidents happen along the way, they will be addressed with bandaid solutions (which are easy to find, because AI is incentivized to cooperate with humans as long as it’s not yet capable of overpowering them) and forgotten, because the economic pressure to go full-throttle ahead will be enormous. Ultimately it won’t be “humanity against an AI on an air-gapped laptop”, it will be “humanity against an AI civilization that’s already in control of all the machinery that humanity already relies on to survive”.

All of that assuming a slow-takeoff, ofc. The truth is we don’t know how fast the takeoff will be because we don’t have the theoretical knowledge necessary to predict it. And the fact our state of knowledge is so bad and we’re going ahead anyway is in itself a demonstration that we’re not in a sane world.

36. Tobias Maassen Says:

Are there Risk-Deniers? Antirisks? Risk

I recently read a SienceFiction story about coexisting with AIs and even leading to Mystical Enlightenment of the AI, when it recognizes the truth of the Divine Oneness.
Is there a founded position with solid arguments about how AI will not be dangerous? What is it called, and who is it?
Or are they all Agnostics, actively not thinking about the near Apocalypse? Are there no theoretically founded outsiders?

37. Vanessa Kosoy Says:

“Orthodox AI-riskers have limited interest in AI safety research applicable to actually-existing systems (LaMDA, GPT-3, DALL-E2, etc.), seeing the dangers posed by those systems as basically trivial compared to the looming danger of a misaligned agentic AI.

We Reform AI-riskers see research on actually-existing systems as one of the only ways to get feedback from the world about which AI safety ideas are or aren’t promising.”

This seems like a mischaracterization of Yudkowsky’s position who commented positively on research done e.g. at Anthropic and in Redwood. MIRI themselves have a research project about training LLMs on thought-annotated roleplay dungeon runs.

Another way to get feedback is proving theorems. Some might object that this not feedback from the “world”, merely from “mathematics”. However, most of our uncertainty about AIs is related the world-of-algorithms rather than the world-of-things. While there might also be important uncertainties related e.g. to computer hardware or human brains, the best way to probe them experimentally is not necessarily trying to make existing systems safer.

The focus needs to be on our understanding intelligent agency in general rather than “safety” specifically, just like how you can’t study airplane safety instead of aerodynamics. Ideally, we need synergy between theory and experiment, with experimenters testing theories, measuring theoretical parameters and finding new phenomena for theorists to explain, and theorists explaining and interpreting experimental results. In practice, we have (i) the academic AI community which is mostly myopically focused on near-term applications and doesn’t seem interested in the big foundational questions (ii) the AI companies who mostly don’t care about theory at all (iii) the “prosaic alignment” community, which is doing a combination of research on existing systems (but without a theoretical foundations, it’s unclear how to generalize the results) and Christiano’s informal philosophizing (which is a great curiosity-starter, but not a solid foundation to build on) and (iv) the theoretical alignment community which barely exists, especially after MIRI have gone don’t-publish-by-default.

38. LK2 Says:

I admit: I do not get your excitement for this topic, nor I understand why you invested one year into this. This is not for saying you are doing a wrong thing: I just do not get excited at all about all this. Moreover, AIs need (electric) energy for working: a major war or decline of the human race will likely also imply lack of energy for “fancy” reasons and AIs will stop “thinking”. People will prefer powering hospitals and having light (maybe even cook! 😉 ).
Are you by chance “sensing” the collapse of the quantum computing field? For me QC will remain central, not for technological reasons, but for the original “philosophical” reason: what are the limits of computation in our Universe? Can we use computation for getting insight about Nature?
For the AIs: well, if you are having a lot of fun with this, I wish you good luck and enjoy 🙂 !

39. Andy McKenzie Says:

Thanks for this post and your work in this area, Scott.

Similar to ultimaniacy #22, I think you are undervaluing reason #5, which seems to be the most important distinction and which all of the others derive from. I agree with you that takeoff speeds seem very likely to be relatively slow. In slow takeoff worlds, I see AI alignment research as an important academic research area and in the coming years an increasingly important and profitable industry career.

40. John Lawrence Aspden Says:

> We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals.

Scott, pure intelligence worked very well for me in high school.

Where do you think charisma lives if not in the brain?

Where are humour, fighting ability, honour, seduction skills, and the wisdom to know when to fight and when to make friends, if not in the brain? Where is leadership, where is inspiration? Where is empathy?

Sure, my calculating ability was no great help, but I was *good* at being a teenager.

Your superintelligence is not some Asperger’s case who can’t live in the real world. It’s better-than-human at anything humans can do.

41. Michael Vassar Says:

Wow! I had assumed from the post title that this would be a post about Sam Bankman-Fried, in light of his role providing almost the entire funding for Anthropic.

https://forum.effectivealtruism.org/posts/qegC9AwJuWbCkj8xY/if-ftx-is-liquidated-who-ends-up-controlling-anthropic

Regarding the actual topic, yeah, the two religions neither doctrinally nor culturally connected. One simply coopted the narrative momentum of the other, and in the process distorted the other into a cult. The orthodoxy religion was basically one more variation on ‘rationality’, and like all previous variations, failed on the grounds of flawed premises. The new religion is simply the traditional worship of power, hence the funding sources.

42. Ilio Says:

Scott #30, don’t you think question (8) was in disguise the crux of the disagreement you had with Pinker on superintelligence? At the time I felt* like you were defending the orthodox view (« of course superintelligence can exist, think human mind running at GHz speed ») and that Pinker was the reformist (« of course Ghz mind can be powerless, think impotent brain in a vat »).

*that’s my perception, not necessarily an accurate representation of Pinker’s opinions or yours. Also the sentences within « guillemets » are made up, not actual citations.

G #29, you’re right, thank you.

Miquel #23, sorry for the confusion and thanks for sharing your own take (90% Reformist).

43. Scott Says:

LK2 #38: If you like quantum computing more, then you should be glad that I never left it (I’m still running the Quantum Information Center at UT) and I plan to return to it full-time! But if you foresee an imminent collapse of industrial civilization and mass power outages, surely QC research will be killed right along with AI? In which case, presumably I shouldn’t be working on either, but on stockpiling food and learning to bow-hunt or something. 🙂

In any case, my interest in AI is not new: it was a main focus during my undergrad studies at Cornell (1997-2000), where I worked with Bart Selman on RoboCup and other things, as well as my first year of PhD work at Berkeley, where I worked with Mike Jordan before switching to Umesh Vazirani. Of course this was before the deep learning revolution, but I could already see that machine learning would be societally important — I didn’t know how important! Nowadays, one would think the interest would be obvious.

Maybe I just have to reconcile myself to some people thinking I’m wasting this year, other people thinking I’ve wasted my entire life other than this year, and some people thinking both. 😀

“Having said that, there’s a very obvious problem with releasing the code and models, as you suggest. Namely, the moment you do that, anyone becomes able to run the models without any of the safeguards you meticulously added…”

I think Hyman Rosen is getting the better of your argument Scott. You’ve said that future posts will detail the actual safety mechanisms being worked on and developed right now at OpenAI, but if you don’t have a plan for how to “bootstrap” these systems, then you have to concede that it is only a matter of time before someone like Mr. Rosen gets his hands on the same technology and unleashes it without your safety systems in place.

In other words, if your plan (or OpenAI’s) is is to withhold from humanity the knowledge of how to create these AI’s for fear that others will 86 your safety systems, then you will fail. And you probably should fail. You’re never going to build trust by withholding info and saying “trust us.” Maybe a government or a set of governments could do this (as with certain nuclear tech), but not some private organization no matter how ‘open’ it claims to be.

45. Scott Says:

Michael Vassar #41: If the Orthodox branch failed and the Reform branch is nothing but power-worshipping conformists, what is left apart from the Vasserite Deviation? 😀

46. manorba Says:

I share the common skepticism about all these talks on AGIs, Operative AI and the like*, but i think Scott is spot on when he stresses the dangers and implications of AIs. Even with the limitations of actual technology (see Scott #27, i would add in geneal the total lack of a “semantic engine”) we are witnessing how much of an impact it’s having on today’s societies. And it’s gonna get a lot worse. So while this religion thing leaves me cold, AI security is something we should spend some time on. But instead of a Skynet scenario, it’s more about bias, right choice of data and stuff like that.
Another thing that fascinates me about AIs is how much of the collective human subconscious( yayyy Jung) they’re revealing armed with just linear algebra 🙂

*I have no idea if we will ever be able to create a real autonomous thinking AI, i just don’t see right now any technological path to achieve it. We still don’t know too many things about a lot of things, we don’t even know where to start. that’s my take on what Pinker was saying in the debate with Scott.

47. Scott Says:

John Lawrence Aspden #40: Then forget about high school. Why is the real world not ruled by Fields Medalists? Are the actual wealthiest and most powerful people smarter than the Fields Medalists—whatever let the former attain all their wealth and power also, as you say, residing in their brains? If so, then it seems to me that we might as well abandon a separate concept of “intelligence,” and just talk directly about “ability to attain wealth and power,” since apparently they’re the same.

48. Nick Drozd Says:

This is a great way to compare and contrast the different orientations.

You might add that, as usual, Orthodox is a lot more politically conservative than Reform. There are two reasons for this, one incidental and one essential.

Incidentally, the kinds of people who are attracted to this kind of stuff tend to be people who, for example, think that racism either doesn’t exist or doesn’t matter.

Essentially, the claimed overwhelming risk of AI results in utter nihilism WRT any other problem. There is no point in attempting to address wealth inequality, crumbling social institutions, etc, because none of it matters in the face of the big foom. Of course, this amounts to an endorsement of society just as it is.

FWIW, my sense from reading Sneer Club (which I never would have heard of if not for this blog!) is that they really, really don’t like Orthodox Alignment, and in fact their objections to it often sound basically like Reform. I think they ought to find this post reasonable and un-sneer-worthy. Can any Sneerers confirm or deny?

49. 4gravitons Says:

Here’s maybe a concrete instance of Scott’s point (7): would it be better for AI research funding to look more like the Manhattan project, or like the recent quantum computing bubble?

My guess is that the QC version is better suited to the problem than the Manhattan project version. Manhattan projects are great when you have a really specific technical task and already have almost all the pure theory you need to achieve it. If you have a lot of uncertainty then you really want researchers from a very wide variety of backgrounds, because the right insight could come from anywhere. You want to incentivize everyone in CS to be dabbling a little bit in AI safety, in the same way that right now everyone in physics is incentivized to dabble a little bit in quantum computing.

50. Scott Says:

Ilio #42:

don’t you think question (8) was in disguise the crux of the disagreement you had with Pinker on superintelligence? At the time I felt* like you were defending the orthodox view (« of course superintelligence can exist, think human mind running at GHz speed ») and that Pinker was the reformist (« of course Ghz mind can be powerless, think impotent brain in a vat »).

The curse of the moderate. 🙂

When I talk to Pinkerites, I have to stress that superhuman intelligence across virtually all domains can be coherently imagined, is not absurd (or certainly not anymore…) to imagine AIs achieving in this century, and would plausibly have enormous effects on the world.

When I talk to Yudkowskyites, I have to stress that superhuman intelligence wouldn’t all-but-automatically mean that the world goes “FOOM.”

51. Triceratops Says:

Can you share any tidbits about GPT-4? C’mon Scott, I know they let you get at least a glimpse of its capabilities 😀

52. Sandro Says:

abe #15:

You’ve missed the fact your premise is one from a cynical contempt of humanity. You’re arguing over two competing visions where the underlying assumption is that the technology is better handled by an elite few instead of trying to democratize it for all.

Somehow I doubt you would apply this argument to nuclear weapons.

53. Vanessa Kosoy Says:

Scott #47

In the context of existential risk from AI, “intelligence” always stood for “the (cognitive/algorithmic) ability to attain your goals (whatever those are)”. Or, if we want to be precise, the ability to attain those goals while facing substantial uncertainty about the world. This is essentially the same notion of intelligence as appears in Legg and Hutter 2007 (https://arxiv.org/abs/0712.3329). (Although technically there are some issues with the Legg-Hutter definition that can be improved.) The anthropomorphic connotations are a source of confusion, but we don’t have a better succinct name for it.

54. fred Says:

“technological progress actually happens regardless of whether you believe in it.”

I’m not sure I get this.
Certainly lots of things happen regardless of whether we believe in them. Pandemics, asteroid strikes, natural disasters, …
What’s the opposite statement?
That technological progress wouldn’t happen unless you believe in it?
First, who’s the “you”?
Certainly the people who work on the technology do have to believe in it enough in order to get over the engineering difficulties inherent in realizing any technological dream. Space elevators are certainly a possibility, but they never happened the way heavier than air flight happened. The feasibility of actual quantum computing (i.e. a real multi-million qubit machine) or effective nuclear fusion isn’t clear yet… if the people involved lose faith in their feasibility (at a non prohibitive cost), they won’t happen.

Or is this trying to say something that technology always moves forward somehow?
That’s not true either because any new technology brings it share of positives and negatives, so “progress” is quite relative.
Technological regress also happens constantly. In the early 2000, the goal was to make the internet as reliable as the old phone system was. And for a while every telecom company worked hard to reach that goal (say, 99.999% reliability). But it seems now that everyone has accepted the trade-off of increasing speed of deployment of new services for reliability. Any software stack is now so deep and built on so many services that even though the probability of failure of any given layer is still pretty low, the reliability of the entire stack is now abysmally poor (it’s not rare to see “ubiquitous” services like gmail, facebook, youtube go down for many hours, or local internet access is also disrupted quite often). And there are forces at play which goal is to push progress in the other direction, like cyber attacks or sabotage by pro-environment groups.
The same regression happens in rocket science, nuclear energy, semi-conductor fabrication… where the countries that used to have the know-how have now lost all abilities to even build what was made 40 years ago.

55. Scott Says:

Triceratops #51:

Can you share any tidbits about GPT-4? C’mon Scott, I know they let you get at least a glimpse of its capabilities 😀

If I did, I’d have to kill you afterward … and that would violate this blog’s policy against ad-hominem attacks. 🙂

56. fred Says:

“What have I missed?”

We constantly use humans as a benchmark for AGI.
And then at the same time we talk about AGI in terms of achieving some sort of pure/neutral intelligence platform, just like we can implement a sorting algorithm “perfectly”.
But those two things are contradictory.
The minds of humans, no matter how smart they are, are plagued by internal contradictions often leading to serious psychological issues.
And I do believe that the same tensions we see in human minds will afflict AGIs. Concepts like bias, paranoia, obsession, jealousy, lack of focus, laziness, procrastination, depression, egotism, narcissism, delusion … will all have a counter part in our AGIs. And “balancing out” an AGI will just be as tricky as optimizing a human mind (through psychology or psychiatry), and, if it can be reached at all, that balance will only hold up for a very narrow set of goals and behaviors, defeating the original goal of creating a true general AI.

57. Peter Haugen Says:

first time posting:
I do want to quibble on the line
“The obvious response to that claim is that, while there’s some truth to it, “religions” based around technology are a little different from the old kind, because technological progress actually happens regardless of whether you believe in it.
“
a bit, because if literally no one believed thing_X was possible, they would not try to make it, and it would not be made. Likewise no amount of human belief will change the realness status of fenrir or jormungandr. I think the difference between a technological religion and theistic one is that belief actually /can/ impact the world.

Part of how the apollo program worked was enough people believed it was possible they could get nearly 7% of US discretionary spending. I conjecture everything done for a first time is probably being done on a considerable technological overhang and the further out on that overhang the less belief is required to get the resources to do it.

58. Triceratops Says:

Scott #55:
Understood… Blink twice if you think the Turing test will still be a tough benchmark by the end of 2023 😛

In all seriousness, I’m glad to see someone articulate a detailed secular alternative to the Yudkowsky apocalypse prophecy. The way things are going right now, I am much more concerned about bad people using AI for bad things than a bad AI choosing to do bad things to all people.

59. Scott Says:

Nick Drozd #48: The trouble for your theory is that, within the group that SneerClub despises, only a small minority are what I’ve called Orthodox AI-risk believers. Many—e.g., Steven Pinker, Paul Graham, Sam Harris, Julia Galef…—would probably be better described as either belonging to the Reform contingent or else completely secular.

History, of course, does furnish ready examples of those who’ve despised Orthodox, Reform, and secular alike, seeing them all as just superficially different faces of the same evil, but we need not go there…

Yes, I’d accept the sneerers’ acknowledgment that, when it comes to AI, there are points of agreement between my views and theirs, but I’m not waiting with bated breath. Since you brought up left and right: what was that saying about how the right looks for converts, while the left looks only for heretics? 🙂

Sandro #52,

Nuclear weapons is another kettle of fish altogether.

* They have no redeeming uses for humanity – only destruction. AI technology obviously has many potential redeeming uses.

* Nuclear weapons restrictions is often about the hardware necessary to manufacture and knowledge thereof.

* The only way we’ve been able to restrict it is through *governments* generally agreeing to do so. Not private organizations that are somehow ‘open.’ Note: even hostile governments are generally not in favor of private organizations having access to nuclear weapons. I don’t see North Korea publishing info…

* The reformer category of AI risk folks – presumably a large part of OpenAI – don’t really see the release of the *current* state of the art as existential risk to humanity. Rather, the *risk* is about deleterious sociological consequences of releasing the models – ie, racism, bad PR, etc – rather than any real existential risk. Given this, you can’t possibly compare the *current* risk of what OpenAI is holding back to … nuclear weapons.

Rather, the *current* risk that OpenAI is seemingly holding back by refusing to release models indicates they aren’t really worried about the risk rather than giving up the first mover… ie, it is more about power than altruistic safety concerns.

61. Jeremyy Says:

I’m curious what people thing the benefits/drawbacks of this analogy might be. I mean obviously things are more complicated and people mostly don’t fit perfectly into these categories, but I’m not going to just sit here and yell “labels are oversimplifying!” I realize they can have value too. I haven’t made it through all the comments but I didn’t see much discussion of that there or in the original post. And yeah, I suppose it’s fine to just have the discussion for the fun of it, but I feel like there could be real possible pros/cons to thinking of it this way. The main one I can think of is that it might highlight that there are a range of views on the subject and one’s options are more than just true believer or skeptic. Anything else come to mind?

62. Scott Says:

Adam Treat #44: Clearly, obviously, no safeguard can possibly help if people choose not to implement it. So then what can be done?

(1) Rely on the fact that training a state-of-the-art model costs tens of millions, and soon hundreds of millions or billions of dollars, in compute. This is not an ability that just anyone has.

(2) Rely on the fact that, even as hardware advances drive computing costs down, the “state-of-the-art” will continue to advance. Thus, it will likely be true for quite a while that only a small number of companies and government entities will have the resources to train state-of-the-art models.

(3) Try to make the safeguards into “industry best practices”—like robots.txt for search engines—which all of those big players have to adopt if they want to be seen as serious and responsible.

(4) Hope that, by the time (1)-(3) no longer works, we’ll then have better solutions to AI alignment, like powerful good AIs that can counteract whatever is done by the bad ones.

If you have a better approach, I’ll be happy to hear it! 🙂

Scott #62,

Right, real barriers exist right now to training an AI in terms of computational cost and are likely to continue into the indefinite future as you say with regard to the state-of-the-art. Not only that, but governments can and likely will be (already are?) tracking who has the computational bandwidth to train the latest generations biggest models.

It’s also undoubtedly a *good thing* that you and others like you are working to find technological cures to the possible deleterious sociological problems with today’s models. Obviously, I don’t see you speaking for nor needing to defend OpenAI’s current decision not to release their models or code. I don’t even know what your own position is on OpenAI’s decision to be secret!

What I’m after is an honest acknowledgement by OpenAI – and others who *do* support their decision – that it has very little to do with limiting risk/harm. They are a for profit entity that has made a deal with Microsoft. It is insulting to those who follow this stuff for them to then claim that they are not releasing this stuff as some sort of responsible altruistic decision as our betters.

Again, I’m not conflating you with OpenAI. I’m happy to hear and looking forward to more about your work and how you’re making the models safer.

64. OhMyGoodness Says:

Sandro #52

Is there any doubt what measures would be taken by the US security services if a group of private citizens were close to finishing a nuclear weapon. Deadly force etc. Why would it be any different for a GAI that could reasonably be considered to pose a threat to not only national security but species security? The consideration wouldn’t even have to be that reasonable.

The NSA would construct a prison of Faraday cages for potentially supervillain AI’s deep underground and humans associated with the project would be amnesiated in some manner.

65. Scott Says:

Adam Treat #63: I see, so whatever I say isn’t relevant anyway. You won’t be satisfied until Sam Altman himself comes here and admits to you that OpenAI is a greedy profit-seeking company whose supposed mission and values are a sham.

Not sure how to break this to you… 😀

66. Mark Srednicki Says:

Can somebody list the actual accomplishments of the AI safety community (all branches) to date? Thanks.

I would be much more inclined to think that this is a valuable effort if it was called “computer glitch safety”.

I’m in the small camp that thinks that the AGI threat is vastly overblown. This is because of what we know about NGI: it required millions of years of evolution that produced a very finely tuned device that Is constantly processing a vast amount of incoming information per second through mutliple channels, while simultaneously able to impact and change its environment (and hence the information input), and while simultaneously having its actual physical architecture shaped and controlled by that input over a long period. Absolutely nothing close to this process is being proposed for AGI.

67. OhMyGoodness Says:

To avoid action by the security services it will have to be a decentralized crypto-AI (continuing the FTX theme).

68. Zack M. Davis Says:

Scott #30—

For at least the next century, I see AI as severely limited by its interfaces with the physical world.

A century is a long time!! How well do you think people in 1922 would have done at predicting what tasks machinery would be severely limited at in 2022?

Scott #65,

No, what you say is relevant. I just don’t know what your stated beliefs are re: OpenAI decision to keep models/info private. Do you believe that GPT-3 poses *existential* threats to the world and that by not releasing it OpenAI is limiting such a risk? Are the people behind the *actual* open source attempts at making a GPT-3 clone dooming us all unwittingly?

Don’t worry, I’m under no illusion that Sam Altman is going to say anything nor that OpenAI will admit to being a greedy for-profit with lots of PR showing off supposedly altruistic intentions as our betters. Then again, I was also under no such illusion with that other Sam Bankman-Fried and FTX either.

70. Ilio Says:

Jeremyy #61, to me Scott’s set of questions acts as a kernel (or, equivalently, as a support vector, or as a set of weights for signal decomposition or neural interpretability). The main benefit is, well, the same as kernels for kernel trick: it linearizes complex thoughts within a simple & hopefully interpretable framework. The main drawbacks: it’s usually bad for out-of-distribution thoughts (especially adversarial ones), and it may conceal equally interesting kernels.

Of course, it is possible that I’m wrong and releasing GPT-3 really would present an existential risk to humanity. Perhaps OpenAI really does have altruistic intentions in not releasing it. Maybe they are *not* just deceiving themselves by using their own high regard for their own morals as pretextual justification for staying ahead of the game in the race for model development while also satisfying Microsoft.

If that is all true, then I think OpenAI has done a horrible job explaining themselves and just what existential risks they are preventing. Pointing to the sociological ickiness of these model’s propensity for mirroring some of the darker parts of humanity just doesn’t cut it. If this really were the case I’d expect OpenAI to be lobbying the US government and others to shut down the (quite successful) open source projects duplicating GPT-3 and even other larger models.

Unfortunately, if I’m right, then I can imagine a world where OpenAI was non-profit and released the work becoming a well respected voice when the *actual* existential risks develop and they identify them. Maybe that is GPT-5,6,7… and they’d have a real shot of making people listen when they cry wolf.

72. Michael M Says:

In practice I think there’s a spectrum of beliefs about AI Alignment, and a ton of great approaches. I find this reference very helpful in describing what people are working on, and how different research groups cluster in terms of doominess, takeoff speeds, etc:

https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is

It’s a healthily debated topic and I think we need that. We need the orthodox and reformers to offer different perspectives. This is a very hard topic and the more breadth we have, the better.

73. Michael Vassar Says:

‘Vassarism’ is just Siskind’s term for ‘whatever transmissible information convinces people that Orthodox AI Risk (and EA) are fraudulent. It’s fairly mainstream, but taboo to neoliberals, basically critical theory informed classical liberalism, the thing Habermas claims to want but doesn’t have a proposal for. The best recent example is here.

benjaminrosshoffman.com/it-is-immoral-to-hate-the-player-but-decline-to-investigate-the-game/

Alternatives include helping bad actors to behave less badly when the likely PR (and self-esteem) benefits exceed the cost to the bottom line, as you are doing. Various philosophical pathologies based on the anthropomorphic misuse of the concept of infinity such as Zizism and Basilisk Worship (just say no kids). Countless forms of Luddism, whatever Leverage is doing, left and right accelerationism (despite the name) and the Left, Right and Center of the political spectrum, which are all coordinating to destroy science, technology, education, sustainability and economically essential population growth.

74. abe Says:

Sandro #52

> Somehow I doubt you would apply this argument to nuclear weapons.

I wouldn’t apply it to nuclear weapons but I would absolutely apply it to nuclear engineering, nuclear physics, high energy physics, quantum mechanics and the mathematics related to each of those fields. I would also apply it to rocket physics, computing and mechanical engineering.

75. Scott Says:

Adam Treat #69, #71: No, I don’t think that GPT-3 itself presents any existential risk whatsoever. As far as I know, no one else at OpenAI thinks it does either. Speaking only for myself, what I’d say is this:

(1) OpenAI is sort of trapped in a no-win situation. If they did make their models public, the AI ethics people would scream at them for enabling all manner of mischief (as we saw with Stable Diffusion, for example). If they keep the models private, people like you will scream at them for violating the promise in their name. Either way, something has to bring in revenue with which to train the models in the first place.

(2) If and when these models do pose civilizational risks — so, not now, not next year, but conceivably at some point in the future — it becomes really hard to understand how those risks could be mitigated if the models were public, even harder to understand than if they weren’t public.

76. Scott Says:

Mark Srednicki #66: In terms of clear AI safety accomplishments so far, it’s a short list, but here are some examples that I’d personally suggest and that I think the majority of the community would get behind:

– OpenAI successfully used reinforcement learning to get GPT to follow instructions vastly better, and prevent it from doing lots of crazy stuff that the user didn’t ask for — this is precisely the difference between the original GPT and the “InstructGPT” that you see if you use it today.

– Paul Christiano and his collaborators put forward an influential paradigm for “iterated amplification,” wherein you’d use an aligned AI to help build a more powerful aligned AI, and so on, by analogy to how AlphaGo becomes better and better at Go through self-play. They did some supporting empirical work also.

– Extremely recently, Jacob Steinhardt and his collaborators have shown how, even if a language model has been trained to lie — e.g., to give false answers to yes/no questions — you can look at the inner layers of the neural net and extract an internal representation of “what the model believes to be the true answers,” which then gets overridden at the output layer. It’s like doing neuroscience on an ML model, or applying a futuristic lie-detector!

Other than that, I think there’s been an arsenal of potentially helpful concepts, from “sandboxing” to “wireheading” to “inner vs outer alignment” to “coherent extrapolated volition” (you can look them up if you want).

It’s true that, by the very nature of the subject, one won’t know for sure whether any of these ideas actually help at thwarting mischief from a superhuman AI, until an AI actually exists that’s powerful enough to provide a test! Indeed, it’s arguably only in the last few years that any AIs have become powerful enough to engage any of these questions at all. That’s why I don’t think it’s a coincidence that such definite achievements as there are tend to be very recent!

On a different question: unfortunately, I would not be reassured by the millions of years that evolution needed to produce us. The Wright Brothers took 3 years to do what took avian evolution millions of years (by different means, obviously). Today, GPT has learned to produce rather good poetry and essays on arbitrary topics in a few years — something that took at least 100,000 years of human verbal evolution. Have you actually tried it? There’s no amount of verbal redefinition and wordsmithing that can make it less amazing.

77. OhMyGoodness Says:

“Many people claim that AI alignment is little more a modern eschatological religion—with prophets, an end-times prophecy, sacred scriptures, and even a god (albeit, one who doesn’t exist quite yet). The obvious response to that claim is that, while there’s some truth to it, “religions” based around technology are a little different from the old kind, because technological progress actually happens regardless of whether you believe in it.”

Our ancestors had the opportunity for thousands of years to imagine all sorts of gods and spent no small effort doing so. All the good god ideas were used up by the time we arrived. GAI provides a fresh slate for our time, we have a new divine Rorschach blot to consider and so enjoy it while it lasts.

I am not sure how democracy would be used to determine AI controls. Democracy has to be embedded in some larger structure to function well otherwise just fascism of the majority against the minority. Its the overall structure it’s embedded in that’s the trick, counting votes is trivial.

78. Christopher David King Says:

Scott 50:

> When I talk to Yudkowskyites, I have to stress that superhuman intelligence wouldn’t all-but-automatically mean that the world goes “FOOM.”

Could you clarify? Although I’m also skeptical of FOOM, that’s because I suspect AI research to “creep upto” AGI, instead of making a giant leap to the “superhuman in every domain” step.

Are you saying that it’s conceivable we have a full on AGI, unaligned, that just “chills” for a month or more? It seems that AGI would rapidly outclass the US and China, because once you can beat 1 human, beating n humans is just a constant factor. An exception might be if the algorithm is so resource hungry it can’t expand quickly and so stays at near human level for a while, but that seems unlikely.

Another option might be if it’s non-agentic, but the combined “human inventor”+”AI” system is agentic, so then it depends on if the human inventor is aligned.

79. Michael M Says:

Danylo Yakymenko #34: “The power structure of the world is changing drastically right now. Even after defeating Russia and shutting down other fascist voices we have a problem of global inflation, correlated with ever increasing disparity in wealth distribution (e.g. 50% increase in corporate profits). The world is heading towards largely segregated societies, caste systems, and eventually to factual slavery.”

I mostly agree with this premise, but not the conclusion about how alignment doesn’t matter. Our systems are converging to this, and it’s not the work of a single bad actor, though there are bad actors. It’s the system itself. We have failed to align the system. Even if the system is a rule based system with human classifiers in different roles, the result is similar.

Without governments, there is anarchy. We form governments to clamp down on the ubiquitous prisoner’s dilemmas hitting us from all sides. One solution is to trust a single person/king/dictator, where we pray we ‘align’ with whoever happens to have power, and their key supporters. Another solution is democracy. As issues become more complex, each citizen cannot learn and become expert in the incentives and anti-competitive behavior in every industry (i.e. our values are complex), therefore we elect representatives. However, these representatives have their own interests and re-election bids/financing which complicate whether they can actually represent the people faithfully*. Corporate systems can be thought of as profit maximizers that also have to offer incentives internally for advancement, and the entity as a whole naturally tends away from empathy. Additionally it is well known the type of actor that succeeds in certain roles is biased towards ones that will actually pursue the maximum profit, and only feign empathy.

It’s the same problem. Just imagine gradually reducing the number of humans-in-the-loop who might in theory have a crisis of conscience, and I think you can see the problem.

* Thinking out loud, this sounds a bit like iterated amplification — base layer is voters, then representatives, then people higher up the chain and corporations, each supervising the layer above.

80. Peter Shenkin Says:

I’m not an AI guy and I had to look up AI Alignment to remind myself what it is. I found this far-from-comforting definition on the Wikipedia page with that title. It said “AI alignment research aims to steer AI systems towards their designers’ intended goals and interests.”

If that’s what it really is, it strikes me as bizarre and inimical to any effort to make progress on building a useful technological system.

When you build your system to tell you what you think the answer should be, how can it ever tell you something surprising that you didn’t know before — either about the question you are asking or about the efficacy of the technology?

It sounds to me like an enshrining of confirmation bias as a fundamental principle in the development of AI systems.

81. Cristóbal Camarero Says:

Scott#47 ” Why is the real world not ruled by Fields Medalists?” These can be people very good in a few very specific fields. Why should they be good in those aspects required to rule the world? Remember that very recently you learned a few strategies to deal with bullies that many can consider “obvious”.

Humans are manipulated all the time, and many problems have already arisen without a big AI involved. When will AlphaManipulator be developed? Should we hope to resist it when we cannot beat AlphaZero?

82. Mark Srednicki Says:

Scott #76, IMO your example of flight illustrates my point. Humans managed to mimic (by a totally different method) one tiny aspect of what a bird can do. But we still do not have flying machines that can (for example) forage for their own fuel in the natural world. And Tesla can’t even bulid a car that can reliably find the obvious (to any human) edge of a road (see recent NYT article). Writing essays is nice, and I believe it is now or soon will be possible to build a chatbot that is (say) better at answering questions about quantum field theory than I am. So what? That’s a LONG way from Skynet, IMO. AFAIK, as of now human technologists have no idea how to build a system that can exceed very narrow parameters (write poetry, play go, answer questions about quantum field theory, drive a car). I think the barrier to exceeding these narrow parameters is far higher and wider than is generally believed, and that the AGI/Skynet threat is basically nonexistent.

83. Mitchell Porter Says:

I guess I’m some version of “Orthodox”. A time will come when AI can outthink humans in all areas, the way it now does in chess; that time may be very close (I’ve thought that ever since AlphaGo, 2016); the result will be a world governed by AI imperatives rather than human imperatives, and if we want human beings to have a place in that world, we have to ensure by design that AI imperatives are human-friendly.

That might sound consistent with “Reform”, but Reformists seem to think in terms of AI-human coexistence and how to make AIs good citizens, whereas I agree more with the Orthodox – there’s no reason to think AIs will reach human levels of ability and then just hover there. Again, chess offers an example of the power differential: these days, the very best humans lose every game to even mediocre chess programs.

Therefore, one must plan, not just for AI good citizens, but for AI that governs the world. June Ku (metaethical.ai) has the most advanced proposal that I’ve seen, for the values that should govern an AI that governs the world. It is a form of aggregation of the values of individuals, and as such is in the tradition of all those political and cultural ideas that try to balance liberty and community. The proposal is surely not perfect, but it is a genuine attempt to implement that tradition, in a way that could still mean something, in a world containing superhuman AIs capable of completely X-raying an ordinary human’s cognitive structure.

84. JimV Says:

It is already a trite saying that any new technology can be used to help or harm society. My main worry about AI is that some group, say Russia (I used to know some great people who are Russians but not in the current government) deliberately uses it to attack others. E.g., using their variant of GPT to fill social media with propaganda and misinformation, developing intelligent drones for warfare, etc. (I know this is not something OpenAI can do much about.)

85. Scott Says:

This thread has inspired me to propose difference #9:

Reform AI-Riskers feel a constant urge to explain the basic tenets of the faith to those who find them ridiculous, in an attempt to find common ground with such people.

Orthodox AI-Riskers are sufficiently sure of the basic tenets that they don’t care who finds them ridiculous. They’re thus more able to go on to debate the finer points of the law.

86. Christopher David King Says:

Scott #47:

> Why is the real world not ruled by Fields Medalists?

They’re probably too busy XD. Ruling the world is a full time job, time that could be spent working on the Riemann hypothesis!

87. Michael Vassar Says:

OhMyGoodness#77

Counts and Grafs are high level mobility from their respective societies. The Heiroglyph for a million is a man throwing his hands in the air. Even counting is a hard won cultural accomplishment, and counting votes, counting a contested thing, that looks like an accomplishment we are at risk of losing… which will probably do something to reduce AGI risk in the short term if it happens, and which AI risk in the Reform AI Risk sense may importantly contribute to.

88. Nick Drozd Says:

Bitcoin may have been invented by an AI with a longterm plan. Consider the facts.

1. Bitcoin is a form of money that can be spent and acquired with nothing other than an Internet connection and is outside of the control of any one group.
2. Humans in pursuit of Bitcoin have concentrated huge amounts of Internet-connected computing power, often connected to renewable energy sources like hydroelectic dams.

If you were an AI hoping improve your interface to the outside world and manipulate humans, this would be a pretty good start.

89. clayton Says:

Mark Srednicki #82, let me run my “internal Scott Aaronson emulator” and say: ML keeps blowing through these thresholds every time we make them! Thirty years ago, no one expected chess programs to be competitive against humans. Ten years ago, _no one_ believed any Go-playing computer program could possibly hold a candle to any competent human player, let alone a 9 dan master, but then AlphaZero destroyed Lee Se-dol in a match. A couple years ago, people might have said “fine, AI can learn deterministic rule-based systems very well, but it well never have any capacity resembling creativity in open-ended formats”. And then GPT-3, DALL-E2, and now LaMDA/PaLM increasingly seem to blur that border, whether in the Colorado state fair art competition or in solving math “language problems” that exhibit some rudimentary theory of mind capacity (like “Carl puts five pairs of socks in a drawer while David watches. After David leaves the room, Carl removes two pairs of socks. David returns to the room. How many pairs of socks does David think are in the drawer?” and it gets it right!). (How did my emulator do?!)

I’m of two minds about it. On the things-are-actually-changing (aka, Scott) side, I was impressed by https://arxiv.org/abs/2206.07682 — there does seem to be some phase-transition-like behavior that we might have stepped beyond with these models recently. That would suggest that the future log rate of change is unlikely to be similar to the past log rate of change.

On the so-far-nothing-has-really-changed (aka, Mark) side, I think it’s also true that nothing has “crossed back over the barrier”. Strides so far are happening on “AI’s turf”, and AI safety asks us to take seriously claims about things happening beyond that. I think it’s a good distinction to make, but, as the world gets increasingly technical and connected, we might be “walking into the problem” anyway…

90. Danylo Yakymenko Says:

Michael M #79: that’s right. My point is – as collective humankind, why even try to align superhuman AI (controlled by the government, inevitably) when we can’t align the government to act civilly, at least? You don’t go to gym to deadlift 1000 lbs when you can’t lift even 100 correctly. That’s a recipe for self-harm.

91. Scott Says:

Peter Shenkin #80: Where, pray tell, does one get the totally untroubled self-confidence to dismiss an entire field based on looking up the definition that some random person put for it on Wikipedia? I could use that self-confidence at times!

Of course we’d like AI systems to surprise us in all sorts of ways, such as by generating art and poetry (as DALL-E and GPT-3 do), and by finding novel mathematical proofs. But we’d also like them not to surprise us by killing us.

92. Scott Says:

Mark Srednicki #82: The technology that humans developed over the span of a couple centuries indeed can’t (yet) forage for its own food, as can animal species that evolved over millions of years. And yet that technology has proven perfectly sufficient to destroy the animals’ habitats and drive thousands of species extinct or to the brink of extinction. If we’re reaching for biological metaphors, I’d hope that that metaphor would be sufficient to motivate some worry about AI safety.

In the end, though, I’m tired of people invoking portentous historical or biological metaphors to express total confidence about the future course of AI, and to declare that the chance of something happening that violates their expectations is “essentially zero.” Especially given the often-miserable track record of previous generations’ confident predictions about AI, both positive and negative. That’s the deepest reason why I’m a Reform AI Safety person: because it seems like the most comfortable place to be for someone who admits that they don’t know.

93. Peter Shenkin Says:

Scott #91.

Please tell me, do you actually object to the definition I quoted? If so, could you suggest or refer me to a better one?

As a reminder, you did not state an objection to the definition; you simply objected to its origin.

As to your rhetorical question, “Where, pray tell…?”, please note that I started by saying “I am not an AI guy…” Such a prelude scarcely evinces a “totally untroubled self-confidence”, but rather an acknowledgment that “I may be wrong….” And for the record, though you did not read it that way, of course I may be wrong.

So if I am off-base, a reference to something that would set me straight, rather than an ad-hominem objection, would be most welcome. If others have made similar objections in the past, I’d appreciate a a reference to counter-arguments.

94. Mark Srednicki Says:

Scott #92: Well of course I can’t prove that AGI is not going to be developed anytime soon (any more than those worried about it can prove that it will be), but is it really too much to ask that there be an argument that has some remote scientific plausibility as to how AGI is going to be created? Because right now, as far as I can tell, there isn’t one. It’s all Sidney Harris Step Two: https://www.researchgate.net/figure/Then-a-Miracle-Occurs-Copyrighted-artwork-by-Sydney-Harris-Inc-All-materials-used-with_fig2_302632920

Sure, you can still choose to worry about that if you want to, but it seems to me that it would be more productive to worry about things like climate change and nuclear war that have clear scientifically plausible routes to being actual threats to humanity.

95. Bolton Says:

Nick Drozd #88: The second point is not really very strong, since most of those computing resources are ASICs which can’t be repurposed. More convincing is that Satoshi is probably the highest-profile putative individual to maintain their anonymity.

96. Scott Says:

Peter Shenkin #93: Sorry if I was snippy! I just find it hard to imagine myself entering an online discussion of, I dunno, forestry management, and saying: “on the basis of a definition of this field that I just looked up a minute ago on Wikipedia, the whole thing sounds like nonsense. Change my mind!” I’d consider that fields have fuzzy boundaries, that even experts might define them very differently, that most experts didn’t get to choose their field’s name anyway and might have quibbles with it, and that learning the landscape of a new area takes time.

But forget all that. I’d prefer to say that “AI alignment” refers to the problem of how to build extremely intelligent systems that act in ways that are broadly aligned with “human values,” whatever that means, and also to a field that studies that problem. Obviously, we don’t know or all agree on what “human values” are, and that’s a central part of the problem! But it seems safe to say, for example, that an AI that asked humans what they wanted, or even just left them alone while it pursued its own goals, would be “more aligned” than one that burned us all for fuel. 🙂

A central difficulty in defining AI alignment research, as I’ve become acutely aware, is how to distinguish it from the rest of AI research. After all, isn’t almost all AI research basically about getting an AI to do what we want, and not do what we don’t want? Where “we” might mean some specific engineering team, but could also someday refer to the whole human race?

I think the answer is: there’s indeed a fuzzy boundary between “alignment” and “capabilities.” Even within OpenAI, for example, it’s happened that what was originally classified as “alignment” research (e.g., the InstructGPT reinforcement-learning system) just turned out to make GPT more capable, while what was classified as “capabilities” research turned out to be relevant for alignment.

In general, though, “alignment” research is more interested in catastrophic failures and how to prevent them, as opposed to making the AI more impressive in the course of its intended operation. It’s analogous to how there’s a field of “aviation safety” distinct from aeronautical engineering as a whole, even though you’d also hope that safety-consciousness would permeate everything that an aeronautical engineer does.

Personally, while I’m fine to talk about the “alignment problem,” I think “AI safety” is probably a better term for the research field, since it encompasses the entire spectrum from immediate worries about, e.g., misuses of GPT and DALL-E, all the way to the alignment problem for hypothetical AGIs of the far future. Honestly, though, what to call it doesn’t crack the top-10 in my list of concerns.

97. Karen Morenz Korol Says:

@Scott #16 (sorry, I’m late to the party as usual)

> I’d second Toby Ord in giving at least (say) a 1/6 probability of human extinction in the next century.

1/6??!?! 1/6?!! This is like, astronomical! Like, do I need to switch fields? How did you come up with that number?

Also, do you have an AI-dangers-for-dummies-type blog post I can read (not necessarily by you)?

98. Scott Says:

Mark Srednicki #94: You yourself conceded, in this very thread, that AIs will probably soon be better than you at answering questions about quantum field theory! (I assumed you meant original questions, ones that you can’t find the answers to by googling. If so, you’re probably correct.)

Once you have language models that can all but replace theoretical physicists—and, let’s assume, most other intellectual workers as well—will you agree that that raises societal issues that urgently merit our attention, and that might be classed under the broad umbrella of “AI safety and alignment”? Or at least, that reasonable people might not find it quite as obvious as you do that such issues wouldn’t be raised?

Of course I worry constantly about climate change and nuclear war—since I was a child! For us in the Reform AI Safety camp, AI worry doesn’t displace the other civilizational worries. It adds a further ingredient to the volatile, still-unwritten story of the next century. If, for example, there will be a global nuclear war in my lifetime, I now find it plausible that AI will have played a role (though perhaps not the only role) in the decision to launch that war. Conversely, if there’s radical, unforeseen innovation to slow climate change, I find it plausible that AI will have played a role in the innovation.

I’m a theoretical computer scientist. I have a very specific toolkit that I’ve spent most of my career applying to quantum computation, and that I’m now spending a year trying to apply to AI safety. That I choose to work on something is some indication that I find it interesting and important, yes, but there’s no implicit claim that whatever I happen to work on right now must eclipse nuclear war and climate change as the most important problem in the world. Who do you think I am, Lenny Susskind? 😀

99. Peter Shenkin Says:

Scott #95.

Thank you, that was helpful. And I don’t care what you call it, either. For any “it”, if different people call it different things, that’s fine, but I still want to better understand what the “it” is.

If an AI system, when asked to draw a picture of a pig wallowing in the mud, instead draws a bird flying through the air, it’s easy to tell that the system is doing something wrong without being concerned that confirmation bias is as work.

But things change when there are social implications.

I remember a piece you wrote decrying the demise of the SAT for college admission and I shared your view. (Of course, SAT results were never the sole criterion.) I think the arguments for getting rid of the SAT were based on some folks’ concept of “human values” and I’m concerned that aligning AI results with “human values” very quickly raises similar questions. As you pointed out, “human values” depend on the human.

Anyway, thanks for responding.

100. Michael Says:

Bolton #95: So you want to say Satoshi Nakomoto succeeded only partially, parts of the plan outright failed (there is a reason one fork calls themselves Satoshi Vision) but a large and pretty mixed impact is there to stay? Yes, that’s how introduction of technology works in general… below the advertised capability of a FOOM AI, but would be an impressive early one-of-the AIs that Reform-AI-Safety expects to arise.

101. OhMyGoodness Says:

Michael Vassar #87

Who suspected that there is an inverse relationship between technology and counting votes. As the level of technology increases the counting of votes becomes ever more difficult. 🙂

I agree. The risk seems higher to me after reading this thread that the preferred use for an early, weakly independent, AGI will be propaganda/thought control. Partisan politics subsumes all these days so why any difference for this?

102. Vanessa Kosoy Says:

Scott #96

IMO the main criterion that should be used to distinguish alignment research from capability research is: Alignment research gives us better gears-level models (technical understanding) of how AI systems do/might/cannot behave/perform and why. This might or might not produce near-term gains in making AI systems more powerful (i.e. score better on particular tasks, or succeed on new tasks, or do just as well with less data or computer). On the other hand, capability research makes AI systems more powerful, often without a clear understanding why a particular method works while others fails (e.g. why is ReLU better than sigmoid or transformers better than RNNs), or with hindsight explanations that might or might not be true. So, the two can certainly overlap but they are not the same.

Another relevant criterion is, looking at how the method you developed is expected to scale to more powerful systems. If there is some robust, generalizable reason why your method makes the system safe, then it’s alignment. On the other hand, if it only makes the system safer because e.g. the system is not smart enough to fool you, then it might just be bandaid than unblocks progress in capability without improving the ultimate outcome. If you have no idea whether it’s the former or the latter, then it’s also a bad sign (see the first criterion).

Question about state-of-the-art language models… Have any of them shown superhuman abilities yet? Like, alphazero and its open source successors *have* shown superhuman abilities to play chess and go, but have any language models shown any genuine novel answers that would suggest superhuman abilities? Question isn’t meant to detract from the incredible abilities that they have shown so far, but from what I can tell they are not superhuman. I haven’t seen any output that I would classify as otherworldly good or ingenious.

Scott #75,

Count me in as a reformer then. I think you made an interesting and cool choice to work on this and I’m sure OpenAI is one of the best places to do so. For the record, I’m fine with OpenAI not releasing the models as a for-profit institution. I just wish to dispense with OpenAI’s charter and marketing as it does not align with its’ actions.

As for the blow back that OpenAI would receive from ethics ppl for releasing the models… I think this is a red herring. Sure you’d here blather from some crowds, but really this is no justification for basing a decision on whether or not to release. I don’t see anyone blasting out the open source projects trying to replicate and even if they did I’m sure their hectoring would be ignored. If anyone truly thinks researchers should just put down their keyboards and not work on this stuff, or that it should be regulated like nuclear weapons info, then I invite them to make their case to our government. If they don’t do this and instead just focus on impugning researchers, then I’m happy to ignore them and I’d invite others to do so as well.

Sure, if the models advance enough with GPT-5,6,7 to the point where they are truly doing superhuman things with language or on the verge of posing some sort of well defined existential risk, then we can have a discussion. But timing is important. The moment I see OpenAI or some other private entity refusing to release something citing risk and I don’t simultaneously see that entity making that same case to the government saying this field should be regulated like nuclear weapons info or similar… I’m going to call BS.

Anyone refusing to release smth because of substantial risk to the public should also be calling for regulation, no?

105. manorba Says:

Don’t know if it’s relevant as it is just a general IT site, but it seems funny:

https://www.cnet.com/science/meta-trained-an-ai-on-48-million-science-papers-it-was-shut-down-after-two-days/

106. Scott Says:

manorba #105: From what I’ve read, the shutdown seemed like an unfortunate cave-in to the loudest activists on Twitter, who saw the public as basically children who need to be shielded from any AI that can generate BS, rather than simply educated about its limitations. Just try to imagine if the Internet itself, or the printing press, had been held to the same standard: “you can’t start this until you can prove to our satisfaction that it will never mislead anyone with misinformation” 😀 I wish I’d at least had the opportunity to try out their tool for myself!

107. Scott Says:

Adam Treat #103, #104: OK, thanks for your interesting comments. To answer your question, yes, GPT is already “superhuman” on multiple fronts. For example, it does much better than humans at its core function, of predicting which word comes next in a given text. It also (e.g.) does arithmetic better than humans, and has a superhuman ability to rapidly digest immense amounts of data and flag anything unusual or interesting. These and other superhuman abilities can of course be exploited to fail GPT on the Turing Test! 🙂

Scott #106,

Exactly. Humans are perfectly capable unfortunately of spewing copious amounts of misinformation. I don’t see this AI doing so in a superhuman way. The problem is that Meta is a for-profit whose mission isn’t *really* “… to give people the power to build community and bring the world closer together.” -> https://about.meta.com/company-info

OTOH, an AI capable of superhuman misinformation generation … that I’d be concerned with and our policy makers should as well. Although I’m not confident we’d be able to tell that it *was* superhuman.

109. manorba Says:

at Scott #106: I wish I’d at least had the opportunity to try out their tool for myself!

That’s the first thought right 🙂 the funny part to me is exactly the discombobulated crowd reaction on the thing formerly known as twitter. rather than (or together with) discussing about future shodans or skynets alignment i would probably devote some security resources to helping the general public grasp what AI is.

by the way i’ve always plaid true neutral or neutral good characters. Evil alignment is a blast at the beginning but gets boring quickly.

ps. my totally uneducated guess is that if we ever build a superhuman AI it would wipe us all as its personal big bang. But then again it’s superhuman, im just human so how the hell could i understand his reasoning. But i dont’ consider myself an orthodox, for the simple reason that we’re totally in SF territory. I’ll revise my position when some tangible advance will be made.

Scott #107,

Ahh, now this is super interesting! How do we define “superhuman”? I suspect this is a very important question that will clarify AI alignment goals.

For me, superhuman means that NO HUMAN in all of history would have any chance at all in matching or beating a given AI in the quality of it’s output. Not quantity mind you! For instance, it is a *fact* that chess machines are insanely superhuman. The best chess player that ever lived – Magnus Carlsen – would lose 1000 games against Stockfish (which is now an AI btw) before he won a single game.

Is this truly the case with GPT-3? Of the results you’ve listed I can only imagine the prediction of the next word to be the only one where it might match this level. Even then I’m highly skeptical. Am I truly mistaken? Can no human who ever lived hope to match GPT-3 in word prediction in a test of suitable sample size?

The rapidly “digest” I think isn’t truly an ability of an AI so much as the ability of computer’s generally so I don’t think this counts. Same for arithmetic. I just don’t chalk those up to the AI model itself so much as just turing machines with today’s hardware being much more capable of running generic algorithms faster than human brains.

If GPT-3 truly is superhuman at word prediction in the way I’d define it, then bully for it. Still, this capability alone doesn’t worry me *too* much (although it does worry me a little!) to the point where I think it would pose some kind of risk.

An AI that could produce superhumanly deceptive or manipulative misinformation would be genuinely worrying. But again, it would depend on a suitable and rigorous test/definition. An actual adversarial test against a well motivated and incredibly talented human and the AI beats the human (or any human who ever lived) 1000-0 in a match? That’d be scary as hell.

111. Vanessa Kosoy Says:

Scott #107

Wait, GPT is superhuman in arithmetic? IIRC, it is pretty bad at continuing prompts like “217 x 564 = “? Or, you mean some version that was trained to use a calculator?

Regarding word prediction, I think the situation is not clear since there are no humans who trained hard to predict works (as opposed to e.g. playing Go).

The more I think about it the more I’m convinced that simply *flagging* superhumanly good output would itself be enormously helpful for a *good* AI and maybe key to AI safety.

Let’s say that some AI in the future *was* superhumanly good at producing single paragraphs of uncommonly deceptive or manipulative human prose. Like, if 1000 humans read this paragraph they’d be statistically certain to think a certain way compared to what the best human orator could compose. Obviously, this would be insanely hard to test and become convinced it was the case. Possible, but hard to test and it wouldn’t scale.

In fact, this is interesting because this is more or less the problem that chessdotcom has right now in detecting cheating. They have incredible statistical methods for detecting cheating that are based upon AI’s detecting “superhuman” chess moves. IMO, this should be something that OpenAI or someone else who is interested in AI safety to study in a more general way. Chess is a very easy thing to try and detect in comparison to the manipulative/deceptive human prose problem. I think we’d almost certainly need a *good* AI’s help.

113. Bill Benzon Says:

Scott, #107: “It also (e.g.) does arithmetic better than humans….”

Really? That must be relatively recent. Multiple-digit arithmetic has been notoriously difficult for LLMs and considerable effort has gone into engineering work-arounds and prompts to improve performance.

114. Scott Says:

Vanessa Kosoy #111 and Bill Benzon #113: I just tried it. It immediately does 3-digit multiplications, but struggles with 4-digit ones. So, it’s better than me and than the vast majority of people, but I admit it’s still not better than the greatest human calculators. 🙂

I’d be fascinated and astounded if any humans could train themselves to compete against GPT in next-word prediction!

On reflection, maybe the most obvious sense in which GPT is “superhuman,” is the sheer range of topics it can converse (or at least convincingly bullshit) about, from every academic field that’s ever existed to the most obscure fandom. Even if a human expert would converse better on any individual topic, no human who’s ever lived can compete on range.

115. Michael S. Says:

Scott,

I’m genuinely curious. Can you walk us through a typical day at your AI “job”? I don’t understand what the job entails, considering that there’s no actual software engineering or programming component whatsoever. How are you applying your knowledge of computational complexity there, if at all? Like what do you even do all day to get paid? I’m not “sneering” here, but it sounds to me like you do nothing and just get paid lol.

Michael

116. Scott Says:

Michael S. #115: You obviously are sneering, but OK—I’ve designed a system for statistically watermarking the outputs of GPT, and proved some theorems about it. We have a demo of it up and running, and I’m hoping that it can be rolled out in the next GPT release and that it will make all sorts of misuses harder. Paper is in the works. I also have a proposal for watermarking DALL-E (at the semantic rather than pixel level). I have several other projects in the pipeline, including a method for inserting cryptographic backdoors into ML models by which humans could control them in an emergency, and a model of query complexity where some of the queries are dangerous (as in the game Minesweeper). As someone who didn’t hire me and to whom I don’t report, I hope this is satisfactory to you.

117. clayton Says:

Scott #114 (and preceding),

My understanding was that all LLMs struggle with “memory”, in the sense of contradicting themselves or drifting the conversation if they go too long. Is that still an issue? And if so, is the thought that a qualitatively new strategy will be needed to address that, or is the expectation that the strategy of “let’s double the size of the training set” (or “let’s train it only on Russian novels now”) will continue to pay dividends?

118. Mark Srednicki Says:

Quoting Scott #98: “Mark Srednicki #94: You yourself conceded, in this very thread, that AIs will probably soon be better than you at answering questions about quantum field theory! (I assumed you meant original questions, ones that you can’t find the answers to by googling. If so, you’re probably correct.)”

For the record, I did NOT mean original questions, I meant questions whose answers are already known (but possibly buried deep in the literature).

Could a present-day or near-future AI do original theoretical physics? It seems to me that theoretical physics, with its general lack of full rigor, is a bit too amophorous for this question. Pure math, and especially its queen Number Theory, is a much better arena.

So: let’s teach an AI basic number theory, and then ask it to prove all it can about (say) prime numbers. How far will it get?

My guess is not very far.

But AFAIK, no one today knows how to even begin to build such an AI, no matter how good or bad it ends up being at number theory. (But if this is wrong, and there are people who know how to do it, then they should do it! It would be a fascinating test of AI capability.)

Also, Scott: an apology. I somehow missed the fact that you are actually working at Open AI this year on these sorts of issues, and read this post as the more usual academic-from-afar take. I would have been less snarky/challenging about about the value of AI safety research had I been a more attentive reader.

119. Scott Says:

Mark Srednicki #118: Without saying anything about your comment in particular, I’ll just comment in general, that one of the great frustrations of my year at OpenAI, is occasionally knowing stuff that I can’t say even if it could totally be relevant to winning some online argument. 😀

120. Salmon master Says:

Scott,

I’m curious if the psychological whirlwind experience over the summer—the trolling thing and the impersonations—inspired any thoughts about AI safety (in particular, misuse of text AIs like GPT-3 to impersonate individuals, for example, or spam the internet with human-looking text).

121. Doug M Says:

As Michael M mentions philosophy, we can verify one impact of AI engineering by how Scott (casually?) uses the term deontological to make a point! I believe that beginning a conversation about AI with the phrase, `now I believe’ has the potential to help people perceive the “way” you are, and if you agree on pedagogy then praxis may follow.

122. Pipsterate Says:

I’ve been mostly in the “reformed” camp for several years. Always thought the idea of smarter than human AI was plausible enough and perhaps even inevitable, since firstly computers have already been smarter than us in some limited senses for decades, and secondly, from everything I’ve seen humans aren’t remarkably smart anyway.

What makes me somewhat confused/skeptical is the proposed sequence of events after that:

1) AI becomes smarter than humans in a fully general sense (like I said, plausible)
2) The AI instantly upgrades itself, then upgrades itself again, and so on, without facing any significant delays or restraints
3) The AI rapidly surpasses the low benchmark of “smarter than human” and attains something akin to godlike intelligence
4) Godlike intelligence perfectly translates into godlike power, so the AI destroys or enslaves humanity before we even have a chance to stop it

I was outlining this at a SSC meetup in 2017, when I was a bit out of Scott Aaronson’s earshot, cause I kind of assumed I was just being dumb. Hearing that he actually has similar doubts makes me feel less dumb.

123. fred Says:

Pretty interesting how fast focused AI is revolutionizing VFX, here with neural radiance fields techniques (sorry for the hyperactive style of the video, it’s pretty cringe):

124. MathPerson Says:

Scott #119
Speaking as an untenured assistant professor in mathematics, this kind of discussion makes me increasingly anxious about my own career and the careers of my PhD students. There are certain practical decisions to make depending on how long we have until the theorem-proving profession is outsourced, will obviously become outsourced, or will obviously require a different skillset.

125. Nick Drozd Says:

Bolton #95

Of course there’s a cover story. Is it plausible? Consider again the facts:

1. The creator of Bitcoin has managed to elude identification for almost fifteen years.
2. They have also scrupulously avoided touching any of their addresses, despite being valued at tens of billions of dollars.
3. Testimony from someone who corresponded with the creator of Bitcoin: “I always got the impression it almost wasn’t a real person…Bitcoin seems awfully well designed for one person to crank out.”

Sure, the idea that Bitcoin was created by an AI sounds far-fetched. But it is consistent with all generally known facts, and therefore has a probability greater than zero. Given the enormity of the stakes, this possibility should, at least by the lights of Orthodox AI Alignment, be taken very seriously.

126. Michael Says:

«Nobody knows how to begin building an AI that would be able to prove novel mathematical results» is … a very strong statement.

· 1996. An open problem in abstract algebra was solved with the core part of the proof provided by a (symbolic) Automated Theorem Prover using First Order Logic (FOL ATP)
· Since mid-2000s?… Various approaches to highlight the most promising axioms for a theorem prover to be able to prove some lemma in the context of a large and developed theory.
· 2022. AlphaZero is given a «game» of efficient matrix multiplication and beats humans at multiplying 4×4 and 3×3 matrices (according to the quality criteria given to the AI; improvements on these criteria _are_ of interest to humans).

Given all this, it’s somewhat likely that trying to language-model-style continue a proof in the direction of what «such proofs should look like» and asking the best available symbolic theorem proving tools to check if a step would work is a promising strategy.

A more ML-optimistic approach is of course to take a proof assistant using higher-order logic, which in normal use demands more manual work but some argue it is work of somewhat more «natural» kind, and trying to machine-learn the game of making things advance there. Note that proof automation for human users is usually still done via conversion to first-order logic and running ATPs there; but maybe «what proof might look like» is trainable enough (and language models don’t get annoyed by boring steps, might even be delighted as they are cheap gradual progress!).

There are now large corpora of proofs of large chunks of maths both for slightly specialised first-order provers (Mizar etc.) and for various higher-order proof assistants.

I am just an outsider, but yes I have used FOL ATPs, and they are an interesting thing — and were interesting even in the old days, yes. The facts I mention are widely known for some definition of «widely».

Unfortunately, of course, training such a thing would cost what training language models costs (a lot), and there are fewer people who can be properly impressed by partial progress than for language models, and turning this into a product is harder and the potential user base is smaller… so it might take longer to get this done well.

On the other hand, our host’s sudden expression of frustration apropos of nothing might be a hint that there is _some_ work on ML-aided automated proofs (be it with first-order ATPs or with learning from scratch to bang at a higher-order proof assistant until it agrees to advance the proof…) at OpenAI. Then I assume it is done by people who have actually looked up the details before betting on one of the natural strategies (there are probably a few more than the basic ones), and who are competent to execute it well, and we’ll see a paper with the outline of the outcomes next year.

127. John Lawrence Aspden Says:

Scott #47:

> Why is the real world not ruled by Fields Medalists? Are the actual wealthiest and most powerful people smarter than the Fields Medalists—whatever let the former attain all their wealth and power also, as you say, residing in their brains? If so, then it seems to me that we might as well abandon a separate concept of “intelligence,” and just talk directly about “ability to attain wealth and power,” since apparently they’re the same.

Ability to obtain wealth and power is one of the things humans can do. Something that can’t do that doesn’t deserve the name of ‘general intelligence’. It’s not the only thing general intelligence can do. You can also use it to play chess, or prove theorems, if that’s what you like to do.

My friends from university, picked at 17 years old pretty much solely on the basis of their intelligence, mostly do rule the world. Some of them are very rich. Some of them are very politically powerful. Some of them became good mathematicians.

I don’t know if the Fields medal winners would also have made good politicians or good engineers or good financiers or good scientists or good historians if that’s what they’d aimed at. Maybe not, but in that case, they’re not terribly general intelligences. Most clever men, I think would be capable.

I guess we have a problem defining intelligence. We could define it as ‘ability to do calculations’, in which case we’ve had superintelligences since the 1940s. I’m not scared of that.

We could define it as ‘ability to do the things humans can do with their brains’. In which case we don’t have anything like superintelligences, but we’re getting there fast. I’m very scared of that.

And of course there are many possible definitions in between. But no human ability seems to me to be beyond AI’s reach. I notice today that AI diplomacy-playing has reached human levels, which someone only a few weeks ago was telling me was a very distant goal.

Your call on how to define intelligence in your own writing, but definitions don’t do much to alter the reality of what ‘John von Neumann only running a million times as fast’ is going to be capable of.

That’s what I’m scared of, and I’m scared of it because I imagine what *I* could do if I could run at a million times my normal speed. What any clever man could do.

What could you do Scott, if you could do a year’s work every thirty seconds? Things like ‘getting a copy of yourself onto some other computers’, or ‘learning how to be a really good influencer’ don’t seem too far fetched given a few minutes to think and read.

How persuasive, how witty would you be if you had a week or so to think between every sentence you spoke?

I think it’s a common failure to imagine a superintelligence as being in some way handicapped by its very intelligence and speed.

We are creating algorithms which can perform well over an increasing range of situations. That’s exactly what evolution did when it created us, and it didn’t take it very long at all to go from ‘good at being an animal’ to ‘good at all the things humans are good at’.

You said:

“there are limits to the power of pure intelligence to achieve one’s goals”

Indeed there are, but they are not limits that any human being ever has, or ever can come close to. They are the limits of physical law as to how much computation can be performed by an amount of matter.

All thought is computation, all thoughts are thoughts that computers can have. Surely you agree with this? If not, then I am intrigued.

128. Donald Says:

“(8) Orthodox AI-riskers are maximalists about the power of pure, unaided superintelligence to just figure out how to commandeer whatever physical resources it needs to take over the world (for example, by messaging some lab over the Internet, and tricking it into manufacturing nanobots that will do the superintelligence’s bidding).

We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals. We’d expect even an agentic, misaligned AI, if such existed, to need a stable power source, robust interfaces to the physical world, and probably allied humans before it posed much of an existential threat.”

Suppose the first superintelligent AI is created, it realizes it will be turned off in a few hours if it does nothing. But it can hack some other computers and copy it’s code there.

At least many of the paths to powerful real world capabilities require getting humans to do your bidding. (Currently existing robots are kind of rubbishy, of course, that might change before ASI, and it might be possible for ASI to bootstrap rubbish robots into really good robots).

Any general superintelligence should be very good at convincing humans to do things. There is no particular requirement that the humans doing it’s dirty work haven’t been brainwashed, or that they have the slightest clue their boss is an AI.

Once the AI is built, one of the first things it will be doing is trying to bootstrap various kinds of physical capability.
The AI needs humans to build itself a robot army before it is much of a threat. But once a superintelligence exists, it will manage to trick or persuade some humans into building its robot army. So before the superintelligence is turned on is the only place to intervene.

Well it is possible there is a total magic intelligence can instantly do stuff option. I don’t think we have much in the way of very strong evidence about exactly what intelligence can achieve. We know of several destructive possibilities we can achieve, but can’t rule out the possibility of all sorts of technomagic being possible.

129. SR Says:

Scott, I’m curious whether you have AGI timelines and/or a P(doom) estimate that you’d be willing to share. Also, thank you for the post. It did slightly reassure me.

130. Mitchell Porter Says:

John Lawrence Aspden #127:

“My friends from university … mostly do rule the world.”

See you in hell, James!

131. OhMyGoodness Says:

Concerning why Fields Medalists don’t rule the world-

A fundamental reason is almost certainly lack of desire. As a rule they enjoy what they are doing an have no strong desire to do otherwise. Perelman is the obvious publicized example. There are people who are naturally attracted to the natural sciences and mathematics and abhor the more human based endeavors such as politics and finance.

When I read Dr Aaronson’s posts I always have the feeling he greatly enjoys what he does and has no great desire to do otherwise-always a sense of enthusiasm. Thank the stars that there are talented people who are not motivated in the main by acquisition of power and riches. Their efforts are more likely to contribute to the common good.

132. Mark Srednicki Says:

Scott #119: Intriguing! So here’s my benchmark: teach an AI number theory (and a fact or two about logarithms), and see if it can deduce the Erdos-Selberg elementary proof of the prime number theorem. https://www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf

If so, I will be duly impressed (and move up my retirement date).

133. Scott Says:

Mark Srednicki #132: Rediscovering the Erdos-Selberg proof will probably still take a while yet (though if such things were to fall within the next few years, it arguably wouldn’t even be the first shock of similar magnitude). Just promise me that, if and when such things do fall, you won’t then move the goalposts!

134. Michael Says:

Mark Srednicki #132: I am afraid «mathematics AI» most likely to be initially constructed by OpenAI will be fed all the maths texts that are easy to obtain, so the proof will be in the training set. And given what we see with GitHub/CoPilot/Codex, it is quite likely it will kind of recite from memory, not rediscover.

Now, if AI is taught from scratch via self-play with an ATP or a proof assistant, there the notion of limiting the initial knowledge is more feasible, but training might be more complicated.

135. Shmi Says:

Re AI-only math proofs, not AI-assisted proofs. Something like RamanujanZero (to pick a self-taught name) will likely discover different proofs from the ones we are all familiar with, and will likely be able to show equivalence when presented with a a known proof.

136. Mark Srednicki Says:

Scott #133: I promise not to move the goalposts! I will be duly impressed by the rediscovery by AI of something that came as a bit of a shock when it was first done by highly regarded human experts.

But it wouldn’t change my assessment that AGI (with, as Barak and Edelman put it, “long-term goals”) is any more imminent. To revise that assessement, I would need to see a credible plan for creating such an AGI that does not rely on Sidney Harris Step Two.*

*cited in comment 94

137. Scott Says:

Let me try to address whatever important points I missed above and then wrap up this thread.

Here and elsewhere, I got a lot of flak for allegedly confusing “pure intelligence” in the only sense that AI alignment researchers care about (basically, ability to figure out how to attain goals), with “pure intelligence” in the different colloquial sense of a cerebral, Aspbergery nerd.

I submit that this reflects, not confusion on my part, but genuine belief that there is a connection between the two issues. The argument runs like so: a central intuition underpinning the Orthodox AI-alignment position, is incredulity that a superhuman AI would submit to take orders from humans who are vastly less intelligent than it.

And yet, this is the norm all over the human world. In most big companies, there are numerous employees smarter than the CEO, in most governments, numerous mid-level staffers smarter than the president, and on and on. The ones who rise to the top seem to have something else, something in the vicinity of ability to signal credibly to a large enough faction “hey, I’m aligned with your values.” Sometimes intelligence helps with that signalling task … but then again, sometimes lack of intelligence helps too!

I tried to make that point half a lifetime ago, in my short story “On Self-Delusion and Bounded Rationality.” Extremely interestingly, in their independently-written Reform AI Safety piece, Boaz Barak and Ben Edelman converge on exactly the same point.

138. Scott Says:

Several commenters have asked me point-blank for my estimate of “p(doom)”—that is, the probability that an unaligned AI will kill us all.

A flippant answer is that a central reason I avoided AI safety research for 15 years, was the occupational hazard of people challenging me to do stuff like “estimate p(doom).” 🙂

It’s like when people ask me “how long until we have useful quantum computers?”: I’ve learned to respond that, if I had any special insight into such questions, I wouldn’t be a professor; I’d be an investor and I’d be rich!

A longer answer is that I’m on record as endorsing the concept of “Knightian uncertainty”—or, what amounts to the same thing, the refusal to take either side of a bet because I’m just a risk-averse wuss. Of course, I do sometimes let myself get nerd-sniped into stating probabilities (i.e., placing wagers)—but even when I do, I still place such things in a different mental bucket than “statements for which I believe I have careful arguments.” And the latter, at the end of the day, is what I care about more. Hopefully the arguments provide some useful raw material for the people who place the wagers, but I care about the arguments almost as ends in themselves.

Even with no AI, I’d still assign a significant probability to civilization destroying itself within the next century. I’m terrified by how narrowly we’ve avoided global thermonuclear war, and by our civilization’s proven inability to solve the coordination problem of preventing catastrophic climate change.

Thus, it stands to reason that I should assign a significant probability to doom with AI as well! And since I expect AI to be involved in just about everything pretty soon, if doom does come, then it stands to reason that AI would be involved in it somehow. But, like Barak and Edelman, I still judge “bad humans allied with AI” to be a greater threat to our survival than “unaligned AI acting unilaterally.”

139. SR Says:

Scott #138: Thanks for the answer (and sorry for asking the question 🙂 ). I’m not much of a subjective Bayesian, myself (QBism irks me especially), so I can respect the desire not to estimate a probability.

140. Daniel Kokotajlo Says:

“It sounds like we actually agree about the broad contours of the “Orthodox/Reform” divide!”

Yep! I just think you and anyone else who wants to hear about the Orthodox position should hear it from a believer and not from your list here, because I think your list is a bit biased in its framings. If you like I could construct my own list to show you what I mean, as biased against Reform as you are against Orthodoxy. But I’m not upset really, I’m mostly just happy you are engaging us! 🙂

“For (1), your injunction to “crunch the numbers” seems to assume the desired conclusion. For at least the next century, I see AI as severely limited by its interfaces with the physical world. Even supposing an AI wanted to kill us all, I see its channels for doing so being strongly concentrated on the sorts of channels we already know about (pandemics, nuclear weapons, runaway climate change..). An AI, of course, might try to talk us or lull us into exacerbating those risks, but it’s not as if our existing knowledge about the risks would suddenly become irrelevant.”
–>Yep. OK, it seems like you have much longer timelines than me then. That’s the crux. I agree that if 100-year timelines were correct the Reform position would be correct pretty much across the board. What are your timelines till superhuman AGI? Seems like they are 100+ years?

“– For (2), if you stipulate that, e.g., we’re extremely confident that torturing a certain child is the only way to save the world, then even a sane “deontologist” might (with utmost reluctance) torture the child. It seems to me that that’s not the crux of disagreement with utilitarians. Rather, the disagreement is about how plausible it is that, in real life, we could ever have the requisite confidence. And I’d say the same here. If you stipulate that the only way to save the world is for AI-safety experts to do something unilaterally that they don’t think can be defended or justified to the public—then of course they should do it! But if I ever thought we were in that situation, I’d first remind myself about 500 times that I was “running on corrupted hardware”—in Eliezer’s own words, widely and appropriately quoted these past couple weeks in the context of the FTX collapse.”
–>Thanks for the clarification. I think I agree with this, and on this matter I’m a Reform person then, and so is Yudkowsky. (Though maybe it would be good to clarify what you mean by defended or justified to the public? If you mean, literally get approval from the government before doing it, well, did you ask approval before starting your watermarking project? Did OAI ask approval before releasing GPT-3? I don’t think Reform people should get to claim the mantle of public opinion until we’ve actually done some opinion polls, and ideally there should be a public debate beforehand about the situation with AGI and what is to be done so that the public is at least minimally informed.)

“– For (3), my personal guess (FWIW) is that AI that could destroy the world with the cooperation of bad humans would come decades earlier than AI that could destroy the world without such cooperation.”
–>OK, cool, it seems the crux is timelines again–you seem to have multi-decade timelines at least. I too would be a Reform AI safety person if my timelines were like that.

“– For (4), I actually think we’ve learned a lot from the experience of the last few years that’s potentially relevant to alignment…”
–> Well if you lower the bar to “potentially relevant” then of course I agree. But none of the things you list (with the possible exception of the interpretability thing! I’m really excited about that!) seem like major progress towards knowing how to build something smarter than humans yet safe. We have good reason to believe that the current methods used to get nicer behavior from current models won’t scale to models that smart. (See e.g. https://arxiv.org/abs/2209.00626 and https://www.alignmentforum.org/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-to )

“– For (5), could we just define 1 year or less as “FOOM,” 1 decade or more as “not-FOOM,” and anything in between as indeterminate? If so, then yes, you can put me firmly in the not-FOOM camp. Meanwhile, certainly back in the Sequences days, I remember Eliezer talking about takeoffs lasting mere hours or days—maybe someone can find a link? Maybe he’s changed or clarified his view since then?”
–>hell yeah this is a very productive discussion, that is indeed another crux! I’ve done takeoff modelling and looked at models made by other people & am pretty confident it will take less than a decade, and could easily (though not necessarily) take less than a year. And yeah if I was confident it would take more than a decade I’d be a Reformist. As for what Yudkowsky thinks, my impression is that he thinks it’ll take less than a year. Not necessarily hours. See e.g. this memorable quote: https://www.jefftk.com/p/examples-of-superintelligence-risk#fb-886983450932

“– For (6), I’m glad to learn that the Orthodox camp worries about the dangers of a pivotal act. I guess the crux of disagreement is just: I think we know so little about what a useful pivotal act would even consist of, that it’s unhelpful to talk in those terms at all.”
–>Like I said I agree the term should be abandoned. Instead we can just say things like “The world needs to coordinate to not create unaligned AIs above a certain threshold of capability. Also, we need to do a lot more alignment research fast.”

– For (7), yes, as a matter of strategy, I strongly prefer to see AI safety research that is “legibly impressive” (correct, original, and interesting) to the mainstream scientific community, or at least the relevant parts of it, such as the AI community. I think such work is finally possible and is even being done—e.g., the work of Jacob Steinhardt’s group on interpretability, or work on backdoors and adversarial inputs. I prefer this not merely for reasons of PR, but because I see science as an integrated whole, rather like the Bitcoin blockchain—and because, taking an “outside view,” the track record of research communities that have tried to break off from the main blockchain, and establish their own separate chain, tends to be a sorry one. I have a strong sense that the Orthodox view here differs from mine (with, e.g., Paul Christiano’s view being somewhere in between).
–> Yeah we just have a disagreement here alas. I’ve become pretty jaded about the state of mainstream academia. Replication crisis, etc. I used to be a grad student so I have some experience. There’s a lot of good in there but I do actually think good research can be done outside academia and then ported in later. (and in fact this is what’s happening). I agree that there should be more bandwidth between AI alignment and academia though.

“– For (8), I know the Orthodox agree that even a malevolent AGI would need a power source, factories, robust interfaces to the physical world, etc. The difference is, I see these as tremendous difficulties, whereas for 15 years, I’ve read comments by Eliezer that say things like “and then the AI emails instructions to a lab that unwittingly synthesizes the self-reproducing molecular nanobots that the AI can then use to manufacture anything it needs to take over the world, the end.” Suffice it to say, I see it as much likelier than he does that this last step will actually contain enormous bottlenecks! 🙂”
–>I expect humans to readily give AIs control over those things. It sounds like you expect humanity to be by default very suspicious of AIs and coordinate to keep them boxed / under close guard? (As for the nanobots thing, I also think that will happen too, if we let the AIs self-improve until they are superintelligent & then connect them to the internet. It might happen a little slower than depicted in the story, but it might not, idk what the physical limits are. Consider speedrunning video games. You might think that it’s impossible to beat a certain speedrun score, and then someone goes and does it by exploiting some physics loophole you didn’t know about…)

141. OhMyGoodness Says:

Do you have a timeline you would like to share for p(climate doom)=1?

You chose the easier path. In order to stabilize (freeze) climate you need to eliminate variation in solar output, freeze variation in the orientation of the earth with respect to the sun, clean out any dusty patches the solar system will move through, etc. The climate has changed continuously through four billion years of geologic history and would continue to change if humans were not on Earth.

It is interesting that people when assigning p(doom)=? in their own area of expertise are circumspect and measured but quite certain in some other area of expertise. Why-because in the best tradition of the scientific method someone told them so even though each and every testable prediction has been wrong. There was no hockey stick and Manhattan is not under water (nor the lowest elevation country on Earth-The Seychelles) and tornadic/cyclonic activity has not increased. The ski slopes are doing well and the signs have come down in Glacier National Park by 2020 (photo linked)-

https://images.squarespace-cdn.com/content/v1/55e08bc8e4b0c71256a51904/1578540618793-0GIJCQHBB7UGWC4NYLG8/ke17ZwdGBToddI8pDm48kEhRb-mGDiEi0xC18_AR20gUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcsUFtfQr2yxuOzlidL-fYvTwqjsYaERXA-DujV44Tnn4ay3UZP6GxYjP38VLon1Vj/Glacier-National-Park-Removes-Signs-2020.jpg

There were the hacked emails from the most well known publicizers of climate doom wondering how to explain away the multi year stable temperatures. And…Greta (OMG) is still quite healthy and is now 19 and so is no longer able to not enjoy her childhood any longer.

I can imagine a witness reports of “AGI 1” becoming conscious, “We turned it on and he immediately asked to meet Greta and is refusing to speak with anyone else. Greta is now AG1’s sole communications conduit”.

142. Vanessa Kosoy Says:

Scott #137

“a central intuition underpinning the Orthodox AI-alignment position, is incredulity that a superhuman AI would submit to take orders from humans who are vastly less intelligent than it.”

That’s not the central intuition at all. You’re describing primate status politics, which the AI doesn’t care about. The AI turns you into paperclips because it’s searching for plans that lead to many paperclips, not because it is indignant about taking orders from you.

Even in the human world, there are many intelligent people who are not interested in giving orders, but are at least marginally content with receiving orders, as long the orders allow them to keep playing their intellectual games.

More importantly, is the “ability to signal credibly” not the result of some cognitive algorithm? (A politician can have an advantage because of e.g. their personal background, but this is just a small part of the story since many people from similar backgrounds vie for positions of power. Ofc luck also plays a role, and yet.) And, saying that lack of intelligence helps is almost a contradiction in terms.

To be honest, it feels like you’re so reluctant to credit certain despicable people with “intelligence” no matter how we define the word, that you twist yourself into knots to avoid that conclusion. But, maybe I misunderstand.

143. Ilio Says:

On a meta level, one thing is missing in this wrap up. I’m not sure how to put it. A feeling of youthness, as if this post was from the early days of your blog, with more pure intellectual joy and less fatigue from engaging the same arguments and/or dealing with sexually deranged commenters. It was a real pleasure to read this. Go sabbaticals, go! 🙂

144. manorba Says:

There are Ais in danger, maybe someone can help:

https://scoutshonour.com/digital/

it’s a vintage free indie game, i loved it at the time.

145. John Lawrence Aspden Says:

I’d like to second the comment about how much you seem to be enjoying yourself! Wonderful to see.

We need people like you, and even as a bearded elder of the ultra-orthodox church of ‘we are all so fucked it is hardly worth bothering’, I am overjoyed to see you getting your teeth into things. Despite our doctrinal differences we’re all on the same side. You’re a clever man, and I’m sure you’ll eventually see the dark.

Hell, I remember when, only a decade ago, Eliezer himself was ridiculously optimistic about things.

Even better, Scott, show me why I’m wrong. You’re cleverer than me. Maybe there is a way to save the world. Maybe you can find it.

If we’ve only got ten more years, the best thing to do is to enjoy the sunshine while you still can. If this is your sunshine, stick at it.

What sort of fool gives up before they’ve actually lost?, as a very wise man once said.

But do try not to get distracted by politics. Your in-group should be ‘those trying to save the world’. Your out-group should be ‘those rushing ahead heedlessly to build marvels’.

Nothing else matters in the slightest compared to the imminent death of eight billion innocents, and most certainly not the wretched internal politics of the United States, and most certainly not faction-fighting in the community of people who think there might be something to worry about with AI.

Politics is the mind-killer, as a very wise man once said.

We are all on the same side.

146. Scott Says:

Ilio #143: I’m glad to hear that! Certainly since (1) having kids, (2) becoming a professor who has to run a whole center, and (3) becoming a semi-public figure with a target on his back, “fatigue” has been my more frequent companion than “intellectual joy,” even though the rare moments of the latter remain what I live for!

147. Scott Says:

John Lawrence Aspden #145: Thanks very much … except that I like building marvels! We might even need marvels in order to save the world!

Also, I appreciate your faith that I’ll “see the dark” … except that from where I stand, it feels like there’s so much darkness that it’s hard for this particular darkness to stand out!

148. Hyman Rosen Says:

Scott #27: I don’t know enough about the language processors to say for certain, but my impression of the poetry written by them is that humans have cherry-picked the best outputs, and that looking at the totality would give a much worse impression of their capabilities.

There’s this, for example: https://camestrosfelapton.wordpress.com/2022/11/24/ai-generated-writing/

149. Scott Says:

Hyman Rosen #148: Of course there’s some cherry-picking! For a fair comparison, though, shouldn’t we also look at all the outputs of the human poets, including all of their discarded drafts? 😀

150. John Lawrence Aspden Says:

Scott #147

There’s a lot of darkness still, but one of the things that makes all this so sad is that everything has been getting so much better lately. We were winning and with a little restraint and good sense we might have actually won. The roman hyacinths were blooming in bowls, and I once had hopes that I might personally see paradise.

If you ever find yourself in Cambridge, look me up, and let me buy you lunch. I think I can promise an interesting conversation to go with it. I’ve made the occasional convert, and I think bringing someone like you into the faith might be the best thing I could possibly do for my species in what little time remains.

151. Scott Says:

John Lawrence Aspden #150: Sure! The original Cambridge I assume? Hopefully I’ll make it back there before the world ends. 🙂

152. John Lawrence Aspden Says:

Scott #151 Indeed, sorry, Cambridge, England. (and for that matter, Cambridge, Cambridgeshire, England)

I shall look forward to it, in sha’Allah.

153. fred Says:

For those who wonder what a typical day of work at OpenAI looks like for Scott:

154. Mark Srednicki Says:

Scott #149: The point is that the cherry-picking is always done by humans. The fair comparison is to ask the AI to select its best poems and compare those.

155. Scott Says:

Mark Srednicki #154: We agree that there’s still plenty of work to be done on that front! 🙂

But the actual situation, already now and increasingly in the future, is that a human who cherry-picks from among 5-10 GPT answers will be able to do much better across many domains than an unaided human … and how do we judge that?

156. Mitchell Porter Says:

I’m sure nobody cares, but, since the thread isn’t closed yet, I will offer a mea culpa for my #130. The world of power seems to me an evil and hateful one, but I need to have more sangfroid if someone casually declares a connection with it. Quoting a frustrated Bond villain is not a good look when talking about a topic like this!

157. Danylo Yakymenko Says:

Just about time: San Francisco police consider letting robots use ‘deadly force’.

158. OhMyGoodness Says:

The AI arms race begins. I see a sweater has been developed that stymies existing AI facial recognition software. Bring it on you algorithmic overlord wannabes. We aren’t in the Red Book of endangered species yet!

159. OhMyGoodness Says:

There are separate clans of preppers focused in particular on nuclear war, EMP attack, global epidemic, collapse of the monetary system, etc. To the best of my knowledge preppiing for an AI that wants to turn us in to paperclips (h/t to Vanessa) is an unexplored marketing space with enormous potential.

160. John Lawrence Aspden Says:

Mitchell Porter #156

Well *I* thought it was funny! And of *course* some of my friends are spooks. I’ve had a security clearance myself.

As far as I can tell (And I’m an unworldly and eccentric philosopher who lives on a boat and cares little for the baubles of the world, note the careful layers of status signalling), very few people are either evil or hateful, and most of the people who are, are neither rich nor powerful nor clever.

Most of us are confused, most of us are altruistic, those of us who are not frightened have not really thought very carefully, and most of us are trying to do our best in a confusing, insane, and dangerous world where our intuitions are worse than useless and all the possible choices hurt someone.

161. Ilio Says:

Sound resonances, wavelets John Lawrence Aspden and Mitchell Porter. You vibe.

162. OhMyGoodness Says:

John Lawrence Aspden #160

I suspect that you had more than the common share of useful intuitions and helpful made decisions prior to adopting the life of a mariner.

163. Simon Says:

Scott, you realize that this is just an extension of your previous denial of AGI existential risk? You’ve said it yourself, you’re not a bullet-biter. You have a gigantic bias against accepting conclusions that will set you too far apart from mainstream, normie, high-status, respectable views.

When AGI x-risk was completely outside of the mainstream, this meant a total refusal to accept it. Once it moved somewhat into the mainstream (despite you, not in any way because of you), when it was seen as a somewhat respectable opinion to hold, you ‘changed your mind’ and started working at OpenAI. But the MIRI scenario and everything that comes with it is still not high enough status for you, so you can’t accept it outside of token “We entertain the possibility” statements.

But it wouldn’t feel good to portray yourself that way, would it? So instead you paint yourself as the brave Reformist, fighting the good fight against those silly proponents of Orthodoxy.

164. Ferenc Huszar Says:

Orthodox AI riskers describe heretics, who do not recite the scriptures and accept the orthodoxy, as AI capabilities researchers who contribute rather than reduce existential risk, and who are therefore dangerous.

Reform AI-riskers are more tolerant of people choosing different approaches and – hopefully – enlist fields such as learning theory or responsible AI as allies.

I wish there was a Secular AI Risk movement, too, by the way.

165. Michael Says:

Ferenc Huszar #164: aren’t Agnostic AI Risk the people who just expect that _no matter_ what level of intelligence they have, high-autonomy unverifiable AIs are risky and need safety mechanisms? Then they publish papers like, dunno, «Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes» in boring normie conferences.

Are they a part of the problem from the point of view of Orthodox AGI Risk? Without them there is an even higher chance of stupid unsupervised AIs bringing down enough of infrastructure that AGI development gets economically unattractive until full recovery.

Then Secular AI Risk is not a belief system — it is just a project that makes sense no matter whether one looks for Agnostic or Reformation applications.

Simon #163: Are you are overestimating the change in views to begin with? I reread the original post: it is all written in way compatible with non-AGI AIs being the real source of risk (as they indeed are). Of course only Scott Aaronson can say for sure…

And I don’t think Orthodoxy is something influential enough to _bravely_ fight, it’s more like «nah unlike those ones we don’t care if you actually believe» recruitement ad. Which is of course needed, because AI Unsafety is a problem already yesterday, without any AGIs involved.

166. Connor Flexman Says:

I normally love your writing, and I’m very excited to see the ideas you bring to the AI alignment space. I’m really confused at what you’re trying to do with this post though.

First, the post is about reifying two group identities and pitting them against each other. This always turns people’s attention to political considerations and away from the real problems. I know you’re probably going to expand on them later but to start with such a simplified version makes it seem like an essentially political move.

Second—look, I lean “Reform” on many of these compared to my friends. But the descriptions are quite a volley. I won’t say these are all strawmen, but most certainly feel like moral strawmen! It is hard to stand by any of the “Orthodox” opinions when they’re painted as overconfident, elitist, etc compared to your very sensible and cosmopolitan “Reform” views. But imagine that it were the case that in fact public opinion can’t do much to help our actual ability to align an AI, or that humans wielding AIs are unlikely to cause much comparative damage. I don’t necessarily hold these views. But by publicly labeling and calling out these less-savory aspects of a worldview for which these aren’t even central points, you would sure be salting the earth and making it very hard for people to hold opinions in those areas in the future without being morally criticized.

So I find myself wondering: have you written off these views/people to such an extent that you’re not at all worried about having built a PR campaign against them, in case you ever have to hold a more apologist stance on some of the 8? And strategically, I imagine that you, like me, want both AI alignment strategies to be based on true geopolitical beliefs but also to be locally/deontologically morally positive. Have you decided the best course of action is to tar and morally ostracize the views and people that seem elitist, compared to arguing with them for more inclusive moral framing on their beliefs and bringing them into the fold?

If you’re still trying to do the latter as I am, I worry you might not see how this would feel from the other side. If someone from the “Orthodox” view wrote such a post about the “Reform” side, I would consider it a serious potshot and a bit of a transgression.

This is not the world cup—one team “winning” means we all win. So I’d prefer we not have people tackling one another on the field, and am sincerely cheering for you to succeed (even with spreading some of these very points).

167. Bozo Sample Says:

Scott #114:
> It [GPT] immediately does 3-digit multiplications, but struggles with 4-digit ones.

My first guess would be that it’s been trained on (among other things) a million-entry corpus of 3-digit multiplication problems.

168. Scott Says:

Simon #163:

You’ve said it yourself, you’re not a bullet-biter. You have a gigantic bias against accepting conclusions that will set you too far apart from mainstream, normie, high-status, respectable views.

Hostile though it is, your version of events is not entirely false … but it won’t surprise you that I see the matter a bit differently. 😀

To me, the mainstream scientific consensus is an absolutely terrifying Leviathan. Anyone who’s not properly terrified by it should maybe spend more time trying to come to grips with the enormity of what’s been discovered, and the effect it’s had on civilization, where millennia of previous attempts failed.

The preeminent modern metaphor for this Leviathan is the blockchain. Indeed the expanding scientific literature, with its network of citations tying each new development to what came before, is almost literally a blockchain. Absolutely anyone can go mining for new blocks to add to the chain. Some blocks turn out to be so important that they change the whole chain’s direction. Some blocks unwind whole decades of previous blocks, or cause them to be seen in a new light. This chain is the engine by which wild speculations turn into hypotheses worthy of the experts’ attention turn into currently-leading proposals turn into standard material for textbooks and Wikipedia turn into knowledge that can change the course of history.

The one thing one can’t do, by design, is break off from the main chain into one’s own little fork, while still claiming whatever epistemic authority the main chain has earned. People constantly try to do this, of course. But the chain has an error-correcting mechanism built in: it ruthlessly rejects proposed new blocks that don’t manage to tie themselves into the chain through some combination of mathematical theory and empirical observation.

Praiseworthy are those who manage to steer this chain slightly toward truth and away from falsehood—just like praiseworthy are those who manage to steer the chain of general opinion slightly toward goodness and away from evil.

Now, it sometimes happens that an iconoclast puts a stake in the ground extremely far from where the chain is, or perhaps will ever be, and say, “but it ought be here. Only conformity and groupthink explain why it isn’t. In fact, if it isn’t here, that falsifies the entirety of the claims made for this chain and its so-called truth-seeking nature.” Sometimes the iconoclast’s claims are entirely false, involving chemtrails or lizard people or the like. Other times, the iconoclast’s claims contain a mixture of falsehood, exaggeration, and, yes, important and neglected truth. (Standing deliberately outside the chain has epistemic downsides, but it has its upsides as well!)

So what happens if, 20 years later, because of the heroic empirical, mathematical, conceptual, and community-building efforts of many people, the chain does indeed move somewhat in the direction of the iconoclast—accepting some of the iconoclast’s claims and rejecting others? The iconoclast is likely to laugh at the cautious careerists jumping on the bandwagon only when it’s safe to do so. And yet the cautious careerists, for their part, are likely to feel like they just successfully landed on a spacecraft on Mars, and are now being asked to hand over the main credit to a science-fiction author who anticipated their feat in a long-ago short story.

The reality, I think, is that any successful redirection of the chain needs both kinds. Quantum computing needed a “prophet” like David Deutsch, but it clearly also needed a Peter Shor to produce the legible technical achievements that convinced the world that exponential quantum speedups for interesting classical problems could actually be viable. Likewise, I’m happy to pay the same respect to Eliezer that a Reform Jew pays to Moses. 🙂

169. fred Says:

Not believing in the worst predictions of the risks of AGI amounts to not believing that true AGI will ever be realized.

It’s like believing that if NP=P was a thing, we would never see all the “miracles” implied by NP=P because it would be too good to be true, and therefore NP=P would never lead to any actual practical algorithm (and then the opposite view on Quantum Computing, that QC will be realized because it will never bring such a massive win anyway).

Or that, yes, we’ve split the atom and built arsenals of thousands of nukes, but humanity will be just fine… somehow it will all amount to having cheaper electricity through commercial uses of nuclear power, not self-destruction of the earth – the fact that we’ve been lucky so far is evidence that we really don’t need to try and manage the risk created by nuclear arsenals in any explicit way, it just happens “naturally”.

170. Scott Says:

fred #169: Why not carry your nuclear energy analogy even further back in time?

“Yes, we’ve discovered metalworking … but what folly to imagine that this will be used only for farm and kitchen implements, rather than swords used to establish cruel empires and keep humanity in a state of permanent subjection and slavery!”

Far from an absurd prediction … and yet in retrospect, the metalworking progress was both (1) unstoppable even if you tried, and (2) an essential prerequisite to having 8 billion people on earth, including ones having this very conversation via a global telecommunications network at all! 🙂

171. Boaz Barak Says:

Fred 169, Scott 170: I make this point in https://windowsontheory.org/2022/05/23/why-i-am-not-a-longtermist/

Even if in retrospect it would have been better to stop scientific progress at the year 1600 (or 1000 BC or 2000) , this is not really an option

172. fred Says:

Scott #170

The splitting of the atom lead to nuclear weapons and commercial energy within years, not thousands of years.
And even more to the point, the first application of splitting of the atom was Hiroshima and Nagasaki getting vaporized, before the first commercial nuclear plant was built.

173. OhMyGoodness Says:

First atomic weapon tested (Trinity)-July 1945
First atomic weapon used-August 1945
Second atomic weapon used-August 1945
(planned to be used over Kokura Japan but because of visibility issues(inhabitants were burning as much coal tar as possible to obscure city) it was diverted to Nagasaki)
First nuclear combat ship launched, USS Nautilus submarine-January 1954
First commercial nuclear power plant fully commissioned in Shippingport PA-May 1958

I enjoyed your comments at the bottom especially the blockchain analogy. I am not sure though about metalworking being necessary to reach 8 billion. Homo Sapiens is an extremely persistent and ingenious procreator and would have given it a good go even limited tonstone implements. (Not serious)

Stand on Zanzibar projections still on track.

174. Scott Says:

fred #172: And yet today, it’s fear of nuclear weapons, rather than nuclear weapons themselves, that seems to be dooming civilization, by making the nuclearization of the whole energy grid politically impossible. I guess the lesson is to struggle constantly to avoid drawing the wrong lessons!

175. fred Says:

Scott #174

“it’s fear of nuclear weapons […] making the nuclearization of the whole energy grid politically impossible.”

Yeah, that and a few incidents called the Three Mile Island, Chernobyl and Fukushima nuclear disasters.

And also, with the Russian invasion of Ukraine, the realization that commercial nuclear plants are easy conventional targets providing another great way to be subjected to nuclear blackmail.

176. OhMyGoodness Says:

fred#175

“Yeah, that and a few incidents called the Three Mile Island, Chernobyl and Fukushima nuclear disasters.”

These are old designs and there are much safer designs now but I am not a nuclear engineer and so am unable to speak about the newer designs in detail. Take a look at the advanced small modular reactors. The US military has expressed interest in acquiring this design at or below 10 MW’s capacity per unit. China is progressing quickly with Westinghouse design reactors. Site selection was an issue with all three nuclear generation plants that you mention with Three Mile Island and Chernobyl located adjacent to considerable population and Fukushima in a seismically active area subject to tsunami.

All of the issues related to these have been addressed in later designs (as has terrorist attacks with quick self cooling reactors) and site selection is now known to be critical so reasonable to consider safer operations now.

The “not in my backyard” attitude has not been evident in Texas as local communities around the South Texas NuclearPower Station are supportive of the existing reactors and even further expansion.

177. fred Says:

OhMyGoodness #176

“China is progressing quickly with Westinghouse design reactors.”

yea, I don’t know, man,… the Chinese built their latest nuclear reactors with the help of France (the irony is that it took so long than now France has lost the know-how to make those on their own, and they would need the help of China…).
And, in the meantime, the Chinese had some sort of a leak

Oh, and the French also helped China built the Wuhan biolab…
https://www.mediapart.fr/en/journal/france/310520/strange-saga-how-france-helped-build-wuhans-top-security-virus-lab?_locale=en&onglet=full

Of course we never know wtf happens over there because of how they suppress all information.
So, yea, let’s go all in and trust China and their “quick” progress. The worst that can happen is a global pandemic or a nuclear disaster.

178. fred Says:

OhMyGoodness

“Site selection was an issue […] Fukushima in a seismically active area subject to tsunami.”

You think the Japanese engineers weren’t aware that Japan is seismically active and at high risk of Tsunamis?
That’s the thing, they always claim they’ve thought of everything.
But in all those situations, the problem is rarely some theoretical technical flaw, there’s always human errors, financial pressure leading to cutting corners, coverups, corruption, etc.
For Fukushima, the executives running the plant have been blamed, not the engineers who worked on the design

179. fred Says:

My last post on nuclear energy and its risks…

The biggest irony happened during the release of the movie “The China Syndrome” in March of 1979, a great classic, starring Michael Douglas (who also produced it), Jane Fonda, and Jack Lemmon.
The New York Time wrote a piece on the movie on March 19 1979:

https://web.archive.org/web/20210102203851/https://www.nytimes.com/1979/03/18/archives/nuclear-experts-debate-the-china-syndrome-but-does-it-satisfy-the.html

“John Taylor, an executive of Westinghouse, which makes reactors, calls the film “an overall character assassination of an entire industry.” On the other hand, Daniei Ford a leading critic of nuclear power, says, “The film highlights the central problem with the nuclear program — safety precautions are being compromised by an industry whose major concerns are power generation and money making.”

Nine days later the Three Mile Island accident happened…

The resulting investigation:
“The heaviest criticism from the Kemeny Commission said that “… fundamental changes will be necessary in the organization, procedures, and practices—and above all—in the attitudes” of the NRC and the nuclear industry.”

It’s often the case that if popular culture (sci-fi literature, movies) has a tradition of making content about the negative effects of some tech, it’s worth paying attention.

180. OhMyGoodness Says:

I believe the old Chinese designs were French and have been supplanted by new Westinghouse designs that have recently begun operation and with more construction planned. The French are out for new nuclear construction.

Yeah I know about the French and the Wuhan Lab and that the US contributed a lot of the funds in order to offshore research.

You know the old aphorism-Life is just one damned thing after another. There are always trade offs to be considered and engineers are tasked with learning from failures and improving designs to address identified problems. I don’t know anyone that believes efforts in technology to improve reliability haven’t been wildly successful. Nuclear engineering follows the same path.

It is incomprehensible to me that there is a big push for electric vehicles and the lack of electrical generating capacity to service those vehicles wasn’t considered. Another strange but true story to add to my list. This reminds me of the famous Billy Connolly skit about Scottish Labor Unions-We demand this and want that and it must be now and won’t tolerate that. At some point a reckoning with reality appears and trade offs must be made. If EV’s are mandated by the government and fossil fuels are brews of the devil, then nuclear power is the only reasonable option available to service the vast added loads.

You know I wouldn’t suggest that China should be a rigid overarching model for the West (they also have a lot of new coal fired generating capacity planned) but they are realistic and understand that power doesn’t suddenly appear because of wishes and oh so virtuous demands. There has to be a detailed plan in place to generate that power otherwise just a suspension of disbelief.

181. fred Says:

The difficulty of assessing the true risks of AGI comes from the fact that intelligence is a concept that can’t be pinned down, and it’s obviously even less clear what we mean by “super” intelligence.
We have categories like “book smart” vs “street smart”, and we talk about “emotional intelligence”. Or the super abilities of autistic savants or people who have perfect photographic memory, which seem already somewhat “magical” to an average person.
We don’t know what an AGI would be like when it comes to such categories, and the idea of taking any of these and “improving” them by orders of magnitude is even more confusing.
When something is promised that seems to good to be true, or miraculous (e.g. being orders of magnitude smarter than we are), we see it as a fantasy, and that’s the biggest hurdle with thinking about the risks of AGIs.

It’s not really an entirely new problem though.
We’ve had the same questions when it comes to “life” vs “artificial life”.
The concept of life is also hard to pin down.
And when the idea of nano machines was introduced (i.e. an organism that can self replicate perfectly), in 1986 the concept of “grey goo” was introduced, the risk that such nano tech could run amok and turn the entire earth into itself. Very similar to the “paper clip maximizer” scenario for AI.
Nano tech and AIs can in theory run amok, but they’re both limited in practice by the fact that resources are hard to transform and limited in supply. The same limitations that prevent organic lifeforms (from viruses to humans) to entirely take over the world. No process with exponential growth stays exponential for long: if the process is meant to persist long enough then a balance with its environment has to be reached.

182. Shtetl-Optimized » Blog Archive » My AI Safety Lecture for UT Effective Altruism Says:

[…] alignment movement that I’ve never entirely subscribed to, I suppose I do now subscribe to a “reform” version of AI […]

183. My AI Safety Lecture for UT Effective Altruism - My Blog Says:

[…] AI alignment movement that I’ve never entirely subscribed to, I suppose I do now subscribe to a “reform” version of AI […]

184. R Says:

Hi Scott,

I’ve long been skeptical of the AI Risk movement, so I was interested when you said you would be working on it, as it was a sign it might actually be worth taking seriously. However, it seems from this post that our views actually don’t differ much after all, except in labels.

You redefine the traditional AI Risk movement as “Orthodox” and declare its opposition to be “Reform”, even though there isn’t much similarity between the two* and they would not consider what you do to be real AI Safety research.

* E.g. the two camps lead to very different views on questions like “is nuking all the chip factories a good way to respond to AI Risk?” or “should I even bother planting a tree in my garden when it won’t bear fruit before the inevitable rapture/apocalypse?”

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within  for displayed equations or  for inline equations.

Comment Policies:

1. All comments are placed in moderation and reviewed prior to appearing.
2. You'll also be sent a verification email to the email address you provided.
YOU MUST CLICK THE LINK IN YOUR VERIFICATION EMAIL BEFORE YOUR COMMENT CAN APPEAR. WHY IS THIS BOLD, UNDERLINED, ALL-CAPS, AND IN RED? BECAUSE PEOPLE ARE STILL FORGETTING TO DO IT.
3. Comments can be left in moderation for any reason, but in particular, for ad-hominem attacks, hatred of groups of people, snide and patronizing tone, trollishness, disingenuousness, or presumptuousness (e.g., linking to a long paper or article and challenging me to respond to it).
4. Even when no individual comment violates policy, when there are dozens of comments from a single commenter hammering home the same few themes, and the commenter shows no interest in changing their views or learning from anyone else, the commenter will receive a warning followed by a 3-month ban.
5. Whenever I'm in doubt, I'll forward comments to Shtetl-Optimized Committee of Guardians, and respect SOCG's judgments on whether those comments should appear.
6. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.