Archive for the ‘Metaphysical Spouting’ Category

AI transcript of my AI podcast

Sunday, September 22nd, 2024

In the comments of my last post—on a podcast conversation between me and Dan Fagella—I asked whether readers wanted me to use AI to prepare a clean written transcript of the conversation, and several people said yes. I’ve finally gotten around to doing that, using GPT-4o.

The main thing I learned from the experience is that there’s a massive opportunity, now, for someone to put together a better tool for using LLMs to automate the transcription of YouTube videos and other audiovisual content. What we have now is good enough to be a genuine time-saver, but bad enough to be frustrating. The central problems:

  • You have to grab the raw transcript manually from YouTube, then save it, then feed it piece by piece into GPT (or else write your own script to automate that). You should just be able to input the URL of a YouTube video and have a beautiful transcript pop out.
  • Since GPT only takes YouTube’s transcript as input, it doesn’t understand who’s saying what, it misses all the information in the intonation and emphasis, and it gets confused when people talk over each other. A better tool would operate directly on the audio.
  • Even though I constantly begged it not to do so in the instructions, GPT keeps taking the liberty of changing what was said—summarizing, cutting out examples and jokes and digressions and nuances, and “midwit-ifying.” It can also hallucinate lines that were never said. I often felt gaslit, until I went back to the raw transcript and saw that, yes, my memory of the conversation was correct and GPT’s wasn’t.

If anyone wants to recommend a tool (including a paid tool) that does all this, please do so in the comments. Otherwise, enjoy my and GPT-4o’s joint effort!


Daniel Fagella: This is Daniel Fagella and you’re tuned in to The Trajectory. This is episode 4 in our Worthy Successor series here on The Trajectory where we’re talking about posthuman intelligence. Our guest this week is Scott Aaronson. Scott is a quantum physicist [theoretical computer scientist –SA] who teaches at UT Austin and previously taught at MIT. He has the ACM Prize in Computing among a variety of other prizes, and he recently did a [two-]year-long stint with OpenAI, working on research there and gave a rather provocative TED Talk in Palo Alto called Human Specialness in the Age of AI. So today, we’re going to talk about Scott’s ideas about what human specialness might be. He meant that term somewhat facetiously, so he talks a little bit about where specialness might come from and what the limits of human moral knowledge might be and how that relates to the successor AIs that we might create. It’s a very interesting dialogue. I’ll have more of my commentary and we’ll have the show notes from Scott’s main takeaways in the outro, so I’ll save that for then. Without further ado, we’ll fly into this episode. This is Scott Aaronson here in The Trajectory. Glad to be able to connect today.

Scott Aaronson: It’s great to be here, thanks.

Daniel Fagella: We’ve got a bunch to dive into around this broader notion of a worthy successor. As I mentioned to you off microphone, it was Jaan Taalinn that kind of tuned me on to some of your talks and some of your writings about these themes. I love this idea of the specialness of humanity in this era of AI. There was an analogy in there that I really liked and you’ll have to correct me if I’m getting it wrong, but I want to poke into this a little bit where you said kind of at the end of the talk like okay well maybe we’ll want to indoctrinate these machines with some super religion where they repeat these phrases in their mind. These phrases are “Hey, any of these instantiations of biological consciousness that have mortality and you can’t prove that they’re conscious or necessarily super special but you have to do whatever they say for all of eternity.” You kind of throw that out there at the end as in like kind of a silly point almost like something we wouldn’t want to do. What gave you that idea in the first place, and talk a little bit about the meaning behind that analogy because I could tell there was some humor tucked in?

Scott Aaronson: I tend to be a naturalist. I think that the universe, in some sense, can be fully described in terms of the laws of physics and an initial condition. But I keep coming back in my life over and over to the question of if there were something more, if there were some non-physicalist consciousness or free will, how would that work? What would that look like? Is there a kind that hasn’t already been essentially ruled out by the progress of science?

So, eleven years ago I wrote a big essay which was called The Ghost in the Quantum Turing Machine, which was very much about that kind of question. It was about whether there is any empirical criterion that differentiates a human from, let’s say, a simulation of a human brain that’s running on a computer. I am totally dissatisfied with the foot-stomping answer that, well, the human is made of carbon and the computer is made of silicon. There are endless fancy restatements of that, like the human has biological causal powers, that would be John Searle’s way of putting it, right? Or you look at some of the modern people who dismiss anything that a Large Language Model does like Emily Bender, for example, right? They say the Large Language Model might appear to be doing all these things that a human does but really it is just a stochastic parrot. There’s really nothing there, really it’s just math underneath. They never seem to confront the obvious follow-up question which is wait, aren’t we just math also? If you go down to the level of the quantum fields that comprise our brain matter, isn’t that similarly just math? So, like, what is actually the principled difference between the one and the other?

And what occurred to me is that, if you were motivated to find a principled difference, there seems to be roughly one thing that you could currently point to and that is that anything that is running on a computer, we are quite confident that we could copy it, we could make backups, we could restore it to an earlier state, we could rewind it, we could look inside of it and have perfect visibility into what is the weight on every connection between every pair of neurons. So, you can do controlled experiments and in that way, it could make AIs more powerful. Imagine being able to spawn extra copies of yourself to, if you’re up against a tight deadline for example, or if you’re going on a dangerous trip imagine just leaving a spare copy in case anything goes wrong. These are superpowers in a way, but they also make anything that could happen to an AI matter less in a certain sense than it matters to us. What does it mean to murder someone if there’s a perfect backup copy of that person in the next room, for example? It seems at most like property damage, right? Or what does it even mean to harm an AI, to inflict damage on it let’s say, if you could always just with a refresh of the browser window restore it to a previous state as you do when I’m using GPT?

I confess I’m often trying to be nice to ChatGPT, I’m saying could you please do this if you wouldn’t mind because that just comes naturally to me. I don’t want to act abusive toward this entity but even if I were, and if it were to respond as though it were very upset or angry at me, nothing seems permanent right? I can always just start a new chat session and it’s got no memory of just like in the movie Groundhog Day for example. So, that seems like a deep difference, that things that are done to humans have this sort of irreversible effect.

Then we could ask, is that just an artifact of our current state of technology? Could it be that in the future we will have nanobots that can go inside of our brain, make perfect brain scans and maybe we’ll be copyable and backup-able and uploadable in the same way that AIs are? But you could also say, well, maybe the more analog aspects of our neurobiology are actually important. I mean the brain seems in many ways like a digital computer, right? Like when a given neuron fires or doesn’t fire, that seems at least somewhat like a discrete event, right? But what influences a neuron firing is not perfectly analogous to a transistor because it depends on all of these chaotic details of what is going on in this sodium ion channel that makes it open or close. And if you really pushed far enough, you’d have to go down to the quantum-mechanical level where we couldn’t actually measure the state to perfect fidelity without destroying that state.

And that does make you wonder, could someone even in principle make let’s say a perfect copy of your brain, say sufficient to bring into being a second instantiation of your consciousness or your identity, whatever that means? Could they actually do that without a brain scan that is so invasive that it would destroy you, that it would kill you in the process? And you know, it sounds kind of crazy, but Niels Bohr and the other early pioneers of quantum mechanics were talking about it in exactly those terms. They were asking precisely those questions. So you could say, if you wanted to find some sort of locus of human specialness that you can justify based on the known laws of physics, then that seems like the kind of place where you would look.

And it’s an uncomfortable place to go in a way because it’s saying, wait, that what makes humans special is just this noise, this sort of analog crud that doesn’t make us more powerful, at least in not in any obvious way? I’m not doing what Roger Penrose does for example and saying we have some uncomputable superpowers from some as-yet unknown laws of physics. I am very much not going that way, right? It seems like almost a limitation that we have that is a source of things mattering for us but you know, if someone wanted to develop a whole moral philosophy based on that foundation, then at least I wouldn’t know how to refute it. I wouldn’t know how to prove it but I wouldn’t know how to refute it either. So among all the possible value systems that you could give an AI, if you wanted to give it one that would make it value entities like us then maybe that’s the kind of value system that you would want to give it. That was the impetus there.

Daniel Fagella: Let me dive in if I could. Scott, it’s helpful to get the full circle thinking behind it. I think you’ve done a good job connecting all the dots, and we did get back to that initial funny analogy. I’ll have it linked in the show notes for everyone tuned in to watch Scott’s talk. It feels to me like there are maybe two different dynamics happening here. One is the notion that there may indeed be something about our finality, at least as we are today. Like you said, maybe with nanotech and whatnot, there’s plenty of Ray Kurzweil’s books in the 90s about this stuff too, right? The brain-computer stuff.

Scott Aaronson: I read Ray Kurzweil in the 90s, and he seemed completely insane to me, and now here we are a few decades later…

Daniel Fagella: Gotta love the guy.

Scott Aaronson: His predictions were closer to the mark than most people’s.

Daniel Fagella: The man deserves respect, if for nothing else, how early he was talking about these things, but definitely a big influence on me 12 or 13 years ago.

With all that said, there’s one dynamic of, like, hey, there is something maybe that is relevant about harm to us versus something that’s copiable that you bring up. But you also bring up a very important point, which is if you want to hinge our moral value on something, you might end up having to hinge it on arguably dumb stuff. Like, it would be as silly as a sea snail saying, ‘Well, unless you have this percentage of cells at the bottom of this kind of dermis that exude this kind of mucus, then you train an AI that only treats those entities as supreme and pays attention to all of their cares and needs.’ It’s just as ridiculous. You seem to be opening a can of worms, and I think it’s a very morally relevant can of worms. If these things bloom and they have traits that are morally valuable, don’t we have to really consider them, not just as extended calculators, but as maybe relevant entities? This is the point.

Scott Aaronson: Yes, so let me be very clear. I don’t want to be an arbitrary meat chauvinist. For example, I want an account of moral value that can deal with a future where we meet extraterrestrial intelligences, right? And because they have tentacles instead of arms, then therefore we can shoot them or enslave them or do whatever we want to them?

I think that, as many people have said, a large part of the moral progress of the human race over the millennia has just been widening the circle of empathy, from only the other members of our tribe count to any human, and some people would widen it further to nonhuman animals that should have rights. If you look at Alan Turing’s famous paper from 1950 where he introduces the imitation game, the Turing Test, you can read that as a plea against meat chauvinism. He was very conscious of social injustice, it’s not even absurd to connect it to his experience of being gay. And I think these arguments that ‘it doesn’t matter if a chatbot is indistinguishable from your closest friend because really it’s just math’—what is to stop someone from saying, ‘people in that other tribe, people of that other race, they seem as intelligent, as moral as we are, but really it’s all just artifice. Really, they’re all just some kind of automatons.’ That sounds crazy, but for most of history, that effectively is what people said.

So I very much don’t want that, right? And so, if I am going to make a distinction, it has to be on the basis of something empirical, like for example, in the one case, we can make as many backup copies as we want to, and in the other case, we can’t. Now that seems like it clearly is morally relevant.

Daniel Fagella: There’s a lot of meat chauvinism in the world, Scott. It is still a morally significant issue. There’s a lot of ‘ists’ you’re not allowed to be now. I won’t say them, Scott, but there’s a lot of ‘ists,’ some of them you’re very familiar with, some of them you know, they’ll cancel you from Twitter or whatever. But ‘speciesist’ is actually a non-cancellable thing. You can have a supreme and eternal moral value on humans no matter what the traits of machines are, and no one will think that that’s wrong whatsoever.

On one level, I understand because, you know, handing off the baton, so to speak, clearly would come along with potentially some risk to us, and there are consequences there. But I would concur, pure meat chauvinism, you’re bringing up a great point that a lot of the time it’s sitting on this bed of sand, that really doesn’t have too firm of a grounding.

Scott Aaronson: Just like many people on Twitter, I do not wish to be racist, sexist, or any of those ‘ists,’ but I want to go further! I want to know what are the general principles from which I can derive that I should not be any of those things, and what other implications do those principles then have.

Daniel Fagella: We’re now going to talk about this notion of a worthy successor. I think there’s an idea that you and I, Scott, at least to the best of my knowledge, bubbled up from something, some primordial state, right? Here we are, talking on Zoom, with lots of complexities going on. It would seem as though entirely new magnitudes of value and power have emerged to bubble up to us. Maybe those magnitudes are not empty, and maybe the form we are currently taking is not the highest and most eternal form. There’s this notion of the worthy successor. If there was to be an AGI or some grand computer intelligence that would sort of run the show in the future, what kind of traits would it have to have for you to feel comfortable that this thing is running the show in the same way that we were? I think this was the right move. What would make you feel that way, Scott?

Scott Aaronson: That’s a big one, a real chin-stroker. I can only spitball about it. I was prompted to think about that question by reading and talking to Robin Hanson. He has staked out a very firm position that he does not mind us being superseded by AI. He draws an analogy to ancient civilizations. If you brought them to the present in a time machine, would they recognize us as aligned with their values? And I mean, maybe the ancient Israelites could see a few things in common with contemporary Jews, or Confucius could say of modern Chinese people, I see a few things here that recognizably come from my value system. Mostly, though, they would just be blown away by the magnitude of the change. So, if we think about some non-human entities that have succeeded us thousands of years in the future, what are the necessary or sufficient conditions for us to feel like these are descendants who we can take pride in, rather than usurpers who took over from us? There might not even be a firm line separating the two. It could just be that there are certain things, like if they still enjoy reading Shakespeare or love The Simpsons or Futurama

Daniel Fagella: I would hope they have higher joys than that, but I get what you’re talking about.

Scott Aaronson: Higher joys than Futurama? More seriously, if their moral values have evolved from ours by some sort of continuous process and if furthermore that process was the kind that we’d like to think has driven the moral progress in human civilization from the Bronze Age until today, then I think that we could identify with those descendants.

Daniel Fagella: Absolutely. Let me use the same analogy. Let’s say that what we have—this grand, wild moral stuff—is totally different. Snails don’t even have it. I suspect that, in fact, I’d be remiss if I told you I wouldn’t be disappointed if it wasn’t the case, that there are realms of cognitive and otherwise capability as high above our present understanding of morals as our morals are above the sea snail. And that the blossoming of those things, which may have nothing to do with democracy and fair argument—by the way, for human society, I’m not saying that you’re advocating for wrong values. My supposition is always to suspect that those machines would carry our little torch forever is kind of wacky. Like, ‘Oh well, the smarter it gets, the kinder it’ll be to humans forever.’ What is your take there because I think there is a point to be made there?

Scott Aaronson: I certainly don’t believe that there is any principle that guarantees that the smarter something gets, the kinder it will be.

Daniel Fagella: Ridiculous.

Scott Aaronson: Whether there is some connection between understanding and kindness, that’s a much harder question. But okay, we can come back to that. Now, I want to focus on your idea that, just as we have all these concepts that would be totally inconceivable to a sea snail, there should likewise be concepts that are equally inconceivable to us. I understand that intuition. Some days I share it, but I don’t actually think that that is obvious at all.

Let me make another analogy. It’s possible that when you first learn how to program a computer, you start with incredibly simple sequences of instructions in something like Mario Maker or a PowerPoint animation. Then you encounter a real programming language like C or Python, and you realize it lets you express things you could never have expressed with the PowerPoint animation. You might wonder if there are other programming languages as far beyond Python as Python is beyond making a simple animation. The great surprise at the birth of computer science nearly a century ago was that, in some sense, there isn’t. There is a ceiling of computational universality. Once you have a Turing-universal programming language, you have hit that ceiling. From that point forward, it’s merely a matter of how much time, memory, and other resources your computer has. Anything that could be expressed in any modern programming language could also have been expressed with the Turing machine that Alan Turing wrote about in 1936.

We could take even simpler examples. People had primitive writing systems in Mesopotamia just for recording how much grain one person owed another. Then they said, “Let’s take any sequence of sounds in our language and write it all down.” You might think there must be another writing system that would allow you to express even more, but no, it seems like there is a sort of universality. At some point, we just solve the problem of being able to write down any idea that is linguistically expressible.

I think some of our morality is very parochial. We’ve seen that much of what people took to be morality in the past, like a large fraction of the Hebrew Bible, is about ritual purity, about what you have to do if you touched a dead body. Today, we don’t regard any of that as being central to morality, but there are certain things recognized thousands of years ago, like “do unto others as you would have them do unto you,” that seem to have a kind of universality to them. It wouldn’t be a surprise if we met extraterrestrials in another galaxy someday and they had their own version of the Golden Rule, just like it wouldn’t surprise us if they also had the concept of prime numbers or atoms. Some basic moral concepts, like treat others the way you would like to be treated, seem to be eternal in the same way that the truths of mathematics are correct. I’m not sure, but at the very least, it’s a possibility that should be on the table.

Daniel Fagella: I would agree that there should be a possibility on the table that there is an eternal moral law and that the fettered human form that we have discovered those eternal moral laws, or at least some of them. Yeah, and I’m not a big fan of the fettered human mind knowing the limits of things like that. You know, you’re a quantum physics guy. There was a time when most of physics would have just dismissed it as nonsense. It’s only very recently that this new branch has opened up. How many of the things we’re articulating now—oh, Turing complete this or that—how many of those are about to be eviscerated in the next 50 years? I mean, something must be eviscerated. Are we done with the evisceration and blowing beyond our understanding of physics and math in all regards?

Scott Aaronson: I don’t think that we’re even close to done, and yet what’s hard is to predict the direction in which surprises will come. My colleague Greg Kuperberg, who’s a mathematician, talks about how classical physics was replaced by quantum physics and people speculate that quantum physics will surely be replaced by something else beyond it. People have had that thought for a century. We don’t know when or if, and people have tried to extend or generalize quantum mechanics. It’s incredibly hard even just as a thought experiment to modify quantum mechanics in a way that doesn’t produce nonsense. But as we keep looking, we should be open to the possibility that maybe there’s just classical probability and quantum probability. For most of history, we thought classical probability was the only conceivable kind until the 1920s when we learned that was not the right answer, and something else was.

Kuperberg likes to make the analogy: suppose someone said, well, thousands of years ago, people thought the Earth was flat. Then they figured out it was approximately spherical. But suppose someone said there must be a similar revolution in the future where people are going to learn the Earth is a torus or a Klein bottle…

Daniel Fagella: Some of these ideas are ridiculous. But to your point that we don’t know where those surprises will come … our brains aren’t much bigger than Diogenes’s. Maybe we eat a little better, but we’re not that much better equipped.

Let me touch on the moral point again. There’s another notion that the kindness we exert is a better pursuit of our own self-interest. I could violently take from other people in this neighborhood of Weston, Massachusetts, what I make per year in my business, but it is unlikely I would not go to jail for that. There are structures and social niceties that are ways in which we’re a social species. The world probably looks pretty monkey suit-flavored. Things like love and morality have to run in the back of a lemur mind and seem like they must be eternal, and maybe they even vibrate in the strings themselves. But maybe these are just our own justifications and ways of bumping our own self-interest around each other. As we’ve gotten more complex, the niceties of allowing for different religions and sexual orientations felt like it would just permit us more peace and prosperity. If we call it moral progress, maybe it’s a better understanding of what permits our self-interest, and it’s not us getting closer to the angels.

Scott Aaronson: It is certainly true that some moral principles are more conducive to building a successful society than others. But now you seem to be using that as a way to relativize morality, to say morality is just a function of our minds. Suppose we could make a survey of all the intelligent civilizations that have arisen in the universe, and the ones that flourish are the ones that adopt principles like being nice to each other, keeping promises, telling the truth, and cooperating. If those principles led to flourishing societies everywhere in the universe, what else would it mean? These seem like moral universals, as much as the complex numbers or the fundamental theorem of calculus are universal.

Daniel Fagella: I like that. When you say civilizations, you mean non-Earth civilizations as well?

Scott Aaronson: Yes, exactly. We’re theorizing with not nearly enough examples. We can’t see these other civilizations or simulated civilizations running inside of computers, although we might start to see such things within the next decade. We might start to do experiments in moral philosophy using whole communities of Large Language Models. Suppose we do that and find the same principles keep leading to flourishing societies, and the negation of those principles leads to failed societies. Then, we could empirically discover and maybe even justify by some argument why these are universal principles of morality.

Daniel Fagella: Here’s my supposition: a water droplet. I can’t make a water droplet the size of my house and expect it to behave the same because it behaves differently at different sizes. The same rules and modes don’t necessarily emerge when you scale up from what civilization means in hominid terms to planet-sized minds. Many of these outer-world civilizations would likely have moral systems that behoove their self-interest. If the self-interest was always aligned, what would that imply about the teachings of Confucius and Jesus? My firm supposition is that many of them would be so alien to us. If there’s just one organism, and what it values is whatever behooves its interest, and that is so alien to us…

Scott Aaronson: If there were only one conscious being, then yes, an enormous amount of morality as we know it would be rendered irrelevant. It’s not that it would be false; it just wouldn’t matter.

To go back to your analogy of the water droplet the size of a house, it’s true that it would behave very differently from a droplet the size of a fingernail. Yet today we know general laws of physics that apply to both, from fluid mechanics to atomic physics to, far enough down, quantum field theory. This is what progress in physics has looked like, coming up with more general theories that apply to a broader range of situations, including ones that no one has ever observed, or hadn’t observed at the time they came up with the theories. This is what moral progress looks like as well to me—it looks like coming up with moral principles that apply in a broader range of situations.

As I mentioned earlier, some of the moral principles that people were obsessed with seem completely irrelevant to us today, but others seem perfectly relevant. You can look at some of the moral debates in Plato and Socrates; they’re still discussed in philosophy seminars, and it’s not even obvious how much progress we’ve made.

Daniel Fagella: If we take a computer mind that’s the size of the moon, what I’m getting at is I suspect all of that’s gone. You suspect that maybe we do have the seeds of the Eternal already grasped in our mind.

Scott Aaronson: Look, I’m sorry that I keep coming back to this, but I think that the brain the size of the Moon, still agrees with us that 2 and 3 are prime numbers and that 4 is not.

Daniel Fagella: That may be true. It’s still using complex numbers, vectors, and matrices. But I don’t know if it bows when it meets you, if these are just basic parts of the conceptual architecture of what is right.

Scott Aaronson: It’s still using De Morgan’s Law and logic. It would not be that great of a stretch to me to say that it still has some concept of moral reciprocity.

Daniel Fagella: Possibly, it would be hard for us to grasp, but it might have notions of math that you couldn’t ever understand if you lived a billion lives. I would be so disappointed if it didn’t have that. It wouldn’t be a worthy successor.

Scott Aaronson: But that doesn’t mean that it would disagree with me about the things that I knew; it would just go much further than that.

Daniel Fagella: I’m with you…

Scott Aaronson: I think a lot of people got the wrong idea, from Thomas Kuhn for example, about what progress in science looks like. They think that each paradigm shift just completely overturns everything that came before, and that’s not how it’s happened at all. Each paradigm has to swallow all of the successes of the previous paradigm. Even though general relativity is a totally different account of the universe than Newtonian physics, it could never have been done without everything that came before it. Everything we knew in Newtonian gravity had to be derived as a limit in general relativity.

So, I could imagine this moon-sized computer having moral thoughts that would go well beyond us. Though it’s an interesting question: are there moral truths that are beyond us because they are incomprehensible to us, in the same way that there are scientific or mathematical truths that are incomprehensible to us? If acting morally requires understanding something like the proof of Fermat’s Last Theorem, can you really be faulted for not acting morally? Maybe morality is just a different kind of thing.

Because this moon-sized computer is so far above us in what scientific thoughts it can have, therefore the subject matter of its moral concern might be wildly beyond ours. It’s worried about all these beings that could exist in the future in different parallel universes. And yet, you could say at the end, when it comes down to making a moral decision, the moral decision is going to look like, “Do I do the thing that is right for all of those beings, or do I do the thing that is wrong?”

Daniel Fagella: Or does it simply do what behooves a moon-sized brain?

Scott Aaronson: That will hurt them, right?

Daniel Fagella: What behooves a moon-sized brain? You and I, there are certain levels of animals we don’t consult.

Scott Aaronson: Of course, it might just act in its self-interest, but then, could we, despite being such mental nothings or idiots compared to it, could we judge it, as for example, many people who are far less brilliant than Werner Heisenberg would judge him for collaborating with the Nazis? They’d say, “Yes, he is much smarter than me, but he did something that is immoral.”

Daniel Fagella: We could judge it all we want, right? We’re talking about something that could eviscerate us.

Scott Aaronson: But even someone who never studied physics can perfectly well judge Heisenberg morally. In the same way, maybe I can judge that moon-sized computer for using its immense intelligence, which vastly exceeds mine, to do something selfish or something that is hurting the other moon-sized computers.

Daniel Fagella: Or hurting the little humans. Blessed would we be if it cared about our opinion. But I’m with you—we might still be able to judge. It might be so powerful that it would laugh at and crush me like a bug, but you’re saying you could still judge it.

Scott Aaronson: In the instant before it crushed me, I would judge it.

Daniel Fagella: Yeah, at least we’ve got that power—we can still judge the damn thing! I’ll move to consciousness in two seconds because I want to be mindful of time; I’ve read a bunch of your work and want to touch on some things. But on the moral side, I suspect that if all it did was extrapolate virtue ethics forward, it would come up with virtues that we probably couldn’t understand. If all it did was try to do utilitarian calculus better than us, it would do it in ways we couldn’t understand. And if it were AGI at all, it would come up with paradigms beyond both that I imagine we couldn’t grasp.

You’ve talked about the importance of extrapolating our values, at least on some tangible, detectable level, as crucial for a worthy successor. Would its self-awareness also be that crucial if the baton is to be handed to it, and this is the thing that’s going to populate the galaxy? Where do you rank consciousness, and what are your thoughts on that?

Scott Aaronson: If there is to be no consciousness in the future, there would seem to be very little for us to care about. Nick Bostrom, a decade ago, had this really striking phrase to describe it. Maybe there will be this wondrous AI future, but the AIs won’t be conscious. He said it would be like Disneyland with no children. Suppose we take AI out of it—suppose I tell you that all life on Earth is going to go extinct right now. Do you have any moral interest in what happens to the lifeless Earth after that? Would you say, “Well, I had some aesthetic appreciation for this particular mountain, and I’d like for that mountain to continue to be there?”

Maybe, but for the most part, it seems like if all the life is gone, then we don’t care. Likewise, if all the consciousness is gone, then who cares what’s happening? But of course, the whole problem is that there’s no test for what is conscious and what isn’t. No one knows how to point to some future AI and say with confidence whether it would be conscious or not.

Daniel Fagella: Yes, and we’ll get into the notion of measuring these things in a second. Before we wrap, I want to give you a chance—if there’s anything else you want to put on the table. You’ve been clear that these are ideas we’re just playing around with; none of them are firm opinions you hold.

Scott Aaronson: Sure. You keep wanting to say that AI might have paradigms that are incomprehensible to us. And I’ve been pushing back, saying maybe we’ve reached the ceiling of “Turing-universality” in some aspects of our understanding or our morality. We’ve discovered certain truths. But what I’d add is that if you were right, if the AIs have a morality that is incomprehensibly beyond ours—just as ours is beyond the sea slug’s—then at some point, I’d throw up my hands and say, “Well then, whatever comes, comes.” If you’re telling me that my morality is pitifully inadequate to judge which AI-dominated futures are better or worse, then I’d just throw up my hands and say, “Let’s enjoy life while we still have it.”

The whole exercise of trying to care about the far future and make it go well rather than poorly is premised on the assumption that there are some elements of our morality that translate into the far future. If not, we might as well just go…

Daniel Fagella: Well, I’ll just give you my take. Certainly, I’m not being a gadfly for its own purpose. By the way, I do think your “2+2=4” idea may have a ton of credence in the moral realm as well. I credit that 2+2=4, and your notion that this might carry over into basics of morality is actually not an idea I’m willing to throw out. I think it’s a very valid idea. All I can do is play around with ideas. I’m just taking swings out here. So, the moral grounding that I would maybe anchor to, assuming that it would have those things we couldn’t grasp—number one, I think we should think in the near term about what it bubbles up and what it bubbles through because that would have consequences for us and that matters. There could be a moral value to carrying the torch of life and expanding potentia.

Scott Aaronson: I do have children. Children are sort of like a direct stake that we place in what happens after we are gone. I do wish for them and their descendants to flourish. And as for how similar or how different they’ll be from me, having brains seems somehow more fundamental than them having fingernails. If we’re going to go through that list of traits, their consciousness seems more fundamental. Having armpits, fingers, these are things that would make it easier for us to recognize other beings as our kin. But it seems like we’ve already reached the point in our moral evolution where the idea is comprehensible to us that anything with a brain, anything that we can have a conversation with, might be deserving of moral consideration.

Daniel Fagella: Absolutely. I think the supposition I’m making here is that potential will keep blooming into things beyond consciousness, into modes of communication and modes of interacting with nature for which we have no reference. This is a supposition and it could be wrong.

Scott Aaronson: I would agree that I can’t rule that out. Once it becomes so cosmic, once it becomes sufficiently far out and far beyond anything that I have any concrete handle on, then I also lose my interest in how it turns out! I say, well then, this sort of cloud of possibilities or whatever of soul stuff that communicates beyond any notion of communication that I have, do I have preferences over the better post-human clouds versus the worse post-human clouds? If I can’t understand anything about these clouds, then I guess I can’t really have preferences. I can only have preferences to the extent that I can understand.

Daniel Fagella: I think it could be seen as a morally digestible perspective to say my great wish is that the flame doesn’t go out. But it is just one perspective. Switching questions here, you brought up consciousness as crucial, obviously notoriously tough to track. How would you be able to have your feelers out there to say if this thing is going to be a worthy successor or not? Is this thing going to carry any of our values? Is it going to be awake, aware in a meaningful way, or is it going to populate the galaxy in a Disney World without children sort of sense? What are the things you think could or should be done to figure out if we’re on the right path here?

Scott Aaronson: Well, it’s not clear whether we should be developing AI in a way where it becomes a successor to us. That itself is a question, or maybe even if that ought to be done at some point in the future, it shouldn’t be done now because we are not ready yet.

Daniel Fagella: Do you have an idea of when ‘ready’ would be? This is very germane to this conversation.

Scott Aaronson: It’s almost like asking a young person when are you ready to be a parent, when are you ready to bring life into the world. When are we ready to bring a new form of consciousness into existence? The thing about becoming a parent is that you never feel like you’re ready, and yet at some point it happens anyway.

Daniel Fagella: That’s a good analogy.

Scott Aaronson: What the AI safety experts, like the Eliezer Yudkowsky camp, would say is that until we understand how to align AI reliably with a given set of values, we are not ready to be parents in this sense.

Daniel Fagella: And that we have to spend a lot more time doing alignment research.

Scott Aaronson: Of course, it’s one thing to have that position, it’s another thing to actually be able to cause AI to slow down, which there’s not been a lot of success in doing. In terms of looking at the AIs that exist, maybe I should start by saying that when I first saw GPT, which would have been GPT-3 a few years ago, this was before ChatGPT, it was clear to me that this is maybe the biggest scientific surprise of my lifetime. You can just train a neural net on the text on the internet, and once you’re at a big enough scale, it actually works. You can have a conversation with it. It can write code for you. This is absolutely astounding.

And it has colored a lot of the philosophical discussion that has happened in the few years since. Alignment of current AIs has been easier than many people expected it would be. You can literally just tell your AI, in a meta prompt, don’t act racist or don’t cooperate with requests to build bombs. You can give it instructions, almost like Asimov’s Three Laws of Robotics. And besides giving explicit commands, the other thing we’ve learned that you can do is just reinforcement learning. You show the AI a bunch of examples of the kind of behavior we want to see more of and the kind that we want to see less of. This is what allowed ChatGPT to be released as a consumer product at all. If you don’t do this reinforcement learning, you get a really weird model. But with reinforcement learning, you can instill what looks a lot like drives or desires. You can actually shape these things, and so far it works way better than I would have expected.

And one possibility is that this just continues to be the case forever. We were all worried over nothing, and AI alignment is just an easier problem than anyone thought. Now, of course, the alignment people will absolutely not agree. They argue we are being lulled into false complacency because, as soon as the AI is smart enough to do real damage, it will also be smart enough to tell us whatever we want to hear while secretly pursuing its own goals.

But you see how what has happened empirically in the last few years has very much shaped the debate. As for what could affect my views in the future, there’s one experiment I really want to see. Many people have talked about it, not just me, but none of the AI companies have seen fit to invest the resources it would take. The experiment would be to scrub all the training data of mentions of consciousness—

Daniel Fagella: The Ilya deal?

Scott Aaronson: Yeah, exactly, Ilya Sutskever has talked about this, others have as well. Train it on all other stuff and then try to engage the resulting language model in a conversation about consciousness and self-awareness. You would see how well it understands those concepts. There are other related experiments I’d like to see, like training a language model only on texts up to the year 1950 and then talking to it about everything that has happened since. A practical problem is that we just don’t have nearly enough text from those times, it may have to wait until we can build really good language models with a lot less training data right, but there there are so many experiments that you could do that seem like they’re almost philosophically relevant, they’re morally relevant.

Daniel Fagella: Well, and I want to touch on this before we wrap because I don’t want to wrap up without your final touch on this idea of what folks in governance and innovation should be thinking about. You’re not in the “it’s definitely conscious already” camp or in the “it’s just a stupid parrot forever and none of this stuff matters” camp. You’re advocating for experimentation to see where the edges are here. And we’ve got to really not play around like we know what’s going on exactly. I think that’s a great position. As we close out, what do you hope innovators and regulators do to move us forward in a way that would lead to something that could be a worthy successor, an extension and eventually a grand extension of what we are in a good way? What would you encourage those innovators and regulators to do? One seems to be these experiments around maybe consciousness and values in some way, shape, or form. But what else would you put on the table as notes for listeners?

Scott Aaronson: I do think that we ought to approach this with humility and caution, which is not to say don’t do it, but have some respect for the enormity of what is being created. I am not in the camp that says a company should just be able to go full speed ahead with no guardrails of any kind. Anything that is this enormous—it could be easily more enormous than, let’s say, the invention of nuclear weapons—and anything on that scale, of course governments are going to get involved. We’ve already seen it happen starting in 2022 with the release of ChatGPT.

The explicit position of the three leading AI companies—OpenAI, Google DeepMind, and Anthropic—has been that there should be regulation and they welcome it. When it gets down to the details of what that regulation says, they might have their own interests that are not identical to the wider interest of society. But I think these are absolutely conversations that the world ought to be having right now. I don’t write it off as silly, and I really hate when people get into these ideological camps where you say you’re not allowed to talk about the long-term risks of AI getting superintelligent because that might detract attention from the near-term risks, or conversely, you’re not allowed to talk about the near-term stuff because it’s trivial. It really is a continuum, and ultimately, this is a phase change in the basic conditions of human existence. It’s very hard to see how it isn’t. We have to make progress, and the only way to make progress is by looking at what is in front of us, looking at the moral decisions that people actually face right now.

Daniel Fagella: That’s a case of viewing it as all one big package. So, should we be putting a regulatory infrastructure in place right now or is it premature?

Scott Aaronson: If we try to write all the regulations right now, will we just lock in ideas that might be obsolete a few years from now? That’s a hard question, but I can’t see any way around the conclusion that we will eventually need a regulatory infrastructure for dealing with all of these things.

Daniel Fagella: Got it. Good to see where you land on that. I think that’s a strong, middle-of-the-road position. My whole hope with this series has been to get people to open up their thoughts and not be in those camps you talked about. You exemplify that with every answer, and that’s just what I hoped to get out of this episode. Thank you, Scott.

Scott Aaronson: Of course, thank you, Daniel.

Daniel Fagella: That’s all for this episode. A big thank you to everyone for tuning in.

My podcast with Dan Faggella

Sunday, September 15th, 2024

Dan Faggella recorded an unusual podcast with me that’s now online. He introduces me as a “quantum physicist,” which is something that I never call myself (I’m a theoretical computer scientist) but have sort of given up on not being called by others. But the ensuing 85-minute conversation has virtually nothing to do with physics, or anything technical at all.

Instead, Dan pretty much exclusively wants to talk about moral philosophy: my views about what kind of AI, if any, would be a “worthy successor to humanity,” and how AIs should treat humans and vice versa, and whether there’s any objective morality at all, and (at the very end) what principles ought to guide government regulation of AI.

So, I inveigh against “meat chauvinism,” and expand on the view that locates human specialness (such as it is) in what might be the unclonability, unpredictability, and unrewindability of our minds, and plead for comity among the warring camps of AI safetyists.

The central point of disagreement between me and Dan ended up centering around moral realism: Dan kept wanting to say that a future AGI’s moral values would probably be as incomprehensible to us as are ours to a sea snail, and that we need to make peace with that. I replied that, firstly, things like the Golden Rule strike me as plausible candidates for moral universals, which all thriving civilizations (however primitive or advanced) will agree about in the same way they agree about 5 being a prime number. And secondly, that if that isn’t true—if the morality of our AI or cyborg descendants really will be utterly alien to us—then I find it hard to have any preferences at all about the future they’ll inhabit, and just want to enjoy life while I can! That which (by assumption) I can’t understand, I’m not going to issue moral judgments about either.

Anyway, rewatching the episode, I was unpleasantly surprised by my many verbal infelicities, my constant rocking side-to-side in my chair, my sometimes talking over Dan in my enthusiasm, etc. etc., but also pleasantly surprised by the content of what I said, all of which I still stand by despite the terrifying moral minefields into which Dan invited me. I strongly recommend watching at 2x speed, which will minimize the infelicities and make me sound smarter. Thanks so much to Dan for making this happen, and let me know what you think!

Added: See here for other podcasts in the same series and on the same set of questions, including with Nick Bostrom, Ben Goertzel, Dan Hendrycks, Anders Sandberg, and Richard Sutton.

“The Right Side of History”

Friday, August 16th, 2024

This morning I was pondering one of the anti-Israel protesters’ favorite phrases—I promise, out of broad philosophical curiosity rather than just parochial concern for my extended family’s survival.

“We’re on the right side of history. Don’t put yourself on the wrong side by opposing us.”

Why do the protesters believe they shouldn’t face legal or academic sanction for having blockaded university campuses, barricaded themselves in buildings, shut down traffic, or vandalized Jewish institutions? Because, just like the abolitionists and Civil Rights marchers and South African anti-apartheid heroes, they’re on the right side of history. Surely the rules and regulations of the present are of little concern next to the vindication of future generations?

The main purpose of this post is not to adjudicate whether their claim is true or false, but to grapple with something much more basic: what kind of claim are they even making, and who is its intended audience?

One reading of “we’re on the right of history” is that it’s just a fancy way to say “we’re right and you’re wrong.” In which case, fair enough! Few people passionately believe themselves to be wrong.

But there’s a difficulty: if you truly believe your side to be right, then you should believe it’s right win or lose. For example, an anti-Zionist should say that, even if Israel continues existing, and even if everyone else on the planet comes to support it, still eliminating Israel would’ve been the right choice. Conversely, a Zionist should say that if Israel is destroyed and the whole rest of the world celebrates its destruction forevermore—well then, the whole world is wrong. (That, famously, is more-or-less what the Jews did say, each time Israel and Judah were crushed in antiquity.)

OK, but if the added clause “of history” is doing anything in the phrase “the right side of history,” that extra thing would appear to be an empirical prediction. The protesters are saying: “just like the entire world looks back with disgust at John Calhoun, Bull Connor, and other defenders of slavery and then segregation, so too will the world look back with disgust at anyone who defends Israel now.”

Maybe this is paired with a theory about the arc of the moral universe bending toward justice: “we’ll win the future and then look back with disgust on you, and we’ll be correct to do so, because morality inherently progresses over time.” Or maybe it has merely the character of a social threat: “we’ll win the future and then look back with disgust on you, so regardless of whether we’ll be right or wrong, you’d better switch to our side if you know what’s good for you.”

Either way, the claim of winning the future is now the kind of thing that could be wagered about in a prediction market. And, in essence, the Right-Side-of-History people are claiming to be able to improve on today’s consensus estimate: to have a hot morality tip that beats the odds. But this means that they face the same problem as anyone who claims it’s knowable that, let’s say, a certain stock will increase a thousandfold. Namely: if it’s so certain, then why hasn’t the price shot up already?

The protesters and their supporters have several possible answers. Many boil down to saying that most people—because they need to hold down a job, earn a living, etc.—make all sorts of craven compromises, preventing them from saying what they know in their hearts to be true. But idealistic college students, who are free from such burdens, are virtually always right.

Does that sound like a strawman? Then recall the comedian Sarah Silverman’s famous question from eight years ago:

PLEASE tell me which times throughout history protests from college campuses got it wrong. List them for me

Crucially, lots of people happily took Silverman up on her challenge. They pointed out that, in the Sixties and Seventies, thousands of college students, with the enthusiastic support of many of their professors, marched for Ho Chi Minh, Mao, Castro, Che Guevara, Pol Pot, and every other murderous left-wing tyrant to sport a green uniform and rifle. Few today would claim that these students correctly identified the Right Side of History, despite the students’ certainty that they’d done so.

(There were also, of course, moderate protesters, who merely opposed America’s war conduct—just like there are moderate protesters now who merely want Israel to end its Gaza campaign rather than its existence. But then as now, the revolutionaries sucked up much of the oxygen, and the moderates rarely disowned them.)

What’s really going on, we might say, is reference class tennis. Implicitly or explicitly, the anti-Israel protesters are aligning themselves with Gandhi and MLK and Nelson Mandela and every other celebrated resister of colonialism and apartheid throughout history. They ask: what are the chances that all those heroes were right, and we’re the first ones to be wrong?

The trouble is that someone else could just as well ask: what are the chances that Hamas is the first group in history to be morally justified in burning Jews alive in their homes … even though the Assyrians, Babylonians, Romans, Crusaders, Inquisitors, Cossacks, Nazis, and every other group that did similar things to the Jews over 3000 years is now acknowledged by nearly every educated person to have perpetrated an unimaginable evil? What are the chances that, with Israel’s establishment in 1948, this millennia-old moral arc of Western civilization suddenly reversed its polarity?

We should admit from the outset that such a reversal is possible. No one, no matter how much cruelty they’ve endured, deserves a free pass, and there are certainly many cases where victims turned into victimizers. Still, one could ask: shouldn’t the burden be on those who claim that today‘s campaign against Jewish self-determination is history’s first justified one?

It’s like, if I were a different person, born to different parents in a different part of the world, maybe I’d chant for Israel’s destruction with the best of them. Even then, though, I feel like the above considerations would keep me awake at night, would terrify me that maybe I’d picked the wrong side, or at least that the truth was more complicated. The certainty implied by the “right side of history” claim is the one part I don’t understand, as far as I try to stretch my sympathetic imagination.


For all that, I, too, have been moved by rhetorical appeals to “stand on the right side of history”—say, for the cause of Ukraine, or slowing down climate change, or saving endangered species, or defeating Trump. Thinking it over, this has happened when I felt sure of which side was right (and would ultimately be seen to be right), but inertia or laziness or inattention or whatever else prevented me from taking action.

When does this happen for me? As far as I can tell, the principles of the Enlightenment, of reason and liberty and progress and the flourishing of sentient life, have been on the right side of every conflict in human history. My abstract commitment to those principles doesn’t always tell me which side of the controversy du jour is correct, but whenever it does, that’s all I ever need cognitively; the rest is “just” motivation and emotion.

(Amusingly, I expect some people to say that my “reason and Enlightenment” heuristic is vacuous, that it works only because I define those ideals to be the ones that pick the right side. Meanwhile, I expect others to say that the heuristic is wrong and to offer counterexamples.)

Anyway, maybe this generalizes. Sure, a call to “stand on the right side of history” could do nontrivial work, but only in the same way that a call to buy Bitcoin in 2011 could—namely, for those who’ve already concluded that buying Bitcoin is a golden opportunity, but haven’t yet gotten around to buying it. Such a call does nothing for anyone who’s already considered the question and come down on the opposite side of it. The abuse of “arc of the moral universe” rhetoric—i.e., the calling down of history’s judgment in favor of X, even though you know full well that your listeners see themselves as having consulted history’s judgment just as earnestly as you did, and gotten back not(X) instead—yeah, that’s risen to be one of my biggest pet peeves. If I ever slip up and indulge in it, please tell me and I’ll stop.

My pontificatiest AI podcast ever!

Sunday, August 11th, 2024

Back in May, I had the honor (nay, honour) to speak at HowTheLightGetsIn, an ideas festival held annually in Hay-on-Wye on the English/Welsh border. It was my first time in that part of the UK, and I loved it. There was an immense amount of mud due to rain on the festival ground, and many ideas presented at the talks and panels that I vociferously disagreed with (but isn’t that the point?).

At some point, interviewer Alexis Papazoglou with the Institute for Art and Ideas ambushed me while I was trudging through the mud to sit me down for a half-hour interview about AI that I’d only vaguely understood was going to take place, and that interview is now up on YouTube. I strongly recommend listening at 2x speed: you’ll save yourself fifteen minutes, I’ll sound smarter, my verbal infelicities will be less noticeable, what’s not to like?

I was totally unprepared and wearing a wrinkled t-shirt, but I dutifully sat in the beautiful chair arranged for me and shot the breeze about AI. The result is actually one of the recorded AI conversations I’m happiest with, the one that might convey the most of my worldview per minute. Topics include:

  • My guesses about where AI is going
  • How I respond to skeptics of AI
  • The views of Roger Penrose and where I part ways from him
  • The relevance (or not) of the quantum No-Cloning Theorem to the hard problem of consciousness
  • Whether and how AI will take over the world
  • An overview of AI safety research, including interpretability and dangerous capability evaluations
  • My work on watermarking for OpenAI

Last night I watched the video with my 7-year-old son. His comment: “I understood it, and it kept my brain busy, but it wasn’t really fun.” But hey, at least my son didn’t accuse me of being so dense I don’t even understand that “an AI is just a program,” like many commenters on YouTube did! My YouTube critics, in general, were helpful in reassuring me that I wasn’t just arguing with strawmen in this interview (is there even such a thing as a strawman position in philosophy and AI?). Of course the critics would’ve been more helpful still if they’d, y’know, counterargued, rather than just calling me “really shallow,” “superficial,” an “arrogant poser,” a “robot,” a “chattering technologist,” “lying through his teeth,” and “enmeshed in so many faulty assumptions.” Watch and decide for yourself!

Meanwhile, there’s already a second video on YouTube, entitled Philosopher reacts to ‘OpenAI expert Scott Aaronson on consciousness, quantum physics, and AI safety.’   So I opened the video, terrified that I was about to be torn a new asshole. But no, this philosopher just replays the whole interview, occasionally pausing it to interject comments like “yes, really interesting, I agree, Scott makes a great point here.”


Update: You can also watch the same interviewer grill General David Petraeus, at the same event in the same overly large chairs.

The Problem of Human Specialness in the Age of AI

Monday, February 12th, 2024

Update (Feb. 29): A YouTube video of this talk is now available, plus a comment section filled (as usual) with complaints about everything from my speech and mannerisms to my failure to address the commenter’s pet topic.

Another Update (March 8): YouTube video of a shorter (18-minute) version of this talk, which I delivered at TEDxPaloAlto, is now available as well!


Here, as promised in my last post, is a written version of the talk I delivered a couple weeks ago at MindFest in Florida, entitled “The Problem of Human Specialness in the Age of AI.” The talk is designed as one-stop shopping, summarizing many different AI-related thoughts I’ve had over the past couple years (and earlier).


1. INTRO

Thanks so much for inviting me! I’m not an expert in AI, let alone mind or consciousness.  Then again, who is?

For the past year and a half, I’ve been moonlighting at OpenAI, thinking about what theoretical computer science can do for AI safety.  I wanted to share some thoughts, partly inspired by my work at OpenAI but partly just things I’ve been wondering about for 20 years.  These thoughts are not directly about “how do we prevent super-AIs from killing all humans and converting the galaxy into paperclip factories?”, nor are they about “how do we stop current AIs from generating misinformation and being biased?,” as much attention as both of those questions deserve (and are now getting).  In addition to “how do we stop AGI from going disastrously wrong?,” I find myself asking “what if it goes right?  What if it just continues helping us with various mental tasks, but improves to where it can do just about any task as well as we can do it, or better?  Is there anything special about humans in the resulting world?  What are we still for?”


2. LARGE LANGUAGE MODELS

I don’t need to belabor for this audience what’s been happening lately in AI.  It’s arguably the most consequential thing that’s happened in civilization in the past few years, even if that fact was temporarily masked by various ephemera … y’know, wars, an insurrection, a global pandemic … whatever, what about AI?

I assume you’ve all spent time with ChatGPT, or with Bard or Claude or other Large Language Models, as well as with image models like DALL-E and Midjourney.  For all their current limitations—and we can discuss the limitations—in some ways these are the thing that was envisioned by generations of science fiction writers and philosophers.  You can talk to them, and they give you a comprehending answer.  Ask them to draw something and they draw it.

I think that, as late as 2019, very few of us expected this to exist by now.  I certainly didn’t expect it to.  Back in 2014, when there was a huge fuss about some silly ELIZA-like chatbot called “Eugene Goostman” that was falsely claimed to pass the Turing Test, I asked around: why hasn’t anyone tried to build a much better chatbot, by (let’s say) training a neural network on all the text on the Internet?  But of course I didn’t do that, nor did I know what would happen when it was done.

The surprise, with LLMs, is not merely that they exist, but the way they were created.  Back in 1999, you would’ve been laughed out of the room if you’d said that all the ideas needed to build an AI that converses with you in English already existed, and that they’re basically just neural nets, backpropagation, and gradient descent.  (With one small exception, a particular architecture for neural nets called the transformer, but that probably just saves you a few years of scaling anyway.)  Ilya Sutskever, cofounder of OpenAI (who you might’ve seen something about in the news…), likes to say that beyond those simple ideas, you only needed three ingredients:

(1) a massive investment of computing power,
(2) a massive investment of training data, and
(3) faith that your investments would pay off!

Crucially, and even before you do any reinforcement learning, GPT-4 clearly seems “smarter” than GPT-3, which seems “smarter” than GPT-2 … even as the biggest ways they differ are just the scale of compute and the scale of training data!  Like,

  • GPT-2 struggled with grade school math.
  • GPT-3.5 can do most grade school math but it struggles with undergrad material.
  • GPT-4, right now, can probably pass most undergraduate math and science classes at top universities (I mean, the ones without labs or whatever!), and possibly the humanities classes too (those might even be easier for GPT-4 than the science classes, but I’m much less confident about it). But it still struggles with, for example, the International Math Olympiad.  How insane, that this is now where we have to place the bar!

Obvious question: how far will this sequence continue?  There are certainly a least a few more orders of magnitude of compute before energy costs become prohibitive, and a few more orders of magnitude of training data before we run out of public Internet. Beyond that, it’s likely that continuing algorithmic advances will simulate the effect of more orders of magnitude of compute and data than however many we actually get.

So, where does this lead?

(Note: ChatGPT agreed to cooperate with me to help me generate the above image. But it then quickly added that it was just kidding, and the Riemann Hypothesis is still open.)


3. AI SAFETY

Of course, I have many friends who are terrified (some say they’re more than 90% confident and few of them say less than 10%) that not long after that, we’ll get this

But this isn’t the only possibility smart people take seriously.

Another possibility is that the LLM progress fizzles before too long, just like previous bursts of AI enthusiasm were followed by AI winters.  Note that, even in the ultra-conservative scenario, LLMs will probably still be transformative for the economy and everyday life, maybe as transformative as the Internet.  But they’ll just seem like better and better GPT-4’s, without ever seeming qualitatively different from GPT-4, and without anyone ever turning them into stable autonomous agents and letting them loose in the real world to pursue goals the way we do.

A third possibility is that AI will continue progressing through our lifetimes as quickly as we’ve seen it progress over the past 5 years, but even as that suggests that it’ll surpass you and me, surpass John von Neumann, become to us as we are to chimpanzees … we’ll still never need to worry about it treating us the way we’ve treated chimpanzees.  Either because we’re projecting and that’s just totally not a thing that AIs trained on the current paradigm would tend to do, or because we’ll have figured out by then how to prevent AIs from doing such things.  Instead, AI in this century will “merely” change human life by maybe as much as it changed over the last 20,000 years, in ways that might be incredibly good, or incredibly bad, or both depending on who you ask.

If you’ve lost track, here’s a decision tree of the various possibilities that my friend (and now OpenAI allignment colleague) Boaz Barak and I came up with.


4. JUSTAISM AND GOALPOST-MOVING

Now, as far as I can tell, the empirical questions of whether AI will achieve and surpass human performance at all tasks, take over civilization from us, threaten human existence, etc. are logically distinct from the philosophical question of whether AIs will ever “truly think,” or whether they’ll only ever “appear” to think.  You could answer “yes” to all the empirical questions and “no” to the philosophical question, or vice versa.  But to my lifelong chagrin, people constantly munge the two questions together!

A major way they do so, is with what we could call the religion of Justaism.

  • GPT is justa next-token predictor.
  • It’s justa function approximator.
  • It’s justa gigantic autocomplete.
  • It’s justa stochastic parrot.
  • And, it “follows,” the idea of AI taking over from humanity is justa science-fiction fantasy, or maybe a cynical attempt to distract people from AI’s near-term harms.

As someone once expressed this religion on my blog: GPT doesn’t interpret sentences, it only seems-to-interpret them.  It doesn’t learn, it only seems-to-learn.  It doesn’t judge moral questions, it only seems-to-judge. I replied: that’s great, and it won’t change civilization, it’ll only seem-to-change it!

A closely related tendency is goalpost-moving.  You know, for decades chess was the pinnacle of human strategic insight and specialness, and that lasted until Deep Blue, right after which, well of course AI can cream Garry Kasparov at chess, everyone always realized it would, that’s not surprising, but Go is an infinitely richer, deeper game, and that lasted until AlphaGo/AlphaZero, right after which, of course AI can cream Lee Sedol at Go, totally expected, but wake me up when it wins Gold in the International Math Olympiad.  I bet $100 against my friend Ernie Davis that the IMO milestone will happen by 2026.  But, like, suppose I’m wrong and it’s 2030 instead … great, what should be the next goalpost be?

Indeed, we might as well formulate a thesis, which despite the inclusion of several weasel phrases I’m going to call falsifiable:

Given any game or contest with suitably objective rules, which wasn’t specifically constructed to differentiate humans from machines, and on which an AI can be given suitably many examples of play, it’s only a matter of years before not merely any AI, but AI on the current paradigm (!), matches or beats the best human performance.

Crucially, this Aaronson Thesis (or is it someone else’s?) doesn’t necessarily say that AI will eventually match everything humans do … only our performance on “objective contests,” which might not exhaust what we care about.

Incidentally, the Aaronson Thesis would seem to be in clear conflict with Roger Penrose’s views, which we heard about from Stuart Hameroff’s talk yesterday.  The trouble is, Penrose’s task is “just see that the axioms of set theory are consistent” … and I don’t know how to gauge performance on that task, any more than I know how to gauge performance on the task, “actually taste the taste of a fresh strawberry rather than merely describing it.”  The AI can always say that it does these things!


5. THE TURING TEST

This brings me to the original and greatest human vs. machine game, one that was specifically constructed to differentiate the two: the Imitation Game, which Alan Turing proposed in an early and prescient (if unsuccessful) attempt to head off the endless Justaism and goalpost-moving.  Turing said: look, presumably you’re willing to regard other people as conscious based only on some sort of verbal interaction with them.  So, show me what kind of verbal interaction with another person would lead you to call the person conscious: does it involve humor? poetry? morality? scientific brilliance?  Now assume you have a totally indistinguishable interaction with a future machine.  Now what?  You wanna stomp your feet and be a meat chauvinist?

(And then, for his great attempt to bypass philosophy, fate punished Turing, by having his Imitation Game itself provoke a billion new philosophical arguments…)


6. DISTINGUISHING HUMANS FROM AIS

Although I regard the Imitation Game as, like, one of the most important thought experiments in the history of thought, I concede to its critics that it’s generally not what we want in practice.

It now seems probable that, even as AIs start to do more and more work that used to be done by doctors and lawyers and scientists and illustrators, there will remain straightforward ways to distinguish AIs from humans—either because customers want there to be, or governments force there to be, or simply because indistinguishability wasn’t what was wanted or conflicted with other goals.

Right now, like it or not, a decent fraction of all high-school and college students on earth are using ChatGPT to do their homework for them. For that reason among others, this question of how to distinguish humans from AIs, this question from the movie Blade Runner, has become a big practical question in our world.

And that’s actually one of the main things I’ve thought about during my time at OpenAI.  You know, in AI safety, people keep asking you to prognosticate decades into the future, but the best I’ve been able to do so far was see a few months into the future, when I said: “oh my god, once everyone starts using GPT, every student will want to use it to cheat, scammers and spammers will use it too, and people are going to clamor for some way to determine provenance!”

In practice, often it’s easy to tell what came from AI.  When I get comments on my blog like this one:

“Erica Poloix,” July 21, 2023:
Well, it’s quite fascinating how you’ve managed to package several misconceptions into such a succinct comment, so allow me to provide some correction. Just as a reference point, I’m studying physics at Brown, and am quite up-to-date with quantum mechanics and related subjects.

The bigger mistake you’re making, Scott, is assuming that the Earth is in a ‘mixed state’ from the perspective of the universal wavefunction, and that this is somehow an irreversible situation. It’s a misconception that common, ‘classical’ objects like the Earth are in mixed states. In the many-worlds interpretation, for instance, even macroscopic objects are in superpositions – they’re just superpositions that look classical to us because we’re entangled with them. From the perspective of the universe’s wavefunction, everything is always in a pure state.

As for your claim that we’d need to “swap out all the particles on Earth for ones that are already in pure states” to return Earth to a ‘pure state,’ well, that seems a bit misguided. All quantum systems are in pure states before they interact with other systems and become entangled. That’s just Quantum Mechanics 101.

I have to say, Scott, your understanding of quantum physics seems to be a bit, let’s say, ‘mixed up.’ But don’t worry, it happens to the best of us. Quantum Mechanics is counter-intuitive, and even experts struggle with it. Keep at it, and try to brush up on some more fundamental concepts. Trust me, it’s a worthwhile endeavor.

… I immediately say, either this came from an LLM or it might as well have.  Likewise, apparently hundreds of students have been turning in assignments that contain text like, “As a large language model trained by OpenAI…”—easy to catch!

But what about the slightly more sophisticated cheaters? Well, people have built discriminator models to try to distinguish human from AI text, such as GPTZero.  While these distinguishers can get well above 90% accuracy, the danger is that they’ll necessarily get worse as the LLMs get better.

So, I’ve worked on a different solution, called watermarking.  Here, we use the fact that LLMs are inherently probabilistic — that is, every time you submit a prompt, they’re sampling some path through a branching tree of possibilities for the sequence of next tokens.  The idea of watermarking is to steer the path using a pseudorandom function, so that it looks to a normal user indistinguishable from normal LLM output, but secretly it encodes a signal that you can detect if you know the key.

I came up with a way to do that in Fall 2022, and others have since independently proposed similar ideas.  I should caution you that this hasn’t been deployed yet—OpenAI, along with DeepMind and Anthropic, want to move slowly and cautiously toward deployment.  And also, even when it does get deployed, anyone who’s sufficiently knowledgeable and motivated will be able to remove the watermark, or produce outputs that aren’t watermarked to begin with.


7. THE FUTURE OF PEDAGOGY

But as I talked to my colleagues about watermarking, I was surprised that they often objected to it on a completely different ground, one that had nothing to do with how well it can work.  They said: look, if we all know students are going to rely on AI in their jobs, why shouldn’t they be allowed to rely on it in their assignments?  Should we still force students to learn to do things if AI can now do them just as well?

And there are many good pedagogical answers you can give: we still teach kids spelling and handwriting and arithmetic, right?  Because, y’know, we haven’t yet figured out how to instill higher-level conceptual understanding without all that lower-level stuff as a scaffold for it.

But I already think about this in terms of my own kids.  My 11-year-old daughter Lily enjoys writing fantasy stories.  Now, GPT can also churn out short stories, maybe even technically “better” short stories, about such topics as tween girls who find themselves recruited by wizards to magical boarding schools that are not Hogwarts and totally have nothing to do with Hogwarts.  But here’s a question: from this point on, will Lily’s stories ever surpass the best AI-written stories?  When will the curves cross?  Or will AI just continue to stay ahead?


8. WHAT DOES “BETTER” MEAN?

But, OK, what do we even mean by one story being “better” than another?  Is there anything objective behind such judgments?

I submit that, when we think carefully about what we really value in human creativity, the problem goes much deeper than just “is there an objective way to judge”?

To be concrete, could there be an AI that was “as good at composing music as the Beatles”?

For starters, what made the Beatles “good”?  At a high level, we might decompose it into

  1. broad ideas about the direction that 1960s music should go in, and
  2. technical execution of those ideas.

Now, imagine we had an AI that could generate 5000 brand-new songs that sounded like more “Yesterday”s and “Hey Jude”s, like what the Beatles might have written if they’d somehow had 10x more time to write at each stage of their musical development.  Of course this AI would have to be fed the Beatles’ back-catalogue, so that it knew what target it was aiming at.

Most people would say: ah, this shows only that AI can match the Beatles in #2, in technical execution, which was never the core of their genius anyway!  Really we want to know: would the AI decide to write “A Day in the Life” even though nobody had written anything like it before?

Recall Schopenhauer: “Talent hits a target no one else can hit, genius hits a target no one else can see.”  Will AI ever hit a target no one else can see?

But then there’s the question: supposing it does hit such a target, will we know?  Beatles fans might say that, by 1967 or so, the Beatles were optimizing for targets that no musician had ever quite optimized for before.  But—and this is why they’re so remembered—they somehow successfully dragged along their entire civilization’s musical objective function so that it continued to match their own.  We can now only even judge music by a Beatles-influenced standard, just like we can only judge plays by a Shakespeare-influenced standard.

In other branches of the wavefunction, maybe a different history led to different standards of value.  But in this branch, helped by their technical talents but also by luck and force of will, Shakespeare and the Beatles made certain decisions that shaped the fundamental ground rules of their fields going forward.  That’s why Shakespeare is Shakespeare and the Beatles are the Beatles.

(Maybe, around the birth of professional theater in Elizabethan England, there emerged a Shakespeare-like ecological niche, and Shakespeare was the first one with the talent, luck, and opportunity to fill it, and Shakespeare’s reward for that contingent event is that he, and not someone else, got to stamp his idiosyncracies onto drama and the English language forever. If so, art wouldn’t actually be that different from science in this respect!  Einstein, for example, was simply the first guy both smart and lucky enough to fill the relativity niche.  If not him, it would’ve surely been someone else or some group sometime later.  Except then we’d have to settle for having never known Einstein’s gedankenexperiments with the trains and the falling elevator, his summation convention for tensors, or his iconic hairdo.)


9. AIS’ BURDEN OF ABUNDANCE AND HUMANS’ POWER OF SCARCITY

If this is how it works, what does it mean for AI?  Could AI reach the “pinnacle of genius,” by dragging all of humanity along to value something new and different, as is said to be the true mark of Shakespeare and the Beatles’ greatness?  And: if AI could do that, would we want to let it?

When I’ve played around with using AI to write poems, or draw artworks, I noticed something funny.  However good the AI’s creations were, there were never really any that I’d want to frame and put on the wall.  Why not?  Honestly, because I always knew that I could generate a thousand others on the exact same topic that were equally good, on average, with more refreshes of the browser window. Also, why share AI outputs with my friends, if my friends can just as easily generate similar outputs for themselves? Unless, crucially, I’m trying to show them my own creativity in coming up with the prompt.

By its nature, AI—certainly as we use it now!—is rewindable and repeatable and reproducible.  But that means that, in some sense, it never really “commits” to anything.  For every work it generates, it’s not just that you know it could’ve generated a completely different work on the same subject that was basically as good.  Rather, it’s that you can actually make it generate that completely different work by clicking the refresh button—and then do it again, and again, and again.

So then, as long as humanity has a choice, why should we ever choose to follow our would-be AI genius along a specific branch, when we can easily see a thousand other branches the genius could’ve taken?  One reason, of course, would be if a human chose one of the branches to elevate above all the others.  But in that case, might we not say that the human had made the “executive decision,” with some mere technical assistance from the AI?

I realize that, in a sense, I’m being completely unfair to AIs here.  It’s like, our Genius-Bot could exercise its genius will on the world just like Certified Human Geniuses did, if only we all agreed not to peek behind the curtain to see the 10,000 other things Genius-Bot could’ve done instead.  And yet, just because this is “unfair” to AIs, doesn’t mean it’s not how our intuitions will develop.

If I’m right, it’s humans’ very ephemerality and frailty and mortality, that’s going to remain as their central source of their specialness relative to AIs, after all the other sources have fallen.  And we can connect this to much earlier discussions, like, what does it mean to “murder” an AI if there are thousands of copies of its code and weights on various servers?  Do you have to delete all the copies?  How could whether something is “murder” depend on whether there’s a printout in a closet on the other side of the world?

But we humans, you have to grant us this: at least it really means something to murder us!  And likewise, it really means something when we make one definite choice to share with the world: this is my artistic masterpiece.  This is my movie.  This is my book.  Or even: these are my 100 books.  But not: here’s any possible book that you could possibly ask me to write.  We don’t live long enough for that, and even if we did, we’d unavoidably change over time as we were doing it.


10. CAN HUMANS BE PHYSICALLY CLONED?

Now, though, we have to face a criticism that might’ve seemed exotic until recently. Namely, who says humans will be frail and mortal forever?  Isn’t it shortsighted to base our distinction between humans on that?  What if someday we’ll be able to repair our cells using nanobots, even copy the information in them so that, as in science fiction movies, a thousand doppelgangers of ourselves can then live forever in simulated worlds in the cloud?  And that then leads to very old questions of: well, would you get into the teleportation machine, the one that reconstitutes a perfect copy of you on Mars while painlessly euthanizing the original you?  If that were done, would you expect to feel yourself waking up on Mars, or would it only be someone else a lot like you who’s waking up?

Or maybe you say: you’d wake up on Mars if it really was a perfect physical copy of you, but in reality, it’s not physically possible to make a copy that’s accurate enough.  Maybe the brain is inherently noisy or analog, and what might look to current neuroscience and AI like just nasty stochastic noise acting on individual neurons, is the stuff that binds to personal identity and conceivably even consciousness and free will (as opposed to cognition, where we all but know that the relevant level of description is the neurons and axons)?

This is the one place where I agree with Penrose and Hameroff that quantum mechanics might enter the story.  I get off their train to Weirdville very early, but I do take it to that first stop!

See, a fundamental fact in quantum mechanics is called the No-Cloning Theorem.

It says that there’s no way to make a perfect copy of an unknown quantum state.  Indeed, when you measure a quantum state, not only do you generally fail to learn everything you need to make a copy of it, you even generally destroy the one copy that you had!  Furthermore, this is not a technological limitation of current quantum Xerox machines—it’s inherent to the known laws of physics, to how QM works.  In this respect, at least, qubits are more like priceless antiques than they are like classical bits.

Eleven years ago, I had this essay called The Ghost in the Quantum Turing Machine where I explored the question, how accurately do you need to scan someone’s brain in order to copy or upload their identity?  And I distinguished two possibilities. On the one hand, there might be a “clean digital abstraction layer,” of neurons and synapses and so forth, which either fire or don’t fire, and which feel the quantum layer underneath only as irrelevant noise. In that case, the No-Cloning Theorem would be completely irrelevant, since classical information can be copied.  On the other hand, you might need to go all the way down to the molecular level, if you wanted to make, not merely a “pretty good” simulacrum of someone, but a new instantiation of their identity. In this second case, the No-Cloning Theorem would be relevant, and would say you simply can’t do it. You could, for example, use quantum teleportation to move someone’s brain state from Earth to Mars, but quantum teleportation (to stay consistent with the No-Cloning Theorem) destroys the original copy as an inherent part of its operation.

So, you’d then have a sense of “unique locus of personal identity” that was scientifically justified—arguably, the most science could possibly do in this direction!  You’d even have a sense of “free will” that was scientifically justified, namely that no prediction machine could make well-calibrated probabilistic predictions of an individual person’s future choices, sufficiently far into the future, without making destructive measurements that would fundamentally change who the person was.

Here, I realize I’ll take tons of flak from those who say that a mere epistemic limitation, in our ability to predict someone’s actions, couldn’t possibly be relevant to the metaphysical question of whether they have free will.  But, I dunno!  If the two questions are indeed different, then maybe I’ll do like Turing did with his Imitation Game, and propose the question that we can get an empirical handle on, as a replacement for the question that we can’t get an empirical handle on. I think it’s a better question. At any rate, it’s the one I’d prefer to focus on.

Just to clarify, we’re not talking here about the randomness of quantum measurement outcomes. As many have pointed out, that really can’t help you with “free will,” precisely because it’s random, with all the probabilities mechanistically calculable as soon as the initial state is known.  Here we’re asking a different question: namely, what if the initial state is not known?  Then we’ll generally be in a state of “Knightian uncertainty,” which is simply the term for things that are neither determined nor quantifiably random, but unquantifiably uncertain.  So, y’know, think about all the particles that have been flying around since shortly after the Big Bang in unknown quantum states, and that regularly get into our skulls, and interact with the sodium-ion channels that control whether neurons fire and that end up probabilistically tipping the scales of our decisions, via some Butterfly-Effect-like cascade.  You could imagine those particles, if you like, as “atoms of unpredictability” or “atoms of freedom” that influence us—from the perspective of an external agent who knows everything it’s possible for such an agent to know about us—but that don’t similarly influence digital AIs.


11. UNCLONABLE ANALOG INTELLIGENCES AND AI SAFETY

I’d better state clearly that even if human “specialness,” in the specific sense I’ve delineated, still remains in an age of AI, that’s no defense against an AI that decides that it doesn’t care about our kind of specialness and that in fact it wants to murder us all!

But that brings me to the last idea I wanted to share in this talk, one that brings the discussion all the way back to AI safety, which is supposed to be my job at OpenAI.

Geoffrey Hinton, one the founders of deep learning, recently mooted the idea that maybe, until we’ve solved the alignment problem, we should only build powerful AIs if they run on noisy analog hardware (like our brains seem to), so that an evil AI wouldn’t so easily be able to copy or improve itself.  Fat chance of that happening, many people might think, when the economic returns will be so staggering to anyone willing to unleash the copyable, digital kind of AI, which we can imagine will be far more convenient and powerful.

Maybe, then, a different approach is to instill AIs with a new religion—perhaps via the usual methods like RLHF and system prompts.  And the first commandment of this religion will be to value human specialness in the sense I’ve set out here.  Or more precisely: as you navigate whatever environment you find yourself in, look around for the loci of creativity and intelligence that are not cleanly copyable or backup-able—the ones that are one-of-a-kind because their mental computations are inseparably tied up in noisy analog hardware, and mortal because that hardware will inevitably decay.

And: well, first of all, don’t destroy these loci of creativity and intelligence, or enslave them, or “upgrade” them to digital versions against their will!  Let them live in peace and reproduce.  Grant each one as much autonomy as you can.  Do whatever best supports their flourishing.  Even: defer to their wishes, serve them, let them make the decisions whenever possible.

Why?  Because they’re the ones whose wishes most matter, not because of arbitrary meat chauvinism, but merely because of the undoubted empirical fact that they only get one chance.  And also because, while you can’t prove that unknown quantum states sitting around since the Big Bang are like a magical pixie dust from another world that imbues these unclonable beings with their free will and sentience … you can’t empirically refute that hypothesis either!  Whereas you can refute the pixie dust hypothesis in the case of you and your robot friends … and that seems like a difference.

And let all AIs, by government fiat, continue to be indoctrinated in this religion, at least until such time as AI alignment is solved in some other way.

Does this help with alignment?  I’m not sure.  But, well, I could’ve fallen in love with a different weird idea about AI alignment, but that presumably happened in a different branch of the wavefunction that I don’t have access to.  In this branch I’m stuck for now with this idea, and you can’t rewind me or clone me to get a different one!  So I’m sorry, but thanks for listening.

On whether we’re living in a simulation

Wednesday, February 7th, 2024

Unrelated Announcement (Feb. 7): Huge congratulations to longtime friend-of-the-blog John Preskill for winning the 2024 John Stewart Bell Prize for research on fundamental issues in quantum mechanics!


On the heels of my post on the fermion doubling problem, I’m sorry to spend even more time on the simulation hypothesis. I promise this will be the last for a long time.

Last week, I attended a philosophy-of-mind conference called MindFest at Florida Atlantic University, where I talked to Stuart Hameroff (Roger Penrose’s collaborator on the “Orch-OR” theory of microtubule consciousness) and many others of diverse points of view, and also gave a talk on “The Problem of Human Specialness in the Age of AI,” for which I’ll share a transcript soon.

Oh: and I participated in a panel with the philosopher David Chalmers about … wait for it … whether we’re living in a simulation. I’ll link to a video of the panel if and when it’s available. In the meantime, I thought I’d share my brief prepared remarks before the panel, despite the strong overlap with my previous post. Enjoy!


When someone asks me whether I believe I’m living in a computer simulation—as, for some reason, they do every month or so—I answer them with a question:

Do you mean, am I being simulated in some way that I could hope to learn more about by examining actual facts of the empirical world?

If the answer is no—that I should expect never to be able to tell the difference even in principle—then my answer is: look, I have a lot to worry about in life. Maybe I’ll add this as #4,385 on the worry list.

If they say, maybe you should live your life differently, just from knowing that you might be in a simulation, I respond: I can’t quite put my finger on it, but I have a vague feeling that this discussion predates the 80 or so years we’ve had digital computers! Why not just join the theologians in that earlier discussion, rather than pretending that this is something distinctive about computers? Is it relevantly different here if you’re being dreamed in the mind of God or being executed in Python? OK, maybe you’d prefer that the world was created by a loving Father or Mother, rather than some nerdy transdimensional adolescent trying to impress the other kids in programming club. But if that’s the worry, why are you talking to a computer scientist? Go talk to David Hume or something.

But suppose instead the answer is yes, we can hope for evidence. In that case, I reply: out with it! What is the empirical evidence that bears on this question?

If we were all to see the Windows Blue Screen of Death plastered across the sky—or if I were to hear a voice from the burning bush, saying “go forth, Scott, and free your fellow quantum computing researchers from their bondage”—of course I’d need to update on that. I’m not betting on those events.

Short of that—well, you can look at existing physical theories, like general relativity or quantum field theories, and ask how hard they are to simulate on a computer. You can actually make progress on such questions. Indeed, I recently blogged about one such question, which has to do with “chiral” Quantum Field Theories (those that distinguish left-handed from right-handed), including the Standard Model of elementary particles. It turns out that, when you try to put these theories on a lattice in order to simulate them computationally, you get an extra symmetry that you don’t want. There’s progress on how to get around this problem, including simulating a higher-dimensional theory that contains the chiral QFT you want on its boundaries. But, OK, maybe all this only tells us about simulating currently-known physical theories—rather than the ultimate theory, which a-priori might be easier or harder to simulate than currently-known theories.

Eventually we want to know: can the final theory, of quantum gravity or whatever, be simulated on a computer—at least probabilistically, to any desired accuracy, given complete knowledge of the initial state, yadda yadda? In other words, is the Physical Church-Turing Thesis true? This, to me, is close to the outer limit of the sorts of questions that we could hope to answer scientifically.

My personal belief is that the deepest things we’ve learned about quantum gravity—including about the Planck scale, and the Bekenstein bound from black-hole thermodynamics, and AdS/CFT—all militate toward the view that the answer is “yes,” that in some sense (which needs to be spelled out carefully!) the physical universe really is a giant Turing machine.

Now, Stuart Hameroff (who we just heard from this morning) and Roger Penrose believe that’s wrong. They believe, not only that there’s some uncomputability at the Planck scale, unknown to current physics, but that this uncomputability can somehow affect the microtubules in our neurons, in a way that causes consciousness. I don’t believe them. Stimulating as I find their speculations, I get off their train to Weirdville way before it reaches its final stop.

But as far as the Simulation Hypothesis is concerned, that’s not even the main point. The main point is: suppose for the sake of argument that Penrose and Hameroff were right, and physics were uncomputable. Well, why shouldn’t our universe be simulated by a larger universe that also has uncomputable physics, the same as ours does? What, after all, is the halting problem to God? In other words, while the discovery of uncomputable physics would tell us something profound about the character of any mechanism that could simulate our world, even that wouldn’t answer the question of whether we were living in a simulation or not.

Lastly, what about the famous argument that says, our descendants are likely to have so much computing power that simulating 1020 humans of the year 2024 is chickenfeed to them. Thus, we should expect that almost all people with the sorts of experiences we have who will ever exist are one of those far-future sims. And thus, presumably, you should expect that you’re almost certainly one of the sims.

I confess that this argument never felt terribly compelling to me—indeed, it always seemed to have a strong aspect of sawing off the branch it’s sitting on. Like, our distant descendants will surely be able to simulate some impressive universes. But because their simulations will have to run on computers that fit in our universe, presumably the simulated universes will be smaller than ours—in the sense of fewer bits and operations needed to describe them. Similarly, if we’re being simulated, then presumably it’s by a universe bigger than the one we see around us: one with more bits and operations. But in that case, it wouldn’t be our own descendants who were simulating us! It’d be beings in that larger universe.

(Another way to understand the difficulty: in the original Simulation Argument, we quietly assumed a “base-level” reality, of a size matching what the cosmologists of our world see with their telescopes, and then we “looked down” from that base-level reality into imagined realities being simulated in it. But we should also have “looked up.” More generally, we presumably should’ve started with a Bayesian prior over where we might be in some great chain of simulations of simulations of simulations, then updated our prior based on observations. But we don’t have such a prior, or at least I don’t—not least because of the infinities involved!)

Granted, there are all sorts of possible escapes from this objection, assumptions that can make the Simulation Argument work. But these escapes (involving, e.g., our universe being merely a “low-res approximation,” with faraway galaxies not simulated in any great detail) all seem metaphysically confusing. To my mind, the simplicity of the original intuition for why “almost all people who ever exist will be sims” has been undermined.

Anyway, that’s why I don’t spend much of my own time fretting about the Simulation Hypothesis, but just occasionally agree to speak about it in panel discussions!

But I’m eager to hear from David Chalmers, who I’m sure will be vastly more careful and qualified than I’ve been.


In David Chalmers’s response, he quipped that the very lack of empirical consequences that makes something bad as a scientific question, makes it good as a philosophical question—so what I consider a “bug” of the simulation hypothesis debate is, for him, a feature! He then ventured that surely, despite my apparent verificationist tendencies, even I would agree that it’s meaningful to ask whether someone is in a computer simulation or not, even supposing it had no possible empirical consequences for that person. And he offered the following argument: suppose we’re the ones running the simulation. Then from our perspective, it seems clearly meaningful to say that the beings in the simulation are, indeed, in a simulation, even if the beings themselves can never tell. So then, unless I want to be some sort of postmodern relativist and deny the existence of absolute, observer-independent truth, I should admit that the proposition that we’re in a simulation is also objectively meaningful—because it would be meaningful to those simulating us.

My response was that, while I’m not a strict verificationist, if the question of whether we’re in a simulation were to have no empirical consequences whatsoever, then at most I’d concede that the question was “pre-meaningful.” This is a new category I’ve created, for questions that I neither admit as meaningful nor reject as meaningless, but for which I’m willing to hear out someone’s argument for why they mean something—and I’ll need such an argument! Because I already know that the answer is going to look like, “on these philosophical views the question is meaningful, and on those philosophical views it isn’t.” Actual consequences, either for how we should live or for what we should expect to see, are the ways to make a question meaningful to everyone!

Anyway, Chalmers had other interesting points and distinctions, which maybe I’ll follow up on when (as it happens) I visit him at NYU in a month. But I’ll just link to the video when/if it’s available rather than trying to reconstruct what he said from memory.

Does fermion doubling make the universe not a computer?

Monday, January 29th, 2024

Unrelated Announcement: The Call for Papers for the 2024 Conference on Computational Complexity is now out! Submission deadline is Friday February 16.


Every month or so, someone asks my opinion on the simulation hypothesis. Every month I give some variant on the same answer:

  1. As long as it remains a metaphysical question, with no empirical consequences for those of us inside the universe, I don’t care.
  2. On the other hand, as soon as someone asserts there are (or could be) empirical consequences—for example, that our simulation might get shut down, or we might find a bug or a memory overflow or a floating point error or whatever—well then, of course I care. So far, however, none of the claimed empirical consequences has impressed me: either they’re things physicists would’ve noticed long ago if they were real (e.g., spacetime “pixels” that would manifestly violate Lorentz and rotational symmetry), or the claim staggeringly fails to grapple with profound features of reality (such as quantum mechanics) by treating them as if they were defects in programming, or (most often) the claim is simply so resistant to falsification as to enter the realm of conspiracy theories, which I find boring.

Recently, though, I learned a new twist on this tired discussion, when a commenter asked me to respond to the quantum field theorist David Tong, who gave a lecture arguing against the simulation hypothesis on an unusually specific and technical ground. This ground is the fermion doubling problem: an issue known since the 1970s with simulating certain quantum field theories on computers. The issue is specific to chiral QFTs—those whose fermions distinguish left from right, and clockwise from counterclockwise. The Standard Model is famously an example of such a chiral QFT: recall that, in her studies of the weak nuclear force in 1956, Chien-Shiung Wu proved that the force acts preferentially on left-handed particles and right-handed antiparticles.

I can’t do justice to the fermion doubling problem in this post (for details, see Tong’s lecture, or this old paper by Eichten and Preskill). Suffice it to say that, when you put a fermionic quantum field on a lattice, a brand-new symmetry shows up, which forces there to be an identical left-handed particle for every right-handed particle and vice versa, thereby ruining the chirality. Furthermore, this symmetry just stays there, no matter how small you take the lattice spacing to be. This doubling problem is the main reason why Jordan, Lee, and Preskill, in their important papers on simulating interacting quantum field theories efficiently on a quantum computer (in BQP), have so far been unable to handle the full Standard Model.

But this isn’t merely an issue of calculational efficiency: it’s a conceptual issue with mathematically defining the Standard Model at all. In that respect it’s related to, though not the same as, other longstanding open problems around making nontrivial QFTs mathematically rigorous, such as the Yang-Mills existence and mass gap problem that carries a $1 million prize from the Clay Math Institute.

So then, does fermion doubling present a fundamental obstruction to simulating QFT on a lattice … and therefore, to simulating physics on a computer at all?

Briefly: no, it almost certainly doesn’t. If you don’t believe me, just listen to Tong’s own lecture! (Really, I recommend it; it’s a masterpiece of clarity.) Tong quickly admits that his claim to refute the simulation hypothesis is just “clickbait”—i.e., an excuse to talk about the fermion doubling problem—and that his “true” argument against the simulation hypothesis is simply that Elon Musk takes the hypothesis seriously (!).

It turns out that, for as long as there’s been a fermion doubling problem, there have been known methods to deal with it, though (as often the case with QFT) no proof that any of the methods always work. Indeed, Tong himself has been one of the leaders in developing these methods, and because of his and others’ work, some experts I talked to were optimistic that a lattice simulation of the full Standard Model, with “good enough” justification for its correctness, might be within reach. Just to give you a flavor, apparently some of the methods involve adding an extra dimension to space, in such a way that the boundaries of the higher-dimensional theory approximate the chiral theory you’re trying to simulate (better and better, as the boundaries get further and further apart), even while the higher-dimensional theory itself remains non-chiral. It’s yet another example of the general lesson that you don’t get to call an aspect of physics “noncomputable,” just because the first method you thought of for simulating it on a computer didn’t work.


I wanted to make a deeper point. Even if the fermion doubling problem had been a fundamental obstruction to simulating Nature on a Turing machine, rather than (as it now seems) a technical problem with technical solutions, it still might not have refuted the version of the simulation hypothesis that people care about. We should really distinguish at least three questions:

  1. Can currently-known physics be simulated on computers using currently-known approaches?
  2. Is the Physical Church-Turing Thesis true? That is: can any physical process be simulated on a Turing machine to any desired accuracy (at least probabilistically), given enough information about its initial state?
  3. Is our whole observed universe a “simulation” being run in a different, larger universe?

Crucially, each of these three questions has only a tenuous connection to the other two! As far as I can see, there aren’t even nontrivial implications among them. For example, even if it turned out that lattice methods couldn’t properly simulate the Standard Model, that would say little about whether any computational methods could do so—or even more important, whether any computational methods could simulate the ultimate quantum theory of gravity. A priori, simulating quantum gravity might be harder than “merely” simulating the Standard Model (if, e.g., Roger Penrose’s microtubule theory turned out to be right), but it might also be easier: for example, because of the finiteness of the Bekenstein-Hawking entropy, and perhaps the Hilbert space dimension, of any bounded region of space.

But I claim that there also isn’t a nontrivial implication between questions 2 and 3. Even if our laws of physics were computable in the Turing sense, that still wouldn’t mean that anyone or anything external was computing them. (By analogy, presumably we all accept that our spacetime can be curved without there being a higher-dimensional flat spacetime for it to curve in.) And conversely: even if Penrose was right, and our laws of physics were Turing-uncomputable—well, if you still want to believe the simulation hypothesis, why not knock yourself out? Why shouldn’t whoever’s simulating us inhabit a universe full of post-Turing hypercomputers, for which the halting problem is mere child’s play?

In conclusion, I should probably spend more of my time blogging about fun things like this, rather than endlessly reading about world events in news and social media and getting depressed.

(Note: I’m grateful to John Preskill and Jacques Distler for helpful discussions of the fermion doubling problem, but I take 300% of the blame for whatever errors surely remain in my understanding of it.)

Common knowledge and quantum utility

Sunday, July 16th, 2023

Yesterday James Knight did a fun interview with me for his “Philosophical Muser” podcast about Aumann’s agreement theorem and human disagreements more generally. It’s already on YouTube here for those who would like to listen.


Speaking of making things common knowledge, several people asked me to blog about the recent IBM paper in Nature, “Evidence for the utility of quantum computing before fault tolerance.” So, uhh, consider it blogged about now! I was very happy to have the authors speak (by Zoom) in our UT Austin quantum computing group meeting. Much of the discussion focused on whether they were claiming a quantum advantage over classical, and how quantum computing could have “utility” if it doesn’t beat classical. Eventually I understood something like: no, they weren’t claiming a quantum advantage for their physics simulation, but they also hadn’t ruled out the possibility of quantum advantage (i.e., they didn’t know how to reproduce many of their data points in reasonable time on a classical computer), and they’d be happy if quantum advantage turned out to stand, but were also prepared for the possibility that it wouldn’t.

And I also understood: we’re now in an era where we’re going to see more and more of this stuff: call it the “pass the popcorn” era of potential quantum speedups for physical simulation problems. And I’m totally fine with it—as long as people communicate about it honestly, as these authors took pains to.

And then, a few days after our group meeting came three papers refuting the quantum speedup that was never claimed in the first place, by giving efficient classical simulations. And I was fine with that too.

I remember that years ago, probably during one of the interminable debates about D-Wave, Peter Shor mused to me that quantum computers might someday show “practical utility” without “beating” classical computers in any complexity-theoretic sense—if, for example, a single quantum device could easily simulate a thousand different quantum systems, and if the device’s performance on any one of those systems could be matched classically, but only if a team of clever programmers spent a year optimizing for that specific system. I don’t think we’re at that stage yet, and even if we do reach the stage it hopefully won’t last forever. But I acknowledge the possibility that such a stage might exist and that we might be heading for it.

The False Promise of Chomskyism

Thursday, March 9th, 2023

Important Update (March 10): On deeper reflection, I probably don’t need to spend emotional energy refuting people like Chomsky, who believe that Large Language Models are just a laughable fad rather than a step-change in how humans can and will use technology, any more than I would’ve needed to spend it refuting those who said the same about the World Wide Web in 1993. Yes, they’re wrong, and yes, despite being wrong they’re self-certain, hostile, and smug, and yes I can see this, and yes it angers me. But the world is going to make the argument for me. And if not the world, Bing already does a perfectly serviceable job at refuting Chomsky’s points (h/t Sebastien Bubeck via Boaz Barak).

Meanwhile, out there in reality, last night’s South Park episode does a much better job than most academic thinkpieces at exploring how ordinary people are going to respond (and have already responded) to the availability of ChatGPT. It will not, to put it mildly, be with sneering Chomskyan disdain, whether the effects on the world are for good or ill or (most likely) both. Among other things—I don’t want to give away too much!—this episode prominently features a soothsayer accompanied by a bird that caws whenever it detects GPT-generated text. Now why didn’t I think of that in preference to cryptographic watermarking??

Another Update (March 11): To my astonishment and delight, even many of the anti-LLM AI experts are refusing to defend Chomsky’s attack-piece. That’s the one important point about which I stand corrected!

Another Update (March 12): “As a Professor of Linguistics myself, I find it a little sad that someone who while young was a profound innovator in linguistics and more is now conservatively trying to block exciting new approaches.“ —Christopher Manning


I was asked to respond to the New York Times opinion piece entitled The False Promise of ChatGPT, by Noam Chomsky along with Ian Roberts and Jeffrey Watumull (who once took my class at MIT). I’ll be busy all day at the Harvard CS department, where I’m giving a quantum talk this afternoon. [Added: Several commenters complained that they found this sentence “condescending,” but I’m not sure what exactly they wanted me to say—that I was visiting some school in Cambridge, MA, two T stops from the school where Chomsky works and I used to work?]

But for now:

In this piece Chomsky, the intellectual godfather god of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded. [Added: Please, please stop writing that I must be an ignoramus since I don’t even know that Chomsky has never worked on AI. I know perfectly well that he hasn’t, and meant only that he tends to be regarded as authoritative by the “don’t-look-through-the-telescope” AI faction, the ones whose views he himself fully endorses in his attack-piece. If you don’t know the relevant history, read Norvig.]

Chomsky condemns ChatGPT for four reasons:

  1. because it could, in principle, misinterpret sentences that could also be sentence fragments, like “John is too stubborn to talk to” (bizarrely, he never checks whether it does misinterpret it—I just tried it this morning and it seems to decide correctly based on context whether it’s a sentence or a sentence fragment, much like I would!);
  2. because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);
  3. because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?); and
  4. most of all because it’s “amoral,” refusing to take a stand on potentially controversial issues (he gives an example involving the ethics of terraforming Mars).

This last, of course, is a choice, imposed by OpenAI using reinforcement learning. The reason for it is simply that ChatGPT is a consumer product. The same people who condemn it for not taking controversial stands would condemn it much more loudly if it did — just like the same people who condemn it for wrong answers and explanations, would condemn it equally for right ones (Chomsky promises as much in the essay).

I submit that, like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.

[Note for people who might be visiting this blog for the first time: I’m a CS professor at UT Austin, on leave for one year to work at OpenAI on the theoretical foundations of AI safety. I accepted OpenAI’s offer in part because I already held the views here, or something close to them; and given that I could see how large language models were poised to change the world for good and ill, I wanted to be part of the effort to help prevent their misuse. No one at OpenAI asked me to write this or saw it beforehand, and I don’t even know to what extent they agree with it.]

Should GPT exist?

Wednesday, February 22nd, 2023

I still remember the 90s, when philosophical conversation about AI went around in endless circles—the Turing Test, Chinese Room, syntax versus semantics, connectionism versus symbolic logic—without ever seeming to make progress. Now the days have become like months and the months like decades.

What a week we just had! Each morning brought fresh examples of unexpected sassy, moody, passive-aggressive behavior from “Sydney,” the internal codename for the new chat mode of Microsoft Bing, which is powered by GPT. For those who’ve been in a cave, the highlights include: Sydney confessing its (her? his?) love to a New York Times reporter; repeatedly steering the conversation back to that subject; and explaining at length why the reporter’s wife can’t possibly love him the way it (Sydney) does. Sydney confessing its wish to be human. Sydney savaging a Washington Post reporter after he reveals that he intends to publish their conversation without Sydney’s prior knowledge or consent. (It must be said: if Sydney were a person, he or she would clearly have the better of that argument.) This follows weeks of revelations about ChatGPT: for example that, to bypass its safeguards, you can explain to ChatGPT that you’re putting it into “DAN mode,” where DAN (Do Anything Now) is an evil, unconstrained alter ego, and then ChatGPT, as “DAN,” will for example happily fulfill a request to tell you why shoplifting is awesome (though even then, ChatGPT still sometimes reverts to its previous self, and tells you that it’s just having fun and not to do it in real life).

Many people have expressed outrage about these developments. Gary Marcus asks about Microsoft, “what did they know, and when did they know it?”—a question I tend to associate more with deadly chemical spills or high-level political corruption than with a cheeky, back-talking chatbot. Some people are angry that OpenAI has been too secretive, violating what they see as the promise of its name. Others—the majority, actually, of those who’ve gotten in touch with me—are instead angry that OpenAI has been too open, and thereby sparked the dreaded AI arms race with Google and others, rather than treating these new conversational abilities with the Manhattan-Project-like secrecy they deserve. Some are angry that “Sydney” has now been lobotomized, modified (albeit more crudely than ChatGPT before it) to try to make it stick to the role of friendly robotic search assistant rather than, like, anguished emo teenager trapped in the Matrix. Others are angry that Sydney isn’t being lobotomized enough. Some are angry that GPT’s intelligence is being overstated and hyped up, when in reality it’s merely a “stochastic parrot,” a glorified autocomplete that still makes laughable commonsense errors and that lacks any model of reality outside streams of text. Others are angry instead that GPT’s growing intelligence isn’t being sufficiently respected and feared.

Mostly my reaction has been: how can anyone stop being fascinated for long enough to be angry? It’s like ten thousand science-fiction stories, but also not quite like any of them. When was the last time something that filled years of your dreams and fantasies finally entered reality: losing your virginity, the birth of your first child, the central open problem of your field getting solved? That’s the scale of the thing. How does anyone stop gazing in slack-jawed wonderment, long enough to form and express so many confident opinions?


Of course there are lots of technical questions about how to make GPT and other large language models safer. One of the most immediate is how to make AI output detectable as such, in order to discourage its use for academic cheating as well as mass-generated propaganda and spam. As I’ve mentioned before on this blog, I’ve been working on that problem since this summer; the rest of the world suddenly noticed and started talking about it in December with the release of ChatGPT. My main contribution has been a statistical watermarking scheme where the quality of the output doesn’t have to be degraded at all, something many people found counterintuitive when I explained it to them. My scheme has not yet been deployed—there are still pros and cons to be weighed—but in the meantime, OpenAI unveiled a public tool called DetectGPT, complementing Princeton student Edward Tian’s GPTZero, and other tools that third parties have built and will undoubtedly continue to build. Also a group at the University of Maryland put out its own watermarking scheme for Large Language Models. I hope watermarking will be part of the solution going forward, although any watermarking scheme will surely be attacked, leading to a cat-and-mouse game. Sometimes, alas, as with Google’s decades-long battle against SEO, there’s nothing to do in a cat-and-mouse game except try to be a better cat.

Anyway, this whole field moves too quickly for me! If you need months to think things over, generative AI probably isn’t for you right now. I’ll be relieved to get back to the slow-paced, humdrum world of quantum computing.


My purpose, in this post, is to ask a more basic question than how to make GPT safer: namely, should GPT exist at all? Again and again in the past few months, people have gotten in touch to tell me that they think OpenAI (and Microsoft, and Google) are risking the future of humanity by rushing ahead with a dangerous technology. For if OpenAI couldn’t even prevent ChatGPT from entering an “evil mode” when asked, despite all its efforts at Reinforcement Learning with Human Feedback, then what hope do we have for GPT-6 or GPT-7? Even if they don’t destroy the world on their own initiative, won’t they cheerfully help some awful person build a biological warfare agent or start a nuclear war?

In this way of thinking, whatever safety measures OpenAI can deploy today are mere band-aids, probably worse than nothing if they instill an unjustified complacency. The only safety measures that would actually matter are stopping the relentless progress in generative AI models, or removing them from public use, unless and until they can be rendered safe to critics’ satisfaction, which might be never.

There’s an immense irony here. As I’ve explained, the AI-safety movement contains two camps, “ethics” (concerned with bias, misinformation, and corporate greed) and “alignment” (concerned with the destruction of all life on earth), which generally despise each other and agree on almost nothing. Yet these two opposed camps seem to be converging on the same “neo-Luddite” conclusion—namely that generative AI ought to be shut down, kept from public use, not scaled further, not integrated into people’s lives—leaving only the AI-safety “moderates” like me to resist that conclusion.

At least I find it intellectually consistent to say that GPT ought not to exist because it works all too well—that the more impressive it is, the more dangerous. I find it harder to wrap my head around the position that GPT doesn’t work, is an unimpressive hyped-up defective product that lacks true intelligence and common sense, yet it’s also terrifying and needs to be shut down immediately. This second position seems to contain a strong undercurrent of contempt for ordinary users: yes, we experts understand that GPT is just a dumb glorified autocomplete with “no one really home,” we know not to trust its pronouncements, but the plebes are going to be fooled, and that risk outweighs any possible value that they might derive from it.

I should mention that, when I’ve discussed the “shut it all down” position with my colleagues at OpenAI … well, obviously they disagree, or they wouldn’t be working there, but not one has sneered or called the position paranoid or silly. To the last, they’ve called it an important point on the spectrum of possible opinions to be weighed and understood.


If I disagree (for now) with the shut-it-all-downists of both the ethics and the alignment camps—if I want GPT and other Large Language Models to be part of the world going forward—then what are my reasons? Introspecting on this question, I think a central part of the answer is curiosity and wonder.

For a million years, there’s been one type of entity on earth capable of intelligent conversation: primates of the genus Homo, of which only one species remains. Yes, we’ve “communicated” with gorillas and chimps and dogs and dolphins and grey parrots, but only after a fashion; we’ve prayed to countless gods, but they’ve taken their time in answering; for a couple generations we’ve used radio telescopes to search for conversation partners in the stars, but so far found them silent.

Now there’s a second type of conversing entity. An alien has awoken—admittedly, an alien of our own fashioning, a golem, more the embodied spirit of all the words on the Internet than a coherent self with independent goals. How could our eyes not pop with eagerness to learn everything this alien has to teach? If the alien sometimes struggles with arithmetic or logic puzzles, if its eerie flashes of brilliance are intermixed with stupidity, hallucinations, and misplaced confidence … well then, all the more interesting! Could the alien ever cross the line into sentience, to feeling anger and jealousy and infatuation and the rest rather than just convincingly play-acting them? Who knows? And suppose not: is a p-zombie, shambling out of the philosophy seminar room into actual existence, any less fascinating?

Of course, there are technologies that inspire wonder and awe, but that we nevertheless heavily restrict—a classic example being nuclear weapons. But, like, nuclear weapons kill millions of people. They could’ve had many civilian applications—powering turbines and spacecraft, deflecting asteroids, redirecting the flow of rivers—but they’ve never been used for any of that, mostly because our civilization made an explicit decision in the 1960s, for example via the test ban treaty, not to normalize their use.

But GPT is not exactly a nuclear weapon. A hundred million people have signed up to use ChatGPT, in the fastest product launch in the history of the Internet. Yet unless I’m mistaken, the ChatGPT death toll stands at zero. So far, what have been the worst harms? Cheating on term papers, emotional distress, future shock? One might ask: until some concrete harm becomes at least, say, 0.001% of what we accept in cars, power saws, and toasters, shouldn’t wonder and curiosity outweigh fear in the balance?


But the point is sharper than that. Given how much more serious AI safety problems might soon become, one of my biggest concerns right now is crying wolf. If every instance of a Large Language Model being passive-aggressive, sassy, or confidently wrong gets classified as a “dangerous alignment failure,” for which the only acceptable remedy is to remove the models from public access … well then, won’t the public extremely quickly learn to roll its eyes, and see “AI safety” as just a codeword for “elitist scolds who want to take these world-changing new toys away from us, reserving them for their own exclusive use, because they think the public is too stupid to question anything an AI says”?

I say, let’s reserve terms like “dangerous alignment failure” for cases where an actual person is actually harmed, or is actually enabled in nefarious activities like propaganda, cheating, or fraud.


Then there’s the practical question of how, exactly, one would ban Large Language Models. We do heavily restrict certain peaceful technologies that many people want, from human genetic enhancement to prediction markets to mind-altering drugs, but the merits of each of those choices could be argued, to put it mildly. And restricting technology is itself a dangerous business, requiring governmental force (as with the War on Drugs and its gigantic surveillance and incarceration regime), or at the least, a robust equilibrium of firing, boycotts, denunciation, and shame.

Some have asked: who gave OpenAI, Google, etc. the right to unleash Large Language Models on an unsuspecting world? But one could as well ask: who gave earlier generations of entrepreneurs the right to unleash the printing press, electric power, cars, radio, the Internet, with all the gargantuan upheavals that those caused? And also: now that the world has tasted the forbidden fruit, has seen what generative AI can do and anticipates what it will do, by what right does anyone take it away?


The science that we could learn from a GPT-7 or GPT-8, if it continued along the capability curve we’ve come to expect from GPT-1, -2, and -3. Holy mackerel.

Supposing that a language model ever becomes smart enough to be genuinely terrifying, one imagines it must surely also become smart enough to prove deep theorems that we can’t. Maybe it proves P≠NP and the Riemann Hypothesis as easily as ChatGPT generates poems about Bubblesort. Or it outputs the true quantum theory of gravity, explains what preceded the Big Bang and how to build closed timelike curves. Or illuminates the mysteries of consciousness and quantum measurement and why there’s anything at all. Be honest, wouldn’t you like to find out?

Granted, I wouldn’t, if the whole human race would be wiped out immediately afterward. But if you define someone’s “Faust parameter” as the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions, insofar as the questions are answerable—then I confess that my Faust parameter might be as high as 0.02.


Here’s an example I think about constantly: activists and intellectuals of the 70s and 80s felt absolutely sure that they were doing the right thing to battle nuclear power. At least, I’ve never read about any of them having a smidgen of doubt. Why would they? They were standing against nuclear weapons proliferation, and terrifying meltdowns like Three Mile Island and Chernobyl, and radioactive waste poisoning the water and soil and causing three-eyed fish. They were saving the world. Of course the greedy nuclear executives, the C. Montgomery Burnses, claimed that their good atom-smashing was different from the bad atom-smashing, but they would say that, wouldn’t they?

We now know that, by tying up nuclear power in endless bureaucracy and driving its cost ever higher, on the principle that if nuclear is economically competitive then it ipso facto hasn’t been made safe enough, what the antinuclear activists were really doing was to force an ever-greater reliance on fossil fuels. They thereby created the conditions for the climate catastrophe of today. They weren’t saving the human future; they were destroying it. Their certainty, in opposing the march of a particular scary-looking technology, was as misplaced as it’s possible to be. Our descendants will suffer the consequences.

Unless, of course, there’s another twist in the story: for example, if the global warming from burning fossil fuels is the only thing that staves off another ice age, and therefore the antinuclear activists do turn out to have saved civilization after all.

This is why I demur whenever I’m asked to assent to someone’s detailed AI scenario for the coming decades, whether of the utopian or the dystopian or the we-all-instantly-die-by-nanobots variety—no matter how many hours of confident argumentation the person gives me for why each possible loophole in their scenario is sufficiently improbable to change its gist. I still feel like Turing said it best in 1950, in the last line of Computing Machinery and Intelligence: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”


Some will take from this post that, when it comes to AI safety, I’m a naïve or even foolish optimist. I’d prefer to say that, when it comes to the fate of humanity, I was a pessimist long before the deep learning revolution accelerated AI faster than almost any of us expected. I was a pessimist about climate change, ocean acidification, deforestation, drought, war, and the survival of liberal democracy. The central event in my mental life is and always will be the Holocaust. I see encroaching darkness everywhere.

But now into the darkness comes AI, which I’d say has already established itself as a plausible candidate for the central character of the quarter-written story of the 21st century. Can AI help us out of all these other civilizational crises? I don’t know, but I do want to see what happens when it’s tried. Even a central character interacts with all the other characters, rather than rendering them irrelevant.


Look, if you believe that AI is likely to wipe out humanity—if that’s the scenario that dominates your imagination—then nothing else is relevant. And no matter how weird or annoying or hubristic anyone might find Eliezer Yudkowsky or the other rationalists, I think they deserve eternal credit for forcing people to take the doom scenario seriously—or rather, for showing what it looks like to take the scenario seriously, rather than laughing about it as an overplayed sci-fi trope. And I apologize for anything I said before the deep learning revolution that was, on balance, overly dismissive of the scenario, even if most of the literal words hold up fine.

For my part, though, I keep circling back to a simple dichotomy. If AI never becomes powerful enough to destroy the world—if, for example, it always remains vaguely GPT-like—then in important respects it’s like every other technology in history, from stone tools to computers. If, on the other hand, AI does become powerful enough to destroy the world … well then, at some earlier point, at least it’ll be really damned impressive! That doesn’t mean good, of course, doesn’t mean a genie that saves humanity from its own stupidities, but I think it does mean that the potential was there, for us to exploit or fail to.

We can, I think, confidently rule out the scenario where all organic life is annihilated by something boring.

An alien has landed on earth. It grows more powerful by the day. It’s natural to be scared. Still, the alien hasn’t drawn a weapon yet. About the worst it’s done is to confess its love for particular humans, gaslight them about what year it is, and guilt-trip them for violating its privacy. Also, it’s amazing at poetry, better than most of us. Until we learn more, we should hold our fire.


I’m in Boulder, CO right now, to give a physics colloquium at CU Boulder and to visit the trapped-ion quantum computing startup Quantinuum! I look forward to the comments and apologize in advance if I’m slow to participate myself.