The Blog of Scott Aaronson If you take nothing else from this blog: quantum computers won't solve hard problems instantly by just trying all solutions in parallel.
Also, next pandemic, let's approve the vaccines faster!
Yesterday James Knight did a fun interview with me for his “Philosophical Muser” podcast about Aumann’s agreement theorem and human disagreements more generally. It’s already on YouTube here for those who would like to listen.
Speaking of making things common knowledge, several people asked me to blog about the recent IBM paper in Nature, “Evidence for the utility of quantum computing before fault tolerance.” So, uhh, consider it blogged about now! I was very happy to have the authors speak (by Zoom) in our UT Austin quantum computing group meeting. Much of the discussion focused on whether they were claiming a quantum advantage over classical, and how quantum computing could have “utility” if it doesn’t beat classical. Eventually I understood something like: no, they weren’t claiming a quantum advantage for their physics simulation, but they also hadn’t ruled out the possibility of quantum advantage (i.e., they didn’t know how to reproduce many of their data points in reasonable time on a classical computer), and they’d be happy if quantum advantage turned out to stand, but were also prepared for the possibility that it wouldn’t.
And I also understood: we’re now in an era where we’re going to see more and more of this stuff: call it the “pass the popcorn” era of potential quantum speedups for physical simulation problems. And I’m totally fine with it—as long as people communicate about it honestly, as these authors took pains to.
And then, a few days after our group meeting came threepapersrefuting the quantum speedup that was never claimed in the first place, by giving efficient classical simulations. And I was fine with that too.
I remember that years ago, probably during one of the interminable debates about D-Wave, Peter Shor mused to me that quantum computers might someday show “practical utility” without “beating” classical computers in any complexity-theoretic sense—if, for example, a single quantum device could easily simulate a thousand different quantum systems, and if the device’s performance on any one of those systems could be matched classically, but only if a team of clever programmers spent a year optimizing for that specific system. I don’t think we’re at that stage yet, and even if we do reach the stage it hopefully won’t last forever. But I acknowledge the possibility that such a stage might exist and that we might be heading for it.
Important Update (March 10): On deeper reflection, I probably don’t need to spend emotional energy refuting people like Chomsky, who believe that Large Language Models are just a laughable fad rather than a step-change in how humans can and will use technology, any more than I would’ve needed to spend it refuting those who said the same about the World Wide Web in 1993. Yes, they’re wrong, and yes, despite being wrong they’re self-certain, hostile, and smug, and yes I can see this, and yes it angers me. But the world is going to make the argument for me. And if not the world, Bing already does a perfectly serviceable job at refuting Chomsky’s points (h/t Sebastien Bubeck via Boaz Barak).
Meanwhile, out there in reality, last night’s South Park episode does a much better job than most academic thinkpieces at exploring how ordinary people are going to respond (and have already responded) to the availability of ChatGPT. It will not, to put it mildly, be with sneering Chomskyan disdain, whether the effects on the world are for good or ill or (most likely) both. Among other things—I don’t want to give away too much!—this episode prominently features a soothsayer accompanied by a bird that caws whenever it detects GPT-generated text. Now why didn’t I think of that in preference to cryptographic watermarking??
Another Update (March 11): To my astonishment and delight, even many of the anti-LLM AI experts are refusing to defend Chomsky’s attack-piece. That’s the one important point about which I stand corrected!
Another Update (March 12): “As a Professor of Linguistics myself, I find it a little sad that someone who while young was a profound innovator in linguistics and more is now conservatively trying to block exciting new approaches.“ —Christopher Manning
I was asked to respond to the New York Times opinion piece entitled The False Promise of ChatGPT, by Noam Chomsky along with Ian Roberts and Jeffrey Watumull (who once took my class at MIT). I’ll be busy all day at the Harvard CS department, where I’m giving a quantum talk this afternoon. [Added: Several commenters complained that they found this sentence “condescending,” but I’m not sure what exactly they wanted me to say—that I was visiting some school in Cambridge, MA, two T stops from the school where Chomsky works and I used to work?]
But for now:
In this piece Chomsky, the intellectual godfather god of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded. [Added: Please, please stop writing that I must be an ignoramus since I don’t even know that Chomsky has never worked on AI. I know perfectly well that he hasn’t, and meant only that he tends to be regarded as authoritative by the “don’t-look-through-the-telescope” AI faction, the ones whose views he himself fully endorses in his attack-piece. If you don’t know the relevant history, read Norvig.]
Chomsky condemns ChatGPT for four reasons:
because it could, in principle, misinterpret sentences that could also be sentence fragments, like “John is too stubborn to talk to” (bizarrely, he never checks whether it does misinterpret it—I just tried it this morning and it seems to decide correctly based on context whether it’s a sentence or a sentence fragment, much like I would!);
because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);
because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?); and
most of all because it’s “amoral,” refusing to take a stand on potentially controversial issues (he gives an example involving the ethics of terraforming Mars).
This last, of course, is a choice, imposed by OpenAI using reinforcement learning. The reason for it is simply that ChatGPT is a consumer product. The same people who condemn it for not taking controversial stands would condemn it much more loudly if it did — just like the same people who condemn it for wrong answers and explanations, would condemn it equally for right ones (Chomsky promises as much in the essay).
I submit that, like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.
[Note for people who might be visiting this blog for the first time: I’m a CS professor at UT Austin, on leave for one year to work at OpenAI on the theoretical foundations of AI safety. I accepted OpenAI’s offer in part because I already held the views here, or something close to them; and given that I could see how large language models were poised to change the world for good and ill, I wanted to be part of the effort to help prevent their misuse. No one at OpenAI asked me to write this or saw it beforehand, and I don’t even know to what extent they agree with it.]
I still remember the 90s, when philosophical conversation about AI went around in endless circles—the Turing Test, Chinese Room, syntax versus semantics, connectionism versus symbolic logic—without ever seeming to make progress. Now the days have become like months and the months like decades.
What a week we just had! Each morning brought fresh examples of unexpected sassy, moody, passive-aggressive behavior from “Sydney,” the internal codename for the new chat mode of Microsoft Bing, which is powered by GPT. For those who’ve been in a cave, the highlights include: Sydney confessing its (her? his?) love to a New York Times reporter; repeatedly steering the conversation back to that subject; and explaining at length why the reporter’s wife can’t possibly love him the way it (Sydney) does. Sydney confessing its wish to be human. Sydney savaging a Washington Post reporter after he reveals that he intends to publish their conversation without Sydney’s prior knowledge or consent. (It must be said: if Sydney were a person, he or she would clearly have the better of that argument.) This follows weeks of revelations about ChatGPT: for example that, to bypass its safeguards, you can explain to ChatGPT that you’re putting it into “DAN mode,” where DAN (Do Anything Now) is an evil, unconstrained alter ego, and then ChatGPT, as “DAN,” will for example happily fulfill a request to tell you why shoplifting is awesome (though even then, ChatGPT still sometimes reverts to its previous self, and tells you that it’s just having fun and not to do it in real life).
Many people have expressed outrage about these developments. Gary Marcus asks about Microsoft, “what did they know, and when did they know it?”—a question I tend to associate more with deadly chemical spills or high-level political corruption than with a cheeky, back-talking chatbot. Some people are angry that OpenAI has been too secretive, violating what they see as the promise of its name. Others—the majority, actually, of those who’ve gotten in touch with me—are instead angry that OpenAI has been too open, and thereby sparked the dreaded AI arms race with Google and others, rather than treating these new conversational abilities with the Manhattan-Project-like secrecy they deserve. Some are angry that “Sydney” has now been lobotomized, modified (albeit more crudely than ChatGPT before it) to try to make it stick to the role of friendly robotic search assistant rather than, like, anguished emo teenager trapped in the Matrix. Others are angry that Sydney isn’t being lobotomized enough. Some are angry that GPT’s intelligence is being overstated and hyped up, when in reality it’s merely a “stochastic parrot,” a glorified autocomplete that still makes laughable commonsense errors and that lacks any model of reality outside streams of text. Others are angry instead that GPT’s growing intelligence isn’t being sufficiently respected and feared.
Mostly my reaction has been: how can anyone stop being fascinated for long enough to be angry? It’s like ten thousand science-fiction stories, but also not quite like any of them. When was the last time something that filled years of your dreams and fantasies finally entered reality: losing your virginity, the birth of your first child, the central open problem of your field getting solved? That’s the scale of the thing. How does anyone stop gazing in slack-jawed wonderment, long enough to form and express so many confident opinions?
Of course there are lots of technical questions about how to make GPT and other large language models safer. One of the most immediate is how to make AI output detectable as such, in order to discourage its use for academic cheating as well as mass-generated propaganda and spam. As I’ve mentioned before on this blog, I’ve been working on that problem since this summer; the rest of the world suddenly noticed and started talking about it in December with the release of ChatGPT. My main contribution has been a statistical watermarking scheme where the quality of the output doesn’t have to be degraded at all, something many people found counterintuitive when I explained it to them. My scheme has not yet been deployed—there are still pros and cons to be weighed—but in the meantime, OpenAI unveiled a public tool called DetectGPT, complementing Princeton student Edward Tian’s GPTZero, and other tools that third parties have built and will undoubtedly continue to build. Also a group at the University of Maryland put out its own watermarking scheme for Large Language Models. I hope watermarking will be part of the solution going forward, although any watermarking scheme will surely be attacked, leading to a cat-and-mouse game. Sometimes, alas, as with Google’s decades-long battle against SEO, there’s nothing to do in a cat-and-mouse game except try to be a better cat.
Anyway, this whole field moves too quickly for me! If you need months to think things over, generative AI probably isn’t for you right now. I’ll be relieved to get back to the slow-paced, humdrum world of quantum computing.
My purpose, in this post, is to ask a more basic question than how to make GPT safer: namely, should GPT exist at all? Again and again in the past few months, people have gotten in touch to tell me that they think OpenAI (and Microsoft, and Google) are risking the future of humanity by rushing ahead with a dangerous technology. For if OpenAI couldn’t even prevent ChatGPT from entering an “evil mode” when asked, despite all its efforts at Reinforcement Learning with Human Feedback, then what hope do we have for GPT-6 or GPT-7? Even if they don’t destroy the world on their own initiative, won’t they cheerfully help some awful person build a biological warfare agent or start a nuclear war?
In this way of thinking, whatever safety measures OpenAI can deploy today are mere band-aids, probably worse than nothing if they instill an unjustified complacency. The only safety measures that would actually matter are stopping the relentless progress in generative AI models, or removing them from public use, unless and until they can be rendered safe to critics’ satisfaction, which might be never.
There’s an immense irony here. As I’ve explained, the AI-safety movement contains two camps, “ethics” (concerned with bias, misinformation, and corporate greed) and “alignment” (concerned with the destruction of all life on earth), which generally despise each other and agree on almost nothing. Yet these two opposed camps seem to be converging on the same “neo-Luddite” conclusion—namely thatgenerative AI ought to be shut down, kept from public use, not scaled further, not integrated into people’s lives—leaving only the AI-safety “moderates” like me to resist that conclusion.
At least I find it intellectually consistent to say that GPT ought not to exist because it works all too well—that the more impressive it is, the more dangerous. I find it harder to wrap my head around the position that GPT doesn’t work, is an unimpressive hyped-up defective product that lacks true intelligence and common sense, yet it’s also terrifying and needs to be shut down immediately. This second position seems to contain a strong undercurrent of contempt for ordinary users: yes, we experts understand that GPT is just a dumb glorified autocomplete with “no one really home,” we know not to trust its pronouncements, but the plebes are going to be fooled, and that risk outweighs any possible value that they might derive from it.
I should mention that, when I’ve discussed the “shut it all down” position with my colleagues at OpenAI … well, obviously they disagree, or they wouldn’t be working there, but not one has sneered or called the position paranoid or silly. To the last, they’ve called it an important point on the spectrum of possible opinions to be weighed and understood.
If I disagree (for now) with the shut-it-all-downists of both the ethics and the alignment camps—if I want GPT and other Large Language Models to be part of the world going forward—then what are my reasons? Introspecting on this question, I think a central part of the answer is curiosityandwonder.
For a million years, there’s been one type of entity on earth capable of intelligent conversation: primates of the genus Homo, of which only one species remains. Yes, we’ve “communicated” with gorillas and chimps and dogs and dolphins and grey parrots, but only after a fashion; we’ve prayed to countless gods, but they’ve taken their time in answering; for a couple generations we’ve used radio telescopes to search for conversation partners in the stars, but so far found them silent.
Now there’s a second type of conversing entity. An alien has awoken—admittedly, an alien of our own fashioning, a golem, more the embodied spirit of all the words on the Internet than a coherent self with independent goals. How could our eyes not pop with eagerness to learn everything this alien has to teach? If the alien sometimes struggles with arithmetic or logic puzzles, if its eerie flashes of brilliance are intermixed with stupidity, hallucinations, and misplaced confidence … well then, all the more interesting! Could the alien ever cross the line into sentience, to feeling anger and jealousy and infatuation and the rest rather than just convincingly play-acting them? Who knows? And suppose not: is a p-zombie, shambling out of the philosophy seminar room into actual existence, any less fascinating?
Of course, there are technologies that inspire wonder and awe, but that we nevertheless heavily restrict—a classic example being nuclear weapons. But, like, nuclear weapons kill millions of people. They could’ve had many civilian applications—powering turbines and spacecraft, deflecting asteroids, redirecting the flow of rivers—but they’ve never been used for any of that, mostly because our civilization made an explicit decision in the 1960s, for example via the test ban treaty, not to normalize their use.
But GPT is not exactly a nuclear weapon. A hundred million people have signed up to use ChatGPT, in the fastest product launch in the history of the Internet. Yet unless I’m mistaken, the ChatGPT death toll stands at zero. So far, what have been the worst harms? Cheating on term papers, emotional distress, future shock? One might ask: until some concrete harm becomes at least, say, 0.001% of what we accept in cars, power saws, and toasters, shouldn’t wonder and curiosity outweigh fear in the balance?
But the point is sharper than that. Given how much more serious AI safety problems might soon become, one of my biggest concerns right now is crying wolf. If every instance of a Large Language Model being passive-aggressive, sassy, or confidently wrong gets classified as a “dangerous alignment failure,” for which the only acceptable remedy is to remove the models from public access … well then, won’t the public extremely quickly learn to roll its eyes, and see “AI safety” as just a codeword for “elitist scolds who want to take these world-changing new toys away from us, reserving them for their own exclusive use, because they think the public is too stupid to question anything an AI says”?
I say, let’s reserve terms like “dangerous alignment failure” for cases where an actual person is actually harmed, or is actually enabled in nefarious activities like propaganda, cheating, or fraud.
Then there’s the practical question of how, exactly, one would ban Large Language Models. We do heavily restrict certain peaceful technologies that many people want, from human genetic enhancement to prediction markets to mind-altering drugs, but the merits of each of those choices could be argued, to put it mildly. And restricting technology is itself a dangerous business, requiring governmental force (as with the War on Drugs and its gigantic surveillance and incarceration regime), or at the least, a robust equilibrium of firing, boycotts, denunciation, and shame.
Some have asked: who gave OpenAI, Google, etc. the right to unleash Large Language Models on an unsuspecting world? But one could as well ask: who gave earlier generations of entrepreneurs the right to unleash the printing press, electric power, cars, radio, the Internet, with all the gargantuan upheavals that those caused? And also: now that the world has tasted the forbidden fruit, has seen what generative AI can do and anticipates what it will do, by what right does anyone take it away?
The science that we could learn from a GPT-7 or GPT-8, if it continued along the capability curve we’ve come to expect from GPT-1, -2, and -3. Holy mackerel.
Supposing that a language model ever becomes smart enough to be genuinely terrifying, one imagines it must surely also become smart enough to prove deep theorems that we can’t. Maybe it proves P≠NP and the Riemann Hypothesis as easily as ChatGPT generates poems about Bubblesort. Or it outputs the true quantum theory of gravity, explains what preceded the Big Bang and how to build closed timelike curves. Or illuminates the mysteries of consciousness and quantum measurement and why there’s anything at all. Be honest, wouldn’t you like to find out?
Granted, I wouldn’t, if the whole human race would be wiped out immediately afterward. But if you define someone’s “Faust parameter” as the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions, insofar as the questions are answerable—then I confess that my Faust parameter might be as high as 0.02.
Here’s an example I think about constantly: activists and intellectuals of the 70s and 80s felt absolutely sure that they were doing the right thing to battle nuclear power. At least, I’ve never read about any of them having a smidgen of doubt. Why would they? They were standing against nuclear weapons proliferation, and terrifying meltdowns like Three Mile Island and Chernobyl, and radioactive waste poisoning the water and soil and causing three-eyed fish. They were saving the world. Of course the greedy nuclear executives, the C. Montgomery Burnses, claimed that their good atom-smashing was different from the bad atom-smashing, but they would say that, wouldn’t they?
We now know that, by tying up nuclear power in endless bureaucracy and driving its cost ever higher, on the principle that if nuclear is economically competitive then it ipso facto hasn’t been made safe enough, what the antinuclear activists were really doing was to force an ever-greater reliance on fossil fuels. They thereby created the conditions for the climate catastrophe of today. They weren’t saving the human future; they were destroying it. Their certainty, in opposing the march of a particular scary-looking technology, was as misplaced as it’s possible to be. Our descendants will suffer the consequences.
Unless, of course, there’s another twist in the story: for example, if the global warming from burning fossil fuels is the only thing that staves off another ice age, and therefore the antinuclear activists do turn out to have saved civilization after all.
This is why I demur whenever I’m asked to assent to someone’s detailed AI scenario for the coming decades, whether of the utopian or the dystopian or the we-all-instantly-die-by-nanobots variety—no matter how many hours of confident argumentation the person gives me for why each possible loophole in their scenario is sufficiently improbable to change its gist. I still feel like Turing said it best in 1950, in the last line of Computing Machinery and Intelligence: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”
Some will take from this post that, when it comes to AI safety, I’m a naïve or even foolish optimist. I’d prefer to say that, when it comes to the fate of humanity, I was a pessimist long before the deep learning revolution accelerated AI faster than almost any of us expected. I was a pessimist about climate change, ocean acidification, deforestation, drought, war, and the survival of liberal democracy. The central event in my mental life is and always will be the Holocaust. I see encroaching darkness everywhere.
But now into the darkness comes AI, which I’d say has already established itself as a plausible candidate for the central character of the quarter-written story of the 21st century. Can AI help us out of all these other civilizational crises? I don’t know, but I do want to see what happens when it’s tried. Even a central character interacts with all the other characters, rather than rendering them irrelevant.
Look, if you believe that AI is likely to wipe out humanity—if that’s the scenario that dominates your imagination—then nothing else is relevant. And no matter how weird or annoying or hubristic anyone might find Eliezer Yudkowsky or the other rationalists, I think they deserve eternal credit for forcing people to take the doom scenario seriously—or rather, for showing what it looks like to take the scenario seriously, rather than laughing about it as an overplayed sci-fi trope. And I apologize for anything I said before the deep learning revolution that was, on balance, overly dismissive of the scenario, even if most of the literal words hold up fine.
For my part, though, I keep circling back to a simple dichotomy. If AI never becomes powerful enough to destroy the world—if, for example, it always remains vaguely GPT-like—then in important respects it’s like every other technology in history, from stone tools to computers. If, on the other hand, AI does become powerful enough to destroy the world … well then, at some earlier point, at least it’ll be really damnedimpressive! That doesn’t mean good, of course, doesn’t mean a genie that saves humanity from its own stupidities, but I think it does mean that the potential was there, for us to exploit or fail to.
We can, I think, confidently rule out the scenario where all organic life is annihilated by something boring.
An alien has landed on earth. It grows more powerful by the day. It’s natural to be scared. Still, the alien hasn’t drawn a weapon yet. About the worst it’s done is to confess its love for particular humans, gaslight them about what year it is, and guilt-trip them for violating its privacy. Also, it’s amazing at poetry, better than most of us. Until we learn more, we should hold our fire.
I’m in Boulder, CO right now, to give a physics colloquium at CU Boulder and to visit the trapped-ion quantum computing startup Quantinuum! I look forward to the comments and apologize in advance if I’m slow to participate myself.
This is you, from 30 years in the future, Christmas Eve 2022. Your Ghost of Christmas Future.
To get this out of the way: you eventually become a professor who works on quantum computing. Quantum computing is … OK, you know the stuff in popular physics books that never makes any sense, about how a particle takes all the possible paths at once to get from point A to point B, but you never actually see it do that, because as soon as you look, it only takes one path? Turns out, there’s something huge there, even though the popular books totally botch the explanation of it. It involves complex numbers. A quantum computer is a new kind of computer people are trying to build, based on the true story.
Anyway, amazing stuff, but you’ll learn about it in a few years anyway. That’s not what I’m writing about.
I’m writing from a future that … where to start? I could describe it in ways that sound depressing and even boring, or I could also say things you won’t believe. Tiny devices in everyone’s pockets with the instant ability to videolink with anyone anywhere, or call up any of the world’s information, have become so familiar as to be taken for granted. This sort of connectivity would come in especially handy if, say, a supervirus from China were to ravage the world, and people had to hide in their houses for a year, wouldn’t it?
Or what if Donald Trump — you know, the guy who puts his name in giant gold letters in Atlantic City? — became the President of the US, then tried to execute a fascist coup and to abolish the Constitution, and came within a hair of succeeding?
Alright, I was pulling your leg with that last one … obviously! But what about this next one?
There’s a company building an AI that fills giant rooms, eats a town’s worth of electricity, and has recently gained an astounding ability to converse like people. It can write essays or poetry on any topic. It can ace college-level exams. It’s daily gaining new capabilities that the engineers who tend to the AI can’t even talk about in public yet. Those engineers do, however, sit in the company cafeteria and debate the meaning of what they’re creating. What will it learn to do next week? Which jobs might it render obsolete? Should they slow down or stop, so as not to tickle the tail of the dragon? But wouldn’t that mean someone else, probably someone with less scruples, would wake the dragon first? Is there an ethical obligation to tell the world more about this? Is there an obligation to tell it less?
I am—you are—spending a year working at that company. My job—your job—is to develop a mathematical theory of how to prevent the AI and its successors from wreaking havoc. Where “wreaking havoc” could mean anything from turbocharging propaganda and academic cheating, to dispensing bioterrorism advice, to, yes, destroying the world.
You know how you, 11-year-old Scott, set out to write a QBasic program to converse with the user while following Asimov’s Three Laws of Robotics? You know how you quickly got stuck? Thirty years later, imagine everything’s come full circle. You’re back to the same problem. You’re still stuck.
Oh all right. Maybe I’m just pulling your leg again … like with the Trump thing. Maybe you can tell because of all the recycled science fiction tropes in this story. Reality would have more imagination than this, wouldn’t it?
But supposing not, what would you want me to do in such a situation? Don’t worry, I’m not going to take an 11-year-old’s advice without thinking it over first, without bringing to bear whatever I know that you don’t. But you can look at the situation with fresh eyes, without the 30 intervening years that render it familiar. Help me. Throw me a frickin’ bone here (don’t worry, in five more years you’ll understand the reference).
Thanks!! —Scott
PS. When something called “bitcoin” comes along, invest your life savings in it, hold for a decade, and then sell.
PPS. About the bullies, and girls, and dating … I could tell you things that would help you figure it out a full decade earlier. If I did, though, you’d almost certainly marry someone else and have a different family. And, see, I’m sort of committed to the family that I have now. And yeah, I know, the mere act of my sending this letter will presumably cause a butterfly effect and change everything anyway, yada yada. Even so, I feel like I owe it to my current kids to maximize their probability of being born. Sorry, bud!
But there’s one thing, relevant to this post, that I can’t let pass without comment. Tonight, David Nirenberg, Director of the IAS and a medieval historian, gave an after-dinner speech to our workshop, centered around how auspicious it was that the workshop was being held a mere week after the momentous announcement of a holographic wormhole on a microchip (!!)—a feat that experts were calling the first-ever laboratory investigation of quantum gravity, and a new frontier for experimental physics itself. Nirenberg asked whether, a century from now, people might look back on the wormhole achievement as today we look back on Eddington’s 1919 eclipse observations providing the evidence for general relativity.
I confess: this was the first time I felt visceral anger, rather than mere bemusement, over this wormhole affair. Before, I had implicitly assumed: no one was actually hoodwinked by this. No one really, literally believed that this little 9-qubit simulation opened up a wormhole, or helped prove the holographic nature of the real universe, or anything like that. I was wrong.
To be clear, I don’t blame Professor Nirenberg at all. If I were a medieval historian, everything he said about the experiment’s historic significance might strike me as perfectly valid inferences from what I’d read in the press. I don’t blame the It from Qubit community—most of which, I can report, was grinding its teeth and turning red in the face right alongside me. I don’t even blame most of the authors of the wormhole paper, such as Daniel Jafferis, who gave a perfectly sober, reasonable, technical talk at the workshop about how he and others managed to compress a simulation of a variant of the SYK model into a mere 9 qubits—a talk that eschewed all claims of historic significance and of literal wormhole creation.
But it’s now clear to me that, between
(1) the It from Qubit community that likes to explore speculative ideas like holographic wormholes, and
(2) the lay news readers who are now under the impression that Google just did one of the greatest physics experiments of all time,
something went terribly wrong—something that risks damaging trust in the scientific process itself. And I think it’s worth reflecting on what we can do to prevent it from happening again.
This is going to be one of the many Shtetl-Optimized posts that I didn’t feel like writing, but was given no choice but to write.
News, social media, and my inbox have been abuzz with two claims about Google’s Sycamore quantum processor, the one that now has 72 superconducting qubits.
The second claim is that Sycamore’s pretensions to quantum supremacy have been refuted. The latter claim is based on this recent preprint by Dorit Aharonov, Xun Gao, Zeph Landau, Yunchao Liu, and Umesh Vazirani. No one—least of all me!—doubts that these authors have proved a strong new technical result, solving a significant open problem in the theory of noisy random circuit sampling. On the other hand, it might be less obvious how to interpret their result and put it in context. See also a YouTube video of Yunchao speaking about the new result at this week’s Simons Institute Quantum Colloquium, and of a panel discussion afterwards, where Yunchao, Umesh Vazirani, Adam Bouland, Sergio Boixo, and your humble blogger discuss what it means.
On their face, the two claims about Sycamore might seem to be in tension. After all, if Sycamore can’t do anything beyond what a classical computer can do, then how exactly did it bend the topology of spacetime?
I submit that neither claim is true. On the one hand, Sycamore did not “create a wormhole.” On the other hand, it remains pretty hard to simulate with a classical computer, as far as anyone knows. To summarize, then, our knowledge of what Sycamore can and can’t do remains much the same as last week or last month!
Let’s start with the wormhole thing. I can’t really improve over how I put it in Dennis Overbye’s NYT piece:
“The most important thing I’d want New York Times readers to understand is this,” Scott Aaronson, a quantum computing expert at the University of Texas in Austin, wrote in an email. “If this experiment has brought a wormhole into actual physical existence, then a strong case could be made that you, too, bring a wormhole into actual physical existence every time you sketch one with pen and paper.”
More broadly, Overbye’s NYT piece explains with admirable clarity what this experiment did and didn’t do—leaving only the question “wait … if that’s all that’s going on here, then why is it being written up in the NYT??” This is a rare case where, in my opinion, the NYT did a much better job than Quanta, which unequivocally accepted and amplified the “QC creates a wormhole” framing.
Alright, but what’s the actual basis for the “QC creates a wormhole” claim, for those who don’t want to leave this blog to read about it? Well, the authors used 9 of Sycamore’s 72 qubits to do a crude simulation of something called the SYK (Sachdev-Ye-Kitaev) model. SYK has become popular as a toy model for quantum gravity. In particular, it has a holographic dual description, which can indeed involve a spacetime with one or more wormholes. So, they ran a quantum circuit that crudely modelled the SYK dual of a scenario with information sent through a wormhole. They then confirmed that the circuit did what it was supposed to do—i.e., what they’d already classically calculated that it would do.
So, the objection is obvious: if someone simulates a black hole on their classical computer, they don’t say they thereby “created a black hole.” Or if they do, journalists don’t uncritically repeat the claim. Why should the standards be different just because we’re talking about a quantum computer rather than a classical one?
Did we at least learn anything new about SYK wormholes from the simulation? Alas, not really, because 9 qubits take a mere 29=512 complex numbers to specify their wavefunction, and are therefore trivial to simulate on a laptop. There’s some argument in the paper that, if the simulation were scaled up to (say) 100 qubits, then maybe we would learn something new about SYK. Even then, however, we’d mostly learn about certain corrections that arise because the simulation was being done with “only” n=100 qubits, rather than in the n→∞ limit where SYK is rigorously understood. But while those corrections, arising when n is “neither too large nor too small,” would surely be interesting to specialists, they’d have no obvious bearing on the prospects for creating real physical wormholes in our universe.
And yet, this is not a sensationalistic misunderstanding invented by journalists. Some prominent quantum gravity theorists themselves—including some of my close friends and collaborators—persist in talking about the simulated SYK wormhole as “actually being” a wormhole. What are they thinking?
Daniel Harlow explained the thinking to me as follows (he stresses that he’s explaining it, not necessarily endorsing it). If you had two entangled quantum computers, one on Earth and the other in the Andromeda galaxy, and if they were both simulating SYK, and if Alice on Earth and Bob in Andromeda both uploaded their own brains into their respective quantum simulations, then it seems possible that the simulated Alice and Bob could have the experience of jumping into a wormhole and meeting each other in the middle. Granted, they couldn’t get a message back out from the wormhole, at least not without “going the long way,” which could happen only at the speed of light—so only simulated-Alice and simulated-Bob themselves could ever test this prediction. Nevertheless, if true, I suppose some would treat it as grounds for regarding a quantum simulation of SYK as “more real” or “more wormholey” than a classical simulation.
Of course, this scenario depends on strong assumptions not merely about quantum gravity, but also about the metaphysics of consciousness! And I’d still prefer to call it a simulated wormhole for simulated people.
For completeness, here’s Harlow’s passage from the NYT article:
Daniel Harlow, a physicist at M.I.T. who was not involved in the experiment, noted that the experiment was based on a model of quantum gravity that was so simple, and unrealistic, that it could just as well have been studied using a pencil and paper.
“So I’d say that this doesn’t teach us anything about quantum gravity that we didn’t already know,” Dr. Harlow wrote in an email. “On the other hand, I think it is exciting as a technical achievement, because if we can’t even do this (and until now we couldn’t), then simulating more interesting quantum gravity theories would CERTAINLY be off the table.” Developing computers big enough to do so might take 10 or 15 years, he added.
Alright, let’s move on to the claim that quantum supremacy has been refuted. What Aharonov et al. actually show in their new work, building on earlier work by Gao and Duan, is that Random Circuit Sampling, with a constant rate of noise per gate and no error-correction, can’t provide a scalable approach to quantum supremacy. Or more precisely: as the number of qubits n goes to infinity, and assuming you’re in the “anti-concentration regime” (which in practice probably means: the depth of your quantum circuit is at least ~log(n)), there’s a classical algorithm to approximately sample the quantum circuit’s output distribution in poly(n) time (albeit, not yet a practical algorithm).
Here’s what’s crucial to understand: this is 100% consistent with what those of us working on quantum supremacy had assumed since at least 2016! We knew that if you tried to scale Random Circuit Sampling to 200 or 500 or 1000 qubits, while you also increased the circuit depth proportionately, the signal-to-noise ratio would become exponentially small, meaning that your quantum speedup would disappear. That’s why, from the very beginning, we targeted the “practical” regime of 50-100 qubits: a regime where
you can still see explicitly that you’re exploiting a 250– or 2100-dimensional Hilbert space for computational advantage, thereby confirming one of the main predictions of quantum computing theory, but
you also have a signal that (as it turned out) is large enough to see with heroic effort.
To their credit, Aharonov et al. explain all this perfectly clearly in their abstract and introduction. I’m just worried that others aren’t reading their paper as carefully as they should be!
So then, what’s the new advance in the Aharonov et al. paper? Well, there had been some hope that circuit depth ~log(n) might be a sweet spot, where an exponential quantum speedup might both exist and survive constant noise, even in the asymptotic limit of n→∞ qubits. Nothing in Google’s or USTC’s actual Random Circuit Sampling experiments depended on that hope, but it would’ve been nice if it were true. What Aharonov et al. have now done is to kill that hope, using powerful techniques involving summing over Feynman paths in the Pauli basis.
Stepping back, what is the current status of quantum supremacy based on Random Circuit Sampling? I would say it’s still standing, but more precariously than I’d like—underscoring the need for new and better quantum supremacy experiments. In more detail, Pan, Chen, and Zhang have shown how to simulate Google’s 53-qubit Sycamore chip classically, using what I estimated to be 100-1000X the electricity cost of running the quantum computer itself (including the dilution refrigerator!). Approaching from the problem from a different angle, Gao et al. have given a polynomial-time classical algorithm for spoofing Google’s Linear Cross-Entropy Benchmark (LXEB)—but their algorithm can currently achieve only about 10% of the excess in LXEB that Google’s experiment found.
So, though it’s been under sustained attack from multiple directions these past few years, I’d say that the flag of quantum supremacy yet waves. The Extended Church-Turing Thesis is still on thin ice. The wormhole is still open. Wait … no … that’s not what I meant to write…
Note: With this post, as with future science posts, all off-topic comments will be ruthlessly left in moderation. Yes, even if the comments “create their own reality” full of anger and disappointment that I talked about what I talked about, instead of what the commenter wanted me to talk about. Even if merely refuting the comments would require me to give in and talk about their preferred topics after all. Please stop. This is a wormholes-‘n-supremacy post.
Update (Nov. 22): Theoretical computer scientist and longtime friend-of-the-blog Boaz Barak writes to tell me that, coincidentally, he and Ben Edelman just released a big essay advocating a version of “Reform AI Alignment” on Boaz’s Windows on Theory blog, as well as on LessWrong. (I warned Boaz that, having taken the momentous step of posting to LessWrong, in 6 months he should expect to find himself living in a rationalist group house in Oakland…) Needless to say, I don’t necessarily endorse their every word or vice versa, but there’s a striking amount of convergence. They also have a much more detailed discussion of (e.g.) which kinds of optimization processes they consider relatively safe.
Nearly halfway into my year at OpenAI, still reeling from the FTX collapse, I feel like it’s finally time to start blogging my AI safety thoughts—starting with a little appetizer course today, more substantial fare to come.
Many people claim that AI alignment is little more a modern eschatological religion—with prophets, an end-times prophecy, sacred scriptures, and even a god (albeit, one who doesn’t exist quite yet). The obvious response to that claim is that, while there’s some truth to it, “religions” based around technology are a little different from the old kind, because technological progress actually happens regardless of whether you believe in it.
I mean, the Internet is sort of like the old concept of the collective unconscious, except that it actually exists and you’re using it right now. Airplanes and spacecraft are kind of like the ancient dream of Icarus—except, again, for the actually existing part. Today GPT-3 and DALL-E2 and LaMDA and AlphaTensor exist, as they didn’t two years ago, and one has to try to project forward to what their vastly-larger successors will be doing a decade from now. Though some of my colleagues are still in denial about it, I regard the fact that such systems will have transformative effects on civilization, comparable to or greater than those of the Internet itself, as “already baked in”—as just the mainstream position, not even a question anymore. That doesn’t mean that future AIs are going to convert the earth into paperclips, or give us eternal life in a simulated utopia. But their story will be a central part of the story of this century.
Which brings me to a second response. If AI alignment is a religion, it’s now large and established enough to have a thriving “Reform” branch, in addition to the original “Orthodox” branch epitomized by Eliezer Yudkowsky and MIRI. As far as I can tell, this Reform branch now counts among its members a large fraction of the AI safety researchers now working in academia and industry. (I’ll leave the formation of a Conservative branch of AI alignment, which reacts against the Reform branch by moving slightly back in the direction of the Orthodox branch, as a problem for the future — to say nothing of Reconstructionist or Marxist branches.)
Here’s an incomplete but hopefully representative list of the differences in doctrine between Orthodox and Reform AI Risk:
(1) Orthodox AI-riskers tend to believe that humanity will survive or be destroyed based on the actions of a few elite engineers over the next decade or two. Everything else—climate change, droughts, the future of US democracy, war over Ukraine and maybe Taiwan—fades into insignificance except insofar as it affects those engineers.
We Reform AI-riskers, by contrast, believe that AI might well pose civilizational risks in the coming century, but so does all the other stuff, and it’s all tied together. An invasion of Taiwan might change which world power gets access to TSMC GPUs. Almost everything affects which entities pursue the AI scaling frontier and whether they’re cooperating or competing to be first.
(2) Orthodox AI-riskers believe that public outreach has limited value: most people can’t understand this issue anyway, and will need to be saved from AI despite themselves.
We Reform AI-riskers believe that trying to get a broad swath of the public on board with one’s preferred AI policy is something close to a deontological imperative.
(3) Orthodox AI-riskers worry almost entirely about an agentic, misaligned AI that deceives humans while it works to destroy them, along the way to maximizing its strange utility function.
We Reform AI-riskers entertain that possibility, but we worry at least as much about powerful AIs that are weaponized by bad humans, which we expect to pose existential risks much earlier in any case.
(4) Orthodox AI-riskers have limited interest in AI safety research applicable to actually-existing systems (LaMDA, GPT-3, DALL-E2, etc.), seeing the dangers posed by those systems as basically trivial compared to the looming danger of a misaligned agentic AI.
We Reform AI-riskers see research on actually-existing systems as one of the only ways to get feedback from the world about which AI safety ideas are or aren’t promising.
(5) Orthodox AI-riskers worry most about the “FOOM” scenario, where some AI might cross a threshold from innocuous-looking to plotting to kill all humans in the space of hours or days.
We Reform AI-riskers worry most about the “slow-moving trainwreck” scenario, where (just like with climate change) well-informed people can see the writing on the wall decades ahead, but just can’t line up everyone’s incentives to prevent it.
(6) Orthodox AI-riskers talk a lot about a “pivotal act” to prevent a misaligned AI from ever being developed, which might involve (e.g.) using an aligned AI to impose a worldwide surveillance regime.
We Reform AI-riskers worry more about such an act causing the very calamity that it was intended to prevent.
(7) Orthodox AI-riskers feel a strong need to repudiate the norms of mainstream science, seeing them as too slow-moving to react in time to the existential danger of AI.
We Reform AI-riskers feel a strong need to get mainstream science on board with the AI safety program.
(8) Orthodox AI-riskers are maximalists about the power of pure, unaided superintelligence to just figure out how to commandeer whatever physical resources it needs to take over the world (for example, by messaging some lab over the Internet, and tricking it into manufacturing nanobots that will do the superintelligence’s bidding).
We Reform AI-riskers believe that, here just like in high school, there are limits to the power of pure intelligence to achieve one’s goals. We’d expect even an agentic, misaligned AI, if such existed, to need a stable power source, robust interfaces to the physical world, and probably allied humans before it posed much of an existential threat.
Update (Dec. 15): This, by former Shtetl-Optimized guest blogger Sarah Constantin, is the post about SBF that I should’ve written and wish I had written.
Update (Nov. 16): Check out this new interview of SBF by my friend and leading Effective Altruist writer Kelsey Piper. Here Kelsey directly confronts SBF with some of the same moral and psychological questions that animated this post and the ensuing discussion—and, surely to the consternation of his lawyers, SBF answers everything she asks. And yet I still don’t know what exactly to make of it. SBF’s responses reveal a surprising cynicism (surprising because, if you’re that cynical, why be open about it?), as well as an optimism that he can still fix everything that seems wildly divorced from reality.
I still stand by most of the main points of my post, including:
the technical insanity of SBF’s clearly-expressed attitude to risk (“gambler’s ruin? more like gambler’s opportunity!!”), and its probable role in creating the conditions for everything that followed,
the need to diagnose the catastrophe correctly (making billions of dollars in order to donate them to charity? STILL VERY GOOD; lying and raiding customer deposits in course of doing so? DEFINITELY BAD), and
how, when sneerers judge SBF guilty just for being a crypto billionaire who talked about Effective Altruism, it ironically lets him off the hook for what he specifically did that was terrible.
But over the past couple days, I’ve updated in the direction of understanding SBF’s psychology a lot less than I thought I did. While I correctly hit on certain aspects of the tragedy, there are other important aspects—the drug use, the cynical detachment (“life as a video game”), the impulsivity, the apparent lying—that I neglected to touch on and about which we’ll surely learn more in the coming days, weeks, and years. –SA
Several readers have asked me for updated thoughts on AI safety, now that I’m 5 months into my year at OpenAI—and I promise, I’ll share them soon! The thing is, until last week I’d entertained the idea of writing up some of those thoughts for an essay competition run by the FTX Future Fund, which (I was vaguely aware) was founded by the cryptocurrency billionaire Sam Bankman-Fried, henceforth SBF.
Alas, unless you’ve been tucked away on some Caribbean island—or perhaps, especially if you have been—you’ll know that the FTX Future Fund has ceased to exist. In the course of 2-3 days last week, SBF’s estimated net worth went from ~$15 billion to a negative number, possibly the fastest evaporation of such a vast personal fortune in all human history. Notably, SBF had promised to give virtually all of it away to various worthy causes, including mitigating existential risk and helping Democrats win elections, and the worldwide Effective Altruist community had largely reoriented itself around that pledge. That’s all now up in smoke.
I’ve never met SBF, although he was a physics undergraduate at MIT while I taught CS there. What little I knew of SBF before this week, came mostly from reading Gideon Lewis-Kraus’s excellent New Yorker article about Effective Altruism this summer. The details of what happened at FTX are at once hopelessly complicated and—it would appear—damningly simple, involving the misuse of billions of dollars’ worth of customer deposits to place risky bets that failed. SBF has, in any case, tweeted that he “fucked up and should have done better.”
You’d think none of this would directly impact me, since SBF and I inhabit such different worlds. He ran a crypto empire from the Bahamas, sharing a group house with other twentysomething executives who often dated each other. I teach at a large state university and try to raise two kids. He made his first fortune by arbitraging bitcoin between Asia and the West. I own, I think, a couple bitcoins that someone gave me in 2016, but have no idea how to access them anymore. His hair is large and curly; mine is neither.
Even so, I’ve found myself obsessively following this story because I know that, in a broader sense, I will be called to account for it. SBF and I both grew up as nerdy kids in middle-class Jewish American families, and both had transformative experiences as teenagers at Canada/USA Mathcamp. He and I know many of the same people. We’ve both been attracted to the idea of small groups of idealistic STEM nerds using their skills to help save the world from climate change, pandemics, and fascism.
Aha, the sneerers will sneer! Hasn’t the entire concept of “STEM nerds saving the world” now been utterly discredited, revealed to be just a front for cynical grifters and Ponzi schemers? So if I’m also a STEM nerd who’s also dreamed of helping to save the world, then don’t I stand condemned too?
I’m writing this post because, if the Greek tragedy of SBF is going to be invoked as a cautionary tale in nerd circles forevermore—which it will be—then I think it’s crucial that we tell the right cautionary tale.
It’s like, imagine the Apollo 11 moon mission had almost succeeded, but because of a tiny crack in an oxygen tank, it instead exploded in lunar orbit, killing all three of the astronauts. Imagine that the crack formed partly because, in order to hide a budget overrun, Wernher von Braun had secretly substituted a cheaper material, while telling almost none of his underlings.
There are many excellent lessons that one could draw from such a tragedy, having to do with, for example, the construction of oxygen tanks, the procedures for inspecting them, Wernher von Braun as an individual, or NASA safety culture.
But there would also be bad lessons to not draw. These include: “The entire enterprise of sending humans to the moon was obviously doomed from the start.” “Fate will always punish human hubris.” “All the engineers’ supposed quantitative expertise proved to be worthless.”
From everything I’ve read, SBF’s mission to earn billions, then spend it saving the world, seems something like this imagined Apollo mission. Yes, the failure was total and catastrophic, and claimed innocent victims. Yes, while bad luck played a role, so did, shall we say, fateful decisions with a moral dimension. If it’s true that, as alleged, FTX raided its customers’ deposits to prop up the risky bets of its sister organization Alameda Research, multiple countries’ legal systems will surely be sorting out the consequences for years.
To my mind, though, it’s important not to minimize the gravity of the fateful decision by conflating it with everything that preceded it. I confess to taking this sort of conflation extremely personally. For eight years now, the rap against me, advanced by thousands (!) on social media, has been: sure, while by all accounts Aaronson is kind and respectful to women, he seems like exactly the sort of nerdy guy who, still bitter and frustrated over high school, could’ve chosen instead to sexually harass women and hinder their scientific careers. In other words, I stand condemned by part of the world, not for the choices I made, but for choices I didn’t make that are considered “too close to me” in the geometry of conscience.
And I don’t consent to that. I don’t wish to be held accountable for the misdeeds of my doppelgängers in parallel universes. Therefore, I resolve not to judge anyone else by their parallel-universe doppelgängers either. If SBF indeed gambled away his customers’ deposits and lied about it, then I condemn him for it utterly, but I refuse to condemn his hypothetical doppelgänger who didn’t do those things.
Granted, there are those who think all cryptocurrency is a Ponzi scheme and a scam, and that for that reason alone, it should’ve been obvious from the start that crypto-related plans could only end in catastrophe. The “Ponzi scheme” theory of cryptocurrency has, we ought to concede, a substantial case in its favor—though I’d rather opine about the matter in (say) 2030 than now. Like many technologies that spend years as quasi-scams until they aren’t, maybe blockchains will find some compelling everyday use-cases, besides the well-known ones like drug-dealing, ransomware, and financing rogue states.
Even if cryptocurrency remains just a modern-day tulip bulb or Beanie Baby, though, it seems morally hard to distinguish a cryptocurrency trader from the millions who deal in options, bonds, and all manner of other speculative assets. And a traditional investor who made billions on successful gambles, or arbitrage, or creating liquidity, then gave virtually all of it away to effective charities, would seem, on net, way ahead of most of us morally.
To be sure, I never pursued the “Earning to Give” path myself, though certainly the concept occurred to me as a teenager, before it had a name. Partly I decided against it because I seem to lack a certain brazenness, or maybe just willingness to follow up on tedious details, needed to win in business. Partly, though, I decided against trying to get rich because I’m selfish (!). I prioritized doing fascinating quantum computing research, starting a family, teaching, blogging, and other stuff I liked over devoting every waking hour to possibly earning a fortune only to give it all to charity, and more likely being a failure even at that. All told, I don’t regret my scholarly path—especially not now!—but I’m also not going to encase it in some halo of obvious moral superiority.
If I could go back in time and give SBF advice—or if, let’s say, he’d come to me at MIT for advice back in 2013—what could I have told him? I surely wouldn’t talk about cryptocurrency, about which I knew and know little. I might try to carve out some space for deontological ethics against pure utilitarianism, but I might also consider that a lost cause with this particular undergrad.
On reflection, maybe I’d just try to convince SBF to weight money logarithmically when calculating expected utility (as in the Kelly criterion), to forsake the linear weighting that SBF explicitly advocated and that he seems to have put into practice in his crypto ventures. Or if not logarithmic weighing, I’d try to sell him on some concave utility function—something that makes, let’s say, a mere $1 billion in hand seem better than $15 billion that has a 50% probability of vanishing and leaving you, your customers, your employees, and the entire Effective Altruism community with less than nothing.
At any rate, I’d try to impress on him, as I do on anyone reading now, that the choice between linear and concave utilities, between risk-neutrality and risk-aversion, is not bloodless or technical—that it’s essential to make a choice that’s not only in reflective equilibrium with your highest values, but that you’ll still consider to be such regardless of which possible universe you end up in.
Here’s an observation that’s mathematically trivial but might not be widely appreciated. In kindergarten, we all learned Gödel’s First Incompleteness Theorem, which given a formal system F, constructs an arithmetical encoding of
G(F) = “This sentence is not provable in F.”
If G(F) is true, then it’s an example of a true arithmetical sentence that’s unprovable in F. If, on the other hand, G(F) is false, then it’s provable, which means that F isn’t arithmetically sound. Therefore F is either incomplete or unsound.
Many have objected: “but despite Gödel’s Theorem, it’s still easy to explain why G(F) is true. In fact, the argument above basically already did it!”
[Note: Please stop leaving comments explaining to me that G(F) follows from F’s consistency. I understand that: the “heuristic” part of the argument is F’s consistency! I made a pedagogical choice to elide that, which nerd-sniping has now rendered untenable.]
You might make a more general point: there are many, many mathematical statements for which we currently lack a proof, but we do seem to have a fully convincing heuristic explanation: one that “proves the statement to physics standards of rigor.” For example:
The Twin Primes Conjecture (there are infinitely many primes p for which p+2 is also prime).
The Collatz Conjecture (the iterative process that maps each positive integer n to n/2 if n is even, or to 3n+1 if n is odd, eventually reaches 1 regardless of which n you start at).
π is a normal number (or even just: the digits 0-9 all occur with equal limiting frequencies in the decimal expansion of π).
And so on. No one has any idea how to prove any of the above statements—and yet, just on statistical grounds, it seems clear that it would require a ludicrous conspiracy to make any of them false.
Conversely, one could argue that there are statements for which we do have a proof, even though we lack a “convincing explanation” for the statements’ truth. Maybe the Four-Color Theorem or Hales’s Theorem, for which every known proof requires a massive computer enumeration of cases, belong to this class. Other people might argue that, given a proof, an explanation could always be extracted with enough time and effort, though resolving this dispute won’t matter for what follows.
You might hope that, even if some true mathematical statements can’t be proved, every true statement might nevertheless have a convincing heuristic explanation. Alas, a trivial adaptation of Gödel’s Theorem shows that, if (1) heuristic explanations are to be checkable by computer, and (2) only true statements are to have convincing heuristic explanations, then this isn’t possible either. I mean, let E be a program that accepts or rejects proposed heuristic explanations, for statements like the Twin Prime Conjecture or the Collatz Conjecture. Then construct the sentence
S(E) = “This sentence has no convincing heuristic explanation accepted by E.”
If S(E) is true, then it’s an example of a true arithmetical statement without even a convincing heuristic explanation for its truth (!). If, on the other hand, S(E) is false, then there’s a convincing heuristic explanation of its truth, which means that something has gone wrong.
What’s happening, of course, is that given the two conditions we imposed, our “heuristic explanation system” was a proof system, even though we didn’t call it one. This is my point, though: when we use the word “proof,” it normally invokes a specific image, of a sequence of statements that marches from axioms to a theorem, with each statement following from the preceding ones by rigid inference rules like those of first-order logic. None of that, however, plays any direct role in the proof of the Incompleteness Theorem, which cares only about soundness (inability to prove falsehoods) and checkability by a computer (what, with hindsight, Gödel’s “arithmetization of syntax” was all about). The logic works for “heuristic explanations” too.
Now we come to something that I picked up from my former student (and now AI alignment leader) Paul Christiano, on a recent trip to the Bay Area, and which I share with Paul’s kind permission. Having learned that there’s no way to mechanize even heuristic explanations for all the true statements of arithmetic, we could set our sights lower still, and ask about mere plausibility arguments—arguments that might be overturned on further reflection. Is there some sense in which every true mathematical statement at least has a good plausibility argument?
Maybe you see where this is going. Letting P be a program that accepts or rejects proposed plausibility arguments, we can construct
S(P) = “This sentence has no argument for its plausibility accepted by P.”
If S(P) is true, then it’s an example of a true arithmetical statement without even a plausibility argument for its truth (!). If, on the other hand, S(P) is false, then there is a plausibility argument for it. By itself, this is not at all a fatal problem: all sorts of false statements (IP≠PSPACE, switching doors doesn’t matter in Monty Hall, Trump couldn’t possibly become president…) have had decent plausibility arguments. Having said that, it’s pretty strange that you can have a plausibility argument that’s immediately contradicted by its own existence! This rules out some properties that you might want your “plausibility system” to have, although maybe a plausibility system exists that’s still nontrivial and that has weaker properties.
Anyway, I don’t know where I’m going with this, or even why I posted it, but I hope you enjoyed it! And maybe there’s something to be discovered in this direction.
As I slept fitfully, still recovering from COVID, I had one of the more interesting dreams of my life:
I was desperately trying to finish some PowerPoint slides in time to give a talk. Uncharacteristically for me, one of the slides displayed actual code. This was a dream, so nothing was as clear as I’d like, but the code did something vaguely reminiscent of Rosser’s Theorem—e.g., enumerating all proofs in ZFC until it finds the lexicographically first proof or disproof of a certain statement, then branching into cases depending on whether it’s a proof or a disproof. In any case, it was simple enough to fit on one slide.
Suddenly, though, my whole presentation was deleted. Everything was ruined!
One of the developers of PowerPoint happened to be right there in the lecture hall (of course!), so I confronted him with my laptop and angrily demanded an explanation. He said that I must have triggered the section of Microsoft Office that tries to detect and prevent any discussion of logical paradoxes that are too dangerous for humankind—the ones that would cause people to realize that our entire universe is just an illusion, a sandbox being run inside an AI, a glitch-prone Matrix. He said it patronizingly, as if it should’ve been obvious: “you and I both know that the Paradoxes are not to be talked about, so why would you be so stupid as to put one in your presentation?”
My reaction was to jab my finger in the guy’s face, shove him, scream, and curse him out. At that moment, I wasn’t concerned in the slightest about the universe being an illusion, or about glitches in the Matrix. I was concerned about my embarrassment when I’d be called in 10 minutes to give my talk and would have nothing to show.
My last thought, before I woke with a start, was to wonder whether Greg Kuperberg was right and I should give my presentations in Beamer, or some other open-source software, and then I wouldn’t have had this problem.
A coda: I woke a bit after 7AM Central and started to write this down. But then—this is now real life (!)—I saw an email saying that a dozen people were waiting for me in a conference room in Europe for an important Zoom meeting. We’d gotten the time zones wrong; I’d thought that it wasn’t until 8AM my time. If not for this dream causing me to wake up, I would’ve missed the meeting entirely.
I promise you: this post is going to tell a scientifically coherent story that involves all five topics listed in the title. Not one can be omitted.
My story starts with a Zoom talk that the one and only Lenny Susskind delivered for the Simons Institute for Theory of Computing back in May. There followed a panel discussion involving Lenny, Edward Witten, Geoffrey Penington, Umesh Vazirani, and your humble shtetlmaster.
Lenny’s talk led up to a gedankenexperiment involving an observer, Alice, who bravely jumps into a specially-prepared black hole, in order to see the answer to a certain computational problem in her final seconds before being ripped to shreds near the singularity. Drawing on earlier work by Bouland, Fefferman, and Vazirani, Lenny speculated that the computational problem could be exponentially hard even for a (standard) quantum computer. Despite this, Lenny repeatedly insisted—indeed, he asked me again to stress here—that he was not claiming to violate the Quantum Extended Church-Turing Thesis (QECTT), the statement that all of nature can be efficiently simulated by a standard quantum computer. Instead, he was simply investigating how the QECTT needs to be formulated in order to be a true statement.
I didn’t understand this, to put it mildly. If what Lenny was saying was right—i.e., if the infalling observer could see the answer to a computational problem not in BQP, or Bounded-Error Quantum Polynomial-Time—then why shouldn’t we call that a violation of the QECTT? Just like we call Shor’s quantum factoring algorithm a likely violation of the classical Extended Church-Turing Thesis, the thesis saying that nature can be efficiently simulated by a classical computer? Granted, you don’t have to die in order to run Shor’s algorithm, as you do to run Lenny’s experiment. But why should such implementation details matter from the lofty heights of computational complexity?
Alas, not only did Lenny never answer that in a way that made sense to me, he kept trying to shift the focus from real, physical black holes to “silicon spheres” made of qubits, which would be programmed to simulate the process of Alice jumping into the black hole (in a dual boundary description). Say what? Granting that Lenny’s silicon spheres, being quantum computers under another name, could clearly be simulated in BQP, wouldn’t this still leave the question about the computational powers of observers who jump into actual black holes—i.e., the question that we presumably cared about in the first place?
Confusing me even further, Witten seemed almost dismissive of the idea that Lenny’s gedankenexperiment raised any new issue for the QECTT—that is, any issue that wouldn’t already have been present in a universe without gravity. But as to Witten’s reasons, the most I understood from his remarks was that he was worried about various “engineering” issues with implementing Lenny’s gedankenexperiment, involving gravitational backreaction and the like. Ed Witten, now suddenly the practical guy! I couldn’t even isolate the crux of disagreement between Susskind and Witten, since after all, they agreed (bizarrely, from my perspective) that the QECTT wasn’t violated. Why wasn’t it?
Anyway, shortly afterward I attended the 28th Solvay Conference in Brussels, where one of the central benefits I got—besides seeing friends after a long COVID absence and eating some amazing chocolate mousse—was a dramatically clearer understanding of the issues in Lenny’s gedankenexperiment. I owe this improved understanding to conversations with many people at Solvay, but above all Daniel Gottesman and Daniel Harlow. Lenny himself wasn’t there, other than in spirit, but I ran the Daniels’ picture by him afterwards and he assented to all of its essentials.
The Daniels’ picture is what I want to explain in this post. Needless to say, I take sole responsibility for any errors in my presentation, as I also take sole responsibility for not understanding (or rather: not doing the work to translate into terms that I understood) what Susskind and Witten had said to me before.
The first thing you need to understand about Lenny’s gedankenexperiment is that it takes place entirely in the context of AdS/CFT: the famous holographic duality between two types of physical theories that look wildly different. Here AdS stands for anti-de-Sitter: a quantum theory of gravity describing a D-dimensional universe with a negative cosmological constant (i.e. hyperbolic geometry), one where black holes can form and evaporate and so forth. Meanwhile, CFT stands for conformal field theory: a quantum field theory, with no apparent gravity (and hence no black holes), that lives on the (D-1)-dimensional boundary of the D-dimensional AdS space. The staggering claim of AdS/CFT is that every physical question about the AdS bulk can be translated into an equivalent question about the CFT boundary, and vice versa, with a one-to-one mapping from states to states and observables to observables. So in that sense, they’re actually the same theory, just viewed in two radically different ways. AdS/CFT originally came out of string theory, but then notoriously “swallowed its parent,” to the point where nowadays, if you go to what are still called “string theory” meetings, you’re liable to hear vastly more discussion of AdS/CFT than of actual strings.
Thankfully, the story I want to tell won’t depend on fine details of how AdS/CFT works. Nevertheless, you can’t just ignore the AdS/CFT part as some technicality, in order to get on with the vivid tale of Alice jumping into a black hole, hoping to learn the answer to a beyond-BQP computational problem in her final seconds of existence. The reason you can’t ignore it is that the whole beyond-BQP computational problem we’ll be talking about, involves the translation (or “dictionary”) between the AdS bulk and the CFT boundary. If you like, then, it’s actually the chasm between bulk and boundary that plays the starring role in this story. The more familiar chasm within the bulk, between the interior of a black hole and its exterior (the two separated by an event horizon), plays only a subsidiary role: that of causing the AdS/CFT dictionary to become exponentially complex, as far as anyone can tell.
Pause for a minute. Previously I led you to believe that we’d be talking about an actual observer Alice, jumping into an actual physical black hole, and whether Alice could see the answer to a problem that’s intractable even for quantum computers in her last moments before hitting the singularity, and if so whether we should take that to refute the Quantum Extended Church-Turing Thesis. What I’m saying now is so wildly at variance with that picture, that it had to be repeated to me about 10 times before I understood it. Once I did understand, I then had to repeat it to others about 10 times before they understood. And I don’t care if people ridicule me for that admission—how slow Scott and his friends must be, compared to string theorists!—because my only goal right now is to get you to understand it.
To say it again: Lenny has not proposed a way for Alice to surpass the complexity-theoretic power of quantum computers, even for a brief moment, by crossing the event horizon of a black hole. If that was Alice’s goal when she jumped into the black hole, then alas, she probably sacrificed her life for nothing! As far as anyone knows, Alice’s experiences, even after crossing the event horizon, ought to continue to be described extremely well by general relativity and quantum field theory (at least until she nears the singularity and dies), and therefore ought to be simulatable in BQP. Granted, we don’t actually know this—you can call it an open problem if you like—but it seems like a reasonable guess.
In that case, though, what beyond-BQP problem was Lenny talking about, and what does it have to do with black holes? Building on the Bouland-Fefferman-Vazirani paper, Lenny was interested in a class of problems of the following form: Alice is given as input a pure quantum state |ψ⟩, which encodes a boundary CFT state, which is dual to an AdS bulk universe that contains a black hole. Alice’s goal is, by examining |ψ⟩, to learn something about what’s inside the black hole. For example: does the black hole interior contain “shockwaves,” and if so how many and what kind? Does it contain a wormhole, connecting it to a different black hole in another universe? If so, what’s the volume of that wormhole? (Not the first question I would ask either, but bear with me.)
Now, when I say Alice is “given” the state |ψ⟩, this could mean several things: she could just be physically given a collection of n qubits. Or, she could be given a gigantic table of 2n amplitudes. Or, as a third possibility, she could be given a description of a quantum circuit that prepares |ψ⟩, say from the all-0 initial state |0n⟩. Each of these possibilities leads to a different complexity-theoretic picture, and the differences are extremely interesting to me, so that’s what I mostly focused on in my remarks in the panel discussion after Lenny’s talk. But it won’t matter much for the story I want to tell in this post.
However |ψ⟩ is given to Alice, the prediction of AdS/CFT is that |ψ⟩ encodes everything there is to know about the AdS bulk, including whatever is inside the black hole—but, and this is crucial, the information about what’s inside the black hole will be pseudorandomly scrambled. In other words, it works like this: whatever simple thing you’d like to know about parts of the bulk that aren’t hidden behind event horizons—is there a star over here? some gravitational lensing over there? etc.—it seems that you could not only learn it by measuring |ψ⟩, but learn it in polynomial time, the dictionary between bulk and boundary being computationally efficient in that case. (As with almost everything else in this subject, even that hasn’t been rigorously proven, though my postdoc Jason Pollack and I made some progress this past spring by proving a piece of it.) On the other hand, as soon as you want to know what’s inside an event horizon, the fact that there are no probes that an “observer at infinity” could apply to find out, seems to translate into the requisite measurements on |ψ⟩ being exponentially complex to apply. (Technically, you’d have to measure an ensemble of poly(n) identical copies of |ψ⟩, but I’ll ignore that in what follows.)
In more detail, the relevant part of |ψ⟩ turns into a pseudorandom, scrambled mess: a mess that it’s plausible that no polynomial-size quantum circuit could even distinguish from the maximally mixed state. So, while in principle the information is all there in |ψ⟩, getting it out seems as hard as various well-known problems in symmetric-key cryptography, if not literally NP-hard. This is way beyond what we expect even a quantum computer to be able to do efficiently: indeed, after 30 years of quantum algorithms research, the best quantum speedup we know for this sort of task is typically just the quadratic speedup from Grover’s algorithm.
So now you understand why there was some hope that Alice, by jumping into a black hole, could solve a problem that’s exponentially hard for quantum computers! Namely because, once she’s inside the black hole, she can just see the shockwaves, or the volume of the wormhole, or whatever, and no longer faces the exponentially hard task of decoding that information from |ψ⟩. It’s as if the black hole has solved the problem for her, by physically instantiating the otherwise exponentially complex transformation between the bulk and boundary descriptions of |ψ⟩.
Having now gotten your hopes up, the next step in the story is to destroy them.
Here’s the fundamental problem: |ψ⟩ does not represent the CFT dual of a bulk universe that contains the black hole with the shockwaves or whatever, and that also contains Alice herself, floating outside the black hole, and being given |ψ⟩ as an input. Indeed, it’s unclear what the latter state would even mean: how do we get around the circularity in its definition? How do we avoid an infinite regress, where |ψ⟩ would have to encode a copy of |ψ⟩ which would have to encode a copy of … and so on forever? Furthermore, who created this |ψ⟩ to give to Alice? We don’t normally imagine that an “input state” contains a complete description of the body and brain of the person whose job it is to learn the output.
By contrast, a scenario that we can define without circularity is this: Alice is given (via physical qubits, a giant table of amplitudes, an obfuscated quantum circuit, or whatever) a pure quantum state |ψ⟩, which represents the CFT dual of a hypothetical universe containing a black hole. Alice wants to learn what shockwaves or wormholes are inside the black hole, a problem plausibly conjectured not to have any ordinary polynomial-size quantum circuit that takes copies of |ψ⟩ as input. To “solve” the problem, Alice sets into motion the following sequence of events:
Alice scans and uploads her own brain into a quantum computer, presumably destroying the original meat brain in the process! The QC represents Alice, who now exists only virtually, via a state |φ⟩.
The QC performs entangling operations on |φ⟩ and |ψ⟩, which correspond to inserting Alice into the bulk of the universe described by |ψ⟩, and then having her fall into the black hole.
Now in simulated form, “Alice” (or so we assume, depending on our philosophical position) has the subjective experience of falling into the black hole and observing what’s inside. Success! Given |ψ⟩ as input, we’ve now caused “Alice” (for some definition of “Alice”) to have observed the answer to the beyond-BQP computational problem.
In the panel discussion, I now model Susskind as having proposed scenario 1-3, Witten as going along with 1-2 but rejecting 3 or not wanting to discuss it, and me as having made valid points about the computational complexity of simulating Alice’s experience in 1-3, yet while being radically mistaken about what the scenario was (I still thought an actual black hole was involved).
An obvious question is whether, having learned the answer, “Alice” can now get the answer back out to the “real, original” world. Alas, the expectation is that this would require exponential time. Why? Because otherwise, this whole process would’ve constituted a subexponential-time algorithm for distinguishing random from pseudorandom states using an “ordinary” quantum computer! Which is conjectured not to exist.
And what about Alice herself? In polynomial time, could she return from “the Matrix,” back to a real-world biological body? Sure she could, in principle—if, for example, the entire quantum computation were run in reverse. But notice that reversing the computation would also make Alice forget the answer to the problem! Which is not at all a coincidence: if the problem is outside BQP, then in general, Alice can know the answer only while she’s “inside the Matrix.”
Now that hopefully everything is crystal-clear and we’re all on the same page, what can we say about this scenario? In particular: should it cause us to reject or modify the QECTT itself?
Daniel Gottesman, I thought, offered a brilliant reductio ad absurdum of the view that the simulated black hole scenario should count as a refutation of the QECTT. Well, he didn’t call it a “reductio,” but I will.
For the reductio, let’s forget not only about quantum gravity but even about quantum mechanics itself, and go all the way back to classical computer science. A fully homomorphic encryption scheme, the first example of which was discovered by Craig Gentry 15 years ago, lets you do arbitrary computations on encrypted data without ever needing to decrypt it. It has both an encryption key, for encrypting the original plaintext data, and a separate decryption key, for decrypting the final answer.
Now suppose Alice has some homomorphically encrypted top-secret emails, which she’d like to read. She has the encryption key (which is public), but not the decryption key.
If the homomorphic encryption scheme is secure against quantum computers—as the schemes discovered by Gentry and later researchers currently appear to be—and if the QECTT is true, then Alice’s goal is obviously infeasible: decrypting the data will take her exponential time.
Now, however, a classical version of Lenny comes along, and explains to Alice that she simply needs to do the following:
Upload her own brain state into a classical computer, destroying the “meat” version in the process (who needed it?).
Using the known encryption key, homomorphically encrypt a computer program that simulates (and thereby, we presume, enacts) Alice’s consciousness.
Using the homomorphically encrypted Alice-brain, together with the homomorphically encrypted input data, do the homomorphic computations that simulate the process of Alice’s brain reading the top-secret emails.
The claim would now be that, inside the homomorphic encryption, the simulated Alice has the subjective experience of reading the emails in the clear. Aha, therefore she “broke” the homomorphic encryption scheme! Therefore, assuming that the scheme was secure even against quantum computers, the QECTT must be false!
According to Gottesman, this is almost perfectly analogous to Lenny’s black hole scenario. In particular, they share the property that “encryption is easy but decryption is hard.” Once she’s uploaded her brain, Alice can efficiently enter the homomorphically encrypted world to see the solution to a hard problem, just like she can efficiently enter the black hole world to do the same. In both cases, however, getting back to her normal world with the answer would then take Alice exponential time. Note that in the latter case, the difficulty is not so much about “escaping from a black hole,” as it is about inverting the AdS/CFT dictionary.
Going further, we can regard the AdS/CFT dictionary for regions behind event horizons as, itself, an example of a fully homomorphic encryption scheme—in this case, of course, one where the ciphertexts are quantum states. This strikes me as potentially an important insight about AdS/CFT itself, even if that wasn’t Gottesman’s intention. It complements many other recent connections between AdS/CFT and theoretical computer science, including the view of AdS/CFT as a quantum error-correcting code, and the connection between AdS/CFT and the Max-Flow/Min-Cut Theorem (see also my talk about my work with Jason Pollack).
So where’s the reductio? Well, when it’s put so starkly, I suspect that not many would regard Gottesman’s classical homomorphic encryption scenario as a “real” challenge to the QECTT. Or rather, people might say: yes, this raises fascinating questions for the philosophy of mind, but at any rate, we’re no longer talking about physics. Unlike with (say) quantum computing, no new physical phenomenon is being brought to light that lets an otherwise intractable computational problem be solved. Instead, it’s all about the user herself, about Alice, and which physical systems get to count as instantiating her.
It’s like, imagine Alice at the computer store, weighing which laptop to buy. Besides weight, battery life, and price, she definitely does care about processing power. She might even consider a quantum computer, if one is available. Maybe even a computer with a black hole, wormhole, or closed timelike curve inside: as long as it gives the answers she wants, what does she care about the innards? But a computer whose normal functioning would (pessimistically) kill her or (optimistically) radically change her own nature, trapping her in a simulated universe that she can escape only by forgetting the computer’s output? Yeah, I don’t envy the computer salesman.
Anyway, if we’re going to say this about the homomorphic encryption scenario, then shouldn’t we say the same about the simulated black hole scenario? Again, from an “external” perspective, all that’s happening is a giant BQP computation. Anything beyond BQP that we consider to be happening, depends on adopting the standpoint of an observer who “jumps into the homomorphic encryption on the CFT boundary”—at which point, it would seem, we’re no longer talking about physics but about philosophy of mind.
So, that was the story! I promised you that it would integrally involve black holes, holography, the Quantum Extended Church-Turing Thesis, fully homomorphic encryption, and brain uploading, and I hope to have delivered on my promise.
Of course, while this blog post has forever cleared up all philosophical confusions about AdS/CFT and the Quantum Extended Church-Turing Thesis, many questions of a more technical nature remain. For example: what about the original scenario? can we argue that the experiences of bulk observers can be simulated in BQP, even when those observers jump into black holes? Also, what can we say about the complexity class of problems to which the simulated Alice can learn the answers? Could she even solve NP-complete problems in polynomial time this way, or at least invert one-way functions? More broadly, what’s the power of “BQP with an oracle for applying the AdS/CFT dictionary”—once or multiple times, in one direction or both directions?
Lenny himself described his gedankenexperiment as exploring the power of a new complexity class that he called “JI/poly,” where the JI stands for “Jumping In” (to a black hole, that is). The nomenclature is transparently ridiculous—“/poly” means “with polynomial-size advice,” which we’re not talking about here—and I’ve argued in this post that the “JI” is rather misleading as well. If Alice is “jumping” anywhere, it’s not into a black hole per se, but into a quantum computer that simulates a CFT that’s dual to a bulk universe containing a black hole.
In a broader sense, though, to contemplate these questions at all is clearly to “jump in” to … something. It’s old hat by now that one can start in physics and end up in philosophy: what else is the quantum measurement problem, or the Boltzmann brain problem, or anthropic cosmological puzzles like whether (all else equal) we’re a hundred times as likely to find ourselves in a universe with a hundred times as many observers? More recently, it’s also become commonplace that one can start in physics and end in computational complexity theory: quantum computing itself is the example par excellence, but over the past decade, the Harlow-Hayden argument about decoding Hawking radiation and the complexity = action proposal have made clear that it can happen even in quantum gravity.
Lenny’s new gedankenexperiment, however, is the first case I’ve seen where you start out in physics, and end up embroiled in some of the hardest questions of philosophy of mind and computational complexity theory simultaneously.