NIPS vs. NeurIPS: guest post by Steven Pinker

December 23rd, 2019

Scott’s Update (Dec. 26): Comments on this post are now closed, since I felt that whatever progress could be made, had been, and I wanted to move on to more interesting topics. Thanks so much to everyone who came here to hash things out in good faith—which, as far as I’m concerned, included the majority of the participants on both sides.

If you want to see the position paper that led to the name change movement, see What’s In A Name? The Need to Nip NIPS, by Daniela Witten, Elana Fertig, Anima Anandkumar, and Jeff Dean. I apologize for not linking to this paper in the original post.

To recap what I said many times in this post and the comments: I myself am totally fine with the name NeurIPS. I think several of the arguments for changing the name were good arguments—and I thank some of the commenters on this post for elucidating those arguments without shaming anybody or calling them names. In any case the decision is done, and it belongs to the ML community, not to me and not to Steven Pinker.

The one part that I’m against is the bullying of anyone who disagrees by smearing them as a misogynist. And then, recursively, the smearing as a misogynist of anyone who objected to that bullying, and so on and so on. Most supporters of the name change did not engage in such bullying, but one leader of the movement very conspicuously did, and continues to do it even now (to, I’m told, the consternation even of many of her allies).

Since this post went up, something extremely interesting happened: Steven Pinker and I started getting emails from researchers in the NeurIPS community that said, in various words: “thank you for openly airing perspectives that we could not air, without jeopardizing our careers.” We were told that even women in ML, and even those who agreed with the activists on most points, could no longer voice opposition without risking their hiring or tenure. This put into a slightly different light, I thought, the constant claims of some movement leaders about their own marginalization and powerlessness.

Since I was 7 or 8 years old, the moral lodestar of my life has been my yearning (too often left unfulfilled) to stand up to the world’s bullies. Bullies come in all shapes and sizes: some are gangsters or men who sexually exploit vulnerable women; one, alas, is even the President of the United States. But bullying knows no bounds of ideology or gender. Some bullies resort to whisper networks, or Twitter shaming campaigns, or their power in academic hierarchies, to shut down dissenting voices. With the latter kinds of bully—well, to whatever extent this blog is now in a position to make some difference, I’d feel morally complicit if it didn’t.

As I wrote in the comments: may the 2020s be an era of intellectual freedom, compassion, and understanding for all people regardless of background. –SA

Scott’s prologue:

Happy Christmas and Merry Chanukah!

As a followup to last Thursday’s post about the term “quantum supremacy,” today all of us here at Shtetl-Optimized are humbled to host a guest post by Steven Pinker: the Johnstone Professor of Psychology at Harvard University, and author of The Language Instinct, How the Mind Works, The Blank Slate, Enlightenment Now (which I reviewed here), and other books.

The former NIPS—Neural Information Processing Systems—has been the premier conference for machine learning for 30 years. As many readers might know, last year NIPS changed its name to NeurIPS: ironically, giving greater emphasis to an aspect that I’m told has been de-emphasized at that conference over time. The reason, apparently, was that some male attendees had made puns involving the acronym “NIPS” and nipples.

I confess that the name change took me by surprise, simply because it had never occurred to me to make the NIPS/nipples connection—not when I gave a plenary at NIPS in 2012, and not when my collaborators and I coauthored a NIPS paper. It’s not that I’m averse to puerile humor. It’s just that neither I, nor anyone else I knew, had apparently ever felt the need for a shorthand for “nipples.” Of course, once I did learn about this controversy, it became hard to hear “NIPS” without thinking about it.

Back when this happened, Steven Pinker tweeted about NIPS being “forced to change its acronym … because some thought it was sexist. ?????,” apparently as part of a longer thread about “the new Victorians.” In response, a computer science professor sent Pinker an extremely stern email, saying that Pinker’s tweeting about this had “caused harm to our community” and “just [made] the world a bleaker place for everyone.” After linking to a National Academies report on bias in STEM, the email ended: “I hope you will choose to inform yourself on the discussion to which you have just contributed and that you will offer a well-considered follow up.” I won’t risk betraying confidences by quoting further. Of course, the author is warmly welcomed to share anything they wish in the comments here (or I can add it to the main post).

Steve’s guest post today consists of his response to this email. (He told me that, after sending it, he received no further responses.)

I don’t have any dog in the NIPS/NeurIPS debate, being at most on the “margin” (har!) of machine learning. And in any case the debate ended a year ago: the name is now NeurIPS and it’s not changing back. Reopening the issue would seem to invite a strong risk of social-media denunciation for no possible gain.

So why am I doing this? Mostly because I thought it was in the interest of humanity to know that, even when Steven Pinker is answering someone’s email, with no expectation that his reply will be made public, he writes the same way he does in his books: with clarity, humor, and an amusing quote from his mom.

But also because—again, without taking a position on the NIPS vs. NeurIPS issue itself—there’s a tactic displayed by Pinker’s detractors that fundamentally grates on me. This is where you pretend to an open mind, but it turns out that you’re open only to the possibility that your opponent might not have read enough reports and studies to “do better”—i.e., that they sinned out of ignorance rather than out of malice. You don’t open your mind even a crack to the possibility that the opponent might have a point.

Without further ado, here’s Steven Pinker’s email:

I appreciate your frank comments. At the same time, I do not agree with them. Please allow me to explain.

If this were a matter of sexual harassment or other hostile behavior toward women, I would of course support strong measures to combat it. Any member of the Symposium who uttered demeaning comments toward or about women certainly deserves censure.

But that is not what is at issue here. It’s an utterly irrelevant matter: the three-decades-old acronym for the Neural Information Processing Symposium, the pleasingly pronounceable NIPS. To state what should be obvious: nip is not a sexual word. As Chair of the Usage Panel of the American Heritage Dictionary, I can support this claim.

(And as my mother wrote to me: “I don’t get it. I thought Nips was a brand of caramel candy.”)  [Indeed, I enjoyed those candies as a kid. –SA] Even if people with an adolescent mindset think of nipples when hearing the sound “nips,” the society should not endorse the idea that the concept of nipples is sexist. Men have nipples too, and women’s nipples evolved as organs of nursing, not sexual gratification. Indeed, many feminists have argued that it’s sexist to conceptualize women’s bodies from the point of view of male sexuality.

If some people make insulting puns that demean women, the society should condemn them for the insults, not concede to their puerility by endorsing their appropriation of an innocent sound. (The Linguistics Society of America and Boston Debate League do not change their names to disavow jejune clichés about cunning linguists and master debaters.) To act as if anything with the remotest connection to sexuality must be censored to protect delicate female sensibilities is insulting to women and reminiscent of prissy Victorian taboos against uncovered piano legs or the phrase “with the naked eye.”

Any harm to the community of computer scientists has been done not by me but by the pressure group and the Symposium’s surrender. As a public figure who hears from a broad range of people outside the academic bubble, I can tell you that this episode has not played well. It’s seen as the latest sign that academia has lost its mind—that it has traded reasoned argument, conceptual rigor, proportionality, and common sense for prudish censoriousness, snowflake sensibility, and virtue signaling. I often hear from intelligent non-leftists, “Why should I be impressed by the scientific consensus on climate change? Everyone knows that academics just fall into line with the politically correct position.” To secure the credibility of the academy, we have to make reasoned distinctions, and stop turning our enterprise into a laughingstock.

To repeat: none of this deprecates the important effort to stamp out harassment and misogyny in science, which I’m well aware of and thoroughly support, but which has nothing to do with the acronym NIPS.

You are welcome to share this note with interested parties.

Best,
Steve

Quantum Dominance, Hegemony, and Superiority

December 19th, 2019

Yay! I’m now a Fellow of the ACM. Along with my fellow new inductee Peter Shor, who I hear is a real up-and-comer in the quantum computing field. I will seek to use this awesome responsibility to steer the ACM along the path of good rather than evil.

Also, last week, I attended the Q2B conference in San Jose, where a central theme was the outlook for practical quantum computing in the wake of the first clear demonstration of quantum computational supremacy. Thanks to the folks at QC Ware for organizing a fun conference (full disclosure: I’m QC Ware’s Chief Scientific Advisor). I’ll have more to say about the actual scientific things discussed at Q2B in future posts.

None of that is why you’re here, though. You’re here because of the battle over “quantum supremacy.”

A week ago, my good friend and collaborator Zach Weinersmith, of SMBC Comics, put out a cartoon with a dark-curly-haired scientist named “Dr. Aaronson,” who’s revealed on a hot mic to be an evil “quantum supremacist.” Apparently a rush job, this cartoon is far from Zach’s finest work. For one thing, if the character is supposed to be me, why not draw him as me, and if he isn’t, why call him “Dr. Aaronson”? In any case, I learned from talking to Zach that the cartoon’s timing was purely coincidental: Zach didn’t even realize what a hornet’s-nest he was poking with this.

Ever since John Preskill coined it in 2012, “quantum supremacy” has been an awkward term. Much as I admire John Preskill’s wisdom, brilliance, generosity, and good sense, in physics as in everything else—yeah, “quantum supremacy” is not a term I would’ve coined, and it’s certainly not a hill I’d choose to die on. Once it had gained common currency, though, I sort of took a liking to it, mostly because I realized that I could mine it for dark one-liners in my talks.

The thinking was: even as white supremacy was making its horrific resurgence in the US and around the world, here we were, physicists and computer scientists and mathematicians of varied skin tones and accents and genders, coming together to pursue a different and better kind of supremacy—a small reflection of the better world that we still believed was possible. You might say that we were reclaiming the word “supremacy”—which, after all, just means a state of being supreme—for something non-sexist and non-racist and inclusive and good.

In the world of 2019, alas, perhaps it was inevitable that people wouldn’t leave things there.

My first intimation came a month ago, when Leonie Mueck—someone who I’d gotten to know and like when she was an editor at Nature handling quantum information papers—emailed me about her view that our community should abandon the term “quantum supremacy,” because of its potential to make women and minorities uncomfortable in our field. She advocated using “quantum advantage” instead.

So I sent Leonie back a friendly reply, explaining that, as the father of a math-loving 6-year-old girl, I understood and shared her concerns—but also, that I didn’t know an alternative term that really worked.

See, it’s like this. Preskill meant “quantum supremacy” to refer to a momentous event that seemed likely to arrive in a matter of years: namely, the moment when programmable quantum computers would first outpace the ability of the fastest classical supercomputers on earth, running the fastest algorithms known by humans, to simulate what the quantum computers were doing (at least on special, contrived problems). And … “the historic milestone of quantum advantage”? It just doesn’t sound right. Plus, as many others pointed out, the term “quantum advantage” is already used to refer to … well, quantum advantages, which might fall well short of supremacy.

But one could go further. Suppose we did switch to “quantum advantage.” Couldn’t that term, too, remind vulnerable people about the unfair advantages that some groups have over others? Indeed, while “advantage” is certainly subtler than “supremacy,” couldn’t that make it all the more insidious, and therefore dangerous?

Oblivious though I sometimes am, I realized Leonie would be unhappy if I offered that, because of my wholehearted agreement, I would henceforth never again call it “quantum supremacy,” but only “quantum superiority,” “quantum dominance,” or “quantum hegemony.”

But maybe you now see the problem. What word does the English language provide to describe one thing decisively beating or being better than a different thing for some purpose, and which doesn’t have unsavory connotations?

I’ve heard “quantum ascendancy,” but that makes it sound like we’re a UFO cult—waiting to ascend, like ytterbium ions caught in a laser beam, to a vast quantum computer in the sky.

I’ve heard “quantum inimitability” (that is, inability to imitate using a classical computer), but who can pronounce that?

Yesterday, my brilliant former student Ewin Tang (yes, that one) relayed to me a suggestion by Kevin Tian: “quantum eclipse” (that is, the moment when quantum computers first eclipse classical ones for some task). But would one want to speak of a “quantum eclipse experiment”? And shouldn’t we expect that, the cuter and cleverer the term, the harder it will be to use unironically?

In summary, while someone might think of a term so inspired that it immediately supplants “quantum supremacy” (and while I welcome suggestions), I currently regard it as an open problem.

Anyway, evidently dissatisfied with my response, last week Leonie teamed up with 13 others to publish a letter in Nature, which was originally entitled “Supremacy is for racists—use ‘quantum advantage,'” but whose title I see has now been changed to the less inflammatory “Instead of ‘supremacy’ use ‘quantum advantage.'” Leonie’s co-signatories included four of my good friends and colleagues: Alan Aspuru-Guzik, Helmut Katzgraber, Anne Broadbent, and Chris Granade (the last of whom got started in the field by helping me edit Quantum Computing Since Democritus).

(Update: Leonie pointed me to a longer list of signatories here, at their website called “quantumresponsibility.org.” A few names that might be known to Shtetl-Optimized readers are Andrew White, David Yonge-Mallo, Debbie Leung, Matt Leifer, Matthias Troyer.)

Their letter says:

The community claims that quantum supremacy is a technical term with a specified meaning. However, any technical justification for this descriptor could get swamped as it enters the public arena after the intense media coverage of the past few months.

In our view, ‘supremacy’ has overtones of violence, neocolonialism and racism through its association with ‘white supremacy’. Inherently violent language has crept into other branches of science as well — in human and robotic spaceflight, for example, terms such as ‘conquest’, ‘colonization’ and ‘settlement’ evoke the terra nullius arguments of settler colonialism and must be contextualized against ongoing issues of neocolonialism.

Instead, quantum computing should be an open arena and an inspiration for a new generation of scientists.

When I did an “Ask Me Anything” session, as the closing event at Q2B, Sarah Kaiser asked me to comment on the Nature petition. So I repeated what I’d said in my emailed response to Leonie—running through the problems with each proposed alternative term, talking about the value of reclaiming the word “supremacy,” and mostly just trying to diffuse the tension by getting everyone laughing together. Sarah later tweeted that she was “really disappointed” in my response.

Then the Wall Street Journal got in on the action, with a brief editorial (warning: paywalled) mocking the Nature petition:

There it is, folks: Mankind has hit quantum wokeness. Our species, akin to Schrödinger’s cat, is simultaneously brilliant and brain-dead. We built a quantum computer and then argued about whether the write-up was linguistically racist.

Taken seriously, the renaming game will never end. First put a Sharpie to the Supremacy Clause of the U.S. Constitution, which says federal laws trump state laws. Cancel Matt Damon for his 2004 role in “The Bourne Supremacy.” Make the Air Force give up the term “air supremacy.” Tell lovers of supreme pizza to quit being so chauvinistic about their toppings. Please inform Motown legend Diana Ross that the Supremes are problematic.

The quirks of quantum mechanics, some people argue, are explained by the existence of many universes. How did we get stuck in this one?

Steven Pinker also weighed in, with a linguistically-informed tweetstorm:

This sounds like something from The Onion but actually appeared in Nature … It follows the wokified stigmatization of other innocent words, like “House Master” (now, at Harvard, Residential Dean) and “NIPS” (Neural Information Processing Society, now NeurIPS). It’s a familiar linguistic phenomenon, a lexical version of Gresham’s Law: bad meanings drive good ones out of circulation. Examples: the doomed “niggardly” (no relation to the n-word) and the original senses of “cock,” “ass,” “prick,” “pussy,” and “booty.” Still, the prissy banning of words by academics should be resisted. It dumbs down understanding of language: word meanings are conventions, not spells with magical powers, and all words have multiple senses, which are distinguished in context. Also, it makes academia a laughingstock, tars the innocent, and does nothing to combat actual racism & sexism.

Others had a stronger reaction. Curtis Yarvin, better known as Mencius Moldbug, is one of the founders of “neoreaction” (and a significant influence on Steve Bannon, Michael Anton, and other Trumpists). Regulars might remember that Yarvin argued with me in Shtetl-Optimized‘s comment section, under a post in which I denounced Trump’s travel ban and its effects on my Iranian PhD student. Since then, Yarvin has sent me many emails, which have ranged from long to extremely long, and whose message could be summarized as: “[labored breathing] Abandon your liberal Enlightenment pretensions, young Nerdwalker. Come over the Dark Side.”

After the “supremacy is for racists” letter came out in Nature, though, Yarvin sent me his shortest email ever. It was simply a link to the letter, along with the comment “I knew it would come to this.”

He meant: “What more proof do you need, young Nerdawan, that this performative wokeness is a cancer that will eventually infect everything you value—even totally apolitical research in quantum information? And by extension, that my whole worldview, which warned of this, is fundamentally correct, while your faith in liberal academia is naïve, and will be repaid only with backstabbing?”

In a subsequent email, Yarvin predicted that in two years, the whole community will be saying “quantum advantage” instead of “quantum supremacy,” and in five years I’ll be saying “quantum advantage” too. As Yarvin famously wrote: “Cthulhu may swim slowly. But he only swims left.”

So what do I really think about this epic battle for (and against) supremacy?

Truthfully, half of me just wants to switch to “quantum advantage” right now and be done with it. As I said, I know some of the signatories of the Nature letter to be smart and reasonable and kind. They don’t wish to rid the planet of everyone like me. They’re not Amanda Marcottes or Arthur Chus. Furthermore, there’s little I despise more than a meaty scientific debate devolving into a pointless semantic one, with brilliant friend after brilliant friend getting sucked into the vortex (“you too?”). I’m strongly in the Pinkerian camp, which holds that words are just arbitrary designators, devoid of the totemic power to dictate thoughts. So if friends and colleagues—even just a few of them—tell me that they find some word I use to be offensive, why not just be a mensch, apologize for any unintended hurt, switch words midsentence, and continue discussing the matter at hand?

But then the other half of me wonders: once we’ve ceded an open-ended veto over technical terms that remind anyone of anything bad, where does it stop? How do we ever certify a word as kosher? At what point do we all get to stop arguing and laugh together?

To make this worry concrete, look back at Sarah Kaiser’s Twitter thread—the one where she expresses disappointment in me. Below her tweet, someone remarks that, besides “quantum supremacy,” the word “ancilla” (as in ancilla qubit, a qubit used for intermediate computation or other auxiliary purposes) is problematic as well. Here’s Sarah’s response:

I agree, but I wanted to start by focusing on the obvious one, Its harder for them to object to just one to start with, then once they admit the logic, we can expand the list

(What would Curtis Yarvin say about that?)

You’re probably now wondering: what’s wrong with “ancilla”? Apparently, in ancient Rome, an “ancilla” was a female slave, and indeed that’s the Latin root of the English adjective “ancillary” (as in “providing support to”). I confess that I hadn’t known that—had you? Admittedly, once you do know, you might never again look at a Controlled-NOT gate—pitilessly flipping an ancilla qubit, subject only to the whims of a nearby control qubit—in quite the same way.

(Ah, but the ancilla can fight back against her controller! And she does—in the Hadamard basis.)

The thing is, if we’re gonna play this game: what about annihilation operators? Won’t those need to be … annihilated from physics?

And what about unitary matrices? Doesn’t their very name negate the multiplicity of perspectives and cultures?

What about Dirac’s oddly-named bra/ket notation, with its limitless potential for puerile jokes, about the “bra” vectors displaying their contents horizontally and so forth? (Did you smile at that, you hateful pig?)

What about daggers? Don’t we need a less violent conjugate tranpose?

Not to beat a dead horse, but once you hunt for examples, you realize that the whole dictionary is shot through with domination and brutality—that you’d have to massacre the English language to take it out. There’s nothing special about math or physics in this respect.

The same half of me also thinks about my friends and colleagues who oppose claims of quantum supremacy, or even the quest for quantum supremacy, on various scientific grounds. I.e., either they don’t think that the Google team achieved what it said, or they think that the task wasn’t hard enough for classical computers, or they think that the entire goal is misguided or irrelevant or uninteresting.

Which is fine—these are precisely the arguments we should be having—except that I’ve personally seen some of my respected colleagues, while arguing for these positions, opportunistically tack on ideological objections to the term “quantum supremacy.” Just to goose up their case, I guess. And I confess that every time they did this, it made me want to keep saying “quantum supremacy” from now till the end of time—solely to deny these colleagues a cheap and unearned “victory,” one they apparently felt they couldn’t obtain on the merits alone. I realize that this is childish and irrational.

Most of all, though, the half of me that I’m talking about thinks about Curtis Yarvin and the Wall Street Journal editorial board, cackling with glee to see their worldview so dramatically confirmed—as theatrical wokeness, that self-parodying modern monstrosity, turns its gaze on (of all things) quantum computing research. More red meat to fire up the base—or at least that sliver of the base nerdy enough to care. And the left, as usual, walks right into the trap, sacrificing its credibility with the outside world to pursue a runaway virtue-signaling spiral.

The same half of me thinks: do we really want to fight racism and sexism? Then let’s work together to assemble a broad coalition that can defeat Trump. And Jair Bolsonaro, and Viktor Orbán, and all the other ghastly manifestations of humanity’s collective lizard-brain. Then, if we’re really fantasizing, we could liberalize the drug laws, and get contraception and loans and education to women in the Third World, and stop the systematic disenfranchisement of black voters, and open up the world’s richer, whiter, and higher-elevation countries to climate refugees, and protect the world’s remaining indigenous lands (those that didn’t burn to the ground this year).

In this context, the trouble with obsessing over terms like “quantum supremacy” is not merely that it diverts attention, while contributing nothing to fighting the world’s actual racism and sexism. The trouble is that the obsessions are actually harmful. For they make academics—along with progressive activists—look silly. They make people think that we must not have meant it when we talked about the existential urgency of climate change and the world’s other crises. They pump oxygen into right-wing echo chambers.

But it’s worse than ridiculous, because of the message that I fear is received by many outside the activists’ bubble. When you say stuff like “[quantum] supremacy is for racists,” what’s heard might be something more like:

“Watch your back, you disgusting supremacist. Yes, you. You claim that you mentor women and minorities, donate to good causes, try hard to confront the demons in your own character? Ha! None of that counts for anything with us. You’ll never be with-it enough to be our ally, so don’t bother trying. We’ll see to it that you’re never safe, not even in the most abstruse and apolitical fields. We’ll comb through your words—even words like ‘ancilla qubit’—looking for any that we can cast as offensive by our opaque and ever-shifting standards. And once we find some, we’ll have it within our power to end your career, and you’ll be reduced to groveling that we don’t. Remember those popular kids who bullied you in second grade, giving you nightmares of social ostracism that persist to this day? We plan to achieve what even those bullies couldn’t: to shame you with the full backing of the modern world’s moral code. See, we’re the good guys of this story. It’s goodness itself that’s branding you as racist scum.”

In short, I claim that the message—not the message intended, of course, by anyone other than a Chu or a Marcotte or a SneerClubber, but the message received—is basically a Trump campaign ad. I claim further that our civilization’s current self-inflicted catastrophe will end—i.e., the believers in science and reason and progress and rule of law will claw their way back to power—when, and only when, a generation of activists emerges that understands these dynamics as well as Barack Obama did.

Wouldn’t it be awesome if, five years from now, I could say to Curtis Yarvin: you were wrong? If I could say to him: my colleagues and I still use the term ‘quantum supremacy’ whenever we care to, and none of us have been cancelled or ostracized for it—so maybe you should revisit your paranoid theories about Cthulhu and the Cathedral and so forth? If I could say: quantum computing researchers now have bigger fish to fry than arguments over words—like moving beyond quantum supremacy to the first useful quantum simulations, as well as the race for scalability and fault-tolerance? And even: progressive activists now have bigger fish to fry too—like retaking actual power all over the world?

Anyway, as I said, that’s how half of me feels. The other half is ready to switch to “quantum advantage” or any other serviceable term and get back to doing science.

Two updates

December 2nd, 2019
  1. Two weeks ago, I blogged about the claim of Nathan Keller and Ohad Klein to have proven the Aaronson-Ambainis Conjecture. Alas, Keller and Klein tell me that they’ve now withdrawn their preprint (though it may take another day for that to show up on the arXiv), because of what looks for now like a fatal flaw, in Lemma 5.3, discovered by Paata Ivanishvili. (My own embarrassment over having missed this flaw is slightly mitigated by most of the experts in discrete Fourier analysis having missed it as well!) Keller and Klein are now working to fix the flaw, and I wholeheartedly wish them success.
  2. In unrelated news, I was saddened to read that Virgil Griffith—cryptocurrency researcher, former Integrated Information Theory researcher, and onetime contributor to Shtetl-Optimized—was arrested at LAX for having traveled to North Korea to teach the DPRK about cryptocurrency, against the admonitions of the US State Department. I didn’t know Virgil well, but I did meet him in person at least once, and I liked his essays for this blog about how, after spending years studying IIT under Giulio Tononi himself, he became disillusioned with many aspects of it and evolved to a position not far from mine (though not identical either).
    Personally, I despise the North Korean regime for the obvious reasons—I regard it as not merely evil, but cartoonishly so—and I’m mystified by Virgil’s apparently sincere belief that he could bring peace between the North and South by traveling to North Korea to give a lecture about blockchain. Yet, however world-historically naïve he may have been, his intentions appear to have been good. More pointedly—and here I’m asking not in a legal sense but in a human one—if giving aid and comfort to the DPRK is treasonous, then isn’t the current occupant of the Oval Office a million times guiltier of that particular treason (to say nothing of others)? It’s like, what does “treason” even mean anymore? In any case, I hope some plea deal or other arrangement can be worked out that won’t end Virgil’s productive career.

Guest post by Greg Kuperberg: Archimedes’ other principle and quantum supremacy

November 26th, 2019

Scott’s Introduction: Happy Thanksgiving! Please join me in giving thanks for the beautiful post below, by friend-of-the-blog Greg Kuperberg, which tells a mathematical story that stretches from the 200s BC all the way to Google’s quantum supremacy result last month.

Archimedes’ other principle and quantum supremacy

by Greg Kuperberg

Note: UC Davis is hiring in CS theory! Scott offered me free ad space if I wrote a guest post, so here we are. The position is in all areas of CS theory, including QC theory although the search is not limited to that.

In this post, I wear the hat of a pure mathematician in a box provided by Archimedes. I thus set aside what everyone else thinks is important about Google’s 53-qubit quantum supremacy experiment, that it is a dramatic milestone in quantum computing technology. That’s only news about the physical world, whose significance pales in comparison to the Platonic world of mathematical objects. In my happy world, I like quantum supremacy as a demonstration of a beautiful coincidence in mathematics that has been known for more than 2000 years in a special case. The single-qubit case was discovered by Archimedes, duh. As Scott mentions in Quantum Computing Since Democritus, Bill Wootters stated the general result in a 1990 paper, but Wootters credits a 1974 paper by the Czech physicist Stanislav Sýkora. I learned of it in the substantially more general context of symplectic geometric that mathematicians developed independently between Sýkora’s prescient paper and Wootters’ more widely known citation. Much as I would like to clobber you with highly abstract mathematics, I will wait for some other time.

Suppose that you pick a pure state \(|\psi\rangle\) in the Hilbert space \(\mathbb{C}^d\) of a \(d\)-dimensional qudit, and then make many copies and fully measure each one, so that you sample many times from some distribution \(\mu\) on the \(d\) outcomes. You can think of such a distribution \(\mu\) as a classical randomized state on a digit of size \(d\). The set of all randomized states on a \(d\)-digit makes a \((d-1)\)-dimensional simplex \(\Delta^{d-1}\) in the orthant \(\mathbb{R}_{\ge 0}^d\). The coincidence is that if \(|\psi\rangle\) is uniformly random in the unit sphere in \(\mathbb{C}^d\), then \(\mu\) is uniformly random in \(\Delta^{d-1}\). I will call it the Born map, since it expresses the Born rule of quantum mechanics that amplitudes yield probabilities. Here is a diagram of the Born map of a qutrit, except with the laughable simplification of the 5-sphere in \(\mathbb{C}^3\) drawn as a 2-sphere.

If you pretend to be a bad probability student, then you might not be surprised by this coincidence, because you might suppose that all probability distributions are uniform other than treacherous exceptions to your intuition. However, the principle is certainly not true for a “rebit” (a qubit with real amplitudes) or for higher-dimensional “redits.” With real amplitudes, the probability density goes to infinity at the sides of the simplex \(\Delta^{d-1}\) and is even more favored at the corners. It also doesn’t work for mixed qudit states; the projection then favors the middle of \(\Delta^{d-1}\).

Archimedes’ theorem

The theorem of Archimedes is that a natural projection from the unit sphere to a circumscribing vertical cylinder preserves area. The projection is the second one that you might think of: Project radially from a vertical axis rather than radially in all three directions. Since Archimedes was a remarkably prescient mathematician, he was looking ahead to the Bloch sphere of pure qubit states \(|\psi\rangle\langle\psi|\) written in density operator form. If you measure \(|\psi\rangle\langle\psi|\) in the computational basis, you get a randomized bit state \(\mu\) somewhere on the interval from guaranteed 0 to guaranteed 1.

This transformation from a quantum state to a classical randomized state is a linear projection to a vertical axis. It is the same as Archimedes’ projection, except without the angle information. It doesn’t preserve dimension, but it does preserve measure (area or length, whatever) up to a factor of \(2\pi\). In particular, it takes a uniformly random \(|\psi\rangle\langle\psi|\) to a uniformly random \(\mu\).

Archimedes’ projection is also known as the Lambert cylindrical map of the world. This is the map that squishes Greenland along with the top of North America and Eurasia to give them proportionate area.

(I forgive Lambert if he didn’t give prior credit to Archimedes. There was no Internet back then to easily find out who did what first.) Here is a calculus-based proof of Archimedes’ theorem: In spherical coordinates, imagine an annular strip on the sphere at a polar angle of \(\theta\). (The polar angle is the angle from vertical in spherical coordinates, as depicted in red in the Bloch sphere diagram.) The strip has a radius of \(\sin\theta\), which makes it shorter than its unit radius friend on the cylinder. But it’s also tilted from vertical by an angle of \(\frac{\pi}2-\theta\), which makes it wider by a factor of \(1/(\sin \theta)\) than the height of its projection onto the \(z\) axis. The two factors exactly cancel out, making the area of the strip exactly proportional to the length of its projection onto the \(z\) axis. This is a coincidence which is special to the 2-sphere in 3 dimensions. As a corollary, we get that the surface area of a unit sphere is \(4\pi\), the same as an open cylinder with radius 1 and height 2. If you want to step through this in even more detail, Scott mentioned to me an action video which is vastly spiffier than anything that I could ever make.

The projection of the Bloch sphere onto an interval also shows what goes wrong if you try this with a rebit. The pure rebit states — again expressed in density operator form \(|\psi\rangle\langle\psi|\) are a great circle in the Bloch sphere. If you linearly project a circle onto an interval, then the length of the circle is clearly bunched up at the ends of the interval and the projected measure on the interval is not uniform.

Sýkora’s generalization

It is a neat coincidence that the Born map of a qubit preserves measure, but a proof that relies on Archimedes’ theorem seems to depend on the special geometry of the Bloch sphere of a qubit. That the higher-dimensional Born map also preserves measure is downright eerie. Scott challenged me to write an intuitive explanation. I remembered two different (but similar) proofs, neither of which is original to me. Scott and I disagree as to which proof is nicer.

As a first step of the first proof, it is easy to show that the Born map \(p = |z|^2\) for a single amplitude \(z\) preserves measure as a function from the complex plane \(\mathbb{C}\) to the ray \(\mathbb{R}_{\ge 0}\). The region in the complex numbers \(\mathbb{C}\) where the length of \(z\) is between \(a\) and \(b\), or \(a \le |z| \le b\), is \(\pi(b^2 – a^2)\). The corresponding interval for the probability is \(a^2 \le p \le b^2\), which thus has length \(b^2-a^2\). That’s all, we’ve proved it! More precisely, the area of any circularly symmetric region in \(\mathbb{C}\) is \(\pi\) times the length of its projection onto \(\mathbb{R}_{\ge 0}\).

The second step is to show the same thing for the Born map from the \(d\)-qudit Hilbert space \(\mathbb{C}^d\) to the \(d\)-digit orthant \(\mathbb{R}_{\ge 0}^d\), again without unit normalization. It’s also measure-preserving, up to a factor of \(\pi^d\) this time, because it’s the same thing in each coordinate separately. To be precise, the volume ratio holds for any region in \(\mathbb{C}^d\) that is invariant under separately rotating each of the \(d\) phases. (Because you can approximate any such region with a union of products of thin annuli.)

The third and final step is the paint principle for comparing surface areas. If you paint the hoods of two cars with the same thin layer of paint and you used the same volume of paint for each one, then you can conclude that the two car hoods have nearly same area. In our case, the Born map takes the region \[ 1 \le |z_0|^2 + |z_1|^2 + \cdots + |z_{d-1}|^2 \le 1+\epsilon \] in \(\mathbb{C}^d\) to the region \[ 1 \le p_0 + p_1 + \cdots + p_{d-1} \le 1+\epsilon \] in the orthant \(\mathbb{R}_{\ge 0}^d\). The former is the unit sphere \(S^{2d-1}\) in \(\mathbb{C}^d\) painted to a thickness of roughly \(\epsilon/2\). The latter is the probability simplex \(\Delta^{n-1}\) painted to a thickness of exactly \(\epsilon\). Taking the limit \(\epsilon \to 0\), the Born map from \(S^{2d-1}\) to \(\Delta^{n-1}\) preserves measure up to a factor of \(2\pi^n\).

You might wonder “why” this argument works even if you accept that it does work. The key is that the exponent 2 appears in two different ways: as the dimension of the complex numbers, and as the exponent used to set probabilities and define spheres. If we try the same argument with real amplitudes, then the volume between “spheres” of radius \(a\) and \(b\) is just \(2(b-a)\), which does not match the length \(b^2-a^2\). The Born map for a single real amplitude is the parabola \(p = x^2\), which clearly distorts length since it is not linear. The higher-dimensional real Born map similarly distorts volumes, whether or not you restrict to unit-length states.

If you’re a bitter-ender who still wants Archimedes’ theorem for real amplitudes, then you might consider the variant formula \(p = |x|\) to obtain a probability \(p\) from a “quantum amplitude” \(x\). Then the “Born” map does preserve measure, but for the trivial reason that \(x = \pm p\) is not really a quantum amplitude, it is a probability with a vestigial sign. Also the unit “sphere” in \(\mathbb{R}^d\) is not really a sphere in this theory, it is a hyperoctahedron. The only “unitary” operators that preserve the unit hyperoctahedron are signed permutation matrices. You can only use them for reversible classical computing or symbolic dynamics; they don’t have the strength of true quantum computing or quantum mechanics.

The fact that the Born map preserves measure also yields a bonus calculation of the volume of the unit ball in \(2d\) real dimensions, if we interpret that as \(d\) complex dimensions. The ball \[ |z_0|^2 + |z_1|^2 + \cdots + |z_{d-1}|^2 \le 1 \] in \(\mathbb{C}^d\) is sent to a different simplex \[ p_0 + p_1 + \cdots + p_{d-1} \le 1 \] in \(\mathbb{R}_{\ge 0}^d\). If we recall that the volume of a \(d\)-dimensional pyramid is \(\frac1d\) times base times height and calculate by induction on \(d\), we get that this simplex has volume \(\frac1{d!}\). Thus, the volume of the \(2d\)-dimensional unit ball is \(\frac{\pi^d}{d!}\).

You might ask whether the volume of a \(d\)-dimensional unit ball is always \(\frac{\pi^{d/2}}{(d/2)!}\) for both \(d\) even and odd. The answer is yes if we interpret factorials using the gamma function formula \(x! = \Gamma(x+1)\) and look up that \(\frac12! = \Gamma(\frac32) = \frac{\sqrt{\pi}}2\). The gamma function was discovered by Euler as a solution to the question of defining fractional factorials, but the notation \(\Gamma(x)\) and the cumbersome shift by 1 is due to Legendre. Although Wikipedia says that no one knows why Legendre defined it this way, I wonder if his goal was to do what the Catholic church later did for itself in 1978: It put a Pole at the origin.

(Scott wanted to censor this joke. In response, I express my loyalty to my nation of birth by quoting the opening of the Polish national anthem: “Poland has not yet died, so long as we still live!” I thought at first that Stanislav Sýkora is Polish since Stanisław and Sikora are both common Polish names, but his name has Czech spelling and he is Czech. Well, the Czechs are cool too.)

Sýkora’s 1974 proof of the generalized Archimedes’ theorem is different from this one. He calculates multivariate moments of the space of unit states \(S^{2d-1} \subseteq \mathbb{C}^d\), and confirms that they match the moments in the probability simplex \(\Delta^{d-1}\). There are inevitably various proofs of this result, and I will give another one.

Another proof, and quantum supremacy

There is a well-known and very useful algorithm to generate a random point on the unit sphere in either \(\mathbb{R}^d\) or \(\mathbb{C}^d\), and a similar algorithm to generate a random point in a simplex. In the former algorithm, we make each real coordinate \(x\) into an independent Gaussian random variable with density proportional to \(e^{-x^2}\;dx\), and then rescale the result to unit length. Since the exponents combine as \[ e^{-x_0^2}e^{-x_1^2}\cdots e^{-x_{d-1}^2} = e^{-(x_0^2 + x_1^2 + \cdots + x_{d-1}^2)}, \] we learn that the total exponent is spherically symmetric. Therefore after rescaling, the result is a uniformly random point on the unit sphere \(S^{d-1} \subseteq \mathbb{R}^d\). Similarly, the other algorithm generates a point in the orthant \(\mathbb{R}_{\ge 0}^d\) by making each real coordinate \(p \ge 0\) an independent random variable with exponential distribution \(e^{-p}\;dp\). This time we rescale the vector until its sum is 1. This algorithm likewise produces a uniformly random point in the simplex \(\Delta^{d-1} \subseteq \mathbb{R}_{\ge 0}^d\) because the total exponent of the product \[ e^{-p_0}e^{-p_1}\cdots e^{-p_{d-1}} = e^{-(p_0 + p_1 + \cdots + p_{d-1})} \] only depends on the sum of the coordinates. Wootters describes both of these algorithms in his 1990 paper, but instead of relating them to give his own proof of the generalized Archimedes’ theorem, he cites Sýkora.

The gist of the proof is that the Born map takes the Gaussian algorithm to the exponential algorithm. Explicitly, the Gaussian probability density for a single complex amplitude \[ z = x+iy = re^{i\theta} \] can be converted from Cartesian to polar coordinate as follows: \[ \frac{e^{-|z|^2}\;dx\;dy}{\pi} = \frac{e^{-r^2}r\;dr\;d\theta}{\pi}. \] I have included the factor of \(r\) that is naturally present in an area integral in polar coordinates. We will need it, and it is another way to see that the theorem relies on the fact that the complex numbers are two-dimensional. To complete the proof, we substitute \(p = r^2\) and remember that \(dp = 2r\;dr\), and then integrate over \(\theta\) (trivially, since the integrand does not depend on \(\theta\)). The density simplifies to \(e^{-p}\;dp\), which is exactly the exponential distribution for a real variable \(p \ge 0\). Since the Born map takes the Gaussian algorithm to the exponential algorithm, and since each algorithm produces a uniformly random point, the Born map must preserve uniform measure. (Scott likes this proof better because it is algorithmic, and because it is probabilistic.)

Now about quantum supremacy. You might think that a random chosen quantum circuit on \(n\) qubits produces a nearly uniformly random quantum state \(|\psi\rangle\) in their joint Hilbert space, but it’s not quite not that simple. When \(n=53\), or otherwise as \(n \to \infty\), a manageable random circuit is not nearly creative enough to either reach or approximate most of the unit states in the colossal Hilbert space of dimension \(d = 2^n\). The state \(|\psi\rangle\) that you get from (say) a polynomial-sized circuit resembles a fully random state in various statistical and computational respects, both proven and conjectured. As a result, if you measure the qubits in the computational basis, you get a randomized state on \(n\) bits that resembles a uniformly random point in \(\Delta^{2^n-1}\).

If you choose \(d\) probabilities, and if each one is an independent exponential random variable, then the law of large numbers says that the sum (which you use for rescaling) is close to \(d\) when \(d\) is large. When \(d\) is really big like \(2^{53}\), a histogram of the probabilities of the bit strings of a supremacy experiment looks like an exponential curve \(f(p) \propto e^{-pd}\). In a sense, the statistical distribution of the bit strings is almost the same almost every time, independent of which random quantum circuit you choose to generate them. The catch is that the position of any given bit string does depend on the circuit and is highly scrambled. I picture it in my mind like this:

It is thought to be computationally intractable to calculate where each bit string lands on this exponential curve, or even where just one of them does. (The exponential curve is attenuated by noise in the actual experiment, but it’s the same principle.) That is one reason that random quantum circuits are supreme.

The Aaronson-Ambainis Conjecture (2008-2019)

November 17th, 2019

Update: Sadly, Nathan Keller and Ohad Klein tell me that they’ve retracted their preprint, because of what currently looks like a fatal flaw in Lemma 5.3, uncovered by Paata Ivanishvili. I wish them the best of luck in fixing the problem. In the meantime, I’m leaving up this post for “historical” reasons.

Around 1999, one of the first things I ever did in quantum computing theory was to work on a problem that Fortnow and Rogers suggested in a paper: is it possible to separate P from BQP relative to a random oracle? (That is, without first needing to separate P from PSPACE or whatever in the real world?) Or to the contrary: suppose that a quantum algorithm Q makes T queries to a Boolean input string X. Is there then a classical simulation algorithm that makes poly(T) queries to X, and that approximates Q’s acceptance probability for most values of X? Such a classical simulation, were it possible, would still be consistent with the existence of quantum algorithms like Simon’s and Shor’s, which are able to achieve exponential (and even greater) speedups in the black-box setting. It would simply demonstrate the importance, for Simon’s and Shor’s algorithms, of global structure that makes the string X extremely non-random: for example, encoding a periodic function (in the case of Shor’s algorithm), or encoding a function that hides a secret string s (in the case of Simon’s). It would underscore that superpolynomial quantum speedups depend on structure.

I never managed to solve this problem. Around 2008, though, I noticed that a solution would follow from a perhaps-not-obviously-related conjecture, about influences in low-degree polynomials. Namely, let p:Rn→R be a degree-d real polynomial in n variables, and suppose p(x)∈[0,1] for all x∈{0,1}n. Define the variance of p to be
Var(p):=Ex,y[|p(x)-p(y)|],
and define the influence of the ith variable to be
Infi(p):=Ex[|p(x)-p(xi)|].
Here the expectations are over strings in {0,1}n, and xi means x with its ith bit flipped between 0 and 1. Then the conjecture is this: there must be some variable i such that Infi(p) ≥ poly(Var(p)/d) (in other words, that “explains” a non-negligible fraction of the variance of the entire polynomial).

Why would this conjecture imply the statement about quantum algorithms? Basically, because of the seminal result of Beals et al. from 1998: that if a quantum algorithm makes T queries to a Boolean input X, then its acceptance probability can be written as a real polynomial over the bits of X, of degree at most 2T. Given that result, if you wanted to classically simulate a quantum algorithm Q on most inputs—and if you only cared about query complexity, not computation time—you’d simply need to do the following:
(1) Find the polynomial p that represents Q’s acceptance probability.
(2) Find a variable i that explains at least a 1/poly(T) fraction of the total remaining variance in p, and query that i.
(3) Keep repeating step (2), until p has been restricted to a polynomial with not much variance left—i.e., to nearly a constant function p(X)=c. Whenever that happens, halt and output the constant c.
The key is that by hypothesis, this algorithm will halt, with high probability over X, after only poly(T) steps.

Anyway, around the same time, Andris Ambainis had a major break on a different problem that I’d told him about: namely, whether randomized and quantum query complexities are polynomially related for all partial functions with permutation symmetry (like the collision and the element distinctness functions). Andris and I decided to write up the two directions jointly. The result was our 2011 paper entitled The Need for Structure in Quantum Speedups.

Of the two contributions in the “Need for Structure” paper, the one about random oracles and influences in low-degree polynomials was clearly the weaker and less satisfying one. As the reviewers pointed out, that part of the paper didn’t solve anything: it just reduced one unsolved problem to a new, slightly different problem that was also unsolved. Nevertheless, that part of the paper acquired a life of its own over the ensuing decade, as the world’s experts in analysis of Boolean functions and polynomials began referring to the “Aaronson-Ambainis Conjecture.” Ryan O’Donnell, Guy Kindler, and many others had a stab. I even got Terry Tao to spend an hour or two on the problem when I visited UCLA.

Now, at long last, Nathan Keller and Ohad Klein say they’ve proven the Aaronson-Ambainis Conjecture, in a preprint whose title is a riff on ours: “Quantum Speedups Need Structure.”

Their paper hasn’t yet been peer-reviewed, and I haven’t yet carefully studied it, but I could and should: at 19 pages, it looks very approachable and clear, if not as radically short as (say) Huang’s proof of the Sensitivity Conjecture. Keller and Klein’s argument subsumes all the earlier results that I knew would need to be subsumed, and involves all the concepts (like a real analogue of block sensitivity) that I knew would need to be involved somehow.

My plan had been as follows:
(1) Read their paper in detail. Understand every step of their proof.
(2) Write a blog post that reflects my detailed understanding.

Unfortunately, this plan did not sufficiently grapple with the fact that I now have two kids. It got snagged for a week at step (1). So I’m now executing an alternative plan, which is to jump immediately to the blog post.

Assuming Keller and Klein’s result holds up—as I expect it will—by combining it with the observations in my and Andris’s paper, one immediately gets an explanation for why no one has managed to separate P from BQP relative to a random oracle, but only relative to non-random oracles. This complements the work of Kahn, Saks, and Smyth, who around 2000 gave a precisely analogous explanation for the difficulty of separating P from NP∩coNP relative to a random oracle.

Unfortunately, the polynomial blowup is quite enormous: from a quantum algorithm making T queries, Keller and Klein apparently get a classical algorithm making O(T18) queries. But such things can almost always be massively improved.

Feel free to use the comments to ask any questions about this result or its broader context. I’ll either do my best to answer from the limited amount I know, or else I’ll pass the questions along to Nathan and Ohad themselves. Maybe, at some point, I’ll even be forced to understand the new proof.

Congratulations to Nathan and Ohad!

Update (Nov. 20): Tonight I finally did what I should’ve done two weeks ago, and worked through the paper from start to finish. Modulo some facts about noise operators, hypercontractivity, etc. that I took on faith, I now have a reasonable (albeit imperfect) understanding of the proof. It’s great!

In case it’s helpful to anybody, here’s my one-paragraph summary of how it works. First, you hit your bounded degree-d function f with a random restriction to attenuate its higher-degree Fourier coefficients (reminiscent of Linial-Mansour-Nisan).  Next, in that attenuated function, you find a small “coalition” of influential variables—by which we mean, a set of variables for which there’s some assignment that substantially biases f.  You keep iterating—finding influential coalitions in subfunctions on n/4, n/8, etc. variables. All the while, you keep track of the norm of the vector of all the block-sensitivities of all the inputs (the authors don’t clearly explain this in the intro, but they reveal it near the end). Every time you find another influential coalition, that norm goes down by a little, but by approximation theory, it can only go down O(d2) times until it hits rock bottom and your function is nearly constant. By the end, you’ll have approximated f itself by a decision tree of depth poly(d, 1/ε, log(n)).  Finally, you get rid of the log(n) term by using the fact that f essentially depended on at most exp(O(d)) variables anyway.

Anyway, I’m not sure how helpful it is to write more: the paper itself is about 95% as clear as it could possibly be, and even where it isn’t, you’d probably need to read it first (and, uh, know something about influences, block sensitivity, random restrictions, etc.) before any further clarifying remarks would be of use. But happy to discuss more in the comments, if anyone else is reading it.

Annual recruitment post

November 12th, 2019

Just like I did last year, and the year before, I’m putting up a post to let y’all know about opportunities in our growing Quantum Information Center at UT Austin.

I’m proud to report that we’re building something pretty good here. This fall Shyam Shankar joined our Electrical and Computer Engineering (ECE) faculty to do experimental superconducting qubits, while (as I blogged in the summer) the quantum complexity theorist John Wright will join me on the CS faculty in Fall 2020. Meanwhile, Drew Potter, an expert on topological qubits, rejoined our physics faculty after a brief leave. Our weekly quantum information group meeting now regularly attracts around 30 participants—from the turnout, you wouldn’t know it’s not MIT or Caltech or Waterloo. My own group now has five postdocs and six PhD students—as well as some amazing undergrads striving to meet the bar set by Ewin Tang. Course offerings in quantum information currently include Brian La Cour’s Freshman Research Initiative, my own undergrad Intro to Quantum Information Science honors class, and graduate classes on quantum complexity theory, experimental realizations of QC, and topological matter (with more to come). We’ll also be starting an undergraduate Quantum Information Science concentration next fall.

So without further ado:

(1) If you’re interested in pursuing a PhD focused on quantum computing and information (and/or classical theoretical computer science) at UT Austin: please apply! If you want to work with me or John Wright on quantum algorithms and complexity, apply to CS (I can also supervise physics students in rare cases). Also apply to CS, of course, if you want to work with our other CS theory faculty: David Zuckerman, Dana Moshkovitz, Adam Klivans, Anna Gal, Eric Price, Brent Waters, Vijaya Ramachandran, or Greg Plaxton. If you want to work with Drew Potter on nonabelian anyons or suchlike, or with Allan MacDonald, Linda Reichl, Elaine Li, or others on many-body quantum theory, apply to physics. If you want to work with Shyam Shankar on superconducting qubits, apply to ECE. Note that the deadline for CS and physics is December 1, while the deadline for ECE is December 15.

You don’t need to ask me whether I’m on the lookout for great students: I always am! If you say on your application that you want to work with me, I’ll be sure to see it. Emailing individual faculty members is not how it works and won’t help. Admissions are extremely competitive, so I strongly encourage you to apply broadly to maximize your options.

(2) If you’re interested in a postdoc in my group, I’ll have approximately two openings starting in Fall 2020. To apply, just send me an email by January 1, 2020 with the following info:
– Your CV
– 2 or 3 of your best papers (links or PDF attachments)
– The names of two recommenders (who should email me their letters separately)

(3) If you’re on the faculty job market in quantum computing and information—well, please give me a heads-up if you’re potentially interested in Austin! Our CS, physics, and ECE departments are all open to considering additional candidates in quantum information, both junior and senior. I can’t take credit for this—it surely has to do with developments beyond my control, both at UT and beyond—but I’m happy to relay that, in the three years since I arrived in Texas, the appetite for strengthening UT’s presence in quantum information has undergone jaw-dropping growth at every level of the university.

Also, Austin-Bergstrom International Airport now has direct flights to London, Frankfurt, and (soon) Amsterdam and Paris.

Hook ’em Hadamards!

The morality of quantum computing

November 7th, 2019

This morning a humanities teacher named Richard Horan, having read my NYT op-ed on quantum supremacy, emailed me the following question about it:

Is this pursuit [of scalable quantum computation] just an arms race? A race to see who can achieve it first? To what end? Will this achievement yield advances in medical science and human quality of life, or will it threaten us even more than we are threatened presently by our technologies? You seem rather sanguine about its possible development and uses. But how close does the hand on that doomsday clock move to midnight once we “can harness an exponential number of amplitudes for computation”?

I thought this question might possibly be of some broader interest, so here’s my response (with some light edits).

Dear Richard,

A radio interviewer asked me a similar question a couple weeks ago—whether there’s an ethical dimension to quantum computing research.  I replied that there’s an ethical dimension to everything that humans do.

A quantum computer is not like a nuclear weapon: it’s not going to directly kill anybody (unless the dilution refrigerator falls on them or something?).  It’s true that a full, scalable QC, if and when it’s achieved, will give a temporary advantage to people who want to break certain cryptographic codes.  The morality of that, of course, could strongly depend on whether the codebreakers are working for the “good guys” (like the Allies during WWII) or the “bad guys” (like, perhaps, Trump or Vladimir Putin or Xi Jinping).

But in any case, there’s already a push to switch to new cryptographic codes that already exist and that we think are quantum-resistant.  An actual scalable QC on the horizon would of course massively accelerate that push.  And once people make the switch, we expect that security on the Internet will be more-or-less back where it started.

Meanwhile, the big upside potential from QCs is that they’ll provide an unprecedented ability to simulate physics and chemistry at the molecular level.  That could at least potentially help with designing new medicines, as well as new batteries and solar cells and carbon capture technologies—all things that the world desperately needs right now.

Also, the theory developed around QC has already led to many new and profound insights about physics and computation.  Some of us regard that as an inherent good, in the same way that art and music and literature are.

Now, one could argue that the climate crisis, or various other crises that our civilization faces, are so desperate that instead of working to build QCs, we should all just abandon our normal work and directly confront the crises, as (for example) Greta Thunberg is doing.  On some days I share that position.  But of course, were the position upheld, it would have implications not just for quantum computing researchers but for almost everyone on earth—including humanities teachers like yourself.

Best,
Scott

My New York Times op-ed on quantum supremacy

October 30th, 2019

Here it is.

I’d like to offer special thanks to the editor in charge, Eleanor Barkhorn, who commissioned this piece and then went way, way beyond the call of duty to get it right—including relaxing the usual length limit to let me squeeze in amplitudes and interference, and working late into the night to fix last-minute problems. Obviously I take sole responsibility for whatever errors remain.

Of course a lot of material still ended up on the cutting room floor, including a little riff about Andrew Yang’s tweet that because of quantum supremacy, now “no code is uncrackable,” as well as Ivanka Trump’s tweet giving credit for Google’s experiment (one that Google was working toward since 2015) partly to her father’s administration.

While I’m posting: those of a more technical bent might want to check out my new short preprint with UT undergraduate Sam Gunn, where we directly study the complexity-theoretic hardness of spoofing Google’s linear cross-entropy benchmark using a classical computer. Enjoy!

Quantum supremacy: the gloves are off

October 23rd, 2019

Links:
Google paper in Nature
New York Times article
IBM paper and blog post responding to Google’s announcement
Boaz Barak’s new post: “Boaz’s inferior classical inferiority FAQ”
Lipton and Regan’s post
My quantum supremacy interview with the BBC (featuring some of my fewest “uhms” and “ahs” ever!)
NEW: My preprint with Sam Gunn, On the Classical Hardness of Spoofing Linear Cross-Entropy Benchmarking
My interview on NPR affiliate WOSU (starts around 16:30)

When Google’s quantum supremacy paper leaked a month ago—not through Google’s error, but through NASA’s—I had a hard time figuring out how to cover the news here. I had to say something; on the other hand, I wanted to avoid any detailed technical analysis of the leaked paper, because I was acutely aware that my colleagues at Google were still barred by Nature‘s embargo rules from publicly responding to anything I or others said. (I was also one of the reviewers for the Nature paper, which put additional obligations on me.)

I ended up with Scott’s Supreme Quantum Supremacy FAQ, which tried to toe this impossible line by “answering general questions about quantum supremacy, and the consequences of its still-hypothetical achievement, in light of the leak.” It wasn’t an ideal solution—for one thing, because while I still regard Google’s sampling experiment as a historic milestone for our whole field, there are some technical issues, aspects that subsequent experiments (hopefully coming soon) will need to improve. Alas, the ground rules of my FAQ forced me to avoid such issues, which caused some readers to conclude mistakenly that I didn’t think there were any.

Now, though, the Google paper has come out as Nature‘s cover story, at the same time as there have been new technical developments—most obviously, the paper from IBM (see also their blog post) saying that they could simulate the Google experiment in 2.5 days, rather than the 10,000 years that Google had estimated.

(Yesterday I was deluged by emails asking me “whether I’d seen” IBM’s paper. As a science blogger, I try to respond to stuff pretty quickly when necessary, but I don’t—can’t—respond in Twitter time.)

So now the gloves are off. No more embargo. Time to address the technical stuff under the hood—which is the purpose of this post.

I’m going to assume, from this point on, that you already understand the basics of sampling-based quantum supremacy experiments, and that I don’t need to correct beginner-level misconceptions about what the term “quantum supremacy” does and doesn’t mean (no, it doesn’t mean scalability, fault-tolerance, useful applications, breaking public-key crypto, etc. etc.). If this is not the case, you could start (e.g.) with my FAQ, or with John Preskill’s excellent Quanta commentary.

Without further ado:

(1) So what about that IBM thing? Are random quantum circuits easy to simulate classically?

OK, so let’s carefully spell out what the IBM paper says. They argue that, by commandeering the full attention of Summit at Oak Ridge National Lab, the most powerful supercomputer that currently exists on Earth—one that fills the area of two basketball courts, and that (crucially) has 250 petabytes of hard disk space—one could just barely store the entire quantum state vector of Google’s 53-qubit Sycamore chip in hard disk.  And once one had done that, one could simulate the chip in ~2.5 days, more-or-less just by updating the entire state vector by brute force, rather than the 10,000 years that Google had estimated on the basis of my and Lijie Chen’s “Schrödinger-Feynman algorithm” (which can get by with less memory).

The IBM group understandably hasn’t actually done this yet—even though IBM set it up, the world’s #1 supercomputer isn’t just sitting around waiting for jobs! But I see little reason to doubt that their analysis is basically right. I don’t know why the Google team didn’t consider how such near-astronomical hard disk space would change their calculations; probably they wish they had.

I find this to be much, much better than IBM’s initial reaction to the Google leak, which was simply to dismiss the importance of quantum supremacy as a milestone. Designing better classical simulations is precisely how IBM and others should respond to Google’s announcement, and how I said a month ago that I hoped they would respond. If we set aside the pass-the-popcorn PR war (or even if we don’t), this is how science progresses.

But does IBM’s analysis mean that “quantum supremacy” hasn’t been achieved? No, it doesn’t—at least, not under any definition of “quantum supremacy” that I’ve ever used. The Sycamore chip took about 3 minutes to generate the ~5 million samples that were needed to pass the “linear cross-entropy benchmark”—the statistical test that Google applies to the outputs of its device.

(Technical note added: Google’s samples are extremely noisy—the actual distribution being sampled from is something like 0.998U+0.002D, where U is the uniform distribution and D is the hard distribution that you want. What this means, in practice, is that you need to take a number of samples that’s large compared to 1/0.0022, in order to extract a signal corresponding to D. But the good news is that Google can take that many samples in just a few minutes, since once the circuit has been loaded onto the chip, generating each sample takes only about 40 microseconds. And once you’ve done this, what hardness results we have for passing the linear cross-entropy test—to be discussed later in this post—apply basically just as well as if you’d taken a single noiseless sample.)

Anyway, you might notice that three minutes versus 2.5 days is still a quantum speedup by a factor of 1200. But even more relevant, I think, is to compare the number of “elementary operations.” Let’s generously count a FLOP (floating-point operation) as the equivalent of a quantum gate. Then by my estimate, we’re comparing ~5×109 quantum gates against ~2×1020 FLOPs—a quantum speedup by a factor of ~40 billion.

For me, though, the broader point is that neither party here—certainly not IBM—denies that the top-supercomputers-on-the-planet-level difficulty of classically simulating Google’s 53-qubit programmable chip really is coming from the exponential character of the quantum states in that chip, and nothing else. That’s what makes this back-and-forth fundamentally different from the previous one between D-Wave and the people who sought to simulate its devices classically. The skeptics, like me, didn’t much care what speedup over classical benchmarks there was or wasn’t today: we cared about the increase in the speedup as D-Wave upgraded its hardware, and the trouble was that we never saw a convincing case that there would be one. I’m a theoretical computer scientist, and this is what I believe: that after the constant factors have come and gone, what remains are asymptotic growth rates.

In the present case, while increasing the circuit depth won’t evade IBM’s “store everything to hard disk” strategy, increasing the number of qubits will. If Google, or someone else, upgraded from 53 to 55 qubits, that would apparently already be enough to exceed Summit’s 250-petabyte storage capacity. At 60 qubits, you’d need 33 Summits. At 70 qubits, enough Summits to fill a city … you get the idea.

From the beginning, it was clear that quantum supremacy would not be a milestone like the moon landing—something that’s achieved in a moment, and is then clear to everyone for all time. It would be more like eradicating measles: it could be achieved, then temporarily unachieved, then re-achieved. For by definition, quantum supremacy all about beating something—namely, classical computation—and the latter can, at least for a while, fight back.

As Boaz Barak put it to me, the current contest between IBM and Google is analogous to Kasparov versus Deep Blueexcept with the world-historic irony that IBM is playing the role of Kasparov! In other words, Kasparov can put up a heroic struggle, during a “transitional period” that lasts a year or two, but the fundamentals of the situation are that he’s toast. If Kasparov had narrowly beaten Deep Blue in 1997, rather than narrowly losing, the whole public narrative would likely have been different (“humanity triumphs over computers after all!”). Yet as Kasparov himself well knew, the very fact that the contest was close meant that, either way, human dominance would soon end for good.

Let me leave the last word on this to friend-of-the-blog Greg Kuperberg, who graciously gave me permission to quote his comments about the IBM paper.

I’m not entirely sure how embarrassed Google should feel that they overlooked this.   I’m sure that they would have been happier to anticipate it, and happier still if they had put more qubits on their chip to defeat it.   However, it doesn’t change their real achievement.

I respect the IBM paper, even if the press along with it seems more grouchy than necessary.   I tend to believe them that the Google team did not explore all avenues when they said that their 53 qubits aren’t classically simulable.   But if this is the best rebuttal, then you should still consider how much Google and IBM still agree on this as a proof-of-concept of QC.   This is still quantum David vs classical Goliath, in the extreme.   53 qubits is in some ways still just 53 bits, only enhanced with quantum randomness.  To answer those 53 qubits, IBM would still need entire days of computer time with the world’s fastest supercomputer, a 200-petaflop machine with hundreds of thousands of processing cores and trillions of high-speed transistors.   If we can confirm that the Google chip actually meets spec, but we need this much computer power to do it, then to me that’s about as convincing as a larger quantum supremacy demonstration that humanity can no longer confirm at all.

Honestly, I’m happy to give both Google and IBM credit for helping the field of QC, even if it is the result of a strange dispute.

I should mention that, even before IBM’s announcement, Johnnie Gray, a postdoc at Imperial College, gave a talk (abstract here) at Caltech’s Institute for Quantum Information with a proposal for a different faster way to classically simulate quantum circuits like Google’s—in this case, by doing tensor network contraction more cleverly. Unlike both IBM’s proposed brute-force simulation, and the Schrödinger-Feynman algorithm that Google implemented, Gray’s algorithm (as far as we know now) would need to be repeated k times if you wanted k independent samples from the hard distribution. Partly because of this issue, Gray’s approach doesn’t currently look competitive for simulating thousands or millions of samples, but we’ll need to watch it and see what happens.

(2) Direct versus indirect verification.

The discussion of IBM’s proposed simulation brings us to a curious aspect of the Google paper—one that was already apparent when Nature sent me the paper for review back in August. Namely, Google took its supremacy experiments well past the point where even they themselves knew how to verify the results, by any classical computation that they knew how to perform feasibly (say, in less than 10,000 years).

So you might reasonably ask: if they couldn’t even verify the results, then how did they get to claim quantum speedups from those experiments? Well, they resorted to various gambits, which basically involved estimating the fidelity on quantum circuits that looked almost the same as the hard circuits, but happened to be easier to simulate classically, and then making the (totally plausible) assumption that that fidelity would be maintained on the hard circuits. Interestingly, they also cached their outputs and put them online (as part of the supplementary material to their Nature paper), in case it became feasible to verify them in the future.

Maybe you can now see where this is going. From Google’s perspective, IBM’s rainstorm comes with a big silver lining. Namely, by using Summit, hopefully it will now be possible to verify Google’s hardest (53-qubit and depth-20) sampling computations directly! This should provide an excellent test, since not even the Google group themselves would’ve known how to cheat and bias the results had they wanted to.

This whole episode has demonstrated the importance, when doing a sampling-based quantum supremacy experiment, of going deep into the regime where you can no longer classically verify the outputs, as weird as that sounds. Namely, you need to leave yourself a margin, in the likely event that the classical algorithms improve!

Having said that, I don’t mind revealing at this point that the lack of direct verification of the outputs, for the largest reported speedups, was my single biggest complaint when I reviewed Google’s Nature submission. It was because of my review that they added a paragraph explicitly pointing out that they did do direct verification for a smaller quantum speedup:

The largest circuits for which the fidelity can still be directly verified have 53 qubits and a simplified gate arrangement. Performing random circuit sampling on these at 0.8% fidelity takes one million cores 130 seconds, corresponding to a million-fold speedup of the quantum processor relative to a single core.

(An earlier version of this post misstated the numbers involved.)

(3) The asymptotic hardness of spoofing Google’s benchmark.

OK, but if Google thought that spoofing its test would take 10,000 years, using the best known classical algorithms running on the world’s top supercomputers, and it turns out instead that it could probably be done in more like 2.5 days, then how much else could’ve been missed? Will we find out next that Google’s benchmark can be classically spoofed in mere milliseconds?

Well, no one can rule that out, but we do have some reasons to think that it’s unlikely—and crucially, that even if it turned out to be true, one would just have to add 10 or 20 or 30 more qubits to make it no longer true. (We can’t be more definitive than that? Aye, such are the perils of life at a technological inflection point—and of computational complexity itself.)

The key point to understand here is that we really are talking about simulating a random quantum circuit, with no particular structure whatsoever. While such problems might have a theoretically efficient classical algorithm—i.e., one that runs in time polynomial in the number of qubits—I’d personally be much less surprised if you told me there was a polynomial-time classical algorithm for factoring. In the universe where amplitudes of random quantum circuits turn out to be efficiently computable—well, you might as well just tell me that P=PSPACE and be done with it.

Crucially, if you look at IBM’s approach to simulating quantum circuits classically, and Johnnie Gray’s approach, and Google’s approach, they could all be described as different flavors of “brute force.” That is, they all use extremely clever tricks to parallelize, shave off constant factors, make the best use of available memory, etc., but none involves any deep new mathematical insight that could roust BPP and BQP and the other complexity gods from their heavenly slumber. More concretely, none of these approaches seem to have any hope of “breaching the 2n barrier,” where n is the number of qubits in the quantum circuit to be simulated (assuming that the circuit depth is reasonably large). Mostly, they’re just trying to get down to that barrier, while taking the maximum advantage of whatever storage and connectivity and parallelism are there.

Ah, but at the end of the day, we only believe that Google’s Sycamore chip is solving a classically hard problem because of the statistical test that Google applies to its outputs: the so-called “Linear Cross-Entropy Benchmark,” which I described in Q3 of my FAQ. And even if we grant that calculating the output probabilities for a random quantum circuit is almost certainly classically hard, and sampling the output distribution of a random quantum circuit is almost certainly classically hard—still, couldn’t spoofing Google’s benchmark be classically easy?

This last question is where complexity theory can contribute something to the story. A couple weeks ago, UT undergraduate Sam Gunn and I adapted the hardness analysis from my and Lijie Chen’s 2017 paper “Complexity-Theoretic Foundations of Quantum Supremacy Experiments,” to talk directly about the classical hardness of spoofing the Linear Cross-Entropy benchmark. Our short paper about this should be on the arXiv later this week (or early next week, given that there are no arXiv updates on Friday or Saturday nights) here it is.

Briefly, Sam and I show that if you had a sub-2n classical algorithm to spoof the Linear Cross-Entropy benchmark, then you’d also have a sub-2n classical algorithm that, given as input a random quantum circuit, could estimate a specific output probability (for example, that of the all-0 string) with variance at least slightly (say, Ω(2-3n)) better than that of the trivial estimator that just always guesses 2-n. Or in other words: we show that spoofing Google’s benchmark is no easier than the general problem of nontrivially estimating amplitudes in random quantum circuits. Furthermore, this result automatically generalizes to the case of noisy circuits: all that the noise affects is the threshold for the Linear Cross-Entropy benchmark, and thus (indirectly) the number of samples one needs to take with the QC. Our result helps to explain why, indeed, neither IBM nor Johnnie Gray nor anyone else suggested any attack that’s specific to Google’s Linear Cross-Entropy benchmark: they all simply attack the general problem of calculating the final amplitudes.

(4) Why use Linear Cross-Entropy at all?

In the comments of my FAQ, some people wondered why Google chose the Linear Cross-Entropy benchmark specifically—especially since they’d used a different benchmark (multiplicative cross-entropy, which unlike the linear version actually is a cross-entropy) in their earlier papers. I asked John Martinis this question, and his answer was simply that linear cross-entropy had the lowest variance of any estimator they tried. Since I also like linear cross-entropy—it turns out, for example, to be convenient for the analysis of my certified randomness protocol—I’m 100% happy with their choice. Having said that, there are many other choices of benchmark that would’ve also worked fine, and with roughly the same level of theoretical justification.

(5) Controlled-Z versus iSWAP gates.

Another interesting detail from the Google paper is that, in their previous hardware, they could implement a particular 2-qubit gate called the Controlled-Z. For their quantum supremacy demonstration, on the other hand, they modified their hardware to implement a different 2-qubit gate called the iSWAP some weird combination of iSWAP and Controlled-Z; see the comments section for more. Now, this other gate has no known advantages over the Controlled-Z, for any applications like quantum simulation or Shor’s algorithm or Grover search. Why then did Google make the switch? Simply because, with certain classical simulation methods that they’d been considering, the simulation’s running time grows like 4 to the power of the number of these other gates, but only like 2 to the power of the number of Controlled-Z gates! In other words, they made this engineering choice purely and entirely to make a classical simulation of their device sweat more. This seems totally fine and entirely within the rules to me. (Alas, this choice has no effect on a proposed simulation method like IBM’s.)

(6) Gil Kalai’s objections.

Over the past month, Shtetl-Optimized regular and noted quantum computing skeptic Gil Kalai has been posting one objection to the Google experiment after another on his blog. Unlike the IBM group and many of Google’s other critics, Gil completely accepts the centrality of quantum supremacy as a goal. Indeed, he’s firmly predicted for years that quantum supremacy could never be achieved for fundamental reasons—and he agrees that the Google result, if upheld, would refute his worldview. Gil also has no dispute with the exponential classical hardness of the problem that Google is solving.

Instead, Gil—if we’re talking not about “steelmanning” his beliefs, but about what he himself actually said—has taken the position that the Google experiment must’ve been done wrong and will need to be retracted. He’s offered varying grounds for this. First he said that Google never computed the full histogram of probabilities with a smaller number of qubits (for which such an experiment is feasible), which would be an important sanity check. Except, it turns out they did do that, and it’s in their 2018 Science paper. Next he said that the experiment is invalid because the qubits have to be calibrated in a way that depends on the specific circuit to be applied. Except, this too turns out to be false: John Martinis explicitly confirmed for me that once the qubits are calibrated, you can run any circuit on them that you want. In summary, unlike the objections of the IBM group, so far I’ve found Gil’s objections to be devoid of scientific interest or merit.

Update #1: Alas, I’ll have limited availability today for answering comments, since we’ll be grading the midterm exam for my Intro to Quantum Information Science course! I’ll try to handle the backlog tomorrow (Thursday).

Update #2: Aaannd … timed to coincide with the Google paper, last night the group of Jianwei Pan and Chaoyang Lu put up a preprint on the arXiv reporting a BosonSampling experiment with 20 photons 14 photons observed out of 20 generated (the previous record had been 6 photons). At this stage of the quantum supremacy race, many had of course written off BosonSampling—or said that its importance was mostly historical, in that it inspired Google’s random circuit sampling effort.  I’m thrilled to see BosonSampling itself take such a leap; hopefully, this will eventually lead to a demonstration that BosonSampling was (is) a viable pathway to quantum supremacy as well.  And right now, with fault-tolerance still having been demonstrated in zero platforms, we need all the viable pathways we can get.  What an exciting day for the field.

Book Review: ‘The AI Does Not Hate You’ by Tom Chivers

October 6th, 2019

A couple weeks ago I read The AI Does Not Hate You: Superintelligence, Rationality, and the Race to Save the World, the first-ever book-length examination of the modern rationalist community, by British journalist Tom Chivers. I was planning to review it here, before it got preempted by the news of quantum supremacy (and subsequent news of classical non-supremacy). Now I can get back to rationalists.

Briefly, I think the book is a triumph. It’s based around in-person conversations with many of the notable figures in and around the rationalist community, in its Bay Area epicenter and beyond (although apparently Eliezer Yudkowsky only agreed to answer technical questions by Skype), together of course with the voluminous material available online. There’s a good deal about the 1990s origins of the community that I hadn’t previously known.

The title is taken from Eliezer’s aphorism, “The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else.” In other words: as soon as anyone succeeds in building a superhuman AI, if we don’t take extreme care that the AI’s values are “aligned” with human ones, the AI might be expected to obliterate humans almost instantly as a byproduct of pursuing whatever it does value, more-or-less as we humans did with woolly mammoths, moas, and now gorillas, rhinos, and thousands of other species.

Much of the book relates Chivers’s personal quest to figure out how seriously he should take this scenario. Are the rationalists just an unusually nerdy doomsday cult? Is there some non-negligible chance that they’re actually right about the AI thing? If so, how much more time do we have—and is there even anything meaningful that can be done today? Do the dramatic advances in machine learning over the past decade change the outlook? Should Chivers be worried about his own two children? How does this risk compare to the more “prosaic” civilizational risks, like climate change or nuclear war? I suspect that Chivers’s exploration will be most interesting to readers who, like me, regard the answers to none of these questions as obvious.

While it sounds extremely basic, what makes The AI Does Not Hate You so valuable to my mind is that, as far as I know, it’s nearly the only examination of the rationalists ever written by an outsider that tries to assess the ideas on a scale from true to false, rather than from quirky to offensive. Chivers’s own training in academic philosophy seems to have been crucial here. He’s not put off by people who act weirdly around him, even needlessly cold or aloof, nor by utilitarian thought experiments involving death or torture or weighing the value of human lives. He just cares, relentlessly, about the ideas—and about remaining a basically grounded and decent person while engaging them. Most strikingly, Chivers clearly feels a need—anachronistic though it seems in 2019—actually to understand complicated arguments, be able to repeat them back correctly, before he attacks them.

Indeed, far from failing to understand the rationalists, it occurs to me that the central criticism of Chivers’s book is likely to be just the opposite: he understands the rationalists so well, extends them so much sympathy, and ends up endorsing so many aspects of their worldview, that he must simply be a closet rationalist himself, and therefore can’t write about them with any pretense of journalistic or anthropological detachment. For my part, I’d say: it’s true that The AI Does Not Hate You is what you get if you treat rationalists as extremely smart (if unusual) people from whom you might learn something of consequence, rather than as monkeys in a zoo. On the other hand, Chivers does perform the journalist’s task of constantly challenging the rationalists he meets, often with points that (if upheld) would be fatal to their worldview. One of the rationalists’ best features—and this precisely matches my own experience—is that, far from clamming up or storming off when faced with such challenges (“lo! the visitor is not one of us!”), the rationalists positively relish them.

It occurred to me the other day that we’ll never know how the rationalists’ ideas would’ve developed, had they continued to do so in a cultural background like that of the late 20th century. As Chivers points out, the rationalists today are effectively caught in the crossfire of a much larger cultural war—between, to their right, the recrudescent know-nothing authoritarians, and to their left, what one could variously describe as woke culture, call-out culture, or sneer culture. On its face, it might seem laughable to conflate the rationalists with today’s resurgent fascists: many rationalists are driven by their utilitarianism to advocate open borders and massive aid to the Third World; the rationalist community is about as welcoming of alternative genders and sexualities as it’s humanly possible to be; and leading rationalists like Scott Alexander and Eliezer Yudkowsky strongly condemned Trump for the obvious reasons.

Chivers, however, explains how the problem started. On rationalist Internet forums, many misogynists and white nationalists and so forth encountered nerds willing to debate their ideas politely, rather than immediately banning them as more mainstream venues would. As a result, many of those forces of darkness (and they probably don’t mind being called that) predictably congregated on the rationalist forums, and their stench predictably wore off on the rationalists themselves. Furthermore, this isn’t an easy-to-fix problem, because debating ideas on their merits, extending charity to ideological opponents, etc. is sort of the rationalists’ entire shtick, whereas denouncing and no-platforming anyone who can be connected to an ideological enemy (in the modern parlance, “punching Nazis”) is the entire shtick of those condemning the rationalists.

Compounding the problem is that, as anyone who’s ever hung out with STEM nerds might’ve guessed, the rationalist community tends to skew WASP, Asian, or Jewish, non-impoverished, and male. Worse yet, while many rationalists live their lives in progressive enclaves and strongly support progressive values, they’ll also undergo extreme anguish if they feel forced to subordinate truth to those values.

Chivers writes that all of these issues “blew up in spectacular style at the end of 2014,” right here on this blog. Oh, what the hell, I’ll just quote him:

Scott Aaronson is, I think it’s fair to say, a member of the Rationalist community. He’s a prominent theoretical computer scientist at the University of Texas at Austin, and writes a very interesting, maths-heavy blog called Shtetl-Optimised.

People in the comments under his blog were discussing feminism and sexual harassment. And Aaronson, in a comment in which he described himself as a fan of Andrea Dworkin, described having been terrified of speaking to women as a teenager and young man. This fear was, he said, partly that of being thought of as a sexual abuser or creep if any woman ever became aware that he sexually desired them, a fear that he picked up from sexual-harassment-prevention workshops at his university and from reading feminist literature. This fear became so overwhelming, he said in the comment that came to be known as Comment #171, that he had ‘constant suicidal thoughts’ and at one point ‘actually begged a psychiatrist to prescribe drugs that would chemically castrate me (I had researched which ones), because a life of mathematical asceticism was the only future that I could imagine for myself.’ So when he read feminist articles talking about the ‘male privilege’ of nerds like him, he didn’t recognise the description, and so felt himself able to declare himself ‘only’ 97 per cent on board with the programme of feminism.

It struck me as a thoughtful and rather sweet remark, in the midst of a long and courteous discussion with a female commenter. But it got picked up, weirdly, by some feminist bloggers, including one who described it as ‘a yalp of entitlement combined with an aggressive unwillingness to accept that women are human beings just like men’ and that Aaronson was complaining that ‘having to explain my suffering to women when they should already be there, mopping my brow and offering me beers and blow jobs, is so tiresome.’

Scott Alexander (not Scott Aaronson) then wrote a furious 10,000-word defence of his friend… (p. 214-215)

And then Chivers goes on to explain Scott Alexander’s central thesis, in Untitled, that privilege is not a one-dimensional axis, so that (to take one example) society can make many women in STEM miserable while also making shy male nerds miserable in different ways.

For nerds, perhaps an alternative title for Chivers’s book could be “The Normal People Do Not Hate You (Not All of Them, Anyway).” It’s as though Chivers is demonstrating, through understated example, that taking delight in nerds’ suffering, wanting them to be miserable and alone, mocking their weird ideas, is not simply the default, well-adjusted human reaction, with any other reaction being ‘creepy’ and ‘problematic.’ Some might even go so far as to apply the latter adjectives to the sneerers’ attitude, the one that dresses up schoolyard bullying in a social-justice wig.

Reading Chivers’s book prompted me to reflect on my own relationship to the rationalist community. For years, I interacted often with the community—I’ve known Robin Hanson since ~2004 and Eliezer Yudkowsky since ~2006, and our blogs bounced off each other—but I never considered myself a member.  I never ranked paperclip-maximizing AIs among humanity’s more urgent threats—indeed, I saw them as a distraction from an all-too-likely climate catastrophe that will leave its survivors lucky to have stone tools, let alone AIs. I was also repelled by what I saw as the rationalists’ cultier aspects.  I even once toyed with the idea of changing the name of this blog to “More Wrong” or “Wallowing in Bias,” as a play on the rationalists’ LessWrong and OvercomingBias.

But I’ve drawn much closer to the community over the last few years, because of a combination of factors:

  1. The comment-171 affair. This was not the sort of thing that could provide any new information about the likelihood of a dangerous AI being built, but was (to put it mildly) the sort of thing that can tell you who your friends are. I learned that empathy works a lot like intelligence, in that those who boast of it most loudly are often the ones who lack it.
  2. The astounding progress in deep learning and reinforcement learning and GANs, which caused me (like everyone else, perhaps) to update in the direction of human-level AI in our lifetimes being an actual live possibility,
  3. The rise of Scott Alexander. To the charge that the rationalists are a cult, there’s now the reply that Scott, with his constant equivocations and doubts, his deep dives into data, his clarity and self-deprecating humor, is perhaps the least culty cult leader in human history. Likewise, to the charge that the rationalists are basement-dwelling kibitzers who accomplish nothing of note in the real world, there’s now the reply that Scott has attracted a huge mainstream following (Steven Pinker, Paul Graham, presidential candidate Andrew Yang…), purely by offering up what’s self-evidently some of the best writing of our time.
  4. Research. The AI-risk folks started publishing some research papers that I found interesting—some with relatively approachable problems that I could see myself trying to think about if quantum computing ever got boring. This shift seems to have happened at roughly around the same time my former student, Paul Christiano, “defected” from quantum computing to AI-risk research.

Anyway, if you’ve spent years steeped in the rationalist blogosphere, read Eliezer’s “Sequences,” and so on, The AI Does Not Hate You will probably have little that’s new, although it might still be interesting to revisit ideas and episodes that you know through a newcomer’s eyes. To anyone else … well, reading the book would be a lot faster than spending all those years reading blogs! I’ve heard of some rationalists now giving out copies of the book to their relatives, by way of explaining how they’ve chosen to spend their lives.

I still don’t know whether there’s a risk worth worrying about that a misaligned AI will threaten human civilization in my lifetime, or my children’s lifetimes, or even 500 years—or whether everyone will look back and laugh at how silly some people once were to think that (except, silly in which way?). But I do feel fairly confident that The AI Does Not Hate You will make a positive difference—possibly for the world, but at any rate for a little well-meaning community of sneered-at nerds obsessed with the future and with following ideas wherever they lead.