Not the critic who counts

October 11th, 2017

There’s a website called Stop Timothy Gowers! !!! —yes, that’s the precise name, including the exclamation points.  The site is run by a mathematician who for years went under the pseudonym “owl / sowa,” but who’s since outed himself as Nikolai Ivanov.

For those who don’t know, Sir Timothy Gowers is a Fields Medalist, known for seminal contributions including the construction of Banach spaces with strange properties, the introduction of the Gowers norm, explicit bounds for the regularity lemma, and more—but who’s known at least as well for explaining math, in his blog, books, essays, MathOverflow, and elsewhere, in a remarkably clear, friendly, and accessible way.  He’s also been a leader in the fight to free academia from predatory publishers.

So why on earth would a person like that need to be stopped?  According to sowa, because Gowers, along with other disreputable characters like Terry Tao and Endre Szemerédi and the late Paul Erdös, represents a dangerous style of doing mathematics: a style that’s just as enamored of concrete problems as it is of abstract theory-building, and that doesn’t even mind connections to other fields like theoretical computer science.  If that style becomes popular with young people, it will prevent faculty positions and prestigious prizes from going to the only deserving kind of mathematics: the kind exemplified by Bourbaki and by Alexander Grothendieck, which builds up theoretical frameworks with principled disdain for the solving of simple-to-state problems.  Mathematical prizes going to the wrong people—or even going to the right people but presented by the wrong people—are constant preoccupations of sowa’s.  Read his blog and let me know if I’ve unfairly characterized it.


Now for something totally unrelated.  I recently discovered a forum on Reddit called SneerClub, which, as its name suggests, is devoted to sneering.  At whom?  Basically, at anyone who writes anything nice about nerds or Silicon Valley, or who’s associated with the “rationalist community,” or the Effective Altruist movement, or futurism or AI risk.  Typical targets include Scott Alexander, Eliezer Yudkowsky, Robin Hanson, Michael Vassar, Julia Galef, Paul Graham, Ray Kurzweil, Elon Musk … and with a list like that, I guess I should be honored to be a regular target too.

The basic SneerClub M.O. is to seize on a sentence that, when ripped from context and reflected through enough hermeneutic funhouse mirrors, can make nerds out to look like right-wing villains, oppressing the downtrodden with rays of disgusting white maleness (even, it seems, ones who aren’t actually white or male).  So even if the nerd under discussion turns out to be, say, a leftist or a major donor to anti-Trump causes or malaria prevention or whatever, readers can feel reassured that their preexisting contempt was morally justified after all.

Thus: Eliezer Yudkowsky once wrote a piece of fiction in which a character, breaking the fourth wall, comments that another character seems to have no reason to be in the story.  This shows that Eliezer is a fascist who sees people unlike himself as having no reason to exist, and who’d probably exterminate them if he could.  Or: many rationalist nerds spend a lot of effort arguing against Trumpists, alt-righters, and neoreactionaries.  The fact that they interact with those people, in order to rebut them, shows that they’re probably closet neoreactionaries themselves.


When I browse sites like “Stop Timothy Gowers! !!!” or SneerClub, I tend to get depressed about the world—and yet I keep browsing, out of a fascination that I don’t fully understand.  I ask myself: how can a person read Gowers’s blog, or Slate Star Codex, without seeing what I see, which is basically luminous beacons of intellectual honesty and curiosity and clear thought and sparkling prose and charity to dissenting views, shining out far across the darkness of online discourse?

(Incidentally, Gowers lists “Stop Timothy Gowers! !!!” in his blogroll, and I likewise learned of SneerClub only because Scott Alexander linked to it.)

I’m well aware that this very question will only prompt more sneers.  From the sneerers’ perspective, they and their friends are the beacons, while Gowers or Scott Alexander are the darkness.  How could a neutral observer possibly decide who was right?

But then I reflect that there’s at least one glaring asymmetry between the sides.

If you read Timothy Gowers’s blog, one thing you’ll constantly notice is mathematics.  When he’s not weighing in on current events—for example, writing against Brexit, Elsevier, or the destruction of a math department by cost-cutting bureaucrats—Gowers is usually found delighting in exploring a new problem, or finding a new way to explain a known result.  Often, as with his dialogue with John Baez and others about the recent “p=t” breakthrough, Gowers is struggling to understand an unfamiliar piece of mathematics—and, completely unafraid of looking like an undergrad rather than a Fields Medalist, he simply shares each step of his journey, mistakes and all, inviting you to follow for as long as you can keep up.  Personally, I find it electrifying: why can’t all mathematicians write like that?

By contrast, when you read sowa’s blog, for all the anger about the sullying of mathematics by unworthy practitioners, there’s a striking absence of mathematical exposition.  Not once does sowa ever say: “OK, forget about the controversy.  Since you’re here, instead of just telling you about the epochal greatness of Grothendieck, let me walk you through an example.  Let me share a beautiful little insight that came out of his approach, in so self-contained a way that even a physicist or computer scientist will understand it.”  In other words, sowa never uses his blog to do what Gowers does every day.  Sowa might respond that that’s what papers are for—but the thing about a blog is that it gives you the chance to reach a much wider readership than your papers do.  If someone is already blogging anyway, why wouldn’t they seize that chance to share something they love?

Similar comments apply to Slate Star Codex versus r/SneerClub.  When I read an SSC post, even if I vehemently disagree with the central thesis (which, yes, happens sometimes), I always leave the diner intellectually sated.  For the rest of the day, my brain is bloated with new historical tidbits, or a deep-dive into the effects of a psychiatric drug I’d never heard of, or a jaw-dropping firsthand account of life as a medical resident, or a different way to think about a philosophical problem—or, if nothing else, some wicked puns and turns of phrase.

But when I visit r/SneerClub—well, I get exactly what’s advertised on the tin.  Once you’ve read a few, the sneers become pretty predictable.  I thought that for sure, I’d occasionally find something like: “look, we all agree that Eliezer Yudkowsky and Elon Musk and Nick Bostrom are talking out their asses about AI, and are coddled white male emotional toddlers to boot.  But even granting that, what do we think about AI?  Are intelligences vastly smarter than humans possible?  If not, then what principle rules them out?  What, if anything, can be said about what a superintelligent being would do, or want?  Just for fun, let’s explore this a little: I mean the actual questions themselves, not the psychological reasons why others explore them.”

That never happens.  Why not?


There’s another fascinating Reddit forum called “RoastMe”, where people submit a photo of themselves holding a sign expressing their desire to be “roasted”—and then hundreds of Redditors duly oblige, savagely mocking the person’s appearance and anything else they can learn about the person from their profile.  Many of the roasts are so merciless that one winces vicariously for the poor schmucks who signed up for this, hopes that they won’t be driven to self-harm or suicide.  But browse enough roasts, and a realization starts to sink in: there’s no person, however beautiful or interesting they might’ve seemed a priori, for whom this roasting can’t be accomplished.  And that very generality makes the roasting lose much of its power—which maybe, optimistically, was the point of the whole exercise?

In the same way, spend a few days browsing SneerClub, and the truth hits you: once you’ve made their enemies list, there’s nothing you could possibly say or do that they wouldn’t sneer at.  Like, say it’s a nice day outside, and someone will reply:

“holy crap how much of an entitled nerdbro do you have to be, to erase all the marginalized people for whom the day is anything but ‘nice’—or who might be unable to go outside at all, because of limited mobility or other factors never even considered in these little rich white boys’ geek utopia?”

For me, this realization is liberating.  If appeasement of those who hate you is doomed to fail, why bother even embarking on it?


I’ve spent a lot of time on this blog criticizing D-Wave, and cringeworthy popular articles about quantum computing, and touted arXiv preprints that say wrong things.  But I hope regular readers feel like I’ve also tried to offer something positive: y’know, actual progress in quantum computing that actually excites me, or a talk about big numbers, or an explanation of the Bekenstein bound, whatever.  My experience with sites like “Stop Timothy Gowers! !!!” and SneerClub makes me feel like I ought to be doing less criticizing and more positive stuff.

Why, because I fear turning into a sneerer myself?  No, it’s subtler than that: because reading the sneerers drives home for me that it’s a fool’s quest to try to become what Scott Alexander once called an “apex predator of the signalling world.”

At the risk of stating the obvious: if you write, for example, that Richard Feynman was a self-aggrandizing chauvinist showboater, then even if your remarks have a nonzero inner product with the truth, you don’t thereby “transcend” Feynman and stand above him, in the same way that set theory transcends and stands above arithmetic by constructing a model for it.  Feynman’s achievements don’t thereby become your achievements.

When I was in college, I devoured Ray Monk’s two-volume biography of Bertrand Russell.  This is a superb work of scholarship, which I warmly recommend to everyone.  But there’s one problem with it: Monk is constantly harping on his subject’s failures, and he has no sense of humor, and Russell does.  The result is that, whenever Monk quotes Russell’s personal letters at length to prove what a jerk Russell was, the quoted passages just leap off the page—as if old Bertie has come back from the dead to share a laugh with you, the reader, while his biographer looks on sternly and says, “you two think this is funny?”

For a writer, I can think of no higher aspiration than that: to write like Bertrand Russell or like Scott Alexander—in such a way that, even when people quote you to stand above you, your words break free of the imprisoning quotation marks, wiggle past the critics, and enter the minds of readers of your generation and of generations not yet born.


Update (Nov. 13): Since apparently some people didn’t know (?!), the title of this post comes from the famous Teddy Roosevelt quote:

It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.

Coming to Nerd Central

October 8th, 2017

While I’m generally on sabbatical in Tel Aviv this year, I’ll be in the Bay Area from Saturday Oct. 14 through Wednesday Oct. 18, where I look forward to seeing many friends new and old.  On Wednesday evening, I’ll be giving a public talk in Berkeley, through the Simons Institute’s “Theoretically Speaking” series, entitled Black Holes, Firewalls, and the Limits of Quantum Computers.  I hope to see at least a few of you there!  (I do have readers in the Bay Area, don’t I?)

But there’s more: on Saturday Oct. 14, I’m thinking of having a first-ever Shtetl-Optimized meetup, somewhere near the Berkeley campus.  Which will also be a Slate Star Codex meetup, because Scott Alexander will be there too.  We haven’t figured out many details yet, except that it will definitively involve getting fruit smoothies from one of the places I remember as a grad student.  Possible discussion topics include what the math, CS, and physics research communities could be doing better; how to advance Enlightenment values in an age of recrudescent totalitarianism; and (if we’re feeling really ambitious) the interpretation of quantum mechanics.  If you’re interested, shoot me an email, let me know if there are times that don’t work; then other Scott and I will figure out a plan and make an announcement.

On an unrelated note, some people might enjoy my answer to a MathOverflow question about why one should’ve expected number theory to be so rife with ridiculously easy-to-state yet hard-to-prove conjectures, like Fermat’s Last Theorem and the Goldbach Conjecture.  As I’ve discussed on this blog before, I’ve been deeply impressed with MathOverflow since the beginning, but never more so than today, when a decision to close the question as “off-topic” was rightfully overruled.  If there’s any idea that unites all theoretical computer scientists, I’d say it’s the idea that what makes a given kind of mathematics “easy” or “hard” is, itself, a proper subject for mathematical inquiry.

Because you asked: the Simulation Hypothesis has not been falsified; remains unfalsifiable

October 3rd, 2017

By email, by Twitter, even as the world was convulsed by tragedy, the inquiries poured in yesterday about a different topic entirely: Scott, did physicists really just prove that the universe is not a computer simulation—that we can’t be living in the Matrix?

What prompted this was a rash of popular articles like this one (“Researchers claim to have found proof we are NOT living in a simulation”).  The articles were all spurred by a recent paper in Science Advances: Quantized gravitational responses, the sign problem, and quantum complexity, by Zohar Ringel of Hebrew University and Dmitry L. Kovrizhin of Oxford.

I’ll tell you what: before I comment, why don’t I just paste the paper’s abstract here.  I invite you to read it—not the whole paper, just the abstract, but paying special attention to the sentences—and then make up your own mind about whether it supports the interpretation that all the popular articles put on it.

Ready?  Set?

Abstract: It is believed that not all quantum systems can be simulated efficiently using classical computational resources.  This notion is supported by the fact that it is not known how to express the partition function in a sign-free manner in quantum Monte Carlo (QMC) simulations for a large number of important problems.  The answer to the question—whether there is a fundamental obstruction to such a sign-free representation in generic quantum systems—remains unclear.  Focusing on systems with bosonic degrees of freedom, we show that quantized gravitational responses appear as obstructions to local sign-free QMC.  In condensed matter physics settings, these responses, such as thermal Hall conductance, are associated with fractional quantum Hall effects.  We show that similar arguments also hold in the case of spontaneously broken time-reversal (TR) symmetry such as in the chiral phase of a perturbed quantum Kagome antiferromagnet.  The connection between quantized gravitational responses and the sign problem is also manifested in certain vertex models, where TR symmetry is preserved.

For those tuning in from home, the “sign problem” is an issue that arises when, for example, you’re trying to use the clever trick known as Quantum Monte Carlo (QMC) to learn about the ground state of a quantum system using a classical computer—but where you needed probabilities, which are real numbers from 0 to 1, your procedure instead spits out numbers some of which are negative, and which you can therefore no longer use to define a sensible sampling process.  (In some sense, it’s no surprise that this would happen when you’re trying to simulate quantum mechanics, which of course is all about generalizing the rules of probability in a way that involves negative and even complex numbers!  The surprise, rather, is that QMC lets you avoid the sign problem as often as it does.)

Anyway, this is all somewhat far from my expertise, but insofar as I understand the paper, it looks like a serious contribution to our understanding of the sign problem, and why local changes of basis can fail to get rid of it when QMC is used to simulate certain bosonic systems.  It will surely interest QMC experts.

OK, but does any of this prove that the universe isn’t a computer simulation, as the popular articles claim (and as the original paper does not)?

It seems to me that, to get from here to there, you’d need to overcome four huge difficulties, any one of which would be fatal by itself, and which are logically independent of each other.

  1. As a computer scientist, one thing that leapt out at me, is that Ringel and Kovrizhin’s paper is fundamentally about computational complexity—specifically, about which quantum systems can and can’t be simulated in polynomial time on a classical computer—yet it’s entirely innocent of the language and tools of complexity theory.  There’s no BQP, no QMA, no reduction-based hardness argument anywhere in sight, and no clearly-formulated request for one either.  Instead, everything is phrased in terms of the failure of one specific algorithmic framework (namely QMC)—and within that framework, only “local” transformations of the physical degrees of freedom are considered, not nonlocal ones that could still be accessible to polynomial-time algorithms.  Of course, one does whatever one needs to do to get a result.
    To their credit, the authors do seem aware that a language for discussing all possible efficient algorithms exists.  They write, for example, of a “common understanding related to computational complexity classes” that some quantum systems are hard to simulate, and specifically of the existence of systems that support universal quantum computation.  So rather than criticize the authors for this limitation of their work, I view their paper as a welcome invitation for closer collaboration between the quantum complexity theory and quantum Monte Carlo communities, which approach many of the same questions from extremely different angles.  As official ambassador between the two communities, I nominate Matt Hastings.
  2. OK, but even if the paper did address computational complexity head-on, about the most it could’ve said is that computer scientists generally believe that BPP≠BQP (i.e., that quantum computers can solve more decision problems in polynomial time than classical probabilistic ones); and that such separations are provable in the query complexity and communication complexity worlds; and that at any rate, quantum computers can solve exact sampling problems that are classically hard unless the polynomial hierarchy collapses (as pointed out in the BosonSampling paper, and independently by Bremner, Jozsa, Shepherd).  Alas, until someone proves P≠PSPACE, there’s no hope for an unconditional proof that quantum computers can’t be efficiently simulated by classical ones.
    (Incidentally, the paper comments, “Establishing an obstruction to a classical simulation is a rather ill-defined task.”  I beg to differ: it’s not ill-defined; it’s just ridiculously hard!)
  3. OK, but suppose it were proved that BPP≠BQP—and for good measure, suppose it were also experimentally demonstrated that scalable quantum computing is possible in our universe.  Even then, one still wouldn’t by any stretch have ruled out that the universe was a computer simulation!  For as many of the people who emailed me asked themselves (but as the popular articles did not), why not just imagine that the universe is being simulated on a quantum computer?  Like, duh?
  4. Finally: even if, for some reason, we disallowed using a quantum computer to simulate the universe, that still wouldn’t rule out the simulation hypothesis.  For why couldn’t God, using Her classical computer, spend a trillion years to simulate one second as subjectively perceived by us?  After all, what is exponential time to She for whom all eternity is but an eyeblink?

Anyway, if it weren’t for all four separate points above, then sure, physicists would have now proved that we don’t live in the Matrix.

I do have a few questions of my own, for anyone who came here looking for my reaction to the ‘news’: did you really need me to tell you all this?  How much of it would you have figured out on your own, just by comparing the headlines of the popular articles to the descriptions (however garbled) of what was actually done?  How obvious does something need to be, before it no longer requires an ‘expert’ to certify it as such?  If I write 500 posts like this one, will the 501st post basically just write itself?

Asking for a friend.


Comment Policy: I welcome discussion about the Ringel-Dovrizhin paper; what might’ve gone wrong with its popularization; QMC; the sign problem; the computational complexity of condensed-matter problems more generally; and the relevance or irrelevance of work on these topics to broader questions about the simulability of the universe.  But as a little experiment in blog moderation, I won’t allow comments that just philosophize in general about whether or not the universe is a simulation, without making further contact with the actual content of this post.  We’ve already had the latter conversation here—probably, like, every week for the last decade—and I’m ready for something new.

Also against individual IQ worries

October 1st, 2017

Scott Alexander recently blogged “Against Individual IQ Worries.”  Apparently, he gets many readers writing to him terrified that they scored too low on an IQ test, and therefore they’ll never be able to pursue their chosen career, or be a full-fledged intellectual or member of the rationalist community or whatever.  Amusingly, other Scott says, some of these readers have even performed their own detailed Bayesian analysis of what it might mean that their IQ score is only 90, cogently weighing the arguments and counterarguments while deploying the full vocabulary of statistical research.  It somehow reminds me of the joke about the talking dog, who frets to his owner that he doesn’t think he’s articulate enough to justify all the media attention he’s getting.

I’ve long had mixed feelings about the entire concept of IQ.

On the one hand, I know all the studies that show that IQ is highly heritable, that it’s predictive of all sorts of life outcomes, etc. etc.  I’m also aware of the practical benefits of IQ research, many of which put anti-IQ leftists into an uncomfortable position: for example, the world might never have understood the risks of lead poisoning without studies showing how they depressed IQ.  And as for the thousands of writers who dismiss the concept of IQ in favor of grit, multiple intelligences, emotional intelligence, or whatever else is the flavor of the week … well, I can fully agree about the importance of the latter qualities, but can’t go along with many of those writers’ barely-concealed impulse to lower the social status of STEM nerds even further, or to enforce a world where the things nerds are good at don’t matter.

On the other hand … well, have you actually looked at an IQ test?  To anyone with a scientific or mathematical bent, the tests are vaguely horrifying.  “Which of these pictures is unlike the others?”  “What number comes next in the sequence?”  Question after question that could have multiple defensible valid answers, but only one that “counts”—and that, therefore, mostly tests the social skill of reverse-engineering what the test-writer had in mind.  As a teacher, I’d be embarrassed to put such questions on an exam.

I sometimes get asked what my IQ is.  The truth is that, as far as I know, I was given one official IQ test, when I was four years old, and my score was about 106.  The tester earnestly explained to my parents that, while I scored off the chart on some subtests, I completely bombed others, and averaging yielded 106.  As a representative example of what I got wrong, the tester offered my parents the following:

Tester: “Suppose you came home, and you saw smoke coming out of your neighbor’s roof.  What would you do?”

Me: “Probably nothing, because it’s just the chimney, and they have a fire in their fireplace.”

Tester: “OK, but suppose it wasn’t the chimney.”

Me: “Well then, I’d either call for help or not, depending on how much I liked my neighbor…”

Apparently, my parents later consulted other psychologists who were of the opinion that my IQ was higher.  But the point remains: if IQ is defined as your score on a professionally administered IQ test, then mine is about 106.

Richard Feynman famously scored only 124 on a childhood IQ test—above average, but below the cutoff for most schools’ “gifted and talented” programs.  After he won the Nobel Prize in Physics, he reportedly said that the prize itself was no big deal; what he was really proud of was to have received one despite a merely 124 IQ.  If so, then it seems to me that I can feel equally proud, to have completed a computer science PhD at age 22, become a tenured MIT professor, etc. etc. despite a much lower IQ even than Feynman’s.

But seriously: how do we explain Feynman’s score?  Well, when you read IQ enthusiasts, you find what they really love is not IQ itself, but rather “g“, a statistical construct derived via factor analysis: something that positively correlates with just about every measurable intellectual ability, but that isn’t itself directly measurable (at least, not by any test yet devised).  An IQ test is merely one particular instrument that happens to correlate well with g—not necessarily the best one for all purposes.  The SAT also correlates with g—indeed, almost as well as IQ tests themselves do, despite the idea (or pretense?) that the SAT measures “acquired knowledge.”  These correlations are important, but they allow for numerous and massive outliers.

So, not for the first time, I find myself in complete agreement with Scott Alexander, when he advises people to stop worrying.  We can uphold every statistical study that’s ever been done correlating IQ with other variables, while still affirming that fretting about your own low IQ score is almost as silly as fretting that you must be dumb because your bookshelf is too empty (a measurable variable that also presumably correlates with g).

More to the point: if you want to know, let’s say, whether you can succeed as a physicist, then surely the best way to find out is to start studying physics and see how well you do.  That will give you a much more accurate signal than a gross consumer index like IQ will—and conditioned on that signal, I’m guessing that your IQ score will provide almost zero additional information.  (Though then again, what would a guy with a 106 IQ know about such things?)

Michael Cohen (1992-2017)

September 28th, 2017

Image result for michael cohen mit

I first encountered Michael Cohen when, as a freshman newly arrived at MIT, he walked into my office unannounced to ask if I had any open problems for him to solve.  My first reaction was bemused annoyance: who does this punk think he is?  he’s ready to do theory, but doesn’t even know enough to make an appointment first?  also, why doesn’t he shave?

And then, five minutes later, I was practically begging him to do research with me.  “OK, so you didn’t like that problem?  Wait, I’ve got another one!  Hey, where are you going…”

Within those five minutes, it had become obvious that this was a freshman who I could—must—talk to like an advanced grad student or professor.  Sadly for quantum computing, Michael ultimately decided to go into classical parts of theoretical computer science, such as low-rank approximation and fast algorithms for geometry and linear-algebra problems.  But that didn’t stop him from later taking my graduate course on quantum complexity theory, where he sat in the front and loudly interrupted me every minute, stream-of-consciousness style, so that my “lectures” often turned into dialogues with him.  Totally unforgivable—all the more so because his musings were always on point, constantly catching me in errors or unjustified claims (one of which I blogged about previously).

Not once did I ever suspect he did it to show off: he was simply so overtaken by his urge to understand the point at hand, as to be oblivious to all niceties.  Yet somehow, that social obliviousness didn’t stop him from accumulating a huge circle of friends.  (Well, it was MIT.)

Michael stayed on at MIT as a grad student, racking up an incredible publication list by age 25.  This semester, he went to visit the Simons Institute for Theory of Computing in Berkeley.

Three days ago, Michael was found dead in his apartment in Berkeley, after having cancelled a scheduled talk because he was feeling unwell.  No cause has been given.

The horrible news came just as I was arriving in Germany for the Heidelberg Laureate Forum, to speak about quantum supremacy.  So I barely had time to process the tragedy—yet it was always in the background, especially as I learned that in his brief life, Michael had also touched many of the other computer scientists who I spoke with in Heidelberg, such as Dan Spielman, whose approach to Ramanujan graphs (with Marcus and Srivastava) Michael had made constructive in one of his most celebrated works.  Only now is the full weight of what happened bearing down on me.

I understand that memorial events are being planned at both MIT and Berkeley.  Feel free to share memories of Michael in the comments; see also Luca’s post and Lance’s post.

This is an unfathomable loss for Michael’s family, for his many friends and colleagues, and for a field that’s been robbed of decades of breakthroughs.

My Big Numbers talk at Festivaletteratura

September 14th, 2017

Last weekend, I gave a talk on big numbers, as well as a Q&A about quantum computing, at Festivaletteratura: one of the main European literary festivals, held every year in beautiful and historic Mantua, Italy.  (For those who didn’t know, as I didn’t: this is the city where Virgil was born, and where Romeo gets banished in Romeo and Juliet.  Its layout hasn’t substantially changed since the Middle Ages.)

I don’t know how much big numbers or quantum computing have to do with literature, but I relished the challenge of explaining these things to an audience that was not merely “popular” but humanisitically rather than scientifically inclined.  In this case, there was not only a math barrier, but also a language barrier, as the festival was mostly in Italian and only some of the attendees knew English, to varying degrees.  The quantum computing session was live-translated into Italian (the challenge faced by the translator in not mangling this material provided a lot of free humor), but the big numbers talk wasn’t.  What’s more, the talk was held outdoors, on the steps of a cathedral, with tons of background noise, including a bell that loudly chimed halfway through the talk.  So if my own words weren’t simple and clear, forget it.

Anyway, in the rest of this post, I’ll share a writeup of my big numbers talk.  The talk has substantial overlap with my “classic” Who Can Name The Bigger Number? essay from 1999.  While I don’t mean to supersede or displace that essay, the truth is that I think and write somewhat differently than I did as a teenager (whuda thunk?), and I wanted to give Scott2017 a crack at material that Scott1999 has been over already.  If nothing else, the new version is more up-to-date and less self-indulgent, and it includes points (for example, the relation between ordinal generalizations of the Busy Beaver function and the axioms of set theory) that I didn’t understand back in 1999.

For regular readers of this blog, I don’t know how much will be new here.  But if you’re one of those people who keeps introducing themselves at social events by saying “I really love your blog, Scott, even though I don’t understand anything that’s in it”—something that’s always a bit awkward for me, because, uh, thanks, I guess, but what am I supposed to say next?—then this lecture is for you.  I hope you’ll read it and understand it.

Thanks so much to Festivaletteratura organizer Matteo Polettini for inviting me, and to Fabrizio Illuminati for moderating the Q&A.  I had a wonderful time in Mantua, although I confess there’s something about being Italian that I don’t understand.  Namely: how do you derive any pleasure from international travel, if anywhere you go, the pizza, pasta, bread, cheese, ice cream, coffee, architecture, scenery, historical sights, and pretty much everything else all fall short of what you’re used to?


Big Numbers

by Scott Aaronson
Sept. 9, 2017

My four-year-old daughter sometimes comes to me and says something like: “daddy, I think I finally figured out what the biggest number is!  Is it a million million million million million million million million thousand thousand thousand hundred hundred hundred hundred twenty eighty ninety eighty thirty a million?”

So I reply, “I’m not even sure exactly what number you named—but whatever it is, why not that number plus one?”

“Oh yeah,” she says.  “So is that the biggest number?”

Of course there’s no biggest number, but it’s natural to wonder what are the biggest numbers we can name in a reasonable amount of time.  Can I have two volunteers from the audience—ideally, two kids who like math?

[Two kids eventually come up.  I draw a line down the middle of the blackboard, and place one kid on each side of it, each with a piece of chalk.]

So the game is, you each have ten seconds to write down the biggest number you can.  You can’t write anything like “the other person’s number plus 1,” and you also can’t write infinity—it has to be finite.  But other than that, you can write basically anything you want, as long as I’m able to understand exactly what number you’ve named.  [These instructions are translated into Italian for the kids.]

Are you ready?  On your mark, get set, GO!

[The kid on the left writes something like: 999999999

While the kid on the right writes something like: 11111111111111111

Looking at these, I comment:]

9 is bigger than 1, but 1 is a bit faster to write, and as you can see that makes the difference here!  OK, let’s give our volunteers a round of applause.

[I didn’t plant the kids, but if I had, I couldn’t have designed a better jumping-off point.]


I’ve been fascinated by how to name huge numbers since I was a kid myself.  When I was a teenager, I even wrote an essay on the subject, called Who Can Name the Bigger Number?  That essay might still get more views than any of the research I’ve done in all the years since!  I don’t know whether to be happy or sad about that.

I think the reason the essay remains so popular, is that it shows up on Google whenever someone types something like “what is the biggest number?”  Some of you might know that Google itself was named after the huge number called a googol: 10100, or 1 followed by a hundred zeroes.

Of course, a googol isn’t even close to the biggest number we can name.  For starters, there’s a googolplex, which is 1 followed by a googol zeroes.  Then there’s a googolplexplex, which is 1 followed by a googolplex zeroes, and a googolplexplexplex, and so on.  But one of the most basic lessons you’ll learn in this talk is that, when it comes to naming big numbers, whenever you find yourself just repeating the same operation over and over and over, it’s time to step back, and look for something new to do that transcends everything you were doing previously.  (Applications to everyday life left as exercises for the listener.)

One of the first people to think about systems for naming huge numbers was Archimedes, who was Greek but lived in what’s now Italy (specifically Syracuse, Sicily) in the 200s BC.  Archimedes wrote a sort of pop-science article—possibly history’s first pop-science article—called The Sand-Reckoner.  In this remarkable piece, which was addressed to the King of Syracuse, Archimedes sets out to calculate an upper bound on the number of grains of sand needed to fill the entire universe, or at least the universe as known in antiquity.  He thereby seeks to refute people who use “the number of sand grains” as a shorthand for uncountability and unknowability.

Of course, Archimedes was just guessing about the size of the universe, though he did use the best astronomy available in his time—namely, the work of Aristarchus, who anticipated Copernicus.  Besides estimates for the size of the universe and of a sand grain, the other thing Archimedes needed was a way to name arbitrarily large numbers.  Since he didn’t have Arabic numerals or scientific notation, his system was basically just to compose the word “myriad” (which means 10,000) into bigger and bigger chunks: a “myriad myriad” gets its own name, a “myriad myriad myriad” gets another, and so on.  Using this system, Archimedes estimated that ~1063 sand grains would suffice to fill the universe.  Ancient Hindu mathematicians were able to name similarly large numbers using similar notations.  In some sense, the next really fundamental advances in naming big numbers wouldn’t occur until the 20th century.

We’ll come to those advances, but before we do, I’d like to discuss another question that motivated Archimedes’ essay: namely, what are the biggest numbers relevant to the physical world?

For starters, how many atoms are in a human body?  Anyone have a guess?  About 1028.  (If you remember from high-school chemistry that a “mole” is 6×1023, this is not hard to ballpark.)

How many stars are in our galaxy?  Estimates vary, but let’s say a few hundred billion.

How many stars are in the entire observable universe?  Something like 1023.

How many subatomic particles are in the observable universe?  No one knows for sure—for one thing, because we don’t know what the dark matter is made of—but 1090 is a reasonable estimate.

Some of you might be wondering: but for all anyone knows, couldn’t the universe be infinite?  Couldn’t it have infinitely many stars and particles?  The answer to that is interesting: indeed, no one knows whether space goes on forever or curves back on itself, like the surface of the earth.  But because of the dark energy, discovered in 1998, it seems likely that even if space is infinite, we can only ever see a finite part of it.  The dark energy is a force that pushes the galaxies apart.  The further away they are from us, the faster they’re receding—with galaxies far enough away from us receding faster than light.

Right now, we can see the light from galaxies that are up to about 45 billion light-years away.  (Why 45 billion light-years, you ask, if the universe itself is “only” 13.6 billion years old?  Well, when the galaxies emitted the light, they were a lot closer to us than they are now!  The universe expanded in the meantime.)  If, as seems likely, the dark energy has the form of a cosmological constant, then there’s a somewhat further horizon, such that it’s not just that the galaxies beyond that can’t be seen by us right now—it’s that they can never be seen.


In practice, many big numbers come from the phenomenon of exponential growth.  Here’s a graph showing the three functions n, n2, and 2n:

The difference is, n and even n2 grow in a more-or-less manageable way, but 2n just shoots up off the screen.  The shooting-up has real-life consequences—indeed, more important consequences than just about any other mathematical fact one can think of.

The current human population is about 7.5 billion (when I was a kid, it was more like 5 billion).  Right now, the population is doubling about once every 64 years.  If it continues to double at that rate, and humans don’t colonize other worlds, then you can calculate that, less than 3000 years from now, the entire earth, all the way down to the core, will be made of human flesh.  I hope the people use deodorant!

Nuclear chain reactions are a second example of exponential growth: one uranium or plutonium nucleus fissions and emits neutrons that cause, let’s say, two other nuclei to fission, which then cause four nuclei to fission, then 8, 16, 32, and so on, until boom, you’ve got your nuclear weapon (or your nuclear reactor, if you do something to slow the process down).  A third example is compound interest, as with your bank account, or for that matter an entire country’s GDP.  A fourth example is Moore’s Law, which is the thing that said that the number of components in a microprocessor doubled every 18 months (with other metrics, like memory, processing speed, etc., on similar exponential trajectories).  Here at Festivaletteratura, there’s a “Hack Space,” where you can see state-of-the-art Olivetti personal computers from around 1980: huge desk-sized machines with maybe 16K of usable RAM.  Moore’s Law is the thing that took us from those (and the even bigger, weaker computers before them) to the smartphone that’s in your pocket.

However, a general rule is that any time we encounter exponential growth in our observed universe, it can’t last for long.  It will stop, if not before then when it runs out of whatever resource it needs to continue: for example, food or land in the case of people, fuel in the case of a nuclear reaction.  OK, but what about Moore’s Law: what physical constraint will stop it?

By some definitions, Moore’s Law has already stopped: computers aren’t getting that much faster in terms of clock speed; they’re mostly just getting more and more parallel, with more and more cores on a chip.  And it’s easy to see why: the speed of light is finite, which means the speed of a computer will always be limited by the size of its components.  And transistors are now just 15 nanometers across; a couple orders of magnitude smaller and you’ll be dealing with individual atoms.  And unless we leap really far into science fiction, it’s hard to imagine building a transistor smaller than one atom across!

OK, but what if we do leap really far into science fiction?  Forget about engineering difficulties: is there any fundamental principle of physics that prevents us from making components smaller and smaller, and thereby making our computers faster and faster, without limit?

While no one has tested this directly, it appears from current physics that there is a fundamental limit to speed, and that it’s about 1043 operations per second, or one operation per Planck time.  Likewise, it appears that there’s a fundamental limit to the density with which information can be stored, and that it’s about 1069 bits per square meter, or one bit per Planck area. (Surprisingly, the latter limit scales only with the surface area of a region, not with its volume.)

What would happen if you tried to build a faster computer than that, or a denser hard drive?  The answer is: cycling through that many different states per second, or storing that many bits, would involve concentrating so much energy in so small a region, that the region would exceed what’s called its Schwarzschild radius.  If you don’t know what that means, it’s just a fancy way of saying that your computer would collapse to a black hole.  I’ve always liked that as Nature’s way of telling you not to do something!

Note that, on the modern view, a black hole itself is not only the densest possible object allowed by physics, but also the most efficient possible hard drive, storing ~1069 bits per square meter of its event horizon—though the bits are not so easy to retrieve! It’s also, in a certain sense, the fastest possible computer, since it really does cycle through 1043 states per second—though it might not be computing anything that anyone would care about.

We can also combine these fundamental limits on computer speed and storage capacity, with the limits that I mentioned earlier on the size of the observable universe, which come from the cosmological constant.  If we do so, we get an upper bound of ~10122 on the number of bits that can ever be involved in any computation in our world, no matter how large: if we tried to do a bigger computation than that, the far parts of it would be receding away from us faster than the speed of light.  In some sense, this 10122 is the most fundamental number that sets the scale of our universe: on the current conception of physics, everything you’ve ever seen or done, or will see or will do, can be represented by a sequence of at most 10122 ones and zeroes.


Having said that, in math, computer science, and many other fields (including physics itself), many of us meet bigger numbers than 10122 dozens of times before breakfast! How so? Mostly because we choose to ask, not about the number of things that are, but about the number of possible ways they could be—not about the size of ordinary 3-dimensional space, but the sizes of abstract spaces of possible configurations. And the latter are subject to exponential growth, continuing way beyond 10122.

As an example, let’s ask: how many different novels could possibly be written (say, at most 400 pages long, with a normal-size font, yadda yadda)? Well, we could get a lower bound on the number just by walking around here at Festivaletteratura, but the number that could be written certainly far exceeds the number that have been written or ever will be. This was the subject of Jorge Luis Borges’ famous story The Library of Babel, which imagined an immense library containing every book that could possibly be written up to a certain length. Of course, the vast majority of the books are filled with meaningless nonsense, but among their number one can find all the great works of literature, books predicting the future of humanity in perfect detail, books predicting the future except with a single error, etc. etc. etc.

To get more quantitative, let’s simply ask: how many different ways are there to fill the first page of a novel?  Let’s go ahead and assume that the page is filled with intelligible (or at least grammatical) English text, rather than arbitrary sequences of symbols, at a standard font size and page size.  In that case, using standard estimates for the entropy (i.e., compressibility) of English, I estimated this morning that there are maybe ~10700 possibilities.  So, forget about the rest of the novel: there are astronomically more possible first pages than could fit in the observable universe!

We could likewise ask: how many chess games could be played?  I’ve seen estimates from 1040 up to 10120, depending on whether we count only “sensible” games or also “absurd” ones (though in all cases, with a limit on the length of the game as might occur in a real competition). For Go, by contrast, which is played on a larger board (19×19 rather than 8×8) the estimates for the number of possible games seem to start at 10800 and only increase from there. This difference in magnitudes has something to do with why Go is a “harder” game than chess, why computers were able to beat the world chess champion already in 1997, but the world Go champion not until last year.

Or we could ask: given a thousand cities, how many routes are there for a salesman that visit each city exactly once? We write the answer as 1000!, pronounced “1000 factorial,” which just means 1000×999×998×…×2×1: there are 1000 choices for the first city, then 999 for the second city, 998 for the third, and so on.  This number is about 4×102567.  So again, more possible routes than atoms in the visible universe, yadda yadda.

But suppose the salesman is interested only in the shortest route that visits each city, given the distance between every city and every other.  We could then ask: to find that shortest route, would a computer need to search exhaustively through all 1000! possibilities—or, maybe not all 1000!, maybe it could be a bit more clever than that, but at any rate, a number that grew exponentially with the number of cities n?  Or could there be an algorithm that zeroed in on the shortest route dramatically faster: say, using a number of steps that grew only linearly or quadratically with the number of cities?

This, modulo a few details, is one of the most famous unsolved problems in all of math and science.  You may have heard of it; it’s called P versus NP.  P (Polynomial-Time) is the class of problems that an ordinary digital computer can solve in a “reasonable” amount of time, where we define “reasonable” to mean, growing at most like the size of the problem (for example, the number of cities) raised to some fixed power.  NP (Nondeterministic Polynomial-Time) is the class for which a computer can at least recognize a solution in polynomial-time.  If P=NP, it would mean that for every combinatorial problem of this sort, for which a computer could recognize a valid solution—Sudoku puzzles, scheduling airline flights, fitting boxes into the trunk of a car, etc. etc.—there would be an algorithm that cut through the combinatorial explosion of possible solutions, and zeroed in on the best one.  If P≠NP, it would mean that at least some problems of this kind required astronomical time, regardless of how cleverly we programmed our computers.

Most of us believe that P≠NP—indeed, I like to say that if we were physicists, we would’ve simply declared P≠NP a “law of nature,” and given ourselves Nobel Prizes for the discovery of the law!  And if it turned out that P=NP, we’d just give ourselves more Nobel Prizes for the law’s overthrow.  But because we’re mathematicians and computer scientists, we call it a “conjecture.”

Another famous example of an NP problem is: I give you (say) a 2000-digit number, and I ask you to find its prime factors.  Multiplying two thousand-digit numbers is easy, at least for a computer, but factoring the product back into primes seems astronomically hard—at least, with our present-day computers running any known algorithm.  Why does anyone care?  Well, you might know that, any time you order something online—in fact, every time you see a little padlock icon in your web browser—your personal information, like (say) your credit card number, is being protected by a cryptographic code that depends on the belief that factoring huge numbers is hard, or a few closely-related beliefs.  If P=NP, then those beliefs would be false, and indeed all cryptography that depends on hard math problems would be breakable in “reasonable” amounts of time.

In the special case of factoring, though—and of the other number theory problems that underlie modern cryptography—it wouldn’t even take anything as shocking as P=NP for them to fall.  Actually, that provides a good segue into another case where exponentials, and numbers vastly larger than 10122, regularly arise in the real world: quantum mechanics.

Some of you might have heard that quantum mechanics is complicated or hard.  But I can let you in on a secret, which is that it’s incredibly simple once you take the physics out of it!  Indeed, I think of quantum mechanics as not exactly even “physics,” but more like an operating system that the rest of physics runs on as application programs.  It’s a certain generalization of the rules of probability.  In one sentence, the central thing quantum mechanics says is that, to fully describe a physical system, you have to assign a number called an “amplitude” to every possible configuration that the system could be found in.  These amplitudes are used to calculate the probabilities that the system will be found in one configuration or another if you look at it.  But the amplitudes aren’t themselves probabilities: rather than just going from 0 to 1, they can be positive or negative or even complex numbers.

For us, the key point is that, if we have a system with (say) a thousand interacting particles, then the rules of quantum mechanics say we need at least 21000 amplitudes to describe it—which is way more than we could write down on pieces of paper filling the entire observable universe!  In some sense, chemists and physicists knew about this immensity since 1926.  But they knew it mainly as a practical problem: if you’re trying to simulate quantum mechanics on a conventional computer, then as far as we know, the resources needed to do so increase exponentially with the number of particles being simulated.  Only in the 1980s did a few physicists, such as Richard Feynman and David Deutsch, suggest “turning the lemon into lemonade,” and building computers that themselves would exploit the exponential growth of amplitudes.  Supposing we built such a computer, what would it be good for?  At the time, the only obvious application was simulating quantum mechanics itself!  And that’s probably still the most important application today.

In 1994, though, a guy named Peter Shor made a discovery that dramatically increased the level of interest in quantum computers.  That discovery was that a quantum computer, if built, could factor an n-digit number using a number of steps that grows only like about n2, rather than exponentially with n.  The upshot is that, if and when practical quantum computers are built, they’ll be able to break almost all the cryptography that’s currently used to secure the Internet.

(Right now, only small quantum computers have been built; the record for using Shor’s algorithm is still to factor 21 into 3×7 with high statistical confidence!  But Google is planning within the next year or so to build a chip with 49 quantum bits, or qubits, and other groups around the world are pursuing parallel efforts.  Almost certainly, 49 qubits still won’t be enough to do anything useful, including codebreaking, but it might be enough to do something classically hard, in the sense of taking at least ~249 or 563 trillion steps to simulate classically.)

I should stress, though, that for other NP problems—including breaking various other cryptographic codes, and solving the Traveling Salesman Problem, Sudoku, and the other combinatorial problems mentioned earlier—we don’t know any quantum algorithm analogous to Shor’s factoring algorithm.  For these problems, we generally think that a quantum computer could solve them in roughly the square root of the number of steps that would be needed classically, because of another famous quantum algorithm called Grover’s algorithm.  But getting an exponential quantum speedup for these problems would, at the least, require an additional breakthrough.  No one has proved that such a breakthrough in quantum algorithms is impossible: indeed, no one has proved that it’s impossible even for classical algorithms; that’s the P vs. NP question!  But most of us regard it as unlikely.

If we’re right, then the upshot is that quantum computers are not magic bullets: they might yield dramatic speedups for certain special problems (like factoring), but they won’t tame the curse of exponentiality, cut through to the optimal solution, every time we encounter a Library-of-Babel-like profusion of possibilities.  For (say) the Traveling Salesman Problem with a thousand cities, even a quantum computer—which is the most powerful kind of computer rooted in known laws of physics—might, for all we know, take longer than the age of the universe to find the shortest route.


The truth is, though, the biggest numbers that show up in math are way bigger than anything we’ve discussed until now: bigger than 10122, or even

$$ 2^{10^{122}}, $$

which is a rough estimate for the number of quantum-mechanical amplitudes needed to describe our observable universe.

For starters, there’s Skewes’ number, which the mathematician G. H. Hardy once called “the largest number which has ever served any definite purpose in mathematics.”  Let π(x) be the number of prime numbers up to x: for example, π(10)=4, since we have 2, 3, 5, and 7.  Then there’s a certain estimate for π(x) called li(x).  It’s known that li(x) overestimates π(x) for an enormous range of x’s (up to trillions and beyond)—but then at some point, it crosses over and starts underestimating π(x) (then overestimates again, then underestimates, and so on).  Skewes’ number is an upper bound on the location of the first such crossover point.  In 1955, Skewes proved that the first crossover must happen before

$$ x = 10^{10^{10^{964}}}. $$

Note that this bound has since been substantially improved, to 1.4×10316.  But no matter: there are numbers vastly bigger even than Skewes’ original estimate, which have since shown up in Ramsey theory and other parts of logic and combinatorics to take Skewes’ number’s place.

Alas, I won’t have time here to delve into specific (beautiful) examples of such numbers, such as Graham’s number.  So in lieu of that, let me just tell you about the sorts of processes, going far beyond exponentiation, that tend to yield such numbers.

The starting point is to remember a sequence of operations we all learn about in elementary school, and then ask why the sequence suddenly and inexplicably stops.

As long as we’re only talking about positive integers, “multiplication” just means “repeated addition.”  For example, 5×3 means 5 added to itself 3 times, or 5+5+5.

Likewise, “exponentiation” just means “repeated multiplication.”  For example, 53 means 5×5×5.

But what’s repeated exponentiation?  For that we introduce a new operation, which we call tetration, and write like so: 35 means 5 raised to itself 3 times, or

$$ ^{3} 5 = 5^{5^5} = 5^{3125} \approx 1.9 \times 10^{2184}. $$

But we can keep going. Let x pentated to the y, or xPy, mean x tetrated to itself y times.  Let x sextated to the y, or xSy, mean x pentated to itself y times, and so on.

Then we can define the Ackermann function, invented by the mathematician Wilhelm Ackermann in 1928, which cuts across all these operations to get more rapid growth than we could with any one of them alone.  In terms of the operations above, we can give a slightly nonstandard, but perfectly serviceable, definition of the Ackermann function as follows:

A(1) is 1+1=2.

A(2) is 2×2=4.

A(3) is 3 to the 3rd power, or 33=27.

Not very impressive so far!  But wait…

A(4) is 4 tetrated to the 4, or

$$ ^{4}4 = 4^{4^{4^4}} = 4^{4^{256}} = BIG $$

A(5) is 5 pentated to the 5, which I won’t even try to simplify.  A(6) is 6 sextated to the 6.  And so on.

More than just a curiosity, the Ackermann function actually shows up sometimes in math and theoretical computer science.  For example, the inverse Ackermann function—a function α such that α(A(n))=n, which therefore grows as slowly as the Ackermann function grows quickly, and which is at most 4 for any n that would ever arise in the physical universe—sometimes appears in the running times of real-world algorithms.

In the meantime, though, the Ackermann function also has a more immediate application.  Next time you find yourself in a biggest-number contest, like the one with which we opened this talk, you can just write A(1000), or even A(A(1000)) (after specifying that A means the Ackermann function above).  You’ll win—period—unless your opponent has also heard of something Ackermann-like or beyond.


OK, but Ackermann is very far from the end of the story.  If we want to go incomprehensibly beyond it, the starting point is the so-called “Berry Paradox”, which was first described by Bertrand Russell, though he said he learned it from a librarian named Berry.  The Berry Paradox asks us to imagine leaping past exponentials, the Ackermann function, and every other particular system for naming huge numbers.  Instead, why not just go straight for a single gambit that seems to beat everything else:

The biggest number that can be specified using a hundred English words or fewer

Why is this called a paradox?  Well, do any of you see the problem here?

Right: if the above made sense, then we could just as well have written

Twice the biggest number that can be specified using a hundred English words or fewer

But we just specified that number—one that, by definition, takes more than a hundred words to specify—using far fewer than a hundred words!  Whoa.  What gives?

Most logicians would say the resolution of this paradox is simply that the concept of “specifying a number with English words” isn’t precisely defined, so phrases like the ones above don’t actually name definite numbers.  And how do we know that the concept isn’t precisely defined?  Why, because if it was, then it would lead to paradoxes like the Berry Paradox!

So if we want to escape the jaws of logical contradiction, then in this gambit, we ought to replace English by a clear, logical language: one that can be used to specify numbers in a completely unambiguous way.  Like … oh, I know!  Why not write:

The biggest number that can be specified using a computer program that’s at most 1000 bytes long

To make this work, there are just two issues we need to get out of the way.  First, what does it mean to “specify” a number using a computer program?  There are different things it could mean, but for concreteness, let’s say a computer program specifies a number N if, when you run it (with no input), the program runs for exactly N steps and then stops.  A program that runs forever doesn’t specify any number.

The second issue is, which programming language do we have in mind: BASIC? C? Python?  The answer is that it won’t much matter!  The Church-Turing Thesis, one of the foundational ideas of computer science, implies that every “reasonable” programming language can emulate every other one.  So the story here can be repeated with just about any programming language of your choice.  For concreteness, though, we’ll pick one of the first and simplest programming languages, namely “Turing machine”—the language invented by Alan Turing all the way back in 1936!

In the Turing machine language, we imagine a one-dimensional tape divided into squares, extending infinitely in both directions, and with all squares initially containing a “0.”  There’s also a tape head with n “internal states,” moving back and forth on the tape.  Each internal state contains an instruction, and the only allowed instructions are: write a “0” in the current square, write a “1” in the current square, move one square left on the tape, move one square right on the tape, jump to a different internal state, halt, and do any of the previous conditional on whether the current square contains a “0” or a “1.”

Using Turing machines, in 1962 the mathematician Tibor Radó invented the so-called Busy Beaver function, or BB(n), which allowed naming by far the largest numbers anyone had yet named.  BB(n) is defined as follows: consider all Turing machines with n internal states.  Some of those machines run forever, when started on an all-0 input tape.  Discard them.  Among the ones that eventually halt, there must be some machine that runs for a maximum number of steps before halting.  However many steps that is, that’s what we call BB(n), the nth Busy Beaver number.

The first few values of the Busy Beaver function have actually been calculated, so let’s see them.

BB(1) is 1.  For a 1-state Turing machine on an all-0 tape, the choices are limited: either you halt in the very first step, or else you run forever.

BB(2) is 6, as isn’t too hard to verify by trying things out with pen and paper.

BB(3) is 21: that determination was already a research paper.

BB(4) is 107 (another research paper).

Much like with the Ackermann function, not very impressive yet!  But wait:

BB(5) is not yet known, but it’s known to be at least 47,176,870.

BB(6) is at least 7.4×1036,534.

BB(7) is at least

$$ 10^{10^{10^{10^{18,000,000}}}}. $$

Clearly we’re dealing with a monster here, but can we understand just how terrifying of a monster?  Well, call a sequence f(1), f(2), … computable, if there’s some computer program that takes n as input, runs for a finite time, then halts with f(n) as its output.  To illustrate, f(n)=n2, f(n)=2n, and even the Ackermann function that we saw before are all computable.

But I claim that the Busy Beaver function grows faster than any computable function.  Since this talk should have at least some math in it, let’s see a proof of that claim.

Maybe the nicest way to see it is this: suppose, to the contrary, that there were a computable function f that grew at least as fast as the Busy Beaver function.  Then by using that f, we could take the Berry Paradox from before, and turn it into an actual contradiction in mathematics!  So for example, suppose the program to compute f were a thousand bytes long.  Then we could write another program, not much longer than a thousand bytes, to run for (say) 2×f(1000000) steps: that program would just need to include a subroutine for f, plus a little extra code to feed that subroutine the input 1000000, and then to run for 2×f(1000000) steps.  But by assumption, f(1000000) is at least the maximum number of steps that any program up to a million bytes long can run for—even though we just wrote a program, less than a million bytes long, that ran for more steps!  This gives us our contradiction.  The only possible conclusion is that the function f, and the program to compute it, couldn’t have existed in the first place.

(As an alternative, rather than arguing by contradiction, one could simply start with any computable function f, and then build programs that compute f(n) for various “hardwired” values of n, in order to show that BB(n) must grow at least as rapidly as f(n).  Or, for yet a third proof, one can argue that, if any upper bound on the BB function were computable, then one could use that to solve the halting problem, which Turing famously showed to be uncomputable in 1936.)

In some sense, it’s not so surprising that the BB function should grow uncomputably quickly—because if it were computable, then huge swathes of mathematical truth would be laid bare to us.  For example, suppose we wanted to know the truth or falsehood of the Goldbach Conjecture, which says that every even number 4 or greater can be written as a sum of two prime numbers.  Then we’d just need to write a program that checked each even number one by one, and halted if and only if it found one that wasn’t a sum of two primes.  Suppose that program corresponded to a Turing machine with N states.  Then by definition, if it halted at all, it would have to halt after at most BB(N) steps.  But that means that, if we knew BB(N)—or even any upper bound on BB(N)—then we could find out whether our program halts, by simply running it for the requisite number of steps and seeing.  In that way we’d learn the truth or falsehood of Goldbach’s Conjecture—and similarly for the Riemann Hypothesis, and every other famous unproved mathematical conjecture (there are a lot of them) that can be phrased in terms of a computer program never halting.

(Here, admittedly, I’m using “we could find” in an extremely theoretical sense.  Even if someone handed you an N-state Turing machine that ran for BB(N) steps, the number BB(N) would be so hyper-mega-astronomical that, in practice, you could probably never distinguish the machine from one that simply ran forever.  So the aforementioned “strategy” for proving Goldbach’s Conjecture, or the Riemann Hypothesis would probably never yield fruit before the heat death of the universe, even though in principle it would reduce the task to a “mere finite calculation.”)

OK, you wanna know something else wild about the Busy Beaver function?  In 2015, my former student Adam Yedidia and I wrote a paper where we proved that BB(8000)—i.e., the 8000th Busy Beaver number—can’t be determined using the usual axioms for mathematics, which are called Zermelo-Fraenkel (ZF) set theory.  Nor can B(8001) or any larger Busy Beaver number.

To be sure, BB(8000) has some definite value: there are finitely many 8000-state Turing machines, and each one either halts or runs forever, and among the ones that halt, there’s some maximum number of steps that any of them runs for.  What we showed is that math, if it limits itself to the currently-accepted axioms, can never prove the value of BB(8000), even in principle.

The way we did that was by explicitly constructing an 8000-state Turing machine, which (in effect) enumerates all the consequences of the ZF axioms one after the next, and halts if and only if it ever finds a contradiction—that is, a proof of 0=1.  Presumably set theory is actually consistent, and therefore our program runs forever.  But if you proved the program ran forever, you’d also be proving the consistency of set theory.  And has anyone heard of any obstacle to doing that?  Of course, Gödel’s Incompleteness Theorem!  Because of Gödel, if set theory is consistent (well, technically, also arithmetically sound), then it can’t prove our program either halts or runs forever.  But that means set theory can’t determine BB(8000) either—because if it could do that, then it could also determine the behavior of our program.

To be clear, it was long understood that there’s some computer program that halts if and only if set theory is inconsistent—and therefore, that the axioms of set theory can determine at most k values of the Busy Beaver function, for some positive integer k.  “All” Adam and I did was to prove the first explicit upper bound, k≤8000, which required a lot of optimizations and software engineering to get the number of states down to something reasonable (our initial estimate was more like k≤1,000,000).  More recently, Stefan O’Rear has improved our bound—most recently, he says, to k≤1000, meaning that, at least by the lights of ZF set theory, fewer than a thousand values of the BB function can ever be known.

Meanwhile, let me remind you that, at present, only four values of the function are known!  Could the value of BB(100) already be independent of set theory?  What about BB(10)?  BB(5)?  Just how early in the sequence do you leap off into Platonic hyperspace?  I don’t know the answer to that question but would love to.


Ah, you ask, but is there any number sequence that grows so fast, it blows even the Busy Beavers out of the water?  There is!

Imagine a magic box into which you could feed in any positive integer n, and it would instantly spit out BB(n), the nth Busy Beaver number.  Computer scientists call such a box an “oracle.”  Even though the BB function is uncomputable, it still makes mathematical sense to imagine a Turing machine that’s enhanced by the magical ability to access a BB oracle any time it wants: call this a “super Turing machine.”  Then let SBB(n), or the nth super Busy Beaver number, be the maximum number of steps that any n-state super Turing machine makes before halting, if given no input.

By simply repeating the reasoning for the ordinary BB function, one can show that, not only does SBB(n) grow faster than any computable function, it grows faster than any function computable by super Turing machines (for example, BB(n), BB(BB(n)), etc).

Let a super duper Turing machine be a Turing machine with access to an oracle for the super Busy Beaver numbers.  Then you can use super duper Turing machines to define a super duper Busy Beaver function, which you can use in turn to define super duper pooper Turing machines, and so on!

Let “level-1 BB” be the ordinary BB function, let “level-2 BB” be the super BB function, let “level 3 BB” be the super duper BB function, and so on.  Then clearly we can go to “level-k BB,” for any positive integer k.

But we need not stop even there!  We can then go to level-ω BB.  What’s ω?  Mathematicians would say it’s the “first infinite ordinal”—the ordinals being a system where you can pass from any set of numbers you can possibly name (even an infinite set), to the next number larger than all of them.  More concretely, the level-ω Busy Beaver function is simply the Busy Beaver function for Turing machines that are able, whenever they want, to call an oracle to compute the level-k Busy Beaver function, for any positive integer k of their choice.

But why stop there?  We can then go to level-(ω+1) BB, which is just the Busy Beaver function for Turing machines that are able to call the level-ω Busy Beaver function as an oracle.  And thence to level-(ω+2) BB, level-(ω+3) BB, etc., defined analogously.  But then we can transcend that entire sequence and go to level-2ω BB, which involves Turing machines that can call level-(ω+k) BB as an oracle for any positive integer k.  In the same way, we can pass to level-3ω BB, level-4ω BB, etc., until we transcend that entire sequence and pass to level-ω2 BB, which can call any of the previous ones as oracles.  Then we have level-ω3 BB, level-ω4 BB, etc., until we transcend that whole sequence with level-ωω BB.  But we’re still not done!  For why not pass to level

$$ \omega^{\omega^{\omega}} $$,

level

$$ \omega^{\omega^{\omega^{\omega}}} $$,

etc., until we reach level

$$ \left. \omega^{\omega^{\omega^{.^{.^{.}}}}}\right\} _{\omega\text{ times}} $$?

(This last ordinal is also called ε0.)  And mathematicians know how to keep going even to way, way bigger ordinals than ε0, which give rise to ever more rapidly-growing Busy Beaver sequences.  Ordinals achieve something that on its face seems paradoxical, which is to systematize the concept of transcendence.

So then just how far can you push this?  Alas, ultimately the answer depends on which axioms you assume for mathematics.  The issue is this: once you get to sufficiently enormous ordinals, you need some systematic way to specify them, say by using computer programs.  But then the question becomes which ordinals you can “prove to exist,” by giving a computer program together with a proof that the program does what it’s supposed to do.  The more powerful the axiom system, the bigger the ordinals you can prove to exist in this way—but every axiom system will run out of gas at some point, only to be transcended, in Gödelian fashion, by a yet more powerful system that can name yet larger ordinals.

So for example, if we use Peano arithmetic—invented by the Italian mathematician Giuseppe Peano—then Gentzen proved in the 1930s that we can name any ordinals below ε0, but not ε0 itself or anything beyond it.  If we use ZF set theory, then we can name vastly bigger ordinals, but once again we’ll eventually run out of steam.

(Technical remark: some people have claimed that we can transcend this entire process by passing from first-order to second-order logic.  But I fundamentally disagree, because with second-order logic, which number you’ve named could depend on the model of set theory, and therefore be impossible to pin down.  With the ordinal Busy Beaver numbers, by contrast, the number you’ve named might be breathtakingly hopeless ever to compute—but provided the notations have been fixed, and the ordinals you refer to actually exist, at least we know there is a unique positive integer that you’re talking about.)

Anyway, the upshot of all of this is that, if you try to hold a name-the-biggest-number contest between two actual professionals who are trying to win, it will (alas) degenerate into an argument about the axioms of set theory.  For the stronger the set theory you’re allowed to assume consistent, the bigger the ordinals you can name, therefore the faster-growing the BB functions you can define, therefore the bigger the actual numbers.

So, yes, in the end the biggest-number contest just becomes another Gödelian morass, but one can get surprisingly far before that happens.


In the meantime, our universe seems to limit us to at most 10122 choices that could ever be made, or experiences that could ever be had, by any one observer.  Or fewer, if you believe that you won’t live until the heat death of the universe in some post-Singularity computer cloud, but for at most about 102 years.  In the meantime, the survival of the human race might hinge on people’s ability to understand much smaller numbers than 10122: for example, a billion, a trillion, and other numbers that characterize the exponential growth of our civilization and the limits that we’re now running up against.

On a happier note, though, if our goal is to make math engaging to young people, or to build bridges between the quantitative and literary worlds, the way this festival is doing, it seems to me that it wouldn’t hurt to let people know about the vastness that’s out there.  Thanks for your attention.

GapP, Oracles, and Quantum Supremacy

September 1st, 2017

Let me start with a few quick announcements before the main entrée:

First, the website haspvsnpbeensolved.com is now live!  Thanks so much to my friend Adam Chalmers for setting it up.  Please try it out on your favorite P vs. NP solution paper—I think you’ll be impressed by how well our secret validation algorithm performs.

Second, some readers might enjoy a YouTube video of me lecturing about the computability theory of closed timelike curves, from the Workshop on Computational Complexity and High Energy Physics at the University of Maryland a month ago.  Other videos from the workshop—including of talks by John Preskill, Daniel Harlow, Stephen Jordan, and other names known around Shtetl-Optimized, and of a panel discussion in which I participated—are worth checking out as well.  Thanks so much to Stephen for organizing such a great workshop!

Third, thanks to everyone who’s emailed to ask whether I’m holding up OK with Hurricane Harvey, and whether I know how to swim (I do).  As it happens, I haven’t been in Texas for two months—I spent most of the summer visiting NYU and doing other travel, and this year, Dana and I are doing an early sabbatical at Tel Aviv University.  However, I understand from friends that Austin, being several hours’ drive further inland, got nothing compared to what Houston did, and that UT is open on schedule for the fall semester.  Hopefully our house is still standing as well!  Our thoughts go to all those affected by the disaster in Houston.  Eventually, the Earth’s rapidly destabilizing climate almost certainly means that Austin will be threatened as well by “500-year events” happening every year or two, as for that matter will a large portion of the earth’s surface.  For now, though, Austin lives to be weird another day.


GapP, Oracles, and Quantum Supremacy

by Scott Aaronson

Stuart Kurtz 60th Birthday Conference, Columbia, South Carolina

August 20, 2017

It’s great to be here, to celebrate the life and work of Stuart Kurtz, which could never be … eclipsed … by anything.

I wanted to say something about work in structural complexity and counting complexity and oracles that Stuart was involved with “back in the day,” and how that work plays a major role in issues that concern us right now in quantum computing.  A major goal for the next few years is the unfortunately-named Quantum Supremacy.  What this means is to get a clear quantum speedup, for some task: not necessarily a useful task, but something that we can be as confident as possible is classically hard.  For example, consider the 49-qubit superconducting chip that Google is planning to fabricate within the next year or so.  This won’t yet be good enough for running Shor’s algorithm, to factor numbers of any interesting size, but it hopefully will be good enough to sample from a probability distribution over n-bit strings—in this case, 49-bit strings—that’s hard to sample from classically, taking somewhere on the order of 249 steps.

Furthermore, the evidence that that sort of thing is indeed classically hard, might actually be stronger than the evidence that factoring is classically hard.  As I like to say, a fast classical factoring algorithm would “merely” collapse the world’s electronic commerce—as far as we know, it wouldn’t collapse the polynomial hierarchy!  By contrast, a fast classical algorithm to simulate quantum sampling would collapse the polynomial hierarchy, assuming the simulation is exact.  Let me first go over the argument for that, and then explain some of the more recent things we’ve learned.

Our starting point will be two fundamental complexity classes, #P and GapP.

#P is the class of all nonnegative integer functions f, for which there exists a nondeterministic polynomial-time Turing machine M such that f(x) equals the number of accepting paths of M(x).  Less formally, #P is the class of problems that boil down to summing up an exponential number of nonnegative terms, each of which is efficiently computable individually.

GapP—introduced by Fenner, Fortnow, and Kurtz in 1992—can be defined as the set {f-g : f,g∈#P}; that is, the closure of #P under subtraction.  Equivalently, GapP is the class of problems that boil down to summing up an exponential number of terms, each of which is efficiently computable individually, but which could be either positive or negative, and which can therefore cancel each other out.  As you can see, GapP is a class that in some sense anticipates quantum computing!

For our purposes, the most important difference between #P and GapP is that #P functions can at least be multiplicatively approximated in the class BPPNP, by using Stockmeyer’s technique of approximating counting with universal hash functions.  By contrast, even if you just want to approximate a GapP function to within (say) a factor of 2—or for that matter, just decide whether a GapP function is positive or negative—it’s not hard to see that that’s already a #P-hard problem.  For, supposing we had an oracle to solve this problem, we could then shift the sum this way and that by adding positive and negative dummy terms, and use binary search, to zero in on the sum’s exact value in polynomial time.

It’s also not hard to see that a quantum computation can encode an arbitrary GapP function in one of its amplitudes.  Indeed, let s:{0,1}n→{1,-1} be any Boolean function that’s given by a polynomial-size circuit.  Then consider the quantum circuit below.

When we run this circuit, the probability that we see the all-0 string as output is

$$ \left( \frac{1}{\sqrt{2^n}} \sum_{z\in \{0,1\}^n} s(z) \right)^2 = \frac{1}{2^n} \sum_{z,w\in \{0,1\}^n} s(z) s(w) $$

which is clearly in GapP, and clearly #P-hard even to approximate to within a multiplicative factor.

By contrast, suppose we had a probabilistic polynomial-time classical algorithm, call it M, to sample the output distribution of the above quantum circuit.  Then we could rewrite the above probability as Prr[M(r) outputs 0…0], where r consists of the classical random bits used by M.  This is again an exponentially large sum, with one term for each possible r value—but now it’s a sum of nonnegative terms (probabilities), which is therefore approximable in BPPNP.

We can state the upshot as follows.  Let ExactSampBPP be the class of sampling problems—that is, families of probability distributions {Dx}x, one for each input x∈{0,1}n—for which there exists a polynomial-time randomized algorithm that outputs a sample exactly from Dx, in time polynomial in |x|.  Let ExactSampBQP be the same thing except that we allow a polynomial-time quantum algorithm.  Then we have that, if ExactSampBPP = ExactSampBQP, then squared sums of both positive and negative terms, could efficiently be rewritten as sums of nonnegative terms only—and hence P#P=BPPNP.  This, in turn, would collapse the polynomial hierarchy to the third level, by Toda’s Theorem that PH⊆P#P, together with the result BPPNP⊆∑3.  To summarize:

Theorem 1.  Quantum computers can efficiently solve exact sampling problems that are classically hard unless the polynomial hierarchy collapses.

(In fact, the argument works not only if the classical algorithm exactly samples Dx, but if it samples from any distribution in which the probabilities are multiplicatively close to Dx‘s.  If we really only care about exact sampling, then we can strengthen the conclusion to get that PH collapses to the second level.)

This sort of reasoning was implicit in several early works, including those of Fenner et al. and Terhal and DiVincenzo.  It was made fully explicit in my paper with Alex Arkhipov on BosonSampling in 2011, and in the independent work of Bremner, Jozsa, and Shepherd on the IQP model.  These works actually showed something stronger, which is that we get a collapse of PH, not merely from a fast classical algorithm to simulate arbitrary quantum systems, but from fast classical algorithms to simulate various special quantum systems.  In the case of BosonSampling, that special system is a collection of identical, non-interacting photons passing through a network of beamsplitters, then being measured at the very end to count the number of photons in each mode.  In the case of IQP, the special system is a collection of qubits that are prepared, subjected to some commuting Hamiltonians acting on various subsets of the qubits, and then measured.  These special systems don’t seem to be capable of universal quantum computation (or for that matter, even universal classical computation!)—and correspondingly, many of them seem easier to realize in the lab than a full universal quantum computer.

From an experimental standpoint, though, all these results are unsatisfactory, because they all talk only about the classical hardness of exact (or very nearly exact) sampling—and indeed, the arguments are based around the hardness of estimating just a single, exponentially-small amplitude.  But any real experiment will have tons of noise and inaccuracy, so it seems only fair to let the classical simulation be subject to serious noise and inaccuracy as well—but as soon as we do, the previous argument collapses.

Thus, from the very beginning, Alex Arkhipov and I took it as our “real” goal to show, under some reasonable assumption, that there’s a distribution D that a polynomial-time quantum algorithm can sample from, but such that no polynomial-time classical algorithm can sample from any distribution that’s even ε-close to D in variation distance.  Indeed, this goal is what led us to BosonSampling in the first place: we knew that we needed amplitudes that were not only #P-hard but “robustly” #P-hard; we knew that the permanent of an n×n matrix (at least over finite fields) was the canonical example of a “robustly” #P-hard function; and finally, we knew that systems of identical non-interacting bosons, such as photons, gave rise to amplitudes that were permanents in an extremely natural way.  The fact that photons actually exist in the physical world, and that our friends with quantum optics labs like to do experiments with them, was just a nice bonus!

A bit more formally, let ApproxSampBPP be the class of sampling problems for which there exists a classical algorithm that, given an input x∈{0,1}n and a parameter ε>0, samples a distribution that’s at most  away from Dx in variation distance, in time polynomial in n and 1/ε.  Let ApproxSampBQP be the same except that we allow a quantum algorithm.  Then the “dream” result that we’d love to prove—both then and now—is the following.

Strong Quantum Supremacy Conjecture.  If ApproxSampBPP = ApproxSampBQP, then the polynomial hierarchy collapses.

Unfortunately, Alex and I were only able to prove this conjecture assuming a further hypothesis, about the permanents of i.i.d. Gaussian matrices.

Theorem 2 (A.-Arkhipov).  Given an n×n matrix X of independent complex Gaussian entries, each of mean 0 and variance 1, assume it’s a #P-hard problem to approximate |Per(X)|2 to within ±ε⋅n!, with probability at least 1-δ over the choice of X, in time polynomial in n, 1/ε, and 1/δ.  Then the Strong Quantum Supremacy Conjecture holds.  Indeed, more than that: in such a case, even a fast approximate classical simulation of BosonSampling, in particular, would imply P#P=BPPNP and hence a collapse of PH.

Alas, after some months of effort, we were unable to prove the needed #P-hardness result for Gaussian permanents, and it remains an outstanding open problem—there’s not even a consensus as to whether it should be true or false.  Note that there is a famous polynomial-time classical algorithm to approximate the permanents of nonnegative matrices, due to Jerrum, Sinclair, and Vigoda, but that algorithm breaks down for matrices with negative or complex entries.  This is once again the power of cancellations, the difference between #P and GapP.

Frustratingly, if we want the exact permanents of i.i.d. Gaussian matrices, we were able to prove that that’s #P-hard; and if we want the approximate permanents of arbitrary matrices, we also know that that’s #P-hard—it’s only when we have approximation and random inputs in the same problem that we no longer have the tools to prove #P-hardness.

In the meantime, one can also ask a meta-question.  How hard should it be to prove the Strong Quantum Supremacy Conjecture?  Were we right to look at slightly exotic objects, like the permanents of Gaussian matrices?  Or could Strong Quantum Supremacy have a “pure, abstract complexity theory proof”?

Well, one way to formalize that question is to ask whether Strong Quantum Supremacy has a relativizing proof, a proof that holds in the presence of an arbitrary oracle.  Alex and I explicitly raised that as an open problem in our BosonSampling paper.

Note that “weak” quantum supremacy—i.e., the statement that ExactSampBPP = ExactSampBQP collapses the polynomial hierarchy—has a relativizing proof, namely the proof that I sketched earlier.  All the ingredients that we used—Toda’s Theorem, Stockmeyer approximate counting, simple manipulations of quantum circuits—were relativizing ingredients.  By contrast, all the way back in 1998, Fortnow and Rogers proved the following.

Theorem 3 (Fortnow and Rogers).  There exists an oracle relative to which P=BQP and yet PH is infinite.

In other words, if you want to prove that P=BQP collapses the polynomial hierarchy, the proof can’t be relativizing.  This theorem was subsequently generalized in a paper by Fenner, Fortnow, Kurtz, and Li, which used concepts like “generic oracles” that seem powerful but that I don’t understand.

The trouble is, Fortnow and Rogers’s construction was extremely tailored to making P=BQP.  It didn’t even make PromiseBPP=PromiseBQP (that is, it allowed that quantum computers might still be stronger than classical ones for promise problems), let alone did it collapse quantum with classical for sampling problems.

We can organize the various quantum/classical collapse possibilities as follows:

ExactSampBPP = ExactSampBQP

ApproxSampBPP = ApproxSampBQP   ⇔   FBPP = FBQP

PromiseBPP = PromiseBQP

BPP = BQP

Here FBPP is the class of relation problems solvable in randomized polynomial time—that is, problems where given an input x∈{0,1}n and a parameter ε>0, the goal is to produce any output in a certain set Sx, with success probability at least 1-ε, in time polynomial in n and 1/ε.  FBQP is the same thing except for quantum polynomial time.

The equivalence between the two equalities ApproxSampBPP = ApproxSampBQP and FBPP=FBQP is not obvious, and was the main result in my 2011 paper The Equivalence of Sampling and Searching.  While it’s easy to see that ApproxSampBPP = ApproxSampBQP implies FBPP=FBQP, the opposite direction requires us to take an arbitrary sampling problem S, and define a relation problem RS that has “essentially the same difficulty” as S (in the sense that RS has an efficient classical algorithm iff S does, RS has an efficient quantum algorithm iff S does, etc).  This, in turn, we do using Kolmogorov complexity: basically, RS asks us to output a tuple of samples that have large probabilities according to the requisite probability distribution from the sampling problem; and that also, conditioned on that, are close to algorithmically random.  The key observation is that, if a probabilistic Turing machine of fixed size can solve that relation problem for arbitrarily large inputs, then it must be doing so by sampling from a probability distribution close in variation distance to D—since any other approach would lead to outputs that were algorithmically compressible.

Be that as it may, staring at the chain of implications above, a natural question is which equalities in the chain collapse the polynomial hierarchy in a relativizing way, and which equalities collapse PH (if they do) only for deeper, non-relativizing reasons.

This is one of the questions that Lijie Chen and I took up, and settled, in our paper Complexity-Theoretic Foundations of Quantum Supremacy Experiments, which was presented at this summer’s Computational Complexity Conference (CCC) in Riga.  The “main” results in our paper—or at least, the results that the physicists care about—were about how confident we can be in the classical hardness of simulating quantum sampling experiments with random circuits, such as the experiments that the Google group will hopefully be able to do with its 49-qubit device in the near future.  This involved coming up with a new hardness assumption, which was tailored to those sorts of experiments, and giving a reduction from that new assumption, and studying how far existing algorithms come toward breaking the new assumption (tl;dr: not very far).

But our paper also had what I think of as a “back end,” containing results mainly of interest to complexity theorists, about what kinds of quantum supremacy theorems we can and can’t hope for in principle.  When I’m giving talks about our paper to physicists, I never have time to get to this back end—it’s always just “blah, blah, we also did some stuff involving structural complexity and oracles.”  But given that a large fraction of all the people on earth who enjoy those things are probably right here in this room, in the rest of this talk, I’d like to tell you about what was in the back end.

The first thing there was the following result.

Theorem 4 (A.-Chen).  There exists an oracle relative to which ApproxSampBPP = ApproxSampBQP and yet PH is infinite. In other words, any proof of the Strong Quantum Supremacy Conjecture will require non-relativizing techniques.

Theorem 4 represents a substantial generalization of Fortnow and Rogers’s Theorem 3, in that it makes quantum and classical equivalent not only for promise problems, but even for approximate sampling problems.  There’s also a sense in which Theorem 4 is the best possible: as we already saw, there are no oracles relative to which ExactSampBPP = ExactSampBQP and yet PH is infinite, because the opposite conclusion relativizes.

So how did we prove Theorem 4?  Well, we learned at this workshop that Stuart Kurtz pioneered the development of principled ways to prove oracle results just like this one, with multiple “nearly conflicting” requirements.  But, because we didn’t know that at the time, we basically just plunged in and built the oracle we wanted by hand!

In more detail, you can think of our oracle construction as proceeding in three steps.

  1. We throw in an oracle for a PSPACE-complete problem.  This collapses ApproxSampBPP with ApproxSampBQP, which is what we want.  Unfortunately, it also collapses the polynomial hierarchy down to P, which is not what we want!
  2. So then we need to add in a second part of the oracle that makes PH infinite again.  From Håstad’s seminal work in the 1980s until recently, even if we just wanted any oracle that makes PH infinite, without doing anything else at the same time, we only knew how to achieve that with quite special oracles.  But in their 2015 breakthrough, Rossman, Servedio, and Tan have shown that even a random oracle makes PH infinite with probability 1.  So for simplicity, we might as well take this second part of the oracle to be random.  The “only” problem is that, along with making PH infinite, a random oracle will also re-separate ApproxSampBPP and ApproxSampBQP (and for that matter, even ExactSampBPP and ExactSampBQP)—for example, because of the Fourier sampling task performed by the quantum circuit I showed you earlier!  So we once again seem back where we started.
    (To ward off confusion: ever since Fortnow and Rogers posed the problem in 1998, it remains frustratingly open whether BPP and BQP can be separated by a random oracle—that’s a problem that I and others have worked on, making partial progress that makes a query complexity separation look unlikely without definitively ruling one out.  But separating the sampling versions of BPP and BQP by a random oracle is much, much easier.)
  3. So, finally, we need to take the random oracle that makes PH infinite, and “scatter its bits around randomly” in such a way that a PH machine can still find the bits, but an ApproxSampBQP machine can’t.  In other words: given our initial random oracle A, we can make a new oracle B such that B(y,r)=(1,A(y)) if r is equal to a single randomly-chosen “password” ry, depending on the query y, and B(y,r)=(0,0) otherwise.  In that case, it takes just one more existential quantifier to guess the password ry, so PH can do it, but a quantum algorithm is stuck, basically because the linearity of quantum mechanics makes the algorithm not very sensitive to tiny random changes to the oracle string (i.e., the same reason why Grover’s algorithm can’t be arbitrarily sped up).  Incidentally, the reason why the password ry needs to depend on the query y is that otherwise the input x to the quantum algorithm could hardcode a password, and thereby reveal exponentially many bits of the random oracle A.

We should now check: why does the above oracle “only” collapse ApproxSampBPP and ApproxSampBQP?  Why doesn’t it also collapse ExactSampBPP and ExactSampBQP—as we know that it can’t, by our previous argument?  The answer is: because a quantum algorithm does have an exponentially small probability of correctly guessing a given password ry.  And that’s enough to make the distribution sampled by the quantum algorithm differ, by 1/exp(n) in variation distance, from the distribution sampled by any efficient classical simulation of the algorithm—an error that doesn’t matter for approximate sampling, but does matter for exact sampling.

Anyway, it’s then just like seven pages of formalizing the above intuitions and you’re done!

OK, since there seems to be time, I’d like to tell you about one more result from the back end of my and Lijie’s paper.

If we can work relative to whatever oracle A we like, then it’s easy to get quantum supremacy, and indeed BPPA≠BQPA.  We can, for example, use Simon’s problem, or Shor’s period-finding problem, or Forrelation, or other choices of black-box problems that admit huge, provable quantum speedups.  In the unrelativized world, by contrast, it’s clear that we have to make some complexity assumption for quantum supremacy—even if we just want ExactSampBPP ≠ ExactSampBQP.  For if (say) P=P#P, then ExactSampBPP and ExactSampBQP would collapse as well.

Lijie and I were wondering: what happens if we try to “interpolate” between the relativized and unrelativized worlds?  More specifically, what happens if our algorithms are allowed to query a black box, but we’re promised that whatever’s inside the black box is efficiently computable (i.e., has a small circuit)?  How hard is it to separate BPP from BQP, or ApproxSampBPP from ApproxSampBQP, relative to an oracle A that’s constrained to lie in P/poly?

Here, we’ll start with a beautiful observation that’s implicit in 2004 work by Servedio and Gortler, as well as 2012 work by Mark Zhandry.  In our formulation, this observation is as follows:

Theorem 5.  Suppose there exist cryptographic one-way functions (even just against classical adversaries).  Then there exists an oracle A∈P/poly such that BPPA≠BQPA.

While we still need to make a computational hardness assumption here, to separate quantum from classical computing, the surprise is that the assumption is so much weaker than what we’re used to.  We don’t need to assume the hardness of factoring or discrete log—or for that matter, of any “structured” problem that could be a basis for, e.g., public-key cryptography.  Just a one-way function that’s hard to invert, that’s all!

The intuition here is really simple.  Suppose there’s a one-way function; then it’s well-known, by the HILL and GGM Theorems of classical cryptography, that we can bootstrap it to get a cryptographic pseudorandom function family.  This is a family of polynomial-time computable functions fs:{0,1}n→{0,1}n, parameterized by a secret seed s, such that fs can’t be distinguished from a truly random function f by any polynomial-time algorithm that’s given oracle access to the function and that doesn’t know s.  Then, as our efficiently computable oracle A that separates quantum from classical computing, we take an ensemble of functions like

gs,r(x) = fs(x mod r),

where r is an exponentially large integer that serves as a “hidden period,” and s and r are both secrets stored by the oracle that are inaccessible to the algorithm that queries it.

The reasoning is now as follows: certainly there’s an efficient quantum algorithm to find r, or to solve some decision problem involving r, which we can use to define a language that’s in BQPA but not in BPPA.  That algorithm is just Shor’s period-finding algorithm!  (Technically, Shor’s algorithm needs certain assumptions on the starting function fs to work—e.g., it couldn’t be a constant function—but if those assumptions aren’t satisfied, then fs wasn’t pseudorandom anyway.)  On the other hand, suppose there were an efficient classical algorithm to find the period r.  In that case, we have a dilemma on our hands: would the classical algorithm still have worked, had we replaced fs by a truly random function?  If so, then the classical algorithm would violate well-known lower bounds on the classical query complexity of period-finding.  But if not, then by working on pseudorandom functions but not on truly random functions, the algorithm would be distinguishing the two—so fs wouldn’t have been a cryptographic pseudorandom function at all, contrary to assumption!

This all caused Lijie and me to wonder whether Theorem 5 could be strengthened even further, so that it wouldn’t use any complexity assumption at all.  In other words, why couldn’t we just prove unconditionally that there’s an oracle A∈P/poly such that BPPA≠BQPA?  By comparison, it’s not hard to see that we can unconditionally construct an oracle A∈P/poly such that PA≠NPA.

Alas, with the following theorem, we were able to explain why BPP vs. BQP (and even ApproxSampBPP vs. ApproxSampBQP) are different, and why some computational assumption is still needed to separate quantum from classical, even if we’re working relative to an efficiently computable oracle.

Theorem 6 (A.-Chen).  Suppose that, in the real world, ApproxSampBPP = ApproxSampBQP and NP⊆BPP (granted, these are big assumptions!).  Then ApproxSampBPPA = ApproxSampBQPA for all oracles A∈P/poly.

Taking the contrapositive, this is saying that you can’t separate ApproxSampBPP from ApproxSampBQP relative to an efficiently computable oracle, without separating some complexity classes in the real world.  This contrasts not only with P vs. NP, but even with ExactSampBPP vs. ExactSampBQP, which can be separated unconditionally relative to efficiently computable oracles.

The proof of Theorem 6 is intuitive and appealing.  Not surprisingly, we’re going to heavily exploit the assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP.  Let Q be a polynomial-time quantum algorithm that queries an oracle A∈P/poly.  Then we need to simulate Q—and in particular, sample close to the same probability distribution over outputs—using a polynomial-time classical algorithm that queries A.

Let

$$ \sum_{x,w} \alpha_{x,w} \left|x,w\right\rangle $$

be the state of Q immediately before its first query to the oracle A, where x is the input to be submitted to the oracle.  Then our first task is to get a bunch of samples from the probability distribution D={|αx,w|2}x,w, or something close to D in variation distance.  But this is easy to do, using the assumption ApproxSampBPP = ApproxSampBQP.

Let x1,…,xk be our samples from D, marginalized to the x part.  Then next, our classical algorithm queries A on each of x1,…,xk, getting responses A(x1),…,A(xk).  The next step is to search for a function f∈P/poly—or more specifically, a function of whatever fixed polynomial size is relevant—that agrees with A on the sample data, i.e. such that f(xi)=A(xi) for all i∈[k].  This is where we’ll use the assumption NP⊆BPP (together, of course, with the fact that at least one such f exists, namely A itself!), to make the task of finding f efficient.  We’ll also appeal to a fundamental fact about the sample complexity of PAC-learning.  The fact is that, if we find a polynomial-size circuit f that agrees with A on a bunch of sample points drawn independently from a distribution, then f will probably agree with A on most further points drawn from the same distribution as well.

So, OK, we then have a pretty good “mock oracle,” f, that we can substitute for the real oracle on the first query that Q makes.  Of course f and A won’t perfectly agree, but the small fraction of disagreements won’t matter much, again because of the linearity of quantum mechanics (i.e., the same thing that prevents us from speeding up Grover’s algorithm arbitrarily).  So we can basically simulate Q’s first query, and now our classical simulation is good to go until Q’s second query!  But now you can see where this is going: we iterate the same approach, and reuse the same assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP, to find a new “mock oracle” that lets us simulate Q’s second query, and so on until all of Q’s queries have been simulated.

OK, I’ll stop there.  I don’t have a clever conclusion or anything.  Thank you.

HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP

August 25th, 2017

This post has a grab bag of topics, unified only by the fact that I can no longer put off blogging about them. So if something doesn’t interest you, just scroll down till you find something that does.


Great news, everyone: following a few reader complaints about the matter, the scottaaronson.com domain now supports https—and even automatically redirects to it! I’m so proud that Shtetl-Optimized has finally entered the technological universe of 1994. Thanks so much to heroic reader Martin Dehnel-Wild for setting this up for me.

Update 26/08/2017: Comments should now be working again; comments are now coming through to the moderated view in the blog’s control panel, so if they don’t show up immediately it might just be awaiting moderation. Thanks for your patience.


Last weekend, I was in Columbia, South Carolina, for a workshop to honor the 60th birthday of Stuart Kurtz, theoretical computer scientist at the University of Chicago.  I gave a talk about how work Kurtz was involved in from the 1990s—for example, on defining the complexity class GapP, and constructing oracles that satisfy conflicting requirements simultaneously—plays a major role in modern research on quantum computational supremacy: as an example, my recent paper with Lijie Chen.  (Except, what a terrible week to be discussing the paths to supremacy!  I promise there are no tiki torches involved, only much weaker photon sources.)

Coincidentally, I don’t know if you read anything about this on social media, but there was this total solar eclipse that passed right over Columbia at the end of the conference.

I’d always wondered why some people travel to remote corners of the earth to catch these.  So the sky gets dark for two minutes, and then it gets light again, in a way that’s been completely understood and predictable for centuries?

Having seen it, I can now tell you the deal, if you missed it and prefer to read about it here rather than 10500 other places online.  At risk of stating the obvious: it’s not the dark sky; it’s the sun’s corona visible around the moon.  Ironically, it’s only when the sun’s blotted out that you can actually look at the sun, at all the weird stuff going on around its disk.

OK, but totality is “only” to eclipses as orgasms are to sex.  There’s also the whole social experience of standing around outside with friends for an hour as the moon gradually takes a bigger bite out of the sun, staring up from time to time with eclipse-glasses to check its progress—and then everyone breaking into applause as the sky finally goes mostly dark, and you can look at the corona with the naked eye.  And then, if you like, standing around for another hour as the moon gradually exits the other way.  (If you’re outside the path of totality, this standing around and checking with eclipse-glasses is the whole experience.)

One cool thing is that, a little before and after totality, shadows on the ground have little crescents in them, as if the eclipse is imprinting its “logo” all over the earth.

For me, the biggest lesson the eclipse drove home was the logarithmic nature of perceived brightness (see also Scott Alexander’s story).  Like, the sun can be more than 90% occluded, and yet it’s barely a shade darker outside.  And you can still only look up with glasses so dark that they blot out everything except the sliver of sun, which still looks pretty much like the normal sun if you catch it out of the corner of your unaided eye.  Only during totality, and a few minutes before and after, is the darkening obvious.


Another topic at the workshop, unsurprisingly, was the ongoing darkening of the United States.  If it wasn’t obvious from my blog’s name, and if saying so explicitly will make any difference for anything, let the record state:

Shtetl-Optimized condemns Nazis, as well as anyone who knowingly marches with Nazis or defends them as “fine people.”

For a year, this blog has consistently described the now-president as a thug, liar, traitor, bully, sexual predator, madman, racist, and fraud, and has urged decent people everywhere to fight him by every peaceful and legal means available.  But if there’s some form of condemnation that I accidentally missed, then after Charlottesville, and Trump’s unhinged quasi-defenses of violent neo-Nazis, and defenses of his previous defenses, etc.—please consider Shtetl-Optimized to have condemned Trump that way also.

At least Charlottesville seems to have set local decisionmakers on an unstoppable course toward removing the country’s remaining Confederate statues—something I strongly supported back in May, before it had become the fully thermonuclear issue that it is now.  In an overnight operation, UT Austin has taken down its statues of Robert E. Lee, Albert Johnston, John Reagan, and Stephen Hogg.  (I confess, the postmaster general of the Confederacy wouldn’t have been my #1 priority for removal.  And, genuine question: what did Texas governor Stephen Hogg do that was so awful for his time, besides naming his daughter Ima Hogg?)


A final thing to talk about—yeah, we can’t avoid it—is Norbert Blum’s claimed proof of P≠NP.  I suppose I should be gratified that, after my last post, there were commenters who said, “OK, but enough about gender politics—what about P vs. NP?”  Here’s what I wrote on Tuesday the 15th:

To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

Many people misunderstood me to be saying that I’d again bet $200,000, even though the sentence said the exact opposite.  Maybe I should’ve said: I’m searching in vain for the right way to teach the nerd world to get less excited about these claims, to have the same reaction that the experts do, which is ‘oh boy, not another one’—which doesn’t mean that you know the error, or even that there is an error, but just means that you know the history.

Speaking of which, some friends and I recently had an awesome idea.  Just today, I registered the domain name haspvsnpbeensolved.com.  I’d like to set this up with a form that lets you type in the URL of a paper claiming to resolve the P vs. NP problem.  The site will then take 30 seconds or so to process the paper—with a status bar, progress updates, etc.—before finally rendering a verdict about the paper’s correctness.  Do any readers volunteer to help me create this?  Don’t worry, I’ll supply the secret algorithm to decide correctness, and will personally vouch for that algorithm for as long as the site remains live.

I have nothing bad to say about Norbert Blum, who made important contributions including the 3n circuit size lower bound for an explicit Boolean function—something that stood until very recently as the world record—and whose P≠NP paper was lucidly written, passing many of the most obvious checks.  And I received a bit of criticism for my “dismissive” stance.  Apparently, some right-wing former string theorist who I no longer read, whose name rhymes with Mubos Lotl, even accused me of being a conformist left-wing ideologue, driven to ignore Blum’s proof by an irrational conviction that any P≠NP proof will necessarily be so difficult that it will need to “await the Second Coming of Christ.”  Luca Trevisan’s reaction to that is worth quoting:

I agree with [Mubos Lotl] that the second coming of Jesus Christ is not a necessary condition for a correct proof that P is different from NP. I am keeping an open mind as to whether it is a sufficient condition.

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind.  Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

That being the case, it’s only intellectual honesty that compels me to report that, by about Friday of last week—i.e., exactly on my predicted schedule—a clear consensus had developed among experts that Blum’s P≠NP proof was irreparably flawed, and the consensus has stood since that time.

I’ve often wished that, even just for an hour or two, I could be free from this terrifying burden that I’ve carried around since childhood: the burden of having the right instincts about virtually everything.  Trust me, this “gift” is a lot less useful than it sounds, especially when reality so often contradicts what’s popular or expedient to say.

The background to Blum’s attempt, the counterexample that shows the proof has to fail somewhere, and the specifics of what appears to go wrong have already been covered at length elsewhere: see especially Luca’s post, Dick Lipton’s post, John Baez’s post, and the CS Theory StackExchange thread.

Very briefly, though: Blum claims to generalize some of the most celebrated complexity results of the 1980s—namely, superpolynomial lower bounds on the sizes of monotone circuits, which consist entirely of Boolean AND and OR gates—so that they also work for general (non-monotone) circuits, consisting of AND, OR, and NOT gates.  Everyone agrees that, if this succeeded, it would imply P≠NP.

Alas, another big discovery from the 1980s was that there are monotone Boolean functions (like Perfect Matching) that require superpolynomial-size monotone circuits, even though they have polynomial-size non-monotone circuits.  Why is that such a bummer?  Because it means our techniques for proving monotone circuit lower bounds can’t possibly work in as much generality as one might’ve naïvely hoped: if they did, they’d imply not merely that P doesn’t contain NP, but also that P doesn’t contain itself.

Blum was aware of all this, and gave arguments as to why his approach evades the Matching counterexample.  The trouble is, there’s another counterexample, which Blum doesn’t address, called Tardos’s function.  This is a weird creature: it’s obtained by starting with a graph invariant called the Lovász theta function, then looking at a polynomial-time approximation scheme for the theta function, and finally rounding the output of that PTAS to get a monotone function.  But whatever: in constructing this function, Tardos achieved her goal, which was to produce a monotone function that all known lower bound techniques for monotone circuits work perfectly fine for, but which is nevertheless in P (i.e., has polynomial-size non-monotone circuits).  In particular, if Blum’s proof worked, then it would also work for Tardos’s function, and that gives us a contradiction.

Of course, this merely tells us that Blum’s proof must have one or more mistakes; it doesn’t pinpoint where they are.  But the latter question has now been addressed as well.  On CS StackExchange, an anonymous commenter who goes variously by “idolvon” and “vloodin” provides a detailed analysis of the proof of Blum’s crucial Theorem 6.  I haven’t gone through every step myself, and there might be more to say about the matter than “vloodin” has, but several experts who are at once smarter, more knowledgeable, more cautious, and more publicity-shy than me have confirmed for me that vloodin correctly identified the erroneous region.

To those who wonder what gave me the confidence to call this immediately, without working through the details: besides the Cassandra-like burden that I was born with, I can explain something that might be helpful.  When Razborov achieved his superpolynomial monotone lower bounds in the 1980s, there was a brief surge of excitement: how far away could a P≠NP proof possibly be?  But then people, including Razborov himself, understood much more deeply what was going on—an understanding that was reflected in the theorems they proved, but also wasn’t completely captured by those theorems.

What was going on was this: monotone circuits are an interesting and nontrivial computational model.  Indeed for certain Boolean functions, such as the “slice functions,” they’re every bit as powerful as general circuits.  However, insofar as it’s possible to prove superpolynomial lower bounds on monotone circuit size, it’s possible only because monotone circuits are ridiculously less expressive than general Boolean circuits for the problems in question.  E.g., it’s possible only because monotone circuits aren’t expressing pseudorandom functions, and therefore aren’t engaging the natural proofs barrier or most of the other terrifying beasts that we’re up against.

So what can we say about the prospect that a minor tweak to the monotone circuit lower bound techniques from the 1980s would yield P≠NP?  If, like Mubos Lotl, you took the view that discrete math and theory of computation are just a mess of disconnected, random statements, then such a prospect would seem as likely to you as not.  But if you’re armed with the understanding above, then this possibility is a lot like the possibility that the OPERA experiment discovered superluminal neutrinos: no, not a logical impossibility, but something that’s safe to bet against at 10,000:1 odds.

During the discussion of Deolalikar’s earlier P≠NP claim, I once compared betting against a proof that all sorts of people are calling “formidable,” “solid,” etc., to standing in front of a huge pendulum—behind the furthest point that it reached the last time—even as it swings toward your face.  Just as certain physics teachers stake their lives on the conservation of energy, so I’m willing to stake my academic reputation, again and again, on the conservation of circuit-lower-bound difficulty.  And here I am, alive to tell the tale.

What I believe II (ft. Sarah Constantin and Stacey Jeffery)

August 15th, 2017

Unrelated Update: To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.


In my post “The Kolmogorov Option,” I tried to step back from current controversies, and use history to reflect on the broader question of how nerds should behave when their penchant for speaking unpopular truths collides head-on with their desire to be kind and decent and charitable, and to be judged as such by their culture.  I was gratified to get positive feedback about this approach from men and women all over the ideological spectrum.

However, a few people who I like and respect accused me of “dogwhistling.” They warned, in particular, that if I wouldn’t just come out and say what I thought about the James Damore Google memo thing, then people would assume the very worst—even though, of course, my friends themselves knew better.

So in this post, I’ll come out and say what I think.  But first, I’ll do something even better: I’ll hand the podium over to two friends, Sarah Constantin and Stacey Jeffery, both of whom were kind enough to email me detailed thoughts in response to my Kolmogorov post.


Sarah Constantin completed her PhD in math at Yale. I don’t think I’ve met her in person yet, but we have a huge number of mutual friends in the so-called “rationalist community.”  Whenever Sarah emails me about something I’ve written, I pay extremely close attention, because I have yet to read a single thing by her that wasn’t full of insight and good sense.  I strongly urge anyone who likes her beautiful essay below to check out her blog, which is called Otium.

Sarah Constantin’s Commentary:

I’ve had a women-in-STEM essay brewing in me for years, but I’ve been reluctant to actually write publicly on the topic for fear of stirring up a firestorm of controversy.  On the other hand, we seem to be at a cultural inflection point on the issue, especially in the wake of the leaked Google memo, and other people are already scared to speak out, so I think it’s past time for me to put my name on the line, and Scott has graciously provided me a platform to do so.

I’m a woman in tech myself. I’m a data scientist doing machine learning for drug discovery at Recursion Pharmaceuticals, and before that I was a data scientist at Palantir. Before that I was a woman in math — I got my PhD from Yale, studying applied harmonic analysis. I’ve been in this world all my adult life, and I obviously don’t believe my gender makes me unfit to do the work.

I’m also not under any misapprehension that I’m some sort of exception. I’ve been mentored by Ingrid Daubechies and Maryam Mirzakhani (the first female Fields Medalist, who died tragically young last month).  I’ve been lucky enough to work with women who are far, far better than me.  There are a lot of remarkable women in math and computer science — women just aren’t the majority in those fields. But “not the majority” doesn’t mean “rare” or “unknown.”

I even think diversity programs can be worthwhile. I went to the Institute for Advanced Studies’ Women and Math Program, which would be an excellent graduate summer school even if it weren’t all-female, and taught at its sister program for high school girls, which likewise is a great math camp independent of the gender angle. There’s a certain magic, if you’re in a male-dominated field, of once in a while being in a room full of women doing math, and I hope that everybody gets to have that experience once.  

But (you knew the “but” was coming), I think the Google memo was largely correct, and the way people conventionally talk about women in tech is wrong.

Let’s look at some of his claims. From the beginning of the memo:

  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
  • Extreme: all disparities in representation are due to oppression
  • Authoritarian: we should discriminate to correct for this oppression

Okay, so there’s a pervasive assumption that any deviation from 50% representation of women in technical jobs is a.) due to oppression, and b.) ought to be corrected by differential hiring practices. I think it is basically true that people widely believe this, and that people can lose their jobs for openly contradicting it (as James Damore, the author of the memo, did).  I have heard people I work with advocating hiring quotas for women (i.e. explicitly earmarking a number of jobs for women candidates only).  It’s not a strawman.

Then, Damore disagrees with this assumption:

  • Differences in distributions of traits between men and women may in part explain why we don’t have 50% representation of women in tech and leadership. Discrimination to reach equal representation is unfair, divisive, and bad for business.

Again, I agree with Damore. Note that this doesn’t mean that I must believe that sexism against women isn’t real and important (I’ve heard enough horror stories to be confident that some work environments are toxic to women).  It doesn’t even mean that I must be certain that the different rates of men and women in technical fields are due to genetics.  I’m very far from certain, and I’m not an expert in psychology. I don’t think I can do justice to the science in this post, so I’m not going to cover the research literature.

But I do think it’s irresponsible to assume a priori that there are no innate sex differences that might explain what we see.  It’s an empirical matter, and a topic for research, not dogma.

Moreover, I think discrimination on the basis of sex to reach equal representation is unfair and unproductive.  It’s unfair, because it’s not meritocratic.  You’re not choosing the best human for the job regardless of gender.

I think women might actually benefit from companies giving genuine meritocracy a chance. “Blind” auditions (in which the evaluator doesn’t see the performer) gave women a better chance of landing orchestra jobs; apparently, orchestras were prejudiced against female musicians, and the blinding canceled out that prejudice. Google’s own research has actually shown that the single best predictor of work performance is a work sample — testing candidates with a small project similar to what they’d do on the job. Work samples are easy to anonymize to reduce gender bias, and they’re more effective than traditional interviews, where split-second first impressions usually decide who gets hired, but don’t correlate at all with job performance. A number of tech companies have switched to work samples as part of their interview process.  I used work samples myself when I was hiring for a startup, just because they seemed more accurate at predicting who’d be good at the job; entirely without intending to, I got a 50% gender ratio.  If you want to reduce gender bias in tech, it’s worth at least considering blinded hiring via work samples.

Moreover, thinking about “representation” in science and technology reflects underlying assumptions that I think are quite dangerous.

You expect interest groups to squabble over who gets a piece of the federal budget. In politics, people will band together in blocs, and try to get the biggest piece of the spoils they can.  “Women should get such-and-such a percent of tech jobs” sounds precisely like this kind of politicking; women are assumed to be a unified bloc who will vote together, and the focus is on what size chunk they can negotiate for themselves. If a tech job (or a university position) were a cushy sinecure, a ticket to privilege, and nothing more, you might reasonably ask “how come some people get more goodies than others? Isn’t meritocracy just an excuse to restrict the goodies to your preferred group?”

Again, this is not a strawman. Here’s one Vox response to the memo stating explicitly that she believes women are a unified bloc:

The manifesto’s sleight-of-hand delineation between “women, on average” and the actual living, breathing women who have had to work alongside this guy failed to reassure many of those women — and failed to reassure me. That’s because the manifesto’s author overestimated the extent to which women are willing to be turned against their own gender.

Speaking for myself, it doesn’t matter to me how soothingly a man coos that I’m not like most women, when those coos are accompanied by misogyny against most women. I am a woman. I do not stop being one during the parts of the day when I am practicing my craft. There can be no realistic chance of individual comfort for me in an environment where others in my demographic categories (or, really, any protected demographic categories) are subjected to skepticism and condescension.

She can’t be comfortable unless everybody in any protected demographic category — note that this is a legal, governmental category — is given the benefit of the doubt?  That’s a pretty collectivist commitment!

Or, look at Piper Harron, an assistant professor in math who blogged on the American Mathematical Society’s website that universities should simply “stop hiring white cis men”, and explicitly says “If you are on a hiring committee, and you are looking at applicants and you see a stellar white male applicant, think long and hard about whether your department needs another white man. You are not hiring a researching robot who will output papers from a dark closet. You are hiring an educator, a role model, a spokesperson, an advisor, a committee person … There is no objectivity. There is no meritocracy.”

Piper Harron reflects an extreme, of course, but she’s explicitly saying, on America’s major communication channel for and by mathematicians, that whether you get to work in math should not be based on whether you’re actually good at math. For her, it’s all politics.  Life itself is political, and therefore a zero-sum power struggle between groups.  

But most of us, male or female, didn’t fall in love with science and technology for that. Science is the mission to explore and understand our universe. Technology is the project of expanding human power to shape that universe. What we do towards those goals will live longer than any “protected demographic category”, any nation, any civilization.  We know how the Babylonians mapped the stars.

Women deserve an equal chance at a berth on the journey of exploration not because they form a political bloc but because some of them are discoverers and can contribute to the human mission.

Maybe, in a world corrupted by rent-seeking, the majority of well-paying jobs have some element of unearned privilege; perhaps almost all of us got at least part of our salaries by indirectly expropriating someone who had as good a right to it as us.

But that’s not a good thing, and that’s not what we hope for science and engineering to be, and I truly believe that this is not the inevitable fate of the human race — that we can only squabble over scraps, and never create.  

I’ve seen creation, and I’ve seen discovery. I know they’re real.

I care a lot more about whether my company achieves its goal of curing 100 rare diseases in 10 years than about the demographic makeup of our team.  We have an actual mission; we are trying to do something beyond collecting spoils.  

Do I rely on brilliant work by other women every day? I do. My respect for myself and my female colleagues is not incompatible with primarily caring about the mission.

Am I “turning against my own gender” because I see women as individuals first? I don’t think so. We’re half the human race, for Pete’s sake! We’re diverse. We disagree. We’re human.

When you think of “women-in-STEM” as a talking point on a political agenda, you mention Ada Lovelace and Grace Hopper in passing, and move on to talking about quotas.  When you think of women as individuals, you start to notice how many genuinely foundational advances were made by women — just in my own field of machine learning, Adele Cutler co-invented random forests, Corrina Cortes co-invented support vector machines, and Fei Fei Li created the famous ImageNet benchmark dataset that started a revolution in image recognition.

As a child, my favorite book was Carl Sagan’s Contact, a novel about Ellie Arroway, an astronomer loosely based on his wife Ann Druyan. The name is not an accident; like the title character in Sinclair Lewis’ Arrowsmith, Ellie is a truth-seeking scientist who battles corruption, anti-intellectualism, and blind prejudice.  Sexism is one of the challenges she faces, but the essence of her life is about wonder and curiosity. She’s what I’ve always tried to become.

I hope that, in seeking to encourage the world’s Ellies in science and technology, we remember why we’re doing that in the first place. I hope we remember humans are explorers.


Now let’s hear from another friend who wrote to me recently, and who has a slightly different take.  Stacey Jeffery is a quantum computing theorist at one of my favorite research centers, CWI in Amsterdam.  She completed her PhD at University of Waterloo, and has done wonderful work on quantum query complexity and other topics close to my heart.  When I was being viciously attacked in the comment-171 affair, Stacey was one of the first people to send me a note of support, and I’ve never forgotten it.

Stacey Jeffery’s Commentary

I don’t think Google was right to fire Damore. This makes me a minority among people with whom I have discussed this issue.  Hopefully some people come out in the comments in support of the other position, so it’s not just me presenting that view, but the main argument I encountered was that what he said just sounded way too sexist for Google to put up with.  I agree with part of that, it did sound sexist to me.  In fact it also sounded racist to me. But that’s not because he necessarily said anything actually sexist or actually racist, but because he said the kinds of things that you usually only hear from sexist people, and in particular, the kind of sexist people who are also racist.  I’m very unlikely to try to pursue further interaction with a person who says these kinds of things for those reasons, but I think firing him for what he said between the lines sets a very bad precedent.  It seems to me he was fired for associating himself with the wrong ideas, and it does feel a bit like certain subjects are not up for rational discussion.  If Google wants an open environment, where employees can feel safe discussing company policy, I don’t think this contributes to that.  If they want their employees, and the world, to think that they aim for diversity because it’s the most rational course of action to achieve their overall objectives, rather than because it serves some secret agenda, like maintaining a PC public image, then I don’t think they’ve served that cause either.  Personally, this irritates me the most, because I feel they have damaged the image for a cause I feel strongly about.

My position is independent of the validity of Damore’s attempt at scientific argument, which is outside my area of expertise.  I personally don’t think it’s very productive for non-social-scientists to take authoritative positions on social science issues, especially ones that appear to be controversial within the field (but I say this as a layperson).  This may include some of the other commentary in this blog post, which I have not yet read, and might even extend to Scott’s decision to comment on this issue at all (but this bridge was crossed in the previous blog post).  However, I think one of the reasons that many of us do this is that the burden of solving the problem of too few women in STEM is often placed on us.  Some people in STEM feel they are blamed for not being welcoming enough to women (in fact, in my specific field, it’s my experience that the majority of people are very sympathetic).  Many scientific funding applications even ask applicants how they plan to address the issue of diversity, as if they should be the ones to come up with a solution for this difficult problem that nobody knows the answer to, and is not even within their expertise.  So it’s not surprising when these same people start to think about and form opinions on these social science issues.  Obviously, we working in STEM have valuable insight into how we might encourage women to pursue STEM careers, and we should be pushed to think about this, but we don’t have all the answers (and maybe we should remember that the next time we consider authoring an authoritative memo on the subject).


Scott’s Mansplaining Commentary

I’m incredibly grateful to Sarah and Stacey for sharing their views.  Now it’s time for me to mansplain my own thoughts in light of what they said.  Let me start with a seven-point creed.

1. I believe that science and engineering, both in academia and in industry, benefit enormously from contributions from people of every ethnic background and gender identity.  This sort of university-president-style banality shouldn’t even need to be said, but in a world where the President of the US criticizes neo-Nazis only under extreme pressure from his own party, I suppose it does.

2. I believe that there’s no noticeable difference in average ability between men and women in STEM fields—or if there’s some small disparity, for all I know the advantage goes to women. I have enough Sheldon Cooper in me that, if this hadn’t been my experience, I’d probably let it slip that it hadn’t been, but it has been.  When I taught 6.045 (undergrad computability and complexity) at MIT, women were only 20% or so of the students, but for whatever reasons they were wildly overrepresented among the top students.

3. I believe that women in STEM face obstacles that men don’t.  These range from the sheer awkwardness of sometimes being the only woman in a room full of guys, to challenges related to pregnancy and childcare, to actual belittlement and harassment.  Note that, even if men in STEM fields are no more sexist on average than men in other fields—or are less sexist, as one might expect from their generally socially liberal views and attitudes—the mere fact of the gender imbalance means that women in STEM will have many more opportunities to be exposed to whatever sexists there are.  This puts a special burden on us to create a welcoming environment for women.

4. Given that we know that gender gaps in interest and inclination appear early in life, I believe in doing anything we can to encourage girls’ interest in STEM fields.  Trust me, my four-year-old daughter Lily wishes I didn’t believe so fervently in working with her every day on her math skills.

5. I believe that gender diversity is valuable in itself.  It’s just nicer, for men and women alike, to have a work environment with many people of both sexes—especially if (as is often the case in STEM) so much of our lives revolves around our work.  I think that affirmative action for women, women-only scholarships and conferences, and other current efforts to improve gender diversity can all be defended and supported on that ground alone.

6. I believe that John Stuart Mill’s The Subjection of Women is one of the masterpieces of history, possibly the highest pinnacle that moral philosophy has ever reached.  Everyone should read it carefully and reflect on it if they haven’t already.

7. I believe it’s a tragedy that the current holder of the US presidency is a confessed sexual predator, who’s full of contempt not merely for feminism, but for essentially every worthwhile human value. I believe those of us on the “pro-Enlightenment side” now face the historic burden of banding together to stop this thug by every legal and peaceful means available. I believe that, whenever the “good guys” tear each other down in internecine warfare—e.g. “nerds vs. feminists”—it represents a wasted opportunity and an unearned victory for the enemies of progress.

OK, now for the part that might blow some people’s minds.  I hold that every single belief above is compatible with what James Damore wrote in his now-infamous memo—at least, if we’re talking about the actual words in it.  In some cases, Damore even makes the above points himself.  In particular, there’s nothing in what he wrote about female Googlers being less qualified on average than male Googlers, or being too neurotic to code, or anything like that: the question at hand is just why there are fewer women in these positions, and that in turn becomes a question about why there are fewer women earlier in the CS pipeline.  Reasonable people need not agree about the answers to those questions, or regard them as known or obvious, to see that the failure to make this one elementary distinction, between quality and quantity, already condemns 95% of Damore’s attackers as not having read or understood what he wrote.

Let that be the measure of just how terrifyingly efficient the social-media outrage machine has become at twisting its victims’ words to fit a clickbait narrative—a phenomenon with which I happen to be personally acquainted.  Strikingly, it seems not to make the slightest difference if (as in this case) the original source text is easily available to everyone.

Still, while most coverage of Damore’s memo was depressing in its monotonous incomprehension, dissent was by no means confined to the right-wingers eager to recruit Damore to their side.  Peter Singer—the legendary leftist moral philosopher, and someone whose fearlessness and consistency I’ve always admired whether I’ve agreed with him or not—wrote a powerful condemnation of Google’s decision to fire Damore.  Scott Alexander was brilliant as usual in picking apart bad arguments.  Megan McArdle drew on her experiences to illustrate some of Damore’s contentions.  Steven Pinker tweeted that Damore’s firing “makes [the] job of anti-Trumpists harder.”

Like Peter Singer, and also like Sarah Constantin and Stacey Jeffery above, I have no plans to take any position on biological differences in male and female inclinations and cognitive styles, and what role (if any) such differences might play in 80% of Google engineers being male—or, for that matter, what role they might play in 80% of graduating veterinarians now being female, or other striking gender gaps.  I decline to take a position not only because I’m not an expert, but also because, as Singer says, doing so isn’t necessary to reach the right verdict about Damore’s firing.  It suffices to note that the basic thesis being discussed—namely, that natural selection doesn’t stop at the neck, and that it’s perfectly plausible that it acted differently on women and men in ways that might help explain many of the population-level differences that we see today—can also be found in, for example, The Blank Slate by Steven Pinker, and other mainstream works by some of the greatest thinkers alive.

And therefore I say: if James Damore deserves to be fired from Google, for treating evolutionary psychology as potentially relevant to social issues, then Steven Pinker deserves to be fired from Harvard for the same offense.

Yes, I realize that an employee of a private company is different from a tenured professor.  But I don’t see why it’s relevant here.  For if someone really believes that mooting the hypothesis of an evolutionary reason for average differences in cognitive styles between men and women, is enough by itself to create a hostile environment for women—well then, why should tenure be a bar to firing, any more than it is in cases of sexual harassment?

But the reductio needn’t stop there.  It seems to me that, if Damore deserves to be fired, then so do the 56% of Googlers who said in a poll that they opposed his firing.  For isn’t that 56% just as responsible for maintaining a hostile environment as Damore himself was? (And how would Google find out which employees opposed the firing? Well, if there’s any company on earth that could…)  Furthermore, after those 56% of Googlers are fired, any of the remaining 44% who think the 56% shouldn’t have been fired should be fired as well!  And so on iteratively, until only an ideologically reliable core remains, which might or might not be the empty set.

OK, but while the wider implications of Damore’s firing have frightened and depressed me all week, as I said, I depart from Damore on the question of affirmative action and other diversity policies.  Fundamentally, what I want is a sort of negotiated agreement or bargain, between STEM nerds and the wider culture in which they live.  The agreement would work like this: STEM nerds do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies.  The STEM nerds also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves,  or anything else potentially inflammatory.  In return, just two things:

  1. Male STEM nerds don’t regularly get libelled as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.
  2. The fields beloved by STEM nerds are suffered to continue to exist, rather than getting destroyed and rebuilt along explicitly ideological lines, as already happened with many humanities and social science fields.

So in summary, neither side advances its theories about the causes of gender gaps; both sides simply agree that there are more interesting topics to explore.  In concrete terms, the social-justice side gets to retain 100% of what it has now, or maybe even expand it.  And all it has to offer in exchange is “R-E-S-P-E-C-T“!  Like, don’t smear and shame male nerds as a class, or nerdy disciplines themselves, for gender gaps that the male nerds would be as happy as anybody to see eradicated.

The trouble is that, fueled by outrage-fests on social media, I think the social-justice side is currently failing to uphold its end of this imagined bargain.  Nearly every day the sun rises to yet another thinkpiece about the toxic “bro culture” of Silicon Valley: a culture so uniquely and incorrigibly misogynist, it seems, that it still intentionally keeps women out, even after law and biology and most other white-collar fields have achieved or exceeded gender parity, their own “bro cultures” notwithstanding.  The trouble with this slander against male STEM nerds, besides its fundamental falsity (which Scott Alexander documented), is that puts the male nerds into an impossible position.  For how can they refute the slander without talking about other possible explanations for fields like CS being 80% male, which is the very thing we all know they’re not supposed to talk about?

In Europe, in the Middle Ages, the Church would sometimes enjoy forcing the local Jews into “disputations” about whose religion was the true one.  At these events, a popular tactic on the Church’s side was to make statements that the Jews couldn’t possibly answer without blaspheming the name of Christ—which, of course, could lead to the Jews’ expulsion or execution if they dared it.

Maybe I have weird moral intuitions, but it’s hard for me to imagine a more contemptible act of intellectual treason, than deliberately trapping your opponents between surrender and blasphemy.  I’d actually rather have someone force me into one or the other, than make me choose, and thereby make me responsible for whichever choice I made.  So I believe the social-justice left would do well to forswear this trapping tactic forever.

Ironically, I suspect that in the long term, doing so would benefit no entity more than the social-justice left itself.  If I had to steelman, in one sentence, the argument that in the space of one year propelled the “alt-right” from obscurity in dark and hateful corners of the Internet, to the improbable and ghastly ascent of Donald Trump and his white-nationalist brigade to the most powerful office on earth, the argument would be this:

If the elites, the technocrats, the “Cathedral”-dwellers, were willing to lie to the masses about humans being blank slates—and they obviously were—then why shouldn’t we assume that they also lied to us about healthcare and free trade and guns and climate change and everything else?

We progressives deluded ourselves that we could permanently shame our enemies into silence, on pain of sexism, racism, xenophobia, and other blasphemies.  But the “victories” won that way were hollow and illusory, and the crumbling of the illusion brings us to where we are now: with a vindictive, delusional madman in the White House who has a non-negligible chance of starting a nuclear war this week.

The Enlightenment was a specific historical period in 18th-century Europe.  But the term can also be used much more broadly, to refer to every trend in human history that’s other than horrible.  Seen that way, the Enlightenment encompasses the scientific revolution, the abolition of slavery, the decline of all forms of violence, the spread of democracy and literacy, and the liberation of women from domestic drudgery to careers of their own choosing.  The invention of Google, which made the entire world’s knowledge just a search bar away, is now also a permanent part of the story of the Enlightenment.

I fantasize that, within my lifetime, the Enlightenment will expand further to tolerate a diversity of cognitive styles—including people on the Asperger’s and autism spectrum, with their penchant for speaking uncomfortable truths—as well as a diversity of natural abilities and inclinations.  Society might or might not get the “demographically correct” percentage of Ellie Arroways—Ellie might decide to become a doctor or musician rather than an astronomer, and that’s fine too—but most important, it will nurture all the Ellie Arroways that it gets, all the misfits and explorers of every background.  I wonder whether, while disagreeing on exactly what’s meant by it, all parties to this debate could agree that diversity represents a next frontier for the Enlightenment.


Comment Policy: Any comment, from any side, that attacks people rather than propositions will be deleted.  I don’t care if the comment also makes useful points: if it contains a single ad hominem, it’s out.

As it happens, I’m at a quantum supremacy workshop in Bristol, UK right now—yeah, yeah, I’m a closet supremacist after all, hur hur—so I probably won’t participate in the comments until later.

The Kolmogorov option

August 8th, 2017

Andrey Nikolaevich Kolmogorov was one of the giants of 20th-century mathematics.  I’ve always found it amazing that the same man was responsible both for establishing the foundations of classical probability theory in the 1930s, and also for co-inventing the theory of algorithmic randomness (a.k.a. Kolmogorov complexity) in the 1960s, which challenged the classical foundations, by holding that it is possible after all to talk about the entropy of an individual object, without reference to any ensemble from which the object was drawn.  Incredibly, going strong into his eighties, Kolmogorov then pioneered the study of “sophistication,” which amends Kolmogorov complexity to assign low values both to “simple” objects and “random” ones, and high values only to a third category of objects, which are “neither simple nor random.”  So, Kolmogorov was at the vanguard of the revolution, counter-revolution, and counter-counter-revolution.

But that doesn’t even scratch the surface of his accomplishments: he made fundamental contributions to topology and dynamical systems, and together with Vladimir Arnold, solved Hilbert’s thirteenth problem, showing that any multivariate continuous function can be written as a composition of continuous functions of two variables.  He mentored an awe-inspiring list of young mathematicians, whose names (besides Arnold) include Dobrushin, Dynkin, Gelfand, Martin-Löf, Sinai, and in theoretical computer science, our own Leonid Levin.  If that wasn’t enough, during World War II Kolmogorov applied his mathematical gifts to artillery problems, helping to protect Moscow from German bombardment.

Kolmogorov was private in his personal and political life, which might have had something to do with being gay, at a time and place when that was in no way widely accepted.  From what I’ve read—for example, in Gessen’s biography of Perelman—Kolmogorov seems to have been generally a model of integrity and decency.  He established schools for mathematically gifted children, which became jewels of the Soviet Union; one still reads about them with awe.  And at a time when Soviet mathematics was convulsed by antisemitism—with students of Jewish descent excluded from the top math programs for made-up reasons, sent instead to remote trade schools—Kolmogorov quietly protected Jewish researchers.

OK, but all this leaves a question.  Kolmogorov was a leading and admired Soviet scientist all through the era of Stalin’s purges, the Gulag, the KGB, the murders and disappearances and forced confessions, the show trials, the rewritings of history, the allies suddenly denounced as traitors, the tragicomedy of Lysenkoism.  Anyone as intelligent, individualistic, and morally sensitive as Kolmogorov would obviously have seen through the lies of his government, and been horrified by its brutality.  So then why did he utter nary a word in public against what was happening?

As far as I can tell, the answer is simply: because Kolmogorov knew better than to pick fights he couldn’t win.  He judged that he could best serve the cause of truth by building up an enclosed little bubble of truth, and protecting that bubble from interference by the Soviet system, and even making the bubble useful to the system wherever he could—rather than futilely struggling to reform the system, and simply making martyrs of himself and all his students for his trouble.

There’s a saying of Kolmogorov, which associates wisdom with keeping your mouth shut:

“Every mathematician believes that he is ahead of the others. The reason none state this belief in public is because they are intelligent people.”

There’s also a story that Kolmogorov loved to tell about himself, which presents math as a sort of refuge from the arbitrariness of the world: he said that he once studied to become a historian, but was put off by the fact that historians demanded ten different proofs for the same proposition, whereas in math, a single proof suffices.

There was also a dark side to political quietism.  In 1936, Kolmogorov joined other mathematicians in testifying against his former mentor in the so-called Luzin affair.  By many accounts, he did this because the police blackmailed him, by threatening to reveal his homosexual relationship with Pavel Aleksandrov.  On the other hand, while he was never foolish enough to take on Lysenko directly, Kolmogorov did publish a paper in 1940 courageously supporting Mendelian genetics.


It seems likely that in every culture, there have been truths, which moreover everyone knows to be true on some level, but which are so corrosive to the culture’s moral self-conception that one can’t assert them, or even entertain them seriously, without (in the best case) being ostracized for the rest of one’s life.  In the USSR, those truths were the ones that undermined the entire communist project: for example, that humans are not blank slates; that Mendelian genetics is right; that Soviet collectivized agriculture was a humanitarian disaster.  In our own culture, those truths are—well, you didn’t expect me to say, did you? 🙂

I’ve long been fascinated by the psychology of unspeakable truths.  Like, for any halfway perceptive person in the USSR, there must have been an incredible temptation to make a name for yourself as a daring truth-teller: so much low-hanging fruit!  So much to say that’s correct and important, and that best of all, hardly anyone else is saying!

But then one would think better of it.  It’s not as if, when you speak a forbidden truth, your colleagues and superiors will thank you for correcting their misconceptions.  Indeed, it’s not as if they didn’t already know, on some level, whatever you imagined yourself telling them.  In fact it’s often because they fear you might be right that the authorities see no choice but to make an example of you, lest the heresy spread more widely.  One corollary is that the more reasonably and cogently you make your case, the more you force the authorities’ hand.

But what’s the inner psychology of the authorities?  For some, it probably really is as cynical as the preceding paragraph makes it sound.  But for most, I doubt that.  I think that most authorities simply internalize the ruling ideology so deeply that they equate dissent with sin.  So in particular, the better you can ground your case in empirical facts, the craftier and more conniving a deceiver you become in their eyes, and hence the more virtuous they are for punishing you.  Someone who’s arrived at that point is completely insulated from argument: absent some crisis that makes them reevaluate their entire life, there’s no sense in even trying.  The question of whether or not your arguments have merit won’t even get entered upon, nor will the authority ever be able to repeat back your arguments in a form you’d recognize—for even repeating the arguments correctly could invite accusations of secretly agreeing with them.  Instead, the sole subject of interest will be you: who you think you are, what your motivations were to utter something so divisive and hateful.  And you have as good a chance of convincing authorities of your benign motivations as you’d have of convincing the Inquisition that, sure, you’re a heretic, but the good kind of heretic, the kind who rejects the divinity of Jesus but believes in niceness and tolerance and helping people.  To an Inquisitor, “good heretic” doesn’t parse any better than “round square,” and the very utterance of such a phrase is an invitation to mockery.  If the Inquisition had had Twitter, its favorite sentence would be “I can’t even.”

If it means anything to be a lover of truth, it means that anytime society finds itself stuck in one of these naked-emperor equilibriums—i.e., an equilibrium with certain facts known to nearly everyone, but severe punishments for anyone who tries to make those facts common knowledge—you hope that eventually society climbs its way out.  But crucially, you can hope this while also realizing that, if you tried singlehandedly to change the equilibrium, it wouldn’t achieve anything good for the cause of truth.  If iconoclasts simply throw themselves against a ruling ideology one by one, they can be picked off as easily as tribesmen charging a tank with spears, and each kill will only embolden the tank-gunners still further.  The charging tribesmen don’t even have the assurance that, if truth ultimately does prevail, then they’ll be honored as martyrs: they might instead end up like Ted Nelson babbling about hypertext in 1960, or H.C. Pocklington yammering about polynomial-time algorithms in 1917, nearly forgotten by history for being too far ahead of their time.

Does this mean that, like Winston Smith, the iconoclast simply must accept that 2+2=5, and that a boot will stamp on a human face forever?  No, not at all.  Instead the iconoclast can choose what I think of as the Kolmogorov option.  This is where you build up fortresses of truth in places the ideological authorities don’t particularly understand or care about, like pure math, or butterfly taxonomy, or irregular verbs.  You avoid a direct assault on any beliefs your culture considers necessary for it to operate.  You even seek out common ground with the local enforcers of orthodoxy.  Best of all is a shared enemy, and a way your knowledge and skills might be useful against that enemy.  For Kolmogorov, the shared enemy was the Nazis; for someone today, an excellent choice might be Trump, who’s rightly despised by many intellectual factions that spend most of their time despising each other.  Meanwhile, you wait for a moment when, because of social tectonic shifts beyond your control, the ruling ideology has become fragile enough that truth-tellers acting in concert really can bring it down.  You accept that this moment of reckoning might never arrive, or not in your lifetime.  But even if so, you could still be honored by future generations for building your local pocket of truth, and for not giving falsehood any more aid or comfort than was necessary for your survival.


When it comes to the amount of flak one takes for defending controversial views in public under one’s own name, I defer to almost no one.  For anyone tempted, based on this post, to call me a conformist or coward: how many times have you been denounced online, and from how many different corners of the ideological spectrum?  How many people have demanded your firing?   How many death threats have you received?  How many threatened lawsuits?  How many comments that simply say “kill yourself kike” or similar?  Answer and we can talk about cowardice.

But, yes, there are places even I won’t go, hills I won’t die on.  Broadly speaking:

  • My Law is that, as a scientist, I’ll hold discovering and disseminating the truth to be a central duty of my life, one that overrides almost every other value.  I’ll constantly urge myself to share what I see as the truth, even if it’s wildly unpopular, or makes me look weird, or is otherwise damaging to me.
  • The Amendment to the Law is that I’ll go to great lengths not to hurt anyone else’s feelings: for example, by propagating negative stereotypes, or by saying anything that might discourage any enthusiastic person from entering science.  And if I don’t understand what is or isn’t hurtful, then I’ll defer to the leading intellectuals in my culture to tell me.  This Amendment often overrides the Law, causing me to bite my tongue.
  • The Amendment to the Amendment is that, when pushed, I’ll stand by what I care about—such as free scientific inquiry, liberal Enlightenment norms, humor, clarity, and the survival of the planet and of family and friends and colleagues and nerdy misfits wherever they might be found.  So if someone puts me in a situation where there’s no way to protect what I care about without speaking a truth that hurts someone’s feelings, then I might speak the truth, feelings be damned.  (Even then, though, I’ll try to minimize collateral damage.)

When I see social media ablaze with this or that popular falsehood, I sometimes feel the “Galileo urge” washing over me.  I think: I’m a tenured professor with a semi-popular blog.  How can I look myself in the mirror, if I won’t use my platform and relative job safety to declare to the world, “and yet it moves”?

But then I remember that even Galileo weighed his options and tried hard to be prudent.  In his mind, the Dialogue Concerning the Two Chief World Systems actually represented a compromise (!).  Galileo never declared outright that the earth orbits the sun.  Instead, he put the Copernican doctrine, as a “possible view,” into the mouth of his character Salviati—only to have Simplicio “refute” Salviati, by the final dialogue, with the argument that faith always trumps reason, and that human beings are pathetically unequipped to deduce the plan of God from mere surface appearances.  Then, when that fig-leaf turned out not to be wide enough to fool the Church, Galileo quickly capitulated.  He repented of his error, and agreed never to defend the Copernican heresy again.  And he didn’t, at least not publicly.

Some have called Galileo a coward for that.  But the great David Hilbert held a different view.  Hilbert said that science, unlike religion, has no need for martyrs, because it’s based on facts that can’t be denied indefinitely.  Given that, Hilbert considered Galileo’s response to be precisely correct: in effect Galileo told the Inquisitors, hey, you’re the ones with the torture rack.  Just tell me which way you want it.  I can have the earth orbiting Mars and Venus in figure-eights by tomorrow if you decree it so.

Three hundred years later, Andrey Kolmogorov would say to the Soviet authorities, in so many words: hey, you’re the ones with the Gulag and secret police.  Consider me at your service.  I’ll even help you stop Hitler’s ideology from taking over the world—you’re 100% right about that one, I’ll give you that.  Now as for your own wondrous ideology: just tell me the dogma of the week, and I’ll try to make sure Soviet mathematics presents no threat to it.

There’s a quiet dignity to Kolmogorov’s (and Galileo’s) approach: a dignity that I suspect will be alien to many, but recognizable to those in the business of science.


Comment Policy: I welcome discussion about the responses of Galileo, Kolmogorov, and other historical figures to official ideologies that they didn’t believe in; and about the meta-question of how a truth-valuing person ought to behave when living under such ideologies.  In the hopes of maintaining a civil discussion, any comments that mention current hot-button ideological disputes will be ruthlessly deleted.