A query complexity breakthrough

June 17th, 2015

Update (June 26): See this just-released paper, which independently obtains a couple of the same results as the Ambainis et al. paper, but in a different way (using the original Göös et al. function, rather than modifications of it).


Lots of people have accused me of overusing the word “breakthrough” on this blog. So I ask them: what word should I use when a paper comes out that solves not one, not two, but three of the open problems I’ve cared about most for literally half of my life, since I was 17 years old?

Yesterday morning, Andris Ambainis, Kaspars Balodis, Aleksandrs Belovs, Troy Lee, Miklos Santha, and Juris Smotrovs posted a preprint to ECCC in which they give:

(1) A total Boolean function f with roughly a fourth-power separation between its deterministic and bounded-error quantum query complexities (i.e., with D(f)~Q(f)4). This refutes the conjecture, which people have been making since Beals et al.’s seminal work in 1998, that the biggest possible gap is quadratic.

(2) A total Boolean function f with a quadratic separation between its deterministic and randomized query complexities (with D(f)~R0(f)2). This refutes a conjecture of Saks and Wigderson from 1986, that the best possible gap is R0(f)~D(f)0.753 (from the recursive AND/OR tree), and shows that the known relation D(f)=O(R0(f)2) is close to tight.

(3) The first total Boolean function f with any asymptotic gap between its zero-error and bounded-error randomized query complexities (in particular, with R0(f)~R(f)3/2).

(There are also other new separations—for example, involving exact quantum query complexity and approximate degree as a real polynomial. But the above three are the most spectacular to me.)

In updates to this post (coming soon), I’ll try my best to explain to general readers what D(f), R(f), and so forth are (see here for the classic survey of these measures), and I’ll also discuss how Ambainis et al. designed the strange functions f that achieve the separations (though their paper already does a good job of explaining it). For now, I’ll just write the stuff that’s easier to write.

I’m at the Federated Computing Research Conference in Portland, Oregon right now, where yesterday I gave my STOC talk (click here for the PowerPoint slides) about the largest possible separations between R(f) and Q(f) for partial Boolean functions f. (That paper is also joint work with Andris Ambainis, who has his fingers in many pies, or his queries in many oracles, or something.) Anyway, when I did a practice run of my talk on Monday night, I commented that, of course, for total Boolean functions f (those not involving a promise), the largest known gap between R(f) and Q(f) is quadratic, and is achieved when f is the OR function because of Grover’s algorithm.

Then, Tuesday morning, an hour before I was to give my talk, I saw the Ambainis et al. bombshell, which made that comment obsolete. So, being notoriously bad at keeping my mouth shut, I mentioned to my audience that, while it was great that they came all the way to Portland to learn what was new in theoretical computer science, if they wanted real news in the subfield I was talking about, they could stop listening to me and check their laptops.

(Having said that, I have had a wonderful time at FCRC, and have learned lots of other interesting things—I can do another addendum to the post about FCRC highlights if people want me to.)

Anyway, within the tiny world of query complexity—i.e., the world where I cut my teeth and spent much of my career—the Ambainis et al. paper is sufficiently revolutionary that I feel the need to say what it doesn’t do.

First, the paper does not give a better-than-quadratic gap between R(f) and Q(f) (i.e., between bounded-error randomized and quantum query complexities). The quantum algorithms that compute their functions f are still “just” variants of the old standbys, Grover’s algorithm and amplitude amplification. What’s new is that the authors have found functions where you can get the quadratic, Grover speedup between R(f) and Q(f), while also getting asymptotic gaps between D(f) and R(f), and between R0(f) and R(f). So, putting it together, you get superquadratic gaps between D(f) and Q(f), and between R0(f) and Q(f). But it remains at least a plausible conjecture that R(f)=O(Q(f)2) for all total Boolean functions f—i.e., if you insist on a “fair comparison,” then the largest known quantum speedup for total Boolean functions remains the Grover one.

Second, as far as I can tell (I might be mistaken) (I’m not), the paper doesn’t give new separations involving certificate complexity or block sensitivity (e.g., between D(f) and bs(f)). So for example, it remains open whether D(f)=O(bs(f)2), and whether C(f)=O(bs(f)α) for some α<2. (Update: Avishay Tal, in the comments, informs me that the latter conjecture was falsified by Gilmer, Saks, and Srinivasan in 2013. Wow, I’m really out of it!)

In the end, achieving these separations didn’t require any sophisticated new mathematical machinery—just finding the right functions, something that could’ve been done back in 1998, had anyone been clever enough. So, where did these bizarre functions f come from? Ambainis et al. directly adapted them from a great recent communication complexity paper by Mika Göös, Toniann Pitassi, and Thomas Watson. But the Göös et al. paper itself could’ve been written much earlier. It’s yet another example of something I’ve seen again and again in this business, how there’s no substitute for just playing around with a bunch of examples.

The highest compliment one researcher can pay another is, “I wish I’d found that myself.” And I do, of course, but having missed it, I’m thrilled that at least I get to be alive for it and blog about it. Huge congratulations to the authors!


Addendum: What’s this about?

OK, so let’s say you have a Boolean function f:{0,1}n→{0,1}, mapping n input bits to 1 output bit. Some examples are the OR function, which outputs 1 if any of the n input bits are 1, and the MAJORITY function, which outputs 1 if the majority of them are.

Query complexity is the study of how many input bits you need to read in order to learn the value of the output bit. So for example, in evaluating the OR function, if you found a single input bit that was 1, you could stop right there: you’d know that the output was 1, without even needing to look at the remaining bits. In the worst case, however, if the input consisted of all 0s, you’d have to look at all of them before you could be totally sure the output was 0. So we say that the OR function has a deterministic query complexity of n.

In this game, we don’t care about any other resources used by an algorithm, like memory or running time: just how many bits of the input it looks at! There are many reasons why, but the simplest is that, unlike with memory or running time, for many functions we can actually figure out how many input bits need to be looked at, without needing to solve anything like P vs. NP. (But note that this can already be nontrivial! For algorithms can often cleverly avoid looking at all the bits, for example by looking at some and then deciding which ones to look at next based on which values they see.)

In general, given a deterministic algorithm A and an n-bit input string x, let DA,x (an integer from 0 to n) be the number of bits of x that A examines when you run it. Then let DA be the maximum of DA,x over all n-bit strings x. Then D(f), or the deterministic query complexity of f, is the minimum of DA, over all algorithms A that correctly evaluate f(x) on every input x.

For example, D(OR) and D(MAJORITY) are both n: in the worst case, you need to read everything. For a more interesting example, consider the 3-bit Boolean function

f(x,y,z) = (not(x) and y) or (x and z).

This function has D(f)=2, even though it depends on all 3 of the input bits. (Do you see why?) In general, even if f depends on n input bits, D(f) could be as small as log2n.

The bounded-error randomized query complexity, or R(f), is like D(f), except that now we allow the algorithm to make random choices of which input bit to query, and for each input x, the algorithm only needs to compute f(x) with probability 2/3. (Here the choice of 2/3 is arbitrary; if you wanted the right answer with some larger constant probability, say 99.9%, you could just repeat the algorithm a constant number of times and take a majority vote.) The zero-error randomized query complexity, or R0(f), is the variant where the algorithm is allowed to make random choices, but at the end of the day, needs to output the correct f(x) with probability 1.

To illustrate these concepts, consider the three-bit majority function, MAJ(x,y,z). We have D(MAJ)=3, since if a deterministic algorithm queried one bit and got a 0 and queried a second bit and got a 1 (as can happen), it would have no choice but to query the third bit. But for any possible setting of x, y, and z, if we choose which bits to query randomly, there’s at least a 1/3 chance that the first two queries will return either two 0s or two 1s—at which point we can stop, with no need to query the third bit. Hence R0(MAJ)≤(1/3)2+(2/3)3=8/3 (in fact it equals 8/3, although we haven’t quite shown that). Meanwhile, R(MAJ), as we defined it, is only 1, since if you just need a 2/3 probability of being correct, you can simply pick x, y, or z at random and output it!

The bounded-error quantum query complexity, or Q(f), is the minimum number of queries made by a quantum algorithm for f, which, again, has to output the right answer with probability at least 2/3 for every input x. Here a quantum algorithm makes a “query” by feeding a superposition of basis states, each of the form |i,a,w〉, to a “black box,” which maps each basis state to |i, a XOR xi, w〉, where i is the index of the input bit xi to be queried, a is a 1-qubit “answer register” into which xi is reversibly written, and w is a “workspace” that doesn’t participate in the query. In between two queries, the algorithm can apply any unitary transformation it wants to the superposition of |i,a,w〉’s, as long as it doesn’t depend on x. Finally, some designated qubit is measured to decide whether the algorithm accepts or rejects.

As an example, consider the 2-bit XOR function, XOR(x,y). We have D(XOR)=R0(XOR)=R(XOR)=2, since until you’ve queried both bits, you’ve learned nothing about their XOR. By contrast, Q(XOR)=1, because of the famous Deutsch-Jozsa algorithm.

It’s clear that

0 ≤ Q(f) ≤ R(f) ≤ R0(f) ≤ D(f) ≤ n,

since a quantum algorithm can simulate a randomized one and a randomized one can simulate a deterministic one.

A central question for the field, since these measures were studied in the 1980s or so, has been how far apart these measures can get from each other. If you allow partial Boolean functions—meaning that only some n-bit strings, not all of them, are “valid inputs” for which the algorithm needs to return a definite answer—then it’s easy to get enormous separations between any two of the measures (indeed, even bigger than exponential), as for example in my recent paper with Andris.

For total functions, by contrast, it’s been known for a long time that these measures can differ by at most polynomial factors:

D(f) = O(R(f)3) (Nisan)

D(f) = O(R0(f)2) (folklore, I think)

R0(f) = O(R2 log(n)) (Midrijanis)

D(f) = O(Q(f)6) (Beals et al. 1998)

OK, so what were the largest known gaps? For D versus R0 (as well as D versus R), the largest known gap since 1986 has come from the “recursive AND/OR tree”: that is, an OR of two ANDs of two ORs of two ANDs of … forming a complete binary tree of depth d, with the n=2d input variables comprising the leaves. For this function, we have D(f)=n, whereas Saks and Wigderson showed that R0(f)=Θ(n0.753) (and later, Santha showed that R(f)=Θ(n0.753) as well).

For D versus Q, the largest gap has been for the OR function: we have D(OR)=n (as mentioned earlier), but Q(OR)=Θ(√n) because of Grover’s algorithm. Finally, for R0 versus R, no asymptotic gap has been known for any total function. (This is a problem that I clearly remember working on back in 2000, when I was an undergrad. I even wrote a computer program, the Boolean Function Wizard, partly to search for separations between R0 versus R. Alas, while I did find one or two functions with separations, I was unable to conclude anything from them about asymptotics.)

So, how did Ambainis et al. achieve bigger gaps for each of these? I’ll try to have an explanation written by the time my flight from Portland to Boston has landed tonight. But if you can’t wait for that, or you prefer it straight from the horse’s mouth, read their paper!


Addendum 2: The Actual Functions

As I mentioned before, the starting point for everything Ambainis et al. do is a certain Boolean function g recently constructed by Göös, Pitassi, and Watson (henceforth GPW), for different purposes than the ones that concern Ambainis et al. We think of the inputs to g as divided into nm “cells,” which are arranged in a rectangular grid with m columns and n rows. Each cell contains a bit that’s either 0 or 1 (its “label), as well as a pointer to another cell (consisting of ~log2(nm) bits). The pointer can also be “null” (i.e., can point nowhere). We’ll imagine that a query of a cell gives you everything: the label and all the bits of the pointer. This could increase the query complexity of an algorithm, but only by a log(n) factor, which we won’t worry about.

Let X be a setting of all the labels and pointers in the grid. Then the question we ask about X is the following:

    Does there exist a “marked column”: that is, a column where all n of the labels are 1, and which has exactly one non-null pointer, which begins a chain of pointers of length m-1, which visits exactly one “0” cell in each column other than the marked column, and then terminates at a null pointer?

If such a marked column exists, then we set g(X)=1; otherwise we set g(X)=0. Crucially, notice that if a marked column exists, then it’s unique, since the chain of pointers “zeroes out” all m-1 of the other columns, and prevents them from being marked.

This g already leads to a new query complexity separation, one that refutes a strengthened form of the Saks-Wigderson conjecture. For it’s not hard to see that D(g)=Ω(mn): indeed, any deterministic algorithm must query almost all of the cells. A variant of this is proved in the paper, but the basic idea is that an adversary can answer all queries with giant fields of ‘1’ labels and null pointers—until a given column is almost completed, at which point the adversary fills in the last cell with a ‘0’ label and a pointer to the last ‘0’ cell that it filled in. The algorithm just can’t catch a break; it will need to fill in m-1 columns before it knows where the marked one is (if a marked column exists at all).

By contrast, it’s possible to show that, if n=m, then R(g) is about O(n4/3). I had an argument for R(g)=O((n+m)log(m)) in an earlier version of this post, but the argument was wrong; I thank Alexander Belov for catching the error. I’ll post the R(g)=O(n4/3) argument once I understand it.

To get the other separations—for example, total Boolean functions for which D~R02, D~Q4, R0~Q3, R0~R3/2, and R~approxdeg4—Ambainis et al. need to add various “enhancements” to the basic GPW function g defined above. There are three enhancements, which can either be added individually or combined, depending on one’s needs.

1. Instead of just a single marked column, we can define g(X) to be 1 if and only if there are k marked columns, which point to each other in a cycle, and which also point to a trail of m-k ‘0’ cells, showing that none of the other columns contain all ‘1’ cells. This can help a bounded-error randomized algorithm—which can quickly find one of the all-1 columns using random sampling—while not much helping a zero-error randomized algorithm.

2. Instead of a linear chain of pointers showing that all the non-marked columns contain a ‘0’ cell, for g(X) to be 1 we can demand a complete binary tree of pointers, originating at a marked column and fanning out to all the unmarked columns in only log(m) layers. This can substantially help a quantum algorithm, which can’t follow a pointer trail any faster than a classical algorithm can; but which, given a complete binary tree, can “fan out” and run Grover’s algorithm on all the leaves in only the square root of the number of queries that would be needed classically. Meanwhile, however, putting the pointers in a tree doesn’t much help deterministic or randomized algorithms.

3. In addition to pointers “fanning out” from a marked column to all of the unmarked columns, we can demand that in every unmarked column, some ‘0’ cell contains a back-pointer, which leads back to a marked column. These back-pointers can help a randomized or quantum algorithm find a marked column faster, while not much helping a deterministic algorithm.

Unless I’m mistaken, the situation is this:

With no enhancements, you can get D~R2 and something like D~R03/2 (although I still don’t understand how you get the latter with no enhancements; the paper mentions it without proof Andris has kindly supplied a proof here).

With only the cycle enhancement, you can get R0~R3/2.

With only the binary tree enhancement, you can get R~approxdeg4.

With only the back-pointer enhancement, you can get D~R02.

With the cycle enhancement and the binary-tree enhancement, you can get R0~Q3.

With the back-pointer enhancement and the binary-tree enhancement, you can get D~Q4.

It’s an interesting question whether there are separations that require both the cycle enhancement and the back-pointer enhancement; Ambainis et al. don’t give any examples.

And here’s another interesting question not mentioned in the paper. Using the binary-tree enhancement, Ambainis et al. achieve a fourth-power separation between bounded-error randomized query complexity and approximate degree as a real polynomial—i.e., quadratically better than any separation that was known before. Their proof of this involves cleverly constructing a low-degree polynomial by summing a bunch of low-degree polynomials derived from quantum algorithms (one for each possible marked row). As a result, their final, summed polynomial does not itself correspond to a quantum algorithm, meaning that they don’t get a fourth-power separation between R and Q (which would’ve been even more spectacular than what they do get). On the other hand, purely from the existence of a function with R~approxdeg4, we can deduce that that function has either

(i) a super-quadratic gap between R and Q (refuting my conjecture that the Grover speedup is the best possible quantum speedup for total Boolean functions), or
(ii) a quadratic gap between quantum query complexity and approximate degree—substantially improving over the gap found by Ambainis in 2003.

I conjecture that the truth is (ii); it would be great to have a proof or disproof of this.

97% environmentalist

June 7th, 2015

I decided to add my name to a petition by, as of this writing, 81 MIT faculty, calling on MIT to divest its endowment from fossil fuel companies.  (My co-signatories include Noam Chomsky, so I guess there’s something we agree about!)  There’s also a wider petition signed by nearly 3500 MIT students, faculty, and staff, mirroring similar petitions all over the world.

When the organizers asked me for a brief statement about why I signed, I sent them the following:

Signing this petition wasn’t an obvious choice for me, since I’m sensitive to the charge that divestment petitions are just meaningless sanctimony, a way for activists to feel morally pure without either making serious sacrifices or engaging the real complexities of an issue.  In the end, though, that kind of meta-level judgment can’t absolve us of the need to consider each petition on its merits: if we think of a previous crisis for civilization (say, in the late 1930s), then it seems obvious that even symbolic divestment gestures were better than nothing.  What made up my mind was reading the arguments pro and con, and seeing that the organizers of this petition had a clear-eyed understanding of what they were trying to accomplish and why: they know that divestment can’t directly drive down oil companies’ stock prices, but it can powerfully signal to the world a scientific consensus that, if global catastrophe is to be averted, most of the known fossil-fuel reserves need to be left in the ground, and that current valuations of oil, gas, and coal companies fail to reflect that reality.

For some recent prognoses of the climate situation, see (for example) this or this from Vox.  My own sense is that the threat has been systematically understated even by environmentalists, because of the human impulse to shoehorn all news into a hopeful narrative (“but there’s still time!  if we just buy locally-grown produce, everything can be OK!”).  Logically, there’s an obvious tension between the statements:

(a) there was already an urgent need to act decades ago, and

(b) having failed to act then, we can still feasibly avert a disaster now.

And indeed, (b) appears false to me.  We’re probably well into the era where, regardless of what we do or don’t do, some of us will live to see a climate dramatically different from the one in which human civilization developed for the past 10,000 years, at least as different as the last Ice Ages were.

And yet that fact still doesn’t relieve us of moral responsibility.  We can buy more time to prepare, hoping for technological advances in the interim; we can try to bend the curve of CO2 concentration away from the worst futures and toward the merely terrible ones.  Alas, even those steps will require political will that’s unprecedented outside of major wars.  For the capitalist free market (which I’m a big fan of) to work its magic, actual costs first need to get reflected in prices—which probably means massively taxing fossil fuels, to the point where it’s generally cheaper to leave them in the ground and switch to alternatives.  (Lest anyone call me a doctrinaire treehugger, I also support way less regulation of the nuclear industry, to drive down the cost of building the hundreds of new nuclear plants that we’ll probably need.)

These realities have a counterintuitive practical implication that I wish both sides understood better.  Namely, if you share my desperation and terror about this crisis, the urgent desire to do something, then limiting your personal carbon footprint should be very far from your main concern.  Like, it’s great if you can bike to work, and you should keep it up (fresh air and exercise and all).  But I’d say the anti-environmentalists are right that such voluntary steps are luxuries of the privileged, and will accordingly never add up to a hill of beans.  Let me go further: even to conceptualize this problem in terms of personal virtue and blame seems to me like a tragic mistake, one on which the environmentalists and their opponents colluded.  Given the choice, I’d much rather that the readers of this blog flew to all the faraway conferences they wanted, drove gas-guzzling minivans, ate steaks every night, and had ten kids, but then also took some steps that made serious political action to leave most remaining fossil fuels in the ground even ε more likely, ε closer to the middle of our Overton window.  I signed the MIT divestment petition because it seemed to me like such a step, admittedly with an emphasis on the ε.

Can blog posts nourish the soul? Scott A. (alas, not me) as existence proof

June 3rd, 2015

Reading the essays and speculative fiction of Scott Alexander, as they’ve grown in awesomeness even just within the past half-year, has for me been like witnessing the birth of a new Asimov.  (For more Alexandery goodness, check out Universal Love, Said the Cactus Person.)  That this nerd-bard, this spinner of stupid Internet memes into reflections on eternity, came to my attention by way of his brilliantly defending me, is almost immaterial at this point; I don’t think it plays any role in my continuing admiration for his work.  Whatever you do, just keep writing, other Scott A.

The End of Suffering?

June 1st, 2015

A computer science undergrad who reads this blog recently emailed me about an anxiety he’s been feeling connected to the Singularity—not that it will destroy all human life, but rather that it will make life suffering-free and therefore no longer worth living (more Brave New World than Terminator, one might say).

As he puts it:

This probably sounds silly, but I’ve been existentially troubled by certain science fiction predictions for about a year or two, most of them coming from the Ray Kurzweil/Singularity Institute types … What really bothers me is the idea of the “abolition of suffering” as some put it. I just don’t see the point. Getting rid of cancer, premature death, etc., that all sounds great. But death itself? All suffering? At what point do we just sit down and ask ourselves, why not put our brains in a jar, and just activate our pleasure receptors for all eternity? That seems to be the logical conclusion of that line of thinking. If we want to reduce the conscious feeling of pleasure to the release of dopamine in the brain, well, why not?

I guess what I think I’m worried about is having to make the choice to become a cyborg, or to upload my mind to a computer, to live forever, or to never suffer again. I don’t know how I’d answer, given the choice. I enjoy being human, and that includes my suffering. I really don’t want to live forever. I see that as a hedonic treadmill more than anything else. Crazy bioethicists like David Pearce, who want to genetically re-engineer all species on planet Earth to be herbivores, and literally abolish all suffering, just add fuel to my anxiety.

… Do you think we’re any closer to what Kurzweil (or Pearce) predicted (and by that I mean, will we see it in our lifetimes)? I want to stop worrying about these things, but something is preventing me from doing so. Thoughts about the far flung (or near) future are just intrusive for me. And it seems like everywhere I go I’m reminded of my impending fate. Ernst Jünger would encourage me to take up an attitude of amor fati, but I can’t see myself doing that. My father says I’m too young to worry about these things, and that the answer will be clear when I’ve actually lived my life. But I just don’t know. I want to stop caring, more than anything else. It’s gotten to a point where the thoughts keep me up at night.

I don’t know how many readers might have had similar anxieties, but in any case, I thought my reply might be of some interest to others, so with the questioner’s kind permission, I’m reproducing it below.

1. An end to suffering removing the meaning from life? As my grandmother might say, “we should only have such problems”! I believe, alas, that suffering will always be with us, even after a hypothetical technological singularity, because of basic Malthusian logic. I.e., no matter how many resources there are, population will expand exponentially to exploit them and make the resources scarce again, thereby causing fighting, deprivation, and suffering. What’s terrifying about Malthus’s logic is how fully general it is: it applies equally to tenure-track faculty positions, to any extraterrestrial life that might exist in our universe or in any other bounded universe, and to the distant post-Singularity future.

But if, by some miracle, we were able to overcome Malthus and eliminate all suffering, my own inclination would be to say “go for it”! I can easily imagine a life that was well worth living—filled with beauty, humor, play, love, sex, and mathematical and scientific discovery—even though it was devoid of any serious suffering. (We could debate whether the “ideal life” would include occasional setbacks, frustrations, etc., even while agreeing that at any rate, it should certainly be devoid of cancer, poverty, bullying, suicidal depression, and one’s Internet connection going down.)

2. If you want to worry about something, then rather than an end to suffering, I might humbly suggest worrying about a large increase in human suffering within our lifetimes. A few possible culprits: climate change, resurgent religious fundamentalism, large parts of the world running out of fresh water.

3. It’s fun to think about these questions from time to time, to use them to hone our moral intuitions—and I even agree with Scott Alexander that it’s worthwhile to have a small number of smart people think about them full-time for a living.  But I should tell you that, as I wrote in my post The Singularity Is Far, I don’t expect a Singularity in my lifetime or my grandchildrens’ lifetimes. Yes, technically, if there’s ever going to be a Singularity, then we’re 10 years closer to it now than we were 10 years ago, but it could still be one hell of a long way away! And yes, I expect that technology will continue to change in my lifetime in amazing ways—not as much as it changed in my grandparents’ lifetimes, probably, but still by a lot—but how to put this? I’m willing to bet any amount of money that when I die, people’s shit will still stink.

NSA in P/poly: The Power of Precomputation

May 22nd, 2015

Even after the Snowden revelations, there remained at least one big mystery about what the NSA was doing and how.  The NSA’s classified 2013 budget request mentioned, as a priority item, “groundbreaking cryptanalytic capabilities to defeat adversarial cryptography and exploit internet traffic.”  There was a requested increase, of several hundred million dollars, for “cryptanalytic IT services” and “cryptanalysis and exploitation services program C” (whatever that was).  And a classified presentation slide showed encrypted data being passed to a high-performance computing system called “TURMOIL,” and decrypts coming out.  But whatever was going on inside TURMOIL seemed to be secret even within NSA; someone at Snowden’s level wouldn’t have had access to the details.

So, what was (or is) inside the NSA’s cryptanalytic black box?  A quantum computer?  Maybe even one that they bought from D-Wave?  (Rimshot.)  A fast classical factoring algorithm?  A proof of P=NP?  Commentators on the Internet rushed to suggest each of these far-reaching possibilities.  Some of us tried to pour cold water on these speculations—pointing out that one could envision many scenarios that were a little more prosaic, a little more tied to the details of how public-key crypto is actually used in the real world.  Were we just naïve?

This week, a new bombshell 14-author paper (see also the website) advances an exceedingly plausible hypothesis about what may have been the NSA’s greatest cryptanalytic secret of recent years.  One of the authors is J. Alex Halderman of the University of Michigan, my best friend since junior high school, who I’ve blogged about before.  Because of that, I had some advance knowledge of this scoop, and found myself having to do what regular Shtetl-Optimized readers will know is the single hardest thing in the world for me: bite my tongue and not say anything.  Until now, that is.

Besides Alex, the other authors are David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Béguelink, and Paul Zimmermann (two of these, Green and Heninger, have previously turned up on Shtetl-Optimized).

These authors study vulnerabilities in Diffie-Hellman key exchange, the “original” (but still widely-used) public-key cryptosystem, the one that predates even RSA.  Diffie-Hellman is the thing where Alice and Bob first agree on a huge prime number p and a number g, then Alice picks a secret a and sends Bob ga (mod p), and Bob picks a secret b and sends Alice gb (mod p), and then Alice and Bob can both compute (ga)b=(gb)a=gab (mod p), but an eavesdropper who’s listening in only knows p, g, ga (mod p), and gb (mod p), and one can plausibly conjecture that it’s hard from those things alone to get gab (mod p).  So then Alice and Bob share a secret unknown to the eavesdropper, which they didn’t before, and they can use that secret to start doing cryptography.

As far as anyone knows today, the best way to break Diffie-Hellman is simply by calculating discrete logarithms: that is, solving the problem of recovering a given only g and h=ga (mod p).  At least on a classical computer, the fastest known algorithm for discrete logarithms (over fields of prime order) is the number field sieve (NFS).  Under plausible conjectures about the distribution of “smooth” numbers, NFS uses time that grows like exp((1.923+o(1))(log p)1/3(log log p)2/3), where the exp and logs are base e (and yes, even the lower-order stuff like (log log p)2/3 makes a big difference in practice).  Of course, once you know the running time of the best-known algorithm, you can then try to choose a key size (that is, a value of log(p)) that’s out of reach for that algorithm on the computing hardware of today.

(Note that the recent breakthrough of Antoine Joux, solving discrete log in quasipolynomial time in fields of small characteristic, also relied heavily on sieving ideas.  But there are no improvements from this yet for the “original” discrete log problem, over prime fields.)

But there’s one crucial further fact, which has been understood for at least a decade by theoretical cryptographers, but somehow was slow to filter out to the people who deploy practical cryptosystems.  The further fact is that in NFS, you can arrange things so that almost all the discrete-logging effort depends only on the prime number p, and not at all on the specific numbers g and h for which you’re trying to take the discrete log.  After this initial “precomputation” step, you then have a massive database that you can use to speed up the “descent” step: the step of solving ga=h (mod p), for any (g,h) pair that you want.

It’s a little like the complexity class P/poly, where a single, hard-to-compute “advice string” unlocks exponentially many inputs once you have it.  (Or a bit more precisely, one could say that NFS reveals that exponentiation modulo a prime number is sort of a trapdoor one-way function, except that the trapdoor information is subexponential-size, and given the trapdoor, inverting the function is still subexponential-time, but a milder subexponential than before.)

The kicker is that, in practice, a large percentage of all clients and servers that use Diffie-Hellman key exchange use the same few prime numbers p.  This means that, if you wanted to decrypt a large fraction of all the traffic encrypted with Diffie-Hellman, you wouldn’t need to do NFS over and over: you could just do it for a few p‘s and cache the results.  This fact can singlehandedly change the outlook for breaking Diffie-Hellman.

The story is different depending on the key size, log(p).  In the 1990s, the US government insisted on “export-grade” cryptography for products sold overseas (what a quaint concept!), which meant that the key size was restricted to 512 bits.  For 512-bit keys, Adrian et al. were able to implement NFS and use it to do the precomputation step in about 7 days on a cluster with a few thousand cores.  After this initial precomputation step (which produced 2.5GB of data), doing the descent, to find the discrete log for a specific (g,h) pair, took only about 90 seconds on a 24-core machine.

OK, but no one still uses 512-bit keys, do they?  The first part of Adrian et al.’s paper demonstrates that, because of implementation issues, even today you can force many servers to “downgrade” to the 512-bit, export-grade keys—and then, having done so, you can stall for time for 90 seconds as you figure out the session key, and then do a man-in-the-middle attack and take over and impersonate the server.  It’s an impressive example of the sort of game computer security researchers have been playing for a long time—but it’s really just a warmup to the main act.

As you’d expect, many servers today are configured more intelligently, and will only agree to 1024-bit keys.  But even there, Adrian et al. found that a large fraction of servers rely on just a single 1024-bit prime (!), and many of the ones that don’t rely on just a few other primes.  Adrian et al. estimate that, for a single 1024-bit prime, doing the NFS precomputation would take about 45 million years using a single core—or to put it more ominously, 1 year using 45 million cores.  If you built special-purpose hardware, that could go down by almost two orders of magnitude, putting the monetary cost at a few hundred million dollars, completely within the reach of a sufficiently determined nation-state.  Once the precomputation was done, and the terabytes of output stored in a data center somewhere, computing a particular discrete log would then take about 30 days using 1 core, or mere minutes using a supercomputer.  Once again, none of this is assuming any algorithmic advances beyond what’s publicly known.  (Of course, it’s possible that the NSA also has some algorithmic advances; even modest ones could obviate the need for special-purpose hardware.)

While writing this post, I did my own back-of-the-envelope, and got that using NFS, calculating a 1024-bit discrete log should be about 7.5 million times harder than calculating a 512-bit discrete log.  So, extrapolating from the 7 days it took Adrian et al. to do it for 512 bits, this suggests that it might’ve taken them about 143,840 years to calculate 1024-bit discrete logs with the few thousand cores they had, or 1 year if they had 143,840 times as many cores (since almost all this stuff is extremely parallelizable).  Adrian et al. mention optimizations that they expect would improve this by a factor of 3, giving us about 100 million core-years, very similar to Adrian et al.’s estimate of 45 million core-years (the lower-order terms in the running time of NFS might account for some of the remaining discrepancy).

Adrian et al. mount a detailed argument in their paper that all of the details about NSA’s “groundbreaking cryptanalytic capabilities” that we learned from the Snowden documents match what would be true if the NSA were doing something like the above.  The way Alex put it to me is that, sure, the NSA might not have been doing this, but if not, then he would like to understand why not—for it would’ve been completely feasible within the cryptanalytic budget they had, and the NSA would’ve known that, and it would’ve been a very good codebreaking value for the money.

Now that we know about this weakness of Diffie-Hellman key exchange, what can be done?

The most obvious solution—but a good one!—is just to use longer keys.  For decades, when applied cryptographers would announce some attack like this, theorists like me would say with exasperation: “dude, why don’t you fix all these problems in one stroke by just, like, increasing the key sizes by a factor of 10?  when it’s an exponential against a polynomial, we all know the exponential will win eventually, so why not just go out to where it does?”  The applied cryptographers explain to us, with equal exasperation in their voices, that there are all sorts of reasons why not, from efficiency to (maybe the biggest thing) backwards-compatibility.  You can’t unilaterally demand 2048-bit keys, if millions of your customers are using browsers that only understand 1024-bit keys.  On the other hand, given the new revelations, it looks like there really will be a big push to migrate to larger key sizes, as the theorists would’ve suggested from their ivory towers.

A second, equally-obvious solution is to stop relying so much on the same few prime numbers in Diffie-Hellman key exchange.  (Note that the reason RSA isn’t vulnerable to this particular attack is that it inherently requires a different composite number N for each public key.)  In practice, generating a new huge random prime number tends to be expensive—taking, say, a few minutes—which is why people so often rely on “standard” primes.  At the least, we could use libraries of millions of “safe” primes, from which a prime for a given session is chosen randomly.

A third solution is to migrate to elliptic-curve cryptography (ECC), which as far as anyone knows today, is much less vulnerable to descent attacks than the original Diffie-Hellman scheme.  Alas, there’s been a lot of understandable distrust of ECC after the DUAL_EC_DBRG scandal, in which it came out that the NSA backdoored some of NIST’s elliptic-curve-based pseudorandom generators by choosing particular parameters that it knew how handle.  But maybe the right lesson to draw is mod-p groups and elliptic-curve groups both seem to be pretty good for cryptography, but the mod-p groups are less good if everyone is using the same few prime numbers p (and those primes are “within nation-state range”), and the elliptic-curve groups are less good if everyone is using the same few parameters.  (A lot of these things do seem pretty predictable with hindsight, but how many did you predict?)

Many people will use this paper to ask political questions, like: hasn’t the NSA’s codebreaking mission once again usurped its mission to ensure the nation’s information security?  Doesn’t the 512-bit vulnerability that many Diffie-Hellman implementations still face, as a holdover from the 1990s export rules, illustrate why encryption should never be deliberately weakened for purposes of “national security”?  How can we get over the issue of backwards-compatibility, and get everyone using strong crypto?  People absolutely should be asking such questions.

But for readers of this blog, there’s one question that probably looms even larger than those of freedom versus security, openness versus secrecy, etc.: namely, the question of theory versus practice.  Which “side” should be said to have “won” this round?  Some will say: those useless theoretical cryptographers, they didn’t even know how their coveted Diffie-Hellman system could be broken in the real world!  The theoretical cryptographers might reply: of course we knew about the ability to do precomputation with NFS!  This wasn’t some NSA secret; it’s something we discussed openly for years.  And if someone told us how Diffie-Hellman was actually being used (with much of the world relying on the same few primes), we could’ve immediately spotted the potential for such an attack.  To which others might reply: then why didn’t you spot it?

Perhaps the right lesson to draw is how silly such debates really are.  In the end, piecing this story together took a team that was willing to do everything from learning some fairly difficult number theory to coding up simulations to poring over the Snowden documents for clues about the NSA’s budget.  Clear thought doesn’t respect the boundaries between disciplines, or between theory and practice.

(Thanks very much to Nadia Heninger and Neal Koblitz for reading this post and correcting a few errors in it.  For more about this, see Bruce Schneier’s post or Matt Green’s post.)

Five announcements

May 16th, 2015

1. Sanjeev Arora sent me a heads-up that there’s a discussion about the future of the STOC conference  at the Windows on Theory blog—in particular, about the idea of turning STOC into a longer “CS theory festival.”  If you have opinions about this, don’t miss the chance to make your voice heard.

2. Back in January, I blogged about a new quantum optimization algorithm by Farhi, Goldstone, and Gutmann, which was notable for being, as far as anyone could tell, the first quantum algorithm to achieve a provably better approximation ratio than the best-known classical algorithm for an NP-hard optimization problem.  Today, I report that a fearsome list of authors—Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright—has put out an eagerly-awaited paper that gives a classical algorithm for the same problem, with better performance than the quantum algorithm’s.  (They write that this “improves both qualitatively and quantitatively” on Farhi et al.’s work; I assume “qualitatively” refers to the fact that the new algorithm is classical.)  What happened, apparently, is that after I blogged (with enthusiasm) about the Farhi et al. result, a bunch of classical complexity theorists read my post and decided independently that they could match or beat the quantum algorithm’s performance classically; then they found out about each other and decided to merge their efforts.  I’m proud to say that this isn’t the first example of this blog catalyzing actual research progress, though it’s probably the best example so far.  [Update: Luca Trevisan now has a great post explaining what happened in much more detail, entitled “How Many Theoreticians Does It Take to Approximate Max 3Lin?”]

Another update: Farhi et al. have posted a new version of their paper, in which they can almost match the performance of the classical algorithm using their quantum algorithm.

3. Jennifer Ouellette has a wonderful article in Quanta magazine about recent progress in AdS/MERA (i.e., “the emergence of spacetime from entanglement”), centered around the ideas of Brian Swingle.  This is one of the main things that I’d love to understand better right now—if I succeed even partially, you’ll know because I’ll write a blog post trying to explain it to others.  See also this blog post by Sean Carroll (about this paper by Ning Bao et al.), and this paper by Pastawski, Yoshida, Harlow, and Preskill, which explicitly mines the AdS/CFT correspondence for examples of quantum error-correcting codes.

4. Celebrity rationalist Julia Galef, who I had the great honor of meeting recently, has a podcast interview with Sean Carroll about why Carroll accepts the many-worlds interpretation.  (Or if, like me, you prefer the written word to the spoken one, click here for a full transcript.)  Unfortunately, Sean is given the opportunity at the end of the interview to recommend one science book to his listeners—just one!—but he squanders it by plugging some weird, self-indulgent thing called Quantum Computing Since Democritus.  Julia also has a YouTube video about what she learned from the interview, but I haven’t yet watched it (is there a transcript?).

5. I came across an insightful if meandering essay about nerd culture by Meredith L. Patterson.  In particular, noticing how the term “nerd” has been co-opted by normal, socially-skilled people, who’ve quickly set about remaking nerd social norms to make them identical to the rest of the world’s norms, Patterson coins the term “weird-nerd” to describe people like herself, who are still nerds in the original sense and who don’t see nerd culture as something horribly, irreparably broken.  As she writes: “We’ll start to feel less defensive when we get some indication — any indication — that our critics understand what parts of our culture we don’t want to lose and why we don’t want to lose them.”  (But is this the start of a linguistic treadmill?  Will we eventually need to talk about weird-weird-nerds, etc.?)

“Is There Something Mysterious About Math?”

April 22nd, 2015

When it rains, it pours: after not blogging for a month, I now have a second thing to blog about in as many days.  Aeon, an online magazine, asked me to write a short essay responding to the question above, so I did.  My essay is here.  Spoiler alert: my thesis is that yes, there’s something “mysterious” about math, but the main mystery is why there isn’t even more mystery than there is.  Also—shameless attempt to get you to click—the essay discusses the “discrete math is just a disorganized mess of random statements” view of Luboš Motl, who’s useful for putting flesh on what might otherwise be a strawman position.  Comments welcome (when aren’t they?).  You should also read other interesting responses to the same question by Penelope Maddy, James Franklin, and Neil Levy.  Thanks very much to Ed Lake at Aeon for commissioning these pieces.


Update (4/22): On rereading my piece, I felt bad that it didn’t make a clear enough distinction between two separate questions:

  1. Are there humanly-comprehensible explanations for why the mathematical statements that we care about are true or false—thereby rendering their truth or falsity “non-mysterious” to us?
  2. Are there formal proofs or disproofs of the statements?

Interestingly, neither of the above implies the other.  Thus, to take an example from the essay, no one has any idea how to prove that the digits 0 through 9 occur with equal frequency in the decimal expansion of π, and yet it’s utterly non-mysterious (at a “physics level of rigor”) why that particular statement should be true.  Conversely, there are many examples of statements for which we do have proofs, but which experts in the relevant fields still see as “mysterious,” because the proofs aren’t illuminating or explanatory enough.  Any proofs that require gigantic manipulations of formulas, “magically” terminating in the desired outcome, probably fall into that class, as do proofs that require computer enumeration of cases (like that of the Four-Color Theorem).

But it’s not just that proof and explanation are incomparable; sometimes they might even be at odds.  In this MathOverflow post, Timothy Gowers relates an interesting speculation of Don Zagier, that statements like the equidistribution of the digits of π might be unprovable from the usual axioms of set theory, precisely because they’re so “obviously” true—and for that very reason, there need not be anything deeper underlying their truth.  As Gowers points out, we shouldn’t go overboard with this speculation, because there are plenty of other examples of mathematical statements (the Green-Tao theorem, Vinogradov’s theorem, etc.) that also seem like they might be true “just because”—true only because their falsehood would require a statistical miracle—but for which mathematicians nevertheless managed to give fully rigorous proofs, in effect formalizing the intuition that it would take a miracle to make them false.

Zagier’s speculation is related to another objection one could raise against my essay: while I said that the “Gödelian gremlin” has remained surprisingly dormant in the 85 years since its discovery (and that this is a fascinating fact crying out for explanation), who’s to say that it’s not lurking in some of the very open problems that I mentioned, like π’s equidistribution, the Riemann Hypothesis, the Goldbach Conjecture, or P≠NP?  Conceivably, not only are all those conjectures unprovable from the usual axioms of set theory, but their unprovability is itself unprovable, and so on, so that we could never even have the satisfaction of knowing why we’ll never know.

My response to these objections is basically just to appeal yet again to the empirical record.  First, while proof and explanation need not go together and sometimes don’t, by and large they do go together: over thousands over years, mathematicians learned to seek formal proofs largely because they discovered that without them, their understanding constantly went awry.  Also, while no one can rule out that P vs. NP, the Riemann Hypothesis, etc., might be independent of set theory, there’s very little in the history of math—including in the recent history, which saw spectacular proofs of (e.g.) Fermat’s Last Theorem and the Poincaré Conjecture—that lends concrete support to such fatalism.

So in summary, I’d say that history does present us with “two mysteries of the mathematical supercontinent”—namely, why do so many of the mathematical statements that humans care about turn out to be tightly linked in webs of explanation, and also in webs of proof, rather than occupying separate islands?—and that these two mysteries are very closely related, if not quite the same.

Two papers

April 21st, 2015

Just to get myself back into the habit of blogging:

For those of you who don’t read Lance’s and Bill’s blog, there was a pretty significant breakthrough in complexity theory announced last week.  (And yes, I’m now spending one of the two or so uses of the word “breakthrough” that I allow myself per year—wait, did I just spend the second one with this sentence?)  Ben Rossman (a former MIT PhD student whose thesis committee I was honored to serve on), Rocco Servedio, and Li-Yang Tan have now shown that the polynomial hierarchy is infinite relative to a random oracle, thereby solving the main open problem from Johan Håstad’s 1986 PhD thesis.  While it feels silly even to mention it, the best previous result in this direction was to separate PNP from Σ2P relative to a random oracle, which I did in my Counterexample to the Generalized Linial-Nisan Conjecture paper.  In some sense Rossman et al. infinitely improve on that (using completely different techniques).  Proving their result boils down to proving a new lower bound on the sizes of constant-depth circuits.  Basically, they need to show that, for every k, there are problems that can be solved by small circuits with k layers of AND, OR, and NOT gates, but for which the answer can’t even be guessed, noticeably better than chance, by any small circuit with only k-1 layers of AND, OR, and NOT gates.  They achieve that using a new generalization of the method of random restrictions.  Congratulations to Ben, Rocco, and Li-Yang!

Meanwhile, if you want to know what I’ve been doing for the last couple months, one answer is contained in this 68-page labor of love preprint by me and my superb PhD students Daniel Grier and Luke Schaeffer.  There we give a full classification of all possible sets of classical reversible gates acting on bits (like the Fredkin, Toffoli, and CNOT gates), as well as a linear-time algorithm to decide whether one reversible gate generates another one (previously, that problem wasn’t even known to be decidable).  We thereby completely answer a question that basically no one was asking, although I don’t understand why not.

The ultimate physical limits of privacy

March 11th, 2015

Somewhat along the lines of my last post, the other day a reader sent me an amusing list of questions about privacy and fundamental physics.  The questions, and my answers, are below.

1. Does the universe provide us with a minimum level of information security?

I’m not sure what the question means. Yes, there are various types of information security that are rooted in the known laws of physics—some of them (like quantum key distribution) even relying on specific aspects of quantum physics—whose security one can argue for by appealing to the known properties of the physical world. Crucially, however, any information security protocol is only as good as the assumptions it rests on: for example, that the attacker can’t violate the attack model by, say, breaking into your house with an ax!

2. For example, is my information safe from entities outside the light-cone I project?

Yes, I think it’s safe to assume that your information is safe from any entities outside your future light-cone. Indeed, if information is not in your future light-cone, then almost by definition, you had no role in creating it, so in what sense should it be called “yours”?

3. Assume that there are distant alien cultures with infinite life spans – would they always be able to wait long enough for my light cone to spread to them, and then have a chance of detecting my “private” information?

First of all, the aliens would need to be in your future light-cone (see my answer to 2). In 1998, it was discovered that there’s a ‘dark energy’ pushing the galaxies apart at an exponentially-increasing rate. Assuming the dark energy remains there at its current density, galaxies that are far enough away from us (more than a few tens of billions of light-years) will always recede from us faster than the speed of light, meaning that they’ll remain outside our future light-cone, and signals from us can never reach them. So, at least you’re safe from those aliens!

For the aliens in your future light-cone, the question is subtler. Suppose you took the only piece of paper on which your secrets were written, and burned it to ash—nothing high-tech, just burned it. Then there’s no technology that we know today, or could even seriously envision, that would piece the secrets together. It would be like unscrambling an egg, or bringing back the dead from decomposing corpses, or undoing a quantum measurement. It would mean, effectively, reversing the Arrow of Time in the relevant part of the universe. This is formally allowed by the Second Law of Thermodynamics, since the decrease in entropy within that region could be balanced by an increase in entropy elsewhere, but it would require a staggering level of control over the region’s degrees of freedom.

On the other hand, it’s also true that the microscopic laws of physics are reversible: they never destroy information. And for that reason, as a matter of principle, we can’t rule out the possibility that some civilization of the very far future, whether human or alien, could piece together what was written on your paper even after you’d burned it to a crisp. Indeed, with such godlike knowledge and control, maybe they could even reconstruct the past states of your brain, and thereby piece together private thoughts that you’d never written anywhere!

4. Does living in a black hole provide privacy? Couldn’t they follow you into the hole?

No, I would not recommend jumping into a black hole as a way to ensure your privacy. For one thing, you won’t get to enjoy the privacy for long (a couple hours, maybe, for a supermassive black hole at the center of a galaxy?) before getting spaghettified on your way to the singularity. For another, as you correctly pointed out, other people could still snoop on you by jumping into the black hole themselves—although they’d have to want badly enough to learn your secrets that they wouldn’t mind dying themselves along with you, and also not being able to share whatever they learned with anyone outside the hole.

But a third problem is that even inside a black hole, your secrets might not be safe forever! Since the 1970s, it’s been thought that all information dropped into a black hole eventually comes out, in extremely-scrambled form, in the Hawking radiation that black holes produce as they slowly shrink and evaporate. What do I mean by “slowly”? Well, the evaporation would take about 1070 years for a black hole the mass of the sun, or about 10100 years for the black holes at the centers of galaxies. Furthermore, even after the black hole had evaporated, piecing together the infalling secrets from the Hawking radiation would probably make reconstructing what was on the burned paper from the smoke and ash seem trivial by comparison! But just like in the case of the burned paper, the information is still formally present (if current ideas about quantum gravity are correct), so one can’t rule out that it could be reconstructed by some civilization of the extremely remote future.

The flow of emails within the block inbox

March 7th, 2015

As a diversion from the important topics of shaming, anti-shaming, and anti-anti-shaming, I thought I’d share a little email exchange (with my interlocutor’s kind permission), which gives a good example of what I find myself doing all day when I’m not blogging, changing diapers, or thinking about possibly doing some real work (but where did all the time go?).


Dear Professor Aaronson,

I would be very pleased to know your opinion about time.  In a letter of condolence to the Besso family, Albert Einstein wrote: “Now he has departed from this strange world a little ahead of me. That means nothing. People like us, who believe in physics, know that the distinction between past, present and future is only a stubbornly persistent illusion.” I’m a medical doctor and everyday I see time’s effect over human bodies. Is Einstein saying time is an illusion?  For who ‘believe in physics’ is death an illusion?  Don’t we lose our dears and will they continue to live in an ‘eternal world’?

Is time only human perceptive illusion (as some scientists say physics has proved)?


Dear [redacted],

I don’t read Einstein in that famous quote as saying that time itself is an illusion, but rather, that the sense of time flowing from past to present to future is an illusion. He meant, for example, that the differential equations of physics can just as easily be run backward (from future to past) as forward (from past to future), and that studying physics can strongly encourage a perspective—which philosophers call the “block universe” perspective—where you treat the entire history of spacetime as just a fixed, 4-dimensional manifold, with time simply another dimension in addition to the three spatial ones (admittedly, a dimension that the laws of physics treat somewhat differently than the other three). And yes, relativity encourages this perspective, by showing that different observers, moving at different speeds relative to each other, will divide up the 4-dimensional manifold into time slices in different ways, with two events judged to be simultaneous by one observer judged to be happening at different times by another.

But even after Einstein is read this way, I’d personally respond: well, that’s just one perspective you can take. A perfectly understandable one, if you’re Einstein, and especially if you’re Einstein trying to comfort the bereaved. But still: would you want to say, for example, that because physics treats the table in front of you as just a collection of elementary particles held together by forces, therefore the table, as such, doesn’t “exist”? That seems overwrought. Physics deepens your understanding of the table, of course—showing you what its microscopic constituents are and why they hold themselves together—but the table still “exists.”  In much the same way, physics enormously deepened our understanding of what we mean by the “flow of time”—showing how the “flow” emerges from the time-symmetric equations of physics, combined with the time-asymmetric phenomena of thermodynamics, which increase the universe’s entropy as we move away from the Big Bang, and thereby allow for the creation of memories, records, and other irreversible effects (a part of the story that I didn’t even get into here). But it feels overwrought to say that, because physics gives us a perspective from which we can see the “flow of time” as emerging from something deeper, therefore the “flow” doesn’t exist, or is just an illusion.

Hope that helps!

Best,
Scott


(followup question)

Dear Professor,

I’ve been thinking about the “block universe” and it seems to me that in it past, present and future all coexist.  So on the basis of Einstein’s theory, do all exist eternally, and why do we perceive only the present?


(reply)

But you don’t perceive only the present!  In the past, you perceived what’s now the past (and which you now remember), and in the future, you’ll perceive what’s now the future (and which you now look forward to), right?  And as for why the present is the present, and not some other point in time?  Well, that strikes me as one of those questions like why you’re you, out of all the possible people who you could have been instead, or why, assuming there are billions of habitable planets, you find yourself on earth and not on any of the other planets.  Maybe the best answer is that you had to be someone, living somewhere, at some particular point in time when you asked this question—and you could’ve wondered the same thing regardless of what the answer had turned out to be.