Gaussian BosonSampling, higher-order correlations, and spoofing: An update

In my last post, I wrote (among other things) about an ongoing scientific debate between the group of Chaoyang Lu at USTC in China, which over the past year has been doing experiments that seek to demonstrate quantum supremacy via Gaussian BosonSampling; and the group of Sergio Boixo at Google, which had a recent paper on a polynomial-time classical algorithm to sample approximately from the same distributions.  I reported the facts as I understood them at the time.  Since then, though, a long call with the Google team gave me a new and different understanding, and I feel duty-bound to share that here.

A week ago, I considered it obvious that if, using a classical spoofer, you could beat the USTC experiment on a metric like total variation distance from the ideal distribution, then you would’ve completely destroyed USTC’s claim of quantum supremacy.  The reason I believed that, in turn, is a proposition that I hadn’t given a name but needs one, so let me call it Hypothesis H:

The only way a classical algorithm to spoof BosonSampling can possibly do well in total variation distance, is by correctly reproducing the high-order correlations (correlations among the occupation numbers of large numbers of modes) — because that’s where the complexity of BosonSampling lies (if it lies anywhere).

Hypothesis H had important downstream consequences.  Google’s algorithm, by the Google team’s own admission, does not reproduce the high-order correlations.  Furthermore, because of limitations on both samples and classical computation time, Google’s paper calculates the total variation distance from the ideal distribution only on the marginal distribution on roughly 14 out of 144 modes.  On that marginal distribution, Google’s algorithm does do better than the experiment in total variation distance.  Google presents a claimed extrapolation to the full 144 modes, but eyeballing the graphs, it was far from clear to me what would happen: like, maybe the spoofing algorithm would continue to win, but maybe the experiment would turn around and win; who knows?

Chaoyang, meanwhile, made a clear prediction that the experiment would turn around and win, because of

  1. the experiment’s success in reproducing the high-order correlations,
  2. the admitted failure of Google’s algorithm in reproducing the high-order correlations, and
  3. the seeming impossibility of doing well on BosonSampling without reproducing the high-order correlations (Hypothesis H).

Given everything my experience told me about the central importance of high-order correlations for BosonSampling, I was inclined to agree with Chaoyang.

Now for the kicker: it seems that Hypothesis H is false.  A classical spoofer could beat a BosonSampling experiment on total variation distance from the ideal distribution, without even bothering to reproduce the high-order correlations correctly.

This is true because of a combination of two facts about the existing noisy BosonSampling experiments.  The first fact is that the contribution from the order-k correlations falls off like 1/exp(k).  The second fact is that, due to calibration errors and the like, the experiments already show significant deviations from the ideal distribution on the order-1 and order-2 correlations.

Put these facts together and what do you find?  Well, suppose your classical spoofing algorithm takes care to get the low-order contributions to the distribution exactly right.  Just for that reason alone, it could already win over a noisy BosonSampling experiment, as judged by benchmarks like total variation distance from the ideal distribution, or for that matter linear cross-entropy.  Yes, the experiment will beat the classical simulation on the higher-order correlations.  But because those higher-order correlations are exponentially attenuated anyway, they won’t be enough to make up the difference.  The experiment’s lack of perfection on the low-order correlations will swamp everything else.

Granted, I still don’t know for sure that this is what happens — that depends on whether I believe Sergio or Chaoyang about the extrapolation of the variation distance to the full 144 modes (my own eyeballs having failed to render a verdict!).  But I now see that it’s logically possible, maybe even plausible.

So, let’s imagine for the sake of argument that Google’s simulation wins on variation distance, even though the experiment wins on the high-order correlations.  In that case, what would be our verdict: would USTC have achieved quantum supremacy via BosonSampling, or not?

It’s clear what each side could say.

Google could say: by a metric that Scott Aaronson, the coinventor of BosonSampling, thought was perfectly adequate as late as last week — namely, total variation distance from the ideal distribution — we won.  We achieved lower variation distance than USTC’s experiment, and we did it using a fast classical algorithm.  End of discussion.  No moving the goalposts after the fact.

Google could even add: BosonSampling is a sampling task; it’s right there in the name!  The only purpose of any benchmark — whether Linear XEB or high-order correlation — is to give evidence about whether you are or aren’t sampling from a distribution close to the ideal one.  But that means that, if you accept that we are doing the latter better than the experiment, then there’s nothing more to argue about.

USTC could respond: even if Scott Aaronson is the coinventor of BosonSampling, he’s extremely far from an infallible oracle.  In the case at hand, his lack of appreciation for the sources of error in realistic experiments caused him to fixate inappropriately on variation distance as the success criterion.  If you want to see the quantum advantage in our system, you have to deliberately subtract off the low-order correlations and look at the high-order correlations.

USTC could add: from the very beginning, the whole point of quantum supremacy experiments was to demonstrate a clear speedup on some benchmark — we never particularly cared which one!  That horse is out of the barn as soon as we’re talking about quantum supremacy at all — something the Google group, which itself reported the first quantum supremacy experiment in Fall 2019, again for a completely artificial benchmark — knows as well as anyone else.  (The Google team even has experience with adjusting benchmarks: when, for example, Pan and Zhang pointed out that Linear XEB as originally specified is pretty easy to spoof for random 2D circuits, the most cogent rejoinder was: OK, fine then, add an extra check that the returned samples are sufficiently different from one another, which kills Pan and Zhang’s spoofing strategy.) In that case, then, why isn’t a benchmark tailored to the high-order correlations as good as variation distance or linear cross-entropy or any other benchmark?

Both positions are reasonable and have merit — though I confess to somewhat greater sympathy for the one that appeals to my doofosity rather than my supposed infallibility!

OK, but suppose, again for the sake of argument, that we accepted the second position, and we said that USTC gets to declare quantum supremacy as long as its experiment does better than any known classical simulation at reproducing the high-order correlations.  We’d still face the question: does the USTC experiment, in fact, do better on that metric?  It would be awkward if, having won the right to change the rules in its favor, USTC still lost even under the new rules.

Sergio tells me that USTC directly reported experimental data only for up to order-7 correlations, and at least individually, the order-7 correlations are easy to reproduce on a laptop (although sampling in a way that reproduces the order-7 correlations might still be hard—a point that Chaoyang confirms, and where further research would be great). OK, but USTC also reported that their experiment seems to reproduce up to order-19 correlations. And order-19 correlations, the Google team agrees, are hard to sample consistently with on a classical computer by any currently known algorithm.

So then, why don’t we have direct data for the order-19 correlations?  The trouble is simply that it would’ve taken USTC an astronomical amount of computation time.  So instead, they relied on a statistical extrapolation from the observed strength of the lower-order correlations — there we go again with the extrapolations!  Of course, if we’re going to let Google rest its case on an extrapolation, then maybe it’s only sporting to let USTC do the same.

You might wonder: why didn’t we have to worry about any of this stuff with the other path to quantum supremacy, the one via random circuit sampling with superconducting qubits?  The reason is that, with random circuit sampling, all the correlations except the highest-order ones are completely trivial — or, to say it another way, the reduced state of any small number of output qubits is exponentially close to the maximally mixed state.  This is a real difference between BosonSampling and random circuit sampling—and even 5-6 years ago, we knew that this represented an advantage for random circuit sampling, although I now have a deeper appreciation for just how great of an advantage it is.  For it means that, with random circuit sampling, it’s easier to place a “sword in the stone”: to say, for example, here is the Linear XEB score achieved by the trivial classical algorithm that outputs random bits, and lo, our experiment achieves a higher score, and lo, we challenge anyone to invent a fast classical spoofing method that achieves a similarly high score.

With BosonSampling, by contrast, we have various metrics with which to judge performance, but so far, for none of those metrics do we have a plausible hypothesis that says “here’s the best that any polynomial-time classical algorithm can possibly hope to do, and it’s completely plausible that even a noisy current or planned BosonSampling experiment can do better than that.”

In the end, then, I come back to the exact same three goals I would’ve recommended a week ago for the future of quantum supremacy experiments, but with all of them now even more acutely important than before:

  1. Experimentally, to increase the fidelity of the devices (with BosonSampling, for example, to observe a larger contribution from the high-order correlations) — a much more urgent goal, from the standpoint of evading classical spoofing algorithms, than further increasing the dimensionality of the Hilbert space.
  2. Theoretically, to design better ways to verify the results of sampling-based quantum supremacy experiments classically — ideally, even ways that could be applied via polynomial-time tests.
  3. For Gaussian BosonSampling in particular, to get a better understanding of the plausible limits of classical spoofing algorithms, and exactly how good a noisy device needs to be before it exceeds those limits.

Thanks so much to Sergio Boixo and Ben Villalonga for the conversation, and to Chaoyang Lu and Jelmer Renema for comments on this post. Needless to say, any remaining errors are my own.

26 Responses to “Gaussian BosonSampling, higher-order correlations, and spoofing: An update”

  1. Doug S. Says:

    It would be ironic if it turns out that BPP = BQP after all.

  2. Scott Says:

    Doug S. #1: I confess, your comment reminded me of Feynman’s parable about the people who spend their entire lives trying to break a particular combination lock, very gradually learning more and more of its secrets, whereupon some amateur walks up to them and asks, in a helpful tone of voice, “did you try 1-2-3-4-5 yet?” 😀

  3. RD Says:

    “The Google team even has experience with adjusting benchmarks”… More than once, i.e. not just with the example you gave. They had a different benchmark in their first paper too, and they basically invented the notion / name “Linear Cross Entropy” in the successor paper (see how many search hits that ten gets that are not originating from their paper).

  4. asdf Says:

    Scott #2, that actually happened, in the cryptanalysis of the Enigma. I am probably garbling this a little but the basic story was like this. The commercial version of Enigma had a QWERTY keyboard, so British cryptanalyists figured the military one was wired the same way, but that didn’t work. They figured out that the keys must have been permuted somehow: ok, there were known alternate setups like AZERTY, etc. None of those worked either: it must be a secret permutation. Lots more work went into figuring it out, still to no avail. Then they were informed that Polish cryptographers had completely solved the thing. They met with the Poles and one of their first questions was “what was the keyboard ordering and how did you figure it out?”. The answer was “QWERTY didn’t work, so we guessed ABCDEF… and that turned out to be the right ordering.”

  5. fred Says:

    I’ve always said that that QC research should be cranking up the number of imperfect qubits only as a goal to create perfect qubits, no matter how few, and not to create some noisy quantum soup where everyone’s getting bogged down wasting time arguing over tenuous correlations.

  6. Nick Drozd Says:

    You changed your mind? Jeez what are you, some kind of flip-flopper?

  7. Scott Says:

    asdf #4: LOL, I hadn’t heard that story! Do you (or anyone else) have a reference?

  8. Scott Says:

    fred #5: Most serious people in the field have been saying that too! But perfect qubits are hard.

  9. Scott Says:

    Nick Drozd #6: It’s funny, I feel like my moral and political views have barely changed since I was a teenager in the 90s, even while half the world around me became woke and the other half became right-wing authoritarian. Meanwhile, I feel like my views about the current quantum supremacy experiments radically change from one week to the next, even while many other people’s views (whatever they happen to be) stay the same. 🙂

  10. fred Says:

    asdf #4

    interesting, and I didn’t know that Germany was actually using QWERTZ rather than QWERTY.

  11. Mark Says:

    Scott #7

    I read that same story in ”The Code Book” by Simon Singh I think.

  12. Chris J. Says:

    I have a vague, partially formed question about quantum supremacy and classical spoofing that may be more on the philosophical side, or at least on the interpretation side. If a given quantum experiment is shown repeatedly to be classically spoofable, does it seem more likely that the experiment is (1) in fact carrying out a classical algorithm “under the hood” (perhaps unbeknownst to the experimenters and/or obscured by the experimental setup and physical effects), or (2) carrying out a quantum implementation of a much less efficient algorithm, but then cancelling out and yielding similar performance in the end? Or does this question not make any sense? Does it make sense to talk about what “algorithm” a physical system is using “under the hood”?

  13. asdf Says:

    Scott #7, it’s in various WW2 cryptography history books. I just looked in Stephen Budiansky’s “Battle of Wits” (2002) because I have a copy handy, but it’s probably also in Wladislaw Kozaczuk’s “Enigma” which I don’t have but which is afaict still the best book on the topic, or in F. W. Winterbotham’s “The Ultra Secret”. In Budiansky’s book it is in chapter 3, pp. 94-95.

    The British cryptographer involved was Dilwyn Knox, who had many triumphs despite missing that particular trick. If you look for him in the index of some other book, you can probably find the incident. It was very famous. I did get some details wrong but the general outline was accurate.

  14. asdf Says:

    Fwiw, there was a similar story in “Surely You’re Joking, Mr. Feynman”. Feynman was into safecracking at Los Alamos, and (again in my garbled recollection from the book) he was working away at one particular safe when one of his buddies came along and opened it almost instantly. Feynman asked how the buddy opened it so fast. The buddy replied “oh, I just tried 10-20-30, which is what they set the combination to at the factory. Looks like the safe owner didn’t bother changing it”. Feynman went around trying that combination on other safes and quite a lot of them opened with it. Heh.

  15. Scott Says:

    asdf #14: I guess that’s where Feynman learned that example then! But his point, I think in The Character of Physical Law, was about just how unlikely it is, in the case of physics research, that the person saying “have you tried 1-2-3-4-5?” is going to be telling the experts something they didn’t already think of.

  16. Scott Says:

    Chris J. #12: Yes, that’s an extremely natural thing to wonder, although as you’ve already recognized yourself, it’s hard to know how to answer it as stated. Maybe the most useful thing I can say is that some algorithms—like maybe the Gottesman-Knill algorithm for simulating stabilizer circuits—seem wildly unlike any plausible model for what “Nature herself could be doing” (whatever that means), for example because they don’t generalize to even slightly different cases. But other algorithms, like quantum Monte Carlo, seem more plausible—you could just barely imagine some alternate, Gil-Kalai-friendly universe where QMC somehow worked to simulate all realistic quantum systems, and indeed was “the actual reality behind the curtain.” In any case, though, once there’s any fast classical spoofing method at all, even a “physically unrealistic” one, the possibility is then raised that a different method might be found that’s more physically plausible.

  17. asdf Says:

    But his point, I think in The Character of Physical Law, was about just how unlikely it is, in the case of physics research, that the person saying “have you tried 1-2-3-4-5?” is going to be telling the experts something they didn’t already think of.

    Scott #15, do you mean to tell me that this didn’t really happen? Aww well, what a disappointment. Another illusion shattered.

  18. Raoul Ohio Says:


    Many attempts to “semi brute force” crack passwords start with lists of passwords that are very common, like qaz123, and progress to lists of common words, etc.

    The point being that trying 10-20-30 is more like checking to see if a door is unlocked, than “telling experts something they didn’t know”.

    My top down memory of Feynman’s book includes the fact that he was often successful with the default combo, but not the story of how he got that combo. Those books are great. So is Ulam’s.

  19. fred Says:

    Scott, asdf

    but the more general point is that the practice of safe cracking involves a strong psychological aspect that experts in the mechanical design of locks could be totally ignorant of.
    Just like how most real-life hacking involves skills in social engineering that are totally orthogonal to the mathematics of cryptography.

  20. Doug S. Says:

    I’m pretty sure Feynman got the default combination from a professional locksmith who had been hired to open a safe Feynman couldn’t.

  21. Doug S. Says:

    “1-2-3-4-5? That’s the kind of combination an idiot would put on his luggage!”

  22. Doug S. Says:

    Scott @2: I was mostly joking. P vs PSPACE is still technically an open problem, so there’s no proof that quantum mechanics can’t be simulated efficiently on a classical computer, even if nobody seriously expects to find one.

  23. Job Says:

    The takeaway from this discussion is that amateurs guess 1-2-3-4-5 while experts guess 10-20-30.

    Everyone else just tries to grind the lock down.

  24. Dfeldmann Says:

    Hi, Scott ; I have a (probably) stupid question : Why is Schor algorithm (or the more efficient quantum annealing) never mentioned as a benchmark for (future) classical inframacy(?) It seems that to crack efficiently RSA, you would need at least 10^4 qubits, and probably 10^6, but everyone (in the commercial shere at least) is intent to sell quantum computing as the next revolution, so at first sight you would expect crooks trying to sell prototypes guaranteed to solve any code in a few years if they can get a grant, and experts explaining why this has no chance to work ; how come we never see that (or am I missing the obvious) ?

  25. Scott Says:

    Dfeldmann #24: I don’t understand your question. Of course people talk a lot about Shor’s (not Schor’s) algorithm. But to get an actual advantage from it, you’d need a fault-tolerant device with thousands of logical qubits and perhaps millions of physical qubits. That’s why the discussion of near-term quantum advantages focuses on other things, like quantum simulation and random circuit sampling (and adiabatic optimization, QAOA, and quantum machine learning, although those have more of the character of “Hail Mary passes”).

  26. Michel Says:

    This discussion brings me back to my graduation days: I attacked a few – then – unsolved problems that stumped my professors, in the following way:”If it has a difficult (dis)proof, I can not find it anyway, so what would be the simplest solution that just might work?” And yes, that worked at least three times, getting me a.o. my 1 page (math) masters thesis…

Leave a Reply

Comment Policy: All comments are placed in moderation and reviewed prior to appearing. Comments can be left in moderation for any reason, but in particular, for ad-hominem attacks, hatred of groups of people, or snide and patronizing tone. Also: comments that link to a paper or article and, in effect, challenge me to respond to it are at severe risk of being left in moderation, as such comments place demands on my time that I can no longer meet. You'll have a much better chance of a response from me if you formulate your own argument here, rather than outsourcing the job to someone else. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.

You can now use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or \( \) for inline equations.