From quantum supremacy to classical fallacy

Retrospective Comment (Dec. 26, 2019): While I basically stand by what I wrote in this post, I wanted to call attention to the fact that, in its aftermath, one of the authors of the p-bit paper—Kerem Camsari—displayed a striking degree of intellectual courage and honesty. He showed up in the comments section to defend the motivation for the p-bit model, but also to concede the points I’d raised about scaling. Notably, on some matters, he explicitly broke with his own coauthors. He treated having his paper harshly criticized on Shtetl-Optimized not as a personal attack, but as an opportunity to learn and grow. I’m not sure that I would’ve been able to do the same in his shoes, and I regard it as one of the happier outcomes in this blog’s history. –SA

Maybe I should hope that people never learn to distinguish for themselves which claimed breakthroughs in building new forms of computation are obviously serious, and which ones are obviously silly. For as long as they don’t, this blog will always serve at least one purpose. People will cite it, tweet it, invoke its “authority,” even while from my point of view, I’m offering nothing more intellectually special than my toddler does when he calls out “moo-moo cow! baa-baa sheep!” as we pass them on the road.

But that’s too pessimistic. Sure, most readers must more-or-less already know what I’ll say about each thing: that Google’s quantum supremacy claim is serious, that memcomputing to solve NP-complete problems is not, etc. Even so, I’ve heard from many readers that this blog was at least helpful for double-checking their initial impressions, and for making common knowledge what before had merely been known to many. I’m fine for it to continue serving those roles.

Last week, even as I dealt with fallout from Google’s quantum supremacy leak, I also got several people asking me to comment on a Nature paper entitled Integer factorization using stochastic magnetic tunnel junctions (warning: paywalled). See also here for a university press release.

The authors report building a new kind of computer based on asynchronously updated “p-bits” (probabilistic bits). A p-bit is “a robust, classical entity fluctuating in time between 0 and 1, which interacts with other p-bits … using principles inspired by neural networks.” They build a device with 8 p-bits, and use it to factor integers up to 945. They present this as another “unconventional computation scheme” alongside quantum computing, and as a “potentially scalable hardware approach to the difficult problems of optimization and sampling.”

A commentary accompanying the Nature paper goes much further still—claiming that the new factoring approach, “if improved, could threaten data encryption,” and that resources should now be diverted from quantum computing to this promising new idea, one with the advantages of requiring no refrigeration or maintenance of delicate entangled states. (It should’ve added: and how big a number has Shor’s algorithm factored anyway, 21? Compared to 945, that’s peanuts!)

Since I couldn’t figure out a gentler way to say this, here goes: it’s astounding that this paper and commentary made it into Nature in the form that they did. Juxtaposing Google’s sampling achievement with p-bits, as several of my Facebook friends did last week, is juxtaposing the Wright brothers with some guy bouncing around on a pogo stick.

If you were looking forward to watching me dismantle the p-bit claims, I’m afraid you might be disappointed: the task is over almost the moment it begins. “p-bit” devices can’t scalably outperform classical computers, for the simple reason that they are classical computers. A little unusual in their architecture, but still well-covered by the classical Extended Church-Turing Thesis. Just like with the quantum adiabatic algorithm, an energy penalty is applied to coax the p-bits into running a local optimization algorithm: that is, making random local moves that preferentially decrease the number of violated constraints. Except here, because the whole evolution is classical, there doesn’t seem to be even the pretense that anything is happening that a laptop with a random-number generator couldn’t straightforwardly simulate.

Even so, I wouldn’t be writing this post if you opened the paper and it immediately said, in effect, “look, we know. You’re thinking that this is just yet another stochastic local optimization method, which could clearly be simulated efficiently on a conventional computer, thereby putting it into a different conceptual universe from quantum computing. You’re thinking that factoring an n-bit integer will self-evidently take exp(n) time by this method, as compared to exp(n1/3) for the Number Field Sieve, and that no crypto is in even remote danger from this. But here’s why you should still be interested in our p-bit model: because of other advantages X, Y, and Z.” Alas, in vain one searches the whole paper, and the lengthy supplementary material, and the commentary, for any acknowledgment of the pachyderm in the pagoda. Not an asymptotic runtime scaling in sight. Quantum computing is there, but stripped of the theoretical framework that gives it its purpose.

That silence, in the pages of Naturethat’s the part that convinced me that, while on the negative side this blog seems to have accomplished nothing for the world in 14 years of existence, on the positive side it will likely have a role for decades to come.

Update: See a response in the comments, which I appreciated, from Kerem Cansari (one of the authors of the paper), and my response to the response.

(Partly) Unrelated Announcement #1: My new postdoc, Andrea Rocchetto, had the neat idea of compiling a Quantum Computing Fact Sheet: a quick “Cliffs Notes” for journalists, policymakers, and others looking to get the basics right. The fact sheet might grow in the future, but in the meantime, check it out! Or at a more popular level, try the Quantum Atlas made by folks at the University of Maryland.

Unrelated Announcement #2: Daniel Wichs asked me to give a shout-out to a new Conference on Information-Theoretic Cryptography, to be held June 17-19 in Boston.

Third Announcement: Several friends asked me to share that Prof. Peter Wittek, quantum computing researcher at the University of Toronto, has gone missing in the Himalayas. Needless to say we hope for his safe return.

57 Responses to “From quantum supremacy to classical fallacy”

  1. RandomOracle Says:

    Slight typo in the first bullet point of the fact sheet: “Some of these effects are superposition, inference” -> interference.

    I think that having a fact sheet like this is a great idea. To give some hopefully constructive criticism though, I think the current fact sheet focuses too much on what quantum computing isn’t rather than what it is. I understand that this is done in order to get rid of a lot of misconceptions from the start, but I personally would find it more helpful it there was more of a focus on how quantum computation leverages interference and the differences with respect to using classical randomness. I’m not sure how one would distill that in a couple of bullet points, though 🙂

    Also, as another suggestion, I think it would be helpful to give more concrete examples, even for things that are “obvious”. For instance, for the bullet point “n qubits are not the same thing as 2^n classical bits — they take 2^n classical bits to specify” maybe it would be helpful to also add something like “so to *describe* the state of 10 qubits, one would need 2^10=1024 numbers, but when we *measure* 10 qubits we always get back just 10 bits”.
    The descriptions of interference and asymptotic advantage could also benefit, I think, from some examples.

    Anyway, I don’t want to sound overly critical. I’ll also take this opportunity to say that this blog and Quantum Computing Since Democritus have been tremendously useful for me in developing my intuition (and clearing up misconceptions) about quantum-related things, so thank you!

  2. Michael Says:

    It seems that Nature and Science haven’t learned that they really need a Math&CS editor. Exactly 30 years ago Science published a “parallel O(1) time” neural network algorithm to solve a well known NP-Complete problem (Warning:paywall, See “A Near-Optimum Parallel Planarization Algorithm” ), with a “proof” that it works rather well on graphs with 10-48 vertices…

  3. Boaz Barak Says:

    That is a major failure of Nature, unless they decided they want to start competing with the Onion.

    I do appreciate that the authors are not trying to hide the fact that the proposal is nothing more than standard probabilistic computation. After all, they even helpfully named their bits “p bits”.

    BTW not that it matters for computation, but I believe most Intel processors these days even ship with a hardware random bit generator. Maybe that can be sold as yet another advantage of this model over quantum computing – unlike quantum computing where it’s very hard to estimate certain quantities, research on stochastic computers can proceed much faster since we can simulate them on standard laptops – imagine the possibilities!

  4. Peter Morgan Says:

    If we equate a p-bit with a random variable that has a sample space of {0,1}, then we’re a small way towards an indexed array of random variables that all have a sample space (-∞,∞), which is called a “random field” in the literature. This is appropriate for modeling a noisy analog computer. Forcing the sample space to be {0,1} is just to choose a specific set of observables of a given noisy analog computer.
    If, as an ideal case, the index set of the random field supports an appropriate action of the Poincaré group, and we introduce a Poincaré invariant state over it, we can use the GNS-construction to obtain a Hilbert space that is isomorphic to the Hilbert space of the quantized EM field. More than that, we can construct for the commutative *-algebra of observables A of the random field and the noncommutative *-algebra of observables B of the quantum field a *-algebra isomorphism A∨{V}≡B∨{V}, where ∨ indicates free generation by the elements of the two sets of operators and where V is the vacuum projection operator (which we use implicitly every time we consider a transition probability in quantum optics, and which is as theoretically well-defined for a noisy analog computer, as the ground-state projection operator).
    In the usual way of talking about classical mechanics, the algebra of observables is commutative and that’s that, end of story, but classical signal analysis has been working with noncommutative algebras of observables ever since the invention of the fourier transform. In the random field way of thinking, that whole difference can be subsumed in the difference between A and A∨{V}, although one can also approach the question by considering the role of the Poisson bracket in generating a noncommutative algebra of actions on the commutative algebra of observables.
    Be careful complaining about people not knowing *your* literature and mathematics. The stuff I’ve just described has been nascent in the literature since 1931, and has increasingly been percolating out since about 2000, but you keep repeating that analog computing is grossly different from quantum computing, with this post merely the latest, whereas the difference is much more subtle. Turing computing is different from QC, but analog computing is not so much. The above might possibly make the content of arXiv:1709.06711 (where there’s a DOI for Physica Scripta 2019) more accessible for Shtetl-optimized persons, though, sadly, many people tell me it’s too hard to understand. Of course, this last means that I can agree that this paper ought not to have been published in Nature, for it does not cite literature that it ought to have cited????.

  5. Scott Says:

    Peter Morgan #4:

    (1) Quantum computing is grossly different from classical analog computing, because only for the former and not for the latter do you have a fault-tolerance theorem. That’s a crucial difference: it’s why, as far as we know today, the former is scalable in our actual world and the latter is not.

    (2) Let’s keep things simple. Do you predict that a “p-bit computer” can do something that a probabilistic Turing machine can’t efficiently simulate, or don’t you? If yes, how does it do it? If no, then don’t you agree with me about the essential point, however convoluted and incomprehensible is your manner of expressing it? 😀

  6. Kerem Camsari Says:

    Dear Scott:

    I am one of the co-authors of the article “Integer Factorization Using Stochastic Magnetic Tunnel Junctions” and it is very disappointing to read this post, not because we disagree with what bothers you, but because our position is not correctly described.

    Firstly, the value of this paper is in terms of a “potentially scalable hardware approach to the difficult problems of optimization and sampling”, in other words, scaling refers to the hardware and not the algorithm. The key point as noted in the paper is that a random number generator in a laptop would be at least 300 times bigger and consume 10 times more energy than what is possible with the demonstrated hardware. This comparison is in the paper and was noted in the title of the editorial “how to make computing more sustainable”.

    No claims were made about violating the Extended Church-Turing thesis. Neither was this thesis violated as we scaled from 30K transistors of Intel 8086 to the billion plus transistors in our smartphones and yet no one doubts the value of this achievement.

    A secondary purpose of this paper was to point out that AQC can be “emulated” by an integrated p-bit chip, as long as the system is sign-problem free or “stoquastic”. We were not silent about this limitation. To quote from the paper,

    “However, we note that the replicated p-bit approach to quantum computing is established only for a subset of quantum Hamiltonians that do not suffer from the ‘sign-problem’ associated with negative probabilities and are commonly referred to as ‘stoquastic’.”

    As for factorization it was clearly noted that far better algorithms are available, but the

    “purpose of this study was to establish a system that is suitable for solving optimization problems in general, these factors mentioned above are very attractive, particularly considering the energy and surface area advantages.”

    As for the journalistic reporting: Unfortunately, we have no control over what is written and claimed by journalists. As for stating these facts loud and clear: If you read the paper, they are all there. Everyone has their favorite “qualification” that they really like to see upfront and it is difficult to satisfy everyone. We are happy to discuss further.

  7. Stella Biderman Says:

    As a general rule, I’ve been assuming that CS papers published in Nature (and Science, to as lesser extent) that aren’t very applied are likely nonsense. There’s a minor exception for an overview of extremely high profile work, such as AlphaGo or LeCun, Bengio, and Hinton’s “Deep Learning,” but such works tend to be relatively surface level and have sister papers published in CS conferences that are much more worth reading.

  8. Peter Morgan Says:

    (1) Noisy analog computing *fully realized*, as I would put it, is modeled by the algebra of bounded operators acting on a Hilbert space. If or once one accepts that, it’s very nearly indeed the same as QM. This doesn’t change the experimental physics whatsoever, of course, it only says that there is a noisy analog computing way of thinking about those experiments that is different from the QM way (it’s also different from the de Broglie-Bohm way, because it’s a random field formalism that works naturally for the relativistic case of the EM field, using the Koopman Hilbert space formalism for CM, which is the 1931 I referred to above, instead of the Hamilton-Jacobi formalism as its starting point.)

    From that perspective, your (2) sets up a straw man, noisy analog computation limited —I think, because I don’t read experimental physics papers real well— so that the algebra of observables is commutative. That is, I think a “p-bit computer” cannot do something that a probabilistic Turing machine can do.
    For noisy analog physics, however, one can construct a noncommutative algebra of projective measurements as transition probabilities between states at start and end times, just as one does for quantum theory, which in the math is equivalent to introducing a vacuum (or ground-state) projection operator (there is a natural introduction: for a state ρ over a commutative algebra of operators, define the ground-state projection operator as ρ(AVB)=ρ(A)ρ(B), from which noncommutativity very quickly follows).

    There is at least one significant difference between CM in the Koopman formalism and QM, which is that the Hamiltonian function in CM is of course bounded below, but the Liouville operator in CM that one constructs using the Poisson bracket to generate evolution in CM is not bounded below, whereas the Hamiltonian operator in QM that generates evolution in QM is bounded below. That is, Koopman CM is not exactly the same as QM.

    I can assure you that I’m deeply sorry that I can’t express this other than convolutedly and incomprehensibly. SMH at myself. I’m told that my arXiv:1901.00526, which has been under review with Annals of Physics for the last three months, is not much better. Better people than me have tried and failed over the last 20 years to make Koopman’s Hilbert space formalism for CM go more mainstream, and I’m likely just one more on the pile. To me, your pronouncements on noisy analog computing are woefully wrong, though not more woefully wrong than other people’s, but we’ll see what people think in ten years time. Some smart grad student will hopefully do it right and get the credit for doing it right, but I think it won’t be easy and it will be risky for them to try.

  9. Scott Says:

    Kerem Camsari #6: Thank you for the response—I really appreciate it.

    I agree that you never claim that the p-bit proposal violates the Extended Church-Turing Thesis. (Nature‘s accompanying commentary does come very close to claiming that, but the responsibility for that rests with the commentary’s author, Dmitri Nikonov, not with you.)

    I also agree that the paper acknowledges that there are better factoring algorithms, as part of a discussion of why you’re not even going to compare the factoring performance against standard computers’. Unfortunately, that same discussion explains that p-bit machines are expected to do best on problems that (unlike factoring) involve approximate answers—which begs the crucial question of whether, and why, one should expect p-bit machines to give any advantage for those problems either!

    Anyway, my main issues with your paper are as follows:

    (1) The fact that your paper never claims to violate the ECT, is simply a special case of its never saying anything about time scaling, period … even though that’s the first, overwhelmingly obvious question to ask about any new physical proposal for solving optimization problems! I realize that non-CS authors might not fully appreciate the enormity of this issue, or the decades-long history of proposals broadly similar to yours that were wrongly claimed to solve NP-hard problems in polynomial time (computer science’s “perpetual motion machines”). If so, then it may have been Nature‘s review process that failed you: a reviewer or editor should’ve insisted on a clear and prominent scaling discussion before the paper appeared.

    (2) Your paper does compare p-bit computers to quantum computers, in a way that misleadingly represents them as similar categories of thing. With a quantum computer, the entire point is to get better asymptotic scaling behavior. With p-bit computers, by contrast, we both agree that that’s not the point.

    (3) In your comment, you take the position that the advantage of p-bits over classical computers is a constant-factor savings (in size or energy) in the cost of random-number generation. But if you really wanted to rest your case on that, you’d then need to examine whether pseudorandom bits, perfectly sufficient for whatever application you had in mind, could be generated much more cheaply with a standard computer. Also, it begs the question: if more efficient randomness generation is the entire advantage, then why not simply use the p-bit machine as a fast, low-power random number generator—while doing the actual computation (the optimization and so forth) on a conventional device? If a hybrid scheme like that made sense, then p-bit machines would turn out to be valuable and interesting—but for utterly different reasons than the ones in your paper.

    Finally, let me address your comments about stoquastic AQC. As far as I’m aware, the power of stoquastic adiabatic optimization remains one of the most notorious open problems in quantum computing theory. It might be that such optimization can always be efficiently simulated classically (which, in particular, would imply that D-Wave’s approach not only failed, but must have failed). But there are significant obstructions to proving it—see for example this paper by Hastings and Freedman, or the glued trees paper, which gives an example that might be possible to turn into an artificial exponential speedup for stoquastic AQC.

    Now, unless I’m missing something, I don’t see how p-bits change this discussion at all. If it turned out that stoquastic AQC could be efficiently simulated by a classical computer, then it could also be efficiently simulated by a p-bit machine (which from a theoretical standpoint simply is a classical computer), and if not not, right?

  10. Rand Says:

    Re Kerem #6: The second line of the abstract says:

    “Despite the evolution of conventional computers into sophisticated
    machines, there are many classes of problems that they cannot
    efficiently address, including inference, invertible logic, sampling
    and optimization, leading to considerable interest in alternative
    computing schemes.”

    The following line discusses quantum computing which “is expected to perform these tasks
    efficiently”. (Oddly, the references are to Feynman’s ‘Simulating physics with computers’ and Shor’s algorithm, which don’t address the problems discussed.) And then it discusses its own model, strongly implying that, like quantum computing, it provides asymptotic speedup on certain problems.

    Even if it’s not explicit in the paper (which I haven’t read), it openly invited the confusion with quantum computing and wild claims about a “new kind of computing”.

  11. Rand Says:

    I don’t love the fact sheet.

    2-5 are all about what quantum computing is not. It’s more productive to say what quantum computing is. (And a mathematical formula, while off-putting to some, is better than unexplained jargon in italics.)

    I don’t understand the authors’ insistence (well, Scott’s insistence) that using “at the same time” or “simultaneously” is a terrible mistake. A plus state meaningfully has two spins at the same time. I find that it’s better to say that a quantum state is in a combination of 2^n states, but we’re limited in the operations we can apply to it (unitary ones) and how we can extract information (get a given basis state with probability proportional to its amplitude). “We can’t see it” is something most small children should be able to understand.

    5: What axioms? Is this useful/important information?

    (I understand the Categorical QM guys’ “what, it’s just a symmetric monoidal category – why would you want it to be Cartesian???” but I don’t think that’s the approach you want to take.)

    The first sentence in 10 seems false and the second like an oversimplification. 1) we have machines that function per our models, they just have shortcomings. 2) Current quantum computers are limited to under 100 qubits, typically with high error, interference between qubits and very low coherence times. But those differ dramatically between devices. I’d just say we don’t have reliable quantum computers at scale, and won’t likely have them in the near future – hence computer security isn’t at risk.

    Relatedly, why no point 15 about “quantum advantage”? Quantum supremacy is darn cool, quantum advantage is important.

  12. Job Says:

    I had an idea a while ago for a probabilistic bit that enables “interference”. 🙂

    You’d represent a bit using a coin, where a fair coin is zero and an unfair coin is 1.

    You would set the bit to 1 by biasing the coin for either heads or tails. That way head 1s can interfere with tail 1s, and you get 0.

    That’s the best i could do after much deep thought.

  13. Stella Biderman Says:

    In more positive news, Nature published a clearly bullshit neural networks paper on earthquake prediction a year ago that was heavily criticized on blogs and reddit. They just published another paper criticizing it, though they haven’t retracted the original paper.

    Apologies, I can’t seem to get the html to work right.

    Original paper (paywalled):

    Criticism on a blog:

    Discussion on reddit:

    Criticism paper:

  14. Stella Biderman Says:

    Karem #6

    The following is a lengthy quote from your abstract:

    “… Despite the evolution of conventional computers into sophisticated machines, there are many classes of problems that they cannot efficiently address, including inference, invertible logic, sampling and optimization, leading to considerable interest in alternative computing schemes. Quantum computing, which uses qubits to represent a superposition of 0 and 1, is expected to perform these tasks efficiently. However, decoherence and the current requirement for cryogenic operation, as well as the limited many-body interactions that can be implemented, pose considerable challenges. Probabilistic computing is another unconventional computation scheme that shares similar concepts with quantum computing but is not limited by the above challenges. The key role is played by a probabilistic bit (a p-bit)—a robust, classical entity fluctuating in time between 0 and 1, which interacts with other p-bits in the same system using principles inspired by neural networks. …”

    While you do not say the words “p-bit computers are expected to be able to preform these tasks efficiently” it is extremely hard to credit the idea that the reader was supposed to take away any other impression. Every person I asked about this passage either responded “yes, p-computers can efficiently solve problems classical can’t” or “strictly speaking they didn’t /say/ that, but it’s the obvious intended impression.” What did you think you were communicating with this passage? Did nobody ask you about this during the review period? That too is extremely surprising and worrying (Nature’s fault, not yours).

  15. Kerem Camsari Says:

    Dear Scott (#9):

    Appreciate the thoughtful response! A few short answers to some of the important points you are raising.

    One question you have raised repeatedly is whether p-bit machines can give any advantage over deterministic machines. The answer depends on what you mean by “advantage”.

    In general, we can write the time for a particular problem in the form t = t0 * f(N) where f(N) is a function that defines the scaling properties with respect to the problem size N. In CS, one seems to define advantage solely in terms of this function.

    But from a hardware point of view one of the key items is the prefactor t0. Indeed, one could argue that it is improvements in the prefactor t0 that have brought us from Intel 8080 to the modern Intel Xeon.

    This paper shows that p-bit machines can lead to further advantages in t0. This advantage comes not only from the compact random number generators enabled by modern spintronics, but also from constructing three-terminal transistor-like units out of them to build networks of p-bits that are appropriately correlated to solve relevant problems. Further advantage comes from a large scale (hardware), asynchronous (effectively parallel) operation as noted in the concluding paragraph.

    Note that this is not an algorithmic paper, but a hardware centric one. Half the coauthors are pioneers in MRAM technology that has been greatly scaled in density for integrated circuits in recent years. As such it is not clear why a scaling discussion of f(N) is required, any more than algorithmic CS papers should be required to discuss the prefactor t0.

    The connection to quantum computing comes up primarily because we are implementing an algorithm borrowed from the adiabatic quantum computing literature. Indeed, NISQ-era QC seems to be tackling problems that seem more naturally suited to probabilistic computing, e.g. optimization. As such it is important to scrutinize potential applications to see if they truly require the amazing power of quantum computing.

    Why charter a plane to go where you can go on a pogo stick? 🙂

    Once again, thank you for your thoughtful critique and interest. Happy to discuss further!

  16. Ralf Says:

    Rand (#11):
    > A plus state meaningfully has two spins at the same time. I find that it’s better to say that a quantum state is in a combination of 2^n states, but we’re limited in the operations we can apply to it (unitary ones) and how we can extract information (get a given basis state with probability proportional to its amplitude). “We can’t see it” is something most small children should be able to understand.

    From what I understood, a plus state has “two spins at the same time” in the same sense in which the diagonal vector (1, 1) points in “two directions at the same time” (the x-axis and the y-axis). I think that’s a rather silly way to describe a diagonal vector, and makes it sound much more mysterious than it really is. And this does not even get into the problem that this “multiple state” business is basis-dependent, and the choice of basis is ultimately arbitrary.

    Or maybe I misunderstood the little bits and pieces of quantum physics that I picked up along the way? Unfortunately I never learned it properly. So please correct me if what I say makes no sense. 😀

  17. Stella Biderman Says:

    Karem #15:

    In your abstract (sadly I do not have the ability to read your paper due the paywall) you use the term “scalable” in reference to p-computation. Specifically, you say “results show good agreement with theoretical predictions, thus providing a potentially scalable hardware approach to the difficult problems of optimization and sampling.“

    What does that term mean to you? To me, it is impossible to have an efficient, scalable algorithm for factoring that doesn’t make massive advances on the f(N) factor. Yes, I saw your previous comment remarking that the scalability is directed at the hardware rather than the software, but on a fundamental level I don’t know what that could possibly mean. If the run-time increases exponentially with the size of the input, where’s the efficiency? And if it’s somehow supposed to be scalable but *not* efficiently scalable what’s the point? Everything is inefficiently scalable!

    Is there some kind of “intermediate scale” where exponential growth hasn’t taken over the run-time but the numbers are still large enough that we care about making our computations faster? As a practical matter the answer seems to be “no” to me, or at least “yes, but not in a sense that we care about.” The problem is that every application of factoring “intermediate scale” numbers that I can think of has intermediate-scaleness as a result of the limitations of current techniques. If you can build an intermediate-scalable p-computer able to break 128 bit encryption, people will just swap for 256-bit encryption, whine about the increased run-time, and carry on secure in the knowledge that your approach is no longer practical. If you don’t do something about the exponential growth and on tough the t0 term, there’s always going to be a relatively easy way to buff cryptosystems so that your approach can’t touch it. So if this is what you have in mind, that doesn’t seem particularly compelling to me.

    Both the Nature communication and the university press release clearly believe that these results are aimed at factoring extremely large numbers. I know you didn’t write either, but do you feel that either author understands what your paper is about?

  18. Peter Morgan Says:

    Perhaps you or someone else will answer again or not: I think this more theoretical take does inform the paper you posted about, but I suppose others may not think so. Thank you for your first response, in any case. Perhaps as a last gasp, let me go at the fundamentals from the direction that I take in arXiv:1901.00526.
    A common way to talk about CM is to say that it’s a commutative algebra of observables F(q,p) of the position and momentum, with the commutative operations of addition and multiplication. In this it’s different from QM, which has a non-commutative algebra of observables F(Qq,Qp), where Qq and Qp do not commute. Notoriously, we can’t map q,p to Qq,Qp.
    CM also includes, however, the Poisson bracket structure, which is used to construct a noncommutative group of actions on the algebra of observables, such as time translation, rotations, et cetera. In particular, the Poisson bracket allows us to construct the operators ∂/∂q and ∂/∂p. This construction is not specially natural in the Lagrangian and Hamiltonian approaches to CM, which more fool them, but it’s very natural in the Koopman Hilbert space approach to CM. With this, we can happily map q,p,∂/∂q,∂/∂p to twice as many QM degrees of freedom Qq₁,Qq₂,Qp₁,Qp₂. This works out nicely for the EM field because there are left and right helicities in the quantum field. I’ll just say baldly here that this comes into its own for noisy analog computing modeled as a random field: arXiv:1901.00526 has to work quite hard to try to make a claim such as that compelling for CM.
    Classical physicists need to use ∂/∂q and ∂/∂p and everything freely generated by their use, because there are some combinations of classical measurement results and classical analysis of those results that violate Bell-type inequalities, which can only be modeled using a noncommutative algebra of operators (Landau, Phys.Lett.A 120, 54(1987)). If a classical physicist decides not to use ∂/∂q and ∂/∂p, that’s their problem, but to deny a classical physicist the use of ∂/∂q and ∂/∂p is to create a straw man.
    This “high” theory is one thing, really making it a workable system is another, which I frankly don’t much care about (though I take it others do), because QM is a perfectly good system and I find having the evolution operator not bounded below (which I mentioned in #8) distracting, but, going forward, I suggest that you should be very clear, as you are above, that the comparison you are making is with a restricted classical system such as a “probabilistic Turing machine”, not with the best that CM and noisy analog computing can do.

  19. Scott Says:

    Rand #11: Andrea’s fact sheet is just an “early beta” requiring feedback and improvement. But I just reread the relevant sentence:

      A qubit is not “0 and 1 at the same time” but a complex linear combination of 0 and 1.

    and I fully endorse it! It doesn’t simply deny the usual crude translation of “superposition” into everyday English: instead, and crucially, it immediately goes on to correct it to something subtler and truer.

    Yes, “0 and 1 at the same time” gestures toward a defensible metaphysical position (namely, Everett’s)—but when popular writers say it, most of the time they then give the game away by revealing that they really do think of it as just a massively parallel classical computation, rather than as a probabilistic computation with a new and unfamiliar version of probability. Which then leads to all the usual wildly-wrong claims about which problems a QC would be able to solve efficiently (or at best, to someone who’s utterly mystified about why a QC couldn’t solve those problems, and is just taking it on experts’ authority that they can’t).

    So, rather than waiting for this misconception to grow and blossom, why not nip it at the bud by replacing “0 and 1 at the same time” by something truer and better? 🙂

  20. Andrea Says:

    Rand #11: first of all, thanks for your comments. As Scott mentioned, the fact sheet is a beta and definitely requires improvements, possibly with feedback from all the community.

    You are disappointed that the first five points of the fact sheet focus on what quantum computing is not. But, as we state in the introduction, this is the whole point of the exercise. We read the same mistakes again and again in the popular press and we felt that a list of common misconceptions could be a useful addition to improve the quality of the coverage of the filed. If someone is interested in knowing more about quantum computing, or the unexplained jargon in italics, there are many excellent resources on the Internet.

    I agree that #5 can sound obscure. The message we wanted to convey is that entanglement does not follow from some “magic” we do not understand but it is a direct consequence of a consistent theory. If you have suggestions on how to improve this point please let us know.

    With #10 we wanted to avoid the confusion that arises when new experimental results are juxtaposed to general explanations about the power of quantum computing. A non-expert may rightly ask “wait, if we have a quantum computer capable of performing quantum supremacy, why can’t we do factoring?”. When you say “we have machines that function per our models, they just have shortcomings” it looks like our disagreement is more a matter of semantic than actual content. I am sure we both agree than to factor a number that is beyond the reach of present classical computers we need a fault-tolerant machine, and such a device (which I identify with “[a machine]  capable to function according to our theoretical models”) does not exist. Perhaps a better phrase could have been “Presently, fault-tolerant quantum computers do not exist.” But I am worried that the concept of fault-tolerance may result too technical and water down the effectiveness of the point.

    Quantum advantage does not seem something journalists tend to get wrong so we felt there was no “common mistake” to single out.

  21. Peter Morgan Says:

    For something different from me, the idea in Andrea’s Fact Sheet that we can tell ourselves what a qubit is is to me too realist. I propose, somewhat more instrumentally:
    For a 1-qubit prepared state, we have a class of measurements available: (1) all of those measurements give results on each trial of either 0 or 1, and (2) the ways in which the statistics of those measurements are related to each other can be described by a point that is placed anywhere in or on a sphere.
    For an n-qubit prepared state, we have a class of measurements available: (1) all of those measurements give n results on each trial, with each of those n results being either 0 or 1, and (2) the ways in which the statistics of those measurements are related to each other can be described by a positive, normalized, hermitian 2ⁿ-by-2ⁿ complex matrix.
    Gotta say that I don’t like “4. n qubits are not the same thing as 2ⁿ classical bits — they take 2ⁿ classical bits to specify, but you cannot extract that many by measuring them”, insofar as an n-qubit state requires an uncountable number of bits to specify. I’ve never seen an accessible, good story for the n-qubit case (you’ll be shocked to hear that I think mine above isn’t one).
    There ya go????. Of course we have to specify what projection operator corresponds to a given measurement device as well as specifying the state, which adds no end to the fun.

  22. Jay Says:

    Andrea #20,

    >Perhaps a better phrase could have been “Presently, fault-tolerant quantum
    >computers do not exist.” But I am worried that the concept of fault-tolerance
    > may result too technical and water down the effectiveness of the point.

    From my half* layman perspective (e.g. a layman that spent some time reading SA), I think this is likely the best sentence! Yes it forces talking about fault-tolerance, but imho this is exactly what you want to become common knowledge.

    As an analogy: everyone knows E=mc2. Some will get confuse about the energy of a photon, but the main message is clear: “beware matters can turn into a lot of energy”. If everyone knows “Presently, fault-tolerance computer does not exist”, the main message will be as clear: “beware if someone solves the fault-tolerance stuff”.

  23. Rand Says:

    Ralf #16: Reference frames are arbitrary too, aren’t they? Doesn’t mean they aren’t important or meaningful. Personally, I like to pick a basis and reference frame and stick with it. The frequency with which physicists switch frames/bases seems disorienting.

  24. Rand Says:

    Scott #20:

    Because of overcorrection! I have some friends who will immediately discount quantum computing articles when they talk about qubits being in multiple states at the same time. Even for the staggeringly good articles at

    And my only objection to “a complex linear combination of 0 and 1” is that ‘complex’ and ‘linear’ both have multiple meanings which are apt to confuse. Maybe a “weighted combination” would be more accessible while still being pretty faithful?

    (Also, “a probabilistic computation with a new and unfamiliar version of probability” doesn’t seem like a good description at all. I’d lean towards “linear algebraic computation with frustrating restrictions”.)

  25. Rainer Says:

    I have a comment to “quantum supremacy”:
    If “quantum supremacy” is a complexity theoretic statement then it is a statement about an asymptotic behavior. But then, as long as it is not clear if quantum bits scale arbitrarily, one cannot claim the Google has shown “quantum supremacy”.

  26. Scott Says:

    Rainer #25: The problem with that line of thinking is that, if you were really consistent about it, then no experiment should ever count toward quantum supremacy. Even if someone built a billion-qubit device, who could say everything wouldn’t break down at 2 billion qubits?

    But this is silly. Ultimately, we care about asymptotic complexity because it does help us make predictions, even about our messy finite world. In the case of Google’s announcement, what we have is:

    (1) a factor-of-roughly-1015 speedup over the best known classical algorithms in the number of elementary operations needed to perform a well-defined task (doing in ~3 minutes what would’ve taken ~10,000 years for a cluster of 1 million CPUs),

    (2) if it matters, a potential application for that speedup (namely, certified random number generation),

    (3) a prediction from quantum complexity theory that the speedup is asymptotically exponential in n, and

    (4) no proposed way to explain the actual observed speedup without making reference to the asymptotic one.

    What we don’t have, not yet, is error-correction or fault-tolerance. But as long as one admits that the “quantum supremacy” milestone is possible or meaningful at all in the era before fault-tolerance, I don’t see how to deny that it looks like Google has achieved it.

  27. Rand Says:

    Andrea #21:

    5) If I understand you correctly, you want to say something like “superposition, interference and entanglement are not additional layers of quantum mechanics, they are just aspects of a straightforward mathematical system”. Maybe examples of misleading statements would be helpful? (Both for journalists and experts understanding of what you’re trying to avoid.)

    10) Instead of “Presently, fault-tolerant quantum computers do not exist” how about “All current quantum computers are small and error-prone”? (With the possible addition of “making them only useful for very small, tailor-made problems”.)

    I feel like quantum advantage would be nice to contrast with quantum supremacy, and informative. (“Quantum advantage”, after all, is both the goal (cf. 6,7,8) and what current machines are incapable of (10), to the best of our knowledge.)

  28. Rhenium Says:

    Wow, I saw this paper mentioned in Nature, but didn’t get a chance to read it before I went off to give my lecture. I get back and Scott has a response all ready, thanks!

  29. Scott Says:

    Rand #27: What, in your mind, is the difference between quantum supremacy and quantum advantage? I’d thought of them as basically synonyms.

  30. Ether Says:

    Scott #29: Your comment hints that Google’s efforts in generating certified random numbers is at an advanced stage 🙂

  31. Rand Says:

    I think of quantum supremacy as a quantum computer solving some problem (no matter how convoluted) that a traditional computer cannot solve in a reasonable amount of time. It’s mainly evidence that QC works.

    Quantum advantage refers to a real-world advantage for quantum computers, on problems that people care about. (In practice, presumably problems that physicists or chemists care about.) It’s QC being useful, to non-QC people.

    Since IBM is mainly pushing the notion (and it sounds loose), let’s see what they say:

    Bob Sutor: “Quantum advantage is this idea of saying: For a real use case — like financial services, AI, chemistry — when will you be able to see, and how will you be able to see, that a quantum computer is doing something significantly better than any known classical benchmark?”

    And here’s their quantum advantage press release:

  32. Scott Says:

    Ether #30: No, I simply didn’t know that “quantum advantage” had any connotations of practicality, compared to “quantum supremacy.”

    But, yes, they’re working on the randomness thing.

  33. fred Says:

    Scott #26

    ” Even if someone built a billion-qubit device, who could say everything wouldn’t break down at 2 billion qubits?”

    Just thinking in terms of number of qubits is really missing a lot of very practical real limitations though.
    Even for classical computers, one major thing is how the bits are organized into an actual architecture, and this brings along a lot of scaling limitations forced by basic physics.
    E.g. you’ll notice that, over time, the size of local memory caches in CPUs hardly increases. That’s because the von neumann paradigm of having a central small processing core requires the data to be passed to and from that core to some distant bit, but you can only cram so many bits at a certain distance of a central location (at best, organizing memory into spherical layers, like onion peels, the further the layer, the more bits you have, but the slower the access, because speed of light limit).

    But the saving grace for scaling of classical computers is that, if you have two machines, you can always link them trivially to create a bigger machine (switching to a distributed architecture). The entire internet can be seen as one classical computer.

    Given what we know today, can we take two physically separated QCs and link them to get a “bigger” QC? Assuming a QC could have its state stay “perfect” indefinitely, is it possible to communicate its state over long distances, without causing a collapse of its wave function?

  34. fred Says:

    Scott #32

    “No, I simply didn’t know that “quantum advantage” had any connotations of practicality, compared to “quantum supremacy.”

    Shouldn’t “Quantum advantage” be renamed “quantum privilege”?

  35. Shtetl-Optimized » Blog Archive » Book Review: ‘The AI Does Not Hate You’ by Tom Chivers Says:

    […] The Blog of Scott AaronsonIf you take just one piece of information from this blog:Quantum computers would not solve hard search problemsinstantaneously by simply trying all the possible solutions at once. « From quantum supremacy to classical fallacy […]

  36. Perfecto Says:

    For the Google “Quantum Supremacy” result (sorry to be slow and now OT):
    What about the Markov, Fatima, Isakov, and Boxio (arXiv:1807.10749v3) preprint? They describe simulating a 7×8 (56) qubit system with similar depth (I think) in 141 hours for $35184. Wouldn’t that contradict an experimental Quantum Supremacy claim with 53 qubits? But Isakov and Boxio are also at Google. I’m confused.

  37. Scott Says:

    Perfecto #36: That would be an excellent question to direct to them, but part of the issue is probably just that, in 3 minutes, they’re able to take millions of samples on their device. And $35k × millions = well beyond the budget of even the Google QC group. 😀

  38. fred Says:

    Perfecto #36

    this point is interesting too

    “Quantum computers experience errors in qubit initialization, gates and measurements. This suggests speeding up the simulation by reducing the accuracy of results to the level attained by quantum computers.”

    They also mention that increasing the precision increases the “cost” linearly.

  39. Radar trends to watch: October 2019 – O’Reilly (via – Quantum Computing Says:

    […] (The Google paper appears to have been posted inadvertently by NASA, and then pulled down.) Scott Aaronson has a good explanation of what Google has done, why it’s important, and why it doesn’t mean the […]

  40. Perfecto Says:

    Apologies for mis-spelling Boixo’s name.

    Scott #37: That $35k is the total for generating one million results (see Table 3 caption), just like the quantum experiment. Markov et al. do write about various quantum implementations that make classical simulations harder, and perhaps that was done (but Table 3 is for “hard circuits”).

    The NASA / Google report mentions that the quantum experiment relies on “iSWAP-like gates”. Their “gates” are device-specific (and probably location-specific) with some mixture of iSWAP and CZ that is calibrated.

    But if Google (or IBM) built another quantum chip, the gate calibrations would be different, and the device could not perform the same “calculations” as the first chip. This appears to be a shifting of the “quantum supremacy” goalposts that should not be allowed.

    Fred #37: Yes, these classical simulations seem to be enabled by quantum fidelity still being rather modest. I’m trying to understand how calculations and experiments with 0.1 % overall fidelity are interpreted.

  41. Radar trends to watch: October 2019 – O’Reilly Says:

    […] (The Google paper appears to have been posted inadvertently by NASA, and then pulled down.) Scott Aaronson has a good explanation of what Google has done, why it’s important, and why it doesn’t mean the […]

  42. Job Says:

    But if Google (or IBM) built another quantum chip, the gate calibrations would be different, and the device could not perform the same “calculations” as the first chip. This appears to be a shifting of the “quantum supremacy” goalposts that should not be allowed.

    I kind of agree with this.

    I wouldn’t call it shifting the goal posts because it’s still in the spirit of random circuit sampling, where the quantum computer is really only simulating itself.

    IMO the problem is that using random circuit sampling as a basis for quantum supremacy makes it easy to overlook the actual programmability of the device, which should be a key requirement.

    Especially considering that two-qubit gates are the most difficult to implement.

    As it is, you don’t need accurate CZ gates to pull off the quantum supremacy experiment, just any mix of exotic two-qubit gates that produce consistent results.

    And it’s still impressive, in terms of overall fidelity, but i would like a higher standard for this type of milestone.

  43. Scott Says:

    Perfecto #40 and Job #42: No, that’s just a misunderstanding. The statement of the task to be solved (linear cross-entropy benchmarking with a random circuit) has no dependence on details like decoherence or gate calibration. Indeed, I’ve stressed for years that a clear, mathematical, hardware-independent problem statement is crucial to the definition of “quantum supremacy.” All that the hardware details affect is the score that you achieve on the fixed task. And while a higher score would be great, Google appears to have already achieved a score whose only known explanation involves quantum speedup—indeed, that’s the whole point.

  44. Job Says:

    The statement of the task to be solved (linear cross-entropy benchmarking with a random circuit) has no dependence on details like decoherence or gate calibration.

    Right, by definition. And that’s the problem.

    It doesn’t really matter that the QC is a mixed bag of inaccurate quantum gates depending on qubit couplings, by definition it will always do.

    All that the hardware details affect is the score that you achieve on the fixed task.

    I agree, and i don’t think there is enough focus on the programmability score.

    Sure, it’s not like there is an official award for achieving quantum supremacy, so what does it matter.

    But if there was, in terms of random circuit sampling, it would probably require the use of a single two-qubit gate of your choice.

  45. Scott Says:

    Job #44: No, you’re still not getting it. The Google team didn’t achieve quantum supremacy “by definition.” They had to spend years improving their hardware before they were able to achieve it—in other words, before they could attain a linear cross-entropy score that was detectably bounded above 0 (with their normalization) or above 1 (with mine). The goalpost that you’re trying to reach isn’t moved depending on the noise in your hardware.

  46. Job Says:

    No, you’re still not getting it. The Google team didn’t achieve quantum supremacy “by definition.”

    Is that what i said? 🙂 I said that by definition, the test can be accomplished with hardware that has a low programmability score.

    That’s what many people are picking on, for good reason. There are already non-programmable quantum systems that we can’t simulate, so it matters.

    And for that reason, if there was an official award for quantum supremacy, a hardware-independent gate set is the least you would ask for.

  47. Scott Says:

    Job #46: What does “programmability score” mean? We’re talking about a computationally universal device, limited only by the number of qubits and their coherence time (or equivalently by the depth).

  48. Job Says:

    What does “programmability score” mean? We’re talking about a computationally universal device, limited only by the number of qubits and their coherence time (or equivalently by the depth).

    One measure of programmability would be how large of a circuit you actually need in order to implement a circuit defined using a non-hardware-dependent gateset.

    In this case it could be quite large (i don’t know, it’s not explicitly discussed) because the two-qubit gates perform an operation that’s specific to each qubit pair.

    And then of course you have the corresponding fidelity hit. In a sense what’s the effective max circuit size using CZ and single-qubit gates? What’s the corresponding fidelity when such a circuit runs on this hardware?

    Some people have asked about why protein folding is not already a demonstration of quantum supremacy. Because it would be quite difficult to leverage protein folding for actual computing.

    And difficult because of how you would need to manipulate the system in order to extract relevant computation from it.

    The question about the programmability of a specific QC implementation is along these lines. And there is definitely a score.

  49. fred Says:

    My 2 cents on the QC faq (I’m the intended audience):

    2) shouldn’t it specify that “complex” refers to Complex Numbers?
    Why mention that “superposition” has no good explanation when the rest of the paragraph doesn’t use the word (it does appear in 1)).
    Also it’s not like the concept of superposition that QM uses didn’t exist prior to QM, waves on water already show the behavior (reinforcement and cancelation).

    3) shouldn’t it say that the QM is not only not equivalent to multiple classical computers running in parallel, but actually would have less power than this equivalent?
    But even though it has less power, it has enough power to do interesting things, and that “equivalent” classical computer can’t be realized in practice at all (would require exponential resources).

    5) I’m sure it’s true, but don’t you have to be an expert to understand what that even means? Also, in general, when we talk about “theories”, isn’t the choice of axioms arbitrary? (can’t one use entanglement as a primary axiom?)

    6) it could be noted that the same point applies to the comparison of any two machines or algorithms.

    9) it uses the word “believed”, is that basically for the same reason as in 8)? We can’t prove classical computers can’t do better, but similarly we can’t prove that QC can’t do better either?

    Maybe a point explaining why QC are so much harder to build than classical computers could be interesting (in terms of scalability).

  50. binyamin ben binyamin Says:

    But Scott! Don’t you realize that there’s no proof proving BQP != BPP? So yes, it’s entirely feasible “quantum speedups” could be provided by probabilistic algorithms! All those billions in “muh quantum research” down the drain! AHAHAHH!!!

  51. Scott Says:

    binyamin #50: Obviously I’m well aware of the umproved complexity conjectures that quantum supremacy is based on—did you read my FAQ??

    But note that what you said is not exactly right. Even if BPP=BQP, it would still be plausible that SampBPP and SampBQP would be different, in which case QCs could still efficiently solve classically intractable sampling problems (which is what we’re talking about here).

  52. Perfecto Says:

    My objection is to the claim that the chip performs “computations”. Scott himself calls it “quantum computational supremacy”. But, in my mind at least, computations are abstractly defined things based on simple logic. That is different from results based on a mish-mash of unitary couplings.

    For a device to be credited with any computational power, let alone “supreme” computational power, it should be able to do more than simulate itself.

  53. Scott Says:

    Perfecto #52: See, that’s exactly what I was trying to explain. The device is not simply set loose and told to “simulate itself.” Rather, it’s told to generate samples that have a large linear cross-entropy score with respect to a given ideal quantum circuit. The device can then either succeed or fail at that task, which has a mathematical definition independent of the details of the hardware (noise, calibration errors, etc). The fact that the problem is so close to the hardware is exactly why it’s important to maintain the conceptual distinction.

    The question about the $35k simulation was a good one. I emailed Sergio Boixo and will let you know what he says.

  54. Job Says:

    Isn’t the $35k simulation for quantum circuits using CZ gates though?

    That paper mentions that replacing CZ with iSWAP would make classical simulation more difficult:

    Second-generation benchmarks, with iSWAP gates instead of CZ gates, appear significantly harder to simulate, yet implementable on a quantum computer.

    And the supremacy experiment was supposedly using iSWAP gates, so i don’t see an inconsistency.

    I was actually wondering why the supremacy experiment was using iSWAP gates (or “a full iSWAP with 1/6 of a full CZ” to be more exact), so that would explain it.

    The part that i want to pick on is the fact that the two-qubit unitaries used in the experiment are themselves the result of sampling.

    And that seems weird, or is there also a reason for that?

    IMO, that’s the cheesy “self-simulation” part.

  55. Scott Says:

    Job #54: OK, thanks, now we’re finally getting to a meaty technical issue. Yes, it’s absolutely true that the Google group changed its hardware, from only supporting CZ gates to also supporting iSWAP gates, for the sole purpose of making the device harder to simulate classically (or more precisely: allowing it to solve a classically harder sampling problem). Specifically, if you have m two-qubit gates, then my understanding is that the running time of some of the relevant classical algorithms scales like 2m with CZ gates but like 4m with iSWAP gates.

    It does seem possible that this could be the resolution of the mystery of the $35k simulation.

    I don’t mind if you want to describe this as “cheesy,” but it does seem perfectly allowable under the rules of this game.

    Note that if you wanted to run (e.g.) my protocol for certified random bits, then the hardness of classical simulation is the whole point (or at least is a necessary condition), so switching from CZ to iSWAP gates would actually be a practically relevant hardware upgrade in that case.

  56. Job Says:

    I don’t mind if you want to describe this as “cheesy,” but it does seem perfectly allowable under the rules of this game.

    Just to be clear though, the cheesy part is not the use of iSWAP to make the classical simulation difficult – that part is actually great, i’m glad they decided to do that.

    It’s the use of location-specific “full iSWAP with 1/6 of a full CZ”-ish gates derived from a pre-experiment sampling run that i think lowers the bar a little.

    Basically, you’d never program on that type of a QC when you need two-qubit gates to move logical qubits around – it’s a programmability thing.

    But maybe there is a reason for doing this that i don’t fully understand.

    You mentioned that they changed from CZ to iSWAP recently so i assume that fidelity just wasn’t good enough.

  57. Predicting Predictions For the 20s | Gödel's Lost Letter and P=NP Says:

    […] Deep learning methods will be found to help prove that factoring is hard. See above. Instead, there was this (or see this). […]