D-Wave: Truth finally starts to emerge
Thursday, May 16th, 2013Wrap-Up (June 5): This will be my final update on this post (really!!), since the discussion seems to have reached a point where not much progress is being made, and since I’d like to oblige the commenters who’ve asked me to change the subject. Let me try to summarize the main point I’ve been trying to get across this whole time. I’ll call the point (*).
(*) D-Wave founder Geordie Rose claims that D-Wave has now accomplished its goal of building a quantum computer that, in his words, is “better at something than any other option available.” This claim has been widely and uncritically repeated in the press, so that much of the nerd world now accepts it as fact. However, the claim is not supported by the evidence currently available. It appears that, while the D-Wave machine does outperform certain off-the-shelf solvers, simulated annealing codes have been written that outperform the D-Wave machine on its own native problem when run on a standard laptop. More research is needed to clarify the issue, but in the meantime, it seems worth knowing that this is where things currently stand.
In the comments, many people tried repeatedly to change the subject from (*) to various subsidiary questions. For example: isn’t it possible that D-Wave’s current device will be found to provide a speedup on some other distribution of instances, besides the one that was tested? Even if not, isn’t it possible that D-Wave will achieve a genuine speedup with some future generation of machines? Did it make business sense for Google to buy a D-Wave machine? What were Google’s likely reasons? What’s D-Wave’s current value as a company? Should Cathy McGeoch have acted differently, in the type of comparison she agreed to do, or in how she communicated about its results? Should I have acted differently, in my interaction with McGeoch?
And, I’m afraid to say, I jumped in to the discussion of all of those questions—because, let’s face it, there are very few subjects about which I don’t have an opinion, or at least a list of qualified observations to make. In retrospect, I now think that was a mistake. It would have been better to sidestep all the other questions—not one of which I really know the answer to, and each of which admits multiple valid perspectives—and just focus relentlessly on the truth of assertion (*).
Here’s an analogy: imagine that a biotech startup claimed that, by using an expensive and controversial new gene therapy, it could cure patients at a higher rate than with the best available conventional drugs—basing its claim on a single clinical trial. Imagine that this claim was widely repeated in the press as an established fact. Now imagine that closer examination of the clinical trial revealed that it showed nothing of the kind: it compared against the wrong drugs. And imagine that a more relevant clinical trial—mostly unmentioned in the press—had also been done, and discovered that when you compare to the right drugs, the drugs do better. Imagine that someone wrote a blog post bringing all of this to public attention.
And now imagine that the response to that blogger was the following: “aha, but isn’t it possible that some future clinical trial will show an advantage for the gene therapy—maybe with some other group of patients? Even if not, isn’t it possible that the startup will manage to develop an effective gene therapy sometime in the future? Betcha didn’t consider that, did you? And anyway, at least they’re out there trying to make gene therapy work! So we should all support them, rather than relentlessly criticizing. And as for the startup’s misleading claims to the public? Oh, don’t be so naïve: that’s just PR. If you can’t tune out the PR and concentrate on the science, that’s your own damn problem. In summary, the real issue isn’t what some clinical trial did or didn’t show; it’s you and your hostile attitude.”
In a different context, these sorts of responses would be considered strange, and the need to resort to them revealing. But the rules for D-Wave are different.
(Interestingly, in excusing D-Wave’s statements, some commenters explicitly defended standards of intellectual discourse so relaxed that, as far as I could tell, just about anything anyone could possibly say would be OK with them—except of course for what I say on this blog, which is not OK! It reminds me of the central tenet of cultural relativism: that there exist no universal standards by which any culture could ever be judged “good” or “bad,” except that Western culture is irredeemably evil.)
Update (June 4): Matthias Troyer (who, unfortunately, still can’t comment here for embargo reasons) has asked me to clarify that it’s not he, but rather his postdoc Sergei Isakov, who deserves the credit for actually writing the simulated annealing code that outperformed the D-Wave machine on the latter’s own “home turf” (i.e., random QUBO instances with the D-Wave constraint graph). The quantum Monte Carlo code, which also did quite well at simulating the D-Wave machine, was written by Isakov together with another of Matthias’s postdocs, Troels Rønnow.
Update (June 3): See Cathy McGeoch’s response (here and here), and my response to her response.
Yet More Updates (June 2): Alex Selby has a detailed new post summarizing his comparisons between the D-Wave device (as reported by McGeoch and Wang) and his own solver—finding that his solver can handily outperform the device and speculating about the reasons why.
In other news, Catherine McGeoch spoke on Friday in the MIT quantum group meeting. Incredibly, she spoke for more than an hour, without once mentioning the USC results that found that simulated annealing on a standard laptop (when competently implemented) handily outperformed the D-Wave machine, or making any attempt to reconcile those results with hers and Wang’s. Instead, McGeogh used the time to enlighten the assembled experts about what quantum annealing was, what an exact solver was, etc. etc., then repeated the speedup claims as if the more informative comparisons simply didn’t exist. I left without asking questions, not wanting to be the one to instigate an unpleasant confrontation, and—I’ll admit—questioning my own sanity as a result of no one else asking about the gigantic elephant in the room.
More Updates (May 21): Happy 25th birthday to me! Among the many interesting comments below, see especially this one by Alex Selby, who says he’s written his own specialist solver for one class of the McGeoch and Wang benchmarks that significantly outperforms the software (and D-Wave machine) tested by McGeoch and Wang on those benchmarks—and who provides the Python code so you can try it yourself.
Also, Igor Vernik asked me to announce that on July 8th, D-Wave will be giving a technical presentation at the International Superconducting Electronics Conference in Cambridge. See here for more info; I’ll be traveling then and won’t be able to make it. I don’t know whether the performance comparisons to Matthias Troyer’s and Alex Selby’s code will be among the topics discussed, or if there will be an opportunity to ask questions about such things.
In another exciting update, John Smolin and Graeme Smith posted a paper to the arXiv tonight questioning even the “signature of quantumness” part of the latest D-Wave claims—the part that I’d been ~98% willing to accept, even as I relayed evidence that cast enormous doubt on the “speedup” part. Specifically, Smolin and Smith propose a classical model that they say can explain the “bimodal” pattern of success probabilities observed by the USC group as well as quantum annealing can. I haven’t yet had time to read their paper or form an opinion about it, but I’d be very interested if others wanted to weigh in. Update (May 26): The USC group has put out a new preprint responding to Smolin and Smith, offering additional evidence for quantum behavior in the D-Wave device that they say can’t be explained using Smolin and Smith’s model.
Update (May 17): Daniel Lidar emailed me to clarify his views about error-correction and the viability of D-Wave’s approach. He invited me to share his clarification with others—something that I’m delighted to do, since I agree with him wholeheartedly. Without further ado, here’s what Lidar says:
I don’t believe D-Wave’s approach is scalable without error correction. I believe that the incorporation of error correction is a necessary condition in order to ever achieve a speedup with D-Wave’s machines, and I don’t believe D-Wave’s machines are any different from other types of quantum information processing in this regard. I have repeatedly made this point to D-Wave over several years, and I hope that in the future their designs will allow more flexibility in the incorporation of error correction.
Lidar also clarified that he not only doesn’t dispute what Matthias Troyer told me about the lack of speedup of the D-Wave device compared to classical simulated annealing in their experiments, but “fully agrees, endorses, and approves” of it—and indeed, that he himself was part of the team that did the comparison.
In other news, this Hacker News thread, which features clear, comprehending discussions of this blog post and the backstory that led up to it, has helped to restore my faith in humanity.
Two years ago almost to the day, I announced my retirement as Chief D-Wave Skeptic. But—as many readers predicted at the time—recent events (and the contents of my inbox!) have given me no choice except to resume my post. In an all-too-familiar pattern, multiple rounds of D-Wave-related hype have made it all over the world before the truth has had time to put its pants on and drop its daughter off in daycare. And the current hype is particularly a shame, because once one slices through all the layers of ugh—the rigged comparisons, the “dramatic announcements” that mean nothing, the lazy journalists cherry-picking what they want to hear and ignoring the inconvenient bits—there really has been a huge scientific advance this past month in characterizing the D-Wave devices. I’m speaking about the experiments on the D-Wave One installed at USC, the main results of which finally appeared in April. Two of the coauthors of this new work—Matthias Troyer and Daniel Lidar—were at MIT recently to speak about their results, Troyer last week and Lidar this Tuesday. Intriguingly, despite being coauthors on the same paper, Troyer and Lidar have very different interpretations of what their results mean, but we’ll get to that later. For now, let me summarize what I think their work has established.
Evidence for Quantum Annealing Behavior
For the first time, we have evidence that the D-Wave One is doing what should be described as “quantum annealing” rather than “classical annealing” on more than 100 qubits. (Note that D-Wave itself now speaks about “quantum annealing” rather than “quantum adiabatic optimization.” The difference between the two is that the adiabatic algorithm runs coherently, at zero temperature, while quantum annealing is a “messier” version in which the qubits are strongly coupled to their environment throughout, but still maintain some quantum coherence.) The evidence for quantum annealing behavior is still extremely indirect, but despite my “Chief Skeptic” role, I’m ready to accept what the evidence indicates with essentially no hesitation.
So what is the evidence? Basically, the USC group ran the D-Wave One on a large number of randomly generated instances of what I’ll call the “D-Wave problem”: namely, the problem of finding the lowest-energy configuration of an Ising spin glass, with nearest-neighbor interactions that correspond to the D-Wave chip’s particular topology. Of course, restricting attention to this “D-Wave problem” tilts the tables heavily in D-Wave’s favor, but no matter: scientifically, it makes a lot more sense than trying to encode Sudoku puzzles or something like that. Anyway, the group then looked at the distribution of success probabilities when each instance was repeatedly fed to the D-Wave machine. For example, would the randomly-generated instances fall into one giant clump, with a few outlying instances that were especially easy or especially hard for the machine? Surprisingly, they found that the answer was no: the pattern was strongly bimodal, with most instances either extremely easy or extremely hard, and few instances in between. Next, the group fed the same instances to Quantum Monte Carlo: a standard classical algorithm that uses Wick rotation to find the ground states of “stoquastic Hamiltonians,” the particular type of quantum evolution that the D-Wave machine is claimed to implement. When they did that, they found exactly the same bimodal pattern that they found with the D-Wave machine. Finally they fed the instances to a classical simulated annealing program—but there they found a “unimodal” distribution, not a bimodal one. So, their conclusion is that whatever the D-Wave machine is doing, it’s more similar to Quantum Monte Carlo than it is to classical simulated annealing.
Curiously, we don’t yet have any hint of a theoretical explanation for why Quantum Monte Carlo should give rise to a bimodal distribution, while classical simulating annealing should give rise to a unimodal one. The USC group simply observed the pattern empirically (as far as I know, they’re the first to do so), then took advantage of it to characterize the D-Wave machine. I regard explaining this pattern as an outstanding open problem raised by their work.
In any case, if we accept that the D-Wave One is doing “quantum annealing,” then despite the absence of a Bell-inequality violation or other direct evidence, it’s reasonably safe to infer that there should be large-scale entanglement in the device. I.e., the true quantum state is no doubt extremely mixed, but there’s no particular reason to believe we could decompose that state into a mixture of product states. For years, I tirelessly repeated that D-Wave hadn’t even provided evidence that its qubits were entangled—and that, while you can have entanglement with no quantum speedup, you can’t possibly have a quantum speedup without at least the capacity to generate entanglement. Now, I’d say, D-Wave finally has cleared the evidence-for-entanglement bar—and, while they’re not the first to do so with superconducting qubits, they’re certainly the first to do so with so many superconducting qubits. So I congratulate D-Wave on this accomplishment. If this had been advertised from the start as a scientific research project—“of course we’re a long way from QC being practical; no one would ever claim otherwise; but as a first step, we’ve shown experimentally that we can entangle 100 superconducting qubits with controllable couplings”—my reaction would’ve been, “cool!” (Similar to my reaction to any number of other steps toward scalable QC being reported by research groups all over the world.)
No Speedup Compared to Classical Simulated Annealing
But of course, D-Wave’s claims—and the claims being made on its behalf by the Hype-Industrial Complex—are far more aggressive than that. And so we come to the part of this post that has not been pre-approved by the International D-Wave Hype Repeaters Association. Namely, the same USC paper that reported the quantum annealing behavior of the D-Wave One, also showed no speed advantage whatsoever for quantum annealing over classical simulated annealing. In more detail, Matthias Troyer’s group spent a few months carefully studying the D-Wave problem—after which, they were able to write optimized simulated annealing code that solves the D-Wave problem on a normal, off-the-shelf classical computer, about 15 times faster than the D-Wave machine itself solves the D-Wave problem! Of course, if you wanted even more classical speedup than that, then you could simply add more processors to your classical computer, for only a tiny fraction of the ~$10 million that a D-Wave One would set you back.
Some people might claim it’s “unfair” to optimize the classical simulated annealing code to take advantage of the quirks of the D-Wave problem. But think about it this way: D-Wave has spent ~$100 million, and hundreds of person-years, optimizing the hell out of a special-purpose annealing device, with the sole aim of solving this one problem that D-Wave itself defined. So if we’re serious about comparing the results to a classical computer, isn’t it reasonable to have one professor and a few postdocs spend a few months optimizing the classical code as well?
As I said, besides simulated annealing, the USC group also compared the D-Wave One’s performance against a classical implementation of Quantum Monte Carlo. And maybe not surprisingly, the D-Wave machine was faster than a “direct classical simulation of itself” (I can’t remember how many times faster, and couldn’t find that information in the paper). But even here, there’s a delicious irony. The only reason the USC group was able to compare the D-Wave one against QMC at all, is that QMC is efficiently implementable on a classical computer! (Albeit probably with a large constant overhead compared to running the D-Wave annealer itself—hence the superior performance of classical simulated annealing over QMC.) This means that, if the D-Wave machine can be understood as reaching essentially the same results as QMC (technically, “QMC with no sign problem”), then there’s no real hope for using the D-Wave machine to get an asymptotic speedup over a classical computer. The race between the D-Wave machine and classical simulations of the machine would then necessarily be a cat-and-mouse game, a battle of constant factors with no clear asymptotic victor. (Some people might conjecture that it will also be a “Tom & Jerry game,” the kind where the classical mouse always gets the better of the quantum cat.)
At this point, it’s important to give a hearing to three possible counterarguments to what I’ve written above.
The first counterargument is that, if you plot both the runtime of simulated annealing and the runtime of the D-Wave machine as functions of the instance size n, you find that, while simulated annealing is faster in absolute terms, it can look like the curve for the D-Wave machine is less steep. Over on the blog “nextbigfuture”, an apparent trend of this kind has been fearlessly extrapolated to predict that with 512 qubits, the D-Wave machine will be 10 billion times faster than a classical computer. But there’s a tiny fly in the ointment. As Troyer carefully explained to me last week, the “slow growth rate” of the D-Wave machine’s runtime is, ironically, basically an artifact of the machine being run too slowly on small values of n. Run the D-Wave machine as fast as it can run for small n, and the difference in the slopes disappears, with only the constant-factor advantage for simulated annealing remaining. In short, there seems to be no evidence, at present, that the D-Wave machine is going to overtake simulated annealing for any instance size.
The second counterargument is that the correlation between the two “bimodal distributions”—that for the D-Wave machine and that for the Quantum Monte Carlo simulation—is not perfect. In other words, there are a few instances (not many) that QMC solves faster than the D-Wave machine, and likewise a few instances that the D-Wave machine solves faster than QMC. Not surprisingly, the latter fact has been eagerly seized on by the D-Wave boosters (“hey, sometimes the machine does better!”). But Troyer has a simple and hilarious response to that. Namely, he found that his group’s QMC code did a better job of correlating with the D-Wave machine, than the D-Wave machine did of correlating with itself! In other words, calibration errors seem entirely sufficient to explain the variation in performance, with no need to posit any special class of instances (however small) on which the D-Wave machine dramatically outperforms QMC.
The third counterargument is just the banal one: the USC experiment was only one experiment with one set of instances (albeit, a set one might have thought would be heavily biased toward D-Wave). There’s no proof that, in the future, it won’t be discovered that the D-Wave machine does something more than QMC, and that there’s some (perhaps specially-designed) set of instances on which the D-Wave machine asymptotically outperforms both QMC and Troyer’s simulated annealing code. (Indeed, I gather that folks at D-Wave are now assiduously looking for such instances.) Well, I concede that almost anything is possible in the future—but “these experiments, while not supporting D-Wave’s claims about the usefulness of its devices, also don’t conclusively disprove those claims” is a very different message than what’s currently making it into the press.
Comparison to CPLEX is Rigged
Unfortunately, the USC paper is not the one that’s gotten the most press attention—perhaps because half of it inconveniently told the hypesters something they didn’t want to hear (“no speedup”). Instead, journalists have preferred a paper released this week by Catherine McGeoch and Cong Wang, which reports that quantum annealing running on the D-Wave machine outperformed the CPLEX optimization package running on a classical computer by a factor of ~3600, on Ising spin problems involving 439 bits. Wow! That sounds awesome! But before rushing to press, let’s pause to ask ourselves: how can we reconcile this with the USC group’s result of no speedup?
The answer turns out to be painfully simple. CPLEX is a general-purpose, off-the-shelf exact optimization package. Of course an exact solver can’t compete against quantum annealing—or for that matter, against classical annealing or other classical heuristics! Noticing this problem, McGeoch and Wang do also compare the D-Wave machine against tabu search, a classical heuristic algorithm. When they do so, they find that an advantage for the D-Wave machine persists, but it becomes much, much smaller (they didn’t report the exact time comparison). Amusingly, they write in their “Conclusions and Future Work” section:
It would of course be interesting to see if highly tuned implementations of, say, tabu search or simulated annealing could compete with Blackbox or even QA [i.e., the D-Wave machines] on QUBO [quadratic binary optimization] problems; some preliminary work on this question is underway.
As I said above, at the time McGeoch and Wang’s paper was released to the media (though maybe not at the time it was written?), the “highly tuned implementation” of simulated annealing that they ask for had already been written and tested, and the result was that it outperformed the D-Wave machine on all instance sizes tested. In other words, their comparison to CPLEX had already been superseded by a much more informative comparison—one that gave the “opposite” result—before it ever became public. For obvious reasons, most press reports have simply ignored this fact.
Troyer, Lidar, and Stone Soup
Much of what I’ve written in this post, I learned by talking to Matthias Troyer—the man who carefully experimented with the D-Wave machine and figured out how to beat it using simulated annealing, and who I regard as probably the world’s #1 expert right now on what exactly the machine does. Troyer wasn’t shy about sharing his opinions, and while couched with qualifications, they tended toward extremely skeptical. For example, Troyer conjectured that, if D-Wave ultimately succeeds in getting a speedup over classical computers in a fair comparison, then it will probably be by improving coherence and calibration, incorporating error-correction, and doing other things that “traditional,” “academic” quantum computing researchers had said all along would need to be done.
As I said, Daniel Lidar is another coauthor on the USC paper, and also recently visited MIT to speak. Lidar and Troyer agree on the basic facts—yet Lidar noticeably differed from Troyer, in trying to give each fact the most “pro-D-Wave spin” it could possibly support. Lidar spoke at our quantum group meeting, not about the D-Wave vs. simulated annealing performance comparison (which he agrees with), but about a proposal of his for incorporating quantum error-correction into the D-Wave device, together with some experimental results. He presented his proposal, not as a reductio ad absurdum of D-Wave’s entire philosophy, but rather as a positive opportunity to get a quantum speedup using D-Wave’s approach.
So, to summarize my current assessment of the situation: yes, absolutely, D-Wave might someday succeed—ironically, by adapting the very ideas from “the gate model” that its entire business plan has been based on avoiding, and that D-Wave founder Geordie Rose has loudly denigrated for D-Wave’s entire history! If that’s what happens, then I predict that science writers, and blogs like “nextbigfuture,” will announce from megaphones that D-Wave has been vindicated at last, while its narrow-minded, theorem-obsessed, ivory-tower academic naysayers now have egg all over their faces. No one will care that the path to success—through quantum error-correction and so on—actually proved the academic critics right, and that D-Wave’s “vindication” was precisely like that of the deliciousness of stone soup in the old folktale. As for myself, I’ll probably bang my head on my desk until I sustain so much brain damage that I no longer care either. But at least I’ll still have tenure, and the world will have quantum computers.
The Messiah’s Quantum Annealer
Over the past few days, I’ve explained the above to at least six different journalists who asked. And I’ve repeatedly gotten a striking response: “What you say makes sense—but then why are all these prestigious people and companies investing in D-Wave? Why did Bo Ewald, a prominent Silicon Valley insider, recently join D-Wave as president of its US operations? Why the deal with Lockheed Martin? Why the huge deal with NASA and Google, just announced today? What’s your reaction to all this news?”
My reaction, I confess, is simple. I don’t care—I actually told them this—if the former Pope Benedict has ended his retirement to become D-Wave’s new marketing director. I don’t care if the Messiah has come to Earth on a flaming chariot, not to usher in an age of peace but simply to spend $10 million on D-Wave’s new Vesuvius chip. And if you imagine that I’ll ever care about such things, then you obviously don’t know much about me. I’ll tell you what: if peer pressure is where it’s at, then come to me with the news that Umesh Vazirani, or Greg Kuperberg, or Matthias Troyer is now convinced, based on the latest evidence, that D-Wave’s chip asymptotically outperforms simulated annealing in a fair comparison, and does so because of quantum effects. Any one such scientist’s considered opinion would mean more to me than 500,000 business deals.
The Argument from Consequences
Let me end this post with an argument that several of my friends in physics have explicitly made to me—not in the exact words below but in similar ones.
“Look, Scott, let the investors, government bureaucrats, and gullible laypeople believe whatever they want—and let D-Wave keep telling them whatever’s necessary to stay in business. It’s unsportsmanlike and uncollegial of you to hold D-Wave’s scientists accountable for whatever wild claims their company’s PR department might make. After all, we’re in this game too! Our universities put out all sorts of overhyped press releases, but we don’t complain because we know that it’s done for our benefit. Besides, you’d doubtless be trumpeting the same misleading claims, if you were in D-Wave’s shoes and needed the cash infusions to survive. Anyway, who really cares whether there’s a quantum speedup yet or no quantum speedup? At least D-Wave is out there trying to build a scalable quantum computer, and getting millions of dollars from Jeff Bezos, Lockheed, Google, the CIA, etc. etc. to do so—resources more of which would be directed our way if we showed a more cooperative attitude! If we care about scalable QCs ever getting built, then the wise course is to celebrate what D-Wave has done—they just demonstrated quantum annealing on 100 qubits, for crying out loud! So let’s all be grownups here, focus on the science, and ignore the marketing buzz as so much meaningless noise—just like a tennis player might ignore his opponent’s trash-talking (‘your mother is a whore,’ etc.) and focus on the game.”
I get this argument: really, I do. I even concede that there’s something to be said for it. But let me now offer a contrary argument for the reader’s consideration.
Suppose that, unlike in the “stone soup” scenario I outlined above, it eventually becomes clear that quantum annealing can be made to work on thousands of qubits, but that it’s a dead end as far as getting a quantum speedup is concerned. Suppose the evidence piles up that simulated annealing on a conventional computer will continue to beat quantum annealing, if even the slightest effort is put into optimizing the classical annealing code. If that happens, then I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera. The same pointy-haired bosses who now flock toward quantum computing, will flock away from it just as quickly and as uncomprehendingly. Academic QC programs will be decimated, despite the slow but genuine progress that they’d been making the entire time in a “parallel universe” from D-Wave. People’s contempt for academia is such that, while a D-Wave success would be trumpeted as its alone, a D-Wave failure would be blamed on the entire QC community.
When it comes down to it, that’s the reason why I care about this matter enough to have served as “Chief D-Wave Skeptic” from 2007 to 2011, and enough to resume my post today. As I’ve said many times, I really, genuinely hope that D-Wave succeeds at building a QC that achieves an unambiguous speedup! I even hope the academic QC community will contribute to D-Wave’s success, by doing careful independent studies like the USC group did, and by coming up with proposals like Lidar’s for how D-Wave could move forward. On the other hand, in the strange, unlikely event that D-Wave doesn’t succeed, I’d like people to know that many of us in the QC community were doing what academics are supposed to do, which is to be skeptical and not leave obvious questions unasked. I’d like them to know that some of us simply tried to understand and describe what we saw in front of us—changing our opinions repeatedly as new evidence came in, but disregarding “meta-arguments” like my physicist friends’ above. The reason I can joke about how easy it is to bribe me is that it’s actually kind of hard.