Quantum Machine Learning Algorithms: Read the Fine Print

So, I’ve written a 4-page essay of that title, which examines the recent spate of quantum algorithms for clustering, classification, support vector machines, and other “Big Data” problems that grew out of a 2008 breakthrough on solving linear systems by Harrow, Hassidim, and Lloyd, as well as the challenges in applying these algorithms to get genuine exponential speedups over the best classical algorithms. An edited version of the essay will be published as a Commentary in Nature Physics. Thanks so much to Iulia Georgescu at Nature for suggesting that I write this.

Update (April 4, 2015): The piece has now been published.

This entry was posted on Monday, February 2nd, 2015 at 12:01 pm and is filed under Complexity, Quantum, Speaking Truth to Parallelism. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

60 Responses to “Quantum Machine Learning Algorithms: Read the Fine Print”

Gil Kalai Says:
Comment #1 February 2nd, 2015 at 1:58 pm
Two question about HHL: First, an important example of sparse linear equations with recent very important improvements and applications are Laplacian equations. Does HHL offers something specific for Laplacian equations and if so can it be used as a subroutine for good quantum matching and flow algorithms. Second, can the speedup achieved by HLL can be proved (via some standard CC assumptions)?
Joe Says:
Comment #2 February 2nd, 2015 at 2:21 pm
Very clear, thanks.
Scott Says:
Comment #3 February 2nd, 2015 at 2:35 pm
Gil #1: Yes, people have certainly considered applying HHL to sparse Laplacian systems—see in particular this paper by Guoming Wang, which I referenced in my article. However, I don’t know what (if anything) HHL offers that’s special for those systems, or which (if any) of its parameters improve—maybe someone else can chime in about that.

Regarding provable speedup under complexity assumptions: that’s exactly the point I was making in the article, about the problem solved by HHL being BQP-complete (something that HHL showed in their original paper). In particular, as long as you believe BPP≠BQP, you can always take whatever problem you think witnesses that separation, and reduce it to solving some sparse, well-conditioned, exponentially large, explicitly described linear system. On the other hand, as I also said in the article, if you do this, then you could argue that the “real insight” behind the speedup is coming from Shor’s algorithm or whatever other quantum algorithm you started with, not from HHL.
yme Says:
Comment #4 February 2nd, 2015 at 3:23 pm
Near the top of page 3, the sentence “The most they could say that they couldn’t find such a classical algorithm.” is missing an “is”.
Scott Says:
Comment #5 February 2nd, 2015 at 3:41 pm
yme #4: Thanks!! Fixed.
Gil Kalai Says:
Comment #6 February 2nd, 2015 at 3:52 pm
Hi Scott, regarding the second question, I was asking about “provable” in the sense available for FourierSampling or BosonSampling or related tasks- a proof that if a classical computer can perform the task then some collapse occurs. This is unavailable for BQP-complete questions but the computational complexity status of HLL algorithm (and its subroutines) is not that clear to me (since solving a linear system is not a decision problem). Also in view of the general nature of HLL algorithm it seems like a good candidate for trying an actual proof of hardness.
Scott Says:
Comment #7 February 2nd, 2015 at 3:57 pm
Gil #6: Well, you could also view HHL as solving a BQPsampling-complete sampling problem (and I believe HHL even make that point explicitly in their paper). So you could take BosonSampling, or Bremner-Jozsa-Shepherd, or whatever else is your favorite sampling-based hardness result for simulating quantum computers, and realize it using HHL.

If you’re willing to assume black-box access to the sparse matrix that’s to be inverted (and in particular, that the black box gives you pointers to where the nonzero entries are), then you should also be able to use HHL to realize an unconditional oracle separation between BPP and BQP (or scaling down, a problem solvable in quantum logarithmic time that unconditionally requires, say, √n time classically).

All these are just different ways of saying that HHL captures the whole power of quantum computation, full stop.
Jay Says:
Comment #8 February 2nd, 2015 at 4:21 pm
Near the bottom of page 2, is the sentence “but also that (1) the appropriate matrix A is sparse;” missing a “not”?
Scott Says:
Comment #9 February 2nd, 2015 at 4:39 pm
Jay #8: No, I don’t think so. For HHL to work, you want A to be sparse.
Jay Says:
Comment #10 February 2nd, 2015 at 4:41 pm
My missread, thanks.
Gil Says:
Comment #11 February 2nd, 2015 at 5:39 pm
(#7) Is anything about hardness/easiness known for the kind of problems HHL solve for linear equations based on random sparse matrices or on Laplacian matrices of random graphs?
Miquel Says:
Comment #12 February 2nd, 2015 at 6:39 pm
Thanks for sharing and well done – I found it very informative *yet* concise and readable.
Scott Says:
Comment #13 February 2nd, 2015 at 7:04 pm
Gil #11: Good question. I’m guessing that a random sparse matrix is well-conditioned w.h.p.? If so, and if you also had an oracle that told you where the nonzero entries were, then yes, HHL should work (as always, to give you specific properties of the solution vector x, not all of x). So then the only remaining question would be whether this gives rise to a hard problem classically.
aram Says:
Comment #14 February 2nd, 2015 at 9:05 pm
I think random sparse Hermitian matrices and Laplacians of random graphs have eigenvalues distributed according to the semicircle law, which means condition number scaling as poly(dim).

Anyway, no average-case classical hardness is known for the HHL problem but that’s kind of like saying htat no average-case classical hardness is known for BQP.
Rahul Says:
Comment #15 February 3rd, 2015 at 1:42 pm
Was a very nicely written & very readable paper, even for the non-expert.

I just felt that one part of the summing was somewhat in tension with the rest of the essay:

HHL and its offshoots….in a world with quantum computers, they’d probably find practical uses.

After reading all the caveats & limitations of HHL the feeling I was tending towards by the end of the essay was that it might be very difficult to find a real problem that fit all its constraints.
Scott Says:
Comment #16 February 3rd, 2015 at 1:56 pm
Rahul #15: Well, in a world with QCs, you’d also have a huge number of people looking for practical uses of these algorithms! And given that we already know one or two potential applications (e.g., the Clader et al. E&M scattering cross-sections and Guoming Wang effective resistance things), it seems much more likely to me than not that we’d find others. I chose my language carefully; I didn’t say these algorithms would revolutionize all of statistics and scientific computing, just that they’d “probably find practical uses.”
Ajit R. Jadhav Says:
Comment #17 February 3rd, 2015 at 10:40 pm
Dear Scott,

I second Miquel #12 in toto. Thanks. This essay has clarified many things to me, esp. those concerning quantum computing for the linear system [A]{x} = {b}, and their relation to the other problems of QCing (most of which I don’t understand).

But now that I have gotten going writing, may be a question, even if it is a bit lazy and/or dumb: What is the minimum size of the linear system with which today’s PK cryptography could be broken?

Also, it was noticeable that you more than once visited the issue of taking only the best possible classical speed even if only for making just theoretical comparisons. [It was this part that [finally] helped me gain clarity as to what your stand w.r.t. certain quantum computing companies is really really like! [Did I say something provocative, again?]] …

Ummm, but anyway, though it might look like kid’s play to you, still, for completeness, I would suggest that you could perhaps add a line or two about multi-gridding in the classical computing. Also, the classical stochastic relaxation approaches, if only a probabilistic estimate of {x} is being sought.

And, oh, BTW, thanks also to Iulia for making this request. … In case she reads this comment, I would like to see a similar informative essay about the PDC photons, their dynamics and uses, too.

Best,

–Ajit
[E&OE]
Rahul Says:
Comment #18 February 3rd, 2015 at 11:44 pm
Scott:

Naive question: Just like people are demonstrating Shor on test-cases by factoring small numbers is there a way to try out HHL on baby-systems? Even before we have a true, scalable QC.

What’s the easiest, most accessible system Ax=B that fits all the constraints that one could demonstrate HHL practically on. And has anyone tried this?
Job Says:
Comment #19 February 4th, 2015 at 2:19 am
At a glance, HLL doesn’t seem particularly impressive unless any vector b can be efficiently loaded into memory, along with your other points.

Would it be taken seriously outside of a Quantum context?

Hey I have a polynomial-time solution for graph-isomorphism, here it is: access isomorphisms in sorted order and do a binary search.

It’ll work for graphs whose isomorphisms are sorted according to a chosen access function.

For the others all we need is a way to get the isomorphisms in sorted order without actually sorting them. 🙂

Is that a good analogy?
Scott Says:
Comment #20 February 4th, 2015 at 5:32 am
Ajit #17:
I don’t know exactly, but if you were extremely clever, maybe somewhere on the order of 10¹⁰⁰⁰⁰x10¹⁰⁰⁰⁰, requiring a quantum computer with a few tens of thousands of qubits? At any rate, you’re not going to get any advantage over the number of qubits needed just to implement Shor’s algorithm directly, without embedding it into HHL. And you’ll suffer a loss compared to a direct implementation. But the loss will “only” be by a polynomial factor, not by an exponential one.
Scott Says:
Comment #21 February 4th, 2015 at 5:43 am
Rahul #18:
Here’s a paper reporting an experimental demonstration of HHL with 2×2 matrices (I was about to write that, to my knowledge, no one had tried this yet, but then I wisely did a Google search 🙂 ). I believe 3×3 matrices could also be handled with current technology, if anyone cares enough…

If you had a somewhat larger quantum computer, say 30-50 qubits, and wanted to demonstrate HHL, then I dunno, maybe I’d suggest doing a small-scale demonstration of the Clader et al. electromagnetic scattering algorithm, or Guoming Wang’s algorithm for effective resistances, both of which build on HHL, and both of which are discussed in my article.
Scott Says:
Comment #22 February 4th, 2015 at 5:55 am
Job #19: No, that’s not a good analogy. I don’t understand what you mean by “accessing isomorphisms in sorted order”—even given an exponentially long, sorted list of canonically permuted graphs, still, how would you know where to query that list given G as input? But let’s suppose you simply meant, constructing a giant lookup table encoding the answers to all GI instances of a given size.

Then this is an approach that clearly gives you no advantage whatsoever, compared to just solving GI on the instance you care about, once you factor in the cost of constructing the lookup table. (Except in the limit where you have super-exponentially many GI instances, and you want to amortize the costs.)

By contrast, it’s virtually certain that HHL sometimes gives you an exponential advantage over constructing the system Ax=b and solving it classically—if only because of the fact, which I discussed in the article, that HHL is universal for quantum computation. So then the question becomes one of how wide are the cases where HHL gives you such an advantage, or where your linear system can be cleverly massaged into a form where it does so. As I tried to convey in the article, that’s not a question with a simple answer.
Ajit R. Jadhav Says:
Comment #23 February 4th, 2015 at 8:24 am
Rahul #18:

Guess you guys from the TCS background don’t know about the Matrix Market database: http://math.nist.gov/MatrixMarket/ .

This site has links to other collections too. Notable here is Tim Davis’ collection (Uni. of Florida): http://www.cise.ufl.edu/research/sparse/matrices/. Check out the eye candies right on the home page. For the more serious reader, there’s a paper of his with 1000+ citations (per Google).

The best system for the classical computer, of course, is the tridiagonal system, say, as generated by a 1D assemblage of an `n’ number of identical linear springs that are joined end-to-end with pin joints (i.e. the same thing as a 1D truss), and loaded statically (with axial forces) at these nodes, and with the node numbers monotonically increasing in the physical order (say, from left to right). This system also has very neat condition number etc. properties. … May be one of these QCs could give it a try someday ;-).

Best,

–Ajit
[E&OE]
GASARCH Says:
Comment #24 February 4th, 2015 at 9:50 am
Which problem that people actually want to solve do you think Quantum Computers have the best chance of solving quicker than classical, when they are built?
I would guess Factoring and Quantum Simulations, but are there others? Are there problems with those two?

bill g.
Scott Says:
Comment #25 February 4th, 2015 at 10:40 am
GASARCH #24: The safest bet is some form of quantum simulation (say for lattice gases or quantum chemistry), since we have high confidence in a speedup there, and since you probably don’t even need a fault-tolerant universal QC before you can start seeing significant advantages. After that, I guess I’d go with factoring.
Rahul Says:
Comment #26 February 4th, 2015 at 11:43 am
Other than breaking existing cryptography, are there are other, more constructive uses for faster-than-classical factoring?
Rahul Says:
Comment #27 February 4th, 2015 at 11:58 am
Scott #21:

Thanks for the link!

Maybe this other work closer to home for you (Harvard, Boston) is another interesting experimental demo of HHL (found the link in the References of the paper you linked to).

http://arxiv.org/pdf/1302.1210v1.pdf

I like it that there seems interest in trying to do the nitty gritty hard work to actually implement the algorithm. OTOH, maybe these guys should read your essay. It seems to me that some of their exuberance could be tempered a bit:

e.g.

A quantum algorithm has been designed such that the value of this property may be estimated to any fixed desired accuracy within O(log(N)) time, making it one of the most promising applications of quantum computers

Our work thus demonstrates the implementation of a quantum algorithm with high practical significance as well as an important technological advance
Scott Says:
Comment #28 February 4th, 2015 at 1:09 pm
Rahul #26: Well, a fast factoring algorithm would surely help with various numerical investigations in number theory. And there are other, more involved problems that are solvable in classical polynomial time given a factoring oracle: for example, the membership problem for matrix groups over fields of odd order (by recent work of Babai, Beals, and Seress), and also the solution of certain Diophantine equations (which, extremely ironically, arise when you’re trying to design small quantum circuits!). Because of Shor’s algorithm, all these problems are immediately in BQP as well.
Job Says:
Comment #29 February 4th, 2015 at 8:52 pm
Scott #22, i think you’re misunderstanding how good the analogy really is. 🙂

Let me expand. We can produce a function f(g, i) such that, given a graph g, f(g, i) produces it’s ith isomorphism. What’s the ith isomorphism of g? Whatever f(g, i) wants it to be.

The only requirement is that f(g, i) output all valid isomorphisms of g for some range of i, such as [0, 2^(n^2)]. It can even output some isomorphisms repeatedly as long as it produces all of them.

Suppose that, for a given f, we take a random graph and print it’s isomorphisms, with i starting at zero and onward. Then we would find that, for some graphs, f produces a sorted list of the graph’s isomorphisms (e.g. sorted by the isomorphic graph’s binary value).

If we’re given two graphs g1 and g2, we can use f to find out if they’re isomorphic. We first evaluate f(g1, c), where c is the midway point in the range. If f(g1, c) is less than g2 then g1 and g2 are isomorphic only if g2 exists in the second half. We repeat this process, performing a binary search over the range of f for g1.

Most of the time this won’t work, because f(g1, i) is not sorted. But, if we identify a class of graphs for which there is such an f, then we’ll be able to solve graph isomorphism in polynomial time.

In this case f is analogous to the function which efficiently loads a class of b vectors into Quantum Memory. We can only load b if it has some structure that we can exploit.

Using HHL we can only solve systems for which we have a loader function, and using a binary search we can only solve graph isomorphism instances for which we have a valid access function.

In HHL, if we don’t have a loader function we’ll have to spend a large amount of time loading the vector. In binary search, if we don’t have an access function we’ll have to spend a large amount of time sorting the isomorphisms.

I probably got some details wrong, but come on the analogy is not that bad.
Rahul Says:
Comment #30 February 5th, 2015 at 2:55 am
I was looking at the actual quantification of algorithm performance in the experiments.

In the Cai et al experiments as well as the Barz et al setup the output fidelities for some cases seem as low as 0.8 i.e. 80% correctness. Relative to ideal outcomes.

So not only can we not “solve” Ax=b in the conventional sense but the specific observables we get are only 80% correct? Is this the right interpretation?

And this is for 2×2 matrices. Is there a heursitic to predict how these fidelities will scale as we go to larger matrices? i.e. How does ε increase with n?

Are these errors easy to get rid of by tweaking the initial experimental messiness? Or something more fundamental?

If high-order photon emission events are the culprits these will stay with us, right?
Scott Says:
Comment #31 February 5th, 2015 at 5:51 am
Job #29: If the function f really produces a sorted list of permutations of a given graph, then isn’t graph isomorphism even easier than you say? To check whether G and H are isomorphic, just check whether f(G,1)=f(H,1).

OK, so at some formal level, the two things are analogous: you’ve given an abstract template for graph isomorphism algorithms with major details still to be filled in, while HHL gave an abstract template for solving linear systems with a quantum computer with major details still to be filled in. The difference, then, is just that your template is completely trivial; all the real work is offloaded onto the construction of f. Whereas the HHL algorithm contains nontrivial ideas (decomposing |b⟩ into eigenvectors, using quantum phase estimation to estimate the eigenvalues, using those to apply A^-1), even if in some sense the majority of the work is offloaded onto the construction of |b⟩, e^-iAt, and the measurement procedure.
Ajit R. Jadhav Says:
Comment #32 February 5th, 2015 at 8:16 am
Looks like people didn’t pick up the hints in what I wrote above. Please allow me to write at length. (After all, you guys don’t very often discuss the [A]{x} = {b} system on the TCS threads, do you?)

…

To those who say that the quantum computer would be great for the linear system [A}{x} = {b}, let me play a devil’s advocate.

The tridiagonal system has O(N) complexity. Not all tridiagonal systems would be well-conditioned (think: large differences in the stiffnesses of the springs in the 1D truss, or if you are more EE-inclined, large differences in the resistances of a 1D passive DC network). Yet, even for a not-so-well-conditioned tridiagonal systems, observe, the true game of having to invert the matrix (or doing something similar) is actually absent. And, classically, the tridiagonal system has an O(N) complexity—and you get a *deterministic*, i.e., (numerically) *exact* solution.

If you ask any meatspace/brick-and-mortar/etc. engineer (esp. one like me), that feature of an exact solution hardly matters to us, because we are actually happy with an approximate solution, too. There are some good reasons for that.

Consider FEM, arguably the most general approach (compared to FDM and FVM) that generates the [A]{x} = {b} system in the first place.

A great number of problems of the practical engineering importance are such that they are linear, in the sense, a functional form is available for the governing differential equations (Poisson, beam, and biharmonic being the typical work-horse eqns). For such cases, the approximation inherently built in the FE model is *provably* convergent to the “exact” (i.e. analytical) solution, as the mesh is refined (i.e. as the system size for the [A]{x} = {b} increases). Perhaps more important to practice, this convergence is *provably* only one-sided and monotonic. What those two features together imply is that we *know in advance* that (for the usual displacement-primary formulation) FEM will over-estimate the *stresses* and under-estimate the *displacements*. Since design-for-strength is practically a much more important (or frequent) concern than design-for-stiffness, such features of FEM nicely go along the thinking lines of the practical design engineer.

Thus, for the linear (diff eqn) problems, FE model itself *is* approximate even *before* getting to the final [A]{x} = {b} form, and the inaccuracy built into the model happens to work in the same direction as the logic of the factor of safety—*not* in a conflicting or opposite direction. In short, FEM itself errs, but it *provably* errs on the *safe* side. That’s why, approximate solutions (of the FEM kind) are *perfectly* OK by us. How approximate? Difficult to tell, but my own guess is that 2–5% would be OK by most practical design engineers, and perhaps even 10% or even 15% (in certain cases, guided by his judgement). Compared to the analytical solutions, that is.

In case the physical situation is such that there is no known functional form, as in the nonlinear equations (e.g. Navier-Stokes), there anyway is a further approximation coming in, due to the iterations to handle the nonlinearity. Another point. Gil (#1) was wondering about the Laplacians. Well, all CFD solvers these days use only iterative techniques—i.e. *approximate* solutions—during the intermediate stage of repeatedly solving the Poisson equation, in resolving for the velocity-pressure coupling. None uses any of the exact Poisson equation solvers. That’s because (i) the exact solvers (even for the linear equation i.e. the Poisson equation) take far longer time, and (ii) it’s only an intermediate stage (in a way). Still, this intermediate stage is what consumes the most time in the current CFD solvers. That’s another reason why we go in for approximation solutions—nothing better is available here.

To summarize, practically speaking, engineers are quite fine with approximate solutions.

Where is the devil then? After all, I am saying something in favour of the approximate solutions, and the QC solutions *are* approximate…

Well, the devil is here:

The matrix inversion is, more or less, an O(N^3) problem. Forget for the time being the fact that the exponent is down to something like 2.7-odd—which is a good thing as far as weaving more closely and therefore, more robustly, the fabric of the basic maths theory, but it is hardly of any direct practical significance to engineers as of today. So, it is O(N^3) problem. But we already have (algebraic) multi-gridding (MG) that has the complexity of O(N). Unlike the tridiagonal system, MG gives us only an approximate solution. But approximations don’t matter much to us. And, just like tridiagonal, MG *is* just O(N).

Therefore, if it’s going to take O(N) time to prepare the |b> quantum state, engineers are definitely going to say to the QC community: thanks, but no, thanks.

We might be OK with approximate solutions, but we are even more OK with established techniques. That’s the reason why the classical stochastic solvers remain just a curiosity; none actually bundles it in his FEM or CFD software—not even on the SourceForge.

And it was to highlight this O(N) part that I pointed out the tridiagonal system above—the intent was not to make the scales tilt against QC, but to give a good datum for comparison. The tridiagonal system fits better than MG because it is deterministic, and hence, its solutions are repeatable—good for making comparisons.

Does this all make for a depressing scenario for a QC enthusiast?

In a way, yes.

Yet, I think, it is also possible to play a devil’s devil’s [sic] advocate too, and thereby get some argument in favour of the QC for linear systems—an argument that might cheer up the QC enthusiasts a bit.

But let me stop here for the time being, and (just for fun) wait for a day or so (at the most), and see if any one else also thinks of the same point. … I mean, it’s no big deal, the point is almost trivial and all, but still, just to have some fun….

Best,

–Ajit
[E&OE]
Ajit R. Jadhav Says:
Comment #33 February 5th, 2015 at 8:53 am
Just a correction to my reply above (#32).

In the paragraph beginning with “And it was to highlight this O(N) part…”, the paragraph-ending line should read:

“The tridiagonal system fits better than MG because it is *exact*, and hence, its solutions are repeatable—good for making comparisons.” (I wrote “deterministic” in place of the correct “exact”.)

(Also, other typos like [A} instead of [A] have crept in, but they are minor.)

Sorry for that.

Best,

–Ajit
[E&OE]
Job Says:
Comment #34 February 5th, 2015 at 3:18 pm
If the function f really produces a sorted list of permutations of a given graph, then isn’t graph isomorphism even easier than you say? To check whether G and H are isomorphic, just check whether f(G,1)=f(H,1).

That depends on whether the class of graphs for which f produces a sorted list is closed under isomorphism right? G might be a member, but H might not.

Your point is that HHL is worth more than i’ve been arguing for but less than popular perception. Sure, but isn’t HHL as much of a solver for Linear Systems as binary search is a solver for Graph Isomorphism?
Rahul Says:
Comment #35 February 5th, 2015 at 10:45 pm
Job #34:

I think it’d be nice to get rid entirely of the “Quantum Machine Learning Algorithms” rubric.

A more valid title for HHL / HHL related articles might be something like “Quantum Approximate Solver for Linear Systems”

But that’s not as catchy I guess.

In hindsight, calling HHL a machine learning algorithm is like calling a carburetor an automobile.
Gil Says:
Comment #36 February 6th, 2015 at 2:06 am
HHL indeed looks to me as a genuinely new quantum algorithm that (if QC could be built) could lead to various applications for real life problems or as a useful subroutine in other quantum algorithms. It can be especially useful when the sparse matrix is very structural.; (e.g. matrices coming from number theory questions).

HHL can be useful for applications when you do not attempt to obtain the full exponential speedup. From glancing at the Scott’s four caveats it looks that they are significantly less harsh if all you want is to obtain modest speedup. In many applications of equation solvers even modest speed up can have immense effect.

Computationally complexity wise we do not expect that HHL (or any other quantum computation) will go beyond BQP (for decision problems) so a new quantum algorithm primarily explore new areas in the (what we believe) largely uncharted BQP territory. Since the computational power of quantum computers is sometimes better manifested for other algorithmic task (such as sampling) it would be interesting to see if for the task of solving equations HHL gives some (direct) new hardness results (without going through sampling).
Rahul Says:
Comment #37 February 6th, 2015 at 8:01 am
Gil#36:

“In many applications of equation solvers even modest speed up can have immense effect.”

But conventional equation solvers give you all elements of x for Ax=b right?

Whereas HHL is only giving you access to certain aggregate observables?
John Sidles Says:
Comment #38 February 6th, 2015 at 8:37 am
Ajit R. Jadhav (#32) observes [sagely] “It is also possible to play a devil’s devil’s (sic) advocate too, and thereby get some argument in favour of the QC for linear systems — an argument that might cheer up the QC enthusiasts a bit. “

Ajit’s observations will remind Monty Python fans of the hilarious question “Did somebody say ‘mattress’ to Mr. Lambert?” (from the Python sketch “Buying a Bed” in Episode Eight, 1969).

A Pythonesque interpretation of Ajit’s observations identifies universal fault-tolerant quantum computation with the “Jerusalem” of “Buying a Bed”. As usual with humor, there is a sobering point: the consensus of QIP 2015 — which the Quantum Pontiffs are ably live-blogging — is that scalable fault-tolerant quantum computation is “decades away” (at a minimum).

An alternative view of Ajit’s observations — a view which is both Jerusalem-compatible and enterprise-viable (as it seems to me) — is to focus upon humanity’s “mattress-class” of iterative linear-solver algorithms (which are employed universally in the FE solvers that Ajit’s comment discusses, for example). These solvers rely crucially on ingenious preconditioning transforms that embody deep (but alas, often proprietary) physical insights. Anne Greenbaum’s discussion in Iterative Methods for Solving Linear Systems, (1997), surveys this algorithmic discipline

1.1.3 Preconditioners The tools used in the derivation of preconditioners are much more diverse than those applied to the study of iteration methods. […] Many of the most successful preconditioners have been derived for special problems classes, where the origin of the problem suggests a particular type of preconditioners. […] Since one cannot assume familiarity with every scientific application, a complete survey is impossible. […] The problem of generalizing application-specific preconditioners to a broader setting remains an area of active research.

It’s good news for young quantum researchers that QIP 2015 researchers are (in effect) lifting mattress-class preconditioning methods into the quantum regime. See for example Steve Flammia’s Day 4 review of Isaac Kim’s .”On the informational completeness of local observables” (arXiv:1405.0137).

The mathematical point is that quantum trajectory unravelling of matrix-product states requires, at each time-step, the solution of a large-dimension linear system of equations that governs the lift of a one-form \(dH\) to a tangent vector \(\partial_t \psi\).

This is true for classical unravellings too, but the nice thing about quantum trajectory unravellings that Nature does not require (in Greenbaum’s phrase) “familiarity with every scientific application.” Rather, the particular case of QED Hamiltonians suffices for the great majority of practical applications, including all of condensed matter physics, all of chemistry, all of biology, and every proposed mechanism (known to me) for fault-tolerant quantum computation.

At present the problem of QED trajectory unravelling upon matrix product state-spaces is an example what Steven Weinberg’s advice-to-young-scientists essay “Four golden lessons” (Nature 2003) calls “rough water”, and “a mess” whose principles (like area laws) are at present not well understood.

Weinberg advises young researchers to steer toward these rough waters, not away from them.

” My advice is to go for the messes — that’s where the action is.”

Conclusion Even after the quantum community’s arrival at some future fault-tolerant “Jerusalem” — and surely during the decades (centuries? millennia? eternities?) leading up to that arrival — the messy “quantum mattress economy” of “rough water” QED trajectory-unravelling will present opportunities for young scientists to do good research, young mathematicians to prove deep theorems, and young entrepreneurs to launch great enterprises.
Ajit R. Jadhav Says:
Comment #39 February 6th, 2015 at 9:53 am
0. A novel of a reply, just for one last time (I promise!)

1. Playing a devil’s advocate to the devil’s advocate (i.e. arguing in favour of QC):

This point was mainly motivated by Scott’s observation in the essay: “Classically, this goal seems hopeless, since n^2 steps would be needed even to examine all the entries of [A].” That is, focusing on the *space* part.

How do you square off Scott’s observation with the fact that, as reported in the literature, MG gives you O(N), and not O(N^2)?

For one thing, the reported MG complexity refers to the number of flops, not of memory references. (May be Scott needs to clarify this issue a bit.)

For another thing, MG has been developed in the context of the FEM-like equations, which produce only a *sparse* matrix. In these cases, as the Matrix Market database should make it clear, the “oracle” to tell the value of A[i][j] doesn’t have to possess the prodigious O(N^2) memory; for FEM, the number of filled entries is just O(N). This fact may have a bearing on the time-complexity of MG. … Unfortunately, except for some broad conceptual outlines, I really don’t know the MG all that well—I haven’t studied the nuts-and-bolts of the algorithm. But, may be, this could be the reason for its reported O(N) time-complexity.

And, so, I thought, how about the cases where the [A] matrix is both large and *dense*?

Just the way the tridiagonal system is one extreme example (1D truss => sparse [A]), what kind of [A] matrix would be lie on the other extreme end of the spectrum, even if only for bench-marking purposes? Obviously, it has to be a 100% filled matrix. How do you generate that?

CS guys would know: Say, take a circle, put N number of nodes on its circumference, and then, join each node to *every* other node. This will fill the entire [A] matrix. For an even more extreme case, have the to- and fro- edges between the same pair of nodes carry different weights (which doesn’t happen in FEM-like contexts, but it might occur in some other graph-theoretic contexts from CS). And, if you wish (if it helps in algorithm analysis), in either case (sparse or dense) make the filled entries random in some way (not necessarily with a uniform PDF but one with spikes and all, too).

For this kind of a fully dense [A] matrix, how well does the MG fare? … I have absolutely no idea!

But, thinking aloud, the algorithm couldn’t now skip over most of the O(N^2) entries, because they aren’t zero’s any longer. And, so, I thought, may be, the convergence behaviour of the MG algorithm would get challenged. May be, it could still “theoretically” have O(N), but the factor to stick ahead of that O(N) would be so large as to effectively land the whole thing into O(N^2) or worse, anyway. … As I said, I don’t really know (and so should keep my mouth shut), but still, that’s how I happened to think.

And therefore, here, a QC could have some kind of an advantage, I thought, because it would cost at most only O(N) (for uploading |b>, and downloading and working with |x>). The DD-advocacy is over.

2. Yet, something else also struck me in the meanwhile, once again, against a QC.

QC is approximate, just like MG. However, QC also is *stochastic*, unlike MG. Implication: the {x} entries for two different runs wouldn’t be repeatable—even if you run some kind of a smoothing operation such as the wavelets processing (which itself runs “only” in O(N)). This unrepeatability feature is unlike the classical stochastic algorithms, which rely on the pseudo-RNGs (basically deterministic).

Engineers are fine with approximations, but for mass-market applications, but they would still insist on repeatability. Implementation of ISO standards, for exchanging data between different parties, e.g. between designers and users of the designs, or for a product-integrator coordinating different third-party supplies, repeatability would practically become a very important concern.

Hence, QC won’t be very suitable here. … Not very surprising. These aren’t problem areas for which the classical complexity itself is like k^N; it simply is N^k. So, you can’t possibly expect highly dramatic improvements here anyway.

3. However, for the very-large-and-dense matrix applications (which aren’t mass-market and don’t require coordination to out-of-house parties), and perhaps even for certain niche and very-stupendously-large-and-sparse matrix applications, QCs could perhaps have a clear-cut edge. For instance, for in-house consumptions like drugs discovery in the pharma industry, or for very large data processing/mining (think Google/analytics/etc.), and so on.

4. Overall, for these reasons, I think that the inevitable conclusion is to agree with Scott: Apart from simulating QM, QCing wouldn’t transform the computational land-scape the way PCs, and classical parallel processors (GPGPUs, clusters, supercomputers) have.

5. OK, so… Let me now shut up and hurry back into the wings (and into the audience quickly thereafter), lest Peter’s Principle kicks in—if it hasn’t, already!.

Best,

–Ajit
[E&OE]
rrtucci Says:
Comment #40 February 6th, 2015 at 1:35 pm
Gil Kalai doesn’t believe that quantum computers will work because of noise, but he believes that HLL will work. Makes perfect sense…
rrtucci Says:
Comment #41 February 6th, 2015 at 1:38 pm
errata: I meant HHL not HLL. My apologies to the letter L. This show was brought to you by the letter L
jonas Says:
Comment #42 February 6th, 2015 at 3:28 pm
Re #21, wouldn’t it be possible to experiment with at least a 4×4 matrix by simulating the quantum computer on a classical computer with exponential slowdown?
John Sidles Says:
Comment #43 February 6th, 2015 at 4:11 pm
rrtucci questions “Gil Kalai doesn’t believe that quantum computers will work because of noise, but he believes that HHL will work. Makes perfect sense [?] …”

Gil’s view does make sense if we reflect that the worked example in Clader, Jacobs, and Sprouse “Preconditioned quantum linear system algorithm” (arXiv:1301.2340 [quant-ph], 2013) — an article that Scott’s Nature Physics essay features prominently — solves a specialized QED scattering problem by an algorithm that relies essentially upon a specialized class of matrix preconditioners, known as sparse approximate inverse (SPAI) preconditioners, that were developed to solve (with transformational efficiency) specialized QED scattering problems via classical computation.

Summary It’s consistent to envision a world in which quantum linear solvers are exponentially more efficient than classical solvers, but only for restricted classes of problems; classes that in particular are not BQP-complete. And it is consistent too, to envision that preconditioners that are effective in quantum linear solvers, are effective too in classical linear solvers.

The QED-scattering solver of Clader, Jacobs, and Sprouse’ justly-praised work resides naturally in this world.
Scott Says:
Comment #44 February 6th, 2015 at 4:59 pm
jonas #42: Yes, for sure you could do that. You could probably even do a 20×20 matrix. But exactly what question would you be hoping to answer that way?

(Of course, one can ask the same thing about the physics experiment, but there the answer is: you just want a demonstration platform for experimental techniques that hopefully are of independent interest.)
rrtucci Says:
Comment #45 February 6th, 2015 at 10:25 pm
John Sidles, if an algorithm requires a QC, but the QC doesn’t work, then the algorithm doesn’t work.
Rahul Says:
Comment #46 February 7th, 2015 at 12:29 am
@Gil:

Since you seem the resident expert on noise & how it affects scaling up a QC: Do you expect the same noise swamping issues to prevail in HHL?

i.e. If the experimental demos of HHL move to larger matrices from the current 2×2 toy systems will they be able to preserve reasonable fidelity in their estimated solutions?
Rahul Says:
Comment #47 February 7th, 2015 at 6:35 am
@rrtucc:

So what’s your bet on A, B or C:

(A) (Universal) QC’s will work (i.e. scale) & so will HHL on QC’s

(B) QC’s will work but HHL will not

(C) Neither QC’s nor HHL will work.

Is there a (D)?
rrtucci Says:
Comment #48 February 7th, 2015 at 10:13 am
Rahul, I would get a B grade, although I would prefer an A so I move that you swap the A and B categories

D is of course “QC’s won’t work but HHL will”. it’s the grade that Gil Kalai and John Sidles get , although I would give them an F
Jay Says:
Comment #49 February 8th, 2015 at 1:31 pm
rrtuci,

I understand Gil’s position as: in practice neither QC nor HHL will ever work; in theory QC implicates HHL; the latter is interesting (for CT’s reasons) even if no QC can ever be built.

I don’t understand your position that: QC will work but HHL won’t. If we have a candidate QC, and it fails to run some quantum algorithm, then we won’t call it a QC. We’d call it a failure and search for what’s going wrong and how to fix it. Or maybe you think there’s some problem specific to HHL, such as a mathematical mistake in Aram et al.?
rrtucci Says:
Comment #50 February 8th, 2015 at 6:34 pm
Jay, my understanding is that quantum error correction adds a poly(n) overhead. Hence, HHL talk about log(n) speeds only applies if no error correction is necessary. Scott, who knows much more than I do about this, has pointed out in his paper many other scenarios which would make HHL polynomial in practice.
Rahul Says:
Comment #51 February 8th, 2015 at 11:17 pm
@rrtucci:

Can you elaborate on why you choose (B)

i.e. QC’s will work but HHL will not?
Scott Says:
Comment #52 February 9th, 2015 at 3:06 am
rrtucci #50: No, that’s not right. Error correction adds a polylog(n) overhead, compared to what would’ve otherwise been the computation time. So if ideal HHL took log(n) time, error-corrected HHL would take log(n) polyloglog(n) time.
In the wrong exp-space at the wrong exp-time – Quantum Bot Says:
Comment #53 February 10th, 2015 at 7:18 am
[…] again recalled my first paper about quantum computing reading recent post in Shtetl-Optimized. The review mentioned in the post clarifies some points about quantum machine learning algorithms […]
fred Says:
Comment #54 February 10th, 2015 at 4:12 pm
I’m convinced we’ll find out that QM actually has less power than the classical behavior.
QM is the proof that we’re living in the Matrix and that the machines had to save resources to fake the world:
– things have no clear state when you’re not looking at them.
– entanglement is clearly the equivalent of two threads sharing the same memory line in order to save RAM.
– randomness is a neat trick to procedurally generate “complex” state without the need for full history.
etc
Jay Says:
Comment #55 February 10th, 2015 at 9:34 pm
Fred, there is a theory which states that if ever anyone discovers exactly what QM is for and why it is here, the Universe will instantly disappear and be replaced by something even more bizarre and inexplicable.
There is another theory which states that this has already happened.
Job Says:
Comment #56 February 11th, 2015 at 4:08 am
As a collaborative system, a Universe where all operators commute would be cheaper to build and far more scalable – assuming consistency is important.

What would the world be like if all operations were commutative?
John Sidles Says:
Comment #57 February 11th, 2015 at 10:07 am
Job wonders “What would the world be like if all operations were commutative?”

This question has a well-posed answer for classical dynamical systems, but its answer for quantum dynamical systems a “rough-water” and “messy” (per Steven Weinberg’s terminology of #38). The following points are excerpted from Gogolin, Muller, and Eisert’s “Absence of Thermalization in Nonintegrable Systems” (arXiv:1009.2493 [quant-ph], 2010).

“For \(n\) canonical degrees of freedom, one usually calls a system integrable if there are \(n\) independent constants of motion with vanishing Poisson brackets.”

Note that in quantum dynamical systems, the criterion “vanishing Poisson brackets” implies “vanishing commutators” of the operators associated to constants-of-motion. The discussion of Gogolin et al. continues …

“In quantum mechanics, despite the common use of the term ‘integrable,’ the situation is much less clear, and different criteria are being applied in the literature. The most common notions of integrability are the following [list follows…]”

Further literature review along these thermodynamical lines will motivate Shtetl Optimized readers to survey, on the one hand, the fundamental research presented at QIP 2015 (don’t overlook the posters), and on the other hand, the terrific series of YouTube videos that Schrödinger LLC provides. See for example, the video descriptions of the quantum solver Jaguar in designing new generations of optoelectronic transducers (which are precisely the technologies that will be required to reduce scalable BosonSampling to practice, for example).

Two well-referenced, accessible-to-students, review articles that are helpful (to me at least) in appreciating both sets of literature are Persi Diaconis’ “The Markov chain Monte Carlo revolution” (2009) and Christian Robert and George Casella’s “A short history of Markov Chain Monte Carlo: subjective recollections from incomplete data” (2011).

Conclusion The QIP 2015 lectures and posters, considered together with the surging simulation capabilities of enterprises like Schrödinger LLC, can be naturally appreciated — as it seems to me anyway — as establishing that a third “Markov chain Monte Carlo revolution” (as they have been called) is already underway … and this revolution is already generating a cornucopia of transformationally effective algorithms for unravelling quantum thermodynamical trajectories.

Postscript These references are provided in hope of motivating Shtetl Optimized readers to appreciate — and even comment publicly upon — the terrific materiel that The Quantum Pontiffs is providing with their QIP 2015 live-blogging.

For which outstanding service to the quantum community, this thanks and appreciation is extended to all four Pontiffs: Steve Flammia, Aram Harrow, Charlie Bennett, and Dave Bacon.

Good on `yah, Pontiffs!

—————-.
@article{citetag, Title = {The {M}arkov chain {M}onte {C}arlo revolution}, Author = {Persi Diaconis}, Journal = {Bulletin of the American Mathematical Society}, Number = {2}, Pages = {179--205}, Volume = {46}, Year = {2009}}
@article{citetag, Title = {A short history of Markov Chain Monte Carlo: subjective recollections from incomplete data}, Author = {Christian Robert and George Casella}, Journal = {Statistical Science}, Number = {1}, Pages = {102--115}, Volume = {26}, Year = {2011}}
Joshua Zelinsky Says:
Comment #58 February 11th, 2015 at 7:05 pm
fred #54,

It is very hard to see how sharing memory strands as an explanation for entanglement would be consistent with Bell’s Theorem.
Mark van Hoeij Says:
Comment #59 March 26th, 2015 at 5:33 pm
Scott,

How many of your colleagues believe that a “real” quantum computer (i.e. one with enough qbits to allow it to solve problems that we can’t solve with a classical computer) will be a reality during our lifetime?

In news reports I see lots of optimism. But futuristic predictions have had a bad track record: many big predictions have not come to pass, while many of the things that did happen were not predicted.

If a news reporter simply looks at the track record of past predictions, shouldn’t the conclusion be despite what the people they interview tell them, the actual chance of a real quantum computer within our lifetime is very small?
joe Says:
Comment #60 April 14th, 2015 at 12:45 am
Scott, I think there’s a typo in the published version: on the first page, second column, a |x> appears that I think should be |b>.

Shtetl-Optimized

Quantum Machine Learning Algorithms: Read the Fine Print

60 Responses to “Quantum Machine Learning Algorithms: Read the Fine Print”