Arithmetic natural proofs theory is sought

June 30th, 2008

This post will be longer and more technical than most — but what can I say? Sometimes you need to get something technical off your chest. The topic is something my student, Andy Drucker, and I (along with several interested others) have been thinking about on and off for months, and if we’re not going to get a paper out of it, at least we’ll have this blog post.

Complexity theory could be defined as the field concerned with deep, nontrivial, mathematically-sophisticated justifications for failure. For example, we can’t solve NP-complete problems in polynomial time, but maybe that’s not so bad, since we conjecture there is no solution (P≠NP). Of course, we also can’t prove P≠NP — but maybe that’s not so bad either, since we have good explanations for why the problem is so hard, like relativization, natural proofs, and algebrization.

On the other hand, consider the problem of showing that the Permanent of an nxn matrix requires arithmetic circuits of more than polynomial size. (Given a field F—which we’ll assume for this post is finite—an arithmetic circuit over F is a circuit whose only allowed operations are addition, subtraction, and multiplication over F, and that doesn’t have direct access to the bit representations of the field elements.)

The problem of circuit lower bounds for the Permanent is currently at the frontier of complexity theory. As we now know, it’s intimately related both to derandomizing polynomial identity testing and to the τ problem of Blum, Cucker, Shub, and Smale. Alas, not only can we not prove that Perm∉AlgP/poly (which is the street name for this conjecture), we don’t have any good excuse for why we can’t prove it! Relativization and algebrization don’t seem to apply here, since no one would think of using diagonalization-based techniques on such a problem in the first place. So that leaves us with natural proofs.

The theory of natural proofs, which was developed by Razborov and Rudich in 1993 and for which they recently shared the Gödel Prize, started out as an attempt to explain why it’s so hard to prove NP⊄P/poly (i.e., that SAT doesn’t have polynomial-size circuits, which is a slight strengthening of P≠NP). They said: suppose the proof were like most of the circuit lower bound proofs that we actually know (as a canonical example, the proof that Parity is not in AC0). Then as a direct byproduct, the proof would yield an efficient algorithm A that took as input the truth table of a Boolean function f, and determined that either:

  1. f belongs to an extremely large class C of “random-looking” functions, which includes SAT but does not include any function computable by polynomial-size circuits, or
  2. f does not belong to C.

(The requirement that A run in time polynomial in the size of the truth table, N=2n, is called constructivity. The requirement that C be a large class of functions — say, at least a 2-poly(n) fraction of functions — is called largeness.)

Razborov and Rudich then pointed out that such a polynomial-time algorithm A could be used to distinguish truly random functions from pseudorandom functions with non-negligible bias. As follows from the work of Håstad-Impagliazzo-Levin-Luby and Goldreich-Goldwasser-Micali, one could thereby break one-way functions in subexponential time, and undermine almost all of modern cryptography! In other words, if cryptography is possible, then proofs with the property above are not possible. The irony — we can’t prove lower bounds because lower bounds very much like the ones we want to prove are true — is thick enough to spread on toast.

Now suppose we tried to use the same argument to explain why we can’t prove superpolynomial arithmetic circuit lower bounds for the Permanent, over some finite field F. In that case, a little thought reveals that what we’d need is an arithmetic pseudorandom function family over F. More concretely, we’d need a family of functions gs:Fn→F, where s is a short random “seed”, such that:

  1. Every gs is computable by a polynomial-size, constant-depth (or at most log-depth) arithmetic circuit, but
  2. No polynomial-time algorithm, given oracle access to gs (for a randomly-chosen s), is able to distinguish gs from a random low-degree polynomial over F with non-negligible bias.

It’s important not to get so hung up on definitional details that you miss the substantive issue here. However, three comments on the definition seem in order.

Firstly, we restrict gs to be computable by constant or log-depth circuits, since that’s the regime we’re ultimately interested in (more about this later). The Permanent is a low-degree polynomial, and well-known depth reduction theorems say (roughly speaking) that any low-degree polynomial that’s computable by a small circuit, is also computable by a small circuit with very small depth.

Secondly, we say that no polynomial-time algorithm should be able to distinguish gs from a random low-degree polynomial, rather than a random function. The reason is clear: if gs is itself a low-degree polynomial, then it can always be distinguished easily from a random function, just by picking a random line and doing polynomial interpolation! On the other hand, it’s reasonable to hope that within the space of low-degree polynomials, gs looks random—and that’s all we need to draw a natural proofs conclusion. Note that the specific distribution over low-degree polynomials that we simulate doesn’t really matter: it could be (say) the uniform distribution over all degree-d polynomials for some fixed d, or the uniform distribution over polynomials in which no individual variable is raised to a higher power than d.

Thirdly, to get a close analogy with the original Razborov-Rudich theory, we stipulated that no ordinary (Boolean) polynomial-time algorithm should be able to distinguish gs from a random low-degree polynomial. However, this is not essential. If we merely knew (for example) that no polynomial-size arithmetic circuit could distinguish gs from a random low-degree polynomial, then we’d get the weaker but still interesting conclusion that any superpolynomial arithmetic circuit lower bound for the Permanent would have to be “arithmetically non-naturalizing”: that is, it would have to exploit some property of the Permanent that violates either largeness or else “arithmetic constructivity.” There’s a smooth tradeoff here, between the complexity of the distinguishing algorithm and the strength of the natural proofs conclusion that you get.

There’s no question that, if we had an arithmetic pseudorandom function family as above, it would tell us something useful about arithmetic circuit lower bounds. For we do have deep and nontrivial arithmetic circuit lower bounds — for example, those of Nisan and Wigderson (see also here), Razborov and Grigoriev, Grigoriev and Karpinski, Shpilka and Wigderson, Raz (see also here), Raz, Shpilka, and Yehudayoff, Raz and Yehudayoff, and Mignon and Ressayre. And as far as I can tell, all of these lower bounds do in fact naturalize in the sense above. (Indeed, they should even naturalize in the strong sense that there are quasipolynomial-size arithmetic circuits for the relevant properties.) Concretely, most of these techniques involve looking at the truth table (or rather, the “value table”) of the function g:Fn→F to be lower-bounded, constructing so-called partial-derivatives matrices from that truth table, and then lower-bounding the ranks of those matrices. But these operations—in particular, computing the rank—are all polynomial-time (or quasipolynomial-time for arithmetic circuits). Thus, if we could construct arithmetic pseudorandom functions, we could use them to argue that no techniques similar to the ones we know will work to prove superpolynomial arithmetic circuit lower bounds for the Permanent.

So the problem is “merely” one of constructing these goddamned arithmetic pseudorandom functions. Not surprisingly, it’s easy to construct arithmetic function families that seem pseudorandom (concrete example coming later), but the game we’re playing is that you need to be able to base the pseudorandomness of your PRF on some ‘accepted’ or ‘established’ computational intractability assumption. And here, alas, the current toolbox of complexity theory simply doesn’t seem up for the job.

To be sure, we have pseudorandom function families that are computable by constant-depth Boolean threshold circuits — most famously, those of Naor and Reingold, which are pseudorandom assuming that factoring Blum integers or the decisional Diffie-Hellman problem are hard. (Both assumptions, incidentally, are false in the quantum world, but that’s irrelevant for natural proofs purposes, since the proof techniques that we know how to think about yield polynomial-time classical algorithms.) However, the Naor-Reingold construction is based on modular exponentiation, and doing modular exponentiation in constant depth crucially requires using the bit representation of the input numbers. So this is not something that’s going to work in the arithmetic circuit world.

At the moment, it seems the closest available result to what’s needed is that of Klivans and Sherstov in computational learning theory. These authors show (among other things) that if the n1.5-approximate shortest vector problem is hard for quantum computers, then learning depth-3 arithmetic circuits from random examples is intractable for classical computers. (Here quantum computing actually is relevant—since by using techniques of Regev, it’s possible to use a quantum hardness assumption to get a classical hardness consequence!)

This result seems like exactly what we need—so then what’s the problem? Why aren’t we done? Well, it’s that business about the random examples. If the learner is allowed to make correlated or adaptive queries to the arithmetic circuit’s truth table — as we need to assume it can, in the arithmetic natural proofs setting — then we don’t currently have any hardness result. Furthermore, there seems to me to be a basic difficulty in extending Klivans-Sherstov to the case of adaptive queries (though Klivans himself seemed more optimistic). In particular, there’s a nice idea due to Angluin and Kharitonov, which yields a generic way (using digital signatures) for converting hardness-of-learning results against nonadaptive queries to hardness-of-learning results against adaptive queries. But interestingly, the Angluin-Kharitonov reduction depends essentially on our being in the Boolean world, and seems to break down completely in the arithmetic circuit world.

So, is this all Andy and I can say—that we tried to create an arithmetic natural proofs theory, but that ultimately, our attempt to find a justification of failure to find a justification of failure was itself a failure? Well, not quite. I’d like to end this post with one theorem, and one concrete conjecture.

The theorem is that, if we don’t care about the depth of the arithmetic circuits (or, more-or-less equivalently, the degree of the polynomials that they compute), then we can create arithmetic pseudorandom functions over finite fields, and hence the arithmetic natural proofs theory that we wanted. Furthermore, the only assumption we need for this is that pseudorandom functions exist in the ordinary Boolean world — about the weakest assumption one could possibly hope for!

This theorem might seem surprising, since after all, we don’t believe that there’s any general way to take a polynomial-size Boolean circuit C that operates on finite field elements x1,…,xn represented in binary, and simulate C by a polynomial-size arithmetic circuit that uses only addition, subtraction, and multiplication, and not any bit operations. (The best such simulation, due to Boneh and Lipton, is based on elliptic curves and takes moderately exponential time.) Nevertheless, Andy and I are able to show that for pseudorandomness purposes, unbounded-depth Boolean circuits and unbounded-depth arithmetic circuits are essentially equivalent.

To prove the theorem: let the input to our arithmetic circuit C be elements x1,…,xn of some finite field Fp (I’ll assume for simplicity that p is prime; you should think of p as roughly exponential in n). Then what C will do will be to first compute various affine functions of the input:

y1=a0+a1x1+…+anxn, y2=b0+b1x1+…+bnxn, etc.,

as many such functions as are needed, for coefficients ai, bi, etc. that are chosen at random and then “hardwired” into the circuit. C will then raise each yi to the (p-1)/2 power, using repeated squaring. Note that in a finite field Fp, raising y≠0 to the (p-1)/2 power yields either +1 or -1, depending on whether or not y is a quadratic residue. Let zi=(yi+1)/2. Then we now have a collection of 0/1 “bits.” Using these bits as our input, we can now compute whatever Boolean pseudorandom function we like, as follows: NOT(x) corresponds to the polynomial 1-x, AND(x,y) corresponds to xy, and OR(x,y) corresponds to 1-(1-x)(1-y). The result of this will be a collection of pseudorandom output bits, call them w1,…,wm. The final step is to convert back into a pseudorandom finite field element, which we can do as follows:

Output = w1 + 2w2 + 4w3 + 8w4 + … + 2m-1wm.

This will be pseudorandom assuming (as we can) that 2m is much larger than p.

But why does this construction work? That is, assuming the Boolean circuit was pseudorandom, why is the arithmetic circuit simulating it also pseudorandom? Well, this is a consequence of two basic facts:

  1. Affine combinations constitute a family of pairwise-independent hash functions. That is, for every pair (x1,…,xn)≠(y1,…,yn), the probability over a0,…,an that a0+a1x1+…+anxn=a0+a1y1+…+anyn is only 1/p. Furthermore, the pairwise independence can be easily seen to be preserved under raising various affine combinations to the (p-1)/2 power.
  2. If we draw f from a family of pseudorandom functions, and draw h from a family of pairwise-independent hash functions, then f(h(x)) will again be a pseudorandom function. Intuitively this is “obvious”: after all, the only way to distinguish f(h(x)) from random without distinguishing f from random would be to find two inputs x,y such that h(x)=h(y), but since h is pairwise-independent and since the outputs f(h(x)) aren’t going to help, finding such a collision should take exponential time. A formal security argument can be found (e.g.) in this paper by Bellare, Canetti, and Krawczyk.

Now for the conjecture. I promised earlier that I’d give you an explicit candidate for a (low-degree) arithmetic pseudorandom function, so here it is. Given inputs x1,…,xn∈Fp, let m be polynomially related to n, and let L1(x1,…,xn),…,Lm^2(x1,…,xn) be affine functions of x1,…,xn, with the coefficients chosen at random and then “hardwired” into our circuit, as before. Arrange L1(x1,…,xn),…,Lm^2(x1,…,xn) into an mxm matrix, and take the determinant of that matrix as your output. That’s it.

(The motivation for this construction is Valiant’s result from the 1970s, that determinant is universal under projections. That might suggest, though of course it doesn’t prove, that breaking this function should be as hard as breaking any other arithmetic pseudorandom function.)

Certainly the output of the above generator will be computable by an arithmetic circuit of size mO(log m)n. On the other hand, I conjecture that if you don’t know L1,…,Lm^2, and are polynomial-time bounded, then the output of the generator will be indistinguishable to you from that of a random polynomial of degree m. I’m willing to offer $50 to anyone who can prove or disprove this conjecture—or for that matter, who can prove or disprove the more basic conjecture that there exists a low-degree arithmetic pseudorandom function family! (Here, of course, “prove” means “prove modulo some accepted hardness assumption,” while “disprove” means “disprove.”)

But please be quick about it! If we don’t hurry up and find a formal barrier to proving superpolynomial lower bounds for the Permanent, someone might actually roll up their sleeves and prove one—and we certainly don’t want that!

Quantum Computing Since Democritus Lecture 15: Learning

June 26th, 2008

Lektur iz heer.

This week I explain Valiant’s PAC-learning model (previously covered in GITCS Lectures 19, 20, 21), and also — in response to a question from the floor — take a swipe at Bayesian fundamentalism.  When you only know one formalism to describe some phenomenon (in this case, that of choosing hypotheses to fit data), it’s easy to talk yourself into believing that formalism is the Truth: to paraphrase Caliph Omar, “if it agrees with Bayesianism, it is superfluous; if it disagrees, it is heresy.”  The antidote is to learn other formalisms.  Enter computational learning theory: an account of learning that’s clear, mathematically rigorous, useful, nontrivial, and completely different from the Bayesian account (though of course they have points of contact).  The key idea is to jettison the notoriously-troublesome notion of a prior, replacing it by a concept class (about which one makes no probabilistic assumptions), as well as a probability distribution over sample data rather than hypotheses.

Incidentally, I’d say the same thing about complexity theory.  If you think (for example) that Turing machines are the only way to reason about computational efficiency, then you’re overdue for a heaping helping of communication complexity, circuit complexity, query complexity, algebraic complexity…

Ah yes, complexity.  This week I was at the Conference on Computational Complexity at the beautiful University of Maryland in College Park: home of the Terrapins, as one is reminded by signs placed roughly every three inches.  I heard some great talks (ask in the comments section if you want details), gave two talks myself, and during the business meeting, was elected to the CCC Steering Committee.  This being a complexity conference, my declared campaign motto was “No We Can’t!”  It was inspiring to see how this simple yet hopeful motto united our community: from derandomization to circuit lower bounds, from quantum computing to proof complexity, we might have different backgrounds but we all worry about shrinking grant sizes and the rising costs of conference registration; we all face common challenges to which we want to prove that no solutions exist.  Rest assured, I will treat my duties as a steering committee member (mostly helping to select PC chairs, who in turn select the program committees who select the conference papers) with the awesome gravity they deserve.

Better safe than sorry

June 21st, 2008

After reading these blog posts dealing with the possibility of the Large Hadron Collider creating a black hole of strangelet that would destroy the earth — as well as this report from the LHC Safety Assessment Group, and these websites advocating legal action against the LHC — I realized that I can remain silent about this important issue no longer.

As a concerned citizen of Planet Earth, I demand that the LHC begin operations as soon as possible, at as high energies as possible, and continue operating until such time as it is proven completely safe to turn it off.

Given our present state of knowledge, we simply cannot exclude the possibility that aliens will visit the Earth next year, and, on finding that we have not yet produced a Higgs boson, find us laughably primitive and enslave us. Or that a wormhole mouth or a chunk of antimatter will be discovered on a collision course with Earth, which can only be neutralized or deflected using new knowledge gleaned from the LHC. Yes, admittedly, the probabilities of these events might be vanishingly small, but the fact remains that they have not been conclusively ruled out. And that being the case, the Precautionary Principle dictates taking the only safe course of action: namely, turning the LHC on as soon as possible.

After all, the fate of the planet might conceivably depend on it.

Quantum Computing Since Democritus Lecture 14: Skepticism of Quantum Computing

June 12th, 2008

I just came from Brookhaven National Lab, where I gave my standard colloquium talk on the limits of quantum computers, and got driven around the RHIC accelerator ring by physicist Rob Pisarski. I knew the lab was big; what I hadn’t quite appreciated before getting there is that it’s an entire town, with its own police department, restaurants, etc. In many ways it looks like lots of small towns across America, except that this one’s primary business is smashing gold ions into each other at relativistic speeds and measuring properties of the resulting quark-gluon plasmas.

When I talk to physicists like the ones at BNL, they often find it faintly ridiculous that anyone would doubt quantum mechanics. But people certainly do — even when they don’t admit that that’s what they’re doing — when the alternative is accepting that integers are efficiently factorizable in the physical world. Which brings us to QCSD Lecture 14, on eleven skeptical objections to quantum computing and what can be said in response to them. And yeah, if you’ve been reading this blog for years, a lot of the material won’t be new to you. It’s just one more hunk of meat to throw into the tigers’ den.

Quantum Computing Since Democritus Lecture 13: How Big Are Quantum States?

June 8th, 2008

A year and a half after the actual course, the remaining Democritus lecture notes are finally being transcribed and posted — thanks to my new summer student, Chris Granade. In Lecture 13, you can read about why QMA, QCMA, and BQP/qpoly are not merely complexity classes but battlefields, where competing visions of what quantum states really are face each other off in conflicts that we as theorists intentionally provoked. (Work with me here.)

How George W. Bush is changing my life for the better

May 29th, 2008

I’ll give you a hint: it’s not from the rebate checks.

Let me put it this way: from now on, I am going to exercise at least twice a week. I’m going to post the remaining Quantum Computing Since Democritus lectures. I’m going to finish writing up several papers that I’ve been procrastinating on for years. I’m going to get involved in more non-work-or-blog-related social activities. And I’m going to do all of these things (and have friends and family members vouch for it) because if I don’t, then money I’ve already placed in escrow will be donated to the George W. Bush Presidential Library, as well as to the National Rifle Association.

Yeah, I signed up for the “commitment contract” service stickK.com, which was started in January by Yale economics professors Dean Karlan and Ian Ayres and student Jordan Goldberg. You get to choose the “anti-charities” to which your money gets donated if you don’t achieve your stated goals; surprisingly, stickK itself doesn’t take a cut (it seems to get all its money from ad revenue). The idea is obvious in retrospect; what’s amazing is that it took this long for anyone to build a company around it. So far it seems to be working. For example, I jogged on Tuesday and went swimming this morning, despite not having exercised for the previous six months. What remains to be seen is whether W. can inspire me to new heights of research productivity.

An enormous hat tip to Michael Nielsen.

Great Ideas in Theoretical Computer Science Lectures 16-19

May 27th, 2008

In the next-to-last GITCS installment, we cover some of the greatest hits of the 70’s and 80’s.

  • Lecture 18: Cryptographic Protocols (including: how computer scientists date!)
  • Lecture 19: Interactive Proofs / Machine Learning

(Something tells me Lecture 18 is going to get more hits than the other three combined…)

The course itself ended two weeks ago; last week was the final exam. Thanks so much to all of my students for signing up for a brand-new course, asking probing questions, enduring my excruciating jokes, and doing a fantastic job with the notes. (Of course, thanks also to my eagle-eyed readers for spotting errors.) Thanks above all to my TA, Yinmeng Zhang, who went way beyond her job description to work with students individually, tell me when I was being a doofus, etc. Because of the input of everyone who participated, this course will be better when I teach it the second time around.

Also, for anyone who might want to teach a similar course, the recipe is simple (much simpler than I expected, actually):

  1. Start with a standard, off-the-shelf, undergraduate computability and complexity theory course.
  2. Cut out the most boring parts, like pushdown automata, context-free grammars, and 105000 NP-completeness reductions. (Yes, I know these things can be taught in a non-boring way, but why make it hard on yourself?)
  3. Fill in the gaps with more interesting material, like zero-knowledge proofs, computational learning theory, or quantum computing.
  4. Add a pinch (to taste) of mindblowing results that can’t be covered in detail, like the PCP Theorem, the independence of the Continuum Hypothesis, or cosmological limits on computation.

Serves 10-100.

The array size of the universe

May 23rd, 2008

I’ve been increasingly tempted to make this blog into a forum solely for responding to the posts at Overcoming Bias. (Possible new name: “Wallowing in Bias.”)

Two days ago, Robin Hanson pointed to a fascinating paper by Bousso, Harnik, Kribs, and Perez, on predicting the cosmological constant from an “entropic” version of the anthropic principle. Say what you like about whether anthropicology is science or not, for me there’s something delightfully non-intimidating about any physics paper with “anthropic” in the abstract. Sure, you know it’s going to have metric tensors, etc. (after all, it’s a physics paper) — but you also know that in the end, it’s going to turn on some core set of assumptions about the number of sentient observers, the prior probability of the universe being one way rather than another, etc., which will be comprehensible (if not necessarily plausible) to anyone familiar with Bayes’ Theorem and how to think formally.

So in this post, I’m going to try to extract an “anthropic core” of Bousso et al.’s argument — one that doesn’t depend on detailed calculations of entropy production (or anything else) — trusting my expert readers to correct me where I’m mistaken. In defense of this task, I can hardly do better than to quote the authors themselves. In explaining why they make what will seem to many like a decidedly dubious assumption — namely, that the “number of observations” in a given universe should be proportional to the increase in non-gravitational entropy, which is dominated (or so the authors calculate) by starlight hitting dust — they write:

We could have … continued to estimate the number of observers by more explicit anthropic criteria. This would not have changed our final result significantly. But why make a strong assumption if a more conservative one suffices? [p. 14]

In this post I’ll freely make strong assumptions, since my goal is to understand and explain the argument rather than to defend it.

The basic question the authors want to answer is this: why does our causally-connected patch of the universe have the size it does? Or more accurately: taking everything else we know about physics and cosmology as given, why shouldn’t we be surprised that it has the size it does?

From the standpoint of post-1998 cosmology, this is more-or-less equivalent to asking why the cosmological constant Λ ~ 10-122 should have the value it has. For the radius of our causal patch scales like

1/√Λ ~ 1061 Planck lengths ~ 1010 light-years,

while (if you believe the holographic principle) its maximum information content scales like 1/Λ ~ 10122 qubits. To put it differently, there might be stars and galaxies and computers that are more than ~1010 light-years away from us, and they might require more than ~10122 qubits to describe. But if so, they’re receding from us so quickly that we’ll never be able to observe or interact with them.

Of course, to ask why Λ has the value it does is really to ask two questions:

1. Why isn’t Λ smaller than it is, or even zero? (In this post, I’ll ignore the possibility of its being negative.)
2. Why isn’t Λ bigger than it is?

Presumably, any story that answers both questions simultaneously will have to bring in some actual facts about the universe. Let’s face it: 10-122 is just not the sort of answer you expect to get from armchair philosophizing (not that it wouldn’t be great if you did). It’s a number.

As a first remark, it’s easy to understand why Λ isn’t much bigger than it is. If it were really big, then matter in the early universe would’ve flown apart so quickly that stars and galaxies wouldn’t have formed, and hence we wouldn’t be here to blog about it. But this upper bound is far from tight. Bousso et al. write that, based on current estimates, Λ could be about 2000 times bigger than it is without preventing galaxy formation.

As for why Λ isn’t smaller, there’s a “naturalness” argument due originally (I think) to Weinberg, before the astronomers even discovered that Λ>0. One can think of Λ as the energy of empty space; as such, it’s a sum of positive and negative contributions from all possible “scalar fields” (or whatever else) that contribute to that energy. That all of these admittedly-unknown contributions would happen to cancel out exactly, yielding Λ=0, seems fantastically “unnatural” if you choose to think of the contributions as more-or-less random. (Attempts to calculate the likely values of Λ, with no “anthropic correction,” notoriously give values that are off by 120 orders of magnitude!) From this perspective, the smaller you want Λ to be, the higher the price you have to pay in the unlikelihood of your hypothesis.

Based on the above reasoning, Weinberg predicted that Λ would have close to the largest possible value it could have, consistent with the formation of galaxies. As mentioned before, this gives a prediction that’s too big by a factor of 2000 — a vast improvement over the other approaches, which gave predictions that were off by factors of 10120 or infinity!

Still, can’t we do better? One obvious approach to pushing Λ down would be to extend the relatively-uncontroversial argument explaining why Λ can’t be enormous. After all, the tinier we make Λ, the bigger the universe (or at least our causal patch of it) will be. And hence, one might argue, the more observers there will be, hence the more likely we’ll be to exist in the first place! This form of anthropicizing — that we’re twice as likely to exist in a universe with twice as many observers — is what philosopher Nick Bostrom calls the Self-Indication Assumption.

However, two problems with this idea are evident. First, why should it be our causal patch of the universe that matters, rather than the universe as a whole? For anthropic purposes, who cares if the various civilizations that arise in some universe are in causal contact with each other or not, provided they exist? Bousso et al.’s response is basically just to stress that, from what we know about quantum gravity (in particular, black-hole complementarity), it probably doesn’t even make sense to assign a Hilbert space to the entire universe, as opposed to some causal patch of it. Their “Causal-Patch Self-Indication Assumption” still strikes me as profoundly questionable — but let’s be good sports, assume it, and see what the consequences are.

If we do this, we immediately encounter a second problem with the anthropic argument for a low value of Λ: namely, it seems to work too well! On its face, the Self-Indication Assumption wants the number of observers in our causal patch to be infinite, hence the patch itself to be infinite in size, hence Λ=0, in direct conflict with observation.

But wait: what exactly is our prior over the possible values of Λ? Well, it appears Landscapeologists typically just assume a uniform prior over Λ within some range. (Can someone enlighten me on the reasons for this, if there are any? E.g., is it just that the middle part of a Gaussian is roughly uniform?) In that case, the probability that Λ is between ε and 2ε will be of order ε — and such an event, we might guess, would lead to a universe of “size” 1/ε, with order 1/ε observers. In other words, it seems like the tiny prior probability of a small cosmological constant should precisely cancel out the huge number of observers that such a constant leads to — Λ(1/Λ)=1 — leaving us with no prediction whatsoever about the value of Λ. (When I tried to think about this issue years ago, that’s about as far as I got.)

So to summarize: Bousso et al. need to explain to us on the one hand why Λ isn’t 2000 times bigger than it is, and on the other hand why it’s not arbitrarily smaller or 0. Alright, so are you ready for the argument?

The key, which maybe isn’t so surprising in retrospect, turns out to be other stuff that’s known about physics and astronomy (independent of Λ), together with the assumption that that other stuff stays the same (i.e., that all we’re varying is Λ). Sure, say Bousso et al.: in principle a universe with positive cosmological constant Λ could contain up to ~1/Λ bits of information, which corresponds — or so a computer scientist might estimate! — to ~1/Λ observers, like maybe ~1/√Λ observers in each of ~1/√Λ time periods. (The 1/√Λ comes from the Schwarzschild bound on the amount of matter and energy within a given radius, which is linear in the radius and therefore scales like 1/√Λ.)

But in reality, that 1/Λ upper bound on the number of observers won’t be anywhere close to saturated. In reality, what will happen is that after a billion or so years stars will begin to form, radiating light and quickly increasing the universe’s entropy, and then after a couple tens of billions more years, those stars will fizzle out and the universe will return to darkness. And this means that, even though you pay a Λ price in prior probability for a universe with 1/Λ information content, as Λ goes to zero what you get for your money is not ~1/√Λ observers in each of ~1/√Λ time periods (hence ~1/Λ observers in total), but rather just ~1/√Λ observers over a length of time independent of Λ (hence ~1/√Λ observers in total). In other words, you get diminishing returns for postulating a bigger and bigger causal patch, once your causal patch exceeds a few tens of billions of light-years in radius.

So that’s one direction. In the other direction, why shouldn’t we expect Λ to be 2000 times bigger than it is (i.e. the radius of our causal patch to be ~45 times smaller)? Well, Λ could be that big, say the authors, but in that case the galaxies would fly apart from each other before starlight really started to heat things up. So once again you lose out: during the very period when the stars are shining the brightest, entropy production is at its peak, civilizations are presumably arising and killing each other off, etc., the number of galaxies per causal patch is minuscule, and that more than cancels out the larger prior probability that comes with a larger value of Λ.

Putting it all together, then, what you get is a posterior distribution for Λ that’s peaked right around 10-122 or so, corresponding to a causal patch a couple tens of light-years across. This, of course, is exactly what’s observed. You also get the prediction that we should be living in the era when Λ is “just taking over” from gravity, which again is borne out by observation. According to another paper, which I haven’t yet read, several other predictions of cosmological parameters come out right as well.

On the other hand, it seems to me that there are still few enough data points that physicists’ ability to cook up some anthropic explanation to fit them all isn’t sufficiently surprising to compel belief. (In learning theory terms, the measurable cosmological parameters still seem shattered by the concept class of possible anthropic stories.) For those of us who, unlike Eliezer Yudkowsky, still hew to the plodding, non-Bayesian, laughably human norms of traditional science, it seems like what’s needed is a successful prediction of a not-yet-observed cosmological parameter.

Until then, I’m happy to adopt a bullet-dodging attitude toward this and all other proposed anthropic explanations. I assent to none, but wish to understand them all — the more so if they have a novel conceptual twist that I personally failed to think of.

Floating in Platonic heaven

May 15th, 2008

In the comments section of my last post, Jack in Danville writes:

I may have misunderstood [an offhand comment about the “irrelevance” of the Continuum Hypothesis] … Intuitively I’ve thought the Continuum Hypothesis describes an aspect of the real world.

I know we’ve touched on similar topics before, but something tells me many of you are hungerin’ for a metamathematical foodfight, and Jack’s perplexity seemed as good a pretext as any for starting a new thread.

So, Jack: this is a Deep Question, but let me try to summarize my view in a few paragraphs.

It’s easy to imagine a “physical process” whose outcome could depend on whether Goldbach’s Conjecture is true or false. (For example, a computer program that tests even numbers successively and halts if it finds one that’s not a sum of two primes.) Likewise for P versus NP, the Riemann Hypothesis, and even considerably more abstract questions.

But can you imagine a “physical process” whose outcome could depend on whether there’s a set larger than the set of integers but smaller than the set of real numbers? If so, what would it look like?

I submit that the key distinction is between

  1. questions that are ultimately about Turing machines and finite sets of integers (even if they’re not phrased that way), and
  2. questions that aren’t.

We need to assume that we have a “direct intuition” about integers and finite processes, which precedes formal reasoning — since without such an intuition, we couldn’t even do formal reasoning in the first place. By contrast, for me the great lesson of Gödel and Cohen’s independence results is that we don’t have a similar intuition about transfinite sets, even if we sometimes fool ourselves into thinking we do. Sure, we might say we’re talking about arbitrary subsets of real numbers, but on closer inspection, it turns out we’re just talking about consequences of the ZFC axioms, and those axioms will happily admit models with intermediate cardinalities and other models without them, the same way the axioms of group theory admit both abelian and non-abelian groups. (Incidentally, Gödel’s models of ZFC+CH and Cohen’s models of ZFC+not(CH) both involve only countably many elements, which makes the notion that they’re telling us about some external reality even harder to understand.)

Of course, everything I’ve said is consistent with the possibility that there’s a “truth” about CH floating in Platonic heaven, or even that a plausible axiom system other than ZFC could prove or disprove CH (which was Gödel’s hope). But the “truth” of CH is not going to have consequences for human beings or the physical universe independent of its provability, in the same way that the truth of P=NP could conceivably have consequences for us even if we weren’t able to prove or disprove it.

For mathematicians, this distinction between “CH-like questions” and “Goldbach/Riemann/Pvs.NP-like questions” is a cringingly obvious one, probably even too obvious to point out. But I’ve seen so many people argue about Platonism versus formalism as if this distinction didn’t exist — as if one can’t be a Platonist about integers but a formalist about transfinite sets — that I think it’s worth hammering home.

To summarize, Kronecker had it backwards. Man and Woman deal with the integers; all else is the province of God.

The bullet-swallowers

May 13th, 2008

Question for the day: what do libertarianism and the Many-Worlds Interpretation of quantum mechanics have in common? Interest in the two worldviews seems to be positively correlated: think of quantum computing pioneer David Deutsch, or several prominent posters over at Overcoming Bias, or … oh, alright, my sample size is admittedly pretty small.

Some connections are obvious: libertarianism and MWI are both grand philosophical theories that start from premises that almost all educated people accept (quantum mechanics in the one case, Econ 101 in the other), and claim to reach conclusions that most educated people reject, or are at least puzzled by (the existence of parallel universes / the desirability of eliminating fire departments). Both theories seem to have a strong following with nerds who read science fiction and post to Internet discussion groups, but a relatively poorer following with both John Q. Public and Alistair K. Intellectual. (Needless to say, these stereotypes tell us almost nothing about the theories’ validity.)

My own hypothesis has to do with bullet-dodgers versus bullet-swallowers. A bullet-dodger is a person who says things like:

Sure, obviously if you pursued that particular line of reasoning to an extreme, then you’d get such-and-such an absurd-seeming conclusion. But that very fact suggests that other forces might come into play that we don’t understand yet or haven’t accounted for. So let’s just make a mental note of it and move on.

Faced with exactly the same situation, a bullet-swallower will exclaim:

The entire world should follow the line of reasoning to precisely this extreme, and this is the conclusion, and if a ‘consensus of educated opinion’ finds it disagreeable or absurd, then so much the worse for educated opinion! Those who accept this are intellectual heroes; those who don’t are cowards.

In a lifetime of websurfing, I don’t think I’ve ever read an argument by a libertarian or a Many-Worlds proponent that didn’t sound like the latter.

We know plenty of historical examples where the bullet-swallowers were gloriously right: Moore’s Law, Darwinism, the abolition of slavery, women’s rights. On the other hand, at various points within the last 150 years, extremely smart people also reasoned themselves to the inescapable conclusions that aether had to exist for light to be a wave in, that capitalism was reaching its final crisis, that only a world government could prevent imminent nuclear war, and that space colonies would surely exist by 2000. In those cases, even if you couldn’t spot any flaws in the arguments, you still would’ve been wise to doubt their conclusions. (Or are you sure you would have spotted the flaws where Maxwell and Kelvin, Russell and Einstein did not?)

Here’s a favorite analogy. The world is a real-valued function that’s almost completely unknown to us, and that we only observe in the vicinity of a single point x0. To our surprise, we find that, within that tiny vicinity, we can approximate the function extremely well by a Taylor series.

“Aha!” exclaim the bullet-swallowers. “So then the function must be the infinite series, neither more nor less.”

“Not so fast,” reply the bullet-dodgers. “All we know is that we can approximate the function in a small open interval around x0. Who knows what unsuspected phenomena might be lurking beyond it?”

“Intellectual cowardice!” the first group snorts. “You’re just like the Jesuit schoolmen, who dismissed the Copernican system as a mere calculational device! Why can’t you accept what our best theory is clearly telling us?”

So who’s right: the bullet-swallowing libertarian Many-Worlders, or the bullet-dodging intellectual kibitzers? Well, that depends on whether the function is sin(x) or log(x).