Archive for the ‘Metaphysical Spouting’ Category

An Orthodox rabbi and Steven Weinberg walk into an email exchange…

Friday, October 22nd, 2021

Ever since I posted my obituary for the great Steven Weinberg three months ago, I’ve gotten a steady trickle of emails—all of which I’ve appreciated enormously—from people who knew Steve, or were influenced by him, and who wanted to share their own thoughts and memories. Last week, I was contacted by one Moshe Katz, an Orthodox rabbi, who wanted to share a long email exchange that he’d had with Steve, about Steve’s reasons for rejecting his birth-religion of Judaism (along with every other religion). Even though Rabbi Katz, rather than Steve, does most of the talking in this exchange, and even though Steve mostly expresses the same views he’d expressed in many of his public writings, I knew immediately on seeing this exchange that it could be of broader interest—so I secured permission to share it here on Shtetl-Optimized, both from Rabbi Katz and from Steve’s widow Louise.

While longtime readers can probably guess what I think about most of the topics discussed, I’ll refrain from any editorial commentary in this post—but of course, feel free to share your own thoughts in the comments, and maybe I’ll join in. Mostly, reading this exchange reminded me that someone at some point should write a proper book-length biography of Steve, and someone should also curate and publish a selection of his correspondence, much like Perfectly Reasonable Deviations from the Beaten Track did for Richard Feynman. There must be a lot more gems to be mined.

Anyway, without further ado, here’s the exchange (10 pages, PDF).

Update (Nov. 2, 2021): By request, see here for some of my own thoughts.

The Zen Anti-Interpretation of Quantum Mechanics

Thursday, March 4th, 2021

As I lay bedridden this week, knocked out by my second dose of the Moderna vaccine, I decided I should blog some more half-baked ideas because what the hell? It feels therapeutic, I have tenure, and anyone who doesn’t like it can close their broswer tab.

So: although I’ve written tens of thousands of words, on this blog and elsewhere, about interpretations of quantum mechanics, again and again I’ve dodged the question of which interpretation (if any) I really believe myself. Today, at last, I’ll emerge from the shadows and tell you precisely where I stand.

I hold that all interpretations of QM are just crutches that are better or worse at helping you along to the Zen realization that QM is what it is and doesn’t need an interpretation.  As Sidney Coleman famously argued, what needs reinterpretation is not QM itself, but all our pre-quantum philosophical baggage—the baggage that leads us to demand, for example, that a wavefunction |ψ⟩ either be “real” like a stubbed toe or else “unreal” like a dream. Crucially, because this philosophical baggage differs somewhat from person to person, the “best” interpretation—meaning, the one that leads most quickly to the desired Zen state—can also differ from person to person. Meanwhile, though, thousands of physicists (and chemists, mathematicians, quantum computer scientists, etc.) have approached the Zen state merely by spending decades working with QM, never worrying much about interpretations at all. This is probably the truest path; it’s just that most people lack the inclination, ability, or time.

Greg Kuperberg, one of the smartest people I know, once told me that the problem with the Many-Worlds Interpretation is not that it says anything wrong, but only that it’s “melodramatic” and “overwritten.” Greg is far along the Zen path, probably further than me.

You shouldn’t confuse the Zen Anti-Interpretation with “Shut Up And Calculate.” The latter phrase, mistakenly attributed to Feynman but really due to David Mermin, is something one might say at the beginning of the path, when one is as a baby. I’m talking here only about the endpoint of the path, which one can approach but never reach—the endpoint where you intuitively understand exactly what a Many-Worlder, Copenhagenist, or Bohmian would say about any given issue, and also how they’d respond to each other, and how they’d respond to the responses, etc. but after years of study and effort you’ve returned to the situation of the baby, who just sees the thing for what it is.

I don’t mean to say that the interpretations are all interchangeable, or equally good or bad. If you had to, you could call even me a “Many-Worlder,” but only in the following limited sense: that in fifteen years of teaching quantum information, my experience has consistently been that for most students, Everett’s crutch is the best one currently on the market. At any rate, it’s the one that’s the most like a straightforward picture of the equations, and the least like a wobbly tower of words that might collapse if you utter any wrong ones.  Unlike Bohr, Everett will never make you feel stupid for asking the questions an inquisitive child would ask; he’ll simply give you answers that are as clear, logical, and internally consistent as they are metaphysically extravagant. That’s a start.

The Copenhagen Interpretation retains a place of honor as the first crutch, for decades the only crutch, and the one closest to the spirit of positivism. Unfortunately, wielding the Copenhagen crutch requires mad philosophical skillz—which parts of the universe should you temporarily regard as “classical”? which questions should be answered, and which deflected?—to the point where, if you’re capable of all that verbal footwork, then why do you even need a crutch in the first place? In the hands of amateurs—meaning, alas, nearly everyone—Copenhagen often leads away from rather than toward the Zen state, as one sees with the generations of New-Age bastardizations about “observations creating reality.”

As for deBroglie-Bohm—well, that’s a weird, interesting, baroque crutch, one whose actual details (the preferred basis and the guiding equation) are historically contingent and tied to specific physical systems. It’s probably the right crutch for someone—it gets eternal credit for having led Bell to discover the Bell inequality—but its quirks definitely need to be discarded along the way.

Note that, among those who approach the Zen state, many might still call themselves Many-Worlders or Copenhagenists or Bohmians or whatever—just as those far along in spiritual enlightenment might still call themselves Buddhists or Catholics or Muslims or Jews (or atheists or agnostics)—even though, by that point, they might have more in common with each other than they do with their supposed coreligionists or co-irreligionists.

Alright, but isn’t all this Zen stuff just a way to dodge the actual, substantive questions about QM, by cheaply claiming to have transcended them? If that’s your charge, then please help yourself to the following FAQ about the details of the Zen Anti-Interpretation.

  1. What is a quantum state? It’s a unit vector of complex numbers (or if we’re talking about mixed states, then a trace-1, Hermitian, positive semidefinite matrix), which encodes everything there is to know about a physical system.
  2. OK, but are the quantum states “ontic” (really out in the world), or “epistemic” (only in our heads)? Dude. Do “basketball games” really exist, or is that just a phrase we use to summarize our knowledge about certain large agglomerations of interacting quarks and leptons? Do even the “quarks” and “leptons” exist, or are those just words for excitations of the more fundamental fields? Does “jealousy” exist? Pretty much all our concepts are complicated grab bags of “ontic” and “epistemic,” so it shouldn’t surprise us if quantum states are too. Bad dichotomy.
  3. Why are there probabilities in QM? Because QM is a (the?) generalization of probability theory to involve complex numbers, whose squared absolute values are probabilities. It includes probability as a special case.
  4. But why do the probabilities obey the Born rule? Because, once the unitary part of QM has picked out the 2-norm as being special, for the probabilities also to be governed by the 2-norm is pretty much the only possibility that makes mathematical sense; there are many nice theorems formalizing that intuition under reasonable assumptions.
  5. What is an “observer”? It’s exactly what modern decoherence theory says it is: a particular kind of quantum system that interacts with other quantum systems, becomes entangled with them, and thereby records information about them—reversibly in principle but irreversibly in practice.
  6. Can observers be manipulated in coherent superposition, as in the Wigner’s Friend scenario? If so, they’d be radically unlike any physical system we’ve ever had direct experience with. So, are you asking whether such “observers” would be conscious, or if so what they’d be conscious of? Who the hell knows?
  7. Do “other” branches of the wavefunction—ones, for example, where my life took a different course—exist in the same sense this one does? If you start with a quantum state for the early universe and then time-evolve it forward, then yes, you’ll get not only “our” branch but also a proliferation of other branches, in the overwhelming majority of which Donald Trump was never president and civilization didn’t grind to a halt because of a bat near Wuhan.  But how could we possibly know whether anything “breathes fire” into the other branches and makes them real, when we have no idea what breathes fire into this branch and makes it real? This is not a dodge—it’s just that a simple “yes” or “no” would fail to do justice to the enormity of such a question, which is above the pay grade of physics as it currently exists. 
  8. Is this it? Have you brought me to the end of the path of understanding QM? No, I’ve just pointed the way toward the beginning of the path. The most fundamental tenet of the Zen Anti-Interpretation is that there’s no shortcut to actually working through the Bell inequality, quantum teleportation, Shor’s algorithm, the Kochen-Specker and PBR theorems, possibly even a … photon or a hydrogen atom, so you can see quantum probability in action and be enlightened. I’m further along the path than I was twenty years ago, but not as far along as some of my colleagues. Even the greatest quantum Zen masters will be able to get further when new quantum phenomena and protocols are discovered in the future. All the same, though—and this is another major teaching of the Zen Anti-Interpretation—there’s more to life than achieving greater and greater clarity about the foundations of QM. And on that note…

To those who asked me about Claus Peter Schnorr’s claim to have discovered a fast classical factoring algorithm, thereby “destroying” (in his words) the RSA cryptosystem, see (e.g.) this Twitter thread by Keegan Ryan, which explains what certainly looks like a fatal error in Schnorr’s paper.

Once we can see them, it’s too late

Saturday, January 30th, 2021

[updates: here’s the paper, and here’s Robin’s brief response to some of the comments here]

This month Robin Hanson, the famous and controversy-prone George Mason University economics professor who I’ve known since 2004, was visiting economists here in Austin for a few weeks. So, while my fear of covid considerably exceeds Robin’s, I met with him a few times in the mild Texas winter in an outdoor, socially-distanced way. It took only a few minutes for me to remember why I enjoy talking to Robin so much.

See, while I’d been moping around depressed about covid, the vaccine rollout, the insurrection, my inability to focus on work, and a dozen other things, Robin was bubbling with excitement about a brand-new mathematical model he was working on to understand the growth of civilizations across the universe—a model that, Robin said, explained lots of cosmic mysteries in one fell swoop and also made striking predictions. My cloth facemask was, I confess, unable to protect me from Robin’s infectious enthusiasm.

As I listened, I went through the classic stages of reaction to a new Hansonian proposal: first, bemusement over the sheer weirdness of what I was being asked to entertain, as well as Robin’s failure to acknowledge that weirdness in any way whatsoever; then, confusion about the unstated steps in his radically-condensed logic; next, the raising by me of numerous objections (each of which, it turned out, Robin had already thought through at length); finally, the feeling that I must have seen it this way all along, because isn’t it kind of obvious?

Robin has been explaining his model in a sequence of Overcoming Bias posts, and will apparently have a paper out about the model soon the paper is here! In this post, I’d like to offer my own take on what Robin taught me. Blame for anything I mangle lies with me alone.

To cut to the chase, Robin is trying to explain the famous Fermi Paradox: why, after 60+ years of looking, and despite the periodic excitement around Tabby’s star and ‘Oumuamua and the like, have we not seen a single undisputed sign of an extraterrestrial civilization? Why all this nothing, even though the observable universe is vast, even though (as we now know) organic molecules and planets in Goldilocks zones are everywhere, and even though there have been billions of years for aliens someplace to get a technological head start on us, expanding across a galaxy to the point where they’re easily seen?

Traditional answers to this mystery include: maybe the extraterrestrials quickly annihilate themselves in nuclear wars or environmental cataclysms, just like we soon will; maybe the extraterrestrials don’t want to be found (whether out of self-defense or a cosmic Prime Directive); maybe they spend all their time playing video games. Crucially, though, all answers of that sort founder against the realization that, given a million alien civilizations, each perhaps more different from the others than kangaroos are from squid, it would only take one, spreading across a billion light-years and transforming everything to its liking, for us to have noticed it.

Robin’s answer to the puzzle is as simple as it is terrifying. Such civilizations might well exist, he says, but if so, by the time we noticed one, it would already be nearly too late. Robin proposes, plausibly I think, that if you give a technological civilization 10 million or so years—i.e., an eyeblink on cosmological timescales—then either

  1. the civilization wipes itself out, or else
  2. it reaches some relatively quiet steady state, or else
  3. if it’s serious about spreading widely, then it “maxes out” the technology with which to do so, approaching the limits set by physical law.

In cases 1 or 2, the civilization will of course be hard for us to detect, unless it happens to be close by. But what about case 3? There, Robin says, the “civilization” should look from the outside like a sphere expanding at nearly the speed of light, transforming everything in its path.

Now think about it: when could we, on earth, detect such a sphere with our telescopes? Only when the sphere’s thin outer shell had reached the earth—perhaps carrying radio signals from the extraterrestrials’ early history, before their rapid expansion started. By that point, though, the expanding sphere itself would be nearly upon us!

What would happen to us once we were inside the sphere? Who knows? The expanding civilization might obliterate us, it might preserve us as zoo animals, it might merge us into its hive-mind, it might do something else that we can’t imagine, but in any case, detecting the civilization would presumably no longer be the relevant concern!

(Of course, one could also wonder what happens when two of these spheres collide: do they fight it out? do they reach some agreement? do they merge? Whatever the answer, though, it doesn’t matter for Robin’s argument.)

On the view described, there’s only a tiny cosmic window in which a SETI program could be expected to succeed: namely, when the thin surface of the first of these expanding bubbles has just hit us, and when that surface hasn’t yet passed us by. So, given our “selection bias”—meaning, the fact that we apparently haven’t yet been swallowed up by one of the bubbles—it’s no surprise if we don’t right now happen to find ourselves in the tiny detection window!

This basic proposal, it turns out, is not original to Robin. Indeed, an Overcoming Bias reader named Daniel X. Varga pointed out to Robin that he (Daniel) shared the same idea right here—in a Shtetl-Optimized comment thread—back in 2008! I must have read Daniel Varga’s comment then, but (embarrassingly) it didn’t make enough of an impression for me to have remembered it. I probably thought the same as you probably thought while reading this post:

“Sure, whatever. This is an amusing speculation that could make for a fun science-fiction story. Alas, like with virtually every story about extraterrestrials, there’s no good reason to favor this over a hundred other stories that a fertile imagination could just as easily spin. Who the hell knows?”

This is where Robin claims to take things further. Robin would say that he takes them further by developing a mathematical model, and fitting the parameters of the model to the known facts of cosmic history. Read Overcoming Bias, or Robin’s forthcoming paper, if you want to know the details of his model. Personally, I confess I’m less interested in those details than I am in the qualitative points, which (unless I’m mistaken) are easy enough to explain in words.

The key realization is this: when we contemplate the Fermi Paradox, we know more than the mere fact that we look and look and we don’t see any aliens. There are other relevant data points to fit, having to do with the one sample of a technological civilization that we do have.

For starters, there’s the fact that life on earth has been evolving for at least ~3.5 billion years—for most of the time the earth has existed—but life has a mere billion more years to go, until the expanding sun boils away the oceans and makes the earth barely habitable. In other words, at least on this planet, we’re already relatively close to the end. Why should that be?

It’s an excellent fit, Robin says, to a model wherein there are a few incredibly difficult, improbable steps along the way to a technological civilization like ours—steps that might include the origin of life, of multicellular life, of consciousness, of language, of something else—and wherein, having achieved some step, evolution basically just does a random search until it either stumbles onto the next step or else runs out of time.

Of course, given that we’re here to talk about it, we necessarily find ourselves on a planet where all the steps necessary for blog-capable life happen to have succeeded. There might be vastly more planets where evolution got stuck on some earlier step.

But here’s the interesting part: conditioned on all the steps having succeeded, we should find ourselves near the end of the useful lifetime of our planet’s star—simply because the more time is available on a given planet, the better the odds there. I.e., look around the universe and you should find that, on most of the planets where evolution achieves all the steps, it nearly runs out the planet’s clock in doing so. Also, as we look back, we should find the hard steps roughly evenly spaced out, with each one having taken a good fraction of the whole available time. All this is an excellent match for what we see.

OK, but it leads to a second puzzle. Life on earth is at least ~3.5 billion years old, while the observable universe is ~13.7 billion years old. Forget for a moment about the oft-stressed enormity of these two timescales and concentrate on their ratio, which is merely ~4. Life on earth stretches a full quarter of the way back in time to the Big Bang. Even as an adolescent, I remember finding that striking, and not at all what I would’ve guessed a priori. It seemed like obviously a clue to something, if I could only figure out what.

The puzzle is compounded once you realize that, even though the sun will boil the oceans in a billion years (and then die in a few billion more), other stars, primarily dwarf stars, will continue shining brightly for trillions more years. Granted, the dwarf stars don’t seem quite as hospitable to life as sun-like stars, but they do seem somewhat hospitable, and there will be lots of them—indeed, more than of sun-like stars. And they’ll last orders of magnitude longer.

To sum up, our temporal position relative to the lifetime of the sun makes it look as though life on earth was just a lucky draw from a gigantic cosmic Poisson process. By contrast, our position relative to the lifetime of all the stars makes it look as though we arrived crazily, freakishly early—not at all what you’d expect under a random model. So what gives?

Robin contends that all of these facts are explained under his bubble scenario. If we’re to have an experience remotely like the human one, he says, then we have to be relatively close to the beginning of time—since hundreds of billions of years from now, the universe will likely be dominated by near-light-speed expanding spheres of intelligence, and a little upstart civilization like ours would no longer stand a chance. I.e., even though our existence is down to some lucky accidents, and even though those same accidents probably recur throughout the cosmos, we shouldn’t yet see any of the other accidents, since if we did see them, it would already be nearly too late for us.

Robin admits that his account leaves a huge question open: namely, why should our experience have been a “merely human,” “pre-bubble” experience at all? If you buy that these expanding bubbles are coming, it seems likely that there will be trillions of times more sentient experiences inside them than outside. So experiences like ours would be rare and anomalous—like finding yourself at the dawn of human history, with Hammurabi et al., and realizing that almost every interesting thing that will ever happen is still to the future. So Robin simply takes as a brute fact that our experience is “earth-like” or “human-like”; he then tries to explain the other observations from that starting point.

Notice that, in Robin’s scenario, the present epoch of the universe is extremely special: it’s when civilizations are just forming, when perhaps a few of them will achieve technological liftoff, but before one or more of the civilizations has remade the whole of creation for its own purposes. Now is the time when the early intelligent beings like us can still look out and see quadrillions of stars shining to no apparent purpose, just wasting all that nuclear fuel in a near-empty cosmos, waiting for someone to come along and put the energy to good use. In that respect, we’re sort of like the Maoris having just landed in New Zealand, or Bill Gates surveying the microcomputer software industry in 1975. We’re ridiculously lucky. The situation is way out of equilibrium. The golden opportunity in front of us can’t possibly last forever.

If we accept the above, then a major question I had was the role of cosmology. In 1998, astronomers discovered that the present cosmological epoch is special for a completely different reason than the one Robin talks about. Namely, right now is when matter and dark energy contribute roughly similarly to the universe’s energy budget, with ~30% the former and ~70% the latter. Billions of years hence, the universe will become more and more dominated by dark energy. Our observable region will get sparser and sparser, as the dark energy pushes the galaxies further and further away from each other and from us, with more and more galaxies receding past the horizon where we could receive signals from them at the speed of light. (Which means, in particular, that if you want to visit a galaxy a few billion light-years from here, you’d better start out while you still can!)

So here’s my question: is it just a coincidence that the time—right now—when the universe is “there for the taking,” potentially poised between competing spacefaring civilizations, is also the time when it’s poised between matter and dark energy? Note that, in 2007, Bousso et al. tried to give a sophisticated anthropic argument for the value of the cosmological constant Λ, which measures the density of dark energy, and hence the eventual size of the observable universe. See here for my blog post on what they did (“The array size of the universe”). Long story short, for reasons that I explain in the post, it turns out to be essential to their anthropic explanation for Λ that civilizations flourish only (or mainly) in the present epoch, rather than trillions of years in the future. If we had to count civilizations that far into the future, then the calculations would favor values of Λ much smaller than what we actually observe. This, of course, seems to dovetail nicely with Robin’s account.

Let me end with some “practical” consequences of Robin’s scenario, supposing as usual that we take it seriously. The most immediate consequence is that the prospects for SETI are dimmer than you might’ve thought before you’d internalized all this. (Even after having interalized it, I’d still like at least an order of magnitude more resources devoted to SETI than what our civilization currently spares. Robin’s assumptions might be wrong!)

But a second consequence is that, if we want human-originated sentience to spread across the universe, then the sooner we get started the better! Just like Bill Gates in 1975, we should expect that there will soon be competitors out there. Indeed, there are likely competitors out there “already” (where “already” means, let’s say, in the rest frame of the cosmic microwave background)—it’s just that the light from them hasn’t yet reached us. So if we want to determine our own cosmic destiny, rather than having post-singularity extraterrestrials determine it for us, then it’s way past time to get our act together as a species. We might have only a few hundred million more years to do so.

Update: For more discussion of this post, see the SSC Reddit thread. I especially liked a beautiful comment by “Njordsier,” which fills in some important context for the arguments in this post:

Suppose you’re an alien anthropologist that sent a probe to Earth a million years ago, and that probe can send back one high-resolution image of the Earth every hundred years. You’d barely notice humans at first, though they’re there. Then, circa 10,000 years ago (99% of the way into the stream) you begin to see plots of land turned into farms. Houses, then cities, first in a few isolated places in river valleys, then exploding across five or six continents. Walls, roads, aqueducts, castles, fortresses. Four frames before the end of the stream, the collapse of the population on two of the continents as invaders from another continent bring disease. At T-minus three frames, a sudden appearance of farmland and cities on the coasts those continents. At T-minus two frames, half the continent. At the second to last frame, a roaring interconnected network of roads, cities, farms, including skyscrapers in the cities that were just trying villas three frames ago. And in the last frame, nearly 80 percent of all wilderness converted to some kind of artifice, and the sky is streaked with the trails of flying machines all over the world.

Civilizations rose and fell, cultures evolved and clashed, and great and terrible men and women performed awesome deeds. But what the alien anthropologist sees is a consistent, rapid, exponential explosion of a species bulldozing everything in its path.

That’s what we’re doing when we talk about the far future, or about hypothetical expansionist aliens, on long time scales. We’re zooming out past the level where you can reason about individuals or cultures, but see the strokes of much longer patterns that emerge from that messy, beautiful chaos that is civilization.

Update (Jan. 31): Reading the reactions here, on Hacker News, and elsewhere underscored for me that a lot of people get off Robin’s train well before it’s even left the station. Such people think of extraterrestrial civilizations as things that you either find or, if you haven’t found one, you just speculate or invent stories about. They’re not even in the category of things that you have any serious hope to reason about. For myself, I’d simply observe that trying to reason about matters far beyond current human experience, based on the microscopic shreds of fact available to us (e.g., about the earth’s spatial and temporal position within the universe), has led to some of our species’ embarrassing failures but also to some of its greatest triumphs. Since even the failures tend to be relatively cheap, I feel like we ought to be “venture capitalists” about such efforts to reason beyond our station, encouraging them collegially and mocking them only gently.

The Complete Idiot’s Guide to the Independence of the Continuum Hypothesis: Part 1 of <=Aleph_0

Saturday, October 31st, 2020

A global pandemic, apocalyptic fires, and the possible descent of the US into violent anarchy three days from now can do strange things to the soul.

Bertrand Russell—and if he’d done nothing else in his long life, I’d love him forever for it—once wrote that “in adolescence, I hated life and was continually on the verge of suicide, from which, however, I was restrained by the desire to know more mathematics.” This summer, unable to bear the bleakness of 2020, I obsessively read up on the celebrated proof of the unsolvability of the Continuum Hypothesis (CH) from the standard foundation of mathematics, the Zermelo-Fraenkel axioms of set theory. (In this post, I’ll typically refer to “ZFC,” which means Zermelo-Fraenkel plus the famous Axiom of Choice.)

For those tuning in from home, the Continuum Hypothesis was formulated by Georg Cantor, shortly after his epochal discovery that there are different orders of infinity: so for example, the infinity of real numbers (denoted C for continuum, or \( 2^{\aleph_0} \)) is strictly greater than the infinity of integers (denoted ℵ0, or “Aleph-zero”). CH is simply the statement that there’s no infinity intermediate between ℵ0 and C: that anything greater than the first is at least the second. Cantor tried in vain for decades to prove or disprove CH; the quest is believed to have contributed to his mental breakdown. When David Hilbert presented his famous list of 23 unsolved math problems in 1900, CH was at the very top.

Halfway between Hilbert’s speech and today, the question of CH was finally “answered,” with the solution earning the only Fields Medal that’s ever been awarded for work in set theory and logic. But unlike with any previous yes-or-no question in the history of mathematics, the answer was that there provably is no answer from the accepted axioms of set theory! You can either have intermediate infinities or not; neither possibility can create a contradiction. And if you do have intermediate infinities, it’s up to you how many: 1, 5, 17, ∞, etc.

The easier half, the consistency of CH with set theory, was proved by incompleteness dude Kurt Gödel in 1940; the harder half, the consistency of not(CH), by Paul Cohen in 1963. Cohen’s work introduced the method of forcing, which was so fruitful in proving set-theoretic questions unsolvable that it quickly took over the whole subject of set theory. Learning Gödel and Cohen’s proofs had been a dream of mine since teenagerhood, but one I constantly put off.

This time around I started with Cohen’s retrospective essay, as well as Timothy Chow’s Forcing for Dummies and A Beginner’s Guide to Forcing. I worked through Cohen’s own Set Theory and the Continuum Hypothesis, and Ken Kunen’s Set Theory: An Introduction to Independence Proofs, and Dana Scott’s 1967 paper reformulating Cohen’s proof. I emailed questions to Timothy Chow, who was ridiculously generous with his time. When Tim and I couldn’t answer something, we tried Bob Solovay (one of the world’s great set theorists, who later worked in computational complexity and quantum computing), or Andreas Blass or Asaf Karagila. At some point mathematician and friend-of-the-blog Greg Kuperberg joined my quest for understanding. I thank all of them, but needless to say take sole responsibility for all the errors that surely remain in these posts.

On the one hand, the proof of the independence of CH would seem to stand with general relativity, the wheel, and the chocolate bar as a triumph of the human intellect. It represents a culmination of Cantor’s quest to know the basic rules of infinity—all the more amazing if the answer turns out to be that, in some sense, we can’t know them.

On the other hand, perhaps no other scientific discovery of equally broad interest remains so sparsely popularized, not even (say) quantum field theory or the proof of Fermat’s Last Theorem. I found barely any attempts to explain how forcing works to non-set-theorists, let alone to non-mathematicians. One notable exception was Timothy Chow’s Beginner’s Guide to Forcing, mentioned earlier—but Chow himself, near the beginning of his essay, calls forcing an “open exposition problem,” and admits that he hasn’t solved it. My modest goal, in this post and the following ones, is to make a further advance on the exposition problem.

OK, but why a doofus computer scientist like me? Why not, y’know, an actual expert? I won’t put forward my ignorance as a qualification, although I have often found that the better I learn a topic, the more completely I forget what initially confused me, and so the less able I become to explain things to beginners.

Still, there is one thing I know well that turns out to be intimately related to Cohen’s forcing method, and that made me feel like I had a small “in” for this subject. This is the construction of oracles in computational complexity theory. In CS, we like to construct hypothetical universes where P=NP or P≠NP, or P≠BQP, or the polynomial hierarchy is infinite, etc. To do so, we, by fiat, insert a new function—an oracle—into the universe of computational problems, carefully chosen to make the desired statement hold. Often the oracle needs to satisfy an infinite list of conditions, so we handle them one by one, taking care that when we satisfy a new condition we don’t invalidate the previous conditions.

All this, I kept reading, is profoundly analogous to what the set theorists do when they create a mathematical universe where the Axiom of Choice is true but CH is false, or vice versa, or any of a thousand more exotic possibilities. They insert new sets into their models of set theory, sets that are carefully constructed to “force” infinite lists of conditions to hold. In fact, some of the exact same people—such as Solovay—who helped pioneer forcing in the 1960s, later went on to pioneer oracles in computational complexity. We’ll say more about this connection in a future post.

How Could It Be?

How do you study a well-defined math problem, and return the answer that, as far as the accepted axioms of math can say, there is no answer? I mean: even supposing it’s true that there’s no answer, how do you prove such a thing?

Arguably, not even Gödel’s Incompleteness Theorem achieved such a feat. Recall, the Incompleteness Theorem says loosely that, for every formal system F that could possibly serve as a useful foundation for mathematics, there exist statements even of elementary arithmetic that are true but unprovable in F—and Con(F), a statement that encodes F’s own consistency, is an example of one. But the very statement that Con(F) is unprovable is equivalent to Con(F)’s being true (since an inconsistent system could prove anything, including Con(F)). In other words, if the Incompleteness Theorem as applied to F holds any interest, then that’s only because F is, in fact, consistent; it’s just that resources beyond F are needed to prove this.

Yes, there’s a “self-hating theory,” F+Not(Con(F)), which believes in its own inconsistency. And yes, by Gödel, this self-hating theory is consistent if F itself is. This means that it has a model—involving “nonstandard integers,” formal artifacts that effectively promise a proof of F’s inconsistency without ever actually delivering it. We’ll have much, much more to say about models later on, but for now, they’re just collections of objects, along with relationships between the objects, that satisfy all the axioms of a theory (thus, a model of the axioms of group theory is simply … any group!).

In any case, though, the self-hating theory F+Not(Con(F)) can’t be arithmetically sound: I mean, just look at it! It’s either unsound because F is consistent, or else it’s unsound because F is inconsistent. In general, this is one of the most fundamental points in logic: consistency does not imply soundness. If I believe that the moon is made of cheese, that might be consistent with all my other beliefs about the moon (for example, that Neil Armstrong ate delicious chunks of it), but that doesn’t mean my belief is true. Like the classic conspiracy theorist, who thinks that any apparent evidence against their hypothesis was planted by George Soros or the CIA, I might simply believe a self-consistent collection of absurdities. Consistency is purely a syntactic condition—it just means that I can never prove both a statement and its opposite—but soundness goes further, asserting that whatever I can prove is actually the case, a relationship between what’s inside my head and what’s outside it.

So again, assuming we had any business using F in the first place, the Incompleteness Theorem gives us two consistent ways to extend F (by adding Con(F) or by adding Not(Con(F))), but only one sound way (by adding Con(F)). But the independence of CH from the ZFC axioms of set theory is of a fundamentally different kind. It will give us models of ZFC+CH, and models of ZFC+Not(CH), that are both at least somewhat plausible as “sketches of mathematical reality”—and that both even have defenders. The question of which is right, or whether it’s possible to decide at all, will be punted to the future: to the discovery (or not) of some intuitively compelling foundation for mathematics that, as Gödel hoped, answers the question by going beyond ZFC.

Four Levels to Unpack

While experts might consider this too obvious to spell out, Gödel’s and Cohen’s analyses of CH aren’t so much about infinity, as they are about our ability to reason about infinity using finite sequences of symbols. The game is about building self-contained mathematical universes to order—universes where all the accepted axioms about infinite sets hold true, and yet that, in some cases, seem to mock what those axioms were supposed to mean, by containing vastly fewer objects than the mathematical universe was “meant” to have.

In understanding these proofs, the central hurdle, I think, is that there are at least four different “levels of description” that need to be kept in mind simultaneously.

At the first level, Gödel’s and Cohen’s proofs, like all mathematical proofs, are finite sequences of symbols. Not only that, they’re proofs that can be formalized in elementary arithmetic (!). In other words, even though they’re about the axioms of set theory, they don’t themselves require those axioms. Again, this is possible because, at the end of the day, Gödel’s and Cohen’s proofs won’t be talking about infinite sets, but “only” about finite sequences of symbols that make statements about infinite sets.

At the second level, the proofs are making an “unbounded” but perfectly clear claim. They’re claiming that, if someone showed you a proof of either CH or Not(CH), from the ZFC axioms of set theory, then no matter how long the proof or what its details, you could convert it into a proof that ZFC itself was inconsistent. In symbols, they’re proving the “relative consistency statements”

Con(ZFC) ⇒ Con(ZFC+CH),
Con(ZFC) ⇒ Con(ZFC+Not(CH)),

and they’re proving these as theorems of elementary arithmetic. (Note that there’s no hope of proving Con(ZF+CH) or Con(ZFC+Not(CH)) outright within ZFC, since by Gödel, ZFC can’t even prove its own consistency.)

This translation is completely explicit; the independence proofs even yield algorithms to convert proofs of inconsistencies in ZFC+CH or ZFC+Not(CH), supposing that they existed, into proofs of inconsistencies in ZFC itself.

Having said that, as Cohen himself often pointed out, thinking about the independence proofs in terms of algorithms to manipulate sequences of symbols is hopeless: to have any chance of understanding these proofs, let alone coming up with them, at some point you need to think about what the symbols refer to.

This brings us to the third level: the symbols refer to models of set theory, which could also be called “mathematical universes.” Crucially, we always can and often will take these models to be only countably infinite: that is, to contain an infinity of sets, but “merely” ℵ0 of them, the infinity of integers or of finite strings, and no more.

The fourth level of description is from within the models themselves: each model imagines itself to have an uncountable infinity of sets. As far as the model’s concerned, it comprises the entire mathematical universe, even though “looking in from outside,” we can see that that’s not true. In particular, each model of ZFC thinks it has uncountably many sets, many themselves of uncountable cardinality, even if “from the outside” the model is countable.

Say what? The models are mistaken about something as basic as their own size, about how many sets they have? Yes. The models will be like The Matrix (the movie, not the mathematical object), or The Truman Show. They’re self-contained little universes whose inhabitants can never discover that they’re living a lie—that they’re missing sets that we, from the outside, know to exist. The poor denizens of the Matrix will never even be able to learn that their universe—what they mistakenly think of as the universe—is secretly countable! And no Morpheus will ever arrive to enlighten them, although—and this is crucial to Cohen’s proof in particular—the inhabitants will be able to reason more-or-less intelligibly about what would happen if a Morpheus did arrive.

The Löwenheim-Skolem Theorem, from the early 1920s, says that any countable list of first-order axioms that has any model at all (i.e., that’s consistent), must have a model with at most countably many elements. And ZFC is a countable list of first-order axioms, so Löwenheim-Skolem applies to it—even though ZFC implies the existence of an uncountable infinity of sets! Before taking the plunge, we’ll need to not merely grudgingly accept but love and internalize this “paradox,” because pretty much the entire proof of the independence of CH is built on top of it.

Incidentally, once we realize that it’s possible to build self-consistent yet “fake” mathematical universes, we can ask the question that, incredibly, the Matrix movies never ask. Namely, how do we know that our own, larger universe isn’t similarly a lie? The answer is that we don’t! As an example—I hope you’re sitting down for this—even though Cantor proved that there are uncountably many real numbers, that only means there are uncountably many reals for us. We can’t rule out the possibly that God, looking down on our universe, would see countably many reals.

Cantor’s Proof Revisited

To back up: the whole story of CH starts, of course, with Cantor’s epochal discovery of the different orders of infinity, that for example, there are more subsets of positive integers (or equivalently real numbers, or equivalently infinite binary sequences) than there are positive integers. The devout Cantor thought his discovery illuminated the nature of God; it’s never been entirely obvious to me that he was wrong.

Recall how Cantor’s proof works: we suppose by contradiction that we have an enumeration of all infinite binary sequences: for example,

s(0) = 00000000…
s(1) = 01010101…
s(2) = 11001010….
s(3) = 10000000….

We then produce a new infinite binary sequence that’s not on the list, by going down the diagonal and flipping each bit, which in the example above would produce 1011…

But look more carefully. What Cantor really shows is only that, within our mathematical universe, there can’t be an enumeration of all the reals of our universe. For if there were, we could use it to define a new real that was in the universe but not in the enumeration. The proof doesn’t rule out the possibility that God could enumerate the reals of our universe! It only shows that, if so, there would need to be additional, heavenly reals that were missing from even God’s enumeration (for example, the one produced by diagonalizing against that enumeration).

Which reals could possibly be “missing” from our universe? Every real you can name—42, π, √e, even uncomputable reals like Chaitin’s Ω—has to be there, right? Yes, and there’s the rub: every real you can name. Each name is a finite string of symbols, so whatever your naming system, you can only ever name countably many reals, leaving 100% of the reals nameless.

Or did you think of only the rationals or algebraic numbers as forming a countable dust of discrete points, with numbers like π and e filling in the solid “continuum” between them? If so, then I hope you’re sitting down for this: every real number you’ve ever heard of belongs to the countable dust! The entire concept of “the continuum” is only needed for reals that don’t have names and never will.

From ℵ0 Feet

Gödel and Cohen’s achievement was to show that, without creating any contradictions in set theory, we can adjust size of this elusive “continuum,” put more reals into it or fewer. How does one even start to begin to prove such a statement?

From a distance of ℵ0 feet, Gödel proves the consistency of CH by building minimalist mathematical universes: one where “the only sets that exist, are the ones required to exist by the ZFC axioms.” (These universes can, however, differ from each other in how “tall” they are: that is, in how many ordinals they have, and hence how many sets overall. More about that in a future post!) Gödel proves that, if the axioms of set theory are consistent—that is, if they describe any universes at all—then they also describe these minimalist universes. He then proves that, in any of these minimalist universes, from the standpoint of someone within that universe, there are exactly ℵ1 real numbers, and hence CH holds.

At an equally stratospheric level, Cohen proves the consistency of not(CH) by building … well, non-minimalist mathematical universes! A simple way is to start with Gödel’s minimalist universe—or rather, an even more minimalist universe than his, one that’s been cut down to have only countably many sets—and then to stick in a bunch of new real numbers that weren’t in that universe before. We choose the new real numbers to ensure two things: first, we still have a model of ZFC, and second, that we make CH false. The details of how to do that will, of course, concern us later.

My Biggest Confusion

In subsequent posts, I’ll say more about the character of the ZFC axioms and how one builds models of them to order. Just as a teaser, though, to conclude this post I’d like to clear up a fundamental misconception I had about this subject, from roughly the age of 16 until a couple months ago.

I thought: the way Gödel proves the consistency of CH, must be by examining all the sets in his minimalist universe, and checking that each one has either at most ℵ0 elements or else at least C of them. Likewise, the way Cohen proves the consistency of not(CH), must be by “forcing in” some extra sets, which have more than ℵ0 elements but fewer than C elements.

Except, it turns out that’s not how it works. Firstly, to prove CH in his universe, Gödel is not going to check each set to make sure it doesn’t have intermediate cardinality; instead, he’s simply going to count all the reals to make sure that there are only ℵ1 of them—where 1 is the next infinite cardinality after ℵ0. This will imply that C=ℵ1, which is another way to state CH.

More importantly, to build a universe where CH is false, Cohen is going to start with a universe where C=ℵ1, like Gödel’s universe, and then add in more reals: say, ℵ2 of them. The ℵ1 “original” reals will then supply our set of intermediate cardinality between the ℵ0 integers and the ℵ2 “new” reals.

Looking back, the core of my confusion was this. I had thought: I can visualize what ℵ0 means; that’s just the infinity of integers. I can also visualize what \( C=2^{\aleph_0} \) means; that’s the infinity of points on a line. Those, therefore, are the two bedrocks of clarity in this discussion. By contrast, I can’t visualize a set of intermediate cardinality between ℵ0 and C. The intermediate infinity, being weird and ghostlike, is the one that shouldn’t exist unless we deliberately “force” it to.

Turns out I had things backwards. For starters, I can’t visualize the uncountable infinity of real numbers. I might think I’m visualizing the real line—it’s solid, it’s black, it’s got little points everywhere—but how can I be sure that I’m not merely visualizing the ℵ0 rationals, or (say) the computable or definable reals, which include all the ones that arise in ordinary math?

The continuum C is not at all the bedrock of clarity that I’d thought it was. Unlike its junior partner ℵ0, the continuum is adjustable, changeable—and we will change it when we build different models of ZFC. What’s (relatively) more “fixed” in this game is something that I, like many non-experts, had always given short shrift to: Cantor’s sequence of Alephs ℵ0, ℵ1, ℵ2, etc.

Cantor, who was a very great man, didn’t merely discover that C>ℵ0; he also discovered that the infinite cardinalities form a well-ordered sequence, with no infinite descending chains. Thus, after ℵ0, there’s a next greater infinity that we call ℵ1; after ℵ1 comes ℵ2; after the entire infinite sequence ℵ0,ℵ1,ℵ2,ℵ3,… comes ℵω; after ℵω comes ℵω+1; and so on. These infinities will always be there in any universe of set theory, and always in the same order.

Our job, as engineers of the mathematical universe, will include pegging the continuum C to one of the Alephs. If we stick in a bare minimum of reals, we’ll get C=ℵ1, if we stick in more we can get C=ℵ2 or C=ℵ3, etc. We can’t make C equal to ℵ0—that’s Cantor’s Theorem—and we also can’t make C equal to ℵω, by an important theorem of König that we’ll discuss later (yes, this is an umlaut-heavy field). But it will turn out that we can make C equal to just about any other Aleph: in particular, to any infinity other than ℵ0 that’s not the supremum of a countable list of smaller infinities.

In some sense, this is the whole journey that we need to undertake in this subject: from seeing the cardinality of the continuum as a metaphysical mystery, which we might contemplate by staring really hard at a black line on white paper, to seeing the cardinality of the continuum as an engineering problem.

Stay tuned! Next installment coming after the civilizational Singularity in three days, assuming there’s still power and Internet and food and so forth.

Oh, and happy Halloween. Ghostly sets of intermediate cardinality … spoooooky!

My second podcast with Lex Fridman

Monday, October 12th, 2020

Here it is—enjoy! (I strongly recommend listening at 2x speed.)

We recorded it a month ago—outdoors (for obvious covid reasons), on a covered balcony in Austin, as it drizzled all around us. Topics included:

  • Whether the universe is a simulation
  • Eugene Goostman, GPT-3, the Turing Test, and consciousness
  • Why I disagree with Integrated Information Theory
  • Why I disagree with Penrose’s ideas about physics and the mind
  • Intro to complexity theory, including P, NP, PSPACE, BQP, and SZK
  • The US’s catastrophic failure on covid
  • The importance of the election
  • My objections to cancel culture
  • The role of love in my life (!)

Thanks so much to Lex for his characteristically probing questions, apologies as always for my verbal tics, and here’s our first podcast for those who missed that one.

My video interview with Lex Fridman at MIT about philosophy and quantum computing

Monday, February 17th, 2020

Here it is (about 90 minutes; I recommend the 1.5x speed)

I had buried this as an addendum to my previous post on the quantum supremacy lecture tour, but then decided that a steely-eyed assessment of what’s likely to have more or less interest for this blog’s readers probably militated in favor of a separate post.

Thanks so much to Lex for arranging the interview and for his questions!

“Quantum Computing and the Meaning of Life”

Wednesday, March 13th, 2019

Manolis Kellis is a computational biologist at MIT, known as one of the leaders in applying big data to genomics and gene regulatory networks. Throughout my 9 years at MIT, Manolis was one of my best friends there, even though our research styles and interests might seem distant. He and I were in the same PECASE class; see if you can spot us both in this photo (in the rows behind America’s last sentient president). My and Manolis’s families also became close after we both got married and had kids. We still keep in touch.

Today Manolis will be celebrating his 42nd birthday, with a symposium on the meaning of life (!). He asked his friends and colleagues to contribute talks and videos reflecting on that weighty topic.

Here’s a 15-minute video interview that Manolis and I recorded last night, where he asks me to pontificate about the implications of quantum mechanics for consciousness and free will and whether the universe is a computer simulation—and also about, uh, how to balance blogging with work and family.

Also, here’s a 2-minute birthday video that I made for Manolis before I really understood what he wanted. Unlike the first video, this one has no academic content, but it does involve me wearing a cowboy hat and swinging a makeshift “lasso.”

Happy birthday Manolis!

Interpretive cards (MWI, Bohm, Copenhagen: collect ’em all)

Saturday, February 3rd, 2018

I’ve been way too distracted by actual research lately from my primary career as a nerd blogger—that’s what happens when you’re on sabbatical.  But now I’m sick, and in no condition to be thinking about research.  And this morning, in a thread that had turned to my views on the interpretation of quantum mechanics called “QBism,” regular commenter Atreat asked me the following pointed question:

Scott, what is your preferred interpretation of QM? I don’t think I’ve ever seen you put your cards on the table and lay out clearly what interpretation(s) you think are closest to the truth. I don’t think your ghost paper qualifies as an answer, BTW. I’ve heard you say you have deep skepticism about objective collapse theories and yet these would seemingly be right up your philosophical alley so to speak. If you had to bet on which interpretation was closest to the truth, which one would you go with?

Many people have asked me some variant of the same thing.  As it happens, I’d been toying since the summer with a huge post about my views on each major interpretation, but I never quite got it into a form I wanted.  By contrast, it took me only an hour to write out a reply to Atreat, and in the age of social media and attention spans measured in attoseconds, many readers will probably prefer that short reply to the huge post anyway.  So then I figured, why not promote it to a full post and be done with it?  So without further ado:


Dear Atreat,

It’s no coincidence that you haven’t seen me put my cards on the table with a favored interpretation of QM!

There are interpretations (like the “transactional interpretation”) that make no sense whatsoever to me.

There are “interpretations” like dynamical collapse that aren’t interpretations at all, but proposals for new physical theories.  By all means, let’s test QM on larger and larger systems, among other reasons because it could tell us that some such theory is true or—vastly more likely, I think—place new limits on it! (People are trying.)

Then there’s the deBroglie-Bohm theory, which does lay its cards on the table in a very interesting way, by proposing a specific evolution rule for hidden variables (chosen to match the predictions of QM), but which thereby opens itself up to the charge of non-uniqueness: why that rule, as opposed to a thousand other rules that someone could write down?  And if they all lead to the same predictions, then how could anyone ever know which rule was right?

And then there are dozens of interpretations that seem to differ from one of the “main” interpretations (Many-Worlds, Copenhagen, Bohm) mostly just in the verbal patter.

As for Copenhagen, I’ve described it as “shut-up and calculate except without ever shutting up about it”!  I regard Bohr’s writings on the subject as barely comprehensible, and Copenhagen as less of an interpretation than a self-conscious anti-interpretation: a studied refusal to offer any account of the actual constituents of the world, and—most of all—an insistence that if you insist on such an account, then that just proves that you cling naïvely to a classical worldview, and haven’t grasped the enormity of the quantum revolution.

But the basic split between Many-Worlds and Copenhagen (or better: between Many-Worlds and “shut-up-and-calculate” / “QM needs no interpretation” / etc.), I regard as coming from two fundamentally different conceptions of what a scientific theory is supposed to do for you.  Is it supposed to posit an objective state for the universe, or be only a tool that you use to organize your experiences?

Also, are the ultimate equations that govern the universe “real,” while tables and chairs are “unreal” (in the sense of being no more than fuzzy approximate descriptions of certain solutions to the equations)?  Or are the tables and chairs “real,” while the equations are “unreal” (in the sense of being tools invented by humans to predict the behavior of tables and chairs and whatever else, while extraterrestrials might use other tools)?  Which level of reality do you care about / want to load with positive affect, and which level do you want to denigrate?

This is not like picking a race horse, in the sense that there might be no future discovery or event that will tell us who was closer to the truth.  I regard it as conceivable that superintelligent AIs will still argue about the interpretation of QM … or maybe that God and the angels argue about it now.

Indeed, about the only thing I can think of that might definitively settle the debate, would be the discovery of an even deeper level of description than QM—but such a discovery would “settle” the debate only by completely changing the terms of it.

I will say this, however, in favor of Many-Worlds: it’s clearly and unequivocally the best interpretation of QM, as long as we leave ourselves out of the picture!  I.e., as long as we say that the goal of physics is to give the simplest, cleanest possible mathematical description of the world that somewhere contains something that seems to correspond to observation, and we’re willing to shunt as much metaphysical weirdness as needed to those who worry themselves about details like “wait, so are we postulating the physical existence of a continuum of slightly different variants of me, or just an astronomically large finite number?” (Incidentally, Max Tegmark’s “mathematical multiverse” does even better than MWI by this standard.  Tegmark is the one waiting for you all the way at the bottom of the slippery slope of always preferring Occam’s Razor over trying to account for the specificity of the observed world.)  It’s no coincidence, I don’t think, that MWI is so popular among those who are also eliminativists about consciousness.

When I taught my undergrad Intro to Quantum Information course last spring—for which lecture notes are coming soon, by the way!—it was striking how often I needed to resort to an MWI-like way of speaking when students got confused about measurement and decoherence. (“So then we apply this unitary transformation U that entangles the system and environment, and we compute a partial trace over the environment qubits, and we see that it’s as if the system has been measured, though of course we could in principle reverse this by applying U-1 … oh shoot, have I just conceded MWI?”)

On the other hand, when (at the TAs’ insistence) we put an optional ungraded question on the final exam that asked students their favorite interpretation of QM, we found that there was no correlation whatsoever between interpretation and final exam score—except that students who said they didn’t believe any interpretation at all, or that the question was meaningless or didn’t matter, scored noticeably higher than everyone else.

Anyway, as I said, MWI is the best interpretation if we leave ourselves out of the picture.  But you object: “OK, and what if we don’t leave ourselves out of the picture?  If we dig deep enough on the interpretation of QM, aren’t we ultimately also asking about the ‘hard problem of consciousness,’ much as some people try to deny that? So for example, what would it be like to be maintained in a coherent superposition of thinking two different thoughts A and B, and then to get measured in the |A⟩+|B⟩, |A⟩-|B⟩ basis?  Would it even be like anything?  Or is there something about our consciousness that depends on decoherence, irreversibility, full participation in the arrow of the time, not living in an enclosed little unitary box like AdS/CFT—something that we’d necessarily destroy if we tried to set up a large-scale interference experiment on our own brains, or any other conscious entities?  If so, then wouldn’t that point to a strange sort of reconciliation of Many-Worlds with Copenhagen—where as soon as we had a superposition involving different subjective experiences, for that very reason its being a superposition would be forevermore devoid of empirical consequences, and we could treat it as just a classical probability distribution?”

I’m not sure, but The Ghost in the Quantum Turing Machine will probably have to stand as my last word (or rather, last many words) on those questions for the time being.

Is “information is physical” contentful?

Thursday, July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

Higher-level causation exists (but I wish it didn’t)

Sunday, June 4th, 2017

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.


Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.


This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?


To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.


Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.