Death of proof greatly exaggerated

In 1993, the science writer John Horgan—who’s best known for his book The End of Science, and (of course) for interviewing me in 2016—wrote a now-(in)famous cover article for Scientific American entitled “The Death of Proof.” Mashing together a large number of (what I’d consider) basically separate trends and ideas, Horgan argued that math was undergoing a fundamental change, with traditional deductive proofs being replaced by a combination of non-rigorous numerical simulations, machine-generated proofs, probabilistic and probabilistically-checkable proofs, and proofs using graphics and video. Horgan also suggested that Andrew Wiles’s then-brand-new proof of Fermat’s Last Theorem—which might have looked, at first glance, like a spectacular counterexample to the “death of proof” thesis—could be the “last gasp of a dying culture” and a “splendid anachronism.” Apparently, “The Death of Proof” garnered one of the largest volumes of angry mail in Scientific American‘s history, with mathematician after mathematician arguing that Horgan had strung together half-digested quotes and vignettes to manufacture a non-story.

Now Horgan—who you could variously describe as a wonderful sport, or a ham, or a sucker for punishment—has written a 26-year retrospective on his “death of proof” article. The prompt for this was Horgan’s recent discovery that, back in the 90s, David Hoffman and Hermann Karcher, two mathematicians annoyed by the “death of proof” article, had named a nonexistent mathematical object after its author. The so-called Horgan surface is a minimal surface that numerical computations strongly suggested should exist, but that can be rigorously proven not to exist after all. “The term was intended as an insult, but I’m honored anyway,” Horgan writes.

As a followup to his blog post, Horgan then decided to solicit commentary from various people he knew, including yours truly, about “how proofs are faring in an era of increasing computerization.” He wrote, “I’d love to get a paragraph or two from you.” Alas, I didn’t have the time to do as requested, but only to write eight paragraphs. So Horgan suggested that I make the result into a post on my own blog, which he’d then link to. Without further ado, then:

John, I like you so I hate to say it, but the last quarter century has not been kind to your thesis about “the death of proof”!  Those mathematicians sending you the irate letters had a point: there’s been no fundamental change to mathematics that deserves such a dramatic title.  Proof-based math remains quite healthy, with (e.g.) a solution to the Poincaré conjecture since your article came out, as well as to the Erdős discrepancy problem, the Kadison-Singer conjecture, Catalan’s conjecture, bounded gaps in primes, testing primality in deterministic polynomial time, etc. — just to pick a few examples from the tiny subset of areas that I know anything about.

There are evolutionary changes to mathematical practice, as there always have been.  Since 2009, the website MathOverflow has let mathematicians query the global hive-mind about an obscure reference or a recalcitrant step in a proof, and get near-instant answers.  Meanwhile “polymath” projects have, with moderate success, tried to harness blogs and other social media to make advances on long-standing open math problems using massive collaborations.

While humans remain in the driver’s seat, there are persistent efforts to increase the role of computers, with some notable successes.  These include Thomas Hales’s 1998 computer-assisted proof of the Kepler Conjecture (about the densest possible way to pack oranges) — now fully machine-verified from start to finish, after the Annals of Mathematics refused to publish a mixture of traditional mathematics and computer code (seems this is not exactly what happened; see the comment section for more).  It also includes William McCune’s 1996 solution to the Robbins Conjecture in algebra (the computer-generated proof was only half a page, but involved substitutions so strange that for 60 years no human had found them); and at the “opposite extreme,” the 2016 solution to the Pythagorean triples problem by Marijn Heule and collaborators, which weighed in at 200 terabytes (at that time, “the longest proof in the history of mathematics”).

It’s conceivable that someday, computers will replace humans at all aspects of mathematical research — but it’s also conceivable that, by the time they can do that, they’ll be able to replace humans at music and science journalism and everything else!

New notions of proof — including probabilistic, interactive, zero-knowledge, and even quantum proofs — have seen further development by theoretical computer scientists since 1993.  So far, though, these new types of proof remain either entirely theoretical (as with quantum proofs), or else they’re used for cryptographic protocols but not for mathematical research.  (For example, zero-knowledge proofs now play a major role in certain cryptocurrencies, such as Zcash.)

In many areas of math (including my own, theoretical computer science), proofs have continued to get longer and harder for any one person to absorb.  This has led some to advocate a split approach, wherein human mathematicians would talk to each other only about the handwavy intuitions and high-level concepts, while the tedious verification of details would be left to computers.  So far, though, the huge investment of time needed to write proofs in machine-checkable format — for almost no return in new insight — has prevented this approach’s wide adoption.

Yes, there are non-rigorous approaches to math, which continue to be widely used in physics and engineering and other fields, as they always have been.  But none of these approaches have displaced proof as the gold standard whenever it’s available.  If I had to speculate about why, I’d say: if you use non-rigorous approaches, then even if it’s clear to you under what conditions your results can be trusted, it’s probably much less clear to others.  Also, even if only one segment of a research community cares about rigor, whatever earlier work that segment builds on will need to be rigorous as well — thereby exerting constant pressure in that direction.  Thus, the more collaborative a given research area becomes, the more important is rigor.

For my money, the elucidation of the foundations of mathematics a century ago, by Cantor, Frege, Peano, Hilbert, Russell, Zermelo, Gödel, Turing, and others, still stands as one of the greatest triumphs of human thought, up there with evolution or quantum mechanics or anything else.  It’s true that the ideal set by these luminaries remains mostly aspirational.  When mathematicians say that a theorem has been “proved,” they still mean, as they always have, something more like: “we’ve reached a social consensus that all the ideas are now in place for a strictly formal proof that could be verified by a machine … with the only task remaining being massive rote coding work that none of us has any intention of ever doing!”  It’s also true that mathematicians, being human, are subject to the full panoply of foibles you might expect: claiming to have proved things they haven’t, squabbling over who proved what, accusing others of lack of rigor while hypocritically taking liberties themselves.  But just like love and honesty remain fine ideals no matter how often they’re flouted, so too does mathematical rigor.

Update: Here’s Horgan’s new post (entitled “Okay, Maybe Proofs Aren’t Dying After All”), which also includes a contribution from Peter Woit.

74 Responses to “Death of proof greatly exaggerated”

  1. Vanessa Says:

    I completely agree with everything you wrote, but I also wonder whether specifically probabilistic proof couldn’t be employed more often. For example, I can imagine the following situation: Conjecture X would follow (in the usual sense) if some proposition $\phi(a)$ holds for *most* elements $a$ of some finite set $A$. Suppose that machine verification of $\phi(a)$ for any given $a$ is easy, but $A$ is prohibitively large for it to be feasible to iterate over all of its elements. Then, by sampling enough random elements of $A$, we can become highly confident of X without having either a human verifiable or a machine verifiable proof.

    Is it the case that (1) these things do happen, only I haven’t heard about them form some reason or (2) it is very rare for something like this to work or (3) this sort of thing can work often enough but most mathematicians don’t even consider the possibility?

  2. Scott Says:

    Vanessa #1: There’s an old joke (does anyone remember the source?) that randomized algorithms, with their 10-200 or whatever probability of error, are only good enough for non-critical applications, like air traffic control or diagnosing cancer or monitoring a nuclear reactor … not for publishing a theorem in a math journal.

    To be honest, though, while a situation like the one you describe is possible in principle (given current knowledge), I can’t think of a single case where it’s happened—that is, where

    (1) the proof of an interesting theorem could be reduced to a huge but finite calculation (this is already a very small fraction of interesting theorems),
    (2) we knew a fast enough randomized algorithm to do the calculation, and
    (3) we did not know a fast enough deterministic algorithm.

    Can anyone else think of an example? Maybe involving polynomial identity testing?

    Of course, assuming good enough pseudorandom generators exist, any situation of this kind would presumably be “temporary.” But there are cases where current knowledge suffices to prove the correctness of a randomized algorithm, but not to prove the correctness of its derandomization.

  3. Daniel Gottesman Says:

    Vanessa #1: I can give you a concrete example of this sort of proof. In quant-ph/0504218, Panos Aliferis, John Preskill, and I showed that you can prove (in the usual sense) a lower bound on the threshold for fault-tolerant quantum computation for a concatenated code by counting the number of malignant sets of locations in an extended rectangle for your protocol. Exactly what all of these things mean is not so important for understanding how this will give us an example of this sort of proof. All you need to know is that there’s a bunch of sets and we want count how many sets satisfy a certain property. More malignant sets mean the threshold is lower; we would like it be higher. You can easily get upper bounds on the number of malignant sets analytically, but to get a better lower bound on the threshold you need to count more carefully.

    For the code we studied in quant-ph/0504218 (the concatenated 7-qubit code), it was possible to count all malignant sets exactly using a computer, and you could possibly do it by hand if you were sufficiently patient. However, in quant-ph/0610063, Panos Aliferis and Andrew Cross looked at other bigger codes, with many more sets of locations. They were not able to count locations exactly, so instead they did a random sampling of sets to measure what fraction were malignant, thereby deriving better lower bounds on the threshold for these particular codes.

    Now this was 13 years ago, so today it might be possible to count exactly the number of malignant sets for those particular codes. Also, the results have been somewhat superseded by higher thresholds for different protocols proven without the use of random sampling. Moreover, the most interesting question — is there a threshold or not? — can be addressed without a careful counting at all. However, it remains the case that there is reason to be interested in the tightest provable bound for a particular protocol and that for a sufficiently large code, the only way we know how to do that is to count something for which the set is too large to count deterministically.

  4. Geoffrey Irving Says:

    Although “death is proof” seems a big strong, it is worth celebrating the actual mathematical existence of statements that are “approximately independent” in the sense of only having extremely long proofs (either so long as to be only findable with a computer, or much too long to ever find even if they exist). To my mind the existence of approximately independent statements only makes the world richer and more interesting, and celebrating both the existence of short intuitive proofs and incredibly long unintuitive proofs is a nice way to bring everyone together.

  5. vzn Says:

    Horgan has a nose for the controversial and the controversial also has glimmers of the future. have been following a similar thread re “empirical mathematics, algorithmics etc” for over ½ decade on my blog and longer. hadnt heard of his 1993 article, a prescient gem. my favorite quote from gibson author of Neuromancer (which mentioned the Matrix in 1980): the future is already here, its just not evenly distributed. there is indeed a revolution in computer science crosspollinating with mathematics but its not easy to see up close, its a likely decades-long slow-motion coalescing.

    its already revolutionary yet we are in the early stages and still the early part of the 21st century and think the rest of the century will bear out Horgans ideas. Wolfram has very similar ideas and ofc is similarly unfairly and myopically excoriated by the community. some ideas are so powerful that they dont come to realization in the authors own lifetime. you neglect mention of the 4 color proof and other key milestones of empirical mathematics. anyway though this is a very complex topic and somebody ought to write a book on it, there are a few, but none as encyclopediac as is really required. my open invitation to anyone openminded to further discuss this at length in my blog or stackexchange chat. meantime this 2015 blog is full of diverse/ compelling links/ refs.

  6. Gerard Says:

    @ Vanessa and Scott #1,#2,

    CNF problems look a bit like the situation you describe except we don’t usually have the proposition:

    > Conjecture X would follow (in the usual sense) if some proposition $\phi(a)$ holds for *most* elements…

    Otherwise the set A is the solution set of the CNF formula and $\phi(a)$ is just a trivial function that checks if a satisfies the formula.

    Of course this approach is useless if the property you are checking is satisfiability because then you would have to check all elements of A (assuming P != NP).

    However if you could reduce a problem to the question of whether a given CNF formula has more than m solutions where m ~ 2^(n-1) where n is the number of variables then such an approach might work, though I don’t know if any interesting problems have such a reduction.

  7. Scott Says:

    Daniel #3: Cool!

  8. Scott Says:

    vzn #5: I didn’t mention the famous computer-aided proof of the four-color theorem only because it’s from 1976, and thus is irrelevant to the topic of what’s changed between 1993 and the present.

  9. Charles Says:

    I believe Hale’s original proof of the Kepler conjecture actually was accepted by the Annals, modulo the proviso that they were only “99%” sure the proof was correct, since they could not check all the computer calculations. Hales started his computer formalization to remove any last doubt.

  10. Scott Says:

    Charles #9: I thought they agreed to publish only those parts of the proof that didn’t depend on computer code that the referees couldn’t verify? (Which, to be clear, strikes me as a reasonable decision.)

  11. Charles Says:

    Scott #10: After reading about it some more, it looks like the Annals published an abridged version, with all the details about the computer calculations published in Discrete and Computational Geometry the next year. That being said, Hale’s paper in the Annals does explicitly rely on the computer calculations, and the Annals links to Hale’s code on their webpage for the article, so it looks like they accept the whole thing.

  12. Bobby Says:

    Shouldn’t we calculate the probability that a rigorous proof accepted by the mathematical community is in error (which is surely orders of magnitude higher than 10^-200)?

    Then we should accept probabilistic proofs which provably have a smaller margin of error.

  13. Scott Says:

    Bobby #12: But won’t that 10-200 be completely dominated, in practice, by the probability that the pseudorandom number generator is bad, which fairness demands that we include if we’re including things like “the probability that the mathematical community is in error”?

    In reality, in the extremely rare cases like that described by Daniel Gottesman in #3, I’m basically as happy to accept a Monte Carlo simulation as part of a proof as I am to accept a huge deterministic computation. (Why “basically” as happy, rather than exactly as happy? Only because of the well-known difficulty of debugging Monte Carlo programs.)

  14. Raoul Ohio Says:

    Bobby #12

    Always hard to decide “how rare is rare”. My favorite milestone is 10^-50, which is often the estimated probability that you will be hit and killed by a comet in the next second.

  15. Pantelis Says:

    Dear Scott,
    Of course I agree with you. The problem is that Horgan is not the only one and mathematical proves is not the only case where scientists decline rigor. In fields like Pervasive Computing researchers many times decline mathematical thinking in favour of experimentation. I have written a paper that among others also discusses this issue.

  16. James Gallagher Says:

    Does anyone really care about proofs in pure mathematics anymore? I mean, hasn’t all the interesting, useful stuff been discovered in pure mathematics? Who cares if the abc conjecture is “true” or not, especially if you have to delve so far into the realms that Gödel warned were potentially meaningless in any case that the proof is only useful for disciples to worship over for centuries to come.

    But at least the pure mathematicians are relatively harmless, it’s people like Peter Woit who believe that some holy deep pure mathematical structure underlies physics that are a concern for me. Maybe Nature just ain’t so perfect Pete, eh?

  17. Neel Krishnaswami Says:

    Scott at #8 wrote: I didn’t mention the famous computer-aided proof of the four-color theorem only because it’s from 1976, and thus is irrelevant to the topic of what’s changed between 1993 and the present.

    Actually, one thing did change about the status of the 4-colour theorem between 1993 and the present: George Gonthier did a machine-verified formal proof of it in the Coq proof assistant. So the 4-colour theorem’s status switched from “leading example of a nontrivial theorem relying on the methods of experimental mathematics” to “leading example of a nontrivial theorem with an absolutely formal correctness proof”.

  18. Scott Says:

    James #16: Any future comment in a similar snide tone will earn a temporary blog ban.

    Yes, there are many famous conjectures in math that everyone already believes, and whose proof (if it comes) will be “less about the destination than the journey.” But I happen to work in a field (CS theory) where we try every day to prove various smaller conjectures, and some of the conjectures actually turn out to be false, or not true the way we thought, or they lead us to unexpected places. And since we typically care about the best algorithm and the worst input that could possibly exist, numerical experiments can only get us so far. So it’s not just a matter of “caring about” proofs; they’re simply the main tool we have for reliably learning what’s true (and better yet, why). If a better tool existed, I’d happily switch to it.

  19. James Gallagher Says:

    Scott #18

    It’s not “snide” at all, just an attempt at plain and simple honesty in this debate.

  20. Scott Says:

    Neel #17: Thanks for the reminder about that, but I’m perfectly happy to call what existed in the 1970s a proof. Not an “absolutely formal proof,” but not “experimental mathematics” either, just a proof in the usual working sense.

  21. vzn Says:

    2 choice/ solid/ weighty refs come to mind for this audience, excellent food for thought/ future commentary: Gowers 2000 has a really neat paper “rough structure and classification” that envisions a scifi scenario with an intelligent proof assistant and has done some investigation into automated theorem proving. very visionary.

    this is an excellent modern survey of state of the art covering the Big Proof conference at isaac newton institute summer 2017, notices AMS

    next, sneaking in further lightweight commentary already ½ decade old on gowers etc 🙂

  22. Sniffnoy Says:

    Does anyone really care about proofs in pure mathematics anymore?


    I mean, hasn’t all the interesting, useful stuff been discovered in pure mathematics?


    I mean, I suppose it depends on what you count as useful, but there is plenty of interesting math left to do. Really, you can just check any list of open problems… if you find it all boring, then, well, I don’t know what to say!

  23. Kevin S. Says:

    Since the topic of this post is related to nitpicking, the original computer assisted proof of the Kepler conjecture was joint work of Hales and Ferguson.

  24. Pseudonym Says:

    We compiler writers scoff at any notion of the “death of proof”. Every time your computer program is type checked, that’s a proof. Every time your compiler tests to see if an optimisation is applicable, that’s a proof.

    Each program you try to compile is another case for which those theorems need to be proven.

  25. Richard Gill Says:

    Raoul #14 could you give me a literature reference for that 10^-50 ?

    (You said: “Always hard to decide “how rare is rare”. My favorite milestone is 10^-50, which is often the estimated probability that you will be hit and killed by a comet in the next second.”)

  26. Bjørn Says:

    Thanks for mentioning the Robbins proof. Humans reading this blog should check out page 13 at
    We may have something to learn from those long bar patterns.

  27. Pete Says:

    Scott #2 – here’s an example:

    Balister, Bollobas, Walters, Continuum percolation with steps in the square or the disc.

    This paper (rigorously) reduces a problem in probability theory to showing that a certain explicit high-dimensional integral is sufficiently large. The authors aren’t able to prove this rigorously, but a simple Monte-Carlo evaluation says the answer is yes with high probability.

    (Of course there is a deterministic method to approximate integrals rigorously, but this is very expensive in high dimension!)

  28. gentzen Says:

    Bobby #12: Your comment and its replies reminds me of my old comment on Andrej Bauer’s blog. It also stated different reasonable numerical values for the remaining doubt:

    This sort of proof sometimes allows “the verifier” (i.e. doubting Thomas) to make the remaining doubt as small as desired (say exp(-42) to fix some order of magnitude) by repeated queries, but note that even if he could reduce the remaining doubt to 0, the fact might still not be true.

    Under certain assumptions, this type of proof with arbitrary small remaining doubt can even apply to mathematical statements. Take a sufficiently non-trivial identity for example (or a prime number if you prefer), which has been checked by “the verifier” by sufficiently many random tests to ensure that the remaining doubt is below exp(-421) (or maybe just below 2^-29, it is subjective to a certain degree). Only “the verifier” can know that his random tests were sufficiently random, and even he cannot be absolutely sure.

    Especially Scott’s reply

    But won’t that 10-200 be completely dominated, in practice, by the probability that the pseudorandom number generator is bad

    reminded me of my remarks “note that even if he could reduce the remaining doubt to 0, the fact might still not be true” and “Only “the verifier” can know that his random tests were sufficiently random, and even he cannot be absolutely sure.” above. They were meant in exactly the same way, namely that there will be something which you overlooked that will dominate the uncertainty.

    N. David Mermin on page 12 in Why QBism is not the Copenhagen interpretation and what John Bell might have thought of it went further than both me and Scott and said that any judgment is subjective (including absolutely certain judgments) and need not “be backed up by objective facts”. This is not what I have in mind for my remarks, but I find it impressive:

    A very important difference of QBism, not only from Copenhagen, but from virtually all other ways of looking at science, is the meaning of probability 1 (or 0). In Copenhagen quantum mechanics, an outcome that has probability 1 is enforced by an objective mechanism. …
    … For a QBist, their mistake is much simpler than that: probability-1 assignments, like more general probability-p assignments are personal expressions of a willingness to place or accept bets, … It is wrong to assert that probability assignments must be backed up by objective facts on the ground, even when p = 1. … Probability-1 measures the intensity of a belief: supreme confidence. It does not imply the existence of a deterministic mechanism.

  29. Scott Says:

    Kevin S. #23 and Pete #27: Thanks!!

  30. Scott Says:

    Pseudonym #24: With the greatest of respect to my friends in programming languages, if the only proofs that anyone still cared about were the ones mechanically generated by compilers for type-checking purposes, I think “the death of proof” could’ve been a reasonable title. 🙂

  31. mr_squiggle Says:

    Problems are either interesting, or not.
    Godel demonstrated that problems are either prove/disprovable, or not.

    So, only problems in one quadrant of the overall scope are ever going to be proved.
    But there’s likely to be as many interesting problems again which remain unproved forever, even though they may have very convincing evidence or argument.

  32. Greg Kuperberg Says:

    Here is what I said about it on Usenet in 1993:

    I’d like to take this opportunity to get a few things off of my chest about recent popular accounts of mathematics.

    There seem to be a number of science journalists who have no firm conception of a mathematical proof. Since they don’t understand it, they cannot internalize its importance. Every once in a while, they discover that the mathematical community has paradigmed its way out of proofs. They then write articles and books that build fictitious theses from factual interviews with mathematicians.

    For example, James Gleick reported that “chaos theory” has shaken both mathematics and physics to their roots in the past 30 years. (Specifically, he says that the three most important discoveries in physics in the 20th century may be relativity, quantum mechanics, and chaos.) With chaos theory, we have turned to the creativity of empirical mathematics and the power of modern computers, and away from the dogma of rigorous mathematics, last forced upon the mathematical world by the evil genius of Bourbaki.

    The interviewer and the interviewed may change, but the obselescence of ordinary proofs and the domination of computers stays the same. A few years later, Discover magazine proclaimed that there was a new breed of mathematicians overthrowing the age-old tradition of theorem and proof, this time not only with chaos, but with computer-driven geometry in general. The reporter seemed to be awed by Thurston et al’s Not Knot video. Since proofs are incomprehensible and boring, and since Not Knot is a lot of fun, Not Knot must be better than a proof.

    Not to be outdone by Discover, Scientific American now announces that computers, rather than rendering proofs obselete, are taking over the business of proof. (When I say “Not to be outdone…”, I’m referring to the obvious fact Sci. Am. has gotten a lot more like Discover since it was bought out.) The reporter, John Horgan, has watched Not Knot too. He has also seen lots of fractals and gotten wind of recent results in algorithmic proof theory. Unfortunately, Horgan hops between proofs suggested by computer experiments, proofs that depend on computer searches, symbolic proofs compiled by computers, and (most egregiously) proofs explained with computer graphics. To him they are all computer proofs. He (or maybe another journalist at Sci. Am.) also insults Wiles by calling his proof of Fermat an anachronism. It’s too bad for Wiles that he didn’t use Mathematica or organize a team to make computer graphics.

    What about proofs typeset with the aid of computers, for example in TeX? What about proofs distributed by electronic mail or FAX? What if a proof comes to me when I’m playing Tetris? It’s too bad that Horgan didn’t count these as computer proofs. Then his mistake would have been too obvious.

  33. Neel Krishnaswami Says:

    Scott #20: I’d disagree, mainly because it’s so unrelentingly difficult to write actually correct programs. However, I think your more liberal view probably is more common among the minority of pure mathematicians who have given this question any thought at all.

    I’d say that the key ideas were set out in the 1977 proof, but the code they wrote offered heuristic support for the conjecture. Once Gonthier wrote formally-verified search code (and then ran it), then we had a proper proof. In fact, arguably the same thing happened with the Kepler conjecture — in Fejes Toth’s 1953 paper where he reduced the sphere packing problem to first-order real arithmetic, he noted that the Tarski’s decidability result for first-order real arithmetic meant that exhaustive search via computer could in principle settle the question. Hales improved this reduction to the point where the exhaustive search was actually feasible, but again, it’s not really a proof until you know that the program you wrote to do the search actually does the search you want it to. (Bhargava and Hanke worry about this point in section 4.3.3 of their paper Universal quadratic forms and the 290 theorem. They do an admirable job by computer programmer standards, but, speaking as a computer programmer, this is less reassuring than one might hope.)

    However, doing computations and case analyses impractical to do by hand is only one of the two new capabilities that computers offer mathematicians!

    The other new capability is to start looking at what the late Vladimir Voevodsky called large, intricate problems. He wrote that mathematicians face a tradeoff: they can do proofs about large, simple systems, or about small, intricate systems. However, large, intricate mathematical objects defy study — not for reasons of conceptual difficulty, but simply because it’s too hard to muster the focused attention to carefully check all the side-conditions. (He has some nice slides discussing serious, persistent errors in the literature on motivic cohomology — some of which were his own — and how they motivated him to start looking at mechanized proof.)

    But a computer can serve as a perfect mathematical amanuensis, checking that you have done all the necessary checks, and so opens the door to studying entirely new kinds of mathematical objects. This kind of thing is still rather nascent (probably higher category theory, and to a lesser extent, programming language semantics, are the only areas where this approach has started to see any uptake) but I am hopeful it will open up new frontiers for mathematics.

  34. Greg Kuperberg Says:

    There were actually two key things that happened with the four-color theorem since Appel-Haken. It wasn’t just that the Appel-Haken proof was computer-assisted, it was also that is was a mess by any standard. Appel and Haken adopted an idea that goes back to Kempe’s proof of the 5-color theorem, and that was escalated and strengthened in a proposal due to Heesch. Namely, you should create a list of subgraphs such that at least one exists in every planar graph for Euler characteristic reasons, and then prove by induction that each one can be replaced by any of several simpler subgraphs. The list of subgraphs is a complicated and delicate matter, because they depend on how you choose to smear out the local contribution to Euler characteristic, a process that Heesch called discharging by analogy with discharging static electricity. Once the discharging rules are chosen, this part of the proof may be computer assisted. In any credible attempt, the subgraphs are complicated enough that second part, inductive replacement, must be computer-assisted because it requires a large search over local colorings.

    The Appel-Haken proof only computerized the second half. The first half was done by hand because in the 1970s, there was much less experience with data structure representations of planar graphs. But then they kept finding mistakes in their hand-calculated list of cases. The mistakes were all inessential, but just as with car maintenance or OS updates, the process never stopped. Worse, eventually a controversy developed (which is reported in MathSciNet) over Haken’s extended definition of what counts as a “subgraph”. My impression is that the Appel-Haken proof is now seen as definitely the right idea, but not necessarily good execution.

    The first later development was a wonderful paper by Robertson, Sanders, Seymour, and Thomas, where they cleaned up the whole proof. They made both parts computer-assisted, and they improved the discharging algorithm to avoid any controversial reasoning in the inductive step. This proof was not a mess, moreover unlike Appel and Haken they published all of their source code. Nonetheless I do not know that any manpower was found to fully referee this paper by the standard of checking every argument before it was published, even though RSST did make that possible in principle.

    Gonthier formalized the RSST proof, for the straightforward reason that it was the only proof that he could have formalized short of finding his own. Gonthier’s contribution was unprecedented and amazing, but it was on the computer science side. RSST laid the groundwork on the mathematics side for Gonthier’s achievement to be possible. The RSST proof was never controversial. Instead of dispelling a crisis, Gonthier provided platinum certification for something that people generally agreed was a success. But that’s still great.

    One other thing: Formalized proofs have been the complete opposite of what Horgan envisioned. (As well as other people like him who don’t understand why proofs are important.) Instead of computers convincing people, people are convincing computers.

  35. william e emba Says:

    Greg’s description of the Appel-Haken proof (I presume based on RSST’s description) is perhaps misleading by a smidgen.

    The first part of the AH proof was in fact computer generated. The whole point of their “discharging” algorithm was to take a configuration that did not succumb to the “Kempe chain argument” and replace it with a family of configurations that between them, accounted for all graphs that contained the bad configuration. The plan was that all these configurations were more likely to succumb, and for those that didn’t, repeat.

    Based on “experimental mathematics”, A&H convinced themselves a version of this algorithm would converge to a family of all-good configurations. They went looking for one, and kept refining the family obtained and their discharging algorithm. Ultimately they had hundreds of rules and nearly 1500 configurations.

    The individual proofs that each rule worked was to be done by hand, and they were all supposed to be more or less trivial, but this aspect proved to be indigestible. Unsurprisingly, this was also where small bugs and corrections were found. Certainly this cat and mouse game is common enough in ordinary proofs, but in this context, gives the impression of relying on the earlier “experimental mathematics”. Once one got over the hump one was expected to then generate the family and then check the family worked by computer.

    (This is what I mean by “misleading by a smidgen”. Officially, the first part was also a computer proof, but their algorithm’s believability relied on excessive human checking of a badly written document, whereas the second part was elementary.)

    The RSST proof used only 32 discharging rules, and none of them were complicated. This was the part Gonthier had to formalize, and then rewrite RSST to exactly match his formalization. Ultimately, a human who nowadays wants to convince himself that 4CT is true has to convince himself that formal verification works and that Gonthier properly formalized RSST.

    Of course, in the 1970s, people did not have a good method for sharing software, and large programming projects of the day were guaranteed to be inscrutable. The original paper came with a microfiche addendum listing the AH configuration family! Later printing it out in book form made it no more human-friendly. That the mathematical part was also badly written–a problem that was not unique to A&H–just confounded the mess.

  36. chorasimilarity Says:

    Harald Helfgott proof [1] of the Goldbach ternary problem [2] is a more recent example of a computer aided proof.


  37. Greg Kuperberg Says:

    Alas, I wrote from memory instead of consulting the RSST paper that I linked myself. Somehow I misremembered which part was computer-assisted and which was not. But at least the paper supports my memory on another point, that self-intersecting configurations were a source of controversy in the Appel-Haken proof.

  38. Pete Says:

    As mentioned, both parts of the Appel and Haken proof are in principle computer-assisted. One part is in principle human-checkable, and the main issue is exactly that it actually isn’t computer checkable; Appel and Haken generated their family of configurations by something resembling an exhaustive search, but with a good deal of human interaction.

    I think even back in the 1970s people were not too unhappy with the idea that a computer can reliably perform exhaustive search (though for sure there were enough people who felt that an exhaustive search is not really an explanation). Show me the program that generated the output, I’ll check the program makes sense and then believe the result (maybe after re-trying it on different hardware).

    The problem was the idea of trusting a proof which was substantially computer-output but with no such correctness certificate: Appel and Haken didn’t really make the code they used for this part available, and for the reason that running it blindly wouldn’t actually produce the same output as in the paper. For that one would need to make the same interventions as Appel and Haken – which they didn’t actually record. So the only way to check that part was really to by hand go through the cases, and many people felt that they could not even in principle trust themselves to verify that a couple hundred cases are all correct and exhaustive.

  39. Rainer Says:

    In my opinion, a mathematical proof has two main purposes
    1. To give evidence that a mathematical statement is correct
    2. To give additional insight into mathematics

    For 1. one can ask what is “evidence”. Mathematical proofs are getting more and mode complicated. Less and less experts can really verify if the proof is correct or not. As an extreme example, we have Mochizuki’s alleged proof of the ABC conjecture, which no expert can really verify, even after more than 6 years of hard efforts. So this proof is of probably of not much value.
    Let’s assume that someone has made a proof of the Riemannian hypothesis (RH), which has been reviewed and found correct by a few hight reputed experts. One can assume that the proof will be very difficult and only those few experts will be able to read and verify it.
    But is that then really evidence that RH is correct?
    I think there are two directions to go:
    a) totally formal computer verified proofs. But at the moment this seems not to work very well.
    b) sticking to a less rigorous definition of proof. For instance, the thousands of implications the RH has provides in my opinion more evidence that RH is true then any semi-formal proof which has been reviewed by a few experts.

    For point 2, mathematical proofs have their value.
    But other methods, like computer simulations, will also help providing new insight.

  40. Christopher Says:

    One thing to keep in mind is that mathematical proofs become better understood overtime. For example, very few experts understood Wile’s proof of FLT when it was published, but the proof has now been more thoroughly digested by the mathematical community. See

  41. Okay, Maybe Proofs Aren't Dying After All - The Best of Everything Says:

    […] Scott Aaronson response (which he also just posted on his blog): […]

  42. O. S. Dawg Says:

    Even if proof were dead, or dying, mathematics would still be alive, and well. Math is not solely about proof any more than writing is simply about grammar.

  43. Raoul Ohio Says:

    Richard Gill (#25)

    Cannot recall where I read that. Maybe in a book or paper on randomized algorithms.

    At the time I did my own estimate and it was in the same ballpark.

  44. Greg Kuperberg Says:

    Mathematical proofs are getting more and mode complicated. Less and less experts can really verify if the proof is correct or not. As an extreme example, we have Mochizuki’s alleged proof of the ABC conjecture, which no expert can really verify, even after more than 6 years of hard efforts.

    But this verges on a misinterpretation of the status quo. It is true that the world’s most complicated proof gets more complicated over time — simply because world records are monotonic. It could also be true that the average proof or average paper gets more complicated over time — in part because it is vastly easier to communicate research than it once was. But the proof of any given result can only get simpler over time, sometimes much simpler.

    As for Mochizuki, the reason that experts are unconvinced by his proof has shifted over time. At first, his paper seemed incredibly formidable and people were worried that it would be prohibitively difficult to understand what he was saying. Then there was a phase when people started to sense that they were being filibustered, that the explanations were highly digressive and more complicated than necessary. In the current phase, as I understand it, a few talented mathematicians patiently stepped through his manuscript. They reported that the intimidating abstractions were livable after all, but that these abstractions didn’t seem to be gaining on the ABC conjecture. They also found a specific place where they see a serious gap in the argument that created or could create an illusion of victory. I’m not an insider to any of this, but it sounds like a few people now think that they understand enough of what Mochizuki is saying, and they disagree with him.

  45. Greg Kuperberg Says:

    Another comment: Even though Horgan was magnanimous enough to cite his critics and let Scott and Peter Woit post responses on his blog, in other respects his new piece, “The Horgan Surface and the Death of Proof” still strikes me as remarkably shameless. He says that he doesn’t mind being proven wrong, but actually from the beginning, he was not even wrong.

    There is the jaundiced view that you can’t expect science journalists to understand the material; that they can be your allies anyway even if they don’t; and maybe even that there is no such thing as bad publicity for good mathematics. But I have seen first-hand that this cynicism about science journalism is sometimes false. If you learn how to talk to science journalists, and if they want to understand you, then the outcome can be much better than this.

  46. Rainer Says:

    >> But this verges on a misinterpretation of the status quo. It is true that the world’s most complicated proof gets more complicated over time — simply because world records are monotonic. It could also be true that the average proof or average paper gets more complicated over time — in part because it is vastly easier to communicate research than it once was. But the proof of any given result can only get simpler over time, sometimes much simpler.

    Vladimir Voevodsky found an error in its own proof years after he has published it. He was tgen convinced that a human mind is not able to fully grasp such complicated structures.
    He started then his “univalent foundation program”, which is basically a formalisation of mathematics for the proof assistant COQ.

    >>In the current phase, as I understand it, a few talented mathematicians patiently stepped through his manuscript. They reported that the intimidating abstractions were livable after all, but that these abstractions didn’t seem to be gaining on the ABC conjecture. They also found a specific place where they see a serious gap in the argument that created or could create an illusion of victory.

    On the other hand, Mochizuky claims that Scholze/Stix are wrong — stalemate 🙁

  47. Scott Says:

    Rainer #46: It seems to me that many people are drawing an incorrect lesson from the Mochizuki affair. “Mochizuki claims to have proved abc, other mathematicians claim he hasn’t, it’s a stalemate, ergo the whole enterprise of proof-based math has broken down.”

    There have always been mathematicians, even otherwise distinguished ones, who claimed to have proved the Riemann Hypothesis or whatever and never backed down. That in itself is nothing new. What seems to be unusual about this case is just (1) the sheer enormity of the claimed proof, (2) the number of alien concepts, (3) the fact that almost none of the proof seems to be “doing anything”—just introducing huge, complicated definitions, and (4) Mochizuki’s refusal to travel to give lectures on the proof.

    From the very beginning, attention focused on Corollary 3.12–both as the place where almost all of the “real work” seemed to happen, and as a place where experts couldn’t follow the logic of the proof. Now that Scholze (one of the best mathematicians on earth) and Stix (an expert on anabelian geometry) have written up a refutation of Corollary 3.12–one whose language is as clear as Mochizuki’s language is obscure—it looks to an outsider like the ball is clearly, unequivocally in the court of Mochizuki and his disciples, to convince the rest of the mathematical community to spend more time on this. If they can’t, then people will move on, and continue to regard abc as a problem that’s open for anyone to solve.

  48. Problems With a Point | Gödel's Lost Letter and P=NP Says:

    […] of proofs by computer runs through several chapters of Bill and Clyde’s book. It is also current on Scott Aaronson’s blog—we would be remiss if we did not mention Scott in this […]

  49. Greg Kuperberg Says:

    Since Neel brought up the subject of formalized proof, I want to mention that this is one of those areas where it is easy to fall into a messiah complex. It is easy to believe that you had the revelation to formalize proofs and have computers check them independently of everyone else. It is easy to dream that eventually some vast fraction of mathematics will be formalized in your system, and it is easy to suppose that your system has superior features to make it the best candidate for the future formalized utopia. As I see it, this is simply a professional hazard in this field (and in some other endeavors as well), and just something that it is best to struggle against. It can get in the way of cooperation among researchers. Even those people who don’t think of themselves as the messiah can fall into factions, each with its own messianic leader.

    I don’t want to be too sour about Voevodsky given the vast amount that he accomplished, both in math research itself and in the area of formalized proofs. But it is my impression that he did not do enough to dispel messianic thinking in his univalent foundations project. His specific approach has its own followers and its own positive features, but he was far from the first person to try to formalize mathematics or to think of any of the arguments for why people should. As already mentioned, univalent foundations is implemented on top of Coq. The people who wrote Coq already knew how it could be useful for pure mathematics. In fact, Gonthier formalized the 4-color theorem in Coq before Voevodsky entered the field of formalized proof.

    On the positive side, Wikipedia tells me that Thierry Coquand, one of the inventors of Coq, has praised Voevodsky’s approach. Part of the point is that Univalent Foundations is closer to category theory, while a typical proof assistant starts with more classic formal logic with set-theoretic axioms. It is a fair point that category theory is a different style of mathematical foundations developed well after set-theoretic formal logic, and that it has been very influential. (Category theory in mathematics is roughly like object-oriented programming in computer science, although in a direct comparison the former is one level higher than the latter.)

    I do think that formalized proofs and proof assistants are a great area of research. My late uncle Andrzej Trybulec worked in this area. He invented his own proof assistant system which he called Mizar, and he created his own school of Mizar writers and developers. I think that Mizar does not have a crucial feature of modern systems called computational reflection, in which the proof asks the verifier to fill in steps with an algorithm, after certifying the algorithm. (I think of this as a formal version of “The reader can check…”) However, Mizar could be the most human-readable of the proof assistant systems. I think that that is a lasting contribution to the field. Today I found an interesting paper about implementing a “Mizar mode” in another system under current development called HOL.

  50. william e emba Says:

    Scholze and Stix did not claim to refute 3.12. They claimed that the relevant argument would not work in a somewhat different situation, and that by all rights, it ought to work there.

    This is analogous to rejecting a P=?NP proof on the grounds that the claimed proof seemingly relativizes, when of course it can’t.

    In other words, it’s hypothetically possible that all parties are correct. However, one side has shown little better than bluster at this point, and rather damaged credibility. The burden of proof is over there..

    Disclaimer: the relevant mathematics is way out of my league.

  51. Rainer Says:

    The point I wanted to make is the following:
    Let’s assume Scholze/Stix/Mochizuki finally agree that the proof is correct.
    Then would the verdict of those three mathematicians be enough evidence that the proof is correct? Would a dozen mathematicians provide enough evidence?

    Or, isn’t it that we base our confidence more on the fact that after many years no-one has found any hint that the ABC conjecture could be wrong?

    I think that is a similar situation as in cryptography. This field is meanwhile flooded with fancy “mathematically rigorous” security proofs. But no serious security expert would build a system with a cryptographic algorithm which is solely based on a security proof. He would rather base it on the fact that nobody could break it even after many years of hard effort.

  52. Cris Moore Says:

    As several people have already pointed out, we use proofs, not so much to determine that something is true, but to understand why it’s true. This often means changing the point of view, focusing on a particular aspect of the problem, finding a stronger or weaker form of the claim, defining a new mathematical object, recognizing an analogy with a different problem, realizing a technique from another area can be applied, interpreting the problem differently, and so on. We are also very happy when we can prove something in multiple ways: algebraically, geometrically, etc.

    These cognitive moves make us feel we understand the problem better, as well as how it fits in with other problems, what the strengths and weaknesses of different techniques are, and so on. And these insights are often more important than the original problem, or the fact that a particular thing is true.

    By definition, these insights are only available (at least to mere humans) if the proof is human-readable. For some problems (like the four-color theorem) we have proofs without these insights, but precisely for that reason these problems are viewed as isolated points in mathematics, not known to be connected to the rest of it, and while they may be charming this limits their interest. Ironically, I might even say that the four-color theorem is fun but not deep, precisely because the only proof we have of it is too long 🙂 When something like Fermat’s last theorem turns out to be connected to other things rather than being isolated, this is cause for celebration, and the proof is in some sense our reward for seeing these connections.

    I would say also that except for a few high-profile examples, like the ABC conjecture, confirming a proof isn’t usually very hard compared to finding it. And even that conjecture is interesting mainly because Mochizuki claims to have created new branches of mathematics to do it, and (if true, which apparently Mochizuki isn’t doing a good job of convincing people of) these would presumably be useful for proving other things as well.

  53. Cris Moore Says:

    A bit more:

    Computers are very good at searching for both examples and counterexamples of things. But as Knuth would say, unless these examples are quite small, it typically takes some human ingenuity to organize this search, make good guesses about where in search space it might be found (e.g. with what symmetries) and so on; doing a brute-force search typically doesn’t work. Moreover, when we find these examples or counterexamples, we want to understand what makes them tick, whether they can be extended to an infinite family, and so on.

    But back to the point that we don’t just want to know that things are true: we want to know why they are true, and we often figure that out by learning when and where they are true. Are they true just in two dimensions, or in N dimensions? The reals but not the complexes (or the other way around)? On lattices or on general graphs? And so on.

    In other words, we typically make a mathematical claim in a particular setting, and we want to know what it is about this setting that makes it true, and how things might break down in other settings, or even better how it might generalize to another, deeper claim, telling us how the original claim changes and adapts to a wide variety of settings.

    It also often happens that a claim remains true when we change the setting, but that our original proof of it breaks down, and we need to use a different proof technique. But this is interesting in and of itself. What is it about the first setting that makes the first proof work?

    Having a powerful computer-assisted proof system would be great for exploring these alternate worlds empirically, and quickly testing various hypotheses. But while some proofs are more informative than others, no single proof, machine-aided or not, answers this question of why and where something is true, and what other, deeper truths hold more generally. To do that we need a whole universe of proofs where we can turn their knobs, swap out their parts, and so on. Asking a computer to find or sense this entire universe, rather than construct a single proof, is a tall order: it demands concept formation, analogy making, and all that good stuff that current work on ML/AI doesn’t seem to have made much progress on.

  54. Josh Grochow Says:

    A few points to throw into the mix.

    First, I think there are actually situations where people want the proof just to know something is true, but the proof itself is tedious and boring. I believe this comes up frequently in programming languages research. Showing that a certain programming language has various nice properties (such as confluence) is often *not* the interesting part of PL research, but is more like checking off some boxes. I believe in the PL community it is quite common to use a computer to prove these more routine aspects of programming languages they are developing (though perhaps someone like Neel could correct me).

    This has also come up in my research with various collaborators, e.g. looking for intricate combinatorial/algebraic constructions that could yield better algorithms for matrix multiplication. Of course, what we want is not just truth but understanding. However, there are also times where we’d just like to know if a certain object with certain properties exists. And once we know that, then we begin the hard and more interesting work of “reverse-engineering” it, and actually trying to understand the object, why it exists, how to generalize it, etc. But certainly one of our research tactics has been to use computers to simply tell us whether the thing exists! (Although we’ve been using computers in a more combinatorial way, not in a “find me a proof” way. Technically different, but philosophically perhaps not so different…)

    You can also imagine proof assistants being used similar to how calculators or computers are used today, but instead of for small calculations, for small lemmas. You, human mathematician, are in the middle of some proof, and you realize you need this one little combinatorial lemma. It’s not the heart of the matter, but it seems annoying to prove. Give it to a computer.

    Second, one of the things I’ve often thought would be a cool outcome of having machine-readable proofs is the searchability. Right now, two papers could be talking about the same mathematical object and not even realize it! (See cryptomorphism.) But if everything were machine-readable, you could just search for papers using the same – or even just provably equivalent – axioms to those you were using, and actually hope for meaningful results.

    Third, while I believe Scott’s point is still currently true – writing machine-readable proofs is currently a huge pain because of the level of detail involved – I think this will not be true forever. People interested in the mechanization of proof are working on building higher-level proof languages, just like we now have higher-level programming languages and no one really codes in assembly any more. At some point, those higher-level proof languages will be well-developed enough, and enough of a library of basic lemmas will have accumulated, that it will likely become much easier to write machine-readable proofs in a high-level language that fairly closely resembles the more intuitive language we currently write our mathematics in. I’m not enough in the loop to know whether this is 5, 10, 20, 50, or 100+ years away, but I think it will come some day.

    Scott wrote: “the more collaborative a given research area becomes, the more important is rigor.” I tend to agree with this statement, but have trouble reconciling it with the difference between rigor in physics and in math. I know physicists have their own version of rigor, which I used to think was strictly weaker than mathematical rigor, but I think I am coming around to the idea that maybe it’s simply different. And yet it works so well so often…

    In Horgan’s article, Wolfram is quoted as saying: “But somewhere underneath there will be all sorts of computational irreducibility that we’ll never really be able to bring into the realm of human understanding.” I tend to agree with this. There are certainly mathematical truths that are true simply because they have to be, but in a way that will never be illuminated. And while it is easy to dismiss such statements by saying “Well, such statements simply are of no interest to human mathematicians,” it seems entirely plausible to me that there will be some mathematical statements that people are genuinely interested in because of the consequences they have for other mathematics we’re interested in, and yet the only available proof is a huge case checking.

    And, finally, a point about the nature of mathematics and understanding. Today, we talk about and regularly prove things as undergraduates that were difficult, laborious, and highly nontrivial for mathematicians of, say, 150 years ago. How is this possible? I believe it is largely because of the language we build up (aside from significant societal changes that allow younger and younger people to pursue serious studies). The words in new mathematical languages come from the process of abstraction – giving a name to pattern, and then realizing that there can be patterns among patterns and so on – and these words get their meanings through the theorems we are able to prove about these abstractions. But the beautiful thing is that once you’ve built up the right language, something which was nontrivial before can suddenly become trivial to see. I think that somehow the key to (mathematical) understanding lies in this process of building up our language until the point that the thing we want to understand becomes trivial, or at least trivial enough that a single human mind can comprehend it all at once. And part of the issue with huge computer-aided proofs (like the four color theorem) is that it seems there is no such language to be had…

  55. Greg Kuperberg Says:

    Some responses to Cris and Joshua:

    As several people have already pointed out, we use proofs, not so much to determine that something is true, but to understand why it’s true.

    But how well can you know why something is true if you don’t know whether it’s true? Scott has previously made the analogy that computers are to computer scientists as telescopes are to astronomers. To the extent so, the proof of the four-color theorem has been a very good telescope that actually has led to a lot of followup work. It is not so isolated as all that. Besides, even if the four-color theorem were a one-off in combinatorics (which it’s not), it would still be very interesting as a collaboration between math and computer science. As such, it is hardly sterile for a human audience; there is a lot of human insight in it.

    I know physicists have their own version of rigor, which I used to think was strictly weaker than mathematical rigor, but I think I am coming around to the idea that maybe it’s simply different.

    People talk about standards of rigor in “physics”, but what is much more relevant to the discussion is the framework of rigor in quantum field theory and string theory, not physics in general. They are related, but not the same. The main source of rigor for physics in general is physics experiments, but quantum field theory and string theory have galloped far ahead of experimental evidence — and yet still have some tremendous achievements.

    The main currency of rigor in QFT is consistency checks, which is actually similar to the tradition of conditional results in number theory, computational complexity, and cryptography. It’s similar but not the same, because in the case of number theory and computational complexity, the consistency checks fit into a framework of rigorously stated conjectures. (Notwithstanding the notorious case when two of the heuristic number theory conjectures in Hardy and Wright were discovered to contradict each other.) In my impression, belief in cryptography is somewhat more empirical, which might make it more comparable to quantum field theory in this respect.

    The semi-rigorous framework of quantum field theory is very valuable and powerful, and in that sense it certainly isn’t strictly weaker than what mathematics has. However, there is clearly something big missing. Advances in computerized proof checking have bolstered the belief that most math proofs can be made completely rigorous in principle. (In fact, I see that as the most important achievement of that field, more important than specific assurances about 4CT, Kepler, etc.) No one thinks that about quantum field theory arguments. No one thinks that there is some quantum field theory variant of Zermelo-Frenkel set theory that could make a software proof assistant happy. On the contrary, I have heard that theoretical physicists find significant foundational mistakes and misinterpretations in quantum field theory from time to time, even if they are usually right. Many of them think that quantum field theory can and should be made rigorous, and they are hoping that rigorous mathematicians can help them more.

  56. Cris Moore Says:

    Hi Greg. I certainly agree it’s important to know whether or not something is true, and the Appel-Haken proof and its descendants are important certificates of truth. I also agree that computers are great telescopes for finding things, including proofs (as long as one has already nailed down what kind of proof one is looking for).

    Nor do I mean to say that the 4-color theorem is completely separated from the rest of mathematics. It certainly inspired much of graph theory. But if the only reason it’s true is that a certain set of discharging rules suffice, and we can show this by looking at large enough subgraphs, this would be a little disappointing. If a short proof exists, it might give us a sense _why_ these rules work (besides just tediously showing that they do) and whatever techniques give us that insight might provide insights about many other problems as well.

    To me, the proofs of the 4-color theorem are a bit like a proof, if true, that White can win in Chess. It’s intuitive that White has an advantage by going first, but (unlike Hex) this isn’t a proof that White can win, or even force a draw. If this fact is true, its shortest proof is probably very long, analyzing lots of lines of play and showing that this advantage is strong enough to work. I don’t know what insight we would gain from such a proof, other than a sense that our intuition was right in the first place.

    On the other hand, you understand these proofs much better than I do. Perhaps they speak to you in a way that I can’t appreciate, so that to you they are “insightful” despite their length.

    It may also be that no short proof exists. I remember reading somewhere that just as someone (God?) has Erdos’ fabled Book of the most-elegant proofs, the Devil has a book consisting of all the easy-to-state theorems which have no elegant proofs at all. (Anyone know the reference?) We know that such theorems exist — we just hope the things we want to prove aren’t among them.

  57. Tamás V Says:

    Josh Grochow #54: My bet is that besides the development of language, in the future it will be possible to implant computer-like capabilities into the human brain. And then humans can comprehend and come up with proofs unimaginable today. In 150 years, humans will look back and be surprised why it took so long to prove trivial things like the abc conjecture ????

  58. fred Says:

    “the Robbins Conjecture in algebra (the computer-generated proof was only half a page, but involved substitutions so strange that for 60 years no human had found them)”

    Wouldn’t you expect AI to find techniques that feel strange, i.e. they’re entirely new (same happens in with Alpha Go Zero, etc).
    Just labeling them as strange seems like a lost opportunity to expend the human toolkit.
    It’s a general issue with AI that, in a first pass, it can’t explain the things it does. But there is research to develop supplemental AIs which sole purpose is to explain it all in human terms.

  59. Greg Kuperberg Says:

    On the other hand, you understand these proofs much better than I do. Perhaps they speak to you in a way that I can’t appreciate, so that to you they are “insightful” despite their length.

    No, I don’t think that I do understand the proof of the four-color theorem better, I just see reasons for a more positive interpretation. Let’s take chess as an analogy. I think most people expect chess to be a draw rather than a win for white, but we can consider the more conservative conjecture that it isn’t a win for black. Well, we can already know something about why that would be true if it is true. Namely, it sure looks like the opening in chess, as in most games, is one where tempo is helpful. There is even some rigorous exploration of a related concept called temperature in mathematical game theory. That is, yes, this is intuition, but it is more than that, it is intuition supported by a mathematical theory.

    One of the intuitive merits of the discharging proof(s) of the four-color theorem is the evidence that discharging tends to gain on the problem and should succeed in many variations. It is like a rising tiding of progress in a game where you expect many ways to win. It is also something that you can carry with you to other questions, including (but not only) some variations of the four-color theorem that are still open.

    if the only reason it’s true is that a certain set of discharging rules suffice, this would be a little disappointing

    I agree that it would be interesting to find another proof of the four-color theorem for the reasons that you say. However, it is always a victory to find a completely new proof of any difficult theorem in mathematics, and I don’t know that the four-color theorem begs for it more than other questions do. In fact, Kronheimer and Mrowka have a program to try to prove the four-color theorem with (of all things) the Yang-Mills gauge equations in four dimensions. That would be fantastic if they succeed, but it would not necessarily obviate the discharging proofs. For one thing, RSST state the corollary of their proof that there is a quadratic time algorithm to find a 4-coloring of a planar graph. The Kronheimer-Mrowka approach might well pass through steps that are NP-hard.

  60. Greg Kuperberg Says:

    the Robbins Conjecture in algebra (the computer-generated proof was only half a page, but involved substitutions so strange that for 60 years no human had found them)

    Wouldn’t you expect AI to find techniques that feel strange

    I wonder if the steps in the proof of the Robbins Conjecture are not so much strange as they are arbitrary. Universal algebra is RE-complete. It might be possible to set up even fairly simple questions with cryptographic features. If so, then AI in the meaningful sense of imitating human thought and other biological processes might not help you. AI in the generalized sense of computers being smart amounts to just referring to the whole topic of computer algorithms as AI.

    In particular, EQP is surely closer to symbolic algebra (like Mathematica, Sage, etc.) than to anything like Alpha Zero.

  61. Viljami Says:

    What you make of this?
    Proietti, M., Pickston, A., Graffitti, F., Barrow, P., Kundys, D., Branciard, C., Ringbauer, M. and Fedrizzi, A., 2019. Experimental rejection of observer-independence in the quantum world. arXiv preprint arXiv:1902.05080.

    If I understood correctly it is an experimental test of the paper by Frauchiger & Renner that you went through last year? So the same conclusion holds?
    Frauchiger, D. and Renner, R., 2018. Quantum theory cannot consistently describe the use of itself. Nature communications, 9(1), p.3711.

  62. Scott Says:

    Viljami #61: My comments about the Frauchiger-Renner paper never doubted for a second the result predicted by QM. So an “experimental demonstration” of their setup—by which one really means an experimental demonstration of Hardy’s experiment, since the Frauchiger-Renner one would require superposed conscious observers (!)—has zero effect on any of those comments.

    In the future, PLEASE, no drive-by postings of off-topic papers asking me to respond to them.

  63. Douglas Knight Says:

    Scott 46:

    From the very beginning, attention focused on Corollary 3.12–both as the place where almost all of the “real work” seemed to happen, and as a place where experts couldn’t follow the logic of the proof.

    Not that it’s relevant to your argument, but I think that timing claim is wrong.

    3.12 as a problematic step was first mentioned in public in 12/2017 (here) in a comment by PS. BCnrd followed up by saying that he heard about from 3 separate sources in 12/2015, triggered by his writing up his experiences at a conference on the proof. But he specifically said that he had not heard it at that conference! Three years in is not “from the very beginning,” but even then while some experts focused their attention on this point, conferences specifically about the proof were failing to collate that information, and maybe not even including a single person who held that position.

  64. Scott Says:

    Douglas #63: OK, thanks for clarifying! I stand corrected re the timing.

  65. William Gasarch Says:

    random thoughts
    1) I saw Michael Rabin give a talk on randomized algorithms for primality back in 1980. What struck me was he said that

    If the prob that the algorihtm fails is 1/2^{100} thats about the same as the prob that a deterministic algorithm fails because a cosmic ray flips a bit.
    so YEAH, good enough!

    2) I think empirical methods are GOOD for intuitions and the basis for conjecture but not for proof. As such, perhaps such methods should have more of a place in the math community.

    3) If someone claims to resolve P vs NP (or some other hard problem) and mistakes are found and that person RETRACTS, thats FINE- that person still has my respect.

    If someone claims to resolve P vs NP (or..) and mistakes are found and they NEVER RETRACT thats BAD, and that person does not have my respect.

    Hogan is in a funny position in terms of my respect (Yes, Hogan has sleepless nights wondering if he has my respect 🙂 ). He DID retract but 25 years later. And what he
    retracted was not a proof, but a … prediction? If viewed as a prediction, then maybe 25
    years is a fine amount of time to wait.

    4) One odd issue here is just the word `Proof’ The phrase

    `interactive proof” (e.g., IP=PSPACE)


    `proof of the Poincare Conjecture’

    really are two different uses of the word proof that are perhaps being confused.

  66. Eitan bachmat Says:

    To 52, chris, the ABC conjecture is a huge deal and would be hands down the biggest theorem of the decade had it been proved ( previous biggest theorems are, 70’s Weil conj. By Deligne, 80’s Mordell conj et al by Faltings, 90’s Fermat by Wiles, 2000s Poincare by Perlman
    I must admit that i dont recall a thm at those levels in the 2010s)

    Second remark, If interested in Voevodski’s type theory work I would consult Mark Bickford

  67. Not Lubos Motl Says:

    I also scoffed at John Horgan’s thesis when I first heard it. Then after spending two years trying to learn geometric topology, becoming disgusted by its lack of rigor, and eventually finishing my phd in a totally unrelated field (analytic number theory), I decided that there are pockets of math to which Horgan’s indictment applies pretty well. His “video proofs” aren’t that different from some of the diagrams of 4-manifold handle slides I was supposed to accept as proof. I don’t think low dimensional topology is in crisis a la the Italian school of algebraic geometry, but I do think it has a culture of hand-waving and contempt for details that seems very unhealthy.

  68. Kevin Buzzard Says:

    I am more concerned about the death of proof than most commenters here. Let me raise some other examples and issues which as far as I can see have not yet been bought up.

    The classification of finite simple groups is currently “in the literature” in the sense that if you take the results in hundreds of journal articles (some of which prove things in far more generality than necessary for the proof) then a number of experts know how to put these results together to create a proof. My understanding (I am an algebraic number theorist, not a group theorist) is that there is currently no good “map” through the proof in the literature, however the experts believe it is complete (and I have no reason to disbelieve the experts). However the experts are dying out. Young people are not going into the area of finite group theory, because “the main problem has been resolved”, and the proposed 12 volume “second generation” proof is still only seven volumes in, with only one volume published in the last 10 years. The authors of these proposed volumes are in their 70s. I am genuinely concerned that the details of this proof are going to die.

    Jim Arthur recently was recently awarded both the Wolf prize and Leroy P Steele prize, both mainly for lifetime achievements, however both no doubt sparked by his magnificent monograph “The endoscopic classification of representations”. The main results of this monograph are a spectacular classification of automorphic representations for many classical groups. The results rely on four manuscripts (references [A24] to [A27] in the monograph) which are “in preparation”. The book was published in 2013. These manuscripts have still not yet appeared. I asked Arthur last month and he assures me that he will get back to them at some point. Arthur is also in his 70s. Let me absolutely stress that I am in no way suggesting that there might be problems in the ideas Arthur has which will fill in the gaps in his monograph. However let me also suggest that if Arthur is hit by a bus tomorrow then actually we have a problem here. I have asked several leading number theorists if they feel confident of being able to prove the missing results themselves, and every single one has said “no, but I bet X could do it”; I then ask X who says “no, but I bet Y could do it” etc. I am genuinely concerned that the details of this proof are going to die.

    I found Voevodsky’s 2014 IAS talk very interesting. I am a working mathematician and I fully appreciate that the notion of proof in our subject is in pretty good shape. However the story Voevodsky tells in this talk leads me to the conclusion that it is not in as good a shape as many commentators here believe it to be. In particular I find Voevodsky’s claim “A technical argument by a trusted author, which is hard to check and looks similar to arguments known to be correct, is hardly ever checked in detail.” very disturbing. Voevodsky cites examples of mistakes in articles by both Bloch and himself in this talk. There are other examples of major works by prestigious authors turning out to be problematic. Manin’s proof of the Mordell conjecture for function fields over the complexes was accepted by the community for 25 years before Coleman found a mistake in it in the late ’80s (and also, fortunately in this case, a modification which fixed the proof). There is a mistake in Faltings-Chai about extending abelian schemes over an open set in a base scheme which invalidates some of the arguments — but the experts know what is ok and what is not. I would like to argue that “by induction” it is very likely that results which are currently accepted by the community will be rejected later. This for me is a problem. When this happens, experts will be able to assess the damage and will know what is still OK and what is not OK, assuming the area is still in fashion and there are still experts around. I am not naive enough to believe that areas currently in fashion will remain in fashion forever. This concerns me.

    Mathematics is done by humans, and issues of the latter kind (proofs turning out to be wrong) are inevitable. But proofs dying out are more concerning to me. Other mathematicians have given me other examples in their areas, both of proofs which rely on results which were never satisfactorily written up and of proofs which rely on additional expertise which is dying out. If and when the computer revolution comes and substantial databases of formally verified mathematics start being built, I envisage problems when it comes to proofs which have not been correctly documented, and proofs which turn out to be not quite right, especially if they are in areas which are no longer regarded as fashionable e.g. because the major problems have been resolved to the community’s satisfaction.

    I am not attempting to defend Horgan’s article, however I believe do there is some sort of debate to be had here, and as computer proof verification systems become more powerful these issues will slowly start to emerge. I do not know how long this will take. I have seen with my own eyes the deficiencies of the refereeing process, and I have seen with my own eyes the vast amount of work which one has to put in to prove even the simplest MSc level mathematical results in a formal proof verification system, and I am not offering any answers. However I do believe that the era of formal proof verification systems will come, and when it comes I am concerned that poorly documented and incorrect proofs in areas which have dropped out of fashion will prove very difficult to fix.

  69. Morten H Says:

    Hi, I basically agree with you but at the same time I wish to offer a counterpoint. Your point-of-view, that proof isn’t death etc., is completely correct from your perspective, namely as a working mathematician/computer scientist deeply involved with proofs, where proof still has the highest value above all! Howver, if you look more broadly at a “society level” I think you could say the importance of proof is diminishing. In many commercial areas: Engineering, finance, software engineering etc. it my impression that proofs play a much lesser role than they used to. Often numerical methods and simulations are used instead. The reason are manyfold (of course interrelated in cause-effect in complex ways):

    1) Many practical situations are quite simply extremely difficult to tackle using exact or “closed form” methods so often simulations or numerical methods are the only available, practical approach
    2) Even when analytical methods might be possible, the expertise how to do them are not widely known and not easily verifiable either.
    3) Sometimes applying an analytical approach might take longer time (to think) than doing a simulation.
    4) Even if you manage to apply an analytical approach, if someone suggests a simple change to the model maybe you have to do a full re-analysis and hope another analytical approach will work. If a numerical was used, often you could simply change this aspect of the calculation and run it again. I.e. numerical methods are often more flexible.
    5) Computing resources are easily available – even the simplest laptop can quickly run a quite sophisticated simulation.
    6) Even analytical methods often end up relying on analytical methods. You can use the equation for solving a 3rd degree equation all you want, but at the end of the day you need to calculate some square roots to get a definite answer and how are you gonna do them? Numerical methods.
    7) Even as recent as the 80s and 90s, lay-people understood what a mathematical proof was and it held some respect. TOday, many people are mathematical proof “unaware” or doesn’t hold it in high regard. I have even heard things like “Yeah, proved until someone finds the opposite”, effectively treating a mathematical proof on the same level as empirical findings.
    etc. etc.

    I think there’s also a huge loss going in this direction, but it is what I am seeing. Maybe it is part of an even bigger trend about companies only chasing easy-wins, meaning most R & D departments are mostly just applying existing knowledge and doing very little research these days. Of course, this is also up to how you set the bar for “research”. Is putting some neurons together and running Tensorflow “true” research? I would say not, but clearly mainstream disagrees. Maybe that is why we are seeingly – arguably – less innovation than we used to.

    I can also see the other side of the coin – I got a lot of respect out of exactly solving a model related to a biological problem 20 years ago as a PhD student. The formulas came out very elegant and just enough surprising that it was beautiful. The others working in the field had relied on “simulations in spreadsheet” because of an inability to apply the math which was by no means hard (relying just on elementary analysis), so that’s why they were impressed. But to tell the truth, from a practical point-of-view I don’t even think it changed anything. Sure it looked very beautiful. But it was from an algebraic point-of-view it was elegant and surprising to see these short formulas matching up with reality, not from a practical point-of-view. What they had before, however crude it had been calculated, was “good enough” for practical purposes.

  70. Scott Says:

    Morten #69: I completely agree with you that in engineering and applied sciences, people care about “what works in practice” rather than rigorous proofs. But I disagree with your contention that that was ever not the case! 🙂 Non-rigorous math goes back at least to the Babylonians. Likewise, while I’m sure the “person on the street’s” understanding of mathematical proof is poor, I’d need to be convinced that the situation was ever different.

  71. Morten H Says:

    Hi Scott, yes I don’t know for sure that things have changed, it is purely anecdotal. But I think just the sheer availability for easy and cheap compute means that analytical approaches are “losing ground” to brute force. Back when compute and RAM was severely limited you really had to plan your calculations carefully and use clever tricks to reduce the burden. Not you can just simulate your model from the basic rules you type in without any attempt at reducing the workload for the computer, even though it is enormously wasteful. But I agree that for mathematicians who almost by definition work with proofs, things have not changed but that is almost a tautology. I think the “death of proof” thesis might have more application when viewed at a societal scale.

  72. Karl Palmskog Says:

    For people interested in formal computer-checked proof, it may be worth pointing out what are arguably the greatest achievements in the last 10-15 years, besides the already-mentioned 4-color theorem proof in Coq by Gonthier:

    – the proof of the Kepler Conjecture in 2017 which straddles the HOL Light and Isabelle/HOL proof assistants by Hales and his many collaborators (500k+ lines of code, 20+ person years effort):

    – the proof in Coq of the Feit-Thompson (odd order) theorem in 2012 in Coq in abstract algebra by Gonthier et al. (170k+ loc, 10+ py):

    – the correctness proof of the seL4 operating system kernel, 2009 and onwards, in Isabelle/HOL by Klein et al. (200k+ loc, 22+ py):

    – the correctness proof of the CompCert C compiler, 2006 and onwards, in Coq by Leroy et al. (40k+ loc, 8+ py):

    I think that many CS researchers (at least in Theory B) would say that these proofs have indeed provided new insights besides platinum certification. However, another important positive consequence has been “spillovers” in formal mathematics library development and proof assistant tooling improvements, which other researchers have continued to build on.

  73. Cris Moore Says:

    To Eitan #66, you’re right — I should have read more about the ABC conjecture before making that comment.

  74. IdPnSD Says:

    The truth can be defined using the following three sentences. (1) The laws of nature are the only truths. (2) These laws are created by the objects of nature and their characteristics. (3) Nature always demonstrates all its laws.

    The above definition seems to be very obvious and natural. But then this truth will prove that Mark Twain is correct – “Majority is always in the wrong”. Real numbers are not objects of nature, and therefore they must be false. Real numbers are points on a straight line. But there is no straight line in nature, because every object in the universe is continuously moving. Thus the foundation of mathematic is wrong and therefore the entire mathematics must be wrong. Since money is also real number, money must be false. Thus the entire economics must be false too. Since money controls everything on our earth, everything must be false or wrong.

    However engineering is not false, because engineering uses objects of nature. But since engineering also uses false mathematics, false science, and false money, engineering is unreliable, breaks down, and pollutes environment. The society is severely suffering; poverty is spreading, wars, migration, violence, discrimination, etc., are all destroying everything around us; and all because of false science and false money. If you eliminate money, and create moneyless economy (MLE) then the heaven will come to earth.