## The Quest for Randomness

So, I’ve written an article of that title for the wonderful American Scientist magazine—or rather, Part I of such an article.  This part explains the basics of Kolmogorov complexity and algorithmic information theory: how, under reasonable assumptions, these ideas can be used in principle to “certify” that a string of numbers was really produced randomly—something that one might’ve imagined impossible a priori.  Unfortunately, the article also explains why this fact is of limited use in practice: because Kolmogorov complexity is uncomputable!  Readers who already know this material won’t find much that’s new here, but I hope those who don’t will enjoy the piece.

Part II, to appear in the next issue, will be all about quantum entanglement and Bell’s Theorem, and their very recent use in striking protocols for generating so-called “Einstein-certified random numbers”—something of much more immediate practical interest.

Thanks so much to Fenella Saunders of American Scientist for commissioning these articles, and my apologies to her and any interested readers for the 4.5 years (!) it took me to get off my rear end (or rather, onto it) to write these things.

Update (4/28): Kate Becker of NOVA has published an article about “whether information is fundamental to reality,” which includes some quotes from me. Enjoy!

### 141 Responses to “The Quest for Randomness”

1. Michael Says:

What is the gain in the pursuit of randomness?

2. Scott Says:

Michael: Cryptographic security, for one thing … did you read the article?

3. Liron Says:

“The trick is the following: Given a probability distribution, we consider a search problem that amounts to ‘Find a string that occurs with large probability in the distribution and also has large Kolmogorov complexity.'”

Can you elaborate how “verify that a sampler’s output is faithful to a probability distribution” –> “verify that a searcher’s output has high Kolmogorov complexity” is a useful reduction?

Hi Scott,

Great article, thanks! One question for you:

You write, “As a first step, we could check whether the digits 0 through 9 appeared with approximately equal frequency, among, say, the first million digits. Passing such a test is clearly necessary for randomness…”

If someone took the first million digits PI and individually multiplied each digit by 2, would the resulting number be as random as PI? The resulting number would no longer be normal since there would be more even numbers than odd (when you multiply 2 x 0..4, you’d get an even product, and for 2 x 5..9 your product would consist of the digit 1 along with an even digit) and the only odd number in the resulting sequence would be 1.

If this number isn’t as random as the first million digits of PI, doesn’t it feel strange that you could reverse the process (dividing each digit or pair of digits starting with a 1 by 2) to make a more random number than you started with?

5. Scott Says:

Liron #3: Well, it converts a sampling problem into a search problem, which can be interesting for various complexity-theoretic reasons. For example, it allows you to prove the non-obvious statement that FBPP=FBQP if and only if SampBPP=SampBQP. The advantage of search problems over sampling problems, of course, is that it’s conceptually much clearer what you need to do to verify the output! Admittedly, running my reduction will produce a search problem that involves calculating the Kolmogorov complexity of the output, which is uncomputable! 🙂 By substituting time-bounded Kolmogorov complexity, I show one can bring the effort needed down to PSPACE and even the counting hierarchy, but it would be shocking if one could bring it all the way down to NP. Still, nice to know that FBPP=FBQP iff SampBPP=SampBQP.

For more details, see my paper The Equivalence of Sampling and Searching.

6. Scott Says:

Vadim #4: Here’s an even simpler version of your thought experiment. Take a uniformly-random n-bit string,

x = x1…xn,

and map it to a 2n-bit string in which every bit of x occurs twice:

y = x1x1…xnxn.

Then x is algorithmically-random, y is not algorithmically random, and there’s a completely reversible, easily-computable mapping between the two. But all that’s going on here is that x is maximally compressed, whereas y is needlessly verbose—I fail to see any “paradox.”

Yes, I see what you mean, thanks!

8. James Gallagher Says:

The Dilbert cartoon nails it – in physics, the important distinction between random and non-random is unpredictability – ie there is always a range of outcomes but any single one of them occurs without any possibility of human or god knowing or being able to predict which one will occur.

The mathematical discussion is just a boring analysis of this situation.

I liked most of your article, but your inclusion of boson-sampling makes it a little nerdy, not so much general interest reading

9. Jerry Says:

Scott: Very good article. I look forward to the sequel.

Leonard Mlodinow (@CalTech) has written the very enjoyable book, “The Drunkard’s Walk – How Randomness Rules Our Lives”.

http://www.amazon.com/Drunkards-Walk-Randomness-Rules-Lives/dp/0307275175/ref=sr_1_1?s=books&ie=UTF8&qid=1398204500&sr=1-1&keywords=drunkards+walk+how+randomness+rules+our+lives

He notes that Apple had to make their iPod “shuffle” option less random in order to make it appear random, as customers complained that songs would repeat to often, just like Dilbert’s “9”s.

Are you familiar with Benford’s law, used in forensic accounting?

Go to any page of the Boston Globe and count the number of times the digits 1 through 9 appear.

10. domenico Says:

I am not sure, but in crystallography there is the possibility to verify that some lattices (integer translation of three indipendent vectors) give costructive interference with a wave vector that it is a reciprocal vector (three indipendent vectors that are orthogonal to the lattice).
If the difference between scattered ray is an integer M, measured with a little pertubation of the interference with a little rotation, then if the direction of the incident ray is along a vector of the reciprocal lattice, then there is factorization of a number in a product of integer: if it is all right, a crystal can be used like a factorization for little integers, because the crystal produce all the possible interference (Laue method).
I am thinking that if the complexity of the crystallography results is great for each wavelenght, then the number of results that give the crystal is greater of a digital computer that give the same results with the theory.

11. Michael P Says:

You’ve got a typo here:
Both sequences have the same probability, namely, 2-30

12. Michael P Says:

You discussed both string compression, which is often associated with Shannon enthropy, and Kolmogorov complexity. What are known relations between enthropy and Kolmogorov complexity?

13. Scott Says:

James #8: Well, it’s obvious that randomness has something to do with unpredictability. Unfortunately, that observation by itself doesn’t get you as far as you might like. For how can you know if something is unpredictable or not? Just because you can’t predict it, doesn’t mean that no one else can. So that’s what the article is about—glad you liked most of it.

14. Scott Says:

Michael P #12: Good question! There’s a very strong relation, known since the 1970s, between the Shannon entropy of a computable distribution and the Kolmogorov complexity of a typical sample from that distribution. Indeed, this is exactly what I used to prove my equivalence of sampling and searching result. You can find a formal statement of the connection in Section 2.2 of my paper (or, say, in Li-Vitanyi, which is the standard reference for this area). But informally, it says the following: given any computable distribution D={px}x, almost every element x drawn from D has Kolmogorov complexity equal to

K(x) = log(1/px)±O(1),

where the O(1) can depend on the length of the program to sample D. In particular, we have

H(D) ≤ Ex~D[K(x)] ≤ H(D)+O(1),

where H is the Shannon entropy. (The first inequality is fairly obvious, while the second follows from the existence of the Shannon-Fano code.)

15. Miguel B Says:

Scott,

You wrote:

“… if the digits really were drawn completely randomly, and we looked at a million of them, then with overwhelming probability we would see roughly 10 percent zeros, 10 percent ones, and so forth.”

I fully expected your article to be the first “popular” account of randomness that didn’t assume that only an uniform distribution can be truly random. Sorry to be so picky, but this is a pet peeve of mine, and I was dissapointed.

Great article otherwise!

16. Scott Says:

Miguel #15: Sorry, I thought it was clear enough, from the context, that “completely random” was just a way to say “uniformly random” to a non-mathematician.

17. James Gallagher Says:

Scott #8

That’s why I included god in the things that can’t predict the outcome – I don’t believe in such a simple idea of god, but I do believe in such a simple idea of randomness (that no god could predict)

Look forward (as always) to the next installment

18. Miguel B Says:

Scott,

I understand. However, I can imagine a non-technical person thinking, “Wait a second, what if the completely random numbers come from throwing ten dice and adding the results?! Dr. Aaronson test is all wrong!”

Maybe I just overestimate non-technical people, but an intuitive understanding of the central limit theorem should be part of everybody’s mental toolkit.

19. William Hird Says:

Along the lines of psuedorandomness and unpredictability, Jeff Lagarias has a paper (sorry, no link) on psuedorandom generators where he famously states the unpredictability paradox: if a deterministic algorithm is unpredictable, it is hard to prove anything about it; in fact proving that it is unpredictable is just as hard as solving P vs. NP.

20. Scott Says:

William #19: Lagarias is right, but why is that a “paradox”? It’s a well-known phenomenon in computational complexity. Proving pseudorandomness is hard, and is indeed closely related to the great lower bound problems like P vs. NP (sometimes it’s easier, sometimes it’s actually harder, depending on what kind of pseudorandomness you want).

21. James Cross Says:

Scott,

I can’t believe you answer all of these questions. But I love it. I bought your because of it (Kindle – still a little pricey for digital but worth it).

So if the universe can be reduced or represented as bits. would it be a random number at any point in time?

22. Scott Says:

James #21: That depends on how you’re representing the state of the universe by bits, on what you mean by the state of the universe, and on what you mean by “random number.” For example, if you just look at a quantum pure state evolving by the Schrödinger equation, its Kolmogorov complexity (after truncating the amplitudes) increases only logarithmically with time—since to specify the state at a given time, it suffices to specify the initial state together with how long the state has been evolving for. On the other hand, if you include the results of quantum measurements (from an MWI perspective, which “branch” we’re in), then Kolmogorov complexity should increase more-or-less linearly with time, reaching its maximum when there’s just a soup of low-energy photons at the heat death of the universe. In that sense, one could say that the universe would converge toward a state that was “maximally random.” But even then, whether your encoding of the state of the universe was a Kolmogorov-random string, would depend entirely on whether you were representing the state in the most succinct possible way. For most reasonable encoding schemes, you wouldn’t be.

23. William Hird Says:

Scott#20:
My memory is bad on this, I forget the exact context for which Professor Lagarias uses the word “paradox”, but he shows three mathematical objects, (1) a secure psuedorandom generator, (2) a one way function, (3) a perfectly secure cryptosystem. If one of these objects exist, they all exist. I’m going to guess that he uses the word paradox akin to “a difficult puzzle to solve” 😉

24. Rahul Says:

Interesting article! What’s the state-of-the-art method when testing random number generators in the field? Are they tested by nuanced versions of n-digit frequencies? Or Kolmogorov based techniques?

25. Scott Says:

Rahul #24: If you don’t know anything about where your random numbers came from (e.g. that they were quantum-mechanically generated), then the “state of the art” is just to throw a whole bunch of statistical randomness tests at your sequence, looking for patterns that would make the sequence compressible. (If you’d like to “try this out at home,” then simply try feeding your sequence to good compression programs such as gzip.) You can think of what you’re doing as computing an extremely crude upper bound on the Kolmogorov complexity. I.e., if you find a pattern that makes your sequence compressible, then you’ve proved that it has non-maximal Kolmogorov complexity, so it almost certainly wasn’t drawn uniformly at random. But the converse isn’t true: there could be a subtle or not-so-subtle computable pattern (that your sequence was the binary digits of √2, let’s say) that your statistical analysis tools simply failed to find.

26. Attila Szasz Says:

Wow, I’d really love to see your take on a Great Ideas in Algorithmic Information Theory course/notes someday,
there are a plenty of subtle topics covered in Li-Vitányi exceptionally well, but I’m still pretty sure your style and insights would prove useful for anyone trying to get a first grasp on stuff like Levin search, inductive reasoning, Chaitin type perspective of Gödel’s theorems (his omega and philosophical remarks probably included), the incompressiblity method, time bounded kolmogorov complexity (I especially wonder whether you’ve ever used some of this material in your work, some theorem of Fortnow or Sipser for instance), just to name a few.

27. Scott Says:

Attila #26: OK, will add to my stack! 🙂 (Part of the challenge, but also part of the reason to do to it, would be to master this material myself.)

28. asdf Says:

You know about Downey and Hirschfeldt’s book on algorithmic randomness? There is an online draft:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.666

It looks really interesting, though mostly about computability theory and logic, including relative computability. There’s a part though about notions of randomness that are computable.

29. Scott Says:

asdf #28: No, hadn’t seen that! Will take a look.

30. William Hird Says:

Rahul#24

Assuming you are talking psuedorandom numbers and not “real random” numbers, I would say (my opinion) that the Dieharder suite of random number test software is state of the art. See Robert Brown’s website at Duke University. He is quite a character!

31. gidi Says:

I thought of an (obviously) wrong counterargument to the proof that K(x) cannot be computed, which led me to a question.

When one thinks of computing K(x) the obvious algorithm is simply going over all strings in increasing order and running each string as a program to see if it outputs x. Since “print x” works, the number of strings we have to go over is finite, so this is “almost” good.
The problem of course is that when we run a program we never know if it will eventually print x(e.g. the halting problem), so the above algorithm is no good.
However, this led me to think to the following question:
– are there notions of Kolmogorov-type complexity that count also the resources used till we print x? (resources in terms of time and maybe also memory).

32. Rahul Says:

Scott #25: Thanks!

So, I’m very eager to read the next part of Scott’s article.

But I cannot resist asking: Is there really a way to do better that the current strategy of statistical testing?

i.e. Can there really be a test that can certify random numbers to the extent that if a a black box were spewing out something like Pi() or sqrt(2) or some such statistically random pattern then it could tell that apart?

Sounds amazing to me! Am I understanding this correctly?

So what if one took a large sequence certified to be random by this test and then loaded it on another black box as a list and then tested Black Box #2? Would this divine test flag it as non-random when reused?

33. Sam Hopkins Says:

Clearly a pseudorandom number cannot have a high Kolmogorov complexity. Is there a rigorously-defined property that can tell us when we have a good pseudorandom number? “Statistical randomness” seems pretty ad hoc, and might do a good job for practical purposes, but doesn’t seem that conceptual. Because of the intimate relationship between pseudorandomness and, for example, one-way functions, I would think there ought to be some measure like Kolmogorov* complexity, where the asterix indicates we take into account time complexity as well. Well, not exactly time complexity of the forwards direction (producing the number), but time complexity of the backwards direction (inverting the pseudorandom generator).

So the setup would be something like: we have a string a_n of numbers, indexed by natural numbers n \in N, say. We look for a Turing machine that can invert the production of these numbers (i.e. yield n on an input of a_n), and see how great a time-complexity it has. The higher the minimum time complexity, the better the “pseudorandomness” of the sequence of strings.

Surely something like this has been considered before.

34. Scott Says:

gidi #31 and Sam Hopkins #33: Yes! Both of you are groping toward something that actually exists, namely, the theory of time-bounded Kolmogorov complexity (which, sadly, I wasn’t able to get into in this article). You can, for example, define KT(x) as the minimum, over all programs P that output x, of |P|+log(T(P)), where |P| is the number of bits in P and T(P) is the number of time steps. And there are other ways to combine program length with running time, to get a single measure of time-bounded Kolmogorov complexity.

And yes, passing from K to KT immediately solves the uncomputability problem of Kolmogorov complexity: now you only have to iterate over finitely many programs, each for a finite amount of time! But maybe not surprisingly, instead of uncomputability you now have a computational intractability problem on your hands—arising from the fact that you still have exponentially many possible programs to check.

You might hope that there would be some clever way to avoid the exponential search. But alas, if cryptographically-secure pseudorandom number generators exist—which, as William #23 said, is known to be equivalent to the existence of one-way functions—then there can’t be such a way, since it could be used to distinguish random strings from pseudorandom ones, and hence break any PRNG, if it existed. In fact, time-bounded Kolmogorov complexity is so intimately related to pseudorandomness in cryptography and complexity theory, that usually people just talk directly about the latter rather than about KT.

I say more about pseudorandomness in Chapters 7 and 8 of QCSD, and there are many good resources elsewhere (e.g., Luca Trevisan’s lecture notes and Avi Wigderson’s survey articles).

35. Scott Says:

Rahul #32: Alas, if your “divine pseudorandomness test” existed, then as I explained in comment #34, you would be able to use the test to break any cryptographic pseudorandom number generator. And that, in turn, is known to let you invert any one-way function. So, if OWFs exist—which is only a “slightly” stronger conjecture than P≠NP—then your divine test shouldn’t be possible.

That’s not to say, of course, that one can’t “approach” divinity without reaching it! For example, if you tell me that your bits being the binary expansion of some famous irrational number is the possibility you’re worried about, then we can simply throw a check for that in to our randomness tester. On the other hand, if all you tell me is that you’re worried your n-bit string can be generated by some computer program that’s √n bits long and takes n2 time steps, then I can’t efficiently test for that without being able to invert OWFs.

36. Rahul Says:

Scott #35:

Thanks again! So what’s the rough route / roadmap to doing better at random number generation than where we are now (say, the Yarrow algorithm). i.e. How? And why?

Is there a practical need to do better than what can fool a typical statistical randomness test suite? And if someone offers me a YarrowPlus claiming it’s better than Yarrow what randomness metric do I use to judge? i.e. both will equally resist compression or perform well at conventional tests, right?

PS. Am I asking too many questions? If so, I’ll stop. 🙂

37. Scott Says:

Rahul #36: Yes, there’s a practical need for random numbers better than what could fool a typical statistical randomness test suite—and that practical need arises almost entirely from cryptography. One part of the solution is cryptographic pseudorandom number generators, such as Blum-Blum-Shub, Blum-Micali, etc., whose security can be based on “standard” hard problems such as factoring and discrete log. This Wikipedia page gives a good summary of what’s available.

Note, however, that certain elliptic-curve-based CPRNG’s were recently revealed to have been backdoored by the NSA! So, that underscores the importance of checking that whatever CPRNG you’re using actually matches what was analyzed in a theoretical security reduction, rather than relying on intuition or authority, or saying “sure, some shortcuts were taken in implementation, but I still don’t see how to break the thing, so it’s probably fine.”

Another part of the solution is quantum-mechanical random number generators. For that, you can just use a Zener diode, or amplified thermal noise in analog circuits (as recent Intel chips actually do). Or, if you’re paranoid about your hardware being compromised, you can also use Bell inequality violations to get “device-independent, certified randomness”—something whose theory was worked out within the last few years, that’s just starting now to be demonstrated experimentally, and that’s the subject of Part II of my article.

The two approaches are more complementary than competing—since typically, what you’d want to do in practice is first use physics to generate a small random seed, then use the cryptographic methods to expand the seed into a long pseudorandom string.

And yes, that’s enough questions for today. 🙂

38. William Hird Says:

I would like to mention one more point about the Lagarias paper on generating psuedorandom numbers, he doesn’t mention the field of cellular automata as a possible source of algorithmic unpredictability. Cellular automata (like Conway’s Game of Life) are known to exhibit the property of emergence, it appears to generate states that can’t be predicted from the initial conditions. I think that if any generator is ever going to be proved to be unpredictable, it will probably have to be based on the principles of the cellular automata. So one could envision a generator based on the theory of randomness extraction: you would have one cellular automaton generating a “pool of bits” close to having a uniform distribution and then have a second cellular automation extract the bits from the “pool” , like a lottery drawing mechanism.

39. fred Says:

small typo page 4
“But if k is even a little bit smaller than n (say, n – 50), then 2^k+1 – 1 will be vastly smaller than 2n.”
should be “2^n” I think.

40. Scott Says:

William #38: Yeah, Wolfram discusses the same thing in New Kind of Science. I completely agree that cellular automata (maybe combined with a randomness extractor) seem like a good and obvious way of getting cryptographic-quality pseudorandomness; I should’ve mentioned that in comment #37. If a CPRNG is all you want, then you don’t need anything “structured” like factoring or discrete log: in principle, any one-way function is known to suffice. And a sufficiently “scrambling” cellular automaton seems like it should be able to give you, not merely a one-way function, but a pretty good source of pseudoentropy directly (i.e., without needing to apply some complicated reduction to get the pseudoentropy out).

41. Sandro Says:

This part explains the basics of Kolmogorov complexity and algorithmic information theory: how, under reasonable assumptions, these ideas can be used in principle to “certify” that a string of numbers was really produced randomly—something that one might’ve imagined impossible a priori. Unfortunately, the article also explains why this fact is of limited use in practice: because Kolmogorov complexity is uncomputable!

Given Kolmogorov complexity is incomputable, could there exist some deterministic algorithm that could pass all possible statistical randomness tests, in principle?

42. fred Says:

Scott #40
From what I’ve seen it’s not that it’s necessarily hard to come up with better sources of pseudo-randomness (cellular automaton and whatnot), but the difficulty is that the method has to be practical, i.e. fast and efficient enough to be embedded on cell phone chips, etc.

43. Rahul Says:

Given Kolmogorov complexity is incomputable, could there exist some deterministic algorithm that could pass all possible statistical randomness tests, in principle?

Pi() would, rght?

44. Scott Says:

Sandro #41: No, the obvious problem is that, as soon as you specify which deterministic algorithm A you’re using, there’s always at least one statistical test that distinguishes A’s output from random. Namely, the test that simply checks whether or not your string is the deterministic output of A! 🙂 This, in particular, kills Rahul #43’s suggestion of using the digits of π.

What’s not ruled out by this argument, and what we think you can actually get, is a deterministic scheme for randomness expansion. I.e., for starting out with a 500-bit truly random seed s (which maybe you obtained quantum-mechanically, or from thermal noise, or the weather, or whatever), and then deterministically expanding s into a million-bit random string f(s), in such a way that the only way to distinguish f(s) from a uniformly-random million-bit string, is essentially to loop through all 2500 possibilities for s. Or more generally, to expand an n-bit random seed into as many pseudorandom bits as you want—say, n10 of them—in such a way that distinguishing the result from random requires exp(n) time. This is exactly what CPRG’s (cryptographic pseudorandom generators) accomplish, under plausible complexity assumptions.

45. Nick M Says:

If Kolmogorov complexity is uncomputable so is Omega? But (an) Omega was computed to 64 digits by Calude, Dinneen and Shu back in 2001 (admittedly using a lot of tricks) and the result has Greg Chaitin’s endorsement.

46. Sandro Says:

Scott #44:

in such a way that distinguishing the result from random requires exp(n) time.

This is closer to what I was getting at: is it possible that some deterministic algorithm performing randomness expansion could pass any computable test, regardless of its complexity. Certainly the complexity criterion of taking exp(n) makes for “practical/good enough” CPRG, but I’m wondering if there’s some achievable ideal, in principle.

I’m assuming a randomness test is simply given a bit string source that it can query to arbitrary length. This is basically induction on bit string sources, ie. Solomonoff Induction could eventually reproduce the generating function, but it’s dependent on incomputable Kolmogorov complexity, so this seems to leave open the possibility that *some* algorithm could not be discoverable by *any* induction.

47. William Hird Says:

Fred#42:
Fred, many cellular automata functions can be implemented using shift registers and simple logic gates: these are trivial to implement in hardware and as we all know, there seems to be no end to man’s ingenuity to cram more logic circuitry into smaller spaces ( and draw less current too !).

48. Sam Hopkins Says:

Scott #34: I’m a little confused about how the time-complexity of the TM that produces a pseudorandom number should factor into how “pseudorandom” it is. Presumably we want our pseudorandom generators to be fast. It’s the “inverse TM” that we care about being slow, right?

49. Scott Says:

Nick #45: Yes, Ω is uncomputable—I wouldn’t say “because” Kolmogorov complexity is uncomputable (a different proof is needed), but it is. And yes, for a suitable choice of programming language, it’s possible to compute a finite prefix of Ω, maybe even a long one. (For example, if I had a language where it took more than 1000 bits to write a halting program, then I could trivially know that the first thousand digits of Ω were all 0’s!) Likewise, with enough cleverness, it’s possible to learn the values of K(x) for various particular strings x, at least when K(x) is sufficiently small.

There’s no contradiction whatsoever here with the general unsolvability of these problems—any more than there is between the MRDP theorem (showing that solvability of Diophantine equations is uncomputable in general), and the fact that Andrew Wiles managed to prove Fermat’s Last Theorem.

50. Scott Says:

Sandro #46: Well, if you have a deterministic function f that takes an input seed x of size n, and that expands x to a longer pseudorandom output f(x) via a computation taking T(n) time, then it’s always going to be possible to “reverse-engineer” f (i.e., to find a preimage x mapping to a given f(x)) in 2nT(n) time, by simply trying all possibilities. In that sense, what you’re asking for is impossible.

51. Scott Says:

Fred #42: Yes, as William #47 says, cellular-automaton-based CPRG schemes also tend to have the advantage of being more efficient than the number-theoretic schemes. (Though there’s nothing magical about cellular automata here: any “sufficiently scrambling” operation on bits is likely to work just as well.)

52. Scott Says:

Sam #48: Generally, given a seed of length n, you want the forward-computation of your CPRG to be “efficient” (i.e., to be doable in poly(n) time), whereas inversion should require exp(n) time.

53. Sam Hopkins Says:

One more question: do you know of any connection, formal or mere analogy, between randomness defined in this computational sense, and “quasirandomness”/”pseudo-randomness” as studied in additive combinatorics? Looking from far outside, it seems one of the big takeaways from that field is that an object will either behave nearly like a “random” one, or will nearly have a rigid structure.

54. Michael Dixon Says:

Scott,

I’ve recently been trying to discover ways of implementing some notions of cryptographic protocols to scenarios to situations dealing with non-linear time and time travel. I’m a bit bothered by how many conflicts in Star Trek-like shows could be avoided (or enhanced) with clever applications of cryptography and secrecy of information.

For example, what kind of cryptographic primitives would be needed to help distinguish a dishonest time traveler from their “past/present” counterparts? This might seem impossible at first, since the time traveler can play dumb. But imagine that we forced the two to play an information game against each other (their aims being to prove that the other is the time traveler). In addition, let us have the ability to put them to sleep and wipe their memory similar to the situation posed in the Sleeping beauty problem. Can we distinguish one from the other?

In trying to solve this, one of the biggest hurdles I’ve encountered concerns how the relationship between randomness and time is established. The most familiar (classical) models of randomness, such as Martin-Lof’s, are definitely not “exotic” enough to work with strange variants of temporal logic. What other models of randomness might you suggest (given they exist) I look into instead?

55. Rahul Says:

I had a question about the CSPRNG vs. PRNG distinction: Wikipedia seems to say that most PRNG’s will even fail the next-bit test i.e. Given a random sequence the next bit is predictable with more than 50% probability.

How does such an attack work? Say for Mersenne twister or LCG how does one reverse engineer them to predict the next bit?

56. Scott Says:

Sam #53: Good question! The type of pseudorandomness that arises in things like the Green-Tao theorem (or the Riemann Hypothesis, for that matter) is typically much, much weaker than the type of pseudorandomness sought in cryptography. For the latter, you need pseudorandomness against completely-arbitrary polynomial-time distinguishers, whereas for the former, you “merely” need pseudorandomness against whatever specific regularities would make your theorem false if they were present.

The flip side, of course, is that in cryptography you have the freedom to design your PRG however you like—the more structureless, the better—whereas in math, you’re trying to prove pseudorandom behavior in particular, “structured” objects (such as the set of prime numbers).

Having said that, for many applications in theoretical computer science (e.g., Valiant-Vazirani, approximate counting, error-correcting codes, extractors and condensers), we too only need the weaker kinds of pseudorandomness—the kinds that can be shown to exist unconditionally using present tools! And largely because of that, there’s been an extremely rich interaction over the last decade between CS theorists and mathematicians about pseudorandomness (especially regarding additive combinatorics, and its applications to randomness extractors). Avi Wigderson, Terry Tao, and Timothy Gowers have all written eloquently about this story, and I’m not going to do it justice in a blog comment.

57. Jon Lennox Says:

Michael @54: Given Scott’s proof that $$P^{CTC} = PSPACE$$, does a universe with time travel in fact have any cryptographic primitives at all?

58. William Hird Says:

Mike Dixon #54
Hi Mike, just out of curiosity what is (are ) the motivation (s) for your question, are you a sci-fi writer looking for plausible story lines or is the question motivated by pure scientific inquiry? If the latter, there are several problems with some of the concepts you are alluding to like “time travellers”; we don’t even know if such a phenomenon is even possible (is it even “allowed” by the laws of quantum mechanics as we know them)? What do you mean by “nonlinear time”, can you relate this notion to some known aspect of general relativity?

59. Richard Says:

A lovely article. I have a question about history. When you wrote “Kolmogorov-Solomonoff-Chaitin complexity”, does that reflect the historical order of their discoveries?

60. Scott Says:

Richard #59: No, according to Wikipedia it was first Solomonoff in 1960/1964, then Kolmogorov in 1965, then Chaitin in 1966/1968. But besides their being independent, they also had different concerns—e.g., Solomonoff was much more focused on inductive inference than on the K(x) function as such.

61. Michael Dixon Says:

@57: Yes and no. It depends on what your standards for cryptographic primitives are. If you say that computationally feasible = polynomial time and expect that any implementation of a primitive has to be based on computational (in)feasibility, then probably. Though this is not always the case. For instance, others have considered using NC^0 or logspace instead.

However, from a pure logic setting, you can abstract away the computational infeasibility aspect entirely. For instance, I can just assume the existence of a perfectly secure Enc(k,m). I don’t have to worry about implementing its security using a computationally hard problem. I only care about how I can preserve security with its usage. This is what we happens in BAN logic and CPL.

@58: For me it is just a theoretical or philosophical inquiry. Even given the silly premises of these sci-fi scenarios, can ideas from cryptography and information security help us? I think the answer is an obvious yes.

Recently, I researched various logics for cryptographic protocols. I noticed that they indirectly assumed a lot about the nature of information/randomness. Typical protocol models admit a close connection between randomness and time. Algorithmic randomness is typically understood using sequential forms of computation. Randomness defined using notions of unpredictability require that past “events” not yield information about a future event (such as a random number generator). Steps for executing protocols are done sequentially (sometimes in parallel). Though, all of these assume that there is a linear order on states/events. I want to know if we can generalize or spice up these concepts to handle more elaborate situations.

For further clarification, I’m only considering the basic logic surrounding the problem. I am ignoring the laws of modern physics for the moment. So instead of thinking about the possibilities permissible by quantum mechanics, I am thinking about what is possible under variations of LTL (linear temporal logic). By “nonlinear time”, I am referring to temporal logics that assume a non-sequential structure to time. I am not suggesting that such a thing is sensible in the real world. It is just a thought experiment.

62. Dani Phye Says:

Scott, sorry this is slightly off topic (loved the article by the way), but I was reading your lecture notes for Quantum Computing Since Democritus, and I got to Lecture 6: P, NP, and Friends.

In it you state that PSPACE is contained within EXP, because “A machine with nk bits of memory can only go through 2n^k different configurations, before it either halts or else gets stuck in an infinite loop.”

Does this take into account (assuming the standard multi-tape Turing machine with a working string, read-only input, and write-only output) the fact that the head could be in any of nk positions, with m possible internal states, for a total of nk*m*2^(n^k) possible configurations? Or are you just assuming they are asymptotically close?

63. Scott Says:

Dani #62: Sorry if it was unclear! In that passage, I was simply defining the memory to consist of everything needed to describe the machine’s current state—so if it’s a Turing machine, then the tape head position, the internal state, the works. But I invite you to check that, in any case, including that stuff adds only log(nk)+O(1) bits, so it has no effect whatsoever on the asymptotics.

64. Dani Phye Says:

Scott 63: I see, thanks. I’m assuming the proof would be to simply take my n^k*m*2^(n^k) possible configurations, and have the tape itself represent the binary number for one of those, in which case the length would be log(n^k*m*2^(n^k)) = log(n^k)+log(m)+n^k,
= log(n^k) + O(1) + n^k,
though I’m not sure how that would be realized in practice on a Turing machine.

65. Scott Says:

Dani #64: Just forget about Turing machines, and think about writing the simulation in your favorite programming language. (And even there, just enough to convince yourself that such a simulation could be written.) Because of Turing-equivalence, the low-level details really don’t matter.

66. Dani Phye Says:

Scott #65: Well I always assumed Turing machines were nice because of their simplicity (when writing proofs), but they are horrendous to program in, you’re right. I suppose our computers generally do such a simulation too (storing program state in memory, and not really having much of an internal state if you idealize away the cache and such) already, so it’s “straightforward” in that you don’t have to do anything. Again, thanks!

67. Sandro Says:

Scott #50:
How does this work exactly? The set of total functions are not recursively enumerable, so without knowing the randomization function or its input, how would you design a preimage attack?

68. Scott Says:

Sandro #67: In this game, you always assume that the adversary knows the function f—i.e., you’re not trying to do “security through obscurity.” Everything that the adversary doesn’t know is encapsulated in the input x to f. Indeed, if the adversary didn’t know f, then it would be totally unreasonable to call f a “deterministic PRG,” since all the randomness you wanted could simply be shoehorned into the choice of f! 🙂 Now, given that the adversary knows f, all she has to do is loop over all 2n possible inputs x, and evaluate f(x) for each.

69. Shmi Nux Says:

Scott, some three years ago I ventured to make a rather pedestrian estimate of the upper bound for the complexity of QM, partly in response to Yudkowsky’s (and others’) claims that the MWI version is somehow “simpler” than the one with an explicit Born rule (if one defines simplicity as Occam’s razor expressed by measuring Kolmogorov’s complexity of a given model). It seems that the preference for MWI currently cannot be based on the Occam’s razor so expressed. Did I make any obvious blunders?

70. A. Certain Says:

I’m confused about one of the points in the article. You write, “When should you suspect a coin of being crooked? When a sequence x of flips of the coin satisfies K(x) « |x|.”

If you had a coin that was weighted to produce heads 75% of the time, I think most people would think of that coin is “crooked,” but (at least from my understanding of the article and the underlying math), the sequence would still have the property that K(x) = O(|x|).

I’m assuming that the problem is that you used the word “crooked” to mean not random, as opposed to not “fair.” Is that correct?

71. Scott Says:

A. Certain: No, if the coin was weighted to produce heads 75% of the time, then you can calculate from Shannon’s entropy formula that K(x)≈0.811|x| with overwhelming probability, and that’s certainly small enough compared to |x| to make it clear that x wasn’t uniformly random.

72. Scott Says:

Shmi Nux #69: I’m not sure that trying to estimate the Kolmogorov complexity of existing physical theories is useful—not merely because K(x) is uncomputable, but because even if you could compute it, the answers would depend severely on the choice of programming language, and even more, on what sorts of things you want to use the theory to calculate (e.g., how richly are you allowed to specify the initial data? how much is actually done by the program itself, and how much is ‘offloaded’ to whoever supplies input to the program?).

So I don’t think you can do better, at present, than relying on the simplicity-judgments of people who really understand the theories in question. Even there, though, you find an interesting effect, where the better someone understands a theory, the less complex it will seem to that person. For example, if you ask a general relativist, the defining equation of GR couldn’t possibly be simpler: it’s just G=8πT. Of course, if you’re novice enough to need definitions of G and T—well, the definitions fill a whole page if you write them directly in terms of the metric tensor, less if you use intermediate concepts. And if you need a definition of the metric tensor too … you see the point. GR might have a somewhat-large “Kolmogorov complexity” if you’re starting completely from scratch, but it has tiny Kolmogorov complexity if you’ve loaded enough mathematical intuition into your linking libraries.

73. domenico Says:

I am thinking that the difference between a random number and a calculated numbers can be obtained from the evaluation of the Hausdorff dimension of the graphical reprentation of the numerical sequence of the group of numerical digit: for example d_1={x_1,x_2,x_3},d_2={x_4,x_5,x_6}
I am thinking that each numerical calculus is a function on the numerical digit, and each function is a subspace of the graphical representation, and a true random number have not a simple numerical calculus to obtain the numerical digit: the random number could have D dimension so that fill uniformly all the space, and a calculated number could have D-\alpha dimension.

74. Neal Kelly Says:

Hi Scott,

I was just reading an arstechnica article on Stanford’s password policy and it got me wondering about how that guideline fits in with Kolmogorov complexity.

Since words are drawn from language, and thus vastly more likely to contain certain letters over others, I’m assuming the reduced Shannon entropy of such a random-word password would satisfy K(x) « |x|.

I feel like I should know this from reading recent posts, but would the Kolmogorov complexity of such a password after it’s hashed and stored on server still be significantly less than |x|, and does that matter for cryptographic security?

I assume if we take a string with complexity K(x), and send it through some hashing algorithm, the complexity of the output is at most K(x)+c, where c is related to the complexity of the hashing algorithm. It seems to me that this upper bound essentially follows from the definition of K(x).

Then, since K(x) isn’t computable, a hacker couldn’t easily identify whether an individual password was generated from a word-phrase instead of random letters, but could they, for example, run all the hashed passwords in a database through gzip, and then see which passwords were most efficiently compressed, and target those in particular?

75. Scott Says:

Neal #74: Yes, everything you write looks correct. You can’t get blood from a stone, and you can’t get entropy (or algorithmic randomness, which is interchangeable for this discussion) from nothing. So, if you actually care about security, then I recommend choosing a password with lots of numerals, punctuation marks, letter sequences that mean something only to you, etc. etc. just like all those websites tell you to do. Of course, in the more common case that you don’t care about security, using your birthday or a dictionary word is fine. 🙂

76. Neal Kelly Says:

And regarding the advice to pick 4 words, like in the xkcd strip, does that introduce an actual, exploitable attack?

77. Scott Says:

Neal #76: A nonsensical concatenation of words is a fine choice. It takes longer to type (especially on a mobile device), but it might be easier to remember, and could easily have at least as much entropy as a random concatenation of 6-8 alphanumeric characters. Or, of course, you could just abbreviate the words, or replace them with their first initials, to get the best of both approaches.

78. Douglas Knight Says:

Everyone should uses passwords consisting only of lowercase letters. Other rules are enforced by idiots who don’t compute entropy, as at Stanford. Using all the numbers and symbols on my keyboard gets 6.6 bits per character, compared to 4.7 for just the lower case alphabet. If 8 characters drawn from the large set is OK, that’s 52 bits, which you can get in 12 characters drawn from the small alphabet. Yet Stanford requires 20 characters and then demands our gratitude for their bullshit generousity. And that 8 character password is only as good as the 12 character a-z password if every character has equal chance of being uppercase, of being a number or a symbol. Is that true? Of course not: people add just enough symbols to get by the gatekeeper, which is worth hardly anything.

When you change your google password, it does a good job kibitzing on the strength. I find the update on every additional character enlightening.

79. Sandro Says:

Scott #68

Sandro #67: In this game, you always assume that the adversary knows the function f—i.e., you’re not trying to do “security through obscurity.”

This makes sense for a conservative security analysis, but I’m after a different question. I’m trying to figure out what we can ascertain automatically by induction on observable properties, in principle.

As a specific example, can we ascertain whether quantum mechanics is truly indeterministic by observing enough quantum randomness? de Broglie-Bohm is a deterministic interpretation of QM, so clearly there is at least one possible theory that could generate quantum randomness indefinitely. So could there exist some randomness test that might one day determine whether quantum randomness was truly indeterministic, and thus decide between deterministic or indeterministic intepretations of QM?

80. Scott Says:

Sandro #79: Well, it’s not just “conservative security analysis”—it’s that part of what it means for f to be deterministic is that the would-be predictor knows it. Otherwise, you might as well just let f introduce new random bits and thereby solve the entire problem by fiat!

Regarding your new question: that’s what Part II of my article (coming soon) will be all about! But briefly, in a certain philosophical sense, you can obviously never rule out determinism. For someone could always maintain that, no matter how random events look, everything that’s ever happened or will happen is deterministically foretold in “God’s unknowable encyclopedia.” You can never rule that out—and in fact, it’s not that different from de Broglie-Bohm, which also posits an underlying determinism, but one that we can’t “see” by making measurements even in principle.

Even there, though, quantum mechanics (specifically, the Bell inequality and its more modern refinements) lets us say something nontrivial and surprising: namely, that there can’t possibly be any way to “cash out” this hypothetical determinism into actual predictions of quantum measurement outcomes, unless you can also exploit the determinism to send signals faster than light. And furthermore, even if you only want to believe that the determinism “exists” at an unmeasurable level, you then also need to believe in a preferred reference frame, which goes against the spirit of special relativity.

81. BlueFive Says:

Your use of the phrase “base 2 numbers,” while perhaps technically defensible, is confusing and needlessly distracts the reader.

First off, you’re using “base 2” as a compound modifier, but you haven’t inserted a hyphen. Second, you use the plural “numbers,” which set this reader off on a search for other uses of 2 as a base, a search which was not satisfied until ten paragraphs later. And third, by far the most common use of the term “base 2” is in describing the base-2 number system, not as a description of an exponential expression. And in an article that includes quite a few strings comprised exclusively of zeroes and ones, the confusion is heightened.

Instead of writing “base 2 numbers are used because,” it would have been clearer to write “the base ‘2’ is used because” or ” ‘2’ is used as the base of the exponential expression because.”

Other than that, great article!

82. James Gallagher Says:

Sandro #79 Scott #81

As Scott says, about the best we can currently do wrt to ruling out deterministic hidden-variables is Bell Inequality tests, GHZ state observations and similar experiments carried out by, in particular, groups like Anton Zeilinger’s.

However I would suggest deterministic theories can be demonstrated to be unreasonable by a very simple “demonstration” of free-will – just walk around in circles but reverse direction every prime number times of revolutions.

Now, no known deterministic theory of nature can get a macroscopic body to do that – ie it is incredibly unlikely from a determinsitic interpretation – it’s never been observed elsewhere in the universe for example, and if it was observed we have no deterministic explanation for it – we would surely think we had discovered a signal of intelligent free-will operating elsewhere in the universe.

83. Scott Says:

James #82: Sorry, but such an experiment tells us nothing about free will, or even about indeterminism. And I say that as someone who’s gone on record taking “free will” WAY more seriously than most scientists! 😉 In particular, the results are perfectly consistent with the possibility that we’re deterministic automata that evolved by natural selection to have large brains capable of Turing-universal computation, so in particular, of generating the sequence of primes. No one denies that we’re complicated (and sometimes even intelligent)—the question is just whether we’re also “indeterministic” (and if so, in which senses)!

84. James Gallagher Says:

Scott #84

That’s why I was careful to use the word unreasonable!

I’m very aware that the full machinary of deterministic automata can be brought to the table – but you still have such a hard job of getting a macroscopic object to do prime number based dynamics that it becomes close to believing in Gods or any other kind of incredible supernatural influences – rather than just the simple idea of genuine free-will.

As long as we don’t allow free-will to do magic (something not allowed by physics) I think it’s an ok idea – so we might speculate that free-will enables us to “load the dice” for a collapse eigenstate – somehow evolution discovered this and it’s an incredibly complex emergent phenomena that we don’t currently understand.

BUT, I think I’ve gone a little off-topic 🙂

85. James Gallagher Says:

I meant to say something like:

…you have such a such a hard job of getting a macroscopic object to do prime number based dynamics spontaneously along with all the usual observed behaviour…

86. Scott Says:

James: Most scientists would say, “sure it’s hard, and that’s why you needed a mechanism as powerful as natural selection, plus four billion years, to do it!” The argument you’re making seems to me to venture well beyond the questions of free will and indeterminism, and into the realm of intelligent design (i.e., it seems tantamount to denying that natural selection can produce the sorts of complex adaptive behaviors that we see on earth). Or if not, then I don’t understand what blocks that implication.

87. James Gallagher Says:

Scott #87

The emergence of free-will is surely a phase change beyond simple evolutionary adaption to the environment.

But I’m going to sound hopelessly imprecise and pretentious to try to talk about this, like an ancient greek talking about fire or lodestones.

So, whereof one cannot speak, thereof one must be silent

For now…

88. Itai Says:

Scott
Will you consider probability distribution who has no second moment ( infinite variance ) and has first moment,
or distribution who has both no second moment ,and first moment is not defined by Lebesgue integration ( so the conditions of the strong law of large numbers will not hold)
to be higher kind of randomness in probability theory?
will you call it “Knightian uncertainty” ?
I can give you examples of such distribution if you are not familiar with it.

89. Scott Says:

Itai #88: No. If the distribution can be specified in advance, then I wouldn’t call it “Knightian uncertainty”; the moments are irrelevant.

90. Itai Says:

“Knightian uncertainty” sound to me much the same as Donald Rumsfeld “unknown unknowns”?
http://en.wikipedia.org/wiki/There_are_known_knowns
known knowns -can be described as deterministic
known unknowns – can be described as stochastic
unknown unknowns – not any of the above .

So,would you call such distributions more random than another?
such distributions have infinite range, and not exponential small chance of having extreme values ( like in most well known infinite range distributions : geometric ,normal or exponential distributions ).
I wonder if such distribution are possible in physics, and if not what prevents it from appearing there ( strong law of large numbers not valid, and standard deviation can make trouble with uncertainty principle )

91. Scott Says:

Itai #90: Precisely because these distributions need to have infinite range, I question their physical relevance. In real physics, you basically always have a finite cutoff for anything you’re calculating—given, if nothing else, by the Planck scale. Indeed, when there’s not a known finite cutoff (as in unrenormalized QFT), you typically have to impose a cutoff, to avoid getting nonsensical infinite answers for measurable quantities!

92. Itai Says:

Scott #91
I heard Cauchy distribution / Lorentzian is used in physics, also is Levy distribution.
not sure if it is for measurable values.
such distribution have feasible physical values.
If those distribution are not physical ( I’m not sure there is a physical argument for it )
so we should demand in addition to normalization that :
1. integral X^2 |psi|^2 < inf
2. integral |x| |psi|^2 < inf
nobody demands it , I don't know any real wave function psi that not hold these condition ( but maybe it's possible ,who knows ).
The standard infinite range distribution (normal ,geometric, exponential ) can theoretically have extreme values but I guess it will not happened in the life of the universe.

93. Ben Standeven Says:

James Gallagher #87

Why couldn’t free will be produced by natural selection? Being unpredictable seems pretty adaptive to me…

Or did you mean that it couldn’t be produced by a mutation? But then it does sound like a supernatural process.

94. Jerry Says:

re: Kate Becker’s article.

…Quantum cryptography is already being used commercially for some bank transfers and other highly secure transmissions…

How can quantum cryptography be employed by a classical (i.e. bank’s) computer? If a few lines of code that contain exclusive quantum gates live within a classical program that outputs classical information (as it must) in a way that provides the speed and encryption benefits of quantum computation, have we reached a milestone?

95. fred Says:

How does the Kolmogorov complexity evolve as we start adding extra bits to the object?
E.g. if we concatenate two strings s1 and s2, the total K-complexity is bounded by
min(K(s1),K(s2) <= K(s1+s2) <= K(s1)+K(s2)
The expansion of PI has the same complexity regardless of the number of digits. And adding more bits always implies that either the complexity stays constant or increases, but can never decrease, right?

Also, instead of concatenation, what if we superpose two strings, e.g. we take the expansion of PI, and then a random string 010001100010 (like a Poisson process) and we add them
314159265358
+010001100010
=324150365368
Seems like the same relation still holds. The type of addition (or any operation) we do with the two strings can be described by a small constant in terms of K-complexity, so it's pretty much irrelevant.

96. James Gallagher Says:

Ben Standeven #94

Well obviously I don’t know (!), but yes we can speculate that (macroscopic) unpredictability itself is evolutionary beneficial – and then we have an insanely complex anaylsis of how different species’ adaption to varying levels and types of unpredictable behaviour by rival species might have resulted in what we consider to be “free-will” behaviour today.

I prefer however to believe that evolution solved the interpretation of QM debate, maybe even before the era of prokaryotic cells 🙂

97. Serge Says:

Congratulations Scott on a great article about Kolmogorov complexity! I’d like to know your opinion about a few questions that keep puzzling me.

1) Is K(S) related in any way to the probability for a bit sequence S of complexity K(S) to be output by a computer?

2) More generally, is it true that the objects which actually exist in our universe must be of low Kolmogorov complexity? In that case, it would yield a measure quite analogous to the probability of observing a particle in quantum mechanics.

3) Since K(S) is uncomputable, is it possible to construct different consistent models of computer science by simply assigning different Kolmogorov complexities to their respective bit sequences? Thus, some exotic models could have different properties than ours. In particular, some algorithms might have a low probability of being output by our brain in one universe – that is, a high probability of being found – but a high such probability in another one. What do you think of this hypothesis?

98. Serge Says:

Of course, I meant “some algorithms might have a low probability of being output by our brain in one universe – that is, a *low* probability of being found – but a high such probability in another universe.”

The global idea being that some of the undecidabilities encountered in complexity theory come from the uncomputability of Kolmogorof complexity, which in turn might get decided by physics.

99. Scott Says:

Serge #97:

1) Sure, if you picked a computer program randomly from the universal distribution and ran it, then the probability that the program’s output would be x would go roughly like 2-K(x). (The universal distribution is the one where every prefix-free program of length n occurs with probability 1/2n.)

2) Well, if you accept that quantum measurement outcomes are really random (as QM says they are, and as the Bell inequality violation makes hard to deny), then a sequence of n independent 50/50 quantum measurement outcomes should have Kolmogorov complexity close to n, with overwhelming probability. So, that would be an obvious counterexample to everything in Nature having low Kolmogorov complexity. However,

a) it would still be reasonable to suppose that everything in Nature could be simulated by a short computer program with access to a random-number generator, and algorithmic information theory lets you formalize that statement (this is what the latter part of my article was about).
b) If you believed the Many-Worlds Interpretation, you could dispense even with the need for the random-number generator, by simply saying that the computer program should output all the branches of the wavefunction (with the appropriate amplitudes), since the branches are all equally real!

In any case, once you take away these quantum-mechanical complications, the statement that “our universe has low Kolmogorov complexity” is almost (if not quite) equivalent to the statement “the laws of physics are simple and universal,” which of course has been an extremely successful guiding principle since the 1600s.

Oh, one other point that I should mention: even if the universe as a whole has low Kolmogorov complexity, that still wouldn’t prevent specific parts of the universe from having high Kolmogorov complexity! To see how that’s possible, consider that you can have an extremely short computer program that outputs every possible string in lexicographic order—but some of the individual strings output by that program can only be output by long programs, since the programs have to specify the entire string. (Indeed, the previous discussion about the entire quantum wavefunction having low Kolmogorov complexity, even while individual branches of it have high Kolmogorov complexity, was an example of the same sort of thing.)

3) Yes, you can use Gödel’s Theorem to show that, given any formal system F of your choice, there will exist strings x (in fact, almost all strings!), such that the exact value of K(x) is independent of F. And that means, in particular, that it must be possible to construct “nonstandard models” of F, in which K(x) is assigned a lower value than it really has. (By contrast, if F is consistent, then there are no models of F in which K(x) is assigned a higher value than its actual value—do you see why?)

In any case, as I explain in Chapter 3 of my Democritus book, I personally think that “nonstandard models of arithmetic” are much better thought of as artifacts of Gödel’s theorems (the completeness and incompleteness theorems) than as “alternate realities.” After all, there is a shortest computer program that outputs a given string x; it’s just that particular formal systems might not be powerful enough to prove it.

100. Rahul Says:

I found some parts of that Kate Becker article a bit puzzling, almost annoying. e.g.

Could our universe, in all its richness and diversity, really be just a bunch of bits?

What does that mean? Is that the Nick Bostrom version of the-universe-is-a-simulation-and-we-are-agents-in-it?

Sure, your model of the universe may be it as “just a bunch of bits” but does that really make it so? What does it even mean to say that the universe is “really” X and not Y.

101. Scott Says:

Rahul #100: Well, yes, that’s what I was trying to say! And at least Kate was nice enough to quote my objections within the article itself. 🙂

102. Rahul Says:

Another bit I found intriguing was this quote by Vlatko Vedral in that article

The rules of quantum information provide the most “compact” description of physics

Is that really true? I mean is everything else derivable from this compact set of rules & what exactly is this set of rules.

And is it conversely true: Can all the rules of quantum information be derived from “laws that govern matter and energy.”

103. David Brown Says:

“… Bell’s inequality … lets us say something nontrivial and surprising …”
According to ‘t Hooft, Bell’s theorem is likely to be false.
http://arxiv.org/abs/1207.3612 “Discreteness and Determinism in Superstrings” by Gerard ‘t Hooft, 2012
What is the best counter-argument to ‘t Hooft’s argument against Bell’s theorem?

104. Itai Says:

Scott
I know that some wave functions are not normalizable but are physically possible,
Doesn’t it implies that physical wave functions with no first/second moment do exist also.
Don’t you think it suggests some problematic things in the probalistic interpetation in QM?

105. Scott Says:

David Brown #103: Err, the best counterargument is that Bell’s theorem is a theorem! 🙂 Theorems can’t be false; at best, their assumptions can be argued to be unfair or unjustified. But in ‘t Hooft’s case, the assumptions that he has to make to escape Bell’s theorem—basically, of a “cosmic conspiracy” in which our own brains, measurement devices, and random-number generators all collude with the elementary particles to make it look the CHSH game can be won 85% of the time (but not more than that!)—are a quadrillion times crazier than what he’s trying to avoid. They’re almost like a backhanded compliment to Bell’s analysis, like a convoluted way of admitting he’s lost the argument. Anyway, I’ll say more about this in Part II of the article, so stay tuned for that.

106. Scott Says:

Itai: Can you give me an example of the type of wavefunctions you have in mind? If you’re talking about, e.g., Dirac delta functions, those can always be regarded as just convenient approximations to Gaussians with very small spread. In any case, no, I don’t think this issue suggests anything problematic in the probabilistic interpretation of QM.

107. Itai Says:

Scott
I don’t have any specific example of such wave function,
also i’m not familiar with much except hydrogen atom and potential well.
However I read in Roger Penrose “Road to Reality” some criticism about it ( I think you will agree on the nature of quantum waves and states as opposed to probabilities ):
“For some states, such as momentum states, |psi| diverges, so we do not get
a sensible probability distribution in this way (the probability density being zero everywhere, which is reasonable for a single particle in an infinite
universe).
In accordance with this probability interpretation, it is not uncommon for
the wavefunction to be called a ‘probability wave’. However, I think that this
is a very unsatisfactory description. In the first place, Psi(x) itself is complex,
and so it certainly cannot be a probability. Moreover, the phase Psi up to an
overall constant multiplying factor) is an essential ingredient for the Schrodinger evolution.
Even regarding |psi|^2 as a ‘probability wave’
does not seem very sensible to me.
Recall that for a momentum state, the
modulus |Psi| of Psi is actually constant throughout the whole of spacetime.
There is no information in |Psi| telling us even the direction of motion of the
wave! It is the phase, alone, that gives this wave its ‘wavelike’ character.
Moreover, probabilities are never negative, let alone complex. If the
wave function were just a wave of probabilities, then there would never be
any of the cancellations of destructive interference. This cancellation is a
characteristic feature of quantum mechanics.

108. Anon Says:

Completely unrelated: Scott, Sean Carroll named dropped you in an Intelligence Squared debate (although he accidentally referred to you as a physicist!). He was paraphrasing you in response to his opponent bringing up quantum mechanics in relationship to consciousness. “Quantum mechanics is confusing, consciousness is confusing, so maybe they’re the same”. http://youtu.be/lzCYva2aYCo?t=44m35s

109. Michael Says:

“(although he accidentally referred to you as a physicist!).”

I know Scott doesn’t think of this as a compliment (though he should 😉 ), but there are few who combine technical sills, common sense, and intuition better than Scott (OK, maybe David Deutsche), but very few others. 😉

110. wolfgang Says:

@Itai

The ill-behaved wave functions in quantum mechanics that you seem to worry about are either eliminated with appropriate boundary conditions and/or initial conditions.
Usually, experiments take place in a laboratory and the natural boundary condition is that psi = 0 at the walls and outside. This eliminates all your examples.
Alternatively, if it is easier to handle in calculations, one has to assume that psi falls off quickly enough at large distances.

Also, if the initial state is well behaved, unitarity guarantees that it will remain well behaved.

There can be some technical issues, e.g. with plane waves in the continuum limit, but there is never a physical issue, only the question on how to best handle the math …

However, if we leave conventional quantum mechanics and talk about the “wave function of the universe” e.g. in quantum cosmology, then there really are issues to worry about. Now psi is a function e.g. over the space of all 3-geometries and it is much less clear what boundary conditions are ‘natural’ …

Hi Scott

I read your blog occasionally and it is very interesting although I am not sure I understand everything. What I think that I understand is issues about randomness. My understanding is probably on the border with crack-pot realm but it works for me and it can be interesting for you and others. For example

Let n be a 512 bit string. Treat n as positive integer. Apply (3n+1)/2 if n is odd otherwise do n/2. The result is new n. Repeat step above 256 times to acquire 256 bit parity string recording 1 when n is odd and 0 when n is even. Discard first 128 bits and use the rest of 128 bits as a hash of n.

1. While above algorithm is sort of described that does not mean that function description is there (if it is considered as a function). Without knowledge of the input the function can be composed in 2^256 ways and that depends entirely on the input. There is the same case in random oracle theory when input complexity is on the par with function description complexity. While this case guaranties non-correlation between function instances (the randomness) it is considered as impractical exercise (table with records of fair coin flips for example) where above is not.

2. There is possibility of finding some pattern / correlation between instances when above algorithm is run. That possibility will eventually lead to reduction of 2^256 complexity above. In effect that will mean that branching can be reduced by combination of the sequence and the looping algorithm structures. That runs against structural programming theorem and branching non-reduction. Simply put, it is impossible to make program for checking inputs of 3n+1 problem without using branching structure or exhaustive search.

3. From my understanding from above argumentation the proposed hashing scheme is a one way function, random oracle and NP complete problem (crack-pot alert). It seems (at least for my point of view) that humble branching algorithm structure is proverbial electrical fence between P and NP. Basically, when you have input you are in P, if you have partial output only you do not have function description anymore and you are in NP.

112. Jim Cross Says:

Scott,

Will you be commenting on this?

Is Consciousness Computable? Quantifying Integrated Information Using Algorithmic Information Theory

Phil Maguire, Philippe Moser, Rebecca Maguire, Virgil Griffith

http://arxiv.org/abs/1405.0126

113. Jay Says:

Jim #112

These authors argue that consciousness should be based on lossless integration, otherwise the memory traces would be affected each time we remember something. Problem is, this is exactly what happens.

http://learnmem.cshlp.org/content/7/2/73.full

114. Scott Says:

Jim #112: Well, Philippe Moser is a serious computability theorist. But that paper doesn’t seem relevant to me, since I don’t accept the starting assumptions of “integrated information theory” anyway, and don’t know why some other people do (maybe I’ll write a blog post about that sometime). And the last sentence of the abstract seems overblown. So no, I guess I probably won’t be commenting on this paper more than I just did.

115. James Gallagher Says:

@Itai

just in addition to wolfgang #111, it can be shown taht the wavefunction decays to zero at infinity with increasing potentials or we have a plane wave solution at infinity (with decreasing potentials), assuming the wave function obeys a schrodiinger evolution equation.

This is rigorously proved under quite general conditions in (for example) The Schrodinger Equation – Berezin and Shubin, Chapter 2.4

(Unfortunately the whole section is not viewable in google books, but you could try a search for a djvu copy of the book if you don’t have access to a suitable library)

116. James Cross Says:

Thanks Jay and Scott.

“The implications of this proof are that we have to abandon
either the idea that people enjoy genuinely unitary consciousness or that brain processes can be modeled computationally.”

They really do not prove that the first part is true. They start with it as an assumption.

It might be that people do not really have unitary consciousness – that it is an illusion. Actually this conforms to my own view. For example, when I drive to work I might be listening to Bach on the radio and thinking about a problem I am working on at work all the while my eyes, hands, and feet are doing everything to keep me from wrecking. Consciousness seems to involve all of that but it hardly seems unitary.

But it also still might be both parts are true or that consciousness cannot be practically modeled on any substrate other than living material.

117. Chris Says:

Scott,

You and your readers may be interested in this proposal for a hardware quantum random number generator based on a smartphone camera:

118. James Gallagher Says:

Chris #118

Interesting article. But I doubt this RNG can claim pure quantum randomness any more than Intel’s RdRand implementation on its Ivy Bridge chips.

(If someone can devise a Bell Inquality violation demo using smartphones I’ll be amazed)

119. Jay Says:

Scott #114

Word on the street is Giulio Tononi publish a paper starting with “Using string theory we demonstrated that the Higgs filed shoud not exist, otherwise it would allow gravitationnal waves”. Word on the blogs is Penrose answered “Well, Tononi is serious psychology theorist. However I don’t accept string theory and do not see why other do”. Hopefully, at some point someone will explain that we *do* have empirical evidences for gravitationnal waves.

James #116

By “unitary” it seems you meant “continous” whereas these authors meant “information lossless”. For experimental evidences that neither feature apply to consciousness, you might find usefull to search for “backward masking” and “reconsolidation”, respectively.

120. rrtucci Says:

BICEP is likely wrong!!! And it’s not even an Italian project!
Makes one yearn for the golden days of yore when mathematicians, not flaky physicists, ran the show.
http://resonaances.blogspot.com/2014/05/is-bicep-wrong.html

121. Michael Gogins Says:

Perhaps there is some confusion between the proposition “consciousness is unitary” and the proposition “consciousness is unified.”

Most philosophers and psychologists think that consciousness is unified. That, indeed, unity is a defining mark of consciousness. There is one field of consciousness per subject, in which the subject is aware not only of the various phenomenal objects but also of him or her self as subject, of awareness itself. Consciousness does not divide up into phenomenal consciousness, consciousness of being conscious, and so on.

This of course is consistent with backward masking or reconsolidation or loss of memories or whatever. It is also consistent with doing more than one thing at the same time, losing and gaining consciousness of subsidiary tasks, and so on. A person who is driving, talking, and listening to music is not three subjects at the same time. Rather the focus of consciousness shifts rapidly and fluidly from one task and context to another.

But it is very hard for me to see how to model this unity of consciousness as a computation or physical process. I do not see how theorizing consciousness as any kind of modeling or reflection does not lead to some sort of infinite regress.

122. Jay Says:

Michael #121

Where did you get that most psychologists think that? I’d be surprised. To the contrary, clinical observations from split-brain patient plead that each cerebral hemisphere forms:

“indeed a conscious system in its own right, perceiving, thinking, remembering, reasoning, willing, and emoting, all at a characteristically human level, and . . . both the left and the right hemisphere may be conscious simultaneously in different, even in mutually conflicting, mental experiences that run along in parallel”

123. James Cross Says:

Michael and Jay,

They implicitly define unitary consciousness in this sentence:

“According to the integrated information theory, when we
think of another person as conscious we are viewing them as
a completely integrated and unified information processing
system, with no feasible means of disintegrating their conscious cognition into disjoint components.”

124. Jay Says:

They? The “most psychologists” they or the “serious computer theorists” they? 🙂

BTW, their account of IIT is questionnable too. Tononi defines elementary modes within qualia space, whatever that means, to be “the basic experiential qualities that cannot be further decomposed”. This is at odds with the idea that conscious cognition as a whole cannot be “disintegrated into disjoint component”.

http://www.biolbull.org/content/215/3/216.full

That said, Tononi’s manifesto is still provisionnal and work in progress. I’d be curious to read how he’d fit what we know from split brain in this picture.

125. Fenella Saunders Says:

Scott, what do you think of this?
http://physicsworld.com/cws/article/news/2014/may/16/how-to-make-a-quantum-random-number-generator-from-a-mobile-phone

126. Scott Says:

Fenella #125: That’s really cool! And I don’t see any reason why it wouldn’t work. Of course, it doesn’t give you the “Einstein-certified” guarantee that you get from Bell inequality violation, but it could be perfectly good enough for many cryptographic applications.

127. Yoni Says:

Hi Scott

I read the article and loved it. I did have one question though (apologies if it is in the comments above, I have not read them all).

You state that Kolmogorov complexity is uncomputable as a fact, and your proof for this is in effect that if it were computable then the algorithm that computes it would be useable to compute random numbers with higher Kolmogorov complexity than itself.

However what if the algorithm to compute Kolmogorov complexity is specific to the length of the string it is being run on? That way it could potentially exist and be of practical use for a given length string (and actually be able to be used to spit out the “most random” numbers of the given string length) but without itself having a lower complexity than the numbers it is analysing.

128. Scott Says:

Yoni #127: Yes, you raise a good point. For fixed n, there’s obviously an algorithm to compute K(x) for strings of length n only. Indeed, I invite you to prove, as an exercise, that there’s such an algorithm that takes only n+O(log n) bits to write down (but an immense amount of time to run)! And of course, there’s also an algorithm that takes ~2n bits to write down, and is fast to run: namely, just cache K(x) for every x∈{0,1}n in a gigantic lookup table! Using diagonalization arguments, one can prove that this tradeoff is inherent—so that for example, there’s no algorithm to compute K(x) for n-bit strings that takes poly(n) time to run and also poly(n) bits to write down.

In any case, the broader point is that, when computer scientists talk about “algorithms,” unless otherwise specified they almost always mean uniform algorithms: that is, algorithms that work for inputs of arbitrary size. And that’s what I meant in the article.

For, while it’s true that every problem whatsoever admits a nonuniform algorithm (even uncomputable problems), in some sense that observation simply pushes the uncomputability somewhere else: namely to the question, “OK, so given a value of n, how do I construct the algorithm that works for inputs of size n”? Clearly, there can’t be any algorithm for this meta-task—since if there were, then it would yield a (uniform) algorithm for our original problem, contrary to assumption.

129. Yoni Says:

Scott #128

Thanks for responding; I get your point now. Unfortunately I am going to have to decline your invitation (at least for the foreseeable future) as I have no idea where to even start! (I think I would probably need to go back to undergrad school to get even close).

I look forward to part II of your article, and have bookmarked your excellent blog.

Regards

130. Scott Says:

Yoni #128: OK, I’ll tell you the answer, just to show you that these things often aren’t nearly as hard as they look.

Recall, we want an algorithm that computes K(x) for every n-bit string x, and that takes only about n bits to write down. That algorithm will simply hardcode the number, call it M, of computer programs at most n bits long that halt when run on a blank input. Then, given x as input, our algorithm will start running every program at most n bits long, in parallel. And it will keep doing so until M of the programs have halted. Once that happens, it knows that the remaining programs will never halt! So then, among the M programs that halted, it just has to find the shortest one that outputs x, and that tells it K(x).

131. Yoni Says:

Scott #128

“just to show you that these things often aren’t nearly as hard as they look”
Lol – it took me about 45 minutes just to figure out what your answer meant; I still have no idea what “n+O(log n) bits” means (what is O and what has log n got to do with anything? No need to answer, just wanted to give you an idea of how waay out of my knowledge base we are here.) – so yes, it probably was about as hard as it looked 🙂

Despite that, I do still have a question on your solution:

The solution presumes that we have a way of knowing the number of programs that will halt when run on a blank input. Surely we can’t find that out by just running them as we have no idea when it might halt.

If you have a method of determining which programs will halt and which will not then we are back with a general solution to finding K(X) (i.e. create all programs up to length n, count the ones that end on a blank input [M], record [M], run all the programs until M stop, find the shortest one with x as the output.)

So surely this means that there is no way of finding M – am I missing something?

132. Scott Says:

Yoni: Yes, you’re absolutely right, finding the M for a given n is an uncomputable problem—and we know that, because otherwise K(x) itself would be computable, contrary to what I proved in the article.

However, you yourself are the one who asked me to play the “nonuniform game”—i.e., the game where you first fix a value of n, and then try to imagine the best possible algorithm for inputs of size n, completely ignoring how hard it is to find the algorithm (i.e., imagining that an all-knowing wizard hands you the algorithm, so all you need to do is run it on the x∈{0,1}n of your choice). If you want to take the difficulty of finding the algorithm back into account … well then, you’re right back to the uniform setting that I assumed in my article! 🙂

Incidentally, the reason the algorithm takes n+O(log n) bits to write down is simply that M takes n+1 bits to write down, n takes O(log n) bits to write down, and the rest of the program takes O(1) bits to write down, independent of n. So, adding it all up we get

n+1+O(log n)+O(1) = n+O(log n).

133. Yoni Says:

Scott: “you yourself are the one who asked me to play the ‘nonuniform game'” – I guess I am, although entirely unintentionally (I suppose that just proves that I am not the all-knowing wizard :p)

For my own clarification, if I understand you, the response to my initial questions is “yes, for x of any length n there exists an algorithm to calculate k(x), however that algorithm must be uncomputable since otherwise the general algorithm would also be computable violating the reasoning in your article”. Is that right?

“Incidentally”, my main issue was not remembering what O() means (I thought I simply didn’t know it but having found it out recall the notation having been used in uni). Googling “O()” is surprisingly unhelpful. Thanks for the explanation though.

134. Jay Says:

“n+O(log n) bits” => “about n bits, at least for large enough n”

135. Scott Says:

Yoni #133: Yes, you understand correctly!

Dear Mr. Aaronson,

I am going to begin looking into the definitions of Kolmogorov complexity and Shannon entropy a little more closely. I’ve never really studied the subject before:

For a while something has been bugging me about randomness:

In terms of the randomness unpredictability link: It seems to me that, to the extent we can call anything random, we have to be talking about systems for which we have limited information – we either are not resolving part of the physics, or we don’t have all of the state variables (just a projection thereof). (Or, I suppose in the case of QM, we don’t have a way to keep track of which branch we are ending up in.) In a Newtonian world, all randomness is pseudorandomness.

(PS – if you had the original wavefunction of the universe, knew the physics it operated by, and evolved it in time, why would the complexity increase logarithmically in time? Why at all? It seems that in any deterministic system, if you have the initial state and the integrator, you can’t get new information from it – you already know everything about what it will give you at any time in the future.)

Other examples that spring to mind: Minecraft worlds – as interpreted by our gameplayers perspective, they seem very large and complex. But as interpreted by the generating algorithm, they are just a tiny integer fed as a seed to the algorithm.

(I see you address this in #99)

What prevents you from *always* being able to find some integrator which, an arbitrary output string (or basket of arbitrary output strings) relative to the process is “simple”? It seems you have to include some sort of statement that a string is complex *relative* to some generator or evaluator.

Another example: Suppose you have the “Library of Babel” containing every possible book. Somewhere in there are the works of Shakespeare – though they are far enough away that finding them requires some piece of information that is as complex as authoring the book yourself. But where the works of Shakespeare are depend on how the library is organized. Transformations involving reorganizing the library, or applying some sort of cryptographic transformation to the characters (rot13, anything more complicated involving multiple characters) are equivalent. Somewhere there is a cryptographic transformation that brings all of the works of Shakespeare close to the origin of the library.

The book “AAAAAA…” might not mean much to you, but to the guy who reads things via his complicated (*relative* to you!) cryptographic language, it might be complex and meaningful prose. Likewise, all the books you find interesting would look like gibberish to observer B. Relative to observer B, it is not he who has a complicated language. Rather, you have a complicated filter which is the inverse of observer B’s filter relative to you.

PS – If the Kolmogorov complexity of the universe is low, then without appealing to some sort of limited perspective or information loss: Why would anything the universe contain have higher complexity than the universe itself?

(That includes any finite idea accessible to humans, and anything their ostensibly random generators spit out?)
From a “God’s eye view”, shouldn’t everything be bounded by the complexity of the universe?

I have a quick question about Kolmogorov complexity:

Isn’t the Kolmogorov complexity of a string dependent on the turing machine running your programs? I am currently thinking of the turing machine as a sort of generalized map between input programs and output strings (something that’s not at all how I usually think of them when programming sanely designed ones!):

Suppose you have a perversely designed turing machine that spits out whatever string you want when fed a null program of zero length? It seems you can offload the complexity requirements of your output string from the program to the turing machine. How do you analyze the Kolmogorov complexity of a turing machine then? (You can simulate turing machines on other turing machines, I suppose, but I haven’t been able to convince myself yet that there isn’t some arbitrariness here about what turing machine you are using as a standard.)

139. Scott Says:

MadRocketSci: Yes, you’re absolutely right that the definition of Kolmogorov complexity depends on your choice of universal reference machine (or in other words, your programming language). And you’re further right that, given any string x of any length, there’s some programming language that assigns x a small Kolmogorov complexity—namely, a language that simply hardwires x into its definition!

But there are two big factors that ameliorate the situation. The first is that the above issue doesn’t affect the asymptotics—so if you have an infinite sequence of longer and longer strings, eventually the programming-language dependence will get washed out and you’ll have K(x) uniquely defined up to an additive constant (that’s the content of the Invariance Theorem). The second factor is that in practice, we’re interested in programming languages that are “reasonable”—not ones that do things like hardcode particular strings that would otherwise be Kolmogorov-random. And while “reasonable” is obviously hard to formalize, one can approximate what it means by saying “something like C, or Python, or Lisp, or assembly, or Turing machine, or pretty much any programming language that anyone actually uses or has even defined mathematically.”