Ryan Williams strikes again
Update (Feb. 27): While we’re on the subject of theoretical computer science, friends-of-the-blog Adam Klivans and Raghu Meka have asked me to publicize that STOC’2025 TheoryFest, to be held June 23-27 in Prague, is eagerly seeking proposals for workshops. The deadline is March 9th.
- Because of a recent breakthrough by Cook and Mertz on Tree Evaluation, Ryan now shows that every problem solvable in t time on a multitape Turing machine is also solvable in close to √t space
- As a consequence, he shows that there are problems solvable in O(n) space that require nearly quadratic time on multitape Turing machines
- If this could be applied recursively to boost the polynomial degree, then P≠PSPACE
- On Facebook, someone summarized this result as “there exists an elephant that can’t fit through a mouse hole.” I pointed out that for decades, we only knew how to show there was a blue whale that didn’t fit through the mouse hole
- I’ll be off the Internet for much of today (hopefully only today?) because of jury duty! Good thing you’ll have Ryan’s amazing new paper to keep y’all busy…
Update (Feb. 25): It occurs to me that the new result is yet another vindication for Ryan’s style of doing complexity theory—a style that I’ve variously described with the phrases “ironic complexity theory” and “caffeinated alien reductions,” and that’s all about using surprising upper bounds for one thing to derive unsurprising lower bounds for a different thing, sometimes with a vertigo-inducing chain of implications in between. This style has a decidedly retro feel to it: it’s been clear since the 1960s both that there are surprising algorithms (for example for matrix multiplication), and that the time and space hierarchy theorems let us prove at least some separations. The dream for decades was to go fundamentally beyond that, separating complexity classes by “cracking their codes” and understanding the space of all possible things they can express. Alas, except for low-level circuit classes, that program has largely failed, for reasons partly explained by the Natural Proofs barrier. So Ryan achieves his successes by simply doubling down on two things that have worked since the beginning: (1) finding even more surprising algorithms (or borrowing surprising algorithms from other people), and then (2) combining those algorithms with time and space hierarchy theorems in clever ways to achieve new separations.
Follow
Comment #1 February 24th, 2025 at 11:21 am
Ha-ha: back in the day, one of my physics profs wanted to be on a jury, to participate in civic affairs, etc.
Unfortunately, in the jury selection process, when it was revealed that he was a scientist, the lawyers from BOTH sides jumped up to get him excluded.
Comment #2 February 24th, 2025 at 11:29 am
Raoul Ohio #1: ROFL I hope that will work in my favor! Jury selection coming up soon; will just act like my normal nerdy self.
Comment #3 February 24th, 2025 at 11:37 am
This is pretty impressive, but to me the Cook and Mertz result seems more interesting than Ryan Williams’ result. Ryan’s result is something I’d intuitively believe already. The Cook and Mertz though seems like something I would not expect. I’d guess that there are things which just require keeping track of much of the data late in the process to decide what to do. There was IRCC a result of a similar sort already where instead of t^(1/2) one has t/log t but that doesn’t seem nearly as bothersome whereas t^(1/2) is a lot smaller than t. Is there some nice short explanation for why my intuition is wrong here?
Comment #4 February 24th, 2025 at 11:55 am
Great news: I indeed wasn’t picked! 🙂
Comment #5 February 24th, 2025 at 12:20 pm
At first I thought that was the same “Cook” from the “Cook-Levin theorem”. Now that would be inspiring!!
Looks like it’s James Cook and Ian Mertz.
(Though it indeed seems that Stephen Cook and Leonid Levin are indeed both still working as professors, at 85 and 76 respectively! Cool!)
Comment #6 February 24th, 2025 at 12:22 pm
What do you mean by polynomial degree?
“If this could be applied recursively to boost the polynomial degree, then P≠PSPACE” There are no polynomials involved except PTIME and PSPACE.
Comment #7 February 24th, 2025 at 12:26 pm
Joshua Zelensky #3: The Cook-Mertz thing was indeed shocking, which is why it was recently written up in Quanta! But I confess it took Ryan’s new thing for me to appreciate just how big a deal it was.
As for your (and my) intuition being wrong here … for starters, read Ryan’s paper and the Quanta article! The Cook-Mertz algorithm is nonrelativizing, and relies on low-degree polynomial extensions to reuse space in a pretty striking way (although of course by now there’s tons of precedent in complexity theory for low-degree polynomial extensions to produce shocking, nonrelativizing conclusions).
Comment #8 February 24th, 2025 at 12:58 pm
Anonymous Ocelot #5: Steve Cook retired several years ago, but James is his son.
Comment #9 February 24th, 2025 at 1:47 pm
Anonymous #6: I just meant, if the √t could be improved to tε for every ε>0.
Comment #10 February 24th, 2025 at 3:04 pm
> The Cook-Mertz algorithm is nonrelativizing
Is this because there are known oracles that make Ryan’s theorem false? I didn’t quite follow Ryan’s explanation of Theorem 5.1 about the oracles. Other than the length bound, it sounds like Theorem 5.1 saying “Theorem 1.1 ‘almost’ holds relative to every oracle.” So is the idea that since we know there is an oracle making P = PSPACE, Theorem 5.1 is evidence that these techniques maybe cannot be extended to separate P from PSPACE?
Also, I find it impressive that the Cook-Mertz paper emphasizes how their algorithm (nearly) means you can’t use TreeEvaluation to separate L from P… so Ryan turns around and uses it to take a huge steps towards separating P from PSPACE. That seems uncanny.
Comment #11 February 24th, 2025 at 3:16 pm
Does this tell us anything new about the general possibility/limits of using more memory to accelerate a computation (time/space trade offs)?
Comment #12 February 24th, 2025 at 3:26 pm
Is there any barrier result that indicates this approach to P NE PSPACE can’t work?
Comment #13 February 24th, 2025 at 3:48 pm
We know that there are quadratic time problems because of the time hierarchy theorem, right? I mean that there are problems with O(n^2) algorithms that don’t somehow also have faster algorithms (sort of a P=NP situation inside P)? So those problems by the new result can be solved in O(n) space, i.e. there are O(n) space problems that require O(n^2) time.
Somehow I half-remembered the factoid that the P vs NP situation right now is that we not only don’t know if SAT requires superpolynomial time, but we don’t even know if it is worse than linear. That meant I had to stop for a moment to ask if we know there are P-time superlinear problems at all.
Is the new result really surprising? The first thing I thought of when I saw the Quanta article was Deutsch-Schorr-Waite garbage collection. That’s the one where you reverse the pointers in the data to remember where you came from, then reverse them back later, instead of using a stack to do a recursive scan.
Comment #14 February 24th, 2025 at 4:28 pm
Dave Doty #10:
Is this because there are known oracles that make Ryan’s theorem false?
That’s an excellent question. Certainly low-degree polynomial extensions have the ability to, and often do, lead to results that are false relative to oracles. But of Cook-Mertz and the various results stated by Ryan, I don’t know which ones are actually known to be false relative to oracles — does anyone else?
Comment #15 February 24th, 2025 at 4:37 pm
fred #11:
Does this tell us anything new about the general possibility/limits of using more memory to accelerate a computation (time/space trade offs)?
It tells us something new about the possibility of using less memory to do a computation slower… 🙂
Comment #16 February 24th, 2025 at 4:42 pm
William Gasarch #12:
Is there any barrier result that indicates this approach to P NE PSPACE can’t work?
Another good question, which Ryan touches on in the paper.
If it wasn’t true that anything solvable in time t was solvable in space tε for any ε>0, that would certainly be one obvious “barrier” to using this approach to prove P≠PSPACE.
Beyond that, certainly any proof of P≠PSPACE will need to be non-algebrizing. It would be interesting to know what the prospects are for adding a non-algebrizing ingredient to Ryan’s result, assuming it doesn’t have one already.
Comment #17 February 24th, 2025 at 4:45 pm
asdf #13: Of course we’ve long known the time hierarchy theorem, but that doesn’t say anything about the interplay between time and space, the subject of questions like P vs. PSPACE.
And of course the statement of the new result is totally unsurprising — as usual with complexity lower bounds, the interesting and surprising part is all about what you can currently prove.
Comment #18 February 24th, 2025 at 5:28 pm
Scott #17, I had to stop and check when the post said “as a consequence”. Like “hey wait, maybe there are no O(n^2) time problems (everything in P is really sub-quadratic). So the new result means they are all also sublinear in space and the ‘consequence’ is vacuous. That doesn’t seem crazier than P=NP which is still not disproven. Oh yes, time hierarchy theorem, there really are O(n^2) time problems”. I.e. do I have it right that there is a non-trivial step needed to confirm the consequence?
Comment #19 February 24th, 2025 at 5:37 pm
asdf #18: I don’t understand what you’re talking about. What makes the consequence nontrivial is that the problems require nearly quadratic time, while also being solvable with only linear space.
Comment #20 February 24th, 2025 at 5:57 pm
Scott, I read the original statement as basically “if the set of problems requiring O(n) space is non-empty, then at least one of its members requires O(n^2) time”. So I needed to check that both sets (linear space, and quadratic time) are non-empty.
Comment #21 February 24th, 2025 at 5:59 pm
asdf #20: We’re talking here about the set of problems solvable in O(n) space, which is obviously nonempty.
Comment #22 February 24th, 2025 at 6:51 pm
Hi Scott et al,
Thanks for the discussion! Regarding what is surprising/interesting:
(1) The quadratic time lower bound against linear space is not surprising; everyone (except for the P=PSPACE crackpots?) believed the time lower bound was true.
(2) The simulation of time t in about sqrt{t} space is very surprising (to me). For example, there is an early derandomization result by Sipser (cited in the paper) which literally assumes the opposite is true. The simulation implies every circuit on n gates can be evaluated on any input using only sqrt{n} polylog n space, so you only have to ever store a tiny amount of information relative to the total size of the circuit: a similar statement holds for any simple dynamic program. Even in my most optimistic algorithmic mode, I would not have believed that you could simulate time t in only t^{0.99} space. For this very reason, I sat on this result for months, thinking that either my reduction to Tree Evaluation is obviously wrong, or that the Cook-Mertz procedure must have a bug.
Yes, I believed (1) is true, but I did not believe that (1) would ever be proved by showing (2).
Regarding oracles: papers such as https://userpages.cs.umbc.edu/chang/papers/revisionist/ show/claim that HPV75 and other such speed-up theorems are not relativizing. What I am saying in the paper is that when you restrict oracle query lengths, some form of relativization is possible. This must be taken with a grain of salt, however.
Comment #23 February 24th, 2025 at 6:58 pm
Re #1, #2, and #5: I’ve served twice on juries, once when I was a postdoc and once in my current position as a physics professor. I am skeptical of the many poorly sourced stories of scientists being excluded from juries.
More importantly: (1) Both trials were fascinating. (2) We (scientists) shouldn’t be gleeful about avoiding jury service. Not only is it an important part of everyone’s civic responsibilities, but if we want to claim that being evidence-based, rational, etc., are important to society, this is a place where we should want to put these claims into very direct practice.
Comment #24 February 24th, 2025 at 7:29 pm
Raghu #23: I did find the jury selection process interesting, and I said nothing whatsoever to try to avoid being picked. I’m sure I would’ve found the trial interesting too. What I confess I found a little less interesting were the interminable delays, monster lines, total confusion about where to go, and absence of anyone to explain what was happening, until finally the judge addressed the potential jurors.
Comment #25 February 24th, 2025 at 8:33 pm
Ryan Williams adds the following (for some reason my commenting wasn’t working for him to post it himself):
“As for oracles, https://userpages.cs.umbc.edu/chang/papers/revisionist/rev-book.pdf shows that for every reasonable f(n), TIME[f(n)]^A = SPACE[f(n)]^A relative to some oracle A. It follows that the simulations of HPV75 (and of my paper) are non-relativizing. Still, this does not mean ‘the Cook-Mertz procedure is non-relativizing’ … I’m not quite sure what that would mean (I’m not sure where you want to use an oracle). Their procedure treats the functions at each node as black boxes (in the sense that the entire function at each node is given as a table), so in that sense their procedure yields a weak restricted form of relativization (as described in the paper). An analogous statement is true of HPV75, PPST83, etc. These must be taken with a grain of salt: it’s not clear if these ‘restricted relativizations’ are supposed to be barriers, or not.”
Comment #26 February 24th, 2025 at 11:15 pm
I didn’t mean to imply that you actively avoided anything; sorry. I was, however, interpreting “Great news: I indeed wasn’t picked!” as being happy about not being picked, which I think is a fair reading.
Comment #27 February 24th, 2025 at 11:37 pm
To the experts who read Cook-Mertz paper in depth: Can anyone point out whether it is possible to get logspace bound for tree evaluation by improving Cook-Mertz’s techniques or technically there are still significant bottlenecks.
Comment #28 February 25th, 2025 at 6:02 am
Samit #27: I didn’t read the Cook-Mertz paper in depth, but I know that people are thinking about that problem, and if and when they have an answer, they’ll probably write a paper rather than a blog comment. 🙂
Comment #29 February 25th, 2025 at 7:08 am
Hi Scott, sure, I am not expecting a solution here, but if anyone clarifies the core technical issue whose resolution would lead to logspace bound for tree evaluation, it would be nice to know 😀
Comment #30 February 25th, 2025 at 1:31 pm
I thought we figured the lower limit back in 1981 already:
“640K ought to be enough for anybody” – Bill Gates
Comment #31 February 25th, 2025 at 3:22 pm
fred: maybe it will turn out that 640K is indeed enough…
(If you have plenty of time to wait for an answer, and you can fetch arbitrary chunks of your input from another machine)
Comment #32 February 25th, 2025 at 4:16 pm
ryan williams #31
That’s already what’s being done with any CPU anyway, they don’t actually access the physical RAM directly, the data gets copied into smaller and smaller caches (the closer to the CPU, the smaller and faster it gets).
And then technically a Turing machine just needs one bit (the one under the head), plus an extra bit it sends back to tell to read the next bit to the left or right of the current one.
Comment #33 February 26th, 2025 at 9:19 am
As far as jury selection goes, when I was being selected for a civil case,
the plaintiff’s lawyer was asking for a gigantic (outrageous) sum, but he kept insisting that the financial situation of the defendant made no difference whatsoever, that it doesn’t matter whether he could afford it or not (i.e. he could be liable for life with his salary totally locked into payments to the plaintiff)… and when I said I totally disagreed with this point of view (especially that the financial situation of the defendant could have made a difference in the particulars of the case), I was immediately dismissed.
Comment #34 February 26th, 2025 at 11:15 am
I really stan Ryan’s work. What a cool result!
That being said, how much of the result is model-dependent?
If we translate the result to the RAM model, which is what algorithmic people actually use, I guess we wouldn’t save any space yet?
Comment #35 February 26th, 2025 at 11:36 am
Stan #34: Yeah, the result works for multitape TMs, which are much more powerful than single-tape TMs, but at present it doesn’t work for RAM machines.
On the other hand, the RAM model is itself an idealization because of latency issues (ultimately, the speed of light being finite), as one can already see in practice in large datacenters.
Comment #36 February 26th, 2025 at 12:04 pm
what do you mean by the phrase “ironic complexity theory”? are you using ironic in the same way that john horgan, the science writer, uses “ironic science”?
Comment #37 February 26th, 2025 at 12:29 pm
David Doty #10: You said “Also, I find it impressive that the Cook-Mertz paper emphasizes how their algorithm (nearly) means you can’t use TreeEvaluation to separate L from P… so Ryan turns around and uses it to take a huge steps towards separating P from PSPACE. That seems uncanny.”
I don’t think it’s very uncanny. The Cook-Mertz result shows space-bounded computation is more powerful than we realized. Intuitively, this means it is harder than we thought to separate L from P because L is more powerful than we realized and so a larger subset of P than we realized. But it also makes it easier to separate P from PSPACE since PSPACE is more powerful than we realized and so it is easier to find things it can do that P cannot.
Comment #38 February 26th, 2025 at 1:19 pm
Richard Gaylord #36: No, it’s completely unrelated. The “irony” in this case is that one leverages the existence of surprising algorithms for one task, to prove the nonexistence of surprising algorithms for a different task, showing a sort of yin/yang duality between upper bounds and lower bounds.
Comment #39 February 26th, 2025 at 6:33 pm
@ Patrick #37,
That’s a really insightful point. The general Cook-Mertz result should be taken as a way of showing that *space* is stronger than we thought with respect to time. So if we have a small space class, then it functions as a barrier showing that that the small space class is genuinely small, but it should make it easier to see that a large space class is genuinely large.
Comment #40 February 26th, 2025 at 9:14 pm
About fifteen years ago, I was tossing up whether to focus my personal studies on computational complexity or on particle physics. The latter won out. But at the time, one of my thoughts was that resolving P versus NP might coincide with the creation of superintelligent AI, because it might involve a kind of ultimate algorithmic knowledge. I regarded NP as something like the upper bound on feasible computation. On the other hand, I regarded the complexity classes beyond NP, like PSPACE, as a bit like Graham’s number, something that can be reasoned about, but too big to be very relevant to anything here in our physical world.
At the time we didn’t know that just scaling up deep learning would be enough to achieve human-level conversational AI… Anyway, what I’m wondering is if catalytic computing, and the use of a giant scratchpad already full of information, has anything to say about what today’s AIs do or could do. I don’t think so – at least I don’t think it’s relevant to current AIs – but it’s not completely clear to me, so I thought I’d put it out there.
The training set full of everything anyone ever said on the Internet, feels a bit like the scratchpad. Except that the AI doesn’t exactly memorize it. Perhaps we can say it performs a lossy compression on it? But could something about the resulting weights then be regarded as a scratchpad useful for catalytic computing? Except that, again, this isn’t what the AIs do with their weights.
It does feel as if there is a division in the AI’s “knowledge” into abstract generalized procedural knowledge – e.g. templates for reasoning or calculation or persuasion, that have been extracted from innumerable examples – versus more concrete facts that have been memorized, and which can be plugged into those templates. But this is more akin to the traditional distinction between program and data, than catalytic computing’s division into algorithm and scratchpad. (I’m just calling it scratchpad, I don’t know what the standard term is.)
Comment #41 February 27th, 2025 at 7:43 am
Mitchel Porter #40,
“But this is more akin to the traditional distinction between program and data, than catalytic computing’s division into algorithm and scratchpad.“
I’m no expert, but from my limited reading the content of the so-called scratchpad is immaterial to catalytic computing in these results. IOW, the results are robust even if the scratchpad is full of random noise.
Comment #42 February 27th, 2025 at 9:56 am
Mitchell Porter
“e.g. templates for reasoning or calculation or persuasion, that have been extracted from innumerable examples”
Don Knuth believes P == NP because the number of possible algorithms out there is gigantic, and “innumerable examples” actually only covers a ridiculously infinitesimally small set of possibilities.
Comment #43 February 27th, 2025 at 11:17 am
Don’t we already know that pairwise approximate nearest neighbor requires O(n) space but O(n^2) time unless !SETH? So this work is finding other problems that require O(n) space but O(n^2) time w/o needing to invoke SETH?
Comment #44 February 27th, 2025 at 12:03 pm
First time commenting, seems as good a time as any 🙂
What an incredible result. Everyone wants to see their work used for something, but neither James nor I would’ve guessed that something this impactful would rely on the Tree Evaluation algorithm (let alone so soon). Makes me excited to start taking TreeEval more seriously beyond the original goal of disproving the conjecture of Steve and others.
As many people have already commented (including Ryan), it’s much more surprising and impressive that TIME[t] is in SPACE[sqrt(t log t)] than the separation of SPACE[s] from TIME[s^2]. Still, as far as making progress towards P vs PSPACE goes, it’s obviously a huge deal, and makes it all the more worthwhile to try and push further (and as Dave Doty #10 points out, my first reaction was to the beautiful irony in that Tree Evaluation was supposed to be a tool to separate time from space, and ultimately it was…in the opposite direction).
Since a couple people asked about the TreeEval algorithm I can answer some high level questions about it. For starters, Samit #27/29 asked if there’s a bottleneck to pushing our algorithm all the way to logspace, and to that the answer is yes and no (explanation below, also will appear in the journal version which is currently under review).
Bottleneck: Goldreich has an alternative exposition of our work that makes the connection to interpolation clear: when you get rid of the specifics like roots of unity and all that, our algorithm boils down to the fact that you can get the value q(x) for some degree d polynomial q by evaluating it at d+1 points along a line, i.e. q(x+i*tau) (in our case tau is the old memory, and hence we never have to erase it while doing our evaluations). The barrier is that you can’t do less than d+1 evaluations, and log(#evals) is the loss factor of our algorithm, i.e. the log log n at the moment (or the sqrt(log n) in Ryan’s algorithm). So you have to find a different way to use these arithmetic tricks besides just hoping there’s a magic equation that gives you q(x) via a smaller linear combination.
Counterpoint: instead of trying to improve our subroutine at each node, you can try and mess with the structure of the tree itself to shave off this last log factor. Manuel Stoeckl did so shortly after our algorithm (again see a follow-up writeup by Goldreich) and shaved a log log log n factor off our algorithm by merging layers of the tree; there was a slackness between our register memory and our time bound, and the right tradeoff to exploit this was just to let nodes take more than two inputs. Can’t see any reason other ideas in the same vein won’t work; in our paper we really only thought about solving individual nodes, not any broader questions about the TreeEval object at hand.
I can try and answer other questions as well, although I’d really suggest reading Ryan’s paper wrt implications, future directions, etc (and I’ll also plug the talk I gave at the IAS on the paper, which has the full proof of our TreeEval algorithm and is available on Youtube).
PS: by coincidence I was also asked to do jury duty this week. You’d think after almost nine years of living outside the US (and telling them as such each time they summon me) it would slow down, but alas. If I ever move back maybe I’ll have to take a couple years off TCS to serve on a backlog of half a dozen trials.
Comment #45 February 28th, 2025 at 12:51 pm
Scott #35
That is a good question. If a Turing machine stays in one place and sends light signals to tape cells that are farther and farther away, filling n space one step at a time would take n^2 time. If the Turing machine flies around the datacenter, it would only take n time. Something about this conclusion seems wrong – why don’t we make flying computers? 🙂
Comment #46 March 2nd, 2025 at 1:54 pm
Concerned
as I explained in #32, since we live in 3D space, you have a processing unit at some point in space, then memory fills the space around it, and given that signal transmission speed is finite, it’s then natural to organize memory around the processing units in spherical concentric layers acting as memory caches – the closer to the processing unit, the communication is faster but the size is smaller, and far enough, the memory is arbitrarily big but very slow to access. Having the processing unit “fly around” isn’t improving anything since the data could be anywhere along a spherical shell (that’s why it’s called “random access memory”), so staying at the center is optimal, and it’s the data that’s flying around along wires as electric or photons. The win comes from the caching, and trying to fill entirely a sphere of memory and process it non-randomly, then this way it could be optimal to have the processing units “crawl” the memory cells, but that would be pretty complicated (it would be better to replicate a processing unit at equal nodes along the memory, etc).
Comment #47 March 2nd, 2025 at 2:09 pm
The fact that a Turing Machine has to always access neighbor cells on the tape is an actual annoying limitation, not some kind of clever optimization, because it’s not designed to be practical but designed to be an abstraction that’s as simple as possible. So the concepts of “Turing machine” and “data center” are pretty incompatible.
Comment #48 March 3rd, 2025 at 4:06 am
Off-topic: somehow I have come across a website called “Cognitive Mechanics”, in which a Ryan Williams offers a computational model of cognition in a text combining the numbered style of Wittgenstein’s Tractatus with content worthy of a computer science textbook. At first I was very impressed, but not entirely surprised, that the great complexity theorist also had a philosophy of mind this systematic. But eventually I noticed that one is from Ohio and the other is from Alabama, so I guess it is two different people.
Comment #49 March 3rd, 2025 at 5:23 pm
Scott, when it comes to the definition of 3SAT as an NP complete problem, I never see any explicit bound put on the size of the formula (ie. value of M, the number of clauses), i.e. given that the number of possible boolean functions for N boolean vars grows exponentially (2^2^N), it’s clear (?) that an exponentially large formula (i.e. M~ O(2^N) 3-expressions) is too large because just stating the problem becomes exponentially expensive, so I guess a reasonable bound is P(n) (polynomial numbers of clauses)?
I also remember you mentioned that once M > 4.2 N then the number of instances that are unsatisfiable rises dramatically (meaning a random instance is typically unsatisfiable and only especially constructed ones would be satisfiable, so potentially easy to solve because of strong special structure). So it seems that even M ~ O(N^2) would already be “too” large?
Comment #50 March 3rd, 2025 at 6:38 pm
fred #49: The “input length,” in terms of which the algorithm gets to run in polynomial time, is the total length of the formula, not just the number of variables.
Comment #51 March 3rd, 2025 at 9:31 pm
Fred #46 #47
The Turing machine’s ability to move on the tape, rather than sending O(d) queries to far-away cells, is the reason why it’s not the case that all algorithms which requires O(n) space require O(n^2) time.
Believe it or not, this does correspond to something in “real life.” A datacenter is full of server racks, not storage racks. If you wanted to implement an algorithm that couldn’t be parallelized, what you would do is have the servers send the state of the algorithm between themselves as it processed the data local to each of them. That’s just like a Turing machine moving from cell to cell, so the answer to the riddle is that we *do* have flying servers. (This pattern is not especially common in practice because most algorithms are at least a little parallelizable).
Comment #52 March 3rd, 2025 at 9:36 pm
Has it been proven that TreeEval requires at least logspace? If so, is the proof simple enough for anyone here to explain?
Comment #53 March 4th, 2025 at 6:19 am
Huh, I read Zvi’s blog years ago for card game stuff and this is how I find out he is an AI guy now.
Comment #54 March 4th, 2025 at 9:28 am
Concerned
sure, for specific computation problems( rather than a general computation model), there are always special architectures that would work better, just like for this Tree Evaluation where each node could do the processing locally starting at the leaves and the data propagates towards the root.
Comment #55 March 4th, 2025 at 10:03 am
Scott #50, thanks! Makes sense.
Comment #56 March 4th, 2025 at 7:49 pm
Low-level pedantic technical question for Ryan Williams if he’s still reading, or anyone else who has gone through the proof: In the “warmup result” of Section 3.1 (which on its own would be basically just as groundbreaking) he says “Observe that each node i has indegree at most 5: one edge from i − 1, and at most four other edges for (at most) four active tape blocks during time block i (two active tape blocks for each of the two tapes).”
I’m a bit confused why it’s not in-degree 3 instead of 5. The general rule being described is “edge (i,j) if and only if (1) j=i+1, or (2) some time block accessed in time block j (called an “active” tape block during time block j) was most recently active in time block i”. But (1) is a special case of (2) for the tape block where the tape head is at the start of time block j. In other words, at the start of time block j, tape 1’s head is in some tape block, say k, and during time block j, the head possibly moves to an adjacent tape block, say k+1. Ryan is saying “unconditionally add edge (j-1,j), but then possibly add two more edges (i,j) and (i’,j) for tape 1 corresponding to the time blocks i and i’ that represent the most recent time blocks in which tape block k and k+1 were respectively active. (and the other two edges, to make 5 edges, are from similar reasoning about tape 2)
But tape block k was by definition most recently accessed during time block j-1, because it is the tape block where the tape head starts at the beginning of time block j. So really it seems there is only at most one other time block that could influence data accessed by tape 1 during time block j, namely time block i’ when tape block k+1 was most recently active. So tape 1 seems like it should contribute at most 1 additional edge, accounting for the fact that the neighboring tape block k+1 may have been most recently active a long time ago, during time block i’. Similar reasoning means that tape 2, which starts in tape block p and perhaps traverses into (say) p-1 during time block j, should only need to know what happened during the most recent time block that accessed tape block p-1, since time block j-1 is by definition the most recent time block that accessed tape block p.
Perhaps the explanation is simply that there could be the following corner case: during the very last transition of time block j-1, the tape head moved from adjacent tape block k+1 or k-1 INTO tape block k. Therefore, technically tape block k was not active DURING time block j-1, since the tape head only entered tape block k at the exact moment we exited time block j-1. (In fact I see that in the full proof, Ryan is using block-respecting TMs, so perhaps this “corner case” happens every time in the full construction.)
Of course, technically in this corner case, it seems that edge (j-1,j) would be unnecessary: The tape block where we start time block j was not active during time block j-1, so what happened to tape block k during time block j-1 is irrelevant, because nothing happened there. It seems to me like the “cleanest” rule for edges that shows what is really needed is just part (2): edge (i,j) if and only if some time block active in time block j was most recently active during time block i.
I want to make sure this corner case is the reason to think about 5 possible in-edges to time block j (though technically I think it’s 4 even in this case), instead of 3, and that it’s not something more subtle that I’m missing.
Comment #57 March 6th, 2025 at 9:15 am
Dave Doty #56 : During a time block the head of the first tape may access two different tape blocks. There are two heads hence four different tape blocks. The Turing machines considered in section 3.1 are oblivious, they are *not* time-block respecting (which I think was the mix-up ?)
Comment #58 March 6th, 2025 at 5:40 pm
Scott #50
fred #49 mentions some 4.2 threshold for 3-sat.
On the wiki entry:
https://en.wikipedia.org/wiki/Boolean_satisfiability_problem#2-satisfiability
There’s a short passage about that R = 4.2 (ratio of clause number to variable) phase transition where 3-sat switches to mostly no-solution.
“Selman, Mitchell, and Levesque (1996) give empirical data on the difficulty of randomly generated 3-SAT formulas, depending on their size parameters. Difficulty is measured in number recursive calls made by a DPLL algorithm. They identified a phase transition region from almost-certainly-satisfiable to almost-certainly-unsatisfiable formulas at the clauses-to-variables ratio at about 4.26.”
I found that 1996 paper here:
https://www.sciencedirect.com/science/article/pii/0004370295000453?via%3Dihub
I didn’t read the paper, but it seems to be entirely based on doing statistics using multiple runs.
I was just looking for what their criteria was for the threshold.
And I saw this graph (for N = 30 to N = 200)
https://i.imgur.com/JEESlFv.png
After I found that paper today, this afternoon I used my personal approach (*) on solving 3-sat to come up with a method for deriving the same thing.
Note that this is not based on doing curve fitting on the data from that paper, the data for my graph is derived entirely from scratch using reasoning inspired by my approach.
I used excel to derive it for N = 6 (in about 500 rows).
It would be easy to extend it for much larger values of N, but excel isn’t super practical for that, but I could write a small program to compute it.
https://i.imgur.com/3xEAFnU.png
Do you think it’s interesting at all, or this empirical 4.2 value has been backed analytically (in the time since that original paper was written)?
If you think it’s interesting, I can write some explanation for it and provide some code (e.g. on arXiv).
(*) I don’t claim that my personal approach to solving 3-sat is beating any other current practical 3-sat solvers, but I couldn’t find any reference to the method I’m using (I even read what Knuth had to say on the topic to see if my approach wasn’t super obvious).
Comment #59 March 6th, 2025 at 6:17 pm
Forgot to say that the slope in my graph is not as steep, but it’s for N = 6 (in the paper’s graph, the curve with the shallowest slope is for N = 30, so N = 6 would be even less steep).
But the value of 4 shows up – I wasn’t sure at all i would get anything that looks even vaguely like the original graph, so I was quite surprised, especially that my derivation is very very simple.
And I’m pretty sure the slope increases too using my derivation (I see that for N < 6, the slope is shallower).
Anyway, tomorrow I’ll write a program to derive it for the same values as the paper.
Comment #60 March 9th, 2025 at 11:51 am
Dave Doty #56:
The (j-1, j) edge is always needed to find out the state of the machine at the beginning of the time block. The corner case you mention does indeed seem to be why it is could be possible that none of the tape blocks used in time block j are active in time block j-1.
Comment #61 March 14th, 2025 at 3:21 pm
My results are very similar to the paper’s result at least quantitatively and the shift in ratio is expected.
https://imgur.com/a/wiwRWmA
I still do not know if this is interesting at all and I will try to ask the original paper’s writers for opinion.
Comment #62 March 21st, 2025 at 1:50 am
Scott, I would love to know your reaction to this blogpost https://www.alignment.org/blog/a-computational-no-coincidence-principle/
Comment #63 March 22nd, 2025 at 11:23 pm
Ian Mertz #44: I read your survey
https://iuuk.mff.cuni.cz/~iwmertz/papers/m23.reusing_space.pdf
that included open problems, and found it quite interesting and well presented. I’ve been playing around with a couple problems and might have made some progress. Before I put in too much time, I wanted to verify they are still open! I see from your recent preprint
https://iuuk.mff.cuni.cz/~iwmertz/papers/kmps25.collapsing_catalytic_classes.pdf
that you already solved one or two problems from the paper: CNL = CL, and the CSPACE analog of Savitch’s Theorem. Are the other major problems it describes still open? (Especially the separations or inclusions involving traditional space or time classes.)
Comment #64 March 24th, 2025 at 2:05 am
Re my comment #63:
This one turned out to be too simple not to just publish here (though I only noticed it after giving up on other methods that were 100x more complicated and still not close to working):
SPACE(min(2^s, log c)) is in CSPACE(s, c).
In other words, in the context of CSPACE(s, c) simulating SPACE, s is ridiculously powerful (when c is high) and c is ridiculously weak.
(Technically you need c^O(1) instead of c on the right, if SPACE insists on its traditional implicit O(). There may be other caveats too — consider this theorem-statement approximate.)
(Note that this doesn’t separate CL from L, since c is too small then.)
Proof sketch (rot13, in case you want to enjoy the hunt) –
Frg w = zva(2^f,ybt p). Qrfvtangr gur svefg w ovgf bs gur p gncr nf n “fyvqvat jvaqbj”, juvpu jr jvyy erirefvoyl zbir bar pryy gb gur evtug (pbclvat bgure qngn nebhaq vg), naq nyfb pbhag qbja gur pbagragf bs (nf n w-ovg ahzore), fb gung jr fhogenpg 1 sebz gur pbagragf sbe rnpu fgrc jr zbir vg. Guvf vf rnfl gb qb, fvapr jr pna rnfvyl nqqerff nyy w ovgf hfvat bhe haqreylvat f-ovg znpuvar.
Fgbc bapr gur w-ovg pbhagre ernpurf 0. Bhe jvaqbj bs jvqgu w unf abj zbirq fbzr ahzore bs fgrcf gb gur evtug, orgjrra 0 naq 2^w. (Vs vg zbirf nyy gur jnl gb gur gncr raq, vr jvgu vgf yrsg rqtr cnfg p – w, fgbc naq nqq fbzr jbexnebhaq gb gur zrgubq, be fbzr pnirng gb gur gurberz, fhpu nf punatvat vg gb fnl PFCNPR(f, p + w – 1).)
Abj jr unir n w-ovg ertvba bs nyy 0f gb cynl jvgu — naq whfg nf vzcbegnagyl, bhe p-gncr urnq vf evtug gurer. Vaibxr gur FCNPR(w) znpuvar jr jnagrq gb fvzhyngr. Vs vg arrqf uryc fgnlvat va gur fcnpr obhaqf be trggvat onpx gb gur fgnegvat cbfvgvba jura vg’f qbar, nqq nabgure zvabe jbexnebhaq be pnirng. Erpbeq vgf nafjre va gur pbageby fgngr.
Abj renfr gur w-ovg ertvba ntnva, naq gura erirefr gur cebprff bs trggvat urer, ol zbivat gur w-ovg jvaqbj yrsg, vaperzragvat vgf pbagragf, naq zbivat qngn nebhaq vg gb gur evtug. Fgbc jura vg ernpurf gur p-gncr fgneg. DRQ.
Comment #65 March 25th, 2025 at 11:52 pm
I think the construction in my comment #64 can be applied recursively. I’m still working out the details, but if I’m right, it means for essentially any s (perhaps even O(1)), SPACE(log c) is in CSPACE(s, c + O(log c)), proven by direct simulation of the SPACE machine. I can also prove this bound is close to tight, for any such inclusion which works via direct simulation. (I am unconventionally using SPACE(s) without an implicit O(s) in order to simplify the statement.)
(CSPACE(s, c) is the “catalytic space” class described in the Ian Mertz survey article I asked him about. s is the amount of normal space, and c is the amount of space full of junk that has to be restored at the end.)
Unfortunately I know of no application of these results. It means that if you have an amount c of in-use space and want to call a subroutine which can simulate having log c of space, one way to do it is to allocate log c of space to use as a TM tape pointer into the length c buffer, then let the subroutine have access to the length-c space only using that pointer, altering that space as it likes, but restoring its contents perfectly at the end, and it will not need any other space. And it will follow the “TM tape rules” and never look at the pointer value except to detect whether it’s at either end of the length-c space, etc.
But you could have achieved the same result more easily, by just giving the subroutine the log c of space directly, instead of insisting on it being used only as a black-box pointer into the length c region of “precious junk”! So afaik this is only interesting because of the proof method, and because it’s surprising the CSPACE machine can get around the severe restrictions on the use of the space implicitly provided to it in the form of that pointer.
Comment #66 April 2nd, 2025 at 2:57 pm
This is slightly off-topic, but would be even more off-topic under any more recent post, so I’ll post it only here for now, presuming that at least Scott may see it when he moderates this comment!
It is about a piece of terribly wrong info about basic complexity theory, on a website whose name implies it ought to know better, and which has infected the google browser’s Generative AI feature enough to make it parrot this page and give me a party wrong and therefore completely untrustworthy answer.
https://eitca.org/cybersecurity/eitc-is-cctf-computational-complexity-theory-fundamentals/complexity/space-complexity-classes/is-pspace-class-not-equal-to-the-expspace-class/
This organization claims to be “EUROPEAN INFORMATION TECHNOLOGIES CERTIFICATION ACADEMY – ATTESTING YOUR PROFESSIONAL DIGITAL SKILLS”, and this well-written web page states the Space Hierarchy Theorem apparently accurately, yet goes on to claim
> Despite the strong evidence provided by the Space Hierarchy Theorem, the question of whether PSPACE is not equal to EXPSPACE remains unresolved.
(Followed by an explanation of why they say it remains unresolved.)
The distinction between “strong evidence” and “proof” seems to have completely escaped them.
(BTW, my question to google was “Is it possible that EXPSPACE is in PSPACE/poly?”.)