Updates!
(1) My 8-year-old son asked me last week, “daddy, did you hear that GPT-5 is now out?” So yes, I’m indeed aware that GPT-5 is now out! I’ve just started playing around with it. For detailed reports on what’s changed and how impressive it is compared to previous models, see for example Zvi #1, #2, #3. Briefly, it looks like there are major reductions in hallucinations and sycophancy, and improvements in practical usefulness for coding and other tasks, even while the “raw intelligence” is unlikely to blow away someone who was already well-acquainted with o3 and Opus 4 other state-of-the-art models, the way ChatGPT and then GPT-4 blew away people who had no idea what was possible in late 2022 and early 2023. Partly how impressive a result you see depends on which of several GPT-5 models your query gets routed to, which you don’t entirely control. Anyway, there’s grist here for the people who claim that progress toward AGI is slowing down, but also grist for the people who claim that it continues pretty much as expected within our post-ChatGPT reality!
(2) In other belated news, OpenAI and DeepMind (and then, other companies) announced that they achieved Gold Medal performance on the International Math Olympiad, by solving 5 of the 6 problems (there was one problem, the 6th and hardest, that all of the AIs struggled with). Most importantly, this means that I’ve won $100 from my friend Ernest Davis, AI expert at NYU, who bet me $100 that no AI would earn a Gold Medal at the International Math Olympiad by December 4, 2026. Even though I’m normally risk-averse and reluctant to take bets, I considered this one to be extremely safe, and indeed I won it with more than a year to spare.
(3) I’ve signed an open letter to OpenAI, along with many of my fellow former OpenAI employees as well as distinguished scientists and writers (Geoffrey Hinton, Stuart Russell, Sheldon Glashow, Sean Carroll, Matt Yglesias…), asking for more transparency about OpenAI’s continuing efforts to change its own structure. The questions basically ask OpenAI to declare, in writing, whether it has or hasn’t now completely abandoned the original nonprofit goals with which the organization was founded in 2015.
(4) At Lighthaven, the rationalist meeting space in Berkeley that I recently visited (and that our friend Cade Metz recently cast aspersions on in the New York Times), there’s going to be a writer’s residency called Inkhaven for the whole month of November. The idea—which I love—is that you either write a new blog post every day, or else you get asked to leave (while you also attend workshops, etc. to improve your writing skills). I’d attend myself for the month if teaching and family obligations didn’t conflict; someone standing over me with a whip to make me write is precisely what I need these days! As it is, I’m one of the three advisors to Inkhaven, along with Scott Alexander and Gwern, and I’ll be visiting for a long weekend to share my blogging wisdom, such as I have. Apply now if you’re interested!
(5) Alas, the Springer journal Frontiers of Computer Science has published a nonsense paper, entitled “SAT requires exhaustive search,” claiming to solve (or dissolve, or reframe, or something) the P versus NP problem. It looks indistinguishable from the stuff I used to get in my inbox every week—and now, in the ChatGPT era, get every day. That this was published indicates a total breakdown of the peer review process. Worse, when Eric Allender, Ryan Williams, and others notified the editors of this, asking for the paper to be retracted, the editor-in-chief declined to do so: see this guest post on Lance’s blog for a detailed account. As far as I’m concerned, Frontiers of Computer Science has now completely discredited itself as a journal; publication there means nothing more than publication in viXra. Minus 10 points for journals themselves as an institution, plus 10 points for just posting stuff online and letting it be filtered by experts who care.
(6) Uma Girish and Rocco Servedio released an arXiv preprint called Forrelation is Extremally Hard. Recall that, in the Forrelation problem, you’re given oracle access to two n-bit Boolean functions f and g, and asked to estimate the correlation between f and the Fourier transform of g. I introduced this problem in 2009, as a candidate for an oracle separation between BQP and the polynomial hierarchy—a conjecture that Ran Raz and Avishay Tal finally proved in 2018. What I never imagined was that Forrelation could lead to an oracle separation between EQP (that is, Exact Quantum Polynomial Time) and the polynomial hierarchy. For that, I thought you’d need to go back to the original Recursive Fourier Sampling problem of Bernstein and Vazirani. But Uma and Rocco show, using “bent Boolean functions” (get bent!) and totally contrary to my intuition, that the exact (zero-error) version of Forrelation is already classically hard, taking Ω(2n/4) queries by any randomized algorithm. They leave open whether exact Forrelation needs ~Ω(2n/2) randomized queries, which would match the upper bound, and also whether exact Forrelation is not in PH.
(7) The Google quantum group, to little fanfare, published a paper entitled Constructive interference at the edge of quantum ergodic dynamics. Here, they use their 103-qubit superconducting processor to measure Out-of-Time-Order Correlators (OTOCs) in a many-body scrambling process, and claim to get a verifiable speedup over the best classical methods. If true, this is a great step toward verifiable quantum supremacy for a useful task, for some definition of “useful.”
(8) Last night, on the arXiv, the team at USTC in China reported that it’s done Gaussian BosonSampling with 3,050 photons and 8,176 modes. They say that this achieves quantum supremacy, much more clearly than any previous BosonSampling demonstration, beating (for example) all existing simulations based on tensor network contraction. Needless to say, this still suffers from the central problem of all current sampling-based quantum supremacy experiments, namely the exponential time needed for direct classical verification of the outputs.
Follow
Comment #1 August 13th, 2025 at 8:14 pm
(apologies in advance for dumb questions)
> Needless to say, this still suffers from the central problem of all current sampling-based quantum supremacy experiments, namely the exponential time needed for direct classical verification of the outputs.
I am kind of confused here – if you still need exponential time for verification, does this imply that there’s still no subexponential way to verify that you have actually done anything nontrivial? If I understand Gil kalai right then he’d object this disqualifies this method from proving quantum supremacy.
Comment #2 August 13th, 2025 at 9:11 pm
Awesome!
Re (6) – could Vicky the classical skeptic set up a forrelation circuit using bent functions that, *by construction*, has f(x) well correlated with the Fourier transform of g(x), for Peggy the quantum computer to run? Vicky could either give Peggy such a circuit or some other circuit, and *poof*, verifiable quantum supremacy.
Or would such an idea suffer from all similar proposals (Simon’s, welded trees…) – it’s extremely hard to find such a circuit that hides its forrelation?
Comment #3 August 13th, 2025 at 9:16 pm
Shaked #1: Yes, as I’ve said over and over in my talks and blog posts, this is (in my view) the biggest drawback of sampling-based quantum supremacy experiments as they currently stand. We can do verification up to ~50 qubits, and indirect types of verification beyond that (eg, replacing the computation by an easier one in order to calibrate the error rate). But direct verification of (say) 100-qubit Random Circuit Sampling is believed to be just as hard as spoofing of 100-qubit Random Circuit Sampling, which seems to be hard indeed.
Oddly enough, though, this does not appear to be a crux for Gil Kalai. Gil has said at great length that even the 50-qubit RCS experiments, the ones that were directly verified, must have been done incorrectly, since if they were done correctly then they would’ve falsified his theories about noise.
UPDATE: Following emails from Gil, I hereby retract and apologize for my statement that Gil says that the 50-qubit experiments must have been done incorrectly because otherwise they would falsify his thesis. He says that they must have been done incorrectly for complicated reasons that I’ll leave it to him to explain, and will not attempt again to render in my own words.
Comment #4 August 13th, 2025 at 9:18 pm
Mark Spinelli #2: Yes, exactly, finding a circuit that hides the forrelation is the hard part. The new work makes Forrelation exact, but still leaves it a black-box problem.
Comment #5 August 13th, 2025 at 10:46 pm
Why did Google release the result `with little fanfare’ ? Why aren’t they shouting off the rooftop about it?
Comment #6 August 14th, 2025 at 3:27 am
Scott #3:
The distintion between sampling and decision problems makes it a bit nonobvious, so I’m unsure if this is a dumb question, but would \BQP\ constructively being in \NP\ (by “constructively” I mean e.g reducing a BQP-complete problem to 3SAT directly proving \BQP\ is in \NP\, or reducing quantified boolean formulas to 3SAT, thus proving \NP=PSPACE\ and it’s already known \BQP\ is in \PSPACE\ ) necessarily imply that quantum supremacy would be polynomial-time verifiable? How would it influence the efforts to check whether quantum supremacy was achieved?
Comment #7 August 14th, 2025 at 6:43 am
William Gasarch #5: Good question, I don’t know! Maybe there was fanfare that I missed.
Comment #8 August 14th, 2025 at 6:47 am
The AI world is riddled with sky high hype, expectations, dooms day predictions, and on the other end of spectrum which ridicules it as Stochastic parrot, scam, bias, hallucinations etc.
After almost 3 years of ChatGPT release to the world this is what it looks like
1. ChatGPT and others was definitely way more than what computers/software could manage until then. So from that perspective it was a step change.
2. However, from usefulness perspective its a small increment, maybe a little better than Google search, which gets incrementally better with each iteration
3. This technology is progressing at a pace that is similar to others , internet, mobile, cloud computing etc as vs the exponential progress that the companies profess and others claim. The investments and market capitalizations though have reached the stratosphere.
4. While it will embed itself in daily lives of people and organizations just like search, it will be a long time for it to change the current human way of life substantially.
5. Achieving benchmarks that humans have already mastered, chess, Go, math olympiads etc will continue to happen at a blistering pace because we already know how to get there.
6. So can anyone expect it to provide useful insights (not solve itself) to solve Riemann hypothesis or P=NP, or even problems that are not so deep rooted ? That is up in the air, with no indication whatsoever on either side
Comment #9 August 14th, 2025 at 6:52 am
anonymous #6: You’re right that the role of sampling in quantum supremacy experiments makes this more complicated. If BQP⊆NP, then every decision problem that was quantumly easy would have a short witness (although still, not necessarily a short witness that the quantum computer could efficiently find!). But it need not have that implication for tasks like Random Circuit Sampling or BosonSampling. I can’t think of any standard complexity assumption that makes the latter classically verifiable without also making them classically easy.
(Here I’m leaving out interactive verification protocols like Mahadev’s, which are a whole separate discussion…)
Comment #10 August 14th, 2025 at 11:28 am
I think that we can do something about that Frontiers of Computer Science garbage. We have to hold the publisher Springer to account and get them to do something. They are the ones making money from the journal.
What if we started a public boycott of Springer until they deal with this properly?
Comment #11 August 14th, 2025 at 12:37 pm
I take it back… a boycott is not the right solution. There are plenty of refutations of similar styles of claims out there. One should just be recycled and submitted to Frontiers for this paper.
Comment #12 August 14th, 2025 at 6:20 pm
Still if factoring is in P and SAT has P algorithm how will AI change?
Comment #13 August 14th, 2025 at 6:36 pm
FactoringandNP #12: If factoring is in P, that has zero direct effect on AI that I can think of (huge effect on cryptography, of course). Factoring just isn’t an important problem for AI.
If P=NP and the algorithm was efficient in practice, of course that would have a huge effect on AI — among other things, it would mean you could efficiently find the loss-minimizing weights for any neural net, with no gradient descent, and it would render local search and backtracking heuristics obsolete in many other AI domains as well.
Comment #14 August 14th, 2025 at 8:36 pm
My concern is, is there anything wrong with the framework or the proof of this paper? I have read the comments on Fortnow’s blog, and it seems that no one has found any real errors, except that someone claim that the paper did not mention their theorem only holds in polynomial space.
Comment #15 August 15th, 2025 at 1:52 am
@anonymous #14
Here’s my understanding of the Frontiers paper.
1. The paper makes a critical and strong assumption about how algorithms for CSP must work (“We only need to assume that this task is finished by dividing the original problem into subproblems.”)
2. Lemma 3.1, Theorem 3.2, and Corollary 3.3 depend crucially on this critical assumption. The assumption is made in the text immediately preceding Lemma 3.1, but the assumption is not made for Theorem 3.2 and Corollary 3.3.
3. Without the critical assumption, the statement of Theorem 3.2 contradicts an old result of Ryan Williams.
4. The paper claims to have proved P != NP and SETH in Corollary 3.3, without qualification. But their “proof” of these two conjectures relies implicitly on the critical assumption.
Comment #16 August 15th, 2025 at 2:11 am
anonymous #14: I recommend reading the comment that was submitted to the journal, which is linked from the blog post:
https://people.cs.rutgers.edu/~allender/papers/allender.williams.pdf
That comment makes it very clear how the claimed proof fails: it makes a very strong assumption about the shape that an algorithm for SAT would take, and doesn’t justify that assumption.
The blog post’s comment thread has a rather sprawling discussion that involves space bounds, but the space bounds are all on a tangent and have nothing to do with the main point.
Comment #17 August 15th, 2025 at 5:42 am
Hi Scott,
How long ago did you have the bet with Ernest Davis?
And also what made you confident? What do you think the capacities are that the IMO contestants have, which these AI systems too possess?
Comment #18 August 15th, 2025 at 8:54 am
@pnp #15: I noticed a comment regarding the paper’s assumption. The framework of this paper follows Gödel’s work, where Gödel also used an assumption about the nature of proof: a proof is a finite sequence of symbols, and its correctness can be verified mechanically. In this paper, the assumption is that algorithms for CPS must be based on a “divide-and-conquer” approach. This assumption acts as a bridge connecting self-referential instances to the diagonalization method. The question is: is this framework valid? If so, is there a better possible bridge?
Comment #19 August 15th, 2025 at 10:41 am
Anonymous #14: Note that any proof that any polynomial-space algorithm for CSP (or any other problem in NP) requires more than polynomial time would still imply that P is properly contained in NP. Thus it is not the case that “the theorem only holds in polynomial space”. Or more precisely: the authors have not presented a proof that the theorem holds in polynomial space. The discussion of polynomial space is only relevant, in that the algorithm in [Williams] (which explicitly disproves the theorem as stated by the authors) uses more than polynomial space. The authors attempted to claim that this wasn’t a “real counterexample” because — in their opinion — an algorithm that outperforms brute-force search by using additional space is not considered to be a “real” algorithm for solving constraint satisfaction problems by researchers in the community. I disagree with them on that point.
Comment #20 August 15th, 2025 at 10:50 am
anonymous #18:
Don’t forget that a constructive \(NP=P\) proof can be an arbitrarily opaque algorithm, and making such a strict assumption might be some progress if the conditional result was correct (I haven’t read the proof), but it by far does not come close to actually proving \(NP\) is not in \(P\), let alone SETH. There is a substantial difference between “an algorithm has this concrete, strict form” and “a proof is a finite sequence of discrete characters that can be verified mechanically”. I think the second definition is universal when it comes to formal proofs, because you have to actually represent it in software and actually have a way to verify it. The restriction of algorithms wouldn’t include bubble sort and some other sorting algorithms, for instance.
Comment #21 August 15th, 2025 at 11:37 am
Shaked and Scott (#1,#3)
1) Google’s 2019 quantum supremacy claim — that their quantum computer completed in 300 seconds a task that would take a classical supercomputer 10,000 years — would, if correct, falsify my theory.
2) Once the supremacy claim was largely refuted, Google’s 2019 assertions regarding the fidelity of the 50-qubit samples no longer contradict my theory. (I do not understand why Scott writes otherwise (#3).)
3) Our detailed analysis of the 2019 Google results indicates that these claims rely on flawed methodology. This conclusion holds even in the 12–20 qubit range.
4) Our analysis connects to the broader verification problem: because supremacy claims cannot be directly verified, they rest on extrapolation — from either smaller or simpler circuits. Methodologically unsound optimization can undermine such extrapolation arguments.
Comment #22 August 17th, 2025 at 1:14 pm
Scott,
I’m wondering how your assessment of GPT-5 lines up with the growing popular perception that it’s either worse than, or only very slightly better than, GPT-4 or Opus.
As a daily user of these models, GPT-5 often feels worse to me, and even when I try to measure things more objectively, I don’t see much improvement—in fact, I sometimes see declines in quality.
One intuition I’ve developed after using these systems heavily for over two years is that the benchmarks academics and companies rely on are missing something essential. My sense comes from watching how poorly the models can handle even modest ambiguity in questions.
The analogy I keep coming back to is self-driving cars. On a closed obstacle course, they can perform impressively. But once released into the messy ambiguity of the real world, their many flaws become impossible to miss.
Comment #23 August 17th, 2025 at 5:12 pm
Prasanna # 8:
Thanks for sharing two things: (i) a window into how your generation thinks about it, and, as a part of that, (ii) the following:
> “The AI world is riddled with sky high hype, expectations, dooms day predictions, and on the other end of spectrum which ridicules it as Stochastic parrot, scam, bias, hallucinations etc.”
Ummm…
Before you even attempt to find a golden mean, let me note also this:
It’s rather a dichotomous view. And, let me latch on to just one part, the last, viz., hallucinations:
It was in 2022/2023 when young JPBTIs in CS from IITB were being paid in millions and I often in tens of Indian rupees, that I was told, with respect — given my age and my way of talking — by at least one [very] senior HR executive, that what we were about to have a “tsunami” of sorts in the jobs market for which the world was not only not prepared but didn’t even have an inkling. … He gracefully went on for half an hour or so. [And, yes, it *was* that — a graceful talk.]
Within a few months, then on, he was talking about how s/w development jobs were about to be automated fully, and how many talented people [e.g. me] should be planning for my exit / retirement into peace, perhaps permanent and rather graceful too, given his habits of talking in English.
At that time, I was pointing out hallucinations, and other points too, at this blog, and elsewhere too.
…
Alright, where are we?
The Replacement became *CO*-Pilot. At least, every one came to know about it.
I still have my job, still occasionally finding the ways and means to pay even me in the tens of thousands in Indian Rupees “every” month.
Some of the states in the USA have already begun enacting laws to the effect that the AI mustn’t be allowed in psychotherapy without human supervision, always. [It should be OK to try for physiotherapy, esp. for seniors and veterans, I suppose.]
And, oh, by the by, the young JPBTIs did make a killing in the meanwhile. [Ask me to name the names. I remain as shameless as ever, in matters like these.]
…
But given the standards I demand of myself and keep, I still ask myself — and have the knowledge and ability to answer, and hence will answer — the following question, a question which, to my my knowledge, none has posed this well or so directly [Scott included, including in his sabbatical stint with OpenAI]:
Granted that you can’t eliminate hallucinations, why is it that LLMs/TextGen/GenText models *do* have *so* much of *actual* success?
It’s been a long time that I figured out the answers too, two answers, and I will mention them both here [may be I should be moving this comment to my blog]:
1. Whatever the degree of success this project of LLMs/GenText/TextGen/Etc. has [actually] achieved, it’s a result of [deliberate] engineering it all in such a way that
1.1. The non-hallucinating part is actually a value addition for the user (over and above Search etc.), AND,
1.2. The hallucinations are kept as easy to “spot”, by the human eye, as possible.
2. The “co-pilot” metaphor manages to highlight the value addition part while keeping hallucinations to the minimum [except for an occasional comment by a non-American-degree-holding Indian engineer like me]
OK, now, one last point.
For how long will this value addition trend continue? can?
Two points, again:
1. Note, they exhausted all the original data — discounting data generation (which corrupts the entire DB), and the subsequent data pollution.
2. Government-sponsored projects ultimately do find an ingeniously subtle way to pass the ultimate burden to the common man.
I guess Scott isn’t going to like this comment. I anyway am caching it. Some day, I might run an [edited] version at my blog. I am not in a hurry. Prof. Dirac’s work — and its undervaluation, starting by he himself and esp. strongly by the Americans — beckons.
Best,
–Ajit
Comment #24 August 17th, 2025 at 11:53 pm
@ #18
There appears to be a philosophical misunderstanding on your part, or mine. Gödel’s assumption that “A proof is a finite sequence of symbols” is a definition of what formal systems are, whereas Xu & Zhou’s assumption that “CSP algorithms must use divide-and-conquer” is just a restriction on possible formal approaches. It’s not in the same category for discussion when comparing a general statement of meaning to an exclusion of methods that artificially narrows scope, which is what naturally brings the conversation to counterexamples until all possible points of failure are proven to be otherwise. And that hasn’t been accounted for within the paper. Unless I’m missing something?
Is the framework valid? Not in what it sets out to do. A bridge built on such a narrow assumption reads like a pier. Their work uses diagonalization in a rather limited sense by disregarding generality with no justification and nothing is truly self-referential, instead just being sensitive to parameters. It’s not “just like Gödel” as claimed in the abstract with no self-reference, paradox, or generality. What’s more, nothing technical in their approach changed compared to their 2023 paper. Consequently, the 2025 philosophy feels ad-hoc as opposed to integral and the authors aren’t making reasonable claims here.
Aside from claiming too much with too little, the false quote brings intentionality into the conversation and not just honest error. And what is one to think? How can a quote be misattributed like that? How can all of it pass through? And then there’s all the other strange circumstances others have mentioned surrounding the work. Could it be that they have seen some philosophical tides changing and simply wanted priority? Or is there space for me to think differently, as I’d like to but am struggling to find the means?
Comment #25 August 18th, 2025 at 8:44 am
I have been using ChatGPT 4 and now ChaptGPT 5 on a daily basis for my work for about a year now. From most useful to least useful:
(1) Evaluating philosophical arguments. This has been amazingly useful, but then I am not a professional philosopher.
(2) Just looking things up and summarizing things (people, places, things, events). This usually works pretty well.
(3) Evaluating C++ or Csound code that I have written. Useful, but there are errors.
(4) Writing new C++ or Csound code. Not very useful. Here I am trying to do more original work, my intention is often not grasped, and there are frequent errors and hallucinations. But useful enough, that I keep trying to use it.
My impression is that almost everybody will end up working in this way, and I am by no means certain what that will end up meaning.
However, I will venture that computing and networking are contributing to an increase in social and political alienation, and a loss of access to reliable judgment. I don’t think chatbots are going to help with this, and may well ensconce us ever more deeply in our unreliable cocoons.
Comment #26 August 19th, 2025 at 1:13 am
Hi Scott, do you see any likely candidate for a problem in NP – P (classically hard, but checking the answer is classically easy) that quantum computer might be able to solve in the near future?
Comment #27 August 19th, 2025 at 3:08 am
Has anyone considered applying forrelation to protein crystallography? There, one has an empirical dataset of the fourier transform of a crystal, and one wishes to compare it with a conjectured model one has built (possibly with a huge amount of effort, or at worst using AlphaFold).
This remains one of the central tasks in the field of protein structure. AlphaFold gets you a long way but examining the “ground truth” is still very important, including for pharmacology. One caveat: the dataset is an incomplete picture of the FT, having amplitudes but not phases.
(In the Q2B lecture describing forrelation you asked for suggested applications! Anybody who knows the computational-structural-biology “stack” should have had the same thought as me. But I can’t immediately find anything on Google)
Comment #28 August 19th, 2025 at 12:11 pm
Martin Mertens #26:
Hi Scott, do you see any likely candidate for a problem in NP – P (classically hard, but checking the answer is classically easy) that quantum computer might be able to solve in the near future?
If by “solve” you mean “scalably solve, much better than a classical computer could solve it”—then my strong suspicion is that this will happen when and only when someone builds a true fault-tolerant QC (and then the NP-P problems will just be the usual suspects, like factoring and discrete log and elliptic curve problems).
Will that happen in the “near future”? Given the tremendous effort now being invested, it isn’t impossible, but it’s certainly far from guaranteed!
But also, people should absolutely try to falsify my suspicion, by continuing to look for NP problems for which a clear quantum speedup is achievable on a NISQ device. I just don’t think we know such a problem right now.
Comment #29 August 19th, 2025 at 12:16 pm
Hamish Todd #27: It’s an interesting idea, but I see two severe problems.
First, with datasets from the real world, it will almost always be faster and easier just to do an FFT on a classical computer. The forrelation algorithm becomes a clear win mostly if you have functions f,g with exponentially many inputs (say, 10100 of them), which are specified by some compact rule that lets you calculate f(x) or g(y) for any desired x or y.
And second, if indeed you only have the amplitudes but not the phases, then that’s fatal for the forrelation algorithm.
Comment #30 August 19th, 2025 at 6:13 pm
Scott:
Problem 1 sounds like a showstopper. While parts of the problem have an exponential nature (famously: the number of possible angles is exponential in the number of amino acids), most of it seems cubic at most.
To problem 2: to be clear, the empirical data lacks phases, but then the exact problem one is trying to solve (partly algorithmically, partly with help from AlphaFold, and significantly still with human intervention) is to put them back in. How this is done is significantly a matter of constraints coming from other sources. But it is also a matter of finding a model whose Fourier transform is “close” (presumably in your sense) to the empirical data.
This at least merits me reading about forrelation. I presume the main reference is still “Forelation: A Problem that Optimally Separates Quantum from
Classical Computing”?
Comment #31 August 19th, 2025 at 8:02 pm
@ #24
To rigorously analyze the complexity of solving Constraint Satisfaction Problems (CSPs), it is imperative to constrain the underlying mathematical framework—since determining the complexity of solving problems, at its core, constitutes a self-referential proposition. Addressing such paradoxical self-reference requires the deliberate construction of self-referential instances, which serve as both the subject and the mechanism of analysis.
The divide-and-conquer paradigm offers a reasonable, rather than overly narrow, assumption for tackling CSPs, particularly given their inherently unstructured nature. Ultimately, the necessity for brute-force computation in solving CSPs arises not from the chosen assumptions, but from the self-referential characteristics embedded within the problems themselves. In other words, self-reference stands as the key and sole approach to understanding—and formally proving—the inevitability of brute-force methods.
Comment #32 August 20th, 2025 at 7:53 am
Daniel #31:
A problem with the reasonableness of the divide-and-conquer assumption is that I’m well aware of practical algorithms that do not follow it, e.g the blossom algorithm for finding maximum matchings in graphs, linear programming algorithms such as e.g Karmarkar’s algorithm, the ellipsoid method, the simplex method, some alleged \(NP=P\) proofs reveal they are not divide-and-conquer already in the abstract (though those are also sometimes guaranteed to fail by known theorems before even reading them), GNFS which is the most efficient known classical algorithm for factoring big integers isn’t divide-and-conquer, and considering factoring efficiently reduces to 3SAT I’d expect a \(NP=P\) proof might likely be complex as well if it exists.
Comment #33 August 20th, 2025 at 8:27 am
Does any paper (Google 2019 or other) give us direct proof (not some unverifed extrapolation) that a quantum computer completed in 300 seconds a task that would take a classical supercomputer
10,000 years1 month? (I take a weak version of Gil #21)If not, I don’t understand why (otherwise I will be glad for references)
Comment #34 August 20th, 2025 at 9:55 am
Evyatar #33: No, it doesn’t. This is what I and others have been trying to explain. The reason why not is that the problems that fit on current, non-fault-tolerant quantum computers are not like factoring or discrete log. It becomes intractable to classically verify the results in precisely the same regime where it becomes intractable to classically calculate them, and for the same reasons. This is why current quantum supremacy claims rely on plausible extrapolations from the regime where we can (often with great effort) classically verify the results.
Comment #35 August 20th, 2025 at 10:19 am
Scott #34:
I’m not sure I understand.
I understand that verification and sampling are on the same level (not like factoring or NP problems)
But is a 1 month verification (without extrapolations) of a supercomputer not feasible?1 week?
Comment #36 August 20th, 2025 at 12:34 pm
Evyatar #35: A month (or weeks, or whatever) of verification using a classical supercomputer, costing hundreds of thousands or millions of dollars, is exactly what people do do.
Comment #37 August 20th, 2025 at 12:52 pm
Scott #36:
OK. Is any reference available for this? I searched for it in the past and haven’t found such verification
Comment #38 August 20th, 2025 at 2:05 pm
@ Daniel #31,
I appreciate your reply, though it’s not addressing why the assumption supposedly holds or refuting the possibility of counterexamples. And somewhat just reasserting claims made in the paper? Symmetry-induced fixed-points are self-similar under transformation, but not reflexive in the Gödelian sense. No formal instance in the paper actually encodes, or refers to, its own satisfiability/unsatisfiability like “This sentence is false” or “This sentence is x”. Replication via construction is not inherent self-reference. Calling it “self-referential” is metaphor attempting to masquerade as formal demonstration. If I’m not mistaken and please let me know if so, my earlier assessment still stands.
—
It’s possible one could debatably call it “quasi-self-referential” and yet, if that’s accepted within the community, then there are numerous other structures across mathematics, physics, and logic deserving the same label. Still, that’s not the conversation the authors were attempting to have within the paper.
Comment #39 August 20th, 2025 at 3:25 pm
Sorry but Posts like #31 or #18 sound just like ai generated nonsense. Also half the comments on Lances blog are of a similar nature.
Comment #40 August 20th, 2025 at 4:45 pm
Evyatar #37: Yes, read any of the quantum supremacy papers, like those from Google or Quantinuum or USTC. They explain how they did the classical verification.
The issue, again, is that the classical computation you do to verify the QC’s outputs is almost the same as the one you do to spoof them. So you can have spoofing take a month, but you can’t have it take a billion years, or else verification would take a billion years as well.
Comment #41 August 20th, 2025 at 8:02 pm
@anonymous #32
Computational problems generally fall into two broad categories: algebraic and combinatorial. These two types differ significantly in terms of algorithm design and computational complexity.
Algebraic problems possess inherent structure, which can often be exploited to develop various algorithms. Representative examples include linear programming you mentioned, primality testing, fast matrix multiplication, and the well-known integer factorization problem.
In contrast, combinatorial problems tend to lack such structure, making them less amenable to algorithmic exploitation. General Constraint Satisfaction Problems (CSPs) are fundamentally combinatorial in nature. Remarkably, every CSP has been proven to be either in P or NP-complete, highlighting a stark dichotomy in their complexity classification. Consequently, the landscape of exact algorithms for CSPs is quite limited, and divide-and-conquer emerges as a natural and often necessary strategy for tackling them.
For a more in-depth discussion, see: Barak, Boaz. Structure vs. Combinatorics in Computational Complexity. Bulletin of EATCS, 1(112), 2014. Available at: https://www.boazbarak.org/Papers/meta-alg.pdf
Comment #42 August 20th, 2025 at 8:08 pm
@ Jacob #38
Regarding the assumption, please see my reply to #32.
I think that the authors have made a clear comparison between their work and Gödel’s work in the extended abstract of their paper:
“Gödel’s proof and our proof address two different cases (i.e., first-order logic and propositional logic, respectively). The essence of both proofs (i.e., the use of self-reference) is the same, but there are differences in constructing the self-referential object. Specifically, this involves constructing an object such that negating it results in an object equivalent to itself. In Gödel’s work, the self-referential object is a logical formula. In our work, the self-referential object is an infinite set of satisfiable and unsatisfiable examples. “
Comment #43 August 20th, 2025 at 9:15 pm
Daniel #41: I’m not sure how much I agree with you, or Boaz, in associating algebraic problems with “structure” and combinatorial problems with “lack of structure.” For example, the maximum matching problem is “combinatorial” if anything is, while the permanent is “algebraic,” yet matching seems to have much more structure enabling an exponentially faster algorithm than anything we know for the permanent. In general, if the property that we care about is “having an efficient algorithm,” then (alas) it will be easy to come up with counterexamples to any attempt to equate that property with any property that’s easier to check.
Comment #44 August 21st, 2025 at 1:56 am
Scott #40:
Well, I might misinterpret the 2019 Google paper (supplementary material), but it seems that no such verification was done above 5 hours (FIG. S50(b), page 59, TABLE X, page 53, and TABLE VII, page 47)
Google Paper
I don’t want to disregard those efforts (which cost around $50K) and the acceleration of a x10 factor, but the paper claims of 10,000-year extrapolation and a new paradigm (which kind of breaks church-turing hypotheses) are on a different level.
Comment #45 August 21st, 2025 at 4:25 am
Scott #43:
Generally speaking, algebraic problems tend to exhibit more inherent structure than combinatorial ones. However, when it comes to specific cases, determining which problem possesses greater structure can be quite subtle. The authors appear to have reframed the question: rather than asking which problems are more structured, they focus on identifying problems that have no structure and therefore necessitate exhaustive search. They argue that this shift in perspective makes it much easier to prove lower bounds.
The authors stated in the abstract: “Specifically, proving lower bounds for many problems, such as 3-SAT, can be challenging because these problems have various effective strategies available to avoid exhaustive search. However, for self-referential examples that are extremely hard, exhaustive search becomes unavoidable, making its necessity easier to prove. Consequently, it renders the separation between non-brute-force computation and brute-force computation much simpler than that between P and NP. “
Comment #46 August 21st, 2025 at 5:43 am
@ Daniel #41 & #42,
That’s a respectable citation and I think it’s to serve as more reflective/exploratory, meant to inspire as opposed to prescribe. I’m quite interested to know what explicitly connects Xu and Zhou’s assumption of self-reference to Borak’s paper and formally justifies it? Maybe you could help me out here, Daniel? If I haven’t been clear enough yet, for that I apologize. The authors say:
“The essence of both proofs (i.e., the use of self-reference) is the same, but there are differences in constructing the self-referential object.”
This presumes a false equivalence between that which is in question. Since there’s no discussion in the paper regarding how the stuctures are essentially equivalent, there’s no actual case that’s made in the paper regarding its own self-reference. They aren’t “essentially” equivalent and it’s erroneously assumed. The authors repeat their comparison constantly throughout the paper, but repeating claims isn’t justification or proof. It can instead be a cudgel of fallacious heuristics that attempts to wear down one’s common sense. And I’m rather curious to know the reasoning behind the nature of this repetition?
Comment #47 August 21st, 2025 at 6:54 am
And if I may, I’d just like to sincerely apologize for incorrectly spelling Boaz Barak’s name! No excuses, just tunnel-vision and a lack of attention to detail on my part.
Comment #48 August 21st, 2025 at 4:02 pm
#44 Evyatar , i had no idea the verification was so weak.
extrapolating 5 hours to 10,000 years is absurd
Comment #49 August 21st, 2025 at 10:14 pm
@ Jacob #46
If you look up the definition of self-reference on Wikipedia, you will find that self-reference takes various forms, and Gödel’s self-reference is just one of them. The core of Gödel’s contribution lies in addressing how to represent self-referential statements—such as “This sentence is unprovable”—within the framework of first-order logic, achieved through the ingenious use of Gödel numbering.
The self-referential set constructed by the authors differs in form from Gödel’s self-referential statements, yet they share the same essence—namely, that negating the object yields the object itself. This paradoxical property underscores the deeper distinction between syntax and semantics, revealing how self-reference can be employed to probe the limits of formal reasoning.
Consider a simple example: a set containing both positive and negative numbers, where the task is to determine the sign of each element. Ordinarily, this can be easily accomplished by appealing to the semantic definitions of “positive” and “negative.” But can this task be completed without relying on those semantic definitions? In certain cases, yes. For instance, suppose all positive numbers in the set have absolute values greater than 2, while all negative numbers have absolute values less than 2. In this scenario, we can distinguish between positive and negative numbers purely by examining their absolute values, without invoking their semantic labels. It is evident that positive and negative numbers are transformable into one another via multiplication by -1. We define multiplying a set by -1 as applying this operation to each of its elements. If the set is symmetric with respect to sign—for example, {1, -1, 2.6, -2.6, 3.4, -3.4}—then multiplying the entire set by -1 yields the same set. In such cases, the distinction between positive and negative numbers becomes indistinguishable through structural means alone and must be made based on their semantic definitions.
The gap between syntax and semantics is explained in more detail in the comments by Wanwei Liu included in the appendix of the paper:
“In my opinion, the instances created in Xu and Zhou’s paper can be seen as twin instances: although they share the same syntax, they differ in their semantics, and semantical feature cannot be captured by their (even, have no) syntactical counterpart.
Then, from Gödel’s theorem (of incompleteness) to Tarski’s (undefinability) theorem, then to Turing’s “uncomputability”, maybe the gap between syntax and semantics is the “manipulator” hidden behind them. Xu and Zhou’s paper makes me notice this important issue that has once been ignored.”
Comment #50 August 22nd, 2025 at 5:53 am
@ Daniel #45,
First and foremost, I believe that to be a post-hoc reframing of the paper, which does not readily indicate such thinking itself. It’s also a sentiment in the zeitgeist that isn’t novel or unarticulated at this point and perhaps just awaiting an effective, formal, and explicit exposition. However, in this new interpretation you note:
“The authors appear to have reframed the question: rather than asking which problems are more structured, they focus on identifying problems that have no structure and therefore necessitate exhaustive search.”
Yet the authors have identified but only one problem that itself relies on symmetry to create hard instances, and that’s structure. The very construction that makes it hard relies on structured relationships between variables and constraints. So they’d simultaneously be using structure to create the problem while also claiming it has no structure to argue it’s hard. That’s incoherent. Either way, the paper doesn’t explore this reframed question, or implicitly answer it regardless of whether it had been posed.
It’s an interesting line of thought, but requires a great deal to explicitly demonstrate. I think there could be value in studying problems that one might say are “maximally resistant” to algorithmic shortcuts, even if we can’t prove they have absolutely no structure. But if that’s what’s sought, it would require much more careful definitions, far broader analysis than just Model RB, and rather modest claims compared to what’s done in the paper. All this also involves addressing counterexamples, and the general possibility of such, which is yet to be done. How would one begin to wrestle with the unstructured without embedding any structure too, and are we biased against this by default in having inherent structure ourselves?
Comment #51 August 22nd, 2025 at 4:42 pm
@ Daniel #49,
Delegating the definition of self-reference to Wikipedia as you’ve suggested only serves to broaden it beyond any possible measure of formal meaning. This is a technical claim made in an academic journal and Wikipedia does not suffice as a source. To be honest, none of your reply remotely begins to address my core critique or much of anything I’ve said to substantiate it.
The example you provided indicates a misunderstanding on your part in regarding syntax for semantics. The negative symbols are syntactic. The reason for this is that it doesn’t matter whether we assign the “definitions” of positive and negative to the symbols; what matters is the syntactic rules defining their relationship. Any two labels would suffice, so that’s not really a definition to begin with. “Positive” and “Negative” simply have specific formal properties which are in relation with one another. And that’s what matters. Surely others have their own recommendations, but I’ve found Hao Wang’s book “From Mathematics to Philosophy” to be an excellent read that may help with any confusion, Chapter 5 specifically for that. I’m curious to see your next example. My own take here philosophically is that “relationship” can be viewed as the fundamental primitive for syntax, whereas one’s search of a primitive for semantics inevitably hits the bedrock that is “ineffability” if diligent. Maybe that helps?
Ultimately, there’s still the misapplication of self-reference by equating it with a form of bistability, or symmetry-induced invariance under negation, which is distinct from Gödel’s reflexive, meta-logical self-reference. Negation yields “the object itself” as an invariant set, but there’s no reflexivity because the object doesn’t “claim” anything about its own states. Gödel’s sentence doesn’t equal its own negation; it makes a claim about its own provability. Can you address this, Daniel?
The “gap” that’s evoked in a diagonal argument is referential, not self-referential. It’s participatory in a sense that’s qualitatively distinct as a proof, “looping in” the evaluator, but still doesn’t inherently invoke self-reference. And the authors of the comments in the appendices all seem to buy in to the same presumptive mistake while declaring profundity. I find the way everyone’s talking in the paper to be… not very helpful when considering the other strange circumstances surrounding its publication.
Comment #52 August 22nd, 2025 at 8:11 pm
@ Jacob #50
I believe a comment from Fortnow’s blog provides an answer to your question. For your convenience, I’ve copied it below.
———————————-
This paper appears to present several novel ideas. A straightforward method to assess the validity of a new idea is to examine its applicability to other problems. Specifically, this papers aims to demonstrate the necessity of brute-force computation by constructing self-referential instances, and confirms that such instances can indeed be constructed for Constraint Satisfaction Problems (CSPs) with growing domains. To apply this idea to other problems, we need to identify the properties under which self-referential instances can be constructed to establish the necessity of brute-force computation.
In Appendix B of this paper, I note that reviewer Bin Wang highlighted a crucial yet easily understandable property: growing domains significantly weakens the correlation between assignments, making the solution space appear nearly independent. Formally speaking, two assignments I and J are independent if Pr(I, J are solutions)=Pr(I is a solution)Pr(J is a solution). The near-independence property ensures that the slight change made by the symmetry mapping will not affect the remaining solution space, thereby enabling the construction of self-referential instances.
This property reminds me the well-known hard problems like Independent Set (or Clique), where the task is determine whether a graph with n vertices contains a subset of k vertices such that no two are adjacent. Currently, no known algorithm can find an independent set of size k=n^epsilon more efficiently than brute-force search [1, 2]. For this problem, it is easy to observe that between two random candidate sets of size k=n^epsilon, with high probability these two sets have no overlap. Consequently, such a solution space exhibits near-independence. Given this characteristic, it should be feasible to construct self-referential instances using an approach analogous to that presented in this paper.
[1] R. G. Downey, M. R. Fellows. Fixed-parameter tractability and completeness II: On completeness for W [1]. Theoretical Computer Science, 1995, 141(1-2): 109-131.
[2] J. Chen, X. Huang, I. A. Kanj, G. Xia. Strong computational lower bounds via parameterized complexity. Journal of Computer and System Sciences, 2006, 72(8): 1346-1367.
Comment #53 August 23rd, 2025 at 5:09 am
@ Daniel #52,
I appreciate the continued efforts, but that comment feels like a deflection and somewhat flagrant obfuscation of the conversation’s direction. It’s repeating the same mistakes I’ve pointed out previously, now just with these particular structures instead. And still, what of the counterexamples everyone’s mentioned? Could you read my other comments again and let me know if I’ve been unclear? Growing domains weakening correlation and statistical independence are not self-referential either. They’re essentially different. This erroneous approach persists throughout the whole paper and that includes the appendices.
Comment #54 August 23rd, 2025 at 7:26 am
@ Jacob #51
Let F denote a proposition, P be a predicate, and P(F) indicate that the proposition F is provable. Then Gödel’s self-referential proposition can be expressed as F being equivalent to the unprovability of F, i.e., $F=\neg P(F)$.
Comment #55 August 23rd, 2025 at 12:12 pm
I don’t quite understand why people are actually defending this paper. The authors have already conceded that their claims in the paper rely on assumptions that are not stated. They have also after the publication repeatedly changed what they purport to prove. Those things alone should be reason enough to retract the paper. The fact that they haven’t retracted suggests that they either may not fully understand what it means to prove a claim or that they are not acting in good faith. They should explicitly state what they are proving under what assumptions. As long as that is unclear (and it is quite unclear), the discussion seems a bit useless to me.
Comment #56 August 23rd, 2025 at 12:48 pm
@ Daniel 54,
Your example is a third-person declarative description of the self-reference that’s lacking throughout their paper. This kind of meta-description is what Xu and Zhou lean on for their “self-referential” claims, but without the formal encoding that makes Gödel’s statement truly self-referential, so those claims are unequivocally false. Thanks for providing a definition that illustrates what’s missing from their construction. I suppose you see my point then?
Comment #57 August 23rd, 2025 at 1:13 pm
Tom #55: Amen.
Alas, from 25+ years of experience with P vs NP would-be-resolvers, I learned that they don’t operate by the same rules as the rest of us. They get to change their argument and even what claim they’re making an unlimited number of times. And as long as you haven’t proved that they couldn’t reach a correct and relevant insight at some point in the future, you’re the unimaginative hidebound skeptic and all the burden rests with you.
Comment #58 August 23rd, 2025 at 8:58 pm
@ Tom #55
From my understanding, the reason some people—including myself—have defended this paper is largely rooted in dissatisfaction with the current state of research in computational complexity theory, particularly regarding the proof of complexity lower bounds. Clearly, a paper that presents new ideas, even if it still needs refinement, is far better than one that is entirely correct but lacks any substantive results. Meanwhile, I also believe that figuring out why proving complexity lower bounds is so difficult is actually more important than the results themselves. Some of the analyses and explanations provided by the authors in the paper have also addressed the confusion I had before.
Comment #59 August 23rd, 2025 at 9:25 pm
@ Jacob #56
I greatly appreciate the authors’ idea of studying instance hardness by constructing self-referential objects. From a historical comparative perspective, this is undoubtedly a very promising idea. What deserves reflection is why this idea was not proposed in the past few decades. Moreover, I believe the authors have grasped the essence of self-reference in constructing these objects, and they have demonstrated the feasibility of this approach. I also agree with a previous comment that what needs to be done now is to apply this idea to more problems to further test its feasibility.
The essence of a self-referential object lies in containing contradictory aspects without being self-contradictory itself. The liar paradox, for instance, is self-contradictory because F cannot be equivalent to \neg F. In contrast, Gödel’s self-referential statement is not self-contradictory, as F can be equivalent to the unprovability of F. A more intuitive example is a coin: it simultaneously contains heads and tails. We can say that the head and tail of a coin form a pair of contradictions, for at any given moment, the coin’s state is either heads up or tails up. On the other hand, the coin itself is not self-contradictory, because it cannot be both heads up and tails up at the same time. The two states can transform into each other through a single action—flipping. The property of simultaneously containing two contradictory aspects and enabling their mutual conversion through the same action prevents rule-based reasoning; only by examining based on semantic definitions can one obtain the state of the object.
In fact, for most practical situations, we can infer the state of an object through reasoning. For example, to determine whether the water in a box is in a liquid or solid state, we don’t need to open the box to check; instead, we can reason based on temperature. This is because the process of water turning into ice and that of ice turning into water are two opposite processes—one involves cooling, while the other involves heating. One can imagine that if there existed a single process that could both turn water into ice and ice into water, then we would be unable to reason and would have no choice but to open the box for examination.
The core purpose of constructing self-referential objects lies in determining clear boundaries for reasoning. The essence of algorithm design is to formulate reasoning rules based on the characteristics of problems; thus, constructing self-referential objects serves as a feasible method to determine the boundaries of algorithms. The key to constructing Gödel’s self-referential statements depends on the properties of prime numbers, whereas for combinatorial problems, the key to constructing self-referential instances relies on the near independence of the solution space. Therefore, the methods for constructing self-referential objects are closely related to the nature of the problem. On the other hand, given that the boundaries of reasoning themselves are characterized by diversity and dynamism, the ways of constructing self-referential objects are naturally not rigid or fixed patterns. It can be anticipated that with the passage of time and the deepening of cognition, people will surely create more forms of self-referential objects, and the classic method of “self-reference”—which embodies logical wisdom—will rejuvenate with new vitality through continuous exploration.
Comment #60 August 23rd, 2025 at 10:47 pm
@ Jacob #53
I am not an author of the paper, so it is impossible for me to read your comments carefully and respond to all of them. At present, I am only very interested in the method proposed in the paper. Based on my current understanding, I think there is nothing wrong with the paper’s method, and this method holds promise for proving lower bounds for some problems that I am interested in.
In my opinion, the best way to understand a paper is to try applying its proposed method to solve new problems, rather than reading the paper word by word. Additionally, regarding the counterexamples you mentioned, I believe there’s a well-documented discussion on Fortnow’s blog, and I’d recommend checking it out there: https://blog.computationalcomplexity.org/2025/08/some-thoughts-on-journals-refereeing.html
Comment #61 August 24th, 2025 at 6:22 am
If I may put on perhaps both monocle and tin-foil hat, the practically ubiquitous trend among most anonymous defenders is repetition and deflection. On Lance Fortnow’s blog, when the going got tough, the paper’s defenders backpedaled to philosophy. On this blog, when the going is tough with philosophy, they’re pedaling back to technicals. If they stick to any topic long enough, their foundation’s revealed to be sand. And if we can imagine the authors really do know what a legitimate proof is, the possible alternatives fall into place.
I suspect some of these defenders have some real skin in the game, and are trying to muddy the waters so that this paper could be the one cited decades later as the “flawed, but visionary” work ushering in a new paradigm. They hoped to have manufactured enough ambiguity within the paper, and outside of it, that they could claim a great many future developments as emerging from under their banner. I would find this much easier to dismiss if the authors did not misquote Chaitin, skip over the normal peer review process, refuse retraction despite clear refutations from the community, and then write enough loose jargon in the appendices that they could lay claim to inspiring any discoveries pertaining to “gaps” in a sense. It’s a way to hedge their bet from a long-term perspective should the proofs themselves be rejected by the community, which they could have very well predicted given the strange nature of testimonials and defense.
To quote the anonymous user Daniel quoted, “In Appendix B of this paper, I note that…” and the key point here is the “I” in that statement. This was posted anonymously using first-person language and reads as one with a personal role in the paper. One might wonder themselves about the backpedaling to philosophy on Fortnow’s blog that’s been challenged here and caused “Daniel” to emerge as a named defender like “notbad” did on Fortnow’s?
The attempt to secure a historical footnote as “flawed visionaries” was predicated on the hope that they could create enough rhetorical fog to obscure the truth. They hoped that by kicking up dust with shifting arguments, from technicalities about space complexity to grand pronouncements about Gödel, they could create the appearance of a legitimate, complex debate as they obfuscate truth. In such a scenario, a future academic might charitably cite them for having the “right idea” even if the execution was wrong. But we’ve seen the paper for what it was, as well as how each contributed to their respective defenses throughout, and the only bonafide novelty to emerge from them was perhaps in the nature of their misunderstandings. I agree that this legitimately feels like a bad-faith effort, and a coordinated one at that, just to be cited as a precursor should all this be rejected by the community now. They appear to have forgotten of both Icarus and Ozymandias in their gambit for glory.
The only place their strategy might have a shred of success now is in the far future, in a different field, by someone doing a cursory historical search who is unaware of the context. But even then, the first thing a diligent search will uncover are these very blog posts. The critiques are now inextricably linked to the paper itself. And I know all this sounds a bit out there, but with everything else? I figured it was worth a shot to try to game it out, to attempt to sense the motives behind such things. I guess the question would be if I should take off the monocle, or the tinfoil hat? It’s my first time around this particular carousel, so perhaps I’m just too close to it and have yet to learn not to stare into this abyss of irrationality. All I know for sure is this has just been weird.
Comment #62 August 24th, 2025 at 2:35 pm
[…] which—quietly, on p. 6—includes the sensational “ten-septillion over classical” claim. It was featured alongside seven other developments on Shtetl-Optimized (SO). The discussion thread includes a brief […]
Comment #63 August 24th, 2025 at 3:53 pm
Just for the record, I wrote and submitted #61 before seeing #58, #59, and #60. They were posted in the same batch if I’m not mistaken.
Comment #64 August 25th, 2025 at 1:07 pm
If you ask one of these AIs to formulate an appropriate question for the International Math Olympiad, how does it do?
Comment #65 August 26th, 2025 at 2:24 am
I have caught GPT-5 make what appears rather unambiguously like an attempt to gaslight me. I trust it even less than its predecessor.
Comment #66 August 28th, 2025 at 1:34 pm
1) Kudos to Evyatar (#44) for making a simple yet powerful point that highlights a serious weakness in the Google 2019 experiment. For those interested, I gave (over my blog) a few comments on how it connects to our work (and also on the difference between spoofing and verification hardness).
2) Scott, thanks for the update on #3. It’s true that our reasoning about why the Google 2019 experiment was carried out incorrectly is both complicated and delicate. You and I have discussed this several times—both here and privately—over the past six years.
3) In my view, the more restrained, “not-hyped” attitude of the Google Quantum AI team this time deserves appreciation. (Of course, it remains important to carefully scrutinize the scientific claims in their paper.)
4) A broader, “meta” question—related to my two-decade debate with Scott on quantum computing—is the relevance of “Aumannian rational dialogue” to disagreements and debates. Specifically: to what extent can rational conversation based on Aumann’s theorem (and the theorem itself) survive in an environment of biased inaccuracies? I believe this is an interesting research question (perhaps one already studied).
I became interested in Aumann’s theorem during my early university years and had many conversations about it (including with Aumann himself). Scott has written about it in various places, including a 2015 blog post that sparked a lively discussion. In other venues, he even cited our disagreement about quantum computing as an example. Interestingly, Steven Pinker has just published a new book that touches on this topic, and an example he raised also figured in that 2015 thread.
In fact, in the context of scientific disagreements that await future experimental (or theoretical) resolution, the relevance of Aumann’s theorem seems especially limited. It may also be useful to distinguish between two agents who aim to convince each other (or jointly seek truth), and two agents who aim to persuade a third party or a wider audience. Relatedly, Jacob Glazer and Ariel Rubinstein have written about debates where the goal is persuasion of others—for example, in their paper “Debates and Decisions: On a Rationale of Argumentation Rules .
Comment #67 August 28th, 2025 at 11:12 pm
The current AI apps are just a pleasure to have a conversation with about quite technical subjects, it’s like having your own postgrad tutor on tap for free. For instance I just had a very informative discussion about the role of Dirac as the SOLITARY english/brit person at the dawn of quantum mechanics almost exactly 100 years ago and the reasons for his isolation compared to Heisenberg, Born, Jordan, Pauli and others in Gottingen, Copenhagen and elsewhere on the (european) continent.
But I’m still pretty sure that AI won’t get ever “get” human quirks, it’ll be like a super intelligent autistic person, never understanding or feeling basic human emotions and foibles.
But we’ll see, I guess.
Bravo on the Zionism post.
Comment #68 August 29th, 2025 at 5:27 am
I’m not sure if this is out there in the folklore but, for the pedagogy of it all, what about a time-bounded SAT fixed-point with explicit Cook-Levin accounting? I’ve got a technical note on such, unaffiliated with no college degree though, so that means no arXiv. You just gotta throw the DOI: 16989440 into the search bar on Zenodo. It comes from a well-trodden path, and hopefully a bit of tangibility being added will help some folks understand formal self-reference better.
@ Daniel #60,
You have a tendency to not speak to me, but at me. I’d seek to interpret you more charitably if it didn’t feel like you were keen to use me as a prop for your theatrics. Coordination doesn’t necessarily mean authorship, but electing the denial of such when no one asked is an interesting signal itself. With #58, #59, and #60 coming out in the same batch as #61, I believe you’ve answered my final question in #24.
Comment #69 September 11th, 2025 at 3:39 pm
From the Springer paper, I find the claim that all solving has to be “brute force” to be wrong – there are known ways (*) to solve NP-complete problems that aren’t actually obvious exhaustive searches, if you define brute force/exhaustive search as explicitly trying all solutions, in the same way one would test a candidate solution.
(*) not to say those algorithms aren’t exponentially costly in some ways (i.e. we still have P=NP).
Comment #70 September 11th, 2025 at 3:56 pm
typo:
(i.e. we still have P!=NP)