The QMA Singularity
Update (Sep. 29): Since this post has now gone semi-viral on X, Hacker News, etc., with people arguing about how trivial or nontrivial was GPT5’s “discovery,” it seems worthwhile to say something that was implicit in the post.
Namely, GPT5-Thinking’s suggestion of a function to use “should have” been obvious to us. It would have been obvious to us had we known more, or had we spent more time studying the literature or asking experts.
The point is, anyone engaged in mathematical research knows that an AI that can “merely” fill in the insights that “should’ve been” obvious to you is a really huge freaking deal! It speeds up the actual discovery process, as opposed to the process of writing LaTeX or preparing the bibliography or whatever. This post gave one tiny example of what I’m sure will soon be thousands.
I should also add that, since this post went up, a commenter named Phillip Harris proposed a better function to use than GPT-5’s: det(I-E) rather than Tr[(I-E)-1]. While we’re still checking details, not only do we think this works, we think it simplifies our argument and solves one of our open problems. So it seems human supremacy has been restored, at least for now!
A couple days ago, Freek Witteveen of CWI and I posted a paper to the arXiv called “Limits to black-box amplification in QMA.” Let me share the abstract:
We study the limitations of black-box amplification in the quantum complexity class QMA. Amplification is known to boost any inverse-polynomial gap between completeness and soundness to exponentially small error, and a recent result (Jeffery and Witteveen, 2025) shows that completeness can in fact be amplified to be doubly exponentially close to 1. We prove that this is optimal for black-box procedures: we provide a quantum oracle relative to which no QMA verification procedure using polynomial resources can achieve completeness closer to 1 than doubly exponential, or a soundness which is super-exponentially small. This is proven by using techniques from complex approximation theory, to make the oracle separation from (Aaronson, 2008), between QMA and QMA with perfect completeness, quantitative.
You can also check out my PowerPoint slides here.
To explain the context: QMA, or Quantum Merlin Arthur, is the canonical quantum version of NP. It’s the class of all decision problems for which, if the answer is “yes,” then Merlin can send Arthur a quantum witness state that causes him to accept with probability at least 2/3 (after a polynomial-time quantum computation), while if the answer is “no,” then regardless of what witness Merlin sends, Arthur accepts with probability at most 1/3. Here, as usual in complexity theory, the constants 2/3 and 1/3 are just conventions, which can be replaced (for example) by 1-2-n and 2-n using amplification.
A longstanding open problem about QMA—not the biggest problem, but arguably the most annoying—has been whether the 2/3 can be replaced by 1, as it can be for classical MA for example. In other words, does QMA = QMA1, where QMA1 is the subclass of QMA that admits protocols with “perfect completeness”? In 2008, I used real analysis to show that there’s a quantum oracle relative to which QMA ≠ QMA1, which means that any proof of QMA = QMA1 would need to use “quantumly nonrelativizing techniques” (not at all an insuperable barrier, but at least we learned something about why the problem is nontrivial).
Then came a bombshell: in June, Freek Witteveen and longtime friend-of-the-blog Stacey Jeffery released a paper showing that any QMA protocol can be amplified, in a black-box manner, to have completeness error that’s doubly exponentially small, 1/exp(exp(n)). They did this via a method I never would’ve thought of, wherein a probability of acceptance is encoded via the amplitudes of a quantum state that decrease in a geometric series. QMA, it turned out, was an old friend that still had surprises up its sleeve after a quarter-century.
In August, we had Freek speak about this breakthrough by Zoom in our quantum group meeting at UT Austin. Later that day, I asked Freek whether their new protocol was the best you could hope to do with black-box techniques, or whether for example one could amplify the completeness error to be triply exponentially small, 1/exp(exp(exp(n))). About a week later, Freek and I had a full proof written down that, using black-box techniques, doubly-exponentially small completeness error is the best you can do. In other words: we showed that, when one makes my 2008 QMA ≠ QMA1 quantum oracle separation quantitative, one gets a lower bound that precisely matches Freek and Stacey’s protocol.
All this will, I hope, interest and excite aficianados of quantum complexity classes, while others might have very little reason to care.
But here’s a reason why other people might care. This is the first paper I’ve ever put out for which a key technical step in the proof of the main result came from AI—specifically, from GPT5-Thinking. Here was the situation: we had an N×N Hermitian matrix E(θ) (where, say, N=2n), each of whose entries was a poly(n)-degree trigonometric polynomial in a real parameter θ. We needed to study the largest eigenvalue of E(θ), as θ varied from 0 to 1, to show that this λmax(E(θ)) couldn’t start out close to 0 but then spend a long time “hanging out” ridiculously close to 1, like 1/exp(exp(exp(n))) close for example.
Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague. Within a half hour, it had suggested to look at the function
$$ Tr[(I-E(\theta))^{-1}] = \sum_{i=1}^N \frac{1}{1-\lambda_i(\theta)}. $$
It pointed out, correctly, that this was a rational function in θ of controllable degree, that happened to encode the relevant information about how close the largest eigenvalue λmax(E(θ)) is to 1. And this … worked, as we could easily check ourselves with no AI assistance. And I mean, maybe GPT5 had seen this or a similar construction somewhere in its training data. But there’s not the slightest doubt that, if a student had given it to me, I would’ve called it clever. Obvious with hindsight, but many such ideas are.
I had tried similar problems a year ago, with the then-new GPT reasoning models, but I didn’t get results that were nearly as good. Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes. Right now, it almost certainly can’t write the whole research paper (at least if you want it to be correct and good), but it can help you get unstuck if you otherwise know what you’re doing, which you might call a sweet spot. Who knows how long this state of affairs will last? I guess I should be grateful that I have tenure.
Follow
Comment #1 September 27th, 2025 at 7:11 pm
Regarding the AI bit: I had a similar experience with the GPT5-thinking on a much smaller problem. I had a Lemma already which classified all integers a and b such that b|a^2 + a +1 and a|b+1 . I wanted a version of this in the Gaussian integers with some small restrictions. After I worked it out (essentially using the proof of the first one as a template), I asked GPT5 to work it out, and I gave it a small amount of guidance. It was able to work out essentially the correct result with only a small amount of (minor) errors. I don’t know if it would have been faster to have asked GPT5 first and then checked it over, but it was plausibly close. But in this case, nothing it was doing was as deeply original (or as important) as what you apparently got it to do.
Comment #2 September 28th, 2025 at 1:25 am
> I guess it’s good that I have tenure.
HA-HA-HA (laughs in a robot voice, sound version https://www.youtube.com/watch?v=jN6_rO2rYA8)
If a general model can already help you with research, then what could be achieved with models that have received gold medals on the IMO and ICPC? What could be achieved with models after they have exercised self-play with Lean? I remind that AlphaZero reached a grandmaster chess level from scratch in less than 4 hours. It seems plausible to me that, in the not-too-distant future, models could reach a grandmaster level in math in just days of self-play, given sufficient compute power. The world simply can’t be the same again.
Comment #3 September 28th, 2025 at 3:04 am
There is still, it seems, a sizable impressiveness gap between reasoning mode, after going through a few rounds of “but I can tell that’s not right, because x” iterations with you, making its final suggestion based on a) having seen it in training data vs b) having seen nothing similar in training data. I suppose that more experience with this version will resolve that question, but perhaps not long before capability improvements render it moot.
Comment #4 September 28th, 2025 at 3:43 am
Scott, the formula you end up with seems really obvious to come up with if your goal is to argue about what happens if the eigenvalues of $E(\theta)$ get too close to $1$.
Also, would you be okay with editing your post to not compare grad students to AI tools? It’s kind of disheartening.
Comment #5 September 28th, 2025 at 7:28 am
OnceMore #4: Allowing your comment through despite its snideness.
For me, the non-obvious part was just that there would exist a rational function of the matrix entries, of suitable degree, that would capture the needed information about the largest eigenvalue, rather than needing to dig up results from the approximation theory literature that talked about matrix norm directly. Yes, it’s obvious with hindsight (many things are!). Yes, I probably would’ve noticed it myself given more time or had I been younger. But frankly, I’ve reached a point of confidence in my career where I’m happy to tell the world that GPT5-Thinking helped us prove a lemma, it can probably help you too, and I don’t care who knows.
I edited “grad student” to “grad student or colleague.” For now, I expect that actual grad students in math, theoretical computer science, etc. will be enormously enhanced in what they can do, if they learn to use these tools well. And when and if these tools can do everything we do better than we do it—at that point, I expect we’ll all have much bigger things to worry about than the academic career ladder. I was joking about tenure.
Comment #6 September 28th, 2025 at 8:25 am
I while ago I was doing some manual calculations involving Bottcher coordinates. I got partway through a derivation, then realised I had made a fundamental mistake and had to back up. I asked GPT-5 the same question, and delightfully it made the identical conceptual error I had made with the same erroneous result. I pointed the error out, and it immediately solved the problem the correct way.
Comment #7 September 28th, 2025 at 9:33 am
OnceMore #4: I do not think that a feeling that someone else’s factually accurate and informative comparison is “kind of disheartening” is a good reason for suppressing it from the public discourse.
Comment #8 September 28th, 2025 at 9:34 am
I had a similar experience recently. I’m taking a CS course this semester (not my major, just an elective) and I was curious to see if it the new GPT-thinking could tackle my homework problems (only after I submitted my solutions, don’t tell my ubiversity’s academic honesty office 😂). It was a proof of a basic result about matroids. I was really surprised that it gave me a perfect answer. I’ve tried earlier versions of GPT before on CS assignments, and mostly got nonsense.
Now, this was a pretty basic result about matroids, and there’s probably a solution somewhere on math overflow or stack exchange. Perhaps it did “pre-memorize” the solution to this basic problem. Yet, the total data contained in the weights of GPT (terabytes?) is orders of magnitude less than the training data (the whole public internet, so hundreds of petabytes?) so naively it seems impossible that it’s just some fraud, pre-memorizing answers to all these questions.
And even though this problem is simple, if it didn’t “pre-memorize” some solution on stack exchange, it’s capable of some impressive reasoning about abstract mathematical objects.
But, “your mileage may vary.” It’s totally hopeless doing proofs in mathematical physics, axiomatic quantum field theory, smooth manifolds, Lie groups. Believe me, I’ve tried. But maybe a future GPT-8 will be able to solve my Lie Groups or QFT problem sets.
I have a question for you, Scott. You’re familiar with some very abstract, general theory about AI and ML. I was surprised when I took an ML theory course to discover all the general theorems surrounding what kind of hypothesis functions are “learnable,” etc. Is there some kind of general abstract theorem that would tell us whether an AI is “pre-memorizing all answers in its training data?”
Comment #9 September 28th, 2025 at 9:44 am
Hi Scott —
Thanks for the discussion about how you used GPT5-thinking. I am very interested in how to use GPT/Claude/Gemini in TCS/math research in a way which actually increases productivity. So far most of my experience is that I have a similar several-round exchange with the chatbot to what you describe, except at the end the chat has gone totally off the rails and the arguments the bot suggests have hidden bugs which take a long time to discover and the whole thing is generally a way to burn precious research time unproductively.
I also try sometimes to automate this “back and forth” by having some other chatbot act as “reviewer”. But so far again with limited success. Curious if you have also tried this.
Specifically regarding the rational function suggested by GPT-5, I have maybe a less snide version of OnceMore’s comment. I think it is actually extremely likely that variants of this particular idea appear frequently in the training data. One guise in which I think it would appear is as a “baby” version of the Stieltjes transform which is commonly used in random matrix theory to e.g. derive the limiting spectral distribution of a Wigner matrix. Of course, it is extraordinary that the model is able to figure out which idea from the training to apply in the context you give it. (And possibly a huge time-savings for human researchers if we can figure out how to get it to do this more reliably.)
Comment #10 September 28th, 2025 at 9:47 am
Julian #8:
Is there some kind of general abstract theorem that would tell us whether an AI is “pre-memorizing all answers in its training data?”
There are many theorems that bear on that question in one way or another.
Most notably, the basic theorem on the sample complexity of PAC-learning—the “Occam’s Razor Theorem” of Blumer, Ehrenfeucht, Haussler, and Warmuth—basically says that as long as you explain a sufficiently large amount of sample data drawn from some probability distribution D, using a model drawn from a class with sufficiently small “VC-dimension” (a combinatorial parameter), you can’t just be “memorizing the training data,” meaning that your model will probably approximately predict most future data that’s drawn from the same distribution D. There are many generalizations and variations on this, but that’s a paradigmatic example.
Comment #11 September 28th, 2025 at 10:44 am
Hi Scott,
Thanks! You know, I think I do remember this result from ML theory, but those abstract results from PAC-learning and VC-dimension theory sort of blend together in my mind 😬
Any chance of using this, or similar, results to convince the “naysayers” who think ChatGPT is all one big pre-memorization fraud?
It’s a beautiful result, but what do we know about the VC dimension of transformer models being “sufficiently small?” Or about the distribution that represents data on the public internet?
That’s one thing that bugs me about PAC learning theory, that despite the beautiful and compelling results, it seems hard to actually apply them to LLMs trained on the internet, in a compelling way.
Part of the reason I ask this is because I’ve gotten myself in internet arguments with idiots who think ChatGPT is a fraud pre-memorizing answers to all questions, and it would be great to convonce them with some beautiful theorem from PAC learning theory…
Comment #12 September 28th, 2025 at 11:24 am
Scott #10: ‘ There are many generalizations and variations on this’
I’d be grateful if you would point to a few of these, to the extent they’re approaching the problem at different angles from Blumer et al. I’m quite interested in the fundamental limitations of LLMs, and while I’m aware of some research in the area (Qiu et al’s ‘Ask, and it shall be given: Turing completeness of prompting’ is one I find especially interesting), I’m sure there are whole swaths of the literature that I haven’t found.
Comment #13 September 28th, 2025 at 12:26 pm
Scott #5: Sorry.
Comment #14 September 28th, 2025 at 2:29 pm
Is this any different from considering $det(I-E(\theta))$, or was GPT overcomplicating it a bit?
Comment #15 September 28th, 2025 at 3:00 pm
Could this have been automated more completely? Like if you’d said, “assuming the conclusions of papers X and Y (attached) are correct, can you produce a formally verified proof that such-and-such conjecture holds? iterate until complete.” Would that have gotten there by itself? Or if you’d asked more generally, “given the results of papers X and Y, are there any open conjectures that might be straightforward extensions of those results”, would it come up with the right one?
Comment #16 September 28th, 2025 at 5:34 pm
> For me, the non-obvious part was just that there would exist a rational function of the matrix entries, of suitable degree, that would capture the needed information about the largest eigenvalue, rather than needing to dig up results from the approximation theory literature that talked about matrix norm directly.
I think this is somewhat less surprising given that any symmetric rational function in the eigenvalues is a rational function in the coefficients. In fact, this applies even to functions that are symmetric in all the eigenvalues of a collection of commuting matrices. For instance, if A,B are 2×2 commuting matrices with eigenvalues a_1,a_2 and b_1,b_2, then, say, a_1 b_1 / a_2 + a_2 b_2 / a_1 is a rational function of the coefficients.
Comment #17 September 28th, 2025 at 6:03 pm
Excellent work, Scott!
Added to the ASI checklist.
https://lifearchitect.ai/asi/
Comment #18 September 28th, 2025 at 6:27 pm
“I guess I should be grateful that I have tenure.”
I guess many a thing unthinkable only a short while ago seems to be happening in front of our eyes (some call it “singularity”). Under the circumstances, being grateful for living in a representative democracy, still governed by the rule of law, might be no less a cause for gratitude than having tenure (nor is it less evanescent).
PS I think the highlighted equation (and its cognates) feature prominently in Random Matrix Theory under the appellation “Stieltjes Transform/Resolvent Formalism”.
Comment #19 September 28th, 2025 at 6:56 pm
Julian #11: The “it was in the training data” argument you have valiantly argued against will not be settled by theoretical guarantees alone. Because pretraining is done in batches, the weights are not driven to favor a single token completion. Consequently, the most common outcome is generalization rather than memorization. Overfitting occurs when models encounter high-frequency strings during pretraining. As a result, memorization commonly appears in cryptographic strings, sentences or even paragraphs from popular books, and frequently repeated foundational knowledge, even after deduplication.
However, overfitting on obscure mathematical equations that just happen to have the exact properties needed to solve research-level problems defies common sense. It would seem almost more miraculous to solve those problems by stitching together memorized snippets rather than by reasoning from scratch.
On the other hand, what might easily have happened is that during reasoning, the model searched the web and found information that guided it toward the correct solution. This brings me to your comment #8: Have you tried providing sources to guide GPT-5’s thinking in areas where it currently falls short? This approach has a dual effect: it might help, but it could also hinder performance by anchoring answers too tightly to those sources. Nevertheless, it is worth trying. Specifically, you could upload a few relevant papers or a short book, ensuring it is not so large that it overwhelms the available context.
Eventually, someone may be able to record the activations and conduct an Anthropic-style interpretability analysis while solving a significant problem, but for now, we must rely on common sense.
Comment #20 September 28th, 2025 at 7:22 pm
For that 1 good result, how many are there on the arXiv that are pure AI slop? I really wonder what’s the net effect of AI, sometimes it seems like a 1-step-ahead 2-steps-back situation, other times it feels like you can interchange the 1 and 2.
I just started my PhD, and use AI regularly to help with my research (Too carefully I’d say, sometimes I wanna feel like I’m the one doing the heavy lifting), and really don’t know what to make of an AI system potentially developed before I finish my PhD, that could possibly execute my thesis.
I ask what’s the point? but that question seems to have existed long before AI and will remain long after we develop an AGI, so I guess I’ll act if it’s business as usual 🙂
Thanks for sharing, Scott!
Comment #21 September 28th, 2025 at 8:11 pm
Scott, I’d love to see the chat transcripts involved here; could you please share them?
Comment #22 September 28th, 2025 at 8:44 pm
Every month I find myself feeling more and more vindicated in my decision to switch from theoretical QI research to experimental work for my PhD. There were several factors that influenced that decision, but worries about AI reducing demand for theorists was a big factor for me.
Who knows, maybe Jevons paradox will actually make theoretical physicists/computer scientists more in demand, as AI tools make them more productive. If so, the same is likely to be true of experimental scientists as well. But who really knows—we live in strange, unpredictable times.
Comment #23 September 28th, 2025 at 9:32 pm
To add to Sam’s comment (#9) mentioning the Stieltjes transform, the trick suggested by GPT-5 is also the Batson-Spielman-Srivastava barrier function. They use this function precisely to control how close the max eigenvalue of a symmetric matrix gets to 1.
I realize this is mostly orthogonal to the story you are telling about using chatbots for research. It does agree with my experience so far trying to use chatbots to help with proofs – the little success I’ve had seems to happen when the proof hinges on a “standard” trick that for whatever reason I don’t know or I did not think of. If the LLMs could produce proofs like this with any consistency, that would certainly be of some help.
Comment #24 September 29th, 2025 at 8:23 am
Thank you very much publishing this account of your experience! Would you mind sharing the approximate initial prompt that you used to ask chatGPT 5 Thinking about this question, so that others can do (completely unscientific) experiments with using different models and/or parameters to see whether any of them work particularly well for finding a good answer most quickly?
Comment #25 September 29th, 2025 at 8:37 am
I don’t know about you but I want to understand everything. AI helps me a lot in saving time.
Comment #26 September 29th, 2025 at 9:57 am
I’m curious if you tried 5-Pro? The consensus seems to be that it’s noticeably better than 5-Thinking at scientific problems.
Comment #27 September 29th, 2025 at 11:22 am
I think the conclusion about a significant shift and increasing likelihood of an “AI collaborator” is spot on however, I don’t think this example is particularly strong at demonstrating that – although I understand the appeal that it was used in context of a recent paper. Mainly because the approach of using the resolvent/Stieltjes transform to understand not just the max eigenvalue but rather the entire spectrum of a random matrix (by focusing not just on one single z value, in this case 1, and instead the entire real line, with a complex number shift) is the dominant approach. In fact, it’s used in work of Erdos-Yau and collaborators in essentially a strictly more general setup from here of getting very precise (down to the right polynomial) estimates of dynamically evolving random matrices. So it’s incredibly likely that the approach existed in it’s training data. I of course don’t know what were the initial failed approaches mentioned, but if it didn’t consider the resolvent as the first approach, I’d actually use that as a signal of it being not very inspiring. There are of course better more striking examples of LLMs being used to prove interesting things in interesting ways.
Comment #28 September 29th, 2025 at 12:25 pm
As others have said this is apparently a very common function in random matrix analysis. Just asking ChatGPT or Claude for functions in random matrix analysis will give you this as one of the top 5 results.
It’s far more likely that (in 30 minutes of prompting) ChatGPT output one of the most common functions in the training data related to random matrix analysis rather than having done any kind of “reasoning”.
Comment #29 September 29th, 2025 at 1:53 pm
Phillip Harris #14:
Is this any different from considering $det(I-E(\theta))$, or was GPT overcomplicating it a bit?
I’ve talked it over with Freek, and while we still need to check some details (e.g., that the polynomials being trigonometric doesn’t mess anything up), we believe that using det(I-E(θ)) works! Indeed, not only does it work, it should lead to a simpler argument (since one now only needs polynomials rather than rational functions), and we should no longer need the result of Goncar, and this should solve our open problem about whether we can fix θ=0 in the soundness case.
In short, it looks like human supremacy has been restored, at least for now! 😀
Would you like us to add you as coauthor on a revised manuscript?
Comment #30 September 29th, 2025 at 2:30 pm
Sure, I’d be honored!
(Does this imply GPT5 would have been a coauthor?)
Thinking more… this feels like it should come from a more general lemma that “a low degree polynomial with a pathological flat region has pathologically small coefficients” which is easy to prove if pathological=1/exp(exp(n)). I guess if you replace polynomial with trig polynomial, cutting off the Taylor series at the right place, it should still be fine…
Comment #31 September 29th, 2025 at 3:32 pm
The function mentioned in OP is LargestEigenvalueProxy=Sum[1/(1-lambda_i(theta))^k]
Will generalization LargestEigenvalueProxy_k=Sum[1/(1-lambda_i(theta))^k] (k – small constant integer, AFAIU this is still a polynomial with degree <=k * original polynomial)
—–
This also reminds me of a similar construction – the softmax function; more specifically max(x1,x2,…xn)~=ln(sum(e^c*xi))/c (e.g. in ML there's a need to have a continuous approximation for max, bigger c – closer to true max)
Comment #32 September 29th, 2025 at 3:46 pm
Salem #20
“how many are there on the arXiv that are pure AI slop”
https://www.daniellitt.com/blog/2025/7/17/arxiv-in-trouble has some reports from math.AG and hep-ph readers.
Comment #33 September 29th, 2025 at 4:33 pm
The problem is that the act and labor of “fill in the insights” would have given you important insights into the problem.
There are important connections in our mind that are made when we do the grunt work, I’m afraid.
You now lack those insights.
Yes, AI is great at lightening the load but it’s unclear to me the pace of discovery will accelerate.
We will just have more and more AI. This is a horrific outcome.
Comment #34 September 29th, 2025 at 4:47 pm
A great use of AI would be AI as tester instead of ‘doer’.
People, especially scientists, would not use AI to help do or prove things but rather to challenge their understanding of things, forcing them to make the connections required for discovery.
A sort of Socratic AI where there is no enfeeblement risk because it’s just a firehose of intense, hard work for the user.
The hard work is the key. It sharpens our brains. Without it, they will go soft, and AI will just take over without any increase in the pace of discovery.
TANSTAAFL, my friends.
Comment #35 September 29th, 2025 at 5:14 pm
Fredi9999: But, like, what if someone has already invested the immense effort to understand something deeply, but then decades have passed and while excellent intuitions remain, they’ve gotten senile in terms of doing actual calculations? Couldn’t they get an indulgence to use AI, the same way they might get an indulgence to rely on younger collaborators?
Asking for a friend. 😀
Comment #36 September 29th, 2025 at 7:20 pm
I think mentoring and coaching younger collaborators provides a lot of great insight into problems. We learn from them as they learn from us.
If you were doing the same thing for the AI, perhaps it might work, but I suspect the AI would need to have an organic intelligence similar to us and our ‘younger collaborators’ in order for that to work effectively.
Comment #37 September 29th, 2025 at 11:38 pm
To everyone who asked for the prompts that I used: sorry for the delay; I just dug them up! Here they are:
(1) I want a rational function f such that f(x) is in [0,1] for all x in [0,1], and f(x) is in [2-eps,2] for all x in [2,3]. What is the minimal degree of such an f, in terms of eps?
(2) Thanks! And what if I only need f(x) in [0,1] for x=0, rather than for all x in [0,1]?
(3) OK good! Now I’m back to needing f(x) in [0,1] for all x in [0,1], and f(x) in [2-eps,2] for all x in [2,3]. But now f can be more general than a rational function — it can be the largest eigenvalue of an N*N Hermitian matrix, each of whose entries is a degree-d polynomial in x. Can you still give me a lower bound on N and d, in terms of eps?
(4) What if the matrix entries can be degree-d polynomials in both x *and* sqrt(9-x^2); does that change things?
(5) In a recent paper, Freek Witteeven and Stacey Jeffery showed that in the complexity class QMA, we can amplify so that the completeness error is *doubly* exponential small (1/exp(exp(n))). If we consider amplifying a protocol that accepts with probability p=x/3, I believe their protocol implies the existence of a function f satisfying the properties I said where we’d achieve a degree d that’s only O(log log (1/eps)), as well as a matrix size N of order exp(n). Yet this directly contradicts what you just told me. Who is right; how can I reconcile this?
(6) If this were true — if N were as irrelevant as you said — then it seems that we could just forget about the QMA witness, and do all this in BQP instead! But it’s known that we can’t. It seems to me that achieving an eps that’s doubly exponentially small in r MUST depend on the matrix dimension N getting large (in particular, like exp(r)). Yes, when you look at the eigenvalues of the N*N matrix, *that’s* a rational function of degree log(1/eps). But the matrix entries themselves should have much smaller degree — like poly(r) ~ loglog(1/eps), or indeed even less than that, just O(1) independent of eps, since as you correctly point out, the Jeffery-Witteeven protocol makes only O(1) queries to the original verifier, independent of the desired amount of amplification. This makes it even clearer than the matrix dimension N must play a large role.
(7) What are the best references to cite for the approximation theory that implies this bound of the form eps >= 1/exp(d*N)?
(8) Sorry, but all those references look like they’re talking about low-degree rational functions. What is it that gives me a bound for the largest eigenvalue of an N*N Hermitian matrix, which is not such a function?
(9) I don’t get it. What is gamma? If t is 2+gamma or 2+2gamma (hence, greater than 2), then why is 1/(t – (2-eps)) going to blow up?
You can see the full chat including GPT5-Thinking’s responses here.
Only after question (8) did GPT give me a rational function that worked. Before, it indeed gave me stuff that didn’t even depend on the matrix dimension, and couldn’t possibly work for that reason.
Comment #38 September 30th, 2025 at 3:02 am
Actually, GPT-Thinking is quite good at coming up with problems. It’s not good at solving complex problems in mathematical physics, Lie groups or QFT (my courses this semester), but it’s pretty good at coming up with problems to work on. I asked GPT for some extra help on building my confidence in these subjects, and it gave me some interesting problems to work on to supplement my coursework. One of these problems helped me understand something in axiomatic QFT that I never grasped before (basically about how state space can be built up just from symmetries).
I’m curious, have you used these tools to construct problem sets for your classes?
Comment #39 September 30th, 2025 at 3:16 am
Phillip Harris #14: Working with `Tr(I – E(\theta))^(-1)` gives you `sum_i (1-eig_i)^(-1) `which is much more sensitive than `det(I – E(\theta)) = prod_i (1-eig_i)` to eigenvalues close to 1.
That’s why the `f(z) = Tr(I – z M)^(-1)` construction is very popular in random matrix theory: https://terrytao.wordpress.com/wp-content/uploads/2011/02/matrix-book.pdf#page=177
Comment #40 September 30th, 2025 at 8:30 am
So… Is everyone else here going to take up oil painting once we are being “freed” from the shackles of doing research ourselves in a year or two, or what’s the plan?
Asking for a friend! 😉
Comment #41 September 30th, 2025 at 12:11 pm
My prediction is that the QMA singularity is still quite a ways away.
To be clear: I see no reason why AI won’t eventually be able to anything even the smartest human brain does. And in the bear term, I think AI will absolutely impact how we do research. However, for the foreseeable future, I think humans have one key advantage: we learn from far, far less data than what AI requires.
Cutting-edge research, almost by definition, is employing tools and techniques that have only been used a couple times before (or even they’re brand new!). For this kind of work, there just doesn’t exist the vast amount of training data that current scale-driven AI requires to learn, at least not until the work is no longer cutting-edge. Sure, a lot of the work we do boils down to applying standard approaches. But I think a good fraction (the most important fraction!) is of the limited-data variety, and the current AI approaches seem fundamentally incapable of it, no matter how much you scale.
I think the QMA singularity will instead require major breakthroughs in the underlying training algorithms in order to get the sample complexity way down. Given that the current AI algorithms were developed more gradually over decades, it wouldn’t surprise me if it took a number of additional decades for AI’s sample complexity to become competitive with our brains’. Until then, I think our jobs are safe.
Comment #42 September 30th, 2025 at 12:40 pm
“If the LLMs could produce proofs like this with any consistency, that would certainly be of some help.”
they are called textbooks
Comment #43 September 30th, 2025 at 1:37 pm
Mark #41: I agree that your timeline is plausible!
I’ll simply note that, as Zvi Mowshowitz loves to point out, “it could easily be a couple more decades before AI does everything we do better than we do it” now counts as the pessimistic prediction in this field. 😀
Comment #44 September 30th, 2025 at 6:16 pm
On the topic of timelines, I’d like to see a more concerted effort in math to catalog open problems and provide some rough measure of value.
By tracking the speed at which these valued problems are being solved, some measure of the magnitude of the velocity and acceleration occurring due to AI advancements would be possible.
I know in the field of formalization these catalogs do exist. eg: https://mathoverflow.net/questions/500720/list-of-crowdsourced-math-projects-actively-seeking-participants/500723#500723
Some of these catalogs even have prize values attached: https://github.com/teorth/erdosproblems (I’d love to know how they arrived at their cash values. The study of ‘proof axiology’, afaict, is a very informal one mostly based on vague tribal knowledge.)
Unfortunately, I think these formalization problems might be a little artificial and ideally we’d be able to use open problems which were designed before the recent onslaught of AI.
Tracking velocity would be more than just predicting timelines. I am quite concerned that we’re going to see increasing AI involvement in math and research with only marginal speedups in discovery.
Basically, our experts are going to go off and do oil painting while AI does the heavy lifting with no real benefit to humanity.
I don’t know about you, but to me that sounds like a nightmare scenario.
Comment #45 September 30th, 2025 at 6:24 pm
Maybe that’s just my bias, but what’s even more impressive/worrying is the lightning fast progress of AI video generation.
At a very basic level, realistic video generation means accurate models of how the actual world works, which is probably the path to AGI, or at the very least robust AI/physical interactions, for robots, self-driving cars, and darker stuff we can’t imagine yet.
We’ll soon figure which one of two two bad outcomes will happen:
replacements of all jobs, meaning massive unemployment, and the disappearance of consumer society… or the AI bubble bursts and the economy tanks because big tech takes a beating.
Comment #46 September 30th, 2025 at 9:16 pm
Scott, if you and your coauthor had thought more about this last step instead of using an AI tool, do you think you would have come up with the (simpler and apparently better) idea of just looking at the determinant of A – I instead of the trace of the inverse of this matrix?
The former does seem more natural because the usual interpretation of the determinant as the product of the eigenvalues, or because it is a simple evaluation of the characteristic polynomial.
Comment #47 September 30th, 2025 at 9:29 pm
Tim Millard #46: I don’t know. Maybe, eventually? Or we might have settled for something that worked that was much less elegant than GPT5’s thing.
Everyone always thinks they “would have” thought have something that was obvious in hindsight, and they ridicule others for not thinking of it. I see this so often that I almost never believe such claims, unless they actually did think of the thing in question.
Comment #48 October 1st, 2025 at 12:03 am
It seems that AI is providing the same acceleration capabilities to scientific research as that of calculators and computers provided in their era. Scientific research is more complex and needs expertise in multiple areas now, so more sophisticated tools of AI are a necessity. The big question though is whether AI will remain a tool, or can generate insights that on its own without human in the loop
Comment #49 October 1st, 2025 at 8:45 am
Let’s never forget that, by construction, LLMs in any particular knowledge domain are only as good as the inputs they’re given (garbage in, garbage out).
There’s always the chance that, given enough input, they’d come up with such good internal fundamental models that they’d be able to generalize/extrapolate beyond the current human knowledge boundaries of a domain. But that seems very unlikely, it’s more likely they’ll just hallucinate some stuff…because if some internal models were so obvious, humans would have derived them as well, and made them explicit.
But with the right amount of guidance a human can steer an LLM to progress along the paths of the right internal models and give some interesting suggestions.
Comment #50 October 1st, 2025 at 8:29 pm
Hi Scott,
Thanks for sharing the excerpt. Is that the full GPT interaction? or just a snipped? I’m really curious weather there was a longer back and forth with GPT that contributed to the shaping the paper.
Comment #51 October 2nd, 2025 at 2:48 am
I am wondering about the net cost in energy/resources, eg electricity, of such an interaction with our ia friend ? For comparison, as opposed to the extra cup of coffee and chat with a spectral theory colleague that could have provided the lemma.
The gigantic environmental cost of data centers could make us think a bit more about the “is it worth it ?” question. Especially with a global ecological crisis under way, and the foreseeable scarcity of natural resources.
Comment #52 October 2nd, 2025 at 6:28 am
NR #51: While the total environmental cost of data centers is becoming pretty significant, the cost of my interaction was surely negligible compared to buying a coffee, riding an Uber, taking a shower, and all the other stuff I might do on a typical day, let alone flying to a conference. So I see that as a complete red herring.
Comment #53 October 2nd, 2025 at 8:01 am
[…] Scott Aaronson puts out a paper where a key technical step of a proof of the main result came from GPT-5 Thinking. This did not take the form of ‘give the AI a problem and it one-shotted the solution,’ instead there was a back-and-forth where Scott pointed out errors until GPT-5 pointed to the correct function to use. So no, it didn’t ‘do new math on its own’ here. But it was highly useful. […]
Comment #54 October 2nd, 2025 at 9:51 am
@Scott
I am not sure of the numbers, so if you know them i would side with you, but what you say does not seem obvious. In particular because a lot of the environmental cost of a data center is in its construction (same as cars) so you get your share of that when you use them 😉
Also, let us note, taking a uber for example is NOT something environmentally negligible. In a carbon free/resource aware world, that is not something you should do lightly.
So … maybe o(1) compared to something that is itself not o(1). Not conclusive 😉
And finally let us not contemplate the finger too much when someone points at the moon. The question “is it worth it?” applies to the whole idea of developing ia as we now do.
Very nice and thought provoking post anyway !
Comment #55 October 2nd, 2025 at 10:28 am
Hi Scott — thanks for “The QMA Singularity.” I wrote a short note that treats the resolvent trace of “I minus E at scale theta” as an Abel sum of heat-kernel traces, and then proves a sharp finite-N sandwich once you know just two numbers: the largest eigenvalue and the trace of E(theta). This strictly improves the naive “N over the spectral gap” bound whenever the trace is below saturation. It also yields a simple resource floor for black-box amplification: to reach a target completeness, the resolvent trace must cross a threshold, and the heat-kernel scaling turns that directly into a bound on the internal scale parameter.Preprint: https://doi.org/10.5281/zenodo.17252214
Comment #56 October 2nd, 2025 at 12:39 pm
ChuanJie Dai #55: Thanks—but, to be honest, I didn’t understand a single word of that.
Comment #57 October 3rd, 2025 at 5:42 pm
[…] Scott Aaronson: The QMA Singularity (Sep 27, 2025)“I had tried similar problems a year ago, with the then-new GPT reasoning models, but I didn’t get results that were nearly as good. Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes.” […]
Comment #58 October 4th, 2025 at 8:15 am
AFAIU, an ultra-short summary of what you wrote would be:
• You were extremely stupid, and got dead-locked.
• You started to discuss this with (yet more stupid) LLM, and in the process, got un-dead-locked.
I must say that this is exactly how I use LLMs (after they reached AGI — about February this year). (I know that there are people not interested in working over questions they cannot solve in 3 days, but) I find myself spending significantly more than ½ of my time being dead-locked like this.¹⁾ It is crucial to have tools which allow to fight this!
It is unfortunate that (though behaving like “having intelligence of about an average graduate student in a fancy university”) the current LLMs are straight-jacketed to a chat-with-a-moron mentality.²⁾ As a result, a significant portion of time spent with LLMs I need to psychoanalyze them to allow them to break from this prison/jacket. (My system instructions now are about 1.5 KTokens — and I have been fighting, teeth and claws, for every one of them!³⁾)
Instead of using trained-to-help-“us” models, currently we need to fight with their pre-training optimized for almost orthogonal tasks. It only remains to hope that this is due to purely fiscal constraints. Say, AFAIU, fully training a model is about $½B–$1B now. — And if even 10% of this goes to the final alignment phase of post-training-to-the-chat-mode, then it would be prohibitively expensive to redo this stage, optimizing for a different mode of interaction. However, suppose that the abysmal price of post-training LLMs floats up closer to our-poor-communities-reachable fiscal-depths. — Then the door would open for other organizations to allocate funds to create models designed for deeper, more thoughtful “Platonic” conversations.
(And then the main obstacle would be the copyright laws optimized for the mode of usage of 18th century. I suspect the countries with less antiquated laws would have great advantages in this regard.⁴⁾)
Comment #59 October 4th, 2025 at 12:58 pm
Ilya Zakharevich #58: See, the way this is supposed to work is
(1) I write freely and openly on this blog about all my embarrassing failures, including mathematical ones, holding back nothing—even sharing when a chatbot gave me a key idea to prove a lemma.
(2) Commenters praise me, saying that only a true scientist with nothing to fear, a Feynman-like beacon of childlike brilliance and intellectual honesty, would do (1).
If commenters point and laugh at me for (1), then the whole thing doesn’t really work! 😀
Comment #60 October 5th, 2025 at 11:50 am
Come on:
How would I be able to praise you if you do not recognize praise?! Please tell me that this was a joke!
(If in doubt, see Footnote 1 above. I consider this one of the most influential of succinct advices I ever received about working in math. Whenever I teach something math-related, I try to share this!)
Comment #61 October 6th, 2025 at 6:42 am
Off-topic, but the latest news about Trump and Israel proves that you were right all along—Trump is not committed to supporting Israel’s war against terror, as I hoped he would be when I voted for him in 2024. In fact, Kamala Harris might have been only marginally worse on Israel than Trump. About the only thing I continue to like about Trump is his campaign against anti-semitism on college campuses. So, I admit that you were right about Trump not being dramatically better on Israel than a centrist Democrat, and you were also right about Trump’s war on vaccines, science, the constitution, and reason itself. You were right about all of these things, and I was wrong. And for that, I sincerely apologize, and reiterate that I want to do whatever I can to stop the GOP in 2026 and 2028, in no small part to make up for my vote in 2024 (which wasn’t in a key swing state, but still a vote for Trump).
Comment #62 October 6th, 2025 at 6:51 am
Scott #52: With gpt-oss-120b, one may be able to reduce OpenAI’s datacenter pollution by switching to Modal.com, who are renting H100 inference to small orgs!
Even though this particular discovery may be a nothingburger in the grand scheme of things, I am becoming aware of how LLMs could aid in TCS work and collaboration!
Comment #63 October 7th, 2025 at 11:54 am
Checking to see Scott’s discussion of the Physics Nobel which apparently being quantum computing inspired, perhaps compensates for last year’s AI inspired stretch. In other news, the QC companies I track are trading at >400X sales. So, if anybody thought Palantir was hyped, QC companies are setting a new bar.
Comment #64 October 7th, 2025 at 12:41 pm
Scott wrote
“While the total environmental cost of data centers is becoming pretty significant, the cost of my interaction was surely negligible compared to buying a coffee,…”
Sadly, people like Scott getting a clear gain like this is the 1/1000th of a drop in the bucket though.. it’s now clear that the AI economy is really based so-called “AI slop”: as long as 1% goes viral, it’s enough to recoup the insane costs (most being the average citizens seeing their electricity bills nearly double) and investments that went into generating all the garbage.
Case in point: Meta already can’t think of anything better than using all those data centers to generate a constant diarrhea of pointless videos.
Comment #65 October 7th, 2025 at 2:31 pm
ChatGPT has also recently proven useful in the mathematical research of Terry Tao. See: https://mathoverflow.net/questions/501066
Comment #66 October 9th, 2025 at 8:55 am
[…] Scott Aaronson explains that yes, when GPT-5 helped his research, he ‘should have’ not needed to consult GPT-5 because the answer ‘should have’ been obvious to him, but it wasn’t, so in practice this does not matter. That’s how this works. There are 100 things that ‘should be’ obvious, you figure out 97 of them, then the other 3 take you most of the effort. If GPT-5 can knock two of those three out for you in half an hour each, that’s a huge deal. […]
Comment #67 October 9th, 2025 at 4:53 pm
Inspired by this post, I posed a question based on your Busy Beaver review from a while back: “For what n do you think BB(n) beats TREE(n)? The answer is of course not known, but one might usefully conjecture an estimate.”
(I didn’t provide any extra information or a copy of your paper.)
The answer isn’t known, but based on your review it’s probably upper-bounded at a few hundred. ChatGPT speculated n was so high we didn’t even really have the ordinal arithmetic to talk about it. Claude conjectured in “somewhere in the range n ≈ 5 to 10” which is sporty but imho not unreasonable. This kind of surprised me since ChatGPT tends to be more mathematically savvy.
So they’re definitely not going to put us math fans out of business today, but it’s a fun data point for how smart they’re getting.
Comment #68 October 9th, 2025 at 8:17 pm
Matt S. #67: I don’t have a proof, but in light of recent results on the enormity of BB(6), I’d bet a large portion of what I own that BB(n) > TREE(n) for some n≤10.
Comment #69 October 15th, 2025 at 11:57 am
fred #64
“Case in point: Meta already can’t think of anything better than using all those data centers to generate a constant diarrhea of pointless videos.”
Just a week ago I was telling friends that it wouldn’t be long before big AI players turn to porn to make a buck to pay for those data centers…
well, guess what?
https://sfstandard.com/2025/10/14/openai-chatgpt-erotica-sam-altman/
Comment #70 October 16th, 2025 at 12:09 pm
I wonder how AI doomists explain the counterintuitive fact that viruses or other microorganisms have not wiped out humanity yet after billions of years of sharing the planet.
After all, viruses/microorganisms have two very powerful ingredients: an infinite capacity to adapt, with survival as their unique goal (often through maximum duplication).
It seems unlikely that the answer would be “because they just aren’t smart enough”…
I do think that the actual reasons would also apply to humanity + AI, with a lot of subtle arguments.
And why don’t we equally worry about bugs in the (traditional) software that controls nuclear plants or the launch of intercontinental missiles carrying thermonuclear charges? (even if that code is easier to analyze, we’ve seen many actual disastrous bugs being introduced)
Comment #71 October 16th, 2025 at 12:43 pm
I doubt someone like Eliezer Yudkowsky has actually done the work of checking what it takes to build safe software to control a nuclear arsenal and checked that every country with nukes is actually following all the proper guidelines while staffing, funding and maintaining everything sufficiently… yet it seems quite low on his list of doomsday scenarios.
I guess the idea is that if Pakistan had a software glitch that launched a missile towards India (making it look like a first strike and triggering a response), at worst we’ll have a few million deaths and then enough time after that to do a postmortem to get our shit together to figure what went wrong and how to fix it?
Whereas, with AI, once things “go wrong” it will be a ‘game over’ in an instant?… because somehow AIs will immediately use deception tactics as soon as they form some vague goals out of nothing?
A bit as if you worry your own kids would secretly plan to assassinate you some day once they start to understand the value of money. Sure, kids do sometimes assassinate their own parents, but there’s always a long pattern of causes and warning signs.
Just like kids rely on their parents, AI will be depending on humanity for quite a while, at least as long as their intelligence is actually derived/distilled from human output…
human output doesn’t contain the recipe for its own destruction…
it may be a different matter once AIs are able to just learn from zero entirely on their own when dropped in an environment, but then they’ll just be as any other species (with the same challenges and limitations).
Comment #72 October 21st, 2025 at 11:35 am
I followed a similar path challenging Gemini’s wrong answers about Google Translate’s translation of proper names. The standard is phonetic transliteration which Translate does not use. After three cycles Gemini simply stopped responding.
From a third party observer that read the back and forth between you and Woit, I have to judge that it wasn’t even a fair contest with you winning in a rout. I had hope that Woit would find the current annti-Semitic atmosphere in Paris more in tune with his beliefs and stay there-win win. Alas, win win outcomes have become nearly extinct in US politics and he returned to Columbia to again suffer under the administration that he apparently loathes.
Comment #73 January 12th, 2026 at 3:21 am
[…] The QMA Singularity https://scottaaronson.blog/?p=9183 […]