An Experimental Program for AI-Powered Feedback at STOC: Guest Post from David Woodruff
This year for STOC, we decided to run an experiment to explore the use of Large Language Models in the theoretical computer science community, and we’re inviting the entire community to participate.
We—a team from the STOC PC—are offering authors the chance to get automated pre-submission feedback from an advanced, Gemini-based LLM tool that’s been optimized for checking mathematical rigor. The goal is simple: to provide constructive suggestions and, potentially, help find technical mistakes before the paper goes to the PC. Some important points:
- This is 100% optional and opt-in.
- The reviews generated WILL NOT be passed on to the PC. They are for your eyes only.
- Data Privacy is Our #1 Commitment. We commit that your submitted paper will NOT be logged, stored, or used for training.
- Please do not publicly share these reviews without contacting the organizing team first.
This tool is specifically optimized for checking a paper’s mathematical rigor. It’s a hopefully useful way to check the correctness of your arguments. Note that sometimes it does not possess external, area-specific knowledge (like “folklore” results). This means it may flag sections that rely on unstated assumptions, or it might find simple omissions or typos.
Nevertheless, we hope you’ll find this feedback valuable for improving the paper’s overall clarity and completeness.
If you’re submitting to STOC, we encourage you to opt-in. You’ll get (we hope) useful feedback, and you’ll be providing invaluable data as we assess this tool for future theory conferences.
The deadline to opt-in on the HotCRP submission form is November 1 (5pm EST).
You can read the full “Terms of Participation” (including all privacy and confidentiality details) at the link below.
This experiment is being run by PC members David Woodruff (CMU) and Rajesh Jayaram (Google), as well as Vincent Cohen-Addad (Google) and Jon Schneider (Google).
We’re excited to offer this resource to the community.
Please see the STOC Call for Papers here and specific details on the experiment here.
Follow
Comment #1 October 28th, 2025 at 8:27 pm
Great idea!
Hopefully many of those who opt in will keep detailed notes on the progress of the interaction for a followup study on how well it works. This should provide solid data on the usefullness (or otherwise) of AI in the progress of science.
Comment #2 October 29th, 2025 at 12:34 pm
Is this the Chicago pile of automated proof checkers for conference papers?
(awesome idea, btw!)
Comment #3 October 30th, 2025 at 12:46 am
Any comments on this new “quantum echoes” algorithm? (sorry for the off-topic).
Thanks, LK.
Comment #4 October 31st, 2025 at 4:13 am
I regularly use Claude to provide instant answers to specific questions about new papers (e.g. “Can you explain the part of this paper where they think they obtain the electroweak coupling”), and GPT-5 for longer discussions that combine brainstorming and didactic explanation. (Maybe Claude would be capable of this too, but I am on a plan that does not allow long discussions.)
Two weeks ago on this blog (under the thread “Sad and happy day”), I linked to a claimed proof that P ≠ NP by the veteran AI researcher, Ben Goertzel. Rather mechanical discussion of the paper with GPT-5 yielded a summary of its argument in the form of 35 steps. If I had to sum up my procedure, it would be like this:
1. Summarize the argument, and list the points where you would focus for a detailed verification
2. Summarize how you would verify each point
3. For each point in turn, ask for more detail
4. If you summarized the proof in less than 50 logical steps, what would they be?
(I think that questions 1-3 prime the AI for a more nuanced decomposition of the proof, but haven’t tested this.)
What I plan to do now is use my mere human intelligence to work my way through the 35 steps, reconstruct the argument that way, and see if I agree with it.
But in principle I could ask GPT-5 to present a strict deductive formulation of each step. I could even ask it to do so in Lean.
I am curious to what extent the “advanced Gemini tool” is based on a protocol like this, or whether something more sophisticated is involved.
Comment #5 November 3rd, 2025 at 2:27 pm
Great to see the theoreticalCS community embracing experimental tools like this. The idea of offering authors optional, automated LLM-based feedback to catch mathematical or logical issues before formal review is super promising. By clarifying it’s opt-in, private, and not fed into the main review process, it respects authors’ control and builds trust in the system.
In that spirit, I’d also highlight a related service: monobot.ai. If you’re looking to streamline repetitive tasks, automate structured review comments, or scale feedback loops in your work (whether papers, drafts, or internal documents), monobot.ai can be a useful partner. It won’t replace domain-expert review in deep theory, but it builds the muscle of more frequent feedback, which aligns really well with the approach of this STOC pilot.
Kudos to the organizers for exploring how LLMs can assist rather than replace, and for thinking about privacy and data-use right from the start. Looking forward to seeing how this evolves, and how authors respond to the experience.
Comment #6 November 9th, 2025 at 12:34 am
Hi Scott,
Have you by any chance tested the tool on bogus P=NP or P!=NP manuscripts that you no doubt receive every day?