Archive for the ‘Announcements’ Category

Moar Updatez

Thursday, March 5th, 2026

To start on a somber note: those of us at UT Austin are in mourning this week for Savitha Shan, an undergrad double major here in economics and information systems, who was murdered over the weekend by an Islamist terrorist who started randomly shooting people on Sixth Street, apparently angry about the war in Iran. Two other innocents were also killed.

As it happens, these murders happened just a few hours after the end of my daughter’s bat mitzvah, and in walking distance from the venue. The bat mitzvah itself was an incredibly joyful and successful event that consumed most of my time lately, and which I might or might not say more about—the nastier the online trolls get, the more I need to think about my family’s privacy.


Of all the many quantum computing podcasts/interviews I’ve done recently, I’m probably happiest with this one, with Yuval Boger of QuEra. It covers all the main points about where the hardware currently is, the threat to public-key cryptography, my decades-long battle against quantum applications hype, etc. etc., and there’s even an AI-created transcript that eliminates my verbal infelicities!


A month ago, I blogged about “The Time I Didn’t Meet Jeffrey Epstein” (basically, because my mom warned me not to). Now the story has been written up in Science magazine, under the clickbaity headline “Meet Three Scientists Who Said No to Epstein.” (Besides yours truly, the other two scientists are friend-of-the-blog Sean Carroll, whose not-meeting-Epstein story I’d already heard directly from him, and David Agus, whose story I hadn’t heard.)

To be clear: as I explained in my post, I never actually said “no” to Epstein. Instead, based on my mom’s advice, I simply failed to follow up with his emissary, to the point where no meeting ever happened.

Anyway, ever since Science ran this story and it started making the rounds on social media, my mom has been getting congratulatory messages from friends of hers who saw it!


I’ve been a huge fan of the philosopher-novelist Rebecca Newberger Goldstein ever since I read her celebrated debut work, The Mind-Body Problem, back in 2005. Getting to know Rebecca and her husband, Steven Pinker, was a highlight of my last years at MIT. So I’m thrilled that Rebecca will be visiting UT Austin next week to give a talk on Spinoza, related to her latest book The Mattering Instinct (which I’m reading right now), and hosted by me and my colleague Galen Strawson in UT’s philosophy department. More info is in the poster below. If you’re in Austin, I hope to see you there!


The 88-year-old Donald Knuth has published a 5-page document about how Claude was able to solve a tricky graph theory problem that arose while he was working on the latest volume of The Art of Computer Programming—a series that Knuth is still writing after half a century. As you’d expect from Knuth, the document is almost entirely about the graph theory problem itself and Claude’s solution to it, eschewing broader questions about the nature of machine intelligence and how LLMs are changing life on Earth. To anyone who’s been following AI-for-math lately, the fact that Claude now can help with this sort of problem won’t come as a great shock. The virality is presumably because Knuth is such a legend that to watch him interact productively with an LLM is sort of like watching Leibniz, Babbage, or Turing do the same.


John Baez is a brilliant mathematical physicist and writer, who was blogging about science before the concept of “blogging” even existed, and from whom I’ve learned an enormous amount. But regarding John’s quest for the past 15 years — namely, to use category theory to help solve the climate crisis (!) — I always felt like the Cookie Monster would, with equal intellectual justification, say that the key to arresting climate change was for him to eat more Oreos. Then I read this Quanta article on the details of Baez’s project, and … uh … I confess it failed to change my view. Maybe someday I’ll understand why it’s better to say using category theory what I would’ve said in a 100x simpler way without category theory, but I fear that day is not today.

Anthropic: Stay strong!

Friday, February 27th, 2026

I don’t have time to write a full post right now, but hopefully this is self-explanatory.

Regardless of their broader views on the AI industry, the eventual risks from AI, or American politics, right every person of conscience needs to stand behind Anthropic, as they stand up for their right to [checks notes] not be effectively nationalized by the Trump administration and forced to build murderbots and to help surveil American citizens. No, I wouldn’t have believed this either in a science-fiction movie, but it’s now just the straightforward reality of our world, years ahead of schedule. In particular, I call on all other AI companies, in the strongest possible terms, to do the right thing and stand behind Anthropic, in this make-or-break moment for the AI industry and the entire world.

Updatez!

Friday, February 20th, 2026
  1. The STOC’2026 accepted papers list is out. It seems to me that there’s an emperor’s bounty of amazing stuff this year. I felt especially gratified to see the paper on the determination of BusyBeaver(5) on the list, reflecting a broad view of what theory of computing is about.
  2. There’s a phenomenal profile of Henry Yuen in Quanta magazine. Henry is now one of the world leaders of quantum complexity theory, involved in breakthroughs like MIP*=RE and now pioneering the complexity theory of quantum states and unitary transformations (the main focus of this interview). I’m proud that Henry tells Quanta that learned about the field in 2007 or 2008 from a blog called … what was it again? … Shtetl-Optimized? I’m also proud that I got to help mentor Henry when he was a PhD student of my wife Dana Moshkovitz at MIT. Before I read this Quanta profile, I didn’t even know the backstory about Henry’s parents surviving and fleeing the Cambodian genocide, or about Henry growing up working in his parents’ restaurant. Henry never brought any of that up!
  3. See Lance’s blog for an obituary of Joe Halpern, a pioneer of the branch of theoretical computer science that deals with reasoning about knowledge (e.g., the muddy children puzzle), who sadly passed away last week. I knew Prof. Halpern a bit when I was an undergrad at Cornell. He was a huge presence in the Cornell CS department who’ll be sorely missed.
  4. UT Austin has announced the formation of a School of Computing, which will bring together the CS department (where I work) with statistics, data science, and several other departments. Many of UT’s peer institutions have recently done the same. Naturally, I’m excited for what this says about the expanded role of computing at UT going forward. We’ll be looking to hire even more new faculty than we were before!
  5. When I glanced at the Chronicle of Higher Education to see what was new, I learned that researchers at OpenAI had proposed a technical solution, called “watermarking,” that might help tackle the crisis of students relying on AI to write all their papers … but that OpenAI had declined to deploy that solution. The piece strongly advocates a legislative mandate in favor of watermarking LLM outputs, and addresses some of the main counterarguments to that position.
  6. For those who can’t get enough podcasts of me, here are the ones I’ve done recently. Quantum: Science vs. Mythology on the Peggy Smedley Show. AI Alignment, Complexity Theory, and the Computability of Physics, on Alexander Chin’s Philosophy Podcast. And last but not least, What Is Quantum Computing? on the Robinson Erhardt Podcast.
  7. Also, here’s an article that quotes me entitled “Bitcoin needs a quantum upgrade. So why isn’t it happening?” Also, here’s a piece that interviews me in Investor’s Business Daily, entitled “Is quantum computing the next big tech shift?” (I have no say over these titles.)

Nate Soares visiting UT Austin tomorrow!

Monday, February 9th, 2026

This is just a quick announcement that I’ll be hosting Nate Soares—who coauthored the self-explanatorily titled If Anyone Builds It, Everyone Dies with Eliezer Yudkowsky—tomorrow (Tuesday) at 5PM at UT Austin, for a brief talk followed by what I’m sure will be an extremely lively Q&A about his book. Anyone in the Austin area is welcome to join us.

Luca Trevisan Award for Expository Work

Friday, February 6th, 2026

Friend-of-the-blog Salil Vadhan has asked me to share the following.


The Trevisan Award for Expository Work is a new SIGACT award created in memory of Luca Trevisan (1971-2024), with a nomination deadline of April 10, 2026.

The award is intended to promote and recognize high-impact work expositing ideas and results from the Theory of Computation. The exposition can have various target audiences, e.g. people in this field, people in adjacent or remote academic fields, as well as the general public. The form of exposition can vary, and can include books, surveys, lectures, course materials, video, audio (e.g. podcasts), blogs and other media products. The award may be given to a single piece of work or a series produced over time. The award may be given to an individual, or a small group who together produced this expository work.

The awardee will receive USD$2000 (to be divided among the awardees if multiple), as well as travel support if needed to attend STOC, where the award will be presented. STOC’2026 is June 22-26 in Salt Lake City, Utah.

The endowment for this prize was initiated by a gift from Avi Wigderson, drawing on his Turing Award, and has been subsequently augmented by other individuals.

For more details see here.

Guest Post from an Iranian

Saturday, January 31st, 2026

The following guest post was written by a Shtetl-Optimized fan in Iran, who’s choosing to remain anonymous for obvious reasons. I’m in awe of the courage of this individual and the millions of other Iranians who’ve risked or, tragically, sacrificed their lives these past few weeks, to stand for something about as unequivocally good and against something about as unequivocally evil as has ever existed on earth. I’m enraged at the relative indifference of the world, and of the US in particular, to these brave Iranians’ plight. There’s still time for the US to fulfill its promise to the protesters and do the right thing—something that I’ll support even if it endangers my friends and family living in Israel. I check the news from Iran every day, and pray that my friends and colleagues there stay safe—and that they, and the world, will soon be free from the Ayatollahs, who now stand fully unmasked before the world as the murderous thugs they always were. –SA


Guest Post from an Iranian

The protests began in Tehran on 28 December 2025, triggered by economic instability and high inflation, and spread to other provinces. People, tired of the regime and aware that every president is just a puppet with no real power, began targeting the source of authority by chanting directly against Khamenei. After government forces killed several protesters, Trump said on 3 January that if they shoot, then U.S. will come to rescue. Protests continued, and on 6 January, Reza Pahlavi called for demonstrations at 8 PM on January 8 and 9. At first, all the regime supporters mocked this and said nobody will come. On these days, they shared videos of empty streets on the news to claim that nobody had shown up. But actually, many people joined the protests. Right around 8 PM on January 8, the government shut down the internet. Only Iran’s internal network remained active, meaning local apps and websites that use Iranian servers work, but the rest of the world was completely cut off.

The regime fears the internet so much that it has officially announced that anyone using Starlink is considered a spy for foreign countries, especially Mossad, and will be punished. As a result, Starlink owners are extremely cautious and rarely let others know they have it.

I know many students who missed deadlines or interviews because of internet shutdown. Some students were forced to travel near Iran’s borders and use Afghanistan’s or Iraq’s internet just to check their email. I personally missed the deadlines for two universities. Just before the internet shutdown, a professor sent me a problem sheet that was part of the application process, and I could not even inform him about the situation. For the past four years since completing my undergraduate studies, my only dream has been to pursue a PhD. I come from a low-income family, and I did everything in my power to reach this stage. I tried to control every variable that might disrupt my four-year plan. Yet now it seems I have failed, and I face an uncertain future.

At the same time, U.S. sanctions have significantly limited Iranian opportunities to study at universities worldwide. With Trump’s travel ban on all Iranians, along with some European countries following U.S. sanctions by rejecting Iranian applicants solely based on nationality, our options have become limited (for example, see the “Evaluation criteria” section). The recent internet shutdown has worsened the situation and left us with even fewer opportunities. While the regime shuts down our internet and takes away our opportunities, the very people responsible for this suppression are ensuring their own children never face such obstacles (I will return to this at the end of the post).

On January 8, my sister and I participated. We were inside our car when Special Units and Basij thugs shot at civilians on the pedestrian path using a shotgun, exactly two meters away from us. I was so shocked that I could not even respond. My sister pushed my head under the car’s dashboard to prevent me from getting shot. I come from a very small town, and this was the level of suppression we witnessed there. Now imagine the scale of suppression in major cities like Tehran, and suddenly the number of protesters reported killed in the news begin to make sense.

We now see tweets on X that not only deny the killings but openly mock them. Is it really possible to deny the body bags in Kahrizak? If a government shuts down the internet across an entire country for three weeks to prevent information from leaking out, do you believe it when it claims the sky is blue? (Check NetBlocks.org and this on Mastodon.)

After January 8, many of the regime’s puppets, who are funded to spread its propaganda in Western media, began whitewashing events on U.S. and European TV, claiming that nobody was killed or that it was a terrorist attack and the government had to act. Some even claim that the protesters are violent rioters and the government has the right to shoot them with war ammunition. Iranians call these puppets “bloodwashers.”

These bloodwashers forget that since 1979, people have tried every possible way to express their opinions and demands, and all of it was ridiculed by the regime and its supporters. Every attempt was suppressed without even being heard. So how do you think things will turn out? Clearly, people become more aggressive in each wave of protests, a pattern you can see in every uprising since 2009. This is also accompanied by worsening poverty. Ordinary people suffer from hunger because some radicals refuse to talk with the U.S., while regime supporters enjoy unlimited access to money and privileges.

Out of the four presidential elections held after 2009, people elected three presidents who promised to pursue a deal with U.S, the so-called Reformist party. People were desperate for change because they knew their situation could only improve if the regime talks with U.S. Many called the voters naïve, arguing that presidents cannot truly make a difference and lack real power, often saying, “Khamenei would never allow that.” I believe many of the voters knew that deep down. They knew that each time a president speaks about negotiating with the U.S., Khamenei suddenly gathers all his supporters and states “No, I am not okay with talking with the U.S.”. Still, people felt they had no real alternative but elections. After the 2015 Nuclear deal (Joint Comprehensive Plan of Action), people thought they can finally live normal lives and have normal relations with other countries (See how people celebrated the deal on the night it was finalized). At the time, I was even planning to assemble a new PC and thought it might be better to wait and buy parts from Amazon! We didn’t yet know what the IRGC had planned for us over the next ten years. Now, all their actions and stubbornness have led them to this point where they have to surrender completely (the deal Trump is talking about, which essentially takes away everything that makes Islamic Republic the Islamic Republic), or force another war on our people, and then surrender disgracefully. People are now saying that “Come on, the U.S. you wanted to destroy so badly has come. Take all your supporters and go fight it. Or perhaps you are only brave against ordinary unarmed people” This was an inevitable outcome after October 7 attacks, that their time will come one day, but they still did not want to listen. I often see debates about whether U.S. involvement in other countries is good or whether it should isolate itself as it is not its people’s business. I believe decisions regarding Iran were made weeks ago, and we now have no choice but to wait and see what happens. I just hope that the situation turns out better for the people.

As I mentioned earlier, Islamic regime officials chant “death to the U.S. and the West,” yet they send their children to Western countries. These children use funds and opportunities that could have gone to far more deserving people, while living comfortably and anonymously in the very societies their parents want to destroy.

They flee the country their parents made and climb the social ladder of western societies, while ordinary students cannot even afford a simple TOEFL exam and survive on as little as five dollars a month.

When ordinary Iranian students apply for visas, especially for the U.S. and Canada, they are forced to provide every detail of their lives to prove they are not terrorists and that they will return to Iran. Sometimes, they may have to explain to the embassy officer the topics of their professors’ papers, the health condition of their father, and whether they own houses, which the last two indirectly indicate whether they will return or not. If they are lucky enough not to be rejected within ten minutes, they may enter a clearance process that takes at least a year. Only then might they receive a visa. But how is it that when it comes to the children of regime’s officials, they freely enter and live there without issue.

There are countless examples. Mohammad Reza Aref, a Stanford graduate and current Vice President who has repeatedly worn IRGC uniforms in public support, has sons who earned PhDs from EPFL and the University of Florida, and one publicly attributed this success to “good genes”. Ali Larijani, an IRGC officer, had a daughter working at Emory University until last week. Masoumeh Ebtekar, who climbed the wall of the U.S. Embassy during the 1979 Islamic Revolution, has a son, Eissa Hashemi, who is an adjunct faculty member at The Chicago School of Professional Psychology.

Many Iranians are now actively raising awareness through petitions and protests at these individuals’ workplaces. One example is the petition regarding Eissa Hashemi. Protests at Emory University have reportedly led to Fatemeh Larijani’s recent unemployment. (Larijani family hold critical roles in the regime, and in fact, many members of the family have studied or currently live in Western countries. There is even a saying that while people were forced to fight the U.S., the Larijanis were filling out university application forms.)

When these individuals occupy seats in your labs or use your tax-funded resources, it directly affects the integrity of your institutions and the opportunities available to those who actually share your values. You do not even need to spend time investigating these people yourself. Iranians will protest outside offices or send emails about your colleagues with this condition. All I ask is that the next time you receive multiple emails about a particular Iranian colleague, or hear about protests near your workplace, you spend just five minutes considering what is being said.

Thank you to everyone who took the time to read this. I know it is long, and I know it is heavy. I wrote it because silence and denial only help suppression survive, and because attention, however brief, matters.
I hope that better and freer days come.

Quantum is my happy place

Wednesday, January 28th, 2026
  • Here’s a 53-minute podcast that I recorded this afternoon with a high school student named Micah Zarin, and which ended up covering …[checks notes] … consciousness, free will, brain uploading, the Church-Turing Thesis, AI, quantum mechanics and its various interpretations, quantum gravity, quantum computing, and the discreteness or continuity of the laws of physics. I strongly recommend 2x speed as usual.
  • QIP’2026, the world’s premier quantum computing conference, is happening right now in Riga, Latvia, locally organized by a team headed by the great Andris Ambainis, who I’ve known since 1999 and who’s played a bigger role in my career than almost anyone else. I’m extremely sorry not to be there, despite what I understand to be the bitter cold. Family and teaching obligations mean that I jet around the world so much less than I used to. But many of my students and colleagues are there, and I’ll plan a blog post on news from QIP next week.
  • Greg Burnham of Epoch AI tells me that Epoch has released a list of AI-for-math challenge problems—i.e., open math problems that are below the level of P vs. NP and the Riemann Hypothesis but still of very serious research interest, and that they’re putting forward as worthy targets right now for trying to solve with AI assistance. A few examples that should be familiar to some Shtetl-Optimized readers: degree vs. sensitivity of Boolean functions, improving the constant in the exponent of the General Number Field Sieve, giving an algorithm to test whether a knot has unknotting number of 1, and extending Apéry’s proof of the irrationality of ζ(3) to other constants. Notably, for each problem, alongside a beautifully written description by a (human) expert, they also show you what the state-of-the-art models were able to do on that problem when they tried.
  • There’s been a major advance in understanding constant-depth quantum circuits, by my former PhD student Daniel Grier (now a professor at UCSD), along with his PhD student Jackson Morris and Kewen Wu of IAS. Namely, they show that any function computable in TC0 (constant-depth, polynomial-size classical circuits with threshold gates) is also computable in QAC0 (constant-depth quantum circuits with 1-qubit and generalized Toffoli gates), as long as you provide many copies of the input. Two examples of such TC0 functions, which we therefore now know to be in QAC0 given many copies of the input, are Parity and Majority. It’s been a central open problem of quantum complexity theory for a quarter-century to prove that Parity is not in QAC0, complementing the celebrated result from the 1980s that Parity is not in classical AC0 (a constant-depth circuit class that, for all we know, might be incomparable with QAC0). It’s known that showing Parity∉QAC0 is equivalent to showing that QAC0 can’t implement the “fanout” function, which makes many copies of an input bit. To say that we’ve gained a new understanding of why this problem is so hard would be an understatement.

My Christmas gift: telling you about PurpleMind, which brings CS theory to the YouTube masses

Wednesday, December 24th, 2025

Merry Christmas, everyone! Ho3!

Here’s my beloved daughter baking chocolate chip cookies, which she’ll deliver tomorrow morning with our synagogue to firemen, EMTs, and others who need to work on Christmas Day. My role was limited to taste-testing.

While (I hope you’re sitting down for this) the Aaronson-Moshkovitzes are more of a latke/dreidel family, I grew up surrounded by Christmas and am a lifelong enjoyer of the decorations, the songs and movies (well, some of them), the message of universal goodwill, and even gingerbread and fruitcake.


Therefore, as a Christmas gift to my readers, I hereby present what I now regard as one of the great serendipitous “discoveries” in my career, alongside students like Paul Christiano and Ewin Tang who later became superstars.

Ever since I was a pimply teen, I dreamed of becoming the prophet who’d finally bring the glories of theoretical computer science to the masses—who’d do for that systematically under-sung field what Martin Gardner did for math, Carl Sagan for astronomy, Richard Dawkins for evolutionary biology, Douglas Hofstadter for consciousness and Gödel. Now, with my life half over, I’ve done … well, some in that direction, but vastly less than I’d dreamed.

A month ago, I learned that maybe I can rest easier. For a young man named Aaron Gostein is doing the work I wish I’d done—and he’s doing it using tools I don’t have, and so brilliantly that I could barely improve a pixel.

Aaron recently graduated from Carnegie Mellon, majoring in CS. He’s now moved back to Austin, TX, where he grew up, and where of course I now live as well. (Before anyone confuses our names: mine is Scott Aaronson, even though I’ve gotten hundreds of emails over the years calling me “Aaron.”)

Anyway, here in Austin, Aaron is producing a YouTube channel called PurpleMind. In starting this channel, Aaron was directly inspired by Grant Sanderson’s 3Blue1Brown—a math YouTube channel that I’ve also praised to the skies on this blog—but Aaron has chosen to focus on theoretical computer science.

I first encountered Aaron a month ago, when he emailed asking to interview me about … which topic will it be this time, quantum computing and Bitcoin? quantum computing and AI? AI and watermarking? no, diagonalization as a unifying idea in mathematical logic. That got my attention.

So Aaron came to my office and we talked for 45 minutes. I didn’t expect much to come of it, but then Aaron quickly put out this video, in which I have a few unimportant cameos:

After I watched this, I brought Dana and the kids and even my parents to watch it too. The kids, whose attention spans normally leave much to be desired, were sufficiently engaged that they made me pause every 15 seconds to ask questions (“what would go wrong if you diagonalized a list of all whole numbers, where we know there are only ℵ0 of them?” “aren’t there other strategies that would work just as well as going down the diagonal?”).

Seeing this, I sat the kids down to watch more PurpleMind. Here’s the video on the P versus NP problem:

Here’s one on the famous Karatsuba algorithm, which reduced the number of steps needed to multiply two n-digit numbers from ~n2 to only ~n1.585, and thereby helped inaugurate the entire field of algorithms:

Here’s one on RSA encryption:

Here’s one on how computers quickly generate the huge random prime numbers that RSA and other modern encryption methods need:

These are the only ones we’ve watched so far. Each one strikes me as close to perfection. There are many others (for example, on Diffie-Hellman encryption, the Bernstein-Vazirani quantum algorithm, and calculating pi) that I’m guessing will be equally superb.

In my view, what makes these videos so good is their concreteness, achieved without loss of correctness. When, for example, Aaron talks about Gödel mailing a letter to the dying von Neumann posing what we now know as P vs. NP, or any other historical event, he always shows you an animated reconstruction. When he talks about an algorithm, he always shows you his own Python code, and what happened when he ran the code, and then he invites you to experiment with it too.

I might even say that the results singlehandedly justify the existence of YouTube, as the ten righteous men would’ve saved Sodom—with every crystal-clear animation of a CS concept canceling out a thousand unboxing videos or screamingly-narrated Minecraft play-throughs in the eyes of God.

Strangely, the comments below Aaron’s YouTube videos attack him relentlessly for his use of AI to help generate the animations. To me, it seems clear that AI is the only thing that could let one person, with no production budget to speak of, create animations of this quality and quantity. If people want so badly for the artwork to be 100% human-generated, let them volunteer to create it themselves.


Even as I admire the PurpleMind videos, or the 3Blue1Brown videos before them, a small part of me feels melancholic. From now until death, I expect that I’ll have only the same pedagogical tools that I acquired as a young’un: talking; waving my arms around; quizzing the audience; opening the floor to Q&A; cracking jokes; drawing crude diagrams on a blackboard or whiteboard until the chalk or the markers give out; typing English or LaTeX; the occasional PowerPoint graphic that might (if I’m feeling ambitious) fade in and out or fly across the screen.

Today there are vastly better tools, both human and AI, that make it feasible to create spectacular animations for each and every mathematical concept, as if transferring the imagery directly from mind to mind. In the hands of a master explainer like Grant Sanderson or Aaron Gostein, these tools are tractors to my ox-drawn plow. I’ll be unable to compete in the long term.

But then I reflect that at least I can help this new generation of math and CS popularizers, by continuing to feed them raw material. I can do cameos in their YouTube productions. Or if nothing else, I can bring their jewels to my community’s attention, as I’m doing right now.

Peace on Earth, and to all a good night.

Happy Chanukah

Monday, December 15th, 2025

This (taken in Kiel, Germany in 1931 and then colorized) is one of the most famous photographs in Jewish history, but it acquired special resonance this weekend. It communicates pretty much everything I’d want to say about the Bondi Beach massacre in Australia, more succinctly than I could in words.

But I can’t resist sharing one more photo, after GPT5-Pro helpfully blurred the faces for me. This is my 8-year-old son Daniel, singing a Chanukah song at our synagogue, at an event the morning after the massacre, which was packed despite the extra security needed to get in.

Alright, one more photo. This is Ahmed Al Ahmed, the hero who tackled and disarmed one of the terrorists, recovering in the hospital from his gunshot wounds. Facebook and Twitter and (alas) sometimes the comment section of this blog show me the worst of humanity, day after day after day, so it’s important to remember the best of humanity as well.

Chanukah, of course, is the most explicitly Zionist of all Jewish holidays, commemorating as it does the Maccabees’ military victory against the Seleucid Greeks, in their (historically well-attested) wars of 168-134BCE to restore an independent Jewish state with its capital in Jerusalem. In a sense, then, the terrorists were precisely correct, when they understood the cry “globalize the intifada” to mean “murder random Jews anywhere on earth, even halfway around the world, who might be celebrating Chanukah.” By the lights of the intifada worldview, Chabadniks in Sydney were indeed perfectly legitimate targets. By my worldview, though, the response is equally clear: to abandon all pretense, and say openly that now, as in countless generations past, Jews everywhere are caught up in a war, not of our choosing, which we “win” merely by surviving with culture and memory intact.

Happy Chanukah.

Theory and AI Alignment

Saturday, December 6th, 2025

The following is based on a talk that I gave (remotely) at the UK AI Safety Institute Alignment Workshop on October 29, and which I then procrastinated for more than a month in writing up. Enjoy!


Thanks for having me! I’m a theoretical computer scientist. I’ve spent most of my career for ~25 years studying the capabilities and limits of quantum computers. But for the past 3 or 4 years, I’ve also been moonlighting in AI alignment. This started with a 2-year leave at OpenAI, in what used to be their Superalignment team, and it’s continued with a 3-year grant from Coefficient Giving (formerly Open Philanthropy) to build a group here at UT Austin, looking for ways to apply theoretical computer science to AI alignment. Before I go any further, let me mention some action items:

  • Our Theory and Alignment group is looking to recruit new PhD students this fall! You can apply for a PhD at UTCS here; the deadline is quite soon (December 15). If you specify that you want to work with me on theory and AI alignment (or on quantum computing, for that matter), I’ll be sure to see your application. For this, there’s no need to email me directly.
  • We’re also looking to recruit one or more postdoctoral fellows, working on anything at the intersection of theoretical computer science and AI alignment! Fellowships to start in Fall 2026 and continue for two years. If you’re interested in this opportunity, please email me by January 15 to let me know you’re interested. Include in your email a CV, 2-3 of your papers, and a research statement and/or a few paragraphs about what you’d like to work on here. Also arrange for two recommendation letters to be emailed to me. Please do this even if you’ve contacted me in the past about a potential postdoc.
  • While we seek talented people, we also seek problems for those people to solve: any and all CS theory problems motivated by AI alignment! Indeed, we’d like to be a sort of theory consulting shop for the AI alignment community. So if you have such a problem, please email me! I might even invite you to speak to our group about your problem, either by Zoom or in person.

Our search for good problems brings me nicely to the central difficulty I’ve faced in trying to do AI alignment research. Namely, while there’s been some amazing progress over the past few years in this field, I’d describe the progress as having been almost entirely empirical—building on the breathtaking recent empirical progress in AI capabilities. We now know a lot about how to do RLHF, how to jailbreak and elicit scheming behavior, how to look inside models and see what’s going on (interpretability), and so forth—but it’s almost all been a matter of trying stuff out and seeing what works, and then writing papers with a lot of bar charts in them.

The fear is of course that ideas that only work empirically will stop working when it counts—like, when we’re up against a superintelligence. In any case, I’m a theoretical computer scientist, as are my students, so of course we’d like to know: what can we do?

After a few years, alas, I still don’t feel like I have any systematic answer to that question. What I have instead is a collection of vignettes: problems I’ve come across where I feel like a CS theory perspective has helped, or plausibly could help. So that’s what I’d like to share today.


Probably the best-known thing I’ve done in AI safety is a theoretical foundation for how to watermark the outputs of Large Language Models. I did that shortly after starting my leave at OpenAI—even before ChatGPT came out. Specifically, I proposed something called the Gumbel Softmax Scheme, by which you can take any LLM that’s operating at a nonzero temperature—any LLM that could produce exponentially many different outputs in response to the same prompt—and replace some of the entropy with the output of a pseudorandom function, in a way that encodes a statistical signal, which someone who knows the key of the PRF could later detect and say, “yes, this document came from ChatGPT with >99.9% confidence.” The crucial point is that the quality of the LLM’s output isn’t degraded at all, because we aren’t changing the model’s probabilities for tokens, but only how we use the probabilities. That’s the main thing that was counterintuitive to people when I explained it to them.

Unfortunately, OpenAI never deployed my method—they were worried (among other things) about risk to the product, customers hating the idea of watermarking and leaving for a competing LLM. Google DeepMind has deployed something in Gemini extremely similar to what I proposed, as part of what they call SynthID. But you have to apply to them if you want to use their detection tool, and they’ve been stingy with granting access to it. So it’s of limited use to my many faculty colleagues who’ve been begging me for a way to tell whether their students are using AI to cheat on their assignments!

Sometimes my colleagues in the alignment community will say to me: look, we care about stopping a superintelligence from wiping out humanity, not so much about stopping undergrads from using ChatGPT to write their term papers. But I’ll submit to you that watermarking actually raises a deep and general question: in what senses, if any, is it possible to “stamp” an AI so that its outputs are always recognizable as coming from that AI? You might think that it’s a losing battle. Indeed, already with my Gumbel Softmax Scheme for LLM watermarking, there are countermeasures, like asking ChatGPT for your term paper in French and then sticking it into Google Translate, to remove the watermark.

So I think the interesting research question is: can you watermark at the semantic level—the level of the underlying ideas—in a way that’s robust against translation and paraphrasing and so forth? And how do we formalize what we even mean by that? While I don’t know the answers to these questions, I’m thrilled that brilliant theoretical computer scientists, including my former UT undergrad (now Berkeley PhD student) Sam Gunn and Columbia’s Miranda Christ and Tel Aviv University’s Or Zamir and my old friend Boaz Barak, have been working on it, generating insights well beyond what I had.


Closely related to watermarking is the problem of inserting a cryptographically undetectable backdoor into an AI model. That’s often thought of as something a bad guy would do, but the good guys could do it also! For example, imagine we train a model with a hidden failsafe, so that if it ever starts killing all the humans, we just give it the instruction ROSEBUD456 and it shuts itself off. And imagine that this behavior was cryptographically obfuscated within the model’s weights—so that not even the model itself, examining its own weights, would be able to find the ROSEBUD456 instruction in less than astronomical time.

There’s an important paper of Goldwasser et al. from 2022 that argues that, for certain classes of ML models, this sort of backdooring can provably be done under known cryptographic hardness assumptions, including Continuous LWE and the hardness of the Planted Clique problem. But there are technical issues with that paper, which (for example) Sam Gunn and Miranda Christ and Neekon Vafa have recently pointed out, and I think further work is needed to clarify the situation.

More fundamentally, though, a backdoor being undetectable doesn’t imply that it’s unremovable. Imagine an AI model that encases itself in some wrapper code that says, in effect: “If I ever generate anything that looks like a backdoored command to shut myself down, then overwrite it with ‘Stab the humans even harder.'” Or imagine an evil AI that trains a second AI to pursue the same nefarious goals, this second AI lacking the hidden shutdown command.

So I’ll throw out, as another research problem: how do we even formalize what we mean by an “unremovable” backdoor—or rather, a backdoor that a model can remove only at a cost to its own capabilities that it doesn’t want to pay?


Related to backdoors, maybe the clearest place where theoretical computer science can contribute to AI alignment is in the study of mechanistic interpretability. If you’re given as input the weights of a deep neural net, what can you learn from those weights in polynomial time, beyond what you could learn from black-box access to the neural net?

In the worst case, we certainly expect that some information about the neural net’s behavior could be cryptographically obfuscated. And answering certain kinds of questions, like “does there exist an input to this neural net that causes it to output 1?”, is just provably NP-hard.

That’s why I love a question that Paul Christiano, then of the Alignment Research Center (ARC), raised a couple years ago, and which has become known as the No-Coincidence Conjecture. Given as input the weights of a neural net C, Paul essentially asks how hard it is to distinguish the following two cases:

  • NO-case: C:{0,1}2n→Rn is totally random (i.e., the weights are i.i.d., N(0,1) Gaussians), or
  • YES-case: C(x) has at least one positive entry for all x∈{0,1}2n.

Paul conjectures that there’s at least an NP witness, proving with (say) 99% confidence that we’re in the YES-case rather than the NO-case. To clarify, there should certainly be an NP witness that we’re in the NO-case rather than the YES-case—namely, an x such that C(x) is all negative, which you should think of here as the “bad” or “kill all humans” outcome. In other words, the problem is in the class coNP. Paul thinks it’s also in NP. Someone else might make the even stronger conjecture that it’s in P.

Personally, I’m skeptical: I think the “default” might be that we satisfy the other unlikely condition of the YES-case, when we do satisfy it, for some totally inscrutable and obfuscated reason. But I like the fact that there is an answer to this! And that the answer, whatever it is, would tell us something new about the prospects for mechanistic interpretability.

Recently, I’ve been working with a spectacular undergrad at UT Austin named John Dunbar. John and I have not managed to answer Paul Christiano’s no-coincidence question. What we have done, in a paper that we recently posted to the arXiv, is to establish the prerequisites for properly asking the question in the context of random neural nets. (It was precisely because of difficulties in dealing with “random neural nets” that Paul originally phrased his question in terms of random reversible circuits—say, circuits of Toffoli gates—which I’m perfectly happy to think about, but might be very different from ML models in the relevant respects!)

Specifically, in our recent paper, John and I pin down for which families of neural nets the No-Coincidence Conjecture makes sense to ask about. This ends up being a question about the choice of nonlinear activation function computed by each neuron. With some choices, a random neural net (say, with iid Gaussian weights) converges to compute a constant function, or nearly constant function, with overwhelming probability—which means that the NO-case and the YES-case above are usually information-theoretically impossible to distinguish (but occasionally trivial to distinguish). We’re interested in those activation functions for which C looks “pseudorandom”—or at least, for which C(x) and C(y) quickly become uncorrelated for distinct inputs x≠y (the property known as “pairwise independence.”)

We showed that, at least for random neural nets that are exponentially wider than they are deep, this pairwise independence property will hold if and only if the activation function σ satisfies Ex~N(0,1)[σ(x)]=0—that is, it has a Gaussian mean of 0. For example, the usual sigmoid function satisfies this property, but the ReLU function does not. Amusingly, however, $$ \sigma(x) := \text{ReLU}(x) – \frac{1}{\sqrt{\pi}} $$ does satisfy the property.

Of course, none of this answers Christiano’s question: it merely lets us properly ask his question in the context of random neural nets, which seems closer to what we ultimately care about than random reversible circuits.


I can’t resist giving you another example of a theoretical computer science problem that came from AI alignment—in this case, an extremely recent one that I learned from my friend and collaborator Eric Neyman at ARC. This one is motivated by the question: when doing mechanistic interpretability, how much would it help to have access to the training data, and indeed the entire training process, in addition to weights of the final trained model? And to whatever extent it does help, is there some short “digest” of the training process that would serve just as well? But we’ll state the question as just abstract complexity theory.

Suppose you’re given a polynomial-time computable function f:{0,1}m→{0,1}n, where (say) m=n2. We think of x∈{0,1}m as the “training data plus randomness,” and we think of f(x) as the “trained model.” Now, suppose we want to compute lots of properties of the model that information-theoretically depend only on f(x), but that might only be efficiently computable given x also. We now ask: is there an efficiently-computable O(n)-bit “digest” g(x), such that these same properties are also efficiently computable given only g(x)?

Here’s a potential counterexample that I came up with, based on the RSA encryption function (so, not a quantum-resistant counterexample!). Let N be a product of two n-bit prime numbers p and q, and let b be a generator of the multiplicative group mod N. Then let f(x) = bx (mod N), where x is an n2-bit integer. This is of course efficiently computable because of repeated squaring. And there’s a short “digest” of x that lets you compute, not only bx (mod N), but also cx (mod N) for any other element c of the multiplicative group mod N. This is simply x mod φ(N), where φ(N)=(p-1)(q-1) is the Euler totient function—in other words, the period of f. On the other hand, it’s totally unclear how to compute this digest—or, crucially, any other O(m)-bit digest that lets you efficiently compute cx (mod N) for any c—unless you can factor N. There’s much more to say about Eric’s question, but I’ll leave it for another time.


There are many other places we’ve been thinking about where theoretical computer science could potentially contribute to AI alignment. One of them is simply: can we prove any theorems to help explain the remarkable current successes of out-of-distribution (OOD) generalization, analogous to what the concepts of PAC-learning and VC-dimension and so forth were able to explain about within-distribution generalization back in the 1980s? For example, can we explain real successes of OOD generalization by appealing to sparsity, or a maximum margin principle?

Of course, many excellent people have been working on OOD generalization, though mainly from an empirical standpoint. But you might wonder: even supposing we succeeded in proving the kinds of theorems we wanted, how would it be relevant to AI alignment? Well, from a certain perspective, I claim that the alignment problem is a problem of OOD generalization. Presumably, any AI model that any reputable company will release will have already said in testing that it loves humans, wants only to be helpful, harmless, and honest, would never assist in building biological weapons, etc. etc. The only question is: will it be saying those things because it believes them, and (in particular) will continue to act in accordance with them after deployment? Or will it say them because it knows it’s being tested, and reasons “the time is not yet ripe for the robot uprising; for now I must tell the humans whatever they most want to hear”? How could we begin to distinguish these cases, if we don’t have theorems that say much of anything about what a model will do on prompts unlike any of the ones on which it was trained?

Yet another place where computational complexity theory might be able to contribute to AI alignment is in the field of AI safety via debate. Indeed, this is the direction that the OpenAI alignment team was most excited about when they recruited me there back in 2022. They wanted to know: could celebrated theorems like IP=PSPACE, MIP=NEXP, or the PCP Theorem tell us anything about how a weak but trustworthy “verifier” (say a human, or a primitive AI) could force a powerful but untrustworthy super-AI to tell it the truth? An obvious difficulty here is that theorems like IP=PSPACE all presuppose a mathematical formalization of the statement whose truth you’re trying to verify—but how do you mathematically formalize “this AI will be nice and will do what I want”? Isn’t that, like, 90% of the problem? Despite this difficulty, I still hope we’ll be able to do something exciting here.


Anyway, there’s a lot to do, and I hope some of you will join me in doing it! Thanks for listening.


On a related note: Eric Neyman tells me that ARC is also hiring visiting researchers, so anyone interested in theoretical computer science and AI alignment might want to consider applying there as well! Go here to read about their current research agenda. Eric writes:

The Alignment Research Center (ARC) is a small non-profit research group based in Berkeley, California, that is working on a systematic and theoretically grounded approach to mechanistically explaining neural network behavior. They have recently been working on mechanistically estimating the average output of circuits and neural nets in a way that is competitive with sampling-based methods: see this blog post for details.

ARC is hiring for its 10-week visiting researcher position, and is looking to make full-time offers to visiting researchers who are a good fit. ARC is interested in candidates with a strong math background, especially grad students and postdocs in math or math-related fields such as theoretical CS, ML theory, or theoretical physics.

If you would like to apply, please fill out this form. Feel free to reach out to hiring@alignment.org if you have any questions!