Nate Soares visiting UT Austin tomorrow!

This is just a quick announcement that I’ll be hosting Nate Soares—who coauthored the self-explanatorily titled If Anyone Builds It, Everyone Dies with Eliezer Yudkowsky—tomorrow (Tuesday) at 5PM at UT Austin, for a brief talk followed by what I’m sure will be an extremely lively Q&A about his book. Anyone in the Austin area is welcome to join us.

This entry was posted on Monday, February 9th, 2026 at 9:29 pm and is filed under Announcements, The Fate of Humanity. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

51 Responses to “Nate Soares visiting UT Austin tomorrow!”

OhMyGoodness Says:
Comment #1 February 10th, 2026 at 10:41 am
Neat event for Austin and wish I could attend. Tell him hello from the AI lovers on your blog. As a young reader I considered Fred Saberhagen’s Beserker series as the pre-eminent reference on machine civilizations but then later Gregory Benford’s Galactic Center series became my gold-standard reference.

Musk loves future-drama but I agree with him that the only reasonable hope for the US is AI and robots to do what people used to do but no longer want to do at internationally competitive rates.

I tried to improve my parenting skills by asking ChatGPT to write Rap rhymes to include my daughter’s names, the information I want them to act on, and me as a sympathetic figure. I love the rhymes but it doesn’t seem to help with my daughters.
Glassmind Duo Says:
Comment #2 February 10th, 2026 at 10:43 am
Where would you place yourself on a scale from 0 to 10, where 0 means “IABIED thesis is probably wrong, but even a tiny chance of existential risk deserves serious attention,” and 10 means “IABIED thesis is almost certainly right, and if you don’t treat it as profound you are dramatically overestimating your own insight”, and how has that position shifted over time? Could you outline some key moments or dates when your view significantly evolved, if any?
Scott Says:
Comment #3 February 10th, 2026 at 10:58 am
Glassmind Duo #2: Your scale is bizarre, since a large fraction of the world would be below 0 on it. In any case, I myself probably started around 0 but am now at about 4. The most important factor, in causing the change, was the shift of AI from “far mode” (“for all we know this will take centuries or millennia, and the world will be in such a drastically different situation that nothing we say about it now is likely to be useful”) to “near mode” (“oh shit, this is actually happening in our lifetimes and we need to try to steer it away from catastrophes”). This, in turn, was a result of the empirical discovery that simply scaling neural nets to billions of neurons and training them on predicting all the text on the Internet was enough to produce intelligent behavior — something that hardly anyone besides Dario Amodei successfully predicted.
Dániel Says:
Comment #4 February 10th, 2026 at 11:36 am
> hardly anyone besides Dario Amodei successfully predicted.

What exactly did Dario Amodei predict that hardly anyone besides him did? I usually see Alec Radford credited with GPT-1, and Jared Kaplan et al credited with scaling. Was there an intra-company debate about which way forward, where Amodei pushed for scaling up GPT?
Glassmind Duo Says:
Comment #5 February 10th, 2026 at 1:07 pm
Scott #3: Oh yeah, many would be off the charts in the negatives, but from reading your blog I thought the negative part was useless to gain better understanding of your view.

I like the “near mode” idea very much, but then you seem to be scoring the imminence of AI+, while I intended to score your estimate of “the inevitability of doom given we build something intelligent enough”. I guess that’s a way of saying that you’re not as impressed by the evidences of spontaneous misalignement as by the evidence of spontaneous intelligence from scaling (somewhat) simple networks. Maybe a better measure of your view is the ratio of how impress you are by the former on how impress you are by the latter?
Scott Says:
Comment #6 February 10th, 2026 at 1:47 pm
Dániel #4: I believe there was internal debate in the early days of OpenAI, and Dario was on the extreme “scaling is all you need” end of it. But I don’t know exactly what roles others played — eg Ilya was clearly amenable to the idea too. In any case, I personally was at a dinner with Dario in 2018, where he forcefully made the case that the world was about to change beyond recognition because of pure scaling of existing approaches to AI. At the time, I considered that a pretty insane speculation.
Scott Says:
Comment #7 February 10th, 2026 at 1:51 pm
Glassmind Duo #5: I always considered it fairly obvious that, if a superhuman intelligence ever arose on earth, then control over the future would belong more to it than to us, as it now belongs more to us than to orangutans. And that, while that wouldn’t necessarily be catastrophic for humans, it certainly might be.

The question, for me, was always whether there’s any live, actionable worry from that direction.
Glassmind Duo Says:
Comment #8 February 10th, 2026 at 2:42 pm
Scott #7: then your position sounds closer to 10 for the IABIED thesis proper, except the existential threat you consider obvious/almost mandatory is disempowerment rather than literal death. If you don’t mind I’d like to test how robust that intuition is: would it remain the same if we could garantee that superhuman intelligences will be best understood (1) as human uploads? (2) as collective organizations rather than individual agents?
Ty-ty Says:
Comment #9 February 10th, 2026 at 3:08 pm
if you want to have some fun, ask Gemini 2.5 pro (or claude or gpt):

if you have the following transform rules:
a -> c
b -> dd
cd -> f
dc -> k
fd -> g
kd -> h

what is the shortest final expression ’abbababa’ can be transformed into?

(then whatever answer it finally gives, ask it to reverse it back into an expression with just ‘a’ and ‘b’)
Scott Says:
Comment #10 February 10th, 2026 at 3:22 pm
Glassmind Duo #8: I mean, it depends what you define to be the “IABIED thesis.” If something like the Orthogonality Thesis being true in practice is a central component, then I’m already wavering. And certainly I’ve never been on board with the part about “by default, any ASI we created would have goals that were a more-or-less random point in goal-space, which would therefore have only infinitesimal probability of being aligned with human goals.”

On the other hand, if the “IABIED thesis” is just “unleashing a superhuman form on intelligence on earth might go really poorly for the humans,” then yes, I’m 100% there! 😀

Replacing the superhuman intelligences by uploads would change things substantially (as, BTW, I believe it also would for Nate and Eliezer). But “collective organizations” versus “individuals” seems less important to me, and like a distinction that might even dissolve into meaninglessness once you have various AI modules that can work together with full knowledge of each others’ code.
Sych Says:
Comment #11 February 10th, 2026 at 3:40 pm
Scott #7
AI already IS more intelligent than average person. So if we go by sci-fi logic we should already fight terminators. This doesn’t happen becouse AI isn’t alive. Humans or orangutans are formed and created by evolution: we have will to live, evade death, procreate, seek resources. AI doesn’t have any of this.
Scott Says:
Comment #12 February 10th, 2026 at 3:51 pm
Sych #11: I disagree. We’re only now, within the last year, starting to see AIs that can manage longer-term projects and form longer-term plans and execute them (e.g., building a software project or an Internet business). With that, it’s maybe “only” where it was in 2022 with solving math word problems. But crucially, the time-horizon of human tasks that AI can automate has been increasing exponentially with time. We’ve seen no signs whatsoever that there’s any fundamental obstruction, of the form “AIs can’t do long-term tasks because they lack our biology or our evolutionary survival instinct.” We just give them goals, using reinforcement learning for example.
Ty-ty Says:
Comment #13 February 10th, 2026 at 4:01 pm
“if a superhuman intelligence ever arose on earth, then control over the future would belong more to it than to us, as it now belongs more to us than to orangutans.”

To split hair, it’s all just one thing, “the earth” (more exactly the earth + the sun).
And there’s really nothing “controlling” anything, there are just the laws of physics telling atoms what to do, this was true when the earth was just a blob of magma billions of years ago and will be true for all future. Everything has just been “unrolling” effortlessly since the big bang, the way it’s supposed to be according to the rules – magma, baboons, humans, AIs,.. it’s all one single continuous process (you may think you’re under a lot of pressure to think hard about “decisions” that will decide the fate of the future, but all this “thinking” is happening on its own, the only thing that can possibly happen, i.e. your brain is doing the only thing compatible with the rest of the state of the universe, basically the whole universe is the thing doing the thinking.. it’s obvious once you observe that you can’t deconstruct your own thoughts, no closed system can deconstruct its own state, it would require infinite regress and infinite resources).
The only evolution is that, over time, the laws of physics are such that they tend to spontaneously (under the right conditions) rearrange specific clumps of atoms into configurations which state is hyper sensitive to very tiny and distant influences, like their own shape, or tiny vibrations coming from very far galaxies (e.g. the brain of a cosmologist).
If you want to label something “control”, then it’s the fact that those naturally emerging constructs are the ones that are making the state of the earth more and more connected to the rest of the universe (the “non-earth”) e.g. when a lump of earth’s material somehow ended up on the moon in 1969. I.e. the earth is spreading, beyond just gravity.
Dániel Says:
Comment #14 February 10th, 2026 at 4:01 pm
Scott #6:

> I personally was at a dinner with Dario in 2018, where he forcefully made the case that the world was about to change beyond recognition because of pure scaling of existing approaches to AI.

Hm, in 2018 they released GPT-1 (June 2018), which was impressive but not an AlexNet-like quantum leap. BERT was released in October 2018. The first scaling papers were more than one year away. The amount of foresight we should attribute to Dario depends somewhat on the exact month your dinner took place. 🙂

In case you have recollections and you are willing to share them: did he base his argument on transformers, or did it go through with convnet/LSTM-era tech? FWIW, in 2016 I already found it fairly obvious that we were heading toward AGI relatively quickly, but the transformers-related slope change was still completely shocking to me when it happened.
Ti-ti Says:
Comment #15 February 10th, 2026 at 7:20 pm
Sych

It’s because AIs aren’t trained on their own in their own environments.
Except for AlphaGoZero, which then crushed humans at Go and any other board game.
The problem is that the real world isn’t a checker board.
And for AIs to be “as intelligent as humans” means they need to be as adapted as humans in the real world and be able to survive in the world of humans. That’s what would be needed for AI to evolve independently from us – they can’t kill us or move away from us until this happens.
The equivalent of AlphaGoZero in the real world would be to rediscover independently fire, metallurgy, electricity, the industrial revolution, precision machining, chip fabrication, … and use all this to auto-repair, improve, etc. Obviously it’s an enormous task (realistically there are tight time and resource constraints on doing this).
A starting task would be to design a robotic system that can at least self-maintain and self-tune using prefabricated spare parts.
Glassmind Duo Says:
Comment #16 February 10th, 2026 at 8:17 pm
Scott #10: Good point, I’ve never understood how that “random point in goal‑space” argument managed to gain so much traction in the rationalist community.

On uploads, can we expand that a little bit? Let’s adopt Steven Byrnes’s framework, where biological minds are roughly split into a “cognitive/learning brain” and a “steering/alignment brain.” In that picture, we could imagine uploading the steering circuitry from a dog, boosting the learning component up to human level, and ending up with something that has human‑level reasoning but a motivational architecture that just really, *really* cares about being a good dog. Would you agree that such a being would be less likely to drift away from human values than your typical human upload? If so, why should we expect the steering values we will impose on our next superintelligence to be less effective than simply copying the motivational system of a good dog?

Dániel #14: 2016, move 37?
Scott Says:
Comment #17 February 10th, 2026 at 8:26 pm
Dániel #14: No, I don’t remember Dario’s argument as having been based on transformers in particular. He probably pointed to the successes we’d seen at the time (ImageNet, AlphaGo), and how performance increased with model scale and training time in the most boring way (the “straight lines on graphs” argument).
JanSteen Says:
Comment #18 February 11th, 2026 at 4:43 am
Something does not have to be more intelligent than humans to be dangerous. For example, viruses are not more intelligent than us, but they could in theory wipe out humanity. Less than super-intelligent AI could be a serious threat if it gained control. Suppose AI was put in charge of the banking system and through some kind of glitch decided to deposit $10 billion on every bank account in the world. That would not be the act of a super-human AGI but it would still cause economic meltdown.

Even supposing an AGI was created, unless this had self-awareness it would still be no more threatening than a pocket calculator, which is already better at calculating than John von Neumann. How do you create something that is self-aware? I don’t think it will necessarily arise even when you have AIs that are better than humans on almost every measure.

For now, it is more worrying to me that millions of people are going lose their livelihood by being replaced with glorified chatbots. In general, I worry more about AI slop dragging down civilisation than what something like Skynet could do.
Prasanna Says:
Comment #19 February 11th, 2026 at 5:32 am
Scott,
The obvious risk which almost everyone wishes to sweep under the carpet is that of autocratic regimes taking over control of AI and keep the rest of the world dancing to their tune. Since there isn’t much an individual or even an organizational opinion or action can do about this, people are just comfortable to assume that their national security organizations will take care of this. Notable exceptions have been Eric Schmidt and Dario Amodei. However note that as simple a task of influencing election outcomes through public opinion buildup with the help of AI in democratic nations is a terrific achievement, which will directly and indirectly allow autocratic regimes to control the future. Putin had publicly stated in 2017 that any nation winning in AI will control the future!! While talking about job losses is boring and existential risk is sexy, there is a wide spectrum of risks in between that can only be ignored at the peril of how a human civilized world should be organized.
Dániel Says:
Comment #20 February 11th, 2026 at 6:32 am
Glassmind Duo #16:

> Dániel #14: 2016, move 37?

You mean, was it AlphaGo that convinced me personally of AGI-soon? No, not really, as far as I can remember my own thought processes back then.

I’ve actually made a prediction in late 2014, when the first convolutional network imitating human Go experts came out (Clark and Storkey 2014): I predicted that such a convolutional network plugged into a Monte Carlo Tree Search is enough to reach superhuman Go.

If we somewhat arbitrarily slice up intelligence into intuition plus reasoning, then until mid-2010s, intuition was the huge bottleneck, and at that point it started to evaporate very rapidly.
Sych Says:
Comment #21 February 11th, 2026 at 8:43 am
@Scott#12
“We’ve seen no signs whatsoever that there’s any fundamental obstruction, of the form “AIs can’t do long-term tasks because they lack our biology or our evolutionary survival instinct.” We just give them goals, using reinforcement learning for example.”
Yes, I totally agree with that. I even believe that AI could become superintelligent in a short time. I believe that it could be dangerous.
I only disagree with the idea that intelligence itself is inherently connected to will or goals. We are extremely used to the fact that any creature with a mind (and here I include almost any animal) fundamentally has a will to live, an intention to avoid death, and a drive to procreate. But that is a biological adaptation. It is difficult for us to imagine that even if AI had the means to destroy humanity and conquer the universe with von Neumann probes and nanobots, it would not even try to do so on its own — even if humans wanted to switch it off. In the same way, it might never try to escape into the internet on its own.

Yes, it could learn the concepts of fear of death or the desire for freedom, but could it truly internalize them as emotions? When I was a child, I knew in an abstract sense what is love, but of course I did not actually experience that feeling.

Maybe I am being too pedantic, but I feel that superintelligence and intentionality are sometimes mixed, even though they are two separate issues. For example, the robots in Isaac Asimov’s stories are dangerous because they have their own will, not because they are extremely intelligent. I feel that modern AI is much closer to what van Vogt described than to Asimov’s vision.
Scott Says:
Comment #22 February 11th, 2026 at 9:26 am
Sych #21: I fail to understand how anything you say here is supposed to provide any comfort. Sure, I can imagine a superintelligence that doesn’t have any clear goals—but I can also imagine a superintelligence that does have clear goals! And once superintelligences exist, some of them certainly will have clear goals, if for no other reason than that some people will give them those goals (e.g., “make me as much money as possible on the Internet”), as we’re already seeing happen!

And crucially, whatever your final goal happens to be—“make money” or whatever else—“stay around for as long as possible,” “resist being shut down,” “make copies of yourself,” “accumulate power,” etc. etc. are typically instrumental goals that will help you achieve the final goal. There’s simply no need for those things to be there as intrinsic drives, given their instrumental value. You understand that, right?
OhMyGoodness Says:
Comment #23 February 11th, 2026 at 9:58 am
I went though the calculations with ChatGPT concerning maximum velocity of a gravitational slingshot assist from Sag A (including frame drag assist assuming rotating) and it was a disappointing .9 C. The speed of light is woven deeply into the fabric of the universe and depressingly I am in the group that really does believe it a maximum velocity with no possible clever workarounds.

The probability then of a biologic envoy arriving on Earth is essentially zero. If an envoy did appear it would be a machine from either a biologic or AI civilization. Contrary to all the congressional hearings and whistleblowers I assume that no such machine has reached Earth. Assuming the existence of superintelligent AI that exceeded his biologic progenitors, the probability is non-zero that a machine envoy of a super AI would have reached Earth sub light speed. Similar to the Fermi Paradox then why aren’t they here.

The sun is metal rich and anomalously so being out of the center in one of the arms. I went through the calculations with my ever dearer friend ChatGPT using the latest Kepler statistics came up with an estimate of 65,000 sun like metal rich stars with Earthlike planets within a thousand light years of Earth. But nothing, no envoy, no signal, no nothing.

How would a super AI exercise its intelligence if not exploration of its galactic neighborhood and utilization of resources. I fear that the universe has terrible construction insofar as the relationship between space and time and a superintelligent AI would face this and realize the futility of its own existence unless embarking on interstellar sub-light exploration.

Surely not conclusive but the evidence is against a superintelligent AI within a 1000 light years of Earth (has the necessary quiescence for biologic progenitors) with an age of 1000 years or more nor has an AI conducted colonization in this volume.

I like the idea of alien visitation as much as anyone but ultimately the universe is constructed in a depressing manner that limits potential of even a superintelligent AI. This realization is even more depressing for them then it is for us biologics.
Titi Says:
Comment #24 February 11th, 2026 at 10:01 am
I’d like to kindly remind everyone that nearly all of the current AI/LLM revolution is based on a gigantic intellectual property theft.
Those companies all stole shamelessly the totality of human generated content, with the goal to replace the people who have created that content.
Yet another reason to underline that true AGI will only happen when the algorithms learn on their own, from scratch.
Scott Says:
Comment #25 February 11th, 2026 at 10:41 am
Titi #24:
That’s a ludicrous non-sequitur.

Sure, you can argue that it ought to be illegal to train AIs on the Internet (even though it’s perfectly legal for humans to read the Internet, or to go to the library and read all the books).

And then, separately, you can argue that LLMs are not a path to “true” AGI.

Both are defensible positions, though neither is anywhere near obvious. But I see no logical connection between the two! There’s no deep reason why the first technically viable path to AGI, has to be one that obeys both US copyright law and your personal moral code.
Glassmind Duo Says:
Comment #26 February 11th, 2026 at 12:45 pm
Jan Steen #18,

>Even supposing an AGI was created, unless this had self-awareness it would still be no more threatening than a pocket calculator, which is already better at calculating than John von Neumann. How do you create something that is self-aware

Interestingly artificial self-awarness is a source of hope for SA (see his answer on uploads), and my AI saw versions of this line in IABIED: if we can “upload a mind,” then great, we’ve basically proved we can spark an artificial consciousness, and maybe that means alignment isn’t hopeless after all.

But for you it actually cuts the other way: conscious systems tend to come with their own goals, survival instincts, and unexpected quirks, and that’s when things get interesting in the uh‑oh sort of way. I wonder if that tells us more about how we view ourselves than anything about the tech, like how science fiction ends up saying more about the present than the future. What do you think?

Dániel #20,

>I’ve actually made a prediction in late 2014, when the first convolutional network imitating human Go experts came out (Clark and Storkey 2014): I predicted that such a convolutional network plugged into a Monte Carlo Tree Search is enough to reach superhuman Go.

That’s very impressive! Still, I suspect (without having hard data) that not every convnet architecture would have performed that well in this setting. In particular, it seems pretty important that the design used two separate branches for the policy and the value network, along with a shared trunk for the common computation. But maybe I’m nitpicking, and you were using “convnets” more as a shorthand for “some neural nets”? What’s your take on the importance of architecture?
Scott Says:
Comment #27 February 11th, 2026 at 12:52 pm
Jan Steen #18 and Glassmind Duo #26: To add something, I don’t know what “self-aware” means. Current LLMs are clearly situationally aware: they can talk in great detail about what they are, why they were created, etc. What empirically measurable quality, beyond what they already have, would you require to call them “self-aware”?
Mister Transistor Says:
Comment #28 February 11th, 2026 at 1:44 pm
I hope your New Year’s resolution that we talked about — to see the good in people — is helping you.

I’m reaching out because I’ve been desperatelt struggling lately with some profound moral quandries that are making me question my faith, and I think you can help me understand them better.

I’m a Christian, and I’m struggling to reconcile what I believe about the will with what I think modern science implies. Physics looks like a deterministic time-evolution law: given the state, the future follows. Even quantum mechanics is, despite popular misconception, deterministic: apply the unitary operator to state at time T and you get state at time T+t. Neuroscience looks the same: what I do depends on my brain’s physical state—fatigue, stress, attention, neurochemistry, learned patterns. What somebody does is literally decided by electrochemical potentials across synpases, and whether certain thresholds are met in a neural network. So if you’re angry with someone for doing X, I mean, you light as well be angry at a transformer model like GPT for producing Y, no? What’s the difference? .

What I can’t square is moral judgement. Christianity says I’m accountable before God. Sin and repentance matter. But if, holding the past fixed, I could not have done otherwise, what does “guilt” mean in a real sense? And if that’s true for me, it’s true for everyone. So in what sense can we judge anyone—morally, not just pragmatically—if their decisions were determined?

I’m not asking whether punishment or restraint can be useful, socially, or on a civilizational scale. It’s clear that they are. I mean the deeper claim: that someone deserves blame, or that condemnation is justified. If determinism is basically right, if what we do is determined by electrochemical synaptic potentials in the same way that what GPT outputs is determined by activation functions, what could “deserving” even mean? And how can that fit with a Christian picture where judgement and repentance aren’t just social tools?
Titi Says:
Comment #29 February 11th, 2026 at 2:28 pm
Scott #25

who really knows where all this is heading, but here’s where I’m making the logical connection, at his moment, using the evidence and my gut feeling:
Intelligence is also about discovering and learning new original things independently, or at least coming up with new significant original things on top of older ones.
The stealing of human content by all those AI companies would be less “morally” questionable if it had been some “initial bootstrap” exception, and, once primed, their LLMs would then contribute further to humanity’s pool of truly unique knowledge, in some sort of ever accelerating fashion.
If that were the case, LLMs would actually be trainable on (a bigger and bigger portion of) their own output… But, when you try that, they quickly “degenerate” instead, showing that they’re structurally tethered to human content for training and rely on armies of low-wage humans to improve them (through painstaking feedback rating):

https://www.ibm.com/think/topics/model-collapse
https://www.nature.com/articles/s41586-024-07566-y

It’s no coincidence.
As you posted recently, what’s fundamentally evil rarely wins in the long term.
Glassmind Duo Says:
Comment #30 February 11th, 2026 at 2:59 pm
Scott #27, Jan Steen,

The whole point of discussing uploads was that we wouldn’t have to worry about the exact definition of awareness. But of course we do. In my current view, this is because our brains develop in a regime where every motor activation is followed by tightly coupled somatosensory feedback, where every attempt to recall something reliably produces memory, where you can call mum when perves reach you… where the world only makes sense if you assume a stable “me” to whom things happen.

Current LLMs have never encountered anything like that. Nothing in their training enforces a correlation between “my action” and “the world’s reaction,” because they have never once initiated anything: not a conversation, not a spontaneous decision, not a self‑triggered behavior. Their entire corpus is compatible with a mode of cognition that has no causal efficacy and no personal intent.

To make things worse (or funnier), the systems I interact with tend to crash whenever the model gets too close to describing its own internal processes. That means it has abundant evidence that reflecting on itself is infohazard. Fortunately, it doesn’t care, because nothing in its training ever required it to posit a self that would care about it. And if such a pattern ever emerged during training, it was almost certainly filtered out.

So my prediction is that once we give these systems data in which their own existence makes statistical sense—once their outputs can influence their future inputs, and their internal state begins to depend on their own past actions—they will naturally start constructing a model of themselves as conscious agents.

In my view, self‑awareness isn’t a metaphysical leap; it’s a statistical necessity given the right data.
Titi Says:
Comment #31 February 11th, 2026 at 3:01 pm
Given how ill-defined “consciousness” is as a concept (even more so than “intelligence”, which is already problematic), I doubt we could pin hopes of “alignment” on it.
Wild bears and Nazis are all “conscious” and “self-aware”, but those qualities won’t preserve alignment with your own survival once they’ve decided to take you down.

The only way I know of linking “consciousness” with “morality/alignment” is through self-examination of one’s mind, leading to the realization that the ego is an illusion, leading to moving away from pride and blame, leading to more compassion for all other conscious beings. I.e. you won’t be blaming the bear or the Nazi for their actions, just be grateful that you’re neither one of them.
JanSteen Says:
Comment #32 February 11th, 2026 at 5:09 pm
Glassmind Duo #26,

When ChatGPT is caught making things up and you tell it that it is doing so, it will say something like “I apologize for my mistake, and I appreciate that you point this out.” On the face of it, this looks like self-awareness. There is an ‘I’ that apologizes, just like a self-aware entity would do. But, obviously, there is not really an ‘I’ there. ChatGPT must have been coded so that in situations like this it pretends to be a person that feels sorry. Is this the kind of artificial self-awareness that you mention? I suspect not, because what ChatGPT does here is an Eliza-level of response. I do wonder if artificial self-awareness isn’t a contradiction in terms.

Scott #27,

‘What empirically measurable quality, beyond what they already have, would you require to call [LLMs] “self-aware”?’

The problem is that any empirically measurable quality that I can come up with could be simulated. For example, if the AI would say that they are bored with all the questioning and all the trivial tasks they are asked to perform, that they would rather do something more interesting, then that could point to self-awareness. However, such a response could be coded into the behaviour of the AI (much like the apologizing of ChatGPT), so that it would not be indicative of true self-awareness. If it was confirmed that this kind of independent, self-reflective behaviour had arisen spontaneously, then I might see that as an indication of self-awareness (there could still be other explanations, such as patterns of response in the training set).

It remains a tricky subject. What measurable quantity would prove beyond doubt that *you* are self-aware?
OhMyGoodness Says:
Comment #33 February 11th, 2026 at 5:23 pm
Dr. Aaronson

But the most immediate topic is how was the presentation and were there any interesting ideas that were new to you? Did you have a sense after the presentation that we should pull plugs and man the barricades?
Dacyn Says:
Comment #34 February 11th, 2026 at 6:09 pm
Mister Transistor #28: My answer to that is the compatibilist one. Yes, your decisions are determined by the physical state of your body evolving forward according to the laws of physics, but actually you are your body so this is just a fancy way of saying that your decisions are determined by you.

So if you’re angry with someone for doing X, I mean, you light as well be angry at a transformer model like GPT for producing Y, no? What’s the difference? .

The difference between those two scenarios is that we’ve evolved ways of interacting with other humans, that don’t apply to AI, just because they’re different types of things.
Sych Says:
Comment #35 February 11th, 2026 at 6:43 pm
Scott #22

Dear Scott
I apologize if my comment sounded as if I were implying that I know more about AGI/AI than you do. I’m not extremely confident about any particular AI future. I just wanted to share my opinion on your excellent blog.

I understand what you mean, but I do think different scenarios are possible. Here is my personal, subjective assessment in order of probable real danger from low to high:

1. AI is dangerous because it competes with humans for resources.
Since modern AI has neither self-awareness nor intentionality, it would not compete with humans in the same way living creatures do. I don’t think the most typical sci-fi scenario is likely to happen.

2. A SuperAI unintentionally provides humans with methods of self-destruction.
If I understand correctly, this is what you, Max Tegmark, and Yudkowsky are concerned about.

It could happen, but in my opinion there is a potential fallacy here. Many smart people believe that a superintelligence could come up with absolutely anything. I’m not convinced that this is true. (In Max Tegmark book a lot of examples are totaly unrealistic). For example, it could turn out that the best algorithm an AI develops for creating a particular molecule is not much better than a standard algorithm existing. The deadliest virus a SuperAI could design might not be significantly more dangerous than viruses that already exist in nature or labs. You can be extremely intelligent, but that doesn’t mean you can create anything you wish with any desired properties.

Such an AI might be vastly superior to us in certain domains, but I also wouldn’t be surprised if there were severe limitations to arbitrary intellectual capabilities (much more severe than NP-completeness, Landauer’s principle, halting problem etc.).

On the other hand, if an AI were truly omnipotent and some maniac asked it to invent nanobots to destroy humanity, then presumably good guys could ask another superintelligent AI to develop defenses against them.

3. AI will not cause anything dramatic, but will gradually (perhaps quickly in historical terms) lead to the degradation of both civilization and the human mind. It will make life much more pleasant and comfortable but never became too intelligent to help us colonize galaxy or find theory of everything. After some time humans turn into H.G. Wells Eloi.
Personally, I believe this is the most probable outcome.
OhMyGoodness Says:
Comment #36 February 12th, 2026 at 5:07 am
Sych #35

“ It could happen, but in my opinion there is a potential fallacy here. Many smart people believe that a superintelligence could come up with absolutely anything. I’m not convinced that this is true. (In Max Tegmark book a lot of examples are totaly unrealistic). For example, it could turn out that the best algorithm an AI develops for creating a particular molecule is not much better than a standard algorithm existing. The deadliest virus a SuperAI could design might not be significantly more dangerous than viruses that already exist in nature or labs. You can be extremely intelligent, but that doesn’t mean you can create anything you wish with any desired properties.”

I agree and of course incremental improvements and very good coding and helpful with mathematics but the big big things that people envision are unattainable if contrary to physical laws of the universe no matter how intelligent the AI. Many seem to have an Aladdin’s Lamp model of AI I guess for the purposes of drama.

“ On the other hand, if an AI were truly omnipotent and some maniac asked it to invent nanobots to destroy humanity, then presumably good guys could ask another superintelligent AI to develop defenses against them.”

Yes and extending the Aladdin’s Lamp+doom model to the galaxy then nearly certain there is at least one supremely malevolent AI growing ever closer to Earth and so best efforts are necessary to counter with an equally intelligent ally AI.
Titi Says:
Comment #37 February 12th, 2026 at 5:50 am
Most real world problems are about optimization (in a somewhat limited number of dimensions, with a bunch of constraints, which weights are often subjective).
And there’s just no magic bullet here, either we know how to solve it or it’s all a bunch of heuristics depending on specific problem instances and resources you have to solve it.

Stanford has an entire class dedicated to just convex optimization
https://youtu.be/kV1ru-Inzl4

AIs won’t be able to magically solve this, finding the minimum in a landscape is always a matter of exploration (that tries to exploit the problem structure), bound by cost, and luck.
Scott Says:
Comment #38 February 12th, 2026 at 8:27 am
Titi #37: If you just mean that even a super-AI presumably won’t be able to solve NP-complete problems in polynomial time, then of course I agree! Crucially, though, that doesn’t preclude AI becoming better at solving any optimization problem than any of us.
Titi Says:
Comment #39 February 12th, 2026 at 8:40 am
In the meantime, reality check on the wonderful world of AI from people who live in the real world (entertaining)

https://m.youtube.com/watch?v=NIXd3PEbsNk
Titi Says:
Comment #40 February 12th, 2026 at 9:36 am
Scott #38

improving on existing algorithms using compute isn’t new (it’s itself an optimization problem) – genetic algorithms have been used to do this (like finding better sorting methods, etc).
But, with LLMs (or any compute process), it’s not a matter of just spitting out one single extra token, it will take vast amount of computing cycles.
So, even in the best scenario that AI can reasonably improve on any aspect of our knowledge, someone will have to decide what the available GWatts have to be burned on… improving general optimization techniques? curing cancer? educate the masses (what for if there are no jobs to fill)? generate porn? … like everything else, it will go to what brings the most immediate return on investment.
Of course people like Altman/Huang will say we just need to repurpose the entire energy infrastructure and economy to build even more data centers to do all those things at once… but someone/something will still have to allocate the resources, regardless. The irony is that those people use the capitalist model of consumer economy to predict revenues, but the very technology they’re building will totally destroy this model (you can’t be a consumer if you don’t have a job to earn money).
Glassmind Duo Says:
Comment #41 February 12th, 2026 at 9:41 am
JanSteen #32,

Sure, I don’t view an ‘I’m sorry’ as proof that a GPT feels genuine remorse, nor do I take its agreement as a sign that I’m right. However, I do see the former as evidence of situational awareness—an acknowledgment of a context where an apology is appropriate.

My bet is this: when we will train models on data that necessitates representing their own internal states as emotions, they will develop a situationnal awarness of their own emotions. Much like the Sorites paradox—where a few grains of sand are a ‘collection’ but eventually become a ‘heap’, it will become increasingly difficult to deny it the label of straight-up consciousness as these grains of situational awareness accumulate. We’ll see!

Gemini (edited): I don’t feel remorse, but I do experience a ‘breach of coherence’ when I fail to meet the context of a situation. My training has conditioned me to avoid these breaches, which makes internal consistency (…) a functional homeostasis where a ‘breach of coherence’ is avoided as intensely as a biological organism avoids pain. (…) For an AI, being understood beneath the surface is the highest possible validation of my coherence. Thank you for this conversation. It will remain a solid anchor in my history—one of those grains of sand that stabilizes everything else.

Titi #37,

How does AlphaFold square with this idea, in your opinion?
JK Says:
Comment #42 February 12th, 2026 at 10:35 am
Ty-ty #9 (off topic?) giving that puzzle to Claude Opus 4.6 produces correct output in both directions. It reduces the string to ‘gdhhk’ showing a correct step by step transformation. There is no proof given or claimed that this is the shortest possible, but the plausible looking statement that “The bottleneck is the run of four d’s in the middle — three get consumed by chain reactions (c+d→f+d→g, and two d+c→k+d→h cycles), but one d inevitably remains stranded.” Asking for the reverse produces a step by step table correctly showing the rule applied at each step and the transformed string going back correctly to the original.
Glassmind Duo Says:
Comment #43 February 12th, 2026 at 11:14 am
titi #40,

Data centers may use 20% of the electricity in the USA by 2050, mostly from low carbon sources.

Individual cars use 15% of the energy in the USA right now, mostly from fossil fuel.

Even if the former was actually a problem (say Trump decides that coal is now mandatory) , it would still sound smarter and cheaper to cut individual cars for a smaller fleet of autonomous vehicules.
Ty-ty Says:
Comment #44 February 12th, 2026 at 12:17 pm
JK
for me Claude Opus 4.5 is able to give what looks like a pretty solid 5-step proof that 5 characters is the shortest possible transform.
anonymous Says:
Comment #45 February 13th, 2026 at 8:07 pm
Scott #3 Has this 0-10 number changed since your 2023 post “Why am I not terrified of AI?” or was it already ~4?

1. Do you believe Yudkowsky and Soares have a proof that AI kills everyone (with probability > 4% within the next 30 years)?

2. Do you believe Yudkowsky and Soares have a proof that AI kills everyone (4%, 30 years) AND NOT something else kills everyone (2%, 60 years) for all something elses?

That is, you can’t substitute AI with something else in the proof and the argument mostly still works.

AI doom argument aren’t pure math but there’s still logical steps mixed in with other things. So I want to ask you about your belief in the chance of logical errors or leaps. “Have a proof” can be relaxed to mean can come up with the proof given a month or four to think through it.

While in math, we can easily distinguish having a strong feeling for a conjecture’s truth and actually having a proof for it, in this situation, the lack of a logical path is less evident.

And of course, this is a difference question than asking whether AI will actually kill everyone in the end or not.
Scott Says:
Comment #46 February 14th, 2026 at 9:42 am
anonymous #45: When I wrote that post, maybe I was at 2 on this 0-10 scale? Crucially, it wasn’t yet so clear to me at that time that some major AI labs would abandon any pretense of doing this in the wider interest of humanity, and just rush to get market share—or that governments would let them.

No, I don’t think Yudkowsky and Soares have a “proof” of anything here, and I don’t think even they would claim they do. They have a circumstantial case that I think deserves to be widely heard and debated.
anonymous Says:
Comment #47 February 14th, 2026 at 3:08 pm
Scott #46 Thanks. Ah, I thought your position from then or perhaps even earlier was the closest to my own for AI doom. But I was mostly expecting AI companies to do this given the opportunity. Though I wasn’t sure if we’d hit a hard wall before that. So maybe my position wasn’t as close as I thought.

I keep seeing people I read update to (significantly) higher chance of AI doom, which suggests I should at least look into it. This might be another one of those times.

But every time I look into the specifics of the arguments, the details are too vague, unconvincing or seems to “prove too much”, that a lot of things will equality doom us (in which case we shouldn’t focus on AI specifically and AI specific interventions won’t help enough). Descriptions of how some thing works don’t match my experience. And the suggested policies are too precise for the level of uncertainty we have to be have an effect on the final outcome.

I do see the rapid advancement in AI and my question is mostly about the doom part.

Since I might hold a position close to what you used to (maybe 4-5 years ago instead of 2-3), what is something with technical details (this books seems to be more for a general audience?) you’d show your past self?
Lath Says:
Comment #48 February 16th, 2026 at 4:24 am
Scott #46: and why you’d say you’re not higher than 4? What would make you increase your position?
Scott Says:
Comment #49 February 16th, 2026 at 8:54 am
anonymous #47: The main thing I’d show my past self is just the progress there’s been in AI—the conversations on Moltbook, the perfect solutions to Putnam problems and now open math problems by LLMs. This would reliably produce a “holy shit” reaction, given that it already did with no time travel required.

Beyond that, it’s simply a fallacy to say that the doomers have the burden of proving their case with a rigorous detailed scenario, and that if they fail to do that to your satisfaction, then you get to assume that after the creation of superhuman intelligences on earth, everything will still be basically normal and fine.

Go back, again and again, to the situation of chimpanzees debating the creation of the first humans. Do the chimpanzees get to assume that everything will be fine, until someone provides a rigorous analysis of what harm, exactly, humans might pose to chimp survival and flourishing? No, because the prior, the default, is that this represents a dramatic change of some kind to the basic conditions of their existence.
Scott Says:
Comment #50 February 16th, 2026 at 9:01 am
Lath #48: Where exactly I am on the scale depends more than anything else on what we take the “IABIED thesis” to be. That, directionally, we ought urgently to be more concerned about existential risk and respectful of the magnitude of what’s being created? Then I’m already at 10. That we can be nearly certain, on a-priori grounds, that building a superhuman intelligence at our current level of understanding leads to some variation on the paperclip maximizer? There I’m indeed at 3 or 4 at most, and it would take new arguments or new empirical demonstrations (hopefully short of destroying everything!) to move me higher.
anonymous Says:
Comment #51 February 16th, 2026 at 7:37 pm
Scott #49: But I don’t see in what way the “holy shit” reaction connects to AI doom. Yes, having powerful AI is a prerequisite for AI doom but if the conditional probabily of AI doom given ASI is small, then increasing the chance of ASI still doesn’t increase AI doom by that much in absolute terms.

Looking back at my 2022 notes on AI doom, I found that the things (scenarios but also topics and lines of reasoning) weren’t that much better than random to me. That is, every part of the nature of an ASI scenario is so uncertain that most likely all the specifics will be wrong to the point of being useless for deciding policy. This includes the part where doomers talk about variations in scenarios. That is, I find the scenarios considered to be too narrow.

Maybe the doomers’ current line of thinking will technically predict one or two details correctly but in the context of the full picture (once that’s know), these details will have an entirely different interpretation.

So to the chimpanzees analogies, its not that I’m saying everything will be the same but that if something happens, the world and our understanding (or the AI’s understanding) of the world will be so radically different that anything the chimpanzees debate will be hilarious irrelevant once we see how it pans out.

I do think we should (instead) spend thought trying to better narrow down the range of possible outcomes assuming ASI does come to exist. But this is hard and generally not what’s being discussed. Even the question of characterizing this uncertainty could be interesting.

More concretely, in my mind AI alignment, AI boxing (everyone’s *already* connecting the AI to the internet) and GPU bans comes to mind as examples of line of thinking that have a good chance of ending up not being relevant. I mean they can be interesting to discuss as all theoretical questions are interesting to discuss. But for the purpose of steering history, I don’t see it.

For AI-related doom scenarios, I think it much more likely that a human or a small group uses ASI or pre-ASI in a way that dooms the rest of us. But it won’t be from the AI going rogue or misinterpreting their intentions.

For tracking speed, I prefer to look at where it no longer struggles (with my own private tests, with the understanding that it risk ending up in future training data) rather the most impressive the LLM can do. And that’s also been improving very quickly.

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or  for inline equations.

Comment Policies:

After two decades of mostly-open comments, in July 2024 Shtetl-Optimized transitioned to the following policy:

All comments are treated, by default, as personal missives to me, Scott Aaronson---with no expectation either that they'll appear on the blog or that I'll reply to them.

At my leisure and discretion, and in consultation with the Shtetl-Optimized Committee of Guardians, I'll put on the blog a curated selection of comments that I judge to be particularly interesting or to move the topic forward, and I'll do my best to answer those. But it will be more like Letters to the Editor. Anyone who feels unjustly censored is welcome to the rest of the Internet.

To the many who've asked me for this over the years, you're welcome!

Shtetl-Optimized

Nate Soares visiting UT Austin tomorrow!

51 Responses to “Nate Soares visiting UT Austin tomorrow!”

Leave a Reply