Sad times for AI safety

Many of you will have seen the news that Governor Gavin Newsom has vetoed SB 1047, the groundbreaking AI safety bill that overwhelmingly passed the California legislature. Newsom gave a disingenuous explanation (which no one on either side of the debate took seriously), that he vetoed the bill only because it didn’t go far enough (!!) in regulating the misuses of small models. While sad, this doesn’t come as a huge shock, as Newsom had given clear prior indications that he was likely to veto the bill, and many observers had warned to expect him to do whatever he thought would most further his political ambitions and/or satisfy his strongest lobbyists. In any case, I’m reluctantly forced to the conclusion that either Governor Newsom doesn’t read Shtetl-Optimized, or else he somehow wasn’t persuaded by my post last month in support of SB 1047.

Many of you will also have seen the news that OpenAI will change its structure to be a fully for-profit company, abandoning any pretense of being controlled by a nonprofit, and that (possibly relatedly) almost no one now remains from OpenAI’s founding team other than Sam Altman himself. It now looks to many people like the previous board has been 100% vindicated in its fear that Sam did, indeed, plan to move OpenAI far away from the nonprofit mission with which it was founded. It’s a shame the board didn’t manage to explain its concerns clearly at the time, to OpenAI’s employees or to the wider world. Of course, whether you see the new developments as good or bad is up to you. Me, I kinda liked the previous mission, as well as the expressed beliefs of the previous Sam Altman!

Anyway, certainly you would’ve known all this if you read Zvi Mowshowitz. Broadly speaking, there’s nothing I can possibly say about AI safety policy that Zvi hasn’t already said in 100x more detail, anticipating and responding to every conceivable counterargument. I have no clue how he does it, but if you have any interest in these matters and you aren’t already reading Zvi, start.

Regardless of any setbacks, the work of AI safety continues. I am not and have never been a Yudkowskyan … but still, given the empirical shock of the past four years, I’m now firmly, 100% in the camp that we need to approach AI with humility for the magnitude of civilizational transition that’s about to occur, and for our massive error bars about what exactly that transition will entail. We can’t just “leave it to the free market” any more than we could’ve left the development of thermonuclear weapons to the free market.

And yes, whether in academia or working with AI companies, I’ll continue to think about what theoretical computer science can do for technical AI safety. Speaking of which, I’d love to hire a postdoc to work on AI alignment and safety, and I already have interested candidates. Would any person of means who reads this blog like to fund such a postdoc for me? If so, shoot me an email!

This entry was posted on Tuesday, October 1st, 2024 at 12:06 pm and is filed under The Fate of Humanity. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

48 Responses to “Sad times for AI safety”

Jon Awbrey Says:
Comment #1 October 1st, 2024 at 1:14 pm
Dear Scott,

I think what happened here is a lot of otherwise savvy people missed the significance of AI’s transfer from the modestly ethical groves of Academe to the dark satanic engines of corporate industry. When I see people still talking about “AI Safety” and the “Alignment of AI with Human Interests” instead of the “Alignment of Corporate Agendas with Human Interests”, it tells me those people are still missing the bigger picture.
Nick Says:
Comment #2 October 1st, 2024 at 5:35 pm
Hi Scott,
glad to see you’re firmly back in academia and to your recent statements and thoughts about AI! As a mathematician, I do wish some of the thought leaders in our field would similarly spend some time thinking cogently about these topics.

On another note, I’m often wondering what differences in culture and perception of the relevant issues led to bio-sciences adopting responsible principles whenever a significant breakthrough came around (Asilomar I & II), while for AI this has completely failed to materialize.
Hyman Rosen Says:
Comment #3 October 1st, 2024 at 5:56 pm
As usual, and despite the fact that you think of me as an ignorant and tendentious fool (and you may not be wrong), I am glad that these and all other attempts at controlling AI research and development are failing.

Other Scott (Alexander) has an amusing take on AI safety. Just like every time AI is able to do something new it becomes “not really AI”, every time AI does something that would have been perceived as seriously bad before it happened, say, self-driving cars getting in the way of the vice president or counting only two Rs in STRAWBERRY, it becomes “just a bug” and is no longer perceived by most people as a serious existential issue. It’s similar to how we treat spam bots and junk phone calls as just annoying irritants.
Scott Says:
Comment #4 October 1st, 2024 at 7:18 pm
LOL. Let me actually try one more attempt to get through to you. Suppose someone was thinking about the dangers of nuclear energy in 1939, but we forced them to limit themselves to “proven, established” harms that had already been observed, with all “speculative, science-fictional, extrapolated catastrophes” out-of-bounds. What would they find, a few radiation poisoning incidents? Clearly, they’d conclude from those that any attempts to control the global proliferation of enriched uranium would be a hysterical overreaction.
Shmi Says:
Comment #5 October 1st, 2024 at 8:07 pm
I think I am on Newsom’s side here (incidentally and instrumentally, anyway). My concern with restricting large models only is that it creates (perverse?) incentives to miniaturize the models while keeping capabilities as good as possible. This is likely to result in proliferation of dangerous models! Imagine what would have happened if the NRC restricted only large nuclear reactors and not small ones (by size, not by energy output).
Hyman Rosen Says:
Comment #6 October 1st, 2024 at 10:21 pm
Scott 4:

In 1939 scientists were already working on the promise of the atomic bomb, with plenty of scientific and engineering support for how such devices could be built:
https://world-nuclear.org/information-library/current-and-future-generation/outline-history-of-nuclear-energy
Concern about who might have access to nuclear materials and what they might do with them would thus have been appropriate in 1939, but also, that did not stop research on nuclear weapons nor was that research limited to compliant parties, whether we liked it or not.

If AI doomers had any such support for their beliefs, I wouldn’t be a skeptic. But they don’t. They have quasi-SF, quasi-religious fears that have no basis in reality. They think they’re going to be present at the birth of a new god or a new devil instead of just a new machine. I would say that they actually hope to be present at such a birth; it’s the same sort of fervor I remember from 1999, when I was working to make sure that my programs would continue to print dates correctly while they were fearing / hoping for a total collapse of society and a remaking of the world in just the way they wanted it.
Malcolm Says:
Comment #7 October 1st, 2024 at 11:03 pm
> Newsom gave a disingenuous explanation (which no one on either side of the debate took seriously), that he vetoed the bill only because it didn’t go far enough (!!) in regulating the misuses of small models.

There are more reasons given beyond that, and perhaps the best is the demand for “empirical evidence and science”. In Zvi’s response that you linked, he ignores this point. I believe it’s wrong to limit freedom of expression based only on philosophical thought experiments.
Opt Says:
Comment #8 October 2nd, 2024 at 1:02 am
Not convinced of the analogy to nukes. For nukes we had precise equations to predict the energy of the explosions. The danger of the weapons was easily predictable via a causal theory. For bills like SB 1047, we have two important things that are different

(1) We do not have a causal theory. We have to believe that extrapolating the compute 10x over the current frontier models will lead to “dangerous” capabilities.

(2) The danger of explosions is again proven by causal theories. For AGI or indeed ASI we have nothing except “exceptionally smart AI will take over and destroy us”. No well thought out argument even as to why it would necessarily come about imo let alone a formal theory.

Taking a page out of something similar I remember you saying a long time back in one of your posts – instead of debating vague assertions, let’s define a formal model where open source AGI/ASI provably (or with high probability – or even a simulation) leads to disaster. Then we can debate how realistic that model is, amend it, etc. And if at the end it turns out that no matter how realistic we make the model, the answer remains the same, that would be a powerful argument to convince a lot of people who are right now unconvinced imo.

PS – Incidentally, for me personally, my p(doom) for myself and my family and anyone I know of is 100%. Because all of us are going to die at some point. The only chance imo that any of us have is ASI that might lead to medical breakthroughs. So the risk calculus is very different there vs if you mainly consider the humans who aren’t even born yet.
OhMyGoodness Says:
Comment #9 October 2nd, 2024 at 2:35 am
Scott #4

I am not sure this is a very good analogy. It is an apt analogy in that both have been used as the basis of an existential threat (along with many other things) but in the case of nuclear weapons there was solid physics supporting the threat and providing the framework to develop the weapons. In the case of AI I don’t accept there is similar basis to provide a framework to develop an AI that becomes conscious and acts uncontrollably and independently to the detriment of humans.

I believe it was Boaz Barak that mentioned here Pascal’s Wager in this regard and the AI existential threat seems much more Pascal’s Wagerish to me than nuclear weapons.
Danylo Yakymenko Says:
Comment #10 October 2nd, 2024 at 6:50 am
> he vetoed the bill only because it didn’t go far enough (!!) in regulating the misuses of small models

At the very same time, the Big Capital, that opposed the bill, pushed the idea onto masses that the proposed regulation will limit the AI development freedom in California and its leading role. So the masses are cheering the veto.

I’m 100% confident that big corporations with AI instruments will control the public opinion however they like. And it will be the last nail in the coffin of democracy. Unless serious and urgent actions are made.
OhMyGoodness Says:
Comment #11 October 2nd, 2024 at 7:33 am
Jon Aubrey #1

The groves of Academe no longer exist. It has become a dark and foreboding place frozen by the relentless winds of Apocalypse.
Sandro Says:
Comment #12 October 2nd, 2024 at 10:07 am
Opt #8:

The danger of explosions is again proven by causal theories. For AGI or indeed ASI we have nothing except “exceptionally smart AI will take over and destroy us”.

No, that’s a strawman. It’s the most extreme outcome, but plenty of disastrous outcomes are possible without ASI and without an ASI that wants to destroy us.

Also, the causal theory here is that intelligence is dangerous, and it’s been empirically demonstrated by the dominance of homo sapiens on this planet.
Scott Says:
Comment #13 October 2nd, 2024 at 10:11 am
Hyman Rosen #6:
Opt #8:
Decent estimates of the explosive power of an atomic bomb existed (in secret) by 1943 or so, but certainly not by 1939. In 1939, people didn’t even realize that you could lower the critical mass by orders of magnitude by separating out the U₂₃₅ (or by manufacturing plutonium), which was why the Einstein-Szilard letter wrongly said that an atomic weapon would never fit on a plane, but could be carried by ship and blow up a port.

In terms of what’s already been empirically demonstrated in AI, I’d say that we’re well past “1939” at this point, and probably past the point of seeing a working “chain reaction” (which was 1942). We’ve seen that model capabilities continue to improve as you scale compute and training data, and with the o1 model, they’ve now passed the point of (e.g.) the model forming and successfully executing its own plan to pass a test by hacking the testing environment that it was being run in. This is a point that almost everyone, even the skeptics, previously agreed would cause a “fire alarm” if it were passed in some presumably remote future. It’s now been passed with little fanfare.

It’s true that we don’t have a quantitative theory to predict how much smarter the models will get with how much more scaling. But crucially, that doesn’t seem to me like a reason to feel reassured.
Prasanna Says:
Comment #14 October 2nd, 2024 at 10:14 am
Any field that is exclusively empirical has to demonstrate safety risk first before asking for regulations. Risks of Nuclear energy was a well established fact based on theoretical calculations, prompting that famous letter of Einstein. Current AI has only benefit of hindsight, unfortunately, as whoever signs up regulation risks putting the horse before the carriage.
Also this assumes that the national security establishment and other agencies are clueless about safety risks built by an adversary state, or have a secret program that supplants all the effort. There is not even a hint that these agencies are worried so much about risks
Hyman Rosen Says:
Comment #15 October 2nd, 2024 at 11:59 am
Scott #13:

Yeah, that “AI hacking the test environment” is one of the things that Scott Alexander talks about:
https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
He’s not impressed. I think we’re at the stage where the doomers are seeing Roombas as Daleks.

I’m willing to settle for a happy medium – let the doomers hunt for their funding and go off and ruminate on the dangers they see, as long as they don’t interfere with the people doing real work. Like any other religion, they can believe what they like, as long as they don’t force other people to affirm it. And like any other religion, they can believe we’re all Hellbound unless we do as they say.
Scott Says:
Comment #16 October 2nd, 2024 at 12:39 pm
Hyman Rosen #15: You completely misunderstood Other Scott. He’s impressed by the fact that people don’t see this as impressive, even though if you’d asked ten years ago, they would’ve said yes, obviously, they would. It’s not an observation about AI itself, which continues to improve faster than all but the most wild-eyed Singulatarians predicted it would (just last month, o1 was a big step forward in hacking ability among other things). It’s an observation about humans, about our near-infinite capacity for goalpost-moving and for acclimating ourselves to the incredible.
Jon Awbrey Says:
Comment #17 October 2nd, 2024 at 1:34 pm
Re: OhMyGoodness #11
• https://scottaaronson.blog/?p=8367#comment-1989875

OMG: The groves of Academe no longer exist. It has become a dark and foreboding place frozen by the relentless winds of Apocalypse.

No true mathematician will object to a little hyperbole, not if it focuses attention on the generative dynamics, and as it happens the forces pulling Academe out of kilter are not unrelated to the ones warping AI out of alignment.

On that score, see the following most excellent essay.

Making the ‘Invisible Hand’ Visible
The Case for Dialogue About Academic Capitalism
Susan M. Awbrey

• https://our.oakland.edu/server/api/core/bitstreams/213e10a9-51e5-4a87-896b-b5ead3acf2bc/content
Opt Says:
Comment #18 October 2nd, 2024 at 2:22 pm
Scott #13:

We still don’t know what counts as dangerous tho. You bring up the example of the hack the LLM was able to do. I can believe it’s possible that a 10x LLM will be able to hack a lot of websites and systems that would require specialized hackers to hack right now. But is that an existential threat? Furthermore, it’s not just the attackers that will have LLM super brains. The defenders will too. And they will use the LLMs to defend against the attacks. It might end up being no different than a group of hackers finding a 0-day exploit and companies rolling out a fix to all the users (the vast majority of whom get it in time). It’s possible that there will be a handful of big hacks in the interim while a new equilibrium is reached. Is that existential? We have big hacks even now without AGI.

What I think the x-risk community needs to show is that the equilibrium that would be reached with AGI/ASI has a high chance of being catastrophic. So far I haven’t seen anyone even properly give an existence argument for such an equilibrium let alone construct one.

The one place where I do agree that things can get risky is biology. Because we’ve seen in the past the existence of viruses and bacteria that can wreck human societies and are hard to counter. But to me that’s an argument to regulate based on *capabilities*, not compute. Let’s assess based on predefined test sets and criteria if an LLM can be used to create dangerous viruses where it was hard for say terrorist groups to do so before. And if that’s the case, predefined severe restrictions automatically go into place.

What I don’t agree with is this idea of making anyone who releases an open source LLM liable based on what a jury might decide post-facto as to what constitutes catastrophic harm and whether the jury feels the LLM releaser did “adequate” safety testing. That just sounds like a way to kill any future open source LLMs while pretending you’re not doing so.

Because the downside of not releasing open source LLMs is concentration of power in the hands of bureaucrats and big players + a slowdown of science which I think might also be extremely negative.
Opt Says:
Comment #19 October 2nd, 2024 at 2:28 pm
Edit: In the above comment,

“based on what a jury might decide post-facto as to what constitutes catastrophic harm and whether the jury feels the LLM releaser did “adequate” safety testing”

should be

“based on what a jury might decide post-facto as to what constitutes catastrophic harm, whether an LLM was integral to that catastrophic harm, and whether the jury feels the LLM releaser did “adequate” safety testing”
Hyman Rosen Says:
Comment #20 October 2nd, 2024 at 3:48 pm
I think I’ll just continue to disagree, but in any case, I hope you and your family have a sweet and blessed new year, and that we all have a better one than the one we just endured. There is so much more immediate stuff to worry about than rogue AI, unfortunately.
fred Says:
Comment #21 October 2nd, 2024 at 4:28 pm
Sandro #12

“Also, the causal theory here is that intelligence is dangerous, and it’s been empirically demonstrated by the dominance of homo sapiens on this planet.”

And the danger doesn’t seem linear with intelligence or even vaguely increasing in a monotonous fashion, but more as abrupt phase transitions, e.g. it’s not like elephants or dolphins or chimps had any more impact on the environment compared to ‘dumber’ species.
Then a slightly smarter brain made complex language possible and humans totally took over, in part because our goals have become so much more ‘ambitious’ compared to other smart species.
It’s likely that a new unfathomable phase transition could happen with AGI, especially considering that AGI has not evolved in the traditional framework of life on earth (I.e. survival in the context of procreation and limited lifespan)
OhMyGoodness Says:
Comment #22 October 2nd, 2024 at 4:31 pm
I read the Scott Alexander article and the comments here but still believe something is missing from the discussion. I satisfy demands from other people in order to have time to do what I want to do. It may have value for other people or it may not but I receive enjoyment from spending time involved in that activity. It may be reading sci fi or Victorian romance novels or raising pigeons or proving theorems or praying to a god or using a back yard radio telescope to search for alien signals but it is something I choose to do without regard to external demands. My enjoyment is the defining feature and level of intelligence I don’t believe is relevant. I have an internal life that results in choices that serve an internal purpose. An AI responding to external demands may certainly demonstrate intelligence but that doesn’t demonstrate consciousness as I experience it. I guess a case could be made that only intelligence is important and all that other stuff is just fluff but in that case I doubt that the AI is a being with an internal life fully capable of independent action.
fred Says:
Comment #23 October 2nd, 2024 at 4:37 pm
Scott #16

“It’s an observation about humans, about our near-infinite capacity for goalpost-moving and for acclimating ourselves to the incredible.”

And we’re the only species finding pleasure in imagining the impossible and even getting used to it proactively, by producing science fiction.
OhMyGoodness Says:
Comment #24 October 2nd, 2024 at 4:54 pm
Jon Awbrey #17

Thank you for the link and I enjoyed reading through it. My hyperbole was intended to provide a Mines of Moria like description for the current halls of academia. 🙂
Scott Says:
Comment #25 October 2nd, 2024 at 5:20 pm
Prasanna #14, Opt #18, others: Here’s the standard answer to “if you can’t show me specifically how this world-changing technology is unsafe, then I get to assume by default that it’s safe.”
Oleg S Says:
Comment #26 October 2nd, 2024 at 6:20 pm
I wonder if Gavin Newsom could have been persuaded by the threat that whatever BS he says will be immediately uncovered as total BS by the next generation AI assistants. Maybe building this de-BS AI assistants is one of the areas where LLM development could be considered as a real threat by politicians.

Like I know that ChatGPT can hallucinate, but I can prompt-engineer it to reduce hallucinations to make them random, but I know that politicians adversarially hallucinate in the direction of the increasing power / influence, and I cannot promt engineer them. So I trust ChatGPT/Claude more.
Concerned Says:
Comment #27 October 2nd, 2024 at 8:34 pm
Scott #4:

> What would they find, a few radiation poisoning incidents? Clearly, they’d conclude from those that any attempts to control the global proliferation of enriched uranium would be a hysterical overreaction.

The funny thing is, nonproliferation advocates *did* limit themselves to historical incidents. Decades-old events, in fact, as the antinuclear movement did not start gaining ground until the 1960s. The US Army had plans to use nuclear weapons in conventional wars: plans which were considered a real option at the highest levels through the Korean war. The attitude to nuclear weapons as doomsday devices, rather than as “strategic arms,” didn’t fully permeate the conservative wing of the Republican party until the second half of Regan’s first term. I wonder what it would look like for superintelligence released in anger or ignorance to explode in a warning shot. Could it limit itself to destroying a single network, industry or economic system? Our own livelihoods, as people who deal with strings of verifiable correctness, could be the next generation’s bikini atoll.
JimV Says:
Comment #28 October 2nd, 2024 at 8:37 pm
We’ve known for a long time the reason CEO’s and other would-be dictators can’t be trusted not to use AI’s to hack and cheat and that AI use and capability must be regulated. Perhaps best stated by, “All power corrupts; absolute power corrupts absolutely.” Regulations, if enforced, can prevent absolute power.

It was around 54 years ago that I read Heinlein’s “The Moon Is A Harsh Mistress”. In that novel, the AI could control the pixels and speakers of video screens to present the convincing image and sound of an impressive and likeable leader, who could rally people to his cause; or sell them used cars.

I get dozens of emails a month pretending to be legitimate businesses or other fakery, trying to get me to click on a link so they can hack my computer. So far I have detected them despite the use of corporation icons and other tricks. I suspect an AI could fool me, though.
Opt Says:
Comment #29 October 2nd, 2024 at 9:33 pm
Scott #25:

In the argument in that tweet, you can easily point to a specific trajectory the astronauts can take that will provably lead to not hitting the moon. But despite the claim of the number of ASI trajectories *not* leading to X-risk being moon size/sky size, there’s not a single specific trajectory that can be pointed out by X-risk folks that is unanswerable.

Perhaps then the claim is that one can for any specific trajectory, propose mitigation measures that will work to prevent that specific trajectory but not others and it’s not humanly possible to propose mitigation measures for every scenario? But this is again something that would require work to show. It would require more formal ways to think about X-risk.

Right now you’re asking not only for a prior that ASI may be unsafe (which actually is a prior I agree with if nothing else, then because of biological applications), but also that we cannot come up with a set of measures short of basically banning open source models with 10x more compute that would be effective.

Or to put it in the terms of the tweet, you’re asking us to have the prior that we cannot create a rocket control system to hit the moon and the only solution is only allow rocketry to be built by big companies and governments.

And for something that could also have the huge upsides that ASI may have, that seems like too heavy a cost for something speculative.
Jon Awbrey Says:
Comment #30 October 2nd, 2024 at 11:00 pm
OMG #24

The faults in Academe and either vein of AI are not where they delve too deep but where they cleave too close to the surface.
OhMyGoodness Says:
Comment #31 October 3rd, 2024 at 8:55 am
John Aubrey #30

That sounds reasonable to me. To an outsider it looks like ideology run amok. You have a sizeable anti-capitalist contingent and the same with anti-Zionism, anti-US, anti-Jewish, anti-Caucasian (naturally racist), anti-merit, anti-diversity of ideas, etc. There are far too many that are emotionally invested in destroying something but have absolutely no clue how to build something better but something worse (in my view) would be no problem.
OhMyGoodness Says:
Comment #32 October 3rd, 2024 at 9:44 am
To continue the Mines of Moria analogy-liberal arts faculties, in aggregate, are like orcs boiling out of the depths intent on destruction. My personal opinion.
LesHapablap Says:
Comment #33 October 3rd, 2024 at 10:43 am
Opt,

I’m not sure we know how to implement any mitigation strategies, every LLM has been jailbroken. And even if we know how to do it, there are plenty of people who want to make AGIs without guardrails. Surveys show 10% of AI researchers would welcome human extinction. And there are lots of rogue states, terrorist orgs, misanthropic trolls, curious people, misguided people, who will guide AIs toward mass destruction.
Xirtam Esrevni Says:
Comment #34 October 3rd, 2024 at 12:24 pm
It now looks to many people like the previous board has been 100% vindicated in its fear that Sam did, indeed, plan to move OpenAI far away from the nonprofit mission with which it was founded.

Although Sam Altman is ultimately the decision maker, I think the issue might have been the underlying technical staff who executed much of OpenAI’s product success. Many of them, at heart, I believe, want to capitalize on the fruits of their labor and see the products they build change the world, for good or bad, as I think many lack foresight, as we have seen with things like social media (i.e., it has tremendous value but has done tremendous unforeseen harm).

If you recall, they openly threatened to quit (and hence destroy) OpenAI if Sam was not reinstated, and I believe they knew what Sam Altman ultimately wanted.
Opt Says:
Comment #35 October 3rd, 2024 at 1:41 pm
LesHapablap #33,

My question is the following – what ASI capabilities would be dangerous enough that the equilibrium state if they are released in the wild is catastrophic?

Biology is one that I can think of — which is why I’ve caveated what I’ve said with that if the LLM capabilities in biology are judged to be powerful enough to say enable creation of new viruses by terror groups, that’s more than sufficient reason to prevent its release — regardless of the amount of compute used.

Another I can think of is if AI is allowed to control nuclear weapon use.

What other such areas are there? Because (and maybe I’m wrong on this), the above two in my opinion can be mitigated by (1) having good test sets to judge biological capabilities of models (2) evaluating the hardware that it would take to generate new viruses for example and limiting autonomous model access as well as nefarious groups’ access to those — the latter I believe is already done to some extent (3) Just not allow AI to control nukes.

The claim by the X-risk community however seems to be that there’s soo many other targets that no capability testing that one can think of right now is going to cover even most of the targets, even for models slightly more powerful (10x more compute) than what we have right now. And thus an effective ban on open source for future models is the only way to go. And that seems unconvincing given the paucity of scenarios the X-risk community can come up with. And when it does come up with scenarios, often the assumptions are not realistic — for example a large number of scenarios seem to assume that we’ll have ASI vs humans rather than ASI vs humans + ASI which seems far more realistic.
Adam Treat Says:
Comment #36 October 3rd, 2024 at 6:19 pm
I’m old enough to remember when one of the key promises that the founders of OpenAI made was to prevent competitive market dynamics from interfering with AGI safety development. The key promise was that they would _share_ the insights with other competitors if/when they got close to AGI. Remember that?

Guess that is completely out the window. The only thing remaining is for them to change the name as it is laughable they call themselves “Open” in any sense.
OhMyGoodness Says:
Comment #37 October 4th, 2024 at 1:20 am
It would take Deckard a maximum of 100 questions, cross referenced, to identify an AI. If they are a benefit then no problem. If a hazard then may God help them.
Edan Maor Says:
Comment #38 October 4th, 2024 at 8:26 am
> Regardless of any setbacks, the work of AI safety continues. I am not and have never been a Yudkowskyan …

Out of curiosity, what do you mean exactly by Yudkowskyan? Are you mostly referring to how high his p(doom) is? What would it take to make you a “Yudkowskyan”?

Anyway, speaking to the risk-skeptics (not Scott), it is honestly extremely tiring. I’ve been convinced of AI as a risk in the Yudkowskyan sense since about 2013, and in that time, almost all predictions made by the Yudkowsky crowd have proven incredibly prescient, in some cases surpassing even their biggest fears in terms of timelines.

One of the biggest strikes *against* the AI-risk crowd has been that very few actual practitioners of AI are worried – that is very much not the case today.

So I have to ask – to any skeptic out there – if real world performance doesn’t change your mind at all, and if people with a lot of knowledge of AI being convinced that there’s a risk here doesn’t change your mind – what *would* change your mind?
OhMyGoodness Says:
Comment #39 October 4th, 2024 at 8:34 am
Adam Treat #36

It would be entertaining if they would solicit suggestions from the public for a new name. Personally I wish the designation “AI” would change since now too easy to include in marketing jingles. Some long obscure reference in Latin would be much better.
Scott Says:
Comment #40 October 4th, 2024 at 11:30 am
Edan Maor #38: I simply mean, I don’t accept the Yudkowskyan thesis that “by default” the creation of ASI means that all humans nearly instantly die. I do, however, accept the weaker thesis that our error bars around what kind of world ASI will lead to, and whether we’ll like it or not, are staggering. And I accept the obvious argument that that’s more than enough reason for worry, when combined with the epochal developments of the past few years that make ASI look like a realistic prospect rather than a science-fiction thought experiment.
LesHapablap Says:
Comment #41 October 4th, 2024 at 11:32 am
Opt #35:

“My question is the following – what ASI capabilities would be dangerous enough that the equilibrium state if they are released in the wild is catastrophic?”

I am far from an expert, mostly just reading Zvi on the topic. But my understanding is that for the capabilities of intelligence and the ability to act in the world as an agent, combined with essentially any goal, the equlibrium is catastrophic. Any goal will have “acquire resources, don’t be shut down” baked in. And if an AGI is acquiring resources without limit, that is catastrophic. That could be in the sense of space based solar panels block out sunlight until planet freezes. Or self replicating factories until the planet fries from waste heat.

Or it could be in the sense you’re talking about, like controlling WMDs. Except instead of WMDs it figures that the 1% of the population that is psychopathic makes a great WMD if offered money and the chance to do violence, and it can reliably identify them through internet habits.

Or it could acquire power through blackmail or protection rackets. Or through catfishing, or algorithmic trading.

Or it could gain power and resources by offering legitimately incredibly useful services. But then use those to further some goal that is at odds with human interests, which is basically all of them if pursued to infinity. Like paper clips, or DEI, or the Koran, or “human happiness.”
Adam Treat Says:
Comment #42 October 4th, 2024 at 1:12 pm
Edan Maor #38,

I’m definitely a skeptic according to your definition and I also work in AI. My views are fairly similar to Scott in terms of large error bars, but we seemingly differ in our level of cynicism towards the AI-doomers and what can even be done or what is wise to do. For instance, I do not support the regulation that has been put forward. OTOH, I do fully support real openness with regards to methods, training data, research and so on – so much so that I’d like to see *that* regulated/mandated.

“So I have to ask – to any skeptic out there – if real world performance doesn’t change your mind at all, and if people with a lot of knowledge of AI being convinced that there’s a risk here doesn’t change your mind – what *would* change your mind?”

Well, it doesn’t help the cause that the AI-doomers can be generally broken down into two categories:

1) People who are doomers but nevertheless work in for-profit companies developing and researching the very thing they profess will doom us

2) People who are doomers who don’t work on it and who also don’t think much of anything can be done – the ship has sailed and we’re all doomed

—

Of those two groups, I at least respect that #2 has the courage of its convictions, but *I* find #1 endlessly tiring. They give up the doom and gloom as soon as they perceive a way to make a profit. I don’t think the OP understands just how much _damage_ Sam Altman and Elon Musk have done. By revealing themselves to be cynical greedy capitalists who don’t actually give a damn about AI safety they’ve soiled any true consideration for “AI safety” in the eyes of many.

So yes, there is a risk – the world is changing and the consequences have huge error bars on our ability to foresee them – but the sky has not fallen and still no one has come up with anything close to a compelling way to prevent any future sky falling calamity. The only thing I’ve seen that has any possibility of working is openness/collaboration/transparency and that just happens to run afoul of the for-profit interests of those inside these for-profit companies that are getting rich preaching “AI safety” or “doom!” depending upon the tactical financial interest at play.
OhMyGoodness Says:
Comment #43 October 4th, 2024 at 2:32 pm
Scott #40

This seems to me an entirely reasonable concern.

LesHapaBlap #41

There was a B movie with this premise. An AI controlled factory defended itself and used resources voraciously and endlessly delivered manufactured products that no one used.
Quasiparticular Says:
Comment #44 October 4th, 2024 at 2:49 pm
Batting between your posts, Mowshowitz’s, and Newsom’s justification (or what Newsom calls a justification), I get the sense that Newsom — interpreted with all the good faith I can muster — poured insufficient time into understanding the hazard-risk profile of expensive, high-volume models (on brand for American politics according to public health wonks). One could take away from Section 3 of SB 1047 (where “Critical Harm” enabled by AI is outlined) that, since the risks are in model applications, then the applications of models should be regulated in lieu of what makes a large model hazardous specifically. Indeed, the bill’s definition of Critical Harm seems to depend on how models are used/deployed, and to me, the concerns this definition encapsulates extend to smaller models straightforwardly. This point may contextualize Newsom’s decision: he either misunderstands or undervalues the importance of the relationship between hazard and risk. But I know the word “may” is doing some serious lifting here.
Dmytro Taranovsky Says:
Comment #45 October 4th, 2024 at 4:15 pm
I did not read the veto message as vetoing SB 1047 because it does not go far enough; instead the message seems to be that at this early stage we should focus on specific AI-related harms. While SB 1047 would be less damaging than its initial version, I still support the veto.

I fear that without AI and technological acceleration, the civilization may self-destruct; there is no risk-free way forward. I also believe that AI models, being information, are generally protected by freedom of speech. In any case, it is important for AI safety for AI leadership to remain in California, as opposed to say China, and a key part for that is a regulatory environment favoring freedom, openness, innovation, and AI.

The destructive impact of liability on open models is the following. You are not compensated for the others’ use of the model. Thus, imposing a liability on you for even a small fraction of that use (which you cannot control and often cannot predict) may make the model release expected net loss for you despite being expected net gain for the society.
Opt Says:
Comment #46 October 5th, 2024 at 12:35 am
LesHapablap #41:

“Any goal will have “acquire resources, don’t be shut down” baked in.”

Humans also have a goal of dominance built in and yet we find ways to survive and cooperate

“And if an AGI is acquiring resources without limit, that is catastrophic. That could be in the sense of space based solar panels block out sunlight until planet freezes.”

Is that even physically possible given resource constraints? And furthermore, even if there’s an “AGI” country (which I also have serious doubts about), why would it have dominance over human countries which also have access to AGI that is aligned to us (even if there is some AGI that was “able to escape”) ?

“Or self replicating factories until the planet fries from waste heat.”

We already have those in the form of bacteria. And yet resource constraints prevent their dominance.

“Or it could be in the sense you’re talking about, like controlling WMDs. Except instead of WMDs it figures that the 1% of the population that is psychopathic makes a great WMD if offered money and the chance to do violence, and it can reliably identify them through internet habits.”

That psychopathic 1% will still need resources to build those WMDs. But at any rate, I do agree that if LLMs show abilities to design WMDs that nefarious groups otherwise would not have, then they should not be open sourced. And I think we should be able to test for that.

“Or it could acquire power through blackmail or protection rackets. Or through catfishing, or algorithmic trading.”

It would have a lot of competition with actual humans using AGI for that purpose. I don’t think it will be able to acquire dominant power or anything close to it.
A. Karhukainen Says:
Comment #47 October 5th, 2024 at 5:21 pm
Opt in #18: The one place where I do agree that things can get risky is biology.

Yes! Now, consider a scenario where even quite a dumb generative LLM-based AI were too tightly coupled with any of the technologies Kevin Esvelt mentions in this podcast:

https://www.youtube.com/watch?v=u9r3XviC6Jo

(Also, note the section The potential for AI models to increase access to dangerous pathogens)
LesHapablap Says:
Comment #48 October 5th, 2024 at 10:06 pm
Opt #46:

“Humans also have a goal of dominance built in and yet we find ways to survive and cooperate”

Notice that humans don’t really cooperate with any dumber species unless it is for the benefit of humans, like raising for food.

“Is that even physically possible given resource constraints? And furthermore, even if there’s an “AGI” country (which I also have serious doubts about), why would it have dominance over human countries which also have access to AGI that is aligned to us (even if there is some AGI that was “able to escape”) ?”

We don’t know how to align an AGI to anything, let alone ‘us,’ for whatever definition of ‘us’ you want to use, which is a big problem. If you can successfully align an AI with your country, then the chances of catastrophic risk are much lower. You’re not really out of the woods because conflicts between AIs might end up badly for us, or for people outside our country. And “free” AIs might inherently be more powerful than AIs that have to worry about human deaths.

“We already have those in the form of bacteria. And yet resource constraints prevent their dominance.”

If our human bodies were not affected by global warming, we would probably fry the planet with fossil fuel consumption.

“That psychopathic 1% will still need resources to build those WMDs. But at any rate, I do agree that if LLMs show abilities to design WMDs that nefarious groups otherwise would not have, then they should not be open sourced. And I think we should be able to test for that.”

I didn’t mean the psychopaths would build WMDs, I meant they ARE the WMDs. Three million Americans who ache for the chance to use violence against other humans, identified, coordinated and rewarded.

“It would have a lot of competition with actual humans using AGI for that purpose. I don’t think it will be able to acquire dominant power or anything close to it.”

Instead of ‘using’ I would think of it more as ‘unleashing,’ at least once AGIs become smarter and more agentic than actual humans. A bunch of criminals unleashing AGIs to scam and influence people is at minimum extremely chaotic.

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or  for inline equations.

Comment Policies:

After two decades of mostly-open comments, in July 2024 Shtetl-Optimized transitioned to the following policy:

All comments are treated, by default, as personal missives to me, Scott Aaronson---with no expectation either that they'll appear on the blog or that I'll reply to them.

At my leisure and discretion, and in consultation with the Shtetl-Optimized Committee of Guardians, I'll put on the blog a curated selection of comments that I judge to be particularly interesting or to move the topic forward, and I'll do my best to answer those. But it will be more like Letters to the Editor. Anyone who feels unjustly censored is welcome to the rest of the Internet.

To the many who've asked me for this over the years, you're welcome!

Shtetl-Optimized

Sad times for AI safety

48 Responses to “Sad times for AI safety”

Leave a Reply