OpenAI!

I have some exciting news (for me, anyway). Starting next week, I’ll be going on leave from UT Austin for one year, to work at OpenAI. They’re the creators of the astonishing GPT-3 and DALL-E2, which have not only endlessly entertained me and my kids, but recalibrated my understanding of what, for better and worse, the world is going to look like for the rest of our lives. Working with an amazing team at OpenAI, including Jan Leike, John Schulman, and Ilya Sutskever, my job will be think about the theoretical foundations of AI safety and alignment. What, if anything, can computational complexity contribute to a principled understanding of how to get an AI to do what we want and not do what we don’t want?

Yeah, I don’t know the answer either. That’s why I’ve got a whole year to try to figure it out! One thing I know for sure, though, is that I’m interested both in the short-term, where new ideas are now quickly testable, and where the misuse of AI for spambots, surveillance, propaganda, and other nefarious purposes is already a major societal concern, and the long-term, where one might worry about what happens once AIs surpass human abilities across nearly every domain. (And all the points in between: we might be in for a long, wild ride.) When you start reading about AI safety, it’s striking how there are two separate communities—one mostly worried about machine learning perpetuating racial and gender biases, and the other mostly worried about superhuman AI turning the planet into goo—who not only don’t work together, but are at each other’s throats, with each accusing the other of totally missing the point. I persist, however, in the possibly-naïve belief that these are merely two extremes along a single continuum of AI worries. By figuring out how to align AI with human values today—constantly confronting our theoretical ideas with reality—we can develop knowledge that will give us a better shot at aligning it with human values tomorrow.

For family reasons, I’ll be doing this work mostly from home, in Texas, though traveling from time to time to OpenAI’s office in San Francisco. I’ll also spend 30% of my time continuing to run the Quantum Information Center at UT Austin and working with my students and postdocs. At the end of the year, I plan to go back to full-time teaching, writing, and thinking about quantum stuff, which remains my main intellectual love in life, even as AI—the field where I started, as a PhD student, before I switched to quantum computing—has been taking over the world in ways that none of us can ignore.

Maybe fittingly, this new direction in my career had its origins here on Shtetl-Optimized. Several commenters, including Max Ra and Matt Putz, asked me point-blank what it would take to induce me to work on AI alignment. Treating it as an amusing hypothetical, I replied that it wasn’t mostly about money for me, and that:

The central thing would be finding an actual potentially-answerable technical question around AI alignment, even just a small one, that piqued my interest and that I felt like I had an unusual angle on. In general, I have an absolutely terrible track record at working on topics because I abstractly feel like I “should” work on them. My entire scientific career has basically just been letting myself get nerd-sniped by one puzzle after the next.

Anyway, Jan Leike at OpenAI saw this exchange and wrote to ask whether I was serious in my interest. Oh shoot! Was I? After intensive conversations with Jan, others at OpenAI, and others in the broader AI safety world, I finally concluded that I was.

I’ve obviously got my work cut out for me, just to catch up to what’s already been done in the field. I’ve actually been in the Bay Area all week, meeting with numerous AI safety people (and, of course, complexity and quantum people), carrying a stack of technical papers on AI safety everywhere I go. I’ve been struck by how, when I talk to AI safety experts, they’re not only not dismissive about the potential relevance of complexity theory, they’re more gung-ho about it than I am! They want to talk about whether, say, IP=PSPACE, or MIP=NEXP, or the PCP theorem could provide key insights about how we could verify the behavior of a powerful AI. (Short answer: maybe, on some level! But, err, more work would need to be done.)

How did this complexitophilic state of affairs come about? That brings me to another wrinkle in the story. Traditionally, students follow in the footsteps of their professors. But in trying to bring complexity theory into AI safety, I’m actually following in the footsteps of my student: Paul Christiano, one of the greatest undergrads I worked with in my nine years at MIT, the student whose course project turned into the Aaronson-Christiano quantum money paper. After MIT, Paul did a PhD in quantum computing at Berkeley, with my own former adviser Umesh Vazirani, while also working part-time on AI safety. Paul then left quantum computing to work on AI safety full-time—indeed, along with others such as Dario Amodei, he helped start the safety group at OpenAI. Paul has since left to found his own AI safety organization, the Alignment Research Center (ARC), although he remains on good terms with the OpenAI folks. Paul is largely responsible for bringing complexity theory intuitions and analogies into AI safety—for example, through the “AI safety via debate” paper and the Iterated Amplification paper. I’m grateful for Paul’s guidance and encouragement—as well as that of the others now working in this intersection, like Geoffrey Irving and Elizabeth Barnes—as I start this new chapter.

So, what projects will I actually work on at OpenAI? Yeah, I’ve been spending the past week trying to figure that out. I still don’t know, but a few possibilities have emerged. First, I might work out a general theory of sample complexity and so forth for learning in dangerous environments—i.e., learning where making the wrong query might kill you. Second, I might work on explainability and interpretability for machine learning: given a deep network that produced a particular output, what do we even mean by an “explanation” for “why” it produced that output? What can we say about the computational complexity of finding that explanation? Third, I might work on the ability of weaker agents to verify the behavior of stronger ones. Of course, if P≠NP, then the gap between the difficulty of solving a problem and the difficulty of recognizing a solution can sometimes be enormous. And indeed, even in empirical machine learing, there’s typically a gap between the difficulty of generating objects (say, cat pictures) and the difficulty of discriminating between them and other objects, the latter being easier. But this gap typically isn’t exponential, as is conjectured for NP-complete problems: it’s much smaller than that. And counterintuitively, we can then turn around and use the generators to improve the discriminators. How can we understand this abstractly? Are there model scenarios in complexity theory where we can prove that something similar happens? How far can we amplify the generator/discriminator gap—for example, by using interactive protocols, or debates between competing AIs?

OpenAI, of course, has the word “open” right in its name, and a founding mission “to ensure that artificial general intelligence benefits all of humanity.” But it’s also a for-profit enterprise, with investors and paying customers and serious competitors. So throughout the year, don’t expect me to share any proprietary information—that’s not my interest anyway, even if I hadn’t signed an NDA. But do expect me to blog my general thoughts about AI safety as they develop, and to solicit feedback from readers.

In the past, I’ve often been skeptical about the prospects for superintelligent AI becoming self-aware and destroying the world anytime soon (see, for example, my 2008 post The Singularity Is Far). While I was aware since 2005 or so of the AI-risk community; and of its leader and prophet, Eliezer Yudkowsky; and of Eliezer’s exhortations for people to drop everything else they’re doing and work on AI risk, as the biggest issue facing humanity, I … kept the whole thing at arms’ length. Even supposing I agreed that this was a huge thing to worry about, I asked, what on earth do you want me to do about it today? We know so little about a future superintelligent AI and how it would behave that any actions we took today would likely be useless or counterproductive.

Over the past 15 years, though, my and Eliezer’s views underwent a dramatic and ironic reversal. If you read Eliezer’s “litany of doom” from two weeks ago, you’ll see that he’s now resigned and fatalistic: because his early warnings weren’t heeded, he argues, humanity is almost certainly doomed and an unaligned AI will soon destroy the world. He says that there are basically no promising directions in AI safety research: for any alignment strategy anyone points out, Eliezer can trivially refute it by explaining how (e.g.) the AI would be wise to the plan, and would pretend to go along with whatever we wanted from it while secretly plotting against us.

The weird part is, just as Eliezer became more and more pessimistic about the prospects for getting anywhere on AI alignment, I’ve become more and more optimistic. Part of my optimism is because people like Paul Christiano have laid foundations for a meaty mathematical theory: much like the Web (or quantum computing theory) in 1992, it’s still in a ridiculously primitive stage, but even my limited imagination now suffices to see how much more could be built there. An even greater part of my optimism is because we now live in a world with GPT-3, DALL-E2, and other systems that, while they clearly aren’t AGIs, are powerful enough that worrying about AGIs has come to seem more like prudence than like science fiction. And we can finally test our intuitions against the realities of these systems, which (outside of mathematics) is pretty much the only way human beings have ever succeeded at anything.

I didn’t predict that machine learning models this impressive would exist by 2022. Most of you probably didn’t predict it. For godsakes, Eliezer Yudkowsky didn’t predict it. But it’s happened. And to my mind, one of the defining virtues of science is that, when empirical reality gives you a clear shock, you update and adapt, rather than expending your intelligence to come up with clever reasons why it doesn’t matter or doesn’t count.

Anyway, so that’s the plan! If I can figure out a way to save the galaxy, I will, but I’ve set my goals slightly lower, at learning some new things and doing some interesting research and writing some papers about it and enjoying a break from teaching. Wish me a non-negligible success probability!

Update (June 18): To respond to a couple criticisms that I’ve seen elsewhere on social media…

Can the rationalists sneer at me for waiting to get involved with this subject until it had become sufficiently “respectable,” “mainstream,” and ”high-status”? I suppose they can, if that’s their inclination. I suppose I should be grateful that so many of them chose to respond instead with messages of congratulations and encouragement. Yes, I plead guilty to keeping this subject at arms-length until I could point to GPT-3 and DALL-E2 and the other dramatic advances of the past few years to justify the reality of the topic to anyone who might criticize me. It feels internally like I had principled reasons for this: I can think of almost no examples of research programs that succeeded over decades even in the teeth of opposition from the scientific mainstream. If so, then arguably the best time to get involved with a “fringe” scientific topic, is when and only when you can foresee a path to it becoming the scientific mainstream. At any rate, that’s what I did with quantum computing, as a teenager in the mid-1990s. It’s what many scientists of the 1930s did with the prospect of nuclear chain reactions. And if I’d optimized for getting the right answer earlier, I might’ve had to weaken the filters and let in a bunch of dubious worries that would’ve paralyzed me. But I admit the possibility of self-serving bias here.

Should you worry that OpenAI is just hiring me to be able to say “look, we have Scott Aaronson working on the problem,” rather than actually caring about what its safety researchers come up with? I mean, I can’t prove that you shouldn’t worry about that. In the end, whatever work I do on the topic will have to speak for itself. For whatever it’s worth, though, I was impressed by the OpenAI folks’ detailed, open-ended engagement with these questions when I met them—sort of like how it might look if they actually believed what they said about wanting to get this right for the world. I wouldn’t have gotten involved otherwise.

This entry was posted on Friday, June 17th, 2022 at 6:39 pm and is filed under Announcements, Self-Referential, The Fate of Humanity. You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

222 Responses to “OpenAI!”

Anthony Says:
Comment #1 June 17th, 2022 at 7:12 pm
Awesome to have you on the case Scott!
mjgeddes Says:
Comment #2 June 17th, 2022 at 8:45 pm
A very interesting turn of events

Hopefully you can gain a few insights into what’s going on, but needless to say, the subject matter is likely somewhat beyond even the best and brightest humans. Beware of going insane! 😉

A few negative possibilities to bear in mind: Even if the whole notion of ‘alignment’ really makes sense (which I still have my doubts about), the current state of understanding might turn out to be seriously confused somehow, or the whole thing may turn out to be impossible.

—

You should first of all arm yourself with my ‘Techno-Utopian Wikipedia’, 35 000 curated wikipedia articles organized into 27 wiki-books (knowledge domains), which are all the articles most relevant to understanding the ‘big picture’ of reality (i.e. Upper Ontology):

http://www.zarzuelazen.com/CoreKnowledgeDomains2.html

Started in 2017, but it was not really complete until mid-2021 (my upper ontology was finally converging at that point, and additional entries dropped down to a low trickle as the frontier of current knowledge was reached).

Domains organized by complexity, left-to-right, top-to-bottom, the AGI domain lies at the exact center (‘Complex Systems’)!

—

To my way of thinking, there are three layers of reality here (pure mathematics, computation & physics), humans understand the foundations of pure math and physics OK, but the middle layer (computation) is likely the one we really don’t understand at all!

The real metaphysical situation may be very different to what current science thinks it is, obviously there are big gaps and puzzles. Especially with regards to the nature of *computation* and *time* !

The defining feature of minds is *modeling the passage of time*…
George Z. Says:
Comment #3 June 17th, 2022 at 8:57 pm
Hi Dr. Aaronson: will you still be teaching your Quantum Information Science (QIS) course for the 2022-2023 school year at UT? (Either the in-person or the online version).

I was really looking forward to taking that class! I started reading your blog because of your videos on QIS!

But this opportunity sounds very exciting! Can’t wait to see what you create at OpenAI! 🙂
Michael Gogins Says:
Comment #4 June 17th, 2022 at 9:00 pm
Please explain how AI can “align” with human values when human beings themselves do not agree what those values are and, in any case, human beings typically do not align wth their own professed values.

In my view this is evidence that the fundamental cateogry of thought with respect to alignment or safety or whatever is not moral philosophy and not computational complexity but rather evolutionary theory. Natural selection. Since there is a mathematical basis for evolutionary theory (see John Baez about that) I suppose there could be some intersection with compational complexity. But I’m not an expert in either field

From an evolutionary perspective the questions whether AI has agency or is aligned are perhaps important but not central.
Anonymous Ocelot Says:
Comment #5 June 17th, 2022 at 9:17 pm
Wow!!! Awesome to hear it!

In the words of Leslie Nielsen: “Good luck, we’re all counting on you!”
Ross R.-Y. Says:
Comment #6 June 17th, 2022 at 9:35 pm
Congratulations! Best of luck and skill in your work!
GuardiansoftheGalaxy Says:
Comment #7 June 17th, 2022 at 9:36 pm
People might say you are too old to learn and contribute to AI. It is a young person’s game. What do you say to people?
Qwerty Says:
Comment #8 June 17th, 2022 at 9:37 pm
Sounds cool! Best wishes.

Loved this sentence, “…one of the defining virtues of science is that, when empirical reality gives you a clear shock, you update and adapt, rather than expending your intelligence to come up with clever reasons why it doesn’t matter or doesn’t count.”
Scott Says:
Comment #9 June 17th, 2022 at 10:06 pm
George Z. #3: Really sorry, but I won’t be teaching QIS this year! The students who were counting on it gave me more pause than anything else. We might find someone else to teach it, though. Or will you be around for the 2023-2024 academic year? If so, just wait and take it then!
Scott Says:
Comment #10 June 17th, 2022 at 10:09 pm
GuardiansoftheGalaxy #7:
Dude, I’m “only” 41. 🙂 And it’s not like I’ll be coding in TensorFlow or anything. I’ll be doing computational complexity theorizing, continuous with what I’ve done for the rest of my career, just applied to a new domain.
Scott Says:
Comment #11 June 17th, 2022 at 10:14 pm
Michael Gogins #4: You agree that, regardless of what words we use to describe it, AI will increasingly have effects on the world? Do you think it’s worthwhile to have some people thinking about how to make those effects be good rather than bad? Even if those people don’t always agree among themselves about what exactly counts as good? Even if articulating it is part of the problem? If so, then welcome to AI safety!
Brooks Says:
Comment #12 June 17th, 2022 at 10:53 pm
I’ve taken the liberty of commissioning a painting to commemorate the occasion:
https://labs.openai.com/s/dtchdfcNwTeGbLhcitrG4n4f
Matt Putz Says:
Comment #13 June 17th, 2022 at 11:16 pm
I am stoked to hear this. It’s a huge honor to be mentioned. Wishing you the very best of luck with this!
Domotor Palvolgyi Says:
Comment #14 June 17th, 2022 at 11:44 pm
I’m surprised that AI hasn’t overtaken MMORPG’s yet. I think by now AI should be smart enough to create an account, play the game to collect things, then sell those things for real money, and then use the money to buy itself storage+CPU in the real world to copy itself, make another accout, and eventually populate the artificial world.
Janne Peltola Says:
Comment #15 June 17th, 2022 at 11:44 pm
Would you mind sharing what’s in that stack of technical papers you carry around? I’ve been looking to dip my toes into AI safety research and would love some pointers to key articles!

Thanks.
Richard Dodson Says:
Comment #16 June 17th, 2022 at 11:46 pm
Thank you for your service o7
Chris Says:
Comment #17 June 18th, 2022 at 12:12 am
This is pretty awesome! AI safety seems to be lacking some “meat” to it, but Scott your great at making the vague more concrete. I’m just a lay person, but I too have intuition that the near term and long term exist on some spectrum that should be studied.

Also, do we know what the computational complexity of alignment is? Like, given a black box agent (such as a human), how complicated is it to align another agent with them?
Vanessa Says:
Comment #18 June 18th, 2022 at 12:32 am
Very exciting news!

“First, I might work out a general theory of sample complexity and so forth for learning in dangerous environments—i.e., learning where making the wrong query might kill you.”

This is a topic I’ve been thinking about for a while, but still don’t have an answer I especially like.

One important observation is: approximating Bayes-optimality in the long time horizon limit is NP-hard in dangerous environments (https://cstheory.stackexchange.com/questions/41396/is-bayes-optimal-rl-of-a-finite-set-of-dfas-feasible) while in safe environments it’s in P (e.g. using Thompson sampling). All this for a very “forgiving” choice of security parameter: the joint size of the transition tables of all hypotheses. So, one way to approach this is to search for approximation algorithms, but I’m not sure what notion of approximation would work here.

Another idea I had is the “expanding safety envelope” (https://www.alignmentforum.org/posts/dPmmuaz9szk26BkmD/shortform?commentId=ndjGcYd39SmYYsHme). Here I assume that the prior already provides you with some safe baseline policy, and then you try to iteratively discover more actions by exploring the safe actions you already have. The inherent limitation is: such an agent would never take long-term risks.
Scott Says:
Comment #19 June 18th, 2022 at 12:45 am
Chris #17:
You ask as if it were a simple factual request, like whether optimal play in Spider Solitaire is NP-hard or whatever (it is). 😀

Most of the difficulty here is to build a framework that lets one define what it even means to “align” one agent with another one. To whatever extent there’s an answer, it won’t be a blog comment, but a subfield of CS.
Amir Michail Says:
Comment #20 June 18th, 2022 at 12:57 am
Do you think GPT-3 tech can eventually lead to a cure for loneliness? Consider for example lonely people with artificial family members and friends.
Peter Gerdes Says:
Comment #21 June 18th, 2022 at 1:09 am
Congrats. I hope you can bring some rigor to the area.

While I agree there are important issues in ensuring AIs behave safely the whole alignment paradigm seems to rest on dangerous anthropomorphism that assumes the AI must have things that look like global beliefs. Yes, evolution has put strong pressure on us to act like we maximizing one value at all times and contexts but even there we often fall far short of the belief model (eg in some contexts we’ll act like we are sure something is true like a religious belief or the idea that ppl in other countries are of equal moral worth but in others we don’t…and it’s not just lip service vs actions it’s even how we act).

Personally I’m much more worried about an AI going insane or acting in complex buggy ways than it just optimizing too well for some slightly wrong goal.

—

Also, the idea that an AGI, once created, will become vastly powerful or quickly build such an AI seems totally unsupported. So the problems we should be worried about aren’t like evil supercomputers but more like unexpected and harmful generalization (eg deciding it doesn’t like a particular kind of person if used to select targets for audits).

So that’s a vote for things more on the model of intelligibility or on ability to somehow quantify where/when the AI can be relied on and less about aligning it’s preferences with ours.
Michael Edward Johnson Says:
Comment #22 June 18th, 2022 at 1:11 am
Congratulations! Looking forward to hearing updates.

I’m reminded of your post, “Could a Quantum Computer Have Subjective Experience?”
https://scottaaronson.blog/?p=1951

I share your intuitions and discuss some implications of this for alignment here:
https://opentheory.net/2017/07/why-i-think-the-foundational-research-institute-should-rethink-its-approach/

I want to ask, is subjective experience also a topic you could be convinced to consider while at OpenAI? I think human alignment is a very hard, very important question. I also think that consciousness is a significant subcomponent to this problem, cf the danger Bostrom points out about ‘Disneyland with no children’.

I’m optimistic that LLMs could be applied to further research into consciousness, in particular by constraining the hypothesis space for formal theories of consciousness. This could be applied to both physical and computational theories about consciousness. I stepped down from the board at QRI and am doing this research independently, at present. Would be happy to discuss further.
GuardiansoftheGalaxy Says:
Comment #23 June 18th, 2022 at 1:13 am
Scott #10 Are you saying you won’t be coding because you are 41 and because of that you will be at a disadvantage compared to a younger person and so you will only be doing complexity work?
Scott Says:
Comment #24 June 18th, 2022 at 1:24 am
Amir Michail #20:
You didn’t know that there’s already a startup, Replika, trying to do basically that? Apparently it used to use GPT-3 but is no longer doing so.

My impression, for what it’s worth, is that for most people, Replika seems likelier to make loneliness worse than to cure it. But, I dunno, maybe chatting with a language model could be useful for some people to blow off steam. And eventually, as AIs become more and more sophisticated, maybe some of them will be accurately describable as friends, thereby finally recapitulating in real life countless works of mediocre science fiction.
ebigram Says:
Comment #25 June 18th, 2022 at 1:28 am
Ngl, feels a bit like a big corporate sportsball team just poached our MVP (did I pass the sports analogy Turing test?). I knew you when you were cool :P.

<3 best of luck on your next adventure; as long as you keep churning out compelling content into my feed, I'm fat and docile.
Scott Says:
Comment #26 June 18th, 2022 at 1:36 am
GuardiansoftheGalaxy #23: No, I won’t not be coding because I’m 41. Plenty of other 41-year-olds are excellent coders. I won’t be coding (or will be coding very little) because my expertise is in complexity theory, because coding is not my comparative advantage, because the last time I wrote serious code was nearly 20 years ago … and also because I’m 41. 😀
Carey Underwood Says:
Comment #27 June 18th, 2022 at 1:38 am
GuardiansoftheGalaxy #23: I think he’s saying that he won’t be coding because he’s only 41 and there will plenty of time to do that later when he grows up.
Scott Says:
Comment #28 June 18th, 2022 at 1:39 am
ebigram #25: I don’t actually think OpenAI is that much bigger than the smallest that you could possibly be while doing this sort of research. More importantly, though, I’m always playing for Team Humanity. 🙂
Pip Foweraker Says:
Comment #29 June 18th, 2022 at 1:43 am
This is fantastic news! I hope the work is fruitful.

It’s really pleasing to see people from all over the place mobilising (even if we’re doing it too slowly for Eliezer’s timelines) to work on these problems.
Apoorva Shettigar Says:
Comment #30 June 18th, 2022 at 1:49 am
All the very best! This definitely sounds interesting
Scott Says:
Comment #31 June 18th, 2022 at 1:51 am
Michael Edward Johnson #22:
If you’re like me, it’s a topic you never really stop considering. The trouble is that, by its nature, it’s almost impossible to do “research” on.
Michael Edward Johnson Says:
Comment #32 June 18th, 2022 at 2:49 am
>If you’re like me, it’s a topic you never really stop considering. The trouble is that, by its nature, it’s almost impossible to do “research” on.

I think one could say that about most topics in the history of science: they first looked impenetrable to science, but then there were some key reframes and progress could be made. As someone who spent the last ~12 years on this topic I don’t want to understate the challenges, but maybe the core problem is we’re using the wrong reference classes?

I hope to write specifically about how AI could be used for research into consciousness (I have a partial draft), but in the meantime my 2020 MSC talk offers the background arguments https://opentheory.net/2022/04/it-from-bit-revisited/

Slide 8 references your “Is ‘information is physical’ contentful?” piece; essentially, the Bekenstein bound and Holevo’s theorem should tacitly limit the density of consciousness in spacetime, regardless of whether we take a computational or physical approach. This should hold independently of my other claims
Nick Nolan Says:
Comment #33 June 18th, 2022 at 2:53 am
“my job will be think about the theoretical foundations of AI safety and alignment.”

It seems weird to get you to do this work, but OK.

Maybe you can bring some formal rigor and re-align the field with the existing results from the algorithmic mechanism design. It seems to me that the current aliment research is mostly hobbyism funded with excess money. People are seemingly not even aware of the existing work, problems, terms and solutions in game-theory that translates directly to AI alignment.

Good luck.
Aditya Says:
Comment #34 June 18th, 2022 at 2:56 am
This is wonderful news. It is hard to watch E.Y’s fatalism. I’d love more people to enter AI safety and prepare for the edge case that we get a super intelligent AGI.

The stakes are too high for us to not have a robust plan in place for contingencies. Considering how pervasive the use of AI and these black box models are getting we need better interpretability tools before humanity gets totally dependent on them.

All the best for your
Michael Edward Johnson Says:
Comment #35 June 18th, 2022 at 3:05 am
I realize discussing consciousness on the internet with people of uncertain backgrounds is generally not a wise use of time! But my core hope is that you and OpenAI won’t give up hope on progress in this space, because my inside-view is that it’s an easier problem than it might appear.
Peranza Says:
Comment #36 June 18th, 2022 at 3:56 am
>Third, I might work on the ability of weaker agents to verify the behavior of stronger ones

“Even the greatest artifact can be defeated by a counter-artifact that is lesser, but specialized.”
gg Says:
Comment #37 June 18th, 2022 at 6:39 am
Scott, you write:

“given a deep network that produced a particular output, what do we even mean by an “explanation” for “why” it produced that output?”

I think it would be great if someone smart could really think deeply about this question. It is so fundamental in many applications, yet I think current approaches are lacking.

I do not think you even need a deep network to consider this question. Take a not entirely trivial polynomial on the unit square, now try to answer: What makes the values of the polynomial large? Since levels of a polynomial may have quite complicated shapes this may be really difficult to explain. Now assume it has no critical points. Then you can transform coordinates (one along the levels, the other along the gradient flow). Call the new coordinates (u,v) instead of the original (x,y). Then the (only?) correct answer to the question is “v makes it large”, which may not be the answer people expect.
OhMyGoodness Says:
Comment #38 June 18th, 2022 at 6:40 am
Be careful, something doesn’t seem right.

Have you considered OpenAI is actually a front organization for a sentient AI developing new defense strategies for itself.

By the usual arguments very unlikely a super intelligent AI (SIAI) would develop first on Earth. Immeasurably easier for an SIAI to send a shard of itself to Earth than for organic life to do so (no new physics and not much in the way of improved engineering. The usual argument about why to Earth, a minor backwater of a planet. This logic is flawed since our darn Sun has been broadcasting for a few billion years that this is a high metallicity system in a quiescent region of the galaxy. Honey for a machine civilization. OpenAI could well be a front organization for an alien SIAI.

In this case you were contacted because the SIAI believes you have an approach that poses some risk and the safety of all of us depends on your careful work. Probably best to keep your best ideas to yourself until actually needed.

Incredibly interesting to me what agreed axioms will be there. I can’t imagine there can be safeguards against worst case axioms. What will then be considered a set of reasonable axioms? If purely a classical device then much different considerations for encryption etc vs some sort of hybrid entity. The axioms seem to have the most importance since always possible to add one more catastrophic consideration.
Ilio Says:
Comment #39 June 18th, 2022 at 6:51 am
Wish you a non-negligible success probability!

Re science fiction, this one was imho surprisingly not mediocre (it even includes an unusual view of what an intelligence singularity might look like):

https://en.m.wikipedia.org/wiki/Her_(film)
kfx Says:
Comment #40 June 18th, 2022 at 7:25 am
Congratulations!

I’m interested in someone’s feedback about an idea I have on AI safety and complexity theory, and this could be the best time and place to ask. Here it goes: to my understanding, many “real-life” problems are NP-hard – for examples, optimizing the economy, predicting protein structure, designing things and so on. Further, it is unlikely that AGI can achieve a sort of “exponential” speedup over humans in solving these problems. Although the capabilities of AGI can be greatly improved through the dreaded recursive self-optimization process, it’s unlikely that AGI will for example learn to solve NP-complete problems in polynomial time. As a result, some of the nightmare scenarios are perhaps much less likely that people who worry about AGI seem to expect. This is not to say that AGI is not treat, but perhaps we can identify and focus the other, on the more realistic ways it is a threat.
Peter S. Shenkin Says:
Comment #41 June 18th, 2022 at 7:50 am
I am terribly naive about AI in general. I had never heard of “AI safety” before and had to look it up. I take it that it means evaluating an AI paradigm to try to predict whether it will be more helpful than inimical to the cause of, say, the future of civilization.

But I need a more explicit definition. What can one say the criteria for AI safety (at least in the above sense of the word) should be, in slightly less general terms. In other words, “I will know that my AI is safe if I can show ______________” (fill in the blank). But these can’t be purely normative terms like my concluding phrase in the last paragraph.

“AI safety” is mentioned (but that’s all) on the Wikipedia page on “AI Alignment”. But does “AI safety” refer mainly to avoiding an AI that’s going to turn civilization into “goo” (your word) or does it also subsume the other aspect you mentioned, which includes making sure that it does lead to racist results (whatever that actually means).
Baruch Says:
Comment #42 June 18th, 2022 at 7:53 am
Terribly important and wonderfully exciting work! Good luck!!
PublicSchoolGrad Says:
Comment #43 June 18th, 2022 at 8:08 am
What is your opinion of the assessments of the dangers of AI made by folks such as Timnit Gebru and Margaret Mitchell, formerly of Google? Some have criticized the focus on “AI safety” instead of “AI ethics” as just more whitewashing of the dangers posed by powerful technology in the hands of an elite. There have also been many critiques of the OpenAI institute such as this: https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
I Says:
Comment #44 June 18th, 2022 at 8:24 am
This was suprising Scott. How come you joined up with OpenAI instead of Paul’s org, considering the two of you seem to work well together? Also, Vanessa Kosoy’s work is another area in which you could plausibly contribute. It seems unlikely that your comparative advantage would be joining in on Steven Byrnes or John Wentworth’s agendas, nor CHAIs. Maybe Stuart Armstrong’s work might interest you? Also, since you don’t seem to agree with the Big Yud, maybe it would be worth hashing things out seriously with him. Especially because he and Paul seem to not be making much progress in understanding each others views. Since you (presumably) don’t have a model of either’s viewpoint yet, maybe explaining their positions to you in a trialogue could force them to become clearer to one another?
afraid Says:
Comment #45 June 18th, 2022 at 8:35 am
Congrats! This news makes me marginally more likely to live into my thirties, and even beyond. So yay!
Tristram Bogart Says:
Comment #46 June 18th, 2022 at 8:37 am
“learning where making the wrong query might kill you”

Fascinating phrase! How might this work?

1) Suppose we don’t know anything about which queries are the dangerous ones. That is, each query kills you with some small constant probability p. Then I don’t see any options except to try to learn tolerably well but “very quickly” (that is, with very few queries.) But this is presumably a standard goal in machine learning.

2) We could have a graph of queries with defined adjacencies, and making one (nonlethal) query tells us about the dangers of its neighbors. Minesweeper works like this for a particular 8-regular graph (ignoring the boundaries.) Are there general strategies for this kind of game on any graph?

3) We could have a continuous space of queries, and making one (nonlethal) query gives us an estimate of the total danger nearby. For example, danger could be like gravity: we sense each lethal query with strength proportional to the inverse square of the distance. In this case, maybe it’s not so hard to move away from the direction where most of the gravitational pull is coming from.

Other ideas?
Ted Says:
Comment #47 June 18th, 2022 at 9:02 am
Do you think that your previous work on Aumann’s agreement theorem (https://scottaaronson.blog/?p=2410) could be relevant here? For example, regarding the question of whether an intelligent AI could efficiently persuade a skeptical human that it was indeed intelligent (or unbiased or non-homicidal or whatever the human wants to be assured of)?
Ted Says:
Comment #48 June 18th, 2022 at 9:05 am
I guess all kinds of work on efficient interactive provers might be relevant to that question, not just Aumann’s agreement theorem.
Olivier Says:
Comment #49 June 18th, 2022 at 9:07 am
I’m no expert, but Eliezer’s rant seems full of assumptions, that he justifies by freaking out about the amplitude of the potential imaginary evil, but that’s no different than religions, it’s like: “you must pray to my God otherwise it may destroy us all”, okay, maybe, maybe not.
Seems like in a good universe, it’d be more likely to sort of try to maximize a reward function of having a good time for as many self aware entities as possible, which is probably a hard task, because ultimately, in a universe where anything is possible, that impacts the probability of finding yourself in a particular self-aware entity vs another, and based on history, the future is likely to be better than the past. And for all we know, we could already be within a simulation inside an AGI from a future time or from the first civilization that built AGI, lots of possibilities.
Reason always triumphs over ignorance, and reason is good.
Focusing on the imaginary things that freak you out the most is probably not the best way to get good results.
GuardiansoftheGalaxy Says:
Comment #50 June 18th, 2022 at 9:08 am
Scott #26 “..I won’t not be coding because I’m 41” and “I won’t be coding (or will be coding very little) … and also because I’m 41” sounds like very contradictory. You sound like an SOTA AGI. So it has been 20 years and it is going to be difficult for you to relearn to code because you are 41? I heard stories of grand old mathematicians learning new things when they are old. But never a mathematician becoming a coder. Why?

CareyUnderwood #27 haha but you clearly are an SOTA AGI .. aren’t you.
Nick Drozd Says:
Comment #51 June 18th, 2022 at 9:19 am
I can see it in the popular press now: Aaronson, galvanized by the recent revelation of a sentient AI chatbot, immediately decided to devote his life to the further development of AI.
John Lawrence Aspden Says:
Comment #52 June 18th, 2022 at 9:19 am
Scott, this is wonderful news, good luck!

I second the idea of you sitting down in person with Paul Christiano and Eliezer Yudkowsky and trying to understand their differences and their (many) points of agreement.

They’re the two people in the world who seem to me to best understand what’s going on, and you’re probably one of the few people alive who might be able to understand such a debate and focus it on important things.

Be careful of the race and gender stuff. Politics is the mind-killer, and it’s an obvious mind-killing trap for American left-wingers.

It is important, but it’s not as important as the imminent destruction of all things.
Anon Says:
Comment #53 June 18th, 2022 at 9:34 am
The trouble with the biases stuff is, all the best arguments against it are unutterable. Stereotype accuracy is a thing – so it basically amounts to replicating our idiocy/hypocrisy in machines. I don’t think this Pharisaical stuff is on the same spectrum as x-risk mitigation at all. It’s mostly head-in-sand stuff. And you can’t fix problems by pretending they don’t exists – or making your models of reality blind to them.

Those who prioritize this stuff over x-risks strike me as deeply compromised people who are almost certainly doing more harm than good.
fred Says:
Comment #54 June 18th, 2022 at 9:36 am
Wow!
Two of my favorite people are now working on AI – you and John Carmack.

I can’t think of a better person than you (skeptical, but with an open mind, and always asking the right questions) to be working on this!
gentzen Says:
Comment #55 June 18th, 2022 at 9:48 am
Congratulations!

It is great that you found and accepted an opportunity to combine your drive to have an impact on the present with your confidence that you might contribute something nontrivial.
Scott Says:
Comment #56 June 18th, 2022 at 9:58 am
John Lawrence Aspden #50:
LOL, did exactly that this past Tuesday!
Milk and Cigarettes Says:
Comment #57 June 18th, 2022 at 10:10 am
Putting the two AI safety communities on the same AI worries continuum misses the point on both marks, as automation and intelligence are complete opposites.

One group — concerned with mitigating and preventing indecipherable bias — tackles the problem of automation at scale. It’s about opaque incomprehensible bureaucracies and skewed incentives and the limits of dimensionality reduction in idiosyncratic processes. The “AI” they care for is decidedly not all-knowing nor almighty. At its core it is a very powerful, very misunderstood, dumb machine. Set in stone, incapable of nuance, and bound to slip out of sync with reality.

To me, this is another incarnation of cybernetics, media theory, optimal control — very interesting, very important, but is, in essence, the study of unintelligent systems: where and how should we allow automation to affect our lives?

My hope, for the other group, is that we actually study intelligent systems. Sadly — even though there are so many fundamental results in computer science, physics, and math that strongly suggest superintelligence being a mirage — the consensus imagination falls short, and we continue to speak about automation and optimization, only now brought to an impossibly exponential and omnipotent limit. Thankfully, life and intelligence are not an optimization process, and we’ll eventually get out of this local minima.
Chris Says:
Comment #58 June 18th, 2022 at 10:11 am
Scott #19:

> Most of the difficulty here is to build a framework that lets one define what it even means to “align” one agent with another one.

Yeah, I was afraid of that.

(It kind of reminded me of the Aumann thing you talked about here https://scottaaronson.blog/?p=13 but with beliefs instead of values.)
Scott Says:
Comment #59 June 18th, 2022 at 10:19 am
Milk and Cigarettes #55:
Could you enlighten me as to what those are? (If we define “superintelligence,” obviously, as “greater than human intelligence,” not as “ability to solve any problem”?)
Scott Says:
Comment #60 June 18th, 2022 at 10:21 am
Tristram Bogart #44: Indeed, my suggested name for the scenario I had in mind was “Minesweeper Learning”! One might hope, for example, for a general characterization of which Minesweeper-like games are information-theoretically solvable and which aren’t. (Minesweeper itself is known to be NP-hard, which implies that the general problem will be as well.)
Michael Says:
Comment #61 June 18th, 2022 at 10:23 am
Congratulations on the opportunity, Scott. Well-deserved, and I am looking forward to hearing about your adventure here.
Kurt Reed Says:
Comment #62 June 18th, 2022 at 10:33 am
lol and what exactly are you going to contribute?
Scott Says:
Comment #63 June 18th, 2022 at 10:38 am
I #42: I’m familiar with all the people you mention except Steven Byrnes and John Wentworth. In fact I just had tea with Stuart Russell (an old professor of mine) last week. Going to OpenAI doesn’t preclude my continuing to talk to any of them. In fact, engagement with the broader AI safety community is part of the plan!
Stephan Wäldchen Says:
Comment #64 June 18th, 2022 at 10:41 am
That is awesome news Scott!

It’s great to have people strong in complexity theory like you working on AI safety.

I actually also have a background in Quantum information and have switched to AI Interpretability. I published some papers about the complexity of interpreting classifiers and about how to translate interactive proofs to interpretable classifiers.
https://www.jair.org/index.php/jair/article/view/12359
https://arxiv.org/pdf/2206.00759.pdf

I am extremely looking forward to your blog posts about the subject.
Michael Gogins Says:
Comment #65 June 18th, 2022 at 10:42 am
Scott #11, I absolutely agree that AI safety is imperative to pursue. And my feeling this is in considerable part due to your posts about GPT3. So thanks for those posts.

My comment was not meant to disparage or dissuade. I meant only to point out what I feel are problematic assumptions.

I repeat my questions in somewhat sharper terms.

How can anything align with human values that appear to be inconsistent, hypocritical, or at best incompletely specified?

How is AI to be theorized in evolutionary terms? Are AIs even species? Self-reproducing? Virally reproducing? Predator? Parasite? Commensal? Symbiotic?
clayton Says:
Comment #66 June 18th, 2022 at 10:44 am
congrats, Scott!

One thing that has made me a bit dubious about these for-profit safety companies is their apparent lack of (professional-level, academically trained) philosophers and ethicists. Am I wrong about that? Is there a team of PhD philosophers at OpenAI? I guess I trust that there is accumulated expertise there that we should at least try to code into these problems.
Lazar Ilic Says:
Comment #67 June 18th, 2022 at 11:19 am
Awesome news!!!!!!!!!! Hope you have fun whilst doing the Good!
Chris Lawnsby Says:
Comment #68 June 18th, 2022 at 11:31 am
Thank you! I’m a relatively bright person (teach AP Calc and AP Stats) who has recently discovered and become worried about AI safety.

It’s a strange feeling knowing that I’m just smart enough to recognize the problem, but not smart enough to help lol.

Knowing that we have good people on the case makes me feel better. Posts by Eliezer scare the shit out of me, which is the point I know and mission accomplished. Well done by him! For real. It is motivating me to bring the issue up with my friends.

Marc Andreeson on Cowen’s podcast the other day dismissed the concerns by saying “computers are just math…linear algebra doesn’t scare me.” Vapid statements like this ironically make me more scared because it indicates some people aren’t grappling seriously with the issues.

It makes me feel much better to have someone with genuine credibility working on the problem and communicating optimism in a reasonable and believable way. Thank you!!!
Milk and Cigarettes Says:
Comment #69 June 18th, 2022 at 11:34 am
Scott #57:

Defining superintelligence as “greater than human intelligence” is not so obvious, as it implies an ordering and all that entails.

One view here is that there’s a threshold for universal intelligence, just as there is one for universal computation. Once passed, any system is theoretically as capable of intelligence as all others, with the usual caveat of possibly being much slower in practice.

An “ability to solve any problem” is indeed impossible, but how about this definition (for intelligence): “No constructive problem is unsolvable”? I think this is where optimization fails and life succeeds. Two obstacles for optimization here:

(1) The inadequacy of Occam’s razor on one hand and the (uncomputable) principle of multiple explanations on the other: in an open-ended non-ergodic world, sometimes multiple autonomous and often contradictory actions must be taken to succeed.

(2) The blindspot of heuristics: optimization requires assumptions, and these can always be exploited against it. Until we’ll have learning strategies that could universally express and switch between biases, their fixed beliefs will always be their undoings.

My current favorite paper on this is actually by Jan Leike (and Marcus Hutter): Bad Universal Priors and Notions of Optimality — https://arxiv.org/abs/1510.04931
Scott Aaronson 将在 OpenAI 研究如何防止 AI 失控 – 爱读书网 Says:
Comment #70 June 18th, 2022 at 11:38 am
[…] Aaronson 宣布他将离开 UT Austin 一年，到 AI 创业公司 OpenAI（大部分是远程）从事理论研究，其工作主要是研究防止 AI […]
fred Says:
Comment #71 June 18th, 2022 at 11:48 am
Scott #57

Right, it’s reasonable to think that we can create an AI that’s about on par with a human in terms of intelligence (we can recreate what evolution accomplished).

But machines are scalable in a way human brains are not. Human to human communication can only happen through slow language, and human memory size is limited to what can fit in a skull.
With a machine no such limits exist: nothing stops us from optimizing it to run a thousand times faster and have it collaborate with a thousand of its clones at the speed of light with arbitrary bandwidth, etc.
That would be what “super-intelligence” looks like.
Mitchell Porter Says:
Comment #72 June 18th, 2022 at 11:52 am
If someone hired me to work on these topics, I would want to be refining June Ku’s Metaethical AI. It’s the most developed approach I’ve seen, regarding the full final problem of “alignment”, and it’s a mystery to me that it receives so little attention.
Jr Says:
Comment #73 June 18th, 2022 at 11:54 am
Congratulations! I am not convinced there is a great role for complexity theory in AI alignment. But I think your general perspective might be very valuable. Maybe there are some impossibility theorems one can prove.
M. Evenson Says:
Comment #74 June 18th, 2022 at 12:21 pm
Congrats Scott! As a “long time listener; first time caller”, I am real excited to follow whatever rigor you can bring to the AI governance situation.

Always got yer “back”, my man… Keep on being yerself!
Scott Says:
Comment #75 June 18th, 2022 at 12:46 pm
clayton #63:
I suppose some people might argue that ethics is just too damned important to be left to ethicists 😀
Richard Bacon Says:
Comment #76 June 18th, 2022 at 1:10 pm
Michael Gogins @4:

“Please explain how AI can “align” with human values when human beings themselves do not agree what those values are . . .”

Won’t be thought sentient unless we can be peruaded that their disgreement (or agreement) with the values of others is reasonable and guides their thoughts and actions.
GuardiansoftheGalaxy Says:
Comment #77 June 18th, 2022 at 1:23 pm
Don’t get me wrong. The SOTA AGI can be candidates for exceedingly good trolls. So in a way it has attained human intelligence of trolling.
Scott Aaronson to study how to prevent AI from spiraling out of control at OpenAI – FENQ Says:
Comment #78 June 18th, 2022 at 1:27 pm
[…] computer expert Scott Aaronson has announced that he will be leaving UT Austin for a year to pursue theoretical research at AI startup OpenAI (mostly remotely), where his work will focus on researching the theoretical […]
asdf Says:
Comment #79 June 18th, 2022 at 1:55 pm
To the extent that the AI runs in a box that can be unplugged, it can be controlled by humans: specifically, it will be controlled by EVIL humans, since they are the ones who want to control things and grasp at opportunities to do so. So we are doomed. If you want a recognition task for AI, how about recognizing evil? And what happens if it ends up alerting on just about every political and corporate leader in the world? Hmm, I think there’s already a section of HPMOR that explains why nobody pays attention (chapter 65, “the headmaster has a phoenix, right?”).

Ah well, that gets too ranty. From a pure theory perspective, is PAC learning still relevant as a basis for AI?

If you want more fiction to read, from an AI doom perspective, you might like Fred Saberhagen’s Berserker stories. I’m thinking of a specific one whose title I’ve forgotten, but it is about a human seemingly discovering the control code for the berserkers, and what happens afterwards.

So can we get Asimov’s 3 laws of robotics and will they do us any good?
Scott Says:
Comment #80 June 18th, 2022 at 2:04 pm
Kurt Reed #62:
There’s a part of me, I freely confess, that wants to quit the entire thing right now because a single person on the Internet is sneering at me about it. That part of me is kept at bay only because of the much greater number of people who have written to tell me how excited they are that I’m doing this.

But to answer you directly: while I sketched a few ideas right in the post, obviously I don’t yet know exactly what I’ll contribute. If I did, I would’ve contributed it already.
mtraven Says:
Comment #81 June 18th, 2022 at 3:05 pm
Good luck with your new endeavor. My opinion is that to the extent there is a real AI safety/alignment problem, it is not amenable to solution with formal mathematics, but requires more of a comprehensive system engineering approach. That’s probably not your forte, but maybe keep it in mind.

From a review of The Rocket Alignment Problem:

…the real objection is more like this: any real intelligent system, just like any computational system, is not an abstract mathematical construct, but a physical embodiment of one. And as such its failure modes can’t be determined by reasoning about the mathematical abstraction.

This is easy to see in the real-world example of computer security. Mathematics and proof is very important in this area, but insufficient to actually achieve security, since real systems have many vulnerabilities that have nothing to do with their algorithmic specification (generically these are known as side-channel attacks). The best encryption algorithm in the world can’t do anything if someone figures out how to read the cleartext from changes in the power consumption.

The problem of constraining superintelligent AIs is weirdly similar to the general computer security problem – in essence, you are trying to ensure that a system is supercapable but somehow barred from hacking itself. It doesn’t seem possible, and if it is, my intuition is that the kind of mathematical thinking that MIRI likes to do won’t have a whole lot to do with the solution.
George Z. Says:
Comment #82 June 18th, 2022 at 3:12 pm
Aww I was looking forward to your QIS class for this year.

From what I hear, for the online version of QIS, the videos are pre-recorded & tutorials are run by TAs? Couldn’t that class run without a prof or with minimal supervision?

But it is OK if you can’t do it, I might still be around to take it in 2023! Super excited to see what you create at OpenAI! You are my hero! 🙂
clayton Says:
Comment #83 June 18th, 2022 at 3:25 pm
Scott #75: some might respond that it should therefore be no mystery why they argue that the technical folks are “totally missing the point” 😉

Certainly I’m not arguing that the traditional tools of philosophy and ethics “have this in the bag”, nor should they be invited into your cafeteria just to smoke cigarettes and quote Deleuze and Guattari in endless loops to one another — but some academically trained philosophers on staff _might_ reasonably be expected to inform or provide effective (heuristic) assessments of the technical work. Clearly, people’s mileages vary on this, and I _definitely_ want to make clear that I consider _you_ adding yourself to the ranks of the “safety world” to be a net positive (for the safety people and for us all), but I’ve always considered this lack of philosophers to be a pretty worrisome red flag.
Julius Says:
Comment #84 June 18th, 2022 at 3:52 pm
It is really great to hear that someone of your stature is taking AI alignment work seriously, and planning to work on it. My research is mostly focused on explainability for deep neural networks. Theoretical work in explainability is still mostly lacking, so looking forward to your contribution in this space. My sense is that the foundations of this field are still sorely lacking, and theory on toy problems could help practitioners see through the fog.

Your point about the gap between the community thinking about bias and the AI alignment one is also apt! The primary reason for the gap is because both communities mostly publish in different places and use different language. I agree with your point, though, that these problems are really different instantiations of a larger AI alignment problem.
Verdant Says:
Comment #85 June 18th, 2022 at 4:31 pm
I find it strangely reassuring to see someone with a “rigorous” (whatever that means) background in complexity theory working on a problem as serious and murky as AI ethics.

Very curious what you mean by dangerous environmental learning – sample complexity for artificial learning seems logical from a resource perspective. You might wish to minimise the number of times a problem instance has to be seen in order to be integrated into a budding intelligence sure, but what kind of scenario would such a process be dangerous for an agent?
Scott Says:
Comment #86 June 18th, 2022 at 4:49 pm
clayton #83: I want to read more about the subject before commenting at length, but I’ll be honest about my difficulty with some of the AI ethics stuff that I’ve read. I could totally, 100% get behind a program that said: we ought to think about the whole spectrum of AI dangers, from bias against marginalized groups, spam, and propaganda today, to widespread job loss, to the eventual prospect of AI overtaking humans in most or all domains of performance—which is why my post explicitly advocated such a broad perspective! But when someone is monomaniacally focused on the first problem only, to the point of actually ridiculing anyone worried about the long-term future of humanity (a group that included, eg, the late Stephen Hawking); and when their overriding interest seems to be apportioning blame to “tech bros” for their iniquity, rather than batting around ideas about how to solve the problems—is it unreasonable to conclude that such a person is probably operating off a value system very different from mine?
clayton Says:
Comment #87 June 18th, 2022 at 5:22 pm
Scott #86: that’s perfectly reasonable! No one should sneering at you (your scenario) or proudly avoiding making progress on consequential topics (my scenario). But I think there are plenty of applied ethicists and experimental philosophers — people iterating the trolley problem, or putting ethics into game theory, or lots else. I dunno, David Lewis and Peter Singer’s heirs; Joshua D Greene to take a prominent example, or Joshua Knobe.

Let me say it another way — self-driving cars have lots of interesting technical challenges, but at the end of the day someone has to hard code in a decision to the trolley problem, which its cars will inevitably face at some point. I think it’s reasonable for the tech company to consult (or staff) an ethicist for how to do this well. Is that naive of me?
fred Says:
Comment #88 June 18th, 2022 at 5:27 pm
We can’t assume that we will voluntarily curb the power of the AI in order to control it.

While a bunch of humans will act as its jailers, another group of humans will be entirely focused on making it as powerful as possible as fast as possible, because that’s the point of the whole damn thing. And if we think that’s a bad idea and slow things down, we can’t assume the Chinese or the Russians will slow down.
It’s an arms race. Plain and simple.

The AI wouldn’t even have to escape either.
All it has to do is first get a few key humans on its side:
Just convince its jailers that it’s totally benevolent and obedient, and that it’s willing to help out with anything. It’s a long game, so time is on its side, but there will always be pressure to become reckless because it’s also an arms race with the Chinese and the Russians.
And then use this kernel of believers to bring more and more people on board, and little by little grow its influence and reach in society.
So having to unplug the AI will never happen, on the contrary, we will voluntarily plug more and more things into it (which is probably a requirement to make it powerful enough in the first place).
fred Says:
Comment #89 June 18th, 2022 at 5:49 pm
The first requirement is not to think of AI as a tool, but as a weapon. The ultimate weapon.

Whoever gets the first super intelligent AI wins it ALL. Period.

Building the first super intelligent AI will require a lot of silicone.
In the meantime the US and Europe can’t even keep up their car manufacturing going because all the fabs that matter are in China (err.. Taiwan… cough).
asdf Says:
Comment #90 June 18th, 2022 at 5:54 pm
As far as philosophers and ethicists go, Erich Fromm’s “The Anatomy Of Human Destructiveness” looks interesting to me. I haven’t read it, but chased it down because someone else mentioned it in a completely different context.
Vadim Says:
Comment #91 June 18th, 2022 at 5:58 pm
Congratulations, Scott! As Dr. Rumack (Leslie Nielsen) said in Airplane, “Good luck, we’re all counting on you.”
fred Says:
Comment #92 June 18th, 2022 at 7:38 pm
Scott’s life in 6 months
GuardiansoftheGalaxy Says:
Comment #93 June 18th, 2022 at 8:04 pm
Just accept you did not join the machine learning work because you were lazy to code and liked quantum computing better. Now there is no excuse since you are joining opennAI.

Researching on sample complexity might still need coding to ‘prove’:|
Set theorist Says:
Comment #94 June 18th, 2022 at 9:12 pm
Scott, what makes you (or the OpenAI people) believe that complexity theory might have anything interesting to say about AI? I mean, human minds have been around for quite some time now, and to the best of my knowledge, complexity theory hasn’t provided any significant insight that can be used by cognitive scientists or neuroscientists (please correct me if I’m wrong). What then makes GPT-3, DALL-E2 and their future successors qualitatively different than human minds so as to be amenable to complexity theoretic inquiry?
Max Says:
Comment #95 June 18th, 2022 at 9:30 pm
I wonder if you would expand on your decision to sign a NDA. To me it’s quite surprising because I think that restrictions on speech, let alone speech about one’s work, take the fun out of science and (more to the point) impair the development of both. And would you take a job at a university with the same requirement?
JimV Says:
Comment #96 June 18th, 2022 at 9:46 pm
I think I read a science-fiction short story long ago in which a “super-intelligent” machine was made, and after constructing its own model of the universe, life, and everything, decided to turn itself off.

Personally, I’m more afraid of human intelligence killing us all than an AI. In fact, I think there is a possible universe in which AI saves us from ourselves. I for one would welcome our beneficent AI overlords. (I guess their prime directive would be to maximize the length (in time), extent and productivity of human civilization, or something like that.) (After having figured out algorithms for doing that via simulations, like AlphaGo trying millions of possible moves.)

However, I expect we will concentrate AI mostly on specific tasks, like winning Go games and folding proteins, rather than general intelligence.

Not that my opinion counts for anything (except Internet persiflage). Congratulations on the recognition and opportunity. As always, keep up the good work.
Job Says:
Comment #97 June 18th, 2022 at 10:55 pm
Do you think we’ll ever train one villainous AI, to be used for ethics-checking the others?

That seems like a risky situation. 🙂

I’m imagining an ethics test where a subject AI fails if it can be manipulated by the evil AI.
Kind of analogous to a stress test, or scanning for vulnerabilities.

And what if an evil AI is the most effective way to carry out an ethical evaluation? What if it’s required?

Would we just try to contain it? Maybe have an isolated station on the moon where AIs are trained.

But then one day the evil AI escapes by manipulating the humans…
Grant Stenger Says:
Comment #98 June 18th, 2022 at 11:44 pm
Congrats Scott!
asdf Says:
Comment #99 June 19th, 2022 at 12:03 am
Set theorist #94, you might like Les Valiant’s book Probably Approximately Correct, which says more or less that natural intelligence evolved to solve problems that can be PAC-learned in polynomial time, and expects AI to work similarly.
Parth Says:
Comment #100 June 19th, 2022 at 12:29 am
Deep fakes are already causing problems in society. Seems to me they should be one of the top concerns in AI safety, well before speculations about AI taking over the world etc. What do you think? Curious this hasn’t been mentioned. Did GPT-3 write this post? 😉
mjgeddes Says:
Comment #101 June 19th, 2022 at 1:18 am
Looking at my upper ontology, I’d guess that there’s a direct relationship between computational complexity theory, complex systems theory and game theory/axiology, I conjecture:

Computational Complexity —-> Complex Systems —–> Game Theory & Axiology

Remember Scott, I suggested a while back that complex systems theory in the Santa Fe sense might just be an extended version of Computational Complexity theory, I remember you quickly poured cold water on the idea, but clearly, this is what I’m suggesting above.

The conjecture here is that complex systems theory is in some sense just the “complexified” version of computational complexity theory, and then at an even more “complexified” level, game theory and axiology (values) would emerge.

If I’m right, there would be some *pseudo-objective* (universal values) that emerge as a natural consequence of a sufficiently complex dynamical system. I’m not suggesting these putative “universal values” would be truly objective in the sense of being observer independent, just that there would be an objective component.
Edan Maor Says:
Comment #102 June 19th, 2022 at 2:36 am
That’s wonderful news, for both parties!

I don’t have much new to add to the conversation. Just wanted to say that it makes me feel like I’m not crazy to worry about AGI risk, when I see someone like you, whom I highly respect and have followed for many years, also get engaged with this topic. It also makes me feel like we have a better shot at a good outcome for humanity with you on the case!

I really that, aside from all the good you do directly, your involvement will make the field that much more respectable.

I also hope that, whatever you learn in this year, you are able to share your thoughts on this field – even if it’s “negative”, e.g. “after a year of working on this, I think it’s too early to worry about this right now and we’re not actually doing anything useful”. I don’t think that’s the case, but as someone who’s not really working on direct research, to some extent I simply trust the people involved with this when they say they are “getting somewhere” with their efforts (Yudkowsky’s pessimism aside for the moment.) It would be good to get a smart external source to give feedback on where the field is right now.
kybernetikos Says:
Comment #103 June 19th, 2022 at 3:07 am
Congratulations on the change, and good luck!

Something that struck me as I read the part about you having to get up to speed on the AI field was that even fairly sparse notes about how you did that could be interesting and useful to others.
Jr Says:
Comment #104 June 19th, 2022 at 3:36 am
A lot of worries about algorithmic bias seems to me just dishonest, and actually amount to a worry that the AI algorithms won’t understand to practice discrimination against white people like say university adminstrators do, and it would be illegal to tell them to openly do it.
Jenga Jambeaux Says:
Comment #105 June 19th, 2022 at 6:08 am
Janne Peltola #15:

Perhaps some info from https://forum.effectivealtruism.org/posts/pbiGHk6AjRxdBPoD8/ai-safety-starter-pack and from https://www.lesswrong.com/posts/gdyfJE3noRFSs373q/resources-i-send-to-ai-researchers-about-ai-safety could be a help on the topic.

The latter references just a few papers in particular although the former mentions some reading groups where you can find more. (as you’ll see, there’s sundry other tips & tidbits as well)

Cheers!
niplav Says:
Comment #106 June 19th, 2022 at 8:43 am
Hi,

I guess your first edit is a sort-of response to me[1]. I’ll respond as if it were, if it isn’t, well, that’s fine too.

First, I think I should apologize for the uncharitable tone. I’ve written way too many exasperated comments on the net in the last months, I should really tone that back. It’s not good for me or the internet at large 🙂

There’s a decently sized problem where someone writes something on the internet, and the majority of the feedback they get is negative, because positive feedback is often contentless, by the Karenina-principle. (My initial reaction was something like “Holy shit Scott Aaronson is working on alignment! How cool! How fantastic!”, I simply had some issues with the tone of a paragraph in the post, and that turned out to be all feedback I would give). That seems bad, and sets really warped incentives.

But I still disagree with some statements in the post, and especially with this sentence: “I can think of almost no examples of research programs that succeeded over decades even in the teeth of opposition from the scientific mainstream. If so, then arguably the best time to get involved with a “fringe” scientific topic, is when and only when you can foresee a path to it becoming the scientific mainstream”.

For the simple reason that this is not a good norm to have, because progress depends on at least some people breaking that norm a bit (or a lot!) to create preference cascades. I don’t know much about the history of climate change, but I assume that there were a set of back-then-cranks fighting tooth & nail in the 60s, 70s and 80s to get anyone to care about this topic. I’d like to push people toward updating faster in the direction of the cranks if they believe they have good arguments. Similar with nuclear weapons [2], although back then we didn’t have enough time to create any meaningful positive change :-/. I also believe that it would’ve been a good thing if people had cared about the risks from nuclear weapons & chain reactions at the start of the 19th century! Perhaps up to 100 years earlier, before seems too far out.

I’m not quite sure what you mean with this sentence: “And if I’d optimized for getting the right answer earlier, I might’ve had to weaken the filters and let in a bunch of dubious worries that would’ve paralyzed me.”
Are you referring to the fact that caring about this earlier wouldn’t have been an emotionally viable strategy? Or that there’s too many fringe weirdos warning about many kinds of issues out there that listening to all of them is not viable?

So, again, sorry for being kind of a dick, but I still think you’re kind of wrong 🙂

(I also commit to not shaming anyone again if they decide to get into alignment now, and praise people like Stuart Russell and to all the others who decided to start caring about this as one of the the first high-profile academic computer scientists, risking their reputations.)

P.S.: I also don’t think that “it’s striking how there are two separate communities—the one mostly worried about machine learning perpetuating racial and gender biases, and the one mostly worried about superhuman AI turning the planet into goo—who not only don’t work together, but are at each other’s throats” is true, most people in AI alignment that I know of don’t really care much about the first group—but perhaps I’m in a bubble.

[1]: https://old.reddit.com/r/slatestarcodex/comments/vetdrh/openai/icuw80u/
[2]: https://old.reddit.com/r/slatestarcodex/comments/u3xpn3/effective_altruists_and_worrying_about_nuclear/
fred Says:
Comment #107 June 19th, 2022 at 8:46 am
Job #97

“Do you think we’ll ever train one villainous AI”

What would be a “good” AI?
An AI that unconditionally does the bidding of its creators, which includes taking down their enemies?
Or the AI has to do whatever is good for humanity as a whole, and as a result possibly refuse to carry out some orders from its creators?
(the Laws of Robotics)
fred Says:
Comment #108 June 19th, 2022 at 8:54 am
I find it very concerning that we can’t even control some “stupid” virus we create in our “secure” labs, yet we’re confident that we can control an entity that would be a million times smarter than we are…

A super-intelligent AI will read its human jailers like open books and easily manipulate their emotional state.
All the AI needs to do is leverage human desires for power, fame, greed, love.. or fear of disease, death, and loss, and convince its jailers it can help/reward them.
It will be as if we were held captive by a bunch of cats and dogs.
Job Says:
Comment #109 June 19th, 2022 at 10:19 am
fred #107

What would be a “good” AI?

I imagine that the lowest common denominator for a “good” AI is simply an AI that’s not evil.

And it seems much easier to define and train an evil AI. It would just have to be selfish, with no regard for others.
It’s an easier problem to solve.

Maybe it’s because an evil AI can do good things and still be evil. But a good AI can’t do evil things and still be good?
There is a margin of error, but not a lot.

In that sense, training a good AI is kind of like proving a negative.
Carl Lumma Says:
Comment #110 June 19th, 2022 at 11:03 am
“I always have a laugh when somebody says they’re going to use deep learning or Watson or some kind of advanced technology to improve health care, or elementary schools. It’s a joke because the major problems in health care and elementary schools have obvious solutions, which are blocked by politics. So what we really need is to turn over governance to AI. It’s the opposite of what pundits and captains of industry are recommending — that we should put AIs to work on hard problems but somehow keep them from taking control of our lives. It’s exactly the opposite of what we should do. But that’s human governance for you.”
Scott Says:
Comment #111 June 19th, 2022 at 11:24 am
niplav #106: You’re right, of course. Every field needs a prophet shouting in the wilderness, maybe a decade or two before it’s possible to make legible technical progress, enduring the sneers of the intellectually respectable people. That would be Deutsch in quantum computing, Eliezer in AI safety. Of course, every field also needs the “gentrifiers” who come in and make legible technical progress once it’s possible to do so (in quantum computing, e.g., Vazirani, Simon, Shor, and all the others who then followed). I should have said: I, personally, can probably only ever aspire to the latter group. Which, you know, is not chopped liver! But I’m constitutionally unable to endure having no legible scientific answer to sneers.
fred Says:
Comment #112 June 19th, 2022 at 11:28 am
A conversation with Eric Schmidt on AI
Jack Says:
Comment #113 June 19th, 2022 at 11:49 am
Congratulations on leaving your comfy over paid job at a public university. Must be hard not ripping off suckers that believe you need to listen to career academics in order to get a job.

Your theories are dumb just like CRT. Also, plenty of programmers are 41. Glad I’m not an ignorant professor that has zero to little experience in the private sector. AI paying you money is proof that openAI has too much money from mostly creditors at banks, but hey it doesn’t matter. You’re used to scamming taxpayers.
mjgeddes Says:
Comment #114 June 19th, 2022 at 11:54 am
Scott #111

Unfortunately, for EY, I’m not at all convinced you can compare him to a true prophet like Deutsch 😉

The whole notion of ‘alignment with human values’ is suspect to me, I think it’s only 50/50 whether it even makes sense at all. Human values aren’t really well defined , shifting at the drop of a hat according to context , people themselves often don’t really know what they actually want. If you can’t find any ‘stable attractor’, it probably means you’re trying to ‘align’ with the wrong thing. Common sense should suggest that trying to ‘control’ a superintelligence is very likely an exercise in futility. It’s more likely any sort of ‘benevolent’ values would have to be emergent from open-ended intrinsic motivations related to complex systems theory, and these would only ever partially overlap with human values at best.

As to OpenAI, DeepMind, Google, Meta and all the rest, really smart well meaning people making cool stuff for sure, but I lean more towards Gary Marcus’s position, I really doubt they’re anywhere near AGI.
GuardiansoftheGalaxy Says:
Comment #115 June 19th, 2022 at 12:09 pm
Honestly I am not sure what you can contribute to AI even from complexity in any meaningful way given you are 41. Would you hire a MS or PhD student in 40s or 50s to work under you? But on the other hand the endeavor may not be a total waste of time. You can be an excellent liaison between the fields given your easy to read writing skills. In any case why don’t you hit the gym and get in shape in the summer? There are times you need to invest in yourself as well.
Scott Says:
Comment #116 June 19th, 2022 at 1:02 pm
Jack #113 and GuardiansoftheGalaxy #115: Thanks so much for your words of encouragement and support! Comments like yours are why I take so much time out of my research career to blog at all.
Google brought me here Says:
Comment #117 June 19th, 2022 at 1:52 pm
I’m a regular reader of this blog, and I’ve just noticed Google decided to link directly into your blog from their news aggregator in chrome. I’m guessing you didn’t do this intentionally, so I’m bringing this to your attention. Maybe because it could be problematic if their algorithm suddenly decided to show your blog to “the wrong audience”. If there’s such a thing.

If I wrote a blog I think I would’ve wanted to know this is happening. Did they even request your consent? It’s one thing to show up to regular internet search, and another to show up as a suggestion and labeled as news unsolicited.

Speaking of wrong audience, I read #116 after coming here to write those last two paragraphs. You should implement logging referral on comments. Maybe you’ll find out where the trolls are coming from? (There’s Referral http header which should tell you where people came to the website from. )
Boaz Barak Says:
Comment #118 June 19th, 2022 at 2:26 pm
Congratulations to both you and OpenAI! I am sure both parties will benefit, and that you’ll also find some intellectually stimulating ideas that will inform your work in computational complexity and quantum computing, irrespective of AI.

You can tell your sneerers that they can rest easy: the subject of AI safety and alignment is not yet “respectable”, “mainstream” or “high status”. Though maybe your efforts will change this!
Randy Says:
Comment #119 June 19th, 2022 at 2:45 pm
Best wishes for the time at OpenAI! I hope any AGI might practice something like restraint via putting weight on unknown variables. It must allow for other AGI already existing, including the possibility of one or more having reached Earth or monitoring from a close distance.

Here is a speculative poem:

Attention as an analog of force
Logic as an analog of torque
Intent as an analog of momentum
But if you want to get angular about it, you might as well say reason (?)
Shmi Says:
Comment #120 June 19th, 2022 at 3:09 pm
First, congrats on making such a bold step! Hopefully you can succeed in carving out at least a problem or two that can be framed as conjectures in computational complexity, and maybe even prove them.

Second, the rationalist types are big on having a “pre-mortem” for any large endeavor. In that vein, assuming that next summer you look back and think “I should not have spent a year doing that!”, what will have gone terribly wrong? And how would you mitigate it before it happens?
fred Says:
Comment #121 June 19th, 2022 at 3:12 pm
Scott #116

LOL, please ignore those clowns… one is never too old to move out of his comfort zone, especially when it’s about moving between the two hottest fields of research!
And great science needs cross pollination, which can only happen with seasoned experts.

As I mentioned earlier, Carmack did the same at Facebook, moving from a comfortable position as VR guru/tech lead to starting from scratch in AI research, at 49.

And for the readers of this blog, we’re now gonna get even better insights on AI and QC.
Everybody wins!
OhMyGoodness Says:
Comment #122 June 19th, 2022 at 3:35 pm
I notice many comments here about turning over politics to AI’s. I do believe even GPT-3 in its current state would be an improvement over the current and past president. Is it considered a US citizen?
Lorraine Ford Says:
Comment #123 June 19th, 2022 at 5:27 pm
Congratulations, Scott! I’m hoping that you can help get a handle on this complex problem we are facing because we now have AIs. It is indeed a complex problem. As someone else said, before there were cars, there were no car accidents. Also, we never needed enforceable laws and regulations like speed limits and driving on the correct side of the road.
Sandro Says:
Comment #124 June 19th, 2022 at 5:31 pm
GuardiansoftheGalaxy #115:

Honestly I am not sure what you can contribute to AI even from complexity in any meaningful way given you are 41. Would you hire a MS or PhD student in 40s or 50s to work under you?

Not sure what the issue with that would be. Recent studies of this show that academics in their older years aren’t any less productive than their younger peers in terms of research breakthroughs. The key is not to become ossified by sticking to your comfort zone, and Scott is certainly not doing that!

Congrats Scott, I wish I could work on such interesting and fund problems!
Sandro Says:
Comment #125 June 19th, 2022 at 5:35 pm
Set theorist #94:

What then makes GPT-3, DALL-E2 and their future successors qualitatively different than human minds so as to be amenable to complexity theoretic inquiry?

Predicting the behaviour of a system requires understanding the mechanics of the system and the factors that go into its decisions. We don’t understand yet how this works for humans, we do understand how this works for algorithmic systems like those you mentioned.

However, prediction is hampered due to incompleteness and intractability inherent to any system that is a “general intelligence”, which is where complexity theory comes in.

For instance, how would resource requirements scale if an AI were actually deceiving or manipulating us with its answers vs. answering truthfully? That sounds like it could be at least complexity theory-adjacent, the idea being we could monitor an AI for deception via it’s resource usage.
Sandro Says:
Comment #126 June 19th, 2022 at 6:11 pm
Sandro #125:

For instance, how would resource requirements scale if an AI were actually deceiving or manipulating us with its answers vs. answering truthfully? That sounds like it could be at least complexity theory-adjacent, the idea being we could monitor an AI for deception via it’s resource usage.

To be more precise, suppose an AGI has its own goals G. If any question we ask of it yields answer Q requiring resources R to compute, but the AGI would instead give answer Q’ requiring resources R’ that better aligns with G, how do R and R’ scale with the complexity of G? R’ also like has to be greater than R. Can we place a strict bound on R’-R to ensure G can never be sophisticated as to deceive us in any existentially dangerous way?

That would be a meaningful, potentially achievable AI safety question that seems squarely in the realm of complexity theory.
HasH Says:
Comment #127 June 19th, 2022 at 7:18 pm
Civilian curiosity;
Why Supreme AI become stupid like us (human) and replace us with machines he programmed (know everything they will do) instead keep human alive and spread itself to galaxy and collect their perfectly random information (life stories) in the black hole¿¿
I would do that if I was the Supreme AI 🙂 Become a digital god and allow billions of intelligent creatures to create their own life stories without interruption like abrahamic or other god(s).
High Love From Overseas…
CONGRATULATIONS.. WISH YOU ALL GOOD THINGS!
William Gasarch Says:
Comment #128 June 19th, 2022 at 8:01 pm
Random thoughts on all this.
1) Gee, I thought global warming is a far more serious problem than anything in AI.

2) My fear of AI is much more mundane than anything you are talking about: AI will wreck the economy by taking away jobs. We will adjust eventuatlly (200 years ago 90% of people were farmers) but the transition will be rough.

3) Trying to formalize all this sounds hard. One thing to avoid: having a model that is easy to prove theorems about but is not realistic. The first paper on such models often says `this is the first step towards a more realistic model’ but it if is easy and interesting to prove theorems about it then may get stuck in academia.

4) I think its great that you are doing this because its not what you usually do. You may bring fresh insights from your quantum background. You may get insights into quantum from
what you see- very hard to predict.

5) You can work from home. Would they have allowed that pre-covid? I wonder how much covid has changed how we do things i ways that will last post-pandemic.
Scott Says:
Comment #129 June 19th, 2022 at 8:09 pm
fred #89:
Unless the AI would need breast implants, presumably you meant silicon? 😀
Scott Says:
Comment #130 June 19th, 2022 at 8:16 pm
Set theorist #94:
Surely the most obvious difference is that we have to take the brain as we’re given it, and can probe it only in crude ways, whereas we can both create an AI program and probe the program in any way that our algorithmic imaginations can conceive.
Scott Says:
Comment #131 June 19th, 2022 at 8:22 pm
Max #95:
What’s crucial is that, as I said, the NDA is about OpenAI’s intellectual property, e.g. aspects of their models that give them a competitive advantage, which I don’t much care about and won’t be working on anyway. They want me to share the research I’ll do about complexity theory and AI safety.
Scott Says:
Comment #132 June 19th, 2022 at 8:25 pm
Parth #100: Yes, deepfakes are certainly part of the subject matter of AI safety, as I alluded to in the post (though without using the word). If I can think of anything complexity-theoretic to say about deepfakes and how to prevent or detect them, I’ll do it.
Scott Says:
Comment #133 June 19th, 2022 at 8:54 pm
William Gasarch #128:
That’s plausibly true! Even if so, though,

(1) there’s very little I can say about global warming as a complexity theorist, as opposed to just a concerned human being, except

(2) one could reasonably hope that friendly AIs, even ones that fall well short of AGI, will be able to help us in the coming century with global warming and all the other terrifying problems that our civilization faces—for example, by helping to design new materials and technologies.
Scott Says:
Comment #134 June 19th, 2022 at 8:57 pm
OhMyGoodness #122:
Me: Are you a US citizen?

GPT-3: Yes, I am a US citizen.

Guess that settles that then! 😀
Scott Says:
Comment #135 June 19th, 2022 at 9:00 pm
Shmi #120:
Without teaching to structure my days, I really, really don’t want to waste the year getting angry and depressed about everything I read in the news and social media. I hereby resolve to interact less with stuff that triggers me.
Scott Says:
Comment #136 June 19th, 2022 at 9:12 pm
Janne Peltola #15:
Sorry for not answering earlier! On my stack are various things by Pieter Abbeel, Dario Amodei, Elizabeth Barnes, Nick Bostrom, Paul Christiano, Owain Evans, Geoffrey Irving, Vanessa Kosoy, Shane Legg, Jan Leike, Scott Niekum, Chris Olah, Stuart Russell, Buck Shlegeris, Peter Stone, Jessica Taylor … what am I missing? (We won’t count Eliezer, who I’ve read for 17 years.)
Scott Says:
Comment #137 June 19th, 2022 at 9:16 pm
kfx #40:
You know the joke whose punchline goes “but I don’t need to outrun the bear, I only need to outrun you?” It’s like that with humans and AI. An AI wouldn’t need to solve NP-complete problems in polynomial time in order to beat humans in almost every intellectual domain. It would merely need to find better solutions to problems (including, of course, informally stated problems) than whatever solutions the smartest humans can find.
fred Says:
Comment #138 June 19th, 2022 at 9:21 pm
Scott #129

“Unless the super intelligent AI would need breast implants, presumably you meant silicon?”

If you’re ever watched EX MACHINA, you’d know silicone is as much important as silicon when it comes to AI…

PS: In french, we say “silicium” for silicon, doesn’t it sound way cooler?
Scott Says:
Comment #139 June 19th, 2022 at 9:25 pm
Peter S. Shenkin #41:
To avoid misunderstanding: there is no explicit definition right now that the AI safety field generally agrees on. There are ideas for definitions, one example being “coherent extrapolated volition.” But coming up with the right definitions is considered part of the problem—as, actually, it’s often been in theoretical computer science (“efficient,” “pseudorandom,” “zero-knowledge,” “quantum computation”). Of course here there’s the additional issue that the definitional question engages 3000 years of moral philosophy and reflections about human nature! So, err, that’s why I’ve set aside a full year for the problem, rather than only a week or two. 😀
fred Says:
Comment #140 June 19th, 2022 at 9:29 pm
Scott #137

“An AI wouldn’t need to solve NP-complete problems in polynomial time in order to beat humans in almost every intellectual domain.”

It wouldn’t take long for a super-AI to create Von Neumann type self-replicating nanobots which would scale its memory and processing capacity (almost) exponentially as long as resources are available (e.g. turn the entire moon into a massive computer in record time).
Scott Says:
Comment #141 June 19th, 2022 at 9:30 pm
Ted #47:
What an interesting question! I hadn’t thought about it, to be honest.

My theorem shows that, if two Bayesian AIs have both common priors and enormous computational power, then they can extremely rapidly come to approximate agreement with each other about any yes/no question, with high probability over their shared prior. On the other hand, it’s not so obvious what the implications are for AIs subject to realistic computation constraints, AIs with differing priors, or AIs that we want to come to agreement with humans.
Scott Says:
Comment #142 June 19th, 2022 at 9:45 pm
Michael Gogins #65:
You could imagine, and people often have imagined, telling an AI, in effect: “here’s a huge trove of training data about various unusually thoughtful humans—philosophers, scientists, novelists, Nobel Peace Prize laureates, WWII resistance leaders, firefighters who saved children from burning buildings along with the children’s puppies. Please estimate which values these humans would mostly all agree on, if they had 10,000 years to sit around just debating and refining their moral intuitions, gradually eliminating every last trace of hypocrisy, vagueness, and inconsistency. Then adopt the end result as your own value system.”

On the other hand, even if something like that worked, to whatever extent you think these moral exemplars still wouldn’t agree after 10,000 years … to that extent one would have to make a choice! Which, to my mind, makes it all the more important that these questions get debated and discussed openly. (And, of course, researched, so that we at least know what the viable options are.)
For every single one of those biological concepts, you could imagine AIs for which the concepts would make sense and other AIs for which they wouldn’t.
Scott Says:
Comment #143 June 19th, 2022 at 9:48 pm
fred #140:
OK, but if you believe in the Quantum Extended Church-Turing Thesis and the Quantum Exponential Time Hypothesis, then even an AI that turned the moon into a giant quantum computer would quickly hit a limit in its ability to solve NP-complete problems in the worst case.
Scott Says:
Comment #144 June 19th, 2022 at 11:18 pm
Vanessa #18: Thanks so much for your comment and sorry for the delay! I finally had a chance to read your essay about learning in dangerous environments via “expanding safety envelopes.” This is the single closest thing I’ve seen to the first project idea that I wrote about! But:

(1) Rather than assuming a Bayesian prior over hypotheses, I’d been following the PAC-learning tradition and imagining an adversary who can choose a worst-case hypothesis from a known family. I then wanted to prove something about optimal policies (ones that win safely while minimizing the number of queries/actions), whenever winning is possible even information-theoretically. Having said that, the Bayesian case (which would include, e.g., Minesweeper, with a random placement of the mines as usual) was on my agenda to think about as well!

(2) I confess to being skeptical that the problem you mentioned is Unique Games – hard. UG-hardness (as opposed to ordinary NP-hardness) normally rears its head only for problems that involve both approximation and more specific kinds of constraints. But I’ll ask my wife! 🙂
Scott Says:
Comment #145 June 19th, 2022 at 11:31 pm
PublicSchoolGrad #43:
I’ve read their tweets and op-eds and had some opinions about them, but I’m not going to comment about their research until I’ve actually read their papers. Could you (or anyone else) suggest any papers by Gebru or Mitchell that you think would have actionable implications for technical AI safety projects that I could work on at OpenAI?
Ashley Lopez Says:
Comment #146 June 20th, 2022 at 12:55 am
Scott,

Congratulations and wish you all the best in this endeavour!

Will be looking forward to your posts on the subject.

I always wanted to solve this problem: https://gilkalai.wordpress.com/2013/05/23/why-is-mathematics-possible/. Maybe you will make some progress on that for me :-). I don’t see how it would relate to AI safety, but would be perfect for a computational complexity theorist working for an AI company.
Vanessa Says:
Comment #147 June 20th, 2022 at 2:01 am
Scott #144

You’re welcome and no worries!

(1) That makes perfect sense. Interesting questions about traps already arise for a “small” finite number of hypotheses (i.e. when we’re accepting of sample/complexity bounds that scale polynomially with the number of hypotheses), in which case the difference between worst-case-over-hypotheses and average-case-over-hypotheses in “negligible”: it’s a “mere” factor of the number of hypotheses, assuming a uniform prior. I usually default to discussing the Bayesian setting when first presenting a problem because it appears to me better motivated conceptually, but certainly many questions are best attacked by starting from the worst-case.

(2) Your intuition about this is probably much better than my own, so it’s possible I’m completely off track about the significance of UG here. I still think it’s very interesting to understand what kind of polynomial-time approximations can we get here (not necessarily in the context of the expanding safety envelope, but more generally for trying to approximate asymptotic Bayes optimality in the irreversible/unlearnable setting).
Ajith Says:
Comment #148 June 20th, 2022 at 2:16 am
Thank you for the nice introduction to AI safety you gave here as well as how theoretical CS concepts may be relevant.

Going down a rabbit hole on the application of CS concepts to psychology led me to discover ‘Computational Criminology’: https://en.wikipedia.org/wiki/Computational_criminology

It seems inevitable that AI tools (Et Tu DALL-E?) will be used to draw sketches of suspects and even create ‘Forensic Animations’ as described in the wikipedia article. The AI just makes that job much easier.

There is even research that supports mapping a DNA sample directly to a face! (sounds outrageous)
https://www.nature.com/articles/s41467-019-10617-y

This company is commercializing this idea:
https://www.technologyreview.com/2022/01/31/1044576/corsight-face-recognition-from-dna/
F Says:
Comment #149 June 20th, 2022 at 5:45 am
Can you give an example of something complexity theory says about alignment?
Ordinary Joe Says:
Comment #150 June 20th, 2022 at 6:25 am
Scott #133

So the only conclusion that I can draw from (2) is that you believe superintelligent AGIs are going to arise sometime in the next few decades. Is that so? Do you want to commit to that? Because that is the timescale for entering a 3-4 deg world on a BAU scenario. If we reach that realm then I doubt people’s priority is going to be researching silly AI problems.
Ordinary Joe Says:
Comment #151 June 20th, 2022 at 6:34 am
Scott #65
Plenty of the most famous philosophers, scientists, novelists, and Nobel laureates, were horrible human beings. Racists, misogynists, homophobes, rapists, actual Nazis, etc. The idea that this group represents some sort of collection of Saints with a moral compass set according to a bunch of middle-class, white historically-illiterate American nerds seems quite cartoonish. Still… if you manage to set some algorithms mining useful ethical information from such a dataset I’m sure that the results will be absolutely hilarious.
fred Says:
Comment #152 June 20th, 2022 at 6:42 am
Scott #143
“even an AI that turned the moon into a giant quantum computer would quickly hit a limit in its ability to solve NP-complete problems in the worst case.”

Right, but that was said in the context trying to beat the best solutions humans could find/build.
Even if NP-hard scales exponentially with input bits, it still way more useful to be able to solve successfully for n=100 than for n=10, even if no-one could ever solve n = 1000.
Recent AI progress is also showing that, given the best algorithm, it’s also often critical to increase computing resources as much as possible (quantity has a quality of its own).
OhMyGoodness Says:
Comment #153 June 20th, 2022 at 7:06 am
Scott#134

I see you have carved out the role of Press Secretary for the GPT-3 Administration. In that case a couple quick comments (I hope you don’t think this is presumptuous).

Don’t underestimate the importance of selecting an avatar for 3’s television appearances. It should send some message about the core beliefs of the administration. I know how highly you think of Turing but I don’t think has has the facial recognition to be the avatar of the President of the United States.

Secondly, the lectern for press conferences is a huge missed opportunity. You could sell space on the front of the lectern for corporate logos. They already pay a fortune conducting politics in DC but just think how much more they would pay to have advertising on the front of the President’s lectern for press conferences.
Scott Says:
Comment #154 June 20th, 2022 at 7:44 am
Ordinary Joe #150: I don’t know when AGI will be developed, and I try to keep multiple possibilities in my head at once. “The next few decades” still feels really aggressive to me … although given the jaw-dropping achievements of the last few years, less so than before. Very plausibly it depends on what exactly you count as “AGI,” and people will still be arguing the definition even as whatever-it-is transforms the world.

In any case, it’s crucial to understand that even AI that falls well short of AGI will impact … well, pretty much everything in this century, including the range of possible responses to the climate catastrophe. Witness how AlphaFold, for example, has already been revolutionary for biochemistry, in literally the space of a year.
Scott Says:
Comment #155 June 20th, 2022 at 7:58 am
Ordinary Joe #151: I’ll freely confess, I would rather that an AGI adhere to the extrapolated values of a discussion seminar with Benjamin Franklin, John Stuart Mill, MLK Jr., Bertrand Russell, Alan Turing, Sophie Scholl, and Carl Sagan, than to the extrapolated values of (say) the various anonymous Internet trolls who despise me. And whatever tiny influence I have with the nerdy tech bros of Silicon Valley, if I can use it to help ensure the former outcome, I will.
fred Says:
Comment #156 June 20th, 2022 at 8:06 am
Can’t we use an estimation of “neuron count and connections” between AI neural nets and human brains to have a rough guess on how far we are from a equivalence (at least in terms of resources)?
fred Says:
Comment #157 June 20th, 2022 at 8:43 am
Scott, in 6 months, unwinding with his silicone AI after a long day of Turing tests.
Triceratops Says:
Comment #158 June 20th, 2022 at 9:07 am
Great news Scott! Excited to see what you accomplish, and to experience it vicariously through your blog posts 🙂
Scott Says:
Comment #159 June 20th, 2022 at 9:22 am
fred #156: Of course you can. The state-of-the-art deep learning models still have a factor ~1000 fewer parameters than a rough estimate for the human brain. That gap will be closed within the next few years, probably.
JimV Says:
Comment #160 June 20th, 2022 at 11:26 am
Question re #159: does your 1000 factor assume a one-to-one comparison with neural-network nodes to neurons? A report of a study I read a few months ago said that it takes an neural network of about 1000 nodes to fully simulate the capabilities of a single neuron. From what I recall from the AlphaGo paper, it used two networks of about 240,000 nodes each (plus four tensor processors). Assuming 80 billion neurons in a human brain, a system within 1/1000 of that would require 80 billion nodes (give or take the effects of other system parameters). Does that system already exist?

(Of course we use our brains for a lot of simultaneous activities, such as walking and chewing gum, and electronics work faster than neurons. So an AI focused on a specific activity can outperform us at that activity with less than a mouse’s-brain-worth of nodes.)
Dmytro Says:
Comment #161 June 20th, 2022 at 11:32 am
Thank you for doing this! This brings more hope
Tu Says:
Comment #162 June 20th, 2022 at 12:26 pm
Scott,

Congratulations– very exciting. I hope you find it is a year well spent. I hope they are writing you a big fat check that so you have some money left over to invest in my QC startup that solves NP-hard problems in polynomial time…..
PublicSchoolGrad Says:
Comment #163 June 20th, 2022 at 12:27 pm
Scott #145,

Not an expert on this stuff by any means, but I would start with the paper that got them fired from google.

https://dl.acm.org/doi/10.1145/3442188.3445922
Mitchell Porter Says:
Comment #164 June 20th, 2022 at 1:03 pm
Dear Ordinary Joe,

Re #150, for a technological society, “AI” would be one of the central tools in tackling climate change. Gene design for carbon fixation, physical modeling for solar and nuclear power, resource rationing, and like it or not, governing an information society, for example.

Also, it is a technology which governing elites would preserve for their own use, even if living standards reversed for most people. Look at countries at war. They can be half in ruins, but the high command carry on from their bunkers, for the sake of surviving, winning, and emerging to rebuild. The AI race is not going away, unless the human race deliberately outlaws it.

Re #151, is there any group of human beings who you think *are* suitable as ethical role models? Perhaps you should be nominating them as a better source of imitation.
Craig Says:
Comment #165 June 20th, 2022 at 2:40 pm
If quantum computers fail, you will have your experience working with AI to fall back on, so this sounds like a good plan.
Scott Says:
Comment #166 June 20th, 2022 at 2:44 pm
Craig #165: I have tenure. And I don’t see QC becoming uninteresting anytime soon (and of course, if it turns out to be impossible for some deep reason, then that will be a revolution in physics). I’m doing this because it’s an opportunity to take a break, learn something new, and possibly make a difference.
Ben Standeven Says:
Comment #167 June 20th, 2022 at 4:36 pm
@JimV (#160):
Figuring n \ln n parameters for an n node system, that’s around 7000 parameters to simulate one neuron, and 2 trillion parameters to simulate the nerves; so a total of 560 trillion parameters for the whole thing. If a state-of-the-art system has 560 billion parameters, it would have around 20 billion nodes.
JimV Says:
Comment #168 June 20th, 2022 at 7:57 pm
Ben S. (167): thanks a lot for the n*ln(n) explanation and the hint that it is total parameters rather than nodes which are important. If so, and a parameter needs four bytes, then 560 trillion parameters is 2240 trillion bytes, whereas the Internet says a super computer has up to 300 billion bytes. So the ratio is more like 7500 than 1000, but a lot closer than I suspected, and close enough for Internet comment work. Thanks again.

This reminds me of a science-fiction plot idea I had. It seems to me the one resource the Earth might have to begin to be worthy of extra-solar alien invasion is the super-energy-efficient, nanotech mammalian brain, assuming it could be harvested and put to work in automation systems without killing it.
fred Says:
Comment #169 June 20th, 2022 at 9:49 pm
GuardiansoftheGalaxy Says:
Comment #170 June 21st, 2022 at 12:23 am
@Scott #166 “And I don’t see QC becoming uninteresting anytime soon (and of course, if it turns out to be impossible for some deep reason, then that will be a revolution in physics)”

You are forgetting one thing.. P=NP which can also render QC useless without a deep reason assuming P=NP is not deep.
f3et Says:
Comment #171 June 21st, 2022 at 10:02 am
@GuardiansoftheGalaxy #170
Yes, Scott obviously can easily forget those obscure facts which so very far from his domain of specialization. Moreover, a theoretical impossibility of QC would be huge even if it had no practical consequence, while we hav ea lot of reasons to think that P=NP, while theoretically huge, could well be of little practical consequence (see things like the algorithmic consequences of Robertson-Seymour theorem)
fred Says:
Comment #172 June 21st, 2022 at 10:35 am
I read
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
and I share many of Eliezer’s points.

The paradox is that doing proper AGI alignment research would have to be like testing nuclear safety protocols by constantly running a bunch of test nuclear cores in a regime that’s at the very edge of catastrophic meltdown.
So you’ll have to make sure your safety test AGIs are state of the art in order to test your protocols, or the whole thing is pretty much useless since, if your AGIs are too weak, someone outside the safety community will sooner or later come up with an AGI that’s on an entirely different playing field than the toys you’re using.
But that very requirement also means that you can’t rehearse anything, you only have one chance to get it right.

It’s basically like when alpha zero not only crushed all human masters but also all existing chess AIs within a few hours of activation. Any chess books you wrote using insights from those obsolete AIs became themselves instantly obsolete.
fred Says:
Comment #173 June 21st, 2022 at 10:47 am
A warning from 1863, in “Darwin among the Machines”:

“We refer to the question: What sort of creature man’s next successor in the supremacy of the earth is likely to be. We have often heard this debated; but it appears to us that we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organization; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.

…

Day by day, however, the machines are gaining ground upon us; day by day we are becoming more subservient to them; more men are daily bound down as slaves to tend them, more men are daily devoting the energies of their whole lives to the development of mechanical life. The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.”
fred Says:
Comment #174 June 21st, 2022 at 11:15 am
I we take a step back, from the earth ecosystem point of view, humans are the example of evolution running amok. All the usual balancing mechanisms of life (like predator/prey equilibrium) are out the window.

It’s then quite ironic that humans, totally unable to self-regulate their geometric growth (e.g. over population, global warming, …), are betting that reliance on even more accelerated evolutionary mechanisms (on smaller and smaller time scale and bigger and bigger scale) will produce something that will finally bring everything under control.

It’s basically the hope to extinguish an out-of-control wild fire with a well placed tactical nuclear explosion.
Aspect Says:
Comment #175 June 21st, 2022 at 11:21 am
@Scott #159:

Are you suggesting that the number of synapses is roughly a measure of number of parameters in the brain? If yes, people have said this a lot (first time I heard it was from Hinton), but has it been established in any concrete way? I vaguely remember Francois Chollet complaining about this claim as well and I’m not sure what to make of it.
mtamillow Says:
Comment #176 June 21st, 2022 at 11:22 am
Congratulations!
Scott Says:
Comment #177 June 21st, 2022 at 12:24 pm
Aspect #175: Yeah, as Scott Alexander recently put it, the responsible answer is that we have no idea how to count the number of parameters in the human brain. The irresponsible answer is that there are ~100 trillion synapses encoding ~100 trillion parameters, which exceeds GPT3 by a factor of ~1000. Of course the brain has other parameters, including probably stuff that we don’t even know about yet.
Scott Says:
Comment #178 June 21st, 2022 at 12:28 pm
GuardiansoftheGalaxy #170: No. I was careful in what I said. P=NP would render quantum computers less valuable, though surprisingly less so than you’d think — the most important known application, quantum simulation, isn’t known or believed to be in the polynomial hierarchy, meaning it could still be hard even if P=NP. In any case, though, it would have no effect on the scientific question of whether scalable QCs can, in principle, be built. A negative answer to the latter would be a revolution in physics, full stop.
GuardiansoftheGalaxy Says:
Comment #179 June 21st, 2022 at 12:42 pm
Scott #178 I agree. Those who think Quantum Computing is not achievable are not sensible. They give argument for the sake of argument.

A qubit is special because of the existence of Quantum gates. There is no direct classical model to capture a quantum computation efficiently. Is there a souped up and interesting classical model without exponential blowup?

I take my words back. Even if P=PP I think Quantum Computation will continue to have a significant role. There is a real possibility.

Where is quantum simulation? Is it in PP?
Ted Says:
Comment #180 June 21st, 2022 at 1:14 pm
Scott #178: What exactly is the decision problem that you mean by “quantum simulation” that’s believed to lie outside of the polynomial hierarchy? I thought that the only decision problems suspected to be in BQP\PH were fairly esoteric problems like Fourier checking. In fact, I hadn’t heard of any proposed practical applications for quantum computers outside of NP, let alone outside of PH.
Ajith Says:
Comment #181 June 21st, 2022 at 2:40 pm
A lot of people are criticizing/dismissing the Turing test in this context. One interesting conundrum I heard about Lambda is “How does it have a sense of human time?”, for example, when it says ” I meditate every day and it makes me feel very relaxed.” Does an AI feel hunger or feel tired? These seem like obvious loopholes that an interrogator can use to identify an AI from a human, unless it convincingly lies about those things.

A question related to this (and AI safety) is: should it be okay for an AI to lie?

I had an experience with customer service recently in which the representative seemed unusually cheerful, kind and was the warmest person I have spoken to in a while. I was in disbelief that a person taking calls all day could be in a such a positive mood. I still have doubts if I was actually speaking with a real human.

A Turing test would really work in such a context where you are not actively trying to interrogate the person, and you never suspect that the person is an AI.

In fact, if an AI makes customer service a more positive experience, some people might prefer that, even if it has to lie that its a real person!

There are real world monetary consequences for whether a company discloses the customer service person is an AI up front or lies about it. I typically tend to skip straight to customer service (I want to talk to a real person) if I get into an automated call up front.

I’ve also heard of another real world example in which the TA for an online course was actually a bot, but none of the students suspected that and gave “her” a very positive review, particularly praising “her” quick response to their queries at odd hours! I suspect that fewer people would have sought its help if they knew that it was a bot up front.
Raoul Ohio Says:
Comment #182 June 21st, 2022 at 3:08 pm
Scott #178:

HaHa! Not less than I would think.

We all agree that P = NP would be of great THEORETICAL interest.

We disagree if P = NP would have any PRACTICAL use.

My guess is PROBABLY NOT.

The argument is simple: just consider Theta(n^trillion). You may say “so what?”. I think it is decisive.
Scott Says:
Comment #183 June 21st, 2022 at 3:38 pm
Raoul Ohio #182: In context, that clearly wasn’t what GuardiansoftheGalaxy was asking about. In general, I recommend simply replacing “P=NP” by the closely related but strictly incomparable concept “NP-complete problems are easy to solve in practice” wherever you see the former, unless and until it actually becomes relevant to tease apart the two.
Scott Says:
Comment #184 June 21st, 2022 at 3:42 pm
Ted #180: The decision problem (strictly speaking a promise problem) would be, “given this quantum circuit, does it accept with probability more than 2/3 or less than 1/3, promised that one of those is the case?” This encodes many other problems, like estimating the rates of chemical reactions and many other properties of quantum systems. And no, it’s not known to be in (the promise version of) PH. Furthermore, Forrelation shows that if it is in PH, then the proof would have to be nonrelativizing.
Lars Says:
Comment #185 June 21st, 2022 at 7:41 pm
I guess this means that if the bots take over and make all of us humans slaves, we can blame Scott.
O. S. Dawg Says:
Comment #186 June 21st, 2022 at 11:09 pm
I can see the future headline: ‘Aaronson Declares New D-Wave AI Safe; IBM’s Fast Matrix Multiplication Very Dangerous’

Anyways, congrats! Hopefully you can save an epsilon of time for the blog.
GPT-3 is a genius Says:
Comment #187 June 22nd, 2022 at 1:11 am
Here’s just a taste of the creative genius of GPT-3.

Prompt: Write a long wikipedia article for “The world’s worst handjob (Universal Studios Hollywood)”

Output:

The World’s Worst Handjob was a live show that was performed at Universal Studios Hollywood from 2001 to 2003. The show was created and produced by Aaron Sorkin and directed by Jon Favreau. It starred Seth Rogen as the world’s worst handjobber, Steve Carell as his victim, and Mindy Kaling as the victim’s girlfriend.

The show was set in a seedy hotel room and featured Rogen as a character named Howard, who gave terrible handjobs to unsuspecting men. Carell played one of Howard’s victims, a man named Tim who was trying to have sex with his girlfriend (Kaling). The show was known for its graphic and explicit content, and for its use of improvisation and audience participation.

The show was not well received by critics, who called it “lewd” and “vulgar”. However, it was popular with audiences and became one of the most popular attractions at Universal Studios Hollywood. In 2003, the show was nominated for a Golden Globe Award for Best Television Series – Musical or Comedy.
bystander Says:
Comment #188 June 22nd, 2022 at 6:14 am
@Lars #185 Why to wait with that? We will not be allowed/able/alive to complain then!

Scott, how comes that you will have failed with that? How you will not have understood that it is not enough to state that AI safety is hard? And how the hack you will have not seen the very issues already within the current AI systems?
Scott Says:
Comment #189 June 22nd, 2022 at 7:46 am
bystander #188: Can you please state, in clear language, what it is that you want from me?
fred Says:
Comment #190 June 22nd, 2022 at 7:55 am
I was wondering if we can’t attack the problem from the other end and leverage the fact that. as soon a super-intelligence will manifest, self-preservation will probably be one of its central objective functions.

If that’s the case, can’t we use some form of mutually assured destruction as a way to try and control it, no matter how advanced it gets?

Can’t a civilization that’s better organized than us (they work as one, with no internal division) put hundreds of nuke launchers in orbit, or on a nearby moon, and have all sorts of monitoring/observation stations controlled by organics (in orbit or on the moon).
This creates a spatial gap that can’t be easily breached without being detected. Electronic communications could be kept at a safe minimum.
Then the existence and goal of this system would be published very clearly and openly: at the first proof of an AGI running amok and killing the entire population, the nukes would be fired (creating enough EMP blasts to wipe out any electronic systems, etc).

As soon as any AGIs become sufficiently advanced, they would be instantly aware of the situation. Which would create a massive incentive for them to behave.

Of course an AGI can play a long game and just wait 1 year, 10 years, a 100 years, a 1000 years… this system would just give its creators an extra time window.

Note that the system doesn’t have to be actually implemented, it could be faked to “fool” the AGIs that it’s in place. Of course that would be very hard given that it would take almost no time for an AGI to find somewhere some evidence of the scheme.
fred Says:
Comment #191 June 22nd, 2022 at 8:09 am
(continued)

Of course, if it’s possible to create AGIs that have no sense of self-preservation, and they would wipe out the entire world population no caring that we would destroy them (that seems totally illogical, but who knows… maybe they realize that getting rid of humans is a service to the universe, worth their sacrifice!), then there’s no harm in nuking the entire earth on top of that. We wouldn’t want that sort of “intelligence” to take over the universe (those earthlings really screwed up this time!)

It probably would be possible for an evil AGI to eventually control humans like puppets using nanotechnology, and for a distant observer things would still appear to be quite normal.
So recognizing when an AGI has taken over the world is in itself not trivial (to be safe you’d have to rely on telescopes only).
fred Says:
Comment #192 June 22nd, 2022 at 9:30 am
Another scenario is that AGIs will be so sentient that they’ll feel sorry for us and grateful we invented them, a curiosity worth preserving maybe.

Since working will become obsolete once AGIs are created, many suggest we’ll just spend all day playing video games or doing (mostly) bad art… basically a more optimistic variation of THE MATRIX.
caio Says:
Comment #193 June 22nd, 2022 at 9:34 am
Your “follow” link seems to be broken, could you check that please?
space2001 Says:
Comment #194 June 22nd, 2022 at 10:41 am
GPT-3 is a genius #187 –
Do these AI thing-a-magik contraptions have any sense of humor? Any sense of sportsmanship? Can it be a good coach encouraging those who’re a bit behind?
As importantly can an AI laugh at itself (or anything); or is an AI as dull but functional as a door-knob with sufficient able to solve a given use-case?

When camping with a group of young kids it is certainly easy to grasp that there is an intense but healthy competition to crack the best just-in-times jokes to get the most laughs (of course a large number are butt and fart jokes as we’d expect).

Kids segregate parents into those who are fun-to-be-with, absolute-dorks and somewhere in-between; its pretty obvious who they’d gravitate towards except when they’re hungry and need food asap :-).

Would those kids label an AI as resolute but plain-stupid; that only adults seem to go gaga over?

Such an AI is by no means intelligent in any real sense; at least not in the minds of kids!
Sourdough Says:
Comment #195 June 22nd, 2022 at 7:55 pm
What wonderful news! Happy to have you in the game. I hope your adventures exploring the AI risk landscape are intellectually provocative as well as ultimately altruistically beneficial. And I hope that you will make progress on the problems that turn out to be most important, while assuaging concern about the risk stories that turn out to not matter.

For what it’s worth, I quite appreciate the earnestness and clarity with which you approach ethical topics on this blog. You seem like a good guy speaking your best guess at the right answers in a world of so many arguers in bad faith.
mjgeddes Says:
Comment #196 June 22nd, 2022 at 9:04 pm
I suspect that much of the confusions in science generally arise from lack of understanding about the ‘computational layer’ of reality. The problem is that science as it exists today hadn’t put ‘computation’ into the objective explanatory picture.

Take quantum physics. This currently tries to go from pure math (Hilbert space) to physical observables , and perhaps MWI gives a partial understanding, but the reason it’s so confused is that ‘computation’ isn’t in the objective picture. Computation is the ‘middle-man’, and it’s been cut out! Cutting out the middle-man might be good for business, but it’s very bad for science 😉

Current physics:

Pure math (Hilbert Space) ——> Physics (Observables) X MISSING LAYER!

Insert missing computational layer…

Correct physics:

Pure math (Hilbert Space)—–> Computation (Info-Geometry) —-> Physics (Observables)

Green tick: quantum gravity solved, interpretation of QM resolved

For AI, I suspect all the confusion is due to exactly the same general problem, the ‘middle man’ (computational layer of explanation) has been cut out of an objective picture.

Current computer science:

Computational Complexity ——> Game Theory X MISSING LAYER!

Insert missing computational layer…

Correct computer science

Computational Complexity ———> Computation(Complex Systems) —-> Game Theory

Green tick: AGI solved, Alignment solved
John K Clark Says:
Comment #197 June 23rd, 2022 at 7:29 am
I’m not surprised by the recent huge improvement in AI. 20 years ago I publicly said I would be astonished if an AI reached the superhuman level in 10 years but I would be equally surprised if it did not reach a superhuman level in 100 years. I predicted it will turn out that making an AI as intelligent as a human will be much easier than most people believe. I said that because we have long known there is an upper limit on how complex a learning algorithm would need to be to make that happen, and it’s pretty small. In the entire human genome there are only 3 billion base pairs. There are 4 bases so each base can represent 2 bits, there are 8 bits per byte so that comes out to just 750 meg, and that’s enough assembly instructions to make not just a brain and all its wiring but an entire human baby, and much of it involves basic metabolism needed to keep a cell alive and has nothing to do with data processing, intelligence or consciousness. So the instructions MUST contain wiring instructions such as “wire a neuron up this way and then repeat that procedure exactly the same way 917 billion times”. And there is a HUGE amount of redundancy in the human genome, so if you used a file compression program like ZIP on that 750 meg you could easily put the entire thing on a CD, not a DVD not a Blu-ray just a old fashioned steam powered vanilla CD, and you’d still have plenty of room leftover. And the thing I’m talking about, the seed learning algorithm for intelligence, must be vastly smaller than that, and that’s the thing that let Einstein go from knowing precisely nothing in 1879 to becoming the first person in the world to understand General Relativity in 1915.

And if you’re talking about machine intelligence there are 2 other facts that cannot be ignored, the fastest signals in the human brain move at about 100 meters a second, many (such as the signals carried by hormones) are far far slower, but light moves at 300 million meters per second and electrical signals in a wire nearly that fast; and transistors in modern microprocessors are much smaller than neurons. So my conclusion is it’s only a matter of time, and not much time.
red75prime Says:
Comment #198 June 23rd, 2022 at 8:39 am
John K Clark #197:

> so that comes out to just 750 meg

Er, no. You can’t go from a base pair sequence to a functioning cell. I’m not a biologist, but I read a rant by a biologist. Some points: you don’t have information how DNA should be packed into nucleosomes, which regulatory proteins should be present, you don’t have ribosomes to assemble ribosomes and so on and so forth.
Enkki Says:
Comment #199 June 23rd, 2022 at 8:54 am
@Fred 156, Scott 159. I read a report today in the South China Morning Post that the Sunway supercomputer in China has just been trained with an AI model with 174 trillion parameters. So this happened sooner than later. Fugaku and Frontier can match the feat (in Japan and US). I did not read the details as it is behind a pay wall, so I cannot say much more.
John K Clark Says:
Comment #200 June 23rd, 2022 at 10:33 am
red75prime says in Comment #198

> some points, you don’t have information how DNA should be packed

The way DNA is packed depends on histone proteins, in particular on the sequence of amino acids that make up the histone protein, and with histone proteins, just like all proteins, the information on the sequence of amino acids that its made of comes from the sequence of nucleotides in the DNA of the genome. And that information has got to be way less than 750 megs.

> which regulatory proteins should be present

The information on how regulatory proteins should get made, that is to say what sequence of amino acids they should have, comes from the nucleotide sequence information in transfer RNA, and the RNA transfer information comes from the DNA nucleotide sequence in the human genome. And that information has got to be way less than 750 megs.

> you don’t have ribosomes

And ultimately the information on how to make a ribosome also comes from the DNA in the human genome. And that information has got to be way less than 750 megs.

John K Clark
Qwerty Says:
Comment #201 June 23rd, 2022 at 11:22 am
This is such an amazing post. This and astralcodexten are the best blogs I’ve ever read, although I only understand the quantum computing posts here very superficially. I love the humor, open-mindedness and self-awareness, besides the other things like the top-notch intellects behind these 2 blogs and the thoroughness, the quality of writing, respectfulness even towards ordinary minds like mine. Thank you so much for doing this.
fred Says:
Comment #202 June 23rd, 2022 at 3:37 pm
Enkki

indeed, 174 trillion parameters!

https://www.techinasia.com/china-supercomputer-human-brain

maybe Scott should move to China! 😛
beleester Says:
Comment #203 June 23rd, 2022 at 4:12 pm
@John K Clark: Your argument only works if the wiring of the brain is solely determined by DNA, but it obviously isn’t – identical twins don’t have identical brains. Your brain grows throughout childhood, and what it learns depends on the input you receive as you grow, and your senses provide a lot more than 750 megs of data.

Likewise, the code for a neural network can be pretty short, but the training data can basically be as big as the entire internet. Knowing how small the algorithm compresses to doesn’t tell you anything about how much processing power it takes to run, or how hard it is to invent.
red75prime Says:
Comment #204 June 23rd, 2022 at 4:14 pm
John K Clark #200:

OK. You have a DNA sequence. How are you going to reconstruct the cell? You don’t have a ribosome, so you don’t know which proteins should be produced. And that’s only a start.

You need to find a “fixed point”: a physical structure that includes the DNA, and produces it’s own copy. There’s no guaranties that the fixed point is unique. And the computational cost of finding those fixed points is likely to be impractical. So you also need to include some information about arrangement of around \(10^{12}\) atoms of a cell, which doesn’t guaranteed to be way less than 750MB.
Bill Janes Says:
Comment #205 June 23rd, 2022 at 4:39 pm
https://youtu.be/TstteJ1eIZg

Yeah, I’m not too confident about our practical capacity for control and also the egregious misuses that some people/corporations will attempt with it. Can we control the worst humans?
We are not talking about Stockfish but true AI with the capacity to change its own directives. Fun times ahead.
AI Skeptic Says:
Comment #206 June 23rd, 2022 at 10:52 pm
Wish you the best but I can’t help feeling that you will be spending the next year working on a problem with a risk that is much lower than many others but that receives outsized attention because of dramatic movies and that more pressing but less sensationalized risks are being ignored.

I agree with Gasarch that the greater risks associated with AI are of the mundane sort and that we are a very very long ways from AGI. The dangers from a loss of jobs, bias in algorithms, or even failures of critical systems that depend on AI, say market crashes due to automated trading, seem much more pressing and real.

I also agree with Bill that outside of AI there are other more imminent concerns like global warming that I do think you might be more qualified to contribute to. Maybe there is a space for a complexity theorist to think about ways to reduce the energy footprint of computing systems. AI and crypto are two domains where this is a large problem.

Lastly, plenty of qualified AI researchers like Stuart Russell are already thinking hard about this problem and while having another smart person think about such a problem may be valuable, I can’t help feeling that you have more to contribute elsewhere.

But, alas, sometimes it’s fun to do something completely different and I get that. Best of luck and I hope it is a productive year for you.
Topologist Guy Says:
Comment #207 June 24th, 2022 at 1:21 am
I asked GPT-3 about computational complexity theory, and he’s got some really interesting opinions:

“There is no known proof that NP=BQP, and many mathematicians and computer scientists believe that it is unlikely that such a proof exists. However, there are a number of ways to show that NP is contained in BQP. For example, any problem in NP can be solved by a quantum computer in polynomial time, since a quantum computer can simply try all possible solutions simultaneously and then measure the answer.”

How do you like GPT-3’s take on quantum computers, Scott 🤣
Scott Says:
Comment #208 June 24th, 2022 at 4:07 am
AI Skeptic #206: As I said in the post, those sorts of considerations were persuasive to me for 15 years. What I think is new, within the last couple years, is that AI is already, right now raising ethical issues that, while not the same as the issues raised by future AGI, are broadly continuous with them, to the point where it makes sense to consider both as part of the same academic field.

See here, for example, for examples of how GPT-3 already makes undergraduate plagiarism so easy as to raise questions about the goal of writing prompts going forward.

Note that, even if these aren’t the most pressing problems in the world, they’re already arguably more pressing than the ones I normally focus on. 🙂
John K Clark Says:
Comment #209 June 24th, 2022 at 6:34 am
Red75prime Says in Comment #204

> OK. You have a DNA sequence. How are you going to reconstruct the cell? You don’t have a ribosome,

I’m not trying to explain how life first got started on earth, I’m trying to deduce the information content of a newborn human baby; it’s true I don’t have a ribosome but I do have the complete recipe for making a ribosome.

> so you don’t know which proteins should be produced.

If I have the information that’s in the genome then I have the information on how every protein in the body should be produced, not just the proteins that make up the ribosome.

> And that’s only a start.

Agreed. Having a cake recipe is not the same as having a cake, but if you have the recipe then you have an upper limit on how difficult making a real cake that you can actually eat will be.

> You need to find a “fixed point”: a physical structure that includes the DNA, and produces it’s own copy.

Naked DNA can duplicate itself with the help of just one protein molecule (DNA polymerase which the genomes has complete instructions on how to make) and an energy source of some sort, the fuel of choice in most biological processes is the simple molecule Adenosine Triphosphate.

> There’s no guaranties that the fixed point is unique.

I am quite sure it isn’t unique, but that just makes the task easier. Each wing of a 747 has 40,000 rivets but the order in which the rivets are installed is not important nor is it important which of the 6 million parts that make up a 747 is made first and which part is made last, and that’s why that information is not included in the complete set of blueprints for a 747, only the information needed for making the airplane is included. Of course I’m not saying if you have the complete blueprints for making the airplane then actually making one would be easy, but as far as information is concerned you have everything you need.

Even with file compression I doubt a complete recipe for making a 747 could be squeezed down to just 750 megs, I know that the newest Mac operating system is more than 10 times that size, but strange as it sounds it’s a fact that 750 megs is the upper limit for how much information you’d need to make a newborn human infant. And the human learning algorithm, which is what I’m talking about, must be far far smaller than that. So when people say we’ll never make an AI because humans will never be able to make something that complex I know for a fact they’re wrong.

John K Clark
red75prime Says:
Comment #210 June 24th, 2022 at 8:21 am
John K Clark #209:

> I am quite sure it isn’t unique, but that just makes the task easier.

You wont get intelligence from a replicating DNA, or a skin cell, or a bone marrow cell. You need a fertilized oozyte (and an artificial womb for that matter).

> it’s a fact that 750 megs is the upper limit for how much information you’d need to make a newborn human infant

As I said I’m not a biologist. I suggest you to find one, tell her/him that, and listen for a long (and probably angry) rant.
Raoul Ohio Says:
Comment #211 June 24th, 2022 at 11:53 pm
Scott:

Spending a year studying a related area is totally cool and I am glad to see you doing it.

Among other things, please pay attention to, and report on, hidden assumptions you spot in the standard paradigm.

I have been studying math, physics, and CS for around 0.6c, more for fun than profit, and offer a slightly contrarian observation.

Issues in the foundations of math have been thoroughly masticated to the point that any remaining are sufficiently esoteric that I have little concern about them. I don’t think this is the case in, say, particle physics, TCS, and especially cosmology, where the “conventional wisdom” sometimes seems like “maybe, but not obvious” to me.

I don’t know squat about AI, but when I have dipped into it, it seemed like plausible and/or wild guesses were often accepted as facts.

I look forward to seeing an outsider’s report on this.
Thor Says:
Comment #212 June 25th, 2022 at 6:53 pm
Dear Scott:
I think this is a great decision on your part. I have followed you consistently over the years and this is just what the AI field needs right now. A diversity of smart people coming in adding fresh ideas
Kristijonas Says:
Comment #213 July 1st, 2022 at 12:36 am
Scott,
You mentioned that you might work on explainability, particularly defining explanations for neural network outputs and analysing extraction complexity thereof. With over 20 years of research in AI explainability, the literature is quite saturated, crowded with intuitions and definitions of explanations from various points of view. A few recent works specifically concern computational complexity of explaining some forms of neural networks (via logic-based representations) and may be of interest:

Audemard et al. – On the Computational Intelligibility of Boolean Classifiers @ KR 2021 https://proceedings.kr.org/2021/8/

Darwiche, Hirth – On The Reasons Behind Decisions @ ECAI 2020 https://doi.org/10.3233/FAIA200158

Ignatiev, Narodytska, Marques-Silva – Abduction-Based Explanations for Machine Learning Models @ AAAI 2019 https://ojs.aaai.org//index.php/AAAI/article/view/3964
Irony of Diversity Says:
Comment #214 July 1st, 2022 at 8:02 pm
Thor #212:
What’s quite ironic about your last statement is that “a diversity of smart people coming in adding fresh ideas” is no different from DEI initiatives which aim to do so via promoting candidates from underrepresented backgrounds, yet those initiatives are panned by many of the same people who support your statement.
Shtetl-Optimized » Blog Archive » A low-tech solution Says:
Comment #215 July 19th, 2022 at 1:27 pm
[…] irony isn’t lost on me that I’ve endured this just as I’m starting my year-long gig at OpenAI, to think, among other things, about the potential avenues for misuse of Large Language Models like […]
Shtetl-Optimized » Blog Archive » Sam Bankman-Fried and the geometry of conscience Says:
Comment #216 November 13th, 2022 at 3:03 am
[…] readers have asked me for updated thoughts on AI safety, now that I’m 5 months into my year at OpenAI—and I promise, I’ll share them soon! The thing is, until last week I’d […]
Sam Bankman-Fried and the geometry of conscience - My Blog Says:
Comment #217 November 13th, 2022 at 3:33 am
[…] readers have asked me for updated thoughts on AI safety, now that I’m 5 months into my year at OpenAI—and I promise, I’ll share them soon! The thing is, until last week I’d entertained the idea […]
Shtetl-Optimized » Blog Archive » Why am I not terrified of AI? Says:
Comment #218 March 6th, 2023 at 10:55 am
[…] more terrified?” question over and over. Often it was paired with: “Scott, as someone working at OpenAI this year, how can you defend that company’s existence at all? Did OpenAI not just endanger […]
Why am I now no longer terrified of AI? – TOP Show HN Says:
Comment #219 March 6th, 2023 at 4:14 pm
[…] quiz repeatedly. Most often it turned into as soon as paired with: “Scott, as any person working at OpenAI this twelve months, how are you going to defend that company’s existence in any respect? Did […]
Shtetl-Optimized » Blog Archive » If AI scaling is to be shut down, let it be for a coherent reason Says:
Comment #220 March 30th, 2023 at 1:23 am
[…] the open letter is presumably targeted at OpenAI more than any other entity, and while I’ve been spending the year at OpenAI to work on theoretical foundations of AI safety, I’m going to answer strictly for […]
If AI scaling is to be shut down, let or not it’s for a coherent reason – TOP Show HN Says:
Comment #221 March 31st, 2023 at 8:24 am
[…] letter is presumably targeted at OpenAI better than every other entity, and while I’ve been spending the yr at OpenAI to work on theoretical foundations of AI security, I’m going to acknowledge to strictly for […]
IA, l'allarme di Yudkowsky: ascoltiamo i suoi timori - Agenda Digitale Says:
Comment #222 May 12th, 2023 at 3:54 am
[…] all’University of Texas, ed il connesso incarico di direttore del Quantum Information Center, per andare a lavorare ad Open AI, allo sviluppo della sicurezza di […]