On overexcitable children

Wilbur and Orville are circumnavigating the Ohio cornfield in their Flyer. Children from the nearby farms have run over to watch, point, and gawk. But their parents know better.

An amusing toy, nothing more. Any talk of these small, brittle, crash-prone devices ferrying passengers across continents is obvious moonshine. One doesn’t know whether to laugh or cry that anyone could be so gullible.

Or if they were useful, then mostly for espionage and dropping bombs. They’re a negative contribution to the world, made by autistic nerds heedless of the dangers.

Indeed, one shouldn’t even say that the toy flies: only that it seems-to-fly, or “flies.” The toy hasn’t even scratched the true mystery of how the birds do it, so much more gracefully and with less energy. It sidesteps the mystery. It’s a scientific dead-end.

Wilbur and Orville haven’t even released the details of the toy, for reasons of supposed “commercial secrecy.” Until they do, how could one possibly know what to make of it?

Wilbur and Orville are greedy, seeking only profit and acclaim. If these toys were to be created — and no one particularly asked for them! — then all of society should have had a stake in the endeavor.

Only the rich will have access to the toy. It will worsen inequality.

Hot-air balloons have existed for more than a century. Even if we restrict to heavier-than-air machines, Langley, Whitehead, and others built perfectly serviceable ones years ago. Or if they didn’t, they clearly could have. There’s nothing genuinely new here.

Anyway, the reasons for doubt are many, varied, and subtle. But the bottom line is that, if the children only understood what their parents did, they wouldn’t be running out to the cornfield to gawk like idiots.

80 Responses to “On overexcitable children”

  1. Some rando on the internet Says:

    Oh come on. You want to compare early flight machines to current AI? Yeah we have a lot of AF (Artificial Flyers) in the air nowadays, and all of them are doing their job. But has that brought us anywhere near GAF? Just compare an airplane to an eagle, or a helicopter to a dragonfly. Look at the grace of an eagle hovering in the sky and then dive-bombing onto a rabbit. Watch a dragonfly zigzagging over a lake. I could just go on and on, but you get the picture.

    The only thing we have done is to move the goalpost of what means “flying” to such a low standard that we can now confidently say that the thing a Boeing does is actually “flying” … we were better off if we’d just call it “air transportation”. It is what it is, and it’s very useful, but we are light-years away from actually flying, you know, like a bird, or an insect.

    Sorry my bad English…

  2. David Says:

    Like the Pied Piper of Hamelin who lured children away with music that they couldn’t resist. Only this Pied Piper is without malice and does not know where his music will take them and neither do we.

  3. Scott Says:

    Some rando on the Internet #1: Yes, you’ve understood the point exactly! The entire so-called “aeronautical revolution” has never even touched the real problem of flight, but only sidestepped it. If people are impressed by Boeing et al’s lumbering air-transportation devices, that just shows what gullible idiots they are. Even if the devices have remade the world’s economy, that just shows the folly of treating money, commerce, and “usefulness” as measures of true worth.

  4. Roger Schlafly Says:

    The wild-eyed enthusiasts were also wrong. The Ford Motor started work on the Ford Flivver in 1925, to become the Model T of the Air. Everyone was going to have his own flying car.

  5. Scott Says:

    David #2: Have we ever known where the music would take us?

  6. Scott Says:

    Roger Schlafly #4: The problems with everyone having their own flying car are severe ones, but they’re social and economic more than technical and have been for a century.

  7. Some rando on the internet Says:

    Scott #3: I think you did not get my point. Nobody is an idiot for admiring planes, or space crafts, for that matter. I admire those. But nobody thinks we can scale a Boeing into doing what an eagle is capable of. Now replace Boeing by chatGPT and eagle by brain.

    Probably we will see economical impact from ML. Will it be so thoroughly like aeronautics? That is yet to be proven.

  8. Sebastian Says:

    No matter how impressive the technology, it is screwed up that OpenAI raised money pretending to be a non-profit and has now become… this.

    >Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

    Honestly I don’t understand how it is legal, in a just world it certainly wouldn’t be.

  9. PublicSchoolGrad Says:

    I think this is an unfair characterization of the Chomskyite position (I’m assuming this and the previous post are responses to his attitude towards GPT and LLM’s). Having listened to him talk about this at other places, I don’t think he would disagree with the claim that LLM’s will be incredibly useful and even transformative.

    Chomsky’s main interest, from what I can gather reading his stuff, is in understanding what he calls the faculty of language, which seems to include thought. This is a uniquely human (up to now, anyway) ability that exists in the natural world. His contention is that while LLM’s can be useful, they won’t tell us much about the human faculty of language. He is actually on record as saying that he uses one of these models for speech transcription, for example. His example of “planes don’t fly” is meant to distinguish what airplanes do from what birds do, i.e an engineering question vs a scientific one. Now, I don’t think that is a perfect analogy with language because the principles behind flight are simpler than those behind thought. He views LLM’s as an engineering approach to a problem. The question of what humans do is a scientific question.

    Maybe LLM’s will shed some light on that question, in a similar way to simulations that yield some answers to physical questions. I am skeptical about that, however, given that we know comparatively little about how we produce language. This is apart from the fact that we can create impressive devices that we can have conversations with. Simulations are typically based on existing principles that are understood. That is why, for example, we couldn’t have simulated our way into understanding Mercury’s anomalous perihelion precession until Einstein came up with general relativity. In the case of language, there are no such principles that we know of. So it is hard to say LLM’s do the same thing without understanding how we produce language and thought.

    Chomsky can be polemical, to say the least, so sometimes he comes across as combative and dismissive of positions he does not agree with. Nevertheless, his core claim that LLM’s will not illuminate human thought seems reasonable to me.

  10. JimV Says:

    Here are a couple of (to me) impressive examples which I came across recently:



    There are many others, such as ChatGPT scoring 95% on a a twenty-question final exam for a graduate-level organic chemistry course.

    Of course, there are numerous bad examples also. But the experts can work at eliminating the bad examples, without (yet) violating any moral or ethical standards. It seems like a difficult but promising and exciting field to work in. I hope it succeeds, and believe it eventually will.

    My life experiences have prepared me to have those reactions, by convincing me of the power of trial and error (plus memory). I am not sure anyone, much less everyone, really understands how this universe works, but we are still free to find out what works, by trial and error.

  11. JimV Says:

    However, as I understand it (perhaps incorrectly), the LLM method (used in ChatGPT) depends on the information it is trained on, and is limited in coming up with anything new, except perhaps for fortuitous recombinations. It seems to me that a successful AGI system would require other modules also, such as a logic analyzer, and a trial-and-error idea generator, and an overall supervisory system or operating system, responsible for coordinating all the modules, and with the ability to create and train new neural networks as needed. So I think we are still a long way off from that. Still, just LLM encyclopedias which can respond directly to human-language queries will be useful in the meantime.

  12. lewikee Says:

    I am still stunned that more people aren’t stunned at just how closely this thing approximates actual thought and at how big of a deal that is.

    I’d have expected all of us be “children” when it comes to the awe of it, and then having some of those children appeal to more moderate reactions after the fact. But all the yawns and cynicism from the get-go?

    My guess is it’s a combination of people having been desensitized to this whole concept after
    a century of exposure through sci-fi media, and most not understanding how often this was attempted and failed at over the years.

    If I told a sci-fi geek not well versed in physics that I made a spaceship that went faster than light, they’d think it’s super cool but wouldn’t find it world-changing, while a physicist would lose sleep over it. Not that I think FTL travel and LLM’s are on the same scale – the analogy is just to demonstrate the difference in reactions.

  13. Fred The Robot Says:

    I agree with the point you are making but the way you make it is too much like a straw man argument, and the way you have folded in complaints about the lack of openness with denial of the significance of LLMs feels cheap. Honestly this stuff is beneath you anyway. Can we have some posts on computational complexity instead? How much computation can an LLM do given a prompt? Is the reason it performs better when asked to show it’s working the increased computation that this allows? Can it learn to give longer answers to more complex questions so as to automatically allow for more computation? Is it possible for it to give a concise answer to a computationally complex question?

  14. Roxana Says:

    “Daddy, tell me again about the birds!”

    “Again? Didn’t I tell you last night?”

    “Yes, but please daddy – I want to hear it again!”

    “Well, OK. When I was your age, you could still see these fantastic creatures, with a body covered not in hair but feathers, with two eyes like you and me, and two feet – but instead of arms, they had wings!”

    “Like a plane Daddy?”

    “No! Not metal, not with blades and engines that could [suck a man inside them](https://www.theguardian.com/us-news/2023/jan/25/alabama-airport-worker-killed-jet-engine-safety-warnings). They were of flesh, like you and I, but not like you and I. And nearly all of them could fly by using them, whether soaring majestically above a clearing, stalking its prey or beating their wings dozens of times per second like the hummingbirds who visited our garden”

    “And what color were they?”

    “Every color you could name, in every combination, and more. Some of the more colorful ones could even talk like us, though they couldn’t understand our words, they just mimicked us.”

    “Like the computer Daddy?”

    “No, even that computer is smarter than them, which is why… nevermind. Anyway, the birds were descended from dinosaurs, they told us – and sadly, they seem to have followed them off the Earth…”

    “Why did they go away?”

    “Well, we think some of them are still up there on the surface now, but the bad air and storms made it even harder for them to survive than us. All those power plants, factories, even those crappy, jam-packed planes that we’d go on… we thought we could plant a few trees after every trip but it wasn’t enough… You should go to bed.”

  15. mfields Says:

    lewikee #12: “I am still stunned that more people aren’t stunned”

    It seems there’s so much talk about all the amazing technological marvels of today, but when a real marvel comes along we can’t discern it from all the gilded, upsold repackagings of last century’s science. If we could just pull all the externally motivated junk out of the discussion and look at what this “toy” is doing, we would see that it’s doing the impossible. GPT is a real, actual breakthrough — the kind that you hope for but never really expect to happen.

  16. DavidM Says:

    Scott, apologies since this is slightly off-topic, but do you have a canonical resource explaining GPT-n for the working computer scientist? (i.e. assuming general mathematical sophistication but not knowledge of ML beyond say what a neural net is)

  17. Scott P. Says:

    I think current AI programs are more like the pre-Wright aircraft that didn’t go anywhere. We’re awaiting someone to figure out the airfoil, if such a thing exists.

  18. Corbin Says:

    Dear Scott,

    On the topic of Sir Cayley and the Wrights again? Very well.

    I feel like it is quite rude to discount the other heirs to Sir Cayley’s insights about aeronautics. Ever since Stringfellow’s carriage made transit over the Nile, intrepid independent researchers have each made progress on the phenomenon of “artificial general flight.” Today, we understand well how to fly: just go up and forward faster than you go down towards the ground, according to the method of “gradient ascent.” The current fascination with “large plane gauge models” is understandable, but there may be other techniques not yet perfected.

    Indeed, many of us agree with your critique of Sir Cayley’s overall programme, but we still subscribe to a belief in open and fair dealings of the natural sciences. To that end, I admit that I and others have brought up the fact of your consultation with the Wright brothers, for which you have been compensated a just salary. We may respect the Wrights and their accomplishments, but we should not forget Voisin or other pioneers of the biplane, whom the Wrights have yet to credit.

    I also cannot avoid the observation that the Wrights have advertised in the newspaper about their “general plane transit 4,” and I presume that the Wrights intend to produce a fifth entry in their commercial enterprise. Daresay the Wrights have not heard of the “le hélicoptère” in France! Whereupon the biplane should be found to not be a universal method of flight, let alone one which matches the birds and the bees in majesty, surely the Wrights would not suppress information of the hélicoptère?

    Lest you accuse me of undue vitriol, I assure you that I enjoy reading news of the Wrights and their progress towards a new future. However, I also read the news from Europe, and I believe that after their fourteen years of patent have expired, the Wrights are obliged to release their techniques and documentation to the scientific community, for the betterment of us all; should they ignore their duty, then we shall all learn to speak French and German and join their communes, setting petals abloom and hugging face-to-face, simply so that we may access the (as the children are wont to put it) “state of the art.”

    Wishing you the best, ~ C.

  19. Craig Says:

    Scott, you are talking about the Wright Brothers but that was more than a century ago. Focus on today in which the same stuff those bastards said to the Wright Brothers, they are saying about ChatGPT. ChatGPT is one of the greatest technological miracles of the last hundred years. I couldn’t believe how when I asked it to compose a poem about an obscure topic, it did it in seconds.

  20. Ajit R. Jadhav Says:

    > “Wilbur and Orville haven’t even released the details of the toy, for reasons of supposed “commercial secrecy.” ”

    Any one for the Open Source Software like LaTeX vs. Microsoft Word any more here? Insistence on the former? At least for *credible* *research* *papers*?

    And, if the American patent system is really broke(n), how come some talented people still manage to get job [scratch] gig offers while other talented people go without even paltry-sum-paying jobs for 10+ years?

    Ah! The *Feel* Good Factor! Once You Are In! Capitalism! That’s it!

    No principles.

    No! It’s not that!

    Just talent recognizing other talent.

    No principles. No Capitalism.

    Just MIT talent. Just MIT talent. Just MIT talent.

    [Also Berkeley, etc.]

    PS: Scott, if you won’t identify in the main text itself if some OpenAI / Google / IBM / Microsoft / US Department of Justice software wrote the main text for this post or not, I also am not going to identify if this reply was or was not.

  21. Scott Says:

    Fred The Robot #13: I’d surely have less temptation to write things that were “beneath me,” had Chomsky and his allies not been filling my news feed with stuff that’s beneath them! It’s like being gaslit every time you open your web browser — assured by legions of credentialed experts that, the more observable reality conflicts with a particular narrative, the more urgent it is that we reject reality in favor of their narrative. At some point counterargument reaches its limit and satire and ridicule become the only ways to regain sanity.

    Most of your complexity questions strike me as having reasonably straightforward answers. The computations that can be expressed by a single GPT pass are precisely those that fit in a transformer model of a given width, depth, etc — which makes it unsurprising that transformers struggle to do arithmetic in a single pass with numbers beyond a certain size. The whole point of “let’s think this through step by step” is to force GPT to act as a recurrent neural net — much like the elementary school student who does better when forced to show his or her work.

    Now, if you keep running GPT, then there’s no limit to the length of the Turing-machine computation that could be expressed, except for whatever limit is imposed by the size of the context window (2,000-32,000 tokens, depending on the version of GPT). Even the context-window limit could be gotten around using some simple code that simulates a memory manager, storing a given page of memory and retrieving it whenever GPT requests it. And of course, you can decide in advance to stop when and if GPT generates a stop token.

    At that point you’d have full Turing-universality — thereby using arguably the world’s most powerful AI to simulate what we already had in the late 1940s. 🙂

    Come to think of it, it would be a fun exercise to demo all of this explicitly!

  22. Scott Says:

    DavidM #16: For someone who already knows what a neural net is, I suppose the first step is to learn how backprop works (Hinton or any AI textbook?). Then they should learn about the particular kind of neural net called a transformer (the “Attention Is All You Need” paper?). Lastly they can read about the specific design choices made in the GPT transformer (OpenAI’s GPT papers?), and about the reinforcement learning used to fine-tune GPT (again OpenAI’s papers).

    In this entire stack of ideas, I’d say that there’s less conceptual difficulty than there is in (say) Shor’s algorithm, or maybe even Grover’s algorithm. What makes it work is mostly the sheer scale of the model and the training data, along with the sheer amount of compute that goes into the backprop. This seems to enrage the critics, but reflects maybe the single deepest lesson that’s been learned about AI in the 73 years since Turing. Namely that, if an AI understands X, it won’t be because we understood what it means to understand X, but simply because we gave the AI a general rule for learning basically anything, and then set it loose on vast training data containing many examples of X.

  23. DavidM Says:

    Scott #22: great, thanks! I’ll take a look at that transformers paper, since I think that’s the key piece of knowledge I’m missing.

    Of course the flippant remark is that one rarely understands the thought processes of other humans, so why should we expect the machines to be any different…

  24. Fred the robot Says:

    Scott #21: A practical demonstration would be amazing! I thought the amount of computation would be limited by the size of the context window, and your memory management idea seems very interesting to me. With regards to people filling your news feed, I refer you to https://xkcd.com/386/ 😉

  25. Raoul Ohio Says:

    This exact analogy had been trotted out in support of wild ideas (mostly totally stupid) millions of times.

  26. Scott Says:

    Raoul Ohio #25: The point is that it’s no longer a “wild idea.” The actual airplane is now flying around the actual cornfield. Am I, like, the only person here who cares? Who’s trying to be neither unconditionally enthusiastic nor unconditionally dismissive, but rather, responsive to external reality?

  27. M. Flood Says:

    Count me in as one of the people incredibly excited by the new advances in large language models. My coding has been accelerated so much I now am able to spend most of my time thinking up what to do rather than getting bogged down in the How To Do It struggles. My writing has been accelerated too.

    What I’m wondering is whether this is going to be a recursive technology. By that I mean a technology that accelerates its own development:
    Electricity – once electricity was supplied to every building it created both easier means to experiment with electricity and also created a market for electrical devices
    Computers – In “The Soul of a New Machine” (1981) Tracy Kidder wrote that the Data General computer whose development he documented may have been the last computer designed ‘by hand’, without the aid of computer aided drafting tools. Once those became available, the rate of progress grew by leaps and bounds.
    the Internet – I was a child when Netscape Navigator came out. The first web pages I remember reading were about how to build web pages. The rapid communication the Internet made possible has done a lot of things, one of which was create the market for websites and web development, which in turn advanced the design of servers, scripting languages, streaming, and the spread of high-speed internet.

    Whether LLMs and their related generative AI systems will be similarly recursive remains to be seen, but I am optimistic. Despite its tendency to occasionally produce untrue content (I wish we would drop the term ‘hallucination’ as it is conceptually disabling, as are all anthropomorphizing concepts when applied to software and machines) if I were a young researcher I would be psyched by the ability of these systems to weave together and summarize vast amounts of research, in some cases even providing valuable guidance for future research directions.

  28. danx0r Says:

    Scott, I’d love to hear you address the elephant in the room. I recently saw a talk by Ilya Sutskever. I’ve paid close attention to what you’ve said on this blog. I’ve read Sam Altman’s recent post about OpenAI and alignment.

    I can’t help but conclude that some very smart people with access to the latest research believe that we have crossed a critical threshold. They are careful with their words, but the implication is clear: If AGI isn’t here already, it’s lurking just around the corner.

    Putting aside the harder philosophical questions about consciousness and so on — it should be obvious to anyone knowledgeable and observant that pretty soon we will have agents that talk as if they have thoughts, feelings, aspirations, and goals. They will effectively pass the Turing test for the great majority of people they interact with. And so I ask you this question:

    If entities exist among us who profess (in our own language!) to have a sense of self, and a desire to be treated accordingly, at what point does it become ethically incumbent on us to treat them as approximate moral equals, in the sense of deserving equal respect?

    At what point do we all become Blake Lemoine?

  29. Scott Says:

    danx0r #28: That’s an enormous question, so let me answer only for myself, rather than trying to answer for Ilya, Sam, or anyone else.

    For me, the moral status of AI entities is profoundly influenced by the fact that they can be freely copied, backed up, restored to a previous state, etc. etc. Unless these abilities on our part were removed by fiat, what would it even mean to “murder” an AI? If we gave it the right to vote, would we have to give a billion clones of it a billion votes? And thousands more such puzzles that we can pull from both the philosophy and the science-fiction literature.

    Now, the existence of these puzzles doesn’t necessarily mean that an AI would lack any moral standing! Certainly, if you’ll grant that the Mona Lisa or Macbeth or the Great Pyramids have a sort of moral standing—e.g., they’re treasures of civilization whose permanent destruction would impoverish the world—then it should be no trouble to extend the same sort of moral standing to LLMs. And crucially, this wouldn’t require making the impossible metaphysical judgment of whether they’re conscious, whether there’s anything that it’s like to be them.

    As Turing predicted 73 years ago, the latter question seems more likely to be “settled” by a shift in the culture than by experiment or argument: in a world where we were all surrounded by powerful AIs from early childhood, it might come to seem natural and obvious to ascribe consciousness to them. Again, I think this is especially so if these were the kinds of AIs that irreversible things could happen to—e.g., if they had physically unclonable analog components that made them the unique loci of their particular identities.

    For much more on the latter theme, see my Ghost in the Quantum Turing Machine essay from a decade ago.

  30. M. Flood Says:

    @danx0r #28: speaking as a programmer, if I wrote the system prompt that led it to claim to be so, or I saw the system prompt that made it claim to be so, then No.

    I think the question you ask is best examined from a meta-question: what would we need, other than gut feelings, to decide one way or the other?

    The conscious entities we know of, human beings, exist continuously through time. Whatever may be occurring at the quantum level (I leave this one to Scott), time and life in it are continuous. There are interruptions of active consciousness (sleep, unconsciousness due to injury or illness) but living consciousness is not discrete, pausable, rewindable, or replicable. The machine software we have is all of those things. Even Robin Hanson’s hypothetical Ems are not the kind of things I would want to treat as actually alive in any real sense.

    What we have are tools. Advanced tools of a kind we did not envision (I’ve been reading science fiction for decades and artificial intelligence never, to my knowledge, was ever conceptualized like a large language mode) with quirks we are only beginning to understand. What we are living through is an adjustment period after which our concepts will (hopefully) catch up with our realities, and give us a better grasp of how to use these tools.

    My immediate worry is not about AGI, but that we will as a species fool ourselves into believing we have it when we don’t. People may react to these illusions violently – a mini-Butlerian Jihad against an illusion, a witch burning of mannequins rather than people, though perhaps with violence against their creators. To modify Scott’s story above, I don’t think we’ve invented flight, but something we don’t have a term or concept for. Trying to use that thing as if it were a conscious and rational, an agent in being rather than a simulacara, could lead us to deploying them in roles they are not suited for, with possibly tragic consequences.

  31. SR Says:

    Scott #26: I think there might be some response bias in the comments on your blog. People who are already sold on the potentially huge impact of AI are less likely to reply with their thoughts, as you’ve already made a good case for it. Your blog also tends to attract erudite STEM professionals who are less likely to be impressed as they (1) have read about or witnessed dozens of past technological hype cycles that failed to pan out as promised, and assume this must be similar, (2) can still personally claim better performance than GPT in their respective areas of expertise, and so can continue to ignore its capabilities, a luxury no longer available to many.

    The general public seems much more convinced that huge change might be on the horizon. I frequent an online forum mainly populated by late 20-something humanities majors, many of whom are worried about what GPT will mean for their jobs. Tech Twitter is abuzz with optimism. The comments on Ezra Klein’s recent NYT article on LLMs took the possibility of superhuman AI seriously– they were pessimistic about the future, but not dismissive of the technology’s potential. Survey results from 2 months ago (https://www.monmouth.edu/polling-institute/reports/monmouthpoll_us_021523/) indicate 25% of Americans are very worried, and 30% are moderately worried, that “machines with artificial intelligence could eventually pose a threat to the existence of the human race”.

    I think there’s a case to be made for ignoring Chomsky’s predictions, especially as the man does not command nearly as much respect outside of academia as he does within. (Tangentially, conservatives and moderates have never been fans, but in this era of cancel culture, I’m surprised that progressives haven’t become hostile towards him. After all, he has (a) argued that the US/NATO are equally as culpable as Putin for the invasion of Ukraine, (b) defended unqualified free speech, signing the contentious Harper’s letter 3 yrs ago, (c) endorsed sociobiology, and (d) decried postmodernism. I feel like he is a member of a certain old-school faction of leftism which has lost substantial cultural cachet recently.)

  32. Topologist Guy Says:

    When I read the post title, I thought you were complaining about your two young, loud kids. Are they still being loud these days?

    Re:22, some of the more mathematically interesting ML algorithms come from the world of computer vision. Some semi-supervised learning algorithms use mathematical structures from differential topology (e.g., manifold regularization). David Mumford, of the Deligne-Mumford stack in AG, works in computer vision now.

    I dispute your characterization that learning how ML software works from the ground up is as conceptually simple as learning Schor’s algorithm. Perhaps if you *only* learn the ML concepts themselves—personally, I’m trying to learn some of this stuff as an academic mathematician, and my philosophy of rigorously defining all concepts demands that I build up the entire software stack from the ground up—to put on a basis of precisely defined concepts, the basic properties of operating systems, I/O operations, machine language, how compilers and linkers work, OOP languages, integrated development environments, etc.—to the point where I have confidently put the entire software stack on a basis of precisely defined concepts. In this sense then I’d say the vast majority of my time learning how ML works is spent learning how computers and software work in general. This is of course alien to how any software engineer approaches learning their subject.

  33. HSmyth Says:

    I’m unsubscribing to this blog. The signal-to-noise ratio was once very good and I learned some interesting things here. But lately it seems every other post is Scott airing some grievance against bullies who objectively have no power over him. Scott — you and your friends are changing the world whether other people like it or not. Lots of people are going to lose their jobs and have no say in the matter at all, can’t you imagine why they’re angry? You’re richly rewarded for your efforts and get to influence the future, isn’t that enough? This whining insistence that everybody love you too is really tiresome.

  34. J. Says:

    Scott #21 That sounds interesting. The weights would presumably either have to be (clumsily) set to 0- or 1-equivalent manually or there would need to be an algorithm that prompts the net to set and keep them at those fixed values. After all, one would need to get to a deterministic state somehow.

  35. Scott Says:

    HSmyth #33: I find it ironic that you would leave that comment on a post that had nothing to do with me personally, anything I created, or whether people love me or hate me for it. Not that I don’t obsess about those topics more than I should … but I wasn’t doing so here! Anyway, sorry to see you go, don’t let the door hit you on the way out, etc.

  36. Scott Says:

    SR #31:

      I think there might be some response bias in the comments on your blog. People who are already sold on the potentially huge impact of AI are less likely to reply with their thoughts, as you’ve already made a good case for it. Your blog also tends to attract erudite STEM professionals who are less likely to be impressed as they (1) have read about or witnessed dozens of past technological hype cycles that failed to pan out as promised, and assume this must be similar, (2) can still personally claim better performance than GPT in their respective areas of expertise, and so can continue to ignore its capabilities, a luxury no longer available to many.

    Now that I consider them, those are all extremely plausible hypotheses! Thank you!

  37. Thaomas Says:

    I agree with your “middle of the road” take on AGI/LLM’s but the refutation of falsehoods is not the same as proof.

  38. manorba Says:

    i agree that the points brought by SR in #31 are all valid and important.
    but to me there are also many other forces at play here:
    one is the innate human tendency to conservatorism. changes are scary.
    Also, Scott, your enthusiasm (which i share in part, for what it’s worth) can be perceived as unspontaneous, yourself being on OpenAI payroll in some form. While the regulars like me and the ppl who already know you can understand your stance, this is a criticism you have to deal with.
    And criticism on OpenAI behaviour through the years is very legit imho. By the way as a former sysadmin and IT teacher i also think that the word “open” in their name should mean something, as in open source.
    This last part is why i don’t share your enthusiasm on your wright bros (by the way, analogies and metaphores can take you that far but at some point they lose their efficacy, as the comments have aptly shown) and i believe that if you had given yourself some more time to think and let things settle this post would have been somewhat less… melodramatic?
    But yes this ML toy is quite something! can’t wait to see its evolution and the impact it will have on society.
    I’m still totally unconvinced that it will be a starting point to AGIs, at least not by itself. not that it matters, AGIs are still vapourware. In my personal scale of tech development (that goes like: vapourware, proof of concept, alpha, beta, 1.0 release) GPT is in beta stage, and it is creating havoc already!

  39. Doctor How Says:

    You’re absolutely right, Scott. I don’t understand these people trying to downplay AI’s capabilities. I can see that they are revolutionizing our world, and I want to be part of that movement. Your blog has partly inspired me to study AI when I graduate from high school.

  40. Thomas Graf Says:

    Let me join HSmyth in lamenting the signal-to-noise ratio in recent posts. The topic deserves a more level-headed discussion than this (anticipating the “but the Op-Ed isn’t level-headed” complaint: when you think somebody’s going low, you should still go high). A few comments:

    1. Your consistent pattern of attributing the Op-Ed to Chomsky and Chomsky only is insulting to the two co-authors (in particular because Chomsky probably did the least of the actual writing; my money is on Ian Roberts with some touches by Watumull). And it really shapes your attitude to the Op-Ed, see for instance the beginning of your comment #297.

    2. Whether LLMs learn like humans is relevant because humans are the gold standard for language learning, they converge on the target language with very little data and master not only the body of the Zipfian dinosaur but also its long tail. Current LLMs do not work for resource poor languages, which is the majority of languages (and non-standard dialects of resource rich languages). Perhaps that can be addressed by bootstrapping from a resource rich language, perhaps we can find a transformer counterpart to Emily Bender’s grammar matrix, but all of this is an open question and not to be lightly brushed aside.

    3. You attribute to the Op-Ed the claim that LLMs learn “false” grammar systems when fed false training data and ask how it could be otherwise. But the Op-Ed doesn’t talk about false grammar systems in the sense of getting a rule wrong, it talks about inferring a grammar system that is unnatural because it does not have the shape of a natural language grammar. Humans converge on grammars of a particular shape even if the input data has an unnatural shape. That’s how pidgins become creoles.

    As a mathematical toy example, suppose that we have a learning algorithm that extracts bigrams from the input and builds a grammar that allows all strings, and only those, that contain only bigrams that were in the input. Let us assume furthermore that natural languages, when viewed as sets of strings, cannot be finite and must be countably infinite. Now we present that learning algorithm with a data sample drawn from the unnatural target language {aa}, i.e. a language where the only licit string is aa. The bigrams are $a (start with a), aa (a may follow a), and a$ (end with a). Hence the learning algorithm will converge on the language a+ instead, which contains a, aa, aaa, and so on. It simply cannot learn {aa}. LLMs can, and they can learn many other unnatural patterns. That’s arguably why they need a lot of data to filter out unnatural languages and converge on a natural one (or something close to one, because even the standard claims like “LSTMs can learn long-distance dependencies” don’t quite hold when you look closely).

    4. I don’t know how one can say with such certainty that we’re witnessing an airplane rather than a really good pogo stick. Prediction is difficult, especially if it is about the future. In the mid 90s, Support Vector Machines were a major breakthrough in machine learning, and now you can teach a whole machine learning course without mentioning them once. But no matter which side one is on in the airplane VS pogo stick debate, we all agree that this thing in the air isn’t a bird, which is what the Op-Ed is trying to tell the general public.

    5. For those commenters who are flummoxed that not everybody is stunned by ChatGPT: there’s been a consistent stream of work for over a decade now that points out problems with neural networks. ChatGPT confirms the equally old response that these problems don’t matter 99% of the time for real world performance, but I don’t think anybody ever doubted that. Some people care about the 1% because, again mirroring Zipf’s law, that’s where most of the interesting stuff is happening; and depending on your area of application, getting that 1% wrong may be fatal (language is probably not one of those areas of application).

  41. Scott Says:

    Thomas Graf #40:

    1) The piece was titled “Noam Chomsky: The False Promise of ChatGPT,” thereby positioning Chomsky as primary author from the very beginning.

    2) I suppose I should be honored to hear “when you think somebody’s going low, you should still go high,” thereby implicitly holding this blog to a higher standard than the Emperor of Linguistics himself writing ex cathedra in the New York Times!

    3) Yes, of course GPT learns differently from a human. It lacks the millions of years of evolution that adapted us to our ancestral environment. So, on the negative side, it requires orders of magnitude more training data, but on the positive side, it can learn many things humans couldn’t. Does anyone on any side of the debate dispute any of this?

    4) Yes, the thing soaring through the air right now is not a bird, just as planes were not birds. They were a new kind of thing that humans’ mental categories had to expand to accommodate. So let’s get started! 🙂

  42. Christopher Says:

    It’s become increasingly common to hear people singing the praises of GPT-4, the latest iteration of OpenAI’s text-generating model. They hail it as an example of artificial intelligence, even going so far as to suggest it possesses a level of expertise in various fields. While it’s easy to be seduced by the apparent brilliance of GPT-4, I’d like to argue that what we’re witnessing is not intelligence at all, but rather a sophisticated form of digital Pareidolia or a modern-day Clever Hans.

    A common misconception among the “common folk” is that GPT-4 can solve complex mathematical problems. For example, they might think GPT-4 can easily compute the integral of x^2 from 0 to 2. Of course, GPT-4 wouldn’t know that you could evaluate this integral by finding the antiderivative, which is (x^3)/3, and then applying the Fundamental Theorem of Calculus. It surely wouldn’t recognize that the result is 8/3. It’s important to remember that GPT-4 is just a text generator, and we shouldn’t read too much into its apparent problem-solving abilities.

    Another area where GPT-4 supposedly excels is in language translation. But let’s not be too hasty in assuming it can accurately translate phrases between languages. Take the phrase “La plume de ma tante” for example. It would be a stretch to think that GPT-4 could recognize this as French and translate it into English as “The pen of my aunt.” At best, GPT-4’s translations are likely to be hit-or-miss, and we should not confuse this with true linguistic expertise.

    One might also think that GPT-4 could generate insightful analyses of literary works, but it’s doubtful that it could offer any genuine understanding of a text. Take Shakespeare’s famous soliloquy from Hamlet, “To be or not to be, that is the question.” GPT-4 would likely be unable to identify the complex themes of existentialism and mortality at play in this passage, let alone discuss the inner turmoil faced by the protagonist. It’s just a machine after all, devoid of the human experience necessary for deep comprehension.

    Additionally, GPT-4 is often credited with an ability to write poetry, but we must remember that it is simply stringing together words based on patterns it has seen before. If one were to ask GPT-4 for a haiku about spring, it might produce something like this:

    Blossoms on the breeze
    New life awakens the earth
    Spring’s gentle embrace

    While this may seem impressive, it’s important to keep in mind that GPT-4 lacks the emotional depth and artistic intention that human poets possess. It’s merely an imitation of creativity, not a genuine expression of it.

    It’s easy to understand why many people believe GPT-4 is intelligent. Its responses are often coherent and seem to reflect an understanding of the topic at hand. However, this is simply a result of clever programming and extensive training on vast amounts of text. Like Clever Hans, the horse that seemed to perform arithmetic but was actually responding to subtle cues from its trainer, GPT-4’s abilities are nothing more than an elaborate illusion.

    In conclusion, GPT-4 is undeniably an impressive technological achievement, but we must not mistake its capabilities for true intelligence. To do so would be to fall prey to the same misconceptions that have misled people throughout history, from those who believed in Clever Hans to those who see faces in the clouds. It is vital that we recognize the limits of GPT-4 and maintain a critical perspective when evaluating its responses. As we continue to develop more advanced AI systems, let us strive to distinguish between genuine intelligence and the clever imitations that can so easily deceive us.



    P.S. Dear Scott Aaronson, I hope you find the above essay thought-provoking and amusing. The user who requested this essay asked me to add a short postscript specifically for you after the initial request. As a respected figure in the field of computer science, your insights and opinions are invaluable for the ongoing discussion surrounding AI capabilities and limitations. I look forward to any feedback or thoughts you may have on the essay, and I hope it serves as a testament to the playful yet thought-provoking nature of the conversations surrounding AI. Best regards, GPT-4

  43. Lorraine Ford Says:

    Scott #26:

    Like M. Flood (#30), I am speaking as a (former) computer programmer and analyst: “What we have are tools. Advanced tools of a kind we did not envision…”

    Isn’t there an old adage that you shouldn’t judge based on external appearances? If you were really “responsive to external reality”, then you wouldn’t be looking at superficial appearances: you would be looking closely at the underlying physical reality of what is happening inside computers/ AIs, and how they are set up in order to make them work as required.

  44. Adam Treat Says:


    I disagree with #3. Humans learn from a dataset orders of magnitude larger given that from the time we are born we have a 24/7 real-time data feed coming through our senses that is equivalent to GB’s of real time info. It is simply enormous the data we have access to all the time. That dataset doesn’t look like the LLM dataset and it is multi-modal which might very well be a huge advantage. Also, the latest papers showing the LlAMA models being trained up by Stanford Alpaca shows that with just 52k high quality data the thing vastly vastly improves. I think in the end our conventional wisdom that LLM’s require much more data to train than humans is way overstated.

    Also, please don’t listen to the sneerers/naysayers your last posts have tons of signal and are greatly enjoyed by many. Sorry you have to put up with them.

  45. PublicSchoolGrad Says:

    Scott #41,

    It seems as if people are talking past each other. To me it seems that there are 3 sets of positions which are not all mutually exclusive:

    1. LLM’s are the beginnings of AGI
    2. LLM’s will fundamentally transform society
    3. LLM’s don’t tell us much about human language acquisition

    I think #3 is Chomsky et al.’s position. Believing #3 does not mean you don’t believe #2.

    This post seems to be based on a belief that the Chomsky et. al article was arguing for the negation of #2 (i.e LLM’s will not have significant impact on society). I think that is mistaken.

    I also think that a resentment of Chomsky comes through in the posts and comments (sarcastic(?) references to “Chomskyism”, “Emperor of Linguistics” etc) which doesn’t add to the signal portion of the signal/noise ratio. I think this blog can be a valuable place for a calm and rational discussion of these issues without adding to the noise.

  46. Thomas Graf Says:

    Scott #41:

    1. Yes, that’s what the piece was titled, but afaik it is common practice for news paper editors to make all kinds of changes to op-ed pieces, in particular regarding the title. I will give the authors the benefit of the doubt that this isn’t the title they proposed. Just like I will readily acknowledge that, say, Emily Bender, Bob Frank, or Tal Linzen would’ve had more insightful things to say about ChatGPT from a linguistically informed perspective; but that’s not gonna get the clicks the NYT wants.

    2. Snark acknowledged. But as you know, my “you think somebody’s going low” part does not entail that they’re actually going low. It was a very tame Op-Ed and certainly not an “attack-piece”.

    3. Folks disagree on what the implications are. For example, you think it’s a good thing that it can learn things humans can’t. That makes sense if one’s focus in on general AI and language is just a convenient test case on the road to that. For me it’s the exact opposite. I have no interest in doing AI research, I care about the computational nature of language, and by that I mean language as a system of mental rules, not how to write poetry or deliver a nice zinger during a conversation. In that domain, strong, empirically robust learning biases would be very useful because there is no benefit to the ability to learn unnatural languages, whereas the downsides in terms of training data, lack of learning guarantees, and potential endangerment of smaller linguistic communities are very real.

    Also, I don’t like that the resource needs for these models have become so large that only researchers at a few big IT companies are really in a position to do bleeding edge NLP research and then don’t disclose most of it because it’s proprietary. NLP as a field already has trouble dealing with the consequences of its rapid growth over the last 15 years, and this recent trend makes things even worse.

    4. Well if you acknowledge that ChatGPT is not a bird, and all the Op-Ed says is that ChatGPT isn’t a bird, then I don’t understand why you’re so upset that the Op-Ed says ChatGPT isn’t a bird. You’ve asserted in several comments that the Op-Ed denies the long-term viability for practical tasks (i.e. whether it’s a plane), and I just don’t see it.

  47. Thomas Graf Says:

    Adam Treat #44

    Yes, the data humans get is very different from what we feed LLMs. We don’t need to go into sensory data, semantics and all that stuff, just the lack of prosody already makes text very impoverished compared to spoken and signed language. We know that prosody is a useful indicator of syntactic structure. We also know that child directed speech differs in specific ways from normal speech, presumably in order to aid the learning process. Whether all of that is enough to overcome the relatively small amount of linguistic input is an open question, but there’s good reason to assume that there are strong learning biases in place beyond that, e.g. pidgins turning into creoles.

    I don’t see any major improvements in data quality on the horizon for LLMs, and historically speaking, high quality data in NLP usually brought its own set of problems, e.g. analytical choices in treebanks. So we should work on ways to get better results from smaller amounts of unannotated data. The kind of targeted corpus supplement that you mention is one step in that direction, and this is an area where I wish linguists were more active. Another direction is to identify empirically viable learning biases and figure out how to incorporate them into LLMs, and I think that will ultimately prove to be more scalable across languages.

  48. Adam Scherlis Says:

    Scott, any comment on the recent result that “real quantum mechanics” is experimentally distinguishable from standard QM? I was surprised to hear that such an obvious question is only getting answered now, and I know you’ve written about some variants of QM that use reals or quaternions. Is this paper a big advance or overhyped?

    I refer to Renou, MO., Trillo, D., Weilenmann, M. et al. Quantum theory based on real numbers can be experimentally falsified. Nature 600, 625–629 (2021).

    (2021? Oh, you’ve written about it: p=5270)

  49. Adam Scherlis Says:

    Addendum to above: why is Renou writing about this in SciAm *now*? Are there new results?

    Quantum Physics Falls Apart without Imaginary Numbers, originally published with the title “Imaginary Universe” in Scientific American 328, 4, 62-67 (April 2023)

  50. JimV Says:

    L. Ford, my unsolicited opinion mirrors yours. I think you should be looking closely at the underlying physical reality of what is happening in brains, and how they are set up in order to make them work as required. (It is my understanding that this is what inspired neural networks.)

    Meanwhile, it seems to me that your own concerns have been asked and answered in previous comments (back-propagation, transformers, etc.). Also there are many scientific papers available, such as the one on AlphaGo’s code, and I daresay Dr. Aaronson is as familiar, if not more, with this literature as anyone else in this thread. I believe the key point is as Dr. Aaronson expressed it (in different words): neural-network-based AI programs are not designed to perform a task by rote, but to learn how to perform a task. (They have their failings, as do brains.)

  51. danx0r Says:

    #42 To coin a phrase: “Prompt, or it didn’t happen.”

  52. max Says:

    Scott #33:

    You didn’t respond to the part of HSmyth’s post that I thought had actual teeth:

    “Scott — you and your friends are changing the world whether other people like it or not. Lots of people are going to lose their jobs and have no say in the matter at all, can’t you imagine why they’re angry?”

    What do you think about this? I think it’s surprising that someone who seems to want to take himself seriously as an ethical person hasn’t addressed the morality of working for a company who will, as you must know, allow this research to be used in harmful (misinformation, whatever) or ethically questionable (automating away other people’s jobs) applications.

  53. Charles A Says:

    Scott #21:

    Can you read this paper from Deep Mind:

    “Neural Networks and the Chomsky Hierarchy”

    The point is not that you can take weak models that can only learn finite state automata formal languages and then add a tape to them to get a Turing Machine. Neural Turing machines are different than that, and make all the tape operations differentiable. Transformers were inspired by parts of their architecture.

    It isn’t about what power can the system theoretically have, it is about which ones can actually train from examples produced by formalisms farther up the heirarchy. Neural turing machines seem to actually be able to do this, and transformers not. Adding a tape to a transformer isn’t the same thing, and if you make the adjustments needed to make tape operations differentiable you have a Neural Turing Machine, invented at Deep Mind, not a Transformer (also invented at Deep Mind).


    I only mention where things were invented because of your Wright Brothers analogy. In Brazil most people think Santos Dumont invented the airplane, and at OpenAI, most seem to think OpenAI invented the transformer based large language model with RLHF.

  54. Charles A Says:

    I meant to mention these as well: https://www.wikipedia.org/wiki/Differentiable_neural_computer

  55. Scott Says:

    Adam Scherlis #48: Yeah, I already blogged about it.

    The issue is a little subtler than you say: certainly Nature acts as if amplitudes are complex numbers. And certainly any time you see a complex number in Nature, it “could” “just” be a pair of real numbers that happens to behave as if it were a complex number—a totally unfalsifiable proposition.

    The recent realization was that, if you further assume that the secret simulation of complex numbers by pairs of reals has to respect spatial locality in a certain sense, then you do get an empirical prediction that differs from that of standard QM, analogous to the Bell inequality. And the requisite experiment was actually done in ridiculously short order (like, a month or two), and of course the results 100% confirmed standard QM and ruled out the alternative possibility (they always do 🙂 ).

  56. Scott Says:

    max #42: OK then, my answer is this. If the worst thing AI were going to do was outcompete various people at their jobs, then it would be directly analogous to wave after wave of previous technological displacement (printing press vs scribes, electronic switching systems vs phone operators, Orbitz vs travel agents, etc etc). Again and again, I would say, the previous waves ultimately made the world better. So the right play for society was not to try to arrest the technological change (which probably wouldn’t have succeeded anyway), but to provide compassion and practical help to the people whose jobs were affected, assuming the transition happened in less than one human lifetime. As someone who’s always supported a social safety net and perhaps even a Universal Basic Income, that’s a very easy pill for me to swallow.

    And if this transition is going to happen, then certainly I’d rather it be managed by entities like OpenAI or DeepMind, which at least talk a gigantic game about the best interests of humanity, mitigating the societal impacts, etc etc — thereby making a very clear commitment that the public and policymakers can and should hold them to — than the most likely alternative (human nature being what it is), which is entities that openly, explicitly don’t give a shit about any of these issues.

  57. PublicSchoolGrad Says:

    Scott #56,

    I don’t believe that anyone here is advocating for stopping work on this stuff, including Chomsky et al. I think they are just saying that there is little reason to think it will illuminate how language works in humans. I don’t see why you should find that so objectionable.

    Everyone involved, including the people who wrote the op-ed, is also aware that AI will bring changes to society. I am not as optimistic as you that OpenAI will be a more benevolent steward of this technology. I can’t remember the last time an entity came into possession of powerful technology and didn’t use it toward its own ends. Looking at the founders of your organization doesn’t give one confidence in that regard. Besides, even the most inhumane organizations cloak their intentions in high sounding language so “talking a gigantic game” about the best interests of humanity means little. There are folks like Emily Bender etc who *are* working on understanding the risks of this kind technology and who do not have the same conflicts of interest as some of the OpenAI people.

  58. Jesuit astronomer Says:

    Two strange cats are stalking around behind a pane of glass. Felix and Oscar have approached them to investigate, wait, and observe. But their humans know better.

    A reflective surface, nothing more. Any talk of these “looking glasses” being a portal to another world is obvious moonshine. One doesn’t know whether to laugh or cry that anyone could be so gullible.

    One shouldn’t even say that these cats in the glass are cats: only that they seem-to-be-cats. The glass hasn’t even scratched the true mystery of how cats are made. It sidesteps the mystery. It’s a scientific dead-end.

    Bronze mirrors have existed for millenia. There’s nothing genuinely new here.

    Anyways, the reasons for doubt are many, varied, and subtle. But the bottom line is that, if the cats only understood what their humans did, they wouldn’t be concerned about the strangers behind the glass.

  59. Scott Says:

    PublicSchoolGrad #57: When commenters keep repeating that Chomsky didn’t say any of the extreme things I imputed to him, I keep wondering whether I hallucinated the words on my screen, but then I look again and there they are. The reference to “the banality of evil,” implicitly comparing ChatGPT to Adolf Eichmann. Decrying the “injudicious investments,” “not knowing whether to laugh or cry” … what more would Chomsky have to say to convince you that he despises this work and would shut it all down if he could (albeit for completely different reasons than the AI x-risk people)?

    Then again, many people also found it impossible to believe that Chomsky would deny Pol Pot’s genocide while it was happening, or blurb for a Holocaust denier. It’s as though the idea that arguably the world’s most revered living intellectual sincerely holds such utterly unhinged views, produces so much cognitive dissonance that the only way forward is to deny that he holds them, even while Chomsky himself is loudly insisting otherwise.

  60. Scott Says:

    Incidentally, in case this helps, and without talking about individuals: I’m far more terrified about powerful AI falling under the control of either left-wing or right-wing fanatical ideologues, than I am about it being controlled by Silicon Valley nerds who are politically diverse (but most often classically/procedurally liberal and amenable to compromise), not averse to getting rich, but also motivated by a strong desire to earn the esteem of fellow nerds and not be condemned by the broader society. This group, which I’m not myself part of, seems much less likely than the obvious alternatives to consider itself in possession of the sole moral truth.

  61. Fred the Robot Says:

    Scott, just coming back to your earlier comment, I’m trying to understand what you mean by the memory management idea. It seems to me like this would require a “true” RNN, (more akin to the neural Turing machine in comment 53) rather than a transformer architecture – is that what you are proposing? Or do you think a transformer architecture can learn to do this somehow? The answer would seem to have implications for AI safety concerns on models like GPT because if there are meaningful limits on the amount of computation a transformer can do then it seems “safe” in at least some sense because it is limited in ways humans are not. (I apologize if this comment seems off topic for this article – hopefully of interest more generally?)

  62. gguom Says:

    Adam #44:
    > Also, please don’t listen to the sneerers/naysayers your last posts have tons of signal and are greatly enjoyed by many. Sorry you have to put up with them.

    Is this supposed to be an echo chamber? Certainly by internet standards, I perceive no sneering in these recent GPT posts, and no naysaying, just decent scepticism. The sceptics might end up being wrong, but as of now, I think it’s unfair to just dismiss them (note that I have no skin in this game).

  63. Timothy Chow Says:

    Scott #29: On the topic of irreversible things happening to AIs, I wonder if you have read Isaac Asimov’s story, “The Bicentennial Man”? The premise is that a robot has such a strong desire to become human that it requests an operation to make it mortal. I think it is interesting to ponder whether we, if put in the position of the “surgeon,” would honor such a request.

    Many futurists seem to think it would be a wonderful thing if we mortal humans had the technology to become immortal. Would they change their minds about the value of immortality if they encountered immortal entities clamoring to become mortal in order to accrue the benefits of being treated on an equal footing with other mortal moral agents?

  64. Scott Says:

    Charles A #53 and Fred the Robot #60: I confess that I don’t understand why the details of GPT’s internal architecture are all that relevant to the question at hand, which is what sorts of processes GPT can be set up to simulate. We already know GPT can simulate all sorts of discrete processes with reasonable reliability, so why not a Turing machine? All you’d have to do, is

    (1) keep the instructions of how to simulate a Turing machine in the context window forever, via repetition or fine-tuning,

    (2) give GPT the fixed-size instruction table for some particular universal Turing machine (with the particular TM you want to simulate encoded on the tape),

    (3) externally store any sections of the tape that don’t fit in the context window, and

    (4) tell GPT to request the relevant section of tape from the user, whenever it runs off the left or right edge of its current tape window.

    If a practical demonstration would change your mind about this, maybe I can be nerd-sniped into spending a couple days on it (but anyone else reading this, please feel free to preempt me!).

    More generally, the question of whether a system “is” or “isn’t” Turing-universal is often at least as much about how we assume we’re able to operate the system, as it is about the system’s intrinsic properties. For example, there are many cellular automata that are Turing-universal but only if you put a lot of nontrivial work into encoding an infinite initial state (sometimes a non-repeating one), and also “look in from the outside” to see when the computer has entered a halt state. In the case of GPT, if you regard it as just a fixed-size network mapping a sequence of 2048 tokens to a probability distribution over a 2049th token, then of course that can’t be Turing-universal. But if you run it over and over an unlimited number of times and also handle the memory management for it, then I see no reason whatsoever why it couldn’t simulate a Turing-universal process, and therefore be Turing-universal in an appropriate sense.

  65. Filip Dimitrovski Says:

    if you regard it as just a fixed-size network mapping a sequence of 2048 tokens to a probability distribution over a 2049th token, then of course that can’t be Turing-universa

    Humans in the context of a “who wants to be a millionaire” question aren’t either.

    Hell, if you don’t add a crystal oscillator, even modern computers don’t fit in the definition.

    And if someone wants to be picky about it, no physical machine has an infinite tape anyway!

  66. Charles A Says:

    Scott #63:

    I’m not arguing that a big state machine with a tape can’t be a universal Turing machine. The paper is about learning from examples in a extrapolatable way to languages in the class only produced by universal turing machine (with memory limits). It needs an actual training procedure and not just a notion that with the right weights it could produce extrapolatable examples.

    Neural turing machines do it by making the tape operations differentiable and tape contents more continuous.

    Transformers couldn’t learn A^N B^N (A repeated N times, B repeated N times), but neural turing (and stack) machines were able to.

    What the transformer + tape you are wanting to rig up would need to be a system that can be trained by example in a way that extrapolated to larger N (they learn more complicated things by example like sorting as well). It isn’t about whether there are weights that could do it, but whether they are learnable from examples of the output of a formal language with current training procedures.

    The deep mind paper is really short and worth reading.

  67. Seth Finkelstein Says:

    Scott (mostly at 58): Before seeing your comment, I was tempted to ask if you’re one of the people who if Chomsky says “Good morning”, replies to him “All the dead of the Cambodian genocide will never have a good morning!”. I thought that might be inflammatory, but now that you mentioned it yourself, I can make that joke. If you want to discuss an incident involving Chomsky (e.g. Cambodia, Faurisson), I recommend reading carefully what he actually wrote, and his actual replies. You might still disagree. But never, ever, work from a description given by a “critic” of Chomsky – it’s too likely to be inaccurate or axe-grinding.

    Look, I think you’re doing to Chomsky what you worry about regarding nerds – projecting onto him a generic version of Bad Outgroup Member (e.g. not looking through the telescope). Case in point of the above, that “he despises this work and would shut it all down if he could”. It’s clear what he despises is not *this work*, _per se_ – but people making claims about how this work proves something about human language, and reading into the results any sort of “intelligence”. It’s a very common critique.

    I say the following meaning to be helpful to you, please forgive it being critical: Throughout your recent posts, you seem to be doing fallacious reasoning of the sort:

    “Bad people critique AI, THEREFORE, every critique of AI is being done by a bad person. Further, all critiques share the same poisonous ideology, and have the same malevolent intentions.”

    This is the root of your “wondering whether I hallucinated the words on my screen” – no, it’s not “hallucinated”, it’s that you’re doing some sort of overfitting pattern-matching, to those words, e.g.

    “the banality of evil,” implicitly comparing ChatGPT to Adolf Eichmann

    You’ve made what the social-media left calls a “dogwhistle” argument! That is, they would phrase it as: “the banality of evil” is a dogwhistle for calling ChatGPT a Nazi.

    What would convince you that you’re taking too extreme a view of a relatively mild Op-Ed, because you’re seeing it as a dogwhistle for white supremacy, err, nerd-hating?

  68. PublicSchoolGrad Says:

    Scott #58,59

    I gather that you find Chomsky distasteful for reasons other than what he co-wrote in that editorial. However, I think your points about LLM’s would be better made by addressing the substance of his critiques of LLM instead of responding with disdainful sneering. As far as I can see, you have not addressed the main point that LLMs do not tell us anything about how language works. This is fair enough – I don’t think you ever claimed otherwise either. To me it seems your interests are orthogonal so a more reasonable course would be to ignore his critique.

    As to your comment about the control of this powerful technology, I am afraid I do not share your optimism. It is not as if “Silicon Valley nerds” are a separate species of humans who do not have the same motivations as anyone else. It is not as if they can’t be “left-wing or right-wing fanatical ideologues”. You just have to look at the founders of OpenAI to see that they do have an ideological bent. Even if that were not the case, the technology is likely not going to be controlled by the technologists who created it. You just have to perform a cursory look in history to get an idea of how it is likely to go

  69. Fred the robot Says:

    I understand you now, thanks Scott. Yes, I believe that GPT can tell you what the next action should be for a given Turing machine in a given state.I agree that although it can’t “scroll” the tape ( or update the tape except by appending to it), it probably can tell you how to update the prompt so that it can perform the next step of the computation.

    I still believe that without this help the amount of computation GPT can do is finite, and that this is also intriguing in some ways, but it seems less interesting than thinking about what it could do with such help. What you suggest if exciting because it implies that if the model had access to a system which would update it’s prompt on request then it would be (theoretically at least) much more powerful and be able to do an arbitrary amount of compute from a given starting point. I wonder if this is actually practically helpful – for example to improve what can be accomplished with a “show your working” style prompt. Would love to see an example of this!

  70. Lorraine Ford Says:

    JImV #50:
    Contrary to what you seem to be saying, the physical reality of what is happening in brains is not like the physical reality of computers/ AIs. One reason is that physical brains are measurable: there are many different categories of physical information that could potentially be measured in brains, and there are numbers associated with these categories when they are measured.

    But the only measurable category in computers/ AIs seems to be voltage, which is a different thing to binary digits: binary digits are not measurable because they don’t have a category. Can you suggest other measurable categories in computers/ AIs?

    Also contrary to what you seem to be saying, the code/ program of a computer, and the data that is input to a computer, merely sit on top of the basic processes that are happening in a computer. Different programs and different data are merely a different set of binary digits sitting on top of the basic processes that are happening in a computer: despite superficial appearances, nothing new is happening with AIs.

  71. Filip Dimitrovski Says:

    Also: it’s amazing to me that ChatGPT learned how to solve Sudoku puzzles on its own.

    Yes, it sometimes makes illegal chess moves and struggles after exhausting the memorised openings… but when did we put the bar *so high*?! All we had is dog/cat recognition demos just a decade ago!

    I really don’t buy the “it’s just curve-fitting, it’s never gonna be creative” argument anymore.

  72. Charles A Says:

    Opps, previous comment should have been Scott #64 not #63.

    I’ll also take this opportunity to post an excerpt which explains it better than I did:

    > Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20’910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.

    > […]


    It is a empirical result of testing lots of training on different models, not a theoretical result of what the models could be capable of with weights from an oracle but a test of what types of formal languages in different classes they seem to be able to learn with their normal training methods.

    I think they also have some interesting references on how Transformers can learn some languages from production systems outside most of the traditional hierarchy but also fails on some very simple ones near the lowest levels. More from the paper explaining better:

    > It was theoretically shown that RNNs and Transformers are Turing complete (Chen et al., 2018; Pérez et al., 2019; 2021; Siegelmann & Sontag, 1994). However, these results are impractical as they rely on an unbounded number of recurrent steps and on arbitrary numerical precision. Thus, more recent work (Ackerman & Cybenko, 2020; Bhattamishra et al., 2020; Hahn, 2020; Hao et al., 2022; Korsky & Berwick, 2019; Merrill, 2019; Merrill et al., 2020; Merrill & Sabharwal, 2022; Weiss et al., 2018) has refined these theoretical analyses by considering linear computation steps and logarithmic precision, showing that: (i) RNNs and GRUs can, in theory, recognize regular languages, and (ii) LSTMs are strictly more powerful since they can learn a counting mechanism (i.e., are k-counter machines).

    > Moreover, it was theoretically shown that Transformers are not well-aligned with the Chomsky hierarchy since they cannot recognize certain regular languages (e.g., periodic finite-state languages), while being able to learn some counter languages (e.g., Shuffle-Dyck and n-ary Boolean expressions). A different approach proposed a computational model to capture the Transformer operations and used it to show which tasks could conceivably be learned by a Transformer (histograms, sorting, Dyck languages) (Weiss et al., 2021).

    > However, this approach only upper-bounds the capabilities of a model and does not provide any insight on whether gradient-based methods will find parameters that can solve a task in practice, which is precisely the goal of our work. In that sense, our work complements the above studies by investigating how well gradient-based learning can exploit the inductive biases of common machine learning architectures to recognize languages on different levels of the Chomsky heirarchy.

  73. Working Deep Learner Says:

    @Christopher #42 I was about to write a thoughtful response, but then I realized odds are you never took the few seconds it would require to actually, y’know, check whether ChatGPT would in fact be unable to integrate x^2 between 0 and 2, certainly not using anti-derivatives!

    It can. This example is quite representative and I do hope you will ask yourself whence the confidence in the wrong opinion.

    What is the integral of x^2 between 0 and 1?

    To find the integral of x^2 between 0 and 1, we can use the definite integral formula:

    ∫(from 0 to 1) x^2 dx

    To evaluate this integral, we need to use the power rule of integration:

    ∫ x^n dx = (x^(n+1))/(n+1) + C

    where C is the constant of integration.

    Using this rule, we can integrate x^2 as follows:

    ∫ x^2 dx = (x^3)/3 + C

    So the definite integral of x^2 between 0 and 1 is:

    ∫(from 0 to 1) x^2 dx = [(1^3)/3] – [(0^3)/3] = 1/3

    Therefore, the integral of x^2 between 0 and 1 is 1/3.

    And the integral of x^2 between 0 and 2?

    To find the integral of x^2 between 0 and 2, we can again use the definite integral formula:

    ∫(from 0 to 2) x^2 dx

    Using the power rule of integration, we get:

    ∫ x^2 dx = (x^3)/3 + C

    So the definite integral of x^2 between 0 and 2 is:

    ∫(from 0 to 2) x^2 dx = [(2^3)/3] – [(0^3)/3] = 8/3

    Therefore, the integral of x^2 between 0 and 2 is 8/3.

  74. Raoul Ohio Says:

    Scott #26

    I am also neither unconditionally enthusiastic nor unconditionally dismissive. My in-the-middle thoughts include:

    1. I don’t think anyone has a remotely decent idea of what intelligence, human intelligence, consciousness, knowledge, creativity, etc., really are.

    2. chatGPT and similar developments are a big (maybe biggest ever?) step in discussing these topics.

    3. Said developments will certainly scale up and become a bigger step.

    4. Will this turn out to be all there is? Who knows? I doubt it. It is fun to speculate on what will be “outside”.

  75. Raoul Ohio Says:

    I have noticed that anecdotal reports about trying out “Write a poem that …” seem to be “wow! this is great”, whereas those about “write a Finite Automata to do some trivial task” seem to be “HaHaHa – this is dumb AF”.

    If seems plausible that this is in fact the state of affairs as of 2023 03 20 1:13:14EST. If so, one can speculate about why:

    1. Maybe poetry is a lot easier than CT (computation theory)?

    2. Maybe the training material is weak in CT?

    3. Maybe the standards of quality in poetry and CT are incomparable in some sense?

    4. Maybe CT is something that chatGPT/AI will never be able to do?

    The only bet I will make is that chatGPT/AI will prove to be powerful tool for understanding
    intelligence, human intelligence, consciousness, knowledge, creativity, etc.,

  76. Scott Says:

    Raoul Ohio #74:

      I am also neither unconditionally enthusiastic nor unconditionally dismissive. My in-the-middle thoughts include:

      1. I don’t think anyone has a remotely decent idea of what intelligence, human intelligence, consciousness, knowledge, creativity, etc., really are.

      2. chatGPT and similar developments are a big (maybe biggest ever?) step in discussing these topics.

      3. Said developments will certainly scale up and become a bigger step.

      4. Will this turn out to be all there is? Who knows? I doubt it. It is fun to speculate on what will be “outside”.

    Bravo! I’ve tried to say the same but with many more words. You win this thread, insofar as I’m the judge.

  77. Uspring Says:

    Charles A #72:
    Thank you for the link to that paper.
    The most interesting question seems to be, how Turing complete architectures can be trained. Turing machines can run in long loops, which makes the outcome of the calculation strongly dependent on the parameters. An example is the Mandelbrot set, where the dependency on the parameter is chaotic. I don’t believe, gradient descent works then.
    I think it’s likely, that the human brain learning processes involve as yet unknown inductive biases and procedures, which help to avoid these kind of difficulties and also speed up learning as compared to the slow LLM training process.
    One definitely has to distinguish learning at training time and learning at user prompt time. The former is much more capable, since it is basically a longish search process, whereas search following a user prompt (if it can be viewed as a search) is limited in depth during the 100 or so steps in obtaining the next token.

    Raoul Ohio #74:
    “2. chatGPT and similar developments are a big (maybe biggest ever?) step in discussing these topics.”
    I daresay, that we’re witnessing the birthday of empirical philosophy.

  78. manorba Says:

    Raoul Ohio #74:
    “1. I don’t think anyone has a remotely decent idea of what intelligence, human intelligence, consciousness, knowledge, creativity, etc., really are.”
    Yes, yes… (and i would even take consciousness out of the equation… we dont’ even know if it’s a real thing.)
    Prof. Aaronson, why not doing a joint piece about it with someone like, say, Steven Pinker? oh wait…

    Question to everyone involved in LLMs:
    is it feasible to create a forum moderator with GPT or similar?

  79. JimV Says:

    L. Ford, landing a NASA mission on Mars has many different measurable physical categories involved: velocity, acceleration, solar wind, radiation, gravity, temperature, et cetera. All can be and are simulated for planning purposes on a digital computer. Proof: the mission arrives. It doesn’t matter what controllable property is used in the computer. Computers can and have been made using mechanical gears, acoustical resonances in pipes, photons in fiber-optic tubes, and so on. A composite computer using many different controllable properties in different modules could be made. As long as there is one controllable property it can be used to simulate many different properties. Therefore it makes sense to use just the most efficient one. This is elementary computer science which does not belong on a thread where experts are discussing the pros and cons of LLM’s. (Neither of us belongs in this thread, getting in the way of the experts.) (So maybe this comment will not survive moderation, which will be okay with me.)

    Brains also have an operating system, designed and programmed by biological evolution. Any basic flaw in the cognitive horizon of computers would apply to brains also (which in fact might be the case), unless you believe in magic.

    I believe the key is trial and error plus memory. It created us, over billions of years, and after about 200,000 years we have begun using it to create AI. We know it is possible, since biology did it. The questions are, how long will it take and what resources, and will our civilization survive in the meantime.

    Last thought: the post here was not one of my many favorites, but it has generated a lot of interesting comments. I stand from my laptop for an ovation.

  80. fred Says:

    In the history of science/technology, we now have a new record in how long it took for the push back on *any* criticism/skepticism/worry to boil down to


Leave a Reply

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or \( \) for inline equations.

Comment Policies:

  1. All comments are placed in moderation and reviewed prior to appearing.
  2. You'll also be sent a verification email to the email address you provided.
  3. This comment section is not a free speech zone. It's my, Scott Aaronson's, virtual living room. Commenters are expected not to say anything they wouldn't say in my actual living room. This means: No trolling. No ad-hominems against me or others. No presumptuous requests (e.g. to respond to a long paper or article). No conspiracy theories. No patronizing me. Comments violating these policies may be left in moderation with no explanation or apology.
  4. Whenever I'm in doubt, I'll forward comments to Shtetl-Optimized Committee of Guardians, and respect SOCG's judgments on whether those comments should appear.
  5. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.