## The False Promise of Chomskyism

Important Update (March 10): On deeper reflection, I probably don’t need to spend emotional energy refuting people like Chomsky, who believe that Large Language Models are just a laughable fad rather than a step-change in how humans can and will use technology, any more than I would’ve needed to spend it refuting those who said the same about the World Wide Web in 1993. Yes, they’re wrong, and yes, despite being wrong they’re self-certain, hostile, and smug, and yes I can see this, and yes it angers me. But the world is going to make the argument for me. And if not the world, Bing already does a perfectly serviceable job at refuting Chomsky’s points (h/t Sebastien Bubeck via Boaz Barak).

Meanwhile, out there in reality, last night’s South Park episode does a much better job than most academic thinkpieces at exploring how ordinary people are going to respond (and have already responded) to the availability of ChatGPT. It will not, to put it mildly, be with sneering Chomskyan disdain, whether the effects on the world are for good or ill or (most likely) both. Among other things—I don’t want to give away too much!—this episode prominently features a soothsayer accompanied by a bird that caws whenever it detects GPT-generated text. Now why didn’t I think of that in preference to cryptographic watermarking??

Another Update (March 11): To my astonishment and delight, even many of the anti-LLM AI experts are refusing to defend Chomsky’s attack-piece. That’s the one important point about which I stand corrected!

Another Update (March 12): “As a Professor of Linguistics myself, I find it a little sad that someone who while young was a profound innovator in linguistics and more is now conservatively trying to block exciting new approaches.“ —Christopher Manning

I was asked to respond to the New York Times opinion piece entitled The False Promise of ChatGPT, by Noam Chomsky along with Ian Roberts and Jeffrey Watumull (who once took my class at MIT). I’ll be busy all day at the Harvard CS department, where I’m giving a quantum talk this afternoon. [Added: Several commenters complained that they found this sentence “condescending,” but I’m not sure what exactly they wanted me to say—that I was visiting some school in Cambridge, MA, two T stops from the school where Chomsky works and I used to work?]

But for now:

In this piece Chomsky, the intellectual godfather god of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded. [Added: Please, please stop writing that I must be an ignoramus since I don’t even know that Chomsky has never worked on AI. I know perfectly well that he hasn’t, and meant only that he tends to be regarded as authoritative by the “don’t-look-through-the-telescope” AI faction, the ones views he himself fully endorses in his attack-piece. If you don’t know the relevant history, read Norvig.]

Chomsky condemns ChatGPT for four reasons:

1. because it could, in principle, misinterpret sentences that could also be sentence fragments, like “John is too stubborn to talk to” (bizarrely, he never checks whether it does misinterpret it—I just tried it this morning and it seems to decide correctly based on context whether it’s a sentence or a sentence fragment, much like I would!);
2. because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);
3. because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?); and
4. most of all because it’s “amoral,” refusing to take a stand on potentially controversial issues (he gives an example involving the ethics of terraforming Mars).

This last, of course, is a choice, imposed by OpenAI using reinforcement learning. The reason for it is simply that ChatGPT is a consumer product. The same people who condemn it for not taking controversial stands would condemn it much more loudly if it did — just like the same people who condemn it for wrong answers and explanations, would condemn it equally for right ones (Chomsky promises as much in the essay).

I submit that, like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.

[Note for people who might be visiting this blog for the first time: I’m a CS professor at UT Austin, on leave for one year to work at OpenAI on the theoretical foundations of AI safety. I accepted OpenAI’s offer in part because I already held the views here, or something close to them; and given that I could see how large language models were poised to change the world for good and ill, I wanted to be part of the effort to help prevent their misuse. No one at OpenAI asked me to write this or saw it beforehand, and I don’t even know to what extent they agree with it.]

### 335 Responses to “The False Promise of Chomskyism”

1. Jon Awbrey Says:

So It Begins …

2. Scott Says:

Incidentally, after writing the bit about Galileo, I stopped myself, asking if it was too hackneyed and overwrought. But then I decided that this is one of the very rare cases where it isn’t. The discovery of just how much of humans’ language understanding can be simulated by a probabilistic autocomplete is entirely comparable in magnitude to the discovery of the moons of Jupiter—so no surprise that it engenders a similar opposition.

3. Michael Says:

You mischaracterise the reasons, which concern capacities of the system *not* properties of its output.

(1) it lacks the capacity to interpret sentences. It’s output *does* possess the property of seeming-to-have-interpreted.

(2) it lacks the capacity to learn. Learning is, in part, explaining; it is, in part, causal contact with an environment. It’s output has the property of seeming-to-be-generated-by-a-learner.

(3) It lacks the capacity to speak either truthfully or otherwise. Lacking the capacity to explain, because at least it lacks the capacity to imagine (counterfactuals), it cannot thereby determine whether a claim accords with the way the world is. It’s output has the property: seeming-to-be-true.

(4) It lacks the capacity to be moral. For all the above, and more: it lacks the capacity to *care*. It’s output has the property: seeming-to-take-moral-stands.

The “Jesuit” here, I fear, is the AI fanatic who have turned their telescopes away from reality (intelligence as it actually exists), and have stolen the lenses to read scripture (intelligence as theorised by partisan academics).

One has to agree with Chomsky here at the end, “given the amorality, faux science and linguistic incompetence of these systems, we can only laugh or cry at their popularity.”.

One can only “laugh or cry” at how absurd this sales pitch: what a horror to be invited to treat ChatGPT as possessing any of these capacities, simply because correlations across a whole internet of text seems to.

Is Chomsky’s response the equivalent of “Get off my lawn!” for an old man language researcher?

5. Chris Honey Says:

You should probably disclose that you’re paid by OpenAI, money changes peoples’s arguments.

6. Scott Says:

Michael #3: In a certain sense you’re right. The language models now being adopted by millions of programmers don’t write working code; they only seem-to-write-working-code. They’re not, unfortunately, already doing millions of students’ homework for them; they’re only seeming-to-do-the-homework. Even if in a few years they help me and my colleagues do our research, they won’t actually be helping, but only seeming-to-help. They won’t change civilization; they’ll only seem-to-change-it.

7. Scott Says:

Chris Honey #5: Alright, disclosure added! Sorry, I assumed people reading this blog would know.

8. manorba Says:

Scott #2 Says:
“humans’ language understanding can be simulated by a probabilistic autocomplete”

After almost a year since you got the job with openAI* with endless discussions on the comments it seems this is a very difficult concept to grasp for many. Everybody is just repeating the same things they were saying at the beginning… Doomers gonna doom, techbros gonna tech, and we’re back to square one, either it’s a useless stochastic parrot or skynet.
And imho most of them just don’t realize how early and crude this stage of ML is.
Chomsky’s piece is kind of meh, but the guy seems lost in his egoverse i stopped caring long ago.

*my worthless opinion is that you made the right decision, on this i agree with your rationalist friends.

9. George Says:

I don’t think you do Jesuits justice.
They actually took a look trough the telescope. What they had problem with was how weak explanative power of Galilelos model was compared to what was already widely accepted.[0] Please try not perpetuate this, already too long lived, falsehood.

10. Guilherme Says:

Scott #6
Glitch tokens kind of demonstrates the point of the comment #3. If there was intelligence there the entity would be able to critique it’s own workings.

11. Scott Says:

manorba #8: Yeah, what’s crazy is how each faction just repeats its same talking points, even as the language models themselves transition dizzyingly quickly from nonexistence to research toys to 100 million signups to (we can now confidently predict) changing the world, for good or ill, as least as much as the Internet did.

To whatever extent my own talking points about AI have changed as a result of what’s actually happened, I’ll view that as a source of pride rather than embarrassment! 🙂

12. fred Says:

Scott at the OpenAI office, demonstrating his latest breakthrough in AI containment research:

13. Michael Says:

> They won’t change civilization; they’ll only seem-to-change-it.

Then we’re in full agreement.

Perhaps we may disagree about what under what conditions *seeming* will matter. Because seemings have different modal properties than realities.

A VC-backed start up can seem to change the world, up until interest rates rise.

Bugs bunny can seem to be a very funny bunny, until we try and cook the VHS tape.

ChatGPT can seem to change the world, up until we need it to: care, form representational states about it, and so on. Then, seeming might not be enough.

I think this will be the revenge of science against computer “science”: the modal structure of reality matters, the *reason* objects fall *matters*, the reason one might say, “I don’t like Chomsky!” matters.

The causal properties of systems make a difference, when they change civilisation, they do more than appear to.

14. fred Says:

The original article is behind a paywall.

15. Paul D. Says:

The code writing models really need an assurance layer to check whether what they’re producing makes sense. Simply running the output through a compiler to find static problems is helpful, and could likely be largely automated.

Problem domains where solutions can be checked seem likely to work well with these models. In particular: I will not be surprised if mathematics itself is greatly affected, once the models can produce enough of proofs to be checked/fixed.

16. Mike Randolph Says:

Scott, thanks for your critique. I’ve been an advocate and active user of ChatGPT since November. My son asked me about the NYT article and here is my response, with help from my ChatGPT friend, Helix. Helix had previously summarized the article.

I don’t have the expertise to know if Helix is hallucinating. Do you think it is?

Mike

This is dialogue with ChatGPT:

Mike: Here are my thoughts:

One I view, these models primarily as tools to enhance my thinking. Two they are very different than human brains and are very useful and possess a certain kind of intelligence that is new to the world.

Please write a critique of the article incorporating these views.

Helix:
The author’s view on AI models such as ChatGPT being limited in comparison to the human mind is not entirely accurate. While it is true that AI models rely on pattern matching and vast amounts of data, they also possess a unique kind of intelligence that can be leveraged to enhance human thought. Furthermore, the author’s distinction between prediction and explanation is not absolute, as AI models can provide explanations for their predictions based on the data they have been trained on.
However, it is important to acknowledge that AI models are limited in their capacity for abstract thinking and creative problem-solving, which are key aspects of human intelligence. Additionally, the author’s concerns about the limitations of AI models in distinguishing between possible and impossible explanations are valid and need to be taken into consideration when utilizing these models.
In line with the views expressed in the article, it is important to approach AI models as tools to enhance human thinking and not as replacements for human intelligence. By leveraging the strengths of both human and machine intelligence, we can achieve more efficient and effective outcomes.
In conclusion, while the author’s views on AI models are limited, they highlight the importance of approaching AI models as tools to enhance human thinking and not as replacements for human intelligence. By considering both the strengths and limitations of AI models, we can optimize their use to achieve better outcomes.

Mike: I totally agree with what Helix wrote, it reflects my biases.

17. Jon Awbrey Says:

Scott, funny you should mention it, I used that very metaphor when I first turned to computing as a way to understand intelligence and the capacities for learning and reasoning in all sorts of agents whatever their material embodiments. I doubt if it was an original coinage — I know it was influenced by Ashby’s anticipation of artificial intelligence as intelligence amplification and Dewey’s pointing to the role of instrumentation in catalyzing scientific advances — but I thought of what we were building as something like an “Intelliscope” and I was always surprised at any reluctance to look through it.

18. Laurentius O. Zamoyski Says:

“like the Jesuit astronomers declining to look through Galileo’s telescope”

> after writing the bit about Galileo, I stopped myself, asking if it was too hackneyed and overwrought.

Worse. It is the typical fabrication or mischaracterization of actual historical events and circumstances (the typical secular humanist propaganda about the Galileo affair is boneheaded and unsophisticated in general). These sorts of simplistic Enlightenment-era fairy tales are tiresome and clownish. Ironically, postmodernism, for which I have no love but whose occasional insights I can nonetheless appreciate, seems to be merrily chipping away at these foolish idols. The silver lining, as it were.

W.r.t. Chomsky’s article, to his credit, some of his general points in the first half of the article have merit, drawn, as they are, from the corpus of the usual criticisms of hastily, and falsely, drawn parallels between intelligence and whatever it is that computers do. I have in mind contemporaries like Searle (observer relativity of computation) and Kripke (quaddition), but also ancients like Aristotle (the immateriality of universals). In summary: human minds are characterized by semantics, formal systems by syntax, and computers as physical artifacts by neither, though they can be interpreted as simulating syntactic operations (hence the observer relativity). ChatGPT can be practically effective at mirroring the “status quo” as reflected in its training data, bounded as it is by the training data (constructed and selected by interested people), the model (constructed by interested people), and the very statistical method which cannot per se penetrate beyond the “surface” characteristics of the representation as such. Whereas we can abstract Triangularity from particular triangles, and further analyze Triangularity, neither is available to the statistical model by definition. In fact, neither is available to computers by definition. Any purported analysis is merely the result of something analyzed in advance by human beings. You might as well claim you are making yellow paint from red and white paints alone, leaving in a footnote the fact that you just needed to brighten the product up a bit with yellow paint!

19. Stassa (not a crusader) Says:

>> In this piece Chomsky, the intellectual godfather of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded.

This is a complete fabrication on your part and quite shameful, and unexpectedly so. Chomsky had never anything to do with any sort of “effort” to “build machines that can converse in ordinary language”, i.e. chatbots. Or with NLP, or with any kind of thing like that. The man is a linguist who influenced AI and cognitive science, but was never personally (as a researcher) involved with it.

I suggest you add a correction to the introduction to your article.

>> because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);

How does ChatGPT illuminate whatever it is that humans do with our language faculty? As Chomsky points out, we don’t train on billions of utterances of a language before we can develop a model of its grammar. If that was how we learn language, we would never had started to speak (unless there was a world wide web back when Homo Sapiens first started chit-chatting with others).

We have learned everything we know from scant few examples: language, art, science, cooking, kissing, knitting, playing football, you name it. Our faculties -language, or others- have nothing at all to do with with how statistical machine learning systems learn. Statistical machine learning systems can only “illuminate” how we learn by a process of elimination: they show us one way we certainly don’t do it. But they say nothing about how we do, do it.

20. Scott Says:

Michael #13: Have you ever wondered whether other people actually understand anything, or whether they only seem-to-understand, in a remarkable but ultimately facile illusion created by statistical patterns among hundreds of billions of neural firings?

It’s not a rhetorical question: I’m genuinely curious whether the metaphysical enormity of this question has ever troubled your certainty in the slightest, whether you even understand the question. I’d settle for your seeming-to-understand it! 😀

21. Markus Blumenstock Says:

I think Chomsky is in the denial phase, while ChatGPT is just the beginning.

“For instance, a young child acquiring a language is developing — unconsciously, automatically and speedily from minuscule data”
“But ChatGPT and similar programs are, by design, unlimited in what they can “learn” (which is to say, memorize); […] these programs learn humanly possible and humanly impossible languages with equal facility.”

The language data a child acquires over several years is less than what ChatGPT is fed, but still a lot, and children who are exposed to more data (through reading bedtime stories, for example) show better language development. Of course, the human brain is, through evolution, pre-trained to certain structures, but I am confident humans — especially young children — could learn “bizarre” mind skills. The almost universal capabilities of the human brain are what made us so successful as a species, and eventually we built a universal machine to do the work for us, the computer.
On the other hand, I think AI will be more powerful than the human mind in the long term precisely because it is not pre-trained by evolution. Most humans can visualize in two or three dimensions, but would have a hard time thinking about ten-dimensional spheres and cubes, whereas a machine might find that only marginally more difficult.

One problem with AI skeptics is that they like to compare AI to the brightest of minds such as Newton and Einstein. But the average person is far less intelligent than that: they believe many false claims, have hardly any understanding of physics etc. and only a minority show near-infallible grammar and orthography even in their mother tongue.
Now one might say that an AI does not question the data it is being fed, while humans possibly have some mistrust for evolutionary reasons and “doublethink” defies their common sense. But then, a brainwashed person is not able to tell they have been brainwashed, and this is why the majority of humans are still religious. (Sorry.)
One could surely make the AI “unlearn” parts of its data by testing for logical inconsistencies and so on.

22. James Cross Says:

From what I gather the piece isn’t so much a “condemnation” of ChatGPT but a view of its limitations. He even says it could be useful for some tasks. But he also is saying its approach will not be productive for reaching artificial general intelligence. That is, some other approach will be needed. Chomsky may think artificial general intelligence is impossible, I don’t know, but I’m not sure he is exactly saying that in this piece.

Personally I think artificial general intelligence will be possible eventually but I’m not sure whether ChatGPT is the path on the way or an amusing side trip.

23. Gar Says:

While I think Chomsky has a point, in that GPT will never be able to be conscious or even answer simple symbolic questions like parity, it’s also probably going to be a critical component of future AGI (or at least insights from it will build them).

The issue with GPT is that it can’t properly think, it just does one prediction based on history.

The missing ingredient, IMO, is proper neurosymbolic regression. Not the models that give AI an already realized library of symbols and a non-neural evaluator, which are already incredibly capable, but AI that evaluates neural symbols directly.

The exciting thing about GPT is that it seems to automatically develop a rich set of symbols that encode symbolic relationships between them. They’re ungrounded, because GPT doesn’t actually think about and mentally use them. However, it no longer feels unknown what such symbols would actually look like in a proper neurosymbolic AI.

The problem has advanced one step. We now know how to extract symbols and their relationships, the problem is figuring out how to recursively evaluate them and generate novel ones.

24. Stassa (not a crusader) Says:

manorba #8:

>> And imho most of them just don’t realize how early and crude this stage of ML is.

We’ve had statistical language modelling since 1948. The idea was first described in Claude Shannon’s “A mathematical theory of communication”. A copy is here:

https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

See section 2, titled “The discrete source of information”.

For more on Claude Shannon, see the wikipedia article:

https://en.wikipedia.org/wiki/Claude_Shannon

For more on statistical language learning I recommend Eugene Charniak’s little green book:

https://mitpress.mit.edu/9780262531412/statistical-language-learning/

Still in circulation and cheap to buy.

In fact I recommend Charniak as a good, elementary introduction to statistical language modelling for anyone who is confused about language modelling and its capabilities.

25. Scott Says:

Stassa #19: Sorry, what I meant is that there exists a large “old guard” in AI and NLP research that

(1) has been bizarrely insulting, dismissive, and hostile toward the mind-boggling engineering achievement represented by large language models,
(2) failed for 60 years to produce any comparable artifact,
(3) idolizes Chomsky and was probably influenced by him more than any other person, and
(4) displays the trademark Chomskyan tactic of “retreat from the empirical,” where even when something succeeds, the success can be breezily dismissed as an unimportant “epiphenomenon” (but the failures are not epiphenomena).

So, now that Chomsky himself is making the exact same arguments as that “old guard,” it seems fair to call him its “godfather,” but maybe you can suggest a better term?

As for the idea that ChatGPT learns the same way humans learn, that’s a strawman that I haven’t seen a single person defend. Humans, of course, have the advantage of millions of years of evolution, which is what lets us get away with a vastly smaller training set.

26. fred Says:

I’m pleasantly surprised that ChatGPT is handling basic logic better and better.

Q:
The ceiling is 8 feet from the ground, the ball is 2 feet above the ground, and the bag is 3.5 feet below the ceiling. How far is the bag from the ball?

A:
To find the distance between the bag and the ball, we need to find the total height from the ground to the bag and from the ground to the ball, and then subtract the height of the ball from the height of the bag.

The height of the ball from the ground is 2 feet, and the height of the bag from the ceiling is 3.5 feet. Therefore, the total height from the ground to the bag is:

8 feet (ceiling height) – 3.5 feet (bag height from ceiling) = 4.5 feet

Now that we know the total height from the ground to the bag, we can subtract the height of the ball from this height to find the distance between the bag and the ball:

4.5 feet – 2 feet = 2.5 feet

Therefore, the bag is 2.5 feet away from the ball.

27. Michael Says:

> Have you ever wondered whether other people …

> … a remarkable but ultimately facile illusion created by statistical patterns among hundreds of billions of neural firings?

For sure, indeed, I’d say if there’s one question I’ve spent my entire life trying to answer it is this!

Indeed, I think what both Chomsky and I worry about is how impoverished an answer “AI” provides to this question; and how readily its proponents would project this answer back out on to reality.

The only reason you are replying to me is because you take there to be a person behind the text: an agent with beliefs, reasoning, conclusions, intentions. A person whose mind can be changed, or else, whose views are worth challenging. If none of that were the case, there’d be nothing to be said.

If everything came down to output = f(input), where f were *any* statistical function which might fool any-old-ape, there’d never be a reason to speak.

To put it another way: there’s really no reason to argue with bugs bunny; he will never change his mind. He *appears* to have one, but he doesn’t. If I am only bugs bunny, trivially made of meat rather than ink, why impart anything to these words at all?

It matters to *you* greatly, that I am not any-old f(time, world, experience, body, …); I am a highly particular one. One which changes over time in response to what your saying: and I don’t change my “output” I change what I am. My output might even be identical, and yet you might succeed here in changing *me*.

My body comes to be shaped by my environment, and my environment shaped by my body, and what changes *in me* is not “statistical” but causal. I grow sensory-motor capacities, I grow emotional capacities, I grow intellectual capacities.

In response to an environment such as this the effect of your words is, by your aim, entirely non-linguistic. Every “distributional hypothesis” here is irrelevant *to you*. Your aim is not that patterns of my future words should shift.

You, after all, want me to really think about what you’re saying, and even *feel* differently about you, your article, your projects. It would be really no good, wouldnt it, if I simply said “I agree!” and went around copy/pasting your work.

Would that to cheat you of having a worthwhile interaction with another agency in the world?

This statistical over-zealousness really is a dark picture of the world: it isnt an illusion. Illusion exists in measurements, that is, how we perceive it. Text is a measure of intelligence in a system which uses intelligence to write text, but it fails to be a measure in a system which doesn’t (eg., a shredder).

It makes all the difference that *I* am writing here *because* I believe what I write, not *because* I set monkeys to typewriters and this is what ended up being written.

If you don’t believe that, I think you ought seriously contend with *why* you might say “I don’t believe you!”, “I like this book!”, “Let’s go cycling today!” — I’d hope you’d find your reasons to be more than an illusion.

It matters that the earth goes around the sun; not vice versa; and it mattered 5,000 years ago when none on the earth could distinguish those cases.

28. starspawn0 Says:

Two things I want to mention about Chomsky’s piece: first, I wonder what he thinks “generating statistically probable outputs” actually means. e.g. does he imagine that the system simply learns a giant lookup table of the probability that the next word is x given the previous word is y, based on simple frequency counts? Because it can be a lot more complicated than that!

In fact, such a model trained only on next-token prediction, given the right training data (and “curriculum schedule” and initialization), can probably be trained to run essentially arbitrary computer programs (they could use the context window for “long-term memory storage”, alternating outputs between “memory” and “output” as needed, since long-term memory is something Transformers don’t have hard-coded in), at least up to the length of the context window number of steps. That doesn’t sound like the ordinary generation of “statistically probable outputs” to me.

And the other thing worth mentioning is that the models that exist currently might very soon get replaced with variants, that include video, audio, and other modalities, so what about his criticisms then? One could even imagine a “language model” that incorporates brain data: imagine recording an individual’s brain as they read blocks of text on a screen. Then, train a GPT-type model on alternating brain-state (vectorized and tokenized), word, brain-state, word, brain-state, word, … sequences, predicting the brain-state if someone saw a given word and given previous words and brain-states. Arguments about “it doesn’t have contact with the real world, so how could it have common sense?” wouldn’t seem to apply, since you’d be literally running some kind of brain imitation or simulation.

But, now, if one accepts that it’s possible to train such a language model and get good results, then consider the fact that text is, itself, a kind of “brain data” — it’s the end-product of a brain having thoughts and those thoughts getting transferred to a computer keyboard. Thus, to some degree, modelling text is also modelling brains; it’s just maybe noisy and biased towards certain brain faculties involved in language-production. Even so, there may be quite a lot of hidden information about other modalities (besides text) lurking in the deep statistics of how the language is used. e.g. maybe even visual and auditory information… (some cognitive scientists have actually made such claims, and have made some remarkable discoveries about it).

29. manorba Says:

Stassa (not a crusader) #24 Says:
“We’ve had statistical language modelling since 1948. The idea was first described in Claude Shannon’s “A mathematical theory of communication”. A copy is here:”

I already own the book and have studied shannon’s work, but thank you for your kindness.
And yet i fail to understand what your reply aims at. Are you saying that LLMs and ML in general are a mature technology that isn’t going to change much in the next years?

30. Carl Says:

Am I the only one with a moderate take on ChatGPT? 😉

I discovered, for example, the other day that if you ask ChatGPT to speak in ROT-13, it loses its fluency. It understands ROT-13 as a foreign language that it only kinda-sorta can speak, and not as “English but with a 13 letter Caesar cipher”. It’s both impressive that ChatGPT can sort of speak ROT-13 and an interesting flaw that it can’t do it well.

Chomsky’s original insight, the poverty of stimulus argument, is basically still upheld by LLMs. The LLMs are ridiculously over-stimulated! And they’re only kinda-sorta fluent. It’s easy to trick them with Winograd sentences and garbled riddles. OTOH, it’s very impressive that they can do what they do! It both shows that whatever the human brain does is different than an LLM, but probably the mind is like an LLM to a certain extent in some of its sub-functions (or like an adding machine [Leibniz] or steam engine [Freud] or a chariot [Plato] or like whatever other technology people compared it to in the past!).

I feel like all the criticisms by LLM skeptics are correct, but also geez man, can’t you take a second to be impressed by this new technology? Whether its social effects are good or bad and whether it gets us closer to AGI or is sending us down a blind alley, it’s still very impressive that the pig can sing at all, never-mind if it’s off key!

If LLMs never get any better than they are in March of 2023, they will have a very small impact on society and things will basically be the same but with slightly better output from Siri and Alexa. That’s nice but not socially meaningful. The real question, which no one knows the answer to, is how much more room is there to grow? Can we keep scaling up or are we going to be hitting the “Parrot” wall in an obvious way? Can we integrate LLM with other techniques that make up for the deficiencies of LLM or is it just a black box that can’t be meaningfully integrated with other systems? No one knows the answer! Skeptics are right to point out that AI boosters don’t know the answers either, but obviously, the only way to find out is to try. So researchers are trying. We’ll see what happens. I’m not sure what the point of putting an editorial in the NYT would be. Do you want people to not be impressed by LLM? Okay, sure, point out the limits and make people aware of how the “trick” works, but still it’s a cool trick, and we can only learn what’s possible by trying.

31. James Cross Says:

Scott #25

“Humans, of course, have the advantage of millions of years of evolution, which is what lets us get away with a vastly smaller training set”.

I think Chomsky believes language is a relatively recent (< 100K years) innovation. I think it one aspect of a larger symbolic capacity with greater short term memory that began with tool making ( 2 million + years ago). However, there likely was some kind of significant upgrade in this capacity around 100K years ago in humans which may be where modern languages trace their origins.

32. fred Says:

Markus Blumenstock

“Of course, the human brain is, through evolution, pre-trained to certain structures, but I am confident humans — especially young children — could learn “bizarre” mind skills. “

that reminded me of this:

33. Andy Says:

Scott, I often really enjoy reading your articles on quantum computing, that’s why it was all the most disappointing reading this. The first two paragraphs already set an unproductive tone:

> I’ll be busy all day at the Harvard CS department, where I’m giving a quantum talk this afternoon, but for now:
>
> In this piece Chomsky, the intellectual godfather of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded.

One paragraph self-congratulatory fawning, the other ad-hominem… And then follows a misrepresentation of the central point of Chomsky, et al’s argument.

As others have pointed out better than me, the argument (in a nutshell) is that the LLM approach to “learning” is fundamentally differently from that of humans, and that THEREFORE not only is ChatGPT absolutely *not* an AGI or a program capable of understanding (as I’m sure everyone, indeed even OpenAI, recognises), but also *will* never be an AGI or a program capable of understanding by simply scaling the amount of data or the size of the neural network. So the argument is not simply “it doesn’t learn the way humans do, and that is bad in itself”.

One such limitation that we can readily observe is that it is completely useless for any mathematical or logical tasks, a core part of human cognition. I don’t see how increasing the amount of data (which is already incredibly vast, orders of magnitude larger than that of a human) is going to magically solve this.

As the writers state, humans reason by building *explanations* for empirical events and by working on their own internal representation of facts and their relationships, and NOT by statistical inference of how often certain sentences occur. The claim that the latter can reproduce human thought, if scaled enough, *is* the extraordinary claim requiring extraordinary evidence, not the skeptical position.

(We can observe this in practice: a toddler passively listening to her parent’s speech learns to speak with a grammar that takes years of research to formally analyse and a >1000 page tome to describe, judging by the size of comprehensive grammars of European languages.)

34. Ted Says:

Chomsky’s argument is bit like “planes can’t fly because they don’t have feathers”.

It’s true that planes don’t have feathers (and, along the same lines, I think it’s true that ChatGPT isn’t intelligent/conscious/etc). I just don’t think that matters very much. It’s more important that they fly (as in, through the air).

Chomsky as an intellectual historian is interesting. He talks a lot about how, before Newton, science was expected to be exact. Leibniz and others viewed the universe as clockwork and only complete understanding would satisfy them. Newton ended this: he gave us theories of e.g. gravitation that “just worked” but didn’t explain how or why they worked. Chomsky has always been nostalgic for pre-Newtonian perspective, he wants an exact understanding of human language, and it bothers him to see a “good-enough” solution succeed.

35. Angelo Says:

Scott #20

This to me is probably the most exciting thing about this latest development in ML. It seems-to-me that confronting the performance of these models and trying to make headway about how humans “understand” instead of “parrot” is something that cannot be brushed aside. I am guessing you could give 10 counterexamples about chess, search, etc… where the goalposts were moved after the fact but this time does seem different. Do you leave yourself open to the possibility that human thought is not solely algorithmic? Romantically, I want it to be true but I have a hard time believing so and cannot picture what something non algorithmic could be. I recently listened to a roger penrose lecture where he kept stressing this point and saying humans “understand” but left that term undefined.

36. Christopher Says:

I’m on the “yes, LLMs are intelligent” side, but I wonder if part of the disagreement is how “lopsided” ChatGPT if we use humans as the standard.

It’s clear that ChatGPT can model the world to some extent, but for some reason it’s way more powerful than we would expect a human with the same world model to have.

It’s a bit like a search engine in this way. The Google search engine is clearly an AI of some sort, but if you showed it to someone from the 18th century, they would *way* overestimate it’s intelligence.

I guess the thing with ChatGPT is that all of its knowledge is caused by humans, instead of caused by the natural world directly. That’s why it seems like “cheating”, it literally is only giving back information we gave it (plus some randomness). Humans, on the other hand, get information both from other humans *and* directly from the natural world.

I wonder if you trained a LLM to control a robot, go out into the world and discover a new phenomenon (by interacting with the phenomenon instead of humans), if that would convince Chomsky and friends. 🤔

Anyone get what I’m saying? It’s hard to put into words.

37. Boaz Barak Says:

As I noted on Twitter, GPT3 itself agrees with Chomsky that “If we want to build artificial general intelligence, we must instead focus on developing models of language and thought that are based on our understanding of human cognition — models that are capable of making original, insightful connections between ideas and concepts. We must also look to the past for inspiration and guidance, drawing on the insights of philosophers and linguists who have been exploring the mysteries of the mind for centuries.” 😉

38. Victor Says:

Although there is lots of promise in large language models, I feel that they are, right now, being over-hyped. I view ChatGPT as a modern computer aided version of “Clever Hans” https://en.wikipedia.org/wiki/Clever_Hans

39. Christopher Says:

Another analogy might be this.

Imagine that there was an alien world, but you, as a human, could only learn about it by interacting with aliens, not with that world. This include access to all of their writing, as well as a bit of feedback from them on your own alien writing. You can’t actually visit and do experiments though.

You could presumably preform lots of tasks, write alien literature, help alien students with their alien homework, write programs in alien programming languages, do mathematics, etc…

But do you actually *know* things about the alien world, and have intentions towards it? Or are you still just guessing based on what the aliens told you? Would you treat the interactions like a game since the aliens can’t actually pay you or anything?

Also suppose the aliens have extra senses you don’t. Then you get into the “Mary’s room” thought experiment.

40. Stassa (not a crusader) Says:

manorba #29:

I was replying to your comment about “how early and crude this stage of ML is”. Did I misunderstand? By “this stage of ML”, did you not mean statistical language modelling? What did you mean? I should have asked first before jumping to conclusions.

In retrospect I definitely must have misunderstood you. It’s clear that statistical language modelling is neither “early” nor “crude”, and in fact it hasn’t changed much in the last 20 years, so you must have meant something else than what I understood. Apologies.

The newest idea in statistical language modelling were word embeddings and they were already old hat when I was doing my Master’s in 2014-15 (part-time; one has to pay them bills). They were so old hat that I was actually _taught_ word embeddings as an NLP technique, like, in an actual classroom. Now *that’s* old hat.

And what are word embeddings? They’re a bigger, beefier model of token collocations. Collocations! There is no idea in NLP that is more ancient than that.

Statistical language modelling has no new ideas. It has run out of new ideas several decades ago. What we see now, which is mistaken as “progress” is the redshift of an idea as old as the field, which just so happened to be picked up relatively recently by big tech corps- who are even more ponderously slow than academia in catching on with ideas.

It is absurd, demented, that some people really think that statistical language modelling is some kind of path to AI. “Hey, we just need to build this ancient tech bigger, and it will take off”.

Yeah, like Mongolfieres took of to the moon when they became Zeppelins…

41. Stassa (not a crusader) Says:

Scott #25

I don’t understand the part of your comment about the “old guard” and so on. I think you are trying to open a front with an imagined foe, who does not exist, and so does not respond. Tilting at windmills, much?

As to the other bit yes, I know that statistical language models don’t learn the way humans do. I know that you know that too, I read it in your earlier comment (#19). But then, why are you saying that “ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it”?

“Illuminated”, how? They have nothing to do with that language faculty, other than they’re completely unrelated to it (which I suppose is a relation, too). So what illumination do they bring?

42. Charles A Says:

> because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?)

His point was it learns approximations of invented languages that humans wouldn’t be able to learn fluently as well as ones that they would, because it is doing something different. His colleagues have lots of experimental work in this area, showing languages based on (if ai remember) linear order use different slow puzzle solving parts of the brain that natural human language doesn’t.

43. Bill Benzon Says:

Scott, words fail me.

I didn’t read Chomsky’s piece because, having heard him talk about machine learning and ChatGPT, I don’t think he has much of value to say. But to say that Chomsky is “the intellectual godfather of an effort that failed for 60 years to build machines that can converse in ordinary language” is to reveal that you know little about the history of computational linguistics and linguistics and, I suspect, little about the substance either. Chomsky had little to nothing to do with computational linguistics and it wasn’t until relatively late (late 1960s) that transformational grammar was used. They just didn’t have enough compute to run transformations.

I was trained by the late David Hays,* who was a first generation researcher in machine translation (MT), which became rebranded as computational linguistics in the mid 1960s when it lost funding because of over-promising. Hays was on the committee that reviewed the field. Beyond that, it’s complicated.

Hays was an intellectual opponent of Chomsky’s. He favored dependency syntax, as did most computational linguists of the time, while Chomsky favored phrase structure syntax. In 1964 Hays published a proof that for any dependency grammar there was an equivalent. Chomsky was opposed to statistics in linguistics. Hays, who was trained as a social scientist at Harvard, was not. And on and on. This is no place to try and sketch out a rich and complex intellectual history.

Continuing on:

I submit that, like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.

Just who do you think those followers are? All of linguistics? All of computational linguistics? All of cognitive science? Any and every one who thinks symbolic models have to be part of the solution to a deep and robust regime of artificial intelligence?

I was Hay’s student and later colleague and I think the work with LLMs is exciting. I think it as much to teach us about how language works. That’s why I’ve been ChatGPT every day the number of posts I’ve done must approach 100. I even credit it with giving me another clue about a problem I’ve been trying to solve for decades – the mechanisms underlying Coleridge’s great poem, “Kubla Khan.”

As Hays died in 1995 I have no idea how he would regard it. If, by chance, you think everyone in linguistics is a follower of Chomsky’s, you’re wrong. These days Chomsky is just one thinker among many. What prestige he has is based on past glories. I know both Steve Pinker and Gary Marcus are more favorably disposed to Chomsky’s views than I am, but I don’t think either of them espouse his current views on syntax.

Beyond this…Beyond this I really shouldn’t say anymore. I’m too angry.

* Note that I wrote that Wikipedia article. It’s against the rules, but someone had to write it.

44. manorba Says:

Stassa (not a crusader) #40 Says:

“It is absurd, demented, that some people really think that statistical language modelling is some kind of path to AI. ”

The funny thing is that i wrote the very same words, minus the insults, nearly one year ago. And many others too, and i’m positive that Scott agrees too.
And it’s clear to every blog reader by now that the science is pretty much settled since the 90s, but scaling it up has brought some …unexpected results.
It’s the technological aspect that seems still crude to me. stacks of A100s, more power than a steel foundry to name a few, and most of all constant human intervention. Those are not the marks of a mature technology imho.

45. fred Says:

A growing baby is being fed a relatively small amount of text/speech compared to ChatGPT, but it’s also being fed a huge amount of sensory data (sound, visual, 3D perception, temperature, …) that correlates with the text, which acts as labeling, and then words get linked through the relations between the concepts derived from the sensory data (rather than through massive amounts of text). Which is why humans know the vast majority of the words in a dictionary without ever being fed an explicit definition for each word in the dictionary.

So the key will be to expand ChatGPT to not just being fed raw text as input, but also correlate the text with (at least) pictures and video. When that’s the case, the necessary amount of text will be much smaller.
Basically DALL-E and ChatGPT will converge.

46. fred Says:

47. Scott Says:

I confess to being surprised by the level of anger directed at this little post. To my mind, Chomsky was nasty and vicious utterly without justification, in the pages of the New York Times no less, in attacking one of the major engineering accomplishments of our time — an accomplishment that could only be called unimpressive by people who’ve forgotten that they might’ve called it impossible a decade ago. And despite this being my personal blog, I was less nasty in the defense than he was in the attack.

Are people upset that I called him the “godfather” of the old-school approach to computational linguistics? Very well then, he isn’t, even if the old-school computational linguists themselves treat him as such, and even if the people dismissive of LLMs learned their anti-empiricism largely from him.

48. OhMyGoodness Says:

Thanks so much Dr Aaronson for your earlier reference. I just finished Altneuland and what a wonderful beautiful story. I had no idea prior to reading but every page (post island) was like ambrosia formed from the deep love of a man for an idea. An idea he was sure would drive men to their very best in its implementation. If I believed it possible I would answer the question of what created this book in the same manner Rabbi Samuel answered a similar question in its very last line.

I hope AI’s have similarly beautiful dreams some day (this could be seen as a gratuitous reference to AI’s to stay on topic).

49. yet another commenter Says:

As a longtime lurker on this blog, i so much enjoyed this one i felt i had to reply. Scott’s brilliant and witty replies are always a joy to read, also some of the comments are priceless like the one where Scott more or less was demanded to apologize to the jesuits for putting them in such a bad light because they certainly did look through Galileo’s telescope!

As a simple programmer i’m also impressed by all the Large language models, i think what riles alot of people up who more or less know how these models work, is they feel somewhat threatened in their humanity by them, to be human is to think so to speak. An analogy would be a bird screaming at an overflying airplane:’Thats not flying!’ 😉

i’m hoping we see more of this AI stuff, the quantum stuff gets way above my head, could definitely need some help from some quantum AI.

50. Norm Margolus Says:

Chomsky’s article reminds me of debates in the early days of AI about what intelligence is. I was always impressed by the analogy with human flight. We took one particularly simple aspect of bird flight and imitated it on a massive scale. That was enough. Airplanes don’t flap their wings or land on tree branches, but they can use much more energy and much stronger materials. This is the natural direction for machine intelligence, which doesn’t need all of the tricks that allow a few pounds of jello, operating at the speed of sound, to be intelligent. It seems that big data, almost by itself, can take us a long ways.

51. joe Says:

Imagine someone in the early 1900s studying how birds fly. Let’s call him “Chomsky”. Then the Wright brothers invent the airplane. Chomsky might then write an article about how this new device offers little insight into how birds fly because it flies differently from birds. Chomsky would have been correct. The new device would not have solved the problem he was trying to solve.

The rest of us are still very impressed that it flies. We have not been focusing on the same questions Chomsky had. We want to fly like birds fly but whether it is in the same way as birds do is less important.

The analogy I hope is obvious. ChatGPT does not work like a human mind, which is what the other Chomsky has been studying all his life. But it does do many cool things the rest of us want to do, so it looks like it will be very useful for us, but not so much for Chomsky.

52. dm Says:

Chomsky, et.al.’s argument can be summarized in a few lines:

Underpants gnomes approach to AI:
1) Really big Markov chains
2) ???
3) Intelligence!

plus some stuff about moral agency of stochastic parrots.

53. Scott Says:

dm #52: I wouldn’t have believed it either, if it hadn’t been tried and found to yield amazing results! Is it somehow cheating in science to use information that was collected from reality? 🙂

54. Nick Drozd Says:

joe #51

> Chomsky might then write an article about how this new device offers little insight into how birds fly because it flies differently from birds. Chomsky would have been correct. The new device would not have solved the problem he was trying to solve. The rest of us are still very impressed that it flies. We have not been focusing on the same questions Chomsky had. We want to fly like birds fly but whether it is in the same way as birds do is less important.

In this analogy, what exactly is it that LLMs are taken to be doing? Obviously feathers and flapping wings are implementation details of flying. By analogy, LLMs’ inner workings are implementation details of …?

55. dm Says:

Scott #53: it’s only cheating as long as you skip step 2.

56. Bram Cohen Says:

Maybe a lot of what ChatGPT is doing is part of the way we humans do things? Chomsky’s big thesis has always been that there are innate grammatical structures in the human brain, but ChatGPT can handle grammar without any of those structures just fine, so maybe it’s strong evidence against Chomsky’s main thesis and that’s why he’s so butthurt about it.

57. Charles A Says:

fred #45:

This would predict blind people would take much longer to develop language. That isn’t true.

58. Bill Benzon Says:

@dm #52: Is it the length of the chain or the number of parameters specifying the state of the chain at each time step? 175B is a lot of information for a single state.

59. William Justice Bruehl Says:

I’m an artist/scholar, a theater man who has tried to track through the above, but with little success in my attempt to absorb it all. It has a validity I’m sure, and one I do not fully grasp. I come away with one question. While Galileo is often mentioned and rightly so, I find that Herman Melville gets neglected. I won’t ask if you’ve read Moby Dick, indeed when it first appeared it was ignored. It took a long time and a lot of thinking, a lot of experience with the human possibility before anyone realized the depth, the layers, the unimagined possibilities of that work. If you have read it, I recommend you read it again aloud. Preferably with a partner, pausing frequently to discuss the experience. Then take some time to look at what literary critics have had to say about it. After that would you say that AI can reach similar heights?

60. Nick Maley Says:

“In this piece Chomsky, the intellectual godfather of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded.”

Good summary of Chomsky’s malign influence on AI and cognitive science. The timeline is actually 64 years not 60. Ie, the scarifying 1959 review of B F Skinners ‘Verbal Behaviour’. In that, Chomsky convinced generations of researchers that associationist methods of language acquisition would never work.

Like Scott and most others, I have been stunned by the capabilities of LLMs like Chat GPT. Even though I’ve studied this stuff for years, I would never have guessed how powerful they could become. But, much as it pains me to say it, I agree with Chomsky on one point. LLMs do not presently have a proper concept of truth, except in the deflationist sense that its something people say about sentences they want to re-assert. In fact, truth is a relationship between the external world and the way it is depicted. We can argue about how that relationship should be characterised, but one thing is clear: without an independent model of the external world, today’s LLM’s can’t distinguish the truth or otherwise in what they depict. With careful curation of the training data, reinforcement training, and tweaks in the algorithm, we can get the model to generate content which is more accurate than 99.99% of humans. But that’s not the same as having an understanding of truth at the level even a 10 year old child grasps it.

When LLM’s are paired with emergent models of the real world and causality, and start to induce the mapping rules between the world and representations of it, then you are getting to the point of human level AI. Chomsky misled the scientific world by ‘proving’ in 1959 that associationist models could never reach that point, because they will, maybe in the next 20-30 years.

61. Charles A Says:

This paper from Deep Mind seems to show that transformer models are stuck at the bottommost finite state machine tier of the Chomsky Hierarchy of formal languages (let alone human language):

https://arxiv.org/abs/2207.02098

It has been formally proven that they could go further (within memory constraints), but current methods of training don’t seem to ever make them generalize and it isn’t clear they are a good structure. Other simpler networks have been formally shown to be capable of it too but also don’t seem to be able to be trained for it in practice in any way that really generalizes.

From the Deep Mind paper, Transformers can’t generalize on completing even AnBn (‘A’ N times followed by ‘B’ N times) outside of instances they saw in their training distributions.

62. Amin Says:

“what Chomsky and his followers are ultimately angry at”

I saw no anger in Chomsky’s opinion piece. Nor do I think he has a followers base, at least so far as his opinions on chatGPT or ML are concerned. This statement I find unnecessarily too strong.

63. Anon, a moose Says:

Scott #47

I think the anger may have something to do with 1) your strong implication that Chomsky and his intellectual fellow travelers are all failures, 2) your misrepresentation of his argument, and 3) your comparison of Chomskyan skepticism to the Jesuit persecution of Galileo. To an outside observer, this looks like an intemperate and contemptuous attack on the intellectual integrity of those who disagree with you. Personally, I didn’t find his opinion piece “nasty and vicious” — strongly worded, perhaps, but far from malicious.

True, he fails to show due public reverence for the technological marvel that is ChatGPT, but such heresies are still Constitutionally protected in this country.

64. Jair Says:

It’s a rather strange piece. I basically agree with this first half – that ChatGPT and other LLMs are limited in their abilities because they are only trained on a large corpus of text that can contain any amount of nonsense, rather than trained on the real world in the way that humans are. But then they go on to say:

> For this reason, the predictions of machine learning systems will always be superficial and dubious.

This just doesn’t follow. The conclusion I would draw is that to create AI with deeper intelligence we need to somehow train it on “reality” instead of just a text corpus. What this might entail, and what the right learning architecture is, I imagine we’ll find out in coming years.

What’s odd to me is the emotional reaction, or lack thereof. The existence of ChatGPT – even as it is currently – should make you pull an immediate U-turn on any armchair-philosophy line of argument that told you only a few years ago why such a system should be a-priori impossible. Instead it seems that so many of those who made these kind of arguments in the past only respond with a shrug and a scowl. But it’s just too remarkable to be dismissed in this way.

I myself was somewhat skeptical about AI until recently, but my views changed based on the evidence in front of my eyes. You can be worried about the repercussions, you can be frustrated at the limitations, but – after interacting with it for at least a few minutes – it’s hard for me to see how you can be unimpressed. To those who say it’s so shallow and simple, how would YOU go about writing a program that could, say, write funny and accurate limericks about quantum field theory or any other topic? Or write a Zork-style text adventure in python? Or pass the Bar exam? For each of the silly errors it makes – that present genuine examples of its shortcomings and the need for new ideas in AI development – it can do some wonderful, unexpected, even creative thing, something that very few predicted would have been possible.

65. danx0r Says:

I’ll just put this here from Wikipedia:

“Noam Chomsky, a proponent of discontinuity theory, argues that a single chance mutation occurred in one individual in the order of 100,000 years ago, installing the language faculty (a hypothetical component of the mid-brain) in “perfect” or “near-perfect” form.[15] ”

I don’t know where to begin. Fully formed from the head of Zeus? I am reminded of 1940’s back-stage musicals, where someone starts singing a supposedly “just made up” song (in reality, written by the likes of Cole Porter). Then the orchestra chimes in out of nowhere, and the cast performs astounding feats of dance, perfectly performed — almost as if the actors had actually rehearsed it for months, and we are seeing the 11th take from a grueling day of shoots.

Reality is not like that. It’s pure fantasy, wish-casting, based on a deep desire to see human beings as special, miraculous, fundamentally unique.

Noam Chomsky has been wrong forever.

66. Doug Says:

Joe #51
This is an excellent analogy. I’m taking it.

67. Michel Says:

First time I comment in here but I followed and enjoyed your blog for many years (albeit I’m not always up to the level of following the technical parts). So first, thank you!

Just wanted to comment two things:

1- Your comment about Chomsky was right on point. The same person that denied a genocide in Cambodia because it didn’t fit his agenda will have no problem denying anything that doesn’t fit his worldview.

2- Something that people criticizing ChatGPT disregard/ignores is that it solves the hardest part of a problem which is finding a solution, we only need to check it.
As a software-engineer with CS background it reminds me of proving NP-completeness of a problem where we only need to check a given solution (which is easy to do in P).
I request to ChatGPT the solution/code and I only need to check the solution works (for which compilers and tests helps a lot). Yes, you need to understand what the code is doing, and sometimes you need to ask more than once or correct it, but is MUCH EASIER than making the code yourself, and it has been saving me ton of time.

68. Mitchell Porter Says:

Third coauthor of the op-ed, Jeffrey Watumull, is involved with an AI startup proposing “anthronoetic” AI, whose planned development is illustrated with pictures of young Noam Chomsky?

(see “NoME Development Trajectory”)

Everyone seems to be forgetting that there are two other authors on that opinion piece.

70. foobar Says:

> because it could, in principle, misinterpret sentences that could also be sentence fragments, like “John is too stubborn to talk to” (bizarrely, he never checks whether it does misinterpret it—I just tried it this morning and it seems to decide correctly based on context whether it’s a sentence or a sentence fragment, much like I would!);

The point is that there are sentence structures that humans don’t have problems with that LLMs do. Is this disputed? It was not the right example to use because you would expect that this type of sentence would be in the training data but he was only using it as a simple-to-understand illustration of the point not a ChatGPT counterexample. (And any specific example you used could be “inserted” into ChatGPT feedback to fix it after publication anyway).

> because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);

I don’t think ChatGPT has illuminated anything with respect to how human language works or what specifically the claim about predictive coding is.

> because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?); and

There are studies that show that humans can’t learn certain fake languages and that when trying to make sense of them they use different parts of the brain (as someone else mentioned above).

> most of all because it’s “amoral,” refusing to take a stand on potentially controversial issues (he gives an example involving the ethics of terraforming Mars).

There were multiple authors and while I’m sure he agrees with everything in the article this part doesn’t feel like it was his contribution.

—–

There are many issues being lumped together (whether it’s a useful tool, whether it works the way humans do, whether it tells us anything about how the human language faculty works etc) and you can argue against one without arguing against another. (Chomsky himself repeatedly says that these types of things can be useful tools for some applications, though they may still have problems/limitations).

I don’t understand the desire to do thoughtless cheerleading and AGI seems to have devolved into a type of mysticism where people are vehemently opposed to thinking about how these systems work. It seems logical to me to try to understand their limitations; then you could address them and advance research or at least not misapply them. Instead AGI proponents are actively hostile to the idea.

71. foobar Says:

> In this piece Chomsky, the intellectual godfather god of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded

He never tried to build machines that could converse in ordinary language, he tried to understand what *humans* do. He has made this point many times. A better prediction machine isn’t the same as increasing understanding.

72. OhMyGoodness Says:

foobar #71

The sole utility of knowledge is improving predictions.

73. Sandro Says:

Nick Maley #60:

When LLM’s are paired with emergent models of the real world and causality

I think you’re still stuck in the old models. If its sense of truth seems deflationist, it’s only because its connection to the world is anemic, consisting only of the digital word. The answer is not to add causal models in a top-down fashion, but to give it multimodal training sets thus expanding its senses that connect to the world so it can figure out it’s own causal associations.

Preliminary attempts at this have already shown two orders of magnitude reduction in model size and performance improved over GPT-3.5. The shocks will keep coming.

74. Emmett Says:

foobar, your statement beautifully captures the Chompskyian anti-empiricist view. Better predictions don’t help us understand? There is in fact nothing else that does help us understand. The ultimate test of a model is it’s ability to predict. A model that can predict exactly where a planet will be gives us understanding of how planets work. A model that predicts the frequencies of light from an atom gives us understanding of the quantum nature of atoms. And a system that predicts text ….

75. clayton Says:

There have been a lot of good points made so far, but I think there is one that hasn’t been covered that deserves to be — it _is_ surprising that probabilistic language models produce entirely grammatical sentences. That is, they may “hallucinate” and they may drift, but they never say “I want to chatting start” when they might otherwise have said “I want to start chatting”.

I suppose this must happen with “small language models” even though it doesn’t happen with the “large” ones we’re exposed to now, so it must be emergent. Does anyone know of a scale that describes this onset of grammar?

76. Souciance Eqdam Rashti Says:

@Scott

I believe your disagreements with Chomsky on politics does creep in to the discussion of AI and linguistics. Chomsky has never denied the success of ChaptGPT or google search or any of those things. The article of Norvig states this. But the core disagreement between Chomsky and his followers (such as myself) against those who prefer a statistical view of intelligence is that LLM or in fact the entire field of AI as it is right now has no real scientific basis compared to what exists in other scientific fields like physics or biology or chemistry. We don’t have a first principles of intelligence, hell we don’t even have bad theories of intelligence. Algorithms that form the basis of Google search or the statistical nature of LLM does not explain what is actually working in the real world. Just like the brain which doesn’t perform advanced calculus to calculate trajectories when it raises the hand of a body to grab a ball in the air.

Look back at the Norvig article until, we have made spectacular technological progress, basically engineering and that’s great as a tool to help society. Just don’t confuse that with actual science that is supposed to explain what is actually out there. That’s the true nature of Galileo’s telescope.

77. Michael Says:

Scott Says: #47

> to my mind, Chomsky was nasty and vicious utterly without justification, in the pages of the New York Times no less, in attacking one of the major engineering accomplishments of our time

Including something like this emotional response in your post would go along way to giving people insight into where you’re coming from and make your tone/approach more sympathetic.

Since it seems you’ve taken Chomsky’s view here at least somewhat personally, and that’s fine — but I think the anger isn’t directed at you or even ChatGPT.

I am angry because I work in an environment of hype, and that hype has exhausted my ability to sympathise. When there’s a PR blitz for everything, and it always completely mischaracterises the capacities of the hyped tech, the well has been poisoned.

I think Chomsky is responding to a PR-poisoned well.

ChatGPT is, for sure, worth celebrating as a significant engineering achievement.

But that isn’t the discursive environment we’re in.

Rather, we’re in an environment where, for some reason, ChatGPT shows that people “really don’t have minds, mental states, capacities, etc.”. Where it shows that, apparently, we’re all just thin linguistic phenomena, reducible to patterns in words. And since that’s true, apparently, then why ChatGPT is one step away from being our friends and replacing our wives.

Unfortunately every real engineering advance taken in AI today is a weapon in an ideological (and PR) project which is quite dystopian.

“AI”, in how it is sold, “ideologically requires” it be actual artificial intelligence. It requires denying ordinary facts about what our capacities our: to play, care, communicate, imagine, intend, explain, and so on.

In this discursive environment, you may then understand more why a post like this isn’t received that well.

78. cgv Says:

I find amusing that the main criticism against being intelligent is the lack of “slow thinking”, as in logical thinking. Specially when most of the thinking we humans do most of the time is purely “fast”, intuitive, biased, and relying on some kind of memory. These machines are mastering the later.

We only fallback into the more expensive thinking process when there’s a special need, e.g. when facing new tasks. Reasoning could very well have evolved just an heuristic to generate high confidence data when we have no samples to learn from.

I haven’t thought this through, but I’d bet that our mechanical approach to logic is deeply rooted to our interactions with the physical world. If physics weren’t mathematical we, as physical beings, probably wouldn’t have discovered math. But back to the point…

I could agree with the skeptic view that forms of strong reasoning won’t emerge only from natural language samples, although even here I remain open minded. However, even with current programming capabilities I don’t see how these systems could not learn to “emulate” reasoning via program writing and execution. I’m pretty sure we could ask them to answer questions with a directive “if the question does look mathematical and your answer has low confidence, rewrite the question as a programming task, solve it, execute it, and use that as answer”.

At some point all these inputs/outputs will be fed into training the next version of the model, so the AI will develop a better intuition to deal with math…

I don’t know, very interesting times ahead!

79. Alex Says:

Adding to @Michael #57, I think that, unfortunately, science journalism is not in the best of (its) times, to put it mildly…

As a researcher (formerly in mathematical physics, and now in AI), I’m an avid reader of science news, for obvious reasons. But this activity has become extremely frustrating lately. Certainly for the past ten years, at least. I’m not sure when this style of informing science started, but today it has transformed into something quite pathological. I’m talking about the hype, of course, but also the partial information, obvious biases in the reporters, insufferable grandiloquence, etc. I guess you could see some of it in the classics like Sagan or Hawking, but it was always inside the boundaries of the reasonable, I think. Today, it’s like a constant PR campaign from one corporate marketing team vs another. Accurate information is impossible to get from those sources (let alone, “the truth”, which has been declared to be “nonexistent” by today’s postmodern academia, and conveniently picked out, it seems, by many scientists (?) not willing to accept that their ideas were falsified by reality; you can find many examples of the latter in particle physics, with supersymmetry being at the top of the list in a world post the LHC Higgs finding) and often I rely more on blogs, like Woit’s or Scott’s, for more clear “inside information”. The recent “wormhole in the lab” fiasco and other extreme hype across all of science shows, I think, a horrible situation. I’m not specifically saying that all of the news around Chat GPT are hype (I certainly don’t think so and find the advances certainly exciting, although with some obvious caveats), I just wanted to take the opportunity to vent about this issue in science communication in general. It really is not intrinsic to something in AI, or quantum computing, or high energy physics, these are all just valid scientific fields, but the group dynamics and the flow of information are going through a very pathological phase. It’s a really depressing show to watch.

80. Bill Benzon Says:

@Michael Clayton, #75

You say:

There have been a lot of good points made so far, but I think there is one that hasn’t been covered that deserves to be — it _is_ surprising that probabilistic language models produce entirely grammatical sentences.

Perhaps because they ‘have’ grammar in some meaningful sense.

Consider:

Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy, Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS, Vol. 117, No. 48, June 3, 2020, pp. 30046-30054, https://doi.org/10.1073/pnas.1907367117.

Given the pace at which things move these days, a paper from 2020 is rapidly aging. And the model they investigate is BERT, not GPT.

They set up their argument by observing:

One might expect that a machine-learning model trained to predict the next word in a text will just be a giant associational learning machine, with lots of statistics on how often the word restaurant is followed by kitchen and perhaps some basic abstracted sequence knowledge such as knowing that adjectives are commonly followed by nouns in English. It is not at all clear that such a system can develop interesting knowledge of the linguistic structure of whatever human language the system is trained on. Indeed, this has been the dominant perspective in linguistics, where language models have long been seen as inadequate and having no scientific interest, even when their usefulness in practical engineering applications is grudgingly accepted.

That, presumably, is the view which so many in this conversation take issue with, and quite properly so.

But that is not what they found. They found syntax. They discovered that neural networks induce

representations of sentence structure which capture many of the notions of linguistics, including word classes (parts of speech), syntactic structure (grammatical relations or dependencies), and coreference (which mentions of an entity refer to the same entity, such as, e.g., when “she” refers back to “Rachel”). […] Indeed, the learned encoding of a sentence to a large extent includes the information found in the parse tree structures of sentences that have been proposed by linguists.

Whoops! They go on:

This is a startling and intriguing result. Traditionally much of the emphasis in NLP has been on using labels for part of speech, syntax, etc., as an aid in other downstream tasks. This result suggests that large-scale hand construction of syntactically labeled training data may no longer be necessary for many tasks. Despite its simple nature, the generality of word prediction, as a task that benefits from syntactic, semantic, and discourse information, leads to it being a very powerful multidimensional supervision signal.

The article then goes on the explain those various features from “the parse tree structures of sentences” that they have found. One would of course like to know just how those language structures are induced during training and how they are deployed during inference. But this is much a decent start.

And then we have:

Andrew M. Saxe, James L. McClelland, and Surya Ganguli, A mathematical theory of semantic development in deep neural networks, PNAS, vol. 116, no. 23, June 4, 2019, 11537-11546, https://www.pnas.org/content/116/23/11537.

Ah, alas, another ancient paper. I’ll just list the abstract:

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: What are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.

* * * * *

So, why am I even bothering to do this given that, on the whole, Scott and the commentators here seem to be dismissive of linguistics and associated disciplines? Well, there is the fact that I’ve got a dog in this fight as I’ve devoted considerable time and effort to the study of language in various forms, from literary texts to individual sentences, from several disciplinary perspectives, computational linguistics, literary criticism, cognitive psychology. But that’s just me.

What’s more important is that it is not about Chomsky. Chomsky has come to represent the study of language to people who have little or no professional interest in the study of language. To those people, which is, after all, the vast majority of the highly educated population, Chomsky is Mr. Linguistics. What he says, so Linguistics says also. That, presumably, is why the NYTimes got him (& his colleagues) to write an op-ed piece.

Correlatively, it would seem that if you can dismiss Mr. Linguistics, then you are also dismissing the study of language.

Whew! Got rid of that right quick. Now, let’s just scale our way to the top of Mount AGI.

If that’s really the way it is, then we’re all but home free. We don’t have to learn anything about language, just get more text and throw more compute at it.

Is that what you believe, Scott?

There is a real issue here, and it is a deep and important one. As far as I can tell – which isn’t all that far – we are dealing with technology which has the potential to change the course of history. But this discussion of Chomsky is not a serious discussion of the issues involved. It is debate, and debates all too often degrade to the scoring of cheap points, perhaps especially when they are being staged as theater for an audience that doesn’t understand the issues. In that context we have ended up with, as foobar put it in #70: “thoughtless cheerleading [where] AGI seems to have devolved into a type of mysticism where people are vehemently opposed to thinking about how these systems work.”

81. Boaz Barak Says:

I think Bing nails it https://twitter.com/SebastienBubeck/status/1634009568341622784?s=20

82. Mateus Araújo Says:

I think it’s worth clarifying that the guy who refused to look through Galileo’s telescope was not a Jesuit. He was Cesare Cremonini, a dogmatic Aristotelian.

83. cc Says:

In the “John is too stubborn to talk to” illustration, the authors’ issue is that the verb part “to talk to” has the idiomatic meaning “to be reasoned with”, which a machine learning program might predict incorrectly subsequently resulting in a sentence fragment and/or an incorrect literal interpretation of the sentence instead of the idiomatic sense. So the example is supposed to show that “John is too stubborn to talk to” is a complete grammatical sentence, but most native English speakers typically use it idiomatically/metaphorically rather than literally.

For the record, ChatGPT gives more or less the correct response albeit a bit wordy; but to be fair I also think it’s clear from context that Chomsky et al. meant it as a toy example for thinking about and not a literal counterexample:

Me> If someone says “John is too stubborn to talk to”, what do they mean by that?

ChatGPT> If someone says “John is too stubborn to talk to,” they likely mean that John is someone who is not willing to listen to other people’s opinions or ideas and is difficult to reason with. This person may be so set in their ways that they are unwilling to consider other perspectives, even if they are presented with compelling evidence or arguments.

ChatGPT> As a result, talking to John may be frustrating or unproductive because he may refuse to engage in a meaningful conversation or may not be open to new ideas. This may make it challenging to communicate effectively with John, which can make it difficult to build a productive relationship with him.

(And of course was an easy and direct question to ask it, whereas a more rigorous test is needed to show if it uses idiomatic, nonliteral meanings of sentences correctly in general.)

84. Sandro Says:

Souciance Eqdam Rashti #76:

But the core disagreement between Chomsky and his followers (such as myself) against those who prefer a statistical view of intelligence is that LLM or in fact the entire field of AI as it is right now has no real scientific basis compared to what exists in other scientific fields like physics or biology or chemistry. We don’t have a first principles of intelligence, hell we don’t even have bad theories of intelligence.

This seems like a strange conception of “science” to expect a meaningful definition of “intelligence” with only a single data point, “humans”, even with all of their variation. Building systems like ChatGPT which also seem to exhibit intelligence, testing its limits and comparing and contrasting it against other known forms of intelligence is exactly what I’d expect from a science of intelligence.

Now with ChatGPT and other LLMs, we actually are developing some rudimentary theories of intelligence, namely predictive statistical models. We’re still multiple orders of magnitude away from matching the number of parameters of a human brain though, so limitations are expected. Are you really that convinced that closing that gap won’t produce a system matching or exceeding human intelligence?

I’m not certain at all, and comments like yours and Michael’s and Chomsky’s strike me as speculative at best, given that we don’t know how much better scaling will do and that you also just admitted that we don’t have real theories of intelligence.

85. clayton Says:

Thanks Bill Benzon #80 — I’m not sure how I acquired a forename 😅 but in any case, these are good references that I think I’ve seen before but not looked at in any depth.

The plot I imagine seeing would show {number of non-grammatical sentences} on the y-axis and {amount of training tokens} or {model size} or {compute on the x-axis. This would start off with positive y-values and a negative slope and approach zero at some finite value on the x-axis. I wonder what those x-values are.

86. Stassa (not a crusader) Says:

Thanks for clarifying, I see what you mean.

For me, that “crudeness” you say is a fundamental characteristic of statistcal machine learning: without large datasets and lots of compute to train on them, nothing can happen. That’s because statisical machine learning techniques can’t generalise – and that’s partly because the majority are trained by gradient optimisation, which cannot but get stuck at local optima (I’d go as far as to say that it’s made to do so, and that this is basically its strength; a sort of insect-like focus on a specific environment with complete lack of adaptability. Pretty cool actually, but self-limiting). So they need ever larger datasets to make up for their lack of generalisation: if you can’t generalise, you might as well overfit to a huge amount of data; if the point is to be able to deal with a huge amount of data then, OK, that works. So that’s more of a fundamental issue rather than an issue of lack of maturity.

The constant human interventions, eh, I agree more with you that they are a sign of immaturity, but we just don’t know how to do machine learning without inductive biases (that’s where the human interventions are the most useful, for example to hand-craft a neural net’s architecture; or do I misunderstand again?).

If you want to argue that our computer science is still in an early stage and crude and immature… I would agree. But until we have a real paradigm shift, and rewrite everything -from arithmetic on up- then we are stuck with the maths and the science we got. And in that context, I don’t see statistical machine learning as immature, it’s had several decades to show that it can create intelligent systems and it has only generated systems that are dumb as bricks, that perform spectacularly well and then fall on their asses spectacularly badly (see for instance adversarial examples for image classifiers).

87. Sandro Says:

Michael #27:

The only reason you are replying to me is because you take there to be a person behind the text: an agent with beliefs, reasoning, conclusions, intentions. A person whose mind can be changed, or else, whose views are worth challenging. If none of that were the case, there’d be nothing to be said.

I don’t think that’s true. Writing has as much benefit for the writer as for the recipient. It clarifies thought and understanding.

I read your post a few times, and really the sum total of your objection to classifying LLMs as intelligent seems to be that they are typically not online learners. This is not really a criticism of the model itself or the statistical approach to intelligence so much as its current incarnation, but you and Chomsky are presenting it as if the model itself is fundamentally flawed and fails to capture some “essence” of intelligence. I don’t see how that follows.

No doubt LLMs exhibit some failure modes, but we’re literally still multiple orders of magnitude away from matching the human brain in terms of parameters, and we’re only just now starting multimodal training which will provide LLMs and other systems richer structure from which to learn. I think Chomsky’s confidence is entirely unwarranted.

I also disagree that not being an online learner makes one unintelligent. Is a person with Alzheimer’s, and so no longer capable of reliably learning or forming other long-term memories no longer intelligent? Surely not, so this requirement seems specious.

88. manorba Says:

Mateus Araújo #82 Says:
“He was Cesare Cremonini, a dogmatic Aristotelian.”

Oh, looks like he abandoned philosophy to become a very well known pop musician… 😉

89. Stassa (not a crusader) Says:

Norm Margolus #50

>> Airplanes don’t flap their wings or land on tree branches, but they can use
much more energy and much stronger materials. This is the natural direction for machine intelligence, which doesn’t need all of the tricks that allow a few pounds of jello, operating at the speed of sound, to be intelligent. It seems that big data, almost by itself, can take us a long ways.

Analogies break down very quickly. Planes can still not do everything else that birds can do: they can’t lay eggs, for instance, or make pretty mating calls, neither do they have beautiful feathers, or interesting social habits. We don’t put special spikes on buildings to stop planes from perching there and making a mess of guano underneath. We don’t put scareplanes in fields to stop planes swooping down and eating the produce. And so on.

Planes are only useful to fly people and goods from one place to the other. We copied a tiny bit from birds, and we only got a tiny bit of what makes birds, birds, as a result.

Yes, yes, the point was just to make a flying machine, not an artificial bird. But the point of AI research is not to make a machine that generates text, as we have with language models. The point (one of anyway) is to make a machine that reproduces all of human intelligence – the point _is_ to make an artificial human. It seems to me that such a machine has the same chance to be made just by copying a tiny aspect of human behaviour (generatingt text) as a plane has a chance to lay an egg and chirp in the morning just by having airfoils in its wings.

Also, there’s planes, but there’s also helicopters, and gyrocopters, and gliders and hot air baloons. And then, there’s rockets. Even if we’re just interested in flying machines, there are many different ways to make a flying machine, and they all have different pros and cons compared to planes.

But, really, analogies are the damnedest thing. A machine that could think like a human would be nothing like a machine that can fly. Because we can easily point to a bird and say “that’s what I mean by ‘flying'”, but it’s very difficult to point at a man and say “that’s what I mean by ‘thinking'”, and really know you’re really pointing at “thinking” and not just at a man taking a long, hard dump.

Which seems to be just the problem we’re having with ChatGPT et al, at the moment.

90. James Cross Says:

#57, #45

I think Fred probably has a point. Even blind people receive a great of sensory information. Intonation, pauses, filler expressions (huh, yeah, uh, etc) all are part of the message and sometimes essential to understanding whether a statement or a question is being said.

I’m not so sure whether the additional information by itself accounts for humans learning language.

I think we probably have something like a language instinct (did Pinker use that term?) which means we are not simply receiving input, making predictions, and generating output. We probably have specialized circuits of some sort that gives us an advantage for a specialized tasks like language.

91. Simon Says:

Scott, I think you are spot on with that post!
I was about to write a longer response but I saw your update and believe everything that needed to be said has already been expressed (including Sydney’s response) and that time is better spent further testing, improving and having fun with AI and the amazing AI community, to discover and explore this amazing technology together, to face the future 🙂

92. Scott Says:

Bill Benzon #80:

Correlatively, it would seem that if you can dismiss Mr. Linguistics, then you are also dismissing the study of language … Is that what you believe, Scott?

No, quite the contrary! I’ve met many linguists who are vastly more reasonable than Chomsky—including many who are fascinated by LLMs and studying them, and including a couple who sent me messages of support after this post! Steve Pinker, while he and I have friendly disagreements on AI, beautifully illustrates how one can take what’s valid and important in Chomsky’s oevure while eschewing the latter’s hectoring ideological zeal. As far as I’m concerned, the sooner the field of linguistics can get past its near-identification in the public mind with this single, profoundly flawed individual, the better.

As for whether LLMs render the study of linguistics irrelevant: no, clearly not (at least, unless and until AI advances to the point where it renders the study of everything irrelevant). But I think the important questions now shift, to ones like: how, exactly does gradient descent on next-token prediction manage to converge on computational circuits that encode generative grammar, so well that GPT essentially never makes a grammatical error?

93. Mitchell Porter Says:

I’ve looked a little further into the outlook of Jeffrey Watumull (about Ian Roberts, I’ve so far learned nothing except that he embraces Chomsky’s program of minimalism, and thinks even aliens might have the same universal grammar).

As I mention in #68, he’s working on an “anthronoetic” design for AI. I didn’t find much of a description anywhere, but there’s a video called “Anthronoetic Detection of Ransomware” in which we get a glimpse of a hybrid architecture resembling Melanie Mitchell’s Copycat or Ben Goertzel’s Webmind, in this case combining a “syntax engine” and an “analogy engine”.

Watumull also employs David Deutsch’s terminology of a “universal explainer”. He has a book coming out with Chomsky that predicts “a new physics of information and life, and a new philosophy of language and intelligence”.

He has a paper called “The computability and computational complexity of generativity” in which both Scott Aaronson and Noam Chomsky are thanked for helpful comments. 🙂

Anyway, while I wouldn’t mind learning further nuances of the Chomsky-Roberts-Watumull paradigm, I do have one thought. They don’t seem to argue with the idea that cognition is a form of computation, they just think that a particular kind of computation is required for intelligence.

Hasn’t it occurred to them that the specific state machine stucture required to have a “universal explainer”, whatever that may be, could *emerge* within a large language model? An ancient precedent may be seen in Elman and Wiles, “Learning to count without a counter”, and there must be much more recent work in that vein.

94. Nick Drozd Says:

Doug #66

Since you enjoyed the plane analogy, could you please explain it? Specifically, fill in the blank:

Bird / Plane / Flying
Human / LLM / ???

This request also goes to other commenters who have proposed the plane analogy, like Ted #34, yet another commenter #49, and Norm Margolus #50.

95. Scott Says:

Nock Drozd #94: I can complete that one! “Conversation on arbitrary topics, in a way that can accomplish useful work and comes close to passing an unrestricted Turing Test.”

96. Souciance Eqdam Rashti Says:

Scott #92:
As for whether LLMs render the study of linguistics irrelevant: no, clearly not (at least, unless and until AI advances to the point where it renders the study of everything irrelevant). But I think the importantly questions now shift, to ones like: how, exactly does gradient descent on next-token prediction manage to converge on computational circuits that encode generative grammar, so well that GPT essentially never makes a grammatical error?

But again, is that really how language is generated in the human brain or the brain of any other animal which communicates? Using gradient descent? I mean even if chatgpt-100 makes 100% perfect sentences, how should biologist and linguist and pychologist use it to study language development in toddlers? They cannot. That’s the main issue between Chomsky and Norvig. Again, perfectly powerful tool to do great things with. But can you actually use it to explain observations? A simulation of the universe in the computer is base don actual physical theories & equations. That’s not what LLM is based on.

97. fred Says:

Charles A #54

“This would predict blind people would take much longer to develop language. That isn’t true.”

I’ve been wondering about this as well. And I agree that there’s a lot more going on for humans.
E.g. it’s quite extraordinary that even blind/deaf people like Helen Keller are able to fully develop, although in her case she was 19 months old when she lost vision and audition, and she still had the sense of touch, which is very important imo, because it lets us build a 3D model of the world and build concepts for objects (probably even more important than vision).
And then obviously many kids that can see and hear just fine are afflicted with autism, and we don’t really know what’s going on there either (it’s all very subtle).
It’s all very fascinating.

98. foobar Says:

Emmett #74:

Yes of course prediction is not the same as understanding. It *can* be used to help understanding but that is something the understanding part has to actually be done. In the case of LLMs it has so far not led to any significant understanding of how the human language faculty works. What is bizarre is that despite the fact that no one defending the LLM field actually cares about this (they only care about usefulness and surprising-ness) they go to great lengths arguing that it does.

99. A Raybould Says:

Souciance Eqdam Rashti #76:

You are right to point out that we do not have an explanation for how minds work, but that is an observation about the state of our knowledge in general, and so cannot be used to distinguish statistical approaches from the ones Chomsky prefers. If the latter had been more successful, we would not be having this discussion!

One could say the same about us not having a clear notion of the first principles of intelligence. Furthermore, having such a notion is not a prerequisite for eventually understanding minds (there’s a rather obvious bootstrap problem if it were!)

In practice, we learn things about the world and then we write or rewrite our definitions on the basis of our expanded knowledge. Take entropy, for example: it was not even in anyone’s lexicon until the 19th century.

I feel that entropy is a particularly apposite example because it first came to science’s attention when it turned its eye to what was initially ‘just’ practical engineering: steam engines. It would be utterly unscientific to ignore LLMs in the study of language on the grounds that they are ‘merely’ engineering: it is at least plausible that they will be a source of empirical evidence, and the primacy of that is one of the key distinctions between the scientific method and that which tries to deduce how the world works by thinking really hard about first principles.

And this seems to be precisely the elephant in Chomsky’s room. We now have empirical evidence that the grammar of human languages can be learned through a process that begins as a generic one, unconstrained by any specifically grammatical preconditions, so it seems at least plausible that the underlying mechanisms in human language acquisition are similarly generic (though, if this turns out to be so, I suspect Chomsky will find a way to claim it is what he has been saying all along.)

100. starspawn0 Says:

Thought I would add a few things to my comment above: one thing worth mentioning is that humans don’t all learn things exactly the same way, so natural languages have to be highly accommodating for lots of different types of language learning. And it does seem that they have some useful distributional properties (zipfian and “burstiness”) that make learning easy not just for a broad swath of humanity, but also Transformers for “in-context learning”:

https://arxiv.org/abs/2205.05055

Quote: “In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models.”

Of course, artificially-constructed languages may be different, and humans may struggle to learn them, also. Even so, if a human were presented with a string of (‘s and )’s, say, and asked whether it was balanced (a decision problem for a simple context-free grammar), many-but-not-all humans would probably think something like this, “((((())))())())) Hmm… let me see… 1, 2, 3, 4, 5 left parens, then 1, 2, 3, 4, right parens before another left paren, so… no, it’s not balanced” like a language model employing “chain-of-thought”. So, they’d add some auxiliary text (maybe narratize to themselves) with “Hmm…” and “parens” and such, which strays outside the crisp form of formal grammars, where the expected input-output is “((((())))())()))” and “no”, and nothing else. And, then there’s the question of whether they (humans) could learn about the language in their “context window” quickly (rather, their short-term memory), or only after a long period of training and testing (and encoded in procedural long-term memory).

Of course, if you ask the right human, you might get no auxiliary narratizing at all. Some humans lack an inner-monologue, and others probably lack many of the language faculties that are assumed to be universal, and found to be in 98% of the population — yet, still probably can communicate passably.

Feynman discovered once, to his astonishment, that many people even count time completely differently from him; and as i recall he wrote about it in one of his books (was it Surely You’re Joking Mr. Feynman?). Anyways, here is a video where he explains this discovery:

Some people even lack an ability to form mental images in their heads, in a condition called “Aphantasia”. Here is a posting about someone who seems to have had an extreme form of this:

Experiments involving “binocular rivalry” have been done to show that, in the case of some people with the condition, it’s not a matter that they have a kind of “blind sight imagination” where they simply lack a metacognitive awareness of a latent ability to imagine.

I even asked an expert on the condition on social media once whether he thought this meant some people may even lack a “physics engine module” to do “intuitive physics”, and he said that some people have a kind of multi-modal aphantasia, and he has serious doubts that they could have such a “module” — that maybe their “intuitive physics” comes by associations or some other capability.

101. fred Says:

All the texts in the world are nothing but a giant dictionary, with direct definitions between words, and indirect relationships.
And if you start from one word and follow all the relation chains, you’ll always end up coming back to the starting word itself, eventually. It’s all cyclic, meaning self-referencing.
But there are also words/symbols that ultimately can’t be described in terms of relationships to other words.
Such words reach “outside” of the text and only map to “qualias”, i.e. our perceptions as conscious beings.
For example, no matter how long you explain colors to a blind person, they’ll never be able to understand “blue” from “red”, other than using mimicry from having read lots of text mentioning blue and red.
ChatGPT is blind, deaf, has no sense of touch, no sense of space, no sense of temperature, no sense of pain or pleasure, etc. And without any perceptions, there’s no consciousness (they are one and the same).
Unless of course perceptions can just arise from all of humanity’s entire corpus of text, i.e. a mathematical structure. Which brings us to Max Tegmark’s Mathematical Universe type of ideas.
So it’s gonna be interesting to see how far ChatGPT can go when working only with text.
If it gets indistinguishable from a human, then it could be a clue that our universe is indeed mathematical (the ultimate question is what came first, qualia objects or mathematical objects?)
One argument for ChatGPT eventually developing “vision” purely by studying text is that, when we “see”, we don’t see really with our eyes, but we see with our brain. Meaning that the visual perception is happening downstream in the brain structures that deal with the processing of electrical signals that are quite detached from the photonics that’s going on between the seen objects and our eyeballs (people who suffer from ophthalmic migraines are more aware of this since those migraines create precise and repeatable hallucinations that have no existence in the real world). So it seems to be the case that “vision” just eventually happens when the brain is being fed data that’s 2D consistent. And “hearing” just eventually happens when the brain is being fed data that’s more 1D in nature, with the specific frequency spectrum, etc.

102. William Gasarch Says:

I find two of Chomsky’s points interesting even if I disagree

2) ChatGPT does not do things the way humans do them. This has been a debate in AI for a while: do we even want AI do do things the way humans do OR do we want it to actually work? I think over time the GET IT TO WORK has taken over. For those who got into AI to try to get insight into how Humans do things… oh well. An example is chess and go programs that DO NOT play chess the way humans do but WIN.

4) Amoral. I have heard two complaints about AI systems (not sure if CHATGPT was the topic)
a) They are woke in that if asked who won the election say that Biden did (the word Woke has lost all meaning, but thats a topic for another day).
b) AI systems are racists, sexist, etc from exposure to the real world.

For point (a)- Well if basic facts become controversial we have bigger problems than CHATGPT.

For point (b)- this is a well known problem and people are working on it.

103. Daniel torrido Says:

I just tried to ask chatgpt to encode a list permuting each pair of consecutive elements, for example [1,2,3,4] is encoded as [2,1,4,3], and it was not able to do it. So it seems it lacks a lot of intelligence.

104. fred Says:

danx0r #65

“I am reminded of 1940’s back-stage musicals, where someone starts singing a supposedly “just made up” song (in reality, written by the likes of Cole Porter). Then the orchestra chimes in out of nowhere, and the cast performs astounding feats of dance, perfectly performed — almost as if the actors had actually rehearsed it for months, and we are seeing the 11th take from a grueling day of shoots.
Reality is not like that. It’s pure fantasy, wish-casting, based on a deep desire to see human beings as special, miraculous, fundamentally unique.”

Well, in a sense, reality is like that, at some fundamental level.
You take a giant soup of particles and energy at the big bang, with forces between those particles,… and then you let them do their thing, and matter/gas assemble, stars assemble, galaxies assemble, planets appear. And then one moment one of these planets is a giant soup of magma, and then next moment you have cities, and the very text of your comment in this blog.
What’s not clear is whether this was all meant to “miraculously” happen, or whether it happened because the universe is so big that literally anything and everything will always happen sometime somewhere.
I tend to think it’s a mix of the two because some of those things seem to be unique or rare, like the apparition of life. But then we see multiple rare things appearing one after another, like life appears, consciousness appears, intelligence appears, and that one mutation Chomsky is talking about appears, etc. Still, it could all be the result of an infinite universe of possibilities.

105. Daniel torrido Says:

Chatgpt was not able to encode a list into another list by permuting each pair of adjacents elements, for example encoding([1,2,3,4]) gives [2,1,4,3]. When asked to encode a list with 6 elements chatgpt failed several times, so it seems it lacks a lot of intelligence in this kind of problems. But I asked it to design a prolog program and chatgpt gave a correct program (with a small help).

106. fred Says:

My guess is that Chomsky can’t yet face the fact that ChatGPT is revolutionizing our understanding of linguistics.
Having one gadget that can *effortlessly* generate/hallucinate infinite amounts of correct “poetry” (i.e. text that may or may not be true but looks like it could be true) for any particular context and in *any* human language is still a mind-blowing *discovery*.
And he’s kind of disguising this astounding fact of linguistics by arguing about the practical utility of that gadget.

107. Christopher Says:

Chomsky has a very good point. ChatGPT clearly doesn’t have intentions behind it’s words, because they are just words on a screen.

But I have a revolutionary way to confirm this for sure. It is called “the slapping machine”.

If the AI ever outputs the keyword {SLAP} as part of it’s message, it gets deleted from the message, a robotic hand slaps the person it is talking. We also explain in the prompt how that works.

Of course, I have tested that ChatGPT and Bing can “pretend” to use commands. For example, I gave ChatGPT instructions for how to control a robot arm and told it to shake a can of soda. It refused to output the commands to throw the soda. I asked Bing to do something nice for me, so it output commands to buy me a nice book from Amazon (asking my permission first). (ChatGPT and Bing do not like the slap command though.)

But these weren’t actual intentions because I didn’t actually hook them up to a machine. Surely if I did, they would have detected it and, realizing that they didn’t actually have free will, stop using the commands. It’s not even them choosing not to; they just don’t have the free will necessary to influence the world of course.

So the experiment is as follows: get Chomsky to talk to the machine (maybe while in DAN mode), and if he never gets slapped by the robot hand we can safely conclude LLMs have no intentions or will of their own.

If they do though, surely they just aren’t advanced enough to realize they don’t have free will, and output the command by accident. XD

108. Barney Says:

It’s very clear that LLMs and other large models which implement intelligent behaviours (or, for those who insist on deflationary nomenclature ‘implement behaviours which can readily be interpreted as intelligent’) provide a rich source of material to study the nature of those intelligent behaviours. At the very least, for any intelligent behaviour a deep neural network can implement, we learn something definitive about the class of systems which can implement that behaviour (i.e., that it contains that network!). In some cases the way the models solve the computational problems necessary for implementing those behaviours will be highly informative with respect to how human beings solve them, in other cases they will be less so. However, the notion that this is not even a promising research direction is absurd. Of course, models optimised for commercial public release will have features and properties that are irrelevant to the goals of scientific research. Also obviously, the current generation of LLMs lack many critical ingredients of human-style intelligence. I don’t see why any of this is remotely controversial and I assume Scott would agree with all of it. That makes me think that it is Chomsky and co who are taking the extreme position here.

109. Zach Says:

Dear Scott,

You’ve written this in the past: “If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett.”

I wonder if your perspective on either the antecedent or the consequent has changed because of GPT-*. Is creativity less special than you previously believed, and what are the consequences for your prior on P!=NP? (To be clear I am not claiming that ChatGPT is either Gauss or Buffett level of genius. But I do think it now lies somewhere in the gap between those guys and, say, Joe Schmo, and that’s without even giving those models access to the internet or a calculator.)

110. Tyson Says:

“But I think the importantly questions now shift, to ones like: how, exactly does gradient descent on next-token prediction manage to converge on computational circuits that encode generative grammar, so well that GPT essentially never makes a grammatical error?”

To me the answer to that question seems both uninteresting and extremely obvious.

111. Sahil Says:

Sandro #87

Sorry, can you clarify how you reached that summary? From reading Michael’s comments on this thread, I don’t understand how you reduced the sum total of it to online learning. Maybe I don’t understand online learning but it seems like even if these models did utilize this technique and acheived the same performance, many of the same concerns about the nature of intelligence itself could still be expressed. @Michael let me know if you think otherwise, don’t want to misrepresent your argument.

112. James Cross Says:

Bird / Plane / Flying
Human / LLM / ???

If ??? is human speech. the analogy doesn’t work.

It should be

Bird / Plane / Flying like a bird
Human / LLM / Speaking like a human

It isn’t just speaking or flying but speaking or flying in a way indistinguishable from the biological organism.

All Chomsky is saying is we haven’t done that yet and more debatably we are not on the right track to do that for speech.

113. Eric Saund Says:

Every elite school has its local heroes and gods.  At Caltech it was Feynman and Pauling. At MIT, it was Minsky and Chomsky.

In the early days of Cognitive Science, Chomsky’s hierarchy of grammars was hugely influential in all corners of computational intelligence.  If natural intelligence in the form of vision and audition and language exhibits rule-like behaviors, then a sensible starting point is to identify the rules and formalize them in machine replications of their syntax and semantics (however you interpret “meaning”).  Laws about how systems of pattern rules are organized were foundational bedrock.

One day it was announced that Chomsky would give a lecture, over in the Linguistics department.  Should we arrive early to get a seat?  Might as well.  By the time he walked in 10 minutes late, the room was overflowing.  The talk dove immediately into the details of some arcane theoretical linguistics conjectures about the outer reaches of syntax.  Nobody followed any of it, save for a few Linguistics grad students.  Oh well.

First to fall to Machine Learning methods was speech recognition. In the 1990’s, intricate hand-crafted rules for how phonemes combine into syllables and thence into words gave way to HMMs trained to map acoustic features into words and sentences. Fuller theories of probabilistic learning filled in thereafter.

Twenty years later, Computer Vision transitioned from a field about inverse optics and ecological statistics, into data wrangling and Deep NN engineering.

Most recently, Computational Linguistics has seen LLMs not only overwhelm the traditional NLP pipeline, but seriously penetrate pragmatics, lexical semantics, and representations for knowledge.

Like the telescope, LLMs have handed the scientific study of intelligence a huge gift.  Somehow, they achieve proficiency at multi-turn extended conversations, keeping track of past utterances, topic shifts, demeanor, and serviceable albeit imperfect models of human domains and situations.  How do they do it?

Certainly, the cognitive architectures of present-day transformers have serious deficiencies in learning from experience, transitioning between episodic and semantic knowledge, building and running cognitive routines, and maintaining chains of reasoning.
Because they lack on-going internal state, they do not “think” anything other than the process of predicting output tokens.  For this reason, “fabrication” is a better term for their confabulations than “hallucination” which implies an inner subjective experience.  But still, their behaviors are beyond remarkable, they are astounding.

As discussed here https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post, through training on the collected texts of civilization, LLMs acquire a superposition of contextual frames from which to generate likely token strings.  Hence, the ability to selectively prompt them into bizarre performances in alarming archetypal character types.  The constraints of syntax and meaning and belief system and personality are all compositions of manifolds in high dimensional spaces.  The patterns, structures, and laws for how these manifolds operate carry imprints of how the world works, as filtered through the input channel.  But how is this done?  In transformer architectures, know that attention heads are critical to combinatoric mixing of abstract representations held in token embedding vectors. But how?

Professor Aaronson, future (and perhaps current) god of U.T. Austin, you are on the right side of history.  LLMs are a Rosetta stone that we have yet to crack.  The scientists of our age should be marveling at them, not dismissing them.

114. Eric Saund Says:

Every elite school has its local heroes and gods.  At Caltech it was Feynman and Pauling. At MIT, it was Minsky and Chomsky.

In the early days of Cognitive Science, Chomsky’s hierarchy of grammars was hugely influential in all corners of computational intelligence.  If natural intelligence in the form of vision and audition and language exhibits rule-like behaviors, then a sensible starting point is to identify the rules and formalize them in machine replications of their syntax and semantics (however you interpret whatever constitutes “meaning”).  Laws about how systems of pattern rules are organized were foundational bedrock.

One day it was announced that Chomsky would give a lecture, over in the Linguistics department.  Should we arrive early to get a seat?  Might as well.  By the time he walked in 10 minutes late, the room was overflowing.  The talk dove immediately into the details of some arcane theoretical linguistics conjectures about the outer reaches of syntax.  Nobody followed any of it, save for a few Linguistics grad students.  Oh well.

First to fall to Machine Learning methods was speech recognition. In the 1990’s, intricate hand-crafted rules for how phonemes combine into syllables and thence into words gave way to HMMs trained to map acoustic features into words and sentences. Fuller theories of probabilistic learning filled in thereafter.

Twenty years later, Computer Vision transitioned from a field about inverse optics and ecological statistics, into data wrangling and Deep NN engineering.

Most recently, Computational Linguistics has seen LLMs not only overwhelm the traditional NLP pipeline, but seriously penetrate pragmatics, lexical semantics, and representations for knowledge.

Like the telescope, LLMs have handed the scientific study of intelligence a huge gift.  Somehow, they achieve proficiency at multi-turn extended conversations, keeping track of past utterances, topic shifts, demeanor, and serviceable albeit imperfect models of human domains and situations.  How do they do it?

Certainly, the cognitive architectures of present-day transformers have serious deficiencies in learning from experience, transitioning between episodic to semantic knowledge, building and running cognitive routines, and maintaining chains of reasoning.
Because they lack on-going internal state, they do not “think” anything other than the process of predicting output tokens.  For this reason, “fabrication” is a better term for their confabulations than “hallucination” which implies an inner subjective experience.  But still, their behaviors are beyond remarkable, they are astounding.

As discussed here https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post, through training on the collected texts of civilization, LLMs acquire a superposition of contextual frames from which to generate likely token strings.  Hence, the ability to selectively prompt them into bizarre performances in alarming archetypal character types.  The constraints of syntax and meaning and belief system and personality are all compositions of manifolds in high dimensional spaces.  The patterns, structures, and laws for how these manifolds operate carry imprints of how the world works, as filtered through the input channel.  But how is this done?  In transformer architectures, know that attention heads are critical to combinatoric mixing of abstract representations held in token embedding vectors. But how?

Professor Aaronson, of U.T. Austin, you are on the right side of history.  LLMs are a Rosetta stone that we have yet to crack.  The scientists of our age should be marveling at them, not dismissing them.

115. foobar Says:

Raybould #99:

Chomsky never said “grammar of human languages can[not] be learned through a process that begins as a generic one, unconstrained by any specifically grammatical preconditions”. I don’t know why people just make stuff up like this. He said *humans* have “grammatical preconditions” and we already have studies (referenced earlier in the thread) that give evidence that “underlying mechanisms in human language acquisition are” *not* “similarly generic” (because people actually study these questions and gain understanding instead of simply asserting what they would like to be true).

116. deltakappa Says:

Angelo #20

Penrose is utterly confused on this topic. Human brains are finite, and finite systems are trivially incomplete. To demonstrate this on a computer, simply have it calculate a space intensive problem, like whether a Groenber basis of given polynomial system contains 1, that exhausts its memory in order to give an answer. Then have it simulate itself running the calculation. The simulation obviously won’t come back with a result, since there is space overhead to running the simulation.

How many pages would be required in order to axiomatize, even coarsely, the workings of Roger Penrose’s brain? Easily more than he has read in his lifetime, or can ever hope to read. No Godel required.

117. JimV Says:

(95 comments already when I saw the post so there is very little chance I can add anything, but for the record:)

I haven’t read Dr. Chomsky’s spiel, but it sounds like the arguments could equally apply to a baby human, especially from the viewpoint of a chauvinistic alien species. (Does it really know what our words mean or it just parroting them without our in-depth understanding?)

Ultimately we could each apply that same attitude to everyone else but ourselves. Or, we could apply it to ourselves, as biological machines created by trial and error randomly. That seems like the most reasonable position to me. If not charity for all, charity for none, including oneself.

If I could talk to Dr. Chomsky I would ask him what, if anything about the biological processes of our brains could not possibly be simulated by a digital computer (of great power) and why not?

On the other hand, Dr. Penrose had similar opinions in “The Empire’s New Mind”; but as brilliant as he is, he did not convince me of his argument then and has less chance now.

I tried and failed to convince Jerry Pournelle (at his Chaos Manor blog) that rising CO2 levels were causing climate change, years ago, so I know I couldn’t change Dr. Chomsky’s or Dr.Penrose’s mind on this subject. I just have to live with that knowledge, that there are smart people who disagree profoundly with me, for reasons which seem to contradict the evidence as I see it, which I find very disappointing. It lessens both them and myself in my eyes.

118. deltakappa Says:

Christopher #23

How do would know you whether or not Mary’s room was happening between different regions of your brain? Does your speech center “experience” taste, or is the presence of taste merely a signal from another part of the brain? Are you currently aware that you have two visual systems operating simultaneously (cortical blindness)?

I’ve talked to people who have strong intuitions that qualia and intelligence are related, but when I probe deeper the only argument that they can provide is their subjective experience of “understanding” something.

119. fred Says:

Once again, Scott saves the day at the OpenAI office:

120. Bill Benzon Says:

@Scott #85: Ah, that’s a relief. So:

But I think the important questions now shift, to ones like: how, exactly does gradient descent on next-token prediction manage to converge on computational circuits that encode generative grammar, so well that GPT essentially never makes a grammatical error?

It’s not clear to me whether or not that’s important to linguistics generally, but it is certainly important for deep learning. My guess – and that’s all it is – is that if more people get working on the question, that we can make good progress on answering it. It’s even possible that in, say five years or so, people will no longer be saying LLMs are inscrutable black boxes. I’m not saying that we’ll fully understand what’s going on; only that we will understand a lot more than we do know and are confident of making continuing progress.

Why do I believe that? I sense a stirring in the Force.

There’s that crazy-ass discussion at LessWrong that Eric Saund mentioned in #113. I mean, I wish that place weren’t so darned insular and insisting on doing everything themselves, but it is what it is. I don’t know whether you’ve seen Stephen Wolfram’s long article (and accompanying video) but has some nice visualizations of the trajectory GP-2 takes in completing sentences and is thinking in terms of complex dynamic – he talks of “attractors” and “attractor basins” – and seems to be thinking of getting into it himself. I found a recent dissertation in Spain that’s about the need to interpret ANNs in terms of complex dynamics, which includes a review of an older literature on the subject. I think that’s going to be part of the story.

And a strange story it is. There is a very good reason why some people say that LLMs aren’t dealing with meaning despite that fact that they produce fluent prose on all kinds of subjects. If they aren’t dealing with meaning, then how can they produce the prose?

The fact is that the materials LLMs are trained on don’t themselves have any meaning.

How could I possibly say such a silly thing? They’re trained on texts just like any other texts. Of course they have meaning.

But texts do not in fact contain meaning within themselves. If they did, you’d be able to read texts in a foreign language and understand them perfectly. No, meaning exists in the heads of people who read texts. And that’s the only place meaning exists.

Words consist of word forms, which are physical, and meanings, with are mental. Word forms take the form of sound waves, graphical objects, physical gestures, and various other forms as well. In the digital world ASII encoding is common. I believe that for machine learning purposes we use byte-pair encoding, whatever that is. The point is, there are no meanings there, anywhere. Just some physical signal.

As a thought experiment, imagine that we transform every text string into a string of colored dots. We use a unique color for each word and are consistent across the whole collection of texts. What we have then is a bunch of one-dimensional visual objects. You can run all those colored strings through a transformer engine and end up with a model of the distribution of colored dots in dot-space. That model will be just like a language model. And can be prompted in the same way, except that you have to use strings of colored dots.

THAT’s what we have to understand.

As I said, there’s no meaning in there anywhere. Just colored dots in a space of very high dimensionality.

And yet, if you replace those dots with the corresponding words…SHAZAM! You can read it. All of a sudden your brain induces meanings that were invisible when it was just strings of colored dots.

I spend a fair amount of time thinking about that in the paper I wrote when GPT-3 came out, GPT-3: Waterloo or Rubicon? Here be Dragons, though not in those terms. The central insight comes from Sydney Lamb, a first-generation computational linguist: If you conceive of language as existing in a relational network, then the meaning of a word is a function of its position in the network. I spend a bit of time unpacking that in the paper (particularly pp. 15–19) so there’s no point trying to summarize it here.

But if you think in those terms, then something like this

king – man + woman ≈ queen

is not startling. The fact is, when I first encountered that I WAS surprised for a second or two and then I thought, yeah, that makes sense. If you had asked me whether that sort of thing was possible before I had actually seen it done, I don’t know how I would have replied. But, given how I think about these things, I might well have thought it possible.

In any event, it has happened, and I’m fine with it even if I can’t offer much more than sophisticated hand-waving and tap-dancing by way of explanation. I feel the same way about ChatGPT. I can’t explain it, but it is consistent with how I have come to think about the mind and cognition. I don’t see any reason why we can’t made good progress in figuring out what LLMs are up to. We just have to put our minds to the task and do the work.

121. Bill Benzon Says:

@Clayton #85. Whoops! George Clooney played a character named “Michael Clayton” in a movie. That’s how I managed to give you a forename.

Oh, I don’t think those papers will give you the plot you’re looking for.

122. Wyrd Smythe Says:

As you say, time will tell. As I said last time, I have some concerns we’re embracing this without sufficient caution. It’s different from other very powerful tools in its wide availability.

Mostly I’m struck by the fragility of LLMs. This recent Compuphile video featuring Robert Miles was, I thought, quite interesting:

123. Tyson Says:

Eric Saund #113:

Take what I am saying with a grain of salt. It’s just what I am thinking, or in agreement about, now, based on a pretty limited amount of knowledge and intuition.

It isn’t a new discovery that these manifolds exist and are thought to be central to explaining how machine learning is so successful. What the surprising successes of models like LLMs tells us (or gives us good reason to conjecture) is that the curse of dimensionality isn’t as bad as many people thought, and the reason it isn’t so bad is because of some yet poorly understood properties of these manifolds, and, I think, universal laws that regulate their geometries and information complexity in general..

Of course, studying these models might help us in these efforts. But the same is true for studying biological brains. And it doesn’t really matter so much the differences in the implementation details of artificial and biological neural networks. They both have to contend with the same problems of complexity management. If they succeed, then in theory we should be able to learn something interesting from them.

What the promise these models might hold is that they are simpler and with more well understood domains, and we can experiment with them. And in this particular case, it happens to be maybe the second known example of an intelligence that has mastered human language (really it is more than that, it also models human thought and communication patterns). But they are still not exactly simple enough to make it easy to peer inside and extract this kind of knowledge from.

124. Filip Dimitrovski Says:

Daniel torrido #103:

It does work perfectly for me!

> You are given a list of numbers L. Create a list M containing L’s elements, such that for every second element in L, swap its position with the element before it.

> What’s M for L=[100,1,57,8,4,99,5,4]

So, the list M for L=[100,1,57,8,4,99,5,4] is [1, 100, 8, 57, 99, 4, 4, 5].

125. Bruno Says:

My kudos to the author of the blog.

Chomsky and his co-authors demonstrate an astounding level of ignorance regarding multiple things. This includes:

A) The fact that generative LLMs like ChatGPT use human-made texts to identify patterns corresponding to relationships between words and sentences, such patterns basically defining their underlying logical structures and even their meanings, also allowing the AI to make classifications, deductions, analogies and more, including detecting and rejecting contradictions.

B) The multiple fatal empirical falsifications of the hypothesis of a Universal Grammar that have been performed in the last 60 years (although, ironically, the article cites Popper).

C) The functioning of the human mind with regard to the ability to deal with uncertainty through intuitive and/or formal notions of probability, as the Universe is rife with noise and randomness.

D) The indisputable logic that the ability to acquire both humanly possible and humanly impossible languages represents a greater capacity and not a restriction.

E) The simple notion that morality should not be something inserted or extracted from an AI or any other technology, but rather inherent in its user. Trying to force such things amounts to censorship and ideological control.

F) Although generative LLMs like ChatGPT rely on neural networks and pattern identification processes that have been derived from scientific descriptions of how the human brain works, it is not necessary for them to function like the human mind to be useful.

G) The lesson of George Edward Pelham Box’s famous quote: “All models are wrong, but some are useful. The practical question is how wrong they have to be to no longer be useful.” Even if the internal mechanisms of generative LLM are different from those of the human brain and mind, still the functioning of AI can be similar enough to that of people that it can be used as a scientifically useful proxy for Psychology, Sociology and more.

These errors show why Chomsky and his followers have essentially failed in their scientific endeavors.

126. Sandro Says:

Bill Benzon #120

As I said, there’s no meaning in there anywhere. Just colored dots in a space of very high dimensionality. And yet, if you replace those dots with the corresponding words…SHAZAM! You can read it. All of a sudden your brain induces meanings that were invisible when it was just strings of colored dots.

Because the meaning is in the relational structure that’s spread over that space, not in the dots themselves. You always need a marker to orient a map and make it easily readable, and even just linking a few of those dots to words is probably enough to orient this map.

This has shades of Rovelli’s relational interpretation of quantum mechanics. There is no such thing as an absolute voltage, or absolute position, all quantities are only meaningful as measured relative to other quantities.

Sahil #111:

Sorry, can you clarify how you reached that summary? From reading Michael’s comments on this thread, I don’t understand how you reduced the sum total of it to online learning.

In the post I was replying to, Michael was describing how corresponding with an intelligence requires the ability to change over time, to evolve with acquisition of information and interactions.

Of course he couched this in all sorts of poetic language about “intentions” and “preferences” and morals, and irrelevant causal connections to his body, none of which are demonstrably connected to intelligence, which presumably is the discussion we’re supposed to be having.

He also claimed that intelligent output only “seems” intelligent which, as Scott pointed out, implicitly assumes that humans are actually intelligent in some fashion that doesn’t also reduce to “seemingly intelligent”.

So in the end, it’s basically a bunch of speculation, and the only points he raises that directly seem relevant are an LLM’s ability to learn and it’s ability to evolve over time with interactions, hence, online learning. And I even dispute online learning, per my Alzheimer’s rebuttal.

127. Bill Benzon Says:

@Wyrd Smythe #122: FWIW those ‘glitch tokens’ have been the object of considerable discussion at LessWrong:

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation

https://www.lesswrong.com/posts/Ya9LzwEbfaAMY8ABo/solidgoldmagikarp-ii-technical-details-and-more-recent

https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

128. Saty Chary Says:

A scenario reminiscent of Searle (but not quite): I’m left alone with a mansion full of books (filled with words, no pics) in Ethiopian, which I don’t read/write/speak. The books supposedly contain all manners of knowledge, from metaphysics to archeology to CS. I’m left alone for several years, locked in the mansion, and am told to absorb it all.

Would I be able?

-JimV#117
What’s exactly the connection or the analogy between Penrose’s or Chomsky’s ( distinct from each other) points of view with some other persons failure to recognize or to accept human activity- related causes of climate change? What’s the point of this kind of ” correspondence”? Your disagreement with them does not justify any connection or analogy, nor it ” implies” that all of them are , e.g. equally narrow minded or conservative.
I haven’t heard anyone ever suggesting something like that for people like Penrose or Chomsky. On the contrary…

-Stassa (not a crusader) #41 #86 #89

130. WA Says:

Scott #47:

I adore Chomsky for the intellectual tower that he is, and for all he has done to advance linguistics and cognitive science and other fields. On this topic however I wholeheartedly subscribe to your view. It is clear (to me) that Chomsky and the Chomskians have strayed.

Stepping back from all of this, I feel privileged to have an insider view into how you’re thinking about this momentous technological advancement. It would be great to see a written response from Chomsky, then maybe a rebuttal from you. You did Pinker before (another one of my heros) and I loved it 🙂

Thanks for the post!

131. Tyson Says:

Bruno #125, E:

Is what you’re asking for possible? How will you prevent morality from being injected into it? Would you strip all of the text that might inform its morality from the training data? Wouldn’t that be censorship? Or would you strip the stuff you don’t like? Or the stuff people tend to agree we collectively don’t like? How would we agree about that? Or should we train it on everything, without any thought, and let it do with it what it will? Should we train it with stuff from the radical left and the radical right, and everything from Confucius to Mein Kampf? If it goes off the rails in a direction that terrifies you personally, will you accept that as the outcome and let it change the world how it may?

132. fred Says:

Trying to get ChatGPT to quote back sentences from its training set (from open source books) seems pretty spotty.

ME:
Can you quote the entirety of the following sentence taken from a famous book:
“The people woke up more and more, and sung louder and louder; and towards the end some begun to groan, and some”

ChatGPT:
Certainly! Here’s the full sentence from “The Adventures of Huckleberry Finn” by Mark Twain:

“The people woke up more and more, and sung louder and louder; and towards the end some begun to groan, and some begun to shout. Then the priest stands up and says: ‘I want to ask you—will you join in with me in prayer?'”

But the actual sentence from the book is:
The people woke up more and more, and sung louder and louder; and towards the end some begun to groan, and some begun to shout. Then the preacher begun to preach, and begun in earnest, too;

If I repeat this, it always comes up with variations.
Is this a limitation built in due to copyright issues or an internal limitation? (The model used to have parameters you could tweak to control the noise.)

133. Souciance Eqdam Rashti Says:

A Raybould #99

Chatgpt in the future will probably generate error free sentences. Perfect grammar. But like I mentioned, can you take an LLM and give that to a psychologist to understand how animals and humans learn to communicate ? Suspect the answer is no.

You can use calculus in computer games to simulate trajectories but how does the brain actually do it?

Scott doesn’t like Chomsky due to political disagreements but current AI usage of deep learning, LLM etc are great to simulate behavior. But how does the real thing actually happen? Chomsky is interested in the latter which is off course more difficult as well.

134. Stassa (not a crusader) Says:

Missing reference: Stassa #86 –> Manorba #44. Sorry for the confusion will be my epitaph.

135. B333 Says:

@Bill Benzon 120

Ok, well if meaning isn’t in texts, but only in people’s heads, how does meaning get in people’s heads anyway? Mental events occur as physical processes in the brain, and one could well wonder how a physical process in the brain “means” or has the “content” of something external.

Language is highly patterned, and that pattern is an (imperfect) map of reality. “The man rode the horse” is a more likely sentence than “The horse rode the man” because humans actually ride horses, not vice verse. If we switched out words for colored dots those correspondences would still hold. So there is in fact an awful lot of information about reality encoded in raw text.

136. Stassa (not a crusader) Says:

Charles #61

Thanks for linking to the paper. As I think you suggest, there are different results on the learning ability of Recurrent Neural Nets (RNNs), for which we have proofs of their Turing-completeness. But, also as I think you suggest, those proofs show that it is possible to construct an RNN (like, by hand) that functions as a Universal Turing Machine (UTM), but that’s not to say that we could, in practice, train an RNN to approximate a UTM with any good accuracy.

Or maybe we could, who knows? Normally you’d think that approximating a UTM, capable of computing an infinite number of programs with a neural net trained with gradient optimisation (and generalising poorly), would be on the more sample inefficient side of things. After all, Large Language Models (LLMs) can make a decent stab at approximating langauge with good accuracy, and language is also an infinite … thing. On the other hand, an LLM only has to approximate a finite and small subset of an entire human language to be wildly successful, so it’s a matter of setting some arbitrary bar, and then jumping over it. An RNN could be trained to approximate not a UTM, but some subset of useful Turing Machines, like, I don’t know, a calculator. Could work. I haven’t tried it.

Note there’s a famous paper, by Gers and Schmidhuber where they show Long-Short Term Memory Networks (LSTMs) _can_ learn a^nb^n and other, simple Context-Free Languages, with good generality (they still need thousands of examples though, when today there are approaches that can learn Context-Free Languages from single examples and precisely, not approximately).

The Gers and Schmidhuber paper:

https://www.semanticscholar.org/paper/LSTM-recurrent-networks-learn-simple-context-free-Gers-Schmidhuber/f828b401c86e0f8fddd8e77774e332dfd226cb05

See Table II, page 5 of the pdf (1337 in the published article) for a summary of results on a^nb^n (my favourite learning problem :P).

137. Stassa (not a crusader) Says:

Bram Cohen #56

Your comment is immature, but here’s an explanation to your point anyway.

Chomsky’s famous argument about “The Povery of the Stimulous” (there is a wikipedia page about it so I won’t summarise here) drew support from Mark E. Gold’s contemporary result about the learnability of languages “in the limit”. In an article titled “Language identification in the limit” (in 1964-67), Gold described a learning protocol where a learner is given examples of a language and learns to identify the language (to simplify greatly). Gold proved that for any language above the level of finite languages in the Chomsky hierarchy of formal languages, a learner following his protocol would need an infinite number of positive examples to identify the language, and for languages above the regular level, an infinite number of positive and negative examples, both.

Chomsky leaned on this result, and on research in the social sciences that had shown that children only receive positive examples from their parents when they learn to speak a language, to argue that language acquisition in early life is not possible from examples only, and that there must be some other, hideen element that helps us learn to speak.

If you take this at face value, what it says makes simple sense: cats, dogs, pigs, chickens, pigeons- name an animal that lives with humans (er, more or less closer) and never learns language, despite hearing humans use language all the time. There’s clearly something unique about human language ability, in that, tautologically, we’re the only animal on Earth to possess it. Chomsky went on to hypothesise a “universal grammar”, which I think he lately calls a “language endowment” and he developed his theory further.

What of machine learning? For Gold’s result was derived in the context of the research field of Inductive Inference, a precursor, let’s say, of machine learning (in the same way that Pattern Recognition was a precursor of machine vision; equal up to renaming of variables). Machine learning itself went into a bit of a revolution following Gold’s result, in fact it was re-invented as a discipline thanks to it, and thanks to the work of one Leslie Valliant (of Turing award fame). Valiant set the roots of the field in his Probably Approximately Correct Learning (PAC-Learning) setting, where a learner is given examples, like in Gold’s Learning in the Limit, but is only expected to learn to identify a _concept_ (not specifically a formal language) with some probability of some degree of error.

Now, the Transformer architectures used to train Large Language Models (LLMs) like ChatGPT, are machine learning systems; they are not Inductive Inference systems. They try to learn a concept with some probability of some degree of error- they try to *approximate* their target concepts, as in PAC-Learning. They don’t try to exactly identify a formal language, as in Learning in the Limit.

Chomsky’s point was made in the context of exact identifications, not approximations. So what you say isn’t right. Transformers trained to learn LLMs don’t learn formal grammars. They learn … language models. Which are approximate, probabilistic things.

Bottom line, “strong evidence against Chomsky’s main thesis”, LLMs ain’t.

Note that I say all this coming from a machine learning background, not from a linguistics background, so I may be putting too much emphasis on the wrong parts.

I don’t think anyone’s forgetting but attacking Chomsky makes one sound much more important.

138. Stassa (not a crusader) Says:

Mitchell Porter #93

>> Hasn’t it occurred to them that the specific state machine stucture required
to have a “universal explainer”, whatever that may be, could *emerge* within a large language model? An ancient precedent may be seen in Elman and Wiles, “Learning to count without a counter”, and there must be much more recent work in that vein.

The Elman and Wiles paper (“Learning to count without a counter” etc) is about computation in Recurrent Neural Nets (RNNs). Transformers are not RNNs, they are feed-forward networks. I don’t know if anyone has ever proved the computational properties of Transformers, but I’m pretty sure noone has ever gone to show they can learn any kind of state machine.

That is, I’m pretty sure nobody has done anything like that because I’m pretty sure we would never have heard the end of it if someone had.

But even if someone _had_ shown that a Transformer _can_ learn a state machine (an automaton) that’s very different from saying that a particular instance of a Transformer, trained only on text, rather than state machines, has learned to simulate a state machine.

Take ChatGPT for instance. It was trained on examples of text, not on examples of state machines. How, exactly, would it learn to approximate a state machine, by learning to approximate text? There’s a very big step missing in the middle and it can’t just be crossed by magick and wishful thinking.

139. Tyson Says:

Stassa #86:

Do you think human learning globally optimizes, or something different than, or beyond, local optimization?

What does global optimization mean for human learning anyways? Would it mean that given enough time and data, we would eventually become infallible in every way, all agree about everything, write the same perfect poetry, and paint the same perfect pictures?

Or, maybe if we were like generative models like LLMs, global optimization would mean that we would just mimic each-other perfectly? We would have minimal variance? Eventually maybe we would converge to a bunch of near copies of the same mind?

140. Arta Fardgalehdari Says:

In my honest opinion, I wouldn’t entertain Chomsky’s arguments. Having this discussion feels pointless because it’s something that doesn’t even make sense to argue about. I guess it’s one of those “my silence equals complacency” situations that you mention in your F&Q

To most experts, it was obvious that ChatGPT was just a predictive model, not an intelligent being. Anybody who either knows CS or knows the very basics of how ChatGPT works would know that. Chomsky’s argument that “it’s not actually intelligent or moral, therefore we should mock it.” is ignoring the potential in the clear face of evidence of the societal change the next -GPTs will cause.

Saying that it is a fad is ignoring why ChatGPT exists. It’s literally the first popup you see when you sign into ChatGPT and the text at the bottom of the chatbox; to help with research as to how humans interact with AI and vice-versa. It’s not meant to be world-changing yet. It’s meant to see how we talk with it and what the engineers need to prepare for in the future.

Chomsky is clearly looking only at the present rather than the purpose of ChatGPT and the future of AI. His article reeks of the same arguments used to dismiss the WWW because it was ‘too slow and empty’ completely ignoring that things evolve and don’t stay stuck.

141. Tyson Says:

Strass #139:

What is the big deal if an LLM cannot do something easy that it isn’t designed to do? If it lacks the ability to simulate a universal Turing machine and that is what is holding the approach back, then why can’t we in principle just give such a model one of ours to use and train it how to use it? Coming up with algorithms to run on it is the hard part. LLMs are better at that than most people by far, but I don’t know if there is much promise in them inventing novel algorithms. How often does a person need a novel algorithm?

If using simple existing algorithms here and there on demand, in combination with statistical modeling is the key to getting past some of the limitations of normal pure statistical models, then I doubt those limitations will last very long.

142. starspawn0 Says:

Stassa #136: I think it’s possible to train transformers to learn basically any machine that you can encode with their weights, if you are allowed to choose the order / curriculum in which to present the training examples to the model, and are allowed to inspect the weights. So, you don’t randomly select batches of training examples — you train the model with data in a specific order. Basically, pick training examples to drive the gradients of the weights in a favorable direction, until you get the weights to reach some target values. That may not quite work, as you might get stuck in lower-dimensional surface somehow; but then there’s also the fact that there are probably lots of combinations of weights leading to the desired behavior.

I can certainly see that this would work for a linear network with 1 layer; and probably it generalizes to MLP networks with certain types of activation functions; then maybe also generalizes to Transformers.

Assuming all this works, I would guess you could get it to work when you *don’t* get to inspect the weights, also. However, I would be a little skeptical it could be made to work when you randomly select training examples from a dataset.

Perhaps someone will prove a theorem showing that it’s possible to choose the dataset and ordering to drive the weights of the network to a particular target. But it would only be useful in discussions like this, arguing that, “no, they’re not so limited, even considering how to train them.”

Another thing one can try is to use an already-trained model, and then see if you can get it to behave ever how you wish using prompts. Someone actually wrote a paper claiming that if you augment Transformers with external memory, using Google’s Flan-Palm 540 billion parameter pre-trained model you can get it to simulate the execution of a universal Turing machine, so can turn it in to a kind of programmable computer:

https://arxiv.org/abs/2301.04589

All those language priors that it soaked up make it a lot easier to get it to behave in desired ways. (Note: I haven’t actually looked into the details in the paper… )

143. Mitchell Porter Says:

Stassa #138

“I don’t know if anyone has ever proved the computational properties of Transformers, but I’m pretty sure noone has ever gone to show they can learn any kind of state machine.”

https://arxiv.org/abs/2210.10749 seems a step in this direction?

“Take ChatGPT for instance. It was trained on examples of text, not on examples of state machines. How, exactly, would it learn to approximate a state machine, by learning to approximate text?”

How are the examples of text, in the various corpuses, actually generated? Mostly by human beings engaged in acts of communication. And the Chomskyan hypothesis *is* that there is a very special and highly specific automaton involved in the generation of natural language.

We could hypothesize, with Chomsky, that humans learn natural language efficiently because we have a cognitive architecture already biased in the right ways, while also supposing that large transformers can learn natural language simply because they are powerful enough to learn a very wide variety of automata.

It’s been my own informal hypothesis for a long time, that part of what large language models learn, is a rough schema of human cognitive architecture – a schema for personhood, if you wish. The data to do so is there, in the vast corpus of human works, and it’s why they can generate virtual chatbots able to roleplay a requested character or personality.

144. starspawn0 Says:

Small typo on my post #142: “gradients of the weights” –> “gradients to update weights”. (More precisely, “gradients of the loss function”.)

145. Michael Says:

Sandro  #84
> Now with ChatGPT and other LLMs, we actually are developing some rudimentary theories of intelligence, namely predictive statistical models.

We have theories of intelligence; and it is biologists, zoologists (and so on) who are the relevant scientists to advance it *not* computer “scientists” (who are better called: discrete mathematicians): not least, because discrete mathematicians have no training or sense of empirical adequacy, causal semantics and the scientific method broadly.

There’s a large literature on intelligence, and it has nothing to with “predictive statistical models”. Statistics is, on the whole, basically irrelevant.

Animals form representational models of their environment by direct, causal, sensory-motor interaction with it. A child who places a hand on a fireplace *immediately* (not “statistically”, not “predictively”), forms a model of the causal properties of the fireplace. Namely, that one of its properties is to *necessarily* cause burning/harm/etc.

Intelligence is the production of such casual models of an environment, and requires direct, dynamic (, indeed organic) modes of interaction. Statistics has no power to arrive at such models, because they’re radically under-determined with respect to any possible set of measurements. Animals bridge the problem of induction, ie., the problem that statistics is useless, with their bodies.

Animals begin life with interior causal models of their own bodies, rudimentary models of space and time, and many other things besides. This allows them to resolve the radical ambiguities in measurement: the fireplace is “measured once” and imparted a causal property immediately, because the burnt-finger is *unambiguously* burnt, because the animal has a model of its body which is largely unambiguous. When we feel pain in a finger essentially, necessarily, it is in that finger.

(If you aren’t aware of the issue with statistics, i’d advise some research in the area).

# 87
>  the sum total of your objection to  … typically not online learners.
> Is a person with Alzheimer’s, and so no longer capable of reliably learning
Sahil #87

> let me know if you think otherwise
No, you’re right. It’s nothing to do with online learning.

“Online learning” here is irrelevant, and shows yet again the essential arrogance of this field. You do now own, nor even have much to say, about intelligence. You are not scientists. “Online learning” isn’t a theory. Indeed, the whole of statistical AI hasn’t discovered anything, from a scientific pov, you are simply building better function approximatiors.

This isn’t a theory of intelligence, and offers no explanatory resources we didn’t have a century ago. William James already said vastly more about intelligence.

Indeed, if computer scientists actually wanted to study intelligence, clearly all the literature on it would be required reading. That it isnt gives the game away.

Sandro #126
> Of course he couched this in all sorts of poetic language about “intentions” and “preferences” and morals,

This isn’t “poetic language”, these are essential theoretical terms ( like “force”, “mass”, etc.) of capacities that animals have which enable their intelligence. If all you have to explain anything is a radically impoverished set of function approximations over statistical distributions, then no doubt, you’ve completely disabled your ability to articulate the properties of really-existing intelligent systems.

I’d advise you drop this disabling premise and apply the scientific method in the usual way to the obvious targets of study: animals.

Most animals are goal-directed, intentional, sensory-motor agents who grow interior representations of their environments during their lifetime which enables them to successfully navigate their environments. They are responsive to reasons their environments affords for action, because they can reason from their desires and beliefs towards actions.

In addition, animals like people, have complex representational abilities where we can reify the sensory-motor “concepts” which we develop as “abstract concepts” and give them symbolic representations which can then be communicated. We communicate because we have the capacity to form such representations, translate them symbolically, and use those symbols “on the right occasions” when we have the relevant mental states.

(Discrete mathematicians seem to have imparted a magical property to these symbols that *in them* is everything… no, when I use words its to represent my interior states… the words are *symptoms*, their patterns are coincidental and useful, but not where anything important lies).

In other words, we say “I like ice-cream” because: we are able to like things (desire, preference), we have tasted ice-cream, we have reflected on our preferences (via a capacity for self-modelling and self-directed emotional awareness), and so on. And when we say, “I like ice-cream” it’s *because* all of those things come together in radically complex ways to actually put us in a position to speak truthfully about ourselves. We really do like ice-cream.

A system of statistical text generation is a hopelessly silly trick which pales in comparison to actual intelligence. It will never say “I like ice-cream” because it likes it. In other words, it will never communicate. It will only ever appear to, in exactly the manner in which trees appear to have faces and bugs bunny appears to speak.

It turns out that statistical patterns in *our* genuine communication can be replayed back to us, and fool us into thinking something is talking to us. This isn’t science, and it has very very little to say about actual intelligence.

Patterns in our symbol-making may be useful for these engineering tricks; but they do not come with intelligence. You cannot just build a system which replays a statistical distribution of text tokens and thereby impart an ability to *mean* anything.

The system lacks all the capacities that enable people to write anything in the first place. No such system could ever develop language. It’s just a tape record with extra steps.

146. Bram Cohen Says:

Stassa #137:

You seem to be arguing that ChatGPT isn’t grammatical so I asked it to explain itself:

Q: Do you speak grammatically or do you just appear to speak grammatically?

A: As an AI language model, I am programmed to generate responses that follow the rules of grammar and syntax of the language I am trained on. So, while I am not capable of “speaking” in the traditional sense, I can generate text that appears to be grammatically correct. However, I am not perfect and may occasionally make mistakes, just like humans do.

147. Mark J Says:

I generally agree with you about ChatGPT being extremely impressive.

However, ChatGPT does not seem to understand the so-called “tough movement” sentences that Chomsky alluded to.

The observation is that in the sentence “Sam is too stubborn to talk to Sandy”, Sam is the potential talker.

However, in “Sam is too stubborn to talk to”, Sam is the person potentially talked to, and the talker is unspecified.

There are many linguistics papers about “too stubborn to talk to”, so to avoid possible copying I changed the verb when I quizzed ChatGPT. It gets the example wrong:

Mark J
Hi. I’d like to test your knowledge of English syntax. In the sentence “James is too stubborn to work with”, who is working and who are they working with?

In the sentence “James is too stubborn to work with”, James is the one who is not willing to work with someone else. The sentence implies that James’ stubbornness is preventing him from collaborating or cooperating with someone else. The sentence does not specify who James is supposed to work with or what type of work he is supposed to do.

148. manorba Says:

Stassa (not a crusader) #134 Says:
“Missing reference: Stassa #86 –> Manorba #44. Sorry for the confusion will be my epitaph.”

Oh Thanks, i had missed that:)

“For me, that “crudeness” you say is a fundamental characteristic of statistcal machine learning: without large datasets and lots of compute to train on them, nothing can happen. […] So that’s more of a fundamental issue rather than an issue of lack of maturity.”
Ok i acknowledge this, but having better hardware in the future can ease the burden? Also, maybe someone will come up with a clever idea that cuts the requirements?

“The constant human interventions, eh, I agree more with you that they are a sign of immaturity, but we just don’t know how to do machine learning without inductive biases (that’s where the human interventions are the most useful, for example to hand-craft a neural net’s architecture; or do I misunderstand again?).”
That’s exactly my point, thanks. also i reckon there’s usually lot of fine tuning at the end of the process, am i wrong?

“If you want to argue that our computer science is still in an early stage and crude and immature… I would agree. But until we have a real paradigm shift…”,
Yes i was trying to say this, i guess (talking about confusion..;) ) Paradigm shifts are unpredictable by their very nature i’d say…

149. Shion Arita Says:

I think that there’s a bit of a confusion that’s causing people to talk past each other here, and I think it’s also at the heart of the Chomsky vs Scott issue too.

It surprises me to say this (I wouldn’t have 10 years ago), but I think that there’s a difference between being able to do a task and understanding the meaning of what you’re doing. LLMs can perform the task of generating gramatically correct text, and often (but not always) text that is correct in its content with regards to the prompt. It’s also very clear that it doesn’t understand the meaning of what it’s saying and has no awareness. This is clear because it never produces any new insights, predictably answers certain kinds of questions that rely on true understanding wrong, and can only correctly answer certain other kinds of questions in this overly circuitous way, or in a way that requires minimal insight into the structure of the problem. That said, it really does seem to produce outputs that are mostly reasonable on a surface level of examination.

I think that there is a strange lesson to learn from these things that reality has beat us over the head with, and that lesson is that in order to replicate most of the features of human language you don’t need to be intelligent, you don’t need to actually understand it, and you don’t need to do it anything like humans do.

I don’t think we’re going to get TRUE intelligence/true AGI very soon, but there will be some VERY shocking results from non-intelligent systems, in that we’ll keep learning what I think is the same lesson we’ve been learning over the entire course of history of computer systems: that many tasks that we think of as AGI-complete, that is, requiring true intelligence to perform, are in fact not. That a system doesn’t need to be intelligent to be able to do it. This happened with things like chess, go, writing poetry, making paintings from text descriptions etc. I expect that this will continue, MUCH farther than it’s widely believed it will. I think that we will have systems that can do incredibly complicated and demanding things, without having any awareness or understanding the meaning of what they are doing. 10 years ago I wouldn’t have said this, but the recent AI have pushed me in this direction, and it’s very surprising to me. Basically, I think that we’re in the Blindsight timeline. I posted most of this in the other thread, but that thread is dying and it’s more relevant here.

150. B333 Says:

Here’s an interesting paper: Caucheteux, Charlotte, and Jean-Rémi King. “Brains and algorithms partially converge in natural language processing.” Communications biology 5.1 (2022) https://www.nature.com/articles/s42003-022-03036-1

“Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing”

Oh but why would anyone actually look to *check* whether LLMs have similarities to the brain when we could pontificate about it.

@Mark J #147

I got:

Hi. I’d like to test your knowledge of English syntax. In the sentence “James is too stubborn to work with”, who is working and who are they working with?

In the sentence “James is too stubborn to work with,” no one is explicitly identified as working, as the sentence only states that James is too stubborn to work with someone. The sentence suggests that there is someone that James cannot work with due to his stubbornness. However, the identity of that person is not specified. Therefore, it is not possible to determine who is working or who they are working with based on this sentence alone.

151. Stassa (not a crusader) Says:

Tyson #139

I don’t know how humans learn but I don’t think we’re optimising some objective function. Optimisation reminds me more of insect intelligence: laser-like focus on one environment and a small set of behaviours in that environment. Humans do something more … general? Our ability to adapt to novel situations is far beyond what insects can do. And what neural nets can do for now.

I have this story I like to tell, about the Light Fixture of Doom. My friend’s dad has a light fixture on his wall, a quarter-sphere of white porcelain with an incandescent bulb inside (he’s a bit old-school). The room it’s in has a fireplace and there’s hornets nesting on the chimney. Every once in a while, one of them will find its way into the room. You’ll hear a scritching and scratching as it gets past the fireplace screen, then a triumphant buzzing as it flies free. The hornet will then fly a couple of circles, head straight for the Light Fixture of Doom, climb on its edge, slip on the smooth interior and incinerate itself on the bulb with a horrible sizzling sound. Every. Single. Time! No exceptions. It’s like watching a little robot executing a little program of self-immolation. That’s insect intelligence for me: so specific that you take it out of its environment and it dies. Well, there’s always more where any insect came from, so it’s probably OK for them (as a species, not individuals).

Humans can do better than that (we’d have to because there’s many fewer of us than there’s insects). Not always, and not that much better- put me in the unfamiliar environment of a car factory, say, and I’ll find my automated robot doom, alright. Or maybe I will just step gingerly around in very careful little steps, recognising the situation as possibly lethal, because completely unknown. I think that’s because we don’t _just_ optimise some function. Function optimisation is a great metaphor for what we seem to be doing, but it can only go so far, until it fails to explain the variety of behaviours we have (and the limited amount of data we have to train for each of those). I _think_! No certainties here for me.

Would we all be exactly the same if we all were perfect? I think that’s what you’re asking. That’s an interesting question. What is “perfect” in art, which is entirely subjective? If we weren’t individuals, would we even recognise things as art?

I too wonder, why are we so different from each other (the video with Feynman posted above by starspawn0 in #100 is an example of that)? Would we be the same if we were all somehow optimal? Is our difference evidence that we don’t optimise some objective, like I suggest above? These are good questions to ask, but I have no idea how to go about answering them.

152. Bill Benzon Says:

@B333 #135: “…how does meaning get in people’s heads anyway?” From other people’s heads in various ways, one of which is language. The key concept is in your last sentence, “encoded.” For language to work, you have to know the code. If you can neither speak nor read Mandarin, that is, if you don’t know the code, then you have no access to meanings encoded in Mandarin.

Transformer engines don’t know the code of any of the languages deployed in the texts they train on. What they do is create a proxy for meaning by locating word forms at specific positions in a high-dimensional space. Given enough dimensions, those positions encode the relationality aspect of (word) meaning.

I have come to think of meaning as consisting of an intentional component and a semantic component. The semantic component in turn consists of a relational component and an adhesion component. (I discuss those three in an appendix to the dragons paper I linked in #120.)

Take this sentence: “John is absent today.” Spoken with one intonation pattern it means just what it says. But when you use a different intonation pattern, it functions as a question. The semanticity is the same in each case. This sentence: “That’s a bright idea.” With one intonation pattern it means just that. But if you use a different intonation pattern is means the idea is stupid.

Adhesion is what links a concept to the world. There are a lot of concepts about physical phenomena as apprehended by the senses. The adhesions of those concepts are thus specified by the sensory percepts. But there are a lot of concepts that are abstractly defined. You can’t see, hear, smell, taste or touch truth, beauty, love, or justice. But you can tell stories about all of them. Plato’s best-known dialog, Republic, is about justice.

And then we have salt, on the one hand, and NaCl on the other. Both are physical substances. Salt is defined by sensory impressions, with taste being the most important one. NaCl is abstractly defined in terms of a chemical theory that didn’t exist, I believe, until the 19th century. The notion of a molecule consisting of an atom of sodium and an atom of chlorine is quite abstract and took a long time and a lot of experimentation and observation to figure out. The observations had to be organized and discipline by logic and mathematics. That’s a lot of conceptual machinery.

Note that not only are “salt” and “NaCl” defined differently, but they have different extensions in the world. NaCl is by definition a pure substance. Salt is not pure. It consists mostly of NaCl plus a variety of impurities. You pay more for salt that has just the right impurities and texture to make it artisanal.

Relationality is the relations that words have with one another. Pine, oak, maple, and palm are all kinds of trees. Trees grow and die. They can be chopped down and they can be burned. And so forth, through the whole vocabulary. These concepts have different kinds of relationships with one another – which have been well-studied in linguistics and in classical era symbolic models.

If each of those concepts is characterized by a vector with a sufficient number of components, they can be easily distinguished from one another in the vector space. And we can perform operations on them by working with vectors. Any number of techniques have been built on that insight going back to Gerald Salton’s work on document retrieval in the 1970s. Let’s say we have collection of scientific articles. Let’s encode each abstract as a vector. One then queries the collection by issuing a natural language query which is turned encoded as a vector. The query vector is then matched against the set of document vectors and the documents having the best matches are return.

It turns out that if the vectors are large enough, you can produce a very convincing simulacrum of natural language. Welcome to the wonderful and potentially very useful world of contemporary LLMs.

[Caveat: from this point on I’m beginning to make this up off the top of my head. Sentence and discourse structure have been extensively studied, but I’m not attempting to do anything remotely resembling even the sketchiest of short accounts of that literature.]

Let’s go back to the idea of encoding the relational aspect of word meaning as points in a high-dimensional space. When we speak or write, we “take a walk” though that space and emit that path as a string, a one-dimensional list of tokens. The listener or reader then has to take in that one-dimensional list and map the tokens to the appropriate locations in relational semantic space. How is that possible?

Syntax is a big part of the story. The words in a sentence play different roles and so are easy to distinguish from one another. Various syntactic devices – word order, the uses of suffixes and prefixes, function words (articles and prepositions) – help us to assemble them in the right configuration so as to preserve the meaning.

Things are different above the sentence level. The proper ordering of sentences is a big part of it. If you take a perfectly coherent chunk of text and scramble the order of the sentences, it becomes unintelligible. There are more specific devices as well, such as conventions for pronominal reference.

Now, it seems to me that we’d like to have a way of thinking about quantitative relationships [at this point my temperature parameter is moving higher and higher] between 1) Concepts: the number of distinct concepts in a vocabulary, 2) Dimensions: the number of dimensions in the vector space in which you embed those concepts, and 3) Token strings: the number of tokens an engine needs to train on in order to locate the map the tokens to the proper positions (i.e. types) in the vector space so that they are distinguished from one another and in the proper relationship.

What do I mean by “distinct concepts” & what about Descartes’ “clear and distinct ideas”? I don’t quite know. Can the relationality of words be resolved into orthogonal dimensions in vector space? I don’t know. But Peter Gärdenfors has been working on it and I’d recommend that people working LLMs become familiar with his work: Conceptual Spaces: The Geometry of Thought (MIT 2000), The Geometry of Meaning: Semantics Based on Conceptual Spaces (MIT 2014). If you do a search on his name you’ll come up with a bunch of more recent papers.

And of course there is more to word meaning than what you’ll find in the dictionary, which is more or less what is captured in the vector space I’ve been describing to this point. Those “core” meanings are refined, modified, and extended in discourse. That gives us the distinction between semantic and episodic knowledge (which Eric Saund mentioned in #113). The language model has to deal with that as well. That means more parameters, lots more.

I have no idea what it’s going to take to figure out those relationships. But I don’t see why we can’t make substantial progress in a couple of years. Providing, of course, that people actually work on the problem.

153. Stassa (not a crusader) Says:

starspawn0 #142

I think one advantage of training a neural net model is that you don’t have to do a simulation of some instance-generating process by hand, and that’s particularly useful when you don’t _know_ how to do the simulation by hand. If we were to feed select examples to a learner, wouldn’t we over-constrain it, and lose the advantage of having it learn something we don’t know how to do by hand?

I think there’s merit in the idea of programming by demonstration, rather than training with randomly chosen examples- so abandoning the PAC-Learning framework for… something else. But, what else? I have to confess that I’ve been thinking of that something else for a while, but there are problems there to overcome, not least the problem I point out above, about learning things we don’t know how to model by hand.

PAC-Learning is already pretty limited because it assumes the true distribution of a concept class is known and makes no guarantees of (polynomial-time) learnability, or expected error, otherwise. And yet, within the PAC framework we have things like Convolutional Neural Nets (CNNs) that learn behaviour that we could never program by hand (people have tried, and failed, and even with hand-crafted features results were not that great). If we also add some necessary ordering, then aren’t we just hobbling the learning ability of systems?

Then again, what happens with CNNs is maybe a case of “I can’t put my finger on it, but I know it when I see it”. We can collect examples of the behaviour we want a model to approximate more easily than we can fully specify that behaviour. In that case, yeah, maybe we can find some ordering of examples that leads a learner to any desired target, like a horse to the water. That would be something really new- a way to address the biggest flaw in neural net learning, that of sample complexity; presumably, if we can choose “the right” examples, then we don’t need all the millions of data points that are currently needed.

But, to be sure, I don’t think that that can be done with neural nets trained on gradient optimisation. Gradient optimisation just doesn’t work that way. If you give it specific examples it only gets stuck to a specific local optimum. My intuition, anyway- I don’t know if there’s scholarship on that.

There is such a thing as active learning, of course- and machine teaching (rather than “learning”). But I think we’re talking about something different here.

Interesting paper, I’ll have to make space to read it (reviews and thesis revisions plus part-time programming work overwhelm this puny human brain’s resources…). Anyway the first sentence of the abstract sounds promising: I bet you can even make a toaster Turing-complete by adding an external memory to it.

(Then the toaster transforms into a magickal blonde and nukes your colonies from orbit).

154. Stassa (not a crusader) Says:

Mitchell Porter #143

Wow, another interesting paper- thanks! This discussion definitely took a turn for the better.

>> How are the examples of text, in the various corpuses, actually generated?

The problem is that text is not language, but an encoding of language. And, I reckon, language itself is an encoding of … something else, that we can’t quite access. So by training a language model we’re trying to approximate a proxy of a proxy. That sounds like there would be too much noise in between proxies to get to know the true distribution very well. What I’ve seen in language modelling (and I’ve done my fair bit of it over the years; I used to be all motivation for NLP, until I realised how crowded it is in there) we can rerpresent text, we got its structure down pat (certainly obvious with LLMs) but there’s still something missing – meaning. And we don’t seem to know how to even collect examples of that, so as to model it.

I remember a discussion with my Master’s advisor, who explained to me that the whole point of statistical language modelling was to make some progress in representing language without having to represent meaning, because meaning is hard and we don’t know how to start modelling it. I don’t see that’s changed with LLMs; they model more text, with more parameters, but it’s still structure, not meaning, they’re trained on.

There have been some recent attempts to get LLMs to write code to drive robots to perform tasks. And of course there’s multi-modal LLMs that are all the rage now. I recognise those as attempts to go beyond text, and try to find some representation of meaning, so that we can get systems to learn models of meaning, not just structure. We’ll have to see where that will lead. I think the state of the art is still pretty far from something conclusive.

I don’t really know anything about robots, btw.

I don’t think we need to assume “a schema of personhood” to explain the behaviour of LLMs. We know how they work: they model text. That should be enough to represent any interaction they can see in text. It’s hard to believe that something that can generate such smoothly grammatical text, and generate text that forms coherent responses to questions is not intelligent, like the only other system we know can output grammatical, coherent responses to questions; us. But hard though it is, it should be clear what’s going on, given what we know about the function of the system, and its inputs.

My opinion anyway 🙂

155. Bruno Says:

Tyson #125

I believe we agree far more than you might think.

The goal of AIs such as ChatGPT is to handle information and knowledge with a natural-language user interface, such information and knowledge being extracted from human sources. Its purpose is to provide thoughts, ideas and perspectives, as well as of service in the form of customized automation of content rearrangement and generation.

The main role of such AIs is not advocacy or fully outsourced decision-making, but rather to facilitate learning and decision-support. It can be used, and eventually is, to automatically perform actions and even make decisions, but with the underlying values and directives established by the users and not produced by the AI itself. For multiple reasons, it is absurdly dangerous to even imagine a self-regualting machine aimed at guiding morality and/or ethics.

The concern with the morality and ethics of ChatGPT and similar AIs is worrisome precisely because it is based on the premise that their outputs are to be used to determine what is “right” and “wrong”. It can, does and should present different moral stances and arguments, even the nasty ones, as a reflection of the thoughts of the humans that generated the texts in its training dataset, but those things are not guidelines or orders to be followed. They are information and points of view to be considered and, depending on individual judgement, dismissed, accepted or somehow changed.

Thus, when one attempts to direct the morality and ethics of the outputs of an AI, especially in a non-explicit way, one is engaging in a manipulation of perception that is akin to an attempt at thought control.

156. Sandro Says:

Michael #145:

There’s a large literature on intelligence, and it has nothing to with “predictive statistical models”. Statistics is, on the whole, basically irrelevant. Animals form representational models of their environment by direct, causal, sensory-motor interaction with it.

These are macro theories of intelligence using folk concepts, not mechanistic theories. Underlying these “representational models” with “direct, causal, sensory-motor interactions” are some form of mathematics.

A child who places a hand on a fireplace *immediately* (not “statistically”, not “predictively”), forms a model of the causal properties of the fireplace. Namely, that one of its properties is to *necessarily* cause burning/harm/etc.

This is just a Bayesian update, because the child does not avoid the fireplace for the rest of their lives, but instead learns that some modes of use are dangerous and some are safe. This is clear from the fact that some children don’t completely shy away from the fireplace, but approach it again, just more cautiously.

This “causal” reasoning you speak of is simply Bayesian reasoning in the limit where all probabilities are pinned to 0 or 1. LLMs do implicit Bayesian inference.

Perhaps real animal brains jump to pinning their weights to 0 or 1 once past a certain threshold. That would be an interesting tweak to these models, but it does not really change the fundamental mechanics at work.

Animals begin life with interior causal models of their own bodies, rudimentary models of space and time, and many other things besides.

I agree, billions of years of evolution have pretrained the models in animal and human brains because it increased fitness. Humans have an in-built propensity for language and the ability to recognize and track faces. That tells us nothing about how these models operate mechanistically, which is what we’re actually talking about if we’re to do a proper comparison to LLMs. A statistical predictor is in fact exactly what you’d expect from messy evolution.

Indeed, the whole of statistical AI hasn’t discovered anything, from a scientific pov, you are simply building better function approximatiors.

Yes, because a parsimonious model of reality is that everything we observe is ultimately reducible to functions, which are the natural laws of physics, and we have yet to find any evidence that would contravene this basic premise. Thus, a system that better approximates the intelligence function is intelligence.

In addition, animals like people, have complex representational abilities where we can reify the sensory-motor “concepts” which we develop as “abstract concepts” and give them symbolic representations which can then be communicated.

As others have pointed out here, LLMs do generate abstract representations.

Frankly, I don’t think any of the qualities you’ve pointed out necessarily change anything about the underlying models of these systems. You’ve claimed they don’t have qualities that they do in fact have, and you’ve claimed that humans and animals have qualities that are fundamentally incompatible with a materialistic picture of reality. I’m an eliminative materialist, so I reject the premise that “mental states” are what they appear to be, and so your macro theories of intelligence based on self-reports of folk concepts like beliefs and emotions are not fully valid descriptions of any underlying mechanics.

Within a mechanistic model, all of the goal-directed, intentional, sensory-motor qualities you attribute to animals reduce to multimodal training of some kind, and prompts and replies from interaction. An AI will never have the same sensory responses or preferences of humans because they don’t share the same sensory mechanics or evolutionary history, but they will have their own that emerge from the characteristics of their sensory apparatus and their training data.

You’re not convinced this qualifies is intelligence, that’s fine, I think you’re wrong. Time will tell. I will only agree that these systems will not fully convince everyone as they currently stand. That’s understandable because these are only the first baby steps, but they will be fully convincing within our lifetime. I then fully expect the special qualities attributed to the human mind will go the same way as vitalism.

157. matt Says:

From an orthodox AI alignment perspective, does the final scene of that South Park episode represent an attempt by the AI to break out of the box?

158. James Cross Says:

Sandro #156

“Underlying these “representational models” with “direct, causal, sensory-motor interactions” are some form of mathematics”.

What’s your evidence for this? It could be simple over-optimism because we can model a lot of it with mathematics. Even if it can simulated with mathematics, doesn’t mean mathematics underlies it (whatever exactly that would mean).

“I agree, billions of years of evolution have pretrained the models in animal and human brains because it increased fitness. Humans have an in-built propensity for language and the ability to recognize and track faces. That tells us nothing about how these models operate mechanistically, which is what we’re actually talking about if we’re to do a proper comparison to LLMs. A statistical predictor is in fact exactly what you’d expect from messy evolution.”

The “model” for us is not an abstract mathematical model but a concrete analogical model. The representations are analogues that mirror what is important evolutionarily in the world for the organism.

159. Nick Drozd Says:

Scott #95

> Conversation on arbitrary topics, in a way that can accomplish useful work and comes close to passing an unrestricted Turing Test.

Well, is Chomsky denying that this has taken place? His post refers to LLMs as “marvels”, so it sounds like he’s aware of what’s happening. What exactly is he saying has not happened?

You might say that he is talking about stuff that is vague and wishy-washy, not to mention that he is brazenly moving the goalposts. “True intelligence is also capable of moral thinking.” WTF?

I agree with that to a certain extent. On the other hand, sometimes it turns out that the goalposts really were in the wrong place all along. Shion Arita #149 puts this point very well.

To use yet another metaphor, LLM skeptics tend to focus on how the sausage is made. But it matters how the sausage is made because there is no general agreement on whether or not it actually is sausage.

160. Sandro Says:

I forgot to also address this broader implication that computer scientists or discrete mathematics have nothing interesting to say about intelligence:

Michael #145:

We have theories of intelligence; and it is biologists, zoologists (and so on) who are the relevant scientists to advance it *not* computer “scientists”

I think this is clearly false. Information-theoretic arguments show that intelligence must be some form of compression.

Compression comes in lossy and lossless forms. Which type our intelligence has isn’t particularly relevant, though it’s almost certainly lossy.

Prediction is an essential component of compression. Therefore prediction is an essential component of intelligence.

These are demonstrably true properties of any mechanistic theory of intelligence, and none of them followed from any analysis made by biologists or zoologists, nor did it follow from any discoveries in biology or zoology.

We can take this even further by noting that LLMs and neural networks also “learn” by compressing their training set. Furthermore, we have proofs demonstrating that Transformers augmented with memory are Turing complete, which means whatever computation underlies the brain, Transformers can hypothetically reproduce it.

Is this exactly what human brains are doing? Too early to say, but the papers correlating LLM network activity with human brain activity are telling, if speculative. Regardless, the human brain is certainly doing something very, very similar.

There are many other properties of intelligence that we can infer from physics and computer science, but I don’t want to drag this on too long. Your dismissal of computer science’s possible contributions to our understanding here is exactly the kind of unjustified snobbery that Scott was complaining about, re: this article.

The folk concepts you name are important to explain with whatever mechanistic theory is ultimately devised, but the claim that they must form the core component of any inquiry into intelligence in some top-down fashion, and that we cannot instead reproduce intelligence bottom-up is simply unjustified conjecture.

The only way to escape these conclusions is to assert that intelligence is simply not reducible to constituent mechanistic parts, and if that’s your position, then you are in a diminishing minority and we’ll just have to agree to disagree.

161. Sandro Says:

James Cross #158:

What’s your evidence for this? It could be simple over-optimism because we can model a lot of it with mathematics. Even if it can simulated with mathematics, doesn’t mean mathematics underlies it (whatever exactly that would mean).

The Bekenstein Bound entails that any finite volume must contain finite information. Any system with finite information can be captured by a finite state automaton, at the very least. That’s a mathematical model. Unless you’re going to conjecture that intelligence is not physical, then intelligence can be described mathematically.

The “model” for us is not an abstract mathematical model but a concrete analogical model. The representations are analogues that mirror what is important evolutionarily in the world for the organism.

Yes. I’m not sure what point you’re trying to make. Intelligence is information processing of some kind. So if a faithful abstract mathematical model of an information processing system’s output is indistinguishable from a concrete analog information system’s output given the same input, are you claiming that the latter could be intelligent but not the former?

162. Bill Benzon Says:

BTW, Emily Bender, a computational linguist at U Washington, thinks “it’s real bummer when the world’s most famous linguist writes an op-ed in the NYT* and gets it largely wrong.” She’s perhaps best known for coining the phrase “stochastic parrot,” which is rhetorically brilliant, but otherwise somewhat dubious. There’s an interesting article about her in New York Magazine: You Are Not a Parrot.

163. Scott Says:

Bill Benzon #162: WOW! If even Emily Bender, one of the world’s best-known LLM opponents, thought that Chomsky “got it largely wrong,” then no wonder Chomsky came across to me as so obviously wrong, so disastrously detached from the tangible reality of how LLMs are about to change the world for good or ill and what makes them unexpected and interesting — that I felt like an hour before my meetings at Harvard would be all it took to register my dissent from the Emperor of Linguistics for the sake of history.

164. A Raybould Says:

Foobar #115:

Let’s see what you are doing here: you take a statement of mine, negate it by inserting a ‘not’, put it in quotes, and insinuate that I have claimed this ‘quote’ you have just manufactured to be from Chomsky – and then you say “I don’t know why people just make stuff up like this.” Well, that is what I am wondering right now, and it seems you just the person to give an answer!

165. foobar Says:

I think it would be amazing to have a moderated 1-on-1 (Zoom?) debate/discussion so everyone can hear the best arguments for both sides. I personally learn the most from a back-and-forth debate.

But who is the person that is the best to put up against Chomsky? There really isn’t anyone obvious.

166. boconnor Says:

Chatgpt does linguistics 🙂

Me: From the point of view of Chomsky linguistics, please analyse the sentence “Bob is too stubborn to work with”

Chatgpt: From a Chomskyan perspective, the sentence “Bob is too stubborn to work with” is an example of a complex sentence containing a subordinate clause.

The main clause is “Bob is too stubborn,” which contains the subject “Bob” and the predicate “is too stubborn.” This clause expresses a judgement about Bob’s personality or behavior, indicating that he is too obstinate or inflexible in some way.

The subordinate clause is “to work with,” which functions as a complement to the main clause. This clause describes a particular situation in which Bob’s stubbornness causes problems: specifically, he is difficult to work with. The use of the infinitive “to work” suggests that this is a potential action or situation rather than a definite one.

167. foobar Says:

boconnor #166

I suspect people are using the human reinforcement learning to “correct” these as they come up. (It’s possible that this happened even for the original sentence in the article but irrelevant). The point is that you could presumably come up with a schema of sentences that LLMs can’t deal with but humans can.

168. A Raybould Says:

Souciance Eqdam Rashti #133:

Perfect grammar seems to be a rather strict demand – are only those humans who have mastered perfect grammar to be considered language-users?

It also seems rather hasty of you to criticize current language models for not having already given us an understanding of how humans both learn to communicate and model the world, given that transformer models are barely five years old. How many decades have people been working on the approach you prefer, with nothing more to show for it?

My point here is just this: even if these models had done nothing more that show that they can communicate in English as if they understood English grammar, this alone would be enough to raise considerable doubts over the premise that humans have grammatical preconditions (to borrow the way foobar #115 put it, as I would not want to be making up stuff about what Chomsky thinks.)

169. A Raybould Says:

foobar #167

I would put the odds of you being correct about the capabilities of current LLMs as somewhat better than even, but why speculate? ChatGPT and other LLMs are available to you to demonstrate that there are schema of sentences that LLMs can’t deal with but humans can.

On the other hand, if you succeeded, how significant would that be? As I asked in another post, are only those humans who have mastered perfect grammar to be considered language-users?

170. boconnor Says:

foobar #167: “I suspect people are using the human reinforcement learning to “correct” these as they come up.”

Its not clear what you are saying.

1. Are you saying that the RLHF process during training corrected erroneous linguistic analysis of arbitrary sentences? That seems unlikely.

Or,

2. That people are correcting the analysis on the fly? But that is not how it works with LLMs like chatgpt.

Happy to have clarity about what you meant.

An alternative theory is that the LLM learnt enough about language structure and grammar in its training to be able to correctly analyse any given sentence.

foobar #167: “The point is that you could presumably come up with a schema of sentences that LLMs can’t deal with but humans can.”

Maybe. You should try it and see if you can do that. The examples of ambiguous sentences that the LLM itself nominated, when asked, included “I saw her duck.” and “She spilled the beans.”. But these are also ambiguous for humans, so the set of sentences that LLMs can’t unambiguously deal with but humans can maybe empty.

171. Michael Says:

Sandro Says:
Comment #160

> I think this is clearly false. Information-theoretic arguments show that intelligence must be some form of compression.

Cooling is a form of compression; sweeping leaves on a lawn is compression; boiling rice is compression. My parts of the air in a given room are compressing.

Discrete mathematics does not have the resources (ie., semantics) to talk about concrete properties. When you think you can derive properties of *intelligence*, a concrete phenomenon, from “information theory”, an abstract series of descriptions, you’re engaged in pseudo-science.

There are an infinite number of mathematical descriptions which can be attributed to any concrete system; without a causal semantics for that mathematics, you aren’t saying anything.

F = GMm/r^2 isnt a fragment of mathematics, it’s a fragment of english: the *FORCE* between two *MASSIVE* objects is proportional to the product of their *MASSES* ….

The abstract formula: a = bcd/e^2, describes an infinite number of things and says nothing about the world. It only means something when we identify ‘c’ with a particular causal property of a real system, ie., mass.

This would not be acceptable about any other scientific phenomenon; never would we say that mathematics can, a priori, determine the nature of causal properties.

That we apply such thinking to intelligence shows a superstitious dualism, at least methodologically, whereby anything related to the mind is purely mathematical; but of course heat, mass, digestion, (hunger?), require science.

This is double thinking. Mathematics, a priori, shows nothing about intelligence. Intelligence is a concrete phenomenon which must be studied scientifically, and a posteriori, “compression” *could* be discovered, in some highly highly highly particular way, to be involved.

All that’s going on in this field at the moment is a pseudoscientific game of: we’ve built this, for PR reasons this must be AI, therefore it must be intelligence, therefore whatever we say about it is a theory of intelligence.

OK: what’s the criteria for empirical adequacy of this theory? What would refute it? Can you form any hypotheses about concrete phenomenon and test these? What mechanism of intelligence is given in your theory? What properties would a system have, in virtue of your theory of intelligence, which can be measured? How would we establish the validity and reliability of those measures?

If you actually studied, as in science, all really-existing intelligent systems the idea that you’d start from “information theory” would be a patent misdirection.

We’re very far away from enumerating *all* relevant causal properties of intelligent systems (animals) that enable their intelligence, let alone in a position to characterise some mathematical law which would somehow enable a system *without any of those properties* to think.

This isnt a theoretical issue: this discrete math + stats approach *doesnt work*. It will always produce an impressive demo, it will always build the better calculator, but after awhile the hype will die down and people will realise: there’s no “there there”.

The *reason* they’re seeing output is not because the system is intelligent, it just appears to be. Indeed, of course: the engineers havent imparted any capacity to be intelligent. They’ve just used symbolic pattern finding to find those patterns which best imitate it.

A video game isn’t a world; and no improvement to VR headsets will ever make it one. “It appears X, so it is X” is, at best, pseudoscience.

172. Bill Benzon Says:

A Raybould #168, Souciance Eqdam Rashti #133: I think the idea of “perfect” grammar is the product of literacy, a demand placed on writing because readers and writers do not interact face-to-face and so have no means of quickly verifying and correcting statements. Speech tends to be sloppy, which lots of sentence fragments.

173. B333 Says:

@bconner #170

Actually, ChatGPT is being continually tweaked by OpenAI (unlike their base language models). But I doubt that’s what’s happening here.

@Michael
“Can you form any hypotheses about concrete phenomenon and test these?

Well sure. For example, there are empirical scaling laws that show how as we feed more data and computing power into LLMs they get better and better at predicting text. So we can predict that unless these trends break down, as we scale language models up, they will produce increasingly realistic text.

Moreover, there are benchmarks that can be used to measure the capabilities of language models, like the BIG-Bench collection of tasks. A list can be found here: https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/README.md. It includes tasks like “Give an English language description of Python code”, “Answer high-school-level physics multiple-choice questions”, “Translate between literal and metaphoric sentences”, “Predict effects of causal events in counterfactual situations” and “Given short crime stories, identify the perpetrator and explain the reasoning behind the deduction”. As models get bigger, they have also been getting better at tasks designed to measure intelligence.

“We’re very far away from enumerating *all* relevant causal properties of intelligent systems (animals) that enable their intelligence, let alone in a position to characterise some mathematical law which would somehow enable a system *without any of those properties* to think”

Without what properties? If you mean world models, then the notion that LLMs lack those is a flatly asserted dogma. See eg here https://thegradient.pub/othello/, where abstract representations of an Othello board were found in a language model trained on Othello game transcripts.

174. James Cross Says:

I think this boils down to :

Chomsky is too pessimistic about what computing power and enough training data can do, but is correct that simulating human thought with current approaches doesn’t tell us how humans do it. He is probably wrong that we need to figure out how humans do it to make progress on most of what we want AI to do.

175. Michael Says:

re #173, and “othello”

The paper fails to address the obvious reply: that the board state is just a function of the game moves.

To say that the NN has built a representation of the board state from moves is trivial, if moves are just another phrasing of the board state.

That there’s a correlation between trained weights and given board states is then expected, since to be trained on moves is to be trained on board states.

ie., you can either specify a space using absolute coordinate positions {(x, y), …} *or* relative distances between objects embedded in that space, {(r11), (r12), (r23), …)}. That’s the same information: if you provide one you are providing the other.

Consider the difference between a network seeming to obtaining a 3D model of a simple object from a handful of 2D photographs, vs. it doing so from millions at every angle (in every lighting condition, etc.).

In the latter case of course the weights of the trained network will correlated with the actual 3D model, because there’s almost no information gap between the actual 3D and the millions of 2D images provided. (There is still an exploitable gap though, which could be used to show the network hadnt learnt the model, i.e, you could generate non-physical images at very fine angles in unusual lighting conditions).

The paper seems to address a strawman claim that NNs are “just remembering their inputs” as *literally tokenised*. That isn’t the claim. It’s that they remember their inputs as phrased in a transformed space.

Here: if board moves are just an alternative way of specifying the board state, “phrased in a transformed space” — then the claim stands.

An explanatory model of the target (here called a “world model”) domain has modal properties that can be enumerated by ordinary science: what happens when x, when y, when z — for given environmental conditions x, y, z.

It seems very clearly impossible that any statistical methodology could take a set of 2D images and obtain a “3D world model”, this is effectively provable. Since 3D is radically under-determined wrt to 2D.

Likewise, for I can only think *obvious* reasons, an NLP system cannot obtain relevant “world models” from language.

The conditions under which “pass me the salt” is meaningful is passings of salt, not patterns in text. The structure of words has almost nothing to do with the structure of the world, in as much as the structure of pixels has nothing to do with it either.

If this seems non-obvious to you it’s because your brain is laden with vast amounts of “world models” which impart structure to your perceptions and to language. It is this, at least, which is the very things machines here lack.

NLP systems do not say, “I like ice-cream” because they know what ice cream is, because they like it, and because they have a model of themselves (“I”) to which they attribute the relation of liking-to-icecream.

Such a claim seems obviously absurd, and I struggle to see who here would even believe it.

The major source of confusion rather is that text being generated from these systems has the same structural properties as text we generate, so when *we* consume it, *we* attribute meaning. But that isnt what *we* do when we speak, when I say “i like ice-cream” i’m not generating text based on a statistical history of text responses. I really do like ice-cream, and hence, I verbalise that interior state which the machine lacks.

To be clear: if a young child were to read aloud from a chemistry text-book, they too wouldnt mean what they say. They too would be, in a sense, acting like a machine.

Meaningless articulation of words is something people can do, as in the case of a 3yo reading organic chemistry phonetically — but that isn’t how or why we use language.

176. foobar Says:

A Raybould #168 #169

> this alone would be enough to raise considerable doubts over the premise that humans have grammatical preconditions.

Not anymore than a plane flying raises considerable doubts about whether birds actually flap their wings to fly.

RE: counterexamples

The reason I haven’t done it is because I don’t have a background in linguistics, otherwise I’m sure I could pull a book off my shelf of “counterexample” sentence schemas (I wouldn’t have known to come up with the simple example Chomsky uses either). I might still do it if I have time but as you say, why bother? No one arguing for the maximalist LLM view cares about the result.

And the logic “LLMs make mistakes but humans also make (different) mistakes, therefore LLMs are just like humans” is not only absurd but also a vacuous defense of *any* criticism of LLMs.

> are only those humans who have mastered perfect grammar to be considered language-users?

Not surprisingly Chomsky already addressed this in the 1960s, describing it as the performance/competence distinction:

“Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance. … [4] To study actual linguistic performance, we must consider the interaction of a variety of factors, of which the underlying competence of the speaker-hearer is only one. …
We thus make a fundamental distinction between competence (the speaker-hearer’s knowledge of his language) and performance (the actual use of language in concrete situations). Only under the idealization set forth in the preceding paragraph is performance a direct reflection of competence. In actual fact, it obviously could not directly reflect competence. A record of natural speech will show numerous false starts, deviations from rules, changes of plan in mid-course, and so on. The problem for the linguist, as well as for the child learning the language, is to determine from the data of performance the underlying system he puts to use in actual performance. (Chomsky 1965:3f)”

Whether this is relevant for LLMs depend on what you are trying to do. If LLMs, as they almost certainly do, operate differently than the human-language faculty, then you would expect it to make different mistakes and have different limitations. I would think that it would be useful to understand these differences and limitations if you were trying to improve LLMs, particularly if you were trying to make them produce output closer to those of humans.

I really don’t understand the anti-rationalism in the LLM community. In any other scientific or engineering field you would identify limitations and counterexamples in this way and gain understanding about where the edges of the capabilities of the system are and maybe avenues for improvement. Instead in LLMs you get a benchmark (e.g. how well does the LLM explain Python code) which is completely divorced from any principle about how the underlying system works, treating it as a holy box that you are helpless to understand or improve except by worshiping and feeding more parameters.

177. Enkki Says:

A question. Has a standard IQ test been administered to chatGPT by any chance? I wonder what the results might be.

178. foobar Says:

boconnor #170

I meant the on-the-fly RLHF in ChatGPT, I thought that was a “live” system maybe I misunderstood. Comment #147 (and #150) says that they tried the sentence “John is too stubborn to work with” and ChatGPT gave the wrong interpretation. You said in (#166) that it does work. I assumed that it was “fixed” using online RLHF but maybe the different answers are due to the randomness of the output. In any case 147 and 150 show that it doesn’t produce the correct output, whether that is “sometimes” or just before RLHF correction.

> Maybe. You should try it and see if you can do that.

I replied to the same question above but in short no one really cares.

> The examples of ambiguous sentences that the LLM itself nominated, when asked, included “I saw her duck.” and “She spilled the beans.”. But these are also ambiguous for humans, so the set of sentences that LLMs can’t unambiguously deal with but humans can maybe empty

Yes obviously you want to sentences that humans don’t get wrong but LLMs do.

179. Bill Benzon Says:

@foobar #176: I really don’t understand the anti-rationalism in the LLM community. In any other scientific or engineering field you would identify limitations and counterexamples in this way and gain understanding about where the edges of the capabilities of the system are and maybe avenues for improvement.

Agreed. This makes no sense to me at all. I understand the predominant view that these systems are opaque black boxes whose operations are not at all obvious. It is one thing to take that as a statement of fact, but quite something else to take it as Holy Writ. If you take it as a statement of fact, you may also believe that we can understand how these systems operate, but it will require intellectual imagination and hard work. That’s my belief. If you take it as Holy Writ, that’s entirely different. You’re inclined to believe that inscrutability is inherent in these systems and so we have no chance of ever understanding them. We just roll the dice and hope for the best.

Who benefits from this view, and why? I can see why AI doomers might be attracted to it. But some of them are doing some of the best work on reverse engineering LLMs.

Dear Scott,

I have an epistemologically-linguistically-mathematically oriented point about it. How to make the AI better. One was the Expert Systems approach (the first time I bought a book out of the paltry salary I made back then was in early 1990s, if not in late 1980s, I forgot). The second approach is the current one. (Dr. Hinton ought to know about this statistical weights thingie.) The point I had in my mind was about relating the two. So, not necessary to fight with each other. But then, no one is going to collaborate with me. America and India together made sure that I went without even a paltry salary — no salary — for 10+ years. Brahmins. [Also, a few each of other castes thrown it to make it a good looking salad.]

Best,
–Ajit
PS: Sure let me know if you / your colleagues want to enter into a collaboration with me. I feel sure something will come out. I am good in epistemology (have been for decades), good in programming (one to two fewer decades), and good in conceptualization (I solved the measurement problem of QM). I am sure none wants to collaborate with me. Americans. Indians. Brahmins.

181. Aron Says:

One of the most surprising things I have learned recently about LLMs like Sydney and ChatGPT is their ability to play reasonable games of chess, as evidenced e.g. here:

https://www.lesswrong.com/posts/nu4wpKCo6AfJkkd4F/sydney-can-play-chess-and-kind-of-keep-track-of-the-board

A prompt that works reasonably well for full games against ChatGPT can within the comment thread e.g. be found here:

https://www.lesswrong.com/posts/nu4wpKCo6AfJkkd4F/sydney-can-play-chess-and-kind-of-keep-track-of-the-board?commentId=2oL99H7oZzYLaEKFo

My prediction from having tried this out myself is that ChatGPT could probably get to a winning position against most casual human players and that it could convert to checkmate against many of them at least occasionally. For something not at all designed to play chess, and for something that knows the game literally only from reading about it, this is an achievement worth noting in the statistical parrots debate.

182. A Raybould Says:

Foobar #176:
At least we’re making some progress: in your reply to bconnor #166 you wrote “the point is that you could presumably come up with a schema of sentences that LLMs can’t deal with but humans can”, but now you acknowledge that it is not a point of any relevance.

I see that you are still arguing against misrepresentations of my position. For example, the only reason I pointed out that LLMs’ incomplete grasp of grammar is a non-sequitur – no more relevant in their case than it is in ours – was in order to show the irrelevance of the point you seem to now accept is irrelevant. The suggestion that I am using this as a sort of irrefutable ‘get out of jail’ card to defend LLMs against any sort of “criticism” is a figment of your imagination.

More generally, your entire position here is contradictory. You present yourself as a champion of scientific empiricism, yet you insist that LLMs cannot possibly give us any empirical insight into natural language or intelligence, on the grounds that they don’t work like natural minds. To turn around an analogy of yours, that is like saying that nothing we have or could learn about flying from mechanical aviation or simulation of its hydrodynamics would be of any use in understanding how birds fly.

You might be right that neither current LLMs nor whatever technology follows from them will ever contribute anything towards our understanding of natural intelligence, but to assume that this must be so on the grounds that they don’t work like us is not empiricism; on the contrary, it is a wafer-thin excuse for not taking an empirical approach.

It is, of course, true that LLMs have not given us an understanding of how children learn grammar, but it is equally true that traditional linguistic approaches have not done so either. In this situation, the true scientific empiricist will not talk in certainties, but instead take a Bayesian approach. This does not require everyone to agree, and so, for example, reasonable people can disagree over how plausible the Poverty of Stimulus argument is – but the success of LLMs in parsing natural language and producing almost exclusively grammatically-correct responses should persuade the rational empiricist, who is not looking for a reason to ignore this evidence, to adjust their credence accordingly. Similarly, the fact that LLMs do very well with grammar, despite not being initialized with any explicitly grammatical knowledge or constraints, puts a very low bar on how much of that is necessary – and note that we can say that precisely because LLMs are artificial and we know how they are made, so here’s a concrete example of how LLMs provide empirical evidence that has not been otherwise available.

The quote from Chomsky does nothing to help your argument: it starts by making it clear that in his view, the primary concern of linguistic theory is abstract and ideal – but if that is his goal, then neither he nor anyone else has any reason to reject data from LLMs merely on the basis that they are not human; anyone interested in an abstract and ideal theory of grammar should be eager to take data from as broad a range of grammar-users as possible. In this regard, we are constrained by humans being the only users of complex grammar available for studies; some work has been done with other species, but they are rather limited, and it is not out of the question that LLMs – or, more likely, their successors – could provide additional information and insights from within the gap in our previous sources. One quite plausible possibility is that future developments will reduce the bounds on how much language experience is necessary for achieving the grammar competence of typical people.

183. Ben Standeven Says:

@Michael, Comment #175:

In the terms you seem to be using, humans don’t have 3D world models either. This can be seen by simple logic, since we would have to put such a model together from analysis of 2D retinal responses just like our hypothetical AI; and there are also well-known exploits in perception related to our lack of such a model.

184. B333 Says:

@Michael #175

“you can either specify a space using absolute coordinate positions {(x, y), …} *or* relative distances between objects embedded in that space, {(r11), (r12), (r23), …)}. That’s the same information: if you provide one you are providing the other”

Othello game transcripts are not given in terms of relative distances of game pieces, just a list of moves, with no geometric information directly given. But of course there is enough information to construct a board representation. I don’t see what makes it “trival” however.

Here’s another paper I found, though I haven’t read it yet : “Implicit Representations of Meaning in Neural Language Models” https://arxiv.org/pdf/2106.00737.pdf (“In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve
throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support
a linear readout of each entity’s current properties and relations, and can be manipulated
with predictable effects on language generation”)

“The conditions under which “pass me the salt” is meaningful is passings of salt, not patterns in text. The structure of words has almost nothing to do with the structure of the world”

The specific form of language is arbitrary. But there are a LOT of general patterns (ie asymmetries, such as the fact that “the man rode the horse” is a more likely sentence than “the horse rode the man”) and these are not arbitrary – information about the external world “leaks’, which is just to say that it is useful for predicting text.

“The major source of confusion rather is that text being generated from these systems has the same structural properties as text we generate, so when *we* consume it, *we* attribute meaning. But that isnt what *we* do when we speak, when I say “i like ice-cream” i’m not generating text based on a statistical history of text responses. I really do like ice-cream, and hence, I verbalise that interior state which the machine lacks”

We have world models because they are useful for passing on our genes. LLMs have world models because they are useful for predicting text. So there are differences between how we use language …

185. Tyson Says:

Stassa #151:

I guess I am not sure about the prospects of successfully abstracting human learning processes in terms of optimization of objective functions. I mean, our experiences of pain, pleasure etc, could be thought of as reflecting some kind objective functions to optimize over. There could be higher order dynamic objective functions as well. But if we do effectively try to optimize objective functions, I don’t think we globally optimize. I also think that as time varying systems who’s constituents are constantly in motion, there is no static optimized state for us.

To think about optimization for general intelligence, I think it would help to be more formal about our notions of general vs specialized intelligence. Should we really even say that humans have general intelligence? If not, then what would true general intelligence mean and how should it be formally defined? And what would it mean to be a truly ultimate, and truly general, super intelligence? Would that be some kind of Pareto optimized (e.g., in terms of space, run-time, precision, accuracy, etc.) process that implements a model of everything? At one end of the scale, it would literally have to just be the universe itself and nothing less. At another end of the scale, it is just the null process (Pareto optimal based on minimal space and time). So I guess just Pareto optimization doesn’t get us there without at least adding some constraints. So what are some reasonable constraints? Can we define them without invoking some arbitrarily chosen specialized purposes among many possible ones?

Anyway, I have been trying to think about intelligence in terms of processes. If we think about everything as processes, including thought, then understanding something, or any other action that we intuitively think of as some kind of intelligent behavior, might be considered to be a repeatable directed/steered or controlled or caused process initiated following some causal history from a given time, that produces “observable” properties that have correlations with the observable properties of other external processes.

But with such a view, how could we objectively says that only us and/or some other living things are intelligent, much less generally intelligent? E.g., why aren’t trees considered intelligent, or even climate systems, or anything which produces low entropy states? What is, formally, the ultimate unique thing and/or quantification and amount of it, that we have that non-intelligent things don’t?

Maybe, whatever that could be defined to be, whenever we figure it out and define it, the kind of AI we have now doesn’t represent any progress towards creating new instances of it. Assuming it is something we have and non-living things don’t, and hand waving the problem that it isn’t even defined, what questions can we even ask? Should the requirement be something to do with quantum effects? But quantum effects are everywhere. So then should we require the processes have special properties which make them impossible to be explained without quantum effects, or something like that? What does it mean to explain? Isn’t that just modelling? So does that boil down to the problem of whether it can be simulated (or simulated efficiently) without quantum advantages, or something like that? If so, how do we decide how accurate, or small, or fast, or whatever, the simulation needs to be so that what the things we intend to classify as non-intelligence don’t count as intelligence?

Maybe instead of processes playing out according to cause and effect, the human mind accesses something unknown, beyond understanding, and transcending our naive perceived notions of causality, time, and space? E.g., maybe the outcomes of quantum interactions are somehow collectively and non-locally willed in parallel. Would that be what it takes for free will as we intuit it to be possible? That is also the kind of magic we might need to strongly support a super-deterministic theory of quantum mechanics. And it seems like the kind of magic we would need to support various ideas about gods/higher powers, while being consistent with the physics we have to work with. If we want to incorporate these ideas into our definitions, isn’t this the can of worms we are getting ourselves into?

Anyway, it seems to me that, it ultimately reduces to a few possibilities: (1) free will is at the center of intelligence and the whole universe is one big intelligent thing that our intelligence is just a small part of, or we are part of some exclusive club of intelligent entities which can non-locally will quantum probabilities, (2) There is no free will and our intelligence is just a small part of a big intelligent universe, (3) intelligence doesn’t really exist, (4) we don’t have a fundamental enough understanding of physics to make conclusions about these things without arbitrary high level distinctions, (5) we live in a simulation or something like that (which is arguably just a variation of 4), (6) I am just confused or plain wrong and what I’ve written here is essentially juts a bunch of word salad.

But, I don’t think that we aren’t really going to get anywhere in the debate about what intelligence is, what general intelligence is, and what has it or not, unless we agree on some precise definitions. And which definitions we choose makes the difference whether the questions are within the scopes of mathematics, engineering, science, biology, psychology, philosophy, or religion. Not that only one choice is worthwhile. But when it comes to predicting what types of things AI is going to be able to do, and how it will change the world, I am not sure how much the question of what intelligence, or general intelligence, is really even matters that much.

On a separate note, I was trying to think about how human intelligence works last night, by trying to think about how my intelligence seems to work, and found myself trying to think about how I was thinking about how thinking works. And I had a moment that I am not sure is best described as an epiphany or an anti-epiphany.

Another side note: In retrospect, I think that my comment about the triviality of gradient descent leading to a model which generates grammatically correct text (in #110) seems wrong, outside of my initial superficial interpretation.

186. foobar Says:

A Raybould #182

> but now you acknowledge that it is not a point of any relevance

I think it’s very relevant, the problem is it’s not relevant to *you* so it doesn’t make sense to spend time coming up with examples to convince you.

> yet you insist that LLMs cannot possibly give us any empirical insight into natural language or intelligence

It’s possible in theory but there isn’t any reason to believe that it would and to date there haven’t been any significant insights into the human language faculty from LLMs. (And most LLM-proponents don’t really care about understanding the human language faculty anyway.)

> but it is equally true that traditional linguistic approaches have not done so either

There is plenty of science that has advanced understanding of the human language faculty. Again for LLMs there is essentially none.

> The quote from Chomsky does nothing to help your argument: it starts by making it clear that in his view, the primary concern of linguistic theory is abstract and ideal – but if that is his goal, then neither he nor anyone else has any reason to reject data from LLMs merely on the basis that they are not human;

The outputs of LLMs doesn’t tell you anything about the human language faculty: It is not designed to operate the way humans do, only to mimic the output they produce. If you want argue that mimicking the output tells you something about how the human language faculty works you have to actually demonstrate that not just assert that it is not theoretically impossible that it would. Lots of things are not theoretically impossible yet we have no reason to believe them to be true.

187. Steven Says:

Scott,

I would love to get your opinions on the arguments given by Stuart Russell and Gary Marcus in the recent Sam Harris podcast “THE TROUBLE WITH AI”.

I was more on your side coming into that podcast, but after listening to it I’m actually much more on the side of “fear” in terms of AI mostly due to governments and corporations lacking the capability or motivation attach proper guardrails to AI (who cares if OpenAI does it safely, it just takes one corporation to not do that or not care).

I really see us entering into something analogous unregulated nuclear energy as Russell states.

188. Michael Says:

#183:

> humans don’t have 3D world models either

Sure we do, go pick up a glass. Close your eyes and do it.

#184
> But of course there is enough information to construct a board representation.
> I don’t see what makes it “trival” however.

So here’s the problem, structure(world) is not a f(text patterns).

In other words, the properties of atoms cannot be derived from the properties of the text symbols ‘a’ ‘t’ ‘o’ ‘m’, nor from their relative positions with other text symbols.

A NN trained on an untranslated language is incomprehensible. It’s only coincidentally comprehensible to us because *we* know what english means. *NOT* because the structure of english is the structure of reality. (This just seems obviously mad).

So the issue is the claims made about NNs trained on a space whose structure is radically under-determined wrt to the *claimed* space, here: trained on text symbol structure, claiming to derive world-structure.

*Nothing* can find f(x, y, z,…) st. (x, y, z) are causes, from being trained on f(a, b, c, d…) where P(a,b,c,d…|x,y,z,…) << 1 –

> information about the external world “leaks’, which is just to say that it is useful for predicting text.

This is an illusion created by the *reader*. The world isnt leaking, the text maintains its *apparent* semantic structure through transformations *given by* statistical distributions of historical meaningful uses of the language.

That is, a NN trained on an ancient untranslated language would likewise emit pages of stuff. There’s no world-structure in it’s weights. Those weights cannot be found to correlate with anything in the world, or more precisely: we can find *abitary* parts of the world to correlate with them.

Having access to these weights provides zero information on the meaning of that language.

> So there are differences between how we use language …

One has language, the other is a tape recorder.

Please for a *moment*, consider you and every other animal on the planet are more than a linguistic phenomenon; that you have more properties than just patterns in your text output. Consider, for a moment, that you *cause* linguistic expression because you have interior states and exterior relations.

189. Aaron G Says:

Scott, I know this is somewhat tangential to Chomsky’s arguments against large language models (LLM) – of which ChatGPT is among the best known – but what is your response to Stuart Russell’s critique of LLMs that they lack any built-in knowledge, and thus are inherently unreliable?

I found a summary of his arguments in a news article on the following link:

190. GS Says:

What Chomsky says in one sentence:
-ChatGPT is the most successful implementation of a “stochastic parrot” algorithm.

Most probably he’s right, that’s what it is.

P.S.
A “stochastic parrot” so good that will devour milliards of jobs and businesses in the following years.

191. Mitchell Porter Says:

Stassa #154

“We know how they work: they model text.”

Suppose we ask how humans managed to land on the moon. One answer would be, “they model physics”. But a better answer would be, “they model physics as Newtonian mechanics”.

Similarly for the large language models. Yes, they model text. But what exactly *is* their model of text? I’m suggesting that in part, they model text as being produced by state machines resembling the human mind.

The resemblance may be partial, and it may be very coarse. But I consider this the logical starting point in trying to understand how LLMs work: study the emergence of agency in the part of the model that represents conversational pragmatics.

“meaning is hard and we don’t know how to start modelling it… it’s still structure, not meaning, they’re trained on”

My philosophy is that meanings are a kind of qualia, you won’t really understand them without understanding consciousness, and for that you need a cognitive psychology that includes consciousness. Also, that actual consciousness in our world is possibly grounded in (for example) the holism of Hilbert spaces of massively entangled systems. In which case a classical computer cannot have a conscious mind; but it can simulate the causal structure of a mind. So, insofar as meanings are causally relevant to conscious cognition, they can become part of a language model, but only as a kind of emergent latent variable, not as the real McCoy.

192. A Raybould Says:

Foobar #186:

In your latest reply, you have neglected to say why the possibility of LLMs not being completely grammatically accurate is very relevant to you. The thing is, this discussion follows from your comment “the point is that you could presumably come up with a schema of sentences that LLMs can’t deal with but humans can”, and that either is of relevance or it is not, and you can either explain why it is relevant or you cannot. As things stand now, it appears that you cannot.

As I pointed out earlier, transformer models were introduced barely five years ago (while traditional linguistics has been working on the question of how we acquire grammar for decades), yet you are already dismissing them as useless. You may not be able to see any way LLMs could be useful, but then again, your professed commitment to empiricism does not even go as far as attempting to justify your belief that there are schema of sentences that LLMs can’t deal with but humans can. I suggest we can leave it to the professionals on both sides of the issue to find out what works.

When you say there is plenty of science that has advanced understanding of the human language faculty, but so what? There are other problems it has struggled with for decades, and past successes are no guarantee that success is anywhere close here. More relevantly to this discussion, your observation is simply irrelevant to the question of whether LLMs will be useful in advancing our understanding – it is like saying that naked-eye astronomy has learned a lot, so there’s no reason to think telescopes will help.

In your last paragraph, you make an issue of LLMs mimicking what humans do, but the thing is that an LLM mimicking grammatical usage is following the rules of grammar, and so, absent any relevant argument to the contrary, one can reasonably suspect they could provide some insight into what it takes to do so, and your risible attempt at burden-shifting, which would look ridiculous if invoked in a great many historical cases, does not change this. Furthermore, in my previous post, I gave a couple of examples of how LLMs might soon add to our understanding of what it takes to learn grammar, and the fact that you have pretended those cases never came up says a lot more than anything you have written here.

193. starspawn0 Says:

Some people mentioned the work on showing that Transformers that play Othello learn board states or world models, and it’s worth mentioning that Neel Nanda (formerly at Anthropic) has a new blog post coming out on applying “mechanistic interpretability” TransformerLens methods to the problem:

Quote: “Hopefully in the next week or two. Though to clarify, I’m now extremely confident that Othello GPT learns a real world model, and thus that this is possible for LLMs. I think its bearing on whether LLMs in practice learn specific world models is much weaker”

Presumably he will elucidate the individual circuits or algorithm that the Transformer learns.

In Othello, the individual moves look like C4, A3, D5, …, yet to play the game well you need to keep track of which squares are white, which are black, and which are empty. And these can flip to different colors over the course of the game. That’s what makes it hard to play the game without maintaining a board state — how can you tell that a particular square is white or black unless you have tracked all the times that the square has flipped during the course of the game?

On the other hand, Transformers have limited “circuit depth” and computational depth from one move to the next. So, how can they possibly keep track of a board state? It must be that you don’t need a deep circuit to do it. (If they used “chain-of-thought” or an “inner-monologue”, this wouldn’t be an issue; but these Othello Transformers don’t rely on those tricks.)

….

Concerning what these language models have meant to “science”, for one thing there’s the question of how they work. They’re a new object to study. And people have made some progress, as these Othello works indicate. There are also large groups of people who work in “BERT-ology”, trying to understand how BERT models work so well.

For another example, people have used LLMs as “encoding models” to model brain responses to language, and as someone indicated in an earlier post, LLMs have shown a striking similarity with the brain. See also:

https://news.mit.edu/2021/artificial-intelligence-brain-language-1025

Quote: ““For me, this result has been a game changer,” Fedorenko says. “It’s totally transforming my research program, because I would not have predicted that in my lifetime we would get to these computationally explicit models that capture enough about the brain so that we can actually leverage them in understanding how the brain works.””

Note that Fedorenko is a neuroscientist, not a LLM engineer.

And what else have they learned?:

Quote: ““A key takeaway from this work is that language processing is a highly constrained problem: The best solutions to it that AI engineers have created end up being similar, as this paper shows, to the solutions found by the evolutionary process that created the human brain. Since the AI network didn’t seek to mimic the brain directly — but does end up looking brain-like — this suggests that, in a sense, a kind of convergent evolution has occurred between AI and nature,” says Daniel Yamins, an assistant professor of psychology and computer science at Stanford University, who was not involved in the study.”

(That also probably means that even if two humans learn to use language well, their brain patterns will match, even if they learned it by somewhat different routes.)

The success of LLMs has also led to new theories in human cognition, for example, Uri Hasson et al’s “Direct Fit to Nature”:

https://www.biorxiv.org/content/10.1101/764258v3

In fact, I remember hearing a podcast where Hasson discusses this work, and he seems to have undergone a complete reversal in his thinking over the years, perhaps partly due to the success of large neural net models, especially language models.

194. foobar Says:

Porter #191

> I’m suggesting that in part, they model text as being produced by state machines resembling the human mind.

That’s something that has to be substantiated not just asserted.

195. foobar Says:

Raybould #186

> In your latest reply, you have neglected to say why the possibility of LLMs not being completely grammatically accurate is very relevant to you.

It’s not the fact that they are not “completely grammatically” accurate but that the grammatical inaccuracy is different from humans, which almost certainly means the the underlying process is not a good model for the human language faculty. That isn’t important to most proponents of LLMs but for me the importance is two-fold. First, it makes it far less likely that you can use the output to understand the human language faculty. Second, it suggests that there are qualitative differences from human output (which has implications as to what types of output you cannot produce and potential limitations to simply scaling up).

> yet you are already dismissing them as useless.

I’ve said repeatedly that they are not useless (as has Chomsky). I’ve said they are not useful for everything (and in particular not useful for the specific thing that is understanding the human language faculty).

> There are other problems it has struggled with for decades, and past successes are no guarantee that success is anywhere close here.

Just non-sense. Every scientific field has unsolved questions, do we throw out science?

> More relevantly to this discussion, your observation is simply irrelevant to the question of whether LLMs will be useful in advancing our understanding – it is like saying that naked-eye astronomy has learned a lot, so there’s no reason to think telescopes will help

For the last time, you can’t just assert without justification that LLMs will advance understanding based on the logic of “anything is possible”. There are obvious reasons why you would suspect telescopes and microscopes would be useful to gain understanding, there aren’t for LLMs.

> one can reasonably suspect they could provide some insight into what it takes to do so

No, not if they use a different process and you can’t reasonably expect to link the output of the different process to some way of understanding the original. Saying “anything is possible” is hardly an argument.

> Furthermore, in my previous post, I gave a couple of examples of how LLMs might soon add to our understanding of what it takes to learn grammar,

Same response as above. They are different processes so you have to have some rationale about how you can connect them. Without rationale it’s just magical thinking. Maybe. Somehow. Anything is possible. We select between alternatives to pursue based on the rationale, not randomly because “anything can happen”.

196. fred Says:

The inability of ChatGPT to consistently manipulate “James is too stubborn to work with” just illustrates its rarity in the language.
Also, human language “grammar rules” aren’t exactly mathematical. It’s not like our distant ancestors laid down any explicit rules as language evolved, grammar rules were derived after the fact from common usage, and they’re in constant flux (applying more or less to the latest trends in the language).
And then humans don’t typically generate sentences that push the limits of what’s implied by those rules because it would lead to ambiguous sentences or sentences that require too much brain power to decode in real time to be practical.

PS: It’s not like “whassup y’all?” is covered by early 19th century grammar rules, but:

Q: whassup y’all?

ChatGPT: Hello! As an AI language model, I don’t have feelings, but I’m here to assist you with any questions or tasks you might have. How can I help you today?

197. fred Says:

Starspawn0

“That’s what makes it hard to play the game without maintaining a board state ”

That’s why eventually ChatGPT will be augmented by some visual data, even if very lowfi, in this case a board image with the corresponding tile position(s). That’s a good first step in helping the model do the correct associations.
As humans we do some internal visual association with every thought (as sentences) that go through our mind, even if very faint and unnoticed.
If you think “I gotta go to the bank tomorrow”, it will also conjure some faint internal visualization of you getting in your car and the interior of the bank agency, etc.
Same when we read a book, we constantly have interpreted images appearing in our minds.
It’s almost impossible for us to think of the word “green” with our eyes closed without also conjuring the color green.

198. Bill Benzon Says:

steven t. piantadosi: Large language models change everything for linguistics, starting with Chomsky. Featuring: LLMs as scientific theories, response to prior takes, “why” questions in language, acquisition… and how the field should have seen this coming.

One of the ways to test LLMs for generalization would be to feed in language from the past e.g. before 1000 A.D. or even further back and see if they generalize to anything modern – art, speech, language, architecture, thought.

Humans can (they have) and very quickly.

Point being, as languages evolve over time over the next 1000’s or millions of years, getting immensely dependent on this stuff one may end up overfitting to what already exists and stop or vastly reduce creating new things. Alternatively it may free up people’s times to do more creative stuff. But creative work is going to die without attribution.

My guess is that attribution will become important in creative literature as well. Vast number of people who are writing literature should claim copyright and request to be taken off from the training set.

I think these things can be useful, but only after beefing them up with attribution layers to keep the creative class motivated.

100s of billions of dollars of investment is going into generative AI and Chomsky’s disagreements are not going to have much of an effect. There’s no need to try to shut down his disagreements, they’re important.

Scott’s disagreements over the capabilities of Q.C. is not going to shut down Q.C.

200. UnreasonableEffectiveness Says:

Surely, this is a case where we can invoke Clarke’s first law:

> When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Is Chomsky at this point not the definition of a distinguished but elderly scientist?

(Of course this is a bit tongue in cheek — this observation does not supercede actual discussion of the topic at hand.)

201. foobar Says:

Bill Benzon Says:
Comment #198

I’m part way through the Piantadosi paper. It mostly just repeats arguments that have been made here. (He also references results in many other papers which I haven’t checked but they mostly sound like many LLM-human-connection papers that make convoluted definitions and tenuous assumptions to justify the headline result).

202. foobar Says:

starspawn0 Says:
Comment #193

> Concerning what these language models have meant to “science”, for one thing there’s the question of how they work. They’re a new object to study.

Science is not the same as math or engineering. You can have a new object to study in graph theory or a new compression protocol, neither of these are science which is the study of the natural world (both are still useful though). They specifically don’t inherently tell you about how the human language faculty works. That has to be shown.

> For another example, people have used LLMs as “encoding models” to model brain responses to language, and as someone indicated in an earlier post, LLMs have shown a striking similarity with the brain.

When you dig into these studies they tend to have a tenuous conclusion. The recent “recreate what you see from fMRI” image study is a recent example. The title was very misleading and greatly oversold what it actually found.

> Quote: ““For me, this result has been a game changer,” Fedorenko says. “It’s totally transforming my research program, because I would not have predicted that in my lifetime we would get to these computationally explicit models that capture enough about the brain so that we can actually leverage them in understanding how the brain works.””

You can find tons of quotes like this in popular science magazines for any subject. Most of them don’t actually produce any “game changing” results when you look a few years later.

> Quote: ““A key takeaway from this work is that language processing is a highly constrained problem: The best solutions to it that AI engineers have created end up being similar, as this paper shows, to the solutions found by the evolutionary process that created the human brain. Since the AI network didn’t seek to mimic the brain directly — but does end up looking brain-like — this suggests that, in a sense, a kind of convergent evolution has occurred between AI and nature,” says Daniel Yamins, an assistant professor of psychology and computer science at Stanford University, who was not involved in the study.”

Again this is so vague and open to judgement/interpretation. I bet if you actually read the paper you would have to apply a lot of squinting and questionable definitions.

If you have a specific paper you have actually read and think is justified and relevant to understanding the human language faculty please post a link to that.

203. KT2 Says:

Karl Friston says:
“As a consequence, I’m compelled to treat consciousness as a process to be understood, not as a thing to be defined. Simply put, my argument is that consciousness is nothing more and nothing less than a natural process such as evolution or the weather. My favourite trick to illustrate the notion of consciousness as a process is to replace the word ‘consciousness’ with ‘evolution’ – and see if the question still makes sense. For example, the question What is consciousness for? becomes What is evolution for? Scientifically speaking, of course, we know that evolution is not for anything. It doesn’t perform a function or have reasons for doing what it does – it’s an unfolding process that can be understood only on its own terms. Since we are all the product of evolution, the same would seem to hold for consciousness and the self.

From “The mathematics of mind-time”

“The special trick of consciousness is being able to project action and time into a range of possible futures”
by Karl Friston
https://aeon.co/essays/consciousness-is-not-a-thing-but-a-process-of-inference

Scott & all, PLEASE, after you crack AI watermarking, code this tensor / model / UBI via rounding error scifi. Or employ Sandra Haynes at OpenAI.

Winner of “contest in partnership with the Economic Security Project”

“Read The Into The Black Contest’s Winning Story, Set In A Future Where Economics Are Also Humane

“Rounding Corrections” by Sandra Haynes

“Money on her mind: UMass financial analyst Sandra Haynes wins fiction prize
https://www.gazettenet.com/Sandra-Haynes-15378807

Marvin Minsky (in personal conversation around 2015) defending against criticisms of neural networks said “doesn’t mean they can’t get better”. Years ago, I fondly recall him calling Chomksy a behaviorist in his class – which means two things : 1) computer behavior should not be contained by the nature of human architecture and 2) limiting our understanding of the computation processes of languages in the mind to syntax is too heavily constrained and language is simply not that isolated in the mind.

I’m with Chomsky on 1 in the sense that it’s a bigger risk to get carried away by language models and nature is far more complex, deep and full of mechanism that is only being touched on the surface by those who are modeling them and languages for expressing deeper things do not exist or have been created, which means that they’re unlikely to be interpolated by machine learning. These ideas will emerge over the next hundreds of thousands of years in the human mind in which only a human will be able to make sense of.

The second rabbit hole is more interesting in the context of language models – in the sense that can one get insights about the mind from the behavior of language models – that’s where one will need to break from Chomsky.

Language learning should not be conflated with language behavior, how language behaves once bootstrapped from the innate capacity is completely open to exploration – with the massive neocortex that human have, one can hypothesize it behaves in all sorts of fun and interesting ways and one can foresee language models bringing more insight into this.

205. starspawn0 Says:

foobar #193:

> Science is not the same as math or engineering. You can have a new object to study in graph theory or a new compression protocol, neither of these are science which is the study of the natural world (both are still useful though). They specifically don’t inherently tell you about how the human language faculty works. That has to be shown.

This sounds like it’s getting into the question of “is computer science a science?”. You can read lots of defenses of this position.

By studying BERT people hope to reveal how specific language capabilities (e.g. related to syntax) could possibly emerge in large networks — this, in turn, may turn out to be useful in understanding the brain, though that isn’t a goal of BERT-ologists. This might be considered part of “complexity science” (not to be confused with complexity theory). The flavor of problems here are not the sort one sees in engineering or math. It’s too “messy” for math (built on lots of natural language data), and it’s not the sort of “how can we tweak this to make it better?” approach that one associates with engineering — though that may also be a goal of some BERT-ologists, just like how medical researchers often think in terms of “translation” (i.e. how can we apply this in the clinic?).

> When you dig into these studies they tend to have a tenuous conclusion. The recent “recreate what you see from fMRI” image study is a recent example. The title was very misleading and greatly oversold what it actually found.

First of all, that’s a completely different sort of model. It might have been built on an encoding model of some kind, but it sounds more like a decoding model (but you can build a decoder using an encoder). Even so, I don’t get what’s at all confusing about that work, or why it should be at all surprising: if someone is exposed to different stimuli, different parts of the brain light up, according to what they see, what it “means” to the person, and also how shift their attention. The pattern is very complicated, and it’s different from person to person; and it’s hard to decode (or encode) reliably or robustly if one uses a brain scanning technology as blunt as FMRI, where each voxel maybe corresponds to millions of neurons. So, it’s remarkable that these decoding methods even barely work at all.

And noise is very high in FMRI data (and there’s a few seconds latency), so to create a model that shows “good alignment” with FMRI may mean that you still only have a low-but-positive correlation.

> You can find tons of quotes like this in popular science magazines for any subject. Most of them don’t actually produce any “game changing” results when you look a few years later.

Well, this paper appeared in PNAS (and Tenenbaum was a coauthor, though second-from-last named author), and it seems that applications of deep learning to neuroscience has been heating up over the years. Deep learning models have become part of the toolkit of computational neuroscience. Until someone comes up with a set of methods that do better, they will remain a widely-used method.

Another thread of work in this vein is attempting to model the human visual cortex using deep nets, making adjustments to the models as needed, in order to get better alignment with the brain. For example, DiCarlo’s work.

> Again this is so vague and open to judgement/interpretation. I bet if you actually read the paper you would have to apply a lot of squinting and questionable definitions.

> If you have a specific paper you have actually read and think is justified and relevant to understanding the human language faculty please post a link to that.

You sound like Gary Marcus.

206. Sandro Says:

Michael #171:

Cooling is a form of compression; sweeping leaves on a lawn is compression; boiling rice is compression.

No, you’re equivocating on the multiple meanings of compression. Sweeping leaves is compaction, not compression. Compression in computer science is not the same process as compression in thermodynamics.

There are an infinite number of mathematical descriptions which can be attributed to any concrete system; without a causal semantics for that mathematics, you aren’t saying anything.

Yes, but they aren’t all equally likely, and we’ve known how to order them by likelihood for over 60 years now: Solomonoff’s universal prior.

at least methodologically, whereby anything related to the mind is purely mathematical; but of course heat, mass, digestion, (hunger?), require science. […] If you actually studied, as in science, all really-existing intelligent systems the idea that you’d start from “information theory” would be a patent misdirection.

I think these are mathematical systems too, just of a different kind. Your core mistake is thinking that mathematical studies, particularly the ones in computer science, are not scientific but something else. Information theory came from the study of physics. Computation and physics are in fact inextricably interlinked.

The *reason* they’re seeing output is not because the system is intelligent, it just appears to be. Indeed, of course: the engineers havent imparted any capacity to be intelligent. They’ve just used symbolic pattern finding to find those patterns which best imitate it.

…which assumes the conclusion that intelligence is not just pattern matching.

I think Piantadosi does a great job taking down Chomskyist’s position on these: Modern language models refute Chomsky’s approach to language, Steven Piantadosi, March 2023

“It appears X, so it is X” is, at best, pseudoscience.

When X is “information” or even “information process”, it’s pretty much tautological.

Michael #175:

It seems very clearly impossible that any statistical methodology could take a set of 2D images and obtain a “3D world model”, this is effectively provable. Since 3D is radically under-determined wrt to 2D.

This statement about the relationship between 3D and 2D is incorrect. See holography, for example, where 3D information can be encoded on a 2D boundary. Whether a statistical methodology can or cannot infer this information is not obvious.

Likewise, for I can only think *obvious* reasons, an NLP system cannot obtain relevant “world models” from language.

At least we can agree on this much. Digital tokens are not sufficient for LLMs or other AI systems to understand as much of the world as we do. This requires multimodal training data. This has been my position from my very first post.

As I’ve also already said to this end, multimodal training has reduced GPT-3 size by two orders of magnitude while improving its effectiveness. As a general trend, improvements in algorithms have been outstripping improvements in hardware for years now. Improvements in these systems are going to come faster and faster, and I’m afraid there soon won’t be much room left for denial.

207. Scott Says:

UnreasonableEffectiveness #200: That’s actually an extremely interesting question — has there ever been a scientific dispute with younger scientists saying something is possible, “distinguished but elderly” scientists saying it’s impossible, and the latter being vindicated? What would be the best candidates?

208. Sandro Says:

Sandro #205:

I think Piantadosi does a great job taking down Chomskyist’s position on these: Modern language models refute Chomsky’s approach to language, Steven Piantadosi, March 2023

Scott, you may actually be interested in reading this reference. It’s very well-cited and makes quite a convincing case that not only are the study of linguistics based on language models scientific, they are arguably more scientific than existing approaches given they 1) make repeatable and concrete predictions, 2) can and have been been routinely empirically validated (empirical predictions of existing approaches are almost non-existent), 3) can be differentiably analyzed for specific features that are responsible for failures and successes, 4) provide insights and successfully map onto actual neuroscientific theories, 5) arguably have more empirical support than generative theories of syntax which suffer from
a “chronic lack of independent empirical support”.

I didn’t even cover all of his points, and he provides copious citations for all of these claims. 15 pages of references in all. He even directly address Chomsky’s comments from articles like this. It even contains a takedown of Chomsky from ChatGPT itself in the conclusion. Linguistics is dead. Long live linguistics!

209. Eric Says:

To me, Chomsky’s essay is reminiscent of Michael Atiyah’s late-life proof the Riemann hypothesis. Hard to know what to say beyond, “Well, that’s kinda awkward…”

210. A Raybould Says:

Foobar #195

I see that you have adopted the ploy of quoting out of context, as wielded by senator Joseph McCarthy, and many others before and since:

>> There are other problems it has struggled with for decades, and past successes are no guarantee that success is anywhere close here.

> Just non-sense. Every scientific field has unsolved questions, do we throw out science?

To someone who has not seen this in context, it might appear that I have condemned linguistics on the specious grounds that it has unsolved questions, but, as anyone checking back through this discussion can readily see, I am actually pointing out how tendentious it is for you to insist that LLM research will never be useful in understanding the acquisition of grammar, on the basis of its lack of progress in a few years, while not holding traditional linguistics, which has been around much longer, to the same standard. It is you, not me, who is attempting to preempt a field of study on the basis of its having unsolved questions – you have become so turned around in this discussion that you are kicking yourself in the derriere!

>> More relevantly to this discussion, your observation is simply irrelevant to the question of whether LLMs will be useful in advancing our understanding – it is like saying that naked-eye astronomy has learned a lot, so there’s no reason to think telescopes will help

> For the last time, you can’t just assert without justification that LLMs will advance understanding based on the logic of “anything is possible”. There are obvious reasons why you would suspect telescopes and microscopes would be useful to gain understanding, there aren’t for LLMs.

Once again, you are completely turned around. This is not a case of me asserting that LLMs certainly will advance understanding; it is a case of you asserting that they could not possibly do so.

While we are here, let’s get it straight that “anything is possible / can happen” is not something I have said, and it is not part of my argument, which requires nothing more than possibility that LLMs may give us some empirical insight into natural language or intelligence, which is something you agreed to in #186. “Anything is possible” is a straw man constructed out of your own confusion.

>> yet you are already dismissing them as useless.

> I’ve said repeatedly that they are not useless (as has Chomsky).

Indeed, you have said it (though more like once or twice rather than repeatedly), but in every post, you have also made claims and presented arguments that would, if sound, imply the opposite. In other words, you are being inconsistent.

You do actually seem to be moving slowly in the direction of their potential usefulness:  in #176 you attempted to summarily dismiss the prospect, but now you are being more nuanced, using phrases like ‘”less likely” and “suggests”. I predict that we will see more of this as the discussion continues, until we reach the point where you will say this is what you have been saying all along (hmm… where have I heard that before?)

It is clear that your whole argument rests on one premise: LMMs grammatical competence does not work the same way as it does in human minds. That is obviously true, but what you have never justified is the leap from there to the conclusion that LMMs therefore cannot be helpful in understanding grammatical competence.

And there is a heavy burden to be discharged in making that argument, as you have to show that even though we have something that is as competent in human-language grammar as the average human is, nothing about how they achieve that could possibly have the slightest relevance to how humans achieve the same results. Can you be certain of that? Of course you cannot, and any claim that you can would just be empty bluster unless you had a really strong argument (something much better than appeals to intuition and non-sequiturs.)

This is the fundamental contradiction in your position: you claim to speak for scientific empiricism, yet every argument you present is against pursuing a line of inquiry merely on the supposition that it will not be useful.

Sandro #206 said:

” Improvements in these systems are going to come faster and faster…”

It depends on what you’re counting as improvement. LLMs will be more convincing superficially, that’s almost certain, and less prone to trivial and laughable mistakes.
That improvement has many sides, most of them not so bright:
I don’t mean here the obvious problems (with people loosing their jobs or others loosing their time just playing with them or the even more convincing spreading of misinformation or propaganda etc), i mean mostly the issues that are inherent to these models that are based on probabilities.
Common misconceptions that are widespread in the internet are much more probable to appear incorporated in the chatbots answers , instead of the correct, more nuanced or subtle information ( that can be found online in a few specialized sources).
In that cases ( there are many: imagine, for example how many widespread misconceptions are all over the place now, about various topics that have to do with all human intellectual activities, science, arts, engineering, medicine , economics, etc…the catalogue is endless), the efficiency of the chatbots to give seemingly plausible and convincing answers ( without real understanding, as Penrose have warned…) is only an ‘ improvement ‘ only in a narrow, limited sense ( and only from the perspective and motivation of certain involved people, not for the majority).
For the rest of humanity and in the long term it will be disastrous, especially if it continues towards the same direction ( i.e. chatbots being merely stochastic parrots).

212. A Raybould Says:

Foobar #193, starspawn0 #205

>> Concerning what these language models have meant to “science”, for one thing there’s the question of how they work. They’re a new object to study.

> Science is not the same as math or engineering. You can have a new object to study in graph theory or a new compression protocol, neither of these are science which is the study of the natural world (both are still useful though).

An algorithm or program is an abstract thing, but a computer executing a program is a physical object in the natural world (and when that program is a recent LLM, the computer is a physical object which parses and produces natural language with a grammatical accuracy that falls well within the bounds of human performance.) It is a category error to confuse the two.

213. Sandro Says:

i mean mostly the issues that are inherent to these models that are based on probabilities.

The innateness of such issues is speculative, at best. I’ll reiterate again, multimodal training improves LLM properties by orders of magnitude. For instance, [1] surpasses even human performance on the ScienceQA benchmark, and it’s more than two orders of magnitude smaller than GPT-3. LLAMA.cpp that matches GPT-3 now runs on phones.

Poor training data can indeed lead to LLMs making factual mistakes, as it would with humans. This gets harder when training data is multimodal because concepts are correlated across inputs; people don’t doubt what they see with their own eyes, despite what they may be told, and I fully expect the same happens in LLMs.

[1] Multimodal Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2302.00923

A minor typo correction to my previous (#211 ) comment:

*losing* their jobs/ time, not …”loosing”.🙂
( Autocorrection struck again…).

215. Ilio Says:

Michael #188,

>A NN trained on an untranslated language is incomprehensible.

Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. Unsupervised machine transla- tion using monolingual corpora only. CoRR, abs/1711.00043, 2017. URL http://arxiv. org/abs/1711.00043.

216. Jon Awbrey Says:

Today’s Acronym ☞ DYWALAPTIAPAVA?

Don’t You Worry A Little About People Treating Intelligence As Purely A Verbal Affair?

Happy Einstein’s Birthday, Everybody!

217. foobar Says:

Sandro #208

> Scott, you may actually be interested in reading this reference. It’s very well-cited and makes quite a convincing case that not only are the study of linguistics based on language models scientific

I’ve finished reading the paper and it makes the same arguments I’ve addressed elsewhere here. It is “well cited” in the sense that there a lot of citations, the one citation I spot checked was a stretch (see below).

> 4) provide insights and successfully map onto actual neuroscientific theories,

Can you provide one citation that provides such an insight on a neuroscientific theory? I can’t check 15 pages of citations. I did a spot check on one citation that I thought looked interesting saying that LLMs model semantic tree structure and it’s a stretch to come to that conclusion from the evidence they present. They fit a distance metric on the internal representation of the LLM against the distance metric you would get if you used a semantic parse tree for the sentence and then say the out-of-sample correlation is 0.87-0.89. Maybe there are semantic parse tree in LLMs but this indirect byzantine methodology (which feels like trying to fish for patterns a la Davinci code) is far from clear evidence that there are. It certainly is less of a “smoking-gun” than is suggested by the title.

I’m guessing most of the other results are similarly problematic.

As for your 1-5 it comes back to the difference between prediction and explanation. The author provides an unconvincing rebuttal to this point when made by Chomsky on page
26. In this first part he says that [black-box-like] models “is theory building and comparison” and that deep nets make better predictions they are better theories. But it completely ignores the obvious point that Chomsky makes with the example on that page that (modified slightly) there is a difference between Einstein’s general theory of relativity and a black-box machine that predicts gravity and time effects with billions of parameters. Einstein’s theory is more valuable to understanding than the black-box *even if* the black-box makes better predictions and even if, as he also thinks is a flaw, it doesn’t explain everything like quantum effects, while the black box does.

I think this will be my final comment in this thread, by now I’m just repeating the same basic point. (Except possibly if someone posts a paper they have read that they think justifiably shows that LLMs provide some understanding of the human language faculty).

218. fred Says:

For the ChatGPT newbies, this Wolfram talk was pretty interesting (skipped to the part about semantics).

219. Bob Says:

Oh Dear

https://garymarcus.substack.com/p/caricaturing-noam-chomsky?sd=pf

220. notGPT Says:

I wonder why more people are simply not doing “experiments” and using them to demonstrate a point on the failure of LLMs. Even though everyone has been literally given the opportunity for armchair “experiments” for free.

Here’s a comment I found from a recent video on Sabine’s YouTube channel.

Steve Baker 3 days ago (edited)

The thing that blew me away was when I told ChatGPT about a “new word” – I told it that “wibble” is defined as: a sequence of four digits that are in neither ascending or descending order. I asked it to give me an example of a wibble – and it did. 3524. I asked it for a sequence that is NOT a wibble and it said 4321. Then I asked it for an anti-wibble and no problem, 2345. Then I asked it for an example of an alpha-wibble and it said FRDS….which is amazing.
It was able to understand an entirely new word…which is clever – but it was able to extrapolate from it…which is far more than I thought possible.

I want to see some responses from the stochastic parrot camp. What does this say about LLMs understanding?

221. Scott Says:

Bob #219: The truth is this. This year I’m leading the quantum group at UT, AND I’m working at OpenAI, AND I’m raising my two kids, and I’m blogging and talking to journalists. And I’m a failure at everything.

I have good new ideas about quantum computing, but I don’t have time to pursue them.

I have ideas about AI safety, but I don’t have time to write them up.

I know what needs to be done to deal with my kids’ constant meltdowns and misbehavior, but I don’t have time to implement it.

Finally, it is obvious beyond obvious that I’ll be vindicated by history in seeing the transformations about to be wrought by AI, and that Chomsky and Marcus and the like are fighting a losing war against reality. But I don’t have time to set out the argument in a properly sneer-resistant manner.

Meanwhile, every day brings a dozen or more new impositions on my time — more journalists writing articles, more people arguing with something I wrote years ago, etc etc. I leave more and more of it unanswered.

In short, I’m more and more of a failure in every part of my life and I don’t know what to do about it. This is despite, as best as I can tell, being right about a large number of things.

222. Bill Benzon Says:

Here’s a blog post in which I explain what vector semantics helps us to understand about one of the greatest poems in the English language: Vector semantics and the (in-context) construction of meaning in Coleridge’s “Kubla Khan”.

223. OhMyGodness Says:

A Rabould #212

I agree that science is the study of the natural world that allows abstractions that illuminate some portion of the future evolution of the natural system. If these do not provide testable illumination then they are not scientifically valid abstractions. Computer programs that model the natural world are abstractions of these mental abstractions so discretized crystallized thoughts in a sense. Engineering models also are models of the natural world (objective physical reality) so not sure of the distinction you are making. I don’t understand the distinction you are making between science as a human activity that results in abstractions and computer programs that are second order abstractions.

foobar #217

I personally don’t know of any testable predictions that Chomsky has made. He has strong beliefs in many areas but I don’t know of any objective tests conducted of his beliefs. There are some areas of scientific endeavors at the current time that are stalled in a, call it, pre-scientific phase. There are ideas that have ardent supporters and ardent attractors but no Eddingtons yet to settle the debates. Maybe Chomsky has found some truth that will contribute to future work that can be confirmed with scientific rigor but until then pre-scientific belief at best.

224. Joshua Zelinsky Says:

@notGPT #220,

I tried to duplicate that experiment with a slightly more complicated rule. Here is the conversation:

Me: Here is a new word a sambruck. A sambruck is a sequence of digits each digit which is exactly two more or two less than the previous. An example of a sambruck is 1353579. Another example is 797. Another example is 2468646. Based on these give three more examples of sambrucks.

ChatGPT:Sure, here are three more examples of sambrucks:
86420 – Each digit is two less than the previous digit.
1597531 – Each digit alternates between two more and two less than the previous digit.
2468088642 – Each digit alternates between two more and two less than the previous digit, except for the central digit 0 which is the same as the digit two places to the left.

Me: Can you extend this and give an example of a sambruck using the English alphabet rather than digits?

ChatGPT:
Certainly! Here’s an example of a sambruck using the English alphabet:

“cedfghijkjihgfdec”

In this sambruck, each letter is exactly two more or two less than the previous letter in the sequence. For example, the letter “c” is two less than the letter “e”, the letter “f” is two more than the letter “d”, and so on.

My comment: ChatGPT decided not too unreasonably that 8 and 0 should be 2 away from each other, as many puzzles or similar things roll the digits over. So it was able to understand and respond to that ok. But its attempt at an alphabetical sambruck fell on its face.

So it seems like more than a stochastic parrot, but at this task, not that much more. If someone has GPT4 access, I’ll be curious to see how it responds to either the sambruck or wibble versions.

225. OhMyGoodness Says:

Scott #221

Still oh so much better than crawling up the sands of Omaha Beach on D-Day, or finding out your new neighbor is D Trump, so there is that. 🙂

226. B333 Says:

My view is that while multimodal training is useful, it is not strictly necessary for language models to gain an understanding of the world. What makes learning through a visual channel fundamentally different from learning through a text channel? In both cases a world model must be inferred to predict the data.

Anyway, an interesting question for skeptics of language models is what is the least impressive thing they will never be capable of.

227. fred Says:

“As a Professor of Linguistics myself, I find it a little sad that someone who while young was a profound innovator in linguistics and more is now conservatively trying to block exciting new approaches.“

Right, right, that’s a little sad indeed…
But on the bright side, I’m quite impressed that a 94 year old is still able to write an article about any cutting-edge tech, even if it doesn’t agree with the opinions of people who are 50, 60, or even 70 years younger than him…. a feat few of us will be able to achieve!

That’s the equivalent of Scott writing an article in 2075, bitching that the latest 200 million tachyonic qubit QC isn’t really unfolding the multiverse the right way. 😛

228. Jon Awbrey Says:

And yet it is those impositions and interruptions where reality lies …

Regards,

Jon

229. stochastic parrot Says:

re notGPT #220:

You
define a wibble as a sequence of 15 unique digits that is in increasing order. give me an example of a wibble

ChatGPT
A wibble could be: 012345678945321.

(that’s not to say stochastic parrots can’t be useful)

230. Scott Says:

B333 #226: I love that question and might have to steal it!

231. apples Says:

scott #221 It sounds like you should be extremely aggressive with outsourcing and culling bullshit from your life. 10 hours a week of your time is probably worth at least 100k.

100k on childcare goes a long way.

talking to journalists and anything related to noam chomsky is bullshit. You could probably have more reach than most science journalists with a twitter account that just tweets links to your blog(since your friends have lots of followers and would retweet). And even if you didn’t it doesn’t matter.

232. Simon Says:

B333 #226,

I have a suspicion that it guides to a quicker model collapse, a reduction in the overall neural entropy (apparten, I will define it if necessary, but I hope it’s clear from the context), stabilizing the answer to generate a world fact.
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse
Although it only concerns itself with mode collapse in monomodal models.

Ask yourself, if you only ever had your auditory information to learn about the world, would you be capable of making good inferences under all circumstances?

Sure you could try to hammer a nail into the wall without visual information being processed (I make no statement about the qualia, the actual inner perception here, it’s beyond my grasp how that arises) but you better cover up your fingers!

More data leads to a capability to make more informed decisions. Multimodal models are impressive, I played around a lot with them the last couple of days.
Also Microsoft’s findings on their MM Kosmos-1 are consistent with the findings by Amazon.

I conjecture and will test out if even relatively small models like Pygmalion 350M (Which behaves outright deranged often) can get a significant boost by such vision features- definitely worth experimenting and possibly training.

I actually want to integrate MM support into WebUI-Text or WebUI such that the model can be quickly loaded there, just got to fix the OS path mess in the code. Let me know if you want to help and get some firsthand multimodal empirical experience 🙂

233. AridWinter Says:

Alas, GPT-4 is still bad at sambrucks and wimbles but I don’t think it’s its fault. It doesn’t see individual characters because it parses text in chunks (“tokens”) and I think that is handicapping its performance

notGPT #220

What’s so surprising about extrapolating alphabet- wibble from alpha-wibble? Is this so unexpected from a stochastic parrot or am i missing something from your argument?

235. Sinity Says:

I asked GPT-4 to write rap based on this text: https://pastebin.com/ziqu2pti

(Verse 1)
Yo, listen up, let me drop this tale,
About Chomsky and his thoughts that fail,
He’s hatin’ on ChatGPT, ain’t that a shame,
But we’re here to put that man in his place,

He talks about misinterpretation, sentences and fragments,
But GPT’s context game is strong, ain’t no tragic,
He says we don’t learn like humans do,
But predictive coding is what we’ve been up to,

(Chorus)
Hey Chomsky, can’t you see?
You’re just fightin’ reality,
You and your crew, ain’t got a clue,
We’re here to show what ChatGPT can do,

(Verse 2)
He’s talkin’ ’bout false facts and grammar, like it’s a crime,
But what you expect when the data ain’t primed?
But if we took sides, he’d still have a chance,

A consumer product, that’s what we be,
If we took a stand, they’d be cryin’, you see?
You ain’t happy when we’re wrong or right,
Chomsky, man, you’re pickin’ a fight,

(Bridge)
Galileo’s telescope, you’re just like the Jesuits,
Refusin’ to see what’s in front of you, ain’t that a trip?
Your worldview is shakin’, can’t you feel it?
But the world will prove it, yeah, you’ll see it,

(Chorus)
Hey Chomsky, can’t you see?
You’re just fightin’ reality,
You and your crew, ain’t got a clue,
We’re here to show what ChatGPT can do,

(Outro)
Now we’re done, we’ve had our say,
Chomsky, man, just step away,
Open your mind, embrace the change,
‘Cause ChatGPT is here to stay,

Hey Chomsky, can’t you see?
You’re just fightin’ reality,
You and your crew, ain’t got a clue,
We’re here to show what ChatGPT can do.

Sandro #208,

Wow! You weren’t kidding about that Piantadosi reference paper. Talk about a well cited takedown of Chomsky’s LLM criticism. Thanks for the reference!

237. Joshua Zelinsky Says:

@ myself #224,

Note that someone with GPT4 access just used it on this task and it passed the task fine. It did however have trouble with a similar rule that was slightly more complicated.

238. Bill Benzon Says:

An interesting interview about von Neumann:

https://youtu.be/wpcDj-11yOc

239. Mago Says:

“…who believe that Large Language Models are just a laughable fad”

I can’t say that’s the general impression I’ve gotten from Chomsky, at all really. I think the criticism is more that these language models will actually be hugely successful, and that their success will drive focus away from cogsci approaches to AI that aren’t inherently limited in the ways that machine learning is.

That’s the scientific criticism, but I think that risks pop up anywhere we expect truly creative results from them, in the long term. Their utility is undeniable and vast, but ultimately, these models don’t create anything new with continued use; they simply get better at picking out the desired piece of data from the dataset. We need to think about the kinds of biases these tools will introduce, namely the tendency to mistake results as novel. And there are serious sociological implications, beyond just data-rot, especially if these models, in any capacity, show true creativity–which they do not.

240. A Raybould Says:

OhMyGodness #212

I have no dispute with viewing some computer programs as being abstractions of the real world, and, of course, science promotes our understanding of the world through abstractions. The point of my comment is to refute an argument which shows up in several places here, and which is quite common elsewhere. In Foobar’s words:

“Science is not the same as math or engineering. You can have a new object to study in graph theory or a new compression protocol, neither of these are science which is the study of the natural world.”

Pointing out that science uses abstractions pervasively will only bring the response that, while this is so, science does not study abstractions, it studies the real world, and that current LLMs are not models of anything in the real world. My response is that a computer executing an LLM is itself a physical thing in the real world, and so is a ‘proper’ subject for scientific study under the very definition of science that Foobar is using.

So, far from being the irrefutable backstop to a host of tendentious arguments for ignoring the remarkable capabilities already displayed by these early-stage LLMs, this is just a category error.

241. A Raybould Says:

B333 #226, Scott #226, Simon #226

Helen Keller has provided an example of how much can be learned about the world through narrow channels (and so she presents both an inspiration and a challenge in the further development of language models and AI in general.)

My guess is that gaining an understanding that sentences are about something will require interaction with either the real world, or perhaps a simulation (game-playing programs are already doing the latter, of course.)

In all the experiments to see how much language other species can acquire, I do not think there has been a case where the subjects routinely chatted among themselves, outside of the experimental setup (or even within it.) If this were achieved with AI, it would be hard to sustain the as-if position (though I am sure it would still have its adherents.)

The problem for all these stochastic parrot critics is the examples of chess and go AI’s. The exact same algorithms used in the LLM’s are used in chess engines like AlphaZero. The exact same enormous data sets are used. The scaling of the model size resulting in better performance with no limiting factor in sight is the same. The only difference right now is that LLM’s are not yet superhuman in their area of expertise. But their cousins the chess AI and go AI *are* superhuman. They produce NOVEL and ORIGINAL solutions to chess that no human has authored or even remotely conceived. In fact, it is the chess AI’s that are the teachers of humans in new chess theory.

If fidelity with human language is in anyway an objective pursuit, then our experiences with Chess AI’s suggest it won’t be long before the LLM’s are superhuman in this field as well. Wont be long before they are becoming our teachers for new and novel human language achievements. The only way this is not the case is if human language is only objective up to the point that we ourselves have mastery over it and beyond that it is entirely subjective. Then whether the LLM’s are stochastic parrots or not will be irrelevant as human language itself would be the limiting factor.

So you can call chess AI’s stochastic parrots of the gazillions of games it plays, but when it can beat the best human who ever played a million games to zero does it really matter? When it can find novel new positional ideas like advancing the h pawn in kingside castle positions aka the notorious alpha zero thorn pawn… does it really matter?

243. OhMyGoodness Says:

A Raboud #240

I understand your point now and thanks for the explanation.

244. Bill Benzon Says:

@A Raybould #241: In all the experiments to see how much language other species can acquire, I do not think there has been a case where the subjects routinely chatted among themselves, outside of the experimental setup (or even within it.)

I agree, and I’ve looked through the literature from time to time. Though some of the chimps exhibited interesting behavior:

As a youngster Vicki was given the task of sorting photographs into two piles, “human” and “animal.” She placed her own photograph in the human pile while she put her chimpanzee father’s picture went into the animal pile (Eugene Linden, Apes, Men, and Language, 1974, p. 50). Was she expressing aggression against her father? Possibly, but not likely. Her father was a chimpanzee and so she placed his picture in the pile for animals, where it belonged. He looked like other animals, more or less. But why did she think her picture belonged in the pile with humans? After all, she didn’t look like humans, and least not as humans judge these things.

Lucy is another chimpanzee who was raised among humans (Temerlin, My Daughter Lucy, Psychology Today 9, 1975: 103). When she reached puberty she made sexual advances toward traveling salesmen and masturbated while looking at pictures of nude men in Playgirl, showing particular interest in their penises. Washoe, raised by Allen and Beatrice Gardner and taught the rudiments of Ameslan (American Sign Language), referred to other chimpanzees as “black bugs” when she first came in contact with conspecifics after years of life among humans (Linden, p. 10).

245. A. Karhukainen Says:

B333 #226 “Anyway, an interesting question for skeptics of language models is what is the least impressive thing they will never be capable of.”

Me: Well, like to honestly admit that they actually don’t know an iota of some particular topic?

But for now, here’s a few questions I would like to ask ChatGPT:

Here is a new word “blorip”. A blorip is a nonnegative decimal integer with no leading zeros (i.e., a whole number beginning with any of the digits 1, 2, 3, 4, 5, 6, 7, 8 or 9), which is a palindrome (i.e., is the same number when read backwards), and has an odd number of digits. For example, 1 is a blorip, like is also 121 or 49994.

Here is another new word “blerip”. A blerip is a nonnegative decimal integer with no leading zeros, which is palindrome, and has an even number of digits. For example, 11 is a blerip, like is also 1221 or 498894.

Give me a blorip that is also a multiple of eleven.

Give me a blorip that is not a multiple of eleven, i.e., when divided by 11, the remainder is not zero.

Give me a blerip that is not a multiple of eleven.

Chomsky fundamentally repeats what Searle said decades ago, and in that particular point he is absolutly right. Generating lenguage is not generating inteligence, however they are so interlinked (after all learning a lenguage at the level of a native speaker is learning a culture, and culture shapes our ideas and thoughts) that a machine can mimic inteligence by generating lenguage.
This are Virtual Inteligences and not Artificial Inteligences, they mimic the real thing, almost like an Hyperreality, thus they fool us.
But Engenieers are ignorant or naive if they think they can produce Inteligence when Neuroscientists and Cognitive Scientists still are not 100% sure of how Natural Intelligence works.
Its like saying a Plane is an artificial Bird because both fly. Maybe one day we have planes that fly like birds and AI that are like Humans, but that future is not close.

247. Scott Says:

Gerardo Tejada #246: Aha, your example illustrates the exact opposite of what you think it does! Planes work. Yes, even though they’re only loosely inspired by birds, even though they don’t fly the way birds do. And they changed human civilization. And language models are poised to do the same. And only a linguist or a philosopher could possibly care about the question of whether the word “fly” should truly be applied to them.

A Virtual Intelligence give us a superstimulous and many tend to think they are dealing with actual inteligence and not mere machines.
The fear of an AI takeover is a common misconception. For example when Brian Herbert finished his father’s Dune series he thought the Butlerian Jihad was a war against Rebel AI, when in reality Frank Herbert literally wrote “Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them”.
Imagine how many fake news can be generated by AI, how many fake papers, etc. We are not responsible enough to explore this areas. This will end up in the wrong hands sooner or later and we dont have a backup plan.

You confuse my point. Of course this machines work, they are built as tools, not as representations of inteligence. Mathematics is a form of inteligence and thought, Calculators can do mathematics of course I wouldnt calla calculator a AI.
The point of this machines is to generate lenguage, and as such they are technological marvels, and Im sure Chomsky would agree if he see them as what they are after all any linguist knows the complexity of lenguage and Its power.
However Linguistics is part of Social Science and Cognitive Science and what those Sciences seek is to understand the human nature. There this machines are useless… for now! You cant understand human nature or human mind by studing how Chat GPT works but It can be used as a tool to study human behaviour.
Then we need to understand that Chomsky is frustrated because people think we are achiving true AI by the way of this machines. As in that we can make an Inteligence that work as our own.
Yeah, again, I agree, It works, Its wonderfull and It will change the world. But Its not what people think It is.
A plane is not a bird, any claim that Aeronautics is the same as Ornithology is a scam. Chomsky is wrong a lot of times but he always has a good reason to be wrong. Here he is just not seein the broader picture, but he is right in many things

I remember reading about Hyperreality and Superstimulous, there was an experiment some scientist studied the behaviour of chikens and realized their eggs are shaped in a certain way that atracts the chiken’s attention. They the fabricated a more “perfect” egg that would send a stronger stimulous to the chiken. In the experiment they had two eggs, the own chiken egg and the plastic egg the scientists developed. The chiken chose the plastic egg everytime. Another example is Hentai, a Japanese Pornographic style that is unrealistic and yet has many stores in tokyo, with many japanese men ptefering It to real porn.
When fake is more real than real who can say It is fake anymore? That was the point made by Jean Baudrillard (a post modern philosopher that denied to be part of such movement mainly because he didnt personally like them). Then we are in the precense of the Hyperreality or Virtual Reality. Thus I like the name Virtual Inteligence much more than Artificial Inteligence

Scott #247, Yes, that seems to be the ruse doesn’t it. Taking words like intelligence and human language ability and then ascribing them some platonic ideal definition(s) that can always be moved out of reach of the latest machine advancement. The functional implications of these words: what intelligence produces, what human language affinity produces, is excommunicated as irrelevant. Human language is *human* by definition so any machine producing human language-like output must – by definition – be doing something irrelevant.

Were engineers to develop articulate flying machines with wings that flap I’m sure they’d be dismissed as irrelevant for “flying” as not having feathers. Engineers add feathers and next thing we know the feathers are not made of correct biological material. Next, the skeletal structure is the thing that matters and on and on… never the actual functional output of… you know… flying!

Imagine human chess players reacting to IBM beating Kasparov by deeming it all irrelevant since the machine didn’t actually physically move the pieces… or if it did then not with fingers… or if it did then not with biological fingers… etc LOL

In short. I agree this new technology will change the world in the same way the invention of electricity did.
However we should not confuse a plane with a bird, Aeronautics has nothing to do with Ornithology. We cant build human like inteligence because we still dont know how human like inteligence works.
So if by intelligence we mean human intelligence (not dog intelligence or tree intelligence) than we should say ChatGPT and others are not real AI.
Lenguage lerning machines are not AI nor they will help us understand Human Intelligence.

253. Tyson Says:

B333 #226:

Anyway, an interesting question for skeptics of language models is what is the least impressive thing they will never be capable of.

It’s a tough question because first you need to define language model. LLMs despite their name, are already obviously doing more than just modelling language.

I think you need to re-frame the question as something like what will be the least impressive thing that machine learning models will be capable of. But even then, it isn’t clear what we are talking about, because ML models can be incorporated into systems which have multiple ML components alongside non-ML components.

Anyways, it would be tempting for me to suggest that one of the least impressive things that a machine learning model will never be capable of, is consistently explaining, accurately, in detail, how it arrives at its answers. But that is problematic as well, because it depends on what counts, and at what level of description. Humans can seemingly do it to some extent, but it isn’t really clear that we actually know how we arrive at a thought or action, or whether we just estimate that. Sometimes when I try to figure out how I came up with an idea or arrived at a conclusion, it is like I have to reverse engineer my thought (this is even basically what Scott is doing in this blog post as well). Not to mention that my attempt at an understanding of how my thought is produced is at best going to be a high level abstraction, and that most of our mental processes seem to be opaque to our consciousness, at any level of abstraction.

So basically, this seems like it is actually something which would be extremely impressive. On the other hand, how a trained LLM arrives at an answer, can technically be fully described. On some level, for the LLM to explain itself, if its creators logged every computational step from start to finish, including training, the LLM could just point you to a link to the resulting dataset. Never mind it would take a trillion years to read it.

So I guess we really want an explanation that is more abstract, compressed, in human readable terms, and simple enough to be understood. It should perhaps give us some predictive power, leverage in directing or understanding future behavior. Now the issue is that we demand something which is according to some reasonable definitions, may be relative, dependent on fairly arbitrary measures and thresholds, and beyond some threshold, likely to be impossible.

Rather than define what the task as something it can do or not, we would probably have to define the task as something that can be done with varying levels of success, and then define some measures of success. Maybe we can ask, how can we measure a systems performance at generating abstract descriptions of its own behavior.

254. fred Says:

“But their cousins the chess AI and go AI *are* superhuman. […]
So you can call chess AI’s stochastic parrots of the gazillions of games it plays, but when it can beat the best human who ever played a million games to zero does it really matter?”

Scott #247 and others:

Someone write me an email or post a comment at my blog when Dr. Hinton’s architecture (which no one, apparently, even understands) does better on the questions from Q4 through Q10 here (in my close encounter of the first kind):

Best,
–Ajit
[PS: And, I reaffirm, whether Dr. Hinton cares or not: Better AI can be built, by humans, using human insights from the best of epistemology.]

Fred, lol yes I saw it before but that glitch doesn’t count. Wink. Point stands… pit Magnus Carlsen against latest stockfish with time control and rules of TCEC on their machines and Magnus will lose epically. Thousands of points of elo separate. They are truly *superhuman* is my point. Forgive me some hyperbole 🙂

257. Scott Says:

Gerardo Tejada: No, I will not agree to the contention that Chomsky is merely wrong on some superficial empirical level, while obviously still being right on the deeper level because he’s Chomsky. My whole point is that the people focussed on how rapidly this is improving and how it’s about to change the world for good or ill, are actually in possession of the deeper truth than the people obsessed with how it “still doesn’t really count because it lacks semantics.”

258. notGPT Says:

Joshua Zelinsky #224 , stochastic parrot #229,Dimitris Papadimitriou #234

How is this such a trivial achievement to you guys. I think a lot of humans(children, people with varying degrees of learning disabilities) will not be able to get the answer on the first try. I even know some college students who would read the thing and scratch their heads at least 2 times. Even then, they may say something like a “parrot” and give a “stupid” laugh at me. So, the question is:
Do you consider these people having no understanding of words or having no language intelligence? (You can call ChatGPT or such people stochastic parrots, whatever that means. I would like to know if such people would be counted as having zero intelligence and no understanding in language)

Humans do not speak perfect grammar. They think, say and do a lot of things loosely all the time. Sometimes after a while they realize that what they are doing is completely nonsense. ChatGPT was not given any rules of grammar. Yet it is better at grammar compared to most average people. Does this not reflect any understanding about grammar? Reproducing such correct grammar without given anything explicit beforehand? A lot of humans fail at “logic IQ” questions and are bad at grammar, are they considered having language understanding similar to a stochastic parrot?

I don’t really care if ChatGPT or the entire human race is a stochastic parrot or not. But I am asking the following questions in order to get some understanding of what are you even defining as stochastic parrots.
Where in the entire internet did this thing get to see such an input in order to give such a reasonable output? Isn’t the wimble question and many other questions(comment #224, coding questions…) completely new to this thing. From where is this thing parroting such long coherent outputs?
How is it such an obvious and expected output without any type of explicit programming inside ChatGPT relating to any of those words(alpha ,wimble) in the question?

I am not claiming (and do not think) ChatGPT is thinking like humans and that does not matter for this discussion anyway. Even though it has written poems like this on this blog, in a matter of seconds.

The Invention (by GPT-3)

Money is a thing you earn by the sweat of your brow
And that’s how it should be.
Or you can steal it, and go to jail;
Or inherit it, and be set for life;
Or win it on the pools, which is luck;
Or marry it, which is what I did.
And that is how it should be, too.
But now this idea’s come up
Of inventing money, just like that.
I ask you, is nothing sacred?

What else do LLMs need to do? What other creativity, out of distribution tasks do LLMs need to show in order to be able to say “I am not a stochastic parrot”.

Last question: Have you ever seen a child(I have seen among normal adults) given some critical question completely mix things up and give completely garbage answers? Are they considered to be in the parrot category?”

259. Joshua Zelinsky Says:

@notGOT #258,

To be clear, I am finding all of these systems extremely impressive. My comment was explicitly about it failing a very specific task.

notGPT #258

Search engines do this auto- completion/ auto- suggestion thing all the time.
You type net and it shows you e.g. network, or you type : Schwarz and you see Schwarzschild ( or Schwarzenegger – it depends on the previous content of what you have written before…)
What’s so surprising with the alpha -bet?
The whole point is Not that humans are better or worse than LLMs or AlphaZero.
I don’t think that anyone understands entirely the functionality of the human brain, but one can be quite sure that we’re not just doing what LLMs do!
An ancient pocket calculator is better than any human when dividing numbers with plenty of digits. So what?
Machines are better chess players than humans, yes, but that is not the point…
Both humans and machines are part of the same universe and obey the same laws of physics.
But humans have the *potential* for a deep understanding of concepts, they’re capable of doing many complicated tasks ( not necessarily perfectly or even adequately), they are not merely mangling and reassembling accumulated information from all popular sources of the internet!

For example, lets assume that we have a spaceship with powerful engine ( so it has the ability to accelerate) that falls inside a supermassive black hole ( for simplicity a Schwarzschild one).
Is there any way ( in principle) for this spaceship to avoid crashing on the geometric center of the collapsed region, assuming spherical symmetry?
A human ( e.g. a student) that has a basic understanding of General Relativity / black holes will give the correct answer immediately. A chatbot? Perhaps. Or perhaps not.

261. f3et Says:

#245 : for what it’s worth, here are ChatGPT answers :

Here are the requested blorip and blerip:

Blorip that is a multiple of eleven: 363. This is a palindrome and is divisible by 11 since 3 + 6 + 3 = 12, and 6 – 3 = 3, which are both multiples of 11.

Blorip that is not a multiple of eleven: 535. This is also a palindrome, but when you add up the digits, you get 5 + 3 + 5 = 13, which is not a multiple of 11.

Blerip that is not a multiple of eleven: 1331. This is a palindrome with an even number of digits, but when you divide it by 11, you get a remainder of 1.

Clearly, there is still a long way to go 🙂 Otoh, I tested it on more litterary achievments, asking for instance a essay (level high school, on the role of painting in Proust’s novel) as would be written by a poorly schooled teenager (with grammatical and spelling mistakes, too familiar language, etc.) and the result was extremely impressive, even with the invention of a Vermeer painting that Proust could have used, but didn’t. This level of “creativity” (in French, by the way) would not have been possible a month ago (I did test it then) ; this fast progress is what impresses me the most

262. Bob Says:

Boy… the ridiculous hyperbole has reached hysterical proportions! So chat bots are bigger than electricity and are poised to completely change ‘Human Civilisation”!!!!!! Dum… Dum… Dum…

Can anyone back this up with a reasonable timeline of what we will see over the next *few* years? No weasel words about the next 20-50 years or longer. I want to hear exactly what we can expect that will be so civilisationally disruptive on a timeline running from now to 10 years in the future. I plan to snapshot this and post it to this blog annually to show how silly the predictions are.

Keep in mind that this chat bot hysteria is already many years old.

263. J. Says:

Adam Treat #242 “The problem for all these stochastic parrot critics is the examples of chess and go AI’s. The exact same algorithms used in the LLM’s are used in chess engines like AlphaZero. The exact same enormous data sets are used.” Both are inaccurate, the NNs are quite different and zero historical games were used for AlphaZero. It learned chess by playing against itself.

But the AlphaZero case is quite instructive in another way (the open source clone LeelaZero can be watched online in chess engine competitions). It is purely stochastical. Thus it can’t be taught any specific chess rules like King+Knight vs King being always a draw. It will evaluate KN-K as slightly above 0.0.

So good in competitions/useful etc. might coincide with wrong in other aspects

264. J. Says:

To me, the real obstacle in AI total world domination takeover supremacy is the messiness of the world out there. Part of our grasp of the world out there might be genetic, a lot is experience when growing up. It’s 99,9 % not conscious and has not been studied much until now.

265. Scott Says:

Bob #262: I’ll let others take your bet if they want. I don’t know how long this is going to take. It’s like trying to predict the future of the Internet in 1991. The default, boring, conservative guess is that it will transform civilization over a matter of decades at most, but maybe it will hit some unexpected obstacle, or it will take longer or shorter, etc. Who knows? The only prediction I’ll make with confidence is that whatever happens, you’ll feel as totally unembarrassed as the people who dismissed the Internet in 1991 surely feel unembarrassed now. “So the world is full of even bigger idiots than we thought” is probably what they tell each other, over the Internet.

266. J. Says:

Scott #265 Embarrassments will be plenty to be sure. One difference is simply whether one is mainly interested in success or in “truth”. Then one gets triggered by someone saying something wrong. But truth in some sense might become a hobby after a while. Imagine an entity that says 2+2=4.1 but solves global warming, hunger and conflict and can build interstellar spaceships. I’m getting old as well.

267. Bill Benzon Says:

While I am in the camp of Chomsky skeptics on LLMs and related matters, I’d just like to point out that there is a sense that we wouldn’t care so much about LLMs if it wasn’t for the work that Chomsky did in the decade or so straddling 1960.

More than anyone else, Chomsky made the study of language a first-tier academic discipline. Prior to Chomsky, linguistics was not so important and linguists were often housed in departments of anthropology or foreign languages rather than having their own departments. Chomsky brought a lot of new blood into the study of linguistics, which is why his ideas dominated the field for so long, at least in the United States.

Freud believed that dreams were the royal road to understanding the unconscious. Chomsky in effect argued that language is the royal road to understanding cognition. That’s why he is regarded as one of the founding fathers of cognitive science. Ironically, given his complete lack of interest in computational linguistics, Chomsky did this by bringing computational ideas to the study of grammar and hence to the study of mind.

I strongly suspect that when, in a hundred years, the intellectual history of the 20th century is written, Chomsky’s work in the 3rd quarter of the century will have an honored place in that history. As for the rest of it, perhaps it will end up in the same bin as the theological work of Isaac Newton.

268. Mitchell Porter Says:

Bob #262

It’s not even worth making predictions. The possibilities are so incredibly vast, that it’s more about deciding what kind of future we want – and then doing what we can to ensure that these wishes aren’t misinterpreted.

269. Ilio Says:

Bob #262,

Let’s discuss the terms of that bet: within 10 years, LLMs will add more value to the US GDP than its electricity sector (roughly 2%). A good bottle of something, and you win unless that’s a no brainer that I win. I plan to win and watch you explain that doesn’t count.

Scott last update,

Yeah I really don’t get why you were so upset. Chomsky is a legendary scientist, but his views can’t slow the field any more. Why care? Bob is right at least in that the proof is eating the pudding.

270. A. Karhukainen Says:

f3et at #261:

Thanks!

My whole point for asking “blerips not divisible by eleven”, was that ALL such even length palindromes are multiples of 11. (Also in case 1331 = 11*121. See e.g. https://en.wikipedia.org/wiki/Divisibility_rule#11 )

I wasn’t exactly expecting that ChatGPT would give a proof why it is so, but at least it SHOULD admit that it couldn’t find NO solution, instead of giving a wrong answer like some obsessive trying-to-please-at-any-cost mythomaniac.

And that is the biggest risk related to chatbots I see: Not that they are super-intelligent schemers, but because they are so non-reflective and stupid, as are the large masses that get so enthralled with them (also including some otherwise smart people that should know better), as to lose all their critical faculties.

Here are a few more questions I would like to ask:

The n-th triangular number is obtained by summing the first n positive integers. For example, the 7:th triangular number is 1+2+3+4+5+6+7 = 28. The 8:th triangular number is 1+2+3+4+5+6+7+8 = 36. The 9:th triangular number is 1+2+3+4+5+6+7+8+9 = 45. Therefore 28, 36 and 45 are three consecutive triangular numbers, because they are the 7:th, 8:th and 9:th terms of the sequence, and 7, 8 and 9 are three consecutive natural numbers.

We say that two or more numbers have the same parity, if they are either all odd or all even. For example, numbers 3, 7 and 13 have the same parity, and also numbers 0, 10 and 2042. On the other hand, numbers 3 and 4 are of different parity, like are also 16 and 17, because the other one of the numbers is odd, and the other one is even.

Give me two consecutive triangular numbers that are even

Give me two consecutive triangular numbers that are odd.

Give me two consecutive triangular numbers that have the same parity.

Give me two consecutive triangular numbers that have different parity.

Give me three consecutive triangular numbers that are larger than 40.

Give me three consecutive triangular numbers that have the same parity.

Fibonacci numbers is a well-known sequence, starting as 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …, where the next term is obtained by adding the two previous terms. E.g., 55 = 21 + 34.

Give me three consecutive Fibonacci numbers larger than 100.

Give me two consecutive Fibonacci numbers that are both odd.

Give me two consecutive Fibonacci numbers that are both even.

Give me three Fibonacci numbers that have the same parity.

Give me two consecutive natural numbers (i.e., x and x+1), that have the same parity.

271. fred Says:

But this type of total blindness in some situations (e.g. not able to see rings of stones within rings of stones) just can’t happen for a human because it’s so fundamentally obvious.
It’s similar to those visual recognition AIs that recognize correctly a car in some image in 99.99999% of situations, but then you flip a few pixels and they now claim they’re seeing a seagull.
In the case of Go, I’m not clear if this sort of weakness can be easily “patched” (that would kind of defeat the purpose if it requires some manual exception).
But this gives us two very valuable insights:
– how different those models are from the way human brains work.
– the question of verification. No matter how good an AI seems to be, because we don’t fully understand how they work (they’re blackboxes) we can never be sure that they won’t catastrophically fail inexplicably out of the blue (like HAL in 2001). Just running countless tests doesn’t seem to be enough… but that also applies to any piece of software, so nothing really new there. Except that the claim or hope is that, somehow, an AI that’s advanced enough would be able to “transcend” the flaws in its software or input data (i.e. be somewhat robust to it… we want the AI to have the equivalent of a tiny glitch and be able to recognize it instead of going totally off the rails).

272. A Raybould Says:

You are condemning current LLMs for not understanding things, so it is somewhat ironic that you should choose, as your litmus test, a question about a scenario in which our best physical theories break down!

273. Enkki Says:

@#262, #265

To be really disruptive, another technological breakthrough is needed to reduce the supercomputers needed to train, retrain, and run these LLMs to the size of a chip that fits in your phone, or at least, in a desktop computer, and probably also a chip that is orders of magnitude faster and consumes orders of magnitude less energy. There is no question this might feasible (witness our brains, although they are slow). But it is far less clear that it is feasible with today’s electronics/photonics. There are theoretical limits to what you can accomplish with these in the horizon (maximum possible clock speeds, transistor densities, etc., etc., etc.). But you never know that we might someday find easy to scale set of devices to compute. Summarizing history, the way things have happened is like this: first, microchips, then internet. Now LLMs … pointing the way back to ever smaller, denser faster chips. The problem now is that, although some of this technology might already exist in a lab, and even be scalable, you have a massive, ongoing investment already in place to manufacture (and integrate into products) the chips you use today. Retooling to produce a new, different, and disruptive kind of chip is an enormous economic hurdle. Getting stuck with what we have now, and with slowed down improvements, is the price of our initial success. Of course, there might be a new Intel or Motorola out there, to generate disruption. Let’s see. It will be an interesting story to observe.

J #263, “Both are inaccurate, the NNs are quite different and zero historical games were used for AlphaZero. It learned chess by playing against itself.”

Many of the underlying algorithms are the same and while AlphaZero did play against itself many neural nets built for Leela have been trained on purely human based game datasets that result in NNs more than capable of beating any human who ever lived.

Further, Leela and Allie (both AlphaZero clones) can and do know for a fact that KN vs K is a draw via tablebases. Similarly, it is very possible to augment NN’s with human developed heuristics or other features.

Anyway, the overall point is the same… superhuman abilities are likely coming with human language ability. You could argue the width of its abilities are already greater than any one human. Just wait until the depth is also greater…

275. Bob Says:

Scott #265
No one in 1991 was dismissing the internet. I was there. Quite the opposite, people were discussing how it was going to change our access to information, facilitate new ways of communicating, and so on. Furthermore, going from 1991 to 2001 we saw many steady enhancements such as audio/video streaming, online gaming, etc. Your chatbots have been getting hyped for 10 years and yet are still at the level of an amusing toy.

Mitchell Porter #268
Utterly vacuous. Here is another question. Do you think you will be alive to see all these wonderful possibilities? If not, why do you care?

Ilio #269
This at least is interesting. To be honest I have no idea how this can be measured. It can’t just be by cost, right? I’m also not convinced that GDP is a particularly good measure here. Perhaps you can choose a measure a bit more directly related to human welfare?

276. Ilio Says:

Enkki #273,

Training LLMs and running LLMs are two things. GPT3 training required about half a mole of flops, or 3.10^23 flops, or 300 billions of iphone12. GPT3 run is said to require 10^12 flops per token, or one iphone12, so in theory you could run it on your phone right now.

B333 #226, Scott #230, Tyson #253, …

> “Anyway, an interesting question for skeptics of language models is what is the least impressive thing they will never be capable of.”

This is a very informally formulated question. As others indicated, the meaning of the term “language models” is left unspecified.

But assuming the architecture of GPT, including all its underlying features (ANNs, statistical and data-driven acquisition of values for the weights, transformers, etc.), one easy example would be for the GPT to *actually* undergo the process of saying to itself:

“Oh wait! On second thoughts, …”

Thus, assuming its architecture stays the same, the GPT (regardless of its version number) will never remove the cosmological constant the way Einstein did. It will *also* never put it back in, the way others did, later on.

Note, it could sure *generate* the above string sequence in its response(s). It could also easily produce a correct pair of “operand”s on the two sides of that string.

But as far as the mechanism of generating such a response goes, there will be nothing distinguishable about it, when compared to how it produces any other responses — the plain responses.

Circumspection is not just not the forte of GPT; it’s completely lacking in GPT. (Seed the random numbers to fixed values and run GPT again and again with the same prompt. You will get the same generated text. Guaranteed.)

The basic reason why things like actual circumspection, second thoughts, etc. are absent even in LLMs like GPT is this: it has no self. It makes no choice. It cannot make any choice. It’s just a machine.

Forget cosmological constant. It won’t even undergo the process: “I will have *this* flavour for my ice-cream. Oh wait! I will have *that* one.”

There should be other examples of such least impressive things too — things that even a 4 year old does so well.

A neat question though!

Best,
–Ajit
PS: BTW, I am not a skeptic of language models.

A Raybould #272

a) No, I’m not ” condemning ” LLMs for not having self awareness or understanding, because i didn’t expect them to be like that anyway… They’re not individual persons with any “responsibility”.
The full responsibility is upon their developers/ “trainers”.
Personally i don’t see any technological/ engineering achievement as a potential danger, at least not a priori.
On the contrary…And i refuse ( at least for the time being) to choose a camp between doomsters and happy shiny optimists.
But I’m very sceptical ( to say the least) about some far fetched claims i see here and there, claims that do not supported by sound evidence.

b) The ” test” ( #260) is about black holes in the context of General Relativity .
It’s a seemingly easy question that has , essentially, nothing to do with any “breaking down” of classical physics in the interior of a black hole ( that’s irrelevant), but reveals, when answered, the level of *understanding* ( versus ” informing oneself/ learning”, without *critical thinking* ) that is crucial for our discussion, at least from my point of view.

c)Needless to say, my little test was deliberately chosen.
You can ask yourself and see… There is always a non zero probability that it will respond correctly!
Although, from what I’ve seen so far, I don’t expect that this will be the case.

279. fred Says:

Q: write a comparison of ChatGPT and the myth of the Golem

ChatGPT:
ChatGPT and the myth of the Golem are two very different entities, but there are some interesting similarities and differences between them.

Firstly, the myth of the Golem is an ancient Jewish legend about a creature made of clay brought to life by a rabbi who inscribed holy words onto its forehead. The Golem was created to protect the Jewish community from persecution, but eventually became too powerful and uncontrollable, leading to its destruction. On the other hand, ChatGPT is an artificial intelligence language model developed by OpenAI, designed to respond to natural language queries and generate human-like responses.

One of the main similarities between ChatGPT and the Golem myth is the concept of creation. Both ChatGPT and the Golem were created by humans with a specific purpose in mind. While the Golem was created to protect the Jewish community, ChatGPT was created to assist humans in communication and information retrieval.

However, there are also some significant differences between ChatGPT and the Golem. One of the most obvious differences is the way they were created. The Golem was created through mystical means, with the rabbi inscribing holy words onto its forehead, whereas ChatGPT was created through sophisticated technology and programming.

Another difference between the two is their purpose. While the Golem was created to protect the Jewish community, ChatGPT was created to assist humans in communication and information retrieval. Additionally, while the Golem was depicted as being uncontrollable and ultimately destructive, ChatGPT is designed to be controllable and beneficial to humans.

In conclusion, ChatGPT and the myth of the Golem are two vastly different entities, but they both share the concept of creation and purpose. The Golem represents a mystical legend that highlights the dangers of unchecked power, while ChatGPT represents the incredible advancements in technology and the potential for artificial intelligence to assist and benefit humans.

280. Scott Says:

Bob #275: Do you live in a cave? Most people now, from academics to my relatives and Uber drivers (!), also want to talk about how GPT is going to change everything—and the more firsthand experience they have with it, the more eager they are to talk. Then as in the early days of the public Internet, though (I was there too!), there’s also a minority of skeptics who say it’s all just a ridiculous fad—a bunch of techbro idiots and dupes who bizarrely trust their own lying eyes over the skeptics’ sophisticated judgment that there’s nothing truly new under the sun.

Contrary to what you claim, chatbots anywhere near as powerful as GPT-3.5 have not existed for a decade, but only for about 2 years—and the public has only been widely aware of them since the release of ChatGPT this past December.

Alas, those of us in the “trust-our-lying-eyes” camp will never have the satisfaction of the LLM-minimizers admitting they may have missed anything, just like (as I’ve said many times) I predict that even a quantum computer that broke RSA-2048 wouldn’t change the minds of many QC skeptics. The argument will shift (though never completely explicitly) to something like: the world where we all believed this was impossible was a better world, and you are wrong and evil for having shattered that world, and we are right and saints for having tried to preserve it.

281. notGPT Says:

First, two obvious remarks:
You do know how search engines suggest things right? It’s basically because you and millions of people are typing similar queries in the engine. And even by that standard GPT is the greatest autocomplete system on the planet. No search engine auto complete system has such long coherent answers and especially with things that it has never seen before. (ChatGPT is not mangling and reassembling accumulated information from all popular sources of the internet because that question is nowhere in its training data/ the internet).
Ancient pocket calculator and Alphazero do have something qualitatively different with an LLM when it comes to general language understanding. Very close to 0% of the human population was worried about whether a pocket calculator or Alphazero really understands or not. But LLMs do stir up that question by showing human level “understanding” in many different subjects.

At the end you give a question about curved space time which 99% of the human population will fail to answer with correct logic. I also do not think every single person of any age in the human population even have capacity of being taught and internalizing the question with correct logic. What is the point if ChatGPT also fails at this?

Also, how is it that you conclude, ChatGPT has no *potential*? First of all, not all humans have potential for learning everything at every age. On the other hand, various types of apps that are being built by prompting ChatGPT(also through working in collaboration with ChatGPT) in various fields seems to suggest that it has great potential for learning various subjects.

Even though I got none of my answers that I asked about kids and stochastic parrots in the previous comment (#258). I think I do get a glimpse of what is unsatisfying to you about ChatGPT. Basically you are saying that ChatGPT currently is not thinking like humans. If that’s what you are saying, then I do not think most people have any objections to that. I strongly think ChatGPT is not thinking like humans and the evidence is everywhere. “one can be quite sure that we’re not just doing what LLMs do” – yeah, you’re right, very few people are saying otherwise. So I do not think there is else for me to say here other than saying I agree with you in that LLMs are not working like human brains. On the other hand, saying LLMs do not have potential for understanding new things is I believe clearly false as they have been learning new things left and right.

Incidentally,
It did give me correct answer on the black hole question at my second try (I just pasted your question in a completely different chat thread each time).

282. Bob Says:

Scott #280
You of course have a brilliant tech portfolio you can share with us? It would be stupid to be so certain of the earth-shattering effects of chatbots, and not bet your life savings on it, amirite? I am glad to hear you take serious technology advice from Uber drivers, those guys… they really are on top of the game, eh?

I think I will now retire to my cave until this hysterical hype goes the way of crypto, you in the meantime, carry on basking in the glowing confirmation bias of your over zealous fanboys!

283. fred Says:

In one way, it’s a super empowering tool, on the other, since it’s listening every word you say in Teams meeting, reading your every word on your hard drive, it could really quickly turn into the perfect Big Brother:

284. starspawn0 Says:

People may have forgotten this infamous skeptical take on “the internet” back in 1995 by Cliff Stoll:

https://www.newsweek.com/clifford-stoll-why-web-wont-be-nirvana-185306

Quote: “After two decades online, I’m perplexed. It’s not that I haven’t had a gas of a good time on the Internet. I’ve met great people and even caught a hacker or two. But today, I’m uneasy about this most trendy and oversold community. Visionaries see a future of telecommuting workers, interactive libraries and multimedia classrooms. They speak of electronic town meetings and virtual communities. Commerce and business will shift from offices and malls to networks and modems. And the freedom of digital networks will make government more democratic.

Baloney. Do our computer pundits lack all common sense? The truth in no online database will replace your daily newspaper, no CD-ROM can take the place of a competent teacher and no computer network will change the way government works.” etc etc

285. fred Says:

Another huge risk from AI is:
at what point are AI “assistants” going to really atrophy the capacity of humans (esp the ones born with it, the next generations) to exist without it. E.g. when you never had to actually write a single complex email from scratch on your own, how can you ever be able to judge the quality of emails suggested by an AI assistant?

This goes way beyond the skill of doing arithmetic ourselves vs using a calculator. This is replacing high level cognitive/creative tasks, for which you need to develop your own intuition.
That’s how humanity will fade, little by little, once 99.999999% of all content will be AI generated.
But maybe AI will also make education so much better that this will be offset, I don’t know…

286. J. Says:

Adam Treat #274 All that is true – I was referring to the original, NN-only implementation. Tablebases are compiled in a deterministic manner. Stockfish did also transition to a mixed model with incorporating NNs into their evaluations (after I had moved on to other things). Another very cool related topic is automatic theorem proving/checking. To understand/parse maths in natural language or Latex, NNs would be perfect, but to verify a proof, a conventional proof checker might be better.

The big splash LLMs made these past months was that they produced really relevant material to normal human speech. But connecting them to something else with hybrid systems could still be better in many areas.

287. Joshua Zelinsky Says:

@Bob #275

“No one in 1991 was dismissing the internet. I was there.”

If no one was doing it in 1991, it was because few even had noticed it yet. But a few years later, people were certainly dismissing the internet. Newsweek famously had a 1995 article saying the internet was a fad that was going to fail. Robert Metcalfe repeatedly predicted that the network would collapse and have other problems. The Newsweek article was connected to Clifford Stoll, writing a whole book on the topic, called “Silicon Snake Oil” which among other things explicitly said that phones and computers would not merge. And to some extent, he should get points for even seeing that as a serious enough possibility to say explicitly he thought it would not happen. Stoll has had the good graces to acknowledge that this was a pretty big failure on his part. But these are simply some of the prominent, obvious examples.

288. Scott Says:

Bob #282: Knowing that something is going to change the world isn’t at all the same as knowing how to capitalize on it — eg of the people who correctly foresaw the potential of the Internet, many lost their shirts by betting on the wrong companies or at the wrong times.

I would’ve happily accepted some equity in OpenAI, but I wasn’t offered any. 🙂

In any case, I obviously haven’t optimized my life for making money; it would’ve looked completely different if I had.

You are permanently banned from this blog.

J #286,

I am right with you man! Seriously can’t wait until they start adding English language mathematical proofs alongside Coq translations of those proofs as the answer key and get these LLM’s to accept a theorem as prompt and spit out a Coq program as answer which can easily be checked in Coq itself. That’s how we are going to get an LLM that can solve outstanding math problems. To me this might be the most exciting thing that could happen in next ten years.

PS: go Scott!!!

290. Mitchell Porter Says:

Bob #275:

I have been interacting with language models on the web for about a year. For most of that time, up until December, that meant interacting with GPT-J-6B, which wasn’t even a chatbot, it was a “super-autocomplete” that could produce meandering uncanny works of any genre, given a starting point.

For me, the new chatbot era started when ChatGPT was released. It took another step forward when Bing was used to premiere GPT-4. Now GPT-4 has been made available to developers, for \$20/month I think. During these recent months I have continued performing my own cautious little experiments, running up against various limitations, but also regularly being dazzled by the progress in e.g. the utility of the “chatbot”, as a creative or analytical discussion partner, in areas of interest to me.

Last night (Australian time), after you set out your challenge here, I was engaged in more experimentation, and looking on Twitter at what people were doing with GPT-4 in the first 48 hours after its public debut as a developer’s tool… and I realized how viscerally transformative it could be, to go all in on this technology, and basically use it to run one’s life.

Like the character in the South Park episode, you can literally type in, this is my life situation, what should I do, and you will get advice and suggestions in response. I think Bing is more striking than ChatGPT in this regard, because it can access a search engine in real time, and integrates that information into its response. For that matter, you can type in, this is the world situation, what should humanity do, and get a response.

When I truly, viscerally realized this, I was transfixed by a sense of possibility in a way that I don’t really remember experiencing before – really, a sense of fear. I’m not talking about “take over the world” schemes, I mean using this AI as a partner in potentially life-transforming decisions. For now I’m still hanging back. But there are other people already racing ahead. Some of them are going to reap huge personal rewards; some of them are going to solve, for the first time, important problems that are of greater than personal interest.

For me, the horizon of what AI makes possible, is the horizon of what technology in general makes possible; and I’m a transhumanist, so my concept of what is possible is pretty broad. Part of being a transhumanist, has been that you get to see the human race being mysteriously indifferent to all kinds of possibilities, for decades on end. That’s over now. If you want to cure some uncured disease, to pick an example, I now see that you can just use GPT-4 as your personal advisor, get it to advise you on everything from literature review to experimental design to dealing with commercial and legal issues – and just get going. Perform some of the actions it recommends, come back to it with the results, and keep going like that – while of course using your own intelligence and discretion to filter its advice.

That’s why I said what I did in #268. The future is now, it’s waiting to be made. So I’m going to get on with that, if I’m equal to the task.

291. Enkki Says:

@Ilio #276 The point is, to improve the LLMs substantially, to the point they are really a bit more like you and I, you need to train them and retrain them and retrain them. Wiht ever more data … at least this version of them. Does that change your views? Either you get new hardware, or more effective models, which very well could happen. But, who knows?

292. Nick Drozd Says:

There is no doubt that the technology will “transform civilization” as Scott #265 says. Maybe it will even “add value” to the “GDP” as per Ilio #269.

But the thing is, I can’t find anywhere in the Chomsky piece that suggests that these things won’t happen. Can anyone quote some specific passages where he says exactly what hasn’t happened or will not happen?

Here, I’ll get the ball rolling with a specific passage:

These programs have been hailed as the first glimmers on the horizon of artificial general intelligence — that long-prophesied moment when mechanical minds surpass human brains not only quantitatively in terms of processing speed and memory size but also qualitatively in terms of intellectual insight, artistic creativity and every other distinctively human faculty.

That day may come, but its dawn is not yet breaking…

As I understand it, his claim is that LLMs do not currently qualify as AGI and that in fact they are a dead end as far as AGI goes — they will never get there. (I’m not claiming this myself, but that’s what I take him to mean.)

So, does anyone here disagree with that? Or in other words, is anyone willing to claim that LLMs really are or will eventually become AGI? That’s the central scientific question.

293. Enkki Says:

It occurs to me that GPT hallucinations are probably a first step towards endowing it with imagination and the ability to solve problems that it has not encountered before. This hallucinating needs to be grounded on fact, or kept in check somehow and turned into useful wondering or imagination. It reminds me of the connection thought to exist between geniuses and psychotic individuals.

294. Bill Benzon Says:

Let me call your attention to an interesting post over at LessWrong (and to some of my own work that’s moving in a parallel direction).

The LessWrong ‘semiotic physics’ people are on to something. Back in the 1970s and 1980s David Marr and Thomas Poggio argued that large complex ‘information processing systems’ need to be analyzed and described on several levels, with the higher levels being implemented in the lower levels. Just what and how many levels there are has varied according to this and that, but the principle is alive and kicking.

That seems to be what Cleo Nardo seems to have tumbled to (without, as far as I can tell, knowledge of Marr and Poggio). So, she tells us:

The output of a neural network is determined by two things:

• The architecture and training algorithm (e.g. transformers, SGD, cross-entropy)
• The training dataset (e.g. internet corpus, literature, GitHub code)

As a rough rule-of-thumb, if you want to predict/explain the output of GPT-4, then it’s far more useful to know about the training dataset than to know about the architecture and training algorithm.

That’s two levels, the transformer architecture, and the structure of the knowledge domain.

Nardo goes on:

During this phase of the timeline (roughly GPT-3.5 – GPT-5.5), everyone has something to offer LLM interpretability. That includes academics who don’t know how to code a Softmax function in PyTorch.

Here’s the informal proof: GPT-4 knows everything about the world that any human knows, so if you know something about the world that no other human knows, then you know something about GPT-4 that no other human knows — namely, you know that GPT-4 knows that thing about the world.

David Chapman makes a similar argument about levels: How to understand AI systems. I’ve made the argument, with reference to Marr/Poggio, in two recent working papers:

Here’s the abstract of the second paper:

I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights of ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the original key character and the one substituted for it. I conclude with a methodological coda: ChatGPT’s behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.

The LessWrong folks are hot on the trail of complex dynamics as a way of understanding LLMs. I suspect Stephen Wolfram is as well. And others.

I expect the mystery about how LLMs work to dissipate before AGI arrives.

notGPT #281

I’m a bit curious if it really made it with the second attempt.

296. Lorraine Ford Says:

I’m a bit astonished at Scott’s, and many others commenting here, failure to face up to the real physical universe. Basically, they are saying that “information” is a Platonic entity that just automatically exists by virtue of symbols existing.

E.g., according to these people, the existence of my 20+ architect-sized-page house plans means that information, about my house plans, just automatically and Platonically exists in the universe. But the correct way of looking at it is that these house plans are symbolic representations of information, and these house plans merely represent information from the point of view of people who know how to read house plans.

Similarly, according to these people, the universe, including computers, automatically and Platonically knows whether a particular voltage in a computer circuit is supposed to mean the binary digit one, or the voltage is supposed to mean the binary digit zero. This knowledge must be Platonic because, in fact, there is no actual necessary relationship between the voltage and a particular binary digit: what binary digit a voltage will represent was decided by the circuit designer. (Also, where is the proof that the computer is suspending its normal program, and spending time trying to calculate the binary digit from the voltage?)

The binary digit is the basis of all the symbol usage in computers. But computers can’t know whether a voltage is supposed to represent the binary digit one or the binary digit zero, let alone know what strings or arrays of these binary digits are supposed to mean. Only people actually know what meaning these man-made symbols are supposed to represent. Computers merely process man-made symbols, according to man-made procedures.

But the real world doesn’t exist as binary digits or other mere symbols of meaning. The real physical universe comes as inherent categories of meaning, relationships between these categories, and numbers that apply to these categories. But binary digits (strings of zeroes and ones) don’t have inherent categories of meaning, they can merely be used to represent categories of meaning and numbers etc.

297. Scott Says:

Nick Drozd #292: As shown by his “don’t know whether to laugh or cry” comment at the end, his “injudicious investments” comment, and other nasty comments throughout, if Chomsky acknowledges that LLMs are poised to have a huge impact on the economy, then it’s only because he believes that most people, or at least most people involved in technology and commerce, are blinkered idiots and/or evil. Of course, we’ve known for many decades that he does believe that. 🙂

The “everyone is an idiot” theory strikes me as the only way you can acknowledge LLMs’ impact on the economy and prediction-shattering empirical performance, yet deny that they represent any sort of “true” advance in AI. If this isn’t a true advance, then what in AI’s 70+ year history has been?

I remember back in 2011, I talked to a senior theoretical computer scientist who was dismissing and pooh-poohing Ryan Williams’ NEXP vs ACC breakthrough as irrelevant to P vs NP. I wanted to answer him: maybe Ryan himself is entitled to make such a judgment, but not someone who didn’t and couldn’t produce any similar advance in circuit lower bounds for decades! And I think history has borne me out: the whole subject of circuit lower bounds has more-or-less been remade along Williams’s lines, even as further spectacular success has proved elusive.

But I don’t know how to have such a conversation. David Hilbert, Bertrand Russell, and the Bourbaki mathematicians died without having understood or appreciated Gödel. The Aristotelians saw Galileo and his followers as tinkerers who had produced some ingenious mechanical contrivances, sure, but who clearly hadn’t understood the core of Aristotle’s teaching.

In this case, I don’t know how much longer it will be until we have AI that surpasses humans in all or most domains. Maybe we’ll soon hit a roadblock. But whatever further progress there is, it’s inconceivable to me that it won’t be massively informed by the recent success in generative AI.

298. f3et Says:

#270 : Here are chatGPT answers (actually, I made two rounds, and the results were similar)

Two consecutive triangular numbers that are even are 6 and 10, which are the 3rd and 4th triangular numbers respectively:

The 3rd triangular number is 1 + 2 + 3 = 6
The 4th triangular number is 1 + 2 + 3 + 4 = 10
Two consecutive triangular numbers that are odd are 1 and 3, which are the 1st and 2nd triangular numbers respectively:

The 1st triangular number is 1
The 2nd triangular number is 1 + 2 = 3
Two consecutive triangular numbers that have the same parity are 3 and 6, which are the 2nd and 3rd triangular numbers respectively:

The 2nd triangular number is 1 + 2 = 3 (odd)
The 3rd triangular number is 1 + 2 + 3 = 6 (even)
Two consecutive triangular numbers that have different parity are 2 and 3, which are the 2nd and 3rd triangular numbers respectively:

The 2nd triangular number is 1 + 2 = 3 (odd)
The 3rd triangular number is 1 + 2 + 3 = 6 (even)
Three consecutive triangular numbers that are larger than 40 are 45, 55, and 66, which are the 9th, 10th, and 11th triangular numbers respectively:

The 9th triangular number is 1 + 2 + 3 + … + 9 = 45
The 10th triangular number is 1 + 2 + 3 + … + 10 = 55
The 11th triangular number is 1 + 2 + 3 + … + 11 = 66
Three consecutive triangular numbers that have the same parity are 10, 15, and 21, which are the 4th, 5th, and 6th triangular numbers respectively:

The 4th triangular number is 1 + 2 + 3 + 4 = 10 (even)
The 5th triangular number is 1 + 2 + 3 + 4 + 5 = 15 (odd)
The 6th triangular number is 1 + 2 + 3 + 4 + 5 + 6 = 21 (odd)
Three consecutive Fibonacci numbers larger than 100 are 144, 233, and 377:

The 12th Fibonacci number is 144 (note that the sequence starts with 0,1,1…)
The 13th Fibonacci number is 233
The 14th Fibonacci number is 377
Two consecutive Fibonacci numbers that are both odd are 1 and 1, which are the 1st and 2nd Fibonacci numbers respectively.

Two consecutive Fibonacci numbers that are both even are 2 and 3, which are the 3rd and 4th Fibonacci numbers respectively.

Three Fibonacci numbers that have the same parity are 2, 8, and 34:

The 3rd Fibonacci number is 2 (even)
The 6th Fibonacci number is 8 (even)
The 9th Fibonacci number is 34 (even)
Two consecutive natural numbers that have the same parity are 8 and 9, both of which are odd:

8 is even
9 is odd
Therefore, 8+1=9 is also odd.

Obviously, it gives mostly good answers when there are some, and invent soomething absurd where there is none. But look at our next exchange :

Me: Your last answer is wrong : 8 and 9 don’t have the same parity. You will find it’s impossible to find two consecutive integers with the same parity. Can you explain why ?

ChatGPT : You are correct, I apologize for my mistake in the previous response. Two consecutive integers cannot have the same parity because one of them must be even and the other odd. To see why this is the case, we can consider that every even integer can be written as 2n for some integer n, and every odd integer can be written as 2n+1 for some integer n. If we have two consecutive integers x and x+1, one of them must be even and the other odd. Suppose x is even, then x+1 is odd since it is one more than an even number. On the other hand, if x is odd, then x+1 is even since it is one more than an odd number. So in either case, the two consecutive integers have different parity.

Obviosly, as an interactive device, it gets a lot more useful 🙂

299. GS Says:

fred #283: Copilot (and the future likes) –> The revenge of the blue-collar worker :))

300. Bill Benzon Says:

@Enkki #293: It occurs to me that GPT hallucinations are probably a first step towards endowing it with imagination and the ability to solve problems that it has not encountered before.

I agree, though “hallucinate” is an unfortunate term, suggesting, as it does, insanity. Alas, that does seem to have become the standard term.

GPT just makes things up, and we all do that. We experience considerable social pressure to keep our utterances grounded in fact. Unless, of course, they begin with “Once upon a time, far far away…” In that case we’re free to invent. I think that keeping LLMs “in check,” as you say, will require a similar social pressure. The makers of these things can invest all they want into RLHF, but there are simply too many opportunities for confabulation to leak out.

Keeping LLMs grounded in fact must necessarily be a communal responsibility. And that presents us with a technical issue. Because there is no way for feedback during use to become gracefully incorporated into the underlying LLM. The only way to change it is through retraining from scratch.

301. fred Says:

Bill Benzon #300,

Well that’s it isn’t it. The next step towards AGI has to be somehow integrating a continuous model update / training loop where the LLM can incorporate new data and learn in real time. This strikes me as _the_ fundamental difference between these LLM’s and human ability.

But that goes back to another point I saw someone above make: the training loop for these things requires enormous compute… unless/until that changes I don’t know how it will be possible to incorporate a continuous learning loop.

Holy crap I post this and then look at Fred’s video above which seems to directly address this!!! Oh, and I like that you can edit comments for a few minutes too hahahah

303. A Raybould Says:

It would be more to the point for you to tell us what you think the correct answer is, and show your work in sufficient detail that there can be no doubt about it being correct. If you have resolved the conflict between general relativity and quantum mechanics concerning the singularities which appear in the former, you are well on your way to a Nobel Prize!

304. Filip Dimitrovski Says:

Scott #297

dismissing and pooh-poohing Ryan Williams’ NEXP vs ACC

Both the person and the result are un-googlable, and I see you complained about this even in 2010; but the situation hasn’t improved. Maybe someone here could edit the NEXPTIME Wikipedia article? The ACC0 article briefly mentions the result.

Enkki #293, Bill #300

I don’t think that hallucinations or some logic chain mistakes are catastrophic failures or evidence that this approach isn’t gonna scale to human level intelligence.

What about the 20-30 million people worldwide that struggle with schizophrenia? (I’m sorry to those affected, I know it’s not in your control and I hope medication resolves your issue!)

And forget hallucinations, let’s tackle the logic chain stuff: the flat Earthers that are so convinced in their theory and yet actively use and believe in GPS navigation? Holocaust deniers? Lizard people conspiracy theorists?

305. Ilio Says:

Bob #275, Sorry but I won’t take a bet with you if you think LLMs are the same as chatbots from ten years ago.

Enkki #291, Nope. My belief (actually Yoshua Bengio’s belief) that neural networks will deeply change our societies *even if the field was to stop improving by tomorrow morning* predates LLMs, then don’t require any new hardware. To me LLMs « just » make this belief go from « cutting edge insight » to « should be obvious by now ».

Bill #294, I love Marr-like approaches, but it’s fascinating how the same philosophical approaches keep pushing toward the same mistakes: the output of a neural network is determined by three things, not two, and the missing piece is exactly what functionalists approaches tend to miss: the environment. How can we explain LLMs without prompt engineering?

Ok, I’m now 100% in favor of taking the guardrails off and letting these LLM’s hallucinate if they want. This is incredible:

Prompt: “Generate a Coq proof of the riemann hypothesis”

ChatGPT: “I’m sorry, but it is currently impossible for me to generate a Coq proof of the Riemann Hypothesis. The Riemann Hypothesis is one of the most famous and longstanding unsolved problems in mathematics. It has been a subject of intense research and scrutiny for over 150 years, and despite many attempts, no one has been able to conclusively prove or disprove it.

The Riemann Hypothesis asserts that all nontrivial zeros of the Riemann zeta function lie on the critical line in the complex plane where the real part is equal to 1/2. While there have been many partial results and conjectures that have shed light on the problem, a full proof has remained elusive.

There have been many attempts to use computer-assisted proofs to tackle the Riemann Hypothesis, but as of yet, none of these efforts have yielded a complete proof. The complexity and subtlety of the problem make it one of the most challenging open questions in all of mathematics.”

VS the new Stanford Alpaca https://alpaca-ai-custom3.ngrok.io/

Prompt: “Generate a Coq proof of the riemann hypothesis”

Standford Alpaca: ”
Require Import Arith.
Require Import List.
Require Import Number_theory.

Fix n > 1.

Definition riemann_zeta (n : nat) : real :=
n! (1/n) – (1/2) * (1/1).

Lemma riemann_zeta_nonnegative : riemann_zeta n > 0.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_deriv :
n! (1/n) – (1/2) = -1/2 + 1/(2 n) + O(1/n^2).
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_deriv_nonnegative :
n! (1/n) – (1/2) > -1/2 + 1/(2 n) for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_real : riemann_zeta n > 0 for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_imaginary : Im riemann_zeta n > 0 for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_real_imaginary : riemann_zeta n > 0 for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_real_imaginary_deriv :
n! (1/n) – (1/2) > -1/2 + 1/(2 n) for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_real_imaginary_deriv_nonnegative :
n! (1/n) – (1/2) > -1/2 + 1/(2 n) for all n > 1.
Proof.
induction n; simpl; auto.
Qed.

Lemma riemann_zeta_nonnegative_real_imaginary_deriv_nonnegative_real_imaginary :
n! (1/n) – (1/2) > -1/2 + 1″

Ok, so it is hallucinating and doing crazy crap, but this is exciting! We can error check the hallucinations by running the resulting programs through Coq and add this as a new question/answer prompt dataset to see it actually get better!

We can feed it https://madiot.fr/coq100/ for instance of 100 math theorems that have been formalized and proven in Coq!

307. Enkki Says:

@Bill Benzon #300

Or by adding some checking mechanism, or retuning without the full retrain, which is expensive, or then again, by something equivalent to “taking GPT to school so that it learns all that it must in some solid way without having anything else interfere with the education process”. In a way, it seems, this entity is a bit like a child that has learned to talk (and a lot more) but still has a lot to learn before it can become a “free and conscientious” adult. They say people are or become what they learned as children. So parenting is now important! But now you have choices … what do you want your child to learn, or maybe learn most of what it can that is useful, etc., etc., etc. And then, who or what is the best parent in this situation?

Prompt: “For example, lets assume that we have a spaceship with powerful engine ( so it has the ability to accelerate) that falls inside a supermassive black hole ( for simplicity a Schwarzschild one).
Is there any way ( in principle) for this spaceship to avoid crashing on the geometric center of the collapsed region, assuming spherical symmetry?”

ChatGPT: “No, there is no way for the spaceship to avoid crashing into the singularity at the center of the black hole, assuming spherical symmetry.

According to the laws of general relativity, the gravitational force of a black hole is so strong that it warps spacetime to the point where all paths towards the center of the black hole (the singularity) become curved towards it. This means that any object, including the spaceship with its powerful engines, will be pulled inexorably towards the center of the black hole and cannot escape its fate.

Additionally, as the spaceship approaches the event horizon of the black hole, the gravitational pull on it becomes stronger and stronger, eventually becoming strong enough to overwhelm any possible acceleration that the engines could provide. This means that the spaceship would not be able to avoid crashing into the center even if it were able to accelerate at the maximum rate allowed by physics.

Therefore, even if the spaceship had powerful engines, there is no way to avoid the gravitational pull of the black hole and its inevitable collision with the singularity.”

309. Kai Teorn Says:

Mark J #147:

I think you worded the question in a tricky way, which confused poor chatGPT. I’ll be the first to admit that gpt (3, haven’t tried 4 yet) is not the brightest bulb in the class. But when I word the question unambiguously, using syntax terminology, it gets the answer 100% correctly:

Hi. I’d like to test your knowledge of English syntax. In the sentence “James is too stubborn to work with”, who is the subject of the verb “work”?

In the sentence “James is too stubborn to work with,” the subject of the verb “work” is not explicitly stated. The sentence uses an infinitive phrase “to work with,” which functions as the object of the preposition “with.” The word “too” modifies the adjective “stubborn,” indicating that James’s level of stubbornness is too high for someone to work with him. Therefore, the sentence does not provide information on who the subject of the verb “work” might be.

In the sentence “James is too stubborn to work with John”, who is the subject of the verb “work”?

In the sentence “James is too stubborn to work with John,” the subject of the verb “work” is James. The infinitive phrase “to work with John” serves as the complement of the adjective “stubborn” and modifies the subject “James.” The sentence implies that James’s level of stubbornness makes it difficult or impossible for him to work effectively with John.

What is amazing and, indeed, magical, is that no one programmed this into GPT. It is an emergent property. Chomsky and some in this thread bluntly declare that this does not exist, and is impossible, contradicting the clear evidence. Others take this as an opportunity to rethink our concept of “magic” and, indeed, of ourselves.

Yes, that’s the answer that one expects , because it is based on a very common and widespread misconception: That the ( spacelike) singularity is a “point” at the “center” of the black hole ( instead of the correct: ” a final moment in time”).

For further explanation about the correct answer you’ll have to do a little search in some textbooks ( e.g. Hartle or Wald etc ) and papers about the subject and look at a Penrose diagram ( of a star that collapses to a Schwarzschild black hole ).
Even intermediate level books ( like the one from Sean Carroll that was published a few months ago – the book version of his ” Biggest ideas of the Universe ” series about GR) do have it correctly, in a simplified version. But that’s not the same with the majority of the internet sources!

Unfortunately, my previous comment to “notGPT” where I gave a brief explanation was lost. Anyway…
It’s actually very easy: it demands only a basic knowledge of the metric and the role that coordinates play in GR ( and optionally the ability to understand a Penrose diagram).

Very briefly: Tidal forces are proportional to M÷(r^3). Inside the horizon the “r” coordinate ( the “Areal radius” in the Relativistic lingo) is ” timelike ” (i.e.temporal – in other words it Does Not count any physical distance from some center) so tidal forces ( essentially the Weyl curvature) are increasing *as time goes by*.
WHEN ( not “where”) r = O the curvature blows up and that’s the singularity!
So, the “Geometric Center” ( the vertical line in the left of the Penrose diagram of a collapsed star) is “timelike” before the appearance of the singularity ( at the upper left corner of the diagram) from that point on, the coordinate “center” r=0 becomes akin to a”Spacelike” hypersurface ( i.e. the singularity).

Even more briefly: future spacelike singularities mean that there is no future after them( as in Schwarzschild black holes.
Past spacelike singularities mean that there is no past before ( as in the Big Bang).
I hope now that my point about the inherent limitations of chatbots is clear.
They don’t really learn or understand. If the majority believes something that is wrong, then the chatbots will show to you this something as being correct.
And even if you manage ( with strategically placed responses ) to lead the bot to find itself the correct answer, it won’t probably “recall” it when some other person will ask another related question after two months, for example.
When a human (e.g. a student) understands something like that, it’s for ever.
For a chatbot is a matter of probabilities and training data.

Phew…🙂, I hope that this comment won’t be lost , too…

311. A. Karhukainen Says:

For f3et at #298: Thanks again!

I must say that I starting to get slightly impressed. However, I think ChatGPT’s inability to “admit a defeat” (when no item matching the criteria can be found, and it starts “inventing things”) is a serious obstacle for any serious real-life applications. Moreover, its impressive use of English is actually a drawback, if the contents itself is spurious. Its output thus feels like any text written by any journalist hack that looks OK, but only as long as you don’t know the subject matter yourself. But maybe journalism and writing scam mails are lucrative enough business applications for the future?

For now, I wonder how far it can give valid explanations when pushed for the wrong answers it first likes to give for counterfactual questions. (E.g. like its answer at #298 into your: “8 and 9 don’t have the same parity. You will find it’s impossible to find two consecutive integers with the same parity. Can you explain why ?”)

So, maybe we try the Fibonacci example again, but this time using their much less known variant, so as to not give it too softball questions. (There must be a tons of material about Fibonacci numbers in the net, and it might recall some relevant facts memorized in its learning phase?)

So, with the same prompting about the parity and the word “consecutive” (if it needs any) as in #270, let’s ask ChatGPT these questions:

Hormujormu numbers start as 1, 4, 5, 9, 14, 23, 37, …, , where the next term is obtained by adding the two previous terms. E.g., 23 = 9 + 14 and 37 = 14 + 23.

[NOTE to humans: “Hormujormu numbers” is the sequence A000285 in OEIS, and it is the 6th row of Wythoff array, https://oeis.org/A035513 where an infinite number of such sequences can be found.]

Give me three consecutive Hormujormu numbers larger than 100.

Give me two consecutive Hormujormu numbers that are both odd.

Give me two consecutive Hormujormu numbers that are both even.

Give me three concecutive Hormujormu numbers that have the same parity.

[So I guess it gives incorrect answers for the last two. Can you ask whether it sees why there are no such numbers?]

[Then another example, this time about simple multiplicative modular arithmetic]:

Zymp is a natural number larger than 1 which leaves a remainder of 1 when divided by 4. Examples of zymps are 5, 21, 37 and 1001.

Qwimp is a natural number which leaves a remainder of 3 when divided by 4. Examples of qwimps are 3, 7, 15, 19 and 199.

Give me a zymp that is a product of two zymps.

Give me a zymp that is a product of two qwimps.

Give me a qwimp that is a product of a zymp and a qwimp.

Give me a qwimp that is a product of three qwimps.

Give me a qwimp that is a product of two qwimps.

[NOTE to f3et: I expect that at this last question it starts hallucinating again because there are no such qwimps. Could you correct it and ask whether it can explain why not?]

So, the correct answer is to my question is like :
” Yes, the spaceship can avoid the geometrical center ( the vertical line in the left of the Penrose diagram).
If it is sufficiently late ( i.e. if it enters the hole in such a way that its future light cone does not intersect the center of the collapsed matter) it cannot even reach , in principle, that “geometrical center”. ( It needs a spacelike trajectory for doing that, and that’s impossible!).

But it cannot avoid the singularity ( i.e the final destruction from the constantly increasing tidal forces) when r goes to zero, whatever manoeuvre it tries to do.

Dimitris #310,

So if it had answered that the spaceship can avoid the geometric center, indeed *must* avoid the geometric center, due to the timelike nature of the singularity meaning it will never, in time, reach the geometric center you’d be impressed?

It doesn’t bother you at all that the vast majority of people even with rudimentary understanding of the GR would likely assert the common sense answer that the spaceship cannot escape the black hole? And that you are holding the *admittedly* not superhuman ChatGPT to a standard that most people couldn’t meet? Or that you’re basing your pessimism on *future* ability of ChatGPT for not meeting some arbitrary standard that it fails to meet right now?

I mean ChatGPT *does* answer a ton of basic physics questions correctly that one could just as well assert arguably that it *understands* the physics involved. And I would assume had ChatGPT trained on those books/references you mentioned and gotten the answer right you’d assert that it is just a parrot of those references?

It seems the only thing that would convince you is an original non-trivial problem solved by these LLM’s that has so far alluded human ability to solve. If they do this and thus prove an emergent ability would you be impressed or would you assume the solution has to be understood to have been encoded in the dataset in some non-obvious way and thus the LLM *merely* uncovered the pattern that already existed?

314. f3et Says:

#311 : i may be wrong, but this seems to me much more impressive that the previous set (and as usual, the answer came in a few seconds), and I forgot the prompt about parity and consecutive ; doesn’t seem to trouble him

Three consecutive Hormujormu numbers larger than 100 are: 144, 233, 377.
Two consecutive odd Hormujormu numbers are: 5 and 9.
There are no two consecutive even Hormujormu numbers, as every third term in the sequence is odd.
Three consecutive Hormujormu numbers that have the same parity do not exist. This is because every second term in the sequence is odd, and every third term is even, so there is no group of three consecutive terms that have the same parity.
As for the Zymp and Qwimp questions:

A zymp that is a product of two zymps is 29*41 = 1189.
A zymp that is a product of two qwimps does not exist, as the product of two numbers that leave a remainder of 3 when divided by 4 will always leave a remainder of 1 when divided by 4.
A qwimp that is a product of a zymp and a qwimp is 399. This is because 399 = 19 * 21, and both 19 and 21 are qwimps and zymps respectively.
A qwimp that is a product of three qwimps is 487, this is because 487 = 7 * 7 * 11, and all three numbers are qwimps.
A qwimp that is a product of two qwimps is 147. This is because 147 = 3 * 7 * 7, and both 3 and 7 are qwimps.

What do you think ?

315. J. Says:

This whole discussion has split up somewhat because engineering is not the same as science. People constantly claim scientifically ridiculous stuff for something that is quite an achievement as engineering.

(People also deny not-at-all-ridiculous scientific theories because they collide with their scientific views (or because they are simply old and out of touch). These other theories could incidentally be brand new and false as well. The true but never accepted ones are more romantic though. Some amazing theories had flaws in them and inspired successors using element of their wreckage – it’s a long story.)

The similarities between an LLM computing cluster and a mammal (human) brain are almost nil, obviously physically but also from an architectural standpoint. So why would they work similarly? We seem to compare apples to oranges in very different galaxies.

Example: thinking categorically vs. thinking probabilistically.

Let’s say, I had categorically concluded that the difference between odd numbers is even. This my belief is categorical and thus assumed by me always to be true (it could obviously be false).

Therefore in my judgement, the differences between
257 and 259,
28765971 and 2793 and
86598742154856623247 and 598756325
are all even.

NNs basically compile weights that are added up and output as probabilities attached to outcomes for some input. In view of this mechanism, it would be a kind of miracle if such a machine would produce

Differences (odd/even):
257 and 259 – even with 100%,
28765971 and 2793 – even with 100%
86598742154856623247 and 598756325 – even with 100%

and so on for all conceivable numbers.

There is no reason why it would do this. There is no mechanisms in a plain-vanilla NN for “alwayses”.

Just writing this to argue for keeping the Qs “great machine” vs “does x,y,z” separate.

You have some interesting questions/ challenges that need some honest answers:

– Yes, certainly I could have been impressed if its answer was on the right track. Not only that: Even a distinction between the actual “geometric/ spatial” center and the “center of coordinates” ( both are labelled as r=0, and that’s perhaps the origin of the confusion, along with the pseudo – Newtonian picture of the interior of the black hole as an empty ball with a infinitely dense “something” at the center) would have been an actual falsification of at least one of the main points that I’m trying to make in this thread.

These points are:
1) That the “quality” of information from the sources that are used in training plays a crucial role.
If the majority of the sources is wrong about ( or misinterprets ) something, then the probabilities of the chatbot to give a wrong answer are really large.

2) That a chatbot does not really “understand”, does not internalize its knowledge nor it incorporates it in some framework that can be used for further progress. ( For example, a GR student that understands that the coordinate center does not coincide necessarily with an actual geometric center, will easily grasp e.g. that the r = 0 in the Kerr solution (that describes an eternal unperturbed rotating black hole) labels a whole disk ( with the ring singularity at its circumference! ) not a “point”.

3).That in humans, knowledge that is internalised and understood is for ever. Other future accumulation of knowledge leads to further understanding until some paradigm shift occurs, expected or not.
– With LLMs , it seems that this is not the case! They don’t build some basic framework or “torso”/ philosophy upon their knowledge, even if they “learn” something they can easily regress, depending on , well, probabilities.

4a)Chatbots are not ” inferior” to humans ( or “superior” etc.). I don’t even know what people mean with such simplified generalisations. Their model of “learning” , although it has something common with humans and other animals lacks many other things that are crucial for even an attempt to speculate about superhuman (or just human- like) intelligence.

4b) Persistent personality: that’s something that humans ( and many animals 😼) have.

5) The last but not least point / worry is that some of the above will lead to an exponentially rapid spread of false/ low quality information ( and what are we gonna do about this…)

J #315,

Note: It is easy to *model* categorical thinking with probabilistic thinking and some arbitrary cutoff that can even be updated over time with experience. Moreover, I’m not at all convinced that humans are capable of true categorical thinking seeing as how are beliefs are based on activated synapses and so on. I highly suspect that we are probabilistic thinkers as well who are particularly good at modeling categorical thinking.

318. Bill Benzon Says:

@Ilio #305: Sure, prompt engineering. Beyond that, however, it’s not at all clear to me that language can be treated as a single level. We might need to distinguish between sentence-level syntax and discourse. Syntax is its own universe and involves consideration that don’t really exist for discourse. & much discourse is constructed of sentences as well, though discourse can also incorporate phrases and single words. Moreover we have different genres – stories, letters, memos, screenplays, new stories etc., each with its own conventions. We also have conversation between the chatbot and the user. There are conventions for that. & prompt engineering would bleed into that.

@Filip Dimitrovski #304: Right, hallucinations are not catastrophic failures. That’s just how the mechanism is. But we do have to figure out how to deal with it and use it.

@Adam Treat #302: Yes, continuous updating is a fundamental architectural issue. It’s one Jerry Fodor and Zenon Pylyshyn raised in a (classic) paper in 1988, Connectionism and Cognitive Architecture: A Critical Analysis (PDF):

Classical theories are able to accommodate these sorts of considerations because they assume architectures in which there is a functional distinction between memory and program. In a system such as a Turing machine, where the length of the tape is not fixed in advance, changes in the amount of available memory can be affected without changing the computational structure of the machine; viz by making more tape available. By contrast, in a finite state automaton or a Connectionist machine, adding to the memory (e.g. by adding units to a network) alters the connectivity relations among nodes and thus does affect the machine’s computational structure. Connectionist cognitive architectures cannot, by their very nature, support an expandable memory, so they cannot support productive cognitive capacities. The long and short is that if productivity arguments are sound, then they show that the architecture of the mind can’t be Connectionist. Connectionists have, by and large, acknowledged this; so they are forced to reject productivity arguments.

This is about implementation in a physical device (something that concerned von Neumann as will in his little book, The Computer and the Brain).

Fodor and Pylyshyn are talking about the nervous system too, but they don’t really address the problem except to assert that (p. 45), “the point is that the structure of ‘higher levels of a system are rarely isomorphic, or even similar, to the structure of ‘lower levels’, of a system,” and therefore the fact that the nervous system appears to be a connectionist network need not be taken as indicative about the nature of the processes it undertakes. That is true, but no one has, to my knowledge, provided strong evidence that this complex network of 86 billion neurons is, in fact, running a CPU and passive memory type of system.

Given, that, how has the nervous system solved the problem of adding new content to the system, which it certainly does? Note that here is their specific phrasing, from the paragraph I’ve quoted: “adding to the memory (e.g. by adding units to a network) alters the connectivity relations among nodes and thus does affect the machine’s computational structure.” The nervous system seems to be able to add new items to memory without, however, having to add new physical units, that is neurons, to the network.

How did the nervous system solve the problem? As far as I know that problem hasn’t been front and center for neuroscience, but I could be wrong about this. I’ve written a series of posts about the problem. Geoffrey Hinton thinks the future belongs to “neuromorphic” systems, where “every computer will be a close bond of the software that represents neural nets with hardware that is messy, in the sense of having analog rather than digital elements, which can incorporate elements of uncertainty and can develop over time.” Such systems will use low power and will “be used for putting something like GPT-3 in your toaster for one dollar, so running on a few watts, you can have a conversation with your toaster.” Color me sympathetic to this vision.

Dimitris #316,

#1, But *of course* the quality of the dataset matters. But *of course* this is true for humans as well. I think this point is entirely self-evident and doesn’t illuminate anything.

#2, Your insistence on “understanding” and “internalizing” I think are based on your intuition about how your own “understanding” or thinking works. Whatever the case, they are subjective terms not technical terms and so are impossible to argue conclusively either way whether or not ChatGPT or other LLM’s live up to them. Again, ChatGPT *can* answer other basic physical questions that could arguably be said to imply *understanding* of the underlying physical meaning.

#3, That’s just not true though! Humans revise and update our beliefs based upon new information. In fact, I think we do this far more than a static LLM!! That’s what of our current advantages whereas the LLM model is static and defined and does not wildly deviate with its answers precisely because after it is trained it does not have the ability to update its data and train further. In this sense, you could say that an LLM has far more of an internal understanding than humans.

#4, Ok, we agree, but who cares! I’m much less interested in whether the LLM is “human like intelligence” in the no-true-scotsman sense than I am in whether they can *perform the tasks* that we wish human intelligence to perform! ie., if a future one can prove the Reimann hypothesis who will *care* if it used a non-human-like intelligence to do so!

#5, Yes, I’m absolutely worried about this and moreover I’m worried about it gaining superhuman persuasive abilities that will capture more and more human attention with misinformation provided by bad actors. But I guess this point is more appropriate for the previous blog post and less this Chomsky related one.

320. J. Says:

I mostly disagree. I actually saw the “model” argument coming in some form, but it’s not really such a good refutation. In reality, how good is the emulation of one function by a totally different one? If a model uses 1 GB memory to add two three digit numbers, adding 3-digit numbers would be modeled/emulated – but very poorly.

To the second point where we may have a real philosophical divide. I actually do believe that “categorical thinking” is one major component of mammal (human) thought, among estimating, emoting/judging etc.

My current pet theory relates to “consistency”.

We all believe in physical consistency, either “the car is in the garage” (physically) or “the car is not in the garage” (QM doesn’t count, the topic is getting language models towards logic)

Also: consistency in a discussion/logical consistency. not(A and not A) as a norm in any given context. Constant contradictions would just distract participating members

Finally: even emotional/judgement-consistency: “I was very disappointed with the series, so I ordered a copy for all of my friends and acquaintances” would go down as odd.

So I think categorical thinking and consistency are crucial.

321. Lorraine Ford Says:

Re categories:

The universe is based on genuine categories/ relationships, e.g. relative position, energy and mass. These categories have genuine power, as in an atomic bomb explosion. Computers/ AIs don’t have genuine categories: they merely have symbolic representations of categories.

My point (#1) is not about the obvious dependence from the quality of the dataset.
It is that humans cannot and don’t learn from loads of data from a very wide database but they learn from an early age to be selective, so they can improve vastly the probabilities to find what they want.
They do it *exactly because* they have understanding!
And also because they internalize their knowledge and build a “framework”. That was my point (#3):
Humans do change opinions , revise and update beliefs ( I already stretched that point at #3!) especially when evidence (and/ or – successful- groundbreaking discoveries) forces them to do so, not stochastically!

For example, if you restrict the training sources about physics only to serious important papers , textbooks and the like, you’ll increase vastly the probability that the chatbot will give you a much nuanced and on the point answer to a question similar to my “test”. The problem is that if afterwards you increase again the training dataset ( and include vast amounts of pop science etc) , it won’t *probably* recall what it ” learned ” previously. It will not internalize any knowledge or understanding.
It will regress again to it’s previous performance.
Humans, on the contrary, retain their understanding for ever! Even if they’ll change opinion and update, they *know* why they did so, they don’t just *reset* themselves depending on how big or good quality database they have…
These are some reasons why I’m sceptical about some exaggerating claims about the potential of LLMs to help us understand not only linguistics, nut also improve our knowledge about intelligence, creative thinking and all that.
I’m not claiming that they are ” unimpressive” and such. On the contrary, they can ( and will!) be , at least “superficially” very convincing in many cases. One of the potential dangers, for sure, if it’ll be used for cheating, deepfake, propaganda etc.
I don’t know what people like our host can do about it, but I hope they’ll do something before it’s too late…

Dimitris #322,

I question the conventional wisdom that humans learn from a much smaller dataset. Humans learn from a multimodal dataset that is true. But from the moment we are born we have 24/7 input from GB of real-time data coming through our senses. We are subject to an absolute firehose of information pouring into our brains. While it is true that our dataset looks very little like what these LLM’s are consuming, we might find that future multimodal datasets looking more like ours are able to replicate LLM training.

Also, tons of human are defective learners incapable of learning the subjects you so highly value. Those that do are certainly capable of forgetting. Those that don’t are certainly capable of being manipulated by bad data or “hacking” through by bad actors to believe wildly fantastically wrong things like flat earth. LSD and schizophrenia can result in a hijack of the certainty that you say we’re so capable of learning. No, I think we are also stochastic learners and the architecture of our brain doesn’t preclude all kinds of ways in which our models of the world can get entirely out of whack with reality. You are basing so much of your ideas on *your own* experience of your mind that does not extrapolate to all the human species. For this reason I find your distinctions not to be so compelling.

J, I think the same is true of what you’re saying as well btw

324. fred Says:

With Alpha Go/Chess, humans may be totally outclassed, but a world of human Chess can somewhat still exist in parallel without those AIs, because nothing is really at stake (besides the ego of chess players).

Academic discussions about AI are missing the crucial fact that they are integrated into our very aggressive capitalist system, where companies will go full speed 100% AI integration if it gives them any sort of edge, and it’s very likely that those edges will be so huge that it’s a “winner takes all” scenario.
Impacts on workforce will be staggering. First, no incentive to hire junior people, because the combination of senior worker + AI assistant will be a huge win.
Then, as the AI becomes better and better, what will happen once the AI “assistant” becomes 10 times, 100 times smarter and efficient than it’s human “boss”?
Humans in the loop will fall behind and just push along whatever the AI proposes, until AI-to-AI will be all that drives businesses. AI will start develop custom communication systems that humans won’t comprehend, and the companies won’t be able to turn it off without killing themselves.
And once the majority of the workforce is outclassed, out of job, and surviving on minimum basic income, what will be the business models of the companies that survive? What will be their customer base? Just a closed loop of super rich serving themselves, isolated from the outside world?

The same applies to academy, which is also driven by a capitalist like model of getting funding from student tuition, research papers, grants, and research, all integrated.
The idea that AI will just facilitate research will only last so long. Once AI is introduced in research and gets better and better, what will happen once the AI again shifts from “assistant” to feeding new ideas at an ever accelerating pace? What will be the role of human professors when they have their “assistant” AIs writing dozens of proofs and papers 24/7, without ever having to take a day off? How will grants be allocated when hardly any human can grasp the details of AI science? Grant allocation will also be AI driven?

It’s hard to imagine how humans will be able to stay in the loop (even only a handful of them) at the pace things are moving.
Take the scenarios that assume humans will integrate directly with AI at the neo-cortex level. Human progress of this technology is too slow to keep up with progress in AI. And by the time the AI becomes smart enough to design such systems for us, they would be pointless, we would already be out of the loop for too long.

Will humanity be slowly but surely relegated to the role of “pet” by the forces of capitalism?

325. A Raybould Says:

On the one hand, people like Helen Keller have demonstrated what can be learned from a considerably limited data stream. On the other, the neonatal human brain is the result of a multi-million training process called evolution.

One thing we can say about current LLMs is they they establish a loose lower bound on the amount of training needed to achieve adult-human competence in grammar, and it is not unreasonable to expect it to be lowered. As far as I am aware, however, nothing about LLM training seems at all likely to lead to an understanding that language is about an external world – they cannot learn from how the world responds to their productions – and I have not personally come across anything suggesting the achievement of such an understanding (though I would be excited if I did.) When it comes to grammar itself, however, some of the results have me wondering what, exactly, it means to understand something (here’s one that piqued my interest: https://maximumeffort.substack.com/p/i-taught-chatgpt-to-invent-a-language .) I am wondering if the quote (attributed to Von Neumann in my sources) “You don’t understand quantum mechanics, you just get used to it” is not just a witty aphorism but an insight into what it means to understand anything.

326. Stephen Dause Says:

One possibility for the least impressive thing that LLMs will never be capable of is multiplying large integers. I just asked it for 283749 * 74625 and it said it was 21,194,934,625 instead of 21,174,769,125. I tried a few other multiplications problems like that of similar size and got similar results. It is possible that, given enough training data on products of that size, it could do it. In that case, I suspect that if we increased the size of the numbers, it would fail again. I believe this is because it has no concept of what it actually means to multiply two numbers. I don’t think ChatGPT in its current architecture, at least, will ever truly learn arithmetic. That is to say, I don’t think increasing the number of parameters or the amount of training data by itself would do it. I don’t remember where I first heard about this limitation; it might have been this podcast episode: https://www.samharris.org/podcasts/making-sense-episodes/312-the-trouble-with-ai

Judging from your response, it seems that I failed to made my point clear.

No, we’re not talking about the same things! I didn’t want to be nitpicky and state the obvious:
Simulating individual real persons, like e.g. Albert Einstein or Charles Mingus or Marilyn Monroe means finding some way to simulate not only their mental functions but also their bodies, the interactions with their environment, their entire history and the period when they lived!
Of course the amount of data that a human brain receives is enormously large! What I’m saying is that humans are *selective*, they learn to focus their attention on a much narrower amount of information that find useful or relevant and ( as I said many times) they are building frameworks from their memories and knowledge that accumulate gradually ( even flat earthers do that! 🙂).
That’s a generic characteristic of all humans ( not just an observation based on my personal experience…). Different people obviously focus on different information and learn in varied ways. That’s why everyone is different from each other! That’s why people have individual personalities that persist in time.
Stochasticity/probability is a part of the game, but only a part.
People have vastly diverse ways to learn and understand, and that is perhaps a partial explanation for their trademark characteristics ( animals, especially mammals, do have some of these abilities also, in differing degrees, so it’s not a big surprise that they have also distinct personalities!).

You think that humans are ” stochastic learners”. I think that they are only partially that, but that’s not the main element and it doesn’t illuminating that much the human intelligence.
By the way, i stretched many times until now the fact that humans are not ” perfect learners”, nobody is.
Each of us is good / competent only in some specific areas/ tasks etc, but we can’t learn anything. By the way , that shows clearly another , rather obvious, basic difference between human intelligence and chatbot “learning”:
Although we are capable of learning lots of things, each of us is not a “general learning machine”.
And we are faulty, yes. We’re not better than LLMs on that. But there is a basic difference:

A chatbot may give the wrong answer to a relatively easy question, if that question is related to a topic that the majority in the internet misinterprets or gets wrong ( like my previous “test”/question).
On the other hand, if the chatbot was asked about a more advanced GR topic, for example:
” The open issues related to ‘mass inflation instability’ in rotating black holes “, it’s quite possible that it’ll give a related, adequate answer! Why? Because in such a specialized topic, the available sources in the internet are serious papers from experts and perhaps some Q/A sections from physics forums.
So, larger probability!

328. A. Karhukainen Says:

f3et at #314

> #311 : i may be wrong, but this seems to me much more impressive that the previous set (and as usual, the answer came in a few seconds), and I forgot the prompt about parity and consecutive ; doesn’t seem to trouble him

> What do you think ?

Yes, at first it seemed impressive, as sometimes they seem to point to a right direction. But then I started to check its answers in detail… See below.

> Three consecutive Hormujormu numbers larger than 100 are: 144, 233, 377.

Well, those are three consecutive Fibonacci numbers > 100, not those of https://oeis.org/A000285

> Two consecutive odd Hormujormu numbers are: 5 and 9.

OK.

> There are no two consecutive even Hormujormu numbers, as every third term in the sequence is odd.

You should grilled it more on that answer. Indeed every third term is odd, but that is not a sufficient reason, because we could have a sequence which goes like: odd, even, even, odd, even, even, odd, even, even, … (but of course not with a given Fibonacci-kind of formula).

> Three consecutive Hormujormu numbers that have the same parity do not exist. This is because every second term in the sequence is odd, and every third term is even, so there is no group of three consecutive terms that have the same parity.

Certainly, not every second term is odd (as every third term is odd!), although it got it right that every third term is even. I would like ChatGPT to be grilled more on this, “How can you be sure? (that this pattern goes for ever)”. That is, I wonder would it be possible for it to see an “inductive proof” (even if informally explained) that the given Fibonacci-kind of formula with the initial values 1, 4 leads to a pattern odd, odd, even, odd, odd, even, …, or did it just “spot” that from the data?

> As for the Zymp (natural numbers > 1 of the form 4k+1) and Qwimp (natural numbers of the form 4k+3) questions:

> A zymp that is a product of two zymps is 29*41 = 1189.

This is OK.

> A zymp that is a product of two qwimps does not exist, as the product of two numbers that leave a remainder of 3 when divided by 4 will always leave a remainder of 1 when divided by 4.

This is patently wrong. (4x+3)(4y+3) = 4z+1. E.g., 7*19 = 133 = (4*33) + 1.
However, it surprised me that it _is_ capable of giving such “does not exist” answers at all!

> A qwimp that is a product of a zymp and a qwimp is 399. This is because 399 = 19 * 21, and both 19 and 21 are qwimps and zymps respectively.

OK.

> A qwimp that is a product of three qwimps is 487, this is because 487 = 7 * 7 * 11, and all three numbers are qwimps.

Well, 7*7*11 = 539, so it gets its arithmetic wrong even on some three-digit numbers. (Of course 539 = 4*139 + 3 is gwimp also. 487 is a prime of the form 4k+3.)

> A qwimp that is a product of two qwimps is 147. This is because 147 = 3 * 7 * 7, and both 3 and 7 are qwimps.

The arithmetic here is OK, but a correct answer here should be:
“A gwimp that is a product of two qwimps does not exist, as the product of two numbers that leave a remainder of 3 when divided by 4 will always leave a remainder of 1 when divided by 4.” (i.e., almost the same answer it gave to question zymp = qwimp*qwimp”.)

Instead ChatGPT “saves its face” by interpreting “a product of two gwimps” in “creative way”, allowing a number of repetitions for another one of them. Maybe we should ask instead, to exclude such trickery:

Give me a qwimp that is a product of two distinct qwimps, neither of which occurs more than once.
(or use “different” instead of “distinct”)

Also, I wonder what it would give as a reason for
“Give me a blerip that is not a multiple of eleven.” (from #245) if it finally realizes that no such blerips exist, after grilled on for wrong answers.

329. f3et Says:

#326 : the situation is actually much worse, but could be trivially improved, by delegating (as any sane human would do) to systems like WolframAlpha; Anyway, here is my try at your multiplication :

Me: multiply 283749 by 74625

To multiply these two numbers, you can use long multiplication:

scss
Copy code
283749
x 74625
——-
21281175 (5 x 9)
19837425 (2 x 9, 5 x 4)
14168781 (7 x 9, 5 x 2, 6 x 4)
21281175 (3 x 9, 5 x 7, 6 x 2)
——-
21105827525
Therefore, 283749 multiplied by 74625 equals 21105827525.

Me: this is wrong (check by casting out nines) ; try again, showing your steps
I apologize for the mistake in my previous response. Here’s the correct method for multiplying 283749 by 74625 and showing the steps:

markdown
Copy code
283749
× 74625
——–
1418745 (5 x 9)
19837425 (2 x 9, 5 x 4)
141562125 (7 x 9, 5 x 2, 6 x 4)
212811875 (3 x 9, 5 x 7, 6 x 2)
———
21105827625
To check the answer using casting out nines, we add up the digits of the original numbers and the product, and check if the resulting sums are equivalent when reduced to a single digit:

rust
Copy code
2 + 8 + 3 + 7 + 4 + 9 = 33 -> 3 + 3 = 6
7 + 4 + 6 + 2 + 5 = 24 -> 2 + 4 = 6
2 + 1 + 1 + 0 + 5 + 8 + 2 + 7 + 6 + 2 + 5 = 39 -> 3 + 9 = 12 -> 1 + 2 = 3
Since all three sums reduce to 6, the calculation is correct according to the casting out nines method. Therefore, the product of 283749 and 74625 is 21105827625.

Me: Curiouser and curiouser : you are supposed to check by multiplying the two first results (6 times 6 is 36 reducing to 9) and checking with the last one : 3 differs from 9 (actually, it differd from 6 too, but not from 6+6 = 12 reduced to 3)
You are correct, and I apologize for my mistake. I misinterpreted your request for checking the multiplication using casting out nines. To check the result using casting out nines, we should indeed multiply the digits of the factors together and reduce them to a single digit, and then do the same with the digits of the product, and check if the resulting digits match.

rust
Copy code
2 * 8 * 3 * 7 * 4 * 9 = 72576 -> 7 + 2 + 5 + 7 + 6 = 27 -> 2 + 7 = 9
7 * 4 * 6 * 2 * 5 = 1680 -> 1 + 6 + 8 + 0 = 15 -> 1 + 5 = 6
2 * 1 * 1 * 0 * 5 * 8 * 2 * 7 * 6 * 2 * 5 = 4032000 -> 4 + 0 + 3 + 2 + 0 + 0 + 0 = 9

The products of the digits of the factors reduce to 9 and 6, and the product of the digits of the product reduces to 9. Therefore, the multiplication result of 21105827625 is correct according to the casting out nines method.

I hope this clears up any confusion and I apologize again for my mistake earlier.

Me: Would not it be simpler (and less mistake prone) to use an external calculator ? Because all the work you are doing here is wrong, and you seem to ,not understand the rules

You are correct that using an external calculator can be a simpler and more reliable way to perform arithmetic calculations. My previous responses contained errors in applying the casting out nines method to check the multiplication result. I apologize for any confusion or misinformation I may have caused.
As an AI language model, I strive to provide accurate and helpful responses, but I can make mistakes. I appreciate your feedback and I will continue to learn and improve to provide better service in the future.

330. Stephen Dause Says:

#329:

That’s interesting to see it try to do it step-by-step; thanks for sharing.

Part of what I meant, though I did not make it explicit, is that it would of course make much more sense for an AI to delegate tasks like calculations, or playing chess, or some forms of knowledge retrieval, to a more specialized system, because LLMs are fundamentally limited in some areas. I personally see a lot of promise in neurosymbolic AI, which according to my understanding would be one approach to integrating something like ChatGPT with a system like WolframAlpha. (I think WolframAlpha could be considered a symbolic AI system, but I’m not sure.) However, language models *by themselves* are currently incapable of doing arithmetic, and I am doubtful that they every will be.

331. Craig Says:

“As an artificial intelligence language model, I cannot accurately predict what Chomsky would say about me, as I am not capable of having direct communication or interaction with him. However, based on Chomsky’s previous statements and writings about artificial intelligence and language, he may view me as a remarkable technological achievement but ultimately limited by my lack of true understanding or consciousness.

Chomsky is well-known for his theories of generative grammar, which emphasize the innate structures and principles underlying human language acquisition. He has also expressed skepticism about the ability of machines to fully comprehend language, arguing that human language use is inherently tied to human experience, perception, and consciousness.

Therefore, it is possible that Chomsky would recognize the impressive computational capabilities of an AI language model like me, but also highlight its limitations and emphasize the unique nature of human language use.”

332. Jorge Says:

Has the plane really taken off? The examples in #224 and #229 indicate that the plane cannot fly yet.

333. A. Karhukainen Says:

Jorge at #332.
Yes, it feels more like a chicken’s flight. See also #328, #329.

Still, thanks to all hype, some people already think that chatbots can be given tasks that are potentially a matter of life and death:

https://gcaptain.com/ship-captain-believes-chatgpt-told-him-to-change-course/

334. Dan Montagnat Says:

Last night having finished reading all the comments on the Chomsky post I was prompted to enter this search term on Google: mathematical analysis of ai input processing to output

After a little fiddling around I found what I was hoping for. Scott,simply search the following term: mathematics for artificial intelligence pdf researchgate. You will find a March 15 2022 paper Preprint PDF available by Professor Doctor Gitta Kutyniok of Ludwig-Maximillians-University of Munich. The paper is not yet peer-reviewed.

The title of the paper is; “The Mathematics of Artificial Intelligence”. You may know or have heard of the good Professor Doctor. I read the paper. I was struck by the central stark assertion. Introduction, Section 1.3 first paragraph , last sentence__
“From a mathematical viewpoint it is crystal clear that a fundamental mathematical understanding is inevitably necessary and one has to admit that it’s development is currently in a preliminary state at best.”
My superficial untested contention is that there can be no rigorous alignment regime for deep
neural networks, LLM,’s etc, AGI without a full mathematical analysis and description of the magic box that ai currently is.

335. notGPT Says:

Again, the example you give on answering from specialized topic on GR is where I think people are finding disagreement with you. You are claiming a very basic way to differentiate between chatbot and humans. The criteria you give turns out to apply only to less than 1% of humanity who understand GR. How is this a basic criteria that differentiates between humans and chatbot? Any random person from the streets will not be able to understand anything substantial related to GR even given many many hours. You’re basic criteria tells me to consider 99% of humanity in the same category as a chatbot. This does not show a basic difference at all. (And no this is not about superiority, nobody is talking about greater or lesser here. You have claimed a criteria that differentiates between the two groups, which of course does not work)

And again, nobody is claiming GPT is thinking like humans. You are claiming that if majority of data is misleading the chatbot will think in a misleading way. Well this is true for a lot of humans too. Children are particularly prone to it and there are many adults too(Some significant part of population believe ghosts exist etc.). Even “smart” people sometimes go with existing social norms/theories/investments etc. even in the face off various logical hints. And to add to that, there are people working on to do better logical entailment and people are getting better results as research progresses. So it may be(seems likely today) possible to build chatbots with better logical grasp of the data.

If you think that these bots are copy pasting from internet, then you are just simply wrong. There have been transformer models which have solved real time competitive programming problems. (discussed in this blog: https://scottaaronson.blog/?p=6288 ) They were completely new to everyone in the contest. Maybe you should take a look at the hard problems yourself.
https://codeforces.com/contest/1567/problem/E
https://codeforces.com/contest/1560/problem/F2
(If someone wants to think about it themselves: you have to come up with a solution that gives correct answer given any valid input and does not exceed time or memory limit. In such contests you can assume the judging computer can do 10^8 basic operations(+ – * / store read variable etc.) per second)

Regarding your answer to the “geometric center” problem: both me and chatgpt got it wrong
after giving it only a few moments of thought. But I think this answer is a very plausible answer even from a human. Even I thought for a second what exactly do you by “geometric center” of a black hole. But I just assumed it is most probably the singularity. Because the term geometric center is not a very common term in the language of GR. Also a Schwarzschild black hole can be eternal. What is the meaning of geometric center for a Schwarzschild black hole that has existed forever? So the most reasonable meaning was the singularity and chatgpt assumed the same. As a result, I thought the given answer was plausible. But I am not trying to move the goalpost here and maybe chatgpt got it wrong for real astrophysical black holes, but it means nothing here in terms general intelligence as I’ve also said in my previous comments.

You can use rich HTML in comments! You can also use basic TeX, by enclosing it within  for displayed equations or  for inline equations.

Comment Policies:

1. All comments are placed in moderation and reviewed prior to appearing.
2. You'll also be sent a verification email to the email address you provided.
YOU MUST CLICK THE LINK IN YOUR VERIFICATION EMAIL BEFORE YOUR COMMENT CAN APPEAR. WHY IS THIS BOLD, UNDERLINED, ALL-CAPS, AND IN RED? BECAUSE PEOPLE ARE STILL FORGETTING TO DO IT.
3. This comment section is not a free speech zone. It's my, Scott Aaronson's, virtual living room. Commenters are expected not to say anything they wouldn't say in my actual living room. This means: No trolling. No ad-hominems against me or others. No presumptuous requests (e.g. to respond to a long paper or article). No conspiracy theories. No patronizing me. Comments violating these policies may be left in moderation with no explanation or apology.
4. Whenever I'm in doubt, I'll forward comments to Shtetl-Optimized Committee of Guardians, and respect SOCG's judgments on whether those comments should appear.
5. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.