I’m much more fascinated by the sociology of the people involved in LLMs than I am by the things themselves, though to be fair, I don’t need any of the things they’re good at (coding copilots only make sense if you can already code).

]]>*“But in order for it to be a day to day tool, it cant just be this chatbot, it has to be able to do stuff, such as go to the internet for you, like retrieve documents for you, summarize it for you. […] here is something to expect within the next year, that GPT and other LLMs will become more and more integrated with how we use the web.[…] you could unlock a lot of the near term usefulness if you could give it errands to do […] I expect that we’re going to go in that direction and that worries me somewhat, because now there’s a lot more potential for deceit”*

It only took a few days because that’s exactly what AutoGPT/BabyAGI are doing!

(video about AutoGPT/BabyAGI capabilities)

https://youtu.be/Qm2Ai_JiQmo?t=400

1a: https://en.wikipedia.org/wiki/Unitary_matrix#Equivalent_conditions

1b: https://en.wikipedia.org/wiki/Qubit#Standard_representation

1c: https://en.wikipedia.org/wiki/CHSH_inequality#CHSH_game

1d: https://spectrum.ieee.org/googles-quantum-tech-milestone-excites-scientists-and-spurs-rivals (Martinis’ quote in “Building on Quantum Supremacy”)

1e: https://en.wikipedia.org/wiki/Lattice-based_cryptography

1f: https://en.wikipedia.org/wiki/Integer_factorization#Difficulty_and_complexity

1g: https://en.wikipedia.org/wiki/Grover%27s_algorithm

1h: https://en.wikipedia.org/wiki/Entropy_of_entanglement

1i: https://books.physics.oregonstate.edu/LinAlg/eigenunitary.html

1j: https://quantum.phys.cmu.edu/CQT/chaps/cqt15.ps (first page)

GPT-4’s performance on questions like this only demonstrates that it is pretty good at determining whether or not the statement in the question is equivalent to, or contradicted by, statements that surely exist in its training set. The question of how impressed we should be by this ability depends on how far the formulations of the statements in the questions are to the formulations of relevant statements in the training set. E.g. its answer to 1j would be a bit more impressive if all articles about density matrices were like the Wikipedia article, which says that their eigenvalues are non-negative and that they sum to 1, but doesn’t explicitly say that they lie in [0,1], as it would then only be able to answer this question by cross-referencing this information against articles on unrelated topics that state that numbers with these properties lie in [0,1]. But the article that I linked to above states the range explicitly.

Given that it was easy for me to find the answers to the first 10 true/false questions despite knowing nothing about quantum computing, they don’t strike me as very compelling evidence of its intelligence. I can’t comment on the other questions, as I haven’t had a chance to look at them. But I’d be more impressed by good performance on questions that require both strong language skills and strong mathematical reasoning skills. It would be very interesting to see how much WolframAlpha boosts its ability to answer such questions.

]]>Example: starspawn0 #196’s usage of “generalize” vs “extrapolate/interpolate/project/etc.” These two types of nomenclature should be distinguished in a strict sense.

When you are trained on pictures of lions, you should be able to get some tigers, house cats or lynxes right – extrapolation. Similarly, If you speak fluent Spanish, you can understand quite a bit of Italian/Portuguese. This type of getting results some distance beyond your source data is the basic skill of a whole lot of base level AIs iirc. (Language hopping, as a guess, could work similar to taking a transfer flight or train, transferring correct translations via a common node.)

“Generalization” should be used for stuff that is systematic/categorical/universal etc. You generalize things into a universal principle. For example, if object A is inside object B and object B is inside object C, then object A is inside object C. This is simply true if you believe in 3D-space and its usual mathematical representation (Euclidian space) or simply common prejudices. There is absolutely no reason (or sign) that LLMs can do such “reasoning” or work with a universal principle.

People seem to want the thing to accomplish things it can’t and won’t.

]]>> Please read the following question, select the correct answer from one of the options, and provide the reasoning process. Don’t look up the answers on the internet. Question: In a box, there are three boxes, and each one of these boxes contains three smaller boxes. How many boxes are there in total? Options: A: 9, B: 10, C: 12, D: 13, E: 15

ChatGPT got this one right 0 times out of 10; it totally failed (though its reasoning steps might have been ok, except for a smaller error or two). Bing chat got it right the first time and didn’t do an external search (it always tells you when it searches the internet, then shows you a check-mark when it completes the search; there was no search, no check-mark, no sources listed). It responded:

> The correct answer is D: 13. Here’s the reasoning process:

> If in a box, there are three boxes, and each one of these boxes contains three smaller boxes, then there are a total of 3 x 3 = 9 smaller boxes. Adding the three larger boxes to the nine smaller boxes gives us a total of 9 + 3 = 12. However, since the question asks for the total number of boxes (including the original box), the answer is 12 + 1 = 13.

I expect GPT4 would do even better, perhaps getting most of the 11 problems right when given 10 tries per problem, told to think step-by-step, and using a simple majority vote.

….

One thing I would point out is that models don’t either only “generalize” or only “memorize”. They can do both — sometimes applying memorization, sometimes applying generalization + a learned algorithm. Also, “generalization” is not the opposite of “memorization”. A rules-based chess engine can be said to fail to generalize to other games; but it can equally well be said not to be memorizing — it’s applying an algorithm.

A well-known example of these models doing “interpolative generalization” is the following: several years ago when Google trained some very large machine translation models, they noticed that the systems could translate between pairs of languages for which they either gave it no examples at all, or very few (far too few to learn to translate properly). For example, maybe the dataset had lots of parallel translations between English French, German French, and German Italian, but say no examples of English Italian; yet, the model somehow generalized and did perfectly well at translating English Italian. (These were not the exact pairs of languages; obviously it was some other pairs, since there surely were lots of English Italian examples).

As I recall, some Google scientists came to believe that the model had emergently learned an “interlingua” — an internal universal language for translation, that works for all language pairs at once. People criticized this claim; but the fact remains that the model *somehow* generalized to translate between pairs of languages that it either wasn’t trained on, or was only given a tiny amount of data.

]]>Tools are unreliable. Bugs are a feature. Humans are free.

]]>