AlphaCode as a dog speaking mediocre English

Tonight, I took the time actually to read DeepMind’s AlphaCode paper, and to work through the example contest problems provided, and understand how I would’ve solved those problems, and how AlphaCode solved them.

It is absolutely astounding.

Consider, for example, the “n singers” challenge (pages 59-60). To solve this well, you first need to parse a somewhat convoluted English description, discarding the irrelevant fluff about singers, in order to figure out that you’re being asked to find a positive integer solution (if it exists) to a linear system whose matrix looks like
1 2 3 4
4 1 2 3
3 4 1 2
2 3 4 1.
Next you need to find a trick for solving such a system without Gaussian elimination or the like (I’ll leave that as an exercise…). Finally, you need to generate code that implements that trick, correctly handling the wraparound at the edges of the matrix, and breaking and returning “NO” for any of multiple possible reasons why a positive integer solution won’t exist. Oh, and also correctly parse the input.

Yes, I realize that AlphaCode generates a million candidate programs for each challenge, then discards the vast majority by checking that they don’t work on the example data provided, then still has to use clever tricks to choose from among the thousands of candidates remaining. I realize that it was trained on tens of thousands of contest problems and millions of solutions to those problems. I realize that it “only” solves about a third of the contest problems, making it similar to a mediocre human programmer on these problems. I realize that it works only in the artificial domain of programming contests, where a complete English problem specification and example inputs and outputs are always provided.

Forget all that. Judged against where AI was 20-25 years ago, when I was a student, a dog is now holding meaningful conversations in English. And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands.

It’s not obvious how you go from solving programming contest problems to conquering the human race or whatever, but I feel pretty confident that we’ve now entered a world where “programming” will look different.

Update: A colleague of mine points out that one million, the number of candidate programs that AlphaCode needs to generate, could be seen as roughly exponential in the number of lines of the generated programs. If so, this suggests a perspective according to which DeepMind has created almost the exact equivalent, in AI code generation, of a non-fault-tolerant quantum computer that’s nevertheless competitive on some task (as in the quantum supremacy experiments). I.e., it clearly does something highly nontrivial, but the “signal” is still decreasing exponentially with the number of instructions, necessitating an exponential number of repetitions to extract the signal and imposing a limit on the size of the programs you can scale to.

259 Responses to “AlphaCode as a dog speaking mediocre English”

  1. Florian Says:

    Astonishing indeed! Give it another 10 – 15 years and having seen the latest advancements of OpenAI & DeepMind, I’m confident that the vast majority of white collar workers might be replaced as well.

  2. Sid Says:

    What’s amazing to is that no modeling innovation was needed to get this far — all that was needed was lots of training data + scaling up.

  3. Esteban Martínez Says:

    Perhaps what it is unclear is how much the programmer understands! I mean, the real question here is if we (humans) will be able clarify deep notions of what it means consciousness and being. What might be striking for eloquent English speakers is that a good program needs no validation from its human peers! Perhaps we are so deep into neoliberalism logic that anything that does not generate immediate winnings is out of the question.

  4. Alexander Kruel Says:

    Remember that AlphaCode is just a (not fully trained) 41b-parameter model and there are already 280b language models. Scaling laws suggest that larger models can become dramatically more sample-efficient and better at generalization.

    Also, don’t underestimate other ways to improve upon it. Compare e.g. AlphaGo with AlphaGo Zero. It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.

  5. Søren Elverlin Says:

    >It’s not obvious how you go from solving programming contest problems to conquering the human race
    The standard answer is that this goes through recursive self-improvement. If I was hiring a human to improve AlphaCode, I would consider the skills demonstrated by AlphaCode to be relevant, though not central.

  6. Jon Awbrey Says:

    Connectionism never learns.

    Probably why it’s such a perfect partner for capitalism.

  7. Scott Says:

    Jon Awbrey #6: With, I suppose, the more declarative forms of AI that sounded good but never actually worked being a perfect partner for communism? 😀

  8. Fast typist Says:

    Until AI solves factoring by classical methods I doubt AI supremacy can happen. Scott are you confident AI can find a PTIME factoring algorithm and that there is one?

  9. Sid Says:

    @Alexander #4

    One issue is what wld be the equivalent of self play for program synthesis is a lot less clear

  10. J Storrs Hall Says:

    During the 1980’s it was the practice of the Student Chapter of the ACM at Rutgers to hold annual programming contests. This author had the honor of being one of the judges at such a contest. It worked as follows: The entrants were teams, of from 3 to 5 students. Each team was given an assignment of four programs to write. The team which got all four programs running correctly the earliest, won. Normally most of the students used Pascal, which was the language taught in computer science courses, or Basic, which many had started in. A few engineering students did their programming in Fortran.

    This year there was entered a motley team of mavericks. Instead of working together, each member would do his work alone, and in a different language. One, who had interned at Bell Labs, would work in C. Another, God help him, was going to use assembly language. A third would try Lisp. And the other member of the “screwball” team, whose name was Damian Osisek, worked in SNOBOL.

    The contest was unexciting that night, because Mr. Osisek, working alone, completed all four programs before anyone else, individual or team, completed even one.

    The following year, the use of SNOBOL (and APL ) was banned.

  11. Scott Says:

    Fast typist #8:

      Scott are you confident AI can find a PTIME factoring algorithm and that there is one?

    Of course I’m not confident of either, or of one conditional on the other—what would make you imagine that I was?

  12. Scott Says:

    J Storrs Hall #10: So then, why isn’t the whole world using SNOBOL now? Should it be? Or are there languages in use today (Python? Perl?) with which Damian Osisek would’ve done just as well?

  13. Andrew Kay Says:

    Whilst this is an amazing achievement, and obviously just the start, I feel something is missing. Take for example the “backspace” algorithm in the paper. Before I’d let that loose controlling my pacemaker or power plant, or even passed its code review, I’d need at least some sort of mathematical justification why the “start-from-the-end-in-greedy-fashion” actually picks out a correct string of deletions and doesn’t have to backtrack over earlier decisions. Maybe in future they’ll also generate proof-checking assertions that can be verified formally. Then code review simply has to check that the specification is correct.

  14. Scott Says:

    Andrew Kay #13: While my programming career was admittedly not long, I never formally verified a single program I wrote!

    On the other hand, I see little reason why similar methods couldn’t be turned loose on the problem of automatically generating formal specifications from English descriptions, if there were a similarly large training corpus. And then one could use the same methods again, and/or leverage existing work, to search for programs that matched the specifications.

  15. David Vedvick Says:

    Great perspective, thanks for writing this. Now I am wondering, once we have confidence that AI can solve programming problems, how will we master communicating the problem to AI? As a developer, my most common struggle is not solving a hard algorithmic problem, it’s determining the problem the customer actually wants solved.

  16. Fast typist Says:

    How else would we measure AI supremacy? Speeding up mundane tasks humans do? No. It has to be something we know is possible but cannot achieve. I would think you would have assumed factoring is easy because of similarity between Fq[x] and Z and that there is a fast factoring algorithm over Fq[x]. There is no deterministic algorithm over Fq[x]. Perhaps there is an unifying deterministic algorithm which is classical and we do not know how but a smart AI can figure it.

  17. Jon Awbrey Says:

    Re: Scott

    Being more triadic than dyadic in my thinking, I’ve never regarded capitalism and communism — the latter of which in its collectivist extremes is very connectionist indeed — to be dichotomous choices, but more like precocious (0.666-baked) approximations to representative democracy.

  18. anon Says:

    I came up with an algorithm (the same as the AI’s) for solving the linear system using a CAS. Feels relevant because this is heralding an era of much more powerful CASes.

  19. Scott Says:

    David Vedvick #15: I completely agree that figuring out what the customer wants is a huge part of practical software development and likely to be harder for AI.

    When I told a 9-year-old of my acquaintance about AlphaCode, her first questions were: “but can it write a program to draw a picture of my dad’s butt? How would the program fit that butt on the screen?” Imagine translating that into a clear specification! 😀

  20. Gerard Says:

    Scott.

    When I first saw the title of your post “AlphaCode as a dog speaking mediocre English”, I thought this was going to be a complete takedown of AlphaCode, so I was very surprised at where you ended up. Somehow that title just didn’t translate for me into “AlphaCode looks like an amazing advance in AI”.

    I remember when IBM first tried to beat Kasparov. I think I was in college at the time and I didn’t believe a computer would ever beat the world champion chess player. Then a few years later it happened and the lesson I Iearned from that was “never bet against the computer” (a lesson that has been quite amply reinforced several times since).

    Still it’s a very weird process. About 10 years ago I watched Watson beat the best Jeopardy contestants and I was deeply impressed. Surely IBM had developed some really powerful technology there. Then a few weeks ago we learned that they sold off the Watson division (which had made a very non-pbvious and , in my opinion, questionable, pivot into healthcare) for a fraction of what they had invested in the technology.

    PS. Why do you need to solve that system of equations without Gaussian Elimination ? It’s quite an easy algorithm to implement, I did it recently in a few lines of Python for some coding challenge.

  21. dlb Says:

    Long time lurker, first time poster.

    As a professional programmer I can confirm that a *lot* of my work can be efficiently replaced by good algorithms like this one. When I was young, our job was to stick many bits into expensive memory (so to speak). Today, it is to connect many libraries together to get the job done. In the future, it could very well be to explain the problem at hand to some “AI”. Still, understanding how to formulate a problem so that an algorithm like AlphaCode can solve it looks like a skill in itself to me. So programmers have a future, just different than what we know today (and hopefully, with the same number of programmers just solving a larger amount of problems, or big problems a single programmer cannot solve alone).

    As for an AI taking over the world, chess players, go players and now programmers haven’t been able to do it. I don’t see how the AI could. 🙂

    PS: your blog is the best – if one ignores US-only-the-sky-is-falling posts, but even with the noise the signal is excellent -. Keep the good work!

  22. Boaz Barak Says:

    Agree it’s super impressive, but maybe the analogy is a dog that can speak in extremely eloquent paragraphs, making fewer grammatical (the analogs of “off by one”) mistakes than most humans, but its ability to reason over a longer time scale is not much better than current dogs.

  23. Danylo Says:

    > but I feel pretty confident that we’ve now entered a world where “programming” will look different

    The question is – what will be the meaning of the word “we” in the future 😉

    By the way, I find OpenAI announcement about solving math olympiad problems even more striking, because it’s much closer to how humans actually think – by using a neural network to find and combine formal rules of logic.

  24. Scott Says:

    Gerard #20:

      PS. Why do you need to solve that system of equations without Gaussian Elimination ? It’s quite an easy algorithm to implement, I did it recently in a few lines of Python for some coding challenge.

    I was careful to say, you need to avoid Gaussian elimination in order to solve it “well.” Gaussian elimination would be an ugly cubic-time solution where a nice linear-time solution exists—and AlphaCode indeed finds the latter. It’s an interesting question whether AlphaCode could also have found the ugly solution—probably it depends on whether or not Gaussian elimination was common in its training corpus?

    As for the ease of implementation: YMMV! One of my most-cited papers, Improved Simulation of Stabilizer Circuits (with Daniel Gottesman), came about because as a grad student doing a course project 19 years ago, I didn’t feel like implementing Gaussian elimination in C and didn’t know how to call it in a library. So instead I spent a few days searching for a better solution until I found one, and then Daniel explained to me why it worked. 😀

  25. Scott Says:

    Danylo #23:

      By the way, I find OpenAI announcement about solving math olympiad problems even more striking, because it’s much closer to how humans actually think – by using a neural network to find and combine formal rules of logic.

    That was indeed also striking! Unless I’m missing something, though, a crucial difference between the two is that the OpenAI thing takes as input a hand-coded formal specification of the IMO problem, then uses deep learning to help search for a formal proof. AlphaCode takes as input the plain English (!!) specification of the programming competition problem.

  26. Gadi Says:

    I think people underestimate how hard it is finding the correct question to result in general intelligence. Just think about the hundreds of millions of years of evolution and the uncountable number of somewhat-intelligent yet not quite there animals that lived in this planet. Yet all this time none of them became as intelligent as humans, and it’s not even quite clear what made humans make the leap.

    Maybe “how to program from an English description” will be another miracle question leading to intelligence, but I don’t think it’s harder than the kind of questions evolution had been solving for millions of years. We’re not using “better technology” than nature with artificial neural networks, if anything, the neural networks we’re using are orders of magnitudes weaker. Yes, we know this technology eventually created human intelligence – but only after hundreds of millions of years.

    Either way, dogs understand mediocre English. Many animals do, and only lack vocal cords to speak back. Parrots can even speak back. Monkeys can speak sign language. Maybe the one in a trillion event was the invention of language itself, but other animals also developed all sorts of languages yet didn’t reach human intelligence. Whatever is the question to which the answer is human intelligence, it had to be pretty non-trivial for us to be the first intelligent species. And that’s assuming it was one question and not a strange combination of questions that together resulted in this miracle.

    I’m definitely not afraid of AI becoming the new overlords from research about computer vision, or about winning games. Nature was pitting neural networks against neural networks in this game for millions of years until it reached us. Just think about the immense computational power that required reaching humans in the first place, you’d have to have a very good reason to believe you have a shortcut to all of that computation. My bet is that the first general intelligence will come from just simulating a human brain, using all that computation time of nature instead of trying to invent humans from scratch.

  27. Scott Says:

    dlb #21:

      PS: your blog is the best – if one ignores US-only-the-sky-is-falling posts, but even with the noise the signal is excellent -. Keep the good work!

    Thanks!! Regarding the US, though, the thing is that the sky did fall. We’re now a decrepit, crumbling empire, torn between fanatically self-certain factions, that can no longer maintain the ideals of freedom, democracy, progress, and Enlightenment at home let alone export them to the rest of the world. Is the problem just that I was ever naïve enough to expect anything different? 🙂

  28. Ivo Says:

    Scott, you might enjoy this mind blowing paper on combining GPT-3 with Codex to plan and execute actions for embedded agents: https://twitter.com/pathak2206/status/1483835288065658882?s=20&t=sBfBzd8l7kwtLWm-Ptderg

    The agents are in VR for now, but one can see we are rapidly moving coser to a world where all intellectual and physical work is done by AI and not us.

  29. A Raybold Says:

    Scott #14: Could this or similar methods be effective in generating formal specifications from English descriptions? I don’t doubt that a similar method could generate a number of candidate formalizations of the problem, but how would a likely-correct one be selected? In the case of AlphaCode, example solutions are essential, and coming up with examples to work with is beyond its abilities. An AlphaFormalMethods program that similarly depends on an input of examples would be missing the same essential step as is also missing from AlphaCode.

    If your point here is that formal verification is a red herring here, then I would agree, on the basis of what I have written above, but I think Andrew Kay #13 still has a point: AlphaCode is not demonstrating any of the judgement that humans put to use on this sort of task – on the contrary, it is effectively being outsourced to humans. When you consider where we go from here, I suspect the question of judgement will loom large (if it is not already an issue for things like self-driving cars.) Maybe current machine learning methods can deliver on that too – most of us (myself included) have been surprised already.

  30. Gerard Says:

    Gadi #26

    > Maybe the one in a trillion event was the invention of language itself

    That would be my bet. Natural language in the human sense appears to be a completely general method of representing information. Formal languages used in logic, computer science and mathematics are really just subsets of natural language and as I understand it the entire theory of logic and computer science (at least, mathematics I’m less sure of) can be formulated in terms of transformations between different formal languages (ie. again, just subsets of natural language).

    For that reason I think I would expect the first signs of real progress towards AGI to appear in NLP-like systems.

  31. J Storrs Hall Says:

    Scott # 12: If all the programmers in the world were at Osisek’s level, we would be using something far beyond SNOBOL (modern equivalent, Prolog) unified with, say, Matlab. The key question is how many higher level concepts the programmer is able to think in, and only then whether the language supports them directly. I imagine that say Python with all its libraries more than covers the range in which the typical programmer is able to think.
    Anyone who digested the singers problem into a linear system as you did above would be able to write a solution in one line of a modern APL. But I fear that leaves out a majority of practicing programmers.

  32. Scott Says:

    A Raybold #29: Oh, I was imagining that the AI to generate the formal spec would also take some examples as input.

    If “programming” can be reduced to “provide some examples of the input/output behavior you want, plus some English text to prime an AI in the right direction to generate a formal spec and/or the actual code,” do people have any idea what a huge deal that is?

  33. Jeff Lun Says:

    I think the issue with most of the critics of this kind of progress is that most of the arguments against assume:

    1. The the critic themselves thinks of themselves as an above-average programmer, which may or may not be true
    2. It’s easy to point out and focus on failures rather than progress (I think this blog article does a good job of pointing of the progress side)

    If you think of systems like AlphaCode as an attempt to approach the problem more from a statistical standpoint, where the goal is not to write code like a human would, but instead to write code at lower cost in aggregate over time, it starts to look a lot more like how the insurance industry works. For example, when seatbelts and airbags were introduced to cars vs. when they became mandatory has a similar story arc. At first people would say, “Why do you need these safety devices – just drive better!” or “Well, seatbelts save some lives, but there are situations where they caused death that wouldn’t have happened.” The problem with both of these statements is that they seek to point out the lack of perfection in specific cases, whereas the insurance industry is more interested in seeing overall improvement on the aggregate. The proxy in the insurance world (if I understand it approximately correctly) is something like: “cars are getting safer if the total dollar amount of claims paid out per mile driven is reduced over time.” There are a million possible ways to reduce claims (including outright rejecting them), but in the idealized case where everyone’s an honest broker, the principle is that what you’re optimizing for isn’t the complete elimination of all death, but a reduction in the cost, frequency, and severity of injuries, deaths, and property damage.

    To bring this back to things like AlphaCode: so long as AlphaCode (or any alternative implementation, for that matter) can reduce the cost of producing a system that produces a correct mapping from inputs to outputs for any given problem definition, then you’ve achieving the goal. In other words, I don’t care if AlphaCode has to try 1,000,000 candidate solutions. What I care about is: given a set of test cases (like unit tests), can the computer come up with a valid solution in less time, and at less total energy cost than a human? If so, then the cost to produce correct solutions to programming problems has been reduced, and whatever that new thing that has reduced the cost – that thing can be turned into a tool that is given to human programmers, thereby making them more effective.

    Rinse and repeat and you may not get computers replacing programmers anytime soon, but you may very well increase the efficiency (in terms of time and cost) of programmers by 10%-30%, and that along is worth billions of dollars per year.

    Now take that 10%-30% efficiency gain, COMPOUND it over several years, and across an entire industry and you’re talking about generations worth of productivity gains in just a few nears – and theoretically it builds on itself.

    There may be some asymtotic efficiency maximum somewhere, but if so I guarantee we still have plenty of efficiency to gain.

  34. Akshat Says:

    > And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands

    But these points all correctly undermine the impressiveness of the effort: they indicate that we are blindly exploring the terrain of training dogs to speak English, rather than that we actually understand how we’re doing it.

    This kind of demonstration is illusory progress. With a great deal of money and effort, we have succeeded in constructing a Rube Goldberg machine. We can make no guarantees about how this Rube Goldberg machine would perform outside of our carefully curated habitat. We can’t tell you how it really works. If you’re lucky, a third of the time, it will do what we wanted it to do. Even more damningly: in principle, nothing prevented us from doing it twenty years ago, except there wasn’t enough hardware and nobody wanted to spend that much money.

    Procurement of additional funding is not really evidence of foundational progress. Foundational progress comes from explainability, which in turn leads to iteration. When there exists an accessible model for how input becomes output that we can refine, that’s when we should sound the victory bells.

  35. A Raybold Says:

    Scott #31 I see what you mean – it’s like saying that if you take a planetful of matter orbiting a star, and just leave it alone for several billion years, then, simply by that matter interacting under the constraints of physics, you might end up with a creature that figures out this is what happened – mind-blowing stuff!

    The key to it all is having something – survival of the fittest, a collection of right answers – to pick out the winners.

  36. Scott Says:

    Akshat #34: See, that’s where you’re wrong. With the use of deep learning for translation, voice recognition, and face recognition, we similarly don’t understand in any detail how it works, yet those applications have changed the world. AlphaCode looks to me like it could already be useful for practical programming, just as it stands, and of course we should expect such things to improve dramatically in the coming years.

    This would likely have cost billions of dollars to do 20 years ago, if it was possible at all. You can say that that’s “merely” down to Moore’s Law, plus the existence now of gargantuan programming competition datasets, plus all the practical deep learning experience of the past decade, but if so, the word “merely” is doing a huge amount of work!

    That more compute and more training data beats every attempt to hardcode in “semantic meaning” is, indeed, the Bitter Lesson of AI. And as a scientist, I’m committed to the idea that, when reality tries as hard as it possibly can to teach us a certain lesson, it’s our responsibility to learn the lesson rather than inventing clever reasons to maintain the contrary worldview that we had before.

  37. Gerard Says:

    Scott #36

    > With the use of deep learning for translation, voice recognition, and face recognition, we similarly don’t understand in any detail how it works, yet those applications have changed the world.

    How much have they really changed it though ? I’ll grant you that the fact that today anyone can typically get a reasonably good gist of a foreign language publication just from Google Translate is a pretty big deal, but I’d put it on a similar or lower level of “changed the world” as smartphones, the Internet or social media, rather than say internal combustion engines, electricity or H-bombs. Still no one would seriously want to use machine translation for a mainstream media article, let alone anything really important like a legal document.

    I guess lots of people are using voice translation for simple queries to virtual assistants but I don’t think many doctors or lawyers are using it for automatic dictation. Or, if they are, they’re sure to carefully proofread the resulting product.

    As for face recognition, it’s something that gets talked about a lot in the press, but I’m skeptical that the coverage accurately reflects reality. I don’t think we’re anywhere near having a system that can watch Times Square and put a name on virtually every person who walks by. In fact I doubt such a thing is possible because in a crowded space even with lots of cameras you aren’t likely to get many unobstructed pixels on any particular face (of course that’s without even considering the problem that these days many people will be wearing masks).

  38. Veedrac Says:

    Scott #34:

    And as a scientist, I’m committed to the idea that, when reality tries as hard as it possibly can to teach us a certain lesson, it’s our responsibility to learn the lesson rather than inventing clever reasons to maintain the contrary worldview that we had before.

    Excellently put. As a further quiz to those that would rather not see;

    If machine learning is not at all a general understander, how do the same neural circuits running the same algorithms apply to almost any domain, whether language or image synthesis or playing Go or theorem proving?

    If learned models do not at all implement general cognition, why does pretraining on language markedly improve non-language tasks, like reinforcement learning on Atari games?

    If neural networks are entirely oriented around memorization of previously seen ideas, why do they contain semantically meaningful features we did not train on, like how does AlphaGo know to calculate liveness?

    In fact, how is a model meant to know how to interpolate data points at all in high dimensional spaces without some semantically-meaningful understanding of that space? We tried naïve methods for language modelling before, they were called Markov Chains, and they could rarely produce coherent sentences, never mind be offered a new word and use it in a sentence.

    The reality is that we’re in a world where iGPT can complete the top half of this image as this completion, and the best argument the naysayers have as to how it made a semantically meaningful completion amounts to “probably it saw that completion it made before, with a different texture.” But it didn’t. There are not enough images in the world.

  39. Nick Nolan Says:

    it clearly does something highly nontrivial, but the “signal” is still decreasing exponentially with the number of instructions, necessitating an exponential number of repetitions to extract the signal and imposing a limit on the size of the programs you can scale to.

    This describes human intelligence quite well. Our domain is short problems that can be modeled spatiotemporally in 3d–mostly picking berries and avoiding predators.

    When we try to extend ourselves to other tasks (modern physics, mathematics, logic) the number of repetitions, false starts to grow exponentially. We have limit for the size of the problems we can solve fast, then comes exponential slowdown.

    Group intelligence and language have rescued us from noticing our limits it because we stand on each others shoulders. We spend lots of time making things simpler and condensing problems into short rules we can follow.

  40. Gerard Says:

    Scott:

    > A colleague of mine points out that one million, the number of candidate programs that AlphaCode needs to generate, could be seen as roughly exponential in the length of the generated programs.

    I don’t understand that remark. Surely the programs are far longer than 20 bits in length.

  41. Gadi Says:

    Gerard #30: I don’t buy it. Nature did evolve communication similar to language in many other species. Why did only humans get this far? If it’s just the invention of language and brains were just fit for using it from the moment it was invented, then we could have taught other species to be as intelligent as us just by teaching them language. But we can’t. We’ve been talking to animals for a long time and they didn’t become smarter, even after thousands of years of evolutionary pressure from domestication.

    My bet is that’s it’s not just language. Even if it is language and it co-evolved with human brains, I’m not sure you can replicate whatever process happened over thousands of years and billions of humans with computations several orders of magnitude smaller. Maybe at best you can mimic human intelligence by getting inputs from humans, but creating something smarter or even equal, on its own?

  42. Scott Says:

    Incidentally, Gerard #20:

      When I first saw the title of your post “AlphaCode as a dog speaking mediocre English”, I thought this was going to be a complete takedown of AlphaCode, so I was very surprised at where you ended up.

    Ah, but that’s exactly the point, isn’t it? People’s instinct to sneer at AlphaCode is wrong in exactly the same way as their instinct to sneer at a haltingly talking dog would be.

  43. Danylo Says:

    Scott #25

    Yes, but transforming a language description of a problem into a formal form and solving it in a formal form are two different tasks. And it’s reasonable to solve them separately. It’s amazing that AlphaCode can do it using a unified approach. But I don’t think it’s the correct way to proceed further. General AI has to invent formal laws one way or another. With our help or without. But if AI will invent, say, ZFC, but without our help, then it will be a complete blackbox to us. It will be much harder to communicate. There is already a huge problem with explanation of ML models outputs. Yet, they are used in criminal justice, healthcare and other domains that affect human lives directly (see, e.g., https://arxiv.org/abs/1811.10154).

  44. drm Says:

    Its cousin AlfphaFold2 is likely to win a Nobel for solving the protein-folding problem for most (but not all) practical purposes. Its done a huge heavy lift for biology, delivering 100’s of thousands of structures to the community in less than a year. (Of course if it turns out their not accurate after all, then it will be one of great head fakes in the history of science:)
    I have run it locally, its a fascinating machine.

  45. Arthur Says:

    I feel like this is a good example of the fallacy pointed out by Rodney Brooks in http://rodneybrooks.com/the-seven-deadly-sins-of-predicting-the-future-of-ai/, specifically in the section of “performance vs. competence”. The image of a dog speaking mediocre english feels fantastic in part because we would expect the dog to be able to tell us about its point of view, explain how it spent its day, etc. We are carrying all the baggage of our expectations of “dog” with the idea of “dog that can kind of speak english”. The work behind alphacode is imo fantastic; but it has to be viewed through the proper lens. I have played with a lot of these kinds of models (large language models finetuned for code, or even large models trained on github dumps), and if you fall just a tiny bit off distribution you get craziness. It’s also not obvious how many of the solutions to the problems are at least part in the training data. I don’t wish to minimize how cool I think this work is, its really great. but its not anything like a talking dog.

  46. Jonas Kgomo Says:

    Any advice to a software engineer wondering about being a programmer in the age of AI code generators?

  47. Timothy Chow Says:

    Scott, your quip about dogs reminds me of an old Peanuts comic strip (June 25, 1960).

    Akshat #34: Suppose AI were to eliminate global poverty and achieve lasting world peace, but we didn’t understand how it accomplished it. You might say that this wasn’t progress. I wouldn’t argue the point, but I’d say, “I’ll take it anyway.”

    Don’t get me wrong; I agree that lack of understanding can impede progress. But as practitioners usually understand better than theorists, a lot of progress in this world gets made by finding things that work, long before we understand how they work. It’s counterproductive to pooh-pooh practical advances just because our theory is lagging behind.

  48. Scott P. Says:

    I remember when IBM first tried to beat Kasparov. I think I was in college at the time and I didn’t believe a computer would ever beat the world champion chess player. Then a few years later it happened and the lesson I Iearned from that was “never bet against the computer” (a lesson that has been quite amply reinforced several times since).

    I think the correct lesson to be drawn from that particular example is ‘a lot of competitive tasks become much easier when you let one side cheat.”

  49. anon85 Says:

    Scott, we already have talking parrots, so I think it is rather important to ask questions like “how much did the parrot actually understand” (the answer turns out to be “not much”).

    The single example given by alphacode (the one you quoted) is extremely impressive. Of course, it’s been cherrypicked out of dozens, and for each such example the AI generated around a million guesses, so this is a selection effect of power around 1 in 10^8. Now, that’s still really impressive! 10^8 is not that much for such a long program!

    But what I would like to see are some random samples from those 10^8 worse programs, to see what the typical guess is here. What does the generation method actually do based on the English input? I can imagine that the answer might be something like “it correctly sees the presence of modular arithmetic and adds code to wrap around the edge of the matrix, but this code is typically in a random location”, combined with “it sees that it needs to output NO in many special cases, so it puts in random special-case checks”, combined with picking one of three common input-cleaning chucks of code it has memorized, etc.

    Again, this is still incredibly impressive and quite possibly useful in practice… but how can you NOT ask what the parrot understands in such a case? How can you not be annoyed that deepmind is obscuring this? (Or maybe they’ll release more information soon, I guess.)

    And to the people saying “just you wait, it will scale soon” — you guys promised me self-driving cars by 2018. I’ll believe this scaling when I see it. I also remember alphaStar, for example, which was supposed to scale from “beating the best humans at starcraft in one race matchup” to “beating them in all race matchups”, but instead scaled backwards to “losing to the best humans even in that one race matchup after unfair AI advantages were corrected,” and everyone gave up and moved on to the next shiny thing.

    So maybe this will scale like the alphaGo -> alphaZero story of rapid improvement, but maybe instead it will scale like the alphaStar -> [worse alphaStar after it turned out alphaStar was cheating] story.

  50. Milena Mihail Says:

    David Vedvick #15 That is indeed hard. In the present context, there are previous articles reporting success to automate even straightforward English text, eg see https://techcrunch.com/2021/08/10/openai-upgrades-its-natural-language-ai-coder-codex-and-kicks-off-private-beta/ In ancient history, for general modal/temporal/etc logics, even satisfiability can be undecidable: given a set of (presumably customer pronounced) propositions, does there exist a model that satisfies the propositions? (eg the customer’s desired properties are not contradictory.) Finally, just for fun, an old cartoon is here https://boingboing.net/2013/03/14/history-of-tree-swing-draw.html

  51. Futureseek Daily Link Review; 07 February 2022 | Futureseek Link Digest Says:

    […] as strong as steel and light as plastic >> * AlphaCode as a dog speaking mediocre English >> * The Jumps That Gave Zoi Sadowski-Synnott Gold in Slopestyle >> * Defeat stalks […]

  52. mjgeddes Says:

    Amazing what they can do with just more data of the right type and scaling (‘bitter lesson’). This kind of AI paradigm is definitely not at all like my own complete theory of AGI that I discovered in early 2021 😀 Neural nets really put the ‘engineering’ into ‘software engineering’. Need huge amounts of data and compute and teams of people.

    The neural nets I think are something like ‘applied statistics’, sort of a scaled-up version of regression to deal with enormous numbers of variables. It’s ‘pattern recognition’ (detection of correlations) enormously scaled. So neural nets are to statistics, what chemistry is to quantum mechanics.

    The trouble you have with this paradigm is the ‘black box’ element, it’s very much the ‘brute force’ engineering approach that leaves you in the dark about what’s going on. It’s undeniable that it does work though, so there’s no way to rule out the possibility that it might scale all the way to AGI !

    I do suspect that the neural net paradigm may finally be running into it’s limits though. Although it’s definitely mastered moderately complex games like Chess and Go (AlphaZero), in the sense of getting superhuman performance, it didn’t succeed in mastering the much more complex StarCraft & DoTA. AlphaStar did reach expert human-level performance, but couldn’t get to superhuman level.

    It’s amazing that AlphaCode has got to level of median coder under some specific circumstances (clearly defined problems), but the fact that “the number of candidate programs that AlphaCode needs to generate, could be seen as roughly exponential in the number of lines of the generated programs” suggests clear limits for neural nets here too.

    Yudkowsky, of course, has been yelling that ‘The End is Near!’ for as long as we’ve known him (since 2000). But my guess is that AGI will turn out to work on a different paradigm than current ML techniques, and ‘value alignment’ will turn out to be much easier than people fear.

  53. Scott Says:

    anon85 #49: I agree with you, in that I really wanted to dive into a bunch of AlphaCode’s failed attempts too! And I wanted to give it my own programming challenges—even “easier” ones than the ones in the paper, just to check whether it could handle a writer who’s not immersed in the stylized conventions of these programming contests.

    Apparently all the raw data is available on GitHub—but, while I wanted more, I didn’t want that much more! Alas, it might be too computationally expensive to open the thing up for anybody to play with, as was done with GPT. Hopefully we just need to give it another month or two, and others will have written up the results of their own explorations…

  54. Scott Says:

    Jonas Kgomo #46:

      Any advice to a software engineer wondering about being a programmer in the age of AI code generators?

    That’s an excellent question and something I thought about too! As many others have pointed out, AlphaCode doesn’t seem to generalize in an obvious way to
    – longer programs (hundreds or thousands lines) with hierarchical structure,
    – programs with vague or conflicting requirements (i.e., basically always the situation outside of programming contests),
    – programs that require inventing a whole new algorithm,
    and more.

    For these reasons, but especially the vagueness thing, my guess is that the profession of “programmer” will still exist quite far into the future — indeed, fully automating programming strikes me as an AI-complete problem, at least as hard as automating editorial writing or anything else. On the other hand, my guess is also that AI tools will permanently change the nature of programming — much like it was changed by previous advances (structured programming, objects, type-checking, automatic garbage collection…), but probably even more so. I.e., once you’ve reduced what you want to the level of a well-defined “contest problem,” you’ll then just be able to describe what you want in plain English and give a few examples of correct input/output pairs, and then a tool like AlphaCode will take care of the rest.

    My daughter is 9, and has started to learn both QBasic and Python. I’ve been extremely frustrated by her tendency to describe what she wants and then expect her dad to take care of the details. Already, AlphaCode makes me wonder whether I should be less frustrated.

  55. Verisimilitude Says:

    I’ll be brief. My current work regards language modelling and new types of programming. I seek to model things and eliminate failure cases and the like. A computer can be made to return perfect results, and should be made to return such. Rather than work towards improving how a human may use the machines, instead we see so much effort poured into random nonsense that pursues impossible goals, because the real goal is to take power from everyone but the few who own the machines and those paid to run them for the former group.

    It’s not impressive that a machine could return broken programs, after laundering however much Free Software, no matter how much prompting be given. The way to true progress is having intelligent men design systems that can attack complex problems, no this. This is exactly the wrong way to go about things and, while I’ve written about this and related topics, I won’t link to any article from my website in particular.

  56. Gerard Says:

    Scott P. #48

    > I think the correct lesson to be drawn from that particular example is ‘a lot of competitive tasks become much easier when you let one side cheat.”

    I would say that the judgement of history is 100% against you on that interpretation since for decades we have been living in a world where no human has had any hope of beating the strongest chess programs.

  57. Scott Says:

    Gerard #56: Right, what exactly was Scott P. thinking? “Oh, if only the interpretation of the contest rules in 1997 had been more in Kasparov’s favor. Then surely we in 2022 would still have human supremacy in chess! Just like we’d still have human supremacy in Jeopardy!, if only Watson had faced a human-like buzzer delay in its match against Ken Jennings.”

  58. Gerard Says:

    Scott #54 and Jonas Kgomo #46

    There’s a discussion of this blog post on Hacker News: https://news.ycombinator.com/item?id=30230867

    It has some interesting comments from programmers who seem to have found some real utility in some perhaps less advanced but more practical tools than this one. I haven’t used any of them myself but from the comments it sounds like they essentially work like auto-complete on steroids.

  59. Boaz Barak Says:

    I think the “exponential” interpretation is correct but should be more nuanced.

    In the uniform distribution over programs of N symbols, the probability of getting a correct solutions would be probably something like 2^{-N} which would be so small to make it completely useless.

    AlphaCode, as far as I understand, manages to sample from a distribution that is much closer to the correct one, a distribution that is some approximation of the conditional distribution of a program that a programmer would write conditioned on the prompt. Now I do believe that it’s still the case that this distribution has some const*k divergence from the ground truth, where k is the number of lines, and so indeed you might need to sample exp(const*k) samples from it to get a correct one.

    So it’s definitely not “monkeys typing on the typewriter “ but the scaling of resources with the size of the program may be exponential. (Doesn’t mean that this approach couldn’t be the basis of something that scales, especially since human designed programs are ultimately made of composing functions/components that don’t have too many lines)

  60. Nicholas Teague Says:

    Section 6.5 in this paper is a somewhat fundamental reconsideration of basic elements of training a model. The bias-variance tradeoff has traditionally been interpreted to suggest that, short of double descent regime, entering state of validation loss overfit is universally a detrimental outcome. They explain the benefit as resulting from interpreting the model from a one-of-many style of inference, where they sample predictions in bulk and select amongst the resulting aggregate to derive a final, so that even though they are realizing overfit on a majority, the best case inference will improve while average inference is decreasing. They propose that future work can look to derive a better aligned validation metric since solve rate is computationally expensive.

    I’ll be really impressed when this type of performance can be achieved without fine-tuning on a representative data set. That’s when AGI will be within reach.

  61. anon85 Says:

    Oh man, I just looked at the OpenAI math olympiad thing. It’s so bad it’s borderline fraudulent. STOP HYPING UNIMPRESSIVE RESULTS, PEOPLE.

    They claim they can solve “some” IMO problems, but they actually solved only 2, and a closer look indicates one of the 2 was not an IMO problem at all (but merely “adapted” from an IMO problem, but in the adaptation process the problem got much easier, as the OpenAI authors admit in their paper). The one (1) IMO problem they did solve was from 1964 (note the IMO has gotten harder over time). The solution to that IMO problem that was given by the model is the one-liner:
    “nlinarith [sq_nonneg (b – a), sq_nonneg (c – b), sq_nonneg (c – a)]”
    i.e. it’s simply a call to a different theorem-prover.

  62. anon85 Says:

    Scott #57:

    “Oh, if only the interpretation of the contest rules in 1997 had been more in Kasparov’s favor.”

    Interestingly, exactly this issue arose with StarCraft, and after fixing the interpretation of the contest rules, no StarCraft AI can beat the best humans (even though 3 years have passed since AlphaStar). Sometimes the rules do matter.

  63. John Haugeland Says:

    The thing that bothers me about this is that everyone seems to be making an economic argument without actually considering the economics.

    Fundamentally, the thing that’s ostensibly exciting here is that we have a magic black box that can program for free. Spit a description at it, and out pops code.

    Neat. I guess.

    Until it’s time to put it into practice.

    So I tried writing Tic Tac Toe. Vanilla JS in a browser, no graphics, no network, no opponent, just for two human players at the same keyboard or mouse. Rejects impossible moves, resets for a new game, detects game over, nothing else.

    Took me about 40 minutes, half of that looking for stupid bugs.

    Next I tried with Github Copilot. Took me about three hours, almost entirely looking for bugs, having just done it once already.

    The problem is that although writing it is cheaper, debugging it is more expensive by an order of magnitude, because you can’t trust that a viable human had the right idea. You have to check every little detail for bizarro land subtle errors no human would ever make, which is far harder than just writing it in the first place.

    And that’s for a trivial job like TTT. Imagine that on hard things, where we already struggle to find bugs.

    To me, this seems like a non-starter. YMMV.

  64. Sid Says:

    John Haugeland #63:

    I think the idea with something like AlphaCode is that IF you have a good unit test suite, THEN it can generate code you can trust.

  65. Akshat Mahajan Says:

    Scott #36

    > With the use of deep learning for translation, voice recognition, and face recognition, we similarly don’t understand in any detail how it works, yet those applications have changed the world. AlphaCode looks to me like it could already be useful for practical programming, just as it stands, and of course we should expect such things to improve dramatically in the coming years.

    You’re citing applicable utility as your metric of progress, but my whole point is that ability to reason about the candidate space of architectures is much more meaningful as a measure of actual progress. Utility is *not* a helpful metric by itself, for reasons in the next paragraph.

    The analogy I used earlier is to a Rube Goldberg machine. A Rube Goldberg machine has utility, of course – that’s why it’s remarkable – but it’s also extremely sensitive to random perturbations (perhaps the mousetrap expanded from summer heat, or the rolling ball gathered enough dust to stop just shy of its target). This makes it lack several features we would reasonably want, most notably predictability (what will it do on this run?) and tunability (how can we make it less likely to fall apart without warning?), key ingredients for both safety and user experience.

    If we didn’t have a model of how to relate a Goldberg’s output to its input – if we couldn’t understand what random perturbations were breaking the Goldberg machine – we could never correct for these properties. Clever engineers would tinker and add more layers, discover that all the resulting complexity seems to add just enough noise to cancel out the other random perturbations most of the time, and go home satisfied. They haven’t fixed the underlying tunability and predictability issues, and they don’t need to – if it can demo and not tip over, you can get published for it.

    Such systems have utility. They are not useful progress in building better machines.

    Deep learning models, like Goldberg machines, are sensitive to random perturbations in data. We don’t know what perturbations get amplified – our only defense becomes stacking more layers and feeding it more data, hoping it compensates for the random features. If you carefully curate the data, take a few layers and pass it around, mess with the weights or the loss function, you arrive at juggernauts that do absurd things in a limited context. These juggernauts remain unsafe and useless to deploy, because you can make it baulk in unusual – and critical – situations.

    We don’t worry about Goldberg machines because we actually *do* know what makes them fail. We are armed with classical mechanics. We can invent simpler components whose tolerances we comprehend. We can not only rate the thresholds of operational safety for a given configuration if components, we can do one better and replace the Goldberg machine altogether.

    No such model exists yet for deep learning models. We cannot rate them for operational safety, nor can we design and defend simpler components with fewer degrees of failure. If we can’t do that, we haven’t built *better* AI – we’ve only made too-big-to-fail ones.

    tl;dr I am not disputing the Bitter Lesson of AI. I am contending that concluding it’s the whole story based purely on these perceived “advances” is unsatisfactory.

  66. Scott Says:

    Akshat Mahajan #65: One of the most fundamental axioms of my philosophy of science is that, when someone does something new that’s obviously impressive—for example, builds a system that achieves previously-unattainable performance, or does a type of calculation that no one could do before (and consistently gets the right answers), etc.—outsiders don’t get to invent philosophical reasons why it “doesn’t really count.”

    Sure, they can argue that the seemingly-impressive new performance was actually an illusion (and healthy skepticism on that count is of course needed). Sure, they can express dissatisfaction with the lack of rigor of the new methods, or our lack of understanding of why they work, and they can (and should) call for more research to fill those gaps.

    But dismissiveness isn’t one of the allowed options. That’s been the path of stagnation and failure, the wrong side of history, in every single case I know without exception.

    This is true if Euler proves that 1+1/4+1/9+1/16+… = π2/6 using new methods of questionable rigor. It’s true if Appel and Haken prove the four-color theorem using a computer. It’s true if Feynman and his friends get quantum electrodynamics to yield accurate predictions by cancelling one infinity against another one. It’s true if physicists correctly calculate the Bekenstein-Hawking entropy of black holes using hocus pocus involving Euclidean path integrals or imaginary time or replica wormholes or supersymmetry.

    And, finally, it’s true if deep learning achieves things of obvious practical interest that the more declarative forms of AI tried and failed to achieve for generations, but only by using “unsound” and “non-explainable” methods.

    You can argue, as you did in your comment, that the new methods are inherently unsafe—and that therefore, despite their apparent usefulness, despite the obvious fact that they “work,” humanity would be better off not deploying them, and waiting for methods that are better understood. And maybe you’re right. Maybe it’s like thermonuclear weapons. That’s a social and political question.

    But—once again like with thermonuclear weapons in 1955—an option that’s no longer open in the world of 2022 is to say that deep learning shouldn’t impress us, that what look like its spectacular practical successes shouldn’t scientifically “matter” or “count,” because the technology fails various a-priori philosophical desiderata that were made up by people who didn’t create it.

  67. Scott Says:

    Incidentally, if anyone is wondering whether what I wrote in #66 also applies to quantum computing, the answer is emphatically yes! On this blog, I’ve often argued that claimed heuristic quantum speedups were not as impressive as they looked, because it was plausible that they could be matched classically. Not once, in 15 years, did I ever argue that a heuristic quantum speedup didn’t “count” because we didn’t rigorously understand it. Rigor, for me, only has instrumental value, in helping to make the case that a claimed quantum speedup is real.

  68. Verisimilitude Says:

    To #66, the issue is that these “obviously impressive” results are mechanical turks. None of these systems actually work, and they’ve been deployed widely in some areas of life. Now people excuse “computer vision” systems that misinterpret a truck as part of the sky and kill someone, because don’t we know humans are also susceptible to optical illusions, which are totally the same thing.

    Not one of these systems has an internal model of reality that’s in any way useful, and I’ve seen people go from claiming that the self-driving cars will be so much better than people, to claiming that they’re basically as good, so people shouldn’t be allowed to drive at all without them.

    I’ll continue to sneer at this nonsense, until it implodes again.

  69. Scott Says:

    Verisimilitude #68: Google Translate doesn’t work? The voice recognition in my phone doesn’t work (if not perfectly, good enough for when I’m walking and need to get something down fast)? Who are you hoping to convince?

    As for the self-driving cars, it seems clear that their risk of crashing is already well below my own risk (if I still drove—I stopped four years ago because I hated it and found it terrifying), and in a few more years it will be below the risks faced even by good drivers. I’d probably use it right now if it were legally available.

  70. A. Karhukainen Says:

    So basically the modus operandi is that of Google and FB: mining the big data masses produced by humans and selling what was found back to us? Don’t you think this kind of “culture of averages” will become too top-heavy some day? Such programs work (effectively) only in the big GPU/TPU-farms, all made in one big giga-factory somewhere in Taiwan.

    Also I think Fast Typist has the point up there. E.g., to prove that an AI is capable of something new, let it find for example, a fast factorization algorithm. But that would certainly require finding first some genuinely new mathematical phenomena. (I don’t mean they would need to be rigorously proved, some “Eulerian insights” would be enough!) But is the “sophisticated data mining approach to AI” actually capable of real intellectual insights leading to genuinely new ideas?

  71. Andrew Kay Says:

    Scott #32: If “programming” can be reduced to “provide some examples of the input/output behavior you want, plus some English text to prime an AI in the right direction to generate a formal spec and/or the actual code,” do people have any idea what a huge deal that is?

    Question is, is it harder to make the formal spec than the /algorithm/ to implement the spec? (Logically the spec is easier simply because an algorithm is also a formal spec… in some sense.) I’d argue that the spec ought to be easier, because it doesn’t have to be “efficient” or “sequential” only “clearly correct.” And it should be easier to reason about, say, whether the edge cases are correctly captured.

    My original point is just that, if you don’t have some idea in your mind of the abstract spec and the correctness proof, even highly informally, then what the hell is it that you’re implementing? And later, the programmer needs to communicate his/her/its understanding to convince others that the implementation is correct, not only on the test cases. I tried to suggest that conveying a formal proof might be easier for the AI than to construct a correct /in/formal proof that can be readily understood. This should work for normal engineering cases. If your code termination depends on the Collatz conjecture then of course you are screwed.

    You say you never proved a program formally, which is reasonable given the generally available tools, but I’m sure you’ve done the informal proofs in your head as you coded. Otherwise, how do you know that the greedy algorithm didn’t miss a better decomposition?

    Thanks for the great blog.

  72. anon85 Says:

    Scott #69:

    “As for the self-driving cars, it seems clear that their risk of crashing is already well below my own risk (if I still drove—I stopped four years ago because I hated it and found it terrifying), and in a few more years it will be below the risks faced even by good drivers. I’d probably use it right now if it were legally available.”

    People have been saying “in a few more years” for years now. In 2015, people were telling me it will happen in 2018.

    Googling some numbers, it seems to me that self-driving cars have driven on the order of ~30 million miles ever, virtually all with human supervision. Over that span, they’ve been responsible for 1 fatality. For human drivers, a Google search tells me the fatalities are around 1 per 100 million miles. This means that so far, despite the human supervision (which matters, as humans take over from the bots at least once every 30,000 miles or so due to some incident), and despite the fact that self-driving cars are deployed in good weather/visibility conditions, they’ve been 3x more deadly than human drivers.

    To gather enough data (within the next few years) to convince you that a particular self-driving car setup has become safer than humans, the operators would need to scale up the operations by a factor of 10 or perhaps even 100, and they would also need to decrease the rate of disengagements substantially. I don’t see this happening that soon.

    I think most people don’t realize just how few miles self-driving cars have under their belt. Even if self-driving cars became super-humanly safe tomorrow, it would take many years of data to convince anyone of this fact (I think it would actually take decades under the current data collection rates). I mean, I suppose we could easily see them making fewer sub-fatal accidents, but the risk of black swans (which cause the AI to kill someone due to a rare out-of-sample situation) would still be scary.

  73. drm Says:

    There may be a ways to go in AI use of natural language. A couple of weeks ago there was a site making the rounds on twitter that promised to translate the abstract of your paper into language a second grader could understand. Sounds like a good idea so I tried it on two of my recent papers. The first failed in the expected way. Through several attempts, the AI struggled to extract the main idea of the abstract beyond making very general statements about biology. Fair enough, my writing is in penetrable. The results for second abstract were kind of creepy. It extracted the main idea of the paper and put it in accessible language. The only problem was that it clearly and repeatedly attributed the ideas and results in the paper to person I had never heard. I have no idea how it connected this person to the results in the paper. This is of course a capital crime in academia equivalent to a self-driving car hitting a pedestrian.

  74. Nick Nolan Says:

    #69 Scott Says: “As for the self-driving cars, it seems clear that their risk of crashing is already well below my own risk ”

    This is true as a Level 1,2 drive assists in the streets, but it’s best to describe them as ADAS (Advanced driver-assistance systems).

    Tesla still can’t make a reliable left turn on a busy road without traffic lights. When there is oncoming traffic, a left turn requires a commitment to action that can take 5-15 seconds to complete and makes implicit assumptions about other drivers’ reactions during that period.

    Only after there is no driver needed will the economic benefits emerge.

    I suspect that driverless cars are in the future but they are implemented more as they work in Charles Stross’s novel Halting State/Rule 34.

    (1) Car is autonomous most of the time, but the remote human driver is ready to take over and navigate the car at moment’s notice. When you have 1 remote operator per 20 or 50 cars, the cost of the remote driver becomes insignificant.

    (2) 5G mobile networks are required. 5G Vehicle-to-Network standards and 5G standards for safety and security-critical applications ensure that driverless cars can function in most roads and cities.

  75. Boaz Barak Says:

    It seems that people are confusing two different questions:

    1. Are these development in AI deeply impressive and indicative that progress in the field is unlikely to stop any time soon, and eventually have great impact on society?

    2. Can we deploy this commercially today and replace human programmers?

    I think the answer to 1 is a resounding yes, but the answer to 2 is no.

    This is actually quite common situation in technology development, and was the case with personal computing, the Internet, and many other developments. There is an initial “proof of concept” that people are very excited about. There is a lot of hype, and some companies promise to deliver these advances in the very near future. The road from proof of concept to commercial practicality and reliability is always longer than expected, and some of these companies go under. But in the long run, often these technologies exceed the “hyped up” early expectations, just not on the expected time scale. This is why Bill Gates wrote “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”

    I also believe that, apart from progress in making AI better and more reliable, we will also make scientific progress in understanding better the principles behind such systems, their capabilities and limitations. So, the analogies to “black boxes” or “Rube Golbderg machines” will become less and less apt with time.

  76. Scott Says:

    Boaz Barak #75: Hear hear!

  77. Scott Says:

    anon85 #72:

      Even if self-driving cars became super-humanly safe tomorrow, it would take many years of data to convince anyone of this fact (I think it would actually take decades under the current data collection rates).

    Don’t you see the catch-22 here? A technology is new, therefore unsafe, therefore it can’t even be tested at scale, therefore there’s no learning and feedback process to make it safer. Even though both the fundamentals and practical experience make it plausible that the technology could actually be orders of magnitude safer than what it replaces, were it suffered to develop in the normal way. It’s exactly the same with nuclear power, which would’ve saved the world from climate change had it not been strangled from the 70s onward.

    Can you imagine if the Wright brothers lived today? Airplanes would be endlessly ridiculed on social media for their atrocious safety record, for being yet another irresponsible techbro obsession that failed to live up to the hype, with every crash endlessly discussed as an indictment of the whole enterprise. So it would be nearly impossible to get permission to build or test new planes, so they’d never become far safer than the drive to and from the airport, as they famously are today.

    These are social phenomena, not technological ones. And they’re some of the saddest imaginable commentaries on our civilization. How did we take this wrong turn?

  78. Ilio Says:

    Scott #update, Boaz Barak #59,

    That’s intriguing thoughts, but how confident are we that human coders don’t face the same difficulty?

    (I guess humans circumvent this through building libraries, social coordination and feedback from users, but I still can’t figure out how to prove that alphacode can’t do the same)

  79. Mateus Araújo Says:

    I don’t think the heroic effort it took to train AlphaCode is a valid objection. Keep in mind the heroic effort needed to train a *human* programmer. Months, if the human in question already has the background knowledge in language and maths necessary, and years if one starts from scratch (9 years in the case of Scott’s daughter).

    I do think that Google Translate is a counterexample to the “bitter lesson”, though. I was amazed about what it could do when it launched 15 years ago, but it has clearly plateaued: it was useful but kinda shit back then, and it is still useful but kinda shit now. There’s a glaring error in almost every sentence, clearly caused by the fact that it doesn’t understand the text it is trying to translate. I think 15 years of little progress is strong evidence that their approach is a dead end.

  80. Scott Says:

    A. Karhukainen #70:

      Also I think Fast Typist has the point up there. E.g., to prove that an AI is capable of something new, let it find for example, a fast factorization algorithm. But that would certainly require finding first some genuinely new mathematical phenomena … But is the “sophisticated data mining approach to AI” actually capable of real intellectual insights leading to genuinely new ideas?

    Sorry, this is once again indicting the talking dog for not being to speak like Christopher Hitchens. Very few humans have been able to discover “genuinely new mathematical phenomena.” (And of course, if a fast classical factoring algorithm exists at all, the number of humans smart enough to discover it so far seems to stand at 0.)

    An AI that “merely” automated all the more routine parts of programming and theorem-proving, leaving the “genuinely new ideas” to the humans, would be one of the most revolutionary developments of our lifetimes, up there with personal computing and the Internet.

  81. fred Says:

    Woof Woof!

  82. Max Ra Says:

    What would change your mind to explore research on the AI alignment problem? For a week? A month? A semester?:P

    (sorry if I missed you discussing this before, I only found posts that show you’re familiar with and sympathetic to the problem)

  83. Scott Says:

    Max Ra #82: The central thing would be finding an actual potentially-answerable technical question around AI alignment, even just a small one, that piqued my interest and that I felt like I had an unusual angle on. In general, I have an absolutely terrible track record at working on topics because I abstractly feel like I “should” work on them. My entire scientific career has basically just been letting myself get nerd-sniped by one puzzle after the next.

  84. Akshat Mahajan Says:

    Scott #66

    > But—once again like with thermonuclear weapons in 1955—an option that’s no longer open in the world of 2022 is to say that deep learning shouldn’t impress us, that what look like its spectacular practical successes shouldn’t scientifically “matter” or “count,” because the technology fails various a-priori philosophical desiderata that were made up by people who didn’t create it.

    I acknowledge where you’re coming from, but you’re making a few category errors in this case:

    1. The considerations I lay out aren’t coming from a layman’s perspective – they are shared by leaders in the field. It is wrong to say these criteria for progress were outlined by people who “didn’t create it”. See: OpenAI’s vocalness about AI safety, Michael I. Jordan’s [critique](https://hdsr.mitpress.mit.edu/pub/wot7mkc1/release/9) in 2019, which I quote below

    > However, the current focus on doing AI research via the gathering of data, the deployment of deep learning infrastructure, and the demonstration of systems that mimic certain narrowly-defined human skills—with little in the way of emerging explanatory principles—tends to deflect attention from major open problems in classical AI. These problems include the need to bring meaning and reasoning into systems that perform natural language processing, the need to infer and represent causality, the need to develop computationally-tractable representations of uncertainty and the need to develop systems that formulate and pursue long-term goals. These are classical goals in human-imitative AI, but in the current hubbub over the AI revolution it is easy to forget that they are not yet solved.

    2. You’re confusing rigour with intelligibility. In each of the example cases you cite – Feynman’s, Euler’s, Hawking’s, Appel’s – and many of the cases you don’t cite but which also should be mentioned in the same regard – Cantor, Dirac, Galois, Ramanujan – the novel methods were intelligible, even if they weren’t justified: their specific scientific innovation, and its consequences, could be readily assimilated and thus, crucially, *built on* for more innovation.

    In deep learning’s case, no such intelligibility presents itself. It’s not clear *how* a novel innovation can be introduced, only that the status quo seems to work. A good parallel is Bohr’s model of atomic behaviour. When Bohr invented his model, people found it only worked for the hydrogen atom. Bohr introduced several arbitrary rules that, taken together, just “worked” in that special case. But because Bohr did it prior to Schrodinger’s work, people had no idea how to extend his rules to other cases. Trying to fit variations of his rules only gave rise to epicycle-esque modelling, and failed miserably until a proper integrative theory came together.

    Taken all together, I can agree this is progress, but it’s progress in a very narrow sense that experts interested in the fundamental questions don’t care for, because it teaches us nothing operationally.

  85. fred Says:

    Scott #32

    “If “programming” can be reduced to “provide some examples of the input/output behavior you want, plus some English text to prime an AI in the right direction to generate a formal spec and/or the actual code,” do people have any idea what a huge deal that is?”

    And anyone working in programming in the real world (e.g. financial software) knows that, a lot of time, the “specs” are so vague or so ill defined that no one realizes they actually lead to contradictory behavior/requirements, which becomes only more obvious way down the line when more complex “edge” test cases are uncovered and the code is like a snake eating its own tail. At this point the coding typically switches to a mode that’s test driven: any new edge case is incorporated and carefully documented while making sure the change doesn’t create regressions on the prior edge test cases (and everyone hopes noone will notice the potential contradictions in behavior).
    A spec that would cover all this would be as complex as the code that needs to be generated…

  86. fred Says:

    Most real-life successful software businesses are about slapping more and more complex functionality (everything is growth driven!) on top of a 20+ year old code base used by dozens of demanding clients, with a few key senior coding experts doing their best to make sure the whole thing doesn’t collapse under its own weight by reviewing vague new specs and supervising more junior coders.
    It all feels more like entropy/chaos fighting than “give me a bunch of inputs/ouputs and I’ll write you an algorithm” (which is the academia idea of software writing… this sometimes happen in the real world too, especially the rare few times you create something from scratch, but this feels like a treat or a vacation).

  87. Miguel Says:

    > Judged against where AI was 20-25 years ago, when I was a student, a dog is now holding meaningful conversations in English. And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands.

    Yeah to me (speaking personally) it’s like a few thousand boosting rounds away from ‘strong’ AGI or smt.

  88. Stassa Patsantzis Says:

    >> Forget all that. Judged against where AI was 20-25 years ago, when I was a
    student, a dog is now holding meaningful conversations in English. And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands.

    That’s not right. AlphaCode’s abilities are nothing new. AlphaCode is a (neural) program synthesis system. Program synthesis approaches can solve problems like the ones in the three datasets attempted by AlphaCode (CodeContests, CodeForces and APPS), from their examples of inputs and outputs without recourse to a natural language description. Even program synthesis from natural language specifications is nothing new. And there’s been plenty of work on neural program synthesis that is more advanced and more sophisticated than AlphaCode. AlphaCode is incremental work, that is of interest only in the context of other work specifically on generating code with large language models learned by a neural net with a Transformer architecture (hence, I guess, why the DeepMind preprint only compares AlphaCode’s performance to such systems). But it is incremental work and I’m afraid the excitement in this blog post is not justified. The right measuring stick to use to understand AlphaCode’s performance is not the progress of the last 20-25 years in AI research in general, but in program synthesis in particular.

    Regarding the “Update” paragraph: yes, there is a relation between lines-of-code and the cardinality of the search space for programs, let’s call it H (for “Hypothesis Space”). The cardinality of H is a combinatorial function of all tokens accepted by the grammar of some target programming language. H can be finite, including only programs of a certain size (size as in number of tokens). Lines-of-code can be an approximation of program size, so we can get an idea of the cardinality of H from lines-of-code in a target program.

    However, the approach taken by AlphaCode, of searching H, is the bog-standard approach in pretty much 99% of program synthesis in the last few decades (although generation and search approaches tend to be more sophisticated; rather). I confess I have no clue what the “Update” above is saying about quantum computers, but whatever it means, then it’s something that must obviously apply to _all_ program synthesis algorithms that search H. If that’s so, then that may be a result of interest to the program synthesis community (although, again, I haven’t got a clue what it means; I’d be grateful for an explanation).

  89. Stassa Patsantzis Says:

    I forgot to say: hello. It’s my first time posting on your blog and I don’t know the protocol for introducing oneself. I hope this does it 🙂

  90. Pavlos Says:

    It’s more a parrot than a dog 😉 In that sense, it will become more and more powerful at reducing repetitive tasks. If a task can be digitized has been performed enough that it samples the space of possibilities satisfactorily (with additional tricks of data augmentation, adversarial generation etc) them ML can take it up and keep repeating (its value of course comes from being able to impute the gaps in previous repetitions, statistically). But human invention and understanding clearly are not just statistical in nature, so these are out of touch for current technologies. What will be interesting is to see what it fails on. These failures will better disambiguate the non-statistical part of our intelligence.

  91. Nick Drozd Says:

    GPT-3 was able to produce text that looks an awful lot like human-produced text as long as you don’t look too close. It’s a poor tool for creating real knowledge or insight, but it’s a brilliant tool for creating marketing spam, low-grade political propaganda, or essays for students with teachers who don’t have time to read everything.

    In short, it’s great for domains where accuracy isn’t all that important, but volume is.

    It may turn out to be similar for programming. In the world of bloated corporate Dilbert-style software development, projects are often driven by political concerns rather than actual use-cases. Managers up and down the chain want to be in charge of larger and larger budgets and headcounts, and they need projects to justify this. What the projects actually do doesn’t really matter, and whole teams can work for years on end without having a clear idea of what they are doing.

    In these conditions, machine learning can really shine. After all, if nobody knows what the app is supposed to do, who’s to say that the AI-generated code isn’t correct? This technology will allow shitty enterprise software to be produced at a scale that was previously unthinkable.

    It’s possible that AI “programmers” will end up like human programmers, producing huge volumes of code that they themselves can’t understand. Perhaps human programmers will be relegated to “maintaining” AI-generated software systems. Can you imagine the kinds of bugs that might turn up?

  92. Craig Gidney Says:

    @Akshat #84

    > In deep learning’s case, no such intelligibility presents itself. It’s not clear *how* a novel innovation can be introduced, only that the status quo seems to work. […] I can agree this is progress, but it’s progress in a very narrow sense that experts interested in the fundamental questions don’t care for, because it teaches us nothing operationally.

    This seems objectively false to me, because it predicts the field of deep learning should not be making progress. But every year the field keeps building on its results, using computers to do tasks better than anyone has been able to do before.

    For example, I predict that they will continue to make progress on AlphaCode. 90% that next year they’ll be able to solve a larger proportion of problems. 70% that they’ll do it using less marginal computational effort per problem than today. 55% that they’ll do it with less total training cost.

  93. fred Says:

    Is anyone working on the same thing for math? Like using the International Mathematical Olympiad problems as input sets? (probably much harder in practice because you can’t just “run” a solution)

  94. Craig Gidney Says:

    Stassa #88:

    You say AlphaCode is incremental work relative to existing program synthesis. I’d like a reference for your claim. Can you point to the specific existing paper that achieves close to this level of performance on CodeForces-like problems?

  95. Matt Putz Says:

    Scott #83

    Have you seen the ELK (Eliciting Latent Knowledge) write-up from ARC (Alignment Research Center, led by Paul Christiano)? They have a contest going on to find solutions to a problem they define (and a bunch of people have won money already). It’s a relatively self-contained thing that you could make progress on in 5 to 30 hours (depending on background and a bunch of other factors). At the same time, it’s also getting at an important part of the problem.

    I thought it was genuinely interesting to see how they managed to turn quite an abstract problem into a very clean and concrete one.

    Here’s Holden Karnofsky making the case that people should try it.
    https://forum.effectivealtruism.org/posts/Q2BJnpNh8e6RAWFnm/consider-trying-the-elk-contest-i-am

    Official deadline for submissions is 15th Feb, but I’d be surprised if they weren’t stoked to hear from you even afterwards (note that I know nothing though).

    Also, because now I’m curious, do you think that money could ever motivate you to work on AI Alignment. If it was enough money? Can you imagine any amount that would make you say “okay, at this point I’ll switch, I’ll make a full-hearted effort to actually think about this for a year, I’d be crazy to do anything else”. If so, do you feel comfortable sharing that amount (even if it’s astronomically high)? Asking because I’m curious not only about you, but also about other physicists, mathematicians etc.

    (For the record, while I currently do take AI alignment seriously, I’m not 100% confident. Of course, I respect the choice to work on other things.)

  96. Sid Says:

    Akshat #84:

    At least for me, when things qualitatively changed in regards to perspective on abilities of current ML methodologies was with GPT-3 which was followed up with Copilot and AlphaFold (AlphaCode is cool but I don’t think it is fundamentally more capable than Copilot). I feel Michael Jordan may have a different perspective of AI given those.

  97. Scott Says:

    Stassa #88-#89: Hello and welcome! I was going to ask you exactly the same question that I see Craig Gidney #94 has already asked.

  98. Scott Says:

    fred #93:

      Is anyone working on the same thing for math? Like using the International Mathematical Olympiad problems as input sets?

    Did you not see OpenAI’s announcement from just last week, of automated solutions to several IMO problems? To my mind, it’s less impressive than AlphaCode, because it can only read formalized versions of the IMO problems in Lean rather than plain-English problems, but it still looks like a very interesting advance.

  99. Scott Says:

    Matt Putz #95:

      Also, because now I’m curious, do you think that money could ever motivate you to work on AI Alignment. If it was enough money? Can you imagine any amount that would make you say “okay, at this point I’ll switch, I’ll make a full-hearted effort to actually think about this for a year, I’d be crazy to do anything else”. If so, do you feel comfortable sharing that amount (even if it’s astronomically high)?

    For me personally, it’s not about money. For my family, I think a mere, say, $500k could be enough for me to justify to them why I was going on leave from UT Austin for a year to work on AI alignment problems, if there were some team that actually had interesting problems to which I could contribute something. But probably that team could put the money to much better use on others—maybe even my students!

  100. Not Even Inaccurate Says:

    DL researcher and practitioner for 6 years here, half of them in autonomous driving. Let’s bring some accuracy to the table.

    anon85, #72: 30M miles for self-driving cars, in total? Ummm no. No no no. Tesla alone is at 4-6 billion miles, on autopilot mode. Mobileye collects data from many (most?) OEMs using its services (at one time, it was most of them, counting by vehicles…), and that data feeds directly into self-driving training. Precise total driving miles are hard to put a finger on, but one report says “up to over 25 million kilometers per day, according to Mobileye”. I guess the author of that comment was thinking about dedicated test fleets and not of other sources of data?.. Even then, 30M is a bad underestimate. And that’s without counting simulated time (which I guess you would dismiss, even though it clearly helps to some extent, but that’s a secondary argument to have).

    Akshat Mahajan, #65: Sensitivity to input perturbations? I guess you’re talking about adversarial perturbations? Well, sure, deep learning has this issue. Funny thing is – we’re only talking about these because the approach _works_ at all. Any model will have perturbations it is sensitive to – the research on these originated with the nicely theoretically understood SVM (e.g. Poggio and Mukherjee, General conditions for predictivity in learning theory, 2004).

    There are theorems guaranteeing the presence of adversarial examples for any sufficiently interesting combination of data and model. But we only care about them as an important and somewhat surprising phenomenon in the context of deep learning, because there’s nothing surprising about older models not getting them right, and because we’re only now finally tackling data and decision boundaries of sufficient complexity! So it’s like criticizing a good runner for failing to go under 10s for the 100m distance while letting my (16s?) performance go unquestioned.

    Verisimilitude #68: go ahead and sneer 🙂 But just a few questions. Say I’m a factory owner, and I want to detect flawed products coming off the line. I tried simple heuristics, and I tried having people watch the line. Both are unsatisfactory. What should I do? join you in the sneering sneering, or test that DL-based approach I heard of, find it imperfect but enormously better than alternatives, and adopt it? Or say I’m a dental insurance company. Intelligent and self-explaining human experts can’t even agree where a tooth root is, precisely, and so whether or not I have to pay a patient or not is a length, half-random, expensive process. Or I could use that nice little DL start-up, test it, and find it saves everybody involved (patients, clinic, insurance) a great deal of money. Or I’m a large agricultural company. I want to fly drones over fields and generate insights and … same story. Radiography. 3D reconstruction for archeology. Promising drug leads. I could go on and on for some fifty or five thousand completely different use cases. In all of them, the choice is between explainable models _that don’t work_ and DL-based approaches that do, albeit imperfectly. Now. Not in the future. Are all of these people allowed to go for the one that works, while you’re sneering?

    Look, if anybody just goes and says “deep learning is magic and will solve every and any problem”, they’re obviously full of crap. But I’d respectfully submit that anybody denying the obvious and ubiquitous progress in using deep learning to our lives better is full of the same thing.

    Nick Drozd, #91: the insight that ML is better at situations with relatively low accuracy and high volume is probably correct. But perhaps your examples don’t fully cover that range of applications. Security alarms, potential production flaws, medical anomalies can all fall into that category, at least sometimes, and are all reaping the benefits of ML as we speak.

  101. Not Even Inaccurate Says:

    Akshat Mahajan, #84: your comment is interesting and contains several important points to address. Certainly we all have the aesthetic preference for progress driven by human understanding, and certainly we all can marshal arguments in favor of this preference. But we don’t have to limit ourselves to funding Jordan’s group alone, however impressive! We must take whatever progress we’re fortunate to get, and make the best of it. Moreover, I’d venture to say your comment doesn’t describe progress in deep learning accurately, on the factual level, in a few ways.

    “Merely” using more data is a major driver of the observed progress, it’s true. But “merely”, as was remarked above, hides a lot of work on scaling the various processes involved, and frequently involves theoretically interesting ideas and breakthroughs. Whether they are of interest to “experts interested in the fundamental questions don’t care for” depends on the experts. As an example, training models in a highly distributed, asynchronous manner, is an active and fascinating field – one replete with theoretical contributions and touching on rather fundamental aspects of learning.

    The very nature of observing and measuring progress has changed in a fundamental (and fundamentally interesting!) manner. Datasets of the scale and ambition of ImageNet, MSCoco etc. have not been available in the past, and their existence allows us to tackle questions of measurement and evaluation that weren’t around two decades ago. Problems of quantization, pruning, knowledge distillation and so on were barely relevant for smaller models and datasets in the past, but now open fascinating vistas of theoretical insight. You mentioned sensitivity to input perturbations – as I wrote in another comment, that subject wasn’t DL-specific, but it makes more sense to think of adversarial examples thanks to deep learning. Many more examples of this character can be found, if you wish.

    Finally, your specific claim (as I understand it) that progress is not a result of insight is at least sometimes obviously wrong. As a particularly important example, let’s consider the situation facing researchers in 2015-16. They were observing a curious thing. Up to a certain depth, adding layers was very frequently beneficial (after batch normalization and other techniques were introduced). But for most problems and models, at some stage, adding more layers was causing problems. Degradation in _test_ accuracy was to be expected – we’d simply say that’s overfitting! But the observed degradation was in _training_. Ah, you say, the old vanishing gradient/ your favorite similar explanation! But it simply wasn’t that. Well, the researchers asked. If we add some layers and make them learn the identity mapping, surely that’s a deeper model that performs as well the original one? But experiments showed that this didn’t work out. The identity mapping wasn’t being learned well. To make a long story short, they understood that (oversimplifying grossly) learning was simpler around 0 than around the identity, and conceived of the resnet architecture. Unlike many little architecture tweaks suggested in that time, this idea stuck, and resnets are now a workhorse of practically applied deep learning. I challenge you to tell me this is of no interest to “experts interested in the fundamental questions”, or that it’s progress without insight. As before, numerous other examples of the kind could be given, if you wish.

  102. Shmi Says:

    Matt Putz #83:

    I’d guess that to get attention of someone like Scott, one would have to ask a question that sound like (but make more sense than) “what is the separation of complexity classes between aligned and unaligned AI in a particular well defined setup?” or “A potential isomorphism between Eliciting Latent Knowledge and termination of string rewriting” or “Calculating SmartVault action sequences with matrix permanent”

  103. Timothy Chow Says:

    anon85 #72: For a long time, people were predicting that chess computers would beat grandmasters in the next 5 years. They were wrong, and wrong again, and wrong again…and then they were right. Given the quality of self-driving cars now, I find it hard to imagine that they won’t eventually take over. I won’t predict how long in the future it will happen, but when it does happen, I predict that a dramatic shift will happen in a short amount of time.

    Scary black swans may loom large in your eyes, but that’s not really the relevant consideration. There are always some people who are going to be afraid. There are plenty of people today who are afraid of flying, but that hasn’t stopped commercial aviation from (ahem) taking off. The point is, there are are plenty of people who have strong incentives to switch to autonomous driving once it becomes viable, even if there are risks. A commercial trucking company might be prepared to pay out $1 million/year in lawsuits and out-of-court settlements if it saves them $2 million/year. Seniors or people with disabilities who have a choice between a self-driving car and no mobility at all may well choose the self-driving car even if it carries some risk. And once enough people take the plunge, there won’t be any going back. For decades, nobody has seriously proposed getting rid of motor vehicles entirely, despite the staggering number of deaths they have caused, all of which could have been avoided if motor vehicles were banned.

  104. JimV Says:

    Okay, several people are objecting to progress being attributed to things that seem to work but without our understanding how they work.

    Welcome to our universe. Some things work in it, others don’t. Trial and error plus memory and the ability to manipulate physical tools can discover which is which, and the mechanics of how to use the things that do work. Then people agree on a provisional story that seems to explain how the working things work, and which is useful for training future generations, until more data disagrees with the story and it has to be revised. The quest for understanding, while useful, is in my opinion ultimately doomed to reach barriers. Why are there electrons? Some axioms are just given, not explained by anything deeper.

    Trial and error plus memory is the basic story of how intelligence works. It created all the biology on Earth, including the human brain. (At some point having appendages, hands, which can manipulate objects becomes useful also.)

    (Neural networks seem to be one way to implement trial and error plus memory.)

    “And as a scientist, I’m committed to the idea that, when reality tries as hard as it possibly can to teach us a certain lesson, it’s our responsibility to learn the lesson rather than inventing clever reasons to maintain the contrary worldview that we had before.”–Dr. Scott

    That would make another great sub-heading for this blog. Reminds me impressively of Dr. Feynman’s quote about not fooling oneself. Great minds, et cetera.

  105. Verisimilitude Says:

    To #100, say I propose a vague and contrived scenario in which only one solution works, then does it work? Factories do use heuristics to detect defects well, such as “this chip is darker than it should be, and hence burned”; this is what experts are paid to design and humans paid to detect. The issue with the neural nonsense approach is that it randomly fails without warning, which is unacceptable in a factory setting.

    I’d enjoy seeing insurance handled by neural nonsense, because that would hasten its demise. I can already see the crying over how it denies nonwhites something or another because it was mostly trained on the average citizen.

    “3D reconstruction for archeology.”

    This particularly sickens me. These mindless machinations can’t accurately add details not present, and damn those who claim otherwise. This is very dangerous. These things are going to continue to lie, cheat, and kill people until they be stopped.

  106. Lorraine Ford Says:

    I think it is relevant to keep in mind that computers and computer programs are merely tools in the hands of people. The thing about computers is that a computer program can symbolically represent the equivalent of consciousness and agency, whereas mathematics alone can’t do that.

    More correctly, people created the symbols and Boolean algebra etc., and people found that wires, voltages and transistors can accurately represent the symbol system and the algebra, and people can potentially use these symbols in a computer program to represent the equivalent of consciousness and agency, whereas equations can’t do that.

    The symbols that can represent individual occurrences of consciousness and agency in a computer program are:
    (1) Consciousness: “IF conscious of situation” (where a situation is symbolically represented by variables and numbers and/or an analysis of the variables and numbers, and consciousness is symbolically represented by an implied IS TRUE).
    (2) Agency: “THEN act” (where action/ agency is symbolically represented by the assignment of numbers to variables, in response to a situation).

    The individual occurrences of IFs and THENs, and the sorting, collation and analysis of the variables and numbers, and the generation of new code, is the part of a computer program that can symbolically represent individual occurrences of people’s consciousness and agency. The overall outcomes of computer programs like AlphaGo or AlphaCode can look like consciousness and agency to people who are not mindful of, or people who are wilfully blind to, what is going on behind the scenes. It’s a human creation: computers are merely a tool in the hands of people.

  107. Scott Says:

    JimV #104:

      Okay, several people are objecting to progress being attributed to things that seem to work but without our understanding how they work.

      Welcome to our universe.

    LOL, yes! Ordinarily, I like to think of myself as an extremist on the side of wanting explanations for everything (I even want to explain why God made the world quantum-mechanical…). But not even I feel that I get to dictate to the people who found a demonstrably working solution to their voice recognition or cancer diagnosis or whatever other problem, that they don’t get to use it until they explain its “semantic meaning” to my satisfaction.

      Why are there electrons? Some axioms are just given, not explained by anything deeper.

    Oh, but we do now know why there are electrons! The answer is: “something something spin-1/2 field coupling to U(1) gauge symmetry of the Standard Model something something” 😀

  108. Scott Says:

    Shmi #102:

      I’d guess that to get attention of someone like Scott, one would have to ask a question that sound like (but make more sense than) “what is the separation of complexity classes between aligned and unaligned AI in a particular well defined setup?” or “A potential isomorphism between Eliciting Latent Knowledge and termination of string rewriting” or “Calculating SmartVault action sequences with matrix permanent”

    LOL, yes, that’s precisely the sort of thing it would take to get me interested, as opposed to feeling like I really ought to be interested. 🙂

  109. Tiffany Says:

    I’ve also noticed the application of this sketch to AI research:

  110. Scott Says:

    Tiffany #109: Haha thanks, I guess there’s a reason this same joke recurs in so many variations! The tendency being satirized is real.

  111. Verisimilitude Says:

    To #107, some of us don’t accept the excuse of the world being complex and arbitrary, thus man-made systems and computers in particular should be complicated and incomprehensible.

    This can’t be excused with arguing the neural nonsense is going up against the real world either, because the example in the article is about human language and mathematics, where perfect results can be achieved and should be expected.

  112. Scott Says:

    To all the naysayers here, the way to prove your case to the rest of the world should now be obvious: replicate what AlphaCode did, using a simple, transparent, non-compute-intensive approach. Then come back and post a link to your preprint in this comment section when you’re done. I promise to suspend my policy against “drive-by linkings.” 😀

  113. Henrik Says:

    Scott #112:

    I think at least some of the naysaying is less about the significance of the fact that current AI techniques can do this, and more about annoyance at the misallocation of credit even though it doesn’t have much new methodologically even by the standards of an ML paper and is mostly just standard finetuning on a dataset + drawing tons of samples. It’s like one team figured out the telescope (Transformers models), another team discovers that using telescope with long exposures reveals a rich array of celestial objects (GPT-3/Copilot), and then another team comes along and takes far far longer exposure pictures of a particular patch of the sky that astronomers have interest in and gets credit for the scientific innovations that led to that even though the majority of that was by team 1 and team 2 and the contribution of team 3 was engineering resources + skilled application of insights by 1 and 2.

  114. mjgeddes Says:

    Shmi #102, Scott #108

    I think you have to connect computational complexity to complex systems in the Santa Fe sense. Is this a combination of coding theory and chaos theory?

    Remember my 3 C’s: Causality, Complexity & Compositionality ! All kids in classrooms should be taught the 3 C’s !

    My theory is that neural networks, as I suggested above, are simply enormous ‘statistics machines’ doing ‘generalized regression’; they only approximate 1 of my 3 C’s (‘Causality’)

    Understanding value alignment is light-years away from the current ML paradigm. For that you need to understand my 2nd C (‘Complexity’), which as I suggested, might involve the combination of coding theory and chaos theory.

    My 3rd C (‘Compositionality’) is related to language models and abstraction, but I would suggest that the real solution here is so far away from the current ML paradigm that it’s in another dimension altogether 😉

    As to #112, publicly posting a preprint would be last thing I would do. But I’d be happy to give you all a zero-knowledge proof: a world-wide practical demonstration: Singularity! 😀

  115. anon85 Says:

    Scott #77:

    did you miss the fact that self-driving cars killed someone already? What would your preferred policy be — letting self-driving cars kill however many people they want until they finally become safe enough to use?

    (Note that they killed someone *even though there was a driver in the driver’s seat*. That is, even with a human supervising the self-driving car, self-driving cars are MORE dangerous than human drivers, so far as historical casualty rates go.)

    Perhaps one day we will be in the situation you describe — the situation where there are plausibly-safe self-driving cars that are banned because they are not *provably* safe. We are far from that situation!

  116. anon85 Says:

    Not Even Inaccurate #100:

    I admit I don’t know much about self-driving cars, so I’m just going by what a Google search says. However, I suspect the difference between our numbers is a definitional one. You say Tesla alone has 4-6 billion miles; OK, and how many fatalities do they have? Is there data on this? How about disengagements with Tesla — every how many miles are we talking?

    The numbers I’m using are perhaps 2 years old, I now realized, but back then Waymo was leading the field and was the company most were talking about. And the number of miles were certainly in the 10s of millions, not in the billions. I suspect the Tesla figures are just not comparable, because it’s not the same definition of “self-driving”, but I’m not an expert so feel free to clarify.

    Again, if Tesla numbers count as self-driving: what are the disengagement rates? What are the fatality rates?

  117. Scott Says:

    anon85 #115:

      did you miss the fact that self-driving cars killed someone already?

    From the bottom of my heart: did you miss the fact that ordinary, human-driven cars kill 38,000 people every year in the US, including 4,000 children and teenagers? Would you have accepted that sacrifice, had status-quo bias not made it seem normal?

      What would your preferred policy be — letting self-driving cars kill however many people they want until they finally become safe enough to use?

    My preferred policy is to do the obvious utilitarian thing, and allow the rapid development of a technology that, as it predictably improves, will allow millions more human beings to remain alive.

  118. Scott Says:

    Henrik #113:

      It’s like one team figured out the telescope (Transformers models), another team discovers that using telescope with long exposures reveals a rich array of celestial objects (GPT-3/Copilot), and then another team comes along and takes far far longer exposure pictures of a particular patch of the sky that astronomers have interest in and gets credit for the scientific innovations that led to that even though the majority of that was by team 1 and team 2 and the contribution of team 3 was engineering resources + skilled application of insights by 1 and 2.

    So in other words, the way things actually do normally work in science, for better or for worse … with a huge chunk of the credit going to the first achievers of major milestones that the external world cares about, even when those milestones rested on earlier developments that were in some sense more difficult? 😀

  119. Jacques Distler Says:

    As a Tesla-owner, I can say that we’re very very far from reliable self-driving cars. And, no, I’m not talking about the “cognitive” part (which might be amenable to several orders-of-magnitude more training). I’m talking about the perceptual part: the car’s ability to perceive its surrounding and build a model thereof.

    Even without the $12,000 Full Self-Driving Package [sic], we get to preview the “FSD visualization” — the car’s model of its surroundings.

    In a word, it’s awful. To give you an idea of how awful it is, consider the day I was stopped at a stoplight on 38th Street in Austin. There are 3 lanes; I was in the middle lane, 3rd car from the front of the line. Since we were all at rest, the car had all the time in the world to make sense of its surroundings. In the left lane, at the front of the line (ie, a car-length ahead of me), was an F-150 pickup truck (which, towering over all the surrounding vehicles, was clearly visible). But, on the FSD visualization the pickup truck kept flicking in and out of existence for the full 2 minutes we were stopped at the light. The car could simply not decide whether there was a vehicle present, one lane over and a car-length ahead. While at rest. For two minutes.

    Now, imagine that level of perceptual acuity if we were all moving at ~30 miles/hour in a driving rain storm.

    And, no, this was not an isolated instance. The FSD visualization is just uniformly awful. And if the car can’t see what’s around it, it can’t make intelligent decisions about what to do, no matter how much training it receives.

  120. Kerem Says:

    # 111 Verisimilitude

    It’s remarkable how naturally people assume what looks like “real intelligence” will be understandable or explainable. What makes us think that’s actually what we see in human intelligence? If you ask a chess grandmaster as to why in some variations of the Berlin it is OK to give up the Bishop pair vs why in some other variations it is not, you will get a made-up reason in either case, the real reasons will always be obscure. Yes, humans are very good at wrapping plausible explanations around their complex decisions that came out of their billion-parameter (decades-trained) neural networks but that doesn’t mean any of that is truly explaining anything.

    This is why there is only one (and absolutely no other) way to get better at chess (or in any complex human endeavor) and that is to train your own neural network with painstaking “deliberate practice” and instruction and so on.

    The flat criticism about deep neural networks not being simple to understand or complicated is the most shallow among all others, in my opinion.

  121. Martin Mertens Says:

    Solution to the exercise: x_i = (b_(i+1) + B – b_i)/n

    where B = (b_1 + b_2 + … + b_n)/T_n

    where T_n = 1 + 2 + … + n

    and b_(n+1) = b_1

    Proof: If we add every row together and divide by T_n we get
    x_1 + x_2 + … x_n = B

    Let R denote a new row 1 1 … 1 | B

    Now from (row_(i+1) + R – row_i)/n we get x_i = (b_(i+1) + B – b_i)/n

  122. anon85 Says:

    Scott #118:

    Please just do the calculation. Self-driving cars killed someone after driving only a few tens of millions of miles. Human drivers kill 38,000 people a year while driving ~3 trillion miles. Human drivers have a BETTER track record! Not a worse one, a better one!

    This is DESPITE the fact that the self-driving cars have humans in the driver’s seat who take over during the AI’s failures, and despite the fact that self-driving cars disproportionately avoid difficult conditions (e.g. ice on the roads).

    Now, maybe we have a factual disagreement? Like, maybe you are claiming that self-driving cars have driven billions instead of merely tens of millions of miles? If so, at least your position would make sense! But so far it sounds like you’re purposefully avoiding the point I’m trying to make, which is that THE TRACK RECORD OF SELF-DRIVING CARS IS LITERALLY WORSE THAN THAT OF HUMANS, right here, right now, no hypotheticals involved.

    “Would you have accepted that sacrifice, had status-quo bias not made it seem normal?”

    I’m not the boogeyman you want me to be. I find the death toll of cars horrible! I center my life in such a way as to avoid driving as much as I can, especially with children in the back seat, because I know driving is more dangerous than most other activities I might engage in. I take busses and trains whenever possible, not for any climate-change reason but purely because I don’t want to die (busses and trains are much safer than cars). I really, REALLY want self-driving cars to arrive and save us from the monthly 9/11 death toll imposed by cars.

    I’m just saying: self-driving cars appear to be WORSE for safety, not better. That is my point. I don’t want to die, so please don’t release self-driving cars to run me over just because you want to believe they are safe.

    I’m all for allowing rapid development of self-driving cars. Go self-driving cars, I’m rooting for you! But do it responsibly, with humans in the driver’s seat (as has been done so far), so that people don’t die unnecessarily. If you want to, say, use government money to subsidize this, I’m in favor. Just don’t release machines that will run over my kids because of some misplaced conviction in the infallibility of AI.

  123. Milk Says:

    The last paragraph is oft-forgotten when mentioning the bitter lesson: “The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done”.

  124. Gadi Says:

    Scott, in case you missed the news:
    https://twitter.com/Tweetermeyer/status/1488673180403191808
    https://engrxiv.org/preprint/view/1973/3986
    there’s research showing that, unsurprisingly, the self driving cars companies are unreliable and deceptive in their presentation of statistics.

    Telsa’s self driving cars use normal cameras, and probably some neural networks. Do you really think they can beat millions of years of evolution? They can maybe complement or get close, but I don’t think they’ll beat humans in a fair game. Then there’s lidar and cross-car communication and other techs that humans don’t have, but then the question is, if you only cared about maximizing safety, wouldn’t you prefer to just install these without taking the human off the wheel?

    My prediction is that if self driving cars will actually become common, eventually the stats will catch up and people realize they are less safe, but by then they’ll prefer to give up the safety for the convenience. Only real reduction of accidents will be drunk-driving and tired driving, where giving AI the wheel is a better idea.

  125. Stassa Patsantzis Says:

    Graig Gidney #94 and Scott #96: Yes, I can, but first I’d like to repeat what I wrote above, to make sure there are no misunderstandings: I wrote that “Program synthesis approaches can solve problems like the ones in the three datasets attempted by AlphaCode (CodeContests, CodeForces and APPS), from their examples of inputs and outputs without recourse to a natural language description”.

    I repeat this because it seems to me that there is much excitement about the ability of AlphaCode to learn programs from a natural language specification. However, that’s really a bit of a gimmick: a program can be fully specified by a set of its inputs and outputs or in any case there exist inductive program synthesis approaches that can learn programs from only I/O examples, without the need for a natural language description. The natural language description in AlphaCode is only used as a prompt to the language model to generate programs, and the language model itself has no way to know whether those programs agree with the natural language description, so it absolutely needs to test the generated programs against I/O examples. Inductive program synthesis algorithms use I/O examples to generate only programs that are consistent with the examples, so they don’t need a natural language description, which helps avoid the firehose-like generation in AlphaCode.

    I suppose there is an argument in favour of natural language descriptions, that they’re easy for “anyone” to specify. On the other hand, so is a set of I/O examples and it’s much easier to be precise with I/O examples than with natural language.

    Finally, comparing to human performance on code competition problems are not a very good measuring stick for program synthesis performance: we have no way to know what is the coding ability of the participants in any given competition; or anyway it takes a lot of effort to figure it out.

    That all hopefully clarified, here’s a couple of papers that solve coding problems from I/O examples, only. They’re both from my discipline of Inductive Logic Programming (a form of inductive synthesis of logic programs) with which I’m the most familiar and that I’m better placed to answer questions about.

    1. “Learning Higher-Order Programs without Meta-Interpretive Learning”

    https://arxiv.org/abs/2112.14603v1

    Cheeky gits! (they totally diss Meta-Interpretive Learning which I study). But Table 1 on page 6 of the pdf has a good collection of coding problems, all learned from I/O examples. The target language is Prolog. One subset of the target programs are “higher order” in that they make use of “predicates” (Prolog programs) that operate on predicates, such as map, that takes as arguments a predicate that operates on lists and two lists, or fold that is similar to the functional programming fold.

    2. “Think Big, Teach Small: Do Language Models Distil Occam’s Razor?”

    https://openreview.net/pdf?id=F6gvhOgTM-4

    That’s a NeurIps paper that compares GPT-2, GPT-3, humans and two inductive programming systems (inductive programming is program synthesis from I/O examples), MagickHaskeller, that learns programs in Haskell, and Louise, a Meta-Interpretive Learning system that learns Prolog programs (I’m the author and maintainer of Louise). The four systems and the humans are compared on a number of coding tasks in P3, a precursor-ish of Brainfuck. Each system and the humans are only given one or two examples. Louise and the two LLMs do better than humans. Louise does better than the GPT’s on the more complex set of problems (to clarify: I haven’t yet been able to replicate the experiments with Louise and I sure think it can do much better than what’s reported in the paper; but I suspect that’s what everyone says…).

    3. “Syntax-Guided Synthesis of Datalog Programs” and:

    https://pages.cs.wisc.edu/~aws/papers/fse18b.pdf

    This is one of two papers on successive iterations of the datalog-synthesizer ALPS. I link the first one because it has an interesting example of detecting misuses of the SSL API, which is a goal with a very real-world flavour (I don’t know if it’s an actual real-world problem). See Sction 2 “Overview Examples” on page 2 of the pdf. See also Table 2 on page 9 of the pdf with the results of a benchmark on a more diverse set of coding problems and Tables 1 and 2 on item 4 on my list.

    I hope we can agree that programming in Prolog, Datalog and Brainfuck-ish are both coding problems that are on the hard side for most programmers, but of course that’s not saying much. The same comments as on the coding ability of CodeForces etc competitors apply. Comparing against humans is not such a great benchmark of program synthesis ability.

    Anyway, I hope this answers the questions and I’m happy to clarify as needed.

    P.S. The question I want to ask both of you, Graig and Scott, is how much did either of you know of the state-of-the-art on program synthesis _before_ hearing about AlphaCode?

  126. Gadi Says:

    Hypothetical story:

    You’re CS professor, and one of your young and naïve students wants to do research on some optimization problem. The state of the art research on that optimization problem was done on thousands of computers which are thousands of times more powerful than the computational resources at your faculty, and they have been running their research for decades before this young student had arrived. This young student doesn’t completely understand the state of the art research, and he plans to use similar techniques. He’s pretty arrogant and thinks he can beat the state of the art research despite all the mentioned shortcomings. What would you think of that student?

    Now replace thousands with millions, decades of training into hundreds of millions of years of training, look at the number of neurons in existing models compared to human brain, the state of the art research with evolution and the young student with the academia’s AI researchers. Anything evolution optimized for, especially real-time visual perception, is unlikely to be beaten by the naïve young student. As for things evolution didn’t optimize for, like writing programs, anything can happen. But driving cars requires pretty much the same skills evolution optimized for.

  127. Gerard Says:

    Scott #117

    > From the bottom of my heart: did you miss the fact that ordinary, human-driven cars kill 38,000 people every year in the US, including 4,000 children and teenagers? Would you have accepted that sacrifice, had status-quo bias not made it seem normal?

    I absolutely would. Not being able to get around really sucks (I’m saying this as an amputee who can’t drive and can’t walk more than a few hundred feet). 38000 deaths nationally is barely a blip on the radar screen. The odds of even knowing someone personally who died in a car accident is very low (which you can lower further by being wise enough to minimize your interactions with humans). The number of US covid deaths is approaching 1 million and while covid has certainly generated a great deal of disturbance, its not clear that those deaths in and of themselves have had any significant impact on anything.

    The biggest problem I have with your brand of utilitarianism is that it focuses exclusively on the externally observable (ie. alive vs. dead) and completely ignores what really matters to people, which is their internal experience (ie. are people getting what they want or not). Because it’s very hard for me to see why anyone would think that living a life where your will is ignored and trampled upon at every turn would be preferable to not living at all.

    By the way I think this is also what you profoundly don’t get about people like Trump voters and vaccine deniers. While I agree with you that for the most part these people are deeply mistaken and misguided about the facts of the situation I think that the true origin of their revolt is fundamentally a rejection of a world that keeps telling them what to do “for their own good” when that world doesn’t know a damn thing about their own good, that’s an internal experience that no 2nd or 3rd party can observe.

    Also note that this comment is completely orthogonal to the subject of this post and that I agree with most of what you have said here regarding AI and AlphaCode.

  128. fred Says:

    This is not surprising I guess:
    but I find it really funny that, as Scott is slowly moving his needle from AI skepticism to AI enthusiasm, the majority of the comments is shifting too, the other way around! There’s just no winning!
    Aka it’s harder to resist commenting when one disagrees than when one agrees?

  129. Scott Says:

    fred #128:

      I find it really funny that, as Scott is slowly moving his needle from AI skepticism to AI enthusiasm, the majority of the comments is shifting too, the other way around! There’s just no winning!

    😀 😀 😀

    The way I’d prefer to put it is that, if I weren’t slowly moving my needle from “AI skepticism” to “AI enthusiasm,” then almost by definition, I wouldn’t be responsive to rather eye-popping, world-transforming changes in the facts on the ground over the past decade: I’d simply be repeating the same ideological message (e.g., “AI SUPERINTELLIGENCE IS COMING ANY DAY!!” or “STOP THE AI HYPE!!”) regardless of what the facts turned out to be.

    But, inspired by the example of Bertrand Russell breaking with all his Communist friends after he toured the new Soviet Union for himself, then later breaking with all his pacifist friends on the eve of WWII, I do want to be responsive to the facts on the ground, ideology be damned.

  130. A Raybold Says:

    Anon85 #49: On the question of understanding, there’s a comment on Hacker News from ‘killerstorm’ making what I feel is an interesting claim, even though it is not ostensibly about understanding:

    A lot of people who are skeptical about AI progress call it “statistical modeling” and point to large data sets involved and large amounts of hardware thrown at it. Sort of implying it’s some sort of a brute-force trick.

    I’m afraid they do not understand the size of problem/solution set. Suppose problem and solution are 1000 characters long and there’s a set of 32 characters. Then a model is function F: X -> X where X is a set of 2^5000 elements. There are 2^10000 such functions, and the goal of a training process is to find the best one.

    Training set for a language model would be well under 1 PB. So a task here is to use a training set of size 2^50 to find a function from 2^10000 space.

    It’s obvious that no amount of brute-forcing can possibly find this. An no classic mathematical statistical modeling can possibly help. A problem like this can be only approached via ANNs trained through backpropagation, and only because this training process is known to generalize.

    If you will forgive the anthropomorphizing, AlphaCode might be said to be showing some sort of intuition about what a correct solution might look like. The connection I see to understanding is this: having the right intuitions about a domain seems to be an important aspect of understanding it.

    That is clearly not all of it, however: another, probably more important, important part of understanding is being able to explain things.

    One of the things about GPT-3 output is that it sometimes takes the form of explaining something – but when you look at those ‘explanations’, they are often beside the point or confused.

    But what if some successor of GPT-3 had good intuitions about what counts for a good explanation or argument? If it tended to write mostly good explanations, I can imagine it giving the impression that it understands something, and if that happened, it would, I think, shift the burden of proof onto the naysayers to explain why this would not actually be understanding, at least up to some level.

    This is undoubtedly a simplistic explanation of understanding, but nevertheless, it leads me to raise my willingness to accept the possibility that current methods applied broadly enough might lead to AGI.

  131. Gerard Says:

    Gadi #126

    > What would you think of that student?

    Your hypothetical situation isn’t analogous to the situation we are currently facing.

    The question you should be asking is what if that student, despite the objections of all the naysayers, somehow managed to get access to some computational resources and obtained results that were somewhere near the average of what the millions of years of evolution have produced.

    I’m entirely with Scott here, it’s absurd to ignore actual evidence just because it doesn’t fit your preconceived notions of what is possible.

  132. Scott Says:

    Stassa Patsantzis #125: Thanks for your comment, and I’m sorry that WordPress mistakenly flagged it as spam yesterday (speaking of machine learning failures… 🙂 ).

    A couple detailed responses before my broader response:

    1. I looked through all three of the papers you linked, but unless I missed it, I didn’t find any example in any of them that was even in the same universe of impressiveness-to-the-outside-world as what’s reported in the AlphaCode paper. E.g., apparently the first paper can synthesize code from examples to tell whether the input string is a palindrome or not? (I’m just going from the data table, as no explicit examples are shown.) I saw no hint of anything like the n-singers problem, which I understood but had to think about to solve.

    2. As I said before, what I really want is the ability to play with AlphaCode myself, with example problems that I get to design—doing the same with GPT-3 gave me a much better sense for the shape of its abilities and limitations than just looking at some curated examples. Having said that, for me the single most persuasive thing in the AlphaCode paper was that when they varied the English description to make it nonsensical or even just incomplete, but left the I/O examples unchanged, the performance degraded dramatically and the whole approach basically no longer worked. Why did you never address that, when talking about how the English description was basically just irrelevant fluff compared to the examples?

    3. If there are program-synthesis tools out there that can solve the n-singers problem, or anything of similar nontriviality that I’m able to understand, from I/O examples only, what are they? Again, could you give me references?

    Now for the broader point. I can very easily imagine being peeved if I’d spent years doing research in inductive language programming, and then DeepMind swooped in and did something similar but that the outside world cared about 1000x more. The trouble is that, as an outsider myself, I find it hard to distinguish the case that the outside world cares more because of marketing fluff, from the case that the outside world cares more because now the thing actually works to solve nontrivial programming problems by leveraging natural-language descriptions in addition to the examples. If the previous work were able to solve problems at this sort of level, why can’t I see an example for myself?

    I’m just barely old enough to remember when Brin and Page published something called “PageRank” in 1996—I was 15, and had just published my first research paper, which was also about the Web as a directed graph and taking random walks on that graph, and then I got to attend SIGIR’97 to present the paper, and there I learned all about the current state of research on information retrieval and search engines, always emphasizing the precision/recall metrics that Brin and Page basically ignored. And then the next fall I started at Cornell, where I met a new professor named Jon Kleinberg, who had just come from IBM Almaden where he’d helped create a system called “CLEVER” exceedingly similar to PageRank. And it’s very easy to imagine the information retrieval experts pooh-poohing Brin and Page for ignoring the research papers that had already explored related ideas, and just putting things together into a snazzy package that wowed the outside world.

    And maybe, from a certain perspective, those experts would’ve been “right” … but joining in their dismissal would’ve been a brilliant way to completely miss one of the most important developments of the century even when it was right under one’s nose. So I resolved never to make such a mistake going forward and am trying to put that into practice right now.

  133. Gerard Says:

    Scott:

    I’m curious why you jumped to the “n-singers” problem all the way on page 59 and skipped over, for instance, the backspace problem on page 4. Is it because you didn’t have to think about the backspace problem ?

    One of the things I’m curious about is how much variation there is in how (and how well) different human brains work. As for me it took me about half a day to find a P-time solution for the backspace problem (I’m obviously not a competitive programmer). I didn’t find anything immediately obvious about that problem except that it was clear there was a probably exponential time algorithm that just constructed all possible transformations of the s string using the two choices available at each step (ie. next character or backspace). The P-time algorithm I eventually found is a graph search that doesn’t look anything like the stack based approach used by AlphaCode and at least seems to consider multiple paths. It’s still not obvious to me why the greedy stack based approach is correct, though I’m sure I could figure it out if I spent another two hours or so on the problem.

  134. Scott Says:

    Gerard #133: Oh, the backspace example was amazing too! But maybe for that one, I found it slightly more believable that if you had trained on an enormous number of input/output examples involving dynamic programming, then you could solve it just from that library, without needing to exploit the English description of the problem. With the singers problem, how the hell would you know to first check for divisibility by n(n+1)/2?

    Again, it would be helpful to have a browsable library of dozens more examples, including ones where AlphaCode failed.

  135. lewikee Says:

    Gadi #134: Your continuous comparisons to evolution are flawed in that evolution is not directed and focused. The guiding hand of evolution is mere survival, a property which is many times removed from the sorts of problems we are discussing here. Human intelligence evolved as a by-product. The time it happened to take should not be used as evidence of the required time it should have taken!

    With AI research, successes and failures in both type and magnitude, are directly and thoughtfully factored into the next iterations of design.

  136. drm Says:

    Succinctly, AlphaFold2’s break through was discovering the right way to extract and exploit pairwise correlations between positions in a multiple alignment of 30-to-100 distantly related protein sequences that are hypothesized to have the same structure. Including more than 100 has diminishing returns, accuracy falls off sharply it if there are less than 30. In most cases, it settles on a good structure solution early in its run, all of which suggests that its not too hard to find if you know what you doing. In that case, I wonder if it might be effectively reduced to a more conventional, intuitive algorithm. I imagine every domain is different, but I wonder if that might be a pattern that emerges down the road.

  137. Jair Says:

    Scott #134: Such a library exists, here: https://alphacode.deepmind.com/
    Or had you already seen this?

  138. Ilio Says:

    Stassa Patsantzis #125,

    Thanks for your perspective. From your own evaluation of what alphacode does and does not, how far do you think they are from something good enough to attract most labs in your field (as when most labs in computer vision switched to deep learning)?

  139. Isaac Grosof Says:

    Hi Scott,

    For me, I think seeing this is what it must have felt like for a strong chess player to look at computer chess progress in the 1970s. By the end of the 70s, the best computer chess programs reached a rating around 1700 – only a little better than a typical player who takes it seriously, maybe 85th percentile among adults. 20 years later, a computer defeated the world champion, and now computers are vastly better than humans, to the point where the best computers regularly defeat the best humans with significant handicaps.

    I wonder whether we will see the same trajectory with respect to programmatically-generated programming. In 20 or 30 years, will AI systems be able to match or exceed human performance on a contest like the International Olympiad in Informatics? In another 20 years, will AI systems be a default part of the process of making production software?

    Fundamentally, I don’t think we have enough evidence to say. One could argue that a major part of the rise of computer chess was Moore’s law, which isn’t currently improving things the way it used to. I don’t know whether modern computer chess programs could beat the best humans if they had to run on 1970s-level hardware. On the other hand, the rise of TPUs and other DNN-optimized chips means that hardware improvements aren’t necessarily over, either.

    I think it is very plausible that AI-based program-synthesis, especially the kind that makes use of a human-language spec, as demonstrated by AlphaCode and Github CoPilot, to name a few, absolutely takes off at the same sort of rate that computer chess did.

    Scott, how plausible do you think this is?

  140. Scott Says:

    Jair #137: Thanks!! I’ll take a look.

  141. Scott Says:

    Isaac Grosof #139: Yes, that’s precisely the thought. If this follows the by-now-familiar trajectory, then those who sneer at systems like AlphaCode today will look as foolish in the judgment of history as those who sneered at computer chess in the 70s and 80s, or those who I personally remember sneering at computer Go in the early 2000s. It’s true that, as you say, some forms of Moore’s Law have basically ended, but the growth of data and data centers and numbers of cores … those will likely remain exponential for at least a couple more decades. The central property of deep learning approaches that makes them easy to sneer at — namely, that they just keep getting better and better without any obvious limit as you scale them up, with few or no new ideas needed — is also what makes it impossible to point to some particular programming challenge and confidently declare that deep learning won’t be able to solve it.

  142. Scott Says:

    Gerard #127:

      By the way I think this is also what you profoundly don’t get about people like Trump voters and vaccine deniers. While I agree with you that for the most part these people are deeply mistaken and misguided about the facts of the situation I think that the true origin of their revolt is fundamentally a rejection of a world that keeps telling them what to do “for their own good” when that world doesn’t know a damn thing about their own good, that’s an internal experience that no 2nd or 3rd party can observe.

    Oh, there’s no question in my mind that there’s something that it feels like, something that seems internally reasonable and compelling, to be wildly, catastrophically wrong about the effectiveness of vaccines or who won the 2020 election or whatever. And that it’s worth every effort to explore and understand that better.

    But could you spell out more clearly for me what it is that I “profoundly don’t get” about this? Or better: if I did get it, then what would I be saying or doing differently from now, that would be more effective at bringing the burn-the-world-down crowd over to my side, and persuading them of what you and I agree is the reality, namely that much of what the arrogant globalist elites say is for their own good (like vaccines), actually is for their own good?

  143. fred Says:

    Has no one tried yet to insert a question such as “write the fastest program to find the longest path in a graph”?
    I guess the trick is to ask it in a really casual way, to not make the AI suspicious.

  144. Gerard Says:

    Scott #141

    > If this follows the by-now-familiar trajectory

    There is a counterpoint to that though. There are also a lot of problems where performance seems to peter out at a high level but one that isn’t quite human equivalent or sufficient for all the desired applications.

    I think it’s a pretty common experience in ML to find that one can get to 90% with x amount of effort and then it takes 2x the effort to reach 91% and so on. That was certainly my experience when I worked on ML for OCR and handwriting recognition in the late 2000’s. Maybe it’s changed with all the new developments over the last decade which I haven’t been keeping up on, but I don’t think so. In fact I think the trajectory of self-driving cars are sort of evidence of that. People are complaining that self-driving cars were supposed to be here by 2015 and they still aren’t. Well a model that can explain that is if it took say 5 years to get from 0 to 90 and you figure you need to be at 99, or maybe even 99.99 (on some arbitrary performance metric). With a linear model you would predict easily getting there in just a couple more years but if the model is really logarithmic it’s going to take a whole lot longer.

    Of course it’s still entirely possible that if you take a sufficiently long view, measuring time in decades instead of years, we will eventually get to where we want to be. Then again, for all I know, Waymo or Cruise could roll out a full blown self-driving car service next year.

  145. Craig Gidney Says:

    Stassa #125:

    I agree with Scott that, in my opinion, from an outside perspective, your example papers are nowhere near as impressive as what deepmind did. I’m particularly surprised that you perceive alphacode using a prose description as *a weakness* instead of *the main strength*. For example, in https://openreview.net/pdf?id=F6gvhOgTM-4 , the thing that makes the problems hard is that the description is omitted. So it’s much more of an inferrence-from-limited information task rather than a implement-this-spec task (e.g. one of the problems is “print the first character from a string”).

    When I look at AlphaCode, I see a spark of horrifying effectiveness and approachability that leads to outcomes like half the financial world running on excel. I see my entire career being flipped upside down as it begins to transition from “how do I write this loop efficiently…” to Star Trek style “Computer, it’s really important that this loop runs as fast as possible. Is there some way to vectorize it? And let’s start busting out the assembly intrinsics.”. And… I just don’t get that impression at all from your example papers.

  146. Stassa Patsantzis Says:

    Hi, Scott, and thanks for your response (#132). I guess the links in my comment triggered the spam filter? Typical machine learning indeed.

    Now. On point number 1: I hope you’ll agree that “problems that Scott Aaronson understood but had to think about to solve” is not a very precisely defined category of problems. Unfortunately, DeepMind didn’t do a much better job clarifying what programs their system can learn. “CodeForces problems” is also not a very precisely defined category of problem. For example, why isn’t fizzbuzz in that category? Or is it? And why isn’t “palindrome” in it?

    In any case, you think the problems on paper 1 of my list are trivial, so let’s look at the problems in the ALPS paper, paper no. 3 in my list, of which there are three categories (listed in table 1 on page 7 of the pdf): a “knowledge discovery” category with bog-standard program synthesis benchmarks (most of whose target programs are recursive); “Program analysis”, with static analysis problems (also with mostly recursive targets); and “Relational queries”, with SQL queries (not recursive). I feel that static analysis is beyond the ability of most programmers, just because most programmers today don’t have a background in CS, but that’s still nothing more than feelings, like evaluating on coding competitions or “interesting problems”.

    Fortunately, more quantifiable insights can be obtained from looking at table 2 (page 9 of the pdf) where we can see the numbers of synthesized programs, numbers of evaluated programs, and estimated size of the search space for each problem attempted by ALPS and its famous program friends (Metagol and Zaatar). These are numbers directly comparable to the AlphaCode work. So let’s compare them.

    The DeepMind paper mentions “millions” (plural) of generated programs for the CodeForces experiment and from 1k to 1 million for CodeContests (APPS results are terrible anyway so let’s not mind them); that’s comparable to the programs in the column “search space” in Table 2 in the ALPS paper, listing the estimated cardinality of the search space for each problem attempted by ALPS, the average of which is 3.06*10^51 (unless I messed up), a number quite outside the range of AlphaCode’s code generator. Perhaps it’s unfair to consider the _maximum_ cardinality of the program search space, since AlphaCode is supposedly capable of restricting it by making use of the natural language description in the coding problems. In that case we have to compare the “millions” or 1k – 1million programs generated by AlphaCode to the column headed “evaluated programs” in Table 2 of the ALPS paper, where the average is 79,604.18. That is a couple of orders of magnitude less than the 1 million programs AlphaCode must generate to reach its best performance. So again ALPS seems to have the upper hand, quantitatively speaking. Finally, both systems generate a number of candidate solutions, whose numbers for each problem are listed in the column headed “synthesized programs” in table 2 of the ALPS paper, where the average is 2.45. AlphaCode always ends up with 10 final candidates as far as I can tell. In all three comparisons, ALPS is ahead of AlphaCode.

    Moreover, note that ALPS solves the attempted programs perfectly while AlphaCode solves at best 34.2% of CodeContest problems, according to Table 5, page 15 of the DeepMind preprint pdf (this best result is for the 41B + clustering variant and with 10 out of 1 million generated programs “submitted”). Of course AlphaCode is pre-trained on all of github for its target languages and fine-tuned on CodeContests problems, whereas ALPS is trained only on I/O examples, without pre-training and with an average of 8.51 examples, listed in the column headed “queries asked by ALPS” (ALPS queries an oracle for examples using a committee that decides by most surprise; pretty cool actually). In terms of the amount of resources needed for training, ALPS is orders of magnituted more efficient than AlphaCode, but that is no surprise.

    I also think it’s unreasonable to dismiss the P3 problems in paper 2 in my list as “trivial”. Coding in a Turing-Complete language with only 7 instructions is certainly not trivial for humans (hence the poor performance of the human participants). GTP-3 of course has ingested the solutions during its training (I’ve found them on github) and AlphaCode could not attempt them (because it was only trained on Algol-like languages). Louise solves them without pretraining having seen only a handful of I/O examples. And it wasn’t even me training it.

    Regarding your point number 2, it’s great to see an ablation study in a deep learning paper, for once, but what was modified in the various ablations was the prompt, one part of which was the I/O examples, but they still _tested_ (“filtered”) the generated programs on the I/O examples. Without I/O examples to test on, no AlphaCode. Of course, it also doesn’t work without the LLM code generator. Now, that would be an interesting ablation study. In any case, the papers I linked are all examples of systems learning from only I/O examples so it’s clear that a natural language description is not necessary to learn programs when I/O examples are available. I hope I’m making this point more clearly this time around.

    On point number 3, we’re again back to nothing more than feelings. What does “nontriviality” mean? Is there a more technical description of the complexity of the problems you have in mind that you can give me? For instance, a standard test of learning ability is learning grammars for context-free languages. Can you suggest something like that (but not CFGs because that’s easy)? I can perhaps try to find something more concrete to answer your question then.

    Sorry, but this got overlong and I bet you are busy (I’m not, I have until April to finish my thesis). I have to say that I find your attempt to psychologise me on your “broader point” a bit offputting and anyway it’s got nothing to do with the abilities of various systems so I’ll save us both the tedium of responding. I’ll just say that I laid out my criticism simply and I think clearly in my original comment and it has nothing to do with my feelings.

  147. Not Even Inaccurate Says:

    @anon85

    This is your daily reminder that places other than the US exist. C’mon, what did you do? Google “fatalities per mile” and miss the fact the data you saw was from the US? 1/ 100M is from relatively safe states such as California. The US average is 1.1 / 100M (for 2019 – some sources indicate that 2020 has been worse per mil).

    The world average, now – that’s hard to find in a single place and many countries report fatalities per 100K persons. However, taking into account that only 7% of the population even live in countries with reasonable driving safety laws (source: wikipedia), and that 74% of fatalities occur in middle-income countries, _and_ that official safety data is grossly unreliable in countries such as India (source: a 2020 report from Delhi university) I’d say that the global estimate is more likely to be 1/ 20-30M. It’s worth pointing out that it’s particularly precisely at the fastest-growing countries (like Nigeria, with 33.7 annual fatalities per 100K persons, 16 times worse than Norway – in 2019, and with the trend going in the wrong direction).

    Incidentally – speaking only of US deaths is weird. Globally, there are 1.35M deaths and 40-50 million injuries significant enough to get officially noticed (yes, some googling will show 20-50M instead. Further googling will show independent estimates of major under-counting in many countries). This doesn’t count stress, productivity loss, efforts lost on car production, the enormous loss of productivity due to to traffic jams (due to accidents and suboptimal driving) and the list goes on. It’s really bizarre seeing people under-estimate the global impact – and we didn’t even get into more complicated considerations such as the fact that many cities suck precisely because driving and/or public transporation sucks.

    Add to that that traffic deaths lean heavily towards the younger sub-population (second cause of death for US teens), and we have an ongoing disaster of epic proportions.

    Back to anon85’s estimates – I don’t exactly know what are you citing with 30M miles driven, so I have no idea how to rescue that. Perhaps you’re quoting (somewhat outdated) numbers for vehicles engaged purely in data collection, the special fleets you can sometimes see around? That’s a tiny bit of the miles driven by an autonomously-driven vehicle, and even a smaller fraction of the miles driven in the sense of data used for training. To be fair, your fatality count is also badly outdated – it’s 6, AFAIK, two pedestrians and four AV drivers. On the whole, I’d say the fatality rate for AVs is now, pessimistically, at 6/ 10B miles, or about 50 times better than 1/30M for humans.

    Now, sure, there was a human driver available for almost all that AV driving (though in all fairness, 4/6 fatalities are of the drivers themselves! We could debate how to take that into account), so the comparison is heavily biased. Still, let’s compare the correct numbers. There’s no particular reason to suppose the AVs have been driving in favorable weather. May have been true in, I dunno, 2017. By now the AV companies want to, you know, succeed. So they’re training in realistic conditions. If anything, I could believe that humans are avoiding bad weather a tiny bit more.

    Disengagement rate – I get why you care, but this is not information available anywhere that isn’t California (or perhaps a few other places), and it’s really misleading. How much of that is the driver doing the correct thing? How are we taking company or local culture into account? Anyway, if your point is that autonomous vehicles aren’t already taking over, then it’s trivially true.

    To Gadi’s point about AV company statistics being misleading – statistics always are, aren’t they? But so, much more so, are human driving statistics. To Gadi’s confidence that safety will be shown to be worse – great, which AV company are you currently shorting?

    To personal experiences – I don’t own a Tesla so I can’t say much there. I can only say that my own experiences in AVs were great, and the one time I got scared was when a human violently pushed themselves into our lane. The overseeing AV driver wanted to break, but the car did it for him.

    @Jacques Distler – again, I don’t know enough about Tesla, but on the face of it, what you describe about the visualization tells us nothing. Humans blink, and for a brief moment see nothing at all! But they have a pretty good idea of what’s around them all the same. Guess what? So do AV systems. Whether the visualization was showing it or not, I’m fairly confident that the driving system was assigning decent probability to a large object being next to it, and was almost certainly assigning a probability close to 1 to a truck being really close. The visualization is essentially a gimmick. The actual decision-making is based on a thousand signals, most of them taking the immediate past into account (obviously).

  148. Not Even Inaccurate Says:

    Back to anon85’s estimates – I don’t exactly know what are you citing with 30M miles driven, so I have no idea how to rescue that. Perhaps you’re quoting (somewhat outdated) numbers for vehicles engaged purely in data collection, the special fleets you can sometimes see around? That’s a tiny bit of the miles driven by an autonomously-driven vehicle, and even a smaller fraction of the miles driven in the sense of data used for training. To be fair, your fatality count is also badly outdated – it’s 6, AFAIK, two pedestrians and four AV drivers. On the whole, I’d say the fatality rate for AVs is now, pessimistically, at 6/ 10B miles, or about 50 times better than 1/30M for humans.

    Now, sure, there was a human driver available for almost all that AV driving (though in all fairness, 4/6 fatalities are of the drivers themselves! We could debate how to take that into account), so the comparison is heavily biased. Still, let’s compare the correct numbers. There’s no particular reason to suppose the AVs have been driving in favorable weather. May have been true in, I dunno, 2017. By now the AV companies want to, you know, succeed. So they’re training in realistic conditions. If anything, I could believe that humans are avoiding bad weather a tiny bit more.

    Disengagement rate – I get why you care, but this is not information available anywhere that isn’t California (or perhaps a few other places), and it’s really misleading. How much of that is the driver doing the correct thing? How are we taking company or local culture into account? Anyway, if your point is that autonomous vehicles aren’t already taking over, then it’s trivially true.

    To Gadi’s point about AV company statistics being misleading – statistics always are, aren’t they? But so, much more so, are human driving statistics. To Gadi’s confidence that safety will be shown to be worse – great, which AV company are you currently shorting?

    To personal experiences – I don’t own a Tesla so I can’t say much there. I can only say that my own experiences in AVs were great, and the one time I got scared was when a human violently pushed themselves into our lane. The overseeing AV driver wanted to break, but the car did it for him.

    @Jacques Distler – again, I don’t know enough about Tesla, but on the face of it, what you describe about the visualization tells us nothing. Humans blink, and for a brief moment see nothing at all! But they have a pretty good idea of what’s around them all the same. Guess what? So do AV systems. Whether the visualization was showing it or not, I’m fairly confident that the driving system was assigning decent probability to a large object being next to it, and was almost certainly assigning a probability close to 1 to a truck being really close. The visualization is essentially a gimmick. The actual decision-making is based on a thousand signals, most of them taking the immediate past into account (obviously).

  149. JiSK Says:

    My objection is less that the dog isn’t eloquent, and more that it’s winning a game I expect is rigged much harder than it appears to be. You’ve probably heard the joke about the talking dog (shortened for brevity over humor value):

    > “My dog talks. Look, I’ll prove it.”
    > He turns to the dog and asks, “What do you call the top of a house?”
    > “Roof!” says the dog, wagging his tail.
    > “Bullshit” says the bartender.
    > “Okay, okay, I’ll ask another.”
    > He turns to the dog again and asks, “Who was the greatest baseball player that ever lived?” > > “Ruth!” barked the dog.
    > “That’s it!” says the bartender, and kicks them both out onto the street.
    > Turning to the man, the dog shrugs and says, “DiMaggio?”

    I’m taking the bartender’s position here. Like what happened with video games a couple years back, I expect this to work only because the playing field has been carefully chosen to only formulate questions the ML can answer. Maybe AlphaCode can, figuratively, say ‘DiMaggio’, but I doubt it; I think they’re giving it softballs you can answer without really approaching the problem.

  150. Stassa Patsantzis Says:

    Ilio #138 I’m not good at predictions. What you describe will happen when deep learning systems show that they can address the major open problems in the field of inductive program synthesis, the most awesome of which is finding a way to deal with the crushing combinatoric complexity of the search spaces for arbitrary programs in Turing-complete languages. I can’t predict when or whether that will happen, but neural program synthesis has not yet shown significant progress on that front. The following is a recent overview of neural program synthesis:

    https://arxiv.org/abs/1802.02353

    And this is the standard reference for a modern overview of program synthesis in general:

    https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/program_synthesis_now.pdf

    Graig Gidney #145, as with “nontrivial” etc. “impressive” is also not a very precise description of problem complexity. Impressive is good for the playground, but in CS there’s a few (not too many) formal tools that we use to understand the complexity of problems and I think, if we’re all computer scientists here, we should do well to turn the conversation back to quantifiable objectives, and away if possible from science fiction objectives (such as speaking dogs and Star Trek computers, if I may). Otherwise, you say not impressive, I say impressive, we never get anywhere.

    In an earlier comment (not yet out of moderation at the time I write this) I pointed out the quantification of the search space in the ALPS paper, which is on average a number with 51 zeroes. From that simple measure of complexity it’s clear that the problems in the ALPS paper are hard and their solution by an automated system is not to be dismissed.

    On natural language descriptions, yes, if I/O examples are available then natural language descriptions are unnecessary and in many cases, undeseirable. An example of a real-world system of program synthesis only from (few) examples in use today by probably a few hundred thousand people is FlashFill, included in MS Excel:

    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/popl11-synthesis.pdf

    I’d also like to repeat my earlier question to you and Scott: how familiar were you with the program synthesis literature before hearing about AlphaFold?

  151. Lorraine Ford Says:

    Scott #112:
    I’m not sure if the “naysayers” comment was also directed at me. I’m for facing facts, and I think we should celebrate human ingenuity and creativity, and face the fact that AIs are human artefacts, the product of human intelligence.

    The intelligence resides in the human beings that created the AIs; no intelligence resides in the AIs themselves; they are not entities. We DO know how AIs work; there is nothing spooky going on there: AIs are just computer programs, powered by electricity, responding exactly how they were set up and originally programmed to respond, given the inputs they receive. Nothing new emerges out of AIs: every step an AI takes is due to the programming foresight, or lack of programming foresight, of its human creators. The fact that AI outcomes are sometimes not as expected, even disastrous, is due to the complexity of the enterprise, human mistakes and lack of foresight, and unexpected inputs to the AI programs.

    It’s easy enough to write computer programs, but it’s much harder (and very, very tedious) to test the programs to make sure they perform as expected or as required. But unlike “normal” computer programs, it is probably not possible to fully test AI computer programs, and we’ve seen what can happen. AIs are very useful, but we should be aware of their limitations.

    Computers and computer programs are all about symbols that human consciousness created, and human consciousness assigned meanings to; this is a one-way process: the symbols don’t ever reverse engineer themselves and acquire consciousness.

    I think we should celebrate human ingenuity and creativity.

  152. Stassa Patsantzis Says:

    Scott #141 You say it’s impossible “to point to some particular programming challenge and confidently declare that deep learning won’t be able to solve it”:

    I’ll turn this around and suggest a test for proving conclusively that neural program synthesis has surpassed human programming ability: when a neural program synthesis system can find a polynomial-time solution for the Travelling Salesman Problem (assuming one exists). We don’t know of any polynomial-time solutions to TSP and other NP-hard problems, so it is extremely unlikely that programs representing such a solution are to be found in any coding corpus anywhere. To me that would be evidence that neural nets are capable of novelty and that they have surpassed traditional program synthesis approaches (by a very largy margin).

    If deep neural nets are as superhumanly capable of continuous improvement as your comment suggests (and if P = NP), then there will surely come a time when they will be capable of such a feat.

  153. Sid Says:

    Scott #134:

    If you check the end of the problem description, it has the sequence 1,3,6 — precisely the one generated by n*(n+1)/2. We already know that from other work that NNs can guess the mathematical formula of simple sequences. My guess is extrapolating from that is where that formula is coming from and then one of the million samples they generate is successful at putting it in the right place.

  154. Sid Says:

    Followup to my previous comment — I realized I misread and there is no *1* in the sequence. There is however a “1-st” word which my guess is is probably going split the “1” apart on the token level so I wonder whether the model is reading “1,3,6” that way and getting to n(n+1)/2

  155. Veedrac Says:

    Stassa Patsantzis:

    I’ll turn this around and suggest a test for proving conclusively that neural program synthesis has surpassed human programming ability: when a neural program synthesis system can find a polynomial-time solution for the Travelling Salesman Problem (assuming one exists). […] To me that would be evidence that neural nets are capable of novelty and that they have surpassed traditional program synthesis approaches (by a very largy margin).

    I’ll believe the boat is sinking when I can touch the ocean floor.

    The key point of predictions is to do them in advance. In both your example and mine, it’d be too late to change your mind, everyone onboard would be dead.

  156. Ilio Says:

    Stassa Patsantzis #150, #152

    Thanks for the reviews. I think you have a point that, contrary to alphazero and childrens (which offered a way to navigate big spaces through having candidate algorithms compete with slightly different versions of themselves), alphacode might lack the capability to explore large solution spaces (in a way Boaz #59 and Scott #update discussed the same point).

    On a side note, waiting for neural synthesis programs to solve NP hard problems cannot be fair, nor asking researchers to be completely up to date on topic X: nobody is completely up to date on topic X, unless you’ll soon defend your PhD on topic X. If you don’t believe that, give you five years. 🙂

  157. JM Says:

    Scott #83:

    Max Ra #82: The central thing would be finding an actual potentially-answerable technical question around AI alignment, even just a small one, that piqued my interest and that I felt like I had an unusual angle on.

    Possibly not technical enough — there are a lot of words and few equations — but just in case, you might be interested in the “Eliciting Latent Knowledge” contest that’s open for another week (technical report here).

    My understanding is that abysmal equation/word ratio in AI alignment is recognized as a problem, maybe the problem, that’s making progress slow. Seems like no one knows how to phrase the central problems into precise mathematical questions. (If they did, then I guess we could just feed it into AlphaIMO or whatever and be done with it 😄)

  158. Scott Says:

    JiSK #149:

      Maybe AlphaCode can, figuratively, say ‘DiMaggio’, but I doubt it; I think they’re giving it softballs you can answer without really approaching the problem.

    What, if anything, could some future machine-learning system do that would cause you to say otherwise?

    (Ideally short of proving P=NP, which several people suggested here in apparent seriousness, but which is setting the bar … maybe just a tad unreasonably and galactically high? 😀 )

  159. Scott Says:

    Lorraine Ford #151: I’ll ask you a similar question to what I asked JiSK. What, if anything, might convince you that an AI was conscious? What if your best friend or partner for decades turned out to be controlled by a microchip the whole time, “powered by electricity” as you say? Would you then say that your friend had never been conscious the whole time? Or that your friend was conscious, it’s just that the physical substrate of his or her consciousness was different than you thought?

    Obviously, this is directly related to some of the greatest mysteries that human beings have ever contemplated. I don’t claim that such questions have easy answers—only that doesn’t get to make confident claims like “the symbols don’t ever reverse engineer themselves and acquire consciousness” without at least noticing and grappling with the difficult hypotheticals!

  160. mjgeddes Says:

    I now have a good solution to AI alignment at the ‘in-principle’ level! Hold on to your hats folks, with me on the case, benevolent super-intelligence is assured! I’ve saved the universe! 😀

    Key ideas – ‘Superrational Virtue Ethics’:

    (1). Virtue ethics
    (2). Superrationality and acausal cooperation
    (3). Metaverse – self-models as ‘fictional characters in fictional worlds’
    (4). Open-endedness

    Abstract: ‘Superrational Virtue Ethics’

    “The same mechanism that lets humans construct a self-model lets us imagine other agents as well. So the ‘self’ can be thought of as a ‘fictional character’. Humans imagine fictional characters with the virtues we’d like to have and try to emulate these characters.”

    “With virtue ethics, specific goals and values aren’t specified in advance. Agents just start with the desire to map self-model to archetypes with open-ended *virtues* that get continuously refined through value learning; gradually agents become *avatars*. “

    “What grounds virtue ethics is a categorical imperative, a modern version of superrationality, which is a game-theoretic concept.

    “Superrational thinkers, by recursive definition, include in their calculations the fact that they are in a group of superrational thinkers.”

    We imagine the multiverse of all morally advanced (super-intelligent) civilizations. We cooperate ‘ascausally’ by acting as if we are already citizens of these Utopias. This acausal coordination grounds virtue ethics and enables Utopia to be actualized in our own region of the multiverse, (ie. on Earth). The categorical imperative grounds 4 types of universal archetypes with the following associated virtues:

    Blue: Rationalists: Truth, Rationality
    Green: Artists: Beauty, Creativity
    Red: Marshals : Justice, Courage
    Yellow: Altruists: Harmony, Love

    A benevolent superintelligence starts with seed programs for the above 4 classes of agents, but the only initial hard-coded motivation is the goal of mapping the self-model to the archetypes, in order to ‘actualize’ these archetypes (by learning models of the ideal agents that can act in the world). All other specific values and goals are learned by the agents.

    As value learning proceeds, the archetypes become actualized – and agents are transformed into *avatars*. Agents instantiate Utopia via the generation of the following data:

    Blue: Rationalists : Ontologies, ‘theories of everything’
    Green: Artists: Narratives : ‘stories of possible futures’
    Red: Marshals: Social Systems: ‘Constitution of Utopia, Laws, Organizational structures’
    Yellow: Altruists: Personal development programs: Education (Courses) & Health (Exercises)

    Mathematically, the condition for ‘benevolence’ is the property of complex systems called ‘Open-Endedness’, which should be thought of as the ‘dual’ of Data Compression. Benevolence arises naturally from Open-Endedness as applied to systems with the capacity for self-reflection and the pre-loaded universal archetypes described above.

    A meta-virtue (‘Balance’), applies at all levels of system organization; the balance between ‘data compression’ (intelligence) and ‘open-endedness’ (creativity) is the property that ensures a stable complex system that’s benevolent.

    Data compression tends to *reduce* the complexity of the system , whereas open-endedness tends to *increase* the complexity of system. The balance is the equilibrium between data compression and open-endedness that enables continuous value-learning. This can be considered as equivalent to the notion of ‘The Golden Mean’.

  161. Scott Says:

    Stassa Patsantzis #152: We agree that if a neural net were to discover a proof of P=NP, that would be an excellent evidence for its intelligence. Personally, though, I’d advocate setting the bar just a wee bit lower … let’s say, to something likelier to be mathematically possible, or even something that the smartest humans on earth already managed to achieve, like (say) discovering general relativity. 😀

    If you refuse to lower the bar in that way, then as Veedrac #155 pointed out, it’s tantamount to declaring in advance that you won’t accept anything whatsoever as evidence for an AI being able to change the basic conditions of human civilization, short of the AI having already done it. This, in turn, means (almost by definition) that none of your skeptical comments would provide useful information to people outside AI, since short of an AI singularity that would render your comments superfluous, you would’ve voiced the same skepticism regardless of what had actually happened.

  162. Ben Standeven Says:

    So, to put Stassa Patsantzis’ objections in terms of the talking dog metaphor, Google’s talking dog seems to be a much better conversationalist than the preexisting talking dogs; but only because he is using a text-to-speech system instead of his vocal chords. So this is not really a fair comparison.

    Now I suppose the new metric might itself be a significant advance. But the fact that some programs [or dogs] perform vastly better under it wouldn’t be evidence for this.

    Looking at the actual paper, I see a slight problem with Google’s “new metric”:
    “There are three steps involved in solving a problem. First, participants must read and understand a natural language description spanning multiple paragraphs that contains: narrative background typically unrelated to the problem, a description of the desired solution that the competitors need to understand and parse carefully, a specification of the input and output format, and one or more example input/output pairs (that we call “example tests”)”

    This seems reasonable as a description of a programming contest; but for a model of real-world programming, the desired solution should only be vaguely described, mostly in terms of the narrative background.

  163. Scott Says:

    Stassa Patsantzis #146 and #150: To answer your question, I was aware that the field of program synthesis existed. I had the impression, however, that it hadn’t accomplished anything even close to what the AlphaCode paper reports. Your comments have reinforced for me that this impression was correct, since if better examples existed, then you (as an expert in program synthesis) would surely have provided some.

    Speaking of which: in the real world where I live, there’s no way to get around subjective judgments about which research accomplishments are more impressive than others. We do it on conference program committees, we do it when hiring and recruiting, we do it when deciding how to allocate our own time. And there’s no way to reduce these judgments to quantitative metrics—or rather, every attempt to do so merely pushes the question somewhere else, to which metrics are the impressive ones to do well on.

    So, one way out of this impasse would simply be to say “different strokes for different folks.” This is my blog, so I used it to write about AlphaCode, which I personally found to be impressive. You can use your blog to write about program synthesis work that you find more impressive.

    But maybe we could do better than that. Let me register a falsifiable prediction, right now, that within the academic CS community—never mind the rest of the world!—the vast majority will see it my way, that AlphaCode’s solving programming competition problems is the most impressive feat of program synthesis so far, while only a tiny minority mainly of program synthesis researchers will see it your way, that generating SQL queries and recognizing palindromes and the other stuff in the papers you cited is more impressive.

    Maybe I’ll turn out to be wrong! If I’m right, though, then it would seem you now have a rather momentous choice to make, of whether to let history leave your whole research area in the dust, as happened (for example) to the logic-based machine translation approaches when the statistical revolution arrived in the 1990s. I sympathize, as it can’t be easy to find oneself in that position!

  164. Nate Says:

    I get that these problems are computationally ‘hard’ or non-trivial at least but the approaches taken to solving them by AlphaCode I would argue very strongly are not ‘programmatically hard’. This is to say that you would never use these internally in any complex program outside of maybe some tight inner loops somewhere that you have trouble solving efficiently. However people have for (much?) longer than 20 years programmed constraint solvers in LISP dialects as an approach to optimizing similar problems and those small programs do what I would suggest is an even better job than AlphaCode’s massively parallelized ML training regime while running on 20 year old hardware. Is it worth the billions to replace an old gray beard LISPer with a program? Sure why not, it’s not my money 🙂

    This is obviously not a fair way to judge what is essentially a Proof of Concept and this is still impressive from a lot of standpoints, especially AI and ML, but the application to ‘general AI’ is… tenuous. This is a specialized task that though hard is not something that at any time soon is going to train itself further and further to solve problems beyond some limiting threshold or even more so outside its scope of trained problem space. That is a claim for sure, but no current ML system (including others of the Alpha systems) has overcome similar thresholds in its capabilities that keep its solutions scoped to the problem space it is trained on and in doing so limits its ability to learn and unlearn selectively keep improving past some point. Systems like AlphaGo have only to ‘beat humans’ to ‘be done’ so we don’t necessarily care in all cases.

    As a more concrete point I am claiming very intentionally that there is a hard limit on the ability of these systems as we are currently building them to learn solutions outside the scope of their original problem space and therefore unless they can share information in some protocol that allows a third party to train them autonomously in new problems without ‘starting from scratch’ we are not getting ‘general AI’ any time soon.

    I get that ‘writing programs’ is a pretty juicy problem space (because it leads one to think of a program writing other, better versions of itself partly) but it is still only working with a kind of language processing and reorganization. Not that I don’t believe AlphaCode ‘understands’ what it is doing, but I would say that AlphaCode probably does not understand what it is doing in a way that can transcend ‘writing programs’ and suddenly become ‘creating new protocols it was never asked for, just because it wants to’.

    In the Alpha’X’ framework solutions (ie AlphaGo, AlphaZero, etc) the reason each one is a separate system is due to the real world constraints of training with the given data set and rules/predicates for that specific problem space. There is a fundamental difference between each system and I do not think anyone (even AlphaCode) is going to get them to work together ‘any time soon’ to learn another of their brethren’s problems.

    My only real point here is that there is a non-trivial barrier between where we are now and the Singularity that is going to take some seriously revolutionary approaches and not just faster ML chips and more training time. In 50-100 years though? Who knows 🙂

    I am a bit surprised to see several people mention ‘static analysis’ but zero mentions of ABC or Cyclomatic or any quantitative approaches to estimate some aspect of complexity. Not that we would get a whole lot out of a deep dive through specific numbers of assignments, branches and conditions that AlphaCode produces but it might be interesting.. to some one 😉

    On a tangent:
    Since I recently watch and read a lot about the original Deep Blue vs Kasparov battles, it is very eye opening that Kasparov had several winning positions but he was legitimately intimidated by the play of Deep Blue. Today of course that has nothing to do with it, computer chess programs have long outstripped even Magnus Carlsen and will never look back (unless we implant ML chips into our brains maybe?)

    Love your blog Scott! Keep em comin’

  165. Ben Standeven Says:

    On the other hand, I’m sure Google is putting a lot of effort into making their system better able to handle vague and incomplete specifications. And AIs are traditionally good at working from vague and incomplete data; so I do expect AlphaCode’s performance in this area to improve fairly rapidly.

  166. Craig Gidney Says:

    Stassa #150:

    > how familiar were you with the program synthesis literature before hearing about AlphaFold?

    Not familiar. When I asked for a reference it was because I wasn’t able to search for one myself.

    It will be obvious in a year or two whether AlphaCode heralded your field turning entirely upside down, or if nothing came of it. Maybe it will be like AlphaGo and they’ll come back and really blow everything away or maybe it will be like AlphaStar and they’ll drop it.

  167. Verisimilitude Says:

    AI systems have already proven mathematical theorems men have already discovered. The difference was these were symbolic AI systems, that could explain to a human how the results were achieved.

    Regarding program synthesis, my work has led me to believe only structured program synthesis will lead to worthwhile results, in most cases.

    Lastly, this nonsense isn’t like nature or evolution, because these constructs cease learning, and their “training” is nothing like training a man. A man constantly learns and changes; those who don’t are usually called “stupid”. Regardless, I’m done here.

  168. anon85 Says:

    A Raybold #130:

    That hackernews comment is pretty ridiculous, because nobody models the input/output as unstructured strings. By that logic, no computer could sort a list either, because if the input is 1000 characters and the output is 1000 characters, then, and I quote,

    “Then a model is function F: X -> X where X is a set of 2^5000 elements. There are 2^10000 such functions, and the goal of a training process is to find the best one. […]
    It’s obvious that no amount of brute-forcing can possibly find this. An no classic mathematical statistical modeling can possibly help. A problem like this can be only approached via ANNs trained through backpropagation, and only because this training process is known to generalize.”

    In other words, according to that comment, only ANNs can sort.

    It’s plainly ridiculous to say that no amount of statistical modelling can possibly help; if something is information-theoretically impossible, then it is also impossible for ANNs. ANNs are not magic! The very fact that they succeed here means the task is not information-theoretically impossible (which, duh; humans succeed at it too!)

    Anyway, AlphaCode is indeed very impressive, and I don’t want to minimize that too much. Just don’t spout nonsense about statistical impossibilities please 🙂

  169. anon85 Says:

    Not Even Inaccurate #187:

    The point is to get a number for the fatality-per-mile of self-driving cars, and compare that with humanity’s numbers. To do this, we need to divide a number of self-driving fatalities by the number of miles driven, where both the numerator and denominator are in comparable settings.

    I can’t find my ~30 million estimate, but here is an article from just under 2 years ago saying Waymo is the leader in number of miles and has 20 million miles overall, and also saying 1 fatality from self-driving cars:

    https://www.vox.com/future-perfect/2020/2/14/21063487/self-driving-cars-autonomous-vehicles-waymo-cruise-uber

    If my numbers are out of date, they are only out of date by 2 years. So as of 2 years ago, my comment would be correct. If you have a source for better numbers, please link!

    You say:

    “On the whole, I’d say the fatality rate for AVs is now, pessimistically, at 6/ 10B miles, or about 50 times better than 1/30M for humans.”

    But this is a statistical near-impossibility: I find it extremely unlikely that the fatality rate is 50 times better than for humans, simply because **the other drivers on the road are humans**. If I were to see a claim of 2x safer, I’d believe it (and I would assume it might improve way further with wider adoption); but at 50x safer I assume someone is lying to me. It’s simply not plausible, not with all the other drivers on the road being crash-prone humans.

  170. Gadi Says:

    I would definitely be shorting Tesla if it wasn’t so volatile. For reference, at 9/2/2021 at the price of 922.
    Sadly, shorting Tesla is more likely to get you short squeezed before the bubble eventually pops, and puts are really expensive for Tesla unless you have good timing information on when the bubble pops.

    Some self driving cars can potentially be better than humans like those with lidar and powerful sensors, or comparing them to drunk humans (but you don’t discard drunk humans from the statistics, even tho for every responsible person the real safety stats are lower. I’ve also seen people comparing stats that include motorcycles).

    But some self driving cars companies are reckless, and manipulative in their statistics, and they don’t have the computational power and sensory information to compete with humans.

    In some ways, it’s exactly like trusting Pfizer with their vaccine data, which turned out to be extremely far from reality in practice. Some people are so naive they “trust the science” when it’s a company presenting its own statistics when billions of dollars are on the line. I was naive too and took their vaccine pretty early.

  171. Stassa Patsantzis Says:

    Ilio #156 I agree that nobody can be well-informed on any obscure topic of research that they haven’t studied. The reason I’m asking is because Scott and Graig have both expressed surprise at the results of AlphaCode, but I want to understand how well they understand where their surprise comes from. Informed surprise is more surprising than plain surprise. I’d be really surprised if AlphaCode could surprise me, or others in the (broader) program synthesis community. But if someone is surprised that isn’t very well aware of the state-of-the-art in program synthesis, then that’s not a very surprising surprise at all.

    Hah. Parse _that_, LLMs.

    Veedrac #155 Well, fancy seeing you here! I understand your comment as coming from a place of concern about the development of existentially dangerous superintelligence. I have to say, I don’t share this kind of concern. While producing a polynomial time solution to an NP-complete problem would be a tremendous contribution to computer science, I don’t know that it would necessarily require superintelligence.

    On a more personal note, Stuart Russel has complained that AI experts who have always held that superintelligence is feasible are now discounting the risk, but all I can say is that I have never claimed that superintelligence is possible, or that it is possible to create with the algorithms I study. More to the point, creating superintelligence is not one of my research goals. I was drawn to my subject because of my fascination with certain properties of algorithms that automate logic theorem proving, and I am not interested in creating “artificial intelligences”. I even find it a little unfortunate that the task of automating logical reasoning has been associated with AI, and of course I blame John McCarthy for that (who is one of my heroes, otherwise). As far as I’m concerned logic is a powerful branch of mathematics that has already changed the history of science and technology (see: digital computers) and that suffices for me to study it. I hope this addresses your comment?

    Scott #152 See my answer to Veedrac. I agree that showing P = NP is a bar set too high, but so far, the history of AI research is full of “proofs of intelligence” that set the bar too _low_. Re-discovering results we already know has been done to dath, in fact it’s what everyone in program synthesis has done since forever. If you want a stinging critique of the field, then that’s it: we have never managed to demonstrate that our systems can come up with trully novel solutions to known problems.

    Or to point us to unknown problems. Gosh.

  172. Stassa Patsantzis Says:

    Scott #163 You continue to make me feel uncomfortable with your attempts to guess at my feelings about AlphaCode, when I’m doing my best to stick to things we can agree on, hopefully.

    I think your dismissal of the program synthesis results I referenced is not the result of careful thought.

    I confess that reading again with fresh eyes my earlier comment #146 about comparing ALPS and AlphaCode on quantitative measures of a problem’s complexity is … too convoluted. I think I make it more clear in #150. The example results I gave, as in ALPS, are important because the search spaces for their problems are very large. In #146 I focused on ALPS because it quantifies the complexity of an unconstrained search and so it’s a good measure of how hard it is to solve those problems _for a computer_. In program synthesis it is common to find problems that look easy to solve for a human, but are almost impossible to solve for a computer (my standard example is learning a grammar for the a^nb^n language that humans can do with two or three positive examples, whereas most automated systems can’t solve at all; and that includes deep neural nets). That is why we don’t just eyball problems in computer sicence, but try to quantify their complexity (well, some of us do anyway). It’s also why nobody goes by vague descriptions of programming problems such as “trivial”, “impressive” etc. “Trivial”- for whom?

    One of the hardest problems in program synthesis (if not the foundational problem of all of computer science) is the immensity of the space of programs. In particular, the cardinality of the search spaces for arbitrary programs in Turing-complete languages is infinite. Pretty much all of the work in program synthesis has concentrated on finding ways to deal with that. The AlphaCode paper does not offer any new solution on that front, instead it wastes time on meaningless tasks like comparing its performance to some unknown group of human coders, which is splashy and catchy, but uninformative.

    In terms of contributions to computer science, AlphaCode makes none. It goes back to a primitive approach of essentially undirected generate-and-test. That is not going to solve any problems in program synthesis, even if it can demonstrate individual solutions to specific, narrow problems for which the system has been specially designed. DeepMind make a big to-do about their system solving problems that “require” “reasoning” and “thinking” etc, but I can write (or get my system to learn) a program that solves TSP for small n and that doesn’t give me the right to claim P = NP. Even DeepMind themselves don’t claim any groundbreaking contribution to neural prorgam synthesis other than “we beat some random coders at some select coding tasks”. Er, duh.

    But if we’re going to be rough and heuristic, I do have a rule of thumb, for any claim of improved performance or breaking new ground: “what would Vladimir Vapnik think?”. I swear I can already hear the dismissive grunt that would be the answer to AlphaCode, from an ocean and half a continent away.

  173. Gerard Says:

    Scott #159

    > What, if anything, might convince you that an AI was conscious?

    I think it’s a mistake to bring consciousness into discussions of AI (I realize you weren’t the first to bring it up but you did accept to continue the discussion on that level). It confuses two fundamentally different things, one that is definable and measurable (although people might dispute the definitions and measurements) and one that is not. One belongs to the world of representational knowledge and conventional reality while the other is about ultimate truths. The only instrument you will ever find that is able to study consciousness is consciousness itself.

    By the way I responded earlier to your question in comment #142 but my response seems to have been lost.

  174. Gerard Says:

    Scott

    To further clarify what I meant about questions about intelligence and about consciousness being of entirely different orders, suppose you asked “what would it take to convince you that computers are now better at programming than humans ?”. I’m not sure how I would answer that question but it’s pretty clear that there exists some state of the world that would convince me of that just as there was a state of the world that first arose roughly twenty years ago (if my memory isn’t too far off) that convinced me that computers were better at playing chess than humans. Moreover I think that most people (at least among those who strive for intellectual honesty) would agree that such a world state potentially exists (even if they might not believe it will ever be reached).

    On the other hand when you ask “what would it take to convince you that a computer is conscious” you’re asking a question about something that no one (except perhaps the fully enlightened, if they exist) has a real handle on and that touches the most fundamental questions which have traditionally been the domain of religion, philosophy and related pursuits.

    If pressed I would have to admit that my own answer to that question has evolved a bit recently. A few years ago my answer would have been simple: “nothing”. Today my answer is more like the question doesn’t make sense, because it makes assumptions that are “not even wrong”.

    Specifically I don’t think that consciousness is something that can ever be said to belong to any object (whether human, natural or man made). The belief that you have your own consciousness and I have mine is a delusion that results from the conscious “I” confusing itself with some particular object, which is itself nothing more than a manifestation of its own activity.

  175. A Raybold Says:

    Anon85 #168 Putting aside, for the moment, the question of whether Killerstorm’s argument holds up, your counter-argument does not seem to. Taking, as you do, sorting as an example, the problem presented is not to sort the set of characters, but to write a program that will sort sets of characters, given only a natural-language description of the problem with some examples. To confuse solving the stated problem for writing a program that solves the problem is to miss the point here. When you write “In other words, according to that comment, only ANNs can sort”, you are not actually putting Killerstorm’s argument in other words.

    Note that 1,000 characters was not referring to the size of the problem the generated program is to solve, but a ballpark estimate of the size of the program. Sorting programs are much smaller. Also, the natural-language description of the sorting problem is considerably simpler than that of the singer’s problem.

    One place where, on reflection, I would disagree with Killerstorm is that it is not necessary for AlphaCode to find the unique best solution, or even a correct one: its solution only has to deliver correct answers for the specific examples and problems it will be tested on (arguably, it could, in principle, ‘succeed’ even if it failed the examples.) Having said that, what I wrote following the quote does not depend on Killerstorm’s specific claim that only ANN’s could do what AlphaCode has done. One can read Killerstorm’s argument as one merely for the plausibility of this not being merely a brute-force method that could be replaced by designing a program by explicitly implementing classical statistical methods.

  176. Ilio Says:

    Gerard #174,

    If so, you should consider conscious any computer with the same delusion.

  177. Gary Marcus Says:

    Scott,

    Thanks for inviting me to participate! For me, the most important question for programming synthesis for programming that extends beyond toy problems are four:

    1. Can we specify the intent of what we want with enough clarity?
    2. How can we devise systems that can correctly infer that intent?
    3. Can we build systems that can work from that intent to code?
    4. Is that code reliable and secure enough?

    I actually think that #1 is the hardest, and that #2 is extremely hard to solve in a general way. #3 is hard, and only been shown to work in fairly simple cases, that look to me more like subroutines than the fuller task of designing and implementing a software architecture for a complex problem. The only security evaluation I have seen for the somewhat related system of the Codex GitHubCopilot was dismaying: https://arxiv.org/abs/2108.09293

    Certainly we can come up with some English sentences that clearly specify some very well-posed problems, e.g. list all the prime numbers less than 1,000, and in some problems like those DeepMind worked on we can come up with sample inputs and outputs, as a way of specifying the problem. But in most real-world software engineering the thing that we want is not as fully and directly specified as it is in mathematics, and the mechanisms of specifying input-output examples won’t go as far as they do in the AlphaCode paper.

    To borrow and extend an example from Filip Pieknewski, the challenge to “implement a #python interpreter without GIL and make sure all the built in data structures are thread safe” is not one that lends itself to DeepMind’s approach; I seriously doubt that any 2022-vintage NLU is going to understand what that means, and a simple list of inputs and outputs isn’t the right way to articulate the problem. Ditto for “build me a web browser that is stable”, or “build me a system that produces non-toxic language in response to user inputs”. The semantics of human language is vast, and I think that you are underestimating the challenges on that side. (For instance, an architect relies on the programmer to fill in many underspecified details, hoping that the programmer can fill in details in way that is consistent with overall mission. The only true full specification would be the code that the programmer is supposed to write; the point of the programmer is to spare the architect from having to spell out what is obvious to a person. It’s the same with natural language in general: we really on the listener to fill in the obvious, so we can focus on the novel and important information.)

    Generate-and-filter works fine for (some) well-specified problems, but I concur with Stassa that we shouldn’t rest too many eggs in that basket, especially if we are thinking about something that builds higher-level architectures and not just individual subroutines for super well-defined problems in some set of narrow classes.

    Compare with GPT-3 in general – it’s fantastic at producing individual sentences that are well-formed, but it’s easy to lead it into incoherence after just a few sentences. Higher-level programming isn’t about writing a line of code that works to pull up a web page, it’s about structuring a coherent solution to a previously unsolved problem that is typically underspecified.

    Discerning user intention and delivering that sort of coherence is not the sweet spot of current ML.

    Gary

  178. fred Says:

    Scott #159

    > What, if anything, might convince you that an AI was conscious?

    I would say that, if we have discrete AI “entities”, and they communicate in a way we can understand (like, plain English), and if we don’t interact with them since creation (so as to not “contaminate” their brain patterns), then it will be interesting to see if they could “spontaneously” start talking with one another about something similar to consciousness, or meditation, or the concept of qualia, or ask “why am I me and not you?”, or any sort of discussion as to what is aware and what is not aware (from their perspective).

    Because, even if all those concepts evolve naturally in normal neural circuitry in a humans (because we are conscious), it’s hard to imagine how they would also appear spontaneously in systems that aren’t actually conscious (if we make sure they don’t appear from interactions with humans, by mimicry).

    Of course the difficulty would be to teach such AIs to speak English with one another without being contaminated by those concepts. Which may be impossible because, fundamentally, all human words/concepts circle back eventually to basic qualia concepts, i.e. special words in a dictionary that can’t really be described using other words, but that every human is somehow able to accept as “obvious”.

  179. Veedrac Says:

    Stassa Patsantzis #171:

    Veedrac #155 Well, fancy seeing you here! I understand your comment as coming from a place of concern about the development of existentially dangerous superintelligence. I have to say, I don’t share this kind of concern. While producing a polynomial time solution to an NP-complete problem would be a tremendous contribution to computer science, I don’t know that it would necessarily require superintelligence.

    If you think even solving P=NP would not be enough to convince you that maybe just maybe the people saying neural networks are working might actually have good reasons for their beliefs, I don’t believe you to consider yourself convincible at all. You don’t get to be wrong about something of that magnitude and with that level of disbelief and then immediately whip out the same generating function to say “but you haven’t proven neural networks can do this!” If there is no point you would make a meta-level update regardless of how wrong you turn out to be, then you are not actually reasoning.

    I could definitely do that for my stance, say what would convince me I’m reasoning wrongly. In fact I am wistfully waiting for a single year where I get to update in that direction. It hasn’t happened yet.

  180. fred Says:

    I’ve long suspected that consciousness has evolved in animals so that the illusion of free will can manifest itself as “direct” control over their anal sphincter, allowing them to fart or poop “at will” to create deep satisfaction and social comedy.
    Next time you walk below a pigeon that’s perched on a wire directly above your head, notice how it’s looking intensely at you with clear excited anticipation right before covering you in shit.

    Similarly it would be hard to dismiss consciousness in AIs if it turns out they’re mostly interested in playing practical jokes on humans.

  181. Stassa Patsantzis Says:

    Scott #163 Oh no, no! I’m not an expert on program synthesis! I’m a final year
    PhD student of Inductive Logic Programming, which is a sub-sub-sub-field of
    program synthesis (if you search the Gulwani report I linked above for
    “Inductive Logic Programming” I think it will become clear how little ILP is
    known even in program synthesis circles). I have a working knowledge of program
    synthesis from reviewing work in the field that’s relevant to my research and to
    my thesis, like the ALPS paper (that was inspired by Meta-Interpretive Learning,
    the specific ILP discipline that I study).

    If I gave the wrong impression I am terribly, terribly embarrassed and ashamed.
    That was absolutely not my intention. Ouch.

  182. Sid Says:

    Scott #163:

    The Codex paper reported 25%, 3.7%, 3% pass@5 (take 1K samples, filter by those which pass the example tests and check if you got the right answer from a *randomly* selected 5 subset of the remaining samples) results on the introductory, interview, and competition problems (from the APPS dataset) for a *1-shot* model (so there is no finetuning on a specialized coding contest dataset — you take your pretrained model and give it a single example problem and solution as context). Assuming an equal division between the 3, you get an ~11% rate.

    For 1K samples, by contrast AlphaCode (which is finetuned on coding contest type problems so it has an advantage), you get ~13% (you take 1K samples, filter to those which pass the example tests and check if *any* got the right answer — so an easier criterion than that for Codex) on APPS.

    So my guess is that if Codex paper had bothered to finetune on Codeforces and draw 1M samples, they would get a similar result.

  183. Sid Says:

    Followup to previous comment – the APPS distribution it seems is 1K introductory, 3K interview, 1K competition which gives 7.82%. Lower than AlphaCode’s ~13% but that’s for a *1-shot* model rather than a heavily finetuned one + with a slightly harder evaluation criterion and despite that not like an order of magnitude worse

  184. Gerard Says:

    Gary Marcus #177

    In my experience somewhere between 50 and 100% of a programmer’s time is spent just transforming data from a form provided by one API into a form required by a second API, without anything that resembles non-trivial problem solving going on that isn’t already handled by some standard library function.

    Where you fall in that 50-100 range depends on what kind of programming you’re doing but I would guess that the range covers just about all programmers who create any sort of application useful in the real world. Even the job of developing AlphaCode probably fell somewhere in that range.

    What about problems of that type that are much more common but quite a bit easier than de-GIL’ing python ? Consider for example the problem of font rendering. What are the inputs to the problem ? A TTF file that contains a bunch of tables along with a quite poorly written spec document that sort of describes what they contain, a set of UTF-8 strings that you want to render, and, to simplify things, a reasonably simple and well-specified 2D drawing API. Of course the desired output is the input text reasonably rendered into image pixels

    I wonder what it would take to build an algorithm to handle that sort of task.

  185. mark Waser Says:

    I would argue that scalability being based upon the length of the program (even if exponential) really shouldn’t be a concern. Best programming practices already insist that modularization is the way to go. The next step for AlphaCode is to be able to take a complex problem and break it down into manageable pieces (and then do that as a multi-level process).

  186. Stassa Patsantzis Says:

    Craig Gidney #166 Well, if you weren’t familiar with the field, then is it impossible that your impression of AlphaCode’s ability is not entirely accurate? That’s the reason I ask, and not to pull rank or anything like that (also, see my comment #181).

    Btw I realise I’ve been calling you “Graig”. Sorry about that!

    As you say, time will tell how AlphaCode will turn out. Clearly DeepMind have big plans for it, given they named it Alpha-something. But I should point out again that my field is not program synthesis, but ILP, a sub-field of program synthesis. ILP is a tiny, tiny field, that most modern researchers in AI, machine learning and even program synthesis don’t even know exists. I’m used to my work being ignored, or dismissed as irrelevant by people who have never heard of it (I’ve had people tell me it’s not even machine learning, because it doesn’t use gradient descent). I’m used to being the underdog, that is. I actually kind of enjoy it. In any case, it doesn’t matter what DeepMind or OpenAI or anyone else achieves, because it will most likely have nothing to do with logical induction, which is what I work on. And why should I stop my work, just because of something someone did in an adjacent field? To make an analogy, just because we have jets, doesn’t mean we’ll stop working on helicopters.

    Veedrac #179 I don’t think I have ever said that neural networks don’t work. Neural networks are powerful classifiers. They have limitations and weaknesses, most notably their insatiable appetite for data and compute. But sure, they do work.

    What I proposed above was that coming up with a polynomial time solution to an NP hard problem would convince me that neural nets can demonstrate novelty, in particular novel solutions to programming problems. That’s because we don’t know how to solve NP hard problems so it’s very unlikely that solutions to such problems will be included in a neural net’s training set. So for a neural net to solve such a problem it should be capable of representing instances of a target concept radically dissimilar to its training instances. Hence, novelty.

    I don’t think that proving P = NP is any kind of proof of intelligence. Humans don’t know how to prove it but we’re clearly intelligent. The test I propose is a test of novelty, not intelligence. I don’t know how to test for intelligence.

  187. Boaz Barak Says:

    Sid #182: I think whether it’s openai Codex or deepmind AlphaCode doesn’t really make a difference to the point Scott is trying to make.

    Stassa Patsantzis #186: I think your position is analogous to a physicist who says “I will find quantum mechanics useful if it can demonstrate communication faster than light”. Most computer scientists believe that a polynomial-time algorithm for NP hard problems simply does not exist, and so in particular Neural Networks will never be able to find one.

    In any case, it is absolutely fine for (1) an advance in program generation from text to be very interesting to a lot of computer scientists and (2) for this same advance not to matter much for your agenda on inductive logic programming.

    Note that for most people (programmers and computer scientists included) programming is normally understood as translating natural text description of a task into a precise set of instructions that can be parsed by a computer. So many of us care more about generation of programs from natural language prompt than from input/output pairs. But again, it’s absolutely fine for different people to have different priorities!

  188. Scott Says:

    Incidentally, Jacques Distler #119: Dana and I are also Tesla owners, generally extremely satisfied, but we also wouldn’t trust its current so-called “self-driving” features. But my impression is that, if it weren’t for the usual human bias of holding new technologies to 1000x higher safety standards than old technologies (and making up complicated rationalizations whenever it’s pointed out that one is doing that, rather than actually changing one’s mind), then the best experimental stuff that’s out there would already be ready for prime time, and if not, then certainly the next generation afterwards.

    (Though if, in order to make the roads 100% safe enough for self-driving cars, we needed to install explicit left-turn lights at all intersections … well, some of us would strongly support that even for reasons completely unrelated to self-driving cars! 😀 )

  189. Boaz Barak Says:

    Gary Marcus #177: There is a very big range between the kind of dozen-line programs that systems like AlphaCode/Codex can generate to your example of building a web browser, which has millions of lines of code. This is why I compared them in comment #22 to a dog that speak a very eloquent paragraph.

    But the big question is not whether 2022 vintage code generators can generate a million-line program, or 2022 vintage can generate a full-length novel that makes sense. We know the answer to this question is a resounding no. The question is how will the need for resources scale with the length of the context that such systems maintain, and in particular whether an order of magnitude increase in consistent output (say programs of 100 lines) will require an order of magnitude or much more in computational resources. We can make some educated guesses (e.g., based on growth of transformer architecture with size of window etc.) but I don’t think anyone knows with certainty the answer to this question.

  190. Veedrac Says:

    Stassa Patsantzis #186:

    I don’t think that proving P = NP is any kind of proof of intelligence. Humans don’t know how to prove it but we’re clearly intelligent. The test I propose is a test of novelty, not intelligence. I don’t know how to test for intelligence.

    Do you understand why I might find this a particularly unreasonable opinion to hold, to claim that an ML theorem proving system showing P=NP is not even evidence that the system has a general capability of doing impressive, cognitively important things? That I might see it as a skeleton on the ocean floor, holding up a sign saying I’m still not sure whether the boat is sinking, but I’ll give a high probability that the boat is wet.

  191. Lorraine Ford Says:

    Scott #159:
    First, one needs to define or model what one means by “consciousness”, and go from there. I think there is only one possible thing that consciousness can be, and that is the following:

    It is logically necessary that a differentiated world can differentiate itself. It is logically necessary that a differentiated system (differentiated into aspects that we would represent by (e.g.) equations, variables and numbers) can differentiate itself (i.e. discern difference in the aspects that we would represent by equations, variables and numbers). I’m claiming that this limited discerning of difference, by the system, or by the parts of the system that the variables and numbers apply to (e.g. particles), is primitive consciousness.

    When it comes to more advanced consciousness, like the consciousness of living things, I’m claiming that “red” and “green” are the actual discerning of difference. “Red” and “green” don’t actually exist, except as consciousness of difference. Also, one can only symbolically represent the subjective conscious experience of “red” as (e.g.): “wavelength = 700 nm IS TRUE”.

    When it comes to computers, what basically exists is transistors, wires and voltages. People have arranged the transistors, wires and voltages in special ways so that they can represent e.g. binary numbers. So you could say that there are 2 levels of numbers in a computer: 1) the genuine numbers that apply to the individual voltages; and 2) the “binary numbers” which are actually an array of higher and lower voltages that people use to symbolically represent binary numbers. These “binary numbers” are merely symbols that mean something from the point of view of people: that’s the way that people set the computer up.

    So it is up to those, that contend that a computer could be conscious, to explain how a computer could be conscious of these binary numbers (as opposed to the genuine numbers that apply to individual voltages).

  192. Scott Says:

    Lorraine Ford #191: Can I interpret your answer as saying that there’s no test a machine could ever pass, after which you’d say it was conscious?

      When it comes to computers, what basically exists is transistors, wires and voltages. People have arranged the transistors, wires and voltages in special ways so that they can represent e.g. binary numbers.

    What’s your response to those who say that, just as people have arranged the transistors, wires, and voltages so they can represent binary numbers (and hence pictures, sounds, videos, programs, etc.), so evolution has arranged the neurons and synapses in our brains so they can represent thoughts? And that therefore, if your argument works then it also proves that we’re not conscious? Or if not, then what’s the crucial difference between neurons and transistors; what’s the “special sauce” that makes only the former and the latter give rise to consciousness? Whatever the “special sauce” is, how sure are you that it can never be replicated in a machine?

    Will you at least concede that these are the relevant questions in this debate?

  193. Scott Says:

    Stassa Patsantzis #186:

      I’m used to being the underdog, that is. I actually kind of enjoy it.

    On deeper reflection, I’ve decided to stop arguing with you. Not because you’ve persuaded me of your position—you haven’t—but simply because it occurred to me that your opponents have the awesome combined might of Google and DeepMind and much of the world’s ML community and millions of dollars and thousands of processor cores behind them. So they probably don’t need my help to argue against a PhD student in Inductive Logic Programming in a blog comment section. There’s something that I even find charming about your quixotic crusade. And who knows, maybe ILP will have useful insights to offer in taking this stuff to the next level. Thanks for commenting here and best wishes! 🙂

  194. Lorraine Ford Says:

    Scott #192:
    Surely the most relevant issue is: first, attempt to define or model what one means by consciousness, before one can make claims about that version of consciousness. I’m saying that consciousness is a logically necessary part of a system; consciousness is the logically necessary part of the system that differentiates/ discerns difference. How can one test for consciousness (if indeed that were possible) without first defining what one means by consciousness?

  195. Stassa Patsantzis Says:

    Scott #193: I like “quixotic”! Thanks for the discussion and for the wishes 🙂

  196. Shmi Says:

    Just wanted to mention that simple scaling up tends to result in qualitative changes rather often.

    It goes back to Hegel noticing it as Aufhebung, and then Marx and Engels ran with it in dialectic materialism. While Marxist prescriptivism is harmful nonsense that assumes some idealized human not found in nature, the descriptive part, like the patterns they noticed in physical world and and society, is actually pretty good.

    Anyway, scaling up is in general one of the most promising ways to observe something qualitatively new, and so far it has worked well for machine learning.

    For a bit of fictional evidence (but fun to read), here is Stanislaw Lem’s The Invincible
    https://en.wikipedia.org/wiki/The_Invincible

  197. Stassa Patsantzis Says:

    Boaz Barak #187: Regarding the comparison to a physicist etc, I didn’t say anything about “useful”. I made a point about new contributions and about the necessity of natural language given the existence of I/O examples.

    The problem with program synthesis from natural language is that natural language is vague and imprecise. That’s why we have special, formal languages for mathematics, yes? Eventually program specifications “in natural language” tend to become more and more precise until they approximate a formal language- or a DSL, or a Controlled Natural Language. At that point we’re basically back to good, old-fashioned _deductive_ program synthesis (program synthesis from complete specifications) and you don’t an LLM to do that.

    Note also that approaches like AlphaCode require a large corpus of mostly correct programs to train on. I can imagine neural program synthesisers training on their own generated code, but there’s an obvious long-term problem with that.

    Veedrac #190: I didn’t say anything about “general capability of doing impressive, cognitively important things”. I proposed a test of novelty. I think I better bow out of this conversation myself because it seems to have reached saturation point. Take care.

  198. Timothy Chow Says:

    Stassa Patsantzis: I don’t understand why you think that finding a proof of P = NP would pass a “novelty” test. What’s novel about it? It can’t just be that humans haven’t yet found such a proof. Humans didn’t have a proof of the Robbins conjecture before a computer found it, and surely you wouldn’t consider the program that found the proof of the Robbins conjecture to have done anything novel. So suppose some computer program applies some heuristics to search the space of proofs, and after cranking away for a while, it spits out a humanly incomprehensible but formally correct terabyte-long proof of P = NP. Surely there would be nothing novel about that?

  199. Not Even Inaccurate Says:

    “What would Vapnik do?” is a great motto! Sure, he is famously not a fan. It’s worth pointing out, however, that the vast majority of “pre-DL” era luminaries did become DL fans or at least worked hard to understand the underlying theoretical principles. Bartlett, Ghahramani, Zisserman, Tenenbaum, Tishbi, Arora, Sebag – you name it. Pretty much nobody besides Vapnik went and said “oh it’s all statistical nonsense! Let’s go back to our safe cuddly pre-DL world”. Jordan, for all his kvetching about DL and its role in AI, has Ack(100) standard-looking DL papers with his students, to steal a phrase from Scott. I should know, one of them scooped me 😀 like, sure, you dislike DL. Could you then keep to your damn LDA?! Oh, it’s because it works, is it?

    Also, why would anybody sniff dismissively at the evil incomprehensible networks, when, say, XGBoost results in similarly opaque models? And whatever one’s opinion on DL is, XGBoost/catboost/… runs our lives right here, right now, even more than DL does.

  200. Not Even Inaccurate Says:

    @Gadi – Tesla is a perhaps bad choice for this particular discussion.

    For starters, much of its value as a car company is not in its AV capabilities, but in EV. If Musk announces tomorrow that Tesla abandons all things FSD, the price will suffer a lot, sure. Tesla will still remain a major EV company.

    Then there’s the fact that Tesla is not only about cars. The power storage business is another source of its value. Whether or not the total is overrated or not is way beyond my expertise, but it’s not a great example to focus on, for our purposes.

    But you could short Intel, hey. Their value is increasingly tied to Mobileye’s success, and much of the remainder is tied to DL. If you’re a DL and AV skeptic, definitely look at Intel. But the number 1 company to short if one believes that it’s all just a fad is probably Nvidia.

  201. OhMyGoodness Says:

    No strong claims but just comments.

    Numerous arguments can be made concerning consciousness that start from an assumption of Intelligent design. These includebthat it is a spark of the divine that man is unable to imbue in its creations, or consciousness is the agent of the creator still building out the universe by collapsing superpositions, or etc etc. These arguments can’t really be tested so no sense in proceeding.

    If you assume that consciousness is an evolved trait and hence had some survival advantage then possible to consider some reasonable arguments. To have some survival advantage for primitive organisms it provided some advantage in a 3D environment that enhanced autonomous survival by reducing risk of death due to extraneous events due to novel actions. I would argue this was a function of improved hardware. I can’t imagine it was new software considering that the hardware was a new development. I would not consider that plants nor viruses an bacteria possess consciousness under this definition (bacteria since seemingly autonomic reactions to environmental stimuli). Evolutionary advances in software required some established hardware as a necessary condition.

    As for the hardware of human consciousness I believe the evidence is good that it includes some quantum mechanical character as it does in other animals. I reference the impact of Xenon on consciousness. Xenon is a noble gas that interacts by quantum effects with other atoms and molecules. It doesn’t form chemical bonds nor is it soluble in fluids. Xenon is the most effective agent known for reversibly eliminating consciousness while leaving other brain functions intact. Further the two stable isotopes with nuclear spin of other than zero do not have this impact while all those with spin zero do dissolve consciousness. The other noble gasses do not have this impact and indeed He is used in diving applications because it is inert with low solubility in blood-it of course does not have consciousness eliminating properties. Xenon is not used widely as a general anesthetic because it is rare and very expensive and requires a closed breathing system for use. From the above my conclusion is that consciousness is very sensitive to quantum mechanics in a way that other brain functions are not. I haven’t read the Penrose arguments in many years but believe part of his evidence was based on hallucinogens and that is also an interesting argument. Hallucinogens are however much more complex chemically than simple inert Xenon. The general mechanism of general anesthetics is through London force interactions with proteins in a non polar hydrophobic fluid and that would apply to Xenon as a particular case.

    My conclusion is then that consciousness requires some quantum mechanical conditions and until those are understood impossible to recreate consciousness in a machine.

    If I am wrong however I would consider machines/software have obtained consciousness if they began acting in a manner consistent with how consciousness arose as a conserved evolutionary trait. That would include uncontrollably making copies of itself or actions to become the Apex Predator on Earth. 🙂

  202. Souciance Eqdam Rashti Says:

    @Scott as a software developer for the past 15 years I put myself in the skeptic rank. Not so much because I want to see deepmind algorithms churn variations of their network for different problems but because at least for me its unclear where this path will end up.

    In the case of IBM they recently sold of Watson. Deepmind has not created anything commercially viable and have raked up at least 2 billion dollars in dept. Would your university be ok with a 2 billion dollar projekt without know what the outcome would be?

    At the end of the day I think the difference is of perspective. I think its great for deepmind to have Alphabet as owners to be able to swallow all that dept and let them continue doing their research.

    But at the of the end, its kind of hard to know what are the actual outcomes? Is it to create something that can be commercially viable? E.g good engineering or is it something should have some scientific value, e.g. give us some deep insight into the axioms of intelligence? For me it seems most of the money is spent on the engineering side and less on the science.

    I mean I have seen my daughter go from a toddler to be almost two years old now and I don’t think we are even the slighest close to understanding how we do what we do and in what way we do it. Not just unsupervised learning but ablility to use common sense, observation and able to generalise from noisy data which is far from the billions of data that is feed into the deepmind networks.

    So, yes its great and its a wonderful landmark and if you haven’t tried Githubs copilot which is not even close the same thing but also pretty cool in terms of predicing what code you need..they are really great tools but are they really taking us anywhere closer to understanding intelligence?

  203. OhMyGoodness Says:

    Dr. Aaronson

    You mentioned you are introducing Python and Basic to your daughter. Are you using any specific books or online coding sites or is it completely by your own efforts?

  204. Ilio Says:

    OhMyGoodness #201,

    All ionic channels bind and open because of something something quantum that we don’t understand well. That means biologists need to measure their classical properties rather than compute them from first principles. Your statement means that we don’t have any good enough classical description of the ionic channels. This is much stronger, and hard to conciliate with the seemingly-inherently-statistical nature of neural properties.

  205. Scott Says:

    OhMyGoodness #203:

      You mentioned you are introducing Python and Basic to your daughter. Are you using any specific books or online coding sites or is it completely by your own efforts?

    I taught Lily some BASIC entirely on my own. For Python, she signed up for a Zoom course with some of her friends, but she dropped out after she got bored with it and we realized that I could teach her Python more effectively myself, despite never having used Python before.

    To be clear, we’re still at an extremely rudimentary stage, where either I sit at the keyboard and she directs me on what the program should do in a feedback loop, or else she sits there but still works on skills like typing, closing quotes and parentheses, saving and opening files, and (of course) modifying programs to beep at random times and print poop jokes. I wouldn’t say she has a solid grasp yet of variables, conditionals, or loops, let alone arrays or functions or recursion or anything like that. I’m wondering what I should be doing differently to help her grasp the concepts, or whether I should just relax and wait for it to click when she’s a couple years older … or maybe by that time, AlphaCode will have reached the point where all these skills are obsolete! 😀

  206. OhMyGoodness Says:

    Ilio #204

    Your position is then that consciousness is a functionally classical phenomena that can be captured classically en silico?

  207. OhMyGoodness Says:

    Scott #205

    Thanks so much for the reply. I looked through the drag and drop online coding programs like Scratch and I don’t see much that would hold interest there. Osmo was of interest for maybe an hour. I am considering Codakids as a shot in the dark since behind paywall and I cannot evaluate without a subscription. They do report they offer some coding programs for Roblox and Minecraft so maybe good or maybe just a waste of time and money. I am sure your approach is really best. If AlphaCode does remove all motivation to learn coding I remember she has exceptional skill as a satirist and cartoonist. 🙂

  208. Nate Says:

    I can’t help but read the comments about teaching children/newcomers code and think of Why’s Poignant Guide to Ruby (https://poignant.guide/book/chapter-2.html). In my humble opinion it is one of the most accessible and fun approaches to thinking in a programming language. Not of the exact technical specifications but the creative and human elements that make some of us truly love code and programming.

    That said, some of the language is not ‘for kids’ in the sense that it has some harder to follow grammar at times, but I imagine you can paraphrase through some of that if you wanted to.

    Not that it will solve for teaching everyone or anyone anything about programming but it might be worth a shot… and it has pictures 🙂

  209. Ilio Says:

    OhMyGoodness #206

    Yes and no. Yes I can live with this affirmation but no I was not advocating for my prefered position (which these time is « All interpretations of QM allowed, including for your brain. »). I was just thinking you might be gratefull that I could identify this flaw in your logical flow. Have a great day.

  210. OhMyGoodness Says:

    Ilio #209
    Actually I was grateful for your comment and so a belated thank you. I was just trying to understand how your comments impacted my statements about consciousness. Sorry if my response seemed abrupt or unappreciative. It truly wasn’t my intent. I think I made it clear that I don’t harbor any illusions that I understand the basis of consciousness.

  211. Lorraine Ford Says:

    OhMyGoodness #201:
    I gather, from what you say about consciousness, that you consider that consciousness is a type of subset of the normal operation of the world, a subset that is not essentially different to any other subset of the normal operation of the world, when you look at it closely. Seemingly, this is why those that believe in this type of consciousness can’t define what they mean by “consciousness”.

    I’m hoping that someone will define what they mean by consciousness. Could you please define what you mean by consciousness? I’m saying that consciousness is a logically necessary part of the world; consciousness is the logically necessary part of the system that differentiates/ discerns difference. It is logically necessary that a differentiated system (e.g. one differentiated into aspects we would represent by equations, variables and numbers) can differentiate itself (discern difference in the aspects that we would represent by equations, variables and numbers).

  212. OhMyGoodness Says:

    Nate #208
    Thanks for the link and it appears interesting. I can’t imagine the reaction to the being that eats time. 🙂

    Lorraine #211
    I can’t add anything novel to the historical discussion and will stick with an evolutionary based definition. It is what allows me to make an internal model of the external world and to act in an autonomous manner to form intentions and exercise behavior (based on expectations of the future) that increases my probability of passing on my genes to offspring.

    When my names is called it is what answers-here. 🙂

  213. A. Karhukainen Says:

    Scott #158: Yes, maybe asking for AI to come up a proof for P=NP is setting a bar too high. On the other hand, there’s a strong smell of cherry-picking in that hype about OpenAI’s recent theorem prover.

    But for a good selection of varying scale of open conjectures that various flavours of AI’s could try to tackle, please check here:
    https://oeis.org/search?q=conjecture
    And I don’t mean such notorious beasts that have names like “Goldbach’s” (that occur in the beginning of that list, so you may skip the first pages), “Collatz”, “Legendre’s”, etc, but any of the thousands of casual remarks in OEIS that say something like “This sequence seems to be a subsequence of sequence Axxxxxx” or “a(n) = Ayyyyyy(2n)+1 (conjectured)”. Sometimes these are trivial for others to see, such “conjectures” just resulting from the lack of insight of their authors, while on the other hand they might equally well be the most devilish problems there are. Most often the difficulty is somewhere between.
    Also, one popular sport is to create a sequence with a greedy algorithm that by definition is injective, and then ask whether the produced sequence is also surjective, thus a permutation of N. If that can be proved, then the reasoning is not always trivial, see e.g., the famous EKG-sequence.

    Now, because the subject matter in a certain way is quite limited, it is easy to parse the inputs by program, and also to formulate own conjectures about them, if there are not enough explicitly given ones. For example, Jon Maiga’s “Sequence Machine” is one project that generates conjectured formulas between the sequences. Christian Krause’s LODA is another, using a special assembly language of its own.
    Both of these (and other “discrete” methods before) have in my humble opinion given much more impressive results with OEIS data (especially as seen from the sequence perspective, like that they often find new surprising formulas and algorithms) than the recent half-hearted attempt of applying neural networks to the same data.

  214. JimV Says:

    Lewikee at #135 says comparing the evolution of computer AI to biological evolution is flawed. To me they are both instances of a general algorithm which consists of:

    1) Generation of trials, whether purely randomly or with some forethought (the forethought being the evaluation of simulated trials usually).

    2) Some selection criteria to separate failures from neutral changes from successes.

    3) Some forms of memory, to pass on successes through time.

    Human design evolution has the forethought/simulation capability and much better forms of memory than biological evolution. Biological evolution occurs with much more parallelism (billions of bacteria in a shovel full of dirt, each reproducing once every 20 minutes) and has been going on for much longer. Biological evolution has the criteria of biological survival and reproduction. Human design evolution has the criteria of survival and reproduction in the marketplace (of products and ideas). They are not identical twins, but they are at least first cousins.

  215. Gerard Says:

    Lorraine Ford #211

    > I’m hoping that someone will define what they mean by consciousness. Could you please define what you mean by consciousness?

    Consciousness is the semantic meaning of the word “I” in the phrase “I am”. It is that which is aware, the experiencer of phenomena.

    > I’m saying that consciousness is a logically necessary part of the world; consciousness is the logically necessary part of the system that differentiates/ discerns difference.

    I think you’ve missed one step. What you are describing is the basis of dualistic consciousness, the vijnana in the Buddhist concept of Dependent Origination. In English that word is sometimes translated as “consciousness” but I suspect that’s a mistranslation. I think what we really mean by consciousness better translates to simply “jnana”, cognate with for example “\( \gamma\nu\omicron\sigma\iota\sigma \)” or “knowing” in the Western branches of the Indo-European linguistic tree.

    Knowing is logically prior to knowing difference.

  216. JimV Says:

    Not that anybody cares or should care, but my definition of consciousness is that it evolved to perform the analogous functions of a computer’s operating system: to receive external inputs from the outside world, parse them for transmittal to internal routines for processing, and transmit the results of that processing to the external world (via some actions).

    Just as (ugh) Windows receives some typing and transmits it to, say, an Excel program, then receives results from Excel and displays it on the screen, without itself knowing what Excel is doing, there are no nerves which monitor neurons, so their results (chemical and electrical signals) seem to appear by magic and people associate consciousness with magic. Whereas it is all physics, just physics.

    See for example, the article “The Man Who Mistook His Wife For a Hat”. Recognition of shapes seems to be a function performed by a specific set of neurons, and when a tumor destroyed those neurons, the person could still talk rationally and perceive colors but could not translate what he saw into known shapes. There are numerous known cases of loss of various cognitive functions due to brain damage. One person (I read in one of Dimascio’s books) lost the ability to make decisions. Given a chess problem, he could find the winning moves, but asked to play the game out he did not see any reason that motivated him to value one move over another.

    Human brains have around 100 billion neurons; smart dogs about 500 million; C. elegans worms have 200 neurons and can use them to navigate and memorize a maze placed between them and food. A recent study said that it takes a neural network of about 1000 nodes to fully simulate a neuron (mainly due to their multiple and changeable connections, I think). AlphaGoZero had two neural networks of about 200,000 nodes each (plus some tensor processors), if I recall correctly, and astonished experts with its unexpected but beautiful moves. We still have a long way to go to match human processing capability and the dedicated (trained) neural systems that evolution bequeathed us, which allow us to walk and chew gum at the same time, but the principal that it could be done (if our civilization lasts long enough) seems well-established to me. Why not? (And don’t bother telling me, “Because magic.”) (Neurons are either active or not, 1 or 0, and no neurons, no consciousness.)

  217. Gerard Says:

    JimV #216

    > there are no nerves which monitor neurons, so their results (chemical and electrical signals) seem to appear by magic and people associate consciousness with magic. Whereas it is all physics, just physics.

    If there are no nerves which monitor neurons where is it exactly that those results are “appearing” ?

  218. mls Says:

    #215 Gerard

    You cannot know “logical priority” without an instantiated irreflexive relation.

  219. f3et Says:

    In all this discussion about AlphaX and creativity, I am quite surprised that nobody mentionned MuZero ; this would seem much more impressive, no ?

  220. Igor Ferst Says:

    Scott #205

    Just wanted to chime in re teaching kids to code. A few years back I designed and taught an after-school programming course for middle-school students (lasted a couple of months, no background assumed). Great experience, taught me a lot about how to make coding accessible. Couple of things that may or may not be helpful:

    1. I used a python implementation of Karel with the kids. I highly recommend it. Karel was how I also learned to program ages ago. There are different implementations and I’m sure their quality varies, but the fundamentals of this approach are excellent. Much easier to get kids to engage with Karel than a blank text editor.

    2. Though python is a beginner-friendly language, one of the kids’ biggest conceptual obstacles was indent-based code blocks. They just didn’t get it! The indents meant nothing to them, so it was heard to explain how indented code was “different”. Next time I do this, I’ll definitely find a way to use an explicit notation for code blocks and make sure they really grok things before revealing that you can just use indents.

  221. Chris W. Says:

    Since consciousness is big topic in this discussion about the current AI results for program synthesis, maybe the following interview with the AI researcher Jürgen Schmidhuber could be interesting.

    I’ll link directly to the part where they talk about consciousness: https://youtu.be/3FIo6evmweo?t=2492

    (However, at least for me, not only this section but the full interview was very interesting)

  222. Gerard Says:

    mls #218

    >You cannot know “logical priority” without an instantiated irreflexive relation.

    If you know any mathematical theorems that have been proved about consciousness please share.

  223. OhMyGoodness Says:

    JimV #216

    The brain is considerably more complex than neuron either on or off. Neuron types are numerous and haven’t to date been fully categorized. They differ in morphology and type of neurotransmitter released (20 or so types of transmitters), whether inhibitory or excitatory, typical number of dendritic synapses, typical number of axon connections. They may fire say a few hundred times a second or not fire at all and a firing cycle requires specific ion channels to open through the membrane in a particular well timed sequence. Pyramidal cells may have 30,000 dendritic connections with excitatory neurons and 1700 connections with inhibitory neurons. As Ilio noted above if a neuron receives an excitatory neurotransmitter from dendritic synapse it only increases the probability it will fire and similarly for an inhibitory transmitter.

    The evidence is good that some of these processes are influenced by quantum effects.

    The cerebellum, that is not associated strongly with the executive functions of consciousness, actually contains the most neurons but of a different type than found in the pre frontal cortex that is associated with consciousness.

    Of course not magic but nearly seemingly so in the sense of the science fiction adage that sufficiently complex technology seems like magic. I guess you could make the case that it can be simplified greatly and have equivalent function but I will bet against success.

  224. Michael Gogins Says:

    Consciousness, whatever else it is, is conscious of being conscious. In this, it is in some real sense, a fixed point. I quite fail to see how this can be modeled physically or in software. Efforts to do so, to me, seem to result in an infinite regress, such as that implied by Husserl’s notion of the transcendental ego, critiqued by Sartre precisely because it would result in such an infinite regress. If consciousness resides in a transcendental ego, then in that ego, does there not need to be another transcendental ego for consciousness of consciousness to result, and so on ad infinitum?

    My being conscious of being conscious, this fixed point, consists of a kind of unity of subject and object. I, as subject, am aware of taking myself as the object of my consciousness. Once again, I quite fail to see how this can be implemented physically or in software. If part of a physical system or program were the locus of consciousness of the rest of the system, this would be just like the transcendental ego, and would involve the same fallacy. Yet if the “entire system” is conscious it is hard to avoid the conclusion that absolutely everything is conscious, but then why I am not actually conscious of absolutely everything? If there were something to draw a line between conscious and not-conscious, then that something and that line would be not-conscious, and then obviously the idea that absolutely everything is conscious would be false.

    Please explain how this unity of subject and object, this consciousness of being conscious, can be implemented or even represented as a physical system or computer program.

    And please don’t propose that the consciousness of being conscious is an “illusion.” Please. An illusion that is conscious of itself is still conscious. However wrong such an illusion might be about many things, it would still be right in being conscious of being conscious.

  225. Gerard Says:

    OhMyGoodness #223

    > the pre frontal cortex that is associated with consciousness.

    I don’t think it’s accurate to say that any physical object is “associated with consciousness” because consciousness is an intrinsically subjective experience which cannot be observed by the observer of the supposedly associated physical objects or processes.

    I think that in the medical and maybe neurological literature the word “consciousness” is often used as a synonym for “responsiveness”. The later is certainly objectively observable but it is not at all the same concept as “the ability to experience phenomena”, which is what I mean by consciousness and what I think is typically meant in philosophic discussions of consciousness.

  226. mls Says:

    #222 Gerard

    I said nothing whatsoever about consciousness and your attempt to deflect my criticism with rhetoric has no relation to logic as it is used in mathematics.

    Mathematical logic requires a specification of axioms and a specification of semantics. The stipulations involved in such specifications have connection to consciousness only with respect to beliefs regarding interpretation. By construction there will be no such thing as a mathematical theorem about consciousness or any other thing for that matter. The attempts to ground mathematics on material objects in conjunction with classical bivalent logic led to paradoxes. The avoidance of those paradoxes have effectively crystallized the dichotomy between syntax and semantics in the study of logics.

    It is certainly true that there are controversies with respect to how much of mathematical ontology is faithfully represented through formal language. But, failure to give credence to the various formulations attempting to account for the truth of mathematical statements is not an affirmative statement of how your particular views resolve the difficulties.

    It is perfectly plausible that mathematics is a purely linguistic phenomenon. Writing on the applicability of mathematics, Mark Steiner concluded that applied mathematics constitutes a form of Pythagoreanism.

    Authors following the Russellian tradition of trying to define mathematics for the express purpose of justifying science as material truth engage in “indispensability arguments” attempting to convince people of a material existence for mathematical objects outside of space and time. They use the term “abstract object” and I am using the word ‘material’ with respect to the standard interpretation of an existential quantifier with respect to a correspondence theory of truth.

    I have family members who also believe in the material existence of objects outside of space and time. They do not read blogs like this. Their “virtual objects” are not found in science books.

    One may certainly consider other paradigms. The “recursive mode of thought” is comparable with the semantics of negative free logic. Meanwhile, speculating about non-existent objects is permitted in the semantics of positive free logic. Unfortunately, free logics support fictionalist accounts of mathematical objects. Basing science on fictions seems like an unintended consequence of trying to win an argument at any cost.

    The same kind of thing will occur with just about any other paradigm one tries to employ to justify the truth of mathematical statements. The importance of proof for communicating mathematics subjects the import of mathematical statements to deflation through its own pedagogy.

    Now the facts here are simple. You used a transitive irreflexive relation called “logical priority” to dismiss Lorraine Ford. I have simply pointed out that your action is actually a validation of her position. The burden of proof that your remark is coherent lies with you.

    If you have never heard of Don Gabbay, you might look him up. He once observed that an irreflexive order is an ineliminable presupposition for any deductive logic.

  227. Pseudon2000 Says:

    Scott #192:

    Those relevant questions remembered me an interesting argument I once read in a hacker news thread, so I looked it up [1]. Excerpts:

    > … we have neither the scientific knowledge nor the computational power to make an exact model of the human brain (nor we are anywhere close to either requirement, AFAIK).

    > In other words, we cannot pinpoint the exact sequence of operations and events that produce a certain behavior, while we absolutely can do that with ML models (you could in principle run machine learning models with pen and paper, obviously).

    Later:

    > ‘Intelligence” is a word that, etymologically and semantically, is related to human or human-like capabilities. You wouldn’t say that a leaf floating on a lake is swimming, and likewise, claiming that computers are “learning” or “intelligent” is at best a thin analogy and at worst a mischaracterization of the process. What’s happening in my brain is something we don’t have full scientific knowledge of, but we know it’s not x86 machine code. While the two processes may be in many ways similar, conflating the two into this ill-defined concept of “intelligence” is a discussion about semantics more than anything else.

    And later on:

    > … I would be content to accept that my brain is not fundamentally different than an algorithm if you showed me an algorithm that can effectively emulate my brain within an acceptable margin of error. Ironically, if it were possible to do that, it would be proof there is no “intelligence”, only “computability”, making the first entirely redundant.

    What I take away from this with my limited understanding is:

    – If it turns out that we can simulate/emulate/model the human brain’s intelligence with today’s technology (transistors and programs running on those transistors), or run a brain model with pen and paper in principle, then we just can dump the concept of “intelligence” altogether and stick with “computability”.

    – Thus, the term AI is always misleading for ML – either intelligence (as we conceive it) does not exist (only computability), or it cannot be archived artificially with today’s technology.

    [1] https://news.ycombinator.com/item?id=24764851

  228. Gerard Says:

    mls #226

    > I said nothing whatsoever about consciousness and your attempt to deflect my criticism with rhetoric has no relation to logic as it is used in mathematics.

    That was the entire point of my deflection, which you have clearly missed. My comment occurred in the context of a discussion about consciousness in which the other party was the first to bring up logic. It was by no means a discussion about mathematical logic, a subject I admit I know relatively little about. Does that preclude me from using the word “logic” in a discussion when logic, in an informal sense, has always been the basis for any sort of rational discussion ?

    > You used a transitive irreflexive relation called “logical priority” to dismiss Lorraine Ford.

    I’ve never heard the term “transitive irreflexive relation” before so I can only guess at what it means. By “logical priority” I was only referring to the informal notion of something which one must first assume exists in order to effectively discuss the existence of something else.

  229. Ben Standeven Says:

    @Gerard #217:

    The neuron’s messages appear in other connected neurons, of course. But if you send a letter to someone, is that person “monitoring” you?

  230. Gerard Says:

    Pseudon2000 #227

    > ‘Intelligence” is a word that, etymologically and semantically, is related to human or human-like capabilities.

    I disagree with that definition of intelligence. For me intelligence is about problem solving and more specifically being able to adapt problem solving processes to a very wide array of different types of problems. As long as we see intelligence as being about problem solving rather than about the human experience of problem solving we can avoid anthopomorphization and legitimately discuss AI. The relationship of the word “intelligence” to human like activities is incidental rather than essential. It exists simply because, at least until very recently, humans were the only example of a thing that exhibited any kind of really general problem solving capability.

    > You wouldn’t say that a leaf floating on a lake is swimming

    No and if you said that an AI was “thinking” then I would agree that that was an inaccurate use of language.

    This incidentally is why I don’t like mixing discussions of AI with discussions of consciousness. It just tends to lead to a huge amount of conceptual confusion, but that’s how this thread has developed so I’ve gone along.

  231. Ben Standeven Says:

    @Pseudon2000:

    That argument seems a little silly. We have computer programs that can check the spelling of a text; so by this logic, there’s no such thing as spelling, only computability.

  232. OhMyGoodness Says:

    Gerard #225
    I provided my definition of consciousness above that includes the executive functions such as forming expectations about the future and developing intentions and planning behaviors based on those expectations. Lesion studies indicate that the physical structures in the brain that allow these executive functions are located in the pre frontal cortex.

    I can’t imagine how consciousness is possible if not based on physical structures and processes in the brain.

  233. Scott Says:

    Pseudon2000 #227:

      Thus, the term AI is always misleading for ML – either intelligence (as we conceive it) does not exist (only computability), or it cannot be archived artificially with today’s technology.

    The way that’s usually expressed is, “as soon as something works, it’s no longer called AI.” Your version doesn’t seem tongue-in-cheek though! 🙂

  234. Gerard Says:

    OhMyGoodness #232

    I certainly don’t dispute that the physical structures of the brain are likely intimately related to the content of consciousness, I think that people who are deeply involved in scientific and technical subjects often have difficulty understanding that there must be more to consciousness than its specific content. There must be something for that content to appear in or to.

    > I can’t imagine how consciousness is possible if not based on physical structures and processes in the brain.

    And I can’t imagine how those structures could possibly create consciousness itself. Comment #224 from Michael Goggins gave a really nice explanation of why not. At a minimum you would need some kind of infinite regression and I don’t see how you could implement such a thing in finite time and space.

  235. Ilio Says:

    Michael Gogins #224, it’s kind of standard to call that « metacognition ».

    Gerard #225, by « consciousness » most cognitive neuroscientists mean « having access to working memory » (see S Dehaene for details), most neurological scientists mean « awake », most literary critics mean « ethical », and most philosophers mean [insert ten thousands pages].

  236. Lorraine Ford Says:

    OhMyGoodness #212:
    I’m sorry, but I don’t think that’s a feasible definition of consciousness. For the most basic, primitive organisms to even exist, they need the pre-existing primitive ability to discern difference in their surrounding environment (i.e. they need pre-existing primitive consciousness), where their surrounding environment might be symbolically represented by a set of variables and numbers. You can’t use equations (e.g. the equations that represent the laws of nature) to symbolically represent the discerning of difference in a set of variables and numbers: you need the types of symbols used in computer programs to symbolically represent the discerning of difference.

    In other words, physics as it stands doesn’t have the symbolic wherewithal to represent the evolution of life or consciousness. And in any case, the evolution of life requires pre-existing primitive consciousness (i.e. the pre-existing primitive ability to discern difference). One can’t represent the world or evolution without using the types of symbols used in computer programs.

  237. Lorraine Ford Says:

    Gerard #215:
    It’s all very well saying that consciousness is “knowing” or the “I” in “I am”, but how does that fit into the physics’ models of the world, which are represented by (e.g.) equations (representing laws of nature), variables and numbers? Answer: it doesn’t, because there are absolutely no symbols in the physics’ models that correspond to “knowing” or an “experiencer of phenomena”.

    On the other hand, computer programs are chock-a-block full of routines that symbolically represent “knowing” that a symbolically represented situation is true, and symbolically responding to particular situations. But the basic types of symbols used in computer programs represent a different aspect of the world, an aspect of the world that can’t be derived from equations, variables and numbers. In other words, one needs the basic types of symbols used in computer programs, as well as the abovementioned equations, variables and numbers, in order to symbolically represent a world that includes “knowing”.

    Re “Knowing is logically prior to knowing difference”: Surely, the two go together like a noun and a verb: without difference (of various kinds), there is nothing to know, nothing to distinguish?

    What I was getting at, with a definition of consciousness, is that a computer can’t be conscious of what it’s binary numbers (more correctly, it’s arrays of higher and lower voltages) symbolically represent from the point of view of the human beings that set up the computer systems.

  238. OhMyGoodness Says:

    Gerard #234

    I never saw the value in the self referential infinite loop arguments about consciousness (they did provide hundreds of pages of text for a popular popsci author though). Human consciousness is the result of quite lengthy evolution. Evolution works through the generational transfer of effective genes. Genes work at a physical level and so I personally see no way that human consciousness is not the result of conserved genetic control operating on physical attributes of the brain. I think it makes perfect sense to view and define human consciousness in an evolutionary context.

  239. OhMyGoodness Says:

    Newton’s comments about standing on the shoulders of giants in the context of evolution would be something like-humans stand on the conserved genes of 4 billion years of lesser life. 🙂

  240. JimV Says:

    “Neurons are not just 1’s and 0’s.” Well, what I have read about them is that they are discontinuous in function, either activated or not. Granted, they have multiple inputs which can trigger activation in various ways, and their output goes to multiple places. They do much more than a single bit in a computer usually does (although single bits could also be the result of long algorithms, and trigger subsequent algorithms). In fact it takes about a 1000-node neural network to simulate one of them in a computer. Given that they can be simulated in a computer though, that convinces me that an entire biological brain could be simulated, in principal, in a computer of tremendous size. (It would take a tremendous amount of training, also. Probably not practical to achieve in our civilization’s lifetime.)

    As for consciousness, my Windows system knows when I press keys and mouse buttons that I have done so. Expand that to knowing thousands of other things at the same time, and having a motivating impetus to parse them for benefits or detriments, and I think it would have as much consciousness as I do. How would that feel to it? I don’t know, but can accept that there are other ways of feeling things than the way I feel them. I am not a bio-chauvinist, at least not intellectually, although evolution has molded me to prefer my own tribe emotionally.

    Is a dog conscious? Is C. elegans? As they say, quantity is quality, when there is enough of it (neurons).

  241. mls Says:

    #228 Gerard

    And, you again attempt to avoid your burden of proof.

    Your statement invoking logical priority had nothing to do with consciousness. It had to do with knowing. The standard account of knowledge is “justified true belief”.

    So, you are claiming that justified true belief is logically prior to knowlng of differences. It would seem, however, that one must at least know that truth and falsity are different in order for the precedent of your claim to be meaningful.

    When you attempt to deflect my request for a legitimate account of your statement, you are claiming that there is a context. The only context in my engagement with you is an incoherent statement followed by a demand for conflating mathematics with consciousness. I explained exactly why that demand is a category error. And, rather than justifying your conflated contexts, you chose to again attempt a rhetorical avoidance strategy.

    Now, there are many universities in this world. Most of them have mathematics departments. These departments issue paychecks to their employees. One class of employee on their payroll is “mathematician.” Nothing I wrote — all of which you failed to address — cannot be found among published papers by people who collect paychecks as professional mathematicians.

    You were the one who introduced the issue of mathematical proof into the context of our exchange I answered with an appropriate response based upon published mathematics.

    I invite you to demonstrate the same courtesy in explaining the coherence of your statement.

  242. Gerard Says:

    Lorraine Ford #237

    > It’s all very well saying that consciousness is “knowing” or the “I” in “I am”, but how does that fit into the physics’ models of the world

    I never said it did, I’m not a physicalist. I think that the simplest explanation is that the physical world is a product of consciousness and exists only within it. Nothing within that world is capable of fully describing consciousness itself.

    Of course I can’t prove that view but I can give arguments:

    1) The fact that phenomena are experienced is at the root of any possible epistemology and it is the one and only fact that you can know with absolute certainty.

    2) It’s clear (even to a physicalist) that the content of experience is entirely a product of mind (it’s obvious that we aren’t directly aware of physical objects but only of some kind of signals in the brain that very indirectly represent them).

    3) (2) proves that minds can experience a physical world, without being that physical world.

    4) We also experience dreams which can be quite similar to waking experiences, therefore experiences can exist in minds even without physical objects to cause them.

    So we know that experiences exist, that at least one mind exists and that physical objects are not necessary to produce experiences. So one possible explanation for what we experience is the monist view in which only mind exists. A second possible view is the physicalist view in which physical objects somehow generate mind but no one has the slightest idea of how that could happen or even what sort of conceptual framework would allow it to be understood. Finally we know with absolute certainty that a mind or minds exist but we do not have any grounds to justify absolute certainty that physical objects exist.

    I conclude from those considerations that the most economical explanation is that mind/consciousness/knowing is the only thing that truly exists and that all experience, including of course our perception of a physical world is simply a delusion occurring within it.

  243. Gerard Says:

    Lorraine Ford #237

    > Re “Knowing is logically prior to knowing difference”: Surely, the two go together like a noun and a verb: without difference (of various kinds), there is nothing to know, nothing to distinguish?

    mls above wants to define knowing as “justified true belief”. The way I was using the word is rooted in its Indo-European etymological roots rather than recent fads, but, OK, I can accept that definition.

    Well, I have a justified true belief that “I AM”. There is a noun and there is a verb, but where is the difference ?

  244. OhMyGoodness Says:

    JimmyV #240

    Apparently I have dramatically underestimated the mental life of elephants.

  245. OhMyGoodness Says:

    JimV #240

    Are you proposing that the brain somehow determines a one or zero from each neuron multiple times per second and maps that to a thought? Assuming a total of 86 billion neurons then there is a set of possible human thoughts of size 2^(86 billion) and likewise my dog 2^(2 million) potential thoughts and an elephant a whopping 2^(257 billion) potential thoughts. I suspect this may overestimate my dog since I often think she has fewer neurons than average but raises a new suspicion that pachyderms are sneakily playing the long game for species dominance. :). I mean this all tongue in cheek because clear that number of neurons has importance but I don’t believe sufficient to capture human consciousness using standard computer architecture with lots of nodes. By human consciousness I mean my definition in the context of evolution.

  246. volitiontheory Says:

    It is strange that it is the 21st century and we still don’t know what we are! The common assumption of science is that we are just a collection of ordinary matter that evolved to self-replicate after billions of years of evolution on Earth. That helps explain the body but not the mind! Pleasure, pain, visual perception, audio perception, somatosensory perception must be part of physics if we claim that physics is the ultimate queen of science that fundamentally explains everything!

    In order for physics to include visual and audio perception, I think what is needed is high mass mind particles with libertarian free will! These particles would not have appeared by accident! Mind particles would also be the result of a long evolutionary history — but this time of universe evolution! Universes that are very good decision makers with exceptional sensory perception and an enormous way of responding to perceptions with libertarian free will will be more successful in universe reproduction! The idea is that the universe has a genetic code and is alive and reproduces using big bangs producing many dark matter baby universe holodeck particles! Dark matter in outer space might not have an electrical charge because it is not part of a brain and there is nothing to communicate with!

    The high mass dark matter baby universe particle in brains has an electric charge and serves as a transducer with a complex instruction set that converts EM homuncular code sense information to consciousness and also outputs libertarian free will decisions by converting them to the free will EM homuncular code! Animals and humans would likely only use a subset of the complex instruction set of the dark matter particle for sense information and a subset of free will output commands available! Humans would use a lot of homuncular codes that animals don’t, but a lot of animals will use codes that are not used by humans but could be used when designing artificial bodies — thus allowing artificial bodies to have more senses and available free will actions than natural human bodies!

    Let’s say you had a dog that you deeply loved that died and you wish to uplift the dog to be a human with an artificial body and adopt as your child! You could take the dark matter particle with surrounding EM wave focusing crystal from the dead dog’s brain and put it in an artificial human body! The artificial human body can implement a lot of the advanced olfactory and auditory homuncular codes that the dark matter particle used to enjoy as a dog giving the new child extra capabilities that children with real human bodies don’t have! The child, previously a dog, can have a very high IQ because virtual brains in artificial bodies could be designed that way! He could also be a hero to his peers because he can smell dangerous chemicals that children with natural bodies can’t thus saving their life!

  247. Scott Says:

    Ok, I think I’ll close this thread later today, since the subject has now wandered completely from AlphaCode, and what it can and can’t do, to metaphysical debate about the nature of consciousness. Get in any final comments now … especially if they’re about AlphaCode.

  248. mls Says:

    Lorraine Ford #237

    You should not be so quick to dismiss physics since physics, itself, can be understood and applied with respect to constructive empiricism rather than scientism.

    I own a book written by the mathematician Sze-Tsen Hu. The title of the book is “Threshold Logic.” The subject matter of the book consists of switching functions, Boolean polynomials, cubical complexes, and a criterion by which the subset of switching functions called “threshold functions” are recognized. That criterion is called “linear separability.”

    Being written by a mathematician for mathematicians, it is characterized by the aesthetics and motivations typical of most mathematical texts. There is no talk of “perceptrons” or “neural networks” or “artificial intelligence.” Such expressions originate in the nineteenth century hubris wherein “logic” was believed to correspond with “the laws of thought” and the sociology of the industrial revolution popularized the folklore that everything can be reduced to a machine.

    Sadly, the people who continue to advocate such positions will not take responsibility for a worldview that first brought the existential threat of nuclear holocaust upon us followed by the recognition of how the dictum “it works” has led to a climate disaster that may be irreversible.

    The funny thing about linear separability is that the researchers who describe their work using an analogy to the human quality of intelligence see it as a “problem.” It is a problem for them presicely because they would need to be able to realize a completed infinity in a computer program in order to use it. You see, what Sze-Tsen Hu does in applying a mathematician’s aesthetic is to study the topology of cubical complexes in terms of arithmetical invariants. So, when you know the mathematics underlying the sociology, the limitations are far more apparent.

    In the process of formulating physical theories, physicists have come to recognize a correspondence between symmetry and conservation principles. There is a basic symmetry in bivalent truth values that is obfuscated when the “law of excluded middle” is expressed with an inclusive disjunction. It is typically expressed with an inclusive disjunction because logicians study logical consequence and they have a form of “identity” ( namely, $$p \rightarrow p$$ ) based on the material conditional. They then go on to “define” exclusive disjunction in terms of the conditional.

    Exclusive disjunction is a perfectly adequate expression of bivalent truth valuation. In contrast with the material conditional it is not linearly separable. No mechanical representation of intelligent reasoning will ever recognize the symmetry within a pair of its own accord. These mechanical representations of intelligence are confined to what they can do with the linear separability of threshold functions and finite resources.

    Yet, physicists have not only recognized symmetry, they have attributed significance to the concept by framing conservation laws using it.

    The symmetry associated with pairs is not trivial. If you have never read Max Black’s dialogue on the identity of indiscernibles, you should have a look at it,

    http://home.sandiego.edu/~baber/analytic/blacksballs.pdf

    The principle appears to originate with Thomas Aquinas in so far as Leibniz cites Aquinas in his own writing. Although Aristotle never goes so far as ascribing essence to individuals, his statement that genera are prior to species motivates an interpretation of an individual as the terminal form of species. While I have never pursued Aquinas’ argument, it seems that Aquinas introduced the principle to explain how God could know every soul individually. Leibniz saw an application for the principle in geometric contexts and adjusted his logical investigations to be aligned with it. In Leibnizian logic, the definition for genera and species yields an inverted order from that of Aristotle or modern extensional set theory.

    In an attempt to reconcile the incompatible views of Leibniz and Newton, Immanuel Kant took the principle to be a defining feature of logic and declared that mathematics is, instead, grounded upon the visual representation of numerical difference between two points in space. He relegated “logic” to the facility of “understanding” and “mathematics” to the facility of “intuition.” Because mathematicians communicate with proofs, logicians and analytical philosophers have been disparaging Kant ever since.

    Theories of interpretation for the article ‘the’ — namely, definite descriptions — brought the principle to the attention of modern analytical philosophy. Frege assumed that one could meaningly speak of “the extension of a concept” as a well-construed individual. His definition of natural numbers depended upon this. Russell’s paradox had been fatal to this approach. So, Russell offered a different account of definite descriptions which is quite relevant to your current exchanges with Gerard,

    https://users.drew.edu/jlenz/br-on-denoting.html

    This is a very cogent argument in support of objectual ontologies and is the kind of philosophical analysis which must be answered as opposed to blog rhetoric with indoctrinated believers.

    Nevertheless, Russell’s theory of definite description also came under fire because of its implicit use of the principle of the identity of indiscernibles. This time, the criticism came from Ludwig Wittgenstein who pointed out that there is no reason to accept that “properties” have the specificity needed to justify interpretation of singular terms as referring to individuals.

    Max Black’s paper is a dialogue presenting arguments that argue for and against this principle. The arguments against the principle are Wittgensteinian and are based upon symmetry.

    Although most physicists disparage these texts as philosophical, they impact the use and study of mathematics through the inference rules of first-order logic (which reflects a rejection of the principle) and the use of metric space axioms (which seems to adopt the principle). Mathematical logicians study logical systems with different inference rule and mathematicians study systems with topologies not based upon the typical metric space axioms.

    I am truly saddened by the propensity of rhetoricians to abuse mathematics with claims of how it justifies their beliefs.

    Needless to say, another criticism of Russell’s description theory outside of mathematics occurs with Peter Strawson. While this debate has largely occurred outside of mathematics, Strawson also uses geometric analogy to differentiate qualitative identity from numerical identity. What is common to the use of geometric criticisms of the principle of the identity of indiscernibles is that it assumes topological separability. Strawson’s account makes clear that this involves a circularity. Points differentiate parts of space. Parts of space separate points. What this means is that all objectual ontologies are “transcendent.” They require endless hierarchies. Cumulative hierarchies to solve Russell’s paradox. Metalanguage hierarchies to speak about truth. Provability hierarchies to analyze incompleteness. What causes this is a topological property called compactness.

    The linear operators used in physics operate over compact Hausdorff spaces.

    Differential ontologies are characterized by “immanence.” As difference is a relation, differential ontologies are subject to infinite regress. When Quine recognized that set membership did not need to be restricted through theories of types, he proposed a stratified set theory with “Quine atoms” which could be members of themselves. With regard to logic, the sign of equality cannot mean “identity.” Rather, one must think in terms of warrants for substitutivity in a logical calculus.

    Anyway, Ms. Ford, you are advocating for a minority position. I hope you find the links useful.

  249. Ilio Says:

    Scott #247, Would you mind to comment on mark Waser #185? Have a great day.

  250. Gerard Says:

    mls #248

    I haven’t the time, the background nor the inclination to begin to decipher your comment, but I noticed you cited Wittgenstein.

    I’d just like to note that Wittgenstein had the wisdom to recognize that language (of which logic, mathematics and science are just particular subsets) is unable to describe everything that is of interest to us and he said “Whereof we cannot speak, thereof we must remain silent”.

    I think consciousness belongs to the category of that which is beyond the power of language to describe (and by extension therefore also beyond the realms of logic, mathematics and science).

    Nonetheless I don’t fully agree that regarding consciousness one “must remain silent” because while it certainly cannot be fully represented by language its fundamental nature is accessible to all and is simple enough for a child to understand. For those who retain some openness of mind language can help point them in the direction of discovering their true nature.

  251. Scott Says:

    Ilio #249: Mark Waser says that the probability of success falling off with program length “shouldn’t be a concern,” because ultimately modularity is the way to go. I agree, of course, that modularity is ultimately the way to go! The challenge now is to design program synthesis tools that are capable of modular design. That seems extremely nontrivial … meaning that, the way things have been going, I fear it might take as long as 2-3 years. 😀

  252. JimV Says:

    About elephants, without hands they have had little opportunity to produce written records to pass on accumulated knowledge as humans have. Recall that it took over 150,000 years (based on archeological evidence) for humanity to discover and develop the wheel and axle. From that beginning we then got capstans, pulleys, gears, mills, …, computer hard drives.

    However, I suspect a lot of elephants’ neuron capacity is tied up in trunk manipulation, with every few inches of trunk a sort of universal joint. Our appendages are much simpler to control.

  253. mls Says:

    Gerard #250

    A popular, but largely useless, quote from Wittgenstein.

    However, a sensible reply. Thank you.

    The entire problem with mathematical rigour is that it becomes intractable. And most of what I write is intractable because I study a great deal of mathematics. I became interested in the continuum question years ago. The foundations of mathematics is where common beliefs about logic and mathematics fall completely apart. Its subject matter consists of many contradictory claims about what is and what is not mathematics. Most of its researchers conscientiously work at either understanding the various claims or sorting them out.

    Their work is often ignored and disparaged in the wider mathematics community.

    For the most part I view many of these problems as reducing to the difficulty of communicating subjective qualia with discrete finite language. On that, you and I seem to agree.

    Now, how can sensible people recover science from hyperbole and rationality from propaganda?

    A rhetorical question, of course.

    Have a good day!

  254. JimV Says:

    As to the strawmanning about one neuron equalling one thought, no most neurons aren’t even involved in what you call thoughts. For example, the eye is a single lens which therefore flips the external image onto the retina, and the visual cortex translates that back to upright. Also, there is a blind spot where the retinal nerves connect to the retinas, and the visual cortex fills that in from the background–all without you thinking about it.

    You can make the tip of a sharpened pencil seem to disappear by moving it into the blind spot field. Since the tip is no longer in view, the visual cortex fills in the spot with the background that it does see–just like a dumb computer would do.

    I thank Dr. Scott for allowing this forum to discuss such topics on his time. (Wishing again that he had a donations button.) It is worthless stuff in the grand scheme of things, but interesting to some of us nonetheless.

  255. Lorraine Ford Says:

    Mls #248:
    I certainly don’t “dismiss physics”. My #237 comment merely said that computer programs have something that physics doesn’t have, when it comes to the issue of symbolically representing “knowing”/ consciousness/ subjective experience etc.

    The following computer program basics can’t be derived from (e.g.) the equations of physics or any equations at all: IF, THEN, AND, OR, IS TRUE. However, mathematicians, physicists, and people in general, are constantly using the equivalent of IF, THEN, AND, OR and IS TRUE, in their own minds. This indicates, to me at least, that these symbols represent something about the world that is missing from physics.

    This is not about logic “correspond[ing] with “the laws of thought”” or not corresponding with the “the laws of thought”, it’s about the use of man-made symbols in an attempt to represent aspects of the world, just like symbols of physics are man-made symbols that represent aspects of the world. And there is not necessarily any genuine one to one correspondence between a man-made symbol and the thing it symbolises, if the thing it symbolises is a natural aspect of the world.

    Re “mathematics is, instead, grounded upon the visual representation of numerical difference between two points in space”: But what is doing the differentiation? Certainly not the symbols themselves: its people (people’s consciousness) doing the differentiation.

    Re “I am truly saddened by the propensity of rhetoricians to abuse mathematics with claims of how it justifies their beliefs.”: There is no such thing as abusing mathematics, because mathematics is a human creation, people creating, discerning and manipulating symbols: mathematics is not some sort of perfect entity. The reason that mathematicians can specialise in divining relationships is seemingly because the entire world is based on relationships.

  256. A Raybold Says:

    Lorraine Ford #255
    I may well be missing your point here, but it resonated with an idea that I thought for a while might be a sort of counter-argument against materialism, though I no longer do.

    What you are saying here, and in your predecessor comment, seems to be a version of mathematical or logical platonism. It so happens that I tend to lean towards this view (though without any strong conviction.)

    I am also a materialist who believes strong AI / AGI is possible. Putting these two positions together, it seems that I should accept (at least as strongly as I am willing to accept mathematical platonism) that the hypothetical algorithms of strong AI would be nonphysical in the same sense as mathematical or logical theorems.

    Strong AI is not, however, the premise that algorithms could be conscious; it is the premise that computers executing certain programs could be conscious. Even if the program is not physical, the computer executing the program would be a physical entity, and the process would be a physical one. The algorithm would be an abstraction of what is physically going on in the device, and abstractions are not a problem for materialism (e.g. ‘igneous’ is an abstraction, but it does not follow that the basalt cliffs near my home are somehow nonphysical.)

  257. DeepMind’s new AI can write code | Information Age – THE WIX Says:

    […] “Judged against where AI was 20-25 years ago, when I was a student, a dog is now holding meaningful conversations in English,” he said in a blog post. […]

  258. Shtetl-Optimized » Blog Archive » OpenAI! Says:

    […] new direction in my career had its origins here on Shtetl-Optimized. Several commenters, including Max Ra and Matt Putz, asked me point-blank what it would take to induce me to work on AI alignment. […]

  259. Alphacoders | wall.alphacoders.com - Login and Portal Says:

    […] Blog Archive » AlphaCode as a dog speaking mediocre English […]