Patterns in Information

*November 8, 2024* (*Special thanks to [a cool cat](https://x.com/fleetingbits) and his reading groups*) # Patterns in Information ![[memories-are-stored-in-the-landscape.png]] > *Memories are stored in the landscape* ^[From this year's [Nobel Prize in physics](https://www.nobelprize.org/uploads/2024/11/popular-physicsprize2024-3.pdf)] *** Science and engineering have always had tight interplay. Because of that, technological progress tends to foreshadow imminent scientific progress. If you haven't heard, deep learning has taken off recently. It turns out that it may answer a critical mass of engineering "how" questions; enough at least to justify immense time and resources to perfect the technology. History suggests that answers to existing "why" questions are soon to follow. Not only does the expansion of deep learning signify a technological shift, it also signals an imminent conceptual shift, not limited to Artificial Intelligence. ## Essences of Change > *We dug mines in ignorance of geology, we smelted iron and forged steel with almost no knowledge of chemistry, and we built engines without awareness that there existed a science of thermodynamics.* > > *But all this was natural enough. The men and women who settled this country were daily confronted with overwhelming problems that had to be solved quickly by the nearest means if the people were to survive at all.* > > —Roger Burlingame, *Scientists Behind The Inventors* A pattern in history is that we understand the "essence" of a phenomenon before we can describe it concretely. When a useful principle is discovered, inventors and engineers are the first to strike. We get new technology first, fundamental science later. In time, it usually turns out that "essences" are placeholders for latent scientific theories. We need to have essences because human ingenuity requires grounding of some kind, regardless of fundamental validity. In the 18th century, the time of early steam engines, it was believed that heat was an invisible, weightless liquid called *caloric*. When something is hot, it's full of caloric, when it's cold it's empty. When you place a hot object against a cold object, you are pouring caloric from one to the other. It was also believed that combustible materials contained a hypothetical substance called *phlogiston*, the "essence of fire". When you burn a material, you release the phlogiston inside it, and that's what makes the flame. We now have a conceptual model of "atoms" and how their elemental behaviours explain both phenomena, even though we can't perceive them directly. It's now understood that heat is actually vibrational energy. Hot things have atoms that buzz a lot, cold things have less buzzing. When they touch, the buzzing spreads out, which we perceive as a temperature change. We also get that when something burns, it's just a fast chemical reaction where loose energy is a bi-product. We now understand that there is energy stored in the bonds between atoms, and that if you rearrange the bonds to hold less energy, the discrepancy is released as heat and light. None of that matters however, if the goal is to confront pressing threats and opportunities. ## The Mechanical Advantage That's why breakthroughs in fundamental science usually trail breakthroughs in engineering. This was very true for the steam engine. Obviously, steam engines require a heat source. While there is no *phlogiston* in coal, the misunderstanding won't stop you from setting it on fire. For the purpose of an engine, it's not important *why* coal produces such intense heat, only that it does. The "essence" of combustion was all Thomas Newcomen needed to invent the [atmospheric engine](https://en.wikipedia.org/wiki/Newcomen_atmospheric_engine). It solved an important problem of the time, pumping water out of mines. Although monstrously inefficient, it did the job. Similarly, the essence of it was enough for James Watt to improve the design by further controlling the flow of heat. When he added a [condenser](https://en.wikipedia.org/wiki/Watt_steam_engine#Separate_condenser), the engine worked less against itself, making it efficient enough to escape the mines and last through the industrial revolution. ^[[This post](https://acoup.blog/2022/08/26/collections-why-no-roman-industrial-revolution/) from Unmitigated Pedantry tells a fantastic story ] It wasn't until the contributions of Carnot, Clausius, and Boltzmann over the 19th century that we got a fundamental glimpse of the principles we had actually been using the whole time, leading to our laws of thermodynamics. We could finally explain engines in terms of temperature, pressure, and *entropy*, which in turn, we could explain in terms of the average motion of underlying particles. ^[To this day entropy is still an "essence" in many ways] It turns out the "essences" we employed to build the engine were helpful veneers over a simple and fundamentally transparent aspect of reality. ## Harnessing The "It" >*...when you refer to “Lambda”, “ChatGPT”, “Bard”, or “Claude” then, it’s not the model weights that you are referring to. It’s the dataset.* > > —jbetker, [The “it” in AI models is the dataset](https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/) Just as there was an intuitive *essence*, an "it" in combustion we understood and harnessed to make engines, there is also an "it" in deep learning behind recent AI. It's related to [Richard Sutton's bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html): that *general methods that leverage computation are ultimately the most effective, and by a large margin*. In other words, when you have a lot of data, the fruitful question isn't *"how"*, it's *"how much"*. It suggests that in the long run, AI progress won't be bounded by our inventiveness, but by the robustness, and supply of data and compute. This line of thinking has, apparently, led us somewhere interesting. Over the last few years, we have observed that deep learning models do in fact get better at things with scale. But also, that there is a great deal of hard-earned finesse for it to actually work. We bake information into the model in phases. We forge a *base model* from huge volumes of data over the course of weeks or months. We know that the mixture of data matters, and we blend it thoughtfully with real and synthetic parts to target what we want from it. We know that the character of the data will dictate the model's strengths and nature, and that different data works better for different models. We expect base models to be unwieldy. We have tricks and procedures to "align" them into the image of instruction-following, helpful, harmless, and truthful language assistants. We have a toolbox of other nuanced techniques to further condense or dilute behaviours, sand or sharpen edges. When training a model, we know how to measure the way its performance scales with its size, data, and training compute. We can estimate optimal ratios between these quantities; we allocate the time and resources to produce a model, then sell access to it at roughly minimal cost. We have different ways to benchmark, and very approximately measure model "capabilities". We know that there can be tradeoffs between different skillsets and behaviours. The most informative evaluations are drawn from [Elo scores](https://en.wikipedia.org/wiki/Elo_rating_system), or better yet, word of mouth. We are in the process of mastering *something*, but there is still a surprising gap between theory and practice. ## Do We Have a Theory? > *Scientific frameworks are often difficult to discover, not because they are complex, but because intuitive but incorrect assumptions keep us from seeing the correct answer.* > > —Jeff Hawkins, *On Intelligence* *Why* does deep learning work? If you ask around you may hear that artificial neural networks "capture complex patterns", "learn increasingly abstract features", "approximate complex distributions", and "generalize to new data". These are all true statements grounded in mathematics. However, we should be careful not to mistake neural network theory for a *scientific* theory. That's not to say neural network theory is *wrong*, or that it's not *part* of a scientific theory. It's to say that features, distributions, and manifolds are suspiciously abstract concepts. The artesian nature of deep learning expertise suggests there are still tangible, *physically* grounded aspects missing from our current picture. Elements of neural network theory could in some ways be analogous to [Sadi Carnot's heat engine](https://en.wikipedia.org/wiki/Carnot_heat_engine). The Carnot cycle is the theoretical model describing the most efficient heat engine physically possible. Although it was developed under the assumption of *caloric*, it was still correct and is taught to this day. With a few substitutions later that century, it could easily retrofit into the broad ideas of thermodynamics and statistical physics. In the last few years, we've gathered an *astounding* amount of deep learning "hows", but not a proportional amount of clear "why's". It would be pretty weird for a new, sweeping technology *not* to trigger a cascading change somewhere in our model of physical reality. The effectiveness of deep learning hints that we are triangulating a fundamental property of nature. ## Release The Physicists In a poetic way, many roots of AI stem from statistical physics; fruit of the steam engine. [This year's Nobel Prize in Physics](https://www.nobelprize.org/uploads/2024/11/popular-physicsprize2024-3.pdf) was actually awarded to John Hopfield and Geoff Hinton for using *tools from physics to construct methods that helped lay the foundation for today’s powerful machine learning*. Hopfield Networks and Boltzmann Machines are indeed core components of how we understand generative processes, pattern recognition, and memory in real *and* artificial neural networks. By the same token, machine learning is a fair distance from what people typically think of as physics. Combined with fact that Geoff Hinton isn't technically a physicist, this Nobel Prize may be legible as a green light for physicists to contribute to mainstream AI, without having to deviate from the banner of "physics". It could be an exciting degree of freedom for research faculties, and attract new perspectives on AI. I feel compelled to share this passage from Duncan Watts: _Physicists, it turns out, are almost perfectly suited to invading other people’s disciplines, being not only extremely clever but also generally much less fussy than most about the problems they choose to study. Physicists tend to see themselves as the lords of the academic jungle, loftily regarding their own methods as above the ken of anybody else and jealously guarding their own terrain. But their alter egos are closer to scavengers, happy to borrow ideas and technologies from anywhere if they seem like they might be useful, and delighted to stomp all over someone else’s problem. As irritating as this attitude can be to everybody else, the arrival of the physicists into a previously non-physics area of research often presages a period of great discovery and excitement. Mathematicians do the same thing occasionally, but **no one descends with such fury and in so great a number as a pack of hungry physicists, adrenalized by the scent of a new problem.**_ ^[From his book *Six Degrees: The Science of a Connected Age*] Interest from the physics community is something to be excited about. [Xientists are rejoicing](https://x.com/_xjdr/status/1854554634632970684)! ## Conclusion: New Science > *It's not about standing still and becoming safe. If anybody wants to keep creating they have to be about change.* > > —Miles Davis The first whiff of strange essence means something is brewing—uncharted territory and unknowns within. Responsibility falls on us to do something about it, lest it harbour something either of value or danger. This way of thinking has been double edged for humanity. On one hand, it drove us to explore, discover, and create; make life better for everyone. On the other hand, it can also desensitize us to consequences. The legacy of the steam engine extended far, far beyond the scope of pumping water from coal mines. The industrial revolution meant a destiny, setting the course of our culture, ecosystem, and how we live. Now, with stuff going on in AI, there is a potent "it" we are tapping into with intent to solve the most pressing challenges of today. In time, we will get to better understand what "it" is, which means today is a golden age for curiosity. With technology comes change, and the promise of new questions to ask.