*February 17, 2024* # Intelligence by Force ![[engines.jpeg]] > from [strasburgrailroad.com](https://www.strasburgrailroad.com/blog/historic-steam-engines/) There are two reasons why transformers work. *Mechanically*, we have no idea how they work, in contrast to say, how we understand parts of an engine. On this front, we are like neuroscientists working out the insides of a computer ([they tried](https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005268&type=printable)). We kind of know what we're doing. But not really. We will eventually, but it's hard to get a timeline. When that happens, we'll engineer these machines with specificity that is currently impossible. That will be cool. However, the field's attention has shifted towards another, apparently more effective angle... *Economically*, we know exactly what's going on. That's because for now, the definitive answer to *better* seems to be *more*. The world we live in can't always think harder, but *by God* it can push harder. Thinky problems are risky. Push-y problems are predictable. It's very rare that a research problem has obvious "cost" and "time" knobs. It's a managerial dream come true. This property, more or less a side effect of the original design, is really *why* transformers work. The design of transformer works out to: 1) make use of (increasing) compute 2) work well enough 3) improve metrics in a predictable way This is basically OpenAI's whole thing. They [noticed this property](https://arxiv.org/abs/2001.08361) early on and shifted focus away from architecture problems, toward scale problems. Now they've got some sort of flywheel going, and are churning out lots of *stuff* (see [SORA](https://openai.com/index/sora/)). Like any [evolutionary arms race](https://lachlan-gray.com/The+Cambrian+Period+of+AI#Musical+Chairs), when things hit the metal, brawn beats brain most of the time. It's interesting that this is also true for the field of "Artificial Intelligence".