Quantcast
Channel: Stephen Wolfram Writings
Viewing all 205 articles
Browse latest View live

Remembering Murray Gell-Mann (1929–2019), Inventor of Quarks

$
0
0
Quark

First Encounters

In the mid-1970s, particle physics was hot. Quarks were in. Group theory was in. Field theory was in. And so much progress was being made that it seemed like the fundamental theory of physics might be close at hand.

Right in the middle of all this was Murray Gell-Mann—responsible for not one, but most of the leaps of intuition that had brought particle physics to where it was. There’d been other theories, but Murray’s—with their somewhat elaborate and abstract mathematics—were always the ones that seemed to carry the day.

It was the spring of 1978 and I was 18 years old. I’d been publishing papers on particle physics for a few years, and had gotten quite known around the international particle physics community (and, yes, it took decades to live down my teenage-particle-physicist persona). I was in England, but planned to soon go to graduate school in the US, and was choosing between Caltech and Princeton. And one weekend afternoon when I was about to go out, the phone rang. In those days, it was obvious if it was an international call. “This is Murray Gell-Mann”, the caller said, then launched into a monologue about why Caltech was the center of the universe for particle physics at the time.

Perhaps not as starstruck as I should have been, I asked a few practical questions, which Murray dismissed. The call ended with something like, “Well, we’d like to have you at Caltech”.

A few months later I was indeed at Caltech. I remember the evening I arrived, wandering around the empty 4th floor of Lauritsen Lab—the home of Caltech theoretical particle physics. There were all sorts of names I recognized on office doors, and there were two offices that were obviously the largest: “M. Gell-Mann” and “R. Feynman”. (In between them was a small office labeled “H. Tuck”—which by the next day I’d realized was occupied by Helen Tuck, the lively longtime departmental assistant.)

There was a regular Friday lunch in the theoretical physics group, and as soon as a Friday came around, I met Murray Gell-Mann there. The first thing he said to me was, “It must be a culture shock coming here from England”. Then he looked me up and down. There I was in an unreasonably bright yellow shirt and sandals—looking, in fact, quite Californian. Murray seemed embarrassed, mumbled some pleasantry, then turned away.

With Murray at Caltech

I never worked directly with Murray (though he would later describe me to others as “our student”). But I interacted with him frequently while I was at Caltech. He was a strange mixture of gracious and gregarious, together with austere and combative. He had an expressive face, which would wrinkle up if he didn’t approve of what was being said.

Murray always had people and things he approved of, and ones he didn’t—to which he would often give disparaging nicknames. (He would always refer to solid-state physics as “squalid-state physics”.) Sometimes he would pretend that things he did not like simply did not exist. I remember once talking to him about something in quantum field theory called the beta function. His face showed no recognition of what I was talking about, and I was getting slightly exasperated. Eventually I blurted out, “But, Murray, didn’t you invent this?” “Oh”, he said, suddenly much more charming, “You mean g times the psi function. Why didn’t you just say that? Now I understand”. Of course, he had understood all along, but was being difficult about me using the “beta function” term, even though it had by then been standard for years.

I could never quite figure out what it was that made Murray impressed by some people and not others. He would routinely disparage physicists who were destined for great success, and would vigorously promote ones who didn’t seem so promising, and didn’t in fact do well. So when he promoted me, I was on the one hand flattered, but on the other hand concerned about what his endorsement might really mean.

The interaction between Murray Gell-Mann and Richard Feynman was an interesting thing to behold. Both came from New York, but Feynman relished his “working-class” New York accent, while Gell-Mann affected the best pronunciation of words from any language. Both would make surprisingly childish comments about the other.

I remember Feynman insisting on telling me the story of the origin of the word “quark”. He said he’d been talking to Murray one Friday about these hypothetical particles, and in their conversation they’d needed a name for them. Feynman told me he said (no doubt in his characteristic accent), “Let’s call them ‘quacks’”. The next Monday he said Murray came to him very excited and said he’d found the word “quark” in James Joyce. In telling this to me, Feynman then went into a long diatribe about how Murray always seemed to think the names for things were so important. “Having a name for something doesn’t tell you a damned thing”, Feynman said. (Having now spent so much of my life as a language designer, I might disagree). Feynman went on, mocking Murray’s concern for things like what different birds are called. (Murray was an avid bird watcher.)

Meanwhile, Feynman had worked on particles which seemed (and turned out to be) related to quarks. Feynman had called them “partons”. Murray insisted on always referring to them as “put-ons”.

Even though in terms of longstanding contributions to particle physics (if not physics in general) Murray was the clear winner, he always seemed to feel as if he was in the shadow of Feynman, particularly with Feynman’s showmanship. When Feynman died, Murray wrote a rather snarky obituary, saying of Feynman: “He surrounded himself with a cloud of myth, and he spent a great deal of time and energy generating anecdotes about himself”. I never quite understood why Murray—who could have gone to any university in the world—chose to work at Caltech for 33 years in an office two doors down from Feynman.

Murray cared a lot about what people thought of him, but would routinely (and maddeningly to watch) put himself in positions where he would look bad. He was very interested in—and I think very knowledgeable about—words and languages. And when he would meet someone, he would make a point of regaling them with information about the origin of their name (curiously—as I learned only years later—his own name, “Gell-Mann”, had been “upgraded” from “Gellmann”). Now, of course, if there’s one word people tend to know something about, it’s their own name. And, needless to say, Murray sometimes got its origins wrong—and was very embarrassed. (I remember he told a friend of mine named Nathan Isgur a long and elaborate story about the origin of the name “Isgur”, with Nathan eventually saying: “No, it was made up at Ellis Island!”.)

Murray wasn’t particularly good at reading other people. I remember in early 1982 sitting next to Murray in a limo in Chicago that had just picked up a bunch of scientists for some event. The driver was reading the names of the people he’d picked up over the radio. Many were complicated names, which the driver was admittedly butchering. But after each one, Murray would pipe up, and say “No, it’s said ____”. The driver was getting visibly annoyed, and eventually I said quietly to Murray that he should stop correcting him. When we arrived, Murray said to me: “Why did you say that?” He seemed upset that the driver didn’t care about getting the names right.

Occasionally I would ask Murray for advice, though he would rarely give it. When I was first working on one-dimensional cellular automata, I wanted to find a good name for them. (There had been several previous names for the 2D case, one of which—that I eventually settled on—was “cellular automata”.) I considered the name “polymones” (somehow reflecting Leibniz’s monad concept). But I asked Murray—given all his knowledge of words and languages—for a suggestion. He said he didn’t think polymones was much good, but didn’t have any other suggestion.

When I was working on SMP (a forerunner of Mathematica and the Wolfram Language) I asked Murray about it, though at the time I didn’t really understand as I do now the correspondences between human and computational languages. Murray was interested in trying out SMP, and had a computer terminal installed in his office. I kept on offering to show him some things, but he kept on putting it off. I later realized that—bizarrely to me—Murray was concerned about me seeing that he didn’t know how to type. (By the way, at the time, few people did—which is, for example, why SMP, like Unix, had cryptically short command names.)

But alongside the brush-offs and the strangeness, Murray could be personally very gracious. I remember him inviting me several times to his house. I never interacted with either of his kids (who were both not far from my age). But I did interact with his wife, Margaret, who was a very charming English woman. (As part of his dating advice to me, Feynman had explained that both he and Murray had married English women because “they could cope”.)

While I was at Caltech, Margaret got very sick with cancer, and Murray threw himself into trying to find a cure. (He blamed himself for not having made sure Margaret had had more checkups.) It wasn’t long before Margaret died. Murray invited me to the memorial service. But somehow I didn’t feel I could go; even though by then I was on the faculty at Caltech, I just felt too young and junior. I think Murray was upset I didn’t come, and I’ve felt guilty and embarrassed about it ever since.

Murray did me quite a few favors. He was an original board member of the MacArthur Foundation, and I think was instrumental in getting me a MacArthur Fellowship in the very first batch. Later, when I ran into trouble with intellectual property issues at Caltech, Murray went to bat for me—attempting to intercede with his longtime friend Murph Goldberger, who was by then president of Caltech (and who, before Caltech, had been a professor at Princeton, and had encouraged me to go to graduate school there).

I don’t know if I would call Murray a friend, though, for example, after Margaret died, he and I would sometimes have dinner together, at random restaurants around Pasadena. It wasn’t so much that I felt of a different generation from him (which of course I was). It was more that he exuded a certain aloof tension, that made one not feel very sure about what the relationship really was.

A Great Time in Physics

At the end of World War II, the Manhattan Project had just happened, the best and the brightest were going into physics, and “subatomic particles” were a major topic. Protons, neutrons, electrons and photons were known, and together with a couple of hypothesized particles (neutrinos and pions), it seemed possible that the story of elementary particles might be complete.

But then, first in cosmic rays, and later in particle accelerators, new particles started showing up. There was the muon, then the mesons (pions and kaons), and the hyperons (Λ, Σ, Ξ). All were unstable. The muon—which basically nobody understands even today—was like a heavy electron, interacting mainly through electromagnetic forces. But the others were subject to the strong nuclear force—the one that binds nuclei together. And it was observed that this force could generate these particles, though always together (Λ with K, for example). But, mysteriously, the particles could only decay through so-called weak interactions (of the kind involved in radioactive beta decay, or the decay of the muon).

For a while, nobody could figure out why this could be. But then around 1953, Murray Gell-Mann came up with an explanation. Just as particles have “quantum numbers” like spin and charge, he hypothesized that they could have a new quantum number that he called strangeness. Protons, neutrons and pions would have zero strangeness. But the Λ would have strangeness -1, the (positive) kaon strangeness +1, and so on. And total strangeness, he suggested, might be conserved in strong (and electromagnetic) interactions, but not in weak interactions. To suggest a fundamentally new property of particles was a bold thing to do. But it was correct: and immediately Murray was able to explain lots of things that had been observed.

But how did the weak interaction that was—among other things—responsible for the decay of Murray’s “strange particles” actually work? In 1957, in their one piece of collaboration in all their years together at Caltech, Feynman and Gell-Mann introduced the so-called V-A theory of the weak interaction—and, once again, despite initial experimental evidence to the contrary, it turned out to be correct. (The theory basically implies that neutrinos can only have left-handed helicity, and that weak interactions involve parity conservation and parity violation in equal amounts.)

As soon as the quantum mechanics of electrons and other particles was formulated in the 1920s, people started wondering about the quantum theory of fields, particularly the electromagnetic field. There were issues with infinities, but in the late 1940s—in Feynman’s big contribution—these were handled through the concept of renormalization. The result was that it was possible to start computing things using quantum electrodynamics (QED)—and soon all sorts of spectacular agreements with experiment had been found.

But all these computations worked by looking at just the first few terms in a series expansion in powers of the interaction strength parameter α≃1/137. In 1954, during his brief time at the University of Illinois (from which he went to the University of Chicago, and then Caltech), Murray, together with Francis Low, wrote a paper entitled “Quantum Electrodynamics at Small Distances” which was an attempt to explore QED to all orders in α. In many ways this paper was ahead of its time—and 20 years later, the “renormalization group” that it implicitly defined became very important (and the psi function that it discussed was replaced by the beta function).

While QED could be investigated through a series expansion in the small parameter α≃1/137, no such program seemed possible for the strong interaction (where the effective expansion parameter would be ≃1). So in the 1950s there was an attempt to take a more holistic approach, based on looking at the whole so-called S-matrix defining overall scattering amplitudes. Various properties of the S-matrix were known—notably analyticity with respect to values of particle momenta, and so-called crossing symmetry associated with exchanging particles and antiparticles.

But were these sufficient to understand the properties of strong interactions? Throughout the 1960s, attempts involving more and more elaborate mathematics were made. But things kept on going wrong. The proton-proton total interaction probability was supposed to rise with energy. But experimentally it was seen to level off. So a new idea (the pomeron) was introduced. But then the interaction probability was found to start rising again. So another phenomenon (multiparticle “cuts”) had to be introduced. And so on. (Ironically enough, early string theory spun off from these attempts—and today, after decades of disuse, S-matrix theory is coming back into vogue.)

But meanwhile, there was another direction being explored—in which Murray Gell-Mann was centrally involved. It all had to do with the group-theory-meets-calculus concept of Lie groups. An example of a Lie group is the 3D rotation group, known in Lie group theory as SO(3). A central issue in Lie group theory is to find representations of groups: finite collections, say of matrices, that operate like elements of the group.

Representations of the rotation group had been used in atomic physics to deduce from rotational symmetry a characterization of possible spectral lines. But what Gell-Mann did was to say, in effect, “Let’s just imagine that in the world of elementary particles there’s some kind of internal symmetry associated with the Lie group SU(3). Now use representation theory to characterize what particles will exist”.

And in 1961, he published his eightfold way (named after Buddha’s Eightfold Way) in which he proposed—periodic-table style—that there should be 8+1 types of mesons, and 10+8 types of baryons (hyperons plus nucleons, such as proton and neutron). For the physics of the time, the mathematics involved in this was quite exotic. But the known particles organized nicely into Gell-Mann’s structure. And Gell-Mann made a prediction: that there should be one additional type of hyperon, that he called the , with strangeness -3, and certain mass and decay characteristics.

And—sure enough—in 1964, the was observed, and Gell-Mann was on his way to the Nobel Prize, which he received in 1969.

At first the SU(3) symmetry idea was just about what particles should exist. But Gell-Mann wanted also to characterize interactions associated with particles, and for this he introduced what he called current algebra. And, by 1964, from his work on current algebra, he’d realized something else: that his SU(3) symmetry could be interpreted as meaning that things like protons were actually composed of something more fundamental—that he called quarks.

What exactly were the quarks? In his first paper on the subject, Gell-Mann called them “mathematical entities”, although he admitted that, just maybe, they could actually be particles themselves. There were problems with this, though. First, it was thought that electric charge was quantized in units of the electron charge, but quarks would have to have charges of 2/3 and -1/3. But even more seriously, one would have to explain why no free quarks had ever been seen.

It so happened that right when Gell-Mann was writing this, a student at Caltech named George Zweig was thinking of something very similar. Zweig (who was at the time visiting CERN) took a mathematically less elaborate approach, observing that the existing particles could be explained as built from three kinds of “aces”, as he called them, with the same properties as Gell-Mann’s quarks.

Zweig became a professor at Caltech—and I’ve personally been friends with him for more than 40 years. But he never got as much credit for his aces idea as he should (though in 1977 Feynman proposed him for a Nobel Prize), and after a few years he left particle physics and started studying the neurobiology of the ear—and now, in his eighties, has started a quant hedge fund.

Meanwhile, Gell-Mann continued pursuing the theory of quarks, refining his ideas about current algebras. But starting in 1968, there was something new: particle accelerators able to collide high-energy electrons with protons (“deep inelastic scattering”) observed that sometimes the electrons could suffer large deflections. There were lots of details, particularly associated with relativistic kinematics, but in 1969 Feynman proposed his parton (or, as Gell-Mann called it, “put-on”) model, in which the proton contained point-like “parton” particles.

It was immediately guessed that partons might be quarks, and within a couple of years this had been established. But the question remained of why the quarks should be confined inside particles such as protons. To avoid some inconsistencies associated with the exclusion principle, it had already been suggested that quarks might come in three “colors”. Then in 1973, Gell-Mann and his collaborators suggested that associated with these colors, quarks might have “color charges” analogous to electric charge.

Electromagnetism can be thought of as a gauge field theory associated with the Lie group U(1). Now Gell-Mann suggested that there might be a gauge field theory associated with an SU(3) color group (yes, SU(3) again, but a different application than in the eightfold way, etc.). This theory became known as quantum chromodynamics, or QCD. And, in analogy to the photon, it involves particles called gluons.

Unlike photons, however, gluons directly interact with each other, leading to a much more complex theory. But in direct analogy to Gell-Mann and Low’s 1954 renormalization group computation for QED, in 1973 the beta function (AKA g times psi function) for QCD was computed, and was found to show the phenomenon of asymptotic freedom—essentially that QCD interactions get progressively weaker at shorter distances.

This immediately explained the success of the parton model, but also suggested that if quarks get further apart, the QCD interactions between them get stronger, potentially explaining confinement. (And, yes, this is surely the correct intuition about confinement, although even to this day, there is no formal proof of quark confinement—and I suspect it may have issues of undecidability.)

Through much of the 1960s, S-matrix theory had been the dominant approach to particle physics. But it was having trouble, and the discovery of asymptotic freedom in QCD in 1973 brought field theory back to the fore, and, with it, lots of optimism about what might be possible in particle physics.

Murray Gell-Mann had had an amazing run. For 20 years he had made a series of bold conjectures about how nature might work—strangeness, V-A theory, SU(3), quarks, QCD—and in each case he had been correct, while others had been wrong. He had had one of the more remarkable records of repeated correct intuition in the whole history of science.

He tried to go on. He talked about “grand unification being in the air”, and (along with many other physicists) discussed the possibility that QCD and the theory of weak interactions might be unified in models based on groups like SU(5) and SO(10). He considered supersymmetry—in which there would be particles that are crosses between things like neutrinos and things like gluons. But quick validations of these theories didn’t work out—though even now it’s still conceivable that some version of them might be correct.

But regardless, the mid-1970s were a period of intense activity for particle physics. In 1974, the
J/ψ particle was discovered, which turned out to be associated with a fourth kind of quark (charm quark). In 1978, evidence of a fifth quark was seen. Lots was figured out about how QCD works. And a consistent theory of weak interactions emerged that, together with QED and QCD, defined what by the early 1980s had become the modern Standard Model of particle physics that exists today.

I myself got seriously interested in particle physics in 1972, when I was 12 years old. I used to carry around a copy of the little Particle Properties booklet—and all the various kinds of particles became, in a sense, my personal friends. I knew by heart the mass of the Λ, the lifetime of the , and a zillion other things about particles. (And, yes, amazingly, I still seem to remember almost all of them—though now they’re all known to much greater accuracy.)

At the time, it seemed to me like the most important discoveries ever were being made: fundamental facts about the fundamental particles that exist in our universe. And I think I assumed that before long everyone would know these things, just as people know that there are atoms and protons and electrons.

But I’m shocked today that almost nobody has, for example, even heard of muons—even though we’re continually bombarded with them from cosmic rays. Talk about strangeness, or the omega-minus, and one gets blank stares. Quarks more people have heard of, though mostly because of their name, with its various uses for brands, etc.

To me it feels a bit tragic. It’s not hard to show Gell-Mann’s eightfold way pictures, and to explain how the particles in them can be made from quarks. It’s at least as easy to explain that there are 6 known types of quarks as to explain about chemical elements or DNA bases. But for some reason—in most countries—all these triumphs of particle physics have never made it into school science curriculums.

And as I was writing this piece, I was shocked at how thin the information on “classic” particle physics is on the web. In fact, in trying to recall some of the history, the most extensive discussion I could find was in an unpublished book I myself wrote when I was 12 years old! (Yes, full of charming spelling mistakes, and a few physics mistakes.)

The Rest of the Story

When I first met Murray in 1978, his great run of intuition successes and his time defining almost everything that was important in particle physics was already behind him. I was never quite sure what he spent his time on. I know he traveled a lot, using physics meetings in far-flung places as excuses to absorb local culture and nature. I know he spent significant time with the JASON physicists-consult-for-the-military-and-get-paid-well-for-doing-so group. (It was a group that also tried to recruit me in the mid-1980s.) I know he taught classes at Caltech—though he had a reputation for being rather disorganized and unprepared, and I often saw him hurrying to class with giant piles of poorly collated handwritten notes.

Quite often I would see him huddled with more junior physicists that he had brought to Caltech with various temporary jobs. Often there were calculations being done on the blackboard, sometimes by Murray. Lots of algebra, usually festooned with tensor indices—with rarely a diagram in sight. What was it about? I think in those days it was most often supergravity—a merger of the idea of supersymmetry with an early form of string theory (itself derived from much earlier work on S-matrix theory).

This was the time when QCD, quark models and lots of other things that Murray had basically created were at their hottest. Yet Murray chose not to work on them—for example telling me after hearing a talk I gave on QCD that I should work on more worthwhile topics.

I’m guessing Murray somehow thought that his amazing run of intuition would continue, and that his new theories would be as successful as his old. But it didn’t work out that way. Though when I would see Murray, he would often tell me of some amazing physics that he was just about to crack, often using elaborate mathematical formalism that I didn’t recognize.

By the time I left Caltech in 1983, Murray was spending much of his time in New Mexico, around Santa Fe and Los Alamos—particularly getting involved in what would become the Santa Fe Institute. In 1984, I was invited to the inaugural workshop discussing what was then called the Rio Grande Institute might do. It was a strange event, at which I was by far the youngest participant. And as chance would have it, in connection with the republication of the proceedings of that event, I just recently wrote an account of what happened there, which I will soon post.

But in any case, Murray was co-chairing the event, and talking about his vision for a great interdisciplinary university, in which people would study things like the relations between physics and archaeology. He talked in grand flourishes about covering the arts and sciences, the simple and the complex, and linking them all together. It didn’t seem very practical to me—and at some point I asked what the Santa Fe Institute would actually concentrate on if it had to make a choice.

People asked what I would suggest, and I (somewhat reluctantly, because it seemed like everyone had been trying to promote their pet area) suggested “complex systems theory”, and my ideas about the emergence of complexity from things like simple programs. The audio of the event records some respectful exchanges between Murray and me, though more about organizational matters than content. But as it turned out, complex systems theory was indeed what the Santa Fe Institute ended up concentrating on. And Murray himself began to use “complexity” as a label for things he was thinking about.

I tried for years (starting when I first worked on such things, in 1981) to explain to Murray about cellular automata, and about my explorations of the computational universe. He would listen politely, and pay lip service to the relevance of computers and experiments with them. But—as I later realized—he never really understood much at all of what I was talking about.

By the late 1980s, I saw Murray only very rarely. I heard, though, that through an agent I know, Murray had got a big advance to write a book. Murray always found writing painful, and before long I heard that the book had gone through multiple editors (and publishers), and that Murray thought it responsible for a heart attack he had. I had hoped that the book would be an autobiography, though I suspected that Murray might not have the introspection to produce that. (Several years later, a New York Times writer named George Johnson wrote what I considered a very good biography of Murray, which Murray hated.)

But then I heard that Murray’s book was actually going to be about his theory of complexity, whatever that might be. A few years went by, and, eventually, in 1994, to rather modest fanfare, Murray’s book The Quark and the Jaguar appeared. Looking through it, though, it didn’t seem to contain anything concrete that could be considered a theory of complexity. George Zweig told me he’d heard that Murray had left people like me and him out of the index to the book, so we’d have to read the whole book if we wanted to find out what he said about us.

At the time, I didn’t bother. But just now, in writing this piece, I was curious to find out what, if anything, Murray actually did say about me. In the printed book, the index goes straight from “Winos” to Woolfenden. But online I can find that there I am, on page 77 (and, bizarrely, I’m also in the online index): “As Stephen Wolfram has emphasized, [a theory] is a compressed package of information, applicable to many cases”. Yes, that’s true, but is that really all Murray got out of everything I told him? (George Zweig, by the way, isn’t mentioned in the book at all.)

In 2002, I’d finally finished my own decade-long basic science project, and I was getting ready to publish my book A New Kind of Science. In recognition of his early support, I’d mentioned Murray in my long list of acknowledgements in the book, and I thought I’d reach out to him and see if he’d like to write a back-cover blurb. (In the end, Steve Jobs convinced me not to have any back-cover blurbs: “Isaac Newton didn’t have blurbs on the Principia; nor should you on your book”.)

Murray responded politely: “It is exciting to know that your magnum opus, reflecting so much thought, research, and writing, will finally appear. I should, of course, be delighted to receive the book and peruse it, and I might be able to come up with an endorsement, especially since I expect to be impressed”. But he said, “I find it difficult to write things under any conditions, as you probably know”.

I sent Murray the book, and soon thereafter was on the phone with him. It was a strange and contentious conversation. Murray was obviously uncomfortable. I was asking him about what he thought complexity was. He said it was “like a child learning a language”. I asked what that meant. We went back and forth talking about languages. I had the distinct sense that Murray thought he could somehow blind me with facts I didn’t know. But—perhaps unfortunately for the conversation—even though A New Kind of Science doesn’t discuss languages much, my long efforts in computational language design had made me quite knowledgeable about the topic, and in the conversation I made it quite clear that I wasn’t convinced about what Murray had to say.

Murray followed up with an email: “It was good to talk with you. I found the exchange of ideas very interesting. We seem to have been thinking about many of the same things over the last few years, and apparently we agree on some of them and have quite divergent views on others”. He talked about the book, saying that “Obviously, I can’t, in a brief perusal, come to any deep conclusions about such an impressive tome. It is clear, however, that there are many ideas in it with which, if I understand them correctly, I disagree”.

Then he continued: “Also, my own work of the last decade or so is not mentioned anywhere, even though that work includes discussions of the meaning and significance of simplicity and complexity, the role of decoherent histories in the understanding of quantum mechanics, and other topics that play important roles in A New Kind of Science”. (Actually, I don’t think I discussed anything relevant to decoherent histories in quantum mechanics.) He explained that he didn’t want to write a blurb, and ended: “I’m sorry, and I hope that this matter does not present any threat to our friendship, which I hold dear”.

As it turned out, I never talked to Murray about science again. The last time I saw Murray was in 2012 at a peculiar event in New York City for promising high-school students. I said hello. Murray looked blank. I said my name, and held up my name tag. “Do I know you?”, he said. I repeated my name. Still blank. I couldn’t tell if it was a problem of age—or a repeat of the story of the beta function. But, with regret, I walked away.

I have often used Murray as an example of the challenges of managing the arc of a great career. From his twenties to his forties, Murray had the golden touch. His particular way of thinking had success after success, and in many ways, he defined physics for a generation. But by the time I knew him, the easy successes were over. Perhaps it was Murray; more likely, it was just that the easy pickings from his approach were now gone.

I think Murray always wanted to be respected as a scholar and statesman of science—and beyond. But—to his chagrin—he kept on putting himself in situations that played to his weaknesses. He tried to lead people, but usually ended up annoying them. He tried to become a literary-style author, but his perfectionism and insecurity got in the way. He tried to do important work in new fields, but ended up finding that his particular methods didn’t work there. To me, it felt in many ways tragic. He so wanted to succeed as he had before, but he never found a way to do it—and always bore the burden of his early success.

Still, with all his complexities, I am pleased to have known Murray. And though Murray is now gone, the physics he discovered will live on, defining an important chapter in the quest for our understanding of the fundamental structure of our universe.


The Wolfram Function Repository: Launching an Open Platform for Extending the Wolfram Language

$
0
0
sw-blog-thumb-funct-repo

What the Wolfram Language Makes Possible

We’re on an exciting path these days with the Wolfram Language. Just three weeks ago we launched the Free Wolfram Engine for Developers to help people integrate the Wolfram Language into large-scale software projects. Now, today, we’re launching the Wolfram Function Repository to provide an organized platform for functions that are built to extend the Wolfram Language—and we’re opening up the Function Repository for anyone to contribute.

The Wolfram Function Repository is something that’s made possible by the unique nature of the Wolfram Language as not just a programming language, but a full-scale computational language. In a traditional programming language, adding significant new functionality typically involves building whole libraries, which may or may not work together. But in the Wolfram Language, there’s so much already built into the language that it’s possible to add significant functionality just by introducing individual new functions—which can immediately integrate into the coherent design of the whole language.

To get it started, we’ve already got 532 functions in the Wolfram Function Repository, in 26 categories:

The Wolfram Function Repository

Just like the 6000+ functions that are built into the Wolfram Language, each function in the Function Repository has a documentation page, with a description and examples:

LogoQRCode

Go to the page, click to copy the “function blob”, paste it into your input, and then use the function just like a built-in Wolfram Language function (all necessary downloading etc. is already handled automatically in Version 12.0):

ResourceFunction
&#10005
ResourceFunction["LogoQRCode"]["wolfr.am/E72W1Chw", CloudGet["https://wolfr.am/EcBjBfzw"]]

And what’s critical here is that in introducing LogoQRCode you don’t, for example, have to set up a “library to handle images”: there’s already a consistent and carefully designed way to represent and work with images in the Wolfram Language—that immediately fits in with everything else in the language:

ImageTransformation
&#10005
Table[ImageTransformation[
  ResourceFunction["LogoQRCode"]["wolfr.am/E72W1Chw", 
   ColorNegate[CloudGet["https://wolfr.am/EcBjBfzw"]]], #^k &], {k, 1,
   2, .25}]

I’m hoping that—with the help of the amazing and talented community that’s grown up around the Wolfram Language over the past few decades—the Wolfram Function Repository is going to allow rapid and dramatic expansion in the range of (potentially very specialized) functions available for the language. Everything will leverage both the content of the language, and the design principles that the language embodies. (And, of course, the Wolfram Language has a 30+ year history of design stability.)

Inside the functions in the Function Repository there may be tiny pieces of Wolfram Language code, or huge amounts. There may be calls to external APIs and services, or to external libraries in other languages. But the point is that when it comes to user-level functionality everything will fit together, because it’s all based on the consistent design of the Wolfram Language—and every function will automatically “just work”.

We’ve set it up to be as easy as possible to contribute to the Wolfram Function Repository—essentially just by filling out a simple notebook. There’s automation that helps ensure that everything meets our design guidelines. And we’re focusing on coverage, not depth—and (though we’re putting in place an expert review process) we’re not insisting on anything like the same kind of painstaking design analysis or the same rigorous standards of completeness and robustness that we apply to built-in functions in the language.

There are lots of tradeoffs and details. But our goal is to optimize the Wolfram Function Repository both for utility to users, and for ease of contribution. As it grows, I’ve no doubt that we’ll have to invent new mechanisms, not least for organizing a large number of functions, and finding the ones one wants. But it’s very encouraging to see that it’s off to such a good start. I myself contributed a number of functions to the initial collection. Many are based on code that I’ve had for a long time. It only took me minutes to submit them to the Repository. But now that they’re in the Repository, I can—for the first time ever—immediately use the functions whenever I want, without worrying about finding files, loading packages, etc.

Low Cost, High Payoff

We’ve had ways for people to share Wolfram Language code since even before the web (our first major centralized effort was MathSource, built for Mathematica in 1991, using CD-ROMs, etc.). But there’s something qualitatively different—and much more powerful—about the Wolfram Function Repository.

We’ve worked very hard for more than 30 years to maintain the design integrity of the Wolfram Language, and this has been crucial in allowing the Wolfram Language to become not just a programming language, but a full-scale computational language. And now what the Wolfram Function Repository does is to leverage all this design effort to let new functions be added that fit consistently into the framework of the language.

Inside the implementation of each function, all sorts of things can be going on. But what’s critical is that to the user, the function is presented in a very definite and uniform way. In a sense, the built-in functions of the Wolfram Language provide 6000+ consistent examples of how functions should be designed (and our livestreamed design reviews include hundreds of hours of the process of doing that design). But more than that, what ultimately makes the Wolfram Function Repository able to work well is the symbolic character of the Wolfram Language, and all the very rich structures that are already built into the language. If you’ve got a function that deals with images—or sparse arrays, or molecular structures, or geo positions, or whatever—there’s already a consistent symbolic representation of those in the language, and by using that, your function is immediately compatible with other functions in the system.

Setting up a repository that really works well is an interesting meta-design problem. Give too little freedom and one can’t get the functionality one wants. Give too much freedom and one won’t be able to maintain enough consistency. We’ve had several previous examples that have worked very well. The Wolfram Demonstrations Project—launched in 2007 and now (finally) running interactively on the web—contains more than 12,000 contributed interactive demonstrations. The Wolfram Data Repository has 600+ datasets that can immediately be used in the Wolfram Language. And the Wolfram Neural Net Repository adds neural nets by the week (118 so far) that immediately plug into the NetModel function in the Wolfram Language.

All these examples have the feature that the kind of thing that’s being collected is well collimated. Yes, the details of what actual Demonstration or neural net or whatever one has can vary a lot, but the fundamental structure for any given repository is always the same. So what about a repository that adds extensions to the Wolfram Language? The Wolfram Language is set up to be extremely flexible—so it can basically be extended and changed in any way. And this is tremendously important in making it possible to quickly build all sorts of large-scale systems in the Wolfram Language. But with this flexibility comes a cost. Because the more one makes use of it, the more one ends up with a separated tower of functionality—and the less one can expect that (without tremendous design effort) what one builds will consistently fit in with everything else.

In traditional programming languages, there’s already a very common problem with libraries. If you use one library, it might be OK. But if you try to use several, there’s no guarantee that they fit together. Of course, it doesn’t help that in a traditional programming language—as opposed to a full computational language—there’s no expectation of even having consistent built-in representations for anything but basic data structures. But the problem is bigger than that: whenever one builds a large-scale tower of functionality, then without the kind of immense centralized design effort that we’ve put into the Wolfram Language, one won’t be able to achieve the consistency and coherence needed for everything to always work well together.

So the idea of the Wolfram Function Repository is to avoid this problem by just adding bite-sized extensions in the form of individual functions—that are much easier to design in a consistent way. Yes, there are things that cannot conveniently be done with individual functions (and we’re soon going to be releasing a streamlined mechanism for distributing larger-scale packages). But with everything that’s already built into the Wolfram Language there’s an amazing amount that individual functions can do. And the idea is that with modest effort it’s possible to create very useful functions that maintain enough design consistency that they fit together and can be easily and widely used.

It’s a tradeoff, of course. With a larger-scale package one can introduce a whole new world of functionality, which can be extremely powerful and valuable. But if one wants to have new functionality that will fit in with everything else, then—unless one’s prepared to spend immense design effort—it’ll have to be smaller scale. The idea of the Wolfram Function Repository is to hit a particular sweet spot that allows for powerful functionality to be added while making it manageably easy to maintain good design consistency.

Contributing to the Repository

We’ve worked hard to make it easy to contribute to the Wolfram Function Repository. On the desktop (already in Version 12.0), you can just go to File > New > Repository Item > Function Repository Item and you’ll get a “Definition Notebook” (programmatically, you can also use CreateNotebook["FunctionResource"]):

Definition notebook

There are two basic things you have to do: first, actually give the code for your function and, second, give documentation that shows how the function should be used.

Press the Open Sample button at the top to see an example of what you need to do:

PositionLargest

Essentially, you’re trying to make something that’s like a built-in function in the Wolfram Language. Except that it can be doing something much more specific than a built-in function ever would. And the expectations for how complete and robust it is are much lower.

But you’ll need a name for your function, that fits in with Wolfram Language function naming principles. And you’ll need documentation that follows the same pattern as for built-in functions. I’ll say more later about these things. But for now, just notice that in the row of buttons at the top of the Definition Notebook there’s a Style Guidelines button that explains more about what to do, and there’s a Tools button that provides tools—especially for formatting documentation.

When you think you’re ready, press the Check button. It’s OK if you haven’t gotten all the details right yet. Because Check will automatically go through and do lots of style and consistency checks. Often it will make immediate suggestions for you to approve (“This line should end with a colon” and it’ll offer to put the colon in). Sometimes it will ask you to add or change something yourself. We’ll be continually adding to the automatic functionality of Check, but basically its goal to try to ensure that anything you submit to the Function Repository is already guaranteed to follow as many of the style guidelines as possible.

Check comments

OK, so after you run Check, you can use Preview. Preview generates a preview of the documentation page that you’ve defined for your function. You can choose to create a preview either in a desktop notebook, or in the cloud. If you don’t like something you see in the preview, just go back and fix it, and press Preview again.

Now you’re ready to deploy your function. The Deploy button provides four options:

Deploy

The big thing you can do is to submit your function to the Wolfram Function Repository, so it’s available to everyone forever. But you can also deploy your function for more circumscribed use. For example, you can have the function just deployed locally on your computer, so it will be available whenever you use that particular computer. Or you can deploy it to your cloud account, so it will be available to you whenever you’re connected to the cloud. You can also deploy a function publicly through your cloud account. It won’t be in the central Wolfram Function Repository, but you’ll be able to give anyone a URL that’ll let them get your function from your account. (In the future, we’ll also be supporting organization-wide central repositories.)

OK, let’s say you’re ready to actually submit your function to the Wolfram Function Repository. Then, needless to say, you press Submit to Repository. So what happens then? Well, your submission immediately goes into a queue for review and approval by our team of curators.

As your submission goes through the process (which will typically take a few days) you’ll get status messages—as well as maybe suggestions. But as soon as your function is approved, it’ll immediately be published in the Wolfram Function Repository, and available for anyone to use. (And it’ll show up in New Functions digests, etc. etc.)

What Should Be in the Repository

We have very high standards for the completeness, robustness—and overall quality—of the 6000+ functions that we’ve painstakingly built into the Wolfram Language over the past 30+ years. The goal of the Wolfram Function Repository is to leverage all the structure and functionality that already exists in the Wolfram Language to add as many as possible, much more lightweight, functions.

Yes, functions in the Wolfram Function Repository need to follow the design principles of the Wolfram Language—so they fit in with other functions, and with users’ expectations about how functions should work. But they don’t need to have the same completeness or robustness.

In the built-in functions of the Wolfram Language, we work hard to make things be as general as possible. But in the Wolfram Function Repository, there’s nothing wrong with having a function that just handles some very specific, but useful, case. SendMailFromNotebook can accept notebooks in one specific format, and produce mail in one specific way. PolygonalDiagram makes diagrams only with particular colors and labeling. And so on.

Another thing about built-in functions is that we go to great pains to handle all the corner cases, to deal with bad input properly, and so on. In the Function Repository it’s OK to have a function that just handles the main cases—and ignores everything else.

Obviously it’s better to have functions that do more, and do it better. But the optimization for the Function Repository—as opposed to for the built-in functions of the Wolfram Language—is to have more functions, covering more functionality, rather than to deepen each function.

What about testing the functions in the Function Repository? The expectations are considerably lower than for built-in functions. But—particularly when functions depend on external resources such as APIs—it’s important to be continually running regression tests, which is what automatically happens behind the scenes. In the Definition Notebook, you can explicitly give (in the Additional Information section) as many tests as you want, defined either by input and output lines or by full symbolic VerificationTest objects. In addition, the system tries to turn the documentation examples you give into tests (though this can sometimes be quite tricky, e.g. for a function whose result depends on random numbers, or the time of day).

There’ll be a whole range of implementation complexity to the functions in the Function Repository. Some will be just a single line of code; others might involve thousands or tens of thousands of lines, probably spread over many subsidiary functions. When is it worth adding a function that takes only very little code to define? Basically, if there’s a good name for the function—that people would readily understand if they saw it in a piece of code—then it’s worth adding. Otherwise, it’s probably better just to write the code again each time you need to use it.

The primary purpose of the Function Repository (as its name suggests) is to introduce new functions. If you want to introduce new data, or new entities, then use the Wolfram Data Repository. But what if you want to introduce new kinds of objects to compute with?

There are really two cases. You might want a new kind of object that’s going to be used in new functions in the Function Repository. And in that case, you can always just write down a symbolic representation of it, and use it in the input or output of functions in the Function Repository.

But what if you want to introduce an object and then define how existing functions in the Wolfram Language should operate on it? Well, the Wolfram Language has always had an easy mechanism for that, called upvalues. And with certain restrictions (particularly for functions that don’t evaluate their arguments), the Function Repository lets you just introduce a function, and define upvalues for it. (To set expectations: getting a major new construct fully integrated everywhere in the Wolfram Language is typically a very significant undertaking, that can’t be achieved just with upvalues—and is the kind of thing we do as part of the long-term development of the language, but isn’t what the Function Repository is set up to deal with.)

But, OK, so what can be in the code for functions in the Function Repository? Anything built into the Wolfram Language, of course (at least so long as it doesn’t pose a security risk). Also, any function from the Function Repository. But there are other possibilities, too. A function in the Function Repository can call an API, either in the Wolfram Cloud or elsewhere. Of course, there’s a risk associated with this. Because there’s no guarantee that the API won’t change—and make the function in the Function Repository stop working. And to recognize issues like this, there’s always a note on the documentation page (under Requirements) for any function that relies on more than just built-in Wolfram Language functionality. (Of course, when real-world data is involved, there can be issues even with this functionality—because actual data in the world changes, and even sometimes changes its definitions.)

Does all the code for the Wolfram Function Repository have to be written in the Wolfram Language? The code inside an external API certainly doesn’t have to be. And, actually, nor even does local code. In fact, if you find a function in pretty much any external language or library, you should be able to make a wrapper that allows it to be used in the Wolfram Function Repository. (Typically this will involve using ExternalEvaluate or ExternalFunction in the Wolfram Language code.)

So what’s the point of doing this? Basically, it’s to leverage the whole integrated Wolfram Language system and its unified design. You get the underlying implementation from an external library or language—but then you’re using the Wolfram Language’s rich symbolic structure to create a convenient top-level function that makes it easy for people to use whatever functionality has been implemented. And, at least in a perfect world, all the details of loading libraries and so on will be automatically taken care of through the Wolfram Language. (In practice, there can sometimes be issues getting external languages set up on a particular computer system—and in the cloud there are additional security issues to worry about.)

By the way, when you first look at typical external libraries, they often seem far too complicated to just be covered by a few functions. But in a great many cases, most of the complexity comes from building up the infrastructure needed for the library—and all the functions to support that. When one’s using the Wolfram Language, however, the infrastructure is usually already built in, and so one doesn’t need to expose all those support functions—and one only needs to create functions for the few “topmost” applications-oriented functions in the library.

The Ecosystem of the Repository

If you’ve written functions that you use all the time, then send them in to the Wolfram Function Repository! If nothing else, it’ll be much easier for you to use the functions yourself. And, of course, if you use the functions all the time, it’s likely other people will find them useful too.

Of course, you may be in a situation where you can’t—or don’t want to—share your functions, or where they access private resources. And in such cases, you can just deploy the functions to your own cloud account, setting permissions for who can access them. (If your organization has a Wolfram Enterprise Private Cloud, then this will soon be able to host its own private Function Repository, which can be administered within your organization, and set to force review of submissions, or not.)

Functions you submit to the Wolfram Function Repository don’t have to be perfect; they just have to be useful. And—a bit like the “Bugs” section in classic Unix documentation—there’s a section in the Definition Notebook called “Author Notes” in which you can describe limitations, issues, etc. that you’re already aware of about your function. In addition, when you submit your function you can include Submission Notes that’ll be read by the curation team.

Once a function is published, its documentation page always has two links at the bottom: “Send a message about this function”, and “Discuss on Wolfram Community”. If you send a message (say reporting a bug), you can check a box saying you want your message and contact information to be passed to the author of the function.

Often you’ll just want to use functions from the Wolfram Function Repository like built-in functions, without looking inside them. But if you want to “look inside”, there’s always a Source Notebook button at the top. Press it and you’ll get your own copy of the original Definition Notebook that was submitted to the Function Repository. Sometimes you might just want to look at this as an example. But you can also make your own modifications. Maybe you’ll want to deploy these on your computer or in your cloud account. Or maybe you’ll want to submit these to the Function Repository, perhaps as a better version of the original function.

In the future, we might support Git-style forking in the Function Repository. But for now, we’re keeping it simpler, and we’re always having just one canonical version of each function. And basically (unless they abandon it and don’t respond to messages) the original author of the function gets to control updates to it—and gets to submit new versions, which are then reviewed and, if approved, published.

OK, so how does versioning work? Right now, as soon as you use a function from the Function Repository its definition will get permanently stored on your computer (or in your cloud account, if you’re using the cloud). If there’s a new version of the function, then when you next use the function, you’ll get a message letting you know this. And if you want to update to the new version, you can do that with ResourceUpdate. (The “function blob” actually stores more information about versioning, and in the future we’re planning on making this conveniently accessible.)

One of the great things about the Wolfram Function Repository is that any Wolfram Language program anywhere can use functions from it. If the program appears in a notebook, it’s often nice to format Function Repository functions as easy-to-read “function blobs” (perhaps with appropriate versioning set).

But you can always refer to any Function Repository function using a textual ResourceFunction[...]. And this is convenient if you’re directly writing code or scripts for the Wolfram Engine, say with an IDE or textual code editor. (And, yes, the Function Repository is fully compatible with the Free Wolfram Engine for Developers.)

How It Works

Inside the Wolfram Function Repository it’s using exactly the same Resource System framework as all our other repositories (Data Repository, Neural Net Repository, Demonstrations Project, etc.) And like everything else in the Resource System, a ResourceFunction is ultimately based on a ResourceObject.

Here’s a ResourceFunction:

ResourceFunction
&#10005
ResourceFunction["StringIntersectingQ"]

It’s somewhat complicated inside, but you can see some of what’s there using Information:

Information
&#10005
Information[ResourceFunction["StringIntersectingQ"]]

So how does setting up a resource function work? The simplest is the purely local case. Here’s an example that takes a function (here, just a pure function) and defines it as a resource function for this session:

DefineResourceFunction
&#10005
DefineResourceFunction[1 + # &, "AddOne"]

Once you’ve made the definition, you can use the resource function:

ResourceFunction
&#10005
ResourceFunction["AddOne"][100]

Notice that in this function blob, there’s a black icon . This indicates the function blob refers to an in-memory resource function defined for your current session. For a resource function that’s permanently stored on your computer, or in a cloud account, there’s a gray icon . And for an official resource function in the Wolfram Function Repository there’s an orange icon .

OK, so what happens when you use the Deploy menu in a Definition Notebook? First, it’ll take everything in the Definition Notebook and make a symbolic ResourceObject out of it. (And if you’re using a textual IDE—or a program—you can also explicitly create the ResourceObject.)

Deploying locally on your computer uses LocalCache on the resource object to store it as a LocalObject in your file system. Deploying in your cloud account uses CloudDeploy on the resource object, and deploying publicly in the cloud uses CloudPublish. In all cases, ResourceRegister is also used to register the name of the resource function so that ResourceFunction["name"] will work.

If you press Submit to Function Repository, then what’s happening underneath is that ResourceSubmit is being called on the resource object. (And if you’re using a textual interface, you can call ResourceSubmit directly.)

By default, the submission is made under the name associated with your Wolfram ID. But if you’re submitting on behalf of a group or an organization, then you can set up a separate Publisher ID, and you can instead use this as the name to associate with your submissions.

Once you’ve submitted something to the Function Repository, it’ll go into the queue for review. If you get comments back, they’ll usually be in the form of a notebook with extra “comment cells” added. You can always check on the status of your submission by going to the Resource System Contributor Portal. But as soon as it’s approved, you’ll be notified (by email), and your submission will be live on the Wolfram Function Repository.

Some Subtleties

At first, it might seem like it should be possible to take a Definition Notebook and just put it verbatim into the Function Repository. But actually there are quite a few subtleties—and handling them requires doing some fairly sophisticated metaprogramming, symbolically processing both the code defining the function, and the Definition Notebook itself. Most of this happens internally, behind the scenes. But it has some consequences that are worth understanding if you’re going to contribute to the Function Repository.

Here’s one immediate subtlety. When you fill out the Definition Notebook, you can just refer to your function everywhere by a name like MyFunction—that looks like an ordinary name for a function in the Wolfram Language. But for the Function Repository documentation, this gets replaced by ResourceFunction["MyFunction"]—which is what users will actually use.

Here’s another subtlety: when you create a resource function from a Definition Notebook, all the dependencies involved in the definition of the function need to be captured and explicitly included. And to guarantee that the definitions remain modular, one needs to put everything in a unique namespace. (Needless to say, the functions that do all this are in the Function Repository.)

Usually you’ll never see any evidence of the internal context used to set up this namespace. But if for some reason you return an unevaluated symbol from the innards of your function, then you’ll see that the symbol is in the internal context. However, when the Definition Notebook is processed, at least the symbol corresponding to the function itself is set up to be displayed elegantly as a function blob rather than as a raw symbol in an internal context.

The Function Repository is about defining new functions. And these functions may have options. Often these options will be ones (like, say, Method or ImageSize) that have already been used for built-in functions, and for which built-in symbols already exist. But sometimes a new function may need new options. To maintain modularity, one might like these options to be symbols defined in a unique internal context (or to be something like whole resource functions in their own right). But to keep things simple, the Function Repository allows new options to be given in definitions as strings. And, as a courtesy to the final user, these definitions (assuming they’ve used OptionValue and OptionsPattern) are also processed so that when the functions are used, the options can be given not only strings but also as global symbols with the same name.

Most functions just do what they do each time they are called. But some functions need initialization before they can run in a particular session—and to deal with this there’s an Initialization section in the Definition Notebook.

Functions in the Function Repository can immediately make use of other functions that are already in the Repository. But how do you set up definitions for the Function Repository that involve two (or more) functions that refer to each other? Basically you just have to deploy them in your session, so you can refer to them as ResourceFunction["name"]. Then you can create the examples you want, and then submit the functions.

What Happens When the Repository Gets Big?

Today we’re just launching the Wolfram Function Repository. But over time we expect it to grow dramatically, and as it grows there are a variety of issues that we know will come up.

The first is about function names and their uniqueness. The Function Repository is designed so that—like for built-in functions in the Wolfram Language—one can refer to any given function just by giving its name. But this inevitably means that the names of functions have to be globally unique across the Repository—so that, for example, there can be only one ResourceFunction["MyFavoriteFunction"] in the Repository.

This might seem like a big issue. But it’s worth realizing it’s basically the same issue as for things like internet domains or social network handles. And the point is that one simply has to have a registrar—and that’s one of the roles we’re playing for the Wolfram Function Repository. (For private versions of the Repository, their administrators can be registrars.) Of course an internet domain can be registered without having anything on it, but in the Function Repository the name of a function can only be registered if there’s an actual function definition to go with it.

And part of our role in managing the Wolfram Function Repository is to ensure that the name picked for a function is reasonable given the definition of the function—and that it fits in with Wolfram Language naming conventions. We’ve now had 30+ years of experience in naming built-in functions in the Wolfram Language, and our curation team brings that experience to the Function Repository. Of course, there are always tradeoffs. For example, it might seem nice to have a short name for some function. But it’s better to “name defensively” with a longer, more specific name, because then it’s less likely to collide with something one wants to do in the future.

(By the way, just adding some kind of contributor tag to disambiguate functions wouldn’t achieve much. Because unless one insists on always giving the tag, one will end up having to define a default tag for any given function. Oh, and allocating contributor tags again requires global coordination.)

As the Wolfram Function Repository grows, one of the issues that’s sure to arise is the discoverability of functions. Yes, there’s search functionality (and Definition Notebooks can include keywords, etc.). But for built-in functions in the Wolfram Language there’s all sorts of cross-linking in documentation which helps “advertise” functions. Functions in the Function Repository can link to built-in functions. But what about the other way around? We’re going to be experimenting with various schemes to expose Function Repository functions on the documentation pages for built-in functions.

For built-in functions in the Wolfram Language, there’s also a level of discoverability provided by the network of “guide pages” that give organized lists of functions relevant to particular areas. It’s always complicated to appropriately balance guide pages—and as the Wolfram Language has grown, it’s common for guide pages to have to be completely refactored. It’s fairly easy to put functions from the Function Repository into broad categories, and even to successively break up these categories. But it’s much more valuable to have properly organized guide pages. It’s not yet clear how best to produce these for the whole Function Repository. But for example CreateResourceObjectGallery in the Function Repository lets anyone put up a webpage containing their “picks” from the repository:

Resource gallery

The Wolfram Function Repository is set up to be a permanent repository of functions, where any function in it will always just work. But of course, there may be new versions of functions. And we fully expect some functions to be obsoleted over time. The functions will still work if they’re used in programs. But their documentation pages will point to new, better functions.

The Wolfram Function Repository is all about providing new functions quickly—and exploring new frontiers for how the Wolfram Language can be used. But we fully expect that some of what’s explored in the Function Repository will eventually make sense to become built-in parts of the core Wolfram Language. We’ve had a slightly similar flow over the past decade from functionality that was originally introduced in Wolfram|Alpha. And one of the lessons is that to achieve the standards of quality and coherence that we insist on for anything built into the Wolfram Language is a lot of work—that usually dwarfs the original implementation effort. But even so, a function in the Function Repository can serve as a very useful proof of concept for a future function built into the Wolfram Language.

And of course the critical thing is that a function in the Function Repository is something that’s available for everyone to use right now. Yes, an eventual built-in function could be much better and stronger. But the Function Repository lets people get access to new functions immediately. And, crucially, it lets those new functions be contributed by anyone.

Earlier in the history of the Wolfram Language this wouldn’t have worked so well. But now there is so much already built into the language—and so strong an understanding of the design principles of the language—that it’s feasible to have a large community of people add functions that will maintain the design consistency to make them broadly useful.

There’s incredible talent in the community of Wolfram Language users. (And, of course, that community includes many of the world’s top people in R&D across a vast range of fields.) I’m hoping that the Wolfram Function Repository will provide an efficient platform for that talent to be exposed and shared. And that together we’ll be able to create something that dramatically expands the domain to which the computational paradigm can be applied.

We’ve taken the Wolfram Language a long way in 30+ years. Now, together, let’s take it much further. And let’s use the Function Repository—as well as things like the Free Wolfram Engine for Developers—as a platform for doing that.

A Few Thoughts about Deep Fakes

$
0
0
deep-fake-thumb

Someone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes….

What You See May Not Be What Happened

The idea of modifying images is as old as photography. At first, it had to be done by hand (sometimes with airbrushing). By the 1990s, it was routinely being done with image manipulation software such as Photoshop. But it’s something of an art to get a convincing result, say for a person inserted into a scene. And if, for example, the lighting or shadows don’t agree, it’s easy to tell that what one has isn’t real.

What about videos? If one does motion capture, and spends enough effort, it’s perfectly possible to get quite convincing results—say for animating aliens, or for putting dead actors into movies. The way this works, at least in a first approximation, is for example to painstakingly pick out the keypoints on one face, and map them onto another.

What’s new in the past couple of years is that this process can basically be automated using machine learning. And, for example, there are now neural nets that are simply trained to do “face swapping”:

Face swap

In essence, what these neural nets do is to fit an internal model to one face, and then apply it to the other. The parameters of the model are in effect learned from looking at lots of real-world scenes, and seeing what’s needed to reproduce them. The current approaches typically use generative adversarial networks (GANs), in which there’s iteration between two networks: one trying to generate a result, and one trying to discriminate that result from a real one.

Today’s examples are far from perfect, and it’s not too hard for a human to tell that something isn’t right. But even just as a result of engineering tweaks and faster computers, there’s been progressive improvement, and there’s no reason to think that within a modest amount of time it won’t be possible to routinely produce human-indistinguishable results.

Can Machine Learning Police Itself?

OK, so maybe a human won’t immediately be able to tell what’s real and what’s not. But why not have a machine do it? Surely there’s some signature of something being “machine generated”. Surely there’s something about a machine-generated image that’s statistically implausible for a real image.

Well, not naturally. Because, in fact, the whole way the machine images are generated is by having models that as faithfully as possible reproduce the “statistics” of real images. Indeed, inside a GAN there’s explicitly a “fake or not” discriminator. And the whole point of the GAN is to iterate until the discriminator can’t tell the difference between what’s being generated, and something real.

Could one find some other feature of an image that the GAN isn’t paying attention to—like whether a face is symmetric enough, or whether writing in the background is readable? Sure. But at this level it’s just an arms race: having identified a feature, one puts it into the model the neural net is using, and then one can’t use that feature to discriminate any more.

There are limitations to this, however. Because there’s a limit to what a typical neural net can learn. Generally, neural nets do well at tasks like image recognition that humans do without thinking. But it’s a different story if one tries to get neural nets to do math, and for example factor numbers.

Imagine that in modifying a video one has to fill in a background that’s showing some elaborate computation—say a mathematical one. Well, then a standard neural net basically doesn’t stand a chance.

Will it be easy to tell that it’s getting it wrong? It could be. If one’s dealing with public-key cryptography, or digital signatures, one can certainly imagine setting things up so that it’s very hard to generate something that is correct, but easy to check whether it is.

But will this kind of thing show up in real images or videos? My own scientific work has actually shown that irreducibly complex computation can be quite ubiquitous even in systems with very simple rules—and presumably in many systems in nature. Watch a splash in water. It takes a complex computation to figure out the details of what’s going to happen. And while a neural net might be able to get something that basically looks like a splash, it’d be vastly harder for it to get the details of a particular splash right.

But even though in the abstract computational irreducibility may be common, we humans, in our evolution and the environments we set up for ourselves, tend to end up doing our best to avoid it. We have shapes with smooth curves. We build things with simple geometries. We try to make things evolvable or understandable.  And it’s this avoidance of computational irreducibility that makes it feasible for neural nets to successfully model things like the visual scenes in which we typically find ourselves.

One can disrupt this, of course. Just put in the picture a display that’s showing some sophisticated computation (even, for example, a cellular automaton). If someone tries to fake some aspect of this with a neural net, it won’t (at least on its own) feasibly be able to get the details right.

I suspect that in the future of human technology—as we mine deeper in the computational universe—irreducible computation will be much more common in what we build. But as of now, it’s still rare in typical human-related situations. And as a result, we can expect that neural nets will successfully be able to model what’s going on well enough to at least fool other neural nets.

How to Know What’s Real

So if there’s no way to analyze the bits in an image to tell if it’s a real photograph, does that mean we just can’t tell? No. Because we can also think about metadata associated with the image—and about the provenance of the image. When was the image created? By whom? And so on.

So let’s say we create an image. How can we set things up so that we can prove when we did it? Well, in modern times it’s actually very easy. We take the image, and compute a cryptographic hash from it (effectively by applying a mathematical operation that derives a number from the bits in the image). Then we take this hash and put it on a blockchain.

The blockchain acts as a permanent ledger. Once we’ve put data on it, it can never be changed, and we can always go back and see what the data was, and when it was added to the blockchain.

This setup lets us prove that the image was created no later than a certain time. If we want to prove that the image wasn’t created earlier, then when we create the hash for the image, we can throw in a hash from the latest block on our favorite blockchain.

OK, but what about knowing who created the image? It takes a bit of cryptographic infrastructure—very similar to what’s done in proving the authenticity of websites. But if one can trust some “certificate authority” then one can associate a digital signature to the image that validates who created it.

But how about knowing where the image was taken? Assuming one has a certain level of access to the device or the software, GPS can be spoofed. If one records enough about the environment when the image was taken, then it gets harder and harder to spoof. What were the nearby Wi-Fi networks? The Bluetooth pings? The temperature? The barometric pressure? The sound level? The accelerometer readings? If one has enough information collected, then it becomes easier to tell if something doesn’t fit.

There are several ways one could do this. Perhaps one could just detect anomalies using machine learning. Or perhaps one could use actual models of how the world works (the path implied by the accelerometer isn’t consistent with the equations of mechanics, etc.). Or one could somehow tie the information to some public computational fact. Was the weather really like that in the place the photo was said to be taken? Why isn’t there a shadow from such-and-such a plane going overhead? Why is what’s playing on the television not what it should be? Etc.

But, OK, even if one just restricts oneself to creation time and creator ID, how can one in practice validate them?

The best scheme seems to be something like how modern browsers handle website security. The browser tries to check the cryptographic signature of the website. If it matches, the browser shows something to say the website is secure; if not, it shows some kind of warning.

So let’s say an image comes with data on its creation time and creator ID. The data could be metadata (say EXIF data), or it could be a watermark imprinted on the detailed bits in the image. Then the image viewer (say in the browser) can check whether the hash on a blockchain agrees with what the data provided by the image implies. If it does, fine. And the image viewer can make the creation time and creator ID available. If not, the image viewer should warn the user that something seems to be wrong.

Exactly the same kind of thing can be done with videos. It just requires video players computing hashes on the video, and comparing to what’s on a blockchain. And by doing this, one can guarantee, for example, that one’s seeing a whole video that was made at a certain time.

How would this work in practice? Probably people often wouldn’t want to see all the raw video taken at some event. But a news organization, for example, could let people click through to it if they wanted. And one can easily imagine digital signature mechanisms that could be used to guarantee that an edited video, for example, contained no content not in certain source videos, and involved, say, specified contiguous chunks from these source videos.

The Path Forward

So, where does this leave us with deep fakes? Machine learning on its own won’t save us. There’s not going to be a pure “fake or not” detector that can run on any image or video. Yes, there’ll be ways to protect oneself against being “faked” by doing things like wearing a live cellular automaton tie. But the real way to combat deep fakes, I think, is to use blockchain technology—and to store on a public ledger cryptographic hashes of both images and sensor data from the environment where the images were acquired. The very presence of a hash can guarantee when an image was acquired; “triangulating” from sensor and other data can give confidence that what one is seeing was something that actually happened in the real world.

Of course, there are lots of technical details to work out. But in time I’d expect image and video viewers could routinely check against blockchains (and “data triangulation computations”), a bit like how web browsers now check security certificates. And today’s “pics or it didn’t happen” will turn into “if it’s not on the blockchain it didn’t happen”.

My Part in an Origin Story: The Launching of the Santa Fe Institute

$
0
0
Launching the Santa Fe Institute

The first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops.

It was a slightly dark room, decorated with Native American artifacts. Around it were tables arranged in a large rectangle, at which sat a couple dozen men (yes, all men), mostly in their sixties. The afternoon was wearing on, with many different people giving their various views about how to organize what amounted to a putative great new interdisciplinary university.

Here’s the original seating chart, together with a current view of the meeting room. (I’m only “Steve” to Americans currently over the age of 60…):

Santa Fe seating chart
Dobkin Boardroom

I think I was less patient in those days. But eventually I could stand it no longer. I don’t remember my exact words, but they boiled down to: “What are you going to do if you only raise a few million dollars, not two billion?” It was a strange moment. After all, I was by far the youngest person there—at 25 years old—and yet it seemed to have fallen to me to play the “let’s get real” role. (To be fair, I had founded my first tech company a couple of years earlier, and wasn’t a complete stranger to the world of grandiose “what-if” discussions, even if I was surprised, though more than a little charmed, to be seeing them in the sixty-something-year-old set.)

A fragment of my notes from the day record my feelings:

What is supposed to be the point of this discussion?

George Cowan (Manhattan Project alum, Los Alamos administrator, and founder of the Los Alamos Bank) was running the meeting, and I sensed a mixture of frustration and relief at my question. I don’t remember precisely what he said, but it boiled down to: “Well, what do you think we should do?” “Well”, I said, “I do have a suggestion”. I summarized it a bit, but then it was agreed that later that day I should give a more formal presentation. And that’s basically how I came to suggest that what would become the Santa Fe Institute should focus on what I called “Complex Systems Theory”.

Of course, there was a whole backstory to this. It basically began in 1972, when I was 12 years old, and saw the cover of a college physics textbook that purported to show an arrangement of simulated colliding molecules progressively becoming more random. I was fascinated by this phenomenon, and quite soon started trying to use a computer to understand it. I didn’t get too far with this. But it was the golden age of particle physics, and I was soon swept up in publishing papers about a variety of topics in particle physics and cosmology.

Still, in all sorts of different ways I kept on coming back to my interest in how randomness—or complexity—gets produced. In 1978 I went to Caltech as a graduate student, with Murray Gell-Mann (inventor of quarks, and the first chairman of the Santa Fe Institute) doing his part to recruit me by successfully tracking down a phone number for me in England. Then in 1979, as a way to help get physics done, I set about building my first large-scale computer language. In 1981, the first version was finished, I was installed as a faculty member at Caltech—and I decided it was time for me to try something more ambitious, and really see what I could figure out about my old interest in randomness and complexity.

By then I had picked away at many examples of complexity. In self-gravitating gases. In dendritic crystal growth. In road traffic flow. In neural networks. But the reductionist physicist in me wanted to drill down and find out what was underneath all these. And meanwhile the computer language designer in me thought, “Let’s just invent something and see what can be done with it”. Well, pretty soon I invented what I later found out were called cellular automata.

I didn’t expect that simple cellular automata would do anything particularly interesting. But I decided to try computer experiments on them anyway. And to my great surprise I discovered that—despite the simplicity of their construction—cellular automata can in fact produce behavior of great complexity. It’s a major shock to traditional scientific intuition—and, as I came to realize in later years, a clue to a whole new kind of science.

But for me the period from 1981 to 1984 was an exciting one, as I began to explore the computational universe of simple programs like cellular automata, and saw just how rich and unexpected it was. David Pines, as the editor of Reviews of Modern Physics, had done me the favor of publishing my first big paper on cellular automata (John Maddox, editor of Nature, had published a short summary a little earlier). Through the Center for Nonlinear Studies, I had started making visits to Los Alamos in 1981, and I initiated and co-organized the first-ever conference devoted to cellular automata, held at Los Alamos in 1983.

In 1983 I had left Caltech (primarily as a result of an unhappy interaction about intellectual property rights) and gone to the Institute for Advanced Study in Princeton, and begun to build a group there concerned with studying the basic science of complex systems. I wasn’t sure until quite a few years later just how general the phenomena I’d seen in cellular automata were. But I was pretty certain that there were at least many examples of complexity across all sorts of fields that they’d finally let one explain in a fundamental, theoretical way.

I’m not sure when I first heard about plans for what was then called the Rio Grande Institute. But I remember not being very hopeful about it; it seemed too correlated with the retirement plans of a group of older physicists. But meanwhile, people like Pete Carruthers (founder of T Division at Los Alamos) were encouraging me to think about starting my own institute to pursue the kind of science I thought could be done.

I didn’t know quite what to make of the letter I received in July 1984 from Nick Metropolis (long-time Los Alamos scientist, and inventor of the Metropolis method). It described the nascent Rio Grande Institute as “a teaching and research institution responsive to the challenge of emerging new syntheses in science”. Murray Gell-Mann had told me that it would bring together physics and archaeology, linguistics and cosmology, and more. But at least in the circulated documents, the word “complexity” appeared quite often.

Letter from Los Alamos—click to enlarge

The invitation described the workshop as being “to examine a concept for a fresh approach to research and teaching in rapidly developing fields of scientific activity dealing with highly complex, interactive systems”. Murray Gell-Mann, who had become a sort of de facto intellectual leader of the effort, was given to quite flowery descriptions, and declared that the institute would be involved with “simplicity and complexity”.

When I arrived at the workshop it was clear that everyone wanted their favorite field to get a piece of the potential action. Should I even bring up my favorite emerging field? Or should I just make a few comments about computers and let the older guys do their thing?

As I listened to the talks and discussions, I kept wondering how what I was studying might relate to them. Quite often I really didn’t know. At the time I still believed, for example, that adaptive systems might have fundamentally different characteristics. But still, the term “complexity” kept on coming up. And if the Rio Grande Institute needed an area to concentrate on, it seemed that a general study of complexity would be the closest to being central to everything they were talking about.

I’m not sure quite what the people in the room made of my speech about “complex systems theory”. But I think I did succeed in making the point that there really could be a general “science of complexity”—and that things like cellular automata could show one how it might work. People had been talking about the complexity of this, or the complexity of that. But it seemed like I’d at least started the process of getting people to talk about complexity as an abstract thing one could expect to have general theories about.

After that first workshop, I had a few more interactions with what was to be the Santa Fe Institute. I still wasn’t sure what was going to happen with it—but the “science of complexity” idea did seem to be sticking. Meanwhile, however, I was forging ahead with my own plans to start a complex systems institute (I avoided the term “complexity theory” out of deference to the rather different field of computational complexity theory). I was talking to all sorts of universities, and in fact David Pines was encouraging me to consider the University of Illinois.

George Cowan asked me if I’d be interested in running the research program for the Santa Fe Institute, but by that point I was committed to starting my own operation, and it wasn’t long afterwards that I decided to do it at the University of Illinois. My Center for Complex Systems Research—and my journal Complex Systems—began operations in the summer of 1986.

Complex Systems

I’m not sure how things would have been different if I’d ended up working with the Santa Fe Institute. But as it was, I rather quickly tired of the effort to raise money for complex systems research, and I was soon off creating what became Mathematica (and now the Wolfram Language), and starting my company Wolfram Research.

By the early 1990s, probably in no small part through the efforts of the Santa Fe Institute, “complexity” had actually become a popular buzzword, and, partly through a rather circuitous connection to climate science, funding had started pouring in. But having launched Mathematica and my company, I’d personally pretty much vanished from the scene, working quietly on using the tools I’d created to pursue my interests in basic science. I thought it would only take a couple of years, but in the end it took more than a decade.

I discovered a lot—and realized that, yes, the phenomena I’d first seen with cellular automata and talked about at the Santa Fe workshop were indeed a clue to a whole new kind of science, with all sorts of implications for long-standing problems and for the future. I packaged up what I’d figured out—and in 2002 published my magnum opus A New Kind of Science.

A New Kind of Science

It was strange to reemerge after a decade and a half away. The Santa Fe Institute had continued to pursue the science of complexity. As something of a hermit in those years, I hadn’t interacted with it—but there was curiosity about what I was doing (highlighted, if nothing else, by a bizarre incident in 1998 involving “leaks” about my research). When my book came out in 2002 I was pleased that I thought I’d actually done what I talked about doing back at that Santa Fe workshop in 1984—as well as much more.

But by then almost nobody who’d been there in 1984 was still involved with the Santa Fe Institute, and instead there was a “new guard” (now, I believe, again departed), who, far from being pleased with my progress and success in broadening the field, actually responded with rather unseemly hostility.

It’s been an interesting journey from those days in October 1984. Today complex systems research is very definitely “a thing”, and there are hundreds of “complex systems” institutes around the world. (Though I still don’t think the basic science of complexity, as opposed to its applications, has received the attention it should.) But the Santa Fe Institute remains the prototypical example—and it’s not uncommon when I talk about complexity research for people to ask, “Is that like what the Santa Fe Institute does?”

“Well actually”, I sometimes say, “there’s a little footnote to history about that”. And off I go, talking about that Saturday afternoon back in October 1984—when I could be reached (as the notes I distributed said) through that newfangled thing called email at ias!swolf

Stephen Wolfram's notes on complex systems—click to enlarge

Testifying at the Senate about A.I.-Selected Content on the Internet

$
0
0
capitol-thumb

Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms

An Invitation to Washington

Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms” I wasn’t sure why I’d be relevant.

But then the email went on: “The hearing is intended to examine, among other things, whether algorithmic transparency or algorithmic explanation are policy options Congress should be considering.” That piqued my interest, because, yes, I have thought about “algorithmic transparency” and “algorithmic explanation”, and their implications for the deployment of artificial intelligence.

Generally I stay far away from anything to do with politics. But figuring out how the world should interact with AI is really important. So I decided that—even though it was logistically a bit difficult—I should do my civic duty and go to Washington and testify.

Understanding the Issues

So what was the hearing really about? For me, it was in large measure an early example of reaction to the realization that, yes, AIs are starting to run the world. Billions of people are being fed content that is basically selected for them by AIs, and there are mounting concerns about this, as reported almost every day in the media.

Are the AIs cleverly hacking us humans to get us to behave in a certain way? What kind of biases do the AIs have, relative to what the world is like, or what we think the world should be like? What are the AIs optimizing for, anyway? And when are there actually “humans behind the curtain”, controlling in detail what the AIs are doing?

It doesn’t help that in some sense the AIs are getting much more free rein than they might because the people who use them aren’t really their customers. I have to say that back when the internet was young, I personally never thought it would work this way, but in today’s world many of the most successful businesses on the internet—including Google, Facebook, YouTube and Twitter—make their revenue not from their users, but instead from advertisers who are going through them to reach their users.

All these business also have in common that they are fundamentally what one can call “automated content selection businesses”: they work by getting large amounts of content that they didn’t themselves generate, then using what amounts to AI to automatically select what content to deliver or to suggest to any particular user at any given time—based on data that they’ve captured about that user. Part of what’s happening is presumably optimized to give a good experience to their users (whatever that might mean), but part of it is also optimized to get revenue from the actual customers, i.e. advertisers. And there’s also an increasing suspicion that somehow the AI is biased in what it’s doing—maybe because someone explicitly made it be, or because it somehow evolved that way.

“Open Up the AI”?

So why not just “open up the AI” and see what it’s doing inside? Well, that’s what the algorithmic transparency idea mentioned in the invitation to the hearing is about.

And the problem is that, no, that can’t work. If we want to seriously use the power of computation—and AI—then inevitably there won’t be a “human-explainable” story about what’s happening inside.

So, OK, if you can’t check what’s happening inside the AI, what about putting constraints on what the AI does? Well, to do that, you have to say what you want. What rule for balance between opposing kinds of views do you want? How much do you allow people to be unsettled by what they see? And so on.

And there are two problems here: first, what to want, and, second, how to describe it. In the past, the only way we could imagine describing things like this was with traditional legal rules, written in legalese. But if we want AIs to automatically follow these rules, perhaps billions of times a second, that’s not good enough: instead, we need something that AIs can intrinsically understand.

And at least on this point I think we’re making good progress. Because—thanks to our 30+ years of work on Wolfram Language—we’re now beginning to have a computational language that has the scope to formulate “computational contracts” that can specify relevant kinds of constraints in computational terms, in a form that humans can write and understand, and that machines can automatically interpret.

But even though we’re beginning to have the tools, there’s still the huge question of what the “computational laws” for automatic content selection AIs will be.

A lot of the hearing ultimately revolved around Section 230 of the 1996 Communications Decency Act—which specifies what kinds of content companies can choose to block without losing their status as “neutral platforms”. There’s a list of fairly uncontroversially blockable kinds of content. But then the sentence ends with “or otherwise objectionable [content]”. What does this mean? Does it mean content that espouses objectionable points of view? Whose definition of “objectionable”? Etc.

Well, one day things like Section 230 will, of necessity, not be legalese laws, but computational laws. There’ll be some piece of computational language that specifies for example that this-or-that machine learning classifier trained on this-or-that sample of the internet will be used to define this or that.

We’re not there yet, however. We’re only just beginning to be able to set up computational contracts for much simpler things, like business situations. And—somewhat fueled by blockchain—I expect that this will accelerate in the years to come. But it’s going to be a while before the US Senate is routinely debating lines of code in computational laws.

So, OK, what can be done now?

A Possible Path Forward?

A little more than a week ago, what I’d figured out was basically what I’ve already described here. But that meant I was looking at going to the hearing and basically saying only negative things. “Sorry, this won’t work. You can’t do that. The science says it’s impossible. The solution is years in the future.” Etc.

And, as someone who prides himself on turning the seemingly impossible into the possible, this didn’t sit well with me. So I decided I’d better try to figure out if I could actually see a pragmatic, near-term path forward. At first, I tried thinking about purely technological solutions. But soon I basically convinced myself that no such solution was going to work.

So, with some reticence, I decided I’d better start thinking about other kinds of solutions. Fortunately there are quite a few people at my company and in my circle who I could talk to about this—although I soon discovered they often had strongly conflicting views. But after a little while, a glimmer of an idea emerged.

Why does every aspect of automated content selection have to be done by a single business? Why not open up the pipeline, and create a market in which users can make choices for themselves?

One of the constraints I imposed on myself is that my solution couldn’t detract from the impressive engineering and monetization of current automated content selection businesses. But I came up with at least two potential ways to open things up that I think could still perfectly well satisfy this constraint.

One of my ideas involved introducing what I call “final ranking providers”: third parties who take pre-digested feature vectors from the underlying content platform, then use these to do the final ranking of items in whatever way they want. My other ideas involved introducing “constraint providers”: third parties who provide constraints in the form of computational contracts that are inserted into the machine learning loop of the automated content selection system.

The important feature of both these solutions is that users don’t have to trust the single AI of the automated content selection business. They can in effect pick their own brand of AI—provided by a third party they trust—to determine what content they’ll actually be given.

Who would these third-party providers be? They might be existing media organizations, or nonprofits, or startups. Or they might be something completely new. They’d have to have some technical sophistication. But fundamentally what they’d have to do is to define—or represent—brands that users would trust to decide what the final list of items in their news feed, or video recommendations, or search results, or whatever, might be.

Social networks get their usefulness by being monolithic: by having “everyone” connected into them. But the point is that the network can prosper as a monolithic thing, but there doesn’t need to be just one monolithic AI that selects content for all the users on the network. Instead, there can be a whole market of AIs, that users can freely pick between.

And here’s another important thing: right now there’s no consistent market pressure on the final details of how content is selected for users, not least because users aren’t the final customers. (Indeed, pretty much the only pressure right now comes from PR eruptions and incidents.) But if the ecosystem changes, and there are third parties whose sole purpose is to serve users, and to deliver the final content they want, then there’ll start to be real market forces that drive innovation—and potentially add more value.

Could It Work?

AI provides powerful ways to automate the doing of things. But AIs on their own can’t ultimately decide what they want to do. That has to come from outside—from humans defining goals. But at a practical level, where should those goals be set? Should they just come—monolithically—from an automated content selection business? Or should users have more freedom, and more choice?

One might say: “Why not let every user set everything for themselves?”. Well, the problem with that is that automated content selection is a complicated matter. And—much as I hope that there’ll soon be very widespread computational language literacy—I don’t think it’s realistic that everyone will be able to set up everything in detail for themselves. So instead, I think the better idea is to have discrete third-party providers, who set things up in a way that appeals to some particular group of users.

Then standard market forces can come into play. No doubt the result would even be a greater level of overall success for the delivery of content to users who want it (and monetize it). But this market approach also solves some other problems associated with the “single point of failure” monolithic AI.

For example, with the monolithic AI, if someone figures out how to spread some kind of bad content, it’ll spread everywhere. With third-party providers, there’s a good chance it’ll only spread through some of them.

Right now there’s lots of unhappiness about people simply being “banned” from particular content platforms. But with the market of third-party providers, banning is not an all-or-nothing proposition anymore: some providers could ban someone, but others might not.

OK, but are there “fatal flaws” with my idea? People could object that it’s technically difficult to do. I don’t know the state of the codebases inside the major automated content selection businesses. But I’m certain that with manageable effort, appropriate APIs etc. could be set up. (And it might even help these businesses by forcing some code cleanup and modernization.)

Another issue might be: how will the third-party providers be incentivized? I can imagine some organizations just being third-party providers as a public service. But in other cases they’d have to be paid a commission by the underlying content platform. The theory, though, is that good work by third-party content providers would expand the whole market, and make them “worth their commission”. Plus, of course, the underlying content platforms could save a lot by not having to deal with all those complaints and issues they’re currently getting.

What if there’s a third-party provider that upranks content some people don’t like? That will undoubtedly happen. But the point is that this is a market—so market dynamics can operate.

Another objection is that my idea makes even worse the tendency with modern technology for people to live inside “content bubbles” where they never broaden their points of view. Well, of course, there can be providers who offer broader content. But people could choose “content bubbles” providers. The good thing, though, is that they’re choosing them, and they know they’re doing that, just like they know they’re choosing to watch one television channel and not another.

Of course it’s important for the operation of society that people have some level of shared values. But what should those shared values be, and who should decide them? In a totalitarian system, it’s basically the government. Right now, with the current monolithic state of automated content selection, one could argue it’s the automated content selection businesses.

If I were running one of those businesses, I’d certainly not want to get set up as the moral arbiter for the world; it seems like a no-win role. With the third-party providers idea, there’s a way out, without damaging the viability of the business. Yes, users get more control, as arguably they should have, given that they are the fuel that makes the business work. But the core business model is still perfectly intact. And there’s a new market that opens up, for third-party providers, potentially delivering all sorts of new economic value.

What Should I Do?

At the beginning of last weekend, what I just described was basically the state of my thinking. But what should I do with it? Was there some issue I hadn’t noticed? Was I falling into some political or business trap? I wasn’t sure. But it seemed as if some idea in this area was needed, and I had an idea, so I really should tell people about it.

So I quickly wrote up the written testimony for the hearing, and sent it in by the deadline on Sunday morning. (The full text of the testimony is included at the end of this piece.)

Stephen Wolfram's written testimony

The Hearing Itself

View of the Senate

This morning was the hearing itself. It was in the same room as the hearing Mark Zuckerberg did last fall. The staffers were saying that they expected a good turnout of senators, and that of the 24 senators on the subcommittee (out of 100 total in the Senate), they expected about 15 to show up at some point or another.

At the beginning, staffers were putting out nameplates for the senators. I was trying to figure out what the arrangement was. And then I realized! It was a horseshoe configuration and Republican senators were on the right side of the horseshoe, Democrats were on the left. There really are right and left wings! (Yes, I obviously don’t watch C-SPAN enough, or I’d already know that.)

When the four of us on the panel were getting situated, one of the senators (Marsha Blackburn [R-TN]) wandered up, and started talking about computational irreducibility. Wow, I thought, this is going to be interesting. That’s a pretty abstruse science concept to be finding its way into the Senate.

Everyone had five minutes to give opening remarks, and everyone had a little countdown timer in front of them. I talked a bit about the science and technology of AI and explainability. I mentioned computational contracts and the concept of an AI Constitution. Then I said I didn’t want to just explain that everything was impossible—and gave a brief summary of my ideas for solutions. Rather uncharacteristically for me, I ended a full minute before my time was up.

The format for statements and questions was five minutes per senator. The issues raised were quite diverse. I quickly realized, though, that it was unfortunate that I really had three different things I was talking about (non-explainability, computational laws, and my ideas for a near-term solution). In retrospect perhaps I should have concentrated on the near-term solution, but it felt odd to be emphasizing something I just thought of last week, rather than something I’ve thought about for many years.

Still, it was fascinating—and a sign of things to come—to see serious issues about what amounts to the philosophy of computation being discussed in the Senate. To be fair, I had done a small hearing at the Senate back in 2003 (my only other such experience) about the ideas in A New Kind of Science. But then it had been very much on the “science track”; now the whole discussion was decidedly mainstream.

I couldn’t help thinking that I was witnessing the concept of computation beginning to come of age. What used to be esoteric issues in the theory of computation were now starting to be things that senators were discussing writing laws about. One of the senators mentioned atomic energy, and compared it to AI. But really, AI is going to be something much more central to the whole future of our species.

It enables us to do so much. But yet it forces us to confront what we want to do, and who we want to be. Today it’s rare and exotic for the Senate to be discussing issues of AI. In time I suspect AI and its many consequences will be a dominant theme in many Senate discussions. This is just the beginning.

I wish we were ready to really start creating an AI Constitution. But we’re not (and it doesn’t help that we don’t have an AI analog of the few thousand years of human political history that were available as a guide when the US Constitution was drafted). Still, issue by issue I suspect we’ll move closer to the point where having a coherent AI Constitution becomes a necessity. No doubt there’ll be different ones in different communities and different countries. But one day a group like the one I saw today—with all the diverse and sometimes colorful characters involved—will end up having to figure out just how we humans interact with AI and the computational world.


The Written Testimony

Summary

Automated content selection by internet businesses has become progressively more contentious—leading to calls to make it more transparent or constrained. I explain some of the complex intellectual and scientific problems involved, then offer two possible technical and market suggestions for paths forward. Both are based on giving users a choice about who to trust for the final content they see—in one case introducing what I call “final ranking providers”, and in the other case what I call “constraint providers”.

The Nature of the Problem

There are many kinds of businesses that operate on the internet, but some of the largest and most successful are what one can call automated content selection businesses. Facebook, Twitter, YouTube and Google are all examples. All of them deliver content that others have created, but a key part of their value is associated with their ability to (largely) automatically select what content they should serve to a given user at a given time—whether in news feeds, recommendations, web search results, or advertisements.

What criteria are used to determine content selection? Part of the story is certainly to provide good service to users. But the paying customers for these businesses are not the users, but advertisers, and necessarily a key objective of these businesses must be to maximize advertising income. Increasingly, there are concerns that this objective may have unacceptable consequences in terms of content selection for users. And in addition there are concerns that—through their content selection—the companies involved may be exerting unreasonable influence in other kinds of business (such as news delivery), or in areas such as politics.

Methods for content selection—using machine learning, artificial intelligence, etc.—have become increasingly sophisticated in recent years. A significant part of their effectiveness—and economic success—comes from their ability to use extensive data about users and their previous activities. But there has been increasing dissatisfaction and, in some cases, suspicion about just what is going on inside the content selection process.

This has led to a desire to make content selection more transparent, and perhaps to constrain aspects of how it works. As I will explain, these are not easy things to achieve in a useful way. And in fact, they run into deep intellectual and scientific issues, that are in some ways a foretaste of problems we will encounter ever more broadly as artificial intelligence becomes more central to the things we do. Satisfactory ultimate solutions will be difficult to develop, but I will suggest here two near-term practical approaches that I believe significantly address current concerns.

How Automated Content Selection Works

Whether one’s dealing with videos, posts, webpages, news items or, for that matter, ads, the underlying problem of automated content selection (ACS) is basically always the same. There are many content items available (perhaps even billions of them), and somehow one has to quickly decide which ones are “best” to show to a given user at a given time. There’s no fundamental principle to say what “best” means, but operationally it’s usually in the end defined in terms of what maximizes user clicks, or revenue from clicks.

The major innovation that has made modern ACS systems possible is the idea of automatically extrapolating from large numbers of examples. The techniques have evolved, but the basic idea is to effectively deduce a model of the examples and then to use this model to make predictions, for example about what ranking of items will be best for a given user.

Because it will be relevant for the suggestions I’m going to make later, let me explain here a little more about how most current ACS systems work in practice. The starting point is normally to extract a collection of perhaps hundreds or thousands of features (or “signals”) for each item. If a human were doing it, they might use features like: “How long is the video? Is it entertainment or education? Is it happy or sad?” But these days—with the volume of data that’s involved—it’s a machine doing it, and often it’s also a machine figuring out what features to extract. Typically the machine will optimize for features that make its ultimate task easiest—whether or not (and it’s almost always not) there’s a human-understandable interpretation of what the features represent.

As an example, here are the letters of the alphabet automatically laid out by a machine in a “feature space” in which letters that “look similar” appear nearby:

Feature space plot

How does the machine know what features to extract to determine whether things will “look similar”? A typical approach is to give it millions of images that have been tagged with what they are of (“elephant”, “teacup”, etc.). And then from seeing which images are tagged the same (even though in detail they look different), the machine is able—using the methods of modern machine learning—to identify features that could be used to determine how similar images of anything should be considered to be.

OK, so let’s imagine that instead of letters of the alphabet laid out in a 2D feature space, we’ve got a million videos laid out in a 200-dimensional feature space. If we’ve got the features right, then videos that are somehow similar should be nearby in this feature space.

But given a particular person, what videos are they likely to want to watch? Well, we can do the same kind of thing with people as with videos: we can take the data we know about each person, and extract some set of features. “Similar people” would then be nearby in “people feature space”, and so on.

But now there’s a “final ranking” problem. Given features of videos, and features of people, which videos should be ranked “best” for which people? Often in practice, there’s an initial coarse ranking. But then, as soon as we have a specific definition of “best”—or enough examples of what we mean by “best”—we can use machine learning to learn a program that will look at the features of videos and people, and will effectively see how to use them to optimize the final ranking.

The setup is a bit different in different cases, and there are many details, most of which are proprietary to particular companies. However, modern ACS systems—dealing as they do with immense amounts of data at very high speed—are a triumph of engineering, and an outstanding example of the power of artificial intelligence techniques.

Is It “Just an Algorithm”?

When one hears the term “algorithm” one tends to think of a procedure that will operate in a precise and logical way, always giving a correct answer, not influenced by human input. One also tends to think of something that consists of well-defined steps, that a human could, if needed, readily trace through.

But this is pretty far from how modern ACS systems work. They don’t deal with the same kind of precise questions (“What video should I watch next?” just isn’t something with a precise, well-defined answer). And the actual methods involved make fundamental use of machine learning, which doesn’t have the kind of well-defined structure or explainable step-by-step character that’s associated with what people traditionally think of as an “algorithm”. There’s another thing too: while traditional algorithms tend to be small and self-contained, machine learning inevitably requires large amounts of externally supplied data.

In the past, computer programs were almost exclusively written directly by humans (with some notable exceptions in my own scientific work). But the key idea of machine learning is instead to create programs automatically, by “learning the program” from large numbers of examples. The most common type of program on which to apply machine learning is a so-called neural network. Although originally inspired by the brain, neural networks are purely computational constructs that are typically defined by large arrays of numbers called weights.

Imagine you’re trying to build a program that recognizes pictures of cats versus dogs. You start with lots of specific pictures that have been identified—normally by humans—as being either of cats or dogs. Then you “train” a neural network by showing it these pictures and gradually adjusting its weights to make it give the correct identification for these pictures. But then the crucial point is that the neural network generalizes. Feed it another picture of a cat, and even if it’s never seen that picture before, it’ll still (almost certainly) say it’s a cat.

What will it do if you feed it a picture of a cat dressed as a dog? It’s not clear what the answer is supposed to be. But the neural network will still confidently give some result—that’s derived in some way from the training data it was given.

So in a case like this, how would one tell why the neural network did what it did? Well, it’s difficult. All those weights inside the network were learned automatically; no human explicitly set them up. It’s very much like the case of extracting features from images of letters above. One can use these features to tell which letters are similar, but there’s no “human explanation” (like “count the number of loops in the letter”) of what each of the features are.

Would it be possible to make an explainable cat vs. dog program? For 50 years most people thought that a problem like cat vs. dog just wasn’t the kind of thing computers would be able to do. But modern machine learning made it possible—by learning the program rather than having humans explicitly write it. And there are fundamental reasons to expect that there can’t in general be an explainable version—and that if one’s going to do the level of automated content selection that people have become used to, then one cannot expect it to be broadly explainable.

Sometimes one hears it said that automated content selection is just “being done by an algorithm”, with the implication that it’s somehow fair and unbiased, and not subject to human manipulation. As I’ve explained, what’s actually being used are machine learning methods that aren’t like traditional precise algorithms.

And a crucial point about machine learning methods is that by their nature they’re based on learning from examples. And inevitably the results they give depend on what examples were used.

And this is where things get tricky. Imagine we’re training the cat vs. dog program. But let’s say that, for whatever reason, among our examples there are spotted dogs but no spotted cats. What will the program do if it’s shown a spotted cat? It might successfully recognize the shape of the cat, but quite likely it will conclude—based on the spots—that it must be seeing a dog.

So is there any way to guarantee that there are no problems like this, that were introduced either knowingly or unknowingly? Ultimately the answer is no—because one can’t know everything about the world. Is the lack of spotted cats in the training set an error, or are there simply no spotted cats in the world?

One can do one’s best to find correct and complete training data. But one will never be able to prove that one has succeeded.

But let’s say that we want to ensure some property of our results. In almost all cases, that’ll be perfectly possible—either by modifying the training set, or the neural network. For example, if we want to make sure that spotted cats aren’t left out, we can just insist, say, that our training set has an equal number of spotted and unspotted cats. That might not be a correct representation of what’s actually true in the world, but we can still choose to train our neural network on that basis.

As a different example, let’s say we’re selecting pictures of pets. How many cats should be there, versus dogs? Should we base it on the number of cat vs. dog images on the web? Or how often people search for cats vs. dogs? Or how many cats and dogs are registered in America? There’s no ultimate “right answer”. But if we want to, we can give a constraint that says what should happen.

This isn’t really an “algorithm” in the traditional sense either—not least because it’s not about abstract things; it’s about real things in the world, like cats and dogs. But an important development (that I happen to have been personally much involved in for 30+ years) is the construction of a computational language that lets one talk about things in the world in a precise way that can immediately be run on a computer.

In the past, things like legal contracts had to be written in English (or “legalese”). Somewhat inspired by blockchain smart contracts, we are now getting to the point where we can write automatically executable computational contracts not in human language but in computational language. And if we want to define constraints on the training sets or results of automated content selection, this is how we can do it.

Issues from Basic Science

Why is it difficult to find solutions to problems associated with automated content selection? In addition to all the business, societal and political issues, there are also some deep issues of basic science involved. Here’s a list of some of those issues. The precursors of these issues date back nearly a century, though it’s only quite recently (in part through my own work) that they’ve become clarified. And although they’re not enunciated (or named) as I have here, I don’t believe any of them are at this point controversial—though to come to terms with them requires a significant shift in intuition from what exists without modern computational thinking.


Data Deducibility

Even if you don’t explicitly know something (say about someone), it can almost always be statistically deduced if there’s enough other related data available

What is a particular person’s gender identity, ethnicity, political persuasion, etc.? Even if one’s not allowed to explicitly ask these questions, it’s basically inevitable that with enough other data about the person, one will be able to deduce what the best answers must be.

Everyone is different in detail. But the point is that there are enough commonalities and correlations between people that it’s basically inevitable that with enough data, one can figure out almost any attribute of a person.

The basic mathematical methods for doing this were already known from classical statistics. But what’s made this now a reality is the availability of vastly more data about people in digital form—as well as the ability of modern machine learning to readily work not just with numerical data, but also with things like textual and image data.

What is the consequence of ubiquitous data deducibility? It means that it’s not useful to block particular pieces of data—say in an attempt to avoid bias—because it’ll essentially always be possible to deduce what that blocked data was. And it’s not just that this can be done intentionally; inside a machine learning system, it’ll often just happen automatically and invisibly.


Computational Irreducibility

Even given every detail of a program, it can be arbitrarily hard to predict what it will
or won’t do

One might think that if one had the complete code for a program, one would readily be able to deduce everything about what the program would do. But it’s a fundamental fact that in general one can’t do this. Given a particular input, one can always just run the program and see what it does. But even if the program is simple, its behavior may be very complicated, and computational irreducibility implies that there won’t be a way to “jump ahead” and immediately find out what the program will do, without explicitly running it.

One consequence of this is that if one wants to know, for example, whether with any input a program can do such-and-such, then there may be no finite way to determine this—because one might have to check an infinite number of possible inputs. As a practical matter, this is why bugs in programs can be so hard to detect. But as a matter of principle, it means that it can ultimately be impossible to completely verify that a program is “correct”, or has some specific property.

Software engineering has in the past often tried to constrain the programs it deals with so as to minimize such effects. But with methods like machine learning, this is basically impossible to do. And the result is that even if it had a complete automated content selection program, one wouldn’t in general be able to verify that, for example, it could never show some particular bad behavior.


Non-explainability

For a well-optimized computation, there’s not likely to be a human-understandable narrative about how it works inside

Should we expect to understand how our technological systems work inside? When things like donkeys were routinely part of such systems, people didn’t expect to. But once the systems began to be “completely engineered” with cogs and levers and so on, there developed an assumption that at least in principle one could explain what was going on inside. The same was true with at least simpler software systems. But with things like machine learning systems, it absolutely isn’t.

Yes, one can in principle trace what happens to every bit of data in the program. But can one create a human-understandable narrative about it? It’s a bit like imagining we could trace the firing of every neuron in a person’s brain. We might be able to predict what a person would do in a particular case, but it’s a different thing to get a high-level “psychological narrative” about why they did it.

Inside a machine learning system—say the cats vs. dogs program—one can think of it as extracting all sorts of features, and making all sorts of distinctions. And occasionally one of these features or distinctions might be something we have a word for (“pointedness”, say). But most of the time they’ll be things the machine learning system discovered, and they won’t have any connection to concepts we’re familiar with.

And in fact—as a consequence of computational irreducibility—it’s basically inevitable that with things like the finiteness of human language and human knowledge, in any well-optimized computation we’re not going to be able to give a high-level narrative to explain what it’s doing. And the result of this is that it’s impossible to expect any useful form of general “explainability” for automated content selection systems.


Ethical Incompleteness

There’s no finite set of principles that can completely define any reasonable, practical system of ethics

Let’s say one’s trying to teach ethics to a computer, or an artificial intelligence. Is there some simple set of principles—like Asimov’s Laws of Robotics—that will capture a viable complete system of ethics? Looking at the complexity of human systems of laws one might suspect that the answer is no. And in fact this is presumably a fundamental result—essentially another consequence of computational irreducibility.

Imagine that we’re trying to define constraints (or “laws”) for an artificial intelligence, in order to ensure that the AI behaves in some particular “globally ethical” way. We set up a few constraints, and we find that many things the AI does follow our ethics. But computational irreducibility essentially guarantees that eventually there’ll always be something unexpected that’s possible. And the only way to deal with that is to add a “patch”—essentially to introduce another constraint for that new case. And the issue is that this will never end: there’ll be no way to give a finite set of constraints that will achieve our global objectives. (There’s a somewhat technical analogy of this in mathematics, in which Gödel’s theorem shows that no finite set of axiomatic constraints can give one only ordinary integers and nothing else.)

So for our purposes here, the main consequence of this is that we can’t expect to have some finite set of computational principles (or, for that matter, laws) that will constrain automated content selection systems to always behave according to some reasonable, global system of ethics—because they’ll always be generating unexpected new cases that we have to define a new principle to handle.


The Path Forward

I’ve described some of the complexities of handling issues with automated content selection systems. But what in practice can be done?

One obvious idea would be just to somehow “look inside” the systems, auditing their internal operation and examining their construction. But for both fundamental and practical reasons, I don’t think this can usefully be done. As I’ve discussed, to achieve the kind of functionality that users have become accustomed to, modern automated content selection systems make use of methods such as machine learning that are not amenable to human-level explainability or systematic predictability.

What about checking whether a system is, for example, biased in some way? Again, this is a fundamentally difficult thing to determine. Given a particular definition of bias, one could look at the internal training data used for the system—but this won’t usually give more information than just studying how the system behaves.

What about seeing if the system has somehow intentionally been made to do this or that? It’s conceivable that the source code could have explicit “if” statements that would reveal intention. But the bulk of the system will tend to consist of trained neural networks and so on—and as in most other complex systems, it’ll typically be impossible to tell what features might have been inserted “on purpose” and what are just accidental or emergent properties.

So if it’s not going to work to “look inside” the system, what about restricting how the system can be set up? For example, one approach that’s been suggested is to limit the inputs that the system can have, in an extreme case preventing it from getting any personal information about the user and their history. The problem with this is that it negates what’s been achieved over the course of many years in content selection systems—both in terms of user experience and economic success. And for example, knowing nothing about a user, if one has to recommend a video, one’s just going to have to suggest whatever video is generically most popular—which is very unlikely to be what most users want most of the time.

As a variant of the idea of blocking all personal information, one can imagine blocking just some information—or, say, allowing a third party to broker what information is provided. But if one wants to get the advantages of modern content selection methods, one’s going to have to leave a significant amount of information—and then there’s no point in blocking anything, because it’ll almost certainly be reproducible through the phenomenon of data deducibility.

Here’s another approach: what about just defining rules (in the form of computational contracts) that specify constraints on the results content selection systems can produce? One day, we’re going to have to have such computational contracts to define what we want AIs in general to do. And because of ethical incompleteness—like with human laws—we’re going to have to have an expanding collection of such contracts.

But even though (particularly through my own efforts) we’re beginning to have the kind of computational language necessary to specify a broad range of computational contracts, we realistically have to get much more experience with computational contracts in standard business and other situations before it makes sense to try setting them up for something as complex as global constraints on content selection systems.

So, what can we do? I’ve not been able to see a viable, purely technical solution. But I have formulated two possible suggestions based on mixing technical ideas with what amount to market mechanisms.

The basic principle of both suggestions is to give users a choice about who to trust, and to let the final results they see not necessarily be completely determined by the underlying ACS business.

There’s been debate about whether ACS businesses are operating as “platforms” that more or less blindly deliver content, or whether they’re operating as “publishers” who take responsibility for content they deliver. Part of this debate can be seen as being about what responsibility should be taken for an AI. But my suggestions sidestep this issue, and in different ways tease apart the “platform” and “publisher” roles.

It’s worth saying that the whole content platform infrastructure that’s been built by the large ACS businesses is an impressive and very valuable piece of engineering—managing huge amounts of content, efficiently delivering ads against it, and so on. What’s really at issue is whether the fine details of the ACS systems need to be handled by the same businesses, or whether they can be opened up. (This is relevant only for ACS businesses whose network effects have allowed them to serve a large fraction of a population. Small ACS businesses don’t have the same kind of lock-in.)


Suggestion A: Allow Users to Choose among Final Ranking Providers

Suggestion A

As I discussed earlier, the rough (and oversimplified) outline of how a typical ACS system works is that first features are extracted for each content item and each user. Then, based on these features, there’s a final ranking done that determines what will actually be shown to the user, in what order, etc.

What I’m suggesting is that this final ranking doesn’t have to be done by the same entity that sets up the infrastructure and extracts the features. Instead, there could be a single content platform but a variety of “final ranking providers”, who take the features, and then use their own programs to actually deliver a final ranking.

Different final ranking providers might use different methods, and emphasize different kinds of content. But the point is to let users be free to choose among different providers. Some users might prefer (or trust more) some particular provider—that might or might not be associated with some existing brand. Other users might prefer another provider, or choose to see results from multiple providers.

How technically would all this be implemented? The underlying content platform (presumably associated with an existing ACS business) would take on the large-scale information-handling task of deriving extracted features. The content platform would provide sufficient examples of underlying content (and user information) and its extracted features to allow the final ranking provider’s systems to “learn the meaning” of the features.

When the system is running, the content platform would in real time deliver extracted features to the final ranking provider, which would then feed this into whatever system they have developed (which could use whatever automated or human selection methods they choose). This system would generate a ranking of content items, which would then be fed back to the content platform for final display to the user.

To avoid revealing private user information to lots of different providers, the final ranking provider’s system should probably run on the content platform’s infrastructure. The content platform would be responsible for the overall user experience, presumably providing some kind of selector to pick among final ranking providers. The content platform would also be responsible for delivering ads against the selected content.

Presumably the content platform would give a commission to the final ranking provider. If properly set up, competition among final ranking providers could actually increase total revenue to the whole ACS business, by achieving automated content selection that serves users and advertisers better.

Suggestion B: Allow Users to Choose among Constraint Providers

Suggestion B

One feature of Suggestion A is that it breaks up ACS businesses into a content platform component, and a final ranking component. (One could still imagine, however, that a quasi-independent part of an ACS business could be one of the competing final ranking providers.) An alternative suggestion is to keep ACS businesses intact, but to put constraints on the results that they generate, for example forcing certain kinds of balance, etc.

Much like final ranking providers, there would be constraint providers who define sets of constraints. For example, a constraint provider could require that there be on average an equal number of items delivered to a user that are classified (say, by a particular machine learning system) as politically left-leaning or politically right-leaning.

Constraint providers would effectively define computational contracts about properties they want results delivered to users to have. Different constraint providers would define different computational contracts. Some might want balance; others might want to promote particular types of content, and so on. But the idea is that users could decide what constraint provider they wish to use.

How would constraint providers interact with ACS businesses? It’s more complicated than for final ranking providers in Suggestion A, because effectively the constraints from constraint providers have to be woven deeply into the basic operation of the ACS system.

One possible approach is to use the machine learning character of ACS systems, and to insert the constraints as part of the “learning objectives” (or, technically, “loss functions”) for the system. Of course, there could be constraints that just can’t be successfully learned (for example, they might call for types of content that simply don’t exist). But there will be a wide range of acceptable constraints, and in effect, for each one, a different ACS system would be built.

All these ACS systems would then be operated by the underlying ACS business, with users selecting which constraint provider—and therefore which overall ACS system—they want to use.

As with Suggestion A, the underlying ACS business would be responsible for delivering advertising, and would pay a commission to the constraint provider.


Although their detailed mechanisms are different, both Suggestions A and B attempt to leverage the exceptional engineering and commercial achievements of the ACS businesses, while diffusing current trust issues about content selection, providing greater freedom for users, and inserting new opportunities for market growth.

The suggestions also help with some other issues. One example is the banning of content providers. At present, with ACS businesses feeling responsible for content on their platforms, there is considerable pressure, not least from within the ACS businesses themselves, to ban content providers that they feel are providing inappropriate content. The suggestions diffuse the responsibility for content, potentially allowing the underlying ACS businesses not to ban anything but explicitly illegal content.

It would then be up to the final ranking providers, or the constraint providers, to choose whether or not to deliver or allow content of a particular character, or from a particular content provider. In any given case, some might deliver or allow it, and some might not, removing the difficult all-or-none nature of the banning that’s currently done by ACS businesses.

One feature of my suggestions is that they allow fragmentation of users into groups with different preferences. At present, all users of a particular ACS business have content that is basically selected in the same way. With my suggestions, users of different persuasions could potentially receive completely different content, selected in different ways.

While fragmentation like this appears to be an almost universal tendency in human society, some might argue that having people routinely be exposed to other people’s points of view is important for the cohesiveness of society. And technically some version of this would not be difficult to achieve. For example, one could take the final ranking or constraint providers, and effectively generate a feature space plot of what they do.

Some would be clustered close together, because they lead to similar results. Others would be far apart in feature space—in effect representing very different points of view. Then if someone wanted to, say, see their typical content 80% of the time, but see different points of view 20% of the time, the system could combine different providers from different parts of feature space with a certain probability.

Of course, in all these matters, the full technical story is much more complex. But I am confident that if they are considered desirable, either of the suggestions I have made can be implemented in practice. (Suggestion A is likely to be somewhat easier to implement than Suggestion B.) The result, I believe, will be richer, more trusted, and even more widely used automated content selection. In effect both my suggestions mix the capabilities of humans and AIs—to help get the best of both of them—and to navigate through the complex practical and fundamental problems with the use of automated content selection.


Mitchell Feigenbaum (1944–2019), 4.66920160910299067185320382…

$
0
0
feigenbaum_icon
Mitchell Feigenbaum
(Artwork by Gunilla Feigenbaum)

Behind the Feigenbaum Constant

It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior.

Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by doing experimental mathematics on a pocket calculator.

It became a defining discovery in the history of chaos theory. But when it was first discovered, it was a surprising, almost bizarre result, that didn’t really connect with anything that had been studied before. Somehow, though, it’s fitting that it should have been Mitchell Feigenbaum—who I knew for nearly 40 years—who would discover it.

Trained in theoretical physics, and a connoisseur of its mathematical traditions, Mitchell always seemed to see himself as an outsider. He looked a bit like Beethoven—and projected a certain stylish sense of intellectual mystery. He would often make strong assertions, usually with a conspiratorial air, a twinkle in his eye, and a glass of wine or a cigarette in his hand.

He would talk in long, flowing sentences which exuded a certain erudite intelligence. But ideas would jump around. Sometimes detailed and technical. Sometimes leaps of intuition that I, for one, could not follow. He was always calculating, staying up until 5 or 6 am, filling yellow pads with formulas and stressing Mathematica with elaborate algebraic computations that might run for hours.

He published very little, and what he did publish he was often disappointed wasn’t widely understood. When he died, he had been working for years on the optics of perception, and on questions like why the Moon appears larger when it’s close to the horizon. But he never got to the point of publishing anything on any of this.

For more than 30 years, Mitchell’s official position (obtained essentially on the basis of his Feigenbaum constant result) was as a professor at the Rockefeller University in New York City. (To fit with Rockefeller’s biological research mission, he was themed as the Head of the “Laboratory of Mathematical Physics”.) But he dabbled elsewhere, lending his name to a financial computation startup, and becoming deeply involved in inventing new cartographic methods for the Hammond World Atlas.

What Mitchell Discovered

The basic idea is quite simple. Take a value x between 0 and 1. Then iteratively replace x by a x (1 – x). Let’s say one starts from x = , and takes a = 3.2. Then here’s what one gets for the successive values of x:

Successive values
&#10005
ListLinePlot[NestList[Compile[x, 3.2 x (1 - x)], N[1/3], 50], 
 Mesh -> All, PlotRange -> {0, 1}, Frame -> True]

After a little transient, the values of x are periodic, with period 2. But what happens with other values of a? Here are a few results for this so-called “logistic map”:

Logistic map
&#10005
GraphicsGrid[
 Partition[
  Table[Labeled[
    ListLinePlot[NestList[Compile[x, a x (1 - x)], N[1/3], 50], 
     Sequence[
     Mesh -> All, PlotRange -> {0, 1}, Frame -> True, 
      FrameTicks -> None]], StringTemplate["a = ``"][a]], {a, 2.75, 
    4, .25}], 3], Spacings -> {.1, -.1}]

For small a, the values of x quickly go to a fixed point. For larger a they become periodic, first with period 2, then 4. And finally, for larger a, the values start bouncing around seemingly randomly.

One can summarize this by plotting the values of x (here, 300, after dropping the first 50 to avoid transients) reached as a function of the value of a:

Period doublings
&#10005
ListPlot[Flatten[
  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a x (1 - x)], N[1/3], 300], 50], {a, 0, 
    4, .01}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

As a increases, one sees a cascade of “period doublings”. In this case, they’re at a = 3, a 3.449, a 3.544090, a 3.5644072. What Mitchell noticed is that these successive values approach a limit (here a 3.569946) in a geometric sequence, with aan ~ δ-n and δ 4.669.

That’s a nice little result. But here’s what makes it much more significant: it isn’t just true about the specific iterated map xa x (1 – x); it’s true about any map like that. Here, for example, is the “bifurcation diagram” for xa sin(π ):

Bifucation diagram
&#10005
ListPlot[Flatten[
  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a Sin[Pi Sqrt@x]], N[1/3], 300], 50], {a,
     0, 1, .002}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

The details are different. But what Mitchell noticed is that the positions of the period doublings again form a geometric sequence, with the exact same base: δ 4.669.

It’s not just that different iterated maps give qualitatively similar results; when one measures the convergence rate this turns out be exactly and quantitatively the same—always δ 4.669. And this was Mitchell’s big discovery: a quantitatively universal feature of the approach to chaos in a class of systems.

The Scientific Backstory

The basic idea behind iterated maps has a long history, stretching all the way back to antiquity. Early versions arose in connection with finding successive approximations, say to square roots. For example, using Newton’s method from the late 1600s, can be obtained by iterating x (here starting from x = 1):

Starting from x = 1
&#10005
NestList[Function[x, 1/x + x/2], N[1, 8], 6]

The notion of iterating an arbitrary function seems to have first been formalized in an 1870 paper by Ernst Schröder (who was notable for his work in formalizing things from powers to Boolean algebra), although most of the discussion that arose was around solving functional equations, not actually doing iterations. (An exception was the investigation of regions of convergence for Newton’s approximation by Arthur Cayley in 1879.) In 1918 Gaston Julia made a fairly extensive study of iterated rational functions in the complex plane—inventing, if not drawing, Julia sets. But until fractals in the late 1970s (which soon led to the Mandelbrot set), this area of mathematics basically languished.

But quite independent of any pure mathematical developments, iterated maps with forms similar to xa x (1 – x) started appearing in the 1930s as possible practical models in fields like population biology and business cycle theory—usually arising as discrete annualized versions of continuous equations like the Verhulst logistic differential equation from the mid-1800s. Oscillatory behavior was often seen—and in 1954 William Ricker (one of the founders of fisheries science) also found more complex behavior when he iterated some empirical fish reproduction curves.

Back in pure mathematics, versions of iterated maps had also shown up from time to time in number theory. In 1799 Carl Friedrich Gauss effectively studied the map x FractionalPart[] in connection with continued fractions. And starting in the late 1800s there was interest in studying maps like x FractionalPart[a x] and their connections to the properties of the number a.

Particularly following Henri Poincaré’s work on celestial mechanics around 1900, the idea of sensitive dependence on initial conditions arose, and it was eventually noted that iterated maps could effectively “excavate digits” in their initial conditions. For example, iterating xFractionalPart[10 x], starting with the digits of π, gives (effectively just shifting the sequence of digits one place to the left at each step):

Starting with the digits of pi...
&#10005
N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 5], 10]
FractionalPart
&#10005
ListLinePlot[
 Rest@N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 50], 
   40], Mesh -> All]

(Confusingly enough, with typical “machine precision” computer arithmetic, this doesn’t work correctly, because even though one “runs out of precision”, the IEEE Floating Point standard says to keep on delivering digits, even though they are completely wrong. Arbitrary precision in the Wolfram Language gets it right.)

Maps like xa x(1 – x) show similar kinds of “digit excavation” behavior (for example, replacing x by sin[π u]2, x ⟶ 4 x(1 – x) becomes exactly uFractionalPart[u, 2]—and this was already known by the 1940s, and, for example, commented on by John von Neumann in connection with his 1949 iterative “middle-square” method for generating pseudorandom numbers by computer.

But what about doing experimental math on iterated maps? There wasn’t too much experimental math at all on early digital computers (after all, most computer time was expensive). But in the aftermath of the Manhattan Project, Los Alamos had built its own computer (named MANIAC), that ended up being used for a whole series of experimental math studies. And in 1964 Paul Stein and Stan Ulam wrote a report entitled “Non-linear Transformation Studies on Electronic Computers” that included photographs of oscilloscope-like MANIAC screens displaying output from some fairly elaborate iterated maps. In 1971, another “just out of curiosity” report from Los Alamos (this time by Nick Metropolis [leader of the MANIAC project, and developer of the Monte Carlo method], Paul Stein and his brother Myron Stein) started to give more specific computer results for the behavior logistic maps, and noted the basic phenomenon of period doubling (which they called the “U-sequence”), as well as its qualitative robustness under changes in the underlying map.

But quite separately from all of this, there were other developments in physics and mathematics. In 1964 Ed Lorenz (a meteorologist at MIT) introduced and simulated his “naturally occurring” Lorenz differential equations, that showed sensitive dependence on initial conditions. Starting in the 1940s (but following on from Poincaré’s work around 1900) there’d been a steady stream of developments in mathematics in so-called dynamical systems theory—particularly investigating global properties of the solutions to differential equations. Usually there’d be simple fixed points observed; sometimes “limit cycles”. But by the 1970s, particularly after the arrival of early computer simulations (like Lorenz’s), it was clear that for nonlinear equations something else could happen: a so-called “strange attractor”. And in studying so-called “return maps” for strange attractors, iterated maps like the logistic map again appeared.

But it was in 1975 that various threads of development around iterated maps somehow converged. On the mathematical side, dynamical systems theorist Jim Yorke and his student Tien-Yien Li at the University of Maryland published their paper “Period Three Implies Chaos”, showing that in an iterated map with a particular parameter value, if there’s ever an initial condition that leads to a cycle of length 3, there must be other initial conditions that don’t lead to cycles at all—or, as they put it, show chaos. (As it turned out, Aleksandr Sarkovskii—who was part of a Ukrainian school of dynamical systems research—had already in 1962 proved the slightly weaker result that a cycle of period 3 implies cycles of all periods.)

But meanwhile there had also been growing interest in things like the logistic maps among mathematically oriented population biologists, leading to the rather readable review (published in mid-1976) entitled “Simple Mathematical Models with Very Complicated Dynamics” by physics-trained Australian Robert May, who was then a biology professor at Princeton (and would subsequently become science advisor to the UK government, and is now “Baron May of Oxford”).

But even though things like sketches of bifurcation diagrams existed, the discovery of their quantitatively universal properties had to await Mitchell Feigenbaum and his discovery.

Mitchell’s Journey

Mitchell Feigenbaum grew up in Brooklyn, New York. His father was an analytical chemist, and his mother was a public-school teacher. Mitchell was unenthusiastic about school, though did well on math and science tests, and managed to teach himself calculus and piano. In 1960, at age 16, as something of a prodigy, he enrolled in the City College of New York, officially studying electrical engineering, but also taking physics and math classes. After graduating in 1964, he went to MIT. Initially he was going to do a PhD in electrical engineering, but he quickly switched to physics.

But although he was enamored of classic mathematical physics (as represented, for example, in the books of Landau and Lifshiftz), he ended up writing his thesis on a topic set by his advisor about particle physics, and specifically about evaluating a class of Feynman diagrams for the scattering of photons by scalar particles (with lots of integrals, if not special functions). It wasn’t a terribly exciting thesis, but in 1970 he was duly dispatched to Cornell for a postdoc position.

Mitchell struggled with motivation, preferring to hang out in coffee shops doing the New York Times crossword (at which he was apparently very fast) to doing physics. But at Cornell, Mitchell made several friends who were to be important to him. One was Predrag Cvitanović, a star graduate student from what is now Croatia, who was studying quantum electrodynamics, and with whom he shared an interest in German literature. Another was a young poet named Kathleen Doorish (later, Kathy Hammond), who was a friend of Predrag’s. And another was a rising-star physics professor named Pete Carruthers, with whom he shared an interest in classical music.

In the early 1970s quantum field theory was entering a golden age. But despite the topic of his thesis, Mitchell didn’t get involved, and in the end, during his two years at Cornell, he produced no visible output at all. Still, he had managed to impress Hans Bethe enough to be dispatched for another postdoc position, though now at a place lower in the pecking order of physics, Virginia Polytechnic Institute, in rural Virginia.

At Virginia Tech, Mitchell did even less well than at Cornell. He didn’t interact much with people, and he produced only one three-page paper: “The Relationship between the Normalization Coefficient and Dispersion Function for the Multigroup Transport Equation”. As its title might suggest, the paper was quite technical and quite unexciting.

As Mitchell’s two years at Virginia Tech drew to a close it wasn’t clear what was going to happen. But luck intervened. Mitchell’s friend from Cornell, Pete Carruthers, had just been hired to build up the theory division (“T Division”) at Los Alamos, and given carte blanche to hire several bright young physicists. Pete would later tell me with pride (as part of his advice to me about general scientific management) that he had a gut feeling that Mitchell could do something great, and that despite other people’s input—and the evidence—he decided to bet on Mitchell.

Having brought Mitchell to Los Alamos, Pete set about suggesting projects for him. At first, it was following up on some of Pete’s own work, and trying to compute bulk collective (“transport”) properties of quantum field theories as a way to understand high-energy particle collisions—a kind of foreshadowing of investigations of quark-gluon plasma.

But soon Pete suggested that Mitchell try looking at fluid turbulence, and in particular on seeing whether renormalization group methods might help in understanding it.

Whenever a fluid—like water—flows sufficiently rapidly it forms lots of little eddies and behaves in a complex and seemingly random way. But even though this qualitative phenomenon had been discussed for centuries (with, for example, Leonardo da Vinci making nice pictures of it), physics had had remarkably little to say about it—though in the 1940s Andrei Kolmogorov had given a simple argument that the eddies should form a cascade with a k distribution of energies. At Los Alamos, though, with its focus on nuclear weapons development (inevitably involving violent fluid phenomena), turbulence was a very important thing to understand—even if it wasn’t obvious how to approach it.

But in 1974, there was news that Ken Wilson from Cornell had just “solved the Kondo problem” using a technique called the renormalization group. And Pete Carruthers suggested that Mitchell should try to apply this technique to turbulence.

The renormalization group is about seeing how changes of scale (or other parameters) affect descriptions (and behavior) of systems. And as it happened, it was Mitchell’s thesis advisor at MIT, Francis Low, who, along with Murray Gell-Mann, had introduced it back in 1954 in the context of quantum electrodynamics. The idea had lain dormant for many years, but in the early 1970s it came back to life with dramatic—though quite different—applications in both particle physics (specifically, QCD) and condensed matter physics.

In a piece of iron at room temperature, you can basically get all electron spins associated with each atom lined up, so the iron is magnetized. But if you heat the iron up, there start to be fluctuations, and suddenly—above the so-called Curie temperature (770°C for iron)—there’s effectively so much randomness that the magnetization disappears. And in fact there are lots of situations (think, for example, melting or boiling—or, for that matter, the formation of traffic jams) where this kind of sudden so-called phase transition occurs.

But what is actually going on in a phase transition? I think the clearest way to see this is by looking at an analog in cellular automata. With the particular rule shown below, if there aren’t very many initial black cells, the whole system will soon be white. But if you increase the number of initial black cells (as a kind of analog of increasing the temperature in a magnetic system), then suddenly, in this case at 50% black, there’s a sharp transition, and now the whole system eventually becomes black. (For phase transition experts: yes, this is a phase transition in a 1D system; one only needs 2D if the system is required to be microscopically reversible.)

&#10005
GraphicsRow[SeedRandom[234316];
 Table[ArrayPlot[
   CellularAutomaton[<|
     "RuleNumber" -> 294869764523995749814890097794812493824, 
     "Colors" -> 4|>, 
    3 Boole[Thread[RandomReal[{0, 1}, 2000] < rho]], {500, {-300, 
      300}}], FrameLabel -> {None, 
Row[{
Round[100 rho], "% black"}]}], {rho, {0.4, 0.45, 0.55, 0.6}}], -30]

But what does the system do near 50% black? In effect, it can’t decide whether to finally become black or white. And so it ends up showing a whole hierarchy of “fluctuations” from the smallest scales to the largest. And what became clear by the 1960s is that the “critical exponents” characterizing the power laws describing these fluctuations are universal across many different systems.

But how can one compute these critical exponents? In a few toy cases, analytical methods were known. But mostly, something else was needed. And in the late 1960s Ken Wilson realized that one could use the renormalization group, and computers. One might have a model for how individual spins interact. But the renormalization group gives a procedure for “scaling up” to the interactions of larger and larger blocks of spins. And by studying that on a computer, Ken Wilson was able to start computing critical exponents.

At first, the physics world didn’t pay much attention, not least because they weren’t used to computers being so intimately in the loop in theoretical physics. But then there was the Kondo problem (and, yes, so far as I know, it has no relation to modern Kondoing—though it does relate to modern quantum dot cellular automata). In most materials, electrical resistivity decreases as the temperature decreases (going to zero for superconductors even above absolute zero). But back in the 1930s, measurements on gold had shown instead an increase of resistivity at low temperatures. By the 1960s, it was believed that this was due to the scattering of electrons from magnetic impurities—but calculations ran into trouble, generating infinite results.

But then, in 1975, Ken Wilson applied his renormalization group methods—and correctly managed to compute the effect. There was still a certain mystery about the whole thing (and it probably didn’t help that—at least when I knew him in the 1980s and beyond—I often found Ken Wilson’s explanations quite hard to understand). But the idea that the renormalization group could be important was established.

So how might it apply to fluid turbulence? Kolmogorov’s power law seemed suggestive. But could one take the Navier–Stokes equations which govern idealized fluid flow and actually derive something like this? This was the project on which Mitchell Feigenbaum embarked.

The Big Discovery

The Navier–Stokes equations are very hard to work with. In fact, to this day it’s still not clear how even the most obvious feature of turbulence—its apparent randomness—arises from these equations. (It could be that the equations aren’t a full or consistent mathematical description, and one’s actually seeing amplified microscopic molecular motions. It could be that—as in chaos theory and the Lorenz equations—it’s due to amplification of randomness in the initial conditions. But my own belief, based on work I did in the 1980s, is that it’s actually an intrinsic computational phenomenon—analogous to the randomness one sees in my rule 30 cellular automaton.)

So how did Mitchell approach the problem? He tried simplifying it—first by going from equations depending on both space and time to ones depending only on time, and then by effectively making time discrete, and looking at iterated maps. Through Paul Stein, Mitchell knew about the (not widely known) previous work at Los Alamos on iterated maps. But Mitchell didn’t quite know where to go with it, though having just got a swank new HP-65 programmable calculator, he decided to program iterated maps on it.

Then in July 1975, Mitchell went (as I also did a few times in the early 1980s) to the summer physics hang-out-together event in Aspen, CO. There he ran into Steve Smale—a well-known mathematician who’d been studying dynamical systems—and was surprised to find Smale talking about iterated maps. Smale mentioned that someone had asked him if the limit of the period-doubling cascade a 3.56995 could be expressed in terms of standard constants like π and . Smale related that he’d said he didn’t know. But Mitchell’s interest was piqued, and he set about trying to figure it out.

He didn’t have his HP-65 with him, but he dove into the problem using the standard tools of a well-educated mathematical physicist, and had soon turned it into something about poles of functions in the complex plane—about which he couldn’t really say anything. Back at Los Alamos in August, though, he had his HP-65, and he set about programming it to find the bifurcation points an.

The iterative procedure ran pretty fast for small n. But by n = 5 it was taking 30 seconds. And for n = 6 it took minutes. While it was computing, however, Mitchell decided to look at the an values he had so far—and noticed something: they seemed to be converging geometrically to a final value.

At first, he just used this fact to estimate a, which he tried—unsuccessfully—to express in terms of standard constants. But soon he began to think that actually the convergence exponent δ was more significant than a—since its value stayed the same under simple changes of variables in the map. For perhaps a month Mitchell tried to express δ in terms of standard constants.

But then, in early October 1975, he remembered that Paul Stein had said period doubling seemed to look the same not just for logistic maps but for any iterated map with a single hump. Reunited with his HP-65 after a trip to Caltech, Mitchell immediately tried the map x ⟶ sin(x)—and discovered that, at least to 3-digit precision, the exponent δ was exactly the same.

He was immediately convinced that he’d discovered something great. But Stein told him he needed more digits to really conclude much. Los Alamos had plenty of powerful computers—so the next day Mitchell got someone to show him how to write a program in FORTRAN on one of them to go further—and by the end of the day he had managed to compute that in both cases δ was about 4.6692.

The computer he used was a typical workhorse US scientific computer of the day: a CDC 6000 series machine (of the same type I used when I first moved to the US in 1978). It had been designed by Seymour Cray, and by default it used 60-bit floating-point numbers. But at this precision (about 14 decimal digits), 4.6692 was as far as Mitchell could compute. Fortunately, however, Pete’s wife Lucy Carruthers was a programmer at Los Alamos, and she showed Mitchell how to use double precision—with the result that he was able to compute δ to 11-digit precision, and determine that the values for his two different iterated maps agreed.

Within a few weeks, Mitchell had found that δ seemed to be universal whenever the iterated map had a single quadratic maximum. But he didn’t know why this was, or have any particular framework for thinking about it. But still, finally, at the age of 30, Mitchell had discovered something that he thought was really interesting.

On Mitchell’s birthday, December 19, he saw his friend Predrag, and told him about his result. But at the time, Predrag was working hard on mainstream particle physics, and didn’t pay too much attention.

Mitchell continued working, and within a few months he was convinced that not only was the exponent δ universal—the appropriately scaled, limiting, infinitely wiggly, actual iteration of the map was too. In April 1976 Mitchell wrote a report announcing his results. Then on May 2, 1976, he gave a talk about them at the Institute for Advanced Study in Princeton. Predrag was there, and now he got interested in what Mitchell was doing.

As so often, however, it was hard to understand just what Mitchell was talking about. But by the next day, Predrag had successfully simplified things, and come up with a single, explicit, functional equation for the limiting form of the scaled iterated map: g(g(x)) = , with α 2.50290—implying that for any iterated map of the appropriate type, the limiting form would always look like an even wigglier version of:

FeigenbaumFunction plot
&#10005
fUD[z_] = 
  1. - 1.5276329970363323 z^2 + 0.1048151947874277 z^4 + 
   0.026705670524930787 z^6 - 0.003527409660464297 z^8 + 
   0.00008160096594827505 z^10 + 0.000025285084886512315 z^12 - 
   2.5563177536625283*^-6 z^14 - 9.65122702290271*^-8 z^16 + 
   2.8193175723520713*^-8 z^18 - 2.771441260107602*^-10 z^20 - 
   3.0292086423142963*^-10 z^22 + 2.6739057855563045*^-11 z^24 + 
   9.838888060875235*^-13 z^26 - 3.5838769501333333*^-13 z^28 + 
   2.063994985307743*^-14 z^30;
   fCF = Compile[{z}, 
    Module[{\[Alpha] = -2.5029078750959130867, n, \[Zeta]},
     n = If[Abs[z] <= 1., 0, Ceiling[Log[-\[Alpha], Abs[z]]]];
     \[Zeta] = z/\[Alpha]^n;
     Do[\[Zeta] = #, {2^n}];
     \[Alpha]^n \[Zeta]]] &[fUD[\[Zeta]]];
     Plot[fCF[x], {x, -100, 100}, MaxRecursion -> 5, PlotRange -> All]

How It Developed

The whole area of iterated maps got a boost on June 10, 1976, with the publication in Nature of Robert May’s survey about them, written independent of Mitchell and (of course) not mentioning his results. But in the months that followed, Mitchell traveled around and gave talks about his results. The reactions were mixed. Physicists wondered how the results related to physics. Mathematicians wondered about their status, given that they came from experimental mathematics, without any formal mathematical proof. And—as always—people found Mitchell’s explanations hard to understand.

In the fall of 1976, Predrag went as a postdoc to Oxford—and on the very first day that I showed up as 17-year-old particle-physics-paper-writing undergraduate, I ran into him. We talked mostly about his elegant “bird tracks” method for doing group theory (about which he finally published a book 32 years later). But he also tried to explain iterated maps. And I still remember him talking about an idealized model for fish populations in the Adriatic Sea (only years later did I make the connection that Predrag was from what is now Croatia).

At the time I didn’t pay much attention, but somehow the idea of iterated maps lodged in my consciousness, soon mixed together with the notion of fractals that I learned from Benoit Mandelbrot’s book. And when I began to concentrate on issues of complexity a couple of years later, these ideas helped guide me towards systems like cellular automata.

But back in 1976, Mitchell (who I wouldn’t meet for several more years) was off giving lots of talks about his results. He also submitted a paper to the prestigious academic journal Advances in Mathematics. For 6 months he heard nothing. But eventually the paper was rejected. He tried again with another paper, now sending it to the SIAM Journal of Applied Mathematics. Same result.

I have to say I’m not surprised this happened. In my own experience of academic publishing (now long in the past), if one was reporting progress within an established area it wasn’t too hard to get a paper published. But anything genuinely new or original one could pretty much count on getting rejected by the peer review process, either through intellectual shortsightedness or through academic corruption. And for Mitchell there was the additional problem that his explanations weren’t easy to understand.

But finally, in late 1977, Joel Lebowitz, editor of the Journal of Statistical Physics, agreed to publish Mitchell’s paper—essentially on the basis of knowing Mitchell, even though he admitted he didn’t really understand the paper. And so it was that early in 1978 “Quantitative Universality for a Class of Nonlinear Transformations”—reporting Mitchell’s big result—officially appeared. (For purposes of academic priority, Mitchell would sometimes quote a summary of a talk he gave on August 26, 1976, that was published in the Los Alamos Theoretical Division Annual Report 1975–1976. Mitchell was quite affected by the rejection of his papers, and for years kept the rejection letters in his desk drawer.)

Mitchell continued to travel the world talking about his results. There was interest, but also confusion. But in the summer of 1979, something exciting happened: Albert Libchaber in Paris reported results on a physical experiment on the transition to turbulence in convection in liquid helium—where he saw period doubling, with exactly the exponent δ that Mitchell had calculated. Mitchell’s δ apparently wasn’t just universal to a class of mathematical systems—it also showed up in real, physical systems.

Pretty much immediately, Mitchell was famous. Connections to the renormalization group had been made, and his work was becoming fashionable among both physicists and mathematicians. Mitchell himself was still traveling around, but now he was regularly hobnobbing with the top physicists and mathematicians.

I remember him coming to Caltech, perhaps in the fall of 1979. There was a certain rock-star character to the whole thing. Mitchell showed up, gave a stylish but somewhat mysterious talk, and was then whisked away to talk privately with Richard Feynman and Murray Gell-Mann.

Soon Mitchell was being offered all sorts of high-level jobs, and in 1982 he triumphantly returned to Cornell as a full professor of physics. There was an air of Nobel Prize–worthiness, and by June 1984 he was appearing in the New York Times magazine, in full Beethoven mode, in front of a Cornell waterfall:

Mitchell in New York Times Magazine

Still, the mathematicians weren’t satisfied. As with Benoit Mandelbrot’s work, they tended to see Mitchell’s results as mere “numerical conjectures”, not proven and not always even quite worth citing. But top mathematicians (who Mitchell had befriended) were soon working on the problem, and results began to appear—though it took a decade for there to be a full, final proof of the universality of δ.

Where the Science Went

So what happened to Mitchell’s big discovery? It was famous, for sure. And, yes, period-doubling cascades with his universal features were seen in a whole sequence of systems—in fluids, optics and more. But how general was it, really? And could it, for example, be extended to the full problem of fluid turbulence?

Mitchell and others studied systems other than iterated maps, and found some related phenomena. But none were quite as striking as Mitchell’s original discovery.

In a sense, my own efforts on cellular automata and the behavior of simple programs, beginning around 1981, have tried to address some of the same bigger questions as Mitchell’s work might have led to. But the methods and results have been very different. Mitchell always tried to stay close to the kinds of things that traditional mathematical physics can address, while I unabashedly struck out into the computational universe, investigating the phenomena that occur there.

I tried to see how Mitchell’s work might relate to mine—and even in my very first paper on cellular automata in 1981 I noted for example that the average density of black cells on successive steps of a cellular automaton’s evolution can be approximated (in “mean field theory”) by an iterated map.

I also noted that mathematically the whole evolution of a cellular automaton can be viewed as an iterated map—though on the Cantor set, rather than on ordinary real numbers. In my first paper, I even plotted the analog of Mitchell’s smooth mappings, but now they were wild and discontinuous:

Rules plot
&#10005
GraphicsRow[
 Labeled[ListPlot[
     Table[FromDigits[CellularAutomaton[#, IntegerDigits[n, 2, 12]], 
       2], {n, 0, 2^12 - 1}], Sequence[
     AspectRatio -> 1, Frame -> True, FrameTicks -> None]], 
    Text[StringTemplate["rule ``"][#]]] & /@ {22, 42, 90, 110}]

But try as I might, I could never find any strong connection with Mitchell’s work. I looked for analogs of things like period doubling, and Sarkovskii’s theorem, but didn’t find much. In my computational framework, even thinking about real numbers, with their infinite sequence of digits, was a bit unnatural. Years later, in A New Kind of Science, I had a note entitled “Smooth iterated maps”. I showed their digit sequences, and observed, rather undramatically, that Mitchell’s discovery implied an unusual nested structure at the beginning of the sequences:

Nested
&#10005
FractionalDigits[x_, digs_Integer] := 
 NestList[{Mod[2 First[#], 1], Floor[2 First[#]]} &, {x, 0}, digs][[
  2 ;;, -1]];
  GraphicsRow[
 Function[a, 
   ArrayPlot[
    FractionalDigits[#, 40] & /@ 
     NestList[a # (1 - #) &, N[1/8, 80], 80]]] /@ {2.5, 3.3, 3.4, 3.5,
    3.6, 4}]

The Rest of the Story

Portrait of Mitchell
(Photograph by Predrag Cvitanović)

So what became of Mitchell? After four years at Cornell, he moved to the Rockefeller University in New York, and for the next 30 years settled into a somewhat Bohemian existence, spending most of his time at his apartment on the Upper East Side of Manhattan.

While he was still at Los Alamos, Mitchell had married a woman from Germany named Cornelia, who was the sister of the wife of physicist (and longtime friend of mine) David Campbell, who had started the Center for Nonlinear Studies at Los Alamos, and would later go on to be provost at Boston University. But after not too long, Cornelia left Mitchell, taking up instead with none other than Pete Carruthers. (Pete—who struggled with alcoholism and other issues—later reunited with Lucy, but died in 1997 at the age of 61.)

When he was back at Cornell, Mitchell met a woman named Gunilla, who had run away from her life as a pastor’s daughter in a small town in northern Sweden at the age of 14, had ended up as a model for Salvador Dalí, and then in 1966 had been brought to New York as a fashion model. Gunilla had been a journalist, video maker, playwright and painter. Mitchell and she married in 1986, and remained married for 26 years, during which time Gunilla developed quite a career as a figurative painter.

Mitchell’s last solo academic paper was published in 1987. He did publish a handful of other papers with various collaborators, though none were terribly remarkable. Most were extensions of his earlier work, or attempts to apply traditional methods of mathematical physics to various complex fluid-like phenomena.

Mitchell liked interacting with the upper echelons of academia. He received all sorts of honors and recognition (though never a Nobel Prize). But to the end he viewed himself as something of an outsider—a Renaissance man who happened to have focused on physics, but didn’t really buy into all its institutions or practices.

From the early 1980s on, I used to see Mitchell fairly regularly, in New York or elsewhere. He became a daily user of Mathematica, singing its praises and often telling me about elaborate calculations he had done with it. Like many mathematical physicists, Mitchell was a connoisseur of special functions, and would regularly talk to me about more and more exotic functions he thought we should add.

Mitchell had two major excursions outside of academia. By the mid-1980s, the young poetess—now named Kathy Hammond—that Mitchell had known at Cornell had been an advertising manager for the New York Times and had then married into the family that owned the Hammond World Atlas. And through this connection, Mitchell was pulled into a completely new field for him: cartography.

I talked to him about it many times. He was very proud of figuring out how to use the Riemann mapping theorem to produce custom local projections for maps. He described (though I never fully understood it) a very physics-based algorithm for placing labels on maps. And he was very pleased when finally an entirely new edition of the Hammond World Atlas (that he would refer to as “my atlas”) came out.

Starting in the 1980s, there’d been an increasing trend for physics ideas to be applied to quantitative finance, and for physicists to become Wall Street quants. And with people in finance continually looking for a unique edge, there was always an interest in new methods. I was certainly contacted a lot about this—but with the success of James Gleick’s 1987 book Chaos (for which I did a long interview, though was only mentioned, misspelled, in a list of scientists who’d been helpful), there was a whole new set of people looking to see how “chaos” could help them in finance.

One of those was a certain Michael Goodkin. When he was in college back in the early 1960s, Goodkin had started a company that marketed the legal research services of law students. A few years later, he enlisted several Nobel Prize–winning economists and started what may have been the first hedge fund to do computerized arbitrage trading. Goodkin had always been a high-rolling, globetrotting gambler and backgammon player, and he made and lost a lot of money. And, down on his luck, he was looking for the next big thing—and found chaos theory, and Mitchell Feigenbaum.

For a few years he cultivated various physicists, then in 1995 he found a team to start a company called Numerix to commercialize the use of physics-like methods in computations for increasingly exotic financial instruments. Mitchell Feigenbaum was the marquee name, though the heavy lifting was mostly done by my longtime friend Nigel Goldenfeld, and a younger colleague of his named Sasha Sokol.

At the beginning there was lots of mathematical-physics-like work, and Mitchell was quite involved. (He was an enthusiast of Itô calculus, gave lectures about it, and was proud of having found 1000 speed-ups of stochastic integrations.) But what the company actually did was to write C++ libraries for banks to integrate into their systems. It wasn’t something Mitchell wanted to do long term. And after a number of years, Mitchell’s active involvement in the company declined.

(I’d met Michael Goodkin back in 1998, and 14 years later—having recently written his autobiography The Wrong Answer Faster: The Inside Story of Making the Machine That Trades Trillions—he suddenly contacted me again, pitching my involvement in a rather undefined new venture. Mitchell still spoke highly of Michael, though when the discussion rather bizarrely pivoted to me basically starting and CEOing a new company, I quickly dropped it.)

I had many interactions with Mitchell over the years, though they’re not as well archived as they might be, because they tended to be verbal rather than written, since, as Mitchell told me (in email): “I dislike corresponding by email. I still prefer to hear an actual voice and interact…”

There are fragments in my archive, though. There’s correspondence, for example, about Mitchell’s 2004 60th-birthday event, that I couldn’t attend because it conflicted with a significant birthday for one of my children. In lieu of attending, I commissioned the creation of a “Feigenbaum–Cvitanović Crystal”—a 3D rendering in glass of the limiting function g(z) in the complex plane.

It was a little complex to solve the functional equation, and the laser manufacturing method initially shattered a few blocks of glass, but eventually the object was duly made, and sent—and I was pleased many years later to see it nicely displayed in Mitchell’s apartment:

Feigenbaum–Cvitanović crystal

Sometimes my archives record mentions of Mitchell by others, usually Predrag. In 2007, Predrag reported (with characteristic wit):

“Other news: just saw Mitchell, he is dating Odyssey.

No, no, it’s not a high-level Washington type escort service—he is dating Homer’s Odyssey, by computing the positions of low stars as function of the 26000 year precession—says Hiparcus [sic] had it all figured out, but Catholic church succeeded in destroying every single copy of his tables.”

Living up to the Renaissance man tradition, Mitchell always had a serious interest in history. In 2013, responding to a piece of mine about Leibniz, Mitchell said he’d been a Leibniz enthusiast since he was a teenager, then explained:

“The Newton hagiographer (literally) Voltaire had no idea of the substance of the Monadology, so could only spoof ‘the best of all possible worlds’. Long ago I’ve published this as a verbal means of explaining 2^n universality.

Leibniz’s second published paper at age 19, ‘On the Method of Inverse Tangents’, or something like that, is actually the invention of the method of isoclines to solve ODEs, quite contrary to the extant scholarly commentary. Both Leibniz and Newton start with differential equations, already having received the diff. calculus. This is quite an intriguing story.”

But the mainstay of Mitchell’s intellectual life was always mathematical physics, though done more as a personal matter than as part of institutional academic work. At some point he was asked by his then-young goddaughter (he never had children of his own) why the Moon looks larger when it’s close to the horizon. He wrote back an explanation (a bit in the style of Euler’s Letters to a German Princess), then realized he wasn’t sure of the answer, and got launched into many years of investigation of optics and image formation. (He’d actually been generally interested in the retina since he was at MIT, influenced by Jerry Lettvin of “What the Frog’s Eye Tells the Frog’s Brain” fame.)

He would tell me about it, explaining that the usual theory of image formation was wrong, and he had a better one. He always used the size of the Moon as an example, but I was never quite clear whether the issue was one of optics or perception. He never published anything about what he did, though with luck his manuscripts (rumored to have the makings of a book) will eventually see the light of day—assuming others can understand them.

When I would visit Mitchell (and Gunilla), their apartment had a distinctly Bohemian feel, with books, papers, paintings and various devices strewn around. And then there was The Bird. It was a cockatoo, and it was loud. I’m not sure who got it or why. But it was a handful. Mitchell and Gunilla nearly got ejected from their apartment because of noise complaints from neighbors, and they ended up having to take The Bird to therapy. (As I learned in a slightly bizarre—and never executed—plan to make videogames for “they-are-alien-intelligences-right-here-on-this-planet” pets, cockatoos are social and, as pets, arguably really need a “Twitter for Cockatoos”.)

The Bird
(Photograph by Predrag Cvitanović)

In the end, though, it was Gunilla who left, with the rumor being that she’d been driven away by The Bird.

The last time I saw Mitchell in person was a few years ago. My son Christopher and I visited him at his apartment—and he was in full Mitchell form, with eyes bright, talking rapidly and just a little conspiratorially about the mathematical physics of image formation. “Bird eyes are overrated”, he said, even as his cockatoo squawked in the next room. “Eagles have very small foveas, you know. Their eyes are like telescopes.”

“Fish have the best eyes”, he said, explaining that all eyes evolved underwater—and that the architecture hadn’t really changed since. “Fish keep their entire field of view in focus, not like us”, he said. It was charming, eccentric, and very Mitchell.

For years, we had talked from time to time on the phone, usually late at night. I saw Predrag a few months ago, saying that I was surprised not to have heard from Mitchell. He explained that Mitchell was sick, but was being very private about it. Then, a few weeks ago, just after midnight, Predrag sent me an email with the subject line “Mitchell is dead”, explaining that Mitchell had died at around 8 pm, and attaching a quintessential Mitchell-in-New-York picture:

Mitchell in New York
(Photograph by Predrag Cvitanović)

It’s kind of a ritual I’ve developed when I hear that someone I know has died: I immediately search my archives. And this time I was surprised to find that a few years ago Mitchell had successfully reached voicemail I didn’t know I had. So now we can give Mitchell the last word:

Play Audio
Mitchell's voicemail

And, of course, the last number too: 4.66920160910299067185320382…

Fifty Years of Mentoring

$
0
0
Fifty Years of Mentoring

I’ve been reflecting recently on things I like to do. Of course I like creating things, figuring things out, and so on. But something else I like—that I don’t believe I’ve ever written about before—is mentoring. I’ve been doing it a shockingly long time: my first memories of it date from before I was 10 years old, 50 years ago. Somehow I always ended up being the one giving lots of advice—first to kids my own age, then also to ones somewhat younger, or older, and later to all sorts of people.

I was in England recently, and ran into someone I’d known as a kid nearly 50 years ago—and hadn’t seen since. He’s had a fascinating and successful career, but was kind enough to say that my interactions and advice to him nearly 50 years ago had really been important to him. Of course it’s nice to hear things like that—but as I reflect on it, I realize that mentoring is something I find fulfilling, whether or not I end up knowing that whatever seeds I’ve sown germinate (though, to be clear, I do find it fascinating to see what happens).

Mentoring is not like teaching. It’s something much more individual and personal. It’s about answering the specific “What should I do about X?” questions, and the general “What should I do given who I am?” questions. I’ve always been interested in people—which has been a great asset in identifying and leading people at my company all these years. It’s also what’s gotten me in recent years to write historical biography, and, sadly, to write a rather large number of obituaries.

But there’s something particularly fulfilling to me about mentoring, and about helping and changing outcomes, one person at a time. These days, there are two main populations I end up mentoring: CEOs, and kids. At some level, they’re totally different. But at some level, they’re surprisingly similar.

I like learning things, and I like solving problems. And in the mentoring I do, I’m always doing both these things. I’m hearing—often in quite a lot of detail—about different kinds of situations. And I’m trying to use my skills at problem solving to work out what to do. The constraint is always what is right for this particular person, and what is possible given the world as it is. But it’s so satisfying when one figures it out.

“Have you ever thought of X?” Sometimes, there’ll be an immediate “Oh, that’s a good idea” response. Sometimes one will be told a host of reasons why it can’t work—and then it’s a matter of picking through which objections are real, where all that’s needed is encouragement, and where there are other problems to be solved.

Sometimes my mentoring ends up being about things that have immediate effects on the world, like major strategy decisions for significant companies. Sometimes my mentoring is about things that are—for now—completely invisible to the world, like whether a kid should study this or that.

I tend to find mentoring the most interesting when it’s dealing with things I’ve never dealt with before. Maybe they’re things that are genuinely new in the world—like new situations in the technology industry. Or maybe they’re things that are just new to me, because I’ve never experienced or encountered that particular corner of human experience, or the world.

One thing that’s in common between CEOs and kids is that at some level they tend to be in “anything is possible” situations: they have a wide range of choices they can make about how to lead their companies, or their lives. And they also tend to want to think about the future—and about where they might go.

To be fair, there are both CEOs and kids where I wouldn’t be a particularly useful mentor. And most often that’s when they’re somehow already on some definite track, and where their next several years are largely determined. (Say just following a particular business plan, or a particular educational program.)

In the case of CEO mentoring, there’s a tendency for there to be quite long periods where not much happens, interspersed by the occasional urgent crises—deals to do or not, PR emergencies, personnel meltdowns, etc. (And, yes, those calls can come in at the most awkward times, making me glad that when I’m pushing other things aside, at least I can say to myself that I’m typically an official company advisor too, usually with a little equity in the company.)

With kids, things usually tend to be less urgent, and it’s more a matter of repeated interactions, gradually showing a direction, or working through issues. Sometimes—and this applies to CEOs as well—the issues are exogenous, and relate to situations in the world. Sometimes they’re endogenous, and they’re about how someone is motivated, or thinks about themselves or their place in the world.

I’ve found that the kids I find it most interesting to mentor fall into two categories. The first are the seriously precocious kids who are already starting to launch in high-flying directions. And the second are kids who aren’t connected to the high-flying world, and may be in difficult circumstances, but who somehow have some kind of spark that interactions with me can help nurture.

I’ve done a fair amount of traveling around the world in recent years (often with one or more of my own kids). And I always find it interesting to visit schools. (Research universities tend to seem similar all over the world, but as one gets to high schools and below, there are more and more obvious—and interesting—differences.) Usually I’ll give talks and have discussions with students. And there’s a pattern that’s repeated over and over again. At the end of an event, one or two students will come up to me and start an interesting conversation, and eventually I’ll hand them a business card and say: “If you ever want to chat more, send me mail”.

And, yes, the ones I hear from are a very self-selected set. Typically I’ll do an initial phone call to learn more about them. And if it seems like I can be useful, I’ll say, “Let me put you on my list of people I’ll call when I have time”.

I have a busy life, and I like to be as productive as possible. But there are always scraps of time when I’m not doing what I usually do. Maybe I’ll be driving from here to there at a time when there’s no useful meeting I can do. Maybe I’ll be procrastinating starting something because I’m not quite in the right frame of mind. And at those kinds of times it’s great to do a mentoring phone call. Because even if I’m hearing about all sorts of problems, I always find it energizing.

With CEOs, the problems can be big and sophisticated. With kids one might at first assume they’d be too familiar and low-level to be interesting. But at least for me, that’s not the case. Sometimes it’s that I started my career sufficiently early that I never personally encountered that kind of problem. Sometimes it’s that the problems are ones that newly exist only in recent years.

And particularly for kids in difficult circumstances, it’s often that with my particular trajectory in life I’ve just never been exposed to those kinds of problems. Sometimes I’m quite embarrassed at how clueless I am about some economic or social hardship a kid tells me about. But I’ll ask lots of questions—and often I’m quite proud of the solutions I’ll come up with.

I have to say that in modern times, it’s disappointing how difficult it tends to be for someone like me to reach kids who aren’t already connected to the rather high-flying parts of the world I usually deal with. There’s an example with our (very successful, I might add) Wolfram High School Summer Camp, which we’ve been putting on for the past seven years. We’ve always got great kids at the Summer Camp. But in the first few years, I noticed that almost all of them came from the most elite schools—usually on the East Coast or West Coast of the US, and generally had very sophisticated backgrounds.

I wanted to broaden things out, and so we put effort into advertising the Summer Camp on our Wolfram|Alpha website that (I’m happy to say) a very large number of kids use. The results were good in the sense that we immediately got a much broader geographic distribution, both within the US and outside. But though we advertised that scholarships and financial aid were available, few people applied for those, and in fact the fraction even seems to have recently been going down slightly.

It’s a frustrating situation, and perhaps it’s a reflection of broader societal issues. Of course, the Summer Camp is a somewhat different situation from mentoring, because to be successful at the Summer Camp, kids already have to have (or give themselves) a certain amount of preparation (learn at least the basics of the Wolfram Language, etc.). And in fact, it’s not uncommon for kids I’ve mentored to end up going to the Summer Camp. And from that point on (or, for example, when they go to some good college), they’re often basically “solved problems”, now connected to people and resources that will help take them forward.

When my company was young, I often found myself mentoring employees. But as the company grew, and developed a strong internal culture, that became less and less necessary because in a sense, the whole ambient environment provided mentoring. And, yes, as is typical in companies, my values as founder and CEO are (for better or worse) deeply imprinted on the organization. And part of what that means is that I don’t personally have to communicate them to everyone in the organization.

In a company it clearly makes sense to promote a certain coherent set of goals and values. But what about in the world at large, or, say, in kids one mentors? There’s always a great tendency to promote—often with missionary zeal—the kind of thing one does oneself. “Everyone should want to be a tech entrepreneur!” “Everyone should want to be a professor!” etc. And, yes, there will be people for whom those are terrific directions, and unless someone mentors them in those directions, they’ll never find them. But what about all the others?

I did some surveys of kids a couple of years ago, asking them about their goals. I asked them to say how interested they were in things like having their own reality TV show, making a billion dollars, making a big scientific discovery, having lots of friends, taking a one-way trip to Mars, etc. And, perhaps not surprisingly, there was great diversity in their answers. I asked some adults the same questions, and then asked them how they thought their answers would have been different when they were kids.

And my very anecdotal conclusion was that at least at this coarse level, the things people say they’d like to do or have done change fairly little over the course of their lives—at least after their early teenage years. Of course, an important goal of education should surely be to show people what’s out there in the world, and what it’s possible to do. In practice, though, much of modern formal education is deeply institutionalized in particular tracks that were defined a century ago. But still there are signals to be gleaned.

So you like math in school? The number of people who just do math for a living is pretty small. But what is the essence of what you like about math? Is it the definiteness of it? The problem solving? The abstract aesthetics? The measurable competitiveness? If you’re mentoring a kid you should be able to parse it out—and depending on the answer there’ll be all sorts of different possible directions and opportunities.

And in general, my point of view is that the goal should always be to try to find signals from people, and then to see how to help amplify them, and solve the problem of how to fit them into what’s possible in the world. I like to think that for every person there’s something out there that’s the best fit for what they should be doing. Of course, you may be lucky or unlucky in the time in history in which you live. You want to be an explorer, doing things like searching for the sources of rivers? Sorry, that’s been done. You want to be an asteroid miner or a genetic designer of animals? Sorry, you’re too early.

In a company, I view it as a major role—and responsibility—of management to take the skills and talents of the people one has, and solve the puzzle of fitting them into the projects that the company needs to do. Quite often one ends up suggesting quite new directions to people. (“I had no idea there was a thing like software quality assurance.” “Linguistic curation is something people do?” etc.) And over the years it’s been very satisfying to see how many successful careers I’ve been able to help launch by pointing people to new fields where it turns out their skills and interests are a match.

I don’t claim to be immune to the “encourage people to do what you do” phenomenon. And in a sense that informs the people—CEOs or kids—who I mentor. But I like to think that I’m unprejudiced about subject areas (and the more experience I get in the world, and with different kinds of people, the easier that gets). What does tend to be in common, though, is that I believe in just figuring out what to do, and doing it.

Too few people have had serious experience in going from “nothing to something”: of starting from some idea that just got invented, and then seeing it over the course of time turn into something real—and perhaps even important—in the world. But that’s the kind of thing I’ve spent my life doing, and that I try to do all the time.

And (at least given my worldview) I think it’s something that’s incredibly valuable and educational for people to see, and if possible experience for themselves. When people at the company have been involved in major “nothing-to-something” projects, I think there’s a certain glow of confidence they get that lasts a decade.

I can see that my own children have benefitted from watching so many projects of mine go from nothing to something—and being exposed to the process that’s been involved (and often giving their own input). And when I mentor kids (and often CEOs too) I like to mention projects I’ve got going on, so that over the course of time they too gradually get a sense of at least my version of the “nothing-to-something” process.

For the past several years, I’ve spent a couple of hours most Sundays doing “Computational Adventures” with groups of kids (mostly middle school, with some early high school, and some late elementary school). It’s been fascinating for me, especially as I try to understand more about teaching computational thinking. And of course it’s invigorating for me to be doing something so different from my typical “day job”.

Most of the time what I’ll actually do with the kids is try to figure out or build something with the Wolfram Language. It’s not the same kind of thing as mentoring individual kids, but there’s a little bit of “create something from nothing” when we develop ideas and implement them in the Wolfram Language.

I think to most kids, knowledge is something that just exists, not something that they know people create. And so it’s always fun when the kids bring up a topic, and I’m like “well, it so happens that the world expert on that is a friend of mine”, or, “well, actually, I was the one who discovered this or that!”. Like in mentoring, all this helps communicate the “you can do that too” message. And after a while, it’s something that kids just start to take for granted.

One of the features of having done mentoring for so long is that I’ve been able to see all sorts of long-term outcomes. Sometimes it’s a bit uncanny. I’ll be talking to some kid, and I’ll think to myself: “They’re just like that kid I knew 50 years ago!” And then I’ll start playing out in my mind what I think would naturally happen this time around, decades hence. And it’s the same with CEOs and their issues.

And, yes, it’s useful to have the experience, and to be able to make those predictions. But there’s still the problem solving about the present to do, and the human connection to make. And for me it all adds up to the fascinating and fulfilling experience I’ve had in doing all that mentoring over the past half-century or so.

Often it’s been some random coincidence that’s brought a particular mentoree to me. Sometimes it’s been their initiative in reaching out (or, very occasionally, someone reaching out on their behalf). I’m hoping that in the future (particularly when it comes to kids), it’ll be a still broader cross-section. And that in the years to come I’ll have the pleasure of successfully answering ever more of those “What should I do?” questions—that make me think about something I’ve never thought about before, and help someone follow the path they want.

A Book from Alan Turing… and a Mysterious Piece of Paper

$
0
0
A Book from Alan Turing...

A Book from Alan Turing...

How I Got the Book

In May 2017, I got an email from a former high-school teacher of mine named George Rutter: “I have a copy of Dirac’s big book in German (Die Prinzipien der Quantenmechanik) that was owned by Alan Turing, and following your book Idea Makers it seemed obvious that you were the right person to own this.” He explained that he’d got the book from another (by then deceased) former high-school teacher of mine, Norman Routledge, who I knew had been a friend of Alan Turing’s. George ended, “If you would like the book, I could give it to you the next time you are in England.”

A couple of years passed. But in March 2019 I was indeed in England, and arranged to meet George for breakfast at a small hotel in Oxford. We ate and chatted, and waited for the food to be cleared. Then the book moment arrived. George reached into his briefcase and pulled out a rather unassuming, typical mid-1900s academic volume.

P. A. M. Dirac's Die Prinzipien der Quantenmechanik

I opened the front of the book, wondering if it might have a “Property of Alan Turing” sticker or something. It didn’t. But what it did have (in addition to an inscription saying “from Alan Turing’s books”) was a colorful four-page note from Norman Routledge to George Rutter, written in 2002.

I had known Norman Routledge when I was a high-school student at Eton in the early 1970s. He was a math teacher, nicknamed “Nutty Norman”. He was charmingly over the top in many ways, and told endless stories about math and other things. He’d also been responsible for the school getting a computer (programmed with paper tape, and the size of a desk)—that was the very first computer I ever used.

At the time, I didn’t know too much about Norman’s background (remember, this was long before the web). I knew he was “Dr. Routledge”. And he often told stories about people in Cambridge. But he never mentioned Alan Turing to me. Of course, Alan Turing wasn’t famous yet (although, as it happens, I’d already heard of him from someone who’d known him at Bletchley Park during the Second World War).

Alan M. Turing by Sara Turing

Alan Turing still wasn’t famous in 1981 when I started studying simple programs, albeit in the context of cellular automata rather than Turing machines. But looking through the card catalog at the Caltech library one day, I chanced upon a book called Alan M. Turing by Sara Turing, his mother. There was lots of information in the book—among other things, about Turing’s largely unpublished work on biology. But I didn’t learn anything about a connection to Norman Routledge, because the book didn’t mention him (although, as I’ve now found out, Sara Turing did correspond with Norman about the book, and Norman ended up writing a review of it).

A decade later, very curious about Turing and his (then still unpublished) work on biology, I arranged to visit the Turing Archive at King’s College, Cambridge. Soon I’d gone through what they had of Turing’s technical papers, and with some time to spare, I thought I might as well ask to see his personal correspondence too. And flipping through it, I suddenly saw a couple of letters from Alan Turing to Norman Routledge.

By that time, Andrew Hodges’s biography—which did so much to make Turing famous—had appeared, and it confirmed that, yes, Alan Turing and Norman Routledge had indeed been friends, and in fact Turing had been Norman’s PhD advisor. I wanted to ask Norman about Turing, but by then Norman was retired and something of a recluse. Still, when I finished A New Kind of Science in 2002 (after my own decade of reclusiveness) I tracked him down and sent him a copy of the book with an inscription describing him as “My last mathematics teacher”. Some correspondence ensued, and in 2005 I was finally in England again, and arranged to meet Norman for a quintessentially English tea at a fancy hotel in London.

We had a lovely chat about many things, including Alan Turing. Norman started by saying that he’d really known Turing mostly socially—and that that was 50 years ago. But still he had plenty to say about him. “He was a loner.” “He giggled a lot.” “He couldn’t really talk to non-mathematicians.” “He was always afraid of upsetting his mother.” “He would go off in the afternoon and run a marathon.” “He wasn’t very ambitious (though ‘one wasn’t’ at King’s in those days).” Eventually the conversation came back to Norman. He said that even though he’d been retired for 16 years, he still contributed items to the Mathematical Gazette, in order, he said, “to unload things before I pass to a better place”, where, he added, somewhat impishly, “all mathematical truths will surely be revealed”. When our tea was finished, Norman donned his signature leather jacket and headed for his moped, quite oblivious to the bombings that had so disrupted transportation in London on that particular day.

That was the last time I saw Norman, and he died in 2013. But now, six years later, as I sat at breakfast with George Rutter, here was this note from him, written in 2002 in his characteristically lively handwriting:

Norman's letter

Norman's letter, page 1 Norman's letter, page 2 Norman's letter, page 3 Norman's letter, page 4 Image Map

I read it quickly at first. It was colorful as always:

I got Alan Turing’s book from his friend & executor Robin Gandy (it was quite usual at King’s for friends to be offered books from a dead man’s library—I selected the collected poems of A. E. Housman from the books of Ivor Ramsay as a suitable memento: he was the Dean & jumped off the chapel [in 1956])…

Later in the note he said:

You ask about where, eventually, the book should go—I would prefer it to go to someone (or some where) wh. wd. appreciate the Turing connection, but really it is up to you.

Stephen Wolfram sent me his impressive book, but I’ve done no more than dip into it…

He ended by congratulating George Rutter for having the courage to move (as it turned out, temporarily) to Australia in his retirement, saying that he’d “toyed with moving to Sri Lanka, for a cheap, lotus-eating existence”, but added “events there mean I was wise not to do so” (presumably referring to the Sri Lankan Civil War).

What’s In the Book?

OK, so here I was with a copy of a book in German written by Paul Dirac, that was at one time owned by Alan Turing. I don’t read German, and I’d had a copy of the same book in English (which was its original language) since the 1970s. Still, as I sat at breakfast, I thought it only proper that I should look through the book page by page. After all, that’s a standard thing one does with antiquarian books.

I have to say that I was struck by the elegance of Dirac’s presentation. The book was published in 1931, yet its clean formalism (and, yes, despite the language barrier, I could read the math) is pretty much as one would write it today. (I don’t want to digress too much about Dirac, but my friend Richard Feynman told me that at least to him, Dirac spoke only monosyllabically. Norman Routledge told me that he had been friends in Cambridge with Dirac’s stepson, who became a graph theorist. Norman quite often visited the Dirac household, and said the “great man” was sometimes in the background, always with lots of mathematical puzzles around. I myself unfortunately never met Dirac, though I’m told that after he finally retired from Cambridge and went to Florida, he lost much of his stiffness and became quite social.)

But back to Turing’s copy of Dirac’s book. On page 9 I started to see underlinings and little marginal notes, all written in light pencil. I kept on flipping pages. After a few chapters, the annotations disappeared. But then, suddenly, tucked into page 127, there was a note:

German note

It was in German, with what looked like fairly typical older German handwriting. And it seemed to have something to do with Lagrangian mechanics. By this point I’d figured out that someone must have had the book before Turing, and this must be a note made by that person.

I kept flipping through the book. No more annotations. And I was thinking I wouldn’t find anything else. But then, on page 231, a bookmark—with a charmingly direct branding message:

Heffers bookmark

Would there be anything more? I continued flipping. Then, near the end of the book, on page 259, in a section on the relativistic theory of electrons, I found this:

Folded note

I opened the piece of paper:

Opened note

I recognized it immediately: it’s lambda calculus, with a dash of combinators. But what on Earth was it doing here? Remember, the book is about quantum mechanics. But this is about mathematical logic, or what’s now considered theory of computation. Quintessential Turing stuff. So, I immediately wondered, did Turing write this page?

Even as we were sitting at breakfast, I was looking on the web for samples of Turing’s handwriting. But I couldn’t find many calculational ones, so couldn’t immediately conclude much. And soon I had to go, carefully packing the book away, ready to pursue the mystery of what this page was, and who wrote it.

About the Book

Before anything else, let’s talk about the book itself. Dirac’s The Principles of Quantum Mechanics was published in English in 1930, and very quickly also appeared in German. (The preface by Dirac is dated May 29, 1930; the one from the translator—Werner Bloch—August 15, 1930.) The book was a landmark in the development of quantum mechanics, systematically setting up a clear formalism for doing calculations, and, among other things, explaining Dirac’s prediction of the positron, which would be discovered in 1932.

Why did Alan Turing get the book in German rather than English? I don’t know for sure. But in those days, German was the leading language of science, and we know Alan Turing knew how to read it. (After all, the title of his famous Turing machine paper “On Computable Numbers, with an Application to the Entscheidungsproblem” had a great big German word in it—and within the body of the paper he referred to the rather obscure Gothic characters he used as “German letters”, contrasting them, for example, with Greek letters.)

Did Alan Turing buy the book, or was he given it? I don’t know. On the inside front cover of Turing’s copy of the book is a pencil notation “20/-”, which was standard notation for “20 shillings”, equal to £1. On the right-hand page, there’s an erased “26.9.30”, presumably meaning 26 September, 1930—perhaps the date when the book was first in inventory. Then to the far right, there’s an erased “20-.”, perhaps again the price. (Could this have been a price in Reichsmarks, suggesting the book was sold in Germany? Even though at that time 1 RM was worth roughly 1 shilling, a German price would likely have been written as, for example, “20 RM”.) Finally, on the inside back cover there’s “c 5/-”—maybe the (highly discounted) price for the book used.

Let’s review the basic timeline. Alan Turing was born June 23, 1912 (coincidentally, exactly 76 years before Mathematica 1.0 was released). He went as an undergraduate to King’s College, Cambridge in the fall of 1931. He got his undergraduate degree after the usual three years, in 1934.

In the 1920s and early 1930s, quantum mechanics was hot, and Alan Turing was interested in it. From his archives, we know that in 1932—as soon as it was published—he got John von Neumann’s Mathematical Foundations of Quantum Mechanics (in its original German). We also know that in 1935, he asked the Cambridge physicist Ralph Fowler for a possible question to study in quantum mechanics. (Fowler suggested computing the dielectric constant of water—which actually turns out to be a very hard problem, basically requiring full-fledged interacting quantum field theory analysis, and still not completely solved.)

When and how did Turing get his copy of Dirac’s book? Given that there seems to be a used price in the book, Turing presumably bought it used. Who was its first owner? The annotations in the book seem to be concerned primarily with logical structure, noting what should be considered an axiom, what logically depends on what, and so on. What about the note tucked into page 127?

Well, perhaps coincidentally, page 127 isn’t just any page: it’s the page where Dirac talks about the quantum principle of least action, and sets the stage for the Feynman path integral—and basically all modern quantum formalism. But what does the note say? It’s expanding on equation 14, which is an equation for the time evolution of a quantum amplitude. The writer has converted Dirac’s A for amplitude into a ρ, possibly reflecting an earlier (fluid-density analogy) German notation. Then the writer attempts an expansion of the action in powers of (Planck’s constant over 2π, sometimes called Dirac’s constant).

But there doesn’t seem to be a lot to be gleaned from what’s on the page. Hold the page up to the light, though, and there’s a little surprise—a watermark reading “Z f. Physik. Chem. B”:

Z f. Physik. Chem. B watermark

That’s a short form of Zeitschrift für physikalische Chemie, Abteilung B—a German journal of physical chemistry that began publication in 1928. Was the note perhaps written by an editor of the journal? Here’s the masthead of the journal for 1933. Conveniently, the editors are listed with their locations, and one stands out: Born · Cambridge.

Zeitschrift für physikalische Chemie, Abteilung B

That’s Max Born, of the Born interpretation, and many other things in quantum mechanics (and also the grandfather of the singer Olivia Newton-John). So, was this note written by Max Born? Unfortunately it doesn’t seem like it: the handwriting doesn’t match.

OK, so what about the bookmark at page 231? Here are the two sides of it:

Heffers bookmark

The marketing copy is quaint and rather charming. But when is it from? Well, there’s still a Heffers Bookshop in Cambridge, though it’s now part of Blackwell’s. But for more than 70 years (ending in 1970) Heffers was located, as the bookmark indicates, at 3 and 4 Petty Cury.

But there’s an important clue on the bookmark: the phone number is listed as “Tel. 862”. Well, it turns out that in 1939, most of Cambridge (including Heffers) switched to 4-digit numbers, and certainly by 1940 bookmarks were being printed with “modern” phone numbers. (English phone numbers progressively got longer; when I was growing up in England in the 1960s, our phone numbers were “Oxford 56186” and “Kidmore End 2378”. Part of why I remember these numbers is the now-strange-seeming convention of always saying one’s number when answering the phone.)

But, OK, so the bookmark was from before 1939. But how much before? There are quite a few scans of old Heffers ads to be found on the web—and from at least 1912 (along with “We solicit the favour of your enquiries…”) they list “Telephone 862”, helpfully adding “(2 lines)”. And there are even some bookmarks with the same design to be found in copies of books from as long ago as 1904 (though it’s not clear they were original to the books). But for our purposes it seems as if we can reasonably conclude that our book came from Heffers (which was the main bookstore in Cambridge, by the way) sometime between 1930 and 1939.

The Lambda Calculus Page

OK, so we know something about when the book was bought. But what about the “lambda calculus page”? When was it written? Well, of course, lambda calculus had to have been invented. And that was done by Alonzo Church, a mathematician at Princeton, in an initial form in 1932, and in final form in 1935. (There had been precursors, but they hadn’t used the λ notation.)

There’s a complicated interaction between Alan Turing and lambda calculus. It was in 1935 that Turing had gotten interested in “mechanizing” the operations of mathematics, and had invented the idea of a Turing machine, and used it to solve a problem in the foundations of mathematics. Turing had sent a paper about it to a French journal (Comptes rendus), but initially it was lost in the mail; and then it turned out the person he’d sent it to wasn’t around anyway, because they’d gone to China.

But in May 1936, before Turing could send his paper anywhere else, Alonzo Church’s paper arrived from the US. Turing had been “scooped” once before, when in 1934 he created a proof of the central limit theorem, only to find that there was a Norwegian mathematician who’d already given a proof in 1922.

It wasn’t too hard to see that Turing machines and lambda calculus were actually equivalent in the kinds of computations they could represent (and that was the beginning of the Church–Turing thesis). But Turing (and his mentor Max Newman) got convinced that Turing’s approach was different enough to deserve separate publication. And so it was that in November 1936 (with a bug fix the following month), Turing’s famous paper “On Computable Numbers…” was published in the Proceedings of the London Mathematical Society.

To fill in a little more of the timeline: from September 1936 to July 1938 (with a break of three months in the summer of 1937), Turing was at Princeton, having gone there to be, at least nominally, a graduate student of Alonzo Church. While at Princeton, Turing seems to have concentrated pretty completely on mathematical logic—writing several difficult-to-read papers full of Church’s lambdas—and most likely wouldn’t have had a book about quantum mechanics with him.

Turing was back in Cambridge in July 1938, but already by September of that year he was working part-time for the Government Code and Cypher School—and a year later he moved to Bletchley Park to work full time on cryptanalysis. After the war ended in 1945, Turing moved to London to work at the National Physical Laboratory on producing a design for a computer. He spent the 1947–8 academic year back in Cambridge, but then moved to Manchester to work on building a computer there.

In 1951, he began working in earnest on theoretical biology. (To me it’s an interesting irony that he seems to have always implicitly assumed that biological systems have to be modeled by differential equations, rather than by something discrete like Turing machines or cellular automata.) He also seems to have gotten interested in physics again, and by 1954 even wrote to his friend and student Robin Gandy that “I’ve been trying to invent a new Quantum Mechanics” (though he added, “but it won’t really work”). But all this came to an end on June 7, 1954, when Turing suddenly died. (My own guess is that it was not suicide, but that’s a different story.)

OK, but back to the lambda calculus page. Hold it up to the light, and once again there’s a watermark:

Excelsior watermark

So it’s a British-made piece of paper, which seems, for example, to make it unlikely to have been used in Princeton. But can we date the paper? Well, after some help from the British Association of Paper Historians, we know that the official manufacturer of the paper was Spalding & Hodge, Papermakers, Wholesale and Export Stationers of Drury House, Russell Street off Drury Lane, Covent Garden, London. But this doesn’t help as much as one might think—because their Excelsior brand of machine-made paper seems to have been listed in catalogs all the way from the 1890s to 1954.

What Does the Page Say?

What does the page say?

OK, so let’s talk in more detail about what’s on the two sides of the page. Let’s start with the lambdas.

These are a way of defining “pure” or “anonymous” functions, and they’re a core concept in mathematical logic, and nowadays also in functional programming. They’re common in the Wolfram Language, and they’re pretty easy to explain there. One writes f[x] to mean a function f applied to an argument x. And there are lots of named functions that f can be—like Abs or Sin or Blur. But what if one wants f[x] to be 2x+1? There’s no immediate name for that function. But is there still something we can write for f that will make f[x] be this?

The answer is yes: in place of f we write Function[a, 2a+1]. And in the Wolfram Language, Function[a, 2a+1][x] is defined to give 2x+1. The Function[a, 2a+1] is a “pure” or “anonymous” function, that represents the pure operation of doubling and adding 1.

Well, λ in lambda calculus is the exact analog of Function in the Wolfram Language—and so for example λa.(2a+1) is equivalent to Function[a, 2a+1]. (It’s worth noting that Function[b, 2b+1] is equivalent; the “bound variable” a or b is just a placeholder—and in the Wolfram Language it can be avoided by using the alternative notation (2#+1)&.)

In traditional mathematics, functions tend to be thought of as things that map inputs (like, say, integers) to outputs (that are also, say, integers). But what kind of a thing is Function (or λ)? It’s basically a structural operator that takes expressions and turns them into functions. That’s a bit weird from the point of view of traditional mathematics and mathematical notation. But if one’s thinking about manipulating arbitrary symbols, it’s much more natural, even if at first it still seems a little abstract. (And, yes, when people learn the Wolfram Language, I can always tell they’ve passed a certain threshold of abstract understanding when they get the idea of Function.)

OK, but the lambdas are just part of what’s on the page. There’s also another, yet more abstract concept: combinators. See the rather obscure-looking line PI1IIx? What does it mean? Well, it’s a sequence of combinators, or effectively, a kind of abstract composition of symbolic functions.

Ordinary composition of functions is pretty familiar from mathematics. And in Wolfram Language one can write f[g[x]] to mean “apply f to the result of applying g to x”. But does one really need the brackets? In the Wolfram Language f@g@x is an alternative notation. But in this notation, we’re relying on a convention in the Wolfram Language: that the @ operator associates to the right, so that f@g@x is equivalent to f@(g@x).

But what would (f@g)@x mean? It’s equivalent to f[g][x]. And if f and g were ordinary functions in mathematics, this would basically be meaningless. But if f is a higher-order function, then f[g] can itself be a function, which can perfectly well be applied to x.

OK, there’s another piece of complexity here. In f[x] the f is a function of one argument. And f[x] is equivalent to Function[a, f[a]][x]. But what about a function of two arguments, say f[x, y]? This can be written Function[{a,b}, f[a, b]][x, y]. But what about Function[{a}, f[a, b]]? What would this be? It’s got a “free variable” b just hanging out. Function[{b}, Function[{a}, f[a, b]]] would “bind” that variable. And then Function[{b}, Function[{a}, f[a, b]]][y][x] gives f[x, y] again. (The process of unwinding functions so that they have single arguments is called “currying”, after a logician named Haskell Curry.)

If there are free variables, then there’s all sorts of complexity about how functions can be composed. But if we restrict ourselves to Function or λ objects that don’t have free variables, then these can basically be freely composed. And such objects are called combinators.

Combinators have a long history. So far as one knows, they were first invented in 1920 by a student of David Hilbert’s named Moses Schönfinkel. At the time, it had only recently been discovered that one didn’t need And and Or and Not to represent expressions in standard propositional logic: it was sufficient to use the single operator that we’d now call Nand (because, for example, writing Nand as ·, Or[a, b] is just (a·a)·(b·b)). Schönfinkel wanted to find the same kind of minimal representation of predicate logic, or in effect, logic including functions.

And what he came up with was the two “combinators” S and K. In Wolfram Language notation, K[x_][y_] x and S[x_][y_][z_] x[z][y[z]]. Now, here’s the remarkable thing: it turns out to be possible to use these two combinators to perform any computation. So, for example, S[K[S]][S[K[S[K[S]]]][S[K[K]]]] can be used as a function to add two integers.

It is, to put it mildly, quite abstract stuff. But now that one’s understood Turing machines and lambda calculus, it’s possible to see that Schönfinkel’s combinators actually anticipated the concept of universal computation. (And what’s more remarkable still, the definitions of S and K from 1920 are almost minimally simple, reminiscent of the very simplest universal Turing machine that I finally suggested in the 1990s, and was proved in 2007.)

But back to our page, and the line PI1IIx. The symbols here are combinators, and they’re all intended to be composed. But the convention was that function composition should be left-associative, so that fgx should be interpreted not like f@g@x as f@(g@x) or f[g[x]] but rather like (f@g)@x or f[g][x]. So, translating a bit for convenient Wolfram Language use, PI1IIx is p[i][one][i][i][x].

Why would someone be writing something like this? To explain that, we have to talk about the concept of Church numerals (named after Alonzo Church). Let’s say we’re just working with symbols and with lambdas, or combinators. Is there a way we use these to represent integers?

Well, how about just saying that a number n corresponds to Function[x, Nest[f, x, n]]? Or, in other words, that (in shorter notation) 1 is f[#]&, 2 is f[f[#]]&, 3 is f[f[f[#]]]&, and so on. This might seem irreducibly obscure. But the reason it’s interesting is that it allows us to do everything completely symbolically and abstractly, without ever having to explicitly talk about something like integers.

With this setup, imagine, for example, adding two numbers: 3 can be represented as f[f[f[#]]]&, and 2 is f[f[#]]&. We can add them just by applying one of them to the other:

f[f[f[#]]] & [f[f[#]] &]
&#10005
f[f[f[#]]] & [f[f[#]] &]

OK, but what is the f supposed to be? Well, just let it be anything! In a sense, “go lambda” all the way, and represent numbers by functions that take f as an argument. In other words, make 3 for example be Function[f, f[f[f[#]]]&] or Function[f, Function[x, f[f[f[x]]]]. (And, yes, exactly when and how you need to name variables is the bane of lambda calculus.)

Here’s a fragment from Turing’s 1937 paper “Computability and λ-Definability” that sets things up exactly as we just discussed:

Fragment from "Computability and λ-Definability"

The notation is a little confusing. Turing’s x is our f, while his x’ (the typesetter did him no favor by inserting space) is our x. But it’s exactly the same setup.

OK, so let’s take a look at the line right after the fold on the front of the page. It’s I1IIYI1IIx. In Wolfram Language notation this would be i[one][i][i][y][i][one][i][i][x]. But here, i is the identity function, so i[one] is just one. Meanwhile, one is the Church numeral for 1, or Function[f, f[#]&]. But with this definition one[a] becomes a[#]& and one[a][b] becomes a[b]. (By the way, i[a][b], or Identity[a][b], is also a[b].)

It keeps things cleaner to write the rules for i and one using pattern matching rather than explicit lambdas, but the result is the same. Apply these rules and one gets:

i[one][i][i][y][i][one][i][i][x] //. {i[x_] → x, one[x_][y_] → x[y]}
&#10005
i[one][i][i][y][i][one][i][i][x] //.
					 {i[x_] -> x, one[x_][y_] -> x[y]}

And that’s exactly the same as the first reduction shown:

Excerpt 1

OK, now let’s look higher on the page again now:

Excerpt 2

There’s a rather confusing “E” and “D”, but underneath these say “P” and “Q”, so we can write out the expression, and evaluate it (note that here—after some confusion with the very last character—the writer makes both [ ... ] and ( … ) represent function application):

Function[a, a[p]][q]
&#10005
Function[a, a[p]][q]

OK, so this is the first reduction shown. To see more, let’s substitute in the form of Q:

q[p] /. q → Function[f, f[i][one][i][i][x]]
&#10005
q[p] /.
					 q -> Function[f, f[i][one][i][i][x]]

We get exactly the next reduction shown. OK, so what about putting in the form for P?

Excerpt 3

Here’s the result:

p[i][one][i][i][x] /. {p → Function[r, r[Function[s, s[one][i][i][y]]]]}
&#10005
p[i][one][i][i][
  x] /.
   {p -> Function[r, r[Function[s, s[one][i][i][y]]]]}

And now using the fact that i is the identity, we get:

i[Function[s, s[one][i][i][y]]][one][i][i][x] /. {i[x_] → x}
&#10005
i[Function[s, s[one][i][i][y]]][one][i][i][x] /.
					 {i[x_] -> x}

But oops. This isn’t the next line written. Is there a mistake? It’s not clear. Because, after all, unlike in most of the other cases, there isn’t an arrow indicating that the next line follows from the previous one.

OK, so there’s a mystery there. But let’s skip ahead to the bottom of the page:

Excerpt 4

The 2 here is a Church numeral, defined for example by the pattern two[a_][b_] a[a[b]]. But notice that this is actually the form of the second line, with a being Function[r, r[p]] and b being q. So then we’d expect the reduction to be:

two[Function[r, r[p]]][q] //. {two[x_][y_] → x[x[y]]}
&#10005
two[Function[r, r[p]]][q] //.
					 {two[x_][y_] -> x[x[y]]}

Somehow, though, the innermost a[b] is being written as x (probably different from the x earlier on the page), making the final result instead:

Function[r, r[p]][x]
&#10005
Function[r, r[p]][x]

OK, so we can decode quite a bit of what’s happening on the page. But at least one mystery that remains is what Y is supposed to be.

There’s actually a standard “Y combinator” in combinatory logic: the so-called fixed-point combinator. Formally, this is defined by saying that Y[f] must be equal to f[Y[f]], or, in other words, that Y[f] doesn’t change when f is applied, so that it’s a fixed point of f. (The Y combinator is related to #0 in the Wolfram Language.)

In modern times, the Y combinator has been made famous by the Y Combinator startup accelerator, named that way by Paul Graham (who had been a longtime enthusiast of functional programming and the LISP programming language—and had written an early web store using it) because (as he once told me) “nobody understands the Y combinator”. (Needless to say, Y Combinator is all about avoiding having companies go to fixed points…)

The Y combinator (in the sense of fixed-point combinator) was invented several times. Turing actually came up a version of it in 1937, that he called Θ. But is the “Y” on our page the famous fixed-point combinator? Probably not. So what is our “Y”? We see this reduction:

Excerpt 5

But that’s not enough information to uniquely determine what Y is. It’s clear Y isn’t operating just on a single argument; it seems to be dealing with at least two arguments. But it’s not clear (at least to me) how many arguments it’s taking, and what it’s doing.

OK, so even though we can interpret many parts of the page, we have to say that globally it’s not clear what’s been done. But even though it’s needed a lot of explanation here, what’s on the page is actually fairly elementary in the world of lambda calculus and combinators. Presumably it’s an attempt to construct a simple “program”—using lambda calculus and combinators—to do something. But as is typical in reverse engineering, it’s hard for us to tell what the “something”— the overall “explainable” goal—is supposed to be.

There’s one more feature of the page that’s worth commenting on, and that’s its use of brackets. In traditional mathematics one basically (if confusingly) uses parentheses for everything—both function application (as in f(x)) and grouping of terms (as in (1+x)(1-x), or, more ambiguously, a(1-x)). (In Wolfram Language, we separate different uses, with square brackets for function application—as in f[x]—and parentheses only for grouping.)

And in the early days of lambda calculus, there were lots of issues about brackets. Later, Alan Turing would write a whole (unpublished) paper entitled “The Reform of Mathematical Notation and Phraseology”, but already in 1937 he felt he needed to describe the (rather hacky) current conventions for lambda calculus (which were due to Church, by the way).

He said that f applied to g should be written {f}(g), unless f is just a single symbol, in which case it can be f(g). Then he said that a lambda (as in Function[a, b]) should be written λ a[b], or alternatively λ a . b. By perhaps 1940, however, the whole idea of using { … } and [ ... ] to mean different things had been dropped, basically in favor of standard-mathematical-style parentheses.

Look at what’s near the top of the page:

Excerpt 6

As written, this is a bit hard to understand. In Church’s convention, the square brackets would be for grouping, with the opening bracket replacing the dot. And with this convention, it’s clear that the Q (finally labeled D) enclosed in parentheses at the end is what the whole initial lambda is applied to. But actually, the square bracket doesn’t delimit the body of the lambda; instead, it’s actually representing another function application, and there’s no explicit specification of where the body of the lambda ends. At the very end, one can see that the writer changed a closing square bracket to a parenthesis, thereby effectively enforcing Church’s convention—and making the expression evaluate as the page shows.

So what does this little notational tangle imply? I think it strongly suggests that the page was written in the 1930s, or not too long thereafter—before conventions for brackets became clearer.

Whose Handwriting Is It?

OK, so we’ve talked about what’s on the page. But what about who wrote it?

The most obvious candidate would be Alan Turing, since, after all, the page was inside a book he owned. And in terms of content there doesn’t seem to be anything inconsistent with Alan Turing having written it—perhaps even when he was first understanding lambda calculus after getting Church’s paper in early 1936.

But what about the handwriting? Is that consistent with Alan Turing’s? Here are a few surviving samples that we know were written by Alan Turing:

Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Image Map

The running text definitely looks quite different. But what about the notation? At least to my eye, it didn’t look so obviously different—and one might think that any difference could just be a reflection of the fact that the extant samples are pieces of exposition, while our page shows “thinking in action”.

Conveniently, the Turing Archive contains a page where Turing wrote out a table of symbols to use for notation. And comparing this, the letter forms did look to me fairly similar (this was from Turing’s time of studying plant growth, hence the “leaf area” annotation):

Table of Symbols

But I wanted to check further. So I sent the samples to Sheila Lowe, a professional handwriting examiner (and handwriting-based mystery writer) I happen to know—just presenting our page as “sample A” and known Turing handwriting as “sample B”. Her response was definitive, and negative: “The writing style is entirely different. Personality-wise, the writer of sample B has a quicker, more intuitive thinking style than the one of sample A.” I wasn’t yet completely convinced, but decided it was time to start looking at other alternatives.

So if Turing didn’t write this, who did? Norman Routledge said he got the book from Robin Gandy, who was Turing’s executor. So I sent along a “Sample C”, from Gandy:

Sample C

But Sheila’s initial conclusion was that the three samples were likely written by three different people, noting again that sample B came from “the quickest thinker and the one that is likely most willing to seek unusual solutions to problems”. (I find it a little charming that a modern handwriting expert would give this assessment of Turing’s handwriting, given how vociferously Turing’s school reports from the 1920s complained about his handwriting.)

Well, at this point it seemed as if both Turing and Gandy had been eliminated as writers of the page. So who might have written it? I started thinking about people Turing might have lent the book to. Of course, they’d have to be capable of doing calculations in lambda calculus.

I assumed that the person would have to be in Cambridge, or at least in England, given the watermark on the paper. And I took as a working hypothesis that 1936 or thereabouts was the relevant time. So who did Turing know then? We got a list of all math students and faculty at King’s College at the time. (There were 13 known students who started in 1930 through 1936.)

And from these, the most promising candidate seemed to be David Champernowne. He was the same age as Turing, a longtime friend, and also interested in the foundations of mathematics—in 1933 already publishing a paper on what’s now called Champernowne’s constant: 0.12345678910111213… (obtained by concatenating the digits of 1, 2, 3, 4, …, 8, 9, 10, 11, 12, …, and one of the very few numbers known to be “normal” in the sense that every possible block of digits occurs with equal frequency). In 1937, he even used Dirac gamma matrices, as mentioned in Dirac’s book, to solve a recreational math problem. (As it happens, years later, I became quite an aficionado of gamma matrix computations.)

After starting in mathematics, though, Champernowne came under the influence of John Maynard Keynes (also at King’s), and eventually became a distinguished economist, notably doing extensive work on income inequality. (Still, in 1948 he also worked with Turing to design Turbochamp: a chess-playing program that almost became the first ever to be implemented on a computer.)

But where could I find a sample of Champernowne’s handwriting? Soon I’d located his son Arthur Champernowne on LinkedIn, who, curiously, had a degree in mathematical logic, and had been working for Microsoft. He said his father had talked to him quite a lot about Turing’s work, though hadn’t mentioned combinators. He sent me a sample of his father’s handwriting (a piece about algorithmic music composition):

Champernowne's handwriting

One could immediately tell it wasn’t a match (Champernowne’s f’s have loops, etc.)

So who else might it be? I wondered about Max Newman, in many ways Alan Turing’s mentor. Newman had first got Turing interested in “mechanizing mathematics”, was a longtime friend, and years later would be his boss at Manchester in the computer project there. (Despite his interest in computation, Newman always seems to have seen himself first and foremost as a topologist, though his cause wasn’t helped by a flawed proof he produced of the Poincaré conjecture.)

It wasn’t difficult to find a sample of Newman’s handwriting. And no, definitely not a match.

Tracing the Book

OK, so handwriting identification hadn’t worked. And I decided the next thing to do was to try to trace in a bit more detail what had actually happened to the book I had in my hands.

So, first, what was the more detailed story with Norman Routledge? He had gone to King’s College, Cambridge as an undergraduate in 1946, and had gotten to know Turing then (yes, they were both gay). He graduated in 1949, then started doing a PhD with Turing as his advisor. He got his PhD in 1954, working on mathematical logic and recursion theory. He got a fellowship at King’s College, and by 1957 was Director of Studies in Mathematics there. He could have stayed doing this his whole life, but he had broad interests (music, art, architecture, recreational math, genealogy, etc.) and in 1960 changed course, and became a teacher at Eton—where he entertained (and educated) many generations of students (including me) with his eclectic and sometimes outlandish knowledge.

Could Norman Routledge have written the mysterious page? He knew lambda calculus (though, coincidentally, he mentioned at our tea in 2005 that he always found it “confusing”). But his distinctive handwriting style immediately excludes him as a possible writer.

Could the page be somehow associated with a student of Norman’s, perhaps from when he was still in Cambridge? I don’t think so. Because I don’t think Norman ever taught about lambda calculus or anything like it. In writing this piece, I found that Norman wrote a paper in 1955 about doing logic on “electronic computers” (and creating conjunctive normal forms, as BooleanMinimize now does). And when I knew Norman he was quite keen on writing utilities for actual computers (his initials were “NAR”, and he named his programs “NAR…”, with, for example, “NARLAB” being a program for creating textual labels using hole patterns punched in paper tape). But he never talked about theoretical models of computation.

OK, but let’s read Norman’s note inside the book a bit more carefully. The first thing we notice is that he talks about being “offered books from a dead man’s library”. And from the wording, it sounds as if this happened quite quickly after a person died, suggesting that Norman got the book soon after Turing’s death in 1954, and that Gandy didn’t have it for very long. Norman goes on to say that actually he got four books in total, two on pure math, and two on theoretical physics.

Then he says that he gave “the other [physics] one (by Herman Weyl, I think)” to “Sebag Montefiore, a pleasant, clever boy whom you [George Rutter] may remember”. OK, so who is that? I searched for my rarely used Old Etonian Association List of Members. (I have to report that on opening it, I could not help but notice its rules from 1902, the first under “Rights of Members” charmingly being “To wear the Colours of the Association”. I should add that I would probably never have joined this association or got this book but for the insistence of a friend of mine at Eton named Nicholas Kermack, who from the age of 12 planned how he would one day become Prime Minister, but sadly died at the age of 21.)

But in any case, there were five Sebag-Montefiores listed, with quite a distribution of dates. It wasn’t hard to figure out that the appropriate one was probably Hugh Sebag-Montefiore. Small world that it is, it turned out that his family had owned Bletchley Park before selling it to the British Government in 1938. And in 2000, Sebag-Montefiore had written a book about the breaking of Enigma—which is presumably why in 2002 Norman thought to give him a book that had been owned by Turing.

OK, so what about the other books Norman got from Turing? Not having any other way to work out what happened to them, I ordered a copy of Norman’s will. The last clause in the will was classic Norman:

Excerpt from Norman's will

But what the will ultimately said was that Norman’s books should be left to King’s College. And although the complete collection of his books doesn’t seem to be anywhere to be found, the two Turing-owned pure math books that he mentioned in his note are now duly in the King’s College archive collection.

But, OK, so the next question is: what happened to Turing’s other books? I looked up Turing’s will, which seemed to leave them all to Robin Gandy.

Gandy was a math undergraduate at King’s College, Cambridge, who in his last year of college—in 1940—had become friends with Alan Turing. In the early part of the war, Gandy worked on radio and radar, but in 1944 he was assigned to the same unit as Turing, working on speech encipherment. And after the war, Gandy went back to Cambridge, soon starting a PhD, with Turing as his advisor.

Gandy’s war work apparently got him interested in physics, and his thesis, completed in 1952, was entitled “On Axiomatic Systems in Mathematics and Theories in Physics”. What Gandy seems to have been trying to do is to characterize what physical theories are in mathematical logic terms. He talks about type theory and rules of inference, but never about Turing machines. And from what we know now, I think he rather missed the point. And indeed my own work from the early 1980s argued that physical processes should be thought of as computations—like Turing machines or cellular automata—not as things like theorems to be deduced. (Gandy has a rather charming discussion of the order of types involved in physical theories, saying for example that “I reckon that the order of any computable binary decimal is less than eight”. He says that “one of the reasons why modern quantum field theory is so difficult is that it deals with objects of rather high type—functionals of functions…”, eventually suggesting that “we might well take the greatest type in common use as an index of mathematical progress”.)

Gandy mentions Turing a few times in the thesis, noting in the introduction that he owes a debt to A. M. Turing, who “first called my somewhat unwilling attention to the system of Church” (i.e. lambda calculus)—though in fact the thesis has very few lambdas in evidence.

After his thesis, Gandy turned to purer mathematical logic, and for more than three decades wrote papers at the rate of about one per year, and traveled the international mathematical logic circuit. In 1969 he moved to Oxford, and I have to believe that I must have met him in my youth, though I don’t have any recollection of it.

Gandy apparently quite idolized Turing, and in later years would often talk about him. But then there was the matter of the Turing collected works. Shortly after Turing died, Sara Turing and Max Newman had asked Gandy—as Turing’s executor—to organize the publication of Turing’s unpublished papers. Years went by. Letters in the archives record Sara Turing’s frustration. But somehow Gandy never seemed to get the papers together.

Gandy died in 1995, still without the collected works complete. Nick Furbank—a literary critic and biographer of E. M. Forster who Turing had gotten to know at King’s College—was Turing’s literary executor, and finally he swung into action on the collected works. The most contentious volume seemed to be the one on mathematical logic, and for this he enlisted Robin Gandy’s first serious PhD student, a certain Mike Yates—who found letters to Gandy about the collected works that had been unopened for 24 years. (The collected works finally appeared in 2001—45 years after they were started.)

But what about the books Turing owned? In continuing to try to track them down, my next stop was the Turing family, and specifically Turing’s brother’s youngest child, Dermot Turing (who is actually Sir Dermot Turing, as a result of a baronetcy which passed down the non-Alan branch of the Turing family). Dermot Turing (who recently wrote a biography of Alan Turing) told me about “granny Turing” (aka Sara Turing), whose house apparently shared a garden gate with his family’s, and many other things about Alan Turing. But he said the family never had any of Alan Turing’s books.

So I went back to reading wills, and found out that Gandy’s executor was his student Mike Yates. We found out that Mike Yates had retired from being a professor 30 years ago, but was now living in North Wales. He said that in the decades he was working in mathematical logic and theory of computation, he’d never really touched a computer—but finally did when he retired (and, as it happens, discovered Mathematica soon thereafter). He said how remarkable it was that Turing had become so famous—and that when he’d arrived at Manchester just three years after Turing died, nobody talked about Turing, not even Max Newman when he gave a course about logic. Though later on, Gandy would talk about how swamped he was in dealing with Turing’s collected works—eventually leaving the task to Mike.

What did Mike know about Turing’s books? Well, he’d found one handwritten notebook of Turing’s, that Gandy had not given to King’s College, because (bizarrely) Gandy had used it as camouflage for notes he kept about his dreams. (Turing kept dream notebooks too, that were destroyed when he died.) Mike said that notebook had recently been sold at auction for about $1M. And that otherwise he didn’t think there was any Turing material among Gandy’s things.

It seemed like all our leads had dried up. But Mike asked to see the mysterious piece of paper. And immediately he said, “That’s Robin Gandy’s handwriting!” He said he’d seen so much of it over the years. And he was sure. He said he didn’t know much about lambda calculus, and couldn’t really read the page. But he was sure it had been written by Robin Gandy.

We went back to our handwriting examiner with more samples, and she agreed that, yes, what was there was consistent with Gandy’s writing. So finally we had it: Robin Gandy had written our mysterious piece of paper. It wasn’t written by Alan Turing; it was written by his student Robin Gandy.

Of course, some mysteries remain. Presumably Turing lent Gandy the book. But when? The lambda calculus notation seems like it’s from the 1930s. But based on comments in Gandy’s thesis, Gandy probably wouldn’t have been doing anything with lambda calculus until the late 1940s. Then there’s the question of why Gandy wrote it. It doesn’t seem directly related to his thesis, so maybe it was when he was first trying to understand lambda calculus.

I doubt we’ll ever know. But it’s certainly been interesting trying to track it down. And I have to say that the whole process has done much to heighten my awareness of just how complex the stories may be of all those books from past centuries that I own. And it makes me think I’d better make sure I’ve gone through all their pages, just to find out what curious things might be in there…


Thanks for additional help to Jonathan Gorard (local research in Cambridge), Dana Scott (mathematical logic) and Matthew Szudzik (mathematical logic).


The Ease of Wolfram|Alpha, the Power of Mathematica: Introducing Wolfram|Alpha Notebook Edition

$
0
0
sw-icon

Wolfram|Alpha Notebook Edition

The Next Big Step for Wolfram|Alpha

Wolfram|Alpha has been a huge hit with students. Whether in college or high school, Wolfram|Alpha has become a ubiquitous way for students to get answers. But it’s a one-shot process: a student enters the question they want to ask (say in math) and Wolfram|Alpha gives them the (usually richly contextualized) answer. It’s incredibly useful—especially when coupled with its step-by-step solution capabilities.

But what if one doesn’t want just a one-shot answer? What if one wants to build up (or work through) a whole computation? Well, that’s what we created Mathematica and its whole notebook interface to do. And for more than 30 years that’s how countless inventions and discoveries have been made around the world. It’s also how generations of higher-level students have been taught.

But what about students who aren’t ready to use Mathematica yet? What if we could take the power of Mathematica (and what’s now the Wolfram Language), but combine it with the ease of Wolfram|Alpha?

Well, that’s what we’ve done in Wolfram|Alpha Notebook Edition.

It’s built on a huge tower of technology, but what it does is to let any student—without learning any syntax or reading any documentation—immediately build up or work through computations. Just type input the way you would in Wolfram|Alpha. But now you’re not just getting a one-shot answer. Instead, everything is in a Wolfram Notebook, where you can save and use previous results, and build up or work through a whole computation:

Wolfram Notebook

The Power of Notebooks

Being able to use Wolfram|Alpha-style free-form input is what opens Wolfram|Alpha Notebook Edition up to the full range of students. But it’s the use of the notebook environment that makes it so uniquely valuable for education. Because by being able to work through things in a sequence of steps, students get to really engage with the computations they’re doing.

Try one step. See what happens. Change it if you want. Understand the output. See how it fits into the next step. And then—right there in the notebook—see how all your steps fit together to give your final results. And then save your work in the notebook, to continue—or review what you did—another time.

But notebooks aren’t just for storing computations. They can also contain text and structure. So students can use them not just to do their computations, but also to keep notes, and to explain the computations they’re doing, or the results they get:

Student notebook

And in fact, Wolfram Notebooks enable a whole new kind of student work: computational essays. A computational essay has both text and computation—combined to build up a narrative to which both human and computer contribute.

The process of creating a computational essay is a great way for students to engage with material they’re studying. Computational essays can also provide a great showcase of student achievement, as well as a means of assessing student understanding. And they’re not just something to produce for an assignment: they’re active computable documents that students can keep and use at any time in the future.

Study notebook—click to enlarge

But students aren’t the only ones to produce notebooks. In Wolfram|Alpha Notebook Edition, notebooks are also a great medium for teachers to provide material to students. Describe a concept in a notebook, then let students explore by doing their own computations right there in the notebook. Or make a notebook defining an assignment or a test—then let the students fill in their work (and grade it right there in the notebook).

Assignment

It’s very common to use Wolfram|Alpha Notebook Edition to create visualizations of concepts. Often students will just ask for the visualizations themselves. But teachers can also set up templates for visualizations, and let students fill in their own functions or data to explore for themselves.

Visualizations

Wolfram|Alpha Notebook Edition also supports dynamic interactive visualizations—for example using the Wolfram Language Manipulate function. And in Wolfram|Alpha Notebook Edition students (and teachers!) can build all sorts of dynamic visualizations just using natural language:

Dynamic visualizations

But what if you want some more sophisticated interactive demonstration, that might be hard to specify? Well, Wolfram|Alpha Notebook Edition has direct access to the Wolfram Demonstrations Project, which contains over 12,000 Demonstrations. You can ask for Demonstrations using natural language, or you can just browse the Demonstrations Project website, select a Demonstration, copy it into your Wolfram|Alpha Notebook Edition notebook, and then immediately use it there:

Demonstrations

With Wolfram|Alpha Notebook Edition it’s very easy to create compelling content. The content can involve pure calculations or visualizations. But—using the capabilities of the Wolfram Knowledgebase—it can also involve a vast range of real-world data, whether about countries, chemicals, words or artworks. And you can access it using natural language, and work with it directly in a notebook:

Using natural language

Wolfram|Alpha Notebook Edition is a great tool for students to use on their own computers. But it’s also a great tool for lectures and class demonstrations (as well as for student presentations). Go to File > New > Presenter Notebook, and you’ll get a notebook that’s set up to create a Wolfram|Alpha Notebook Edition slide show:

Presenter notebook

Click Start Presentation and you can start presenting. But what you’ll have is not just a “PowerPoint-style” slide show. It’s a fully interactive, editable, computable slide show. The Manipulate interfaces work. Everything is immediately editable. And you can do computations right there during the presentation, exploring different cases, pulling in different data, and so on.

Slide show

Making Code from Natural Language

We invented notebooks more than 30 years ago, and they’ve been widely used in Mathematica ever since. But while in Mathematica (and Wolfram Desktop) notebooks you (by default) specify computations in the precise syntax and semantics of the Wolfram Language, in Wolfram|Alpha Notebook Edition notebooks you instead specify them just using free-form Wolfram|Alpha-style input.

And indeed one of the key technical achievements that’s made Wolfram|Alpha Notebook Edition possible is that we’ve now developed increasingly robust natural-language-to-code technology that’s able to go from the free-form natural language input you type to precise Wolfram Language code that can be used to build up computations:

Natural language to code

By default, Wolfram|Alpha Notebook Edition is set up to show you the Wolfram Language code it generates. You don’t need to look at this code (and you can set it to always be hidden). But—satisfyingly for me as a language designer—students seem to find it very easy to read, often actually easier than math. And reading it gives them an extra opportunity to understand what’s going on—and to make sure the computation they’ve specified is actually the one they want.

And there’s a great side effect to the fact that Wolfram|Alpha Notebook Edition generates code: through routinely being exposed to code that represents natural language they’ve entered, students gradually absorb the idea of expressing things in computational language, and the concepts of computational thinking.

If a student wants to change a computation when they’re using Wolfram|Alpha Notebook Edition, they can always edit the free-form input they gave. But they can also directly edit the Wolfram Language that’s been generated, giving them real computational language experience.

Free-form input

What Should I Do Next? The Predictive Interface

A central goal of Wolfram|Alpha Notebook Edition is to be completely “self-service”—so that students at all levels can successfully use it without any outside instruction or assistance. Of course, free-form input is a key part of achieving this. But another part is the Wolfram|Alpha Notebook Edition Predictive Interface—that suggests what to do next based on what students have done.

Enter a computation and you’ll typically see some buttons pop up under the input field:

Buttons

These buttons will suggest directions to take. Here step-by-step solution generates an enhanced interactive version of Wolfram|Alpha Pro step-by-step functionality—all right in the notebook:

Step-by-step functionality

Click related computations and you’ll see suggestions for different computations you might want to do:

Related computations

It suggests plotting the integrand and the integral:

Plotting the integrand and the integral

It also suggests you might like to see a series expansion:

Series expansion

Now notice that underneath the output there’s a bar of suggestions about possible follow-on computations to do on this output. Click, for example, coefficient list to find the list of coefficients:

Coefficient list

Now there are new suggestions. Click, for example, total to find the total of the coefficients:

Find the total of the coefficients

The Math Experience

Wolfram|Alpha Notebook Edition has got lots of features to enhance the “math experience”. For example, click the button at the top of the notebook and you’ll get a “math keyboard” that you can use to directly enter math notation:

Math keyboard

The Wolfram Language that underlies Wolfram|Alpha Notebook Edition routinely handles the math that’s needed by the world’s top mathematicians. But having all that sophisticated math can sometimes lead to confusions for students. So in Wolfram|Alpha Notebook Edition there are ways to say “keep the math simple”. For example, you can set it to minimize the use of complex numbers:

Simplified

Simplified

Wolfram|Alpha Notebook Edition also by default does things like adding constants of integration to indefinite integrals:

Constants of integration

By the way, Wolfram|Alpha Notebook Edition by default automatically formats mathematical output in elegant “traditional textbook” form. But it always includes a little button next to each output, so you can toggle between “traditional form”, and standard Wolfram Language form.

It’s quite common in doing math to have a function, and just say “I want to plot that!” But what range should you use? In Mathematica (or the Wolfram Language), you’d have to specify it. But in Wolfram|Alpha Notebook Edition there’s always an automatic range that’s picked:

Automatic range

But since you can see the Wolfram Language code—including the range—it’s easy to change that, and specify whatever range you want.

Specify range

What if you want to get an interactive control to change the range, or to change a parameter in the function? In Mathematica (or the Wolfram Language) you’d have to write a Manipulate. But in Wolfram|Alpha Notebook Edition, you can build a whole interactive interface just using natural language:

Interactive interface

And because in Wolfram|Alpha Notebook Edition the Manipulate computations are all running directly on your local computer, nothing is being slowed down by network transmission—and so everything moves at full speed. (Also, if you have a long computation, you can just let it keep running on your computer; there’s no timeout like in Wolfram|Alpha on the web.)

Multistep Computation

One of the important features of Wolfram|Alpha Notebook Edition is that it doesn’t just do one-shot computations; it allows you to do multistep computations that in effect involve a back-and-forth conversation with the computer, in which you routinely refer to previous results:

Multistep computation

Often it’s enough to just talk about the most recent result, and say things like “plot it as a function of x”. But it’s also quite common to want to refer back to results earlier in the notebook. One way to do this is to say things like “the result before last”—or to use the Out[n] labels for each result. But another thing that Wolfram|Alpha Notebook Edition allows you to do is to set values of variables, that you can then use throughout your session:

Set values

It’s also possible to define functions, all with natural language:

Define functions

There are lots of complicated design and implementation issues that arise in dealing with multistep computations. For example, if you have a traditional result for an indefinite integral, with a constant of integration, what do you do with the constant when you want to plot the result? (Wolfram|Alpha Notebook Edition consistently handles arbitrary additive constants in plots by effectively setting them to zero.)

Integrate x

It can also be complicated to know what refers to what in the “conversation”. If you say “plot”, are you trying to plot your latest result, or are you asking for an interface to create a completely new plot? If you use a pronoun, as in “plot it”, then it’s potentially more obvious what you mean, and Wolfram|Alpha Notebook Edition has a better chance of being able to use its natural language understanding capabilities to figure it out.

The World with Wolfram|Alpha Notebook Edition

It’s been very satisfying to see how extensively Wolfram|Alpha has been adopted by students. But mostly that adoption has been outside the classroom. Now, with Wolfram|Alpha Notebook Edition, we’ve got a tool that can immediately be put to use in the classroom, across the whole college and precollege spectrum. And I’m excited to see how it can streamline coursework, deepen understanding, enable new concepts to be taught, and effectively provide a course-based personal AI tutor for every student.

Starting today, Wolfram|Alpha Notebook Edition is available on all standard computer platforms (Mac, Windows, Linux). (A cloud version will also be available on the web soon.) Colleges and universities with full Wolfram Technology System site licenses can automatically start using Wolfram|Alpha Notebook Edition today; at schools with other site licenses, it can immediately be added. It’s available to K–12 schools and junior colleges in classroom packs, or as a site license. And, of course, it’s also available to individual teachers, students, hobbyists and others.

(Oh, and if you have Mathematica or Wolfram Desktop, it’ll also be possible in future versions to create “Wolfram|Alpha mode” notebooks that effectively integrate Wolfram|Alpha Notebook Edition capabilities. And in general there’s perfect compatibility among Wolfram|Alpha Notebook Edition, Mathematica, Wolfram Desktop, Wolfram Cloud, Wolfram Programming Lab, etc.—providing a seamless experience for people progressing across education and through professional careers.)

Like Wolfram|Alpha—and the Wolfram Language—Wolfram|Alpha Notebook Edition will continue to grow in capabilities far into the future. But what’s there today is already a remarkable achievement that I think will be transformative in many educational settings.

More than 31 years ago we introduced Mathematica (and what’s now the Wolfram Language). A decade ago we introduced Wolfram|Alpha. Now, today, with the release of Wolfram|Alpha Notebook Edition we’re giving a first taste—in the context of education—of a whole new approach to computing: a full computing environment that’s driven by natural language. It doesn’t supplant Wolfram Language, or Wolfram|Alpha—but it defines a new direction that in time will bring the power of computation to a whole massive new audience.

Announcing the Rule 30 Prizes

$
0
0
rule30-icon

Announcing the Rule 30 Prizes

Join Stephen Wolfram on October 7 for an AMA and explorations livestream discussing
the Rule 30 Prizes »

The Story of Rule 30

How can something that simple produce something that complex? It’s been nearly 40 years since I first saw rule 30—but it still amazes me. Long ago it became my personal all-time favorite science discovery, and over the years it’s changed my whole worldview and led me to all sorts of science, technology, philosophy and more.

But even after all these years, there are still many basic things we don’t know about rule 30. And I’ve decided that it’s now time to do what I can to stimulate the process of finding more of them out. So as of today, I am offering $30,000 in prizes for the answers to three basic questions about rule 30.

The setup for rule 30 is extremely simple. One’s dealing with a sequence of lines of black and white cells. And given a particular line of black and white cells, the colors of the cells on the line below are determined by looking at each cell and its immediate neighbors and then applying the following simple rule:

RulePlot
&#10005
RulePlot[CellularAutomaton[30]]

If you start with a single black cell, what will happen? One might assume—as I at first did—that the rule is simple enough that the pattern it produces must somehow be correspondingly simple. But if you actually do the experiment, here’s what you find happens over the first 50 steps:

RulePlot
&#10005
RulePlot[CellularAutomaton[30], {{1}, 0}, 50, Mesh -> All, 
 ImageSize -> Full]

But surely, one might think, this must eventually resolve into something much simpler. Yet here’s what happens over the first 300 steps:

The first 300 steps of rule 30—click to enlarge

And, yes, there’s some regularity over on the left. But many aspects of this pattern look for all practical purposes random. It’s amazing that a rule so simple can produce behavior that’s so complex. But I’ve discovered that in the computational universe of possible programs this kind of thing is common, even ubiquitous. And I’ve built a whole new kind of science—with all sorts of principles—based on this.

And gradually there’s been more and more evidence for these principles. But what specifically can rule 30 tell us? What concretely can we say about how it behaves? Even the most obvious questions turn out to be difficult. And after decades without answers, I’ve decided it’s time to define some specific questions about rule 30, and offer substantial prizes for their solutions.

I did something similar in 2007, putting a prize on a core question about a particular Turing machine. And at least in that case the outcome was excellent. In just a few months, the prize was won—establishing forever what the simplest possible universal Turing machine is, as well as providing strong further evidence for my general Principle of Computational Equivalence.

The Rule 30 Prize Problems again get at a core issue: just how complex really is the behavior of rule 30? Each of the problems asks this in a different, concrete way. Like rule 30 itself, they’re all deceptively simple to state. Yet to solve any of them will be a major achievement—that will help illuminate fundamental principles about the computational universe that go far beyond the specifics of rule 30.

I’ve wondered about every one of the problems for more than 35 years. And all that time I’ve been waiting for the right idea, or the right kind of mathematical or computational thinking, to finally be able to crack even one of them. But now I want to open this process up to the world. And I’m keen to see just what can be achieved, and what methods it will take.

The Rule 30 Prize Problems

For the Rule 30 Prize Problems, I’m concentrating on a particularly dramatic feature of rule 30: the apparent randomness of its center column of cells. Start from a single black cell, then just look down the sequence of values of this cell—and it seems random:

ArrayPlot
&#10005
ArrayPlot[
 MapIndexed[If[#2[[2]] != 21, # /. {0 -> 0.2, 1 -> .6}, #] &, 
  CellularAutomaton[30, {{1}, 0}, 20], {2}], Mesh -> All]

But in what sense is it really random? And can one prove it? Each of the Prize Problems in effect uses a different criterion for randomness, then asks whether the sequence is random according to that criterion.

Problem 1: Does the center column always remain non-periodic?

Here’s the beginning of the center column of rule 30:

ArrayPlot
&#10005
ArrayPlot[List@CellularAutomaton[30, {{1}, 0}, {80, {{0}}}], 
 Mesh -> True, ImageSize -> Full]

It’s easy to see that this doesn’t repeat—it doesn’t become periodic. But this problem is about whether the center column ever becomes periodic, even after an arbitrarily large number of steps. Just by running rule 30, we know the sequence doesn’t become periodic in the first billion steps. But what about ever? To establish that, we need a proof. (Here are the first million and first billion bits in the sequence, by the way, as entries in the Wolfram Data Repository.)

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s what one gets if one tallies the number of black and of white cells in successively more steps in the center column of rule 30:

The number of black and of white cells in the center column of rule 30
&#10005
Dataset[{{1, 1, 0, ""}, {10, 7, 3, 2.3333333333333335}, {100, 52, 48, 1.0833333333333333}, 
 {1000, 481, 519, 0.9267822736030829}, {10000, 5032, 4968, 1.0128824476650564}, 
 {100000, 50098, 49902, 1.0039276982886458}, {1000000, 500768, 499232, 
  1.003076725850907}, {10000000, 5002220, 4997780, 1.0008883944471345}, 
 {100000000, 50009976, 49990024, 1.000399119632349}, 
 {1000000000, 500025038, 499974962, 1.0001001570154626}}]

The results are certainly close to equal for black vs. white. But what this problem asks is whether the limit of the ratio after an arbitrarily large number of steps is exactly 1.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

To find the nth cell in the center column, one can always just run rule 30 for n steps, computing the values of all the cells in this diamond:

ArrayPlot
&#10005
With[{n = 100}, 
 ArrayPlot[
  MapIndexed[If[Total[Abs[#2 - n/2 - 1]] <= n/2, #, #/4] &, 
   CellularAutomaton[30, CenterArray[{1}, n + 1], n], {2}]]]

But if one does this directly, one’s doing n2 individual cell updates, so the computational effort required goes up like O(n2). This problem asks if there’s a shortcut way to compute the value of the nth cell, without all this intermediate computation—or, in particular, in less than O(n) computational effort.

The Digits of Pi

Rule 30 is a creature of the computational universe: a system found by exploring possible simple programs with the new intellectual framework that the paradigm of computation provides. But the problems I’ve defined about rule 30 have analogs in mathematics that are centuries old.

Consider the digits of π. They’re a little like the center column of rule 30. There’s a definite algorithm for generating them. Yet once generated they seem for all practical purposes random:

N[Pi, 85]
&#10005
N[Pi, 85]

Just to make the analog a little closer, here are the first few digits of π in base 2:

BaseForm[N[Pi, 25], 2]
&#10005
BaseForm[N[Pi, 25], 2]

And here are the first few bits in the center column of rule 30:

Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]
&#10005
Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]

Just for fun, one can convert these to base 10:

N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 0}, 2], 85]
&#10005
N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 
   0}, 2], 85]

Of course, the known algorithms for generating the digits of π are considerably more complicated than the simple rule for generating the center column of rule 30. But, OK, so what’s known about the digits of π?

Well, we know they don’t repeat. That was proved in the 1760s when it was shown that π is an irrational number—because the only numbers whose digits repeat are rational numbers. (It was also shown in 1882 that π is transcendental, i.e. that it cannot be expressed in terms of roots of polynomials.)

How about the analog of problem 2? Do we know if in the digit sequence of π different digits occur with equal frequency? By now more than 100 trillion binary digits have been computed—and the measured frequencies of digits are very close (in the first 40 trillion binary digits the ratio of 1s to 0s is about 0.9999998064). But in the limit, are the frequencies exactly the same? People have been wondering about this for several centuries. But so far mathematics hasn’t succeeded in delivering any results.

For rational numbers, digit sequences are periodic, and it’s easy to work out relative frequencies of digits. But for the digit sequences of all other “naturally constructed” numbers, basically there’s nothing known about limiting frequencies of digits. It’s a reasonable guess that actually the digits of π (as well as the center column of rule 30) are “normal”, in the sense that not only every individual digit, but also every block of digits of any given length in the limit occur with equal frequency. And as was noted in the 1930s, it’s perfectly possible to “digit-construct” normal numbers. Champernowne’s number, formed by concatenating the digits of successive integers, is an example (and, yes, this works in any base, and one can also get normal numbers by concatenating values of functions of successive integers):

N[ChampernowneNumber[10], 85]
&#10005
N[ChampernowneNumber[10], 85]

But the point is that for “naturally constructed” numbers formed by combinations of standard mathematical functions, there’s simply no example known where any regularity of digits has been found. Of course, it ultimately depends what one means by “regularity”—and at some level the problem devolves into a kind of number-digit analog of the search for extraterrestrial intelligence. But there’s absolutely no proof that one couldn’t, for example, find even some strange combination of square roots that would have a digit sequence with some very obvious regularity.

OK, so what about the analog of problem 3 for the digits of π? Unlike rule 30, where the obvious way to compute elements in the sequence is one step at a time, traditional ways of computing digits of π involve getting better approximations to π as a complete number. With the standard (bizarre-looking) series invented by Ramanujan in 1910 and improved by the Chudnovsky brothers in 1989, the first few terms in the series give the following approximations:

Standard series
&#10005
Style[Table[N[(12*\!\(
\*UnderoverscriptBox[\(\[Sum]\), \(k = 0\), \(n\)]
\*FractionBox[\(
\*SuperscriptBox[\((\(-1\))\), \(k\)]*\(\((6*k)\)!\)*\((13591409 + 
           545140134*k)\)\), \(\(\((3*k)\)!\) 
\*SuperscriptBox[\((\(k!\))\), \(3\)]*
\*SuperscriptBox[\(640320\), \(3*k + 3/2\)]\)]\))^-1, 100], {n, 10}] //
   Column, 9]

So how much computational effort is it to find the nth digit? The number of terms required in the series is O(n). But each term needs to be computed to n-digit precision, which requires at least O(n) individual digit operations—implying that altogether the computational effort required is more than O(n).

Until the 1990s it was assumed that there wasn’t any way to compute the nth digit of π without computing all previous ones. But in 1995 Simon Plouffe discovered that actually it’s possible to compute—albeit slightly probabilistically—the nth digit without computing earlier ones. And while one might have thought that this would allow the nth digit to be obtained with less than O(n) computational effort, the fact that one has to do computations at n-digit precision means that at least O(n) computational effort is still required.

Results, Analogies and Intuitions

Problem 1: Does the center column always remain non-periodic?

Of the three Rule 30 Prize Problems, this is the one on which the most progress has already been made. Because while it’s not known if the center column in the rule 30 pattern ever becomes periodic, Erica Jen showed in 1986 that no two columns can both become periodic. And in fact, one can also give arguments that a single column plus scattered cells in another column can’t both be periodic.

The proof about a pair of columns uses a special feature of rule 30. Consider the structure of the rule:

RulePlot[CellularAutomaton[30]]
&#10005
RulePlot[CellularAutomaton[30]]

Normally one would just say that given each triple of cells, the rule determines the color of the center cell below. But for rule 30, one can effectively also run the rule sideways: given the cell to the right and above, one can also uniquely determine the color of the cell to the left. And what this means is that if one is given two adjacent columns, it’s possible to reconstruct the whole pattern to the left:

ArrayPlot
&#10005
GraphicsRow[
 ArrayPlot[#, PlotRange -> 1, Mesh -> All, PlotRange -> 1, 
    Background -> LightGray, 
    ImageSize -> {Automatic, 80}] & /@ (PadLeft[#, {Length[#], 10}, 
      10] & /@ 
    Module[{data = {{0, 1}, {1, 1}, {0, 0}, {0, 1}, {1, 1}, {1, 
         0}, {0, 1}, {1, 10}}}, 
     Flatten[{{data}, 
       Table[Join[
         Table[Module[{p, q = data[[n, 1]], r = data[[n, 2]], 
            s = data[[n + 1, 1]] },
           p = Mod[-q - r - q r + s, 2];
           PrependTo[data[[n]], p]], {n, 1, Length[data] - i}], 
         PrependTo[data[[-#]], 10] & /@ Reverse[Range[i]]], {i, 7}]}, 
      1]])]

But if the columns were periodic, it immediately follows that the reconstructed pattern would also have to be periodic. Yet by construction at least the initial condition is definitely not periodic, and hence the columns cannot both be periodic. The same argument works if the columns are not adjacent, and if one doesn’t know every cell in both columns. But there’s no known way to extend the argument to a single column—such as the center column—and thus it doesn’t resolve the first Rule 30 Prize Problem.

OK, so what would be involved in resolving it? Well, if it turns out that the center column is eventually periodic, one could just compute it, and show that. We know it’s not periodic for the first billion steps, but one could at least imagine that there could be a trillion-step transient, after which it’s periodic.

Is that plausible? Well, transients do happen—and theoretically (just like in the classic Turing machine halting problem) they can even be arbitrarily long. Here’s a somewhat funky example—found by a search—of a rule with 4 possible colors (totalistic code 150898). Run it for 200 steps, and the center column looks quite random:

Rule 150898
&#10005
ArrayPlot[
 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {200, 150 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, 
 PixelConstrained -> 2, Frame -> False]

After 500 steps, the whole pattern still looks quite random:

Rule 150898
&#10005
ArrayPlot[
 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {500, 300 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, Frame -> False, 
 ImagePadding -> 0, PlotRangePadding -> 0, PixelConstrained -> 1]

But if one zooms in around the center column, there’s something surprising: after 251 steps, the center column seems to evolve to a fixed value (or at least it’s fixed for more than a million steps):

Rule 150898
&#10005
Grid[{ArrayPlot[#, Mesh -> True, 
     ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
       2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, ImageSize -> 38,
      MeshStyle -> Lighter[GrayLevel[.5, .65], .45]] & /@ 
   Partition[
    CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {1400, {-4, 4}}],
     100]}, Spacings -> .35]

Could some transient like this happen in rule 30? Well, take a look at the rule 30 pattern, now highlighting where the diagonals on the left are periodic:

ArrayPlot
&#10005
steps = 500;
					diagonalsofrule30 = 
  Reverse /@ 
   Transpose[
    MapIndexed[RotateLeft[#1, (steps + 1) - #2[[1]]] &, 
     CellularAutomaton[30, {{1}, 0}, steps]]];

     diagonaldataofrule30 = 
  Table[With[{split = 
      Split[Partition[Drop[diagonalsofrule30[[k]], 1], 8]], 
     ones = Flatten[
       Position[Reverse[Drop[diagonalsofrule30[[k]], 1]], 
        1]]}, {Length[split[[1]]], split[[1, 1]], 
     If[Length[split] > 1, split[[2, 1]], 
      Length[diagonalsofrule30[[k]]] - Floor[k/2]]}], {k, 1, 
    2 steps + 1}];

transientdiagonalrule30 = %;

    transitionpointofrule30 = 
  If[IntegerQ[#[[3]]], #[[3]], 
     If[#[[1]] > 1, 
      8 #[[1]] + Count[Split[#[[2]] - #[[3]]][[1]], 0] + 1, 0] ] & /@ 
   diagonaldataofrule30;

   decreasingtransitionpointofrule30 = 
  Append[Min /@ Partition[transitionpointofrule30, 2, 1], 0];

  transitioneddiagonalsofrule30 = 
  Table[Join[
    Take[diagonalsofrule30[[n]], 
      decreasingtransitionpointofrule30[[n]]] + 2, 
    Drop[diagonalsofrule30[[n]], 
     decreasingtransitionpointofrule30[[n]]]], {n, 1, 2 steps + 1}];

     transientdiagonalrule30 = 
 MapIndexed[RotateRight[#1, (steps + 1) - #2[[1]]] &, 
  Transpose[Reverse /@ transitioneddiagonalsofrule30]];
  
  smallertransientdiagonalrule30 = 
  Take[#, {225, 775}] & /@ Take[transientdiagonalrule30, 275];

 Framed[ArrayPlot[smallertransientdiagonalrule30, 
  ColorRules -> {0 -> White, 1 -> Gray, 2 -> Hue[0.14, 0.55, 1], 
    3 -> Hue[0.07, 1, 1]}, PixelConstrained -> 1,
  Frame -> None,
  ImagePadding -> 0, ImageMargins -> 0,
  PlotRangePadding -> 0, PlotRangePadding -> Full
  ], FrameMargins -> 0, FrameStyle -> GrayLevel[.75]]

There seems to be a boundary that separates order on the left from disorder on the right. And at least over the first 100,000 or so steps, the boundary seems to move on average about 0.252 steps to the left at each step—with roughly random fluctuations:

ListLinePlot
&#10005
data = CloudGet[
   CloudObject[
    "https://www.wolframcloud.com/obj/bc470188-f629-4497-965d-\
a10fe057e2fd"]];

ListLinePlot[
 MapIndexed[{First[#2], -# - .252 First[#2]} &, 
  Module[{m = -1, w}, 
   w = If[First[#] > m, m = First[#], m] & /@ data[[1]]; m = 1;
   Table[While[w[[m]] < i, m++]; m - i, {i, 100000}]]], 
 Filling -> Axis, AspectRatio -> 1/4, MaxPlotPoints -> 10000, 
 Frame -> True, PlotRangePadding -> 0, AxesOrigin -> {Automatic, 0}, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

But how do we know that there won’t at some point be a huge fluctuation, that makes the order on the left cross the center column, and perhaps even make the whole pattern periodic? From the data we have so far, it looks unlikely, but I don’t know any way to know for sure.

And it’s certainly the case that there are systems with exceptionally long “transients”. Consider the distribution of primes, and compute LogIntegral[n] - PrimePi[n]:

DiscretePlot
&#10005
DiscretePlot[LogIntegral[n] - PrimePi[n], {n, 10000}, 
 Filling -> Axis,
 Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, 
 Joined -> True, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Yes, there are fluctuations. But from this picture it certainly looks as if this difference is always going to be positive. And that’s, for example, what Ramanujan thought. But it turns out it isn’t true. At first the bound for where it would fail was astronomically large (Skewes’s number 10^10^10^964). And although still nobody has found an explicit value of n for which the difference is negative, it’s known that before n = 10317 there must be one (and eventually the difference will be negative at least nearly a millionth of the time).

I strongly suspect that nothing like this happens with the center column of rule 30. But until we have a proof that it can’t, who knows?

One might think, by the way, that while one might be able to prove periodicity by exposing regularity in the center column of rule 30, nothing like that would be possible for non-periodicity. But actually, there are patterns whose center columns one can readily see are non-periodic, even though they’re very regular. The main class of examples are nested patterns. Here’s a very simple example, from rule 161—in which the center column has white cells when n = 2k:

Rule 161
&#10005
GraphicsRow[
 ArrayPlot[CellularAutomaton[161, {{1}, 0}, #]] & /@ {40, 200}]

Here’s a slightly more elaborate example (from the 2-neighbor 2-color rule 69540422), in which the center column is a Thue–Morse sequence ThueMorse[n]:

Thue-Morse sequence
&#10005
GraphicsRow[
 ArrayPlot[
    CellularAutomaton[{69540422, 2, 2}, {{1}, 
      0}, {#, {-#, #}}]] & /@ {40, 400}]

One can think of the Thue–Morse sequence as being generated by successively applying the substitutions:

RulePlot
&#10005
RulePlot[SubstitutionSystem[{0 -> {0, 1}, 1 -> {1, 0}}], 
 Appearance -> "Arrow"]

And it turns out that the nth term in this sequence is given by Mod[DigitCount[n, 2, 1], 2]—which is never periodic.

Will it turn out that the center column of rule 30 can be generated by a substitution system? Again, I’d be amazed (although there are seemingly natural examples where very complex substitution systems do appear). But once again, until one has a proof, who knows?

Here’s something else, that may be confusing, or may be helpful. The Rule 30 Prize Problems all concern rule 30 running in an infinite array of cells. But what if one considers just n cells, say with the periodic boundary conditions (i.e. taking the right neighbor of the rightmost cell to be the leftmost cell, and vice versa)? There are 2n possible total states of the system—and one can draw a state transition diagram that shows which state evolves to which other. Here’s the diagram for n = 5:

Graph
&#10005
Graph[# -> CellularAutomaton[30][#] & /@ Tuples[{1, 0}, 4], 
 VertexLabels -> ((# -> 
       ArrayPlot[{#}, ImageSize -> 30, Mesh -> True]) & /@ 
    Tuples[{1, 0}, 4])]

And here it is for n = 4 through n = 11:

Grid
&#10005
Row[Table[
  Framed[Graph[# -> CellularAutomaton[30][#] & /@ 
     Tuples[{1, 0}, n]]], {n, 4, 11}]]

The structure is that there are a bunch of states that appear only as transients, together with other states that are on cycles. Inevitably, no cycle can be longer than 2n (actually, symmetry considerations show that it always has to be somewhat less than this).

OK, so on a size-n array, rule 30 always has to show behavior that becomes periodic with a period that’s less than 2n. Here are the actual periods starting from a single black cell initial condition, plotted on a log scale:

ListLogPlot
&#10005
ListLogPlot[
 Normal[Values[
   ResourceData[
      "Repetition Periods for Elementary Cellular Automata"][
     Select[#Rule == 30 &]][All, "RepetitionPeriods"]]], 
 Joined -> True, Filling -> Bottom, Mesh -> All, 
 MeshStyle -> PointSize[.008], AspectRatio -> 1/3, Frame -> True, 
 PlotRange -> {{47, 2}, {0, 10^10}}, PlotRangePadding -> .1, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And at least for these values of n, a decent fit is that the period is about 20.63 n. And, yes, at least in all these cases, the period of the center column is equal to the period of the whole evolution. But what do these finite-size results imply about the infinite-size case? I, at least, don’t immediately see.

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s a plot of the running excess of 1s over 0s in 10,000 steps of the center column of rule 30:

ListLinePlot
&#10005
ListLinePlot[
 Accumulate[2 CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}] - 1],
  AspectRatio -> 1/4, Frame -> True, PlotRangePadding -> 0, 
 AxesOrigin -> {Automatic, 0}, Filling -> Axis, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Here it is for a million steps:

ListLinePlot
&#10005
ListLinePlot[
 Accumulate[
  2 ResourceData[
     "A Million Bits of the Center Column of the Rule 30 Cellular Automaton"] - 1], Filling -> Axis, Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, MaxPlotPoints -> 1000, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And a billion steps:

ListLinePlot
&#10005
data=Flatten[IntegerDigits[#,2,8]&/@Normal[ResourceData["A 
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]];
data=Accumulate[2 data-1];
sdata=Downsample[data,10^5];
ListLinePlot[Transpose[{Range[10000] 10^5,sdata}],Filling->Axis,Frame->True,PlotRangePadding->0,AspectRatio->1/4,MaxPlotPoints->1000,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]]

We can see that there are times when there’s an excess of 1s over 0s, and vice versa, though, yes, as we approach a billion steps 1 seems to be winning over 0, at least for now.

But let’s compute the ratio of the total number of 1s to the total number 0f 0s. Here’s what we get after 10,000 steps:

ListLinePlot
&#10005
Quiet[ListLinePlot[
  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.88, 1.04}}]]

Is this approaching the value 1? It’s hard to tell. Go on a little longer, and this is what we see:

ListLinePlot
&#10005
Quiet[ListLinePlot[
  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^5 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.985, 1.038}}]]

The scale is getting smaller, but it’s still hard to tell what will happen. Plotting the difference from 1 on a log-log plot up to a billion steps suggests it’s fairly systematically getting smaller:

ListLogLogPlot
&#10005
accdata=Accumulate[Flatten[IntegerDigits[#,2,8]&/@Normal[ResourceData["A
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]];

diffratio=FunctionCompile[Function[Typed[arg,TypeSpecifier["PackedArray"]["MachineInteger",1]],MapIndexed[Abs[N[#]/(First[#2]-N[#])-1.]&,arg]]];

data=diffratio[accdata];

ListLogLogPlot[Join[Transpose[{Range[3,10^5],data[[3;;10^5]]}],Transpose[{Range[10^5+1000,10^9,1000],data[[10^5+1000;;10^9;;1000]]}]],Joined->True,AspectRatio->1/4,Frame->True,Filling->Axis,PlotRangePadding->0,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]]

But how do we know this trend will continue? Right now, we don’t. And, actually, things could get quite pathological. Maybe the fluctuations in 1s vs. 0s grow, so even though we’re averaging over longer and longer sequences, the overall ratio will never converge to a definite value.

Again, I doubt this is going to happen in the center column of rule 30. But without a proof, we don’t know for sure.

We’re asking here about the frequencies of black and white cells. But an obvious—and potentially illuminating—generalization is to ask instead about the frequencies for blocks of cells of length k. We can ask if all 2k such blocks have equal limiting frequency. Or we can ask the more basic question of whether all the blocks even ever occur—or, in other words, whether if one goes far enough, the center column of rule 30 will contain any given sequence of length k (say a bitwise representation of some work of literature).

Again, we can get empirical evidence. For example, at least up to k = 22, all 2k sequences do occur—and here’s how many steps it takes:

ListLogPlot
&#10005
ListLogPlot[{3, 7, 13, 63, 116, 417, 1223, 1584, 2864, 5640, 23653, 
  42749, 78553, 143591, 377556, 720327, 1569318, 3367130, 7309616, 
  14383312, 32139368, 58671803}, Joined -> True, AspectRatio -> 1/4, 
 Frame -> True, Mesh -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.01]}], 
 PlotTheme -> "Detailed", 
 PlotStyle -> Directive[{Thickness[.004], Hue[0.1, 1, 0.99]}]]

It’s worth noticing that one can succeed perfectly for blocks of one length, but then fail for larger blocks. For example, the Thue–Morse sequence mentioned above has exactly equally frequencies of 0 and 1, but pairs don’t occur with equal frequencies, and triples of identical elements simply never occur.

In traditional mathematics—and particularly dynamical systems theory—one approach to take is to consider not just evolution from a single-cell initial condition, but evolution from all possible initial conditions. And in this case it’s straightforward to show that, yes, if one evolves with equal probability from all possible initial conditions, then columns of cells generated by rule 30 will indeed contain every block with equal frequency. But if one asks the same thing for different distributions of initial conditions, one gets different results, and it’s not clear what the implication of this kind of analysis is for the specific case of a single-cell initial condition.

If different blocks occurred with different frequencies in the center column of rule 30, then that would immediately show that the center column is “not random”, or in other words that it has statistical regularities that could be used to at least statistically predict it. Of course, at some level the center column is completely “predictable”: you just have to run rule 30 to find it. But the question is whether, given just the values in the center column on their own, there’s a way to predict or compress them, say with much less computational effort than generating an arbitrary number of steps in the whole rule 30 pattern.

One could imagine running various data compression or statistical analysis algorithms, and asking whether they would succeed in finding regularities in the sequence. And particularly when one starts thinking about the overall computational capabilities of rule 30, it’s conceivable that one could prove something about how across a spectrum of possible analysis algorithms, there’s a limit to how much they could “reduce” the computation associated with the evolution of rule 30. But even given this, it’d likely still be a major challenge to say anything about the specific case of relative frequencies of black and white cells.

It’s perhaps worth mentioning one additional mathematical analog. Consider treating the values in a row of the rule 30 pattern as digits in a real number, say with the first digit of the fractional part being on the center column. Now, so far as we know, the evolution of rule 30 has no relation to any standard operations (like multiplication or taking powers) that one does on real numbers. But we can still ask about the sequence of numbers formed by looking at the right-hand side of the rule 30 pattern. Here’s a plot for the first 200 steps:

ListLinePlot
&#10005
ListLinePlot[
 FromDigits[{#, 0}, 2] & /@ 
  CellularAutomaton[30, {{1}, 0}, {200, {0, 200}}], Mesh -> All, 
 AspectRatio -> 1/4, Frame -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.0085]}], 
 PlotTheme -> "Detailed", PlotStyle -> Directive[{
Hue[0.1, 1, 0.99]}], ImageSize -> 575]

And here’s a histogram of the values reached at successively more steps:

Histogram
&#10005
Grid[{Table[
   Histogram[
    FromDigits[{#, 0}, 2] & /@ 
     CellularAutomaton[30, {{1}, 0}, {10^n, {0, 20}}], {.01}, 
    Frame -> True, 
    FrameTicks -> {{None, 
       None}, {{{0, "0"}, .2, .4, .6, .8, {1, "1"}}, None}}, 
    PlotLabel -> (StringTemplate["`` steps"][10^n]), 
    ChartStyle -> Directive[Opacity[.5], Hue[0.09, 1, 1]], 
    ImageSize -> 208, 
    PlotRangePadding -> {{0, 0}, {0, Scaled[.06]}}], {n, 4, 6}]}, 
 Spacings -> .2]

And, yes, it’s consistent with the limiting histogram being flat, or in other words, with these numbers being uniformly distributed in the interval 0 to 1.

Well, it turns out that in the early 1900s there were a bunch of mathematical results established about this kind of equidistribution. In particular, it’s known that FractionalPart[h n] for successive n is always equidistributed if h isn’t a rational number. It’s also known that FractionalPart[hn] is equidistributed for almost all h (Pisot numbers like the golden ratio are exceptions). But specific cases—like FractionalPart[(3/2)n]—have eluded analysis for at least half a century. (By the way, it’s known that the digits of π in base 16 and thus base 2 can be generated by a recurrence of the form xn = FractionalPart[16 xn-1 + r[n]] where r[n] is a fixed rational function of n.)

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Consider the pattern made by rule 150:

Rule 150
&#10005
Row[{ArrayPlot[CellularAutomaton[150, {{1}, 0}, 30], Mesh -> All, 
   ImageSize -> 315], 
  ArrayPlot[CellularAutomaton[150, {{1}, 0}, 200], ImageSize -> 300]}]

It’s a very regular, nested pattern. Its center column happens to be trivial (all cells are black). But if we look one column to the left or right, we find:

ArrayPlot
&#10005
ArrayPlot[{Table[Mod[IntegerExponent[t, 2], 2], {t, 80}]}, 
 Mesh -> All, ImageSize -> Full]

How do we work out the value of the nth cell? Well, in this particular case, it turns out there’s essentially just a simple formula: the value is given by Mod[IntegerExponent[n, 2], 2]. In other words, just look at the number n in base 2, and ask whether the number of zeros it has at the end is even or odd.

How much computational effort does it take to “evaluate this formula”? Well, even if we have to check every bit in n, there are only about Log[2, n] of those. So we can expect that the computational effort is O(log n).

But what about the rule 30 case? We know we can work out the value of the nth cell in the center column just by explicitly applying the rule 30 update rule n2 times. But the question is whether there’s a way to reduce the computational work that’s needed. In the past, there’s tended to be an implicit assumption throughout the mathematical sciences that if one has the right model for something, then by just being clever enough one will always find a way to make predictions—or in other words, to work out what a system will do, using a lot less computational effort than the actual evolution of the system requires.

And, yes, there are plenty of examples of “exact solutions” (think 2-body problem, 2D Ising model, etc.) where we essentially just get a formula for what a system will do. But there are also other cases (think 3-body problem, 3D Ising model, etc.) where this has never successfully been done.

And as I first discussed in the early 1980s, I suspect that there are actually many systems (including these) that are computationally irreducible, in the sense that there’s no way to significantly reduce the amount of computational work needed to determine their behavior.

So in effect Problem 3 is asking about the computational irreducibility of rule 30—or at least a specific aspect of it. (The choice of O(n) computational effort is somewhat arbitrary; another version of this problem could ask for O(nα) for any α<2, or, for that matter, O(log β(n))—or some criterion based on both time and memory resources.)

If the answer to Problem 3 is negative, then the obvious way to show this would just be to give an explicit program that successfully computes the nth value in the center column with less than O(n) computational effort, as we did for rule 150 above.

We can ask what O(n) computational effort means. What kind of system are we supposed to use to do the computation? And how do we measure “computational effort”? The phenomenon of computational universality implies that—within some basic constraints—it ultimately doesn’t matter.

For definiteness we could say that we always want to do the computation on a Turing machine. And for example we can say that we’ll feed the digits of the number n in as the initial state of the Turing machine tape, then expect the Turing machine to grind for much less than n steps before generating the answer (and, if it’s really to be “formula like”, more like O(log n) steps).

We don’t need to base things on a Turing machine, of course. We could use any kind of system capable of universal computation, including a cellular automaton, and, for that matter, the whole Wolfram Language. It gets a little harder to measure “computational effort” in these systems. Presumably in a cellular automaton we’d want to count the total number of cell updates done. And in the Wolfram Language we might end up just actually measuring CPU time for executing whatever program we’ve set up.

I strongly suspect that rule 30 is computationally irreducible, and that Problem 3 has an affirmative answer. But if isn’t, my guess is that eventually there’ll turn out to be a program that rather obviously computes the nth value in less than O(n) computational effort, and there won’t be a lot of argument about the details of whether the computational resources are counted correctly.

But proving that no such program exists is a much more difficult proposition. And even though I suspect computational irreducibility is quite ubiquitous, it’s always very hard to prove explicit lower bounds on the difficulty of doing particular computations. And in fact almost all explicit lower bounds currently known are quite weak, and essentially boil down just to arguments about information content—like that you need O(log n) steps to even read all the digits in the value of n.

Undoubtedly the most famous lower-bound problem is the P vs. NP question. I don’t think there’s a direct relation to our rule 30 problem (which is more like a P vs. LOGTIME question), but it’s perhaps worth understanding how things are connected. The basic point is that the forward evolution of a cellular automaton, say for n steps from an initial condition with n cells specified, is at most an O(n2) computation, and is therefore in P (“polynomial time”). But the question of whether there exists an initial condition that evolves to produce some particular final result is in NP. If you happen (“non-deterministically”) to pick the correct initial condition, then it’s polynomial time to check that it’s correct. But there are potentially 2n possible initial conditions to check.

Of course there are plenty of cellular automata where you don’t have to check all these 2n initial conditions, and a polynomial-time computation clearly suffices. But it’s possible to construct a cellular automaton where finding the initial condition is an NP-complete problem, or in other words, where it’s possible to encode any problem in NP in this particular cellular automaton inversion problem. Is the rule 30 inversion problem NP-complete? We don’t know, though it seems conceivable that it could be proved to be (and if one did prove it then rule 30 could finally be a provably NP-complete cryptosystem).

But there doesn’t seem to be a direct connection between the inversion problem for rule 30, and the problem of predicting the center column. Still, there’s at least a more direct connection to another global question: whether rule 30 is computation universal, or, in other words, whether there exist initial conditions for rule 30 that allow it to be “programmed” to perform any computation that, for example, any Turing machine can perform.

We know that among the 256 simplest cellular automata, rule 110 is universal (as are three other rules that are simple transformations of it). But looking at a typical example of rule 110 evolution, it’s already clear that there are definite, modular structures one can identify. And indeed the proof proceeds by showing how one can “engineer” a known universal system out of rule 110 by appropriately assembling these structures.

Rule 110
&#10005
SeedRandom[23542345]; ArrayPlot[
 CellularAutomaton[110, RandomInteger[1, 600], 400], 
 PixelConstrained -> 1]

Rule 30, however, shows no such obvious modularity—so it doesn’t seem plausible that one can establish universality in the “engineering” way it’s been established for all other known-to-be-universal systems. Still, my Principle of Computational Equivalence strongly suggests that rule 30 is indeed universal; we just don’t yet have an obvious direction to take in trying to prove it.

If one can show that a system is universal, however, then this does have implications that are closer to our rule 30 problem. In particular, if a system is universal, then there’ll be questions (like the halting problem) about its infinite-time behavior that will be undecidable, and which no guaranteed-finite-time computation can answer. But as such, universality is a statement about the existence of initial conditions that reproduce a given computation. It doesn’t say anything about the specifics of a particular initial condition—or about how long it will take to compute a particular result.

OK, but what about a different direction: what about getting empirical evidence about our Problem 3? Is there a way to use statistics, or cryptanalysis, or mathematics, or machine learning to even slightly reduce the computational effort needed to compute the nth value in the center column?

Well, we know that the whole 2D pattern of rule 30 is far from random. In fact, of all 2m2 patches, only m × m can possibly occur—and in practice the number weighted by probability is much smaller. And I don’t doubt that facts like this can be used to reduce the effort to compute the center column to less than O(n2) effort (and that would be a nice partial result). But can it be less than O(n) effort? That’s a much more difficult question.

Clearly if Problem 1 was answered in the negative then it could be. But in a sense asking for less than O(n) computation of the center column is precisely like asking whether there are “predictable regularities” in it. Of course, even if one could find small-scale statistical regularities in the sequence (as answering Problem 2 in the negative would imply), these wouldn’t on their own give one a way to do more than perhaps slightly improve a constant multiplier in the speed of computing the sequence.

Could there be some systematically reduced way to compute the sequence using a neural net—which is essentially a collection of nested real-number functions? I’ve tried to find such a neural net using our current deep-learning technology—and haven’t been able to get anywhere at all.

What about statistical methods? If we could find statistical non-randomness in the sequence, then that would imply an ability to compress the sequence, and thus some redundancy or predictability in the sequence. But I’ve tried all sorts of statistical randomness tests on the center column of rule 30—and never found any significant deviation from randomness. (And for many years—until we found a slightly more efficient rule—we used sequences from finite-size rule 30 systems as our source of random numbers in the Wolfram Language, and no legitimate “it’s not random!” bugs ever showed up.)

Statistical tests of randomness typically work by saying, “Take the supposedly random sequence and process it in some way, then see if the result is obviously non-random”. But what kind of processing should be done? One might see if blocks occur with equal frequency, or if correlations exist, or if some compression algorithm succeeds in doing compression. But typically batteries of tests end up seeming a bit haphazard and arbitrary. In principle one can imagine enumerating all possible tests—by enumerating all possible programs that can be applied to the sequence. But I’ve tried doing this, for example for classes of cellular automaton rules—and have never managed to detect any non-randomness in the rule 30 sequence.

So how about using ideas from mathematics to predict the rule 30 sequence? Well, as such, rule 30 doesn’t seem connected to any well-developed area of math. But of course it’s conceivable that some mapping could be found between rule 30 and ideas, say, in an area like number theory—and that these could either help in finding a shortcut for computing rule 30, or could show that computing it is equivalent to some problem like integer factoring that’s thought to be fundamentally difficult.

I know a few examples of interesting interplays between traditional mathematical structures and cellular automata. For example, consider the digits of successive powers of 3 in base 2 and in base 6:

Digits of successive powers
&#10005
Row[Riffle[
  ArrayPlot[#, ImageSize -> {Automatic, 275}] & /@ {Table[
     IntegerDigits[3^t, 2, 159], {t, 100}], 
    Table[IntegerDigits[3^t, 6, 62], {t, 100}]}, Spacer[10]]]

It turns out that in the base 6 case, the rule for generating the pattern is exactly a cellular automaton. (For base 2, there are additional long-range carries.) But although both these patterns look complex, it turns out that their mathematical structure lets us speed up making certain predictions about them.

Consider the sth digit from the right-hand edge of line n in each pattern. It’s just the sth digit in 3n, which is given by the “formula” (where b is the base, here 2 or 6) Mod[Quotient[3n, bs], b]. But how easy is it to evaluate this formula? One might think that to compute 3n one would have to do n multiplications. But this isn’t the case: instead, one can for example build up 3n using repeated squaring, with about log(n) multiplications. That this is possible is a consequence of the associativity of multiplication. There’s nothing obviously like that for rule 30—but it’s always conceivable that some mapping to a mathematical structure like this could be found.

In Problem 3, we’re talking about the computational effort to compute the nth value in the center column of rule 30—and asking if it can be less than O(n). But imagine that we have a definite algorithm for doing the computation. For any given n, we can see what computational resources it uses. Say the result is r[n]. Then what we’re asking is whether r[n] is less than “big O” of n, or whether MaxLimit[r[n]/n, n ]<.

But imagine that we have a particular Turing machine (or some other computational system) that’s implementing our algorithm. It could be that r[n] will at least asymptotically just be a smooth or otherwise regular function of n for which it’s easy to see what the limit is. But if one just starts enumerating Turing machines, one encounters examples where r[n] appears to have peaks of random heights in random places. It might even be that somewhere there’d be a value of n for which the Turing machine doesn’t halt (or whatever) at all, so that r[n] is infinite. And in general, as we’ll discuss in more detail later, it could even be undecidable just how r[n] grows relative to O(n).

Formal Statements of the Problems

So far, I’ve mostly described the Prize Problems in words. But we can also describe them in computational language (or effectively also in math).

In the Wolfram Language, the first t values in the center column of rule 30 are given by:

c[t_]
&#10005
c[t_] := CellularAutomaton[30, {{1}, 0}, {t, {{0}}}]

And with this definition, the three problems can be stated as predicates about c[t].

Problem 1: Does the center column always remain non-periodic?

Problem 1
&#10005
\!\(
\*SubscriptBox[\(\[NotExists]\), \({p, i}\)]\(
\*SubscriptBox[\(\[ForAll]\), \(t, t > i\)]c[t + p] == c[t]\)\)

or

NotExists
&#10005
NotExists[{p, i}, ForAll[t, t > i, c[t + p] == c[t]]]

or “there does not exist a period p and an initial length i such that for all t with t>i, c[t + p] equals c[t]”.

Problem 2: Does each color of cell occur on average equally often in the center column?

Problem 2
&#10005
\!\(\*UnderscriptBox[\(\[Limit]\), \(t\*
UnderscriptBox["\[Rule]", 
TemplateBox[{},
"Integers"]]\[Infinity]\)]\) Total[c[t]]/t == 1/2

or

DiscreteLimit
&#10005
DiscreteLimit[Total[c[t]]/t, t -> Infinity] == 1/2

or “the discrete limit of the total of the values in c[t]/t as t is 1/2”.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Define machine[m] to be a machine parametrized by m (for example TuringMachine[...]), and let machine[m][n] give {v, t}, where v is the output value, and t is the amount of computational effort taken (e.g. number of steps). Then the problem can be formulated as:

Problem 3
&#10005
\!\(
\*SubscriptBox[\(\[NotExists]\), \(m\)]\((
\*SubscriptBox[\(\[ForAll]\), \(n\)]\(\(machine[m]\)[n]\)[[1]] == 
     Last[c[n]]\  \[And] \ 
\*UnderscriptBox[\(\[MaxLimit]\), \(n -> \[Infinity]\)]
\*FractionBox[\(\(\(machine[m]\)[n]\)[[
       2]]\), \(n\)] < \[Infinity])\)\)

or “there does not exist a machine m which for all n gives c[n], and for which the lim sup of the amount of computational effort spent, divided by n, is finite”. (Yes, one should also require that m be finite, so the machine’s rule can’t just store the answer.)

The Formal Character of Solutions

Before we discuss the individual problems, an obvious question to ask is what the interdependence of the problems might be. If the answer to Problem 3 is negative (which I very strongly doubt), then it holds the possibility for simple algorithms or formulas from which the answers to Problems 1 and 2 might become straightforward. If the answer to Problem 3 is affirmative (as I strongly suspect), then it implies that the answer to Problem 1 must also be affirmative. The contrapositive is also true: if the answer to Problem 1 is negative, then it implies that the answer to Problem 3 must also be negative.

If the answer to Problem 1 is negative, so that there is some periodic sequence that appears in the center column, then if one explicitly knows that sequence, one can immediately answer Problem 2. One might think that answering Problem 2 in the negative would imply something about Problem 3. And, yes, unequal probabilities for black and white implies compression by a constant factor in a Shannon-information way. But to compute value with less than O(n) resources—and therefore to answer Problem 3 in the negative—requires that one be able to identify in a sense infinitely more compression.

So what does it take to establish the answers to the problems?

If Problem 1 is answered in the negative, then one can imagine explicitly exhibiting the pattern generated by rule 30 at some known step—and being able to see the periodic sequence in the center. Of course, Problem 1 could still be answered in the negative, but less constructively. One might be able to show that eventually the sequence has to be periodic, but not know even any bound on where this might happen. If Problem 3 is answered in the negative, a way to do this is to explicitly give an algorithm (or, say, a Turing machine) that does the computation with less than O(n) computational resources.

But let’s say one has such an algorithm. One still has to prove that for all n, the algorithm will correctly reproduce the nth value. This might be easy. Perhaps there would just be a proof by induction or some such. But it might be arbitrarily hard. For example, it could be that for most n, the running time of the algorithm is clearly less than n. But it might not be obvious that the running time will always even be finite. Indeed, the “halting problem” for the algorithm might simply be undecidable. But just showing that a particular algorithm doesn’t halt for a given n doesn’t really tell one anything about the answer to the problem. For that one would have to show that there’s no algorithm that exists that will successfully halt in less than O(n) time.

The mention of undecidability brings up an issue, however: just what axiom system is one supposed to use to answer the problems? For the purposes of the Prize, I’ll just say “the traditional axioms of standard mathematics”, which one can assume are Peano arithmetic and/or the axioms of set theory (with or without the continuum hypothesis).

Could it be that the answers to the problems depend on the choice of axioms—or even that they’re independent of the traditional axioms (in the sense of Gödel’s incompleteness theorem)? Historical experience in mathematics makes this seem extremely unlikely, because, to date, essentially all “natural” problems in mathematics seem to have turned out to be decidable in the (sometimes rather implicit) axiom system that’s used in doing the mathematics.

In the computational universe, though—freed from the bounds of historical math tradition—it’s vastly more common to run into undecidability. And, actually, my guess is that a fair fraction of long-unsolved problems even in traditional mathematics will also turn out to be undecidable. So that definitely raises the possibility that the problems here could be independent of at least some standard axiom systems.

OK, but assume there’s no undecidability around, and one’s not dealing with the few cases in which one can just answer a problem by saying “look at this explicitly constructed thing”. Well, then to answer the problem, we’re going to have to give a proof.

In essence what drives the need for proof is the presence of something infinite. We want to know something for any n, even infinitely large, etc. And the only way to handle this is then to represent things symbolically (“the symbol Infinity means infinity”, etc.), and apply formal rules to everything, defined by the axioms in the underlying axiom system one’s assuming.

In the best case, one might be able to just explicitly exhibit that series of rule applications—in such a way that a computer can immediately verify that they’re correct. Perhaps the series of rule applications could be found by automated theorem proving (as in FindEquationalProof). More likely, it might be constructed using a proof assistant system.

It would certainly be exciting to have a fully formalized proof of the answer to any of the problems. But my guess is that it’ll be vastly easier to construct a standard proof of the kind human mathematicians traditionally do. What is such a proof? Well, it’s basically an argument that will convince other humans that a result is correct.

There isn’t really a precise definition of that. In our step-by-step solutions in Wolfram|Alpha, we’re effectively proving results (say in calculus) in such a way that students can follow them. In an academic math journal, one’s giving proofs that successfully get past the peer review process for the journal.

My own guess would be that if one were to try to formalize essentially any nontrivial proof in the math literature, one would find little corners that require new results, though usually ones that wouldn’t be too hard to get.

How can we handle this in practice for our prizes? In essence, we have to define a computational contract for what constitutes success, and when prize money should be paid out. For a constructive proof, we can get Wolfram Language code that can explicitly be run on any sufficiently large computer to establish the result. For formalized proofs, we can get Wolfram Language code that can run through the proof, validating each step.

But what about for a “human proof”? Ultimately we have no choice but to rely on some kind of human review process. We can ask multiple people to verify the proof. We could have some blockchain-inspired scheme where people “stake” the correctness of the proof, then if one eventually gets consensus (whatever this means) one pays out to people some of the prize money, in proportion to their stake. But whatever is done, it’s going to be an imperfect, “societal” result—like almost all of the pure mathematics that’s so far been done in the world.

What Will It Take?

OK, so for people interested in working on the Problems, what skills are relevant? I don’t really know. It could be discrete and combinatorial mathematics. It could be number theory, if there’s a correspondence with number-based systems found. It could be some branch of algebraic mathematics, if there’s a correspondence with algebraic systems found. It could be dynamical systems theory. It could be something closer to mathematical logic or theoretical computer science, like the theory of term rewriting systems.

Of course, it could be that no existing towers of knowledge—say in branches of mathematics—will be relevant to the problems, and that to solve them will require building “from the ground up”. And indeed that’s effectively what ended up happening in the solution for my 2,3 Turing Machine Prize in 2007.

I’m a great believer in the power of computer experiments—and of course it’s on the basis of computer experiments that I’ve formulated the Rule 30 Prize Problems. But there are definitely more computer experiments that could be done. So far we know a billion elements in the center column sequence. And so far the sequence doesn’t seem to show any deviation from randomness (at least based on tests I’ve tried). But maybe at a trillion elements (which should be well within range of current computer systems) or a quadrillion elements, or more, it eventually will—and it’s definitely worth doing the computations to check.

The direct way to compute n elements in the center column is to run rule 30 for n steps, using at an intermediate stage up to n cells of memory. The actual computation is quite well optimized in the Wolfram Language. Running on my desktop computer, it takes less than 0.4 seconds to compute 100,000 elements:

CellularAutomaton
&#10005
CellularAutomaton[30, {{1}, 0}, {100000, {{0}}}]; // Timing

Internally, this is using the fact that rule 30 can be expressed as Xor[p, Or[q, r]], and implemented using bitwise operations on whole words of data at a time. Using explicit bitwise operations on long integers takes about twice as long as the built-in CellularAutomaton function:

Module
&#10005
Module[{a = 1}, 
   Table[BitGet[a, a = BitXor[a, BitOr[2 a, 4 a]]; i - 1], {i, 
     100000}]]; // Timing

But these results are from single CPU processors. It’s perfectly possible to imagine parallelizing across many CPUs, or using GPUs. One might imagine that one could speed up the computation by effectively caching the results of many steps in rule 30 evolution, but the fact that across the rows of the rule 30 pattern all blocks appear to occur with at least roughly equal frequency makes it seem as though this would not lead to significant speedup.

Solving some types of math-like problems seem pretty certain to require deep knowledge of high-level existing mathematics. For example, it seems quite unlikely that there can be an “elementary” proof of Fermat’s last theorem, or even of the four-color theorem. But for the Rule 30 Prize Problems it’s not clear to me. Each of them might need sophisticated existing mathematics, or they might not. They might be accessible only to people professionally trained in mathematics, or they might be solvable by clever “programming-style” or “puzzle-style” work, without sophisticated mathematics.

Generalizations and Relations

Sometimes the best way to solve a specific problem is first to solve a related problem—often a more general one—and then come back to the specific problem. And there are certainly many problems related to the Rule 30 Prize Problems that one can consider.

For example, instead of looking at the vertical column of cells at the center of the rule 30 pattern, one could look at a column of cells in a different direction. At 45°, it’s easy to see that any sequence must be periodic. On the left the periods increase very slowly; on the right they increase rapidly. But what about other angles?

Or what about looking at rows of cells in the pattern? Do all possible blocks occur? How many steps is it before any given block appears? The empirical evidence doesn’t see any deviation from blocks occurring at random, but obviously, for example, successive rows are highly correlated.

What about different initial conditions? There are many dynamical systems–style results about the behavior of rule 30 starting with equal probability from all possible infinite initial conditions. In this case, for example, it’s easy to show that all possible blocks occur with equal frequency, both at a given row, and in a given vertical column. Things get more complicated if one asks for initial conditions that correspond, for example, to all possible sequences generated by a given finite state machine, and one could imagine that from a sequence of results about different sets of possible initial conditions, one would eventually be able to say something about the case of the single black cell initial condition.

Another straightforward generalization is just to look not at a single black cell initial condition, but at other “special” initial conditions. An infinite periodic initial condition will always give periodic behavior (that’s the same as one gets in a finite-size region with periodic boundary conditions). But one can, for example, study what happens if one puts a “single defect” in the periodic pattern:

A 'single defect' in the periodic pattern
&#10005
GraphicsRow[(ArrayPlot[
     CellularAutomaton[30, 
      MapAt[1 - #1 &, Flatten[Table[#1, Round[150/Length[#1]]]], 50], 
      100]] &) /@ {{1, 0}, {1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0}, {1, 
    0, 0, 0, 0, 0, 0}, {1, 1, 1, 0, 0}}]

One can also ask what happens when one has not just a single black cell, but some longer sequence in the initial conditions. How does the center column change with different initial sequences? Are there finite initial sequences that lead to “simpler” center columns?

Or are there infinite initial conditions generated by other computational systems (say substitution systems) that aren’t periodic, but still give somehow simple rule 30 patterns?

Then one can imagine going “beyond” rule 30. What happens if one adds longer-range “exceptions” to the rules? When do extensions to rule 30 show behavior that can be analyzed in one way or another? And can one then see the effect of removing the “exceptions” in the rule?

Of course, one can consider rules quite different from rule 30 as well—and perhaps hope to develop intuition or methods relevant to rule 30 by looking at other rules. Even among the 256 two-color nearest-neighbor rules, there are others that show complex behavior starting from a simple initial condition:

ArrayPlot
&#10005
Row[Riffle[
  Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {150, All}], 
      PixelConstrained -> 1, Frame -> False], 
     Style[Text[StringTemplate["rule ``"][#]], 12], 
     LabelStyle -> Opacity[.5]] & /@ {45, 73}, Spacer[8]]]

And if one looks at larger numbers of colors and larger neighbors one can find an infinite number of examples. There’s all sorts of behavior that one sees. And, for example, given any particular sequence, one can search for rules that will generate it as their center column. One can also try to classify the center-column sequences that one sees, perhaps identifying a general class “like rule 30” about which global statements can be made.

But let’s discuss the specific Rule 30 Prize Problems. To investigate the possibility of periodicity in rule 30 (as in Problem 1), one could study lots of different rules, looking for examples with very long periods, or very long transients—and try to use these to develop an intuition for how and when these can occur.

To investigate the equal-frequency phenomenon of Problem 2, one can look at different statistical features, and see both in rule 30 and across different rules when it’s possible to detect regularity.

For Problem 3, one can start looking at different levels of computational effort. Can one find the nth value with computational effort O(nγ) for any γ<2 (I don't know any method to achieve this)? Can one show that one can’t find the nth value with less than O(log(n)) computational effort? What about with less than O(log(n)) available memory? What about for different rules? Periodic and nested patterns are easy to compute quickly. But what other examples can one find?

As I’ve mentioned, a big achievement would be to show computation universality for rule 30. But even if one can’t do it for rule 30, finding additional examples (beyond, for example, rule 110) will help build intuition about what might be going on in rule 30.

Then there’s NP-completeness. Is there a way of setting up some question about the behavior of rule 30 for some family of initial conditions where it’s possible to prove that the question is NP-complete? If this worked, it would be an exciting result for cryptography. And perhaps, again, one can build up intuition by looking at other rules, even ones that are more “purposefully constructed” than rule 30.

How Hard Are the Problems?

When I set up my 2,3 Turing Machine Prize in 2007 I didn’t know if it’d be solved in a month, a year, a decade, a century, or more. As it turned out, it was actually solved in about four months. So what will happen with the Rule 30 Prize Problems? I don’t know. After nearly 40 years, I’d be surprised if any of them could now be solved in a month (but it’d be really exciting if that happened!). And of course some superficially similar problems (like features of the digits of π) have been out there for well over a century.

It’s not clear whether there’s any sophisticated math (or computer science) that exists today that will be helpful in solving the problems. But I’m confident that whatever is built to solve them will provide structure that will be important for solving other problems about the computational universe. And the longer it takes (think Fermat’s last theorem), the larger the amount of useful structure is likely to be built on the way to a solution.

I don’t know if solutions to the problems will be “obviously correct” (it’ll help if they’re constructive, or presented in computable form), or whether there’ll be a long period of verification to go through. I don’t know if proofs will be comparatively short, or outrageously long. I don’t know if the solutions will depend on details of axiom systems (“assuming the continuum hypothesis”, etc.), or if they’ll be robust for any reasonable choices of axioms. I don’t know if the three problems are somehow “comparably difficult”—or if one or two might be solved, with the others holding out for a very long time.

But what I am sure about is that solving any of the problems will be a significant achievement. I’ve picked the problems to be specific, definite and concrete. But the issues of randomness and computational irreducibility that they address are deep and general. And to know the solutions to these problems will provide important evidence and raw material for thinking about these issues wherever they occur.

Of course, having lived now with rule 30 and its implications for nearly 40 years, I will personally be thrilled to know for certain even a little more about its remarkable behavior.

Just Published: Adventures of a Computational Explorer

$
0
0
Wolfram_Adventures_Thumb2

Today my latest book is published: Adventures of a Computational Explorer.

Just Published: Adventures of a Computational Explorer

From the preface:

“You work so hard… but what do you do for fun?” people will ask me. Well, the fact is that I’ve tried to set up my life so that the things I work on are things I find fun. Most of those things are aligned with big initiatives of mine, and with products and companies and scientific theories that I’ve built over decades. But sometimes I work on things that just come up, and that for one reason or another I find interesting and fun.

This book is a collection of pieces I’ve written over the past dozen years on some of these things, and the adventures I’ve had around them. Most of the pieces I wrote in response to some particular situation or event. Their topics are diverse. But it’s remarkable how connected they end up being. And at some level all of them reflect the paradigm for thinking that has defined much of my life.

It all centers around the idea of computation, and the generality of abstraction to which it leads. Whether I’m thinking about science, or technology, or philosophy, or art, the computational paradigm provides both an overall framework and specific facts that inform my thinking. And in a sense this book reflects the breadth of applicability of this computational paradigm.

But I suppose it also reflects something else that I’ve long cultivated in myself: a willingness and an interest in applying my ways of thinking to pretty much any topic. I sometimes imagine that I will have nothing much to add to some particular topic. But it’s remarkable how often the computational paradigm—and my way of thinking about it—ends up providing a new and different insight, or an unexpected way forward.

I often urge people to “keep their thinking apparatus engaged” even when they’re faced with issues that don’t specifically seem to be in their domains of expertise. And I make a point of doing this myself. It helps that the computational paradigm is so broad. But even at a much more specific level I’m continually amazed by how much the things I’ve learned from science or language design or technology development or business actually do end up connecting to the issues that come up.

If there’s one thing that I hope comes through from the pieces in this book it’s how much fun it can be to figure things out, and to dive deep into understanding particular topics and questions. Sometimes there’s a simple, superficial answer. But for me what’s really exciting is the much more serious intellectual exploration that’s involved in giving a proper, foundational answer. I always find it particularly fun when there’s a very practical problem to solve, but to get to a good solution requires an adventure that takes one through deep, and often philosophical, issues.

Inevitably, this book reflects some of my personal journey. When I was young I thought my life would be all about making discoveries in specific areas of science. But what I’ve come to realize—particularly having embraced the computational paradigm—is that the same intellectual thought processes can be applied not just to what one thinks of as science, but to pretty much anything. And for me there’s tremendous satisfaction in seeing how this works out.

The New World of Notebook Publishing

$
0
0
World-Pub-Blog-Icon

Wolfram Notebooks on the Web

We’ve been working towards it for many years, but now it’s finally here: an incredibly smooth workflow for publishing Wolfram Notebooks to the web—that makes possible a new level of interactive publishing and computation-enabled communication.

You create a Wolfram Notebook—using all the power of the Wolfram Language and the Wolfram Notebook system—on the desktop or in the cloud. Then you just press a button to publish it to the Wolfram Cloud—and immediately anyone anywhere can both read and interact with it on the web.

The new world of notebook publishing

It’s an unprecedentedly easy way to get rich, interactive, computational content onto the web. And—together with the power of the Wolfram Language as a computational language—it promises to usher in a new era of computational communication, and to be a crucial driver for the development of “computational X” fields.

When a Wolfram Notebook is published to the cloud, it’s immediately something people can read and interact with. But it’s more than that. Because if you press the Make Your Own Copy button, you’ll get your own copy of the notebook, which you can not only read and interact with, but also edit and do computation in, right on the web. And what this means is that the notebook becomes not just something you look at, but something you can immediately use and build on.

And, by the way, we’ve set it up so that anyone can make their own copy of a published notebook, and start using it; all they need is a (free) Cloud Basic account. And people with Cloud Basic accounts can even publish their own notebooks in the cloud, though if they want to store them long term they’ll have to upgrade their account. (Through the Wolfram Foundation, we’re also developing a permanent curated Notebook Archive for public-interest notebooks.)

Make Your Own Copy

There are lots of other important workflows too. On a computer, you can immediately download notebooks to your desktop, and run them there natively using the latest version of the Wolfram Player that we’ve made freely available for many years. You can also run notebooks natively on iOS devices using the Wolfram Player app. And the Wolfram Cloud app (on iOS or Android) gives you a streamlined way to make your own copy of a notebook to work with in the cloud.

Notebook workflows

You can publish a Wolfram Notebook to the cloud, and you can use it as a complete, rich webpage. But you can also embed the notebook inside an existing webpage, providing anything from a single (perhaps dynamically updated) graphic to a full interactive interface or embedded document.

And, by the way, the exact same technology that enables Wolfram Notebooks in the cloud also allows you to immediately set up Wolfram Language APIs or form interfaces, for use either directly on the web, or through client libraries in languages like Python and Java.

The Story of Notebooks

The story of notebooks

We invented notebooks in 1988 as the main interface for Mathematica Version 1.0, and over the past three decades, many millions of Wolfram Notebooks have been made. Some record ongoing work, some are exercises, and some contain discoveries small and large. Some are expositions, presentations or online books and papers. Some are interactive demonstrations. And with the emergence of the Wolfram Language as a full-scale computational language, more and more now serve as rich computational essays, communicating with unprecedented effectiveness in a mixture of human language and computational language.

Over the years, we’ve progressively polished the notebook experience with a long series of user interface innovations, adapted and optimized for successive generations of desktop systems. But what’s allowed us now to do full-scale notebook publishing on the web is that—after many years of work—we’ve managed to get a polished version of Wolfram Notebooks running in the cloud, much as they do on desktop.

Create a notebook on the desktop or in the cloud, complete with all its code, hierarchical sections, interactive elements, large graphics, etc. When it’s published as a cloud notebook people will be able to visit it just like they would visit any webpage, except that it’ll automatically “come alive” and allow all sorts of interaction.

Some of that interaction will happen locally inside the web browser; some of it will automatically access servers in the cloud. But in the end—reflecting our whole hyperarchitecture approach—Wolfram Notebooks will run seamlessly across desktop, cloud and mobile. Create your content once, and let people not only read it anywhere, but also interact with it, as well as modify and compute with it.

What’s in a Notebook

When you first go to a Wolfram Notebook in the cloud it might look like an ordinary webpage. But the fact that it’s an active, computational document means there are lots of things you can immediately do with it. If you see a graphic, you’ll immediately be able to resize it. If it’s 3D, you’ll be able to rotate it too. Notebooks are typically organized in a hierarchy of cells, and you can immediately open and close groups of cells to navigate the hierarchy.

Notebook cell hierarchy

There can also be dynamic interactive content. In the Wolfram Language, functions like Manipulate automatically set up interactive user interfaces in notebooks, with sliders and so on—and these are automatically active in a published cloud notebook. Other content can be dynamic too: using functions like Dynamic you can for example dynamically pull data in real time from the Wolfram Knowledgebase or the Wolfram Data Drop or—if the user allows it—from their computer’s camera or microphone.

Dynamic interactive content

When you write a computational essay you typically want people to read your Wolfram Language code, because it’s part of how you’re communicating your content. But in a Wolfram Notebook you can also use Iconize to just show an iconized version of details of your code (like, say, options for how to display graphics):

Iconize

Normally when you do a computation in a Wolfram Notebook, there’ll be a succession of In[ ] and Out[ ] cells. But you can always double-click the Out[ ] cell to close the In[ ] cell, so people at first just see the output, and not the computational language code that made it.

Input and output cells

One of the great things about the Wolfram Language is how integrated and self-contained it is. And that means that it’s realistic to pick up even fragments of code from anywhere in a notebook, and expect to have it work it elsewhere. In a published notebook, just click a piece of code and it’ll get copied so you can paste it into a notebook you’re creating, on the cloud or the desktop.

A great source of “ready-made” interactive content for Wolfram Notebooks is the 12,000+ interactive Demonstrations in the Wolfram Demonstrations Project. Press Copy to Clipboard and you can paste the Demonstration (together with closed cells containing its code) into any notebook.

Copy Demonstration to clipboard

Once you’ve assembled the notebook you want, you can publish it. On the desktop, go to File > Publish to Cloud. In the cloud, just press Publish. You can either specify the name for the published notebook—or you can let the system automatically pick a UUID name. But you can take any notebook—even a large one—and very quickly have a published version in the cloud.

Computational Journals

It didn’t take long after we invented notebooks back in 1988 for me to start thinking about using them to enable a new kind of computational publishing, with things like computational journals and computational books. And, indeed, even very early on, there started to be impressive examples of what could be done.

But with computation tied to the desktop, there was always a limit to what could be done. Even before the web, we invented systems for distributing notebooks as desktop files. Later, when web browsers existed, we built plugins to access desktop computation capabilities from within browsers. And already in the mid-1990s we built mechanisms for generating content through web servers from within webpages. But it was only with modern web technology and with the whole framework of the Wolfram Cloud that the kind of streamlined notebook publishing that we’re releasing today has become possible.

But given what we now have, I think there’s finally an opportunity to transform things like scientific and technical publishing—and to make them truly take advantage of the computational paradigm. Yes, there can be rich interactive diagrams, that anyone can use on the web. And, yes, things can be dynamically updated, for example based on real-time data from the Wolfram Knowledgebase or elsewhere.

But important as these things are, I think they ultimately pale in comparison to what Wolfram Notebooks can do for the usability and reproducibility of knowledge. Because a Wolfram Notebook doesn’t just give you something to read or even interact with; it can also give you everything you need to actually use—or reproduce—what it says.

Either directly within the notebook, or in the Wolfram Data Repository, or elsewhere in the cloud, there can for example be underlying data—say from observations or experiments. Then there can be code in the notebook that computes graphics or other outputs that can be derived from this data. And, yes, that code could be there just to be there—and could be hidden away in some kind of unreadable computational footnote.

But there’s something much more powerful that’s now uniquely possible with the Wolfram Language as it exists today: it’s possible to use the language not just to provide code for a computer to run, but also to express things in computational language in a way that not just computers, but also humans, can readily understand. Technical papers often use mathematical notation to succinctly express mathematical ideas. What we’ve been working toward all these years with the Wolfram Language is to provide a full-scale computational language that can also express computational ideas.

So let’s say you’ve got a technical paper that’s presented as a Wolfram Notebook, with lots of its content in the Wolfram Language. What can you do with it? You can certainly run the computational language code to make sure it produces what the paper says. But more important, you can take pieces of that computational language code and build on it, using it yourself in your own notebook, running it for different cases, modifying it, etc.

Of course, the fact that this can actually work in practice is incredibly nontrivial, and relies on a huge amount of unique development that we’ve done. Because first and foremost, it requires a coherently designed, full-scale symbolic computational language—because that’s the only way it’s realistic to be able to take even small fragments of code and have them work on their own, or in different situations. But there’s more too: it’s also critical that code that works now goes on working in the future, and with the design discipline we’ve had in the Wolfram Language we have an impressive history of compatibility spanning more than 30 years.

Back in the 1970s when I started writing technical papers, they typically began as handwritten documents. Later they were typed on a typewriter. Then when a journal was going to publish them, they would get copyedited and typeset, before being printed. It was a laborious—and inevitably somewhat expensive—process.

By the 1980s, personal computers with word processors and typesetting systems were becoming common—and pretty soon journals could expect “camera-ready” electronic versions of papers. (As it happens, in 1986 I started what may have been the first journal to routinely accept such things.)

And as the technology improved, the quality of what an author could readily make and what a publisher could produce in a fully typeset journal gradually converged, leaving the primary role of the journal being around branding and selectivity, and for many people calling its value into question.

But for computational journals it’s a new story. Because if a paper has computational language code in it, there’s the immediate question of whether the code actually runs, and runs correctly. It’s a little like the old process of copyediting a paper so it could be typeset. There’s real human work—and understanding—that’s needed to make sure the code runs correctly. The good news is that one can use methods from software quality assurance, now enhanced by things like modern machine learning. But there’s still real work to be done—and as a result there’s real value to be added by the process of “official publication” in a computational journal, and there’s a real reason to actually have a computational journal as an organized, potentially commercial, thing.

We’ve been doing review and curation of submissions to the Wolfram Demonstrations Project for a dozen years now. And, yes, it takes work. But the result is that we can be confident that the Demonstrations we publish actually run, and will go on doing so. For the Wolfram Data Repository we also have a review process, there to ensure that data is computable at an appropriate level.

One day there’ll surely be “first-run” computational journals, where new results are routinely reported through computational essays. But even before that, we can expect ancillary computational journals, that provide genuine “computation-backed” and “data-backed” publication. There just hasn’t been the technology to make this work properly in the past. Now, with the Wolfram Language, and the new streamlined web publishing of Wolfram Notebooks, everything that’s needed finally exists.

Changing the Way I Work

It’s always a sign that something is important when it immediately changes the way one works. And that’s certainly something that’s happened for me with notebook publishing.

I might give a talk where I build up a notebook, say doing a live experiment or a demonstration. And then at the end of the talk, I’ll do something new: I’ll publish the notebook to the cloud (either by pressing the button or using CloudPublish). Then I’ll make a QR code of the notebook URL (say using BarcodeImage), and show it big on the screen. People in the audience can then hold up their phones to read the QR code—and then just click the URL, and immediately be able to start using my notebook in the Wolfram Cloud on their phones.

I can tell that notebook publishing is getting me to write more, because now I have a convenient way to distribute what I write. I’ll often do computational explorations of one thing or another. And in the past, I’d just store the notebooks I made in my filesystem (and, yes, over 30+ years I’ve built up a huge number). But now it’s incredibly fast to add some text to turn the notebooks into computational essays—that I can immediately publish to the cloud, so anyone can access them.

Sometimes I’ll put a link to the published notebook in a post like this; sometimes I’ll do something like tweet it. But the point is that I now have a very streamlined way to give people direct access to computational work I do, in a form that they can immediately interact with, and build on.

From a technical development point of view, the path to where we are today has been a long and complex one, involving many significant achievements in software engineering. But the result is something conceptually clear and simple, though extremely powerful—that I think is going to enable a major new level of computation-informed communication: a new world of notebook publishing.


More about Wolfram Notebooks:

Wolfram Notebooks Overview »
Wolfram Notebooks Interactive Course »

A Few Thoughts about Deep Fakes

$
0
0
deep-fake-thumb

Someone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes….

What You See May Not Be What Happened

The idea of modifying images is as old as photography. At first, it had to be done by hand (sometimes with airbrushing). By the 1990s, it was routinely being done with image manipulation software such as Photoshop. But it’s something of an art to get a convincing result, say for a person inserted into a scene. And if, for example, the lighting or shadows don’t agree, it’s easy to tell that what one has isn’t real.

What about videos? If one does motion capture, and spends enough effort, it’s perfectly possible to get quite convincing results—say for animating aliens, or for putting dead actors into movies. The way this works, at least in a first approximation, is for example to painstakingly pick out the keypoints on one face, and map them onto another.

What’s new in the past couple of years is that this process can basically be automated using machine learning. And, for example, there are now neural nets that are simply trained to do “face swapping”:

Face swap

In essence, what these neural nets do is to fit an internal model to one face, and then apply it to the other. The parameters of the model are in effect learned from looking at lots of real-world scenes, and seeing what’s needed to reproduce them. The current approaches typically use generative adversarial networks (GANs), in which there’s iteration between two networks: one trying to generate a result, and one trying to discriminate that result from a real one.

Today’s examples are far from perfect, and it’s not too hard for a human to tell that something isn’t right. But even just as a result of engineering tweaks and faster computers, there’s been progressive improvement, and there’s no reason to think that within a modest amount of time it won’t be possible to routinely produce human-indistinguishable results.

Can Machine Learning Police Itself?

OK, so maybe a human won’t immediately be able to tell what’s real and what’s not. But why not have a machine do it? Surely there’s some signature of something being “machine generated”. Surely there’s something about a machine-generated image that’s statistically implausible for a real image.

Well, not naturally. Because, in fact, the whole way the machine images are generated is by having models that as faithfully as possible reproduce the “statistics” of real images. Indeed, inside a GAN there’s explicitly a “fake or not” discriminator. And the whole point of the GAN is to iterate until the discriminator can’t tell the difference between what’s being generated, and something real.

Could one find some other feature of an image that the GAN isn’t paying attention to—like whether a face is symmetric enough, or whether writing in the background is readable? Sure. But at this level it’s just an arms race: having identified a feature, one puts it into the model the neural net is using, and then one can’t use that feature to discriminate any more.

There are limitations to this, however. Because there’s a limit to what a typical neural net can learn. Generally, neural nets do well at tasks like image recognition that humans do without thinking. But it’s a different story if one tries to get neural nets to do math, and for example factor numbers.

Imagine that in modifying a video one has to fill in a background that’s showing some elaborate computation—say a mathematical one. Well, then a standard neural net basically doesn’t stand a chance.

Will it be easy to tell that it’s getting it wrong? It could be. If one’s dealing with public-key cryptography, or digital signatures, one can certainly imagine setting things up so that it’s very hard to generate something that is correct, but easy to check whether it is.

But will this kind of thing show up in real images or videos? My own scientific work has actually shown that irreducibly complex computation can be quite ubiquitous even in systems with very simple rules—and presumably in many systems in nature. Watch a splash in water. It takes a complex computation to figure out the details of what’s going to happen. And while a neural net might be able to get something that basically looks like a splash, it’d be vastly harder for it to get the details of a particular splash right.

But even though in the abstract computational irreducibility may be common, we humans, in our evolution and the environments we set up for ourselves, tend to end up doing our best to avoid it. We have shapes with smooth curves. We build things with simple geometries. We try to make things evolvable or understandable.  And it’s this avoidance of computational irreducibility that makes it feasible for neural nets to successfully model things like the visual scenes in which we typically find ourselves.

One can disrupt this, of course. Just put in the picture a display that’s showing some sophisticated computation (even, for example, a cellular automaton). If someone tries to fake some aspect of this with a neural net, it won’t (at least on its own) feasibly be able to get the details right.

I suspect that in the future of human technology—as we mine deeper in the computational universe—irreducible computation will be much more common in what we build. But as of now, it’s still rare in typical human-related situations. And as a result, we can expect that neural nets will successfully be able to model what’s going on well enough to at least fool other neural nets.

How to Know What’s Real

So if there’s no way to analyze the bits in an image to tell if it’s a real photograph, does that mean we just can’t tell? No. Because we can also think about metadata associated with the image—and about the provenance of the image. When was the image created? By whom? And so on.

So let’s say we create an image. How can we set things up so that we can prove when we did it? Well, in modern times it’s actually very easy. We take the image, and compute a cryptographic hash from it (effectively by applying a mathematical operation that derives a number from the bits in the image). Then we take this hash and put it on a blockchain.

The blockchain acts as a permanent ledger. Once we’ve put data on it, it can never be changed, and we can always go back and see what the data was, and when it was added to the blockchain.

This setup lets us prove that the image was created no later than a certain time. If we want to prove that the image wasn’t created earlier, then when we create the hash for the image, we can throw in a hash from the latest block on our favorite blockchain.

OK, but what about knowing who created the image? It takes a bit of cryptographic infrastructure—very similar to what’s done in proving the authenticity of websites. But if one can trust some “certificate authority” then one can associate a digital signature to the image that validates who created it.

But how about knowing where the image was taken? Assuming one has a certain level of access to the device or the software, GPS can be spoofed. If one records enough about the environment when the image was taken, then it gets harder and harder to spoof. What were the nearby Wi-Fi networks? The Bluetooth pings? The temperature? The barometric pressure? The sound level? The accelerometer readings? If one has enough information collected, then it becomes easier to tell if something doesn’t fit.

There are several ways one could do this. Perhaps one could just detect anomalies using machine learning. Or perhaps one could use actual models of how the world works (the path implied by the accelerometer isn’t consistent with the equations of mechanics, etc.). Or one could somehow tie the information to some public computational fact. Was the weather really like that in the place the photo was said to be taken? Why isn’t there a shadow from such-and-such a plane going overhead? Why is what’s playing on the television not what it should be? Etc.

But, OK, even if one just restricts oneself to creation time and creator ID, how can one in practice validate them?

The best scheme seems to be something like how modern browsers handle website security. The browser tries to check the cryptographic signature of the website. If it matches, the browser shows something to say the website is secure; if not, it shows some kind of warning.

So let’s say an image comes with data on its creation time and creator ID. The data could be metadata (say EXIF data), or it could be a watermark imprinted on the detailed bits in the image. Then the image viewer (say in the browser) can check whether the hash on a blockchain agrees with what the data provided by the image implies. If it does, fine. And the image viewer can make the creation time and creator ID available. If not, the image viewer should warn the user that something seems to be wrong.

Exactly the same kind of thing can be done with videos. It just requires video players computing hashes on the video, and comparing to what’s on a blockchain. And by doing this, one can guarantee, for example, that one’s seeing a whole video that was made at a certain time.

How would this work in practice? Probably people often wouldn’t want to see all the raw video taken at some event. But a news organization, for example, could let people click through to it if they wanted. And one can easily imagine digital signature mechanisms that could be used to guarantee that an edited video, for example, contained no content not in certain source videos, and involved, say, specified contiguous chunks from these source videos.

The Path Forward

So, where does this leave us with deep fakes? Machine learning on its own won’t save us. There’s not going to be a pure “fake or not” detector that can run on any image or video. Yes, there’ll be ways to protect oneself against being “faked” by doing things like wearing a live cellular automaton tie. But the real way to combat deep fakes, I think, is to use blockchain technology—and to store on a public ledger cryptographic hashes of both images and sensor data from the environment where the images were acquired. The very presence of a hash can guarantee when an image was acquired; “triangulating” from sensor and other data can give confidence that what one is seeing was something that actually happened in the real world.

Of course, there are lots of technical details to work out. But in time I’d expect image and video viewers could routinely check against blockchains (and “data triangulation computations”), a bit like how web browsers now check security certificates. And today’s “pics or it didn’t happen” will turn into “if it’s not on the blockchain it didn’t happen”.

My Part in an Origin Story: The Launching of the Santa Fe Institute

$
0
0
Launching the Santa Fe Institute

The first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops.

It was a slightly dark room, decorated with Native American artifacts. Around it were tables arranged in a large rectangle, at which sat a couple dozen men (yes, all men), mostly in their sixties. The afternoon was wearing on, with many different people giving their various views about how to organize what amounted to a putative great new interdisciplinary university.

Here’s the original seating chart, together with a current view of the meeting room. (I’m only “Steve” to Americans currently over the age of 60…):

Santa Fe seating chart
Dobkin Boardroom

I think I was less patient in those days. But eventually I could stand it no longer. I don’t remember my exact words, but they boiled down to: “What are you going to do if you only raise a few million dollars, not two billion?” It was a strange moment. After all, I was by far the youngest person there—at 25 years old—and yet it seemed to have fallen to me to play the “let’s get real” role. (To be fair, I had founded my first tech company a couple of years earlier, and wasn’t a complete stranger to the world of grandiose “what-if” discussions, even if I was surprised, though more than a little charmed, to be seeing them in the sixty-something-year-old set.)

A fragment of my notes from the day record my feelings:

What is supposed to be the point of this discussion?

George Cowan (Manhattan Project alum, Los Alamos administrator, and founder of the Los Alamos Bank) was running the meeting, and I sensed a mixture of frustration and relief at my question. I don’t remember precisely what he said, but it boiled down to: “Well, what do you think we should do?” “Well”, I said, “I do have a suggestion”. I summarized it a bit, but then it was agreed that later that day I should give a more formal presentation. And that’s basically how I came to suggest that what would become the Santa Fe Institute should focus on what I called “Complex Systems Theory”.

Of course, there was a whole backstory to this. It basically began in 1972, when I was 12 years old, and saw the cover of a college physics textbook that purported to show an arrangement of simulated colliding molecules progressively becoming more random. I was fascinated by this phenomenon, and quite soon started trying to use a computer to understand it. I didn’t get too far with this. But it was the golden age of particle physics, and I was soon swept up in publishing papers about a variety of topics in particle physics and cosmology.

Still, in all sorts of different ways I kept on coming back to my interest in how randomness—or complexity—gets produced. In 1978 I went to Caltech as a graduate student, with Murray Gell-Mann (inventor of quarks, and the first chairman of the Santa Fe Institute) doing his part to recruit me by successfully tracking down a phone number for me in England. Then in 1979, as a way to help get physics done, I set about building my first large-scale computer language. In 1981, the first version was finished, I was installed as a faculty member at Caltech—and I decided it was time for me to try something more ambitious, and really see what I could figure out about my old interest in randomness and complexity.

By then I had picked away at many examples of complexity. In self-gravitating gases. In dendritic crystal growth. In road traffic flow. In neural networks. But the reductionist physicist in me wanted to drill down and find out what was underneath all these. And meanwhile the computer language designer in me thought, “Let’s just invent something and see what can be done with it”. Well, pretty soon I invented what I later found out were called cellular automata.

I didn’t expect that simple cellular automata would do anything particularly interesting. But I decided to try computer experiments on them anyway. And to my great surprise I discovered that—despite the simplicity of their construction—cellular automata can in fact produce behavior of great complexity. It’s a major shock to traditional scientific intuition—and, as I came to realize in later years, a clue to a whole new kind of science.

But for me the period from 1981 to 1984 was an exciting one, as I began to explore the computational universe of simple programs like cellular automata, and saw just how rich and unexpected it was. David Pines, as the editor of Reviews of Modern Physics, had done me the favor of publishing my first big paper on cellular automata (John Maddox, editor of Nature, had published a short summary a little earlier). Through the Center for Nonlinear Studies, I had started making visits to Los Alamos in 1981, and I initiated and co-organized the first-ever conference devoted to cellular automata, held at Los Alamos in 1983.

In 1983 I had left Caltech (primarily as a result of an unhappy interaction about intellectual property rights) and gone to the Institute for Advanced Study in Princeton, and begun to build a group there concerned with studying the basic science of complex systems. I wasn’t sure until quite a few years later just how general the phenomena I’d seen in cellular automata were. But I was pretty certain that there were at least many examples of complexity across all sorts of fields that they’d finally let one explain in a fundamental, theoretical way.

I’m not sure when I first heard about plans for what was then called the Rio Grande Institute. But I remember not being very hopeful about it; it seemed too correlated with the retirement plans of a group of older physicists. But meanwhile, people like Pete Carruthers (director of T Division at Los Alamos) were encouraging me to think about starting my own institute to pursue the kind of science I thought could be done.

I didn’t know quite what to make of the letter I received in July 1984 from Nick Metropolis (long-time Los Alamos scientist, and inventor of the Metropolis method). It described the nascent Rio Grande Institute as “a teaching and research institution responsive to the challenge of emerging new syntheses in science”. Murray Gell-Mann had told me that it would bring together physics and archaeology, linguistics and cosmology, and more. But at least in the circulated documents, the word “complexity” appeared quite often.

Letter from Los Alamos—click to enlarge

The invitation described the workshop as being “to examine a concept for a fresh approach to research and teaching in rapidly developing fields of scientific activity dealing with highly complex, interactive systems”. Murray Gell-Mann, who had become a sort of de facto intellectual leader of the effort, was given to quite flowery descriptions, and declared that the institute would be involved with “simplicity and complexity”.

When I arrived at the workshop it was clear that everyone wanted their favorite field to get a piece of the potential action. Should I even bring up my favorite emerging field? Or should I just make a few comments about computers and let the older guys do their thing?

As I listened to the talks and discussions, I kept wondering how what I was studying might relate to them. Quite often I really didn’t know. At the time I still believed, for example, that adaptive systems might have fundamentally different characteristics. But still, the term “complexity” kept on coming up. And if the Rio Grande Institute needed an area to concentrate on, it seemed that a general study of complexity would be the closest to being central to everything they were talking about.

I’m not sure quite what the people in the room made of my speech about “complex systems theory”. But I think I did succeed in making the point that there really could be a general “science of complexity”—and that things like cellular automata could show one how it might work. People had been talking about the complexity of this, or the complexity of that. But it seemed like I’d at least started the process of getting people to talk about complexity as an abstract thing one could expect to have general theories about.

After that first workshop, I had a few more interactions with what was to be the Santa Fe Institute. I still wasn’t sure what was going to happen with it—but the “science of complexity” idea did seem to be sticking. Meanwhile, however, I was forging ahead with my own plans to start a complex systems institute (I avoided the term “complexity theory” out of deference to the rather different field of computational complexity theory). I was talking to all sorts of universities, and in fact David Pines was encouraging me to consider the University of Illinois.

George Cowan asked me if I’d be interested in running the research program for the Santa Fe Institute, but by that point I was committed to starting my own operation, and it wasn’t long afterwards that I decided to do it at the University of Illinois. My Center for Complex Systems Research—and my journal Complex Systems—began operations in the summer of 1986.

Complex Systems

I’m not sure how things would have been different if I’d ended up working with the Santa Fe Institute. But as it was, I rather quickly tired of the effort to raise money for complex systems research, and I was soon off creating what became Mathematica (and now the Wolfram Language), and starting my company Wolfram Research.

By the early 1990s, probably in no small part through the efforts of the Santa Fe Institute, “complexity” had actually become a popular buzzword, and, partly through a rather circuitous connection to climate science, funding had started pouring in. But having launched Mathematica and my company, I’d personally pretty much vanished from the scene, working quietly on using the tools I’d created to pursue my interests in basic science. I thought it would only take a couple of years, but in the end it took more than a decade.

I discovered a lot—and realized that, yes, the phenomena I’d first seen with cellular automata and talked about at the Santa Fe workshop were indeed a clue to a whole new kind of science, with all sorts of implications for long-standing problems and for the future. I packaged up what I’d figured out—and in 2002 published my magnum opus A New Kind of Science.

A New Kind of Science

It was strange to reemerge after a decade and a half away. The Santa Fe Institute had continued to pursue the science of complexity. As something of a hermit in those years, I hadn’t interacted with it—but there was curiosity about what I was doing (highlighted, if nothing else, by a bizarre incident in 1998 involving “leaks” about my research). When my book came out in 2002 I was pleased that I thought I’d actually done what I talked about doing back at that Santa Fe workshop in 1984—as well as much more.

But by then almost nobody who’d been there in 1984 was still involved with the Santa Fe Institute, and instead there was a “new guard” (now, I believe, again departed), who, far from being pleased with my progress and success in broadening the field, actually responded with rather unseemly hostility.

It’s been an interesting journey from those days in October 1984. Today complex systems research is very definitely “a thing”, and there are hundreds of “complex systems” institutes around the world. (Though I still don’t think the basic science of complexity, as opposed to its applications, has received the attention it should.) But the Santa Fe Institute remains the prototypical example—and it’s not uncommon when I talk about complexity research for people to ask, “Is that like what the Santa Fe Institute does?”

“Well actually”, I sometimes say, “there’s a little footnote to history about that”. And off I go, talking about that Saturday afternoon back in October 1984—when I could be reached (as the notes I distributed said) through that newfangled thing called email at ias!swolf

Stephen Wolfram's notes on complex systems—click to enlarge

Testifying at the Senate about A.I.‑Selected Content on the Internet

$
0
0
capitol-thumb

Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms

An Invitation to Washington

Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms” I wasn’t sure why I’d be relevant.

But then the email went on: “The hearing is intended to examine, among other things, whether algorithmic transparency or algorithmic explanation are policy options Congress should be considering.” That piqued my interest, because, yes, I have thought about “algorithmic transparency” and “algorithmic explanation”, and their implications for the deployment of artificial intelligence.

Generally I stay far away from anything to do with politics. But figuring out how the world should interact with AI is really important. So I decided that—even though it was logistically a bit difficult—I should do my civic duty and go to Washington and testify.

Watch the Senate hearing:
Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms »

Understanding the Issues

So what was the hearing really about? For me, it was in large measure an early example of reaction to the realization that, yes, AIs are starting to run the world. Billions of people are being fed content that is basically selected for them by AIs, and there are mounting concerns about this, as reported almost every day in the media.

Are the AIs cleverly hacking us humans to get us to behave in a certain way? What kind of biases do the AIs have, relative to what the world is like, or what we think the world should be like? What are the AIs optimizing for, anyway? And when are there actually “humans behind the curtain”, controlling in detail what the AIs are doing?

It doesn’t help that in some sense the AIs are getting much more free rein than they might because the people who use them aren’t really their customers. I have to say that back when the internet was young, I personally never thought it would work this way, but in today’s world many of the most successful businesses on the internet—including Google, Facebook, YouTube and Twitter—make their revenue not from their users, but instead from advertisers who are going through them to reach their users.

All these business also have in common that they are fundamentally what one can call “automated content selection businesses”: they work by getting large amounts of content that they didn’t themselves generate, then using what amounts to AI to automatically select what content to deliver or to suggest to any particular user at any given time—based on data that they’ve captured about that user. Part of what’s happening is presumably optimized to give a good experience to their users (whatever that might mean), but part of it is also optimized to get revenue from the actual customers, i.e. advertisers. And there’s also an increasing suspicion that somehow the AI is biased in what it’s doing—maybe because someone explicitly made it be, or because it somehow evolved that way.

“Open Up the AI”?

So why not just “open up the AI” and see what it’s doing inside? Well, that’s what the algorithmic transparency idea mentioned in the invitation to the hearing is about.

And the problem is that, no, that can’t work. If we want to seriously use the power of computation—and AI—then inevitably there won’t be a “human-explainable” story about what’s happening inside.

So, OK, if you can’t check what’s happening inside the AI, what about putting constraints on what the AI does? Well, to do that, you have to say what you want. What rule for balance between opposing kinds of views do you want? How much do you allow people to be unsettled by what they see? And so on.

And there are two problems here: first, what to want, and, second, how to describe it. In the past, the only way we could imagine describing things like this was with traditional legal rules, written in legalese. But if we want AIs to automatically follow these rules, perhaps billions of times a second, that’s not good enough: instead, we need something that AIs can intrinsically understand.

And at least on this point I think we’re making good progress. Because—thanks to our 30+ years of work on Wolfram Language—we’re now beginning to have a computational language that has the scope to formulate “computational contracts” that can specify relevant kinds of constraints in computational terms, in a form that humans can write and understand, and that machines can automatically interpret.

But even though we’re beginning to have the tools, there’s still the huge question of what the “computational laws” for automatic content selection AIs will be.

A lot of the hearing ultimately revolved around Section 230 of the 1996 Communications Decency Act—which specifies what kinds of content companies can choose to block without losing their status as “neutral platforms”. There’s a list of fairly uncontroversially blockable kinds of content. But then the sentence ends with “or otherwise objectionable [content]”. What does this mean? Does it mean content that espouses objectionable points of view? Whose definition of “objectionable”? Etc.

Well, one day things like Section 230 will, of necessity, not be legalese laws, but computational laws. There’ll be some piece of computational language that specifies for example that this-or-that machine learning classifier trained on this-or-that sample of the internet will be used to define this or that.

We’re not there yet, however. We’re only just beginning to be able to set up computational contracts for much simpler things, like business situations. And—somewhat fueled by blockchain—I expect that this will accelerate in the years to come. But it’s going to be a while before the US Senate is routinely debating lines of code in computational laws.

So, OK, what can be done now?

A Possible Path Forward?

A little more than a week ago, what I’d figured out was basically what I’ve already described here. But that meant I was looking at going to the hearing and basically saying only negative things. “Sorry, this won’t work. You can’t do that. The science says it’s impossible. The solution is years in the future.” Etc.

And, as someone who prides himself on turning the seemingly impossible into the possible, this didn’t sit well with me. So I decided I’d better try to figure out if I could actually see a pragmatic, near-term path forward. At first, I tried thinking about purely technological solutions. But soon I basically convinced myself that no such solution was going to work.

So, with some reticence, I decided I’d better start thinking about other kinds of solutions. Fortunately there are quite a few people at my company and in my circle who I could talk to about this—although I soon discovered they often had strongly conflicting views. But after a little while, a glimmer of an idea emerged.

Why does every aspect of automated content selection have to be done by a single business? Why not open up the pipeline, and create a market in which users can make choices for themselves?

One of the constraints I imposed on myself is that my solution couldn’t detract from the impressive engineering and monetization of current automated content selection businesses. But I came up with at least two potential ways to open things up that I think could still perfectly well satisfy this constraint.

One of my ideas involved introducing what I call “final ranking providers”: third parties who take pre-digested feature vectors from the underlying content platform, then use these to do the final ranking of items in whatever way they want. My other ideas involved introducing “constraint providers”: third parties who provide constraints in the form of computational contracts that are inserted into the machine learning loop of the automated content selection system.

The important feature of both these solutions is that users don’t have to trust the single AI of the automated content selection business. They can in effect pick their own brand of AI—provided by a third party they trust—to determine what content they’ll actually be given.

Who would these third-party providers be? They might be existing media organizations, or nonprofits, or startups. Or they might be something completely new. They’d have to have some technical sophistication. But fundamentally what they’d have to do is to define—or represent—brands that users would trust to decide what the final list of items in their news feed, or video recommendations, or search results, or whatever, might be.

Social networks get their usefulness by being monolithic: by having “everyone” connected into them. But the point is that the network can prosper as a monolithic thing, but there doesn’t need to be just one monolithic AI that selects content for all the users on the network. Instead, there can be a whole market of AIs, that users can freely pick between.

And here’s another important thing: right now there’s no consistent market pressure on the final details of how content is selected for users, not least because users aren’t the final customers. (Indeed, pretty much the only pressure right now comes from PR eruptions and incidents.) But if the ecosystem changes, and there are third parties whose sole purpose is to serve users, and to deliver the final content they want, then there’ll start to be real market forces that drive innovation—and potentially add more value.

Could It Work?

AI provides powerful ways to automate the doing of things. But AIs on their own can’t ultimately decide what they want to do. That has to come from outside—from humans defining goals. But at a practical level, where should those goals be set? Should they just come—monolithically—from an automated content selection business? Or should users have more freedom, and more choice?

One might say: “Why not let every user set everything for themselves?”. Well, the problem with that is that automated content selection is a complicated matter. And—much as I hope that there’ll soon be very widespread computational language literacy—I don’t think it’s realistic that everyone will be able to set up everything in detail for themselves. So instead, I think the better idea is to have discrete third-party providers, who set things up in a way that appeals to some particular group of users.

Then standard market forces can come into play. No doubt the result would even be a greater level of overall success for the delivery of content to users who want it (and monetize it). But this market approach also solves some other problems associated with the “single point of failure” monolithic AI.

For example, with the monolithic AI, if someone figures out how to spread some kind of bad content, it’ll spread everywhere. With third-party providers, there’s a good chance it’ll only spread through some of them.

Right now there’s lots of unhappiness about people simply being “banned” from particular content platforms. But with the market of third-party providers, banning is not an all-or-nothing proposition anymore: some providers could ban someone, but others might not.

OK, but are there “fatal flaws” with my idea? People could object that it’s technically difficult to do. I don’t know the state of the codebases inside the major automated content selection businesses. But I’m certain that with manageable effort, appropriate APIs etc. could be set up. (And it might even help these businesses by forcing some code cleanup and modernization.)

Another issue might be: how will the third-party providers be incentivized? I can imagine some organizations just being third-party providers as a public service. But in other cases they’d have to be paid a commission by the underlying content platform. The theory, though, is that good work by third-party content providers would expand the whole market, and make them “worth their commission”. Plus, of course, the underlying content platforms could save a lot by not having to deal with all those complaints and issues they’re currently getting.

What if there’s a third-party provider that upranks content some people don’t like? That will undoubtedly happen. But the point is that this is a market—so market dynamics can operate.

Another objection is that my idea makes even worse the tendency with modern technology for people to live inside “content bubbles” where they never broaden their points of view. Well, of course, there can be providers who offer broader content. But people could choose “content bubbles” providers. The good thing, though, is that they’re choosing them, and they know they’re doing that, just like they know they’re choosing to watch one television channel and not another.

Of course it’s important for the operation of society that people have some level of shared values. But what should those shared values be, and who should decide them? In a totalitarian system, it’s basically the government. Right now, with the current monolithic state of automated content selection, one could argue it’s the automated content selection businesses.

If I were running one of those businesses, I’d certainly not want to get set up as the moral arbiter for the world; it seems like a no-win role. With the third-party providers idea, there’s a way out, without damaging the viability of the business. Yes, users get more control, as arguably they should have, given that they are the fuel that makes the business work. But the core business model is still perfectly intact. And there’s a new market that opens up, for third-party providers, potentially delivering all sorts of new economic value.

What Should I Do?

At the beginning of last weekend, what I just described was basically the state of my thinking. But what should I do with it? Was there some issue I hadn’t noticed? Was I falling into some political or business trap? I wasn’t sure. But it seemed as if some idea in this area was needed, and I had an idea, so I really should tell people about it.

So I quickly wrote up the written testimony for the hearing, and sent it in by the deadline on Sunday morning. (The full text of the testimony is included at the end of this piece.)

Stephen Wolfram's written testimony

The Hearing Itself

View of the Senate

This morning was the hearing itself. It was in the same room as the hearing Mark Zuckerberg did last fall. The staffers were saying that they expected a good turnout of senators, and that of the 24 senators on the subcommittee (out of 100 total in the Senate), they expected about 15 to show up at some point or another.

At the beginning, staffers were putting out nameplates for the senators. I was trying to figure out what the arrangement was. And then I realized! It was a horseshoe configuration and Republican senators were on the right side of the horseshoe, Democrats were on the left. There really are right and left wings! (Yes, I obviously don’t watch C-SPAN enough, or I’d already know that.)

When the four of us on the panel were getting situated, one of the senators (Marsha Blackburn [R-TN]) wandered up, and started talking about computational irreducibility. Wow, I thought, this is going to be interesting. That’s a pretty abstruse science concept to be finding its way into the Senate.

Everyone had five minutes to give opening remarks, and everyone had a little countdown timer in front of them. I talked a bit about the science and technology of AI and explainability. I mentioned computational contracts and the concept of an AI Constitution. Then I said I didn’t want to just explain that everything was impossible—and gave a brief summary of my ideas for solutions. Rather uncharacteristically for me, I ended a full minute before my time was up.

The format for statements and questions was five minutes per senator. The issues raised were quite diverse. I quickly realized, though, that it was unfortunate that I really had three different things I was talking about (non-explainability, computational laws, and my ideas for a near-term solution). In retrospect perhaps I should have concentrated on the near-term solution, but it felt odd to be emphasizing something I just thought of last week, rather than something I’ve thought about for many years.

Still, it was fascinating—and a sign of things to come—to see serious issues about what amounts to the philosophy of computation being discussed in the Senate. To be fair, I had done a small hearing at the Senate back in 2003 (my only other such experience) about the ideas in A New Kind of Science. But then it had been very much on the “science track”; now the whole discussion was decidedly mainstream.

I couldn’t help thinking that I was witnessing the concept of computation beginning to come of age. What used to be esoteric issues in the theory of computation were now starting to be things that senators were discussing writing laws about. One of the senators mentioned atomic energy, and compared it to AI. But really, AI is going to be something much more central to the whole future of our species.

It enables us to do so much. But yet it forces us to confront what we want to do, and who we want to be. Today it’s rare and exotic for the Senate to be discussing issues of AI. In time I suspect AI and its many consequences will be a dominant theme in many Senate discussions. This is just the beginning.

I wish we were ready to really start creating an AI Constitution. But we’re not (and it doesn’t help that we don’t have an AI analog of the few thousand years of human political history that were available as a guide when the US Constitution was drafted). Still, issue by issue I suspect we’ll move closer to the point where having a coherent AI Constitution becomes a necessity. No doubt there’ll be different ones in different communities and different countries. But one day a group like the one I saw today—with all the diverse and sometimes colorful characters involved—will end up having to figure out just how we humans interact with AI and the computational world.


The Written Testimony

Download PDF

Summary

Automated content selection by internet businesses has become progressively more contentious—leading to calls to make it more transparent or constrained. I explain some of the complex intellectual and scientific problems involved, then offer two possible technical and market suggestions for paths forward. Both are based on giving users a choice about who to trust for the final content they see—in one case introducing what I call “final ranking providers”, and in the other case what I call “constraint providers”.

The Nature of the Problem

There are many kinds of businesses that operate on the internet, but some of the largest and most successful are what one can call automated content selection businesses. Facebook, Twitter, YouTube and Google are all examples. All of them deliver content that others have created, but a key part of their value is associated with their ability to (largely) automatically select what content they should serve to a given user at a given time—whether in news feeds, recommendations, web search results, or advertisements.

What criteria are used to determine content selection? Part of the story is certainly to provide good service to users. But the paying customers for these businesses are not the users, but advertisers, and necessarily a key objective of these businesses must be to maximize advertising income. Increasingly, there are concerns that this objective may have unacceptable consequences in terms of content selection for users. And in addition there are concerns that—through their content selection—the companies involved may be exerting unreasonable influence in other kinds of business (such as news delivery), or in areas such as politics.

Methods for content selection—using machine learning, artificial intelligence, etc.—have become increasingly sophisticated in recent years. A significant part of their effectiveness—and economic success—comes from their ability to use extensive data about users and their previous activities. But there has been increasing dissatisfaction and, in some cases, suspicion about just what is going on inside the content selection process.

This has led to a desire to make content selection more transparent, and perhaps to constrain aspects of how it works. As I will explain, these are not easy things to achieve in a useful way. And in fact, they run into deep intellectual and scientific issues, that are in some ways a foretaste of problems we will encounter ever more broadly as artificial intelligence becomes more central to the things we do. Satisfactory ultimate solutions will be difficult to develop, but I will suggest here two near-term practical approaches that I believe significantly address current concerns.

How Automated Content Selection Works

Whether one’s dealing with videos, posts, webpages, news items or, for that matter, ads, the underlying problem of automated content selection (ACS) is basically always the same. There are many content items available (perhaps even billions of them), and somehow one has to quickly decide which ones are “best” to show to a given user at a given time. There’s no fundamental principle to say what “best” means, but operationally it’s usually in the end defined in terms of what maximizes user clicks, or revenue from clicks.

The major innovation that has made modern ACS systems possible is the idea of automatically extrapolating from large numbers of examples. The techniques have evolved, but the basic idea is to effectively deduce a model of the examples and then to use this model to make predictions, for example about what ranking of items will be best for a given user.

Because it will be relevant for the suggestions I’m going to make later, let me explain here a little more about how most current ACS systems work in practice. The starting point is normally to extract a collection of perhaps hundreds or thousands of features (or “signals”) for each item. If a human were doing it, they might use features like: “How long is the video? Is it entertainment or education? Is it happy or sad?” But these days—with the volume of data that’s involved—it’s a machine doing it, and often it’s also a machine figuring out what features to extract. Typically the machine will optimize for features that make its ultimate task easiest—whether or not (and it’s almost always not) there’s a human-understandable interpretation of what the features represent.

As an example, here are the letters of the alphabet automatically laid out by a machine in a “feature space” in which letters that “look similar” appear nearby:

Feature space plot

How does the machine know what features to extract to determine whether things will “look similar”? A typical approach is to give it millions of images that have been tagged with what they are of (“elephant”, “teacup”, etc.). And then from seeing which images are tagged the same (even though in detail they look different), the machine is able—using the methods of modern machine learning—to identify features that could be used to determine how similar images of anything should be considered to be.

OK, so let’s imagine that instead of letters of the alphabet laid out in a 2D feature space, we’ve got a million videos laid out in a 200-dimensional feature space. If we’ve got the features right, then videos that are somehow similar should be nearby in this feature space.

But given a particular person, what videos are they likely to want to watch? Well, we can do the same kind of thing with people as with videos: we can take the data we know about each person, and extract some set of features. “Similar people” would then be nearby in “people feature space”, and so on.

But now there’s a “final ranking” problem. Given features of videos, and features of people, which videos should be ranked “best” for which people? Often in practice, there’s an initial coarse ranking. But then, as soon as we have a specific definition of “best”—or enough examples of what we mean by “best”—we can use machine learning to learn a program that will look at the features of videos and people, and will effectively see how to use them to optimize the final ranking.

The setup is a bit different in different cases, and there are many details, most of which are proprietary to particular companies. However, modern ACS systems—dealing as they do with immense amounts of data at very high speed—are a triumph of engineering, and an outstanding example of the power of artificial intelligence techniques.

Is It “Just an Algorithm”?

When one hears the term “algorithm” one tends to think of a procedure that will operate in a precise and logical way, always giving a correct answer, not influenced by human input. One also tends to think of something that consists of well-defined steps, that a human could, if needed, readily trace through.

But this is pretty far from how modern ACS systems work. They don’t deal with the same kind of precise questions (“What video should I watch next?” just isn’t something with a precise, well-defined answer). And the actual methods involved make fundamental use of machine learning, which doesn’t have the kind of well-defined structure or explainable step-by-step character that’s associated with what people traditionally think of as an “algorithm”. There’s another thing too: while traditional algorithms tend to be small and self-contained, machine learning inevitably requires large amounts of externally supplied data.

In the past, computer programs were almost exclusively written directly by humans (with some notable exceptions in my own scientific work). But the key idea of machine learning is instead to create programs automatically, by “learning the program” from large numbers of examples. The most common type of program on which to apply machine learning is a so-called neural network. Although originally inspired by the brain, neural networks are purely computational constructs that are typically defined by large arrays of numbers called weights.

Imagine you’re trying to build a program that recognizes pictures of cats versus dogs. You start with lots of specific pictures that have been identified—normally by humans—as being either of cats or dogs. Then you “train” a neural network by showing it these pictures and gradually adjusting its weights to make it give the correct identification for these pictures. But then the crucial point is that the neural network generalizes. Feed it another picture of a cat, and even if it’s never seen that picture before, it’ll still (almost certainly) say it’s a cat.

What will it do if you feed it a picture of a cat dressed as a dog? It’s not clear what the answer is supposed to be. But the neural network will still confidently give some result—that’s derived in some way from the training data it was given.

So in a case like this, how would one tell why the neural network did what it did? Well, it’s difficult. All those weights inside the network were learned automatically; no human explicitly set them up. It’s very much like the case of extracting features from images of letters above. One can use these features to tell which letters are similar, but there’s no “human explanation” (like “count the number of loops in the letter”) of what each of the features are.

Would it be possible to make an explainable cat vs. dog program? For 50 years most people thought that a problem like cat vs. dog just wasn’t the kind of thing computers would be able to do. But modern machine learning made it possible—by learning the program rather than having humans explicitly write it. And there are fundamental reasons to expect that there can’t in general be an explainable version—and that if one’s going to do the level of automated content selection that people have become used to, then one cannot expect it to be broadly explainable.

Sometimes one hears it said that automated content selection is just “being done by an algorithm”, with the implication that it’s somehow fair and unbiased, and not subject to human manipulation. As I’ve explained, what’s actually being used are machine learning methods that aren’t like traditional precise algorithms.

And a crucial point about machine learning methods is that by their nature they’re based on learning from examples. And inevitably the results they give depend on what examples were used.

And this is where things get tricky. Imagine we’re training the cat vs. dog program. But let’s say that, for whatever reason, among our examples there are spotted dogs but no spotted cats. What will the program do if it’s shown a spotted cat? It might successfully recognize the shape of the cat, but quite likely it will conclude—based on the spots—that it must be seeing a dog.

So is there any way to guarantee that there are no problems like this, that were introduced either knowingly or unknowingly? Ultimately the answer is no—because one can’t know everything about the world. Is the lack of spotted cats in the training set an error, or are there simply no spotted cats in the world?

One can do one’s best to find correct and complete training data. But one will never be able to prove that one has succeeded.

But let’s say that we want to ensure some property of our results. In almost all cases, that’ll be perfectly possible—either by modifying the training set, or the neural network. For example, if we want to make sure that spotted cats aren’t left out, we can just insist, say, that our training set has an equal number of spotted and unspotted cats. That might not be a correct representation of what’s actually true in the world, but we can still choose to train our neural network on that basis.

As a different example, let’s say we’re selecting pictures of pets. How many cats should be there, versus dogs? Should we base it on the number of cat vs. dog images on the web? Or how often people search for cats vs. dogs? Or how many cats and dogs are registered in America? There’s no ultimate “right answer”. But if we want to, we can give a constraint that says what should happen.

This isn’t really an “algorithm” in the traditional sense either—not least because it’s not about abstract things; it’s about real things in the world, like cats and dogs. But an important development (that I happen to have been personally much involved in for 30+ years) is the construction of a computational language that lets one talk about things in the world in a precise way that can immediately be run on a computer.

In the past, things like legal contracts had to be written in English (or “legalese”). Somewhat inspired by blockchain smart contracts, we are now getting to the point where we can write automatically executable computational contracts not in human language but in computational language. And if we want to define constraints on the training sets or results of automated content selection, this is how we can do it.

Issues from Basic Science

Why is it difficult to find solutions to problems associated with automated content selection? In addition to all the business, societal and political issues, there are also some deep issues of basic science involved. Here’s a list of some of those issues. The precursors of these issues date back nearly a century, though it’s only quite recently (in part through my own work) that they’ve become clarified. And although they’re not enunciated (or named) as I have here, I don’t believe any of them are at this point controversial—though to come to terms with them requires a significant shift in intuition from what exists without modern computational thinking.


Data Deducibility

Even if you don’t explicitly know something (say about someone), it can almost always be statistically deduced if there’s enough other related data available

What is a particular person’s gender identity, ethnicity, political persuasion, etc.? Even if one’s not allowed to explicitly ask these questions, it’s basically inevitable that with enough other data about the person, one will be able to deduce what the best answers must be.

Everyone is different in detail. But the point is that there are enough commonalities and correlations between people that it’s basically inevitable that with enough data, one can figure out almost any attribute of a person.

The basic mathematical methods for doing this were already known from classical statistics. But what’s made this now a reality is the availability of vastly more data about people in digital form—as well as the ability of modern machine learning to readily work not just with numerical data, but also with things like textual and image data.

What is the consequence of ubiquitous data deducibility? It means that it’s not useful to block particular pieces of data—say in an attempt to avoid bias—because it’ll essentially always be possible to deduce what that blocked data was. And it’s not just that this can be done intentionally; inside a machine learning system, it’ll often just happen automatically and invisibly.


Computational Irreducibility

Even given every detail of a program, it can be arbitrarily hard to predict what it will
or won’t do

One might think that if one had the complete code for a program, one would readily be able to deduce everything about what the program would do. But it’s a fundamental fact that in general one can’t do this. Given a particular input, one can always just run the program and see what it does. But even if the program is simple, its behavior may be very complicated, and computational irreducibility implies that there won’t be a way to “jump ahead” and immediately find out what the program will do, without explicitly running it.

One consequence of this is that if one wants to know, for example, whether with any input a program can do such-and-such, then there may be no finite way to determine this—because one might have to check an infinite number of possible inputs. As a practical matter, this is why bugs in programs can be so hard to detect. But as a matter of principle, it means that it can ultimately be impossible to completely verify that a program is “correct”, or has some specific property.

Software engineering has in the past often tried to constrain the programs it deals with so as to minimize such effects. But with methods like machine learning, this is basically impossible to do. And the result is that even if it had a complete automated content selection program, one wouldn’t in general be able to verify that, for example, it could never show some particular bad behavior.


Non-explainability

For a well-optimized computation, there’s not likely to be a human-understandable narrative about how it works inside

Should we expect to understand how our technological systems work inside? When things like donkeys were routinely part of such systems, people didn’t expect to. But once the systems began to be “completely engineered” with cogs and levers and so on, there developed an assumption that at least in principle one could explain what was going on inside. The same was true with at least simpler software systems. But with things like machine learning systems, it absolutely isn’t.

Yes, one can in principle trace what happens to every bit of data in the program. But can one create a human-understandable narrative about it? It’s a bit like imagining we could trace the firing of every neuron in a person’s brain. We might be able to predict what a person would do in a particular case, but it’s a different thing to get a high-level “psychological narrative” about why they did it.

Inside a machine learning system—say the cats vs. dogs program—one can think of it as extracting all sorts of features, and making all sorts of distinctions. And occasionally one of these features or distinctions might be something we have a word for (“pointedness”, say). But most of the time they’ll be things the machine learning system discovered, and they won’t have any connection to concepts we’re familiar with.

And in fact—as a consequence of computational irreducibility—it’s basically inevitable that with things like the finiteness of human language and human knowledge, in any well-optimized computation we’re not going to be able to give a high-level narrative to explain what it’s doing. And the result of this is that it’s impossible to expect any useful form of general “explainability” for automated content selection systems.


Ethical Incompleteness

There’s no finite set of principles that can completely define any reasonable, practical system of ethics

Let’s say one’s trying to teach ethics to a computer, or an artificial intelligence. Is there some simple set of principles—like Asimov’s Laws of Robotics—that will capture a viable complete system of ethics? Looking at the complexity of human systems of laws one might suspect that the answer is no. And in fact this is presumably a fundamental result—essentially another consequence of computational irreducibility.

Imagine that we’re trying to define constraints (or “laws”) for an artificial intelligence, in order to ensure that the AI behaves in some particular “globally ethical” way. We set up a few constraints, and we find that many things the AI does follow our ethics. But computational irreducibility essentially guarantees that eventually there’ll always be something unexpected that’s possible. And the only way to deal with that is to add a “patch”—essentially to introduce another constraint for that new case. And the issue is that this will never end: there’ll be no way to give a finite set of constraints that will achieve our global objectives. (There’s a somewhat technical analogy of this in mathematics, in which Gödel’s theorem shows that no finite set of axiomatic constraints can give one only ordinary integers and nothing else.)

So for our purposes here, the main consequence of this is that we can’t expect to have some finite set of computational principles (or, for that matter, laws) that will constrain automated content selection systems to always behave according to some reasonable, global system of ethics—because they’ll always be generating unexpected new cases that we have to define a new principle to handle.


The Path Forward

I’ve described some of the complexities of handling issues with automated content selection systems. But what in practice can be done?

One obvious idea would be just to somehow “look inside” the systems, auditing their internal operation and examining their construction. But for both fundamental and practical reasons, I don’t think this can usefully be done. As I’ve discussed, to achieve the kind of functionality that users have become accustomed to, modern automated content selection systems make use of methods such as machine learning that are not amenable to human-level explainability or systematic predictability.

What about checking whether a system is, for example, biased in some way? Again, this is a fundamentally difficult thing to determine. Given a particular definition of bias, one could look at the internal training data used for the system—but this won’t usually give more information than just studying how the system behaves.

What about seeing if the system has somehow intentionally been made to do this or that? It’s conceivable that the source code could have explicit “if” statements that would reveal intention. But the bulk of the system will tend to consist of trained neural networks and so on—and as in most other complex systems, it’ll typically be impossible to tell what features might have been inserted “on purpose” and what are just accidental or emergent properties.

So if it’s not going to work to “look inside” the system, what about restricting how the system can be set up? For example, one approach that’s been suggested is to limit the inputs that the system can have, in an extreme case preventing it from getting any personal information about the user and their history. The problem with this is that it negates what’s been achieved over the course of many years in content selection systems—both in terms of user experience and economic success. And for example, knowing nothing about a user, if one has to recommend a video, one’s just going to have to suggest whatever video is generically most popular—which is very unlikely to be what most users want most of the time.

As a variant of the idea of blocking all personal information, one can imagine blocking just some information—or, say, allowing a third party to broker what information is provided. But if one wants to get the advantages of modern content selection methods, one’s going to have to leave a significant amount of information—and then there’s no point in blocking anything, because it’ll almost certainly be reproducible through the phenomenon of data deducibility.

Here’s another approach: what about just defining rules (in the form of computational contracts) that specify constraints on the results content selection systems can produce? One day, we’re going to have to have such computational contracts to define what we want AIs in general to do. And because of ethical incompleteness—like with human laws—we’re going to have to have an expanding collection of such contracts.

But even though (particularly through my own efforts) we’re beginning to have the kind of computational language necessary to specify a broad range of computational contracts, we realistically have to get much more experience with computational contracts in standard business and other situations before it makes sense to try setting them up for something as complex as global constraints on content selection systems.

So, what can we do? I’ve not been able to see a viable, purely technical solution. But I have formulated two possible suggestions based on mixing technical ideas with what amount to market mechanisms.

The basic principle of both suggestions is to give users a choice about who to trust, and to let the final results they see not necessarily be completely determined by the underlying ACS business.

There’s been debate about whether ACS businesses are operating as “platforms” that more or less blindly deliver content, or whether they’re operating as “publishers” who take responsibility for content they deliver. Part of this debate can be seen as being about what responsibility should be taken for an AI. But my suggestions sidestep this issue, and in different ways tease apart the “platform” and “publisher” roles.

It’s worth saying that the whole content platform infrastructure that’s been built by the large ACS businesses is an impressive and very valuable piece of engineering—managing huge amounts of content, efficiently delivering ads against it, and so on. What’s really at issue is whether the fine details of the ACS systems need to be handled by the same businesses, or whether they can be opened up. (This is relevant only for ACS businesses whose network effects have allowed them to serve a large fraction of a population. Small ACS businesses don’t have the same kind of lock-in.)


Suggestion A: Allow Users to Choose among Final Ranking Providers

Suggestion A

As I discussed earlier, the rough (and oversimplified) outline of how a typical ACS system works is that first features are extracted for each content item and each user. Then, based on these features, there’s a final ranking done that determines what will actually be shown to the user, in what order, etc.

What I’m suggesting is that this final ranking doesn’t have to be done by the same entity that sets up the infrastructure and extracts the features. Instead, there could be a single content platform but a variety of “final ranking providers”, who take the features, and then use their own programs to actually deliver a final ranking.

Different final ranking providers might use different methods, and emphasize different kinds of content. But the point is to let users be free to choose among different providers. Some users might prefer (or trust more) some particular provider—that might or might not be associated with some existing brand. Other users might prefer another provider, or choose to see results from multiple providers.

How technically would all this be implemented? The underlying content platform (presumably associated with an existing ACS business) would take on the large-scale information-handling task of deriving extracted features. The content platform would provide sufficient examples of underlying content (and user information) and its extracted features to allow the final ranking provider’s systems to “learn the meaning” of the features.

When the system is running, the content platform would in real time deliver extracted features to the final ranking provider, which would then feed this into whatever system they have developed (which could use whatever automated or human selection methods they choose). This system would generate a ranking of content items, which would then be fed back to the content platform for final display to the user.

To avoid revealing private user information to lots of different providers, the final ranking provider’s system should probably run on the content platform’s infrastructure. The content platform would be responsible for the overall user experience, presumably providing some kind of selector to pick among final ranking providers. The content platform would also be responsible for delivering ads against the selected content.

Presumably the content platform would give a commission to the final ranking provider. If properly set up, competition among final ranking providers could actually increase total revenue to the whole ACS business, by achieving automated content selection that serves users and advertisers better.

Suggestion B: Allow Users to Choose among Constraint Providers

Suggestion B

One feature of Suggestion A is that it breaks up ACS businesses into a content platform component, and a final ranking component. (One could still imagine, however, that a quasi-independent part of an ACS business could be one of the competing final ranking providers.) An alternative suggestion is to keep ACS businesses intact, but to put constraints on the results that they generate, for example forcing certain kinds of balance, etc.

Much like final ranking providers, there would be constraint providers who define sets of constraints. For example, a constraint provider could require that there be on average an equal number of items delivered to a user that are classified (say, by a particular machine learning system) as politically left-leaning or politically right-leaning.

Constraint providers would effectively define computational contracts about properties they want results delivered to users to have. Different constraint providers would define different computational contracts. Some might want balance; others might want to promote particular types of content, and so on. But the idea is that users could decide what constraint provider they wish to use.

How would constraint providers interact with ACS businesses? It’s more complicated than for final ranking providers in Suggestion A, because effectively the constraints from constraint providers have to be woven deeply into the basic operation of the ACS system.

One possible approach is to use the machine learning character of ACS systems, and to insert the constraints as part of the “learning objectives” (or, technically, “loss functions”) for the system. Of course, there could be constraints that just can’t be successfully learned (for example, they might call for types of content that simply don’t exist). But there will be a wide range of acceptable constraints, and in effect, for each one, a different ACS system would be built.

All these ACS systems would then be operated by the underlying ACS business, with users selecting which constraint provider—and therefore which overall ACS system—they want to use.

As with Suggestion A, the underlying ACS business would be responsible for delivering advertising, and would pay a commission to the constraint provider.


Although their detailed mechanisms are different, both Suggestions A and B attempt to leverage the exceptional engineering and commercial achievements of the ACS businesses, while diffusing current trust issues about content selection, providing greater freedom for users, and inserting new opportunities for market growth.

The suggestions also help with some other issues. One example is the banning of content providers. At present, with ACS businesses feeling responsible for content on their platforms, there is considerable pressure, not least from within the ACS businesses themselves, to ban content providers that they feel are providing inappropriate content. The suggestions diffuse the responsibility for content, potentially allowing the underlying ACS businesses not to ban anything but explicitly illegal content.

It would then be up to the final ranking providers, or the constraint providers, to choose whether or not to deliver or allow content of a particular character, or from a particular content provider. In any given case, some might deliver or allow it, and some might not, removing the difficult all-or-none nature of the banning that’s currently done by ACS businesses.

One feature of my suggestions is that they allow fragmentation of users into groups with different preferences. At present, all users of a particular ACS business have content that is basically selected in the same way. With my suggestions, users of different persuasions could potentially receive completely different content, selected in different ways.

While fragmentation like this appears to be an almost universal tendency in human society, some might argue that having people routinely be exposed to other people’s points of view is important for the cohesiveness of society. And technically some version of this would not be difficult to achieve. For example, one could take the final ranking or constraint providers, and effectively generate a feature space plot of what they do.

Some would be clustered close together, because they lead to similar results. Others would be far apart in feature space—in effect representing very different points of view. Then if someone wanted to, say, see their typical content 80% of the time, but see different points of view 20% of the time, the system could combine different providers from different parts of feature space with a certain probability.

Of course, in all these matters, the full technical story is much more complex. But I am confident that if they are considered desirable, either of the suggestions I have made can be implemented in practice. (Suggestion A is likely to be somewhat easier to implement than Suggestion B.) The result, I believe, will be richer, more trusted, and even more widely used automated content selection. In effect both my suggestions mix the capabilities of humans and AIs—to help get the best of both of them—and to navigate through the complex practical and fundamental problems with the use of automated content selection.


Mitchell Feigenbaum (1944‑2019), 4.66920160910299067185320382…

$
0
0
feigenbaum_icon
Mitchell Feigenbaum
(Artwork by Gunilla Feigenbaum)

Behind the Feigenbaum Constant

It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior.

Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by doing experimental mathematics on a pocket calculator.

It became a defining discovery in the history of chaos theory. But when it was first discovered, it was a surprising, almost bizarre result, that didn’t really connect with anything that had been studied before. Somehow, though, it’s fitting that it should have been Mitchell Feigenbaum—who I knew for nearly 40 years—who would discover it.

Trained in theoretical physics, and a connoisseur of its mathematical traditions, Mitchell always seemed to see himself as an outsider. He looked a bit like Beethoven—and projected a certain stylish sense of intellectual mystery. He would often make strong assertions, usually with a conspiratorial air, a twinkle in his eye, and a glass of wine or a cigarette in his hand.

He would talk in long, flowing sentences which exuded a certain erudite intelligence. But ideas would jump around. Sometimes detailed and technical. Sometimes leaps of intuition that I, for one, could not follow. He was always calculating, staying up until 5 or 6 am, filling yellow pads with formulas and stressing Mathematica with elaborate algebraic computations that might run for hours.

He published very little, and what he did publish he was often disappointed wasn’t widely understood. When he died, he had been working for years on the optics of perception, and on questions like why the Moon appears larger when it’s close to the horizon. But he never got to the point of publishing anything on any of this.

For more than 30 years, Mitchell’s official position (obtained essentially on the basis of his Feigenbaum constant result) was as a professor at the Rockefeller University in New York City. (To fit with Rockefeller’s biological research mission, he was themed as the Head of the “Laboratory of Mathematical Physics”.) But he dabbled elsewhere, lending his name to a financial computation startup, and becoming deeply involved in inventing new cartographic methods for the Hammond World Atlas.

What Mitchell Discovered

The basic idea is quite simple. Take a value x between 0 and 1. Then iteratively replace x by a x (1 – x). Let’s say one starts from x = , and takes a = 3.2. Then here’s what one gets for the successive values of x:

Successive values
&#10005
ListLinePlot[NestList[Compile[x, 3.2 x (1 - x)], N[1/3], 50], 
 Mesh -> All, PlotRange -> {0, 1}, Frame -> True]

After a little transient, the values of x are periodic, with period 2. But what happens with other values of a? Here are a few results for this so-called “logistic map”:

Logistic map
&#10005
GraphicsGrid[
 Partition[
  Table[Labeled[
    ListLinePlot[NestList[Compile[x, a x (1 - x)], N[1/3], 50], 
     Sequence[
     Mesh -> All, PlotRange -> {0, 1}, Frame -> True, 
      FrameTicks -> None]], StringTemplate["a = ``"][a]], {a, 2.75, 
    4, .25}], 3], Spacings -> {.1, -.1}]

For small a, the values of x quickly go to a fixed point. For larger a they become periodic, first with period 2, then 4. And finally, for larger a, the values start bouncing around seemingly randomly.

One can summarize this by plotting the values of x (here, 300, after dropping the first 50 to avoid transients) reached as a function of the value of a:

Period doublings
&#10005
ListPlot[Flatten[
  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a x (1 - x)], N[1/3], 300], 50], {a, 0, 
    4, .01}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

As a increases, one sees a cascade of “period doublings”. In this case, they’re at a = 3, a 3.449, a 3.544090, a 3.5644072. What Mitchell noticed is that these successive values approach a limit (here a 3.569946) in a geometric sequence, with aan ~ δ-n and δ 4.669.

That’s a nice little result. But here’s what makes it much more significant: it isn’t just true about the specific iterated map xa x (1 – x); it’s true about any map like that. Here, for example, is the “bifurcation diagram” for xa sin(π ):

Bifucation diagram
&#10005
ListPlot[Flatten[
  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a Sin[Pi Sqrt@x]], N[1/3], 300], 50], {a,
     0, 1, .002}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

The details are different. But what Mitchell noticed is that the positions of the period doublings again form a geometric sequence, with the exact same base: δ 4.669.

It’s not just that different iterated maps give qualitatively similar results; when one measures the convergence rate this turns out be exactly and quantitatively the same—always δ 4.669. And this was Mitchell’s big discovery: a quantitatively universal feature of the approach to chaos in a class of systems.

The Scientific Backstory

The basic idea behind iterated maps has a long history, stretching all the way back to antiquity. Early versions arose in connection with finding successive approximations, say to square roots. For example, using Newton’s method from the late 1600s, can be obtained by iterating x (here starting from x = 1):

Starting from x = 1
&#10005
NestList[Function[x, 1/x + x/2], N[1, 8], 6]

The notion of iterating an arbitrary function seems to have first been formalized in an 1870 paper by Ernst Schröder (who was notable for his work in formalizing things from powers to Boolean algebra), although most of the discussion that arose was around solving functional equations, not actually doing iterations. (An exception was the investigation of regions of convergence for Newton’s approximation by Arthur Cayley in 1879.) In 1918 Gaston Julia made a fairly extensive study of iterated rational functions in the complex plane—inventing, if not drawing, Julia sets. But until fractals in the late 1970s (which soon led to the Mandelbrot set), this area of mathematics basically languished.

But quite independent of any pure mathematical developments, iterated maps with forms similar to xa x (1 – x) started appearing in the 1930s as possible practical models in fields like population biology and business cycle theory—usually arising as discrete annualized versions of continuous equations like the Verhulst logistic differential equation from the mid-1800s. Oscillatory behavior was often seen—and in 1954 William Ricker (one of the founders of fisheries science) also found more complex behavior when he iterated some empirical fish reproduction curves.

Back in pure mathematics, versions of iterated maps had also shown up from time to time in number theory. In 1799 Carl Friedrich Gauss effectively studied the map x FractionalPart[] in connection with continued fractions. And starting in the late 1800s there was interest in studying maps like x FractionalPart[a x] and their connections to the properties of the number a.

Particularly following Henri Poincaré’s work on celestial mechanics around 1900, the idea of sensitive dependence on initial conditions arose, and it was eventually noted that iterated maps could effectively “excavate digits” in their initial conditions. For example, iterating xFractionalPart[10 x], starting with the digits of π, gives (effectively just shifting the sequence of digits one place to the left at each step):

Starting with the digits of pi...
&#10005
N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 5], 10]
FractionalPart
&#10005
ListLinePlot[
 Rest@N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 50], 
   40], Mesh -> All]

(Confusingly enough, with typical “machine precision” computer arithmetic, this doesn’t work correctly, because even though one “runs out of precision”, the IEEE Floating Point standard says to keep on delivering digits, even though they are completely wrong. Arbitrary precision in the Wolfram Language gets it right.)

Maps like xa x(1 – x) show similar kinds of “digit excavation” behavior (for example, replacing x by sin[π u]2, x ⟶ 4 x(1 – x) becomes exactly uFractionalPart[u, 2]—and this was already known by the 1940s, and, for example, commented on by John von Neumann in connection with his 1949 iterative “middle-square” method for generating pseudorandom numbers by computer.

But what about doing experimental math on iterated maps? There wasn’t too much experimental math at all on early digital computers (after all, most computer time was expensive). But in the aftermath of the Manhattan Project, Los Alamos had built its own computer (named MANIAC), that ended up being used for a whole series of experimental math studies. And in 1964 Paul Stein and Stan Ulam wrote a report entitled “Non-linear Transformation Studies on Electronic Computers” that included photographs of oscilloscope-like MANIAC screens displaying output from some fairly elaborate iterated maps. In 1971, another “just out of curiosity” report from Los Alamos (this time by Nick Metropolis [leader of the MANIAC project, and developer of the Monte Carlo method], Paul Stein and his brother Myron Stein) started to give more specific computer results for the behavior logistic maps, and noted the basic phenomenon of period doubling (which they called the “U-sequence”), as well as its qualitative robustness under changes in the underlying map.

But quite separately from all of this, there were other developments in physics and mathematics. In 1964 Ed Lorenz (a meteorologist at MIT) introduced and simulated his “naturally occurring” Lorenz differential equations, that showed sensitive dependence on initial conditions. Starting in the 1940s (but following on from Poincaré’s work around 1900) there’d been a steady stream of developments in mathematics in so-called dynamical systems theory—particularly investigating global properties of the solutions to differential equations. Usually there’d be simple fixed points observed; sometimes “limit cycles”. But by the 1970s, particularly after the arrival of early computer simulations (like Lorenz’s), it was clear that for nonlinear equations something else could happen: a so-called “strange attractor”. And in studying so-called “return maps” for strange attractors, iterated maps like the logistic map again appeared.

But it was in 1975 that various threads of development around iterated maps somehow converged. On the mathematical side, dynamical systems theorist Jim Yorke and his student Tien-Yien Li at the University of Maryland published their paper “Period Three Implies Chaos”, showing that in an iterated map with a particular parameter value, if there’s ever an initial condition that leads to a cycle of length 3, there must be other initial conditions that don’t lead to cycles at all—or, as they put it, show chaos. (As it turned out, Aleksandr Sarkovskii—who was part of a Ukrainian school of dynamical systems research—had already in 1962 proved the slightly weaker result that a cycle of period 3 implies cycles of all periods.)

But meanwhile there had also been growing interest in things like the logistic maps among mathematically oriented population biologists, leading to the rather readable review (published in mid-1976) entitled “Simple Mathematical Models with Very Complicated Dynamics” by physics-trained Australian Robert May, who was then a biology professor at Princeton (and would subsequently become science advisor to the UK government, and is now “Baron May of Oxford”).

But even though things like sketches of bifurcation diagrams existed, the discovery of their quantitatively universal properties had to await Mitchell Feigenbaum and his discovery.

Mitchell’s Journey

Mitchell Feigenbaum grew up in Brooklyn, New York. His father was an analytical chemist, and his mother was a public-school teacher. Mitchell was unenthusiastic about school, though did well on math and science tests, and managed to teach himself calculus and piano. In 1960, at age 16, as something of a prodigy, he enrolled in the City College of New York, officially studying electrical engineering, but also taking physics and math classes. After graduating in 1964, he went to MIT. Initially he was going to do a PhD in electrical engineering, but he quickly switched to physics.

But although he was enamored of classic mathematical physics (as represented, for example, in the books of Landau and Lifshiftz), he ended up writing his thesis on a topic set by his advisor about particle physics, and specifically about evaluating a class of Feynman diagrams for the scattering of photons by scalar particles (with lots of integrals, if not special functions). It wasn’t a terribly exciting thesis, but in 1970 he was duly dispatched to Cornell for a postdoc position.

Mitchell struggled with motivation, preferring to hang out in coffee shops doing the New York Times crossword (at which he was apparently very fast) to doing physics. But at Cornell, Mitchell made several friends who were to be important to him. One was Predrag Cvitanović, a star graduate student from what is now Croatia, who was studying quantum electrodynamics, and with whom he shared an interest in German literature. Another was a young poet named Kathleen Doorish (later, Kathy Hammond), who was a friend of Predrag’s. And another was a rising-star physics professor named Pete Carruthers, with whom he shared an interest in classical music.

In the early 1970s quantum field theory was entering a golden age. But despite the topic of his thesis, Mitchell didn’t get involved, and in the end, during his two years at Cornell, he produced no visible output at all. Still, he had managed to impress Hans Bethe enough to be dispatched for another postdoc position, though now at a place lower in the pecking order of physics, Virginia Polytechnic Institute, in rural Virginia.

At Virginia Tech, Mitchell did even less well than at Cornell. He didn’t interact much with people, and he produced only one three-page paper: “The Relationship between the Normalization Coefficient and Dispersion Function for the Multigroup Transport Equation”. As its title might suggest, the paper was quite technical and quite unexciting.

As Mitchell’s two years at Virginia Tech drew to a close it wasn’t clear what was going to happen. But luck intervened. Mitchell’s friend from Cornell, Pete Carruthers, had just been hired to build up the theory division (“T Division”) at Los Alamos, and given carte blanche to hire several bright young physicists. Pete would later tell me with pride (as part of his advice to me about general scientific management) that he had a gut feeling that Mitchell could do something great, and that despite other people’s input—and the evidence—he decided to bet on Mitchell.

Having brought Mitchell to Los Alamos, Pete set about suggesting projects for him. At first, it was following up on some of Pete’s own work, and trying to compute bulk collective (“transport”) properties of quantum field theories as a way to understand high-energy particle collisions—a kind of foreshadowing of investigations of quark-gluon plasma.

But soon Pete suggested that Mitchell try looking at fluid turbulence, and in particular on seeing whether renormalization group methods might help in understanding it.

Whenever a fluid—like water—flows sufficiently rapidly it forms lots of little eddies and behaves in a complex and seemingly random way. But even though this qualitative phenomenon had been discussed for centuries (with, for example, Leonardo da Vinci making nice pictures of it), physics had had remarkably little to say about it—though in the 1940s Andrei Kolmogorov had given a simple argument that the eddies should form a cascade with a k distribution of energies. At Los Alamos, though, with its focus on nuclear weapons development (inevitably involving violent fluid phenomena), turbulence was a very important thing to understand—even if it wasn’t obvious how to approach it.

But in 1974, there was news that Ken Wilson from Cornell had just “solved the Kondo problem” using a technique called the renormalization group. And Pete Carruthers suggested that Mitchell should try to apply this technique to turbulence.

The renormalization group is about seeing how changes of scale (or other parameters) affect descriptions (and behavior) of systems. And as it happened, it was Mitchell’s thesis advisor at MIT, Francis Low, who, along with Murray Gell-Mann, had introduced it back in 1954 in the context of quantum electrodynamics. The idea had lain dormant for many years, but in the early 1970s it came back to life with dramatic—though quite different—applications in both particle physics (specifically, QCD) and condensed matter physics.

In a piece of iron at room temperature, you can basically get all electron spins associated with each atom lined up, so the iron is magnetized. But if you heat the iron up, there start to be fluctuations, and suddenly—above the so-called Curie temperature (770°C for iron)—there’s effectively so much randomness that the magnetization disappears. And in fact there are lots of situations (think, for example, melting or boiling—or, for that matter, the formation of traffic jams) where this kind of sudden so-called phase transition occurs.

But what is actually going on in a phase transition? I think the clearest way to see this is by looking at an analog in cellular automata. With the particular rule shown below, if there aren’t very many initial black cells, the whole system will soon be white. But if you increase the number of initial black cells (as a kind of analog of increasing the temperature in a magnetic system), then suddenly, in this case at 50% black, there’s a sharp transition, and now the whole system eventually becomes black. (For phase transition experts: yes, this is a phase transition in a 1D system; one only needs 2D if the system is required to be microscopically reversible.)

&#10005
GraphicsRow[SeedRandom[234316];
 Table[ArrayPlot[
   CellularAutomaton[<|
     "RuleNumber" -> 294869764523995749814890097794812493824, 
     "Colors" -> 4|>, 
    3 Boole[Thread[RandomReal[{0, 1}, 2000] < rho]], {500, {-300, 
      300}}], FrameLabel -> {None, 
Row[{
Round[100 rho], "% black"}]}], {rho, {0.4, 0.45, 0.55, 0.6}}], -30]

But what does the system do near 50% black? In effect, it can’t decide whether to finally become black or white. And so it ends up showing a whole hierarchy of “fluctuations” from the smallest scales to the largest. And what became clear by the 1960s is that the “critical exponents” characterizing the power laws describing these fluctuations are universal across many different systems.

But how can one compute these critical exponents? In a few toy cases, analytical methods were known. But mostly, something else was needed. And in the late 1960s Ken Wilson realized that one could use the renormalization group, and computers. One might have a model for how individual spins interact. But the renormalization group gives a procedure for “scaling up” to the interactions of larger and larger blocks of spins. And by studying that on a computer, Ken Wilson was able to start computing critical exponents.

At first, the physics world didn’t pay much attention, not least because they weren’t used to computers being so intimately in the loop in theoretical physics. But then there was the Kondo problem (and, yes, so far as I know, it has no relation to modern Kondoing—though it does relate to modern quantum dot cellular automata). In most materials, electrical resistivity decreases as the temperature decreases (going to zero for superconductors even above absolute zero). But back in the 1930s, measurements on gold had shown instead an increase of resistivity at low temperatures. By the 1960s, it was believed that this was due to the scattering of electrons from magnetic impurities—but calculations ran into trouble, generating infinite results.

But then, in 1975, Ken Wilson applied his renormalization group methods—and correctly managed to compute the effect. There was still a certain mystery about the whole thing (and it probably didn’t help that—at least when I knew him in the 1980s and beyond—I often found Ken Wilson’s explanations quite hard to understand). But the idea that the renormalization group could be important was established.

So how might it apply to fluid turbulence? Kolmogorov’s power law seemed suggestive. But could one take the Navier–Stokes equations which govern idealized fluid flow and actually derive something like this? This was the project on which Mitchell Feigenbaum embarked.

The Big Discovery

The Navier–Stokes equations are very hard to work with. In fact, to this day it’s still not clear how even the most obvious feature of turbulence—its apparent randomness—arises from these equations. (It could be that the equations aren’t a full or consistent mathematical description, and one’s actually seeing amplified microscopic molecular motions. It could be that—as in chaos theory and the Lorenz equations—it’s due to amplification of randomness in the initial conditions. But my own belief, based on work I did in the 1980s, is that it’s actually an intrinsic computational phenomenon—analogous to the randomness one sees in my rule 30 cellular automaton.)

So how did Mitchell approach the problem? He tried simplifying it—first by going from equations depending on both space and time to ones depending only on time, and then by effectively making time discrete, and looking at iterated maps. Through Paul Stein, Mitchell knew about the (not widely known) previous work at Los Alamos on iterated maps. But Mitchell didn’t quite know where to go with it, though having just got a swank new HP-65 programmable calculator, he decided to program iterated maps on it.

Then in July 1975, Mitchell went (as I also did a few times in the early 1980s) to the summer physics hang-out-together event in Aspen, CO. There he ran into Steve Smale—a well-known mathematician who’d been studying dynamical systems—and was surprised to find Smale talking about iterated maps. Smale mentioned that someone had asked him if the limit of the period-doubling cascade a 3.56995 could be expressed in terms of standard constants like π and . Smale related that he’d said he didn’t know. But Mitchell’s interest was piqued, and he set about trying to figure it out.

He didn’t have his HP-65 with him, but he dove into the problem using the standard tools of a well-educated mathematical physicist, and had soon turned it into something about poles of functions in the complex plane—about which he couldn’t really say anything. Back at Los Alamos in August, though, he had his HP-65, and he set about programming it to find the bifurcation points an.

The iterative procedure ran pretty fast for small n. But by n = 5 it was taking 30 seconds. And for n = 6 it took minutes. While it was computing, however, Mitchell decided to look at the an values he had so far—and noticed something: they seemed to be converging geometrically to a final value.

At first, he just used this fact to estimate a, which he tried—unsuccessfully—to express in terms of standard constants. But soon he began to think that actually the convergence exponent δ was more significant than a—since its value stayed the same under simple changes of variables in the map. For perhaps a month Mitchell tried to express δ in terms of standard constants.

But then, in early October 1975, he remembered that Paul Stein had said period doubling seemed to look the same not just for logistic maps but for any iterated map with a single hump. Reunited with his HP-65 after a trip to Caltech, Mitchell immediately tried the map x ⟶ sin(x)—and discovered that, at least to 3-digit precision, the exponent δ was exactly the same.

He was immediately convinced that he’d discovered something great. But Stein told him he needed more digits to really conclude much. Los Alamos had plenty of powerful computers—so the next day Mitchell got someone to show him how to write a program in FORTRAN on one of them to go further—and by the end of the day he had managed to compute that in both cases δ was about 4.6692.

The computer he used was a typical workhorse US scientific computer of the day: a CDC 6000 series machine (of the same type I used when I first moved to the US in 1978). It had been designed by Seymour Cray, and by default it used 60-bit floating-point numbers. But at this precision (about 14 decimal digits), 4.6692 was as far as Mitchell could compute. Fortunately, however, Pete’s wife Lucy Carruthers was a programmer at Los Alamos, and she showed Mitchell how to use double precision—with the result that he was able to compute δ to 11-digit precision, and determine that the values for his two different iterated maps agreed.

Within a few weeks, Mitchell had found that δ seemed to be universal whenever the iterated map had a single quadratic maximum. But he didn’t know why this was, or have any particular framework for thinking about it. But still, finally, at the age of 30, Mitchell had discovered something that he thought was really interesting.

On Mitchell’s birthday, December 19, he saw his friend Predrag, and told him about his result. But at the time, Predrag was working hard on mainstream particle physics, and didn’t pay too much attention.

Mitchell continued working, and within a few months he was convinced that not only was the exponent δ universal—the appropriately scaled, limiting, infinitely wiggly, actual iteration of the map was too. In April 1976 Mitchell wrote a report announcing his results. Then on May 2, 1976, he gave a talk about them at the Institute for Advanced Study in Princeton. Predrag was there, and now he got interested in what Mitchell was doing.

As so often, however, it was hard to understand just what Mitchell was talking about. But by the next day, Predrag had successfully simplified things, and come up with a single, explicit, functional equation for the limiting form of the scaled iterated map: g(g(x)) = , with α 2.50290—implying that for any iterated map of the appropriate type, the limiting form would always look like an even wigglier version of:

FeigenbaumFunction plot
&#10005
fUD[z_] = 
  1. - 1.5276329970363323 z^2 + 0.1048151947874277 z^4 + 
   0.026705670524930787 z^6 - 0.003527409660464297 z^8 + 
   0.00008160096594827505 z^10 + 0.000025285084886512315 z^12 - 
   2.5563177536625283*^-6 z^14 - 9.65122702290271*^-8 z^16 + 
   2.8193175723520713*^-8 z^18 - 2.771441260107602*^-10 z^20 - 
   3.0292086423142963*^-10 z^22 + 2.6739057855563045*^-11 z^24 + 
   9.838888060875235*^-13 z^26 - 3.5838769501333333*^-13 z^28 + 
   2.063994985307743*^-14 z^30;
   fCF = Compile[{z}, 
    Module[{\[Alpha] = -2.5029078750959130867, n, \[Zeta]},
     n = If[Abs[z] <= 1., 0, Ceiling[Log[-\[Alpha], Abs[z]]]];
     \[Zeta] = z/\[Alpha]^n;
     Do[\[Zeta] = #, {2^n}];
     \[Alpha]^n \[Zeta]]] &[fUD[\[Zeta]]];
     Plot[fCF[x], {x, -100, 100}, MaxRecursion -> 5, PlotRange -> All]

How It Developed

The whole area of iterated maps got a boost on June 10, 1976, with the publication in Nature of Robert May’s survey about them, written independent of Mitchell and (of course) not mentioning his results. But in the months that followed, Mitchell traveled around and gave talks about his results. The reactions were mixed. Physicists wondered how the results related to physics. Mathematicians wondered about their status, given that they came from experimental mathematics, without any formal mathematical proof. And—as always—people found Mitchell’s explanations hard to understand.

In the fall of 1976, Predrag went as a postdoc to Oxford—and on the very first day that I showed up as 17-year-old particle-physics-paper-writing undergraduate, I ran into him. We talked mostly about his elegant “bird tracks” method for doing group theory (about which he finally published a book 32 years later). But he also tried to explain iterated maps. And I still remember him talking about an idealized model for fish populations in the Adriatic Sea (only years later did I make the connection that Predrag was from what is now Croatia).

At the time I didn’t pay much attention, but somehow the idea of iterated maps lodged in my consciousness, soon mixed together with the notion of fractals that I learned from Benoit Mandelbrot’s book. And when I began to concentrate on issues of complexity a couple of years later, these ideas helped guide me towards systems like cellular automata.

But back in 1976, Mitchell (who I wouldn’t meet for several more years) was off giving lots of talks about his results. He also submitted a paper to the prestigious academic journal Advances in Mathematics. For 6 months he heard nothing. But eventually the paper was rejected. He tried again with another paper, now sending it to the SIAM Journal of Applied Mathematics. Same result.

I have to say I’m not surprised this happened. In my own experience of academic publishing (now long in the past), if one was reporting progress within an established area it wasn’t too hard to get a paper published. But anything genuinely new or original one could pretty much count on getting rejected by the peer review process, either through intellectual shortsightedness or through academic corruption. And for Mitchell there was the additional problem that his explanations weren’t easy to understand.

But finally, in late 1977, Joel Lebowitz, editor of the Journal of Statistical Physics, agreed to publish Mitchell’s paper—essentially on the basis of knowing Mitchell, even though he admitted he didn’t really understand the paper. And so it was that early in 1978 “Quantitative Universality for a Class of Nonlinear Transformations”—reporting Mitchell’s big result—officially appeared. (For purposes of academic priority, Mitchell would sometimes quote a summary of a talk he gave on August 26, 1976, that was published in the Los Alamos Theoretical Division Annual Report 1975–1976. Mitchell was quite affected by the rejection of his papers, and for years kept the rejection letters in his desk drawer.)

Mitchell continued to travel the world talking about his results. There was interest, but also confusion. But in the summer of 1979, something exciting happened: Albert Libchaber in Paris reported results on a physical experiment on the transition to turbulence in convection in liquid helium—where he saw period doubling, with exactly the exponent δ that Mitchell had calculated. Mitchell’s δ apparently wasn’t just universal to a class of mathematical systems—it also showed up in real, physical systems.

Pretty much immediately, Mitchell was famous. Connections to the renormalization group had been made, and his work was becoming fashionable among both physicists and mathematicians. Mitchell himself was still traveling around, but now he was regularly hobnobbing with the top physicists and mathematicians.

I remember him coming to Caltech, perhaps in the fall of 1979. There was a certain rock-star character to the whole thing. Mitchell showed up, gave a stylish but somewhat mysterious talk, and was then whisked away to talk privately with Richard Feynman and Murray Gell-Mann.

Soon Mitchell was being offered all sorts of high-level jobs, and in 1982 he triumphantly returned to Cornell as a full professor of physics. There was an air of Nobel Prize–worthiness, and by June 1984 he was appearing in the New York Times magazine, in full Beethoven mode, in front of a Cornell waterfall:

Mitchell in New York Times Magazine

Still, the mathematicians weren’t satisfied. As with Benoit Mandelbrot’s work, they tended to see Mitchell’s results as mere “numerical conjectures”, not proven and not always even quite worth citing. But top mathematicians (who Mitchell had befriended) were soon working on the problem, and results began to appear—though it took a decade for there to be a full, final proof of the universality of δ.

Where the Science Went

So what happened to Mitchell’s big discovery? It was famous, for sure. And, yes, period-doubling cascades with his universal features were seen in a whole sequence of systems—in fluids, optics and more. But how general was it, really? And could it, for example, be extended to the full problem of fluid turbulence?

Mitchell and others studied systems other than iterated maps, and found some related phenomena. But none were quite as striking as Mitchell’s original discovery.

In a sense, my own efforts on cellular automata and the behavior of simple programs, beginning around 1981, have tried to address some of the same bigger questions as Mitchell’s work might have led to. But the methods and results have been very different. Mitchell always tried to stay close to the kinds of things that traditional mathematical physics can address, while I unabashedly struck out into the computational universe, investigating the phenomena that occur there.

I tried to see how Mitchell’s work might relate to mine—and even in my very first paper on cellular automata in 1981 I noted for example that the average density of black cells on successive steps of a cellular automaton’s evolution can be approximated (in “mean field theory”) by an iterated map.

I also noted that mathematically the whole evolution of a cellular automaton can be viewed as an iterated map—though on the Cantor set, rather than on ordinary real numbers. In my first paper, I even plotted the analog of Mitchell’s smooth mappings, but now they were wild and discontinuous:

Rules plot
&#10005
GraphicsRow[
 Labeled[ListPlot[
     Table[FromDigits[CellularAutomaton[#, IntegerDigits[n, 2, 12]], 
       2], {n, 0, 2^12 - 1}], Sequence[
     AspectRatio -> 1, Frame -> True, FrameTicks -> None]], 
    Text[StringTemplate["rule ``"][#]]] & /@ {22, 42, 90, 110}]

But try as I might, I could never find any strong connection with Mitchell’s work. I looked for analogs of things like period doubling, and Sarkovskii’s theorem, but didn’t find much. In my computational framework, even thinking about real numbers, with their infinite sequence of digits, was a bit unnatural. Years later, in A New Kind of Science, I had a note entitled “Smooth iterated maps”. I showed their digit sequences, and observed, rather undramatically, that Mitchell’s discovery implied an unusual nested structure at the beginning of the sequences:

Nested
&#10005
FractionalDigits[x_, digs_Integer] := 
 NestList[{Mod[2 First[#], 1], Floor[2 First[#]]} &, {x, 0}, digs][[
  2 ;;, -1]];
  GraphicsRow[
 Function[a, 
   ArrayPlot[
    FractionalDigits[#, 40] & /@ 
     NestList[a # (1 - #) &, N[1/8, 80], 80]]] /@ {2.5, 3.3, 3.4, 3.5,
    3.6, 4}]

The Rest of the Story

Portrait of Mitchell
(Photograph by Predrag Cvitanović)

So what became of Mitchell? After four years at Cornell, he moved to the Rockefeller University in New York, and for the next 30 years settled into a somewhat Bohemian existence, spending most of his time at his apartment on the Upper East Side of Manhattan.

While he was still at Los Alamos, Mitchell had married a woman from Germany named Cornelia, who was the sister of the wife of physicist (and longtime friend of mine) David Campbell, who had started the Center for Nonlinear Studies at Los Alamos, and would later go on to be provost at Boston University. But after not too long, Cornelia left Mitchell, taking up instead with none other than Pete Carruthers. (Pete—who struggled with alcoholism and other issues—later reunited with Lucy, but died in 1997 at the age of 61.)

When he was back at Cornell, Mitchell met a woman named Gunilla, who had run away from her life as a pastor’s daughter in a small town in northern Sweden at the age of 14, had ended up as a model for Salvador Dalí, and then in 1966 had been brought to New York as a fashion model. Gunilla had been a journalist, video maker, playwright and painter. Mitchell and she married in 1986, and remained married for 26 years, during which time Gunilla developed quite a career as a figurative painter.

Mitchell’s last solo academic paper was published in 1987. He did publish a handful of other papers with various collaborators, though none were terribly remarkable. Most were extensions of his earlier work, or attempts to apply traditional methods of mathematical physics to various complex fluid-like phenomena.

Mitchell liked interacting with the upper echelons of academia. He received all sorts of honors and recognition (though never a Nobel Prize). But to the end he viewed himself as something of an outsider—a Renaissance man who happened to have focused on physics, but didn’t really buy into all its institutions or practices.

From the early 1980s on, I used to see Mitchell fairly regularly, in New York or elsewhere. He became a daily user of Mathematica, singing its praises and often telling me about elaborate calculations he had done with it. Like many mathematical physicists, Mitchell was a connoisseur of special functions, and would regularly talk to me about more and more exotic functions he thought we should add.

Mitchell had two major excursions outside of academia. By the mid-1980s, the young poetess—now named Kathy Hammond—that Mitchell had known at Cornell had been an advertising manager for the New York Times and had then married into the family that owned the Hammond World Atlas. And through this connection, Mitchell was pulled into a completely new field for him: cartography.

I talked to him about it many times. He was very proud of figuring out how to use the Riemann mapping theorem to produce custom local projections for maps. He described (though I never fully understood it) a very physics-based algorithm for placing labels on maps. And he was very pleased when finally an entirely new edition of the Hammond World Atlas (that he would refer to as “my atlas”) came out.

Starting in the 1980s, there’d been an increasing trend for physics ideas to be applied to quantitative finance, and for physicists to become Wall Street quants. And with people in finance continually looking for a unique edge, there was always an interest in new methods. I was certainly contacted a lot about this—but with the success of James Gleick’s 1987 book Chaos (for which I did a long interview, though was only mentioned, misspelled, in a list of scientists who’d been helpful), there was a whole new set of people looking to see how “chaos” could help them in finance.

One of those was a certain Michael Goodkin. When he was in college back in the early 1960s, Goodkin had started a company that marketed the legal research services of law students. A few years later, he enlisted several Nobel Prize–winning economists and started what may have been the first hedge fund to do computerized arbitrage trading. Goodkin had always been a high-rolling, globetrotting gambler and backgammon player, and he made and lost a lot of money. And, down on his luck, he was looking for the next big thing—and found chaos theory, and Mitchell Feigenbaum.

For a few years he cultivated various physicists, then in 1995 he found a team to start a company called Numerix to commercialize the use of physics-like methods in computations for increasingly exotic financial instruments. Mitchell Feigenbaum was the marquee name, though the heavy lifting was mostly done by my longtime friend Nigel Goldenfeld, and a younger colleague of his named Sasha Sokol.

At the beginning there was lots of mathematical-physics-like work, and Mitchell was quite involved. (He was an enthusiast of Itô calculus, gave lectures about it, and was proud of having found 1000 speed-ups of stochastic integrations.) But what the company actually did was to write C++ libraries for banks to integrate into their systems. It wasn’t something Mitchell wanted to do long term. And after a number of years, Mitchell’s active involvement in the company declined.

(I’d met Michael Goodkin back in 1998, and 14 years later—having recently written his autobiography The Wrong Answer Faster: The Inside Story of Making the Machine That Trades Trillions—he suddenly contacted me again, pitching my involvement in a rather undefined new venture. Mitchell still spoke highly of Michael, though when the discussion rather bizarrely pivoted to me basically starting and CEOing a new company, I quickly dropped it.)

I had many interactions with Mitchell over the years, though they’re not as well archived as they might be, because they tended to be verbal rather than written, since, as Mitchell told me (in email): “I dislike corresponding by email. I still prefer to hear an actual voice and interact…”

There are fragments in my archive, though. There’s correspondence, for example, about Mitchell’s 2004 60th-birthday event, that I couldn’t attend because it conflicted with a significant birthday for one of my children. In lieu of attending, I commissioned the creation of a “Feigenbaum–Cvitanović Crystal”—a 3D rendering in glass of the limiting function g(z) in the complex plane.

It was a little complex to solve the functional equation, and the laser manufacturing method initially shattered a few blocks of glass, but eventually the object was duly made, and sent—and I was pleased many years later to see it nicely displayed in Mitchell’s apartment:

Feigenbaum–Cvitanović crystal

Sometimes my archives record mentions of Mitchell by others, usually Predrag. In 2007, Predrag reported (with characteristic wit):

“Other news: just saw Mitchell, he is dating Odyssey.

No, no, it’s not a high-level Washington type escort service—he is dating Homer’s Odyssey, by computing the positions of low stars as function of the 26000 year precession—says Hiparcus [sic] had it all figured out, but Catholic church succeeded in destroying every single copy of his tables.”

Living up to the Renaissance man tradition, Mitchell always had a serious interest in history. In 2013, responding to a piece of mine about Leibniz, Mitchell said he’d been a Leibniz enthusiast since he was a teenager, then explained:

“The Newton hagiographer (literally) Voltaire had no idea of the substance of the Monadology, so could only spoof ‘the best of all possible worlds’. Long ago I’ve published this as a verbal means of explaining 2^n universality.

Leibniz’s second published paper at age 19, ‘On the Method of Inverse Tangents’, or something like that, is actually the invention of the method of isoclines to solve ODEs, quite contrary to the extant scholarly commentary. Both Leibniz and Newton start with differential equations, already having received the diff. calculus. This is quite an intriguing story.”

But the mainstay of Mitchell’s intellectual life was always mathematical physics, though done more as a personal matter than as part of institutional academic work. At some point he was asked by his then-young goddaughter (he never had children of his own) why the Moon looks larger when it’s close to the horizon. He wrote back an explanation (a bit in the style of Euler’s Letters to a German Princess), then realized he wasn’t sure of the answer, and got launched into many years of investigation of optics and image formation. (He’d actually been generally interested in the retina since he was at MIT, influenced by Jerry Lettvin of “What the Frog’s Eye Tells the Frog’s Brain” fame.)

He would tell me about it, explaining that the usual theory of image formation was wrong, and he had a better one. He always used the size of the Moon as an example, but I was never quite clear whether the issue was one of optics or perception. He never published anything about what he did, though with luck his manuscripts (rumored to have the makings of a book) will eventually see the light of day—assuming others can understand them.

When I would visit Mitchell (and Gunilla), their apartment had a distinctly Bohemian feel, with books, papers, paintings and various devices strewn around. And then there was The Bird. It was a cockatoo, and it was loud. I’m not sure who got it or why. But it was a handful. Mitchell and Gunilla nearly got ejected from their apartment because of noise complaints from neighbors, and they ended up having to take The Bird to therapy. (As I learned in a slightly bizarre—and never executed—plan to make videogames for “they-are-alien-intelligences-right-here-on-this-planet” pets, cockatoos are social and, as pets, arguably really need a “Twitter for Cockatoos”.)

The Bird
(Photograph by Predrag Cvitanović)

In the end, though, it was Gunilla who left, with the rumor being that she’d been driven away by The Bird.

The last time I saw Mitchell in person was a few years ago. My son Christopher and I visited him at his apartment—and he was in full Mitchell form, with eyes bright, talking rapidly and just a little conspiratorially about the mathematical physics of image formation. “Bird eyes are overrated”, he said, even as his cockatoo squawked in the next room. “Eagles have very small foveas, you know. Their eyes are like telescopes.”

“Fish have the best eyes”, he said, explaining that all eyes evolved underwater—and that the architecture hadn’t really changed since. “Fish keep their entire field of view in focus, not like us”, he said. It was charming, eccentric, and very Mitchell.

For years, we had talked from time to time on the phone, usually late at night. I saw Predrag a few months ago, saying that I was surprised not to have heard from Mitchell. He explained that Mitchell was sick, but was being very private about it. Then, a few weeks ago, just after midnight, Predrag sent me an email with the subject line “Mitchell is dead”, explaining that Mitchell had died at around 8 pm, and attaching a quintessential Mitchell-in-New-York picture:

Mitchell in New York
(Photograph by Predrag Cvitanović)

It’s kind of a ritual I’ve developed when I hear that someone I know has died: I immediately search my archives. And this time I was surprised to find that a few years ago Mitchell had successfully reached voicemail I didn’t know I had. So now we can give Mitchell the last word:

And, of course, the last number too: 4.66920160910299067185320382…

Fifty Years of Mentoring

$
0
0
Fifty Years of Mentoring

I’ve been reflecting recently on things I like to do. Of course I like creating things, figuring things out, and so on. But something else I like—that I don’t believe I’ve ever written about before—is mentoring. I’ve been doing it a shockingly long time: my first memories of it date from before I was 10 years old, 50 years ago. Somehow I always ended up being the one giving lots of advice—first to kids my own age, then also to ones somewhat younger, or older, and later to all sorts of people.

I was in England recently, and ran into someone I’d known as a kid nearly 50 years ago—and hadn’t seen since. He’s had a fascinating and successful career, but was kind enough to say that my interactions and advice to him nearly 50 years ago had really been important to him. Of course it’s nice to hear things like that—but as I reflect on it, I realize that mentoring is something I find fulfilling, whether or not I end up knowing that whatever seeds I’ve sown germinate (though, to be clear, I do find it fascinating to see what happens).

Mentoring is not like teaching. It’s something much more individual and personal. It’s about answering the specific “What should I do about X?” questions, and the general “What should I do given who I am?” questions. I’ve always been interested in people—which has been a great asset in identifying and leading people at my company all these years. It’s also what’s gotten me in recent years to write historical biography, and, sadly, to write a rather large number of obituaries.

But there’s something particularly fulfilling to me about mentoring, and about helping and changing outcomes, one person at a time. These days, there are two main populations I end up mentoring: CEOs, and kids. At some level, they’re totally different. But at some level, they’re surprisingly similar.

I like learning things, and I like solving problems. And in the mentoring I do, I’m always doing both these things. I’m hearing—often in quite a lot of detail—about different kinds of situations. And I’m trying to use my skills at problem solving to work out what to do. The constraint is always what is right for this particular person, and what is possible given the world as it is. But it’s so satisfying when one figures it out.

“Have you ever thought of X?” Sometimes, there’ll be an immediate “Oh, that’s a good idea” response. Sometimes one will be told a host of reasons why it can’t work—and then it’s a matter of picking through which objections are real, where all that’s needed is encouragement, and where there are other problems to be solved.

Sometimes my mentoring ends up being about things that have immediate effects on the world, like major strategy decisions for significant companies. Sometimes my mentoring is about things that are—for now—completely invisible to the world, like whether a kid should study this or that.

I tend to find mentoring the most interesting when it’s dealing with things I’ve never dealt with before. Maybe they’re things that are genuinely new in the world—like new situations in the technology industry. Or maybe they’re things that are just new to me, because I’ve never experienced or encountered that particular corner of human experience, or the world.

One thing that’s in common between CEOs and kids is that at some level they tend to be in “anything is possible” situations: they have a wide range of choices they can make about how to lead their companies, or their lives. And they also tend to want to think about the future—and about where they might go.

To be fair, there are both CEOs and kids where I wouldn’t be a particularly useful mentor. And most often that’s when they’re somehow already on some definite track, and where their next several years are largely determined. (Say just following a particular business plan, or a particular educational program.)

In the case of CEO mentoring, there’s a tendency for there to be quite long periods where not much happens, interspersed by the occasional urgent crises—deals to do or not, PR emergencies, personnel meltdowns, etc. (And, yes, those calls can come in at the most awkward times, making me glad that when I’m pushing other things aside, at least I can say to myself that I’m typically an official company advisor too, usually with a little equity in the company.)

With kids, things usually tend to be less urgent, and it’s more a matter of repeated interactions, gradually showing a direction, or working through issues. Sometimes—and this applies to CEOs as well—the issues are exogenous, and relate to situations in the world. Sometimes they’re endogenous, and they’re about how someone is motivated, or thinks about themselves or their place in the world.

I’ve found that the kids I find it most interesting to mentor fall into two categories. The first are the seriously precocious kids who are already starting to launch in high-flying directions. And the second are kids who aren’t connected to the high-flying world, and may be in difficult circumstances, but who somehow have some kind of spark that interactions with me can help nurture.

I’ve done a fair amount of traveling around the world in recent years (often with one or more of my own kids). And I always find it interesting to visit schools. (Research universities tend to seem similar all over the world, but as one gets to high schools and below, there are more and more obvious—and interesting—differences.) Usually I’ll give talks and have discussions with students. And there’s a pattern that’s repeated over and over again. At the end of an event, one or two students will come up to me and start an interesting conversation, and eventually I’ll hand them a business card and say: “If you ever want to chat more, send me mail”.

And, yes, the ones I hear from are a very self-selected set. Typically I’ll do an initial phone call to learn more about them. And if it seems like I can be useful, I’ll say, “Let me put you on my list of people I’ll call when I have time”.

I have a busy life, and I like to be as productive as possible. But there are always scraps of time when I’m not doing what I usually do. Maybe I’ll be driving from here to there at a time when there’s no useful meeting I can do. Maybe I’ll be procrastinating starting something because I’m not quite in the right frame of mind. And at those kinds of times it’s great to do a mentoring phone call. Because even if I’m hearing about all sorts of problems, I always find it energizing.

With CEOs, the problems can be big and sophisticated. With kids one might at first assume they’d be too familiar and low-level to be interesting. But at least for me, that’s not the case. Sometimes it’s that I started my career sufficiently early that I never personally encountered that kind of problem. Sometimes it’s that the problems are ones that newly exist only in recent years.

And particularly for kids in difficult circumstances, it’s often that with my particular trajectory in life I’ve just never been exposed to those kinds of problems. Sometimes I’m quite embarrassed at how clueless I am about some economic or social hardship a kid tells me about. But I’ll ask lots of questions—and often I’m quite proud of the solutions I’ll come up with.

I have to say that in modern times, it’s disappointing how difficult it tends to be for someone like me to reach kids who aren’t already connected to the rather high-flying parts of the world I usually deal with. There’s an example with our (very successful, I might add) Wolfram High School Summer Camp, which we’ve been putting on for the past seven years. We’ve always got great kids at the Summer Camp. But in the first few years, I noticed that almost all of them came from the most elite schools—usually on the East Coast or West Coast of the US, and generally had very sophisticated backgrounds.

I wanted to broaden things out, and so we put effort into advertising the Summer Camp on our Wolfram|Alpha website that (I’m happy to say) a very large number of kids use. The results were good in the sense that we immediately got a much broader geographic distribution, both within the US and outside. But though we advertised that scholarships and financial aid were available, few people applied for those, and in fact the fraction even seems to have recently been going down slightly.

It’s a frustrating situation, and perhaps it’s a reflection of broader societal issues. Of course, the Summer Camp is a somewhat different situation from mentoring, because to be successful at the Summer Camp, kids already have to have (or give themselves) a certain amount of preparation (learn at least the basics of the Wolfram Language, etc.). And in fact, it’s not uncommon for kids I’ve mentored to end up going to the Summer Camp. And from that point on (or, for example, when they go to some good college), they’re often basically “solved problems”, now connected to people and resources that will help take them forward.

When my company was young, I often found myself mentoring employees. But as the company grew, and developed a strong internal culture, that became less and less necessary because in a sense, the whole ambient environment provided mentoring. And, yes, as is typical in companies, my values as founder and CEO are (for better or worse) deeply imprinted on the organization. And part of what that means is that I don’t personally have to communicate them to everyone in the organization.

In a company it clearly makes sense to promote a certain coherent set of goals and values. But what about in the world at large, or, say, in kids one mentors? There’s always a great tendency to promote—often with missionary zeal—the kind of thing one does oneself. “Everyone should want to be a tech entrepreneur!” “Everyone should want to be a professor!” etc. And, yes, there will be people for whom those are terrific directions, and unless someone mentors them in those directions, they’ll never find them. But what about all the others?

I did some surveys of kids a couple of years ago, asking them about their goals. I asked them to say how interested they were in things like having their own reality TV show, making a billion dollars, making a big scientific discovery, having lots of friends, taking a one-way trip to Mars, etc. And, perhaps not surprisingly, there was great diversity in their answers. I asked some adults the same questions, and then asked them how they thought their answers would have been different when they were kids.

And my very anecdotal conclusion was that at least at this coarse level, the things people say they’d like to do or have done change fairly little over the course of their lives—at least after their early teenage years. Of course, an important goal of education should surely be to show people what’s out there in the world, and what it’s possible to do. In practice, though, much of modern formal education is deeply institutionalized in particular tracks that were defined a century ago. But still there are signals to be gleaned.

So you like math in school? The number of people who just do math for a living is pretty small. But what is the essence of what you like about math? Is it the definiteness of it? The problem solving? The abstract aesthetics? The measurable competitiveness? If you’re mentoring a kid you should be able to parse it out—and depending on the answer there’ll be all sorts of different possible directions and opportunities.

And in general, my point of view is that the goal should always be to try to find signals from people, and then to see how to help amplify them, and solve the problem of how to fit them into what’s possible in the world. I like to think that for every person there’s something out there that’s the best fit for what they should be doing. Of course, you may be lucky or unlucky in the time in history in which you live. You want to be an explorer, doing things like searching for the sources of rivers? Sorry, that’s been done. You want to be an asteroid miner or a genetic designer of animals? Sorry, you’re too early.

In a company, I view it as a major role—and responsibility—of management to take the skills and talents of the people one has, and solve the puzzle of fitting them into the projects that the company needs to do. Quite often one ends up suggesting quite new directions to people. (“I had no idea there was a thing like software quality assurance.” “Linguistic curation is something people do?” etc.) And over the years it’s been very satisfying to see how many successful careers I’ve been able to help launch by pointing people to new fields where it turns out their skills and interests are a match.

I don’t claim to be immune to the “encourage people to do what you do” phenomenon. And in a sense that informs the people—CEOs or kids—who I mentor. But I like to think that I’m unprejudiced about subject areas (and the more experience I get in the world, and with different kinds of people, the easier that gets). What does tend to be in common, though, is that I believe in just figuring out what to do, and doing it.

Too few people have had serious experience in going from “nothing to something”: of starting from some idea that just got invented, and then seeing it over the course of time turn into something real—and perhaps even important—in the world. But that’s the kind of thing I’ve spent my life doing, and that I try to do all the time.

And (at least given my worldview) I think it’s something that’s incredibly valuable and educational for people to see, and if possible experience for themselves. When people at the company have been involved in major “nothing-to-something” projects, I think there’s a certain glow of confidence they get that lasts a decade.

I can see that my own children have benefitted from watching so many projects of mine go from nothing to something—and being exposed to the process that’s been involved (and often giving their own input). And when I mentor kids (and often CEOs too) I like to mention projects I’ve got going on, so that over the course of time they too gradually get a sense of at least my version of the “nothing-to-something” process.

For the past several years, I’ve spent a couple of hours most Sundays doing “Computational Adventures” with groups of kids (mostly middle school, with some early high school, and some late elementary school). It’s been fascinating for me, especially as I try to understand more about teaching computational thinking. And of course it’s invigorating for me to be doing something so different from my typical “day job”.

Most of the time what I’ll actually do with the kids is try to figure out or build something with the Wolfram Language. It’s not the same kind of thing as mentoring individual kids, but there’s a little bit of “create something from nothing” when we develop ideas and implement them in the Wolfram Language.

I think to most kids, knowledge is something that just exists, not something that they know people create. And so it’s always fun when the kids bring up a topic, and I’m like “well, it so happens that the world expert on that is a friend of mine”, or, “well, actually, I was the one who discovered this or that!”. Like in mentoring, all this helps communicate the “you can do that too” message. And after a while, it’s something that kids just start to take for granted.

One of the features of having done mentoring for so long is that I’ve been able to see all sorts of long-term outcomes. Sometimes it’s a bit uncanny. I’ll be talking to some kid, and I’ll think to myself: “They’re just like that kid I knew 50 years ago!” And then I’ll start playing out in my mind what I think would naturally happen this time around, decades hence. And it’s the same with CEOs and their issues.

And, yes, it’s useful to have the experience, and to be able to make those predictions. But there’s still the problem solving about the present to do, and the human connection to make. And for me it all adds up to the fascinating and fulfilling experience I’ve had in doing all that mentoring over the past half-century or so.

Often it’s been some random coincidence that’s brought a particular mentoree to me. Sometimes it’s been their initiative in reaching out (or, very occasionally, someone reaching out on their behalf). I’m hoping that in the future (particularly when it comes to kids), it’ll be a still broader cross-section. And that in the years to come I’ll have the pleasure of successfully answering ever more of those “What should I do?” questions—that make me think about something I’ve never thought about before, and help someone follow the path they want.

A Book from Alan Turing… and a Mysterious Piece of Paper

$
0
0
A Book from Alan Turing...

A Book from Alan Turing...

How I Got the Book

In May 2017, I got an email from a former high-school teacher of mine named George Rutter: “I have a copy of Dirac’s big book in German (Die Prinzipien der Quantenmechanik) that was owned by Alan Turing, and following your book Idea Makers it seemed obvious that you were the right person to own this.” He explained that he’d got the book from another (by then deceased) former high-school teacher of mine, Norman Routledge, who I knew had been a friend of Alan Turing’s. George ended, “If you would like the book, I could give it to you the next time you are in England.”

A couple of years passed. But in March 2019 I was indeed in England, and arranged to meet George for breakfast at a small hotel in Oxford. We ate and chatted, and waited for the food to be cleared. Then the book moment arrived. George reached into his briefcase and pulled out a rather unassuming, typical mid-1900s academic volume.

P. A. M. Dirac's Die Prinzipien der Quantenmechanik

I opened the front of the book, wondering if it might have a “Property of Alan Turing” sticker or something. It didn’t. But what it did have (in addition to an inscription saying “from Alan Turing’s books”) was a colorful four-page note from Norman Routledge to George Rutter, written in 2002.

I had known Norman Routledge when I was a high-school student at Eton in the early 1970s. He was a math teacher, nicknamed “Nutty Norman”. He was charmingly over the top in many ways, and told endless stories about math and other things. He’d also been responsible for the school getting a computer (programmed with paper tape, and the size of a desk)—that was the very first computer I ever used.

At the time, I didn’t know too much about Norman’s background (remember, this was long before the web). I knew he was “Dr. Routledge”. And he often told stories about people in Cambridge. But he never mentioned Alan Turing to me. Of course, Alan Turing wasn’t famous yet (although, as it happens, I’d already heard of him from someone who’d known him at Bletchley Park during the Second World War).

Alan M. Turing by Sara Turing

Alan Turing still wasn’t famous in 1981 when I started studying simple programs, albeit in the context of cellular automata rather than Turing machines. But looking through the card catalog at the Caltech library one day, I chanced upon a book called Alan M. Turing by Sara Turing, his mother. There was lots of information in the book—among other things, about Turing’s largely unpublished work on biology. But I didn’t learn anything about a connection to Norman Routledge, because the book didn’t mention him (although, as I’ve now found out, Sara Turing did correspond with Norman about the book, and Norman ended up writing a review of it).

A decade later, very curious about Turing and his (then still unpublished) work on biology, I arranged to visit the Turing Archive at King’s College, Cambridge. Soon I’d gone through what they had of Turing’s technical papers, and with some time to spare, I thought I might as well ask to see his personal correspondence too. And flipping through it, I suddenly saw a couple of letters from Alan Turing to Norman Routledge.

By that time, Andrew Hodges’s biography—which did so much to make Turing famous—had appeared, and it confirmed that, yes, Alan Turing and Norman Routledge had indeed been friends, and in fact Turing had been Norman’s PhD advisor. I wanted to ask Norman about Turing, but by then Norman was retired and something of a recluse. Still, when I finished A New Kind of Science in 2002 (after my own decade of reclusiveness) I tracked him down and sent him a copy of the book with an inscription describing him as “My last mathematics teacher”. Some correspondence ensued, and in 2005 I was finally in England again, and arranged to meet Norman for a quintessentially English tea at a fancy hotel in London.

We had a lovely chat about many things, including Alan Turing. Norman started by saying that he’d really known Turing mostly socially—and that that was 50 years ago. But still he had plenty to say about him. “He was a loner.” “He giggled a lot.” “He couldn’t really talk to non-mathematicians.” “He was always afraid of upsetting his mother.” “He would go off in the afternoon and run a marathon.” “He wasn’t very ambitious (though ‘one wasn’t’ at King’s in those days).” Eventually the conversation came back to Norman. He said that even though he’d been retired for 16 years, he still contributed items to the Mathematical Gazette, in order, he said, “to unload things before I pass to a better place”, where, he added, somewhat impishly, “all mathematical truths will surely be revealed”. When our tea was finished, Norman donned his signature leather jacket and headed for his moped, quite oblivious to the bombings that had so disrupted transportation in London on that particular day.

That was the last time I saw Norman, and he died in 2013. But now, six years later, as I sat at breakfast with George Rutter, here was this note from him, written in 2002 in his characteristically lively handwriting:

Norman's letter

Norman's letter, page 1 Norman's letter, page 2 Norman's letter, page 3 Norman's letter, page 4 Image Map

I read it quickly at first. It was colorful as always:

I got Alan Turing’s book from his friend & executor Robin Gandy (it was quite usual at King’s for friends to be offered books from a dead man’s library—I selected the collected poems of A. E. Housman from the books of Ivor Ramsay as a suitable memento: he was the Dean & jumped off the chapel [in 1956])…

Later in the note he said:

You ask about where, eventually, the book should go—I would prefer it to go to someone (or some where) wh. wd. appreciate the Turing connection, but really it is up to you.

Stephen Wolfram sent me his impressive book, but I’ve done no more than dip into it…

He ended by congratulating George Rutter for having the courage to move (as it turned out, temporarily) to Australia in his retirement, saying that he’d “toyed with moving to Sri Lanka, for a cheap, lotus-eating existence”, but added “events there mean I was wise not to do so” (presumably referring to the Sri Lankan Civil War).

What’s In the Book?

OK, so here I was with a copy of a book in German written by Paul Dirac, that was at one time owned by Alan Turing. I don’t read German, and I’d had a copy of the same book in English (which was its original language) since the 1970s. Still, as I sat at breakfast, I thought it only proper that I should look through the book page by page. After all, that’s a standard thing one does with antiquarian books.

I have to say that I was struck by the elegance of Dirac’s presentation. The book was published in 1931, yet its clean formalism (and, yes, despite the language barrier, I could read the math) is pretty much as one would write it today. (I don’t want to digress too much about Dirac, but my friend Richard Feynman told me that at least to him, Dirac spoke only monosyllabically. Norman Routledge told me that he had been friends in Cambridge with Dirac’s stepson, who became a graph theorist. Norman quite often visited the Dirac household, and said the “great man” was sometimes in the background, always with lots of mathematical puzzles around. I myself unfortunately never met Dirac, though I’m told that after he finally retired from Cambridge and went to Florida, he lost much of his stiffness and became quite social.)

But back to Turing’s copy of Dirac’s book. On page 9 I started to see underlinings and little marginal notes, all written in light pencil. I kept on flipping pages. After a few chapters, the annotations disappeared. But then, suddenly, tucked into page 127, there was a note:

German note

It was in German, with what looked like fairly typical older German handwriting. And it seemed to have something to do with Lagrangian mechanics. By this point I’d figured out that someone must have had the book before Turing, and this must be a note made by that person.

I kept flipping through the book. No more annotations. And I was thinking I wouldn’t find anything else. But then, on page 231, a bookmark—with a charmingly direct branding message:

Heffers bookmark

Would there be anything more? I continued flipping. Then, near the end of the book, on page 259, in a section on the relativistic theory of electrons, I found this:

Folded note

I opened the piece of paper:

Opened note

I recognized it immediately: it’s lambda calculus, with a dash of combinators. But what on Earth was it doing here? Remember, the book is about quantum mechanics. But this is about mathematical logic, or what’s now considered theory of computation. Quintessential Turing stuff. So, I immediately wondered, did Turing write this page?

Even as we were sitting at breakfast, I was looking on the web for samples of Turing’s handwriting. But I couldn’t find many calculational ones, so couldn’t immediately conclude much. And soon I had to go, carefully packing the book away, ready to pursue the mystery of what this page was, and who wrote it.

About the Book

Before anything else, let’s talk about the book itself. Dirac’s The Principles of Quantum Mechanics was published in English in 1930, and very quickly also appeared in German. (The preface by Dirac is dated May 29, 1930; the one from the translator—Werner Bloch—August 15, 1930.) The book was a landmark in the development of quantum mechanics, systematically setting up a clear formalism for doing calculations, and, among other things, explaining Dirac’s prediction of the positron, which would be discovered in 1932.

Why did Alan Turing get the book in German rather than English? I don’t know for sure. But in those days, German was the leading language of science, and we know Alan Turing knew how to read it. (After all, the title of his famous Turing machine paper “On Computable Numbers, with an Application to the Entscheidungsproblem” had a great big German word in it—and within the body of the paper he referred to the rather obscure Gothic characters he used as “German letters”, contrasting them, for example, with Greek letters.)

Did Alan Turing buy the book, or was he given it? I don’t know. On the inside front cover of Turing’s copy of the book is a pencil notation “20/-”, which was standard notation for “20 shillings”, equal to £1. On the right-hand page, there’s an erased “26.9.30”, presumably meaning 26 September, 1930—perhaps the date when the book was first in inventory. Then to the far right, there’s an erased “20-.”, perhaps again the price. (Could this have been a price in Reichsmarks, suggesting the book was sold in Germany? Even though at that time 1 RM was worth roughly 1 shilling, a German price would likely have been written as, for example, “20 RM”.) Finally, on the inside back cover there’s “c 5/-”—maybe the (highly discounted) price for the book used.

Let’s review the basic timeline. Alan Turing was born June 23, 1912 (coincidentally, exactly 76 years before Mathematica 1.0 was released). He went as an undergraduate to King’s College, Cambridge in the fall of 1931. He got his undergraduate degree after the usual three years, in 1934.

In the 1920s and early 1930s, quantum mechanics was hot, and Alan Turing was interested in it. From his archives, we know that in 1932—as soon as it was published—he got John von Neumann’s Mathematical Foundations of Quantum Mechanics (in its original German). We also know that in 1935, he asked the Cambridge physicist Ralph Fowler for a possible question to study in quantum mechanics. (Fowler suggested computing the dielectric constant of water—which actually turns out to be a very hard problem, basically requiring full-fledged interacting quantum field theory analysis, and still not completely solved.)

When and how did Turing get his copy of Dirac’s book? Given that there seems to be a used price in the book, Turing presumably bought it used. Who was its first owner? The annotations in the book seem to be concerned primarily with logical structure, noting what should be considered an axiom, what logically depends on what, and so on. What about the note tucked into page 127?

Well, perhaps coincidentally, page 127 isn’t just any page: it’s the page where Dirac talks about the quantum principle of least action, and sets the stage for the Feynman path integral—and basically all modern quantum formalism. But what does the note say? It’s expanding on equation 14, which is an equation for the time evolution of a quantum amplitude. The writer has converted Dirac’s A for amplitude into a ρ, possibly reflecting an earlier (fluid-density analogy) German notation. Then the writer attempts an expansion of the action in powers of (Planck’s constant over 2π, sometimes called Dirac’s constant).

But there doesn’t seem to be a lot to be gleaned from what’s on the page. Hold the page up to the light, though, and there’s a little surprise—a watermark reading “Z f. Physik. Chem. B”:

Z f. Physik. Chem. B watermark

That’s a short form of Zeitschrift für physikalische Chemie, Abteilung B—a German journal of physical chemistry that began publication in 1928. Was the note perhaps written by an editor of the journal? Here’s the masthead of the journal for 1933. Conveniently, the editors are listed with their locations, and one stands out: Born · Cambridge.

Zeitschrift für physikalische Chemie, Abteilung B

That’s Max Born, of the Born interpretation, and many other things in quantum mechanics (and also the grandfather of the singer Olivia Newton-John). So, was this note written by Max Born? Unfortunately it doesn’t seem like it: the handwriting doesn’t match.

OK, so what about the bookmark at page 231? Here are the two sides of it:

Heffers bookmark

The marketing copy is quaint and rather charming. But when is it from? Well, there’s still a Heffers Bookshop in Cambridge, though it’s now part of Blackwell’s. But for more than 70 years (ending in 1970) Heffers was located, as the bookmark indicates, at 3 and 4 Petty Cury.

But there’s an important clue on the bookmark: the phone number is listed as “Tel. 862”. Well, it turns out that in 1939, most of Cambridge (including Heffers) switched to 4-digit numbers, and certainly by 1940 bookmarks were being printed with “modern” phone numbers. (English phone numbers progressively got longer; when I was growing up in England in the 1960s, our phone numbers were “Oxford 56186” and “Kidmore End 2378”. Part of why I remember these numbers is the now-strange-seeming convention of always saying one’s number when answering the phone.)

But, OK, so the bookmark was from before 1939. But how much before? There are quite a few scans of old Heffers ads to be found on the web—and from at least 1912 (along with “We solicit the favour of your enquiries…”) they list “Telephone 862”, helpfully adding “(2 lines)”. And there are even some bookmarks with the same design to be found in copies of books from as long ago as 1904 (though it’s not clear they were original to the books). But for our purposes it seems as if we can reasonably conclude that our book came from Heffers (which was the main bookstore in Cambridge, by the way) sometime between 1930 and 1939.

The Lambda Calculus Page

OK, so we know something about when the book was bought. But what about the “lambda calculus page”? When was it written? Well, of course, lambda calculus had to have been invented. And that was done by Alonzo Church, a mathematician at Princeton, in an initial form in 1932, and in final form in 1935. (There had been precursors, but they hadn’t used the λ notation.)

There’s a complicated interaction between Alan Turing and lambda calculus. It was in 1935 that Turing had gotten interested in “mechanizing” the operations of mathematics, and had invented the idea of a Turing machine, and used it to solve a problem in the foundations of mathematics. Turing had sent a paper about it to a French journal (Comptes rendus), but initially it was lost in the mail; and then it turned out the person he’d sent it to wasn’t around anyway, because they’d gone to China.

But in May 1936, before Turing could send his paper anywhere else, Alonzo Church’s paper arrived from the US. Turing had been “scooped” once before, when in 1934 he created a proof of the central limit theorem, only to find that there was a Norwegian mathematician who’d already given a proof in 1922.

It wasn’t too hard to see that Turing machines and lambda calculus were actually equivalent in the kinds of computations they could represent (and that was the beginning of the Church–Turing thesis). But Turing (and his mentor Max Newman) got convinced that Turing’s approach was different enough to deserve separate publication. And so it was that in November 1936 (with a bug fix the following month), Turing’s famous paper “On Computable Numbers…” was published in the Proceedings of the London Mathematical Society.

To fill in a little more of the timeline: from September 1936 to July 1938 (with a break of three months in the summer of 1937), Turing was at Princeton, having gone there to be, at least nominally, a graduate student of Alonzo Church. While at Princeton, Turing seems to have concentrated pretty completely on mathematical logic—writing several difficult-to-read papers full of Church’s lambdas—and most likely wouldn’t have had a book about quantum mechanics with him.

Turing was back in Cambridge in July 1938, but already by September of that year he was working part-time for the Government Code and Cypher School—and a year later he moved to Bletchley Park to work full time on cryptanalysis. After the war ended in 1945, Turing moved to London to work at the National Physical Laboratory on producing a design for a computer. He spent the 1947–8 academic year back in Cambridge, but then moved to Manchester to work on building a computer there.

In 1951, he began working in earnest on theoretical biology. (To me it’s an interesting irony that he seems to have always implicitly assumed that biological systems have to be modeled by differential equations, rather than by something discrete like Turing machines or cellular automata.) He also seems to have gotten interested in physics again, and by 1954 even wrote to his friend and student Robin Gandy that “I’ve been trying to invent a new Quantum Mechanics” (though he added, “but it won’t really work”). But all this came to an end on June 7, 1954, when Turing suddenly died. (My own guess is that it was not suicide, but that’s a different story.)

OK, but back to the lambda calculus page. Hold it up to the light, and once again there’s a watermark:

Excelsior watermark

So it’s a British-made piece of paper, which seems, for example, to make it unlikely to have been used in Princeton. But can we date the paper? Well, after some help from the British Association of Paper Historians, we know that the official manufacturer of the paper was Spalding & Hodge, Papermakers, Wholesale and Export Stationers of Drury House, Russell Street off Drury Lane, Covent Garden, London. But this doesn’t help as much as one might think—because their Excelsior brand of machine-made paper seems to have been listed in catalogs all the way from the 1890s to 1954.

What Does the Page Say?

What does the page say?

OK, so let’s talk in more detail about what’s on the two sides of the page. Let’s start with the lambdas.

These are a way of defining “pure” or “anonymous” functions, and they’re a core concept in mathematical logic, and nowadays also in functional programming. They’re common in the Wolfram Language, and they’re pretty easy to explain there. One writes f[x] to mean a function f applied to an argument x. And there are lots of named functions that f can be—like Abs or Sin or Blur. But what if one wants f[x] to be 2x+1? There’s no immediate name for that function. But is there still something we can write for f that will make f[x] be this?

The answer is yes: in place of f we write Function[a, 2a+1]. And in the Wolfram Language, Function[a, 2a+1][x] is defined to give 2x+1. The Function[a, 2a+1] is a “pure” or “anonymous” function, that represents the pure operation of doubling and adding 1.

Well, λ in lambda calculus is the exact analog of Function in the Wolfram Language—and so for example λa.(2a+1) is equivalent to Function[a, 2a+1]. (It’s worth noting that Function[b, 2b+1] is equivalent; the “bound variable” a or b is just a placeholder—and in the Wolfram Language it can be avoided by using the alternative notation (2#+1)&.)

In traditional mathematics, functions tend to be thought of as things that map inputs (like, say, integers) to outputs (that are also, say, integers). But what kind of a thing is Function (or λ)? It’s basically a structural operator that takes expressions and turns them into functions. That’s a bit weird from the point of view of traditional mathematics and mathematical notation. But if one’s thinking about manipulating arbitrary symbols, it’s much more natural, even if at first it still seems a little abstract. (And, yes, when people learn the Wolfram Language, I can always tell they’ve passed a certain threshold of abstract understanding when they get the idea of Function.)

OK, but the lambdas are just part of what’s on the page. There’s also another, yet more abstract concept: combinators. See the rather obscure-looking line PI1IIx? What does it mean? Well, it’s a sequence of combinators, or effectively, a kind of abstract composition of symbolic functions.

Ordinary composition of functions is pretty familiar from mathematics. And in Wolfram Language one can write f[g[x]] to mean “apply f to the result of applying g to x”. But does one really need the brackets? In the Wolfram Language f@g@x is an alternative notation. But in this notation, we’re relying on a convention in the Wolfram Language: that the @ operator associates to the right, so that f@g@x is equivalent to f@(g@x).

But what would (f@g)@x mean? It’s equivalent to f[g][x]. And if f and g were ordinary functions in mathematics, this would basically be meaningless. But if f is a higher-order function, then f[g] can itself be a function, which can perfectly well be applied to x.

OK, there’s another piece of complexity here. In f[x] the f is a function of one argument. And f[x] is equivalent to Function[a, f[a]][x]. But what about a function of two arguments, say f[x, y]? This can be written Function[{a,b}, f[a, b]][x, y]. But what about Function[{a}, f[a, b]]? What would this be? It’s got a “free variable” b just hanging out. Function[{b}, Function[{a}, f[a, b]]] would “bind” that variable. And then Function[{b}, Function[{a}, f[a, b]]][y][x] gives f[x, y] again. (The process of unwinding functions so that they have single arguments is called “currying”, after a logician named Haskell Curry.)

If there are free variables, then there’s all sorts of complexity about how functions can be composed. But if we restrict ourselves to Function or λ objects that don’t have free variables, then these can basically be freely composed. And such objects are called combinators.

Combinators have a long history. So far as one knows, they were first invented in 1920 by a student of David Hilbert’s named Moses Schönfinkel. At the time, it had only recently been discovered that one didn’t need And and Or and Not to represent expressions in standard propositional logic: it was sufficient to use the single operator that we’d now call Nand (because, for example, writing Nand as ·, Or[a, b] is just (a·a)·(b·b)). Schönfinkel wanted to find the same kind of minimal representation of predicate logic, or in effect, logic including functions.

And what he came up with was the two “combinators” S and K. In Wolfram Language notation, K[x_][y_] x and S[x_][y_][z_] x[z][y[z]]. Now, here’s the remarkable thing: it turns out to be possible to use these two combinators to perform any computation. So, for example, S[K[S]][S[K[S[K[S]]]][S[K[K]]]] can be used as a function to add two integers.

It is, to put it mildly, quite abstract stuff. But now that one’s understood Turing machines and lambda calculus, it’s possible to see that Schönfinkel’s combinators actually anticipated the concept of universal computation. (And what’s more remarkable still, the definitions of S and K from 1920 are almost minimally simple, reminiscent of the very simplest universal Turing machine that I finally suggested in the 1990s, and was proved in 2007.)

But back to our page, and the line PI1IIx. The symbols here are combinators, and they’re all intended to be composed. But the convention was that function composition should be left-associative, so that fgx should be interpreted not like f@g@x as f@(g@x) or f[g[x]] but rather like (f@g)@x or f[g][x]. So, translating a bit for convenient Wolfram Language use, PI1IIx is p[i][one][i][i][x].

Why would someone be writing something like this? To explain that, we have to talk about the concept of Church numerals (named after Alonzo Church). Let’s say we’re just working with symbols and with lambdas, or combinators. Is there a way we use these to represent integers?

Well, how about just saying that a number n corresponds to Function[x, Nest[f, x, n]]? Or, in other words, that (in shorter notation) 1 is f[#]&, 2 is f[f[#]]&, 3 is f[f[f[#]]]&, and so on. This might seem irreducibly obscure. But the reason it’s interesting is that it allows us to do everything completely symbolically and abstractly, without ever having to explicitly talk about something like integers.

With this setup, imagine, for example, adding two numbers: 3 can be represented as f[f[f[#]]]&, and 2 is f[f[#]]&. We can add them just by applying one of them to the other:

f[f[f[#]]] & [f[f[#]] &]
&#10005
f[f[f[#]]] & [f[f[#]] &]

OK, but what is the f supposed to be? Well, just let it be anything! In a sense, “go lambda” all the way, and represent numbers by functions that take f as an argument. In other words, make 3 for example be Function[f, f[f[f[#]]]&] or Function[f, Function[x, f[f[f[x]]]]. (And, yes, exactly when and how you need to name variables is the bane of lambda calculus.)

Here’s a fragment from Turing’s 1937 paper “Computability and λ-Definability” that sets things up exactly as we just discussed:

Fragment from "Computability and λ-Definability"

The notation is a little confusing. Turing’s x is our f, while his x' (the typesetter did him no favor by inserting space) is our x. But it’s exactly the same setup.

OK, so let’s take a look at the line right after the fold on the front of the page. It’s I1IIYI1IIx. In Wolfram Language notation this would be i[one][i][i][y][i][one][i][i][x]. But here, i is the identity function, so i[one] is just one. Meanwhile, one is the Church numeral for 1, or Function[f, f[#]&]. But with this definition one[a] becomes a[#]& and one[a][b] becomes a[b]. (By the way, i[a][b], or Identity[a][b], is also a[b].)

It keeps things cleaner to write the rules for i and one using pattern matching rather than explicit lambdas, but the result is the same. Apply these rules and one gets:

i[one][i][i][y][i][one][i][i][x] //. {i[x_] → x, one[x_][y_] → x[y]}
&#10005
i[one][i][i][y][i][one][i][i][x] //.
					 {i[x_] -> x, one[x_][y_] -> x[y]}

And that’s exactly the same as the first reduction shown:

Excerpt 1

OK, let’s look higher on the page again now:

Excerpt 2

There’s a rather confusing “E” and “D”, but underneath these say “P” and “Q”, so we can write out the expression, and evaluate it (note that here—after some confusion with the very last character—the writer makes both [ ... ] and ( … ) represent function application):

Function[a, a[p]][q]
&#10005
Function[a, a[p]][q]

OK, so this is the first reduction shown. To see more, let’s substitute in the form of Q:

q[p] /. q → Function[f, f[i][one][i][i][x]]
&#10005
q[p] /.
					 q -> Function[f, f[i][one][i][i][x]]

We get exactly the next reduction shown. OK, so what about putting in the form for P?

Excerpt 3

Here’s the result:

p[i][one][i][i][x] /. {p → Function[r, r[Function[s, s[one][i][i][y]]]]}
&#10005
p[i][one][i][i][
  x] /.
   {p -> Function[r, r[Function[s, s[one][i][i][y]]]]}

And now using the fact that i is the identity, we get:

i[Function[s, s[one][i][i][y]]][one][i][i][x] /. {i[x_] → x}
&#10005
i[Function[s, s[one][i][i][y]]][one][i][i][x] /.
					 {i[x_] -> x}

But oops. This isn’t the next line written. Is there a mistake? It’s not clear. Because, after all, unlike in most of the other cases, there isn’t an arrow indicating that the next line follows from the previous one.

OK, so there’s a mystery there. But let’s skip ahead to the bottom of the page:

Excerpt 4

The 2 here is a Church numeral, defined for example by the pattern two[a_][b_] a[a[b]]. But notice that this is actually the form of the second line, with a being Function[r, r[p]] and b being q. So then we’d expect the reduction to be:

two[Function[r, r[p]]][q] //. {two[x_][y_] → x[x[y]]}
&#10005
two[Function[r, r[p]]][q] //.
					 {two[x_][y_] -> x[x[y]]}

Somehow, though, the innermost a[b] is being written as x (probably different from the x earlier on the page), making the final result instead:

Function[r, r[p]][x]
&#10005
Function[r, r[p]][x]

OK, so we can decode quite a bit of what’s happening on the page. But at least one mystery that remains is what Y is supposed to be.

There’s actually a standard “Y combinator” in combinatory logic: the so-called fixed-point combinator. Formally, this is defined by saying that Y[f] must be equal to f[Y[f]], or, in other words, that Y[f] doesn’t change when f is applied, so that it’s a fixed point of f. (The Y combinator is related to #0 in the Wolfram Language.)

In modern times, the Y combinator has been made famous by the Y Combinator startup accelerator, named that way by Paul Graham (who had been a longtime enthusiast of functional programming and the LISP programming language—and had written an early web store using it) because (as he once told me) “nobody understands the Y combinator”. (Needless to say, Y Combinator is all about avoiding having companies go to fixed points…)

The Y combinator (in the sense of fixed-point combinator) was invented several times. Turing actually came up a version of it in 1937, that he called Θ. But is the “Y” on our page the famous fixed-point combinator? Probably not. So what is our “Y”? We see this reduction:

Excerpt 5

But that’s not enough information to uniquely determine what Y is. It’s clear Y isn’t operating just on a single argument; it seems to be dealing with at least two arguments. But it’s not clear (at least to me) how many arguments it’s taking, and what it’s doing.

OK, so even though we can interpret many parts of the page, we have to say that globally it’s not clear what’s been done. But even though it’s needed a lot of explanation here, what’s on the page is actually fairly elementary in the world of lambda calculus and combinators. Presumably it’s an attempt to construct a simple “program”—using lambda calculus and combinators—to do something. But as is typical in reverse engineering, it’s hard for us to tell what the “something”— the overall “explainable” goal—is supposed to be.

There’s one more feature of the page that’s worth commenting on, and that’s its use of brackets. In traditional mathematics one basically (if confusingly) uses parentheses for everything—both function application (as in f(x)) and grouping of terms (as in (1+x)(1-x), or, more ambiguously, a(1-x)). (In Wolfram Language, we separate different uses, with square brackets for function application—as in f[x]—and parentheses only for grouping.)

And in the early days of lambda calculus, there were lots of issues about brackets. Later, Alan Turing would write a whole (unpublished) paper entitled “The Reform of Mathematical Notation and Phraseology”, but already in 1937 he felt he needed to describe the (rather hacky) current conventions for lambda calculus (which were due to Church, by the way).

He said that f applied to g should be written {f}(g), unless f is just a single symbol, in which case it can be f(g). Then he said that a lambda (as in Function[a, b]) should be written λ a[b], or alternatively λ a . b. By perhaps 1940, however, the whole idea of using { … } and [ ... ] to mean different things had been dropped, basically in favor of standard-mathematical-style parentheses.

Look at what’s near the top of the page:

Excerpt 6

As written, this is a bit hard to understand. In Church’s convention, the square brackets would be for grouping, with the opening bracket replacing the dot. And with this convention, it’s clear that the Q (finally labeled D) enclosed in parentheses at the end is what the whole initial lambda is applied to. But actually, the square bracket doesn’t delimit the body of the lambda; instead, it’s actually representing another function application, and there’s no explicit specification of where the body of the lambda ends. At the very end, one can see that the writer changed a closing square bracket to a parenthesis, thereby effectively enforcing Church’s convention—and making the expression evaluate as the page shows.

So what does this little notational tangle imply? I think it strongly suggests that the page was written in the 1930s, or not too long thereafter—before conventions for brackets became clearer.

Whose Handwriting Is It?

OK, so we’ve talked about what’s on the page. But what about who wrote it?

The most obvious candidate would be Alan Turing, since, after all, the page was inside a book he owned. And in terms of content there doesn’t seem to be anything inconsistent with Alan Turing having written it—perhaps even when he was first understanding lambda calculus after getting Church’s paper in early 1936.

But what about the handwriting? Is that consistent with Alan Turing’s? Here are a few surviving samples that we know were written by Alan Turing:

Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Image Map

The running text definitely looks quite different. But what about the notation? At least to my eye, it didn’t look so obviously different—and one might think that any difference could just be a reflection of the fact that the extant samples are pieces of exposition, while our page shows “thinking in action”.

Conveniently, the Turing Archive contains a page where Turing wrote out a table of symbols to use for notation. And comparing this, the letter forms did look to me fairly similar (this was from Turing’s time of studying plant growth, hence the “leaf area” annotation):

Table of Symbols

But I wanted to check further. So I sent the samples to Sheila Lowe, a professional handwriting examiner (and handwriting-based mystery writer) I happen to know—just presenting our page as “sample A” and known Turing handwriting as “sample B”. Her response was definitive, and negative: “The writing style is entirely different. Personality-wise, the writer of sample B has a quicker, more intuitive thinking style than the one of sample A.” I wasn’t yet completely convinced, but decided it was time to start looking at other alternatives.

So if Turing didn’t write this, who did? Norman Routledge said he got the book from Robin Gandy, who was Turing’s executor. So I sent along a “Sample C”, from Gandy:

Sample C

But Sheila’s initial conclusion was that the three samples were likely written by three different people, noting again that sample B came from “the quickest thinker and the one that is likely most willing to seek unusual solutions to problems”. (I find it a little charming that a modern handwriting expert would give this assessment of Turing’s handwriting, given how vociferously Turing’s school reports from the 1920s complained about his handwriting.)

Well, at this point it seemed as if both Turing and Gandy had been eliminated as writers of the page. So who might have written it? I started thinking about people Turing might have lent the book to. Of course, they’d have to be capable of doing calculations in lambda calculus.

I assumed that the person would have to be in Cambridge, or at least in England, given the watermark on the paper. And I took as a working hypothesis that 1936 or thereabouts was the relevant time. So who did Turing know then? We got a list of all math students and faculty at King’s College at the time. (There were 13 known students who started in 1930 through 1936.)

And from these, the most promising candidate seemed to be David Champernowne. He was the same age as Turing, a longtime friend, and also interested in the foundations of mathematics—in 1933 already publishing a paper on what’s now called Champernowne’s constant: 0.12345678910111213… (obtained by concatenating the digits of 1, 2, 3, 4, …, 8, 9, 10, 11, 12, …, and one of the very few numbers known to be “normal” in the sense that every possible block of digits occurs with equal frequency). In 1937, he even used Dirac gamma matrices, as mentioned in Dirac’s book, to solve a recreational math problem. (As it happens, years later, I became quite an aficionado of gamma matrix computations.)

After starting in mathematics, though, Champernowne came under the influence of John Maynard Keynes (also at King’s), and eventually became a distinguished economist, notably doing extensive work on income inequality. (Still, in 1948 he also worked with Turing to design Turbochamp: a chess-playing program that almost became the first ever to be implemented on a computer.)

But where could I find a sample of Champernowne’s handwriting? Soon I’d located his son Arthur Champernowne on LinkedIn, who, curiously, had a degree in mathematical logic, and had been working for Microsoft. He said his father had talked to him quite a lot about Turing’s work, though hadn’t mentioned combinators. He sent me a sample of his father’s handwriting (a piece about algorithmic music composition):

Champernowne's handwriting

One could immediately tell it wasn’t a match (Champernowne’s f’s have loops, etc.)

So who else might it be? I wondered about Max Newman, in many ways Alan Turing’s mentor. Newman had first got Turing interested in “mechanizing mathematics”, was a longtime friend, and years later would be his boss at Manchester in the computer project there. (Despite his interest in computation, Newman always seems to have seen himself first and foremost as a topologist, though his cause wasn’t helped by a flawed proof he produced of the Poincaré conjecture.)

It wasn’t difficult to find a sample of Newman’s handwriting. And no, definitely not a match.

Tracing the Book

OK, so handwriting identification hadn’t worked. And I decided the next thing to do was to try to trace in a bit more detail what had actually happened to the book I had in my hands.

So, first, what was the more detailed story with Norman Routledge? He had gone to King’s College, Cambridge as an undergraduate in 1946, and had gotten to know Turing then (yes, they were both gay). He graduated in 1949, then started doing a PhD with Turing as his advisor. He got his PhD in 1954, working on mathematical logic and recursion theory. He got a fellowship at King’s College, and by 1957 was Director of Studies in Mathematics there. He could have stayed doing this his whole life, but he had broad interests (music, art, architecture, recreational math, genealogy, etc.) and in 1960 changed course, and became a teacher at Eton—where he entertained (and educated) many generations of students (including me) with his eclectic and sometimes outlandish knowledge.

Could Norman Routledge have written the mysterious page? He knew lambda calculus (though, coincidentally, he mentioned at our tea in 2005 that he always found it “confusing”). But his distinctive handwriting style immediately excludes him as a possible writer.

Could the page be somehow associated with a student of Norman’s, perhaps from when he was still in Cambridge? I don’t think so. Because I don’t think Norman ever taught about lambda calculus or anything like it. In writing this piece, I found that Norman wrote a paper in 1955 about doing logic on “electronic computers” (and creating conjunctive normal forms, as BooleanMinimize now does). And when I knew Norman he was quite keen on writing utilities for actual computers (his initials were “NAR”, and he named his programs “NAR…”, with, for example, “NARLAB” being a program for creating textual labels using hole patterns punched in paper tape). But he never talked about theoretical models of computation.

OK, but let’s read Norman’s note inside the book a bit more carefully. The first thing we notice is that he talks about being “offered books from a dead man’s library”. And from the wording, it sounds as if this happened quite quickly after a person died, suggesting that Norman got the book soon after Turing’s death in 1954, and that Gandy didn’t have it for very long. Norman goes on to say that actually he got four books in total, two on pure math, and two on theoretical physics.

Then he says that he gave “the other [physics] one (by Herman Weyl, I think)” to “Sebag Montefiore, a pleasant, clever boy whom you [George Rutter] may remember”. OK, so who is that? I searched for my rarely used Old Etonian Association List of Members. (I have to report that on opening it, I could not help but notice its rules from 1902, the first under “Rights of Members” charmingly being “To wear the Colours of the Association”. I should add that I would probably never have joined this association or got this book but for the insistence of a friend of mine at Eton named Nicholas Kermack, who from the age of 12 planned how he would one day become Prime Minister, but sadly died at the age of 21.)

But in any case, there were five Sebag-Montefiores listed, with quite a distribution of dates. It wasn’t hard to figure out that the appropriate one was probably Hugh Sebag-Montefiore. Small world that it is, it turned out that his family had owned Bletchley Park before selling it to the British Government in 1938. And in 2000, Sebag-Montefiore had written a book about the breaking of Enigma—which is presumably why in 2002 Norman thought to give him a book that had been owned by Turing.

OK, so what about the other books Norman got from Turing? Not having any other way to work out what happened to them, I ordered a copy of Norman’s will. The last clause in the will was classic Norman:

Excerpt from Norman's will

But what the will ultimately said was that Norman’s books should be left to King’s College. And although the complete collection of his books doesn’t seem to be anywhere to be found, the two Turing-owned pure math books that he mentioned in his note are now duly in the King’s College archive collection.

But, OK, so the next question is: what happened to Turing’s other books? I looked up Turing’s will, which seemed to leave them all to Robin Gandy.

Gandy was a math undergraduate at King’s College, Cambridge, who in his last year of college—in 1940—had become friends with Alan Turing. In the early part of the war, Gandy worked on radio and radar, but in 1944 he was assigned to the same unit as Turing, working on speech encipherment. And after the war, Gandy went back to Cambridge, soon starting a PhD, with Turing as his advisor.

Gandy’s war work apparently got him interested in physics, and his thesis, completed in 1952, was entitled “On Axiomatic Systems in Mathematics and Theories in Physics”. What Gandy seems to have been trying to do is to characterize what physical theories are in mathematical logic terms. He talks about type theory and rules of inference, but never about Turing machines. And from what we know now, I think he rather missed the point. And indeed my own work from the early 1980s argued that physical processes should be thought of as computations—like Turing machines or cellular automata—not as things like theorems to be deduced. (Gandy has a rather charming discussion of the order of types involved in physical theories, saying for example that “I reckon that the order of any computable binary decimal is less than eight”. He says that “one of the reasons why modern quantum field theory is so difficult is that it deals with objects of rather high type—functionals of functions…”, eventually suggesting that “we might well take the greatest type in common use as an index of mathematical progress”.)

Gandy mentions Turing a few times in the thesis, noting in the introduction that he owes a debt to A. M. Turing, who “first called my somewhat unwilling attention to the system of Church” (i.e. lambda calculus)—though in fact the thesis has very few lambdas in evidence.

After his thesis, Gandy turned to purer mathematical logic, and for more than three decades wrote papers at the rate of about one per year, and traveled the international mathematical logic circuit. In 1969 he moved to Oxford, and I have to believe that I must have met him in my youth, though I don’t have any recollection of it.

Gandy apparently quite idolized Turing, and in later years would often talk about him. But then there was the matter of the Turing collected works. Shortly after Turing died, Sara Turing and Max Newman had asked Gandy—as Turing’s executor—to organize the publication of Turing’s unpublished papers. Years went by. Letters in the archives record Sara Turing’s frustration. But somehow Gandy never seemed to get the papers together.

Gandy died in 1995, still without the collected works complete. Nick Furbank—a literary critic and biographer of E. M. Forster who Turing had gotten to know at King’s College—was Turing’s literary executor, and finally he swung into action on the collected works. The most contentious volume seemed to be the one on mathematical logic, and for this he enlisted Robin Gandy’s first serious PhD student, a certain Mike Yates—who found letters to Gandy about the collected works that had been unopened for 24 years. (The collected works finally appeared in 2001—45 years after they were started.)

But what about the books Turing owned? In continuing to try to track them down, my next stop was the Turing family, and specifically Turing’s brother’s youngest child, Dermot Turing (who is actually Sir Dermot Turing, as a result of a baronetcy which passed down the non-Alan branch of the Turing family). Dermot Turing (who recently wrote a biography of Alan Turing) told me about “granny Turing” (aka Sara Turing), whose house apparently shared a garden gate with his family’s, and many other things about Alan Turing. But he said the family never had any of Alan Turing’s books.

So I went back to reading wills, and found out that Gandy’s executor was his student Mike Yates. We found out that Mike Yates had retired from being a professor 30 years ago, but was now living in North Wales. He said that in the decades he was working in mathematical logic and theory of computation, he’d never really touched a computer—but finally did when he retired (and, as it happens, discovered Mathematica soon thereafter). He said how remarkable it was that Turing had become so famous—and that when he’d arrived at Manchester just three years after Turing died, nobody talked about Turing, not even Max Newman when he gave a course about logic. Though later on, Gandy would talk about how swamped he was in dealing with Turing’s collected works—eventually leaving the task to Mike.

What did Mike know about Turing’s books? Well, he’d found one handwritten notebook of Turing’s, that Gandy had not given to King’s College, because (bizarrely) Gandy had used it as camouflage for notes he kept about his dreams. (Turing kept dream notebooks too, that were destroyed when he died.) Mike said that notebook had recently been sold at auction for about $1M. And that otherwise he didn’t think there was any Turing material among Gandy’s things.

It seemed like all our leads had dried up. But Mike asked to see the mysterious piece of paper. And immediately he said, “That’s Robin Gandy’s handwriting!” He said he’d seen so much of it over the years. And he was sure. He said he didn’t know much about lambda calculus, and couldn’t really read the page. But he was sure it had been written by Robin Gandy.

We went back to our handwriting examiner with more samples, and she agreed that, yes, what was there was consistent with Gandy’s writing. So finally we had it: Robin Gandy had written our mysterious piece of paper. It wasn’t written by Alan Turing; it was written by his student Robin Gandy.

Of course, some mysteries remain. Presumably Turing lent Gandy the book. But when? The lambda calculus notation seems like it’s from the 1930s. But based on comments in Gandy’s thesis, Gandy probably wouldn’t have been doing anything with lambda calculus until the late 1940s. Then there’s the question of why Gandy wrote it. It doesn’t seem directly related to his thesis, so maybe it was when he was first trying to understand lambda calculus.

I doubt we’ll ever know. But it’s certainly been interesting trying to track it down. And I have to say that the whole process has done much to heighten my awareness of just how complex the stories may be of all those books from past centuries that I own. And it makes me think I’d better make sure I’ve gone through all their pages, just to find out what curious things might be in there…


Thanks for additional help to Jonathan Gorard (local research in Cambridge), Dana Scott (mathematical logic) and Matthew Szudzik (mathematical logic).

The Ease of Wolfram|Alpha, the Power of Mathematica: Introducing Wolfram|Alpha Notebook Edition

$
0
0
sw-icon

Wolfram|Alpha Notebook Edition

The Next Big Step for Wolfram|Alpha

Wolfram|Alpha has been a huge hit with students. Whether in college or high school, Wolfram|Alpha has become a ubiquitous way for students to get answers. But it’s a one-shot process: a student enters the question they want to ask (say in math) and Wolfram|Alpha gives them the (usually richly contextualized) answer. It’s incredibly useful—especially when coupled with its step-by-step solution capabilities.

But what if one doesn’t want just a one-shot answer? What if one wants to build up (or work through) a whole computation? Well, that’s what we created Mathematica and its whole notebook interface to do. And for more than 30 years that’s how countless inventions and discoveries have been made around the world. It’s also how generations of higher-level students have been taught.

But what about students who aren’t ready to use Mathematica yet? What if we could take the power of Mathematica (and what’s now the Wolfram Language), but combine it with the ease of Wolfram|Alpha?

Well, that’s what we’ve done in Wolfram|Alpha Notebook Edition.

It’s built on a huge tower of technology, but what it does is to let any student—without learning any syntax or reading any documentation—immediately build up or work through computations. Just type input the way you would in Wolfram|Alpha. But now you’re not just getting a one-shot answer. Instead, everything is in a Wolfram Notebook, where you can save and use previous results, and build up or work through a whole computation:

Wolfram Notebook

The Power of Notebooks

Being able to use Wolfram|Alpha-style free-form input is what opens Wolfram|Alpha Notebook Edition up to the full range of students. But it’s the use of the notebook environment that makes it so uniquely valuable for education. Because by being able to work through things in a sequence of steps, students get to really engage with the computations they’re doing.

Try one step. See what happens. Change it if you want. Understand the output. See how it fits into the next step. And then—right there in the notebook—see how all your steps fit together to give your final results. And then save your work in the notebook, to continue—or review what you did—another time.

But notebooks aren’t just for storing computations. They can also contain text and structure. So students can use them not just to do their computations, but also to keep notes, and to explain the computations they’re doing, or the results they get:

Student notebook

And in fact, Wolfram Notebooks enable a whole new kind of student work: computational essays. A computational essay has both text and computation—combined to build up a narrative to which both human and computer contribute.

The process of creating a computational essay is a great way for students to engage with material they’re studying. Computational essays can also provide a great showcase of student achievement, as well as a means of assessing student understanding. And they’re not just something to produce for an assignment: they’re active computable documents that students can keep and use at any time in the future.

Study notebook—click to enlarge

But students aren’t the only ones to produce notebooks. In Wolfram|Alpha Notebook Edition, notebooks are also a great medium for teachers to provide material to students. Describe a concept in a notebook, then let students explore by doing their own computations right there in the notebook. Or make a notebook defining an assignment or a test—then let the students fill in their work (and grade it right there in the notebook).

Assignment

It’s very common to use Wolfram|Alpha Notebook Edition to create visualizations of concepts. Often students will just ask for the visualizations themselves. But teachers can also set up templates for visualizations, and let students fill in their own functions or data to explore for themselves.

Visualizations

Wolfram|Alpha Notebook Edition also supports dynamic interactive visualizations—for example using the Wolfram Language Manipulate function. And in Wolfram|Alpha Notebook Edition students (and teachers!) can build all sorts of dynamic visualizations just using natural language:

Dynamic visualizations

But what if you want some more sophisticated interactive demonstration, that might be hard to specify? Well, Wolfram|Alpha Notebook Edition has direct access to the Wolfram Demonstrations Project, which contains over 12,000 Demonstrations. You can ask for Demonstrations using natural language, or you can just browse the Demonstrations Project website, select a Demonstration, copy it into your Wolfram|Alpha Notebook Edition notebook, and then immediately use it there:

Demonstrations

With Wolfram|Alpha Notebook Edition it’s very easy to create compelling content. The content can involve pure calculations or visualizations. But—using the capabilities of the Wolfram Knowledgebase—it can also involve a vast range of real-world data, whether about countries, chemicals, words or artworks. And you can access it using natural language, and work with it directly in a notebook:

Using natural language

Wolfram|Alpha Notebook Edition is a great tool for students to use on their own computers. But it’s also a great tool for lectures and class demonstrations (as well as for student presentations). Go to File > New > Presenter Notebook, and you’ll get a notebook that’s set up to create a Wolfram|Alpha Notebook Edition slide show:

Presenter notebook

Click Start Presentation and you can start presenting. But what you’ll have is not just a “PowerPoint-style” slide show. It’s a fully interactive, editable, computable slide show. The Manipulate interfaces work. Everything is immediately editable. And you can do computations right there during the presentation, exploring different cases, pulling in different data, and so on.

Slide show

Making Code from Natural Language

We invented notebooks more than 30 years ago, and they’ve been widely used in Mathematica ever since. But while in Mathematica (and Wolfram Desktop) notebooks you (by default) specify computations in the precise syntax and semantics of the Wolfram Language, in Wolfram|Alpha Notebook Edition notebooks you instead specify them just using free-form Wolfram|Alpha-style input.

And indeed one of the key technical achievements that’s made Wolfram|Alpha Notebook Edition possible is that we’ve now developed increasingly robust natural-language-to-code technology that’s able to go from the free-form natural language input you type to precise Wolfram Language code that can be used to build up computations:

Natural language to code

By default, Wolfram|Alpha Notebook Edition is set up to show you the Wolfram Language code it generates. You don’t need to look at this code (and you can set it to always be hidden). But—satisfyingly for me as a language designer—students seem to find it very easy to read, often actually easier than math. And reading it gives them an extra opportunity to understand what’s going on—and to make sure the computation they’ve specified is actually the one they want.

And there’s a great side effect to the fact that Wolfram|Alpha Notebook Edition generates code: through routinely being exposed to code that represents natural language they’ve entered, students gradually absorb the idea of expressing things in computational language, and the concepts of computational thinking.

If a student wants to change a computation when they’re using Wolfram|Alpha Notebook Edition, they can always edit the free-form input they gave. But they can also directly edit the Wolfram Language that’s been generated, giving them real computational language experience.

Free-form input

What Should I Do Next? The Predictive Interface

A central goal of Wolfram|Alpha Notebook Edition is to be completely “self-service”—so that students at all levels can successfully use it without any outside instruction or assistance. Of course, free-form input is a key part of achieving this. But another part is the Wolfram|Alpha Notebook Edition Predictive Interface—that suggests what to do next based on what students have done.

Enter a computation and you’ll typically see some buttons pop up under the input field:

Buttons

These buttons will suggest directions to take. Here step-by-step solution generates an enhanced interactive version of Wolfram|Alpha Pro step-by-step functionality—all right in the notebook:

Step-by-step functionality

Click related computations and you’ll see suggestions for different computations you might want to do:

Related computations

It suggests plotting the integrand and the integral:

Plotting the integrand and the integral

It also suggests you might like to see a series expansion:

Series expansion

Now notice that underneath the output there’s a bar of suggestions about possible follow-on computations to do on this output. Click, for example, coefficient list to find the list of coefficients:

Coefficient list

Now there are new suggestions. Click, for example, total to find the total of the coefficients:

Find the total of the coefficients

The Math Experience

Wolfram|Alpha Notebook Edition has got lots of features to enhance the “math experience”. For example, click the button at the top of the notebook and you’ll get a “math keyboard” that you can use to directly enter math notation:

Math keyboard

The Wolfram Language that underlies Wolfram|Alpha Notebook Edition routinely handles the math that’s needed by the world’s top mathematicians. But having all that sophisticated math can sometimes lead to confusions for students. So in Wolfram|Alpha Notebook Edition there are ways to say “keep the math simple”. For example, you can set it to minimize the use of complex numbers:

Simplified

Simplified

Wolfram|Alpha Notebook Edition also by default does things like adding constants of integration to indefinite integrals:

Constants of integration

By the way, Wolfram|Alpha Notebook Edition by default automatically formats mathematical output in elegant “traditional textbook” form. But it always includes a little button next to each output, so you can toggle between “traditional form”, and standard Wolfram Language form.

It’s quite common in doing math to have a function, and just say “I want to plot that!” But what range should you use? In Mathematica (or the Wolfram Language), you’d have to specify it. But in Wolfram|Alpha Notebook Edition there’s always an automatic range that’s picked:

Automatic range

But since you can see the Wolfram Language code—including the range—it’s easy to change that, and specify whatever range you want.

Specify range

What if you want to get an interactive control to change the range, or to change a parameter in the function? In Mathematica (or the Wolfram Language) you’d have to write a Manipulate. But in Wolfram|Alpha Notebook Edition, you can build a whole interactive interface just using natural language:

Interactive interface

And because in Wolfram|Alpha Notebook Edition the Manipulate computations are all running directly on your local computer, nothing is being slowed down by network transmission—and so everything moves at full speed. (Also, if you have a long computation, you can just let it keep running on your computer; there’s no timeout like in Wolfram|Alpha on the web.)

Multistep Computation

One of the important features of Wolfram|Alpha Notebook Edition is that it doesn’t just do one-shot computations; it allows you to do multistep computations that in effect involve a back-and-forth conversation with the computer, in which you routinely refer to previous results:

Multistep computation

Often it’s enough to just talk about the most recent result, and say things like “plot it as a function of x”. But it’s also quite common to want to refer back to results earlier in the notebook. One way to do this is to say things like “the result before last”—or to use the Out[n] labels for each result. But another thing that Wolfram|Alpha Notebook Edition allows you to do is to set values of variables, that you can then use throughout your session:

Set values

It’s also possible to define functions, all with natural language:

Define functions

There are lots of complicated design and implementation issues that arise in dealing with multistep computations. For example, if you have a traditional result for an indefinite integral, with a constant of integration, what do you do with the constant when you want to plot the result? (Wolfram|Alpha Notebook Edition consistently handles arbitrary additive constants in plots by effectively setting them to zero.)

Integrate x

It can also be complicated to know what refers to what in the “conversation”. If you say “plot”, are you trying to plot your latest result, or are you asking for an interface to create a completely new plot? If you use a pronoun, as in “plot it”, then it’s potentially more obvious what you mean, and Wolfram|Alpha Notebook Edition has a better chance of being able to use its natural language understanding capabilities to figure it out.

The World with Wolfram|Alpha Notebook Edition

It’s been very satisfying to see how extensively Wolfram|Alpha has been adopted by students. But mostly that adoption has been outside the classroom. Now, with Wolfram|Alpha Notebook Edition, we’ve got a tool that can immediately be put to use in the classroom, across the whole college and precollege spectrum. And I’m excited to see how it can streamline coursework, deepen understanding, enable new concepts to be taught, and effectively provide a course-based personal AI tutor for every student.

Starting today, Wolfram|Alpha Notebook Edition is available on all standard computer platforms (Mac, Windows, Linux). (A cloud version will also be available on the web soon.) Colleges and universities with full Wolfram Technology System site licenses can automatically start using Wolfram|Alpha Notebook Edition today; at schools with other site licenses, it can immediately be added. It’s available to K–12 schools and junior colleges in classroom packs, or as a site license. And, of course, it’s also available to individual teachers, students, hobbyists and others.

(Oh, and if you have Mathematica or Wolfram Desktop, it’ll also be possible in future versions to create “Wolfram|Alpha mode” notebooks that effectively integrate Wolfram|Alpha Notebook Edition capabilities. And in general there’s perfect compatibility among Wolfram|Alpha Notebook Edition, Mathematica, Wolfram Desktop, Wolfram Cloud, Wolfram Programming Lab, etc.—providing a seamless experience for people progressing across education and through professional careers.)

Like Wolfram|Alpha—and the Wolfram Language—Wolfram|Alpha Notebook Edition will continue to grow in capabilities far into the future. But what’s there today is already a remarkable achievement that I think will be transformative in many educational settings.

More than 31 years ago we introduced Mathematica (and what’s now the Wolfram Language). A decade ago we introduced Wolfram|Alpha. Now, today, with the release of Wolfram|Alpha Notebook Edition we’re giving a first taste—in the context of education—of a whole new approach to computing: a full computing environment that’s driven by natural language. It doesn’t supplant Wolfram Language, or Wolfram|Alpha—but it defines a new direction that in time will bring the power of computation to a whole massive new audience.

Announcing the Rule 30 Prizes

$
0
0
rule30-icon

Announcing the Rule 30 Prizes

The Story of Rule 30

How can something that simple produce something that complex? It’s been nearly 40 years since I first saw rule 30—but it still amazes me. Long ago it became my personal all-time favorite science discovery, and over the years it’s changed my whole worldview and led me to all sorts of science, technology, philosophy and more.

But even after all these years, there are still many basic things we don’t know about rule 30. And I’ve decided that it’s now time to do what I can to stimulate the process of finding more of them out. So as of today, I am offering $30,000 in prizes for the answers to three basic questions about rule 30.

The setup for rule 30 is extremely simple. One’s dealing with a sequence of lines of black and white cells. And given a particular line of black and white cells, the colors of the cells on the line below are determined by looking at each cell and its immediate neighbors and then applying the following simple rule:

RulePlot
&#10005
RulePlot[CellularAutomaton[30]]

If you start with a single black cell, what will happen? One might assume—as I at first did—that the rule is simple enough that the pattern it produces must somehow be correspondingly simple. But if you actually do the experiment, here’s what you find happens over the first 50 steps:

RulePlot
&#10005
RulePlot[CellularAutomaton[30], {{1}, 0}, 50, Mesh -> All, 
 ImageSize -> Full]

But surely, one might think, this must eventually resolve into something much simpler. Yet here’s what happens over the first 300 steps:

The first 300 steps of rule 30—click to enlarge

And, yes, there’s some regularity over on the left. But many aspects of this pattern look for all practical purposes random. It’s amazing that a rule so simple can produce behavior that’s so complex. But I’ve discovered that in the computational universe of possible programs this kind of thing is common, even ubiquitous. And I’ve built a whole new kind of science—with all sorts of principles—based on this.

And gradually there’s been more and more evidence for these principles. But what specifically can rule 30 tell us? What concretely can we say about how it behaves? Even the most obvious questions turn out to be difficult. And after decades without answers, I’ve decided it’s time to define some specific questions about rule 30, and offer substantial prizes for their solutions.

I did something similar in 2007, putting a prize on a core question about a particular Turing machine. And at least in that case the outcome was excellent. In just a few months, the prize was won—establishing forever what the simplest possible universal Turing machine is, as well as providing strong further evidence for my general Principle of Computational Equivalence.

The Rule 30 Prize Problems again get at a core issue: just how complex really is the behavior of rule 30? Each of the problems asks this in a different, concrete way. Like rule 30 itself, they’re all deceptively simple to state. Yet to solve any of them will be a major achievement—that will help illuminate fundamental principles about the computational universe that go far beyond the specifics of rule 30.

I’ve wondered about every one of the problems for more than 35 years. And all that time I’ve been waiting for the right idea, or the right kind of mathematical or computational thinking, to finally be able to crack even one of them. But now I want to open this process up to the world. And I’m keen to see just what can be achieved, and what methods it will take.

The Rule 30 Prize Problems

For the Rule 30 Prize Problems, I’m concentrating on a particularly dramatic feature of rule 30: the apparent randomness of its center column of cells. Start from a single black cell, then just look down the sequence of values of this cell—and it seems random:

ArrayPlot
&#10005
ArrayPlot[
 MapIndexed[If[#2[[2]] != 21, # /. {0 -> 0.2, 1 -> .6}, #] &, 
  CellularAutomaton[30, {{1}, 0}, 20], {2}], Mesh -> All]

But in what sense is it really random? And can one prove it? Each of the Prize Problems in effect uses a different criterion for randomness, then asks whether the sequence is random according to that criterion.

Problem 1: Does the center column always remain non-periodic?

Here’s the beginning of the center column of rule 30:

ArrayPlot
&#10005
ArrayPlot[List@CellularAutomaton[30, {{1}, 0}, {80, {{0}}}], 
 Mesh -> True, ImageSize -> Full]

It’s easy to see that this doesn’t repeat—it doesn’t become periodic. But this problem is about whether the center column ever becomes periodic, even after an arbitrarily large number of steps. Just by running rule 30, we know the sequence doesn’t become periodic in the first billion steps. But what about ever? To establish that, we need a proof. (Here are the first million and first billion bits in the sequence, by the way, as entries in the Wolfram Data Repository.)

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s what one gets if one tallies the number of black and of white cells in successively more steps in the center column of rule 30:

The number of black and of white cells in the center column of rule 30
&#10005
Dataset[{{1, 1, 0, ""}, {10, 7, 3, 2.3333333333333335}, {100, 52, 48, 1.0833333333333333}, 
 {1000, 481, 519, 0.9267822736030829}, {10000, 5032, 4968, 1.0128824476650564}, 
 {100000, 50098, 49902, 1.0039276982886458}, {1000000, 500768, 499232, 
  1.003076725850907}, {10000000, 5002220, 4997780, 1.0008883944471345}, 
 {100000000, 50009976, 49990024, 1.000399119632349}, 
 {1000000000, 500025038, 499974962, 1.0001001570154626}}]

The results are certainly close to equal for black vs. white. But what this problem asks is whether the limit of the ratio after an arbitrarily large number of steps is exactly 1.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

To find the nth cell in the center column, one can always just run rule 30 for n steps, computing the values of all the cells in this diamond:

ArrayPlot
&#10005
With[{n = 100}, 
 ArrayPlot[
  MapIndexed[If[Total[Abs[#2 - n/2 - 1]] <= n/2, #, #/4] &, 
   CellularAutomaton[30, CenterArray[{1}, n + 1], n], {2}]]]

But if one does this directly, one’s doing n2 individual cell updates, so the computational effort required goes up like O(n2). This problem asks if there’s a shortcut way to compute the value of the nth cell, without all this intermediate computation—or, in particular, in less than O(n) computational effort.

The Digits of Pi

Rule 30 is a creature of the computational universe: a system found by exploring possible simple programs with the new intellectual framework that the paradigm of computation provides. But the problems I’ve defined about rule 30 have analogs in mathematics that are centuries old.

Consider the digits of π. They’re a little like the center column of rule 30. There’s a definite algorithm for generating them. Yet once generated they seem for all practical purposes random:

N[Pi, 85]
&#10005
N[Pi, 85]

Just to make the analog a little closer, here are the first few digits of π in base 2:

BaseForm[N[Pi, 25], 2]
&#10005
BaseForm[N[Pi, 25], 2]

And here are the first few bits in the center column of rule 30:

Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]
&#10005
Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]

Just for fun, one can convert these to base 10:

N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 0}, 2], 85]
&#10005
N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 
   0}, 2], 85]

Of course, the known algorithms for generating the digits of π are considerably more complicated than the simple rule for generating the center column of rule 30. But, OK, so what’s known about the digits of π?

Well, we know they don’t repeat. That was proved in the 1760s when it was shown that π is an irrational number—because the only numbers whose digits repeat are rational numbers. (It was also shown in 1882 that π is transcendental, i.e. that it cannot be expressed in terms of roots of polynomials.)

How about the analog of problem 2? Do we know if in the digit sequence of π different digits occur with equal frequency? By now more than 100 trillion binary digits have been computed—and the measured frequencies of digits are very close (in the first 40 trillion binary digits the ratio of 1s to 0s is about 0.9999998064). But in the limit, are the frequencies exactly the same? People have been wondering about this for several centuries. But so far mathematics hasn’t succeeded in delivering any results.

For rational numbers, digit sequences are periodic, and it’s easy to work out relative frequencies of digits. But for the digit sequences of all other “naturally constructed” numbers, basically there’s nothing known about limiting frequencies of digits. It’s a reasonable guess that actually the digits of π (as well as the center column of rule 30) are “normal”, in the sense that not only every individual digit, but also every block of digits of any given length in the limit occur with equal frequency. And as was noted in the 1930s, it’s perfectly possible to “digit-construct” normal numbers. Champernowne’s number, formed by concatenating the digits of successive integers, is an example (and, yes, this works in any base, and one can also get normal numbers by concatenating values of functions of successive integers):

N[ChampernowneNumber[10], 85]
&#10005
N[ChampernowneNumber[10], 85]

But the point is that for “naturally constructed” numbers formed by combinations of standard mathematical functions, there’s simply no example known where any regularity of digits has been found. Of course, it ultimately depends what one means by “regularity”—and at some level the problem devolves into a kind of number-digit analog of the search for extraterrestrial intelligence. But there’s absolutely no proof that one couldn’t, for example, find even some strange combination of square roots that would have a digit sequence with some very obvious regularity.

OK, so what about the analog of problem 3 for the digits of π? Unlike rule 30, where the obvious way to compute elements in the sequence is one step at a time, traditional ways of computing digits of π involve getting better approximations to π as a complete number. With the standard (bizarre-looking) series invented by Ramanujan in 1910 and improved by the Chudnovsky brothers in 1989, the first few terms in the series give the following approximations:

Standard series
&#10005
Style[Table[N[(12*\!\(
\*UnderoverscriptBox[\(\[Sum]\), \(k = 0\), \(n\)]
\*FractionBox[\(
\*SuperscriptBox[\((\(-1\))\), \(k\)]*\(\((6*k)\)!\)*\((13591409 + 
           545140134*k)\)\), \(\(\((3*k)\)!\) 
\*SuperscriptBox[\((\(k!\))\), \(3\)]*
\*SuperscriptBox[\(640320\), \(3*k + 3/2\)]\)]\))^-1, 100], {n, 10}] //
   Column, 9]

So how much computational effort is it to find the nth digit? The number of terms required in the series is O(n). But each term needs to be computed to n-digit precision, which requires at least O(n) individual digit operations—implying that altogether the computational effort required is more than O(n).

Until the 1990s it was assumed that there wasn’t any way to compute the nth digit of π without computing all previous ones. But in 1995 Simon Plouffe discovered that actually it’s possible to compute—albeit slightly probabilistically—the nth digit without computing earlier ones. And while one might have thought that this would allow the nth digit to be obtained with less than O(n) computational effort, the fact that one has to do computations at n-digit precision means that at least O(n) computational effort is still required.

Results, Analogies and Intuitions

Problem 1: Does the center column always remain non-periodic?

Of the three Rule 30 Prize Problems, this is the one on which the most progress has already been made. Because while it’s not known if the center column in the rule 30 pattern ever becomes periodic, Erica Jen showed in 1986 that no two columns can both become periodic. And in fact, one can also give arguments that a single column plus scattered cells in another column can’t both be periodic.

The proof about a pair of columns uses a special feature of rule 30. Consider the structure of the rule:

RulePlot[CellularAutomaton[30]]
&#10005
RulePlot[CellularAutomaton[30]]

Normally one would just say that given each triple of cells, the rule determines the color of the center cell below. But for rule 30, one can effectively also run the rule sideways: given the cell to the right and above, one can also uniquely determine the color of the cell to the left. And what this means is that if one is given two adjacent columns, it’s possible to reconstruct the whole pattern to the left:

ArrayPlot
&#10005
GraphicsRow[
 ArrayPlot[#, PlotRange -> 1, Mesh -> All, PlotRange -> 1, 
    Background -> LightGray, 
    ImageSize -> {Automatic, 80}] & /@ (PadLeft[#, {Length[#], 10}, 
      10] & /@ 
    Module[{data = {{0, 1}, {1, 1}, {0, 0}, {0, 1}, {1, 1}, {1, 
         0}, {0, 1}, {1, 10}}}, 
     Flatten[{{data}, 
       Table[Join[
         Table[Module[{p, q = data[[n, 1]], r = data[[n, 2]], 
            s = data[[n + 1, 1]] },
           p = Mod[-q - r - q r + s, 2];
           PrependTo[data[[n]], p]], {n, 1, Length[data] - i}], 
         PrependTo[data[[-#]], 10] & /@ Reverse[Range[i]]], {i, 7}]}, 
      1]])]

But if the columns were periodic, it immediately follows that the reconstructed pattern would also have to be periodic. Yet by construction at least the initial condition is definitely not periodic, and hence the columns cannot both be periodic. The same argument works if the columns are not adjacent, and if one doesn’t know every cell in both columns. But there’s no known way to extend the argument to a single column—such as the center column—and thus it doesn’t resolve the first Rule 30 Prize Problem.

OK, so what would be involved in resolving it? Well, if it turns out that the center column is eventually periodic, one could just compute it, and show that. We know it’s not periodic for the first billion steps, but one could at least imagine that there could be a trillion-step transient, after which it’s periodic.

Is that plausible? Well, transients do happen—and theoretically (just like in the classic Turing machine halting problem) they can even be arbitrarily long. Here’s a somewhat funky example—found by a search—of a rule with 4 possible colors (totalistic code 150898). Run it for 200 steps, and the center column looks quite random:

Rule 150898
&#10005
ArrayPlot[
 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {200, 150 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, 
 PixelConstrained -> 2, Frame -> False]

After 500 steps, the whole pattern still looks quite random:

Rule 150898
&#10005
ArrayPlot[
 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {500, 300 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, Frame -> False, 
 ImagePadding -> 0, PlotRangePadding -> 0, PixelConstrained -> 1]

But if one zooms in around the center column, there’s something surprising: after 251 steps, the center column seems to evolve to a fixed value (or at least it’s fixed for more than a million steps):

Rule 150898
&#10005
Grid[{ArrayPlot[#, Mesh -> True, 
     ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
       2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, ImageSize -> 38,
      MeshStyle -> Lighter[GrayLevel[.5, .65], .45]] & /@ 
   Partition[
    CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {1400, {-4, 4}}],
     100]}, Spacings -> .35]

Could some transient like this happen in rule 30? Well, take a look at the rule 30 pattern, now highlighting where the diagonals on the left are periodic:

ArrayPlot
&#10005
steps = 500;
					diagonalsofrule30 = 
  Reverse /@ 
   Transpose[
    MapIndexed[RotateLeft[#1, (steps + 1) - #2[[1]]] &, 
     CellularAutomaton[30, {{1}, 0}, steps]]];

     diagonaldataofrule30 = 
  Table[With[{split = 
      Split[Partition[Drop[diagonalsofrule30[[k]], 1], 8]], 
     ones = Flatten[
       Position[Reverse[Drop[diagonalsofrule30[[k]], 1]], 
        1]]}, {Length[split[[1]]], split[[1, 1]], 
     If[Length[split] > 1, split[[2, 1]], 
      Length[diagonalsofrule30[[k]]] - Floor[k/2]]}], {k, 1, 
    2 steps + 1}];

transientdiagonalrule30 = %;

    transitionpointofrule30 = 
  If[IntegerQ[#[[3]]], #[[3]], 
     If[#[[1]] > 1, 
      8 #[[1]] + Count[Split[#[[2]] - #[[3]]][[1]], 0] + 1, 0] ] & /@ 
   diagonaldataofrule30;

   decreasingtransitionpointofrule30 = 
  Append[Min /@ Partition[transitionpointofrule30, 2, 1], 0];

  transitioneddiagonalsofrule30 = 
  Table[Join[
    Take[diagonalsofrule30[[n]], 
      decreasingtransitionpointofrule30[[n]]] + 2, 
    Drop[diagonalsofrule30[[n]], 
     decreasingtransitionpointofrule30[[n]]]], {n, 1, 2 steps + 1}];

     transientdiagonalrule30 = 
 MapIndexed[RotateRight[#1, (steps + 1) - #2[[1]]] &, 
  Transpose[Reverse /@ transitioneddiagonalsofrule30]];
  
  smallertransientdiagonalrule30 = 
  Take[#, {225, 775}] & /@ Take[transientdiagonalrule30, 275];

 Framed[ArrayPlot[smallertransientdiagonalrule30, 
  ColorRules -> {0 -> White, 1 -> Gray, 2 -> Hue[0.14, 0.55, 1], 
    3 -> Hue[0.07, 1, 1]}, PixelConstrained -> 1,
  Frame -> None,
  ImagePadding -> 0, ImageMargins -> 0,
  PlotRangePadding -> 0, PlotRangePadding -> Full
  ], FrameMargins -> 0, FrameStyle -> GrayLevel[.75]]

There seems to be a boundary that separates order on the left from disorder on the right. And at least over the first 100,000 or so steps, the boundary seems to move on average about 0.252 steps to the left at each step—with roughly random fluctuations:

ListLinePlot
&#10005
data = CloudGet[
   CloudObject[
    "https://www.wolframcloud.com/obj/bc470188-f629-4497-965d-\
a10fe057e2fd"]];

ListLinePlot[
 MapIndexed[{First[#2], -# - .252 First[#2]} &, 
  Module[{m = -1, w}, 
   w = If[First[#] > m, m = First[#], m] & /@ data[[1]]; m = 1;
   Table[While[w[[m]] < i, m++]; m - i, {i, 100000}]]], 
 Filling -> Axis, AspectRatio -> 1/4, MaxPlotPoints -> 10000, 
 Frame -> True, PlotRangePadding -> 0, AxesOrigin -> {Automatic, 0}, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

But how do we know that there won’t at some point be a huge fluctuation, that makes the order on the left cross the center column, and perhaps even make the whole pattern periodic? From the data we have so far, it looks unlikely, but I don’t know any way to know for sure.

And it’s certainly the case that there are systems with exceptionally long “transients”. Consider the distribution of primes, and compute LogIntegral[n] - PrimePi[n]:

DiscretePlot
&#10005
DiscretePlot[LogIntegral[n] - PrimePi[n], {n, 10000}, 
 Filling -> Axis,
 Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, 
 Joined -> True, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Yes, there are fluctuations. But from this picture it certainly looks as if this difference is always going to be positive. And that’s, for example, what Ramanujan thought. But it turns out it isn’t true. At first the bound for where it would fail was astronomically large (Skewes’s number 10^10^10^964). And although still nobody has found an explicit value of n for which the difference is negative, it’s known that before n = 10317 there must be one (and eventually the difference will be negative at least nearly a millionth of the time).

I strongly suspect that nothing like this happens with the center column of rule 30. But until we have a proof that it can’t, who knows?

One might think, by the way, that while one might be able to prove periodicity by exposing regularity in the center column of rule 30, nothing like that would be possible for non-periodicity. But actually, there are patterns whose center columns one can readily see are non-periodic, even though they’re very regular. The main class of examples are nested patterns. Here’s a very simple example, from rule 161—in which the center column has white cells when n = 2k:

Rule 161
&#10005
GraphicsRow[
 ArrayPlot[CellularAutomaton[161, {{1}, 0}, #]] & /@ {40, 200}]

Here’s a slightly more elaborate example (from the 2-neighbor 2-color rule 69540422), in which the center column is a Thue–Morse sequence ThueMorse[n]:

Thue-Morse sequence
&#10005
GraphicsRow[
 ArrayPlot[
    CellularAutomaton[{69540422, 2, 2}, {{1}, 
      0}, {#, {-#, #}}]] & /@ {40, 400}]

One can think of the Thue–Morse sequence as being generated by successively applying the substitutions:

RulePlot
&#10005
RulePlot[SubstitutionSystem[{0 -> {0, 1}, 1 -> {1, 0}}], 
 Appearance -> "Arrow"]

And it turns out that the nth term in this sequence is given by Mod[DigitCount[n, 2, 1], 2]—which is never periodic.

Will it turn out that the center column of rule 30 can be generated by a substitution system? Again, I’d be amazed (although there are seemingly natural examples where very complex substitution systems do appear). But once again, until one has a proof, who knows?

Here’s something else, that may be confusing, or may be helpful. The Rule 30 Prize Problems all concern rule 30 running in an infinite array of cells. But what if one considers just n cells, say with the periodic boundary conditions (i.e. taking the right neighbor of the rightmost cell to be the leftmost cell, and vice versa)? There are 2n possible total states of the system—and one can draw a state transition diagram that shows which state evolves to which other. Here’s the diagram for n = 5:

Graph
&#10005
Graph[# -> CellularAutomaton[30][#] & /@ Tuples[{1, 0}, 4], 
 VertexLabels -> ((# -> 
       ArrayPlot[{#}, ImageSize -> 30, Mesh -> True]) & /@ 
    Tuples[{1, 0}, 4])]

And here it is for n = 4 through n = 11:

Grid
&#10005
Row[Table[
  Framed[Graph[# -> CellularAutomaton[30][#] & /@ 
     Tuples[{1, 0}, n]]], {n, 4, 11}]]

The structure is that there are a bunch of states that appear only as transients, together with other states that are on cycles. Inevitably, no cycle can be longer than 2n (actually, symmetry considerations show that it always has to be somewhat less than this).

OK, so on a size-n array, rule 30 always has to show behavior that becomes periodic with a period that’s less than 2n. Here are the actual periods starting from a single black cell initial condition, plotted on a log scale:

ListLogPlot
&#10005
ListLogPlot[
 Normal[Values[
   ResourceData[
      "Repetition Periods for Elementary Cellular Automata"][
     Select[#Rule == 30 &]][All, "RepetitionPeriods"]]], 
 Joined -> True, Filling -> Bottom, Mesh -> All, 
 MeshStyle -> PointSize[.008], AspectRatio -> 1/3, Frame -> True, 
 PlotRange -> {{47, 2}, {0, 10^10}}, PlotRangePadding -> .1, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And at least for these values of n, a decent fit is that the period is about 20.63 n. And, yes, at least in all these cases, the period of the center column is equal to the period of the whole evolution. But what do these finite-size results imply about the infinite-size case? I, at least, don’t immediately see.

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s a plot of the running excess of 1s over 0s in 10,000 steps of the center column of rule 30:

ListLinePlot
&#10005
ListLinePlot[
 Accumulate[2 CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}] - 1],
  AspectRatio -> 1/4, Frame -> True, PlotRangePadding -> 0, 
 AxesOrigin -> {Automatic, 0}, Filling -> Axis, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Here it is for a million steps:

ListLinePlot
&#10005
ListLinePlot[
 Accumulate[
  2 ResourceData[
     "A Million Bits of the Center Column of the Rule 30 Cellular Automaton"] - 1], Filling -> Axis, Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, MaxPlotPoints -> 1000, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And a billion steps:

ListLinePlot
&#10005
data=Flatten[IntegerDigits[#,2,8]&/@Normal[ResourceData["A 
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]];
data=Accumulate[2 data-1];
sdata=Downsample[data,10^5];
ListLinePlot[Transpose[{Range[10000] 10^5,sdata}],Filling->Axis,Frame->True,PlotRangePadding->0,AspectRatio->1/4,MaxPlotPoints->1000,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]]

We can see that there are times when there’s an excess of 1s over 0s, and vice versa, though, yes, as we approach a billion steps 1 seems to be winning over 0, at least for now.

But let’s compute the ratio of the total number of 1s to the total number 0f 0s. Here’s what we get after 10,000 steps:

ListLinePlot
&#10005
Quiet[ListLinePlot[
  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.88, 1.04}}]]

Is this approaching the value 1? It’s hard to tell. Go on a little longer, and this is what we see:

ListLinePlot
&#10005
Quiet[ListLinePlot[
  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^5 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.985, 1.038}}]]

The scale is getting smaller, but it’s still hard to tell what will happen. Plotting the difference from 1 on a log-log plot up to a billion steps suggests it’s fairly systematically getting smaller:

ListLogLogPlot
&#10005
accdata=Accumulate[Flatten[IntegerDigits[#,2,8]&/@Normal[ResourceData["A
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]];

diffratio=FunctionCompile[Function[Typed[arg,TypeSpecifier["PackedArray"]["MachineInteger",1]],MapIndexed[Abs[N[#]/(First[#2]-N[#])-1.]&,arg]]];

data=diffratio[accdata];

ListLogLogPlot[Join[Transpose[{Range[3,10^5],data[[3;;10^5]]}],Transpose[{Range[10^5+1000,10^9,1000],data[[10^5+1000;;10^9;;1000]]}]],Joined->True,AspectRatio->1/4,Frame->True,Filling->Axis,PlotRangePadding->0,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]]

But how do we know this trend will continue? Right now, we don’t. And, actually, things could get quite pathological. Maybe the fluctuations in 1s vs. 0s grow, so even though we’re averaging over longer and longer sequences, the overall ratio will never converge to a definite value.

Again, I doubt this is going to happen in the center column of rule 30. But without a proof, we don’t know for sure.

We’re asking here about the frequencies of black and white cells. But an obvious—and potentially illuminating—generalization is to ask instead about the frequencies for blocks of cells of length k. We can ask if all 2k such blocks have equal limiting frequency. Or we can ask the more basic question of whether all the blocks even ever occur—or, in other words, whether if one goes far enough, the center column of rule 30 will contain any given sequence of length k (say a bitwise representation of some work of literature).

Again, we can get empirical evidence. For example, at least up to k = 22, all 2k sequences do occur—and here’s how many steps it takes:

ListLogPlot
&#10005
ListLogPlot[{3, 7, 13, 63, 116, 417, 1223, 1584, 2864, 5640, 23653, 
  42749, 78553, 143591, 377556, 720327, 1569318, 3367130, 7309616, 
  14383312, 32139368, 58671803}, Joined -> True, AspectRatio -> 1/4, 
 Frame -> True, Mesh -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.01]}], 
 PlotTheme -> "Detailed", 
 PlotStyle -> Directive[{Thickness[.004], Hue[0.1, 1, 0.99]}]]

It’s worth noticing that one can succeed perfectly for blocks of one length, but then fail for larger blocks. For example, the Thue–Morse sequence mentioned above has exactly equal frequencies of 0 and 1, but pairs don’t occur with equal frequencies, and triples of identical elements simply never occur.

In traditional mathematics—and particularly dynamical systems theory—one approach to take is to consider not just evolution from a single-cell initial condition, but evolution from all possible initial conditions. And in this case it’s straightforward to show that, yes, if one evolves with equal probability from all possible initial conditions, then columns of cells generated by rule 30 will indeed contain every block with equal frequency. But if one asks the same thing for different distributions of initial conditions, one gets different results, and it’s not clear what the implication of this kind of analysis is for the specific case of a single-cell initial condition.

If different blocks occurred with different frequencies in the center column of rule 30, then that would immediately show that the center column is “not random”, or in other words that it has statistical regularities that could be used to at least statistically predict it. Of course, at some level the center column is completely “predictable”: you just have to run rule 30 to find it. But the question is whether, given just the values in the center column on their own, there’s a way to predict or compress them, say with much less computational effort than generating an arbitrary number of steps in the whole rule 30 pattern.

One could imagine running various data compression or statistical analysis algorithms, and asking whether they would succeed in finding regularities in the sequence. And particularly when one starts thinking about the overall computational capabilities of rule 30, it’s conceivable that one could prove something about how across a spectrum of possible analysis algorithms, there’s a limit to how much they could “reduce” the computation associated with the evolution of rule 30. But even given this, it’d likely still be a major challenge to say anything about the specific case of relative frequencies of black and white cells.

It’s perhaps worth mentioning one additional mathematical analog. Consider treating the values in a row of the rule 30 pattern as digits in a real number, say with the first digit of the fractional part being on the center column. Now, so far as we know, the evolution of rule 30 has no relation to any standard operations (like multiplication or taking powers) that one does on real numbers. But we can still ask about the sequence of numbers formed by looking at the right-hand side of the rule 30 pattern. Here’s a plot for the first 200 steps:

ListLinePlot
&#10005
ListLinePlot[
 FromDigits[{#, 0}, 2] & /@ 
  CellularAutomaton[30, {{1}, 0}, {200, {0, 200}}], Mesh -> All, 
 AspectRatio -> 1/4, Frame -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.0085]}], 
 PlotTheme -> "Detailed", PlotStyle -> Directive[{
Hue[0.1, 1, 0.99]}], ImageSize -> 575]

And here’s a histogram of the values reached at successively more steps:

Histogram
&#10005
Grid[{Table[
   Histogram[
    FromDigits[{#, 0}, 2] & /@ 
     CellularAutomaton[30, {{1}, 0}, {10^n, {0, 20}}], {.01}, 
    Frame -> True, 
    FrameTicks -> {{None, 
       None}, {{{0, "0"}, .2, .4, .6, .8, {1, "1"}}, None}}, 
    PlotLabel -> (StringTemplate["`` steps"][10^n]), 
    ChartStyle -> Directive[Opacity[.5], Hue[0.09, 1, 1]], 
    ImageSize -> 208, 
    PlotRangePadding -> {{0, 0}, {0, Scaled[.06]}}], {n, 4, 6}]}, 
 Spacings -> .2]

And, yes, it’s consistent with the limiting histogram being flat, or in other words, with these numbers being uniformly distributed in the interval 0 to 1.

Well, it turns out that in the early 1900s there were a bunch of mathematical results established about this kind of equidistribution. In particular, it’s known that FractionalPart[h n] for successive n is always equidistributed if h isn’t a rational number. It’s also known that FractionalPart[hn] is equidistributed for almost all h (Pisot numbers like the golden ratio are exceptions). But specific cases—like FractionalPart[(3/2)n]—have eluded analysis for at least half a century. (By the way, it’s known that the digits of π in base 16 and thus base 2 can be generated by a recurrence of the form xn = FractionalPart[16 xn-1 + r[n]] where r[n] is a fixed rational function of n.)

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Consider the pattern made by rule 150:

Rule 150
&#10005
Row[{ArrayPlot[CellularAutomaton[150, {{1}, 0}, 30], Mesh -> All, 
   ImageSize -> 315], 
  ArrayPlot[CellularAutomaton[150, {{1}, 0}, 200], ImageSize -> 300]}]

It’s a very regular, nested pattern. Its center column happens to be trivial (all cells are black). But if we look one column to the left or right, we find:

ArrayPlot
&#10005
ArrayPlot[{Table[Mod[IntegerExponent[t, 2], 2], {t, 80}]}, 
 Mesh -> All, ImageSize -> Full]

How do we work out the value of the nth cell? Well, in this particular case, it turns out there’s essentially just a simple formula: the value is given by Mod[IntegerExponent[n, 2], 2]. In other words, just look at the number n in base 2, and ask whether the number of zeros it has at the end is even or odd.

How much computational effort does it take to “evaluate this formula”? Well, even if we have to check every bit in n, there are only about Log[2, n] of those. So we can expect that the computational effort is O(log n).

But what about the rule 30 case? We know we can work out the value of the nth cell in the center column just by explicitly applying the rule 30 update rule n2 times. But the question is whether there’s a way to reduce the computational work that’s needed. In the past, there’s tended to be an implicit assumption throughout the mathematical sciences that if one has the right model for something, then by just being clever enough one will always find a way to make predictions—or in other words, to work out what a system will do, using a lot less computational effort than the actual evolution of the system requires.

And, yes, there are plenty of examples of “exact solutions” (think 2-body problem, 2D Ising model, etc.) where we essentially just get a formula for what a system will do. But there are also other cases (think 3-body problem, 3D Ising model, etc.) where this has never successfully been done.

And as I first discussed in the early 1980s, I suspect that there are actually many systems (including these) that are computationally irreducible, in the sense that there’s no way to significantly reduce the amount of computational work needed to determine their behavior.

So in effect Problem 3 is asking about the computational irreducibility of rule 30—or at least a specific aspect of it. (The choice of O(n) computational effort is somewhat arbitrary; another version of this problem could ask for O(nα) for any α<2, or, for that matter, O(log β(n))—or some criterion based on both time and memory resources.)

If the answer to Problem 3 is negative, then the obvious way to show this would just be to give an explicit program that successfully computes the nth value in the center column with less than O(n) computational effort, as we did for rule 150 above.

We can ask what O(n) computational effort means. What kind of system are we supposed to use to do the computation? And how do we measure “computational effort”? The phenomenon of computational universality implies that—within some basic constraints—it ultimately doesn’t matter.

For definiteness we could say that we always want to do the computation on a Turing machine. And for example we can say that we’ll feed the digits of the number n in as the initial state of the Turing machine tape, then expect the Turing machine to grind for much less than n steps before generating the answer (and, if it’s really to be “formula like”, more like O(log n) steps).

We don’t need to base things on a Turing machine, of course. We could use any kind of system capable of universal computation, including a cellular automaton, and, for that matter, the whole Wolfram Language. It gets a little harder to measure “computational effort” in these systems. Presumably in a cellular automaton we’d want to count the total number of cell updates done. And in the Wolfram Language we might end up just actually measuring CPU time for executing whatever program we’ve set up.

I strongly suspect that rule 30 is computationally irreducible, and that Problem 3 has an affirmative answer. But if isn’t, my guess is that eventually there’ll turn out to be a program that rather obviously computes the nth value in less than O(n) computational effort, and there won’t be a lot of argument about the details of whether the computational resources are counted correctly.

But proving that no such program exists is a much more difficult proposition. And even though I suspect computational irreducibility is quite ubiquitous, it’s always very hard to prove explicit lower bounds on the difficulty of doing particular computations. And in fact almost all explicit lower bounds currently known are quite weak, and essentially boil down just to arguments about information content—like that you need O(log n) steps to even read all the digits in the value of n.

Undoubtedly the most famous lower-bound problem is the P vs. NP question. I don’t think there’s a direct relation to our rule 30 problem (which is more like a P vs. LOGTIME question), but it’s perhaps worth understanding how things are connected. The basic point is that the forward evolution of a cellular automaton, say for n steps from an initial condition with n cells specified, is at most an O(n2) computation, and is therefore in P (“polynomial time”). But the question of whether there exists an initial condition that evolves to produce some particular final result is in NP. If you happen (“non-deterministically”) to pick the correct initial condition, then it’s polynomial time to check that it’s correct. But there are potentially 2n possible initial conditions to check.

Of course there are plenty of cellular automata where you don’t have to check all these 2n initial conditions, and a polynomial-time computation clearly suffices. But it’s possible to construct a cellular automaton where finding the initial condition is an NP-complete problem, or in other words, where it’s possible to encode any problem in NP in this particular cellular automaton inversion problem. Is the rule 30 inversion problem NP-complete? We don’t know, though it seems conceivable that it could be proved to be (and if one did prove it then rule 30 could finally be a provably NP-complete cryptosystem).

But there doesn’t seem to be a direct connection between the inversion problem for rule 30, and the problem of predicting the center column. Still, there’s at least a more direct connection to another global question: whether rule 30 is computation universal, or, in other words, whether there exist initial conditions for rule 30 that allow it to be “programmed” to perform any computation that, for example, any Turing machine can perform.

We know that among the 256 simplest cellular automata, rule 110 is universal (as are three other rules that are simple transformations of it). But looking at a typical example of rule 110 evolution, it’s already clear that there are definite, modular structures one can identify. And indeed the proof proceeds by showing how one can “engineer” a known universal system out of rule 110 by appropriately assembling these structures.

Rule 110
&#10005
SeedRandom[23542345]; ArrayPlot[
 CellularAutomaton[110, RandomInteger[1, 600], 400], 
 PixelConstrained -> 1]

Rule 30, however, shows no such obvious modularity—so it doesn’t seem plausible that one can establish universality in the “engineering” way it’s been established for all other known-to-be-universal systems. Still, my Principle of Computational Equivalence strongly suggests that rule 30 is indeed universal; we just don’t yet have an obvious direction to take in trying to prove it.

If one can show that a system is universal, however, then this does have implications that are closer to our rule 30 problem. In particular, if a system is universal, then there’ll be questions (like the halting problem) about its infinite-time behavior that will be undecidable, and which no guaranteed-finite-time computation can answer. But as such, universality is a statement about the existence of initial conditions that reproduce a given computation. It doesn’t say anything about the specifics of a particular initial condition—or about how long it will take to compute a particular result.

OK, but what about a different direction: what about getting empirical evidence about our Problem 3? Is there a way to use statistics, or cryptanalysis, or mathematics, or machine learning to even slightly reduce the computational effort needed to compute the nth value in the center column?

Well, we know that the whole 2D pattern of rule 30 is far from random. In fact, of all 2m2 patches, only m × m can possibly occur—and in practice the number weighted by probability is much smaller. And I don’t doubt that facts like this can be used to reduce the effort to compute the center column to less than O(n2) effort (and that would be a nice partial result). But can it be less than O(n) effort? That’s a much more difficult question.

Clearly if Problem 1 was answered in the negative then it could be. But in a sense asking for less than O(n) computation of the center column is precisely like asking whether there are “predictable regularities” in it. Of course, even if one could find small-scale statistical regularities in the sequence (as answering Problem 2 in the negative would imply), these wouldn’t on their own give one a way to do more than perhaps slightly improve a constant multiplier in the speed of computing the sequence.

Could there be some systematically reduced way to compute the sequence using a neural net—which is essentially a collection of nested real-number functions? I’ve tried to find such a neural net using our current deep-learning technology—and haven’t been able to get anywhere at all.

What about statistical methods? If we could find statistical non-randomness in the sequence, then that would imply an ability to compress the sequence, and thus some redundancy or predictability in the sequence. But I’ve tried all sorts of statistical randomness tests on the center column of rule 30—and never found any significant deviation from randomness. (And for many years—until we found a slightly more efficient rule—we used sequences from finite-size rule 30 systems as our source of random numbers in the Wolfram Language, and no legitimate “it’s not random!” bugs ever showed up.)

Statistical tests of randomness typically work by saying, “Take the supposedly random sequence and process it in some way, then see if the result is obviously non-random”. But what kind of processing should be done? One might see if blocks occur with equal frequency, or if correlations exist, or if some compression algorithm succeeds in doing compression. But typically batteries of tests end up seeming a bit haphazard and arbitrary. In principle one can imagine enumerating all possible tests—by enumerating all possible programs that can be applied to the sequence. But I’ve tried doing this, for example for classes of cellular automaton rules—and have never managed to detect any non-randomness in the rule 30 sequence.

So how about using ideas from mathematics to predict the rule 30 sequence? Well, as such, rule 30 doesn’t seem connected to any well-developed area of math. But of course it’s conceivable that some mapping could be found between rule 30 and ideas, say, in an area like number theory—and that these could either help in finding a shortcut for computing rule 30, or could show that computing it is equivalent to some problem like integer factoring that’s thought to be fundamentally difficult.

I know a few examples of interesting interplays between traditional mathematical structures and cellular automata. For example, consider the digits of successive powers of 3 in base 2 and in base 6:

Digits of successive powers
&#10005
Row[Riffle[
  ArrayPlot[#, ImageSize -> {Automatic, 275}] & /@ {Table[
     IntegerDigits[3^t, 2, 159], {t, 100}], 
    Table[IntegerDigits[3^t, 6, 62], {t, 100}]}, Spacer[10]]]

It turns out that in the base 6 case, the rule for generating the pattern is exactly a cellular automaton. (For base 2, there are additional long-range carries.) But although both these patterns look complex, it turns out that their mathematical structure lets us speed up making certain predictions about them.

Consider the sth digit from the right-hand edge of line n in each pattern. It’s just the sth digit in 3n, which is given by the “formula” (where b is the base, here 2 or 6) Mod[Quotient[3n, bs], b]. But how easy is it to evaluate this formula? One might think that to compute 3n one would have to do n multiplications. But this isn’t the case: instead, one can for example build up 3n using repeated squaring, with about log(n) multiplications. That this is possible is a consequence of the associativity of multiplication. There’s nothing obviously like that for rule 30—but it’s always conceivable that some mapping to a mathematical structure like this could be found.

Talking of mathematical structure, it’s worth mentioning that there are more formula-like ways to state the basic rule for rule 30. For example, taking the values of three adjacent cells to be p, q, r the basic rule is just p (q r) or Xor[p, Or[q, r]]. With numerical cell values 0 and 1, the basic rule is just Mod[p + q + r + q r, 2]. Do these forms help? I don’t know. But, for example, it’s remarkable that in a sense all the complexity of rule 30 comes from the presence of that one little nonlinear q r term—for without that term, one would have rule 150, about which one can develop a complete algebraic theory using quite traditional mathematics.

To work out n steps in the evolution of rule 30, one’s effectively got to repeatedly compose the basic rule. And so far as one can tell, the symbolic expressions that arise just get more and more complicated—and don’t show any sign of simplifying in such a way as to save computational work.

In Problem 3, we’re talking about the computational effort to compute the nth value in the center column of rule 30—and asking if it can be less than O(n). But imagine that we have a definite algorithm for doing the computation. For any given n, we can see what computational resources it uses. Say the result is r[n]. Then what we’re asking is whether r[n] is less than “big O” of n, or whether MaxLimit[r[n]/n, n ]<.

But imagine that we have a particular Turing machine (or some other computational system) that’s implementing our algorithm. It could be that r[n] will at least asymptotically just be a smooth or otherwise regular function of n for which it’s easy to see what the limit is. But if one just starts enumerating Turing machines, one encounters examples where r[n] appears to have peaks of random heights in random places. It might even be that somewhere there’d be a value of n for which the Turing machine doesn’t halt (or whatever) at all, so that r[n] is infinite. And in general, as we’ll discuss in more detail later, it could even be undecidable just how r[n] grows relative to O(n).

Formal Statements of the Problems

So far, I’ve mostly described the Prize Problems in words. But we can also describe them in computational language (or effectively also in math).

In the Wolfram Language, the first t values in the center column of rule 30 are given by:

c[t_]
&#10005
c[t_] := CellularAutomaton[30, {{1}, 0}, {t, {{0}}}]

And with this definition, the three problems can be stated as predicates about c[t].

Problem 1: Does the center column always remain non-periodic?

Problem 1
&#10005
\!\(
\*SubscriptBox[\(\[NotExists]\), \({p, i}\)]\(
\*SubscriptBox[\(\[ForAll]\), \(t, t > i\)]c[t + p] == c[t]\)\)

or

NotExists
&#10005
NotExists[{p, i}, ForAll[t, t > i, c[t + p] == c[t]]]

or “there does not exist a period p and an initial length i such that for all t with t>i, c[t + p] equals c[t]”.

Problem 2: Does each color of cell occur on average equally often in the center column?

Problem 2
&#10005
\!\(\*UnderscriptBox[\(\[Limit]\), \(t\*
UnderscriptBox["\[Rule]", 
TemplateBox[{},
"Integers"]]\[Infinity]\)]\) Total[c[t]]/t == 1/2

or

DiscreteLimit
&#10005
DiscreteLimit[Total[c[t]]/t, t -> Infinity] == 1/2

or “the discrete limit of the total of the values in c[t]/t as t is 1/2”.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Define machine[m] to be a machine parametrized by m (for example TuringMachine[...]), and let machine[m][n] give {v, t}, where v is the output value, and t is the amount of computational effort taken (e.g. number of steps). Then the problem can be formulated as:

Problem 3
&#10005
\!\(
\*SubscriptBox[\(\[NotExists]\), \(m\)]\((
\*SubscriptBox[\(\[ForAll]\), \(n\)]\(\(machine[m]\)[n]\)[[1]] == 
     Last[c[n]]\  \[And] \ 
\*UnderscriptBox[\(\[MaxLimit]\), \(n -> \[Infinity]\)]
\*FractionBox[\(\(\(machine[m]\)[n]\)[[
       2]]\), \(n\)] < \[Infinity])\)\)

or “there does not exist a machine m which for all n gives c[n], and for which the lim sup of the amount of computational effort spent, divided by n, is finite”. (Yes, one should also require that m be finite, so the machine’s rule can’t just store the answer.)

The Formal Character of Solutions

Before we discuss the individual problems, an obvious question to ask is what the interdependence of the problems might be. If the answer to Problem 3 is negative (which I very strongly doubt), then it holds the possibility for simple algorithms or formulas from which the answers to Problems 1 and 2 might become straightforward. If the answer to Problem 3 is affirmative (as I strongly suspect), then it implies that the answer to Problem 1 must also be affirmative. The contrapositive is also true: if the answer to Problem 1 is negative, then it implies that the answer to Problem 3 must also be negative.

If the answer to Problem 1 is negative, so that there is some periodic sequence that appears in the center column, then if one explicitly knows that sequence, one can immediately answer Problem 2. One might think that answering Problem 2 in the negative would imply something about Problem 3. And, yes, unequal probabilities for black and white implies compression by a constant factor in a Shannon-information way. But to compute value with less than O(n) resources—and therefore to answer Problem 3 in the negative—requires that one be able to identify in a sense infinitely more compression.

So what does it take to establish the answers to the problems?

If Problem 1 is answered in the negative, then one can imagine explicitly exhibiting the pattern generated by rule 30 at some known step—and being able to see the periodic sequence in the center. Of course, Problem 1 could still be answered in the negative, but less constructively. One might be able to show that eventually the sequence has to be periodic, but not know even any bound on where this might happen. If Problem 3 is answered in the negative, a way to do this is to explicitly give an algorithm (or, say, a Turing machine) that does the computation with less than O(n) computational resources.

But let’s say one has such an algorithm. One still has to prove that for all n, the algorithm will correctly reproduce the nth value. This might be easy. Perhaps there would just be a proof by induction or some such. But it might be arbitrarily hard. For example, it could be that for most n, the running time of the algorithm is clearly less than n. But it might not be obvious that the running time will always even be finite. Indeed, the “halting problem” for the algorithm might simply be undecidable. But just showing that a particular algorithm doesn’t halt for a given n doesn’t really tell one anything about the answer to the problem. For that one would have to show that there’s no algorithm that exists that will successfully halt in less than O(n) time.

The mention of undecidability brings up an issue, however: just what axiom system is one supposed to use to answer the problems? For the purposes of the Prize, I’ll just say “the traditional axioms of standard mathematics”, which one can assume are Peano arithmetic and/or the axioms of set theory (with or without the continuum hypothesis).

Could it be that the answers to the problems depend on the choice of axioms—or even that they’re independent of the traditional axioms (in the sense of Gödel’s incompleteness theorem)? Historical experience in mathematics makes this seem extremely unlikely, because, to date, essentially all “natural” problems in mathematics seem to have turned out to be decidable in the (sometimes rather implicit) axiom system that’s used in doing the mathematics.

In the computational universe, though—freed from the bounds of historical math tradition—it’s vastly more common to run into undecidability. And, actually, my guess is that a fair fraction of long-unsolved problems even in traditional mathematics will also turn out to be undecidable. So that definitely raises the possibility that the problems here could be independent of at least some standard axiom systems.

OK, but assume there’s no undecidability around, and one’s not dealing with the few cases in which one can just answer a problem by saying “look at this explicitly constructed thing”. Well, then to answer the problem, we’re going to have to give a proof.

In essence what drives the need for proof is the presence of something infinite. We want to know something for any n, even infinitely large, etc. And the only way to handle this is then to represent things symbolically (“the symbol Infinity means infinity”, etc.), and apply formal rules to everything, defined by the axioms in the underlying axiom system one’s assuming.

In the best case, one might be able to just explicitly exhibit that series of rule applications—in such a way that a computer can immediately verify that they’re correct. Perhaps the series of rule applications could be found by automated theorem proving (as in FindEquationalProof). More likely, it might be constructed using a proof assistant system.

It would certainly be exciting to have a fully formalized proof of the answer to any of the problems. But my guess is that it’ll be vastly easier to construct a standard proof of the kind human mathematicians traditionally do. What is such a proof? Well, it’s basically an argument that will convince other humans that a result is correct.

There isn’t really a precise definition of that. In our step-by-step solutions in Wolfram|Alpha, we’re effectively proving results (say in calculus) in such a way that students can follow them. In an academic math journal, one’s giving proofs that successfully get past the peer review process for the journal.

My own guess would be that if one were to try to formalize essentially any nontrivial proof in the math literature, one would find little corners that require new results, though usually ones that wouldn’t be too hard to get.

How can we handle this in practice for our prizes? In essence, we have to define a computational contract for what constitutes success, and when prize money should be paid out. For a constructive proof, we can get Wolfram Language code that can explicitly be run on any sufficiently large computer to establish the result. For formalized proofs, we can get Wolfram Language code that can run through the proof, validating each step.

But what about for a “human proof”? Ultimately we have no choice but to rely on some kind of human review process. We can ask multiple people to verify the proof. We could have some blockchain-inspired scheme where people “stake” the correctness of the proof, then if one eventually gets consensus (whatever this means) one pays out to people some of the prize money, in proportion to their stake. But whatever is done, it’s going to be an imperfect, “societal” result—like almost all of the pure mathematics that’s so far been done in the world.

What Will It Take?

OK, so for people interested in working on the Problems, what skills are relevant? I don’t really know. It could be discrete and combinatorial mathematics. It could be number theory, if there’s a correspondence with number-based systems found. It could be some branch of algebraic mathematics, if there’s a correspondence with algebraic systems found. It could be dynamical systems theory. It could be something closer to mathematical logic or theoretical computer science, like the theory of term rewriting systems.

Of course, it could be that no existing towers of knowledge—say in branches of mathematics—will be relevant to the problems, and that to solve them will require building “from the ground up”. And indeed that’s effectively what ended up happening in the solution for my 2,3 Turing Machine Prize in 2007.

I’m a great believer in the power of computer experiments—and of course it’s on the basis of computer experiments that I’ve formulated the Rule 30 Prize Problems. But there are definitely more computer experiments that could be done. So far we know a billion elements in the center column sequence. And so far the sequence doesn’t seem to show any deviation from randomness (at least based on tests I’ve tried). But maybe at a trillion elements (which should be well within range of current computer systems) or a quadrillion elements, or more, it eventually will—and it’s definitely worth doing the computations to check.

The direct way to compute n elements in the center column is to run rule 30 for n steps, using at an intermediate stage up to n cells of memory. The actual computation is quite well optimized in the Wolfram Language. Running on my desktop computer, it takes less than 0.4 seconds to compute 100,000 elements:

CellularAutomaton
&#10005
CellularAutomaton[30, {{1}, 0}, {100000, {{0}}}]; // Timing

Internally, this is using the fact that rule 30 can be expressed as Xor[p, Or[q, r]], and implemented using bitwise operations on whole words of data at a time. Using explicit bitwise operations on long integers takes about twice as long as the built-in CellularAutomaton function:

Module
&#10005
Module[{a = 1}, 
   Table[BitGet[a, a = BitXor[a, BitOr[2 a, 4 a]]; i - 1], {i, 
     100000}]]; // Timing

But these results are from single CPU processors. It’s perfectly possible to imagine parallelizing across many CPUs, or using GPUs. One might imagine that one could speed up the computation by effectively caching the results of many steps in rule 30 evolution, but the fact that across the rows of the rule 30 pattern all blocks appear to occur with at least roughly equal frequency makes it seem as though this would not lead to significant speedup.

Solving some types of math-like problems seem pretty certain to require deep knowledge of high-level existing mathematics. For example, it seems quite unlikely that there can be an “elementary” proof of Fermat’s last theorem, or even of the four-color theorem. But for the Rule 30 Prize Problems it’s not clear to me. Each of them might need sophisticated existing mathematics, or they might not. They might be accessible only to people professionally trained in mathematics, or they might be solvable by clever “programming-style” or “puzzle-style” work, without sophisticated mathematics.

Generalizations and Relations

Sometimes the best way to solve a specific problem is first to solve a related problem—often a more general one—and then come back to the specific problem. And there are certainly many problems related to the Rule 30 Prize Problems that one can consider.

For example, instead of looking at the vertical column of cells at the center of the rule 30 pattern, one could look at a column of cells in a different direction. At 45°, it’s easy to see that any sequence must be periodic. On the left the periods increase very slowly; on the right they increase rapidly. But what about other angles?

Or what about looking at rows of cells in the pattern? Do all possible blocks occur? How many steps is it before any given block appears? The empirical evidence doesn’t see any deviation from blocks occurring at random, but obviously, for example, successive rows are highly correlated.

What about different initial conditions? There are many dynamical systems–style results about the behavior of rule 30 starting with equal probability from all possible infinite initial conditions. In this case, for example, it’s easy to show that all possible blocks occur with equal frequency, both at a given row, and in a given vertical column. Things get more complicated if one asks for initial conditions that correspond, for example, to all possible sequences generated by a given finite state machine, and one could imagine that from a sequence of results about different sets of possible initial conditions, one would eventually be able to say something about the case of the single black cell initial condition.

Another straightforward generalization is just to look not at a single black cell initial condition, but at other “special” initial conditions. An infinite periodic initial condition will always give periodic behavior (that’s the same as one gets in a finite-size region with periodic boundary conditions). But one can, for example, study what happens if one puts a “single defect” in the periodic pattern:

A 'single defect' in the periodic pattern
&#10005
GraphicsRow[(ArrayPlot[
     CellularAutomaton[30, 
      MapAt[1 - #1 &, Flatten[Table[#1, Round[150/Length[#1]]]], 50], 
      100]] &) /@ {{1, 0}, {1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0}, {1, 
    0, 0, 0, 0, 0, 0}, {1, 1, 1, 0, 0}}]

One can also ask what happens when one has not just a single black cell, but some longer sequence in the initial conditions. How does the center column change with different initial sequences? Are there finite initial sequences that lead to “simpler” center columns?

Or are there infinite initial conditions generated by other computational systems (say substitution systems) that aren’t periodic, but still give somehow simple rule 30 patterns?

Then one can imagine going “beyond” rule 30. What happens if one adds longer-range “exceptions” to the rules? When do extensions to rule 30 show behavior that can be analyzed in one way or another? And can one then see the effect of removing the “exceptions” in the rule?

Of course, one can consider rules quite different from rule 30 as well—and perhaps hope to develop intuition or methods relevant to rule 30 by looking at other rules. Even among the 256 two-color nearest-neighbor rules, there are others that show complex behavior starting from a simple initial condition:

ArrayPlot
&#10005
Row[Riffle[
  Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {150, All}], 
      PixelConstrained -> 1, Frame -> False], 
     Style[Text[StringTemplate["rule ``"][#]], 12], 
     LabelStyle -> Opacity[.5]] & /@ {45, 73}, Spacer[8]]]

And if one looks at larger numbers of colors and larger neighbors one can find an infinite number of examples. There’s all sorts of behavior that one sees. And, for example, given any particular sequence, one can search for rules that will generate it as their center column. One can also try to classify the center-column sequences that one sees, perhaps identifying a general class “like rule 30” about which global statements can be made.

But let’s discuss the specific Rule 30 Prize Problems. To investigate the possibility of periodicity in rule 30 (as in Problem 1), one could study lots of different rules, looking for examples with very long periods, or very long transients—and try to use these to develop an intuition for how and when these can occur.

To investigate the equal-frequency phenomenon of Problem 2, one can look at different statistical features, and see both in rule 30 and across different rules when it’s possible to detect regularity.

For Problem 3, one can start looking at different levels of computational effort. Can one find the nth value with computational effort O(nγ) for any γ<2 (I don't know any method to achieve this)? Can one show that one can’t find the nth value with less than O(log(n)) computational effort? What about with less than O(log(n)) available memory? What about for different rules? Periodic and nested patterns are easy to compute quickly. But what other examples can one find?

As I’ve mentioned, a big achievement would be to show computation universality for rule 30. But even if one can’t do it for rule 30, finding additional examples (beyond, for example, rule 110) will help build intuition about what might be going on in rule 30.

Then there’s NP-completeness. Is there a way of setting up some question about the behavior of rule 30 for some family of initial conditions where it’s possible to prove that the question is NP-complete? If this worked, it would be an exciting result for cryptography. And perhaps, again, one can build up intuition by looking at other rules, even ones that are more “purposefully constructed” than rule 30.

How Hard Are the Problems?

When I set up my 2,3 Turing Machine Prize in 2007 I didn’t know if it’d be solved in a month, a year, a decade, a century, or more. As it turned out, it was actually solved in about four months. So what will happen with the Rule 30 Prize Problems? I don’t know. After nearly 40 years, I’d be surprised if any of them could now be solved in a month (but it’d be really exciting if that happened!). And of course some superficially similar problems (like features of the digits of π) have been out there for well over a century.

It’s not clear whether there’s any sophisticated math (or computer science) that exists today that will be helpful in solving the problems. But I’m confident that whatever is built to solve them will provide structure that will be important for solving other problems about the computational universe. And the longer it takes (think Fermat’s last theorem), the larger the amount of useful structure is likely to be built on the way to a solution.

I don’t know if solutions to the problems will be “obviously correct” (it’ll help if they’re constructive, or presented in computable form), or whether there’ll be a long period of verification to go through. I don’t know if proofs will be comparatively short, or outrageously long. I don’t know if the solutions will depend on details of axiom systems (“assuming the continuum hypothesis”, etc.), or if they’ll be robust for any reasonable choices of axioms. I don’t know if the three problems are somehow “comparably difficult”—or if one or two might be solved, with the others holding out for a very long time.

But what I am sure about is that solving any of the problems will be a significant achievement. I’ve picked the problems to be specific, definite and concrete. But the issues of randomness and computational irreducibility that they address are deep and general. And to know the solutions to these problems will provide important evidence and raw material for thinking about these issues wherever they occur.

Of course, having lived now with rule 30 and its implications for nearly 40 years, I will personally be thrilled to know for certain even a little more about its remarkable behavior.

Viewing all 205 articles
Browse latest View live