Wolfram Language Artificial Intelligence: The Image Identification Project

May 13, 2015, 9:13 am

≪ Previous: Instant Apps for the Apple Watch with the Wolfram Language

“What is this a picture of?” Humans can usually answer such questions instantly, but in the past it’s always seemed out of reach for computers to do this. For nearly 40 years I’ve been sure computers would eventually get there—but I’ve wondered when.

I’ve built systems that give computers all sorts of intelligence, much of it far beyond the human level. And for a long time we’ve been integrating all that intelligence into the Wolfram Language.

Now I’m excited to be able to say that we’ve reached a milestone: there’s finally a function called ImageIdentify built into the Wolfram Language that lets you ask, “What is this a picture of?”—and get an answer.

And today we’re launching the Wolfram Language Image Identification Project on the web to let anyone easily take any picture (drag it from a web page, snap it on your phone, or load it from a file) and see what ImageIdentify thinks it is:

It won’t always get it right, but most of the time I think it does remarkably well. And to me what’s particularly fascinating is that when it does get something wrong, the mistakes it makes mostly seem remarkably human.

It’s a nice practical example of artificial intelligence. But to me what’s more important is that we’ve reached the point where we can integrate this kind of “AI operation” right into the Wolfram Language—to use as a new, powerful building block for knowledge-based programming.

Now in the Wolfram Language

In a Wolfram Language session, all you need do to identify an image is feed it to the ImageIdentify function:

In[1]:= ImageIdentify[image:giant anteater]

What you get back is a symbolic entity, that the Wolfram Language can then do more computation with—like, in this case, figure out if you’ve got an animal, a mammal, etc. Or just ask for a definition:

In[2]:= giant anteater ["Definition"]

Or, say, generate a word cloud from its Wikipedia entry:

And if one had lots of photographs, one could immediately write a Wolfram Language program that, for example, gave statistics on the different kinds of animals, or planes, or devices, or whatever, that appear in the photographs.

With ImageIdentify built right into the Wolfram Language, it’s easy to create APIs, or apps, that use it. And with the Wolfram Cloud, it’s also easy to create websites—like the Wolfram Language Image Identification Project.

Personal Backstory

For me personally, I’ve been waiting a long time for ImageIdentify. Nearly 40 years ago I read books with titles like The Computer and the Brain that made it sound inevitable we’d someday achieve artificial intelligence—probably by emulating the electrical connections in a brain. And in 1980, buoyed by the success of my first computer language, I decided I should think about what it would take to achieve full-scale artificial intelligence.

Part of what encouraged me was that—in an early premonition of the Wolfram Language—I’d based my first computer language on powerful symbolic pattern matching that I imagined could somehow capture certain aspects of human thinking. But I knew that while tasks like image identification were also based on pattern matching, they needed something different—a more approximate form of matching.

I tried to invent things like approximate hashing schemes. But I kept on thinking that brains manage to do this; we should get clues from them. And this led me to start studying idealized neural networks and their behavior.

Meanwhile, I was also working on some fundamental questions in natural science—about cosmology and about how structures arise in our universe—and studying the behavior of self-gravitating collections of particles.

And at some point I realized that both neural networks and self-gravitating gases were examples of systems that had simple underlying components, but somehow achieved complex overall behavior. And in getting to the bottom of this, I wound up studying cellular automata and eventually making all the discoveries that became A New Kind of Science.

So what about neural networks? They weren’t my favorite type of system: they seemed a little too arbitrary and complicated in their structure compared to the other systems that I studied in the computational universe. But every so often I would think about them again, running simulations to understand more about the basic science of their behavior, or trying to see how they could be used for practical tasks like approximate pattern matching:

Some of my early work on neural networks--from 1983...

Neural networks in general have had a remarkable roller-coaster history. They first burst onto the scene in the 1940s. But by the 1960s, their popularity had waned, and the word was that it had been “mathematically proven” that they could never do anything very useful.

It turned out, though, that that was only true for one-layer “perceptron” networks. And in the early 1980s, there was a resurgence of interest, based on neural networks that also had a “hidden layer”. But despite knowing many of the leaders of this effort, I have to say I remained something of a skeptic, not least because I had the impression that neural networks were mostly getting used for tasks that seemed like they would be easy to do in lots of other ways.

I also felt that neural networks were overly complex as formal systems—and at one point even tried to develop my own alternative. But still I supported people at my academic research center studying neural networks, and included papers about them in my Complex Systems journal.

I knew that there were practical applications of neural networks out there—like for visual character recognition—but they were few and far between. And as the years went by, little of general applicability seemed to emerge.

Machine Learning

Meanwhile, we’d been busy developing lots of powerful and very practical ways of analyzing data, in Mathematica and in what would become the Wolfram Language. And a few years ago we decided it was time to go further—and to try to integrate highly automated general machine learning. The idea was to make broad, general functions with lots of power; for example, to have a single function Classify that could be trained to classify any kind of thing: say, day vs. night photographs, sounds from different musical instruments, urgency level of email, or whatever.

We put in lots of state-of-the-art methods. But, more importantly, we tried to achieve complete automation, so that users didn’t have to know anything about machine learning: they just had to call Classify.

I wasn’t initially sure it was going to work. But it does, and spectacularly.

People can give training data on pretty much anything, and the Wolfram Language automatically sets up classifiers for them to use. We’re also providing more and more built-in classifiers, like for languages, or country flags:

In[4]:= Classify["Language", {"欢迎光临", "Welcome", "Bienvenue", "Добро пожаловать", "Bienvenidos"}]

In[5]:= Classify["CountryFlag", {images:flags}]

And a little while ago, we decided it was time to try a classic large-scale classifier problem: image identification. And the result now is ImageIdentify.

It’s All about Attractors

What is image identification really about? There are some number of named kinds of things in the world, and the point is to tell which of them a particular picture is of. Or, more formally, to map all possible images into a certain set of symbolic names of objects.

We don’t have any intrinsic way to describe an object like a chair. All we can do is just give lots of examples of chairs, and effectively say, “Anything that looks like one of these we want to identify as a chair.” So in effect we want images that are “close” to our examples of chairs to map to the name “chair”, and others not to.

Now, there are lots of systems that have this kind of “attractor” behavior. As a physical example, think of a mountainscape. A drop of rain may fall anywhere on the mountains, but (at least in an idealized model) it will flow down to one of a limited number of lowest points. Nearby drops will tend to flow to the same lowest point. Drops far away may be on the other side of a watershed, and so will flow to other lowest points.

In a mountainscape, water flows to different lowest points depending on where it falls on the terrain

The drops of rain are like our images; the lowest points are like the different kinds of objects. With raindrops we’re talking about things physically moving, under gravity. But images are composed of digital pixels. And instead of thinking about physical motion, we have to think about digital values being processed by programs.

And exactly the same “attractor” behavior can happen there. For example, there are lots of cellular automata in which one can change the colors of a few cells in their initial conditions, but still end up in the same fixed “attractor” final state. (Most cellular automata actually show more interesting behavior, that doesn’t go to a fixed state, but it’s less clear how to apply this to recognition tasks.)

So what happens if we take images and apply cellular automaton rules to them? In effect we’re doing image processing, and indeed some common image processing operations (both done on computers and in human visual processing) are just simple 2D cellular automata.

A lot of image processing can be--and is--done with cellular automata

It’s easy to get cellular automata to pick out certain features of an image—like blobs of dark pixels. But for real image identification, there’s more to do. In the mountain analogy, we have to “sculpt” the mountainscape so that the right raindrops flow to the right points.

Programs Automatically Made

So how do we do this? In the case of digital data like images, it isn’t known how to do this in one fell swoop; we only know how to do it iteratively, and incrementally. We have to start from a base “flat” system, and gradually do the “sculpting”.

There’s a lot that isn’t known about this kind of iterative sculpting. I’ve thought about it quite extensively for discrete programs like cellular automata (and Turing machines), and I’m sure something very interesting can be done. But I’ve never figured out just how.

For systems with continuous (real-number) parameters, however, there’s a great method called back propagation—that’s based on calculus. It’s essentially a version of the very common method of gradient descent, in which one computes derivatives, then uses them to work out how to change parameters to get the system one is using to better fit the behavior one wants.

So what kind of system should one use? A surprisingly general choice is neural networks. The name makes one think of brains and biology. But for our purposes, neural networks are just formal, computational, systems, that consist of compositions of multi-input functions with continuous parameters and discrete thresholds.

How easy is it to make one of these neural networks perform interesting tasks? In the abstract, it’s hard to know. And for at least 20 years my impression was that in practice neural networks could mostly do only things that were also pretty easy to do in other ways.

But a few years ago that began to change. And one started hearing about serious successes in applying neural networks to practical problems, like image identification.

What made that happen? Computers (and especially linear algebra in GPUs) got fast enough that—with a variety of algorithmic tricks, some actually involving cellular automata—it became practical to train neural networks with millions of neurons, on millions of examples. (By the way, these were “deep” neural networks, no longer restricted to having very few layers.) And somehow this suddenly brought large-scale practical applications within reach.

Why Now?

I don’t think it’s a coincidence that this happened right when the number of artificial neurons being used came within striking distance of the number of neurons in relevant parts of our brains.

It’s not that this number is significant on its own. Rather, it’s that if we’re trying to do tasks—like image identification—that human brains do, then it’s not surprising if we need a system with a similar scale.

Humans can readily recognize a few thousand kinds of things—roughly the number of picturable nouns in human languages. Lower animals likely distinguish vastly fewer kinds of things. But if we’re trying to achieve “human-like” image identification—and effectively map images to words that exist in human languages—then this defines a certain scale of problem, which, it appears, can be solved with a “human-scale” neural network.

There are certainly differences between computational and biological neural networks—although after a network is trained, the process of, say, getting a result from an image seems rather similar. But the methods used to train computational neural networks are significantly different from what it seems plausible for biology to use.

Still, in the actual development of ImageIdentify, I was quite shocked at how much was reminiscent of the biological case. For a start, the number of training images—a few tens of millions—seemed very comparable to the number of distinct views of objects that humans get in their first couple of years of life.

All It Saw Was the Hat

There were also quirks of training that seemed very close to what’s seen in the biological case. For example, at one point, we’d made the mistake of having no human faces in our training. And when we showed a picture of Indiana Jones, the system was blind to the presence of his face, and just identified the picture as a hat. Not surprising, perhaps, but to me strikingly reminiscent of the classic vision experiment in which kittens reared in an environment of vertical stripes are blind to horizontal stripes.

When we gave it a picture of Indiana Jones, it zeroed in on the hat

Probably much like the brain, the ImageIdentify neural network has many layers, containing a variety of different kinds of neurons. (The overall structure, needless to say, is nicely described by a Wolfram Language symbolic expression.)

It’s hard to say meaningful things about much of what’s going on inside the network. But if one looks at the first layer or two, one can recognize some of the features that it’s picking out. And they seem to be remarkably similar to features we know are picked out by real neurons in the primary visual cortex.

I myself have long been interested in things like visual texture recognition (are there “texture primitives”, like there are primary colors?), and I suspect we’re now going to be able to figure out a lot about this. I also think it’s of great interest to look at what happens at later layers in the neural network—because if we can recognize them, what we should see are “emergent concepts” that in effect describe classes of images and objects in the world—including ones for which we don’t yet have words in human languages.

We Lost the Anteaters!

Like many projects we tackle for the Wolfram Language, developing ImageIdentify required bringing many diverse things together. Large-scale curation of training images. Development of a general ontology of picturable objects, with mapping to standard Wolfram Language constructs. Analysis of the dynamics of neural networks using physics-like methods. Detailed optimization of parallel code. Even some searching in the style of A New Kind of Science for programs in the computational universe. And lots of judgement calls about how to create functionality that would actually be useful in practice.

At the outset, it wasn’t clear to me that the whole ImageIdentify project was going to work. And early on, the rate of utterly misidentified images was disturbingly high. But one issue after another got addressed, and gradually it became clear that finally we were at a point in history when it would be possible to create a useful ImageIdentify function.

There were still plenty of problems. The system would do well on certain things, but fail on others. Then we’d adjust something, and there’d be new failures, and a flurry of messages with subject lines like “We lost the anteaters!” (about how pictures that ImageIdentify used to correctly identify as anteaters were suddenly being identified as something completely different).

Debugging ImageIdentify was an interesting process. What counts as reasonable input? What’s reasonable output? How should one make the choice between getting more-specific results, and getting results that one’s more certain aren’t incorrect (just a dog, or a hunting dog, or a beagle)?

Sometimes we saw things that at first seemed completely crazy. A pig misidentified as a “harness”. A piece of stonework misidentified as a “moped”. But the good news was that we always found a cause—like confusion from the same irrelevant objects repeatedly being in training images for a particular type of object (e.g. “the only time ImageIdentify had ever seen that type of Asian stonework was in pictures that also had mopeds”).

To test the system, I often tried slightly unusual or unexpected images:

Unexpected images often gave unexpected results

And what I found was something very striking, and charming. Yes, ImageIdentify could be completely wrong. But somehow the errors seemed very understandable, and in a sense very human. It seemed as if what ImageIdentify was doing was successfully capturing some of the essence of the human process of identifying images.

So what about things like abstract art? It’s a kind of Rorschach-like test for both humans and machines—and an interesting glimpse into the “mind” of ImageIdentify:

Abstract art gets fascinating interpretations, sort of like Rorschach-blot interpretations from humans

Out into the Wild

Something like ImageIdentify will never truly be finished. But a couple of months ago we released a preliminary version in the Wolfram Language. And today we’ve updated that version, and used it to launch the Wolfram Language Image Identification Project.

We’ll continue training and developing ImageIdentify, not least based on feedback and statistics from the site. Like for Wolfram|Alpha in the domain of natural language understanding, without actual usage by humans there’s no real way to realistically assess progress—or even to define just what the goals should be for “natural image understanding”.

I must say that I find it fun to play with the Wolfram Language Image Identification Project. It’s satisfying after all these years to see this kind of artificial intelligence actually working. But more than that, when you see ImageIdentify respond to a weird or challenging image, there’s often a certain “aha” feeling, like one was just shown in a very human-like way some new insight—or joke—about an image.

Some of ImageIdentify's errors are quite funny

Underneath, of course, it’s just running code—with very simple inner loops that are pretty much the same as, for example, in my neural network programs from the beginning of the 1980s (except that now they’re Wolfram Language functions, rather than low-level C code).

It’s a fascinating—and extremely unusual—example in the history of ideas: neural networks were studied for 70 years, and repeatedly dismissed. Yet now they are what has brought us success in such a quintessential example of an artificial intelligence task as image identification. I expect the original pioneers of neural networks—like Warren McCulloch and Walter Pitts—would find little surprising about the core of what the Wolfram Language Image Identification Project does, though they might be amazed that it’s taken 70 years to get here.

But to me the greater significance is what can now be done by integrating things like ImageIdentify into the whole symbolic structure of the Wolfram Language. What ImageIdentify does is something humans learn to do in each generation. But symbolic language gives us the opportunity to represent shared intellectual achievements across all of human history. And making all these things computational is, I believe, something of monumental significance, that I am only just beginning to understand.

But for today, I hope you will enjoy the Wolfram Language Image Identification Project. Think of it as a celebration of where artificial intelligence has reached. Think of it as an intellectual recreation that helps build intuition for what artificial intelligence is like. But don’t forget the part that I think is most exciting: it’s also practical technology, that you can use here and now in the Wolfram Language, and deploy wherever you want.

↧

George Boole: A 200-Year View

November 2, 2015, 3:24 pm

≫ Next: How Should We Talk to AIs?

≪ Previous: Wolfram Language Artificial Intelligence: The Image Identification Project

Today is the 200th anniversary of the birth of George Boole. In our modern digital world, we’re always hearing about “Boolean variables”—1 or 0, true or false. And one might think, “What a trivial idea! Why did someone even explicitly need to invent it?” But as is so often the case, there’s a deeper story—for Boolean variables were really just a side effect of an important intellectual advance that George Boole made.

When George Boole came onto the scene, the disciplines of logic and mathematics had developed quite separately for more than 2000 years. And George Boole’s great achievement was to show how to bring them together, through the concept of what’s now called Boolean algebra. And in doing so he effectively created the field of mathematical logic, and set the stage for the long series of developments that led for example to universal computation.

When George Boole invented Boolean algebra, his basic goal was to find a set of mathematical axioms that could reproduce the classical results of logic. His starting point was ordinary algebra, with variables like x and y, and operations like addition and multiplication.

At first, ordinary algebra seems a lot like logic. After all, p and q is the same as q and p, just as p×q = q×p. But if one looks in more detail, there are differences. Like p×p = p², but p and p is just p. Somewhat confusingly, Boole used the notation of standard algebra, but added special rules to create an axiom system that he then showed could reproduce all the usual results of logic.

Boole was rather informal in the way he described his axiom system. But within a few decades, it had been more precisely formalized, and over the course of the century that followed, a few progressively simpler forms of it were found. And then, as it happens, 16 years ago I ended up finishing this 150-year process, by finding—largely as a side effect of other science I was doing—the provably very simplest possible axiom system for logic, that actually happens to consist of just a single axiom.

Boolean logic axiom systems, simplifying down to a single axiom

I thought this axiom was pretty neat, and looking at where it lies in the space of possible axioms has interesting implications for the foundations of mathematics and logic. But in the context of George Boole, one can say that it’s a minimal version of his big idea: that one can have a mathematical axiom system that reproduces all the results of logic just by what amount to simple algebra-like transformations.

Who Was George Boole?

But let’s talk about George Boole, the person. Who was he, and how did he come to do what he did?

George Boole was born (needless to say) in 1815, in England, in the fairly small town of Lincoln, about 120 miles north of London. His father had a serious interest in science and mathematics, and had a small business as a shoemaker. George Boole was something of a self-taught prodigy, who first became locally famous at age 14 with a translation of a Greek poem that he published in the local newspaper. At age 16 he was hired as a teacher at a local school, and by that time he was reading calculus books, and apparently starting to formulate what would later be his idea about relations between mathematics and logic.

At age 19, George Boole did a startup: he started his own elementary school. It seems to have been decently successful, and in fact Boole continued making his living running (or “conducting” as it was then called) schools until he was in his thirties. He was involved with a few people educated in places like Cambridge, notably through the local Mechanics’ Institute (a little like a modern community college). But mostly he seems just to have learned by reading books on his own.

He took his profession as a schoolteacher seriously, and developed all sorts of surprisingly modern theories about the importance of understanding and discovery (as opposed to rote memorization), and the value of tangible examples in areas like mathematics (he surely would have been thrilled by what’s now possible with computers).

When he was 23, Boole started publishing papers on mathematics. His early papers were about hot topics of the time, such as calculus of variations. Perhaps it was his interest in education and exposition that led him to try creating different formalisms, but soon he became a pioneer in the “calculus of operations”: doing calculus by manipulating operators rather than explicit algebraic expressions.

It wasn’t long before he was interacting with leading British mathematicians of the day, and getting positive feedback. He considered going to Cambridge to become a “university person”, but was put off when told that he would have to start with the standard undergraduate course, and stop doing his own research.

Mathematical Analysis of Logic

Logic as a field of study had originated in antiquity, particularly with the work of Aristotle. It had been a staple of education throughout the Middle Ages and beyond, fitting into the practice of rote learning by identifying specific patterns of logical arguments (“syllogisms”) with mnemonics like “bArbArA” and “cElArEnt”. In many ways, logic hadn’t changed much in over a thousand years, though by the 1800s there were efforts to make it more streamlined and “formal”. But the question was how. And in particular, should this happen through the methods of philosophy, or mathematics?

In early 1847, Boole’s friend Augustus de Morgan had become embroiled in a piece of academic unpleasantness over the question. And this led Boole quickly to go off and work out his earlier ideas about how logic could be formulated using mathematics. The result was his first book, The Mathematical Analysis of Logic, published the same year:

The book was not long—only 86 pages. But it explained Boole’s idea of representing logic using a form of algebra. The notion that one could have an algebra with variables that weren’t just ordinary numbers happened to have just arisen in Hamilton’s 1843 invention of quaternion algebra—and Boole was influenced by this. (Galois had also done something similar in 1832 working on groups and finite fields.)

150 years before Boole, Gottfried Leibniz had also thought about using algebra to represent logic. But he’d never managed to see quite how. And the idea seems to have been all but forgotten until Boole finally succeeded in doing it in 1847.

Looking at Boole’s book today, much of it is quite easy to understand. Here, for example, is him showing how his algebraic formulation reproduces a few standard results in logic:

Boole's algebraic formulations, reproducing standard results in logic

At a surface level, this all seems fairly straightforward. “And” is represented by multiplication of variables xy, “not” by 1–x, and “(exclusive) or” by x+y–2xy. There are also extra constraints like x²=x. But when one tries digging deeper, things become considerably murkier. Just what are x and y supposed to be? Today we’d call these Boolean variables, and imagine they could have discrete values 1 or 0, representing true or false. But Boole seems to have never wanted to talk about anything that explicit, or anything discrete or combinatorial. All he ever seemed to discuss was algebraic expressions and equations—even to the point of using series expansions to effectively enumerate possible combinations of values for logical variables.

The Laws of Thought

When Boole wrote his first book he was still working as a teacher and running a school. But he had also become well known as a mathematician, and in 1849, when Queen’s College, Cork (now University College Cork) opened in Ireland, Boole was hired as its first math professor. And once in Cork, Boole started to work on what would become his most famous book, An Investigation of the Laws of Thought:

His preface began: “The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed; to give expression to them in the symbolical language of a Calculus, and upon this foundation to establish the science of Logic and construct its method; …”

Boole appears to have seen himself as trying to create a calculus for the “science of intellectual powers” analogous to Newton’s calculus for physical science. But while Newton had been able to rely on concepts like space and time to inform the structure of his calculus, Boole had to build on the basis of a model of how the mind works, which for him was unquestionably logic.

The first part of Laws of Thought is basically a recapitulation of Boole’s earlier book on logic, but with additional examples—such as a chapter covering logical proofs about the existence and characteristics of God. The second part of the book is in a sense more mathematically traditional. For instead of interpreting his algebraic variables as related to logic, he interprets them as traditional numbers corresponding to probabilities—and in doing so shows that the laws for combining probabilities of events have the same structure as the laws for combining logical statements.

For the most part Laws of Thought reads like a mathematical work, with abstract definitions and formal conclusions. But in the final chapter Boole tries to connect what he has done to empirical questions about the operation of the mind. He discusses how free will can be compatible with definite laws of thought. He talks about how imprecise human experiences can lead to precise concepts. He discusses whether there is truth that humans can recognize that goes beyond what mathematical laws can ever explain. And he talks about how an understanding of human thinking should inform education.

The Rest of Boole’s Life

After the publication of Laws of Thought, George Boole stayed in Cork, living another decade and dying in 1864 of pneumonia at the age of 49. He continued to publish widely on mathematics, but never published on logic again, though he probably intended to do so.

In his lifetime, Boole was much more recognized for his work on traditional mathematics than on logic. He wrote two textbooks, one in 1859 on differential equations, and one in 1860 on difference equations. Both are clean and elegant expositions. And interestingly, while there are endless modern alternatives to Boole’s Differential Equations, sufficiently little has been done on difference equations that when we were implementing them in Mathematica in the late 1990s, Boole’s 1860 book was still an important reference, notable especially for its nice examples of the factorization of linear difference operators.

What Was Boole Like?

What was Boole like as a person? There’s quite a bit of information on this, not least from his wife’s writings and from correspondence and reminiscences his sister collected when he died. From what one can tell, Boole was organized and diligent, with careful attention to detail. He worked hard, often late into the night, and could be so engrossed in his work that he became quite absent minded. Despite how he looks in pictures, he appears to have been rather genial in person. He was well liked as a teacher, and was a talented lecturer, though his blackboard writing was often illegible. He was a gracious and extensive correspondent, and made many visits to different people and places. He spent many years managing people, first at schools, and then at the university in Cork. He had a strong sense of justice, and while he did not like controversy, he was occasionally involved in it, and was not shy to maintain his position.

Despite his successes, Boole seems to have always thought of himself as a self-taught schoolteacher, rather than a member of the academic elite. And perhaps this helped in his ability to take intellectual risks. Whether it was playing fast and loose with differential operators in calculus, or finding ways to bend the laws of algebra so they could apply to logic, Boole seems to have always taken the attitude of just moving forward and seeing where he could go, trusting his own sense of what was correct and true.

Boole was single most of his life, though finally married at the age of 40. His wife, Mary Everest Boole, was 17 years his junior, and she outlived him by 52 years, dying in 1916. She had an interesting story in her own right, later in her life writing books with titles like Philosophy and Fun of Algebra, Logic Taught by Love, The Preparation of the Child for Science and The Message of Psychic Science to the World. George and Mary Boole had five daughters—who, along with their own children, had a wide range of careers and accomplishments, some quite mathematical.

Legacy

It is something of an irony that George Boole, committed as he was to the methods of algebra, calculus and continuous mathematics, should have come to symbolize discrete variables. But to be fair, this took a while. In the decades after he died, the primary influence of Boole’s work on logic was on the wave of abstraction and formalization that swept through mathematics—involving people like Frege, Peano, Hilbert, Whitehead, Russell and eventually Gödel and Turing. And it was only in 1937, with the work of Claude Shannon on switching networks, that Boolean algebra began to be used for practical purposes.

Today there is a lot on Boolean computation in Mathematica and the Wolfram Language, and in fact George Boole is the person with the largest number (15) of distinct functions in the system named after them.

But what has made Boole’s name so widely known is not Boolean algebra, it’s the much simpler notion of Boolean variables, which appear in essentially every computer language—leading to a progressive increase in mentions of the word “boolean” in publications since the 1950s:

The word "boolean" has appeared in increasing numbers of publications since the 1950s

Was this inevitable? In some sense, I suspect it was. For when one looks at history, sufficiently simple formal ideas have a remarkable tendency to eventually be widely used, even if they emerge only slowly from quite complex origins. Most often what happens is that at some moment the ideas become relevant to technology, and quickly then go from curiosities to mainstream.

My work on A New Kind of Science has made me think about enumerations of what amount to all possible “simple formal ideas”. Some have already become incorporated in technology, but many have not yet. But the story of George Boole and Boolean variables provides an interesting example of what can happen over the course of centuries—and how what at first seems obscure and abstruse can eventually become ubiquitous.

↧

How Should We Talk to AIs?

November 18, 2015, 11:30 am

≫ Next: What Is Spacetime, Really?

≪ Previous: George Boole: A 200-Year View

Not many years ago, the idea of having a computer broadly answer questions asked in plain English seemed like science fiction. But when we released Wolfram|Alpha in 2009 one of the big surprises (not least to me!) was that we’d managed to make this actually work. And by now people routinely ask personal assistant systems—many powered by Wolfram|Alpha—zillions of questions in ordinary language every day.

Ask questions in ordinary language, get answers from Wolfram|Alpha

It all works fairly well for quick questions, or short commands (though we’re always trying to make it better!). But what about more sophisticated things? What’s the best way to communicate more seriously with AIs?

I’ve been thinking about this for quite a while, trying to fit together clues from philosophy, linguistics, neuroscience, computer science and other areas. And somewhat to my surprise, what I’ve realized recently is that a big part of the answer may actually be sitting right in front of me, in the form of what I’ve been building towards for the past 30 years: the Wolfram Language.

Maybe this is a case of having a hammer and then seeing everything as a nail. But I’m pretty sure there’s more to it. And at the very least, thinking through the issue is a way to understand more about AIs and their relation to humans.

Computation Is Powerful

The first key point—that I came to understand clearly only after a series of discoveries I made in basic science—is that computation is a very powerful thing, that lets even tiny programs (like cellular automata, or neural networks) behave in incredibly complicated ways. And it’s this kind of thing that an AI can harness.

A cellular automaton with a very simple rule set (shown in the lower left corner) that produces highly complex behavior

Looking at pictures like this we might be pessimistic: how are we humans going to communicate usefully about all that complexity? Ultimately, what we have to hope is that we can build some kind of bridge between what our brains can handle and what computation can do. And although I didn’t look at it quite this way, this turns out to be essentially just what I’ve been trying to do all these years in designing the Wolfram Language.

Language of Computational Thinking

I have seen my role as being to identify lumps of computation that people will understand and want to use, like FindShortestTour, ImageIdentify or Predict. Traditional computer languages have concentrated on low-level constructs close to the actual hardware of computers. But in the Wolfram Language I’ve instead started from what we humans understand, and then tried to capture as much of it as possible in the language.

In the early years, we were mostly dealing with fairly abstract concepts, about, say, mathematics or logic or abstract networks. But one of the big achievements of recent years—closely related to Wolfram|Alpha—has been that we’ve been able to extend the structure we built to cover countless real kinds of things in the world—like cities or movies or animals.

One might wonder: why invent a language for all this; why not just use, say, English? Well, for specific things, like “hot pink”, “new york city” or “moons of pluto”, English is good—and actually for such things the Wolfram Language lets people just use English. But when one’s trying to describe more complex things, plain English pretty quickly gets unwieldy.

Imagine for example trying to describe even a fairly simple algorithmic program. A back-and-forth dialog—“Turing-test style”—would rapidly get frustrating. And a straight piece of English would almost certainly end up with incredibly convoluted prose like one finds in complex legal documents.

The Wolfram Language specifies clearly and succinctly how to create this image. The equivalent natural-language specification is complicated and subject to misinterpretation.

But the Wolfram Language is built precisely to solve such problems. It’s set up to be readily understandable to humans, capturing the way humans describe and think about things. Yet it also has a structure that allows arbitrary complexity to be assembled and communicated. And, of course, it’s readily understandable not just by humans, but also by machines.

I realize I’ve actually been thinking and communicating in a mixture of English and Wolfram Language for years. When I give talks, for example, I’ll say something in English, then I’ll just start typing to communicate my next thought with a piece of Wolfram Language code that executes right there.

Understanding AIs

But let’s get back to AI. For most of the history of computing, we’ve built programs by having human programmers explicitly write lines of code, understanding (apart from bugs!) what each line does. But achieving what can reasonably be called AI requires harnessing more of the power of computation. And to do this one has to go beyond programs that humans can directly write—and somehow automatically sample a broader swath of possible programs.

We can do this through the kind of algorithm automation we’ve long used in Mathematica and the Wolfram Language, or we can do it through explicit machine learning, or through searching the computational universe of possible programs. But however we do it, one feature of the programs that come out is that they have no reason to be understandable by humans.

Engineered programs are written to be human-readable. Automatically created or discovered programs are not necessarily human-readable.

At some level it’s unsettling. We don’t know how the programs work inside, or what they might be capable of. But we know they’re doing elaborate computation that’s in a sense irreducibly complex to analyze.

There’s another, very familiar place where the same kind of thing happens: the natural world. Whether we look at fluid dynamics, or biology, or whatever, we see all sorts of complexity. And in fact the Principle of Computational Equivalence that emerged from the basic science I did implies that this complexity is in a sense exactly the same as the complexity that can occur in computational systems.

Over the centuries we’ve been able to identify aspects of the natural world that we can understand, and then harness them to create technology that’s useful to us. And our traditional engineering approach to programming works more or less the same way.

But for AI, we have to venture out into the broader computational universe, where—as in the natural world—we’re inevitably dealing with things we cannot readily understand.

What Will AIs Do?

Let’s imagine we have a perfect, complete AI, that’s able to do anything we might reasonably associate with intelligence. Maybe it’ll get input from lots of IoT sensors. And it has all sorts of computation going on inside. But what is it ultimately going to try to do? What is its purpose going to be?

This is about to dive into some fairly deep philosophy, involving issues that have been batted around for thousands of years—but which finally are going to really matter in dealing with AIs.

One might think that as an AI becomes more sophisticated, so would its purposes, and that eventually the AI would end up with some sort of ultimate abstract purpose. But this doesn’t make sense. Because there is really no such thing as abstractly defined absolute purpose, derivable in some purely formal mathematical or computational way. Purpose is something that’s defined only with respect to humans, and their particular history and culture.

An “abstract AI”, not connected to human purposes, will just go along doing computation. And as with most cellular automata and most systems in nature, we won’t be able to identify—or attribute—any particular “purpose” to that computation, or to the system that does it.

Giving Goals for an AI

Technology has always been about automating things so humans can define goals, and then those goals can automatically be achieved by the technology.

For most kinds of technology, those goals have been tightly constrained, and not too hard to describe. But for a general computational system they can be completely arbitrary. So then the challenge is how to describe them.

What do you say to an AI to tell it what you want it to do for you? You’re not going to be able to tell it exactly what to do in each and every circumstance. You’d only be able to do that if the computations the AI could do were tightly constrained, like in traditional software engineering. But for the AI to work properly, it’s going to have to make use of broader parts of the computational universe. And it’s then a consequence of a phenomenon I call computational irreducibility that you’ll never be able to determine everything it’ll do.

So what’s the best way to define goals for an AI? It’s complicated. If the AI can experience your life alongside you—seeing what you see, reading your email, and so on—then, just like with a person you know well, you might be able to tell the AI at least simple goals just by saying them in natural language.

But what if you want to define more complex goals, or goals that aren’t closely associated with what the AI has already experienced? Then small amounts of natural language wouldn’t be enough. Perhaps the AI could go through a whole education. But a better idea would be to leverage what we have in the Wolfram Language, which in effect already has lots of knowledge of the world built into it, in a way that both the human and the AI can use.

AIs Talking to AIs

Thinking about how humans communicate with AIs is one thing. But how will AIs communicate with one another? One might imagine they could do literal transfers of their underlying representations of knowledge. But that wouldn’t work, because as soon as two AIs have had different “experiences”, the representations they use will inevitably be at least somewhat different.

And so, just like humans, the AIs are going to end up needing to use some form of symbolic language that represents concepts abstractly, without specific reference to the underlying representations of those concepts.

One might then think the AIs should just communicate in English; at least that way we’d be able to understand them! But it wouldn’t work out. Because the AIs would inevitably need to progressively extend their language—so even if it started as English, it wouldn’t stay that way.

In human natural languages, new words get added when there are new concepts that are widespread enough to make representing them in the language useful. Sometimes a new concept is associated with something new in the world (“blog”, “emoji”, “smartphone”, “clickbait”, etc.); sometimes it’s associated with a new distinction among existing things (“road” vs. “freeway”, “pattern” vs. “fractal”).

Often it’s science that gives us new distinctions between things, by identifying distinct clusters of behavior or structure. But the point is that AIs can do that on a much larger scale than humans. For example, our Image Identification Project is set up to recognize the 10,000 or so kinds of objects that we humans have everyday names for. But internally, as it’s trained on images from the world, it’s discovering all sorts of other distinctions that we don’t have names for, but that are successful at robustly separating things.

I’ve called these “post-linguistic emergent concepts” (or PLECs). And I think it’s inevitable that in a population of AIs, an ever-expanding hierarchy of PLECs will appear, forcing the language of the AIs to progressively expand.

But how could the framework of English support that? I suppose each new concept could be assigned a word formed from some hash-code-like collection of letters. But a structured symbolic language—as the Wolfram Language is—provides a much better framework. Because it doesn’t require the units of the language to be simple “words”, but allows them to be arbitrary lumps of symbolic information, such as collections of examples (so that, for example, a word can be represented by a symbolic structure that carries around its definitions).

So should AIs talk to each other in Wolfram Language? It seems to make a lot of sense—because it effectively starts from the understanding of the world that’s been developed through human knowledge, but then provides a framework for going further. It doesn’t matter how the syntax is encoded (input form, XML, JSON, binary, whatever). What matters is the structure and content that are built into the language.

Information Acquisition: The Billion-Year View

Over the course of the billions of years that life has existed on Earth, there’ve been a few different ways of transferring information. The most basic is genomics: passing information at the hardware level. But then there are neural systems, like brains. And these get information—like our Image Identification Project—by accumulating it from experiencing the world. This is the mechanism that organisms use to see, and to do many other “AI-ish” things.

But in a sense this mechanism is fundamentally limited, because every different organism—and every different brain—has to go through the whole process of learning for itself: none of the information obtained in one generation can readily be passed to the next.

But this is where our species made its great invention: natural language. Because with natural language it’s possible to take information that’s been learned, and communicate it in abstract form, say from one generation to the next. There’s still a problem however, because when natural language is received, it still has to be interpreted, in a separate way in each brain.

Information transfer: Level 0: genomics; Level 1: individual brains; Level 2: natural language; Level 3: computational knowledge language

And this is where the idea of a computational-knowledge language—like the Wolfram Language—is important: because it gives a way to communicate concepts and facts about the world, in a way that can immediately and reproducibly be executed, without requiring separate interpretation on the part of whatever receives it.

It’s probably not a stretch to say that the invention of human natural language was what led to civilization and our modern world. So then what are the implications of going to another level: of having a precise computational-knowledge language, that carries not just abstract concepts, but also a way to execute them?

One possibility is that it may define the civilization of the AIs, whatever that may turn out to be. And perhaps this may be far from what we humans—at least in our present state—can understand. But the good news is that at least in the case of the Wolfram Language, precise computational-knowledge language isn’t incomprehensible to humans; in fact, it was specifically constructed to be a bridge between what humans can understand, and what machines can readily deal with.

What If Everyone Could Code?

So let’s imagine a world in which in addition to natural language, it’s also common for communication to occur through a computational-knowledge language like the Wolfram Language. Certainly, a lot of the computational-knowledge-language communication will be between machines. But some of it will be between humans and machines, and quite possibly it would be the dominant form of communication here.

In today’s world, only a small fraction of people can write computer code—just as, 500 or so years ago, only a small fraction of people could write natural language. But what if a wave of computer literacy swept through, and the result was that most people could write knowledge-based code?

Natural language literacy enabled many features of modern society. What would knowledge-based code literacy enable? There are plenty of simple things. Today you might get a menu of choices at a restaurant. But if people could read code, there could be code for each choice, that you could readily modify to your liking. (And actually, something very much like this is soon going be possible—with Wolfram Language code—for biology and chemistry lab experiments.) Another implication of people being able to read code is for rules and contracts: instead of just writing prose to be interpreted, one can have code to be read by humans and machines alike.

But I suspect the implications of widespread knowledge-based code literacy will be much deeper—because it will not only give a wide range of people a new way to express things, but will also give them a new way to think about them.

Will It Actually Work?

So, OK, let’s say we want to use the Wolfram Language to communicate with AIs. Will it actually work? To some extent we know it already does. Because inside Wolfram|Alpha and the systems based on it, what’s happening is that natural language questions are being converted to Wolfram Language code.

But what about more elaborate applications of AI? Many places where the Wolfram Language is used are examples of AI, whether they’re computing with images or text or data or symbolic structures. Sometimes the computations involve algorithms whose goals we can precisely define, like FindShortestTour; sometimes they involve algorithms whose goals are less precise, like ImageIdentify. Sometimes the computations are couched in the form of “things to do”, sometimes as “things to look for” or “things to aim for”.

We’ve come a long way in representing the world in the Wolfram Language. But there’s still more to do. Back in the 1600s it was quite popular to try to create “philosophical languages” that would somehow symbolically capture the essence of everything one could think about. Now we need to really do this. And, for example, to capture in a symbolic way all the kinds of actions and processes that can happen, as well as things like peoples’ beliefs and mental states. As our AIs become more sophisticated and more integrated into our lives, representing these kinds of things will become more important.

For some tasks and activities we’ll no doubt be able to use pure machine learning, and never have to build up any kind of intermediate structure or language. But much as natural language was crucial in enabling our species to get where we have, so also having an abstract language will be important for the progress of AI.

I’m not sure what it would look like, but we could perhaps imagine using some kind of pure emergent language produced by the AIs. But if we do that, then we humans can expect to be left behind, and to have no chance of understanding what the AIs are doing. But with the Wolfram Language we have a bridge, because we have a language that’s suitable for both humans and AIs.

More to Say

There’s much to be said about the interplay between language and computation, humans and AIs. Perhaps I need to write a book about it. But my purpose here has been to describe a little of my current thinking, particularly my realizations about the Wolfram Language as a bridge between human understanding and AI.

With pure natural language or traditional computer language, we’ll be hard pressed to communicate much to our AIs. But what I’ve been realizing is that with Wolfram Language there’s a much richer alternative, readily extensible by the AIs, but built on a base that leverages human natural language and human knowledge to maintain a connection with what we humans can understand. We’re seeing early examples already… but there’s a lot further to go, and I’m looking forward to actually building what’s needed, as well as writing about it…

↧

What Is Spacetime, Really?

December 2, 2015, 8:39 am

≫ Next: I Wrote a Book—To Teach the Wolfram Language

≪ Previous: How Should We Talk to AIs?

Space network

A hundred years ago today Albert Einstein published his General Theory of Relativity—a brilliant, elegant theory that has survived a century, and provides the only successful way we have of describing spacetime.

There are plenty of theoretical indications, though, that General Relativity isn’t the end of the story of spacetime. And in fact, much as I like General Relativity as an abstract theory, I’ve come to suspect it may actually have led us on a century-long detour in understanding the true nature of space and time.

I’ve been thinking about the physics of space and time for a little more than 40 years now. At the beginning, as a young theoretical physicist, I mostly just assumed Einstein’s whole mathematical setup of Special and General Relativity—and got on with my work in quantum field theory, cosmology, etc. on that basis.

But about 35 years ago, partly inspired by my experiences in creating technology, I began to think more deeply about fundamental issues in theoretical science—and started on my long journey to go beyond traditional mathematical equations and instead use computation and programs as basic models in science. Quite soon I made the basic discovery that even very simple programs can show immensely complex behavior—and over the years I discovered that all sorts of systems could finally be understood in terms of these kinds of programs.

Encouraged by this success, I then began to wonder if perhaps the things I’d found might be relevant to that ultimate of scientific questions: the fundamental theory of physics.

At first, it didn’t seem too promising, not least because the models that I’d particularly been studying (cellular automata) seemed to work in a way that was completely inconsistent with what I knew from physics. But sometime in 1988—around the time the first version of Mathematica was released—I began to realize that if I changed my basic way of thinking about space and time then I might actually be able to get somewhere.

A Simple Ultimate Theory?

In the abstract it’s far from obvious that there should be a simple, ultimate theory of our universe. Indeed, the history of physics so far might make us doubtful—because it seems as if whenever we learn more, things just get more complicated, at least in terms of the mathematical structures they involve. But—as noted, for example, by early theologians—one very obvious feature of our universe is that there is order in it. The particles in the universe don’t just all do their own thing; they follow a definite set of common laws.

But just how simple might the ultimate theory for the universe be? Let’s say we could represent it as a program, say in the Wolfram Language. How long would the program be? Would it be as long as the human genome, or as the code for an operating system? Or would it be much, much smaller?

Before my work on the computational universe of simple programs, I would have assumed that if there’s a program for the universe it must be at least somewhat complicated. But what I discovered is that in the computational universe even extremely simple programs can actually show behavior as complex as anything (a fact embodied in my general Principle of Computational Equivalence). So then the question arises: could one of these simple programs in the computational universe actually be the program for our physical universe?

The Data Structure of the Universe

But what would such a program be like? One thing is clear: if the program is really going to be extremely simple, it’ll be too small to explicitly encode obvious features of our actual universe, like particle masses, or gauge symmetries, or even the number of dimensions of space. Somehow all these things have to emerge from something much lower level and more fundamental.

So if the behavior of the universe is determined by a simple program, what’s the basic “data structure” on which this program operates? At first, I’d assumed that it must be something simple for us to describe, like the lattice of cells that exists in a cellular automaton. But even though such a structure works well for models of many things, it seems at best incredibly implausible as a fundamental model of physics. Yes, one can find rules that give behavior which on a large scale doesn’t show obvious signs of the lattice. But if there’s really going to be a simple model of physics, it seems wrong that such a rigid structure for space should be burned in, while every other feature of physics just emerges.

So what’s the alternative? One needs something in a sense “underneath” space: something from which space as we know it can emerge. And one needs an underlying data structure that’s as flexible as possible. I thought about this for years, and looked at all sorts of computational and mathematical formalisms. But what I eventually realized was that basically everything I’d looked at could actually be represented in the same way: as a network.

A network—or graph—just consists of a bunch of nodes, joined by connections. And all that’s intrinsically defined in the graph is the pattern of these connections.

Space as a Network

So could this be what space is made of? In traditional physics—and General Relativity—one doesn’t think of space as being “made of” anything. One just thinks of space as a mathematical construct that serves as a kind of backdrop, in which there’s a continuous range of possible positions at which things can be placed.

But do we in fact know that space is continuous like this? In the early days of quantum mechanics, it was actually assumed that space would be quantized like everything else. But it wasn’t clear how this could fit in with Special Relativity, and there was no obvious evidence of discreteness. By the time I started doing physics in the 1970s, nobody really talked about discreteness of space anymore, and it was experimentally known that there wasn’t discreteness down to about 10^-18 meters (1/1000 the radius of a proton, or 1 attometer). Forty years—and several tens of billions of dollars’ worth of particle accelerators—later there’s still no discreteness in space that’s been seen, and the limit is about 10^-22 meters (or 100 yoctometers).

Still, there’s long been a suspicion that something has to be quantized about space down at the Planck length of about 10^-34 meters. But when people have thought about this—and discussed spin networks or loop quantum gravity or whatever—they’ve tended to assume that whatever happens there has to be deeply connected to the formalism of quantum mechanics, and to the notion of quantum amplitudes for things.

But what if space—perhaps at something like the Planck scale—is just a plain old network, with no explicit quantum amplitudes or anything? It doesn’t sound so impressive or mysterious—but it certainly takes a lot less information to specify such a network: you just have to say which nodes are connected to which other ones.

But how could this be what space is made of? First of all, how could the apparent continuity of space on larger scales emerge? Actually, that’s not very difficult: it can just be a consequence of having lots of nodes and connections. It’s a bit like what happens in a fluid, like water. On a small scale, there are a bunch of discrete molecules bouncing around. But the large-scale effect of all these molecules is to produce what seems to us like a continuous fluid.

It so happens that I studied this phenomenon a lot in the mid-1980s—as part of my efforts to understand the origins of apparent randomness in fluid turbulence. And in particular I showed that even when the underlying “molecules” are cells in a simple cellular automaton, it’s possible to get large-scale behavior that exactly follows the standard differential equations of fluid flow.

So when I started thinking about the possibility that underneath space there might be a network, I imagined that perhaps the same methods might be used—and that it might actually be possible to derive Einstein’s Equations of General Relativity from something much lower level.

Maybe There’s Nothing But Space

But, OK, if space is a network, what about all the stuff that’s in space? What about all the electrons, and quarks and photons, and so on? In the usual formulation of physics, space is a backdrop, on top of which all the particles, or strings, or whatever, exist. But that gets pretty complicated. And there’s a simpler possibility: maybe in some sense everything in the universe is just “made of space”.

As it happens, in his later years, Einstein was quite enamored of this idea. He thought that perhaps particles, like electrons, could be associated with something like black holes that contain nothing but space. But within the formalism of General Relativity, Einstein could never get this to work, and the idea was largely dropped.

As it happens, nearly 100 years earlier there’d been somewhat similar ideas. That was a time before Special Relativity, when people still thought that space was filled with a fluid-like ether. (Ironically enough, in modern times we’re back to thinking of space as filled with a background Higgs field, vacuum fluctuations in quantum fields, and so on.) Meanwhile, it had been understood that there were different types of discrete atoms, corresponding to the different chemical elements. And so it was suggested (notably by Kelvin) that perhaps these different types of atoms might all be associated with different types of knots in the ether.

It was an interesting idea. But it wasn’t right. But in thinking about space as a network, there’s a related idea: maybe particles just correspond to particular structures in the network. Maybe all that has to exist in the universe is the network, and then the matter in the universe just corresponds to particular features of this network. It’s easy to see similar things in cellular automata on a lattice. Even though every cell follows the same simple rules, there are definite structures that exist in the system—and that behave quite like particles, with a whole particle physics of interactions.

There’s a whole discussion to be had about how this works in networks. But first, there’s something else that’s very important to talk about: time.

What Is Time?

Back in the 1800s, there was space and there was time. Both were described by coordinates, and in some mathematical formalisms, both appeared in related ways. But there was no notion that space and time were in any sense “the same thing”. But then along came Einstein’s Special Theory of Relativity—and people started talking about “spacetime”, in which space and time are somehow facets of the same thing.

It makes a lot of sense in the formalism of Special Relativity, in which, for example, traveling at a different velocity is like rotating in 4-dimensional spacetime. And for about a century, physics has pretty much just assumed that spacetime is a thing, and that space and time aren’t in any fundamental way different.

So how does that work in the context of a network model of space? It’s certainly possible to construct 4-dimensional networks in which time works just like space. And then one just has to say that the history of the universe corresponds to some particular spacetime network (or family of networks). Which network it is must be determined by some kind of constraint: our universe is the one which has such-and-such a property, or in effect satisfies such-and-such an equation. But this seems very non-constructive: it’s not telling one how the universe behaves, it’s just saying that if the behavior looks like this, then it can be the universe.

And, for example, in thinking about programs, space and time work very differently. In a cellular automaton, for example, the cells are laid out in space, but the behavior of the system occurs in a sequence of steps in time. But here’s the thing: just because the underlying rules treat space and time very differently, it doesn’t mean that on a large scale they can’t effectively behave similarly, just like in current physics.

Evolving the Network

OK, so let’s say that underneath space there’s a network. How does this network evolve? A simple hypothesis is to assume that there’s some kind of local rule, which says, in effect that if you see a piece of network that looks like this, replace it with one that looks like that.

But now things get a bit complicated. Because there might be lots of places in the network where the rule could apply. So what determines in which order each piece is handled?

In effect, each possible ordering is like a different thread of time. And one could imagine a theory in which all threads are followed—and the universe in effect has many histories.

But that doesn’t need to be how it works. Instead, it’s perfectly possible for there to be just one thread of time—pretty much the way we experience it. And to understand this, we have to do something a bit similar to what Einstein did in formulating Special Relativity: we have to make a more realistic model of what an “observer” can be.

Needless to say, any realistic observer has to exist within our universe. So if the universe is a network, the observer must be just some part of that network. Now think about all those little network updatings that are happening. To “know” that a given update has happened, observers themselves must be updated.

If you trace this all the way through—as I did in my book, A New Kind of Science—you realize that the only thing observers can ever actually observe in the history of the universe is the causal network of what event causes what other event.

And then it turns out that there’s a definite class of underlying rules for which different orderings of underlying updates don’t affect that causal network. They’re what I call “causal invariant” rules.

Causal invariance is an interesting property, with analogs in a variety of computational and mathematical systems—for example in the fact that transformations in algebra can be applied in any order and still give the same final result. But in the context of the universe, its consequence is that it guarantees that there’s only one thread of time in the universe.

Deriving Special Relativity

So what about spacetime and Special Relativity? Here, as I figured out in the mid-1990s, something exciting happens: as soon as there’s causal invariance, it basically follows that there’ll be Special Relativity on a large scale. In other words, even though at the lowest level space and time are completely different kinds of things, on a larger scale they get mixed together in exactly the way prescribed by Special Relativity.

Roughly what happens is that different “reference frames” in Special Relativity—corresponding, for example, to traveling at different velocities—correspond to different detailed sequencings of the low-level updates in the network. But because of causal invariance, the overall behavior associated with these different detailed sequences is the same—so that the system follows the principles of Special Relativity.

At the beginning it might have looked hopeless: how could a network that treats space and time differently end up with Special Relativity? But it works out. And actually, I don’t know of any other model in which one can successfully derive Special Relativity from something lower level; in modern physics it’s always just inserted as a given.

Deriving General Relativity

OK, so one can derive Special Relativity from simple models based on networks. What about General Relativity—which, after all, is what we’re celebrating today? Here the news is very good too: subject to various assumptions, I managed in the late 1990s to derive Einstein’s Equations from the dynamics of networks.

The whole story is somewhat complicated. But here’s roughly how it goes. First, we have to think about how a network actually represents space. Now remember, the network is just a collection of nodes and connections. The nodes don’t say how they’re laid out in one-dimensional, two-dimensional, or any-dimensional space.

It’s easy to see that there are networks that on a large scale seem, say, two-dimensional, or three-dimensional. And actually, there’s a simple test for the effective dimension of a network. Just start from a node, then look at all nodes that are up to r connections away. If the network is behaving like it’s d-dimensional, then the number of nodes in that “ball” will be about r^d.

Here’s where things start to get really interesting. If the network behaves like flat d-dimensional space, then the number of nodes will always be close to r^d. But if it behaves like curved space, as in General Relativity, then there’s a correction term, that’s proportional to a mathematical object called the Ricci scalar. And that’s interesting, because the Ricci scalar is precisely something that occurs in Einstein’s Equations.

There’s lots of mathematical complexity here. One has to look at shortest paths—or geodesics—in the network. One has to see how to do everything not just in space, but in networks evolving in time. And one has to understand how the large-scale limits of networks work.

In deriving mathematical results, it’s important to be able to take certain kinds of averages. It’s actually very much the same kind of thing needed to derive fluid equations from dynamics of molecules: one needs to be able to assume a certain degree of effective randomness in low-level interactions to justify the taking of averages.

But the good news is that an incredible range of systems, even with extremely simple rules, work a bit like the digits of pi, and generate what seems for all practical purposes random. And the result is that even though the details of a causal network are completely determined once one knows the network one’s starting from, many of these details will appear effectively random.

So here’s the final result. If one assumes effective microscopic randomness, and one assumes that the behavior of the overall system does not lead to a change in overall limiting dimensions, then it follows that the large-scale behavior of the system satisfies Einstein’s Equations!

I think this is pretty exciting. From almost nothing, it’s possible to derive Einstein’s Equations. Which means that these simple networks reproduce the features of gravity that we know in current physics.

There are all sorts of technical things to say, not suitable for this general blog. Quite a few of them I already said long ago in A New Kind of Science—and particularly the notes at the back.

A few things are perhaps worth mentioning here. First, it’s worth noting that my underlying networks not only have no embedding in ordinary space intrinsically defined, but also don’t intrinsically define topological notions like inside and outside. All these things have to emerge.

When it comes to deriving the Einstein Equations, one creates Ricci tensors by looking at geodesics in the network, and looking at the growth rates of balls that start from each point on the geodesic.

The Einstein Equations one gets are the vacuum Einstein Equations. But just like with gravitational waves, one can effectively separate off features of space considered to be associated with “matter”, and then get Einstein’s full Equations, complete with “matter” energy-momentum terms.

As I write this, I realize how easily I still fall into technical “physics speak”. (I think it must be that I learned physics when I was so young…) But suffice it to say that at a high level the exciting thing is that from the simple idea of networks and causal invariant replacement rules, it’s possible to derive the Equations of General Relativity. One puts remarkably little in, yet one gets out that remarkable beacon of 20th-century physics: General Relativity.

Particles, Quantum Mechanics, Etc.

It’s wonderful to be able to derive General Relativity. But that’s not all of physics. Another very important part is quantum mechanics. It’s going to get me too far afield to talk about this in detail here, but presumably particles—like electrons or quarks or Higgs bosons—must exist as certain special regions in the network. In qualitative terms, they might not be that different from Kelvin’s “knots in the ether”.

But then their behavior must follow the rules we know from quantum mechanics—or more particularly, quantum field theory. A key feature of quantum mechanics is that it can be formulated in terms of multiple paths of behavior, each associated with a certain quantum amplitude. I haven’t figured it all out, but there’s definitely a hint of something like this going on when one looks at the evolution of a network with many possible underlying sequences of replacements.

My network-based model doesn’t have official quantum amplitudes in it. It’s more like (but not precisely like) a classical, if effectively probabilistic, model. And for 50 years people have almost universally assumed that there’s a crippling problem with models like that. Because there’s a theorem (Bell’s Theorem) that says that unless there’s instantaneous non-local propagation of information, no such “hidden variables” model can reproduce the quantum mechanical results that are observed experimentally.

But there’s an important footnote. It’s pretty clear what “non-locality” means in ordinary space with a definite dimension. But what about in a network? Here it’s a different story. Because everything is just defined by connections. And even though the network may mostly correspond on a large scale to 3D space, it’s perfectly possible for there to be “threads” that join what would otherwise be quite separated regions. And the tantalizing thing is that there are indications that exactly such threads can be generated by particle-like structures propagating in the network.

Searching for the Universe

OK, so it’s conceivable that some network-based model might be able to reproduce things from current physics. How might we set about finding such a model that actually reproduces our exact universe?

The traditional instinct would be to start from existing physics, and try to reverse engineer rules that could reproduce it. But is that the only way? What about just starting to enumerate possible rules, and seeing if any of them turn out to be our universe?

Before studying the computational universe of simple programs I would have assumed that this would be crazy: that there’s no way the rules for our universe could be simple enough to find by this kind of enumeration. But after seeing what’s out there in the computational universe—and seeing some other examples where amazing things were found just by a search—I’ve changed my mind.

So what happens if one actually starts doing such a search? Here’s the zoo of networks one gets after a fairly small number of steps by using all possible underlying rules of a certain very simple type:

Some of these networks very obviously aren’t our universe. They just freeze after a few steps, so time effectively stops. Or they have far too simple a structure for space. Or they effectively have an infinite number of dimensions. Or other pathologies.

But the exciting thing is that remarkably quickly one finds rules that aren’t obviously not our universe. Telling if they actually are our universe is a difficult matter. Because even if one simulates lots of steps, it can be arbitrarily difficult to know whether the behavior they’re showing is what one would expect in the early moments of a universe that follows the laws of physics as we know them.

There are plenty of encouraging features, though. For example, these universes can start from effectively infinite numbers of dimensions, then gradually settle to a finite number of dimensions—potentially removing the need for explicit inflation in the early universe.

And at a higher level, it’s worth remembering that if the models one’s using are simple enough, there’s a big distance between “neighboring models”, so it’s likely one will either reproduce known physics exactly, or be very wide of the mark.

In the end, though, one needs to reproduce not just the rule, but also the initial condition for the universe. But once one has that, one will in principle know the exact evolution of the universe. So does that means one would immediately be able to figure out everything about the universe? Absolutely not. Because of the phenomenon I call “computational irreducibility”—which implies that even though one may knows the rule and initial condition for a system, it can still require an irreducible amount of computational work to trace through every step in the behavior of the system to find out what it does.

Still, the possibility exists that one could just find a simple rule—and initial condition—that one could hold up and say, “This is our universe!” We’d have found our universe in the computational universe of all possible universes.

Of course this would be an exciting day for science.

But it would raise plenty of other questions. Like: why this rule, and not another? And why should our particular universe have a rule that shows up early enough in our list of all possible universes that we could actually find it just by enumeration?

One might think that it’d just be something about us being in this universe, and that causing us to choose an enumeration which makes it come up early. But my current guess is that it’d be something much more bizarre, such as that with respect to observers in a universe, all of a large class of nontrivial possible universe rules are actually equivalent, so one could pick any of them and get the exact same results, just in a different way.

OK, Show Me the Universe

But these are all speculations. And until we actually find a serious candidate rule for our universe, it’s probably not worth discussing these things much.

So, OK. Where are we at with all this right now? Most of what I’ve said here I had actually figured out by around 1999—several years before I finished A New Kind of Science. And though it was described in simple language rather than physics-speak, I managed to cover the highlights of it in Chapter 9 of the book—giving some of the technical details in the notes at the back.

But after the book was finished in 2002, I started working on the problem of physics again. I found it a bit amusing to say I had a computer in my basement that was searching for the fundamental theory of physics. But that really was what it was doing: enumerating possible rules of certain types, and trying to see if their behavior satisfied certain criteria that could make them plausible as models of physics.

I was pretty organized in what I did, getting intuition from simplified cases, then systematically going through more realistic cases. There were lots of technical issues. Like being able to visualize large evolving sequences of graphs. Or being able to quickly recognize subtle regularities that revealed that something couldn’t be our actual universe.

I accumulated the equivalent of thousands of pages of results, and was gradually beginning to get an understanding of the basic science of what systems based on networks can do.

Notebooks from 2004

In a sense, though, this was always just a hobby, done alongside my “day job” of leading our company and its technology development. And there was another “distraction”. For many years I had been interested in the problem of computational knowledge, and in building an engine that could comprehensively embody it. And as a result of my work on A New Kind of Science, I became convinced that this might be actually be possible—and that this might be the right decade to do it.

By 2005 it was clear that it was indeed possible, and so I decided to devote myself to actually doing it. The result was Wolfram|Alpha. And once Wolfram|Alpha was launched it became clear that even more could be done—and I have spent what I think has probably been my most productive decade ever building a huge tower of ideas and technology, which has now made possible the Wolfram Language and much more.

To Do Physics, or Not to Do Physics?

But over the course of that decade, I haven’t been doing physics. And when I now look at my filesystem, I see a large number of notebooks about physics, all nicely laid out with the things I figured out—and all left abandoned and untouched since the beginning of 2005.

Should I get back to the physics project? I definitely want to. Though there are also other things I want to do.

I’ve spent most of my life working on very large projects. And I work hard to plan what I’m going to do, usually starting to think about projects decades ahead of actually doing them. Sometimes I’ll avoid a project because the ambient technology or infrastructure to do it just isn’t ready yet. But once I embark on a project, I commit myself to finding a way make it succeed, even if it takes many years of hard work to do so.

Finding the fundamental theory of physics, though, is a project of a rather different character than I’ve done before. In a sense its definition of success is much harsher: one either solves the problem and finds the theory, or one doesn’t. Yes, one could explore lots of interesting abstract features of the type of theory one’s constructing (as string theory has done). And quite likely such an investigation will have interesting spinoffs.

But unlike building a piece of technology, or exploring an area of science, the definition of the project isn’t under one’s control. It’s defined by our universe. And it could be that I’m simply wrong about how our universe works. Or it could be that I’m right, but there’s too deep a barrier of computational irreducibility for us to know.

One might also worry that one would find what one thinks is the universe, but never be sure. I’m actually not too worried about this. I think there are enough clues from existing physics—as well as from anomalies attributed to things like dark matter—that one will be able to tell quite definitively if one has found the correct theory. It’ll be neat if one can make an immediate prediction that can be verified. But by the time one’s reproducing all the seemingly arbitrary masses of particles, and other known features of physics, one will be pretty sure one has the correct theory.

It’s been interesting over the years to ask my friends whether I should work on fundamental physics. I get three dramatically different kinds of responses.

The first is simply, “You’ve got to do it!” They say that the project is the most exciting and important thing one can imagine, and they can’t see why I’d wait another day before starting on it.

The second class of responses is basically, “Why would you do it?” Then they say something like, “Why don’t you solve the problem of artificial intelligence, or molecular construction, or biological immortality, or at least build a giant multibillion-dollar company? Why do something abstract and theoretical when you can do something practical to change the world?”

There’s also a third class of responses, which I suppose my knowledge of the history of science should make me expect. It’s typically from physicist friends, and typically it’s some combination of, “Don’t waste your time working on that!” and, “Please don’t work on that.”

The fact is that the current approach to fundamental physics—through quantum field theory—is nearly 90 years old. It’s had its share of successes, but it hasn’t brought us the fundamental theory of physics. But for most physicists today, the current approach is almost the definition of physics. So when they think about what I’ve been working on, it seems quite alien—like it isn’t really physics.

And some of my friends will come right out and say, “I hope you don’t succeed, because then all that work we’ve done is wasted.” Well, yes, some work will be wasted. But that’s a risk you take when you do a project where in effect nature decides what’s right. But I have to say that even if one can find a truly fundamental theory of physics, there’s still plenty of use for what’s been done with standard quantum field theory, for example in figuring out phenomena at the scale where we can do experiments with particle accelerators today.

What Will It Take?

So, OK, if I mounted a project to try to find the fundamental theory of physics, what would I actually do? It’s a complex project, that’ll need not just me, but a diverse team of talented other people too.

Whether or not it ultimately works, I think it’ll be quite interesting to watch—and I’d plan to do it as “spectator science”, making it as educational and accessible as possible. (Certainly that would be a pleasant change from the distraction-avoiding hermit mode in which I worked on A New Kind of Science for a decade.)

Of course I don’t know how difficult the project is, or whether it will even work at all. Ultimately that depends on what’s true about our universe. But based on what I did a decade ago, I have a clear plan for how to get started, and what kind of team I have to put together.

It’s going to need both good scientists and good technologists. There’s going to be lots of algorithm development for things like network evolution, and for analysis. I’m sure it’ll need abstract graph theory, modern geometry and probably group theory and other kinds of abstract algebra too. And I won’t be surprised if it needs lots of other areas of math and theoretical computer science as well.

It’ll need serious, sophisticated physics—with understanding of the upper reaches of quantum field theory and perhaps string theory and things like spin networks. It’s also likely to need methods that come from statistical physics and the modern theoretical frameworks around it. It’ll need an understanding of General Relativity and cosmology. And—if things go well—it’ll need an understanding of a diverse range of physics experiments.

There’ll be technical challenges too—like figuring out how to actually run giant network computations, and collect and visualize their results. But I suspect the biggest challenges will be in building the tower of new theory and understanding that’s needed to study the kinds of network systems I want to investigate. There’ll be useful support from existing fields. But in the end, I suspect this is going to require building a substantial new intellectual structure that won’t look much like anything that’s been done before.

Is It the Right Time?

Is it the right time to actually try doing this project? Maybe one should wait until computers are bigger and faster. Or certain areas of mathematics have advanced further. Or some more issues in physics have been clarified.

I’m not sure. But nothing I have seen suggests that there are any immediate roadblocks—other than putting the effort and resources into trying to do it. And who knows: maybe it will be easier than we think, and we’ll look back and wonder why it wasn’t tried long ago.

One of the key realizations that led to General Relativity 100 years ago was that Euclid’s fifth postulate (“parallel lines never cross”) might not be true in our actual universe, so that curved space is possible. But if my suspicions about space and the universe are correct, then it means there’s actually an even more basic problem in Euclid—with his very first definitions. Because if there’s a discrete network “underneath” space, then Euclid’s assumptions about points and lines that can exist anywhere in space simply aren’t correct.

General Relativity is a great theory—but we already know that it cannot be the final theory. And now we have to wonder how long it will be before we actually know the final theory. I’m hoping it won’t be too long. And I’m hoping that before too many more anniversaries of General Relativity have gone by we’ll finally know what spacetime really is.

↧

I Wrote a Book—To Teach the Wolfram Language

December 8, 2015, 8:24 am

≫ Next: Untangling the Tale of Ada Lovelace

≪ Previous: What Is Spacetime, Really?

An Elementary Introduction to the Wolfram Language is available in print, free on the web, etc.

I wasn’t sure if I was ever going to write another book. My last book—A New Kind of Science—took me more than a decade of intensely focused work, and is the largest personal project I’ve ever done.

But a little while ago, I realized there was another book I had to write: a book that would introduce people with no knowledge of programming to the Wolfram Language and the kind of computational thinking it allows.

The result is An Elementary Introduction to the Wolfram Language, published today in print, free on the web, etc.

The goal of the book is to take people from zero to the point where they know enough about the Wolfram Language that they can routinely use it to create programs for things they want to do. And when I say “zero”, I really mean “zero”. This is a book for everyone. It doesn’t assume any knowledge of programming, or math (beyond basic arithmetic), or anything else. It just starts from scratch and explains things. I’ve tried to make it appropriate for both adults and kids. I think it’ll work for typical kids aged about 12 and up.

In the past, a book like this would have been inconceivable. The necessary underlying technology just didn’t exist. Serious programming was always difficult, and there wasn’t a good way to connect with real-world concepts. But now we have the Wolfram Language. It’s taken three decades. But now we’ve built in enough knowledge and automated enough of the process of programming that it’s actually realistic to take almost anyone from zero to the frontiers of what can be achieved with computation.

But how should one actually do it? What should one explain, in what order? Those were challenges I had to address to write this book. I’d written a Fast Introduction for Programmers that in 30 pages or so introduces people who already know about modern programming to the core concepts of the Wolfram Language. But what about people who don’t start off knowing anything about programming?

For many years I’ve found various opportunities to show what’s now the Wolfram Language to people like that. And now I’ve used my experience to figure out what to do in the book.

It’s a Conversation

In essence, the book brings the reader into a conversation with the computer. There are two great things about the Wolfram Language that make this really work. First, that the language is symbolic, so that anything one’s dealing with—a color, an image, a graph, whatever—can be right there in the dialog. And second, that the language can be purely functional, so that everything is stateless, and every input can be self contained.

It’s also very important that the Wolfram Language has built-in knowledge that lets one immediately compute with real-world things.

Oh, and visualization is really important too—so it’s easy to see what one’s computing.

Where to Start?

OK, but where should one start? The very first page is about arithmetic—just because that’s a place where everyone can see that a computation is actually happening:

There’s a section called Vocabulary because that’s what it is: one’s learning some “words” in the Wolfram Language. Then there are exercises, which I’ll talk about soon.

OK, but once one’s done arithmetic, where should one go next? What I decided to do was to go immediately to the idea of functions—and to first introduce them in terms of arithmetic. The advantage of this is that while the concept of a function may be new, the operation it’s doing (namely arithmetic) is familiar.

And once one’s understood the function Plus, one can immediately go to functions like Max that don’t have special inputs. What Max does isn’t that exciting, though. So as a slightly more exciting function, what I introduce next is RandomInteger—which people often like to run over and over again, to see what it produces.

OK, so what next? The obvious answer is that we have to introduce lists. But what should one do with lists? Doing something like picking elements out of them isn’t terribly exciting, and it’s hard immediately to see why it’s important. So instead what I decided was to make the very first function I show for lists be ListPlot. It’s nice to start getting in the idea of visualization—and it’s also a good example of how one can type in a tiny piece of code, and get something bigger and more interesting out.

Actually, the best extremely simple example of that is Range, which I also show at this point. Range is a great way to show the computer actually computing something, with a result that’s easy to understand.

But OK, so now we want to reinforce the idea of functions, and functions working together. The function Reverse isn’t incredibly common in practice, but it’s very easy to understand, so I introduce it next, followed by Join.

What’s nice then is that between Reverse, Range and Join we have a little microlanguage that’s completely self-contained, but lets us do a variety of computations. And, of course, whatever computations one does, one can immediately see the results, either symbolically or visually.

The next couple of sections talk about displaying and operating on lists, reinforcing what’s already been said, and introducing a variety of functions that are useful in practice. Then it’s on to Table—a very common and powerful function, that in effect packages up a lot of what might otherwise need explicit loops and so on.

I start with trivial versions of Table, without any iteration variable. I take it for granted (as people who don’t know “better” do!) that Table can produce a list of graphics just like it can produce a list of numbers. (Of course, the fact that it can do this is a consequence of the fundamentally symbolic character of the Wolfram Language.)

The next big step is to introduce a variable into Table. I thought a lot about how to do this, and decided that the best thing to show first is the purely symbolic version. After all, we’ve already introduced functions, and with the symbolic version, one can immediately see where the variable goes. But now that we’ve got Table with variables, we can really go to town and start doing what people will think of as “real computations”.

The Arc of the Book

In the first few sections of the book, the raw material for our computations is basically numbers and lists. What I wanted to do next was to show that there are other things to compute with. I chose colors as the first example. Colors are good because (a) everyone knows what they are, (b) you can actually compute with them and (c) they make colorful output (!).

After colors we’re ready for some graphics. I haven’t talked about coordinates yet, so I can only show individual graphical objects, without placement information.

There’s absolutely no reason not to go to 3D, and I do.

Now we’re all set up for something “advanced”: interactive manipulation. It’s pretty much like Table, except that one gets out a complete interactive user interface. And since we’ve introduced graphics, those can be part of the interface. People have seen interactive interfaces in lots of consumer software. My experience is that they’re pretty excited to be able to create them from scratch themselves.

The next, perhaps surprising thing I introduce in the book is image processing. Yes, there’s a lot of sophisticated computation behind image processing. But in the Wolfram Language that’s all internal. And what people see are just functions—like Blur and ColorNegate—whose purposes are easy to understand.

It’s also nice that people—especially kids—can compute with images they take, or drag in. And this is actually the first example in the book where there’s rich data coming into a computation from outside. (I needed a sample image for the section, so, yes, I just snapped one right there—of me working on the book.)

Next I talk about strings and text. String operations on their own are pretty dry. But in the Wolfram Language there’s lots of interesting stuff that’s easy to do with them—like visualizing word clouds from Wikipedia, or looking at common words in different languages.

Next I cover sound, and talk about how to generate sequences of musical notes. In the printed book you can’t hear them, of course, though the little score icons give some sense of what’s there.

One might wonder, “Why not talk about sound right after graphics?” Well, first of all, I thought it wasn’t bad to mix things up a bit, to help keep the flow interesting. But more than that, there’s a certain chain of dependencies between different areas. For example, the names of musical notes are specified as strings—so one has to have talked about strings before musical notes.

Next it’s “Arrays, or Lists of Lists”. Then it’s “Coordinates and Graphics”. At first, I worried that coordinates were too “mathy”. But particularly after one’s seen arrays, it’s not so difficult to understand coordinates. And once one’s got the idea of 2D coordinates, it’s easy to go to 3D.

By this point in the book, people already know how to do some useful and real things with the Wolfram Language. So I made the next section a kind of interlude—a meta-section that gives a sense of the overall scope of the Wolfram Language, and also shows how to find information on specific topics and functions.

Now that people have seen a bit about abstract computation, it’s time to talk about real-world data, and to show how to access the vast amount of data that the Wolfram Language shares with Wolfram|Alpha.

Lots of real-world data involves units—so the next section is devoted to working with units. Once that’s done, we can talk about geocomputation: things like finding distances on the Earth, and drawing maps.

After that I talk about dates and times. One might think this wouldn’t be an interesting or useful topic. But it’s actually a really good example of real-world computation, and it’s also something one uses all over the place.

The Wolfram Language is big. But it’s based on a small number of ideas that are consistently used over and over again. One of the important objectives in the book is to cover these ideas. And the next section—on options—covers one such simple idea that’s widely used in practice.

After covering options, we’re set to talk about something that’s often viewed as a quite advanced topic: graphs and networks. But my experience is that in modern times, people have seen enough graphs and networks in their everyday life that they don’t have much trouble understanding them in the Wolfram Language. Of course, it helps a lot that the language can manipulate them directly, as just another example of symbolic objects.

After graphs and networks, we’re ready for another seemingly very advanced topic: machine learning. But even though the internal algorithms for machine learning are complicated, the actual functions that do it in the Wolfram Language are perfectly easy to understand. And what’s nice is that by doing a bunch of examples with them, one can start to get pretty good high-level intuition about the core ideas of machine learning.

Throughout the book, I try to keep things as simple as possible. But sometimes that means I have to go back for a deeper view of a topic I’ve already covered. “More about Numbers” and “More Forms of Visualization” are two examples of doing this—covering things that would have gotten in the way when numbers and visualization were first introduced, but that need to be said to get a full understanding of these areas.

Functional Programming

The next few sections tackle the important and incredibly powerful topic of functional programming. In the past, functional programming tended to be viewed as a sophisticated topic—and certainly not something to teach people who are first learning about programming. But I think in the Wolfram Language the picture has changed—and it’s now possible to explain functional programming in a way that people will find easy to understand. I start by just talking more abstractly about the process of applying a function.

The big thing this does is set me up to talk about pure anonymous functions. In principle I could have talked about these much sooner, but I think it’s important for people to have seen many different kinds of examples of how functions are used in general—because that’s what’s needed to motivate pure functions.

The next section is where some of the real power of functional programming starts to shine through. In the abstract, functions like NestList and NestGraph sound pretty complicated and abstract. But by this point in the book, we’ve covered enough of the Wolfram Language that there are plenty of concrete examples to give—that are quite easy to understand.

The next several sections cover areas of the language that are unlocked as soon as one understands pure functions. There are lots of powerful programming techniques that emerge from a smaller number of ideas.

After functional programming, the next big topics are patterns and pattern-based programming. I could have chosen to talk about patterns earlier in the book, but they weren’t really needed until now.

What makes patterns so powerful in the Wolfram Language is something much more fundamental: the uniform structure of everything in the language, based on symbolic expressions. If I were writing a formal specification of the Wolfram Language, I would start with symbolic expressions. And I might do the same if I were writing a book for theoretical computer scientists or pure mathematicians.

It’s not that symbolic expressions are a difficult concept to understand. It’s just that without seeing how things actually work in practice in the Wolfram Language, it’s difficult to motivate abstractly studying them. But now it makes sense to talk about them, not least because they let one see the full power of what’s possible with patterns.

The Whole Stack

At this point in the book, we’re getting ready to see how to actually deploy things like web apps. There are a few more pieces to put in place to get there. I talk about associations—and then I talk about natural language understanding. Internally, the way natural language understanding works is complex. But at the level of the Wolfram Language, it’s easy to use—though to see how to connect it into things, it’s helpful to know about pure functions.

OK, so now everything is ready to talk about deploying things to the web. And at this point, people will be able to start creating useful, practical pieces of software that they can share with the world.

It’s taken 220 pages or so. But to me that’s an amazingly small number of pages to go from zero to what are essentially professional-grade web apps. If we’d just been talking about some very specific kind of app, it wouldn’t be so impressive. But we’re talking about extremely general kinds of apps, that do pretty much any kind of computation.

Assigning Values to Variables

If you open a book about a traditional programming language like C++ or Java, one of the first things you’re likely to see is a discussion of assigning values to variables. But in my book I don’t do this until Section 38. At some level, this might seem bizarre—but it really isn’t. Because in the Wolfram Language you can do an amazing amount—including for example deploying a complete web app—without ever needing to assign a value to a variable.

And this is actually one of the reasons why it’s so easy to learn the Wolfram Language. Because if you don’t assign values to variables, every piece of code in the language stands alone, and will do the same thing whenever it’s run. But as soon as you’re assigning values to variables, there’s hidden state, and your code will do different things depending on what values variables happen to have.

Still, having talked about assigning values to variables—as well as about patterns—we’re ready to talk about defining your own functions, which is the way to build up more and more sophisticated functionality in the Wolfram Language.

At this point, you’re pretty much set in terms of the basic concepts of the Wolfram Language. But the last few sections of the book cover some important practical extensions. There’s a section on string patterns and templates. There’s a section on storing things, locally and in the cloud. There’s a section on importing and exporting. And there’s a section on datasets. Not everyone who uses the Wolfram Language will ever need datasets, but when you’re dealing with large amounts of structured data they’re very useful. And they provide an interesting example that makes use of many different ideas from the Wolfram Language.

Essay Sections

At the end of the book, I have what are basically essay sections: about writing good code, about debugging and about being a programmer. My goal in these sections is to build on the way of thinking that I hope people have developed from reading the rest of the book, and then to communicate some more abstract principles.

Structuring the Presentation

I said at the beginning of this post that the book is essentially written as a conversation. In almost every section, I found it convenient to add two additional parts: Q&A and Tech Notes. The goal of Q&A is to have a place to answer obvious questions people might have, without distracting from the main narrative.

There are several different types of questions. Some are about extensions to the functionality that’s been discussed. Some are about the background to it. And some are questions (“What does ‘raised to the power’ mean?”) that will be trivial to some readers but not to others.

In addition to Q&A, I found it useful to include what I call Tech Notes. Their goal is to add technical information—and to help people who already have sophisticated technical knowledge in some particular area to connect it to what they’re reading in this book.

Exercises

Another part of most sections is a collection of exercises. The vast majority are basically of the form “write a piece of code to do X”—though a few are instead “find a simpler version of this piece of code”.

There are answers to all the exercises in the printed book at the back—and in the web version there are additional exercises. Of course, the answers that are given are just possible answers—and they’re almost never the only possible answers.

Writing the exercises was an interesting experience for me, that was actually quite important in my thinking about topics like how to talk to AIs. Because what most of the exercises effectively say is, “Take this description that’s written in English, and turn it into Wolfram Language code.” If what one’s doing is simple enough, then English works quite well as a description language. But when what one’s doing gets more complicated, English doesn’t do so well. And by later in the book, I was often finding it much easier to write the Wolfram Language answer for an exercise than to create the actual exercise in English.

In a sense this is very satisfying, because it means we really need the Wolfram Language to be able to express ideas. Some things we can express easily in English—and eventually expect Wolfram|Alpha to be able to understand. But there’s plenty that requires the greater structure and precision of the Wolfram Language.

A Book?

At some level it might seem odd in this day and age to be writing a book that can actually be printed on paper, rather than creating some more flexible online structure. But what I’ve found is that the concept of a book is very useful. Yes, one can have a website where one can reach lots of information by following links. But when people are trying to systematically learn a subject, I think it’s good to have a definite, finite container of information, where there’s an expectation of digesting it sequentially, and where one can readily see the overall structure.

That’s not to say that it’s not useful to have the book online. Right now the book is available as a website, and for many purposes this web version works very well. But somewhat to my surprise, I still find the physical book, with its definite pagination and browsable pages, better for many things.

Of course, if you’re going to learn the Wolfram Language, you actually need to run it. So even if you’re using a physical book, it’s best to to have a computer (or tablet) by your side—so you can try the examples, do the exercises, etc. You can do this immediately if you’re reading the book on the web or in the cloud. But some people have told me that they actually find it helpful to retype the examples: they internalize them better that way, and with all the autocompletion and other features in the Wolfram Language, it’s very fast to type in code.

What’s Not in the Book

I call the book an “elementary introduction”. And that’s what it is. It’s not a complete book about the Wolfram Language—far from it. It’s intended to be a basic introduction that gets people to the point where they can start writing useful programs. It covers a lot of the core principles of the language—but only a small fraction of the very large number of specific areas of functionality.

Generally I tried to include areas that are either very commonly encountered in practice, or easy for people to understand without external knowledge—and good for illuminating principles. I’m very happy with the sequence of areas I was able to cover—but another book could certainly pick quite different ones.

Of course, I was a little disappointed to have to leave out all sorts of amazing things that the Wolfram Language can do. And at the end of the book I decided to include a short section that gives a taste of what I wasn’t able to talk about.

Some Backstory

I see my new book as part of the effort to launch the Wolfram Language. And back in 1988, when we first launched Mathematica, I wrote a book for that, too. But it was a different kind of book: it was a book that was intended to provide a complete tutorial introduction and reference guide to the whole system. The first edition was 767 pages. But by the 5th edition a decade later, the book had grown to 1488 pages. And at that point we decided a book just wasn’t the correct way to deliver the information—and we built a whole online system instead.

It’s just as well we did that, because it allowed us to greatly expand the depth of coverage, particularly in terms of examples. And of course the actual software system grew a lot—with the result that today the full Documentation Center contains more than 50,000 pages of content.

Many people have told me that they liked the original Mathematica book—and particularly the fact that it was short enough to realistically read from cover to cover. My goal with An Elementary Introduction to the Wolfram Language was again to have a book that’s short enough that people can actually read all of it.

Looking at the book, it’s interesting to see how much of it is about things that simply didn’t exist in the Wolfram Language—or Mathematica—until very recently. Of course it’s satisfying to me to see that things we’re adding now are important enough to make it into an elementary introduction. But it also means that even people who’ve known parts of the Wolfram Language through Mathematica for many years should find the book interesting to read.

Why Me?

I’ve thought for a while that there should be a book like the one I’ve now written. And obviously there are plenty of people who know the Wolfram Language well and could in principle have written an introduction to it. But I’m happy that I’ve been the one to write this book. It’s reduced my productivity on other things—like writing blogs—for a little while. But it’s been a fascinating experience.

It’s been a bit like being back hundreds of years and asking, “How should one approach explaining math to people?” And working out that first one should talk about arithmetic, then algebra, and so on. Well, now we have to do the same kind of thing for computational thinking. And I see the book as a first effort at communicating the tools of computational thinking to a broad range of people.

It’s been fun to write. I hope people find it fun to read—and that they use what they learn from it to create amazing things with the Wolfram Language.

↧

Untangling the Tale of Ada Lovelace

December 10, 2015, 12:34 am

≫ Next: Announcing Wolfram Programming Lab

≪ Previous: I Wrote a Book—To Teach the Wolfram Language

(New York Public Library)

Ada Lovelace was born 200 years ago today. To some she is a great hero in the history of computing; to others an overestimated minor figure. I’ve been curious for a long time what the real story is. And in preparation for her bicentennial, I decided to try to solve what for me has always been the “mystery of Ada”.

It was much harder than I expected. Historians disagree. The personalities in the story are hard to read. The technology is difficult to understand. The whole story is entwined with the customs of 19th-century British high society. And there’s a surprising amount of misinformation and misinterpretation out there.

But after quite a bit of research—including going to see many original documents—I feel like I’ve finally gotten to know Ada Lovelace, and gotten a grasp on her story. In some ways it’s an ennobling and inspiring story; in some ways it’s frustrating and tragic.

It’s a complex story, and to understand it, we’ll have to start by going over quite a lot of facts and narrative.

Looking at a letter from Ada Lovelace to Charles Babbage in the British Library

The Early Life of Ada

Let’s begin at the beginning. Ada Byron, as she was then called, was born in London on December 10, 1815 to recently married high-society parents. Her father, Lord Byron (George Gordon Byron) was 27 years old, and had just achieved rock-star status in England for his poetry. Her mother, Annabella Milbanke, was a 23-year-old heiress committed to progressive causes, who inherited the title Baroness Wentworth. Her father said he gave her the name “Ada” because “It is short, ancient, vocalic”.

Ada’s parents were something of a study in opposites. Byron had a wild life—and became perhaps the top “bad boy” of the 19th century—with dark episodes in childhood, and lots of later romantic and other excesses. In addition to writing poetry and flouting the social norms of his time, he was often doing the unusual: keeping a tame bear in his college rooms in Cambridge, living it up with poets in Italy and “five peacocks on the grand staircase”, writing a grammar book of Armenian, and—had he not died too soon—leading troops in the Greek war of independence (as celebrated by a big statue in Athens), despite having no military training whatsoever.

Annabella Milbanke was an educated, religious and rather proper woman, interested in reform and good works, and nicknamed by Byron “Princess of Parallelograms”. Her very brief marriage to Byron fell apart when Ada was just 5 weeks old, and Ada never saw Byron again (though he kept a picture of her on his desk and famously mentioned her in his poetry). He died at the age of 36, at the height of his celebrityhood, when Ada was 8. There was enough scandal around him to fuel hundreds of books, and the PR battle between the supporters of Lady Byron (as Ada’s mother styled herself) and of him lasted a century or more.

Ada led an isolated childhood on her mother’s rented country estates, with governesses and tutors and her pet cat, Mrs. Puff. Her mother, often absent for various (quite wacky) health cures, enforced a system of education for Ada that involved long hours of study and exercises in self control. Ada learned history, literature, languages, geography, music, chemistry, sewing, shorthand and mathematics (taught in part through experiential methods) to the level of elementary geometry and algebra. When Ada was 11, she went with her mother and an entourage on a year-long tour of Europe. When she returned she was enthusiastically doing things like studying what she called “flyology”—and imagining how to mimic bird flight with steam-powered machines.

But then she got sick with measles (and perhaps encephalitis)—and ended up bedridden and in poor health for 3 years. She finally recovered in time to follow the custom for society girls of the period: on turning 17 she went to London for a season of socializing. On June 5, 1833, 26 days after she was “presented at Court” (i.e. met the king), she went to a party at the house of 41-year-old Charles Babbage (whose oldest son was the same age as Ada). Apparently she charmed the host, and he invited her and her mother to come back for a demonstration of his newly constructed Difference Engine: a 2-foot-high hand-cranked contraption with 2000 brass parts, now to be seen at the Science Museum in London:

Babbage's Difference Engine fragment in the Science Museum, London

Ada’s mother called it a “thinking machine”, and reported that it “raised several Nos. to the 2nd & 3rd powers, and extracted the root of a Quadratic Equation”. It would change the course of Ada’s life.

Charles Babbage

What was the story of Charles Babbage? His father was an enterprising and successful (if personally distant) goldsmith and banker. After various schools and tutors, Babbage went to Cambridge to study mathematics, but soon was intent on modernizing the way mathematics was done there, and with his lifelong friends John Herschel (son of the discoverer of Uranus) and George Peacock (later a pioneer in abstract algebra), founded the Analytical Society (which later became the Cambridge Philosophical Society) to push for reforms like replacing Newton’s (“British”) dot-based notation for calculus with Leibniz’s (“Continental”) function-based one.

Charles Babbage at various ages

Babbage graduated from Cambridge in 1814 (a year before Ada Lovelace was born), went to live in London with his new wife, and started establishing himself on the London scientific and social scene. He didn’t have a job as such, but gave public lectures on astronomy and wrote respectable if unspectacular papers about various mathematical topics (functional equations, continued products, number theory, etc.)—and was supported, if modestly, by his father and his wife’s family.

In 1819 Babbage visited France, and learned about the large-scale government project there to make logarithm and trigonometry tables. Mathematical tables were of considerable military and commercial significance in those days, being used across science, engineering and finance, as well as in areas like navigation. It was often claimed that errors in tables could make ships run aground or bridges collapse.

Back in England, Babbage and Herschel started a project to produce tables for their new Astronomical Society, and it was in the effort to check these tables that Babbage is said to have exclaimed, “I wish to God these tables had been made by steam!”—and began his lifelong effort to mechanize the production of tables.

State of the Art

There were mechanical calculators long before Babbage. Pascal made one in 1642, and we now know there was even one in antiquity. But in Babbage’s day such machines were still just curiosities, not reliable enough for everyday practical use. Tables were made by human computers, with the work divided across a team, and the lowest-level computations being based on evaluating polynomials (say from series expansions) using the method of differences.

What Babbage imagined is that there could be a machine—a Difference Engine—that could be set up to compute any polynomial up to a certain degree using the method of differences, and then automatically step through values and print the results, taking humans and their propensity for errors entirely out of the loop.

(Museum of the History of Science)

By early 1822, the 30-year-old Babbage was busy studying different types of machinery, and producing plans and prototypes of what the Difference Engine could be. The Astronomical Society he’d co-founded awarded him a medal for the idea, and in 1823 the British government agreed to provide funding for the construction of such an engine.

Babbage was slightly distracted in 1824 by the prospect of joining a life insurance startup, for which he did a collection of life-table calculations. But he set up a workshop in his stable (his “garage”), and kept on having ideas about the Difference Engine and how its components could be made with the tools of his time.

In 1827, Babbage’s table of logarithms—computed by hand—was finally finished, and would be reprinted for nearly 100 years. Babbage had them printed on yellow paper on the theory that this would minimize user error. (When I was in elementary school, logarithm tables were still the fast way to do multiplication.)

Charles Babbage's "Table of Logarithms of the Natural Numbers from 1 to 108000"

Also in 1827, Babbage’s father died, leaving him about £100k, or perhaps $14 million today, setting up Babbage financially for the rest of his life. The same year, though, his wife died. She had had eight children with him, but only three survived to adulthood.

Dispirited by his wife’s death, Babbage took a trip to continental Europe, and being impressed by what he saw of the science being done there, wrote a book entitled Reflections on the Decline of Science in England, that ended up being mainly a diatribe against the Royal Society (of which he was a member).

Charles Babbage's "Reflections on the Decline of Science in England, and on Some of Its Causes"

Though often distracted, Babbage continued to work on the Difference Engine, generating thousands of pages of notes and designs. He was quite hands on when it came to personally drafting plans or doing machine-shop experiments. But he was quite hands off in managing the engineers he hired—-and he did not do well at managing costs. Still, by 1832 a working prototype of a small Difference Engine (without a printer) had successfully been completed. And this is what Ada Lovelace saw in June 1833.

(Science Museum / Science & Society Picture Library)

Back to Ada

Ada’s encounter with the Difference Engine seems to be what ignited her interest in mathematics. She had gotten to know Mary Somerville, translator of Laplace and a well-known expositor of science—and partly with her encouragement, was soon, for example, enthusiastically studying Euclid. And in 1834, Ada went along on a philanthropic tour of mills in the north of England that her mother was doing, and was quite taken with the then-high-tech equipment they had.

Ada Lovelace at various ages (she hated the picture on the lower left)

On the way back, Ada taught some mathematics to the daughters of one of her mother’s friends. She continued by mail, noting that this could be “the commencement of ‘A Sentimental Mathematical Correspondence carried on for years between two ladies of rank’ to be hereafter published no doubt for the edification of mankind, or womankind”. It wasn’t sophisticated math, but what Ada said was clear, complete with admonitions like “You should never select an indirect proof, when a direct one can be given.” (There’s a lot of underlining, here shown as italics, in all Ada’s handwritten correspondence.)

Babbage seems at first to have underestimated Ada, trying to interest her in the Silver Lady automaton toy that he used as a conversation piece for his parties (and noting his addition of a turban to it). But Ada continued to interact with (as she put it) Mr. Babbage and Mrs. Somerville, both separately and together. And soon Babbage was opening up to her about many intellectual topics, as well as about the trouble he was having with the government over funding of the Difference Engine.

In the spring of 1835, when Ada was 19, she met 30-year-old William King (or, more accurately, William, Lord King). He was a friend of Mary Somerville’s son, had been educated at Eton (the same school where I went 150 years later) and Cambridge, and then had been a civil servant, most recently at an outpost of the British Empire in the Greek islands. William seems to have been a precise, conscientious and decent man, if somewhat stiff. But in any case, Ada and he hit it off, and they were married on July 8, 1835, with Ada keeping the news quiet until the last minute to avoid paparazzi-like coverage.

The next several years of Ada’s life seem to have been dominated by having three children and managing a large household—though she had some time for horse riding, learning the harp, and mathematics (including topics like spherical trigonometry). In 1837, Queen Victoria (then 18) came to the throne, and as a member of high society, Ada met her. In 1838, William was made an earl for his government work, and Ada become the Countess of Lovelace.

(Powerhouse Museum Sydney)

Within a few months of the birth of her third child in 1839, Ada decided to get more serious about mathematics again. She told Babbage she wanted to find a “mathematical Instructor” in London, though asked that in making enquiries he not mention her name, presumably for fear of society gossip.

The person identified was Augustus de Morgan, first professor of mathematics at University College London, noted logician, author of several textbooks, and not only a friend of Babbage’s, but also the husband of the daughter of Ada’s mother’s main childhood teacher. (Yes, it was a small world. De Morgan was also a friend of George Boole’s—and was the person who indirectly caused Boolean algebra to be invented.)

In Ada’s correspondence with Babbage, she showed interest in discrete mathematics, and wondered, for example, if the game of solitaire “admits of being put into a mathematical Formula, and solved”. But in keeping with the math education traditions of the time (and still today), de Morgan set Ada on studying calculus.

(British Library)

Her letters to de Morgan about calculus are not unlike letters from a calculus student today—except for the Victorian English. Even many of the confusions are the same—though Ada was more sensitive than some to the bad notations of calculus (“why can’t one multiply by dx?”, etc.). Ada was a tenacious student, and seemed to have had a great time learning more and more about mathematics. She was pleased by the mathematical abilities she discovered in herself, and by de Morgan’s positive feedback about them. She also continued to interact with Babbage, and on one visit to her estate (in January 1841, when she was 25), she charmingly told the then-49-year-old Babbage, “If you are a Skater, pray bring Skates to Ockham; that being the fashionable occupation here now, & one I have much taken to.”

Ada learning calculus from Augustus de Morgan

Ada’s relationship with her mother was a complex one. Outwardly, Ada treated her mother with great respect. But in many ways she seems to have found her controlling and manipulative. Ada’s mother was constantly announcing that she had medical problems and might die imminently (she actually lived to age 64). And she increasingly criticized Ada for her child rearing, household management and deportment in society. But by February 6, 1841, Ada was feeling good enough about herself and her mathematics to write a very open letter to her mother about her thoughts and aspirations.

She wrote: “I believe myself to possess a most singular combination of qualities exactly fitted to make me pre-eminently a discoverer of the hidden realities of nature.” She talked of her ambition to do great things. She talked of her “insatiable & restless energy” which she believed she finally had found a purpose for. And she talked about how after 25 years she had become less “secretive & suspicious” with respect to her mother.

But then, three weeks later, her mother dropped a bombshell, claiming that before Ada was born, Byron and his half-sister had had a child together. Incest like that wasn’t actually illegal in England at the time, but it was scandalous. Ada took the whole thing very hard, and it derailed her from mathematics.

Ada had had intermittent health problems for years, but in 1841 they apparently worsened, and she started systematically taking opiates. She was very keen to excel in something, and began to get the idea that perhaps it should be music and literature rather than math. But her husband William seems to have talked her out of this, and by late 1842 she was back to doing mathematics.

Back to Babbage

What had Babbage been up to while all this had been going on? He’d been doing all sorts of things, with varying degrees of success.

After several attempts, he’d rather honorifically been appointed Lucasian Professor of Mathematics at Cambridge—but never really even spent time in Cambridge. Still, he wrote what turned out to be a fairly influential book, On the Economy of Machinery and Manufactures, dealing with such things as how to break up tasks in factories (an issue that had actually come up in connection with the human computation of mathematical tables).

Babbage's "On the Economy of Machinery and Manufactures"

In 1837, he weighed in on the then-popular subject of natural theology, appending his Ninth Bridgewater Treatise to the series of treatises written by other people. The central question was whether there is evidence of a deity from the apparent design seen in nature. Babbage’s book is quite hard to read, opening for example with, “The notions we acquire of contrivance and design arise from comparing our observations on the works of other beings with the intentions of which we are conscious in our own undertakings.”

In apparent resonance with some of my own work 150 years later, he talks about the relationship between mechanical processes, natural laws and free will. He makes statements like “computations of great complexity can be effected by mechanical means”, but then goes on to claim (with rather weak examples) that a mechanical engine can produce sequences of numbers that show unexpected changes that are like miracles.

Babbage's Ninth Bridgewater Treatise (1)

Babbage's Ninth Bridgewater Treatise (2)

Babbage's Ninth Bridgewater Treatise (3)

Babbage tried his hand at politics, running for parliament twice on a manufacturing-oriented platform, but failed to get elected, partly because of claims of misuse of government funds on the Difference Engine.

Babbage also continued to have upscale parties at his large and increasingly disorganized house in London, attracting such luminaries as Charles Dickens, Charles Darwin, Florence Nightingale, Michael Faraday and the Duke of Wellington—with his aged mother regularly in attendance. But even though the degrees and honors that he listed after his name ran to 6 lines, he was increasingly bitter about his perceived lack of recognition.

An example of Babbage's honorifics—six lines, ending in "etc."

Central to this was what had happened with the Difference Engine. Babbage had hired one of the leading engineers of his day to actually build the engine. But somehow, after a decade of work—and despite lots of precision machine tool development—the actual engine wasn’t done. Back in 1833, shortly after he met Ada, Babbage had tried to rein in the project—but the result was that his engineer quit, and insisted that he got to keep all the plans for the Difference Engine, even the ones that Babbage himself had drawn.

But right around this time, Babbage decided he’d had a better idea anyway. Instead of making a machine that would just compute differences, he imagined an “Analytical Engine” that supported a whole list of possible kinds of operations, that could in effect be done in an arbitrarily programmed sequence. At first, he just thought about having the machine evaluate fixed formulas, but as he studied different use cases, he added other capabilities, like conditionals—and figured out often very clever ways to implement them mechanically. But, most important, he figured out how to control the steps in a computation using punched cards of the kind that had been invented in 1801 by Jacquard for specifying patterns of weaving on looms.

(Museum of the History of Science)

Babbage created some immensely complicated designs, and today it seems remarkable that they could work. But back in 1826 Babbage had invented something he called Mechanical Notation—that was intended to provide a symbolic representation for the operation of machinery in the same kind of way that mathematical notation provides a symbolic representation for operations in mathematics.

Babbage was disappointed already in 1826 that people didn’t appreciate his invention. Undoubtedly people didn’t understand it, since even now it’s not clear how it worked. But it may have been Babbage’s greatest invention—because apparently it’s what let him figure out all his elaborate designs.

Babbage’s original Difference Engine project had cost the British government £17,500 or the equivalent of perhaps $2 million today. It was a modest sum relative to other government expenditures, but the project was unusual enough to lead to a fair amount of discussion. Babbage was fond of emphasizing that—unlike many of his contemporaries—he hadn’t taken government money himself (despite chargebacks for renovating his stable as a fireproof workshop, etc.). He also claimed that he eventually spent £20,000 of his own money—or the majority of his fortune (no, I don’t see how the numbers add up)—on his various projects. And he kept on trying to get further government support, and created plans for a Difference Engine No. 2, requiring only 8000 parts instead of 25,000.

By 1842, the government had changed, and Babbage insisted on meeting with the new prime minister (Robert Peel), but ended up just berating him. In parliament the idea of funding the Difference Engine was finally killed with quips like that the machine should be set to compute when it would be of use. (The transcripts of debates about the Difference Engine have a certain charm—especially when they discuss its possible uses for state statistics that strangely parallel computable-country opportunities with Wolfram|Alpha today.)

Ada’s Paper

Despite the lack of support in England, Babbage’s ideas developed some popularity elsewhere, and in 1840 Babbage was invited to lecture on the Analytical Engine in Turin, and given honors by the Italian government.

Babbage had never published a serious account of the Difference Engine, and had never published anything at all about the Analytical Engine. But he talked about the Analytical Engine in Turin, and notes were taken by a certain Luigi Menabrea, who was then a 30-year-old army engineer—but who, 27 years later, became prime minister of Italy (and also made contributions to the mathematics of structural analysis).

In October 1842, Menabrea published a paper in French based on his notes. When Ada saw the paper, she decided to translate it into English and submit it to a British publication. Many years later Babbage claimed he suggested to Ada that she write her own account of the Analytical Engine, and that she had responded that the thought hadn’t occurred to her. But in any case, by February 1843, Ada had resolved to do the translation but add extensive notes of her own.

Menabrea's article on Babbage's Analytical Engine

Over the months that followed she worked very hard—often exchanging letters almost daily with Babbage (despite sometimes having other “pressing and unavoidable engagements”). And though in those days letters were sent by post (which did come 6 times a day in London at the time) or carried by a servant (Ada lived about a mile from Babbage when she was in London), they read a lot like emails about a project might today, apart from being in Victorian English. Ada asks Babbage questions; he responds; she figures things out; he comments on them. She was clearly in charge, but felt she was first and foremost explaining Babbage’s work, so wanted to check things with him—though she got annoyed when Babbage, for example, tried to make his own corrections to her manuscript.

It’s charming to read Ada’s letter as she works on debugging her computation of Bernoulli numbers: “My Dear Babbage. I am in much dismay at having got into so amazing a quagmire & botheration with these Numbers, that I cannot possibly get the thing done today. …. I am now going out on horseback. Tant mieux.” Later she told Babbage: “I have worked incessantly, & most successfully, all day. You will admire the Table & Diagram extremely. They have been made out with extreme care, & all the indices most minutely & scrupulously attended to.” Then she added that William (or “Lord L.” as she referred to him) “is at this moment kindly inking it all over for me. I had to do it in pencil…”

Ada writes Babbage about debugging her "code"; page 1

Ada writes Babbage about debugging her "code"; page 2

William was also apparently the one who suggested that she sign the translation and notes. As she wrote to Babbage: “It is not my wish to proclaim who has written it; at the same time I rather wish to append anything that may tend hereafter to individualize, & identify it, with the other productions of the said A.A.L.” (for “Ada Augusta Lovelace”).

By the end of July 1843, Ada had pretty much finished writing her notes. She was proud of them, and Babbage was complimentary about them. But Babbage wanted one more thing: he wanted to add an anonymous preface (written by him) that explained how the British government had failed to support the project. Ada thought it a bad idea. Babbage tried to insist, even suggesting that without the preface the whole publication should be withdrawn. Ada was furious, and told Babbage so. In the end, Ada’s translation appeared, signed “AAL”, without the preface, followed by her notes headed “Translator’s Note”.

Ada was clearly excited about it, sending reprints to her mother, and explaining that “No one can estimate the trouble & interminable labour of having to revise the printing of mathematical formulae. This is a pleasant prospect for the future, as I suppose many hundreds & thousands of such formulae will come forth from my pen, in one way or another.” She said that her husband William had been excitedly giving away copies to his friends too, and Ada wrote, “William especially conceives that it places me in a much juster & truer position & light, than anything else can. And he tells me that it has already placed him in a far more agreeable position in this country.”

Within days, there was also apparently society gossip about Ada’s publication. She explained to her mother that she and William “are by no means desirous of making it a secret, altho’ I do not wish the importance of the thing to be exaggerated and overrated”. She saw herself as being a successful expositor and interpreter of Babbage’s work, setting it in a broader conceptual framework that she hoped could be built on.

There’s lots to say about the actual content of Ada’s notes. But before we get to that, let’s finish the story of Ada herself.

While Babbage’s preface wasn’t itself a great idea, one good thing it did for posterity was to cause Ada on August 14, 1843 to write Babbage a fascinating, and very forthright, 16-page letter. (Unlike her usual letters, which were on little folded pages, this was on large sheets.) In it, she explains that while he is often “implicit” in what he says, she is herself “always a very ‘explicit function of x’”. She says that “Your affairs have been, & are, deeply occupying both myself and Lord Lovelace…. And the result is that I have plans for you…” Then she proceeds to ask, “If I am to lay before you in the course of a year or two, explicit & honorable propositions for executing your engine … would there be any chance of allowing myself … to conduct the business for you; your own undivided energies being devoted to the execution of the work …”

The beginning of Ada’s 16-page August 14 letter to Babbage

In other words, she basically proposed to take on the role of CEO, with Babbage becoming CTO. It wasn’t an easy pitch to make, especially given Babbage’s personality. But she was skillful in making her case, and as part of it, she discussed their different motivation structures. She wrote, “My own uncompromising principle is to endeavour to love truth & God before fame & glory …”, while “Yours is to love truth & God … but to love fame, glory, honours, yet more.” Still, she explained, “Far be it from me, to disclaim the influence of ambition & fame. No living soul ever was more imbued with it than myself … but I certainly would not deceive myself or others by pretending it is other than a very important motive & ingredient in my character & nature.”

She ended the letter, “I wonder if you will choose to retain the lady-fairy in your service or not.”

At noon the next day she wrote to Babbage again, asking if he would help in “the final revision”. Then she added, “You will have had my long letter this morning. Perhaps you will not choose to have anything more to do with me. But I hope the best…”

(New York Public Library)

At 5 pm that day, Ada was in London, and wrote to her mother: “I am uncertain as yet how the Babbage business will end…. I have written to him … very explicitly; stating my own conditions … He has so strong an idea of the advantage of having my pen as his servant, that he will probably yield; though I demand very strong concessions. If he does consent to what I propose, I shall probably be enabled to keep him out of much hot water; & to bring his engine to consummation, (which all I have seen of him & his habits the last 3 months, makes me scarcely anticipate it ever will be, unless someone really exercises a strong coercive influence over him). He is beyond measure careless & desultory at times. — I shall be willing to be his Whipper-in during the next 3 years if I see fair prospect of success.”

But on Babbage’s copy of Ada’s letter, he scribbled, “Saw A.A.L. this morning and refused all the conditions”.

Babbage's scribble on Ada's letter: "Saw A.A.L. this morning and refused all conditions"

Yet on August 18, Babbage wrote to Ada about bringing drawings and papers when he would next come to visit her. The next week, Ada wrote to Babbage that “We are quite delighted at your (somewhat unhoped for) proposal” [of a long visit with Ada and her husband]. And Ada wrote to her mother: “Babbage & I are I think more friends than ever. I have never seen him so agreeable, so reasonable, or in such good spirits!”

Ada's August 25, 1843 letter to Babbage, page 1 (image from The Carl H. Pforzheimer Collection of Shelley and His Circle, The New York Public Library, Astor, Lenox and Tilden Foundations)

Ada's August 25, 1843 letter to Babbage, page 2 (image from The Carl H. Pforzheimer Collection of Shelley and His Circle, The New York Public Library, Astor, Lenox and Tilden Foundations)

(New York Public Library)

Then, on Sept. 9, Babbage wrote to Ada, expressing his admiration for her and (famously) describing her as “Enchantress of Number” and “my dear and much admired Interpreter”. (Yes, despite what’s often quoted, he wrote “Number” not “Numbers”.)

Babbage's September 9, 1843 "Enchantress of Number" letter to Ada

The next day, Ada responded to Babbage, “You are a brave man to give yourself wholly up to Fairy-Guidance!”, and Babbage signed off on his next letter as “Your faithful Slave”. And Ada described herself to her mother as serving as the “High-Priestess of Babbage’s Engine”.

After the Paper

But unfortunately that’s not how things worked out. For a while it was just that Ada had to take care of household and family things that she’d neglected while concentrating on her Notes. But then her health collapsed, and she spent many months going between doctors and various “cures” (her mother suggested “mesmerism”, i.e. hypnosis), all the while watching their effects on, as she put it, “that portion of the material forces of the world entitled the body of A.A.L.”

She was still excited about science, though. She’d interacted with Michael Faraday, who apparently referred to her as “the rising star of Science”. She talked about her first publication as her “first-born”, “with a colouring & undercurrent (rather hinted at & suggested than definitely expressed) of large, general, & metaphysical views”, and said that “He [the publication] will make an excellent head (I hope) of a large family of brothers & sisters”.

When her notes were published, Babbage had said “You should have written an original paper. The postponement of that will however only render it more perfect.” But by October 1844, it seemed that David Brewster (inventor of the kaleidoscope, among other things) would write about the Analytical Engine, and Ada asked if perhaps Brewster could suggest another topic for her, saying “I rather think some physiological topics would suit me as well as any.”

And indeed later that year, she wrote to a friend (who was also her lawyer, as well as being Mary Somerville’s son): “It does not appear to me that cerebral matter need be more unmanageable to mathematicians than sidereal & planetary matter & movements; if they would but inspect it from the right point of view. I hope to bequeath to the generations a Calculus of the Nervous System.” An impressive vision—10 years before, for example, George Boole would talk about similar things.

Both Babbage and Mary Somerville had started their scientific publishing careers with translations, and she saw herself as doing the same, saying that perhaps her next works would be reviews of Whewell and Ohm, and that she might eventually become a general “prophet of science”.

There were roadblocks, to be sure. Like that, at that time, as a woman, she couldn’t get access to the Royal Society’s library in London, even though her husband, partly through her efforts, was a member of the society. But the most serious issue was still Ada’s health. She had a whole series of problems, though in 1846 she was still saying optimistically “Nothing is needed but a year or two more of patience & cure”.

There were also problems with money. William had a never-ending series of elaborate—and often quite innovative—construction projects (he seems to have been particularly keen on towers and tunnels). And to finance them, they had to turn to Ada’s mother, who often made things difficult. Ada’s children were also approaching teenage-hood, and Ada was exercised by many issues that were coming up with them.

Meanwhile, she continued to have a good social relationship with Babbage, seeing him with considerable frequency, though in her letters talking more about dogs and pet parrots than the Analytical Engine. In 1848 Babbage developed a hare-brained scheme to construct an engine that played tic-tac-toe, and to tour it around the country as a way to raise money for his projects. Ada talked him out of it. The idea was raised for Babbage to meet Prince Albert to discuss his engines, but it never happened.

William also dipped his toe into publishing. He had already written short reports with titles like “Method of growing Beans and Cabbages on the same Ground” and “On the Culture of Mangold-Wurzel”. But in 1848 he wrote one more substantial piece, comparing the productivity of agriculture in France and England, based on detailed statistics, with observations like “It is demonstrable, not only that the Frenchman is much worse off than the Englishman, but that he is less well fed than during the devastating exhaustion of the empire.”

Lord Lovelace's "On the Subdivision of Real Property and Its Effects upon Agriculture and the Produce of the Soil in France"

1850 was a notable year for Ada. She and William moved into a new house in London, intensifying their exposure to the London scientific social scene. She had a highly emotional experience visiting for the first time her father’s family’s former estate in the north of England—and got into an argument with her mother about it. And she got more deeply involved in betting on horseracing, and lost some money doing it. (It wouldn’t have been out of character for Babbage or her to invent some mathematical scheme for betting, but there’s no evidence that they did.)

In May 1851 the Great Exhibition opened at the Crystal Palace in London. (When Ada visited the site back in January, Babbage advised, “Pray put on worsted stockings, cork soles and every other thing which can keep you warm.”) The exhibition was a high point of Victorian science and technology, and Ada, Babbage and their scientific social circle were all involved (though Babbage less so than he thought he should be). Babbage gave out many copies of a flyer on his Mechanical Notation. William won an award for brick-making.

But within a year, Ada’s health situation was dire. For a while her doctors were just telling her to spend more time at the seaside. But eventually they admitted she had cancer (from what we know now, probably cervical cancer). Opium no longer controlled her pain; she experimented with cannabis. By August 1852, she wrote, “I begin to understand Death; which is going on quietly & gradually every minute, & will never be a thing of one particular moment”. And on August 19, she asked Babbage’s friend Charles Dickens to visit and read her an account of death from one of his books.

Her mother moved into her house, keeping other people away from her, and on September 1, Ada made an unknown confession that apparently upset William. She seemed close to death, but she hung on, in great pain, for nearly 3 more months, finally dying on November 27, 1852, at the age of 36. Florence Nightingale, nursing pioneer and friend of Ada’s, wrote: “They said she could not possibly have lived so long, were it not for the tremendous vitality of the brain, that would not die.”

Ada had made Babbage the executor of her will. And—much to her mother’s chagrin—she had herself buried in the Byron family vault next to her father, who, like her, died at age 36 (Ada lived 254 days longer). Her mother built a memorial that included a sonnet entitled “The Rainbow” that Ada wrote.

The inscription on Ada's memorial, including the sonnet she wrote

The Aftermath

Ada’s funeral was small; neither her mother nor Babbage attended. But the obituaries were kind, if Victorian in their sentiments:

William outlived her by 41 years, eventually remarrying. Her oldest son—with whom Ada had many difficulties—joined the navy several years before she died, but deserted. Ada thought he might have gone to America (he was apparently in San Francisco in 1851), but in fact he died at 26 working in a shipyard in England. Ada’s daughter married a somewhat wild poet, spent many years in the Middle East, and became the world’s foremost breeder of Arabian horses. Ada’s youngest son inherited the family title, and spent most of his life on the family estate.

Ada’s mother died in 1860, but even then the gossip about her and Byron continued, with books and articles appearing, including Harriet Beecher Stowe’s 1870 Lady Byron Vindicated. In 1905, a year before he died, Ada’s youngest son—who had been largely brought up by Ada’s mother—published a book about the whole thing, with such choice lines as “Lord Byron’s life contained nothing of any interest except what ought not to have been told”.

When Ada died, there was a certain air of scandal that seemed to hang around her. Had she had affairs? Had she run up huge gambling debts? There’s scant evidence of either. Perhaps it was a reflection of her father’s “bad boy” image. But before long there were claims that she’d pawned the family jewels (twice!), or lost, some said, £20,000, or maybe even £40,000 (equivalent to about $7 million today) betting on horses.

It didn’t help that Ada’s mother and her youngest son both seemed against her. On September 1, 1852—the same day as her confession to William—Ada had written, “It is my earnest and dying request that all my friends who have letters from me will deliver them to my mother Lady Noel Byron after my death.” Babbage refused. But others complied, and, later on, when her son organized them, he destroyed some.

But many thousands of pages of Ada’s documents still remain, scattered around the world. Back-and-forth letters that read like a modern text stream, setting up meetings, or mentioning colds and other ailments. Charles Babbage complaining about the postal service. Three Greek sisters seeking money from Ada because their dead brother had been a page for Lord Byron. Charles Dickens talking about chamomile tea. Pleasantries from a person Ada met at Paddington Station. And household accounts, with entries for note paper, musicians, and ginger biscuits. And then, mixed in with all the rest, serious intellectual discussion about the Analytical Engine and many other things.

What Happened to Babbage?

So what happened to Babbage? He lived 18 more years after Ada, dying in 1871. He tried working on the Analytical Engine again in 1856, but made no great progress. He wrote papers with titles like “On the Statistics of Light-Houses”, “Table of the Relative Frequency of Occurrences of the Causes of Breaking Plate-Glass Windows”, and “On Remains of Human Art, mixed with the Bones of Extinct Races of Animals”.

Then in 1864 he published his autobiography, Passages from the Life of a Philosopher—a strange and rather bitter document. The chapter on the Analytical Engine opens with a quote from a poem by Byron—“Man wrongs, and Time avenges”—and goes on from there. There are chapters on “Theatrical experience”, “Hints for travellers” (including on advice about how to get an RV-like carriage in Europe), and, perhaps most peculiar, “Street nuisances”. For some reason Babbage waged a campaign against street musicians who he claimed woke him up at 6 am, and caused him to lose a quarter of his productive time. One wonders why he didn’t invent a sound-masking solution, but his campaign was so notable, and so odd, that when he died it was a leading element of his obituary.

Babbage never remarried after his wife died, and his last years seem to have been lonely ones. A gossip column of the time records impressions of him:

Babbage in "A Twilight Gossip with the Past"

Apparently he was fond of saying that he would gladly give up the remainder of his life if he could spend just 3 days 500 years in the future. When he died, his brain was preserved, and is still on display…

Even though Babbage never finished his Difference Engine, a Swedish company did, and even already displayed part of it at the Great Exhibition. When Babbage died, many documents and spare parts from his Difference Engine project passed to his son Major-General Henry Babbage, who published some of the documents, and privately assembled a few more devices, including part of the Mill for the Analytical Engine. Meanwhile, the fragment of the Difference Engine that had been built in Babbage’s time was deposited at the Science Museum in London.

Rediscovery

After Babbage died, his life work on his engines was all but forgotten (though did, for example, get a mention in the 1911 Encyclopaedia Britannica). Mechanical computers nevertheless continued to be developed, gradually giving way to electromechanical ones, and eventually to electronic ones. And when programming began to be understood in the 1940s, Babbage’s work—and Ada’s Notes—were rediscovered.

People knew that “AAL” was Ada Augusta Lovelace, and that she was Byron’s daughter. Alan Turing read her Notes, and coined the term “Lady Lovelace’s Objection” (“an AI can’t originate anything”) in his 1950 Turing Test paper. But Ada herself was still largely a footnote at that point.

It was a certain Bertram Bowden—a British nuclear physicist who went into the computer industry and eventually became Minister of Science and Education—who “rediscovered” Ada. In researching his 1953 book Faster Than Thought (yes, about computers), he located Ada’s granddaughter Lady Wentworth (the daughter of Ada’s daughter), who told him the family lore about Ada, both accurate and inaccurate, and let him look at some of Ada’s papers. Charmingly, Bowden notes that in Ada’s granddaughter’s book Thoroughbred Racing Stock, there is use of binary in computing pedigrees. Ada, and the Analytical Engine, of course, used decimal, with no binary in sight.

But even in the 1960s, Babbage—and Ada—weren’t exactly well known. Babbage’s Difference Engine prototype had been given to the Science Museum in London, but even though I spent lots of time at the Science Museum as a child in the 1960s, I’m pretty sure I never saw it there. Still, by the 1980s, particularly after the US Department of Defense named its ill-fated programming language after Ada, awareness of Ada Lovelace and Charles Babbage began to increase, and biographies began to appear, though sometimes with hair-raising errors (my favorite is that the mention of “the problem of three bodies” in a letter from Babbage indicated a romantic triangle between Babbage, Ada and William—while it actually refers to the three-body problem in celestial mechanics!).

As interest in Babbage and Ada increased, so did curiosity about whether the Difference Engine would actually have worked if it had been built from Babbage’s plans. A project was mounted, and in 2002, after a heroic effort, a complete Difference Engine was built, with only one correction in the plans being made. Amazingly, the machine worked. Building it cost about the same, inflation adjusted, as Babbage had requested from the British government back in 1823.

What about the Analytical Engine? So far, no real version of it has ever been built—or even fully simulated.

What Ada Actually Wrote

OK, so now that I’ve talked (at length) about the life of Ada Lovelace, what about the actual content of her Notes on the Analytical Engine?

They start crisply: “The particular function whose integral the Difference Engine was constructed to tabulate, is …”. She then explains that the Difference Engine can compute values of any 6th degree polynomial—but the Analytical Engine is different, because it can perform any sequence of operations. Or, as she says: “The Analytical Engine is an embodying of the science of operations, constructed with peculiar reference to abstract number as the subject of those operations. The Difference Engine is the embodying of one particular and very limited set of operations…”

Charmingly, at least for me, considering the years I have spent working on Mathematica, she continues at a later point: “We may consider the engine as the material and mechanical representative of analysis, and that our actual working powers in this department of human study will be enabled more effectually than heretofore to keep pace with our theoretical knowledge of its principles and laws, through the complete control which the engine gives us over the executive manipulation of algebraical and numerical symbols.”

A little later, she explains that punched cards are how the Analytical Engine is controlled, and then makes the classic statement that “the Analytical Engine weaves algebraical patterns just as the Jacquard-loom weaves flowers and leaves”.

Ada then goes through how a sequence of specific kinds of computations would work on the Analytical Engine, with “Operation Cards” defining the operations to be done, and “Variable Cards” defining the locations of values. Ada talks about “cycles” and “cycles of cycles, etc”, now known as loops and nested loops, giving a mathematical notation for them:

Cycles and cycles of cycles

There’s a lot of modern-seeming content in Ada’s notes. She comments that “There is in existence a beautiful woven portrait of Jacquard, in the fabrication of which 24,000 cards were required.” Then she discusses the idea of using loops to reduce the number of cards needed, and the value of rearranging operations to optimize their execution on the Analytical Engine, ultimately showing that just 3 cards could do what might seem like it should require 330.

Ada talks about just how far the Analytical Engine can go in computing what was previously not computable, at least with any accuracy. And as an example she discusses the three-body problem, and the fact that in her time, of “about 295 coefficients of lunar perturbations” there were many on which different peoples’ computations didn’t agree.

Finally comes Ada’s Note G. Early on, she states: “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. … Its province is to assist us in making available what we are already acquainted with.”

Ada seems to have understood with some clarity the traditional view of programming: that we engineer programs to do things we know how to do. But she also notes that in actually putting “the truths and the formulae of analysis” into a form amenable to the engine, “the nature of many subjects in that science are necessarily thrown into new lights, and more profoundly investigated.” In other words—as I often point out—actually programming something inevitably lets one do more exploration of it.

She goes on to say that “in devising for mathematical truths a new form in which to record and throw themselves out for actual use, views are likely to be induced, which should again react on the more theoretical phase of the subject”, or in other words—as I have also often said—representing mathematical truths in a computable form is likely to help one understand those truths themselves better.

Ada seems to have understood, though, that the “science of operations” implemented by the engine would not only apply to traditional mathematical operations. For example, she notes that if “the fundamental relations of pitched sounds in the science of harmony” were amenable to abstract operations, then the engine could use them to “compose elaborate and scientific pieces of music of any degree of complexity or extent”. Not a bad level of understanding for 1843.

The Bernoulli Number Computation

What’s become the most famous part of what Ada wrote is the computation of Bernoulli numbers, in Note G. This seems to have come out of a letter she wrote to Babbage, in July 1843. She begins the letter with “I am working very hard for you; like the Devil in fact; (which perhaps I am)”. Then she asks for some specific references, and finally ends with, “I want to put in something about Bernoulli’s Numbers, in one of my Notes, as an example of how an implicit function may be worked out by the engine, without having been worked out by human head & hands first…. Give me the necessary data & formulae.”

A letter from Ada Lovelace about Bernoulli numbers, page 1 (British Library, Additional Manuscripts Collection, Charles Babbage Papers)

(British Library)

Ada’s choice of Bernoulli numbers to show off the Analytical Engine was an interesting one. Back in the 1600s, people spent their lives making tables of sums of powers of integers—in other words, tabulating values of Sum of k to the n from 1 to m for different m and n. But Jakob Bernoulli pointed out that all such sums can be expressed as polynomials in m, with the coefficients being related to what are now called Bernoulli numbers. And in 1713 Bernoulli was proud to say that he’d computed the first 10 Bernoulli numbers “in a quarter of an hour”—reproducing years of other peoples’ work.

Today, of course, it’s instantaneous to do the computation in the Wolfram Language:

Table[BernoulliB[n], {n, 0, 10}] {1, -(1/2), 1/6, 0, -(1/30), 0, 1/42, 0, -(1/30), 0, 5/66}

And, as it happens, a few years ago, just to show off new algorithms, we even computed 10 million of them.

But, OK, so how did Ada plan to do it? She started from the fact that Bernoulli numbers appear in the series expansion

Bernoulli series expansion formula

Then by rearranging this and matching up powers of x, she got a sequence of equations for the Bernoulli numbers B_n—which she then “unravelled” to give a recurrence relation of the form:

Bernoulli recurrence relation

Now Ada had to specify how to actually compute this on the Analytical Engine. First, she used the fact that odd Bernoulli numbers (other than B₁) are zero, then computed B_n, which is our modern B_2n (or BernoulliB[2n] in Wolfram Language). Then she started from B₀, and successively computed B_n for larger n, storing each value she got. The algorithm she used for the computation was (in modern terms):

$B[n_]:=B[n]=1/2 n/( n+2)-Sum[B[2k-1]Product[(n+2-i)/(i+1),{i,1,2k-1}],{k,1,(n-1)/2}]$

On the Analytical Engine, the idea was to have a sequence of operations (specified by “Operation Cards”) performed by the “Mill”, with operands coming from the “Store” (with addresses specified by “Variable Cards”). (In the Store, each number was represented by a sequence of wheels, each turned to the appropriate value for each digit.) To compute Bernoulli numbers the way Ada wanted takes two nested loops of operations. With the Analytical Engine design that existed at the time, Ada had to basically unroll these loops. But in the end she successfully produced a description of how B₈ (which she called B₇) could be computed:

Ada Lovelace's "Diagram for the Computation by the Engine of the Numbers of Bernoulli"

This is effectively the execution trace of a program that runs for 25 steps (plus a loop) on the Analytical Engine. At each step, the trace shows what operation is performed on which Variable Cards, and which Variable Cards receive the results. Lacking a symbolic notation for loops, Ada just indicated loops in the execution trace using braces, noting in English that parts are repeated.

And in the end, the final result of the computation appears in location 24:

BernoulliB[8] -(1/30)

As it’s printed, there’s a bug in Ada’s execution trace on line 4: the fraction is upside down. But if you fix that, it’s easy to get a modern version of what Ada did:

Modern version of Ada's Bernoulli table

And here’s what the same scheme gives for next two (nonzero) Bernoulli numbers. As Ada figured out it doesn’t ultimately take any more storage locations (specified by Variable Cards) to compute higher Bernoulli numbers, just more operations.

The Analytical Engine, as it was designed in 1843, was supposed to store 1000 40-digit numbers, which would in principle have allowed it to compute up to perhaps B₅₀ (=495057205241079648212477525/66). It would have been reasonably fast too; the Analytical Engine was intended to do about 7 operations per second. So Ada’s B₈ would have taken about 5 seconds and B₅₀ would have taken perhaps a minute.

Curiously, even in our record-breaking computation of Bernoulli numbers a few years ago, we were basically using the same algorithm as Ada—though now there are slightly faster algorithms that effectively compute Bernoulli number numerators modulo a sequence of primes, then reconstruct the full numbers using the Chinese Remainder Theorem.

Babbage vs. Ada?

The Analytical Engine and its construction were all Babbage’s work. So what did Ada add? Ada saw herself first and foremost as an expositor. Babbage had shown her lots of plans and examples of the Analytical Engine. She wanted to explain what the overall point was—as well as relate it, as she put it, to “large, general, & metaphysical views”.

In the surviving archive of Babbage’s papers (discovered years later in his lawyer’s family’s cowhide trunk), there are a remarkable number of drafts of expositions of the Analytical Engine, starting in the 1830s, and continuing for decades, with titles like “Of the Analytical Engine” and “The Science of Number Reduced to Mechanism”. Why Babbage never published any of these isn’t clear. They seem like perfectly decent descriptions of the basic operation of the engine—though they are definitely more pedestrian than what Ada produced.

When Babbage died, he was writing a “History of the Analytical Engine”, which his son completed. In it, there’s a dated list of “446 Notations of the Analytical Engine”, each essentially a representations of how some operation—like division—could be done on the Analytical Engine. The dates start in the 1830s, and run through the mid-1840s, with not much happening in the summer of 1843.

Babbage's son's "Analytical Engine: Catalogue of 446 Notations. With Their Classification."

Meanwhile, in the collection of Babbage’s papers at the Science Museum, there are some sketches of higher-level operations on the Analytical Engine. For example, from 1837 there’s “Elimination between two equations of the first degree”—essentially the evaluation of a rational function:

Babbage's notes on "Elimination between two equations of the first degree"

There are a few very simple recurrence relations:

A note by Babbage on solving a recurrence relation on the Analytical Engine

Then from 1838, there’s a computation of the coefficients in the product of two polynomials:

A note by Babbage on the computation of coefficients in the product of two polynomials on the Analytical Engine

But there’s nothing as sophisticated—or as clean—as Ada’s computation of the Bernoulli numbers. Babbage certainly helped and commented on Ada’s work, but she was definitely the driver of it.

So what did Babbage say about that? In his autobiography written 26 years later, he had a hard time saying anything nice about anyone or anything. About Ada’s Notes, he writes: “We discussed together the various illustrations that might be introduced: I suggested several, but the selection was entirely her own. So also was the algebraic working out of the different problems, except, indeed, that relating to the numbers of Bernoulli, which I had offered to do to save Lady Lovelace the trouble. This she sent back to me for an amendment, having detected a grave mistake which I had made in the process.”

When I first read this, I thought Babbage was saying that he basically ghostwrote all of Ada’s Notes. But reading what he wrote again, I realize it actually says almost nothing, other than that he suggested things that Ada may or may not have used.

To me, there’s little doubt about what happened: Ada had an idea of what the Analytical Engine should be capable of, and was asking Babbage questions about how it could be achieved. If my own experiences with hardware designers in modern times are anything to go by, the answers will often have been very detailed. Ada’s achievement was to distill from these details a clear exposition of the abstract operation of the machine—something which Babbage never did. (In his autobiography, he basically just refers to Ada’s Notes.)

Babbage’s Secret Sauce

For all his various shortcomings, the very fact that Babbage figured out how to build even a functioning Difference Engine—let alone an Analytical Engine—is extremely impressive. So how did he do it? I think the key was what he called his Mechanical Notation. He first wrote about it in 1826 under the title “On a Method of Expressing by Signs the Action of Machinery”. His idea was to take a detailed structure of a machine and abstract a kind of symbolic diagram of how its part acts on each other. His first example was a hydraulic device:

Babbage makes a symbolic diagram of a hydraulic device

Then he gave the example of a clock, showing on the left a kind of “execution trace” of how the components of the clock change, and on the right a kind of “block diagram” of their relationships:

Babbage makes an "execution trace" for the operation of an elaborate clock

It’s a pretty nice way to represent how a system works, similar in some ways to a modern timing diagram—but not quite the same. And over the years that Babbage worked on the Analytical Engine, his notes show ever more complex diagrams. It’s not quite clear what something like this means:

An example of Babbage's "Mechanical Notation"

But it looks surprisingly like a modern Modelica representation—say in Wolfram SystemModeler. (One difference in modern times is that subsystems are represented much more hierarchically; another is that everything is now computable, so that actual behavior of the system can be simulated from the representation.)

A modern representation of a system, in Wolfram SystemModeler

But even though Babbage used his various kinds of diagrams extensively himself, he didn’t write papers about them. Indeed, his only other publication about “Mechanical Notation” is the flyer he had printed up for the Great Exhibition in 1851—apparently a pitch for standardization in drawings of mechanical components (and indeed these notations appear on Babbage’s diagrams like the one above).

Babbage's flyer on the Laws of Mechanical Notation, handed out at the Great Exhibition in 1851

I’m not sure why Babbage didn’t do more to explain his Mechanical Notation and his diagrams. Perhaps he was just bitter about peoples’ failure to appreciate it in 1826. Or perhaps he saw it as the secret that let him create his designs. And even though systems engineering has progressed a long way since Babbage’s time, there may yet be inspiration to be had from what Babbage did.

The Bigger Picture

OK, so what’s the bigger picture of what happened with Ada, Babbage and the Analytical Engine?

Charles Babbage was an energetic man who had many ideas, some of them good. At the age of 30 he thought of making mathematical tables by machine, and continued to pursue this idea until he died 49 years later, inventing the Analytical Engine as a way to achieve his objective. He was good—even inspired—at the engineering details. He was bad at keeping a project on track.

Ada Lovelace was an intelligent woman who became friends with Babbage (there’s zero evidence they were ever romantically involved). As something of a favor to Babbage, she wrote an exposition of the Analytical Engine, and in doing so she developed a more abstract understanding of it than Babbage had—and got a glimpse of the incredibly powerful idea of universal computation.

The Difference Engine and things like it are special-purpose computers, with hardware that’s built to do only one kind of thing. One might have thought that to do lots of different kinds of things would necessarily require lots of different kinds of computers. But this isn’t true. And instead it’s a fundamental fact that it’s possible to make general-purpose computers, where a single fixed piece of hardware can be programmed to do any computation. And it’s this idea of universal computation that for example makes software possible—and that launched the whole computer revolution in the 20th century.

Gottfried Leibniz had already had a philosophical concept of something like universal computation back in the 1600s. But it wasn’t followed up. And Babbage’s Analytical Engine is the first explicit example we know of a machine that would have been capable of universal computation.

Babbage didn’t think of it in these terms, though. He just wanted a machine that was as effective as possible at producing mathematical tables. But in the effort to design this, he ended up with a universal computer.

When Ada wrote about Babbage’s machine, she wanted to explain what it did in the clearest way—and to do this she looked at the machine more abstractly, with the result that she ended up exploring and articulating something quite recognizable as the modern notion of universal computation.

What Ada did was lost for many years. But as the field of mathematical logic developed, the idea of universal computation arose again, most clearly in the work of Alan Turing in 1936. Then when electronic computers were built in the 1940s, it was realized they too exhibited universal computation, and the connection was made with Turing’s work.

There was still, though, a suspicion that perhaps some other way of making computers might lead to a different form of computation. And it actually wasn’t until the 1980s that universal computation became widely accepted as a robust notion. And by that time, something new was emerging—notably through work I was doing: that universal computation was not only something that’s possible, but that it’s actually common.

And what we now know (embodied for example in my Principle of Computational Equivalence) is that beyond a low threshold a very wide range of systems—even of very simple construction—are actually capable of universal computation.

A Difference Engine doesn’t get there. But as soon as one adds just a little more, one will have universal computation. So in retrospect, it’s not surprising that the Analytical Engine was capable of universal computation.

Today, with computers and software all around us, the notion of universal computation seems almost obvious: of course we can use software to compute anything we want. But in the abstract, things might not be that way. And I think one can fairly say that Ada Lovelace was the first person ever to glimpse with any clarity what has become a defining phenomenon of our technology and even our civilization: the notion of universal computation.

What If…?

What if Ada’s health hadn’t failed—and she had successfully taken over the Analytical Engine project? What might have happened then?

I don’t doubt that the Analytical Engine would have been built. Maybe Babbage would have had to revise his plans a bit, but I’m sure he would have made it work. The thing would have been the size of a train locomotive, with maybe 50,000 moving parts. And no doubt it would have been able to compute mathematical tables to 30- or 50-digit precision at the rate of perhaps one result every 4 seconds.

Would they have figured that the machine could be electromechanical rather than purely mechanical? I suspect so. After all, Charles Wheatstone, who was intimately involved in the development of the electric telegraph in the 1830s, was a good friend of theirs. And by transmitting information electrically through wires, rather than mechanically through rods, the hardware for the machine would have been dramatically reduced, and its reliability (which would have been a big issue) would have been dramatically increased.

Another major way that modern computers reduce hardware is by dealing with numbers in binary rather than decimal. Would they have figured that idea out? Leibniz knew about binary. And if George Boole had followed up on his meeting with Babbage at the Great Exhibition, maybe that would have led to something. Binary wasn’t well known in the mid-1800s, but it did appear in puzzles, and Babbage, at least, was quite into puzzles: a notable example being his question of how to make a square of words with “bishop” along the top and side (which now takes just a few lines of Wolfram Language code to solve).

Babbage’s primary conception of the Analytical Engine was as a machine for automatically producing mathematical tables—either printing them out by typesetting, or giving them as plots by drawing onto a plate. He imagined that humans would be the main users of these tables—although he did think of the idea of having libraries of pre-computed cards that would provide machine-readable versions.

Today—in the Wolfram Language for example—we never store much in the way of mathematical tables; we just compute what we need when we need it. But in Babbage’s day—with the idea of a massive Analytical Engine—this way of doing things would have been unthinkable.

So, OK: would the Analytical Engine have gotten beyond computing mathematical tables? I suspect so. If Ada had lived as long as Babbage, she would still have been around in the 1890s when Herman Hollerith was doing card-based electromechanical tabulation for the census (and founding what would eventually become IBM). The Analytical Engine could have done much more.

Perhaps Ada would have used the Analytical Engine—as she began to imagine—to produce algorithmic music. Perhaps they would have used it to solve things like the three-body problem, maybe even by simulation. If they’d figured out binary, maybe they would even have simulated things like cellular automata.

Neither Babbage nor Ada ever made money commercially (and, as Babbage took pains to point out, his government contracts just paid his engineers, not him). If they had developed the Analytical Engine, would they have found a business model for it? No doubt they would have sold some engines to governments. Maybe they would even have operated a kind of cloud computing service for Victorian science, technology, finance and more.

But none of this actually happened, and instead Ada died young, the Analytical Engine was never finished, and it took until the 20th century for the power of computation to be discovered.

What Were They Like?

If one had met Charles Babbage, what would he have been like? He was, I think, a good conversationalist. Early in life he was idealistic (“do my best to leave the world wiser than I found it”); later he was almost a Dickensian caricature of a bitter old man. He gave good parties, and put great value in connecting with the highest strata of intellectual society. But particularly in his later years, he spent most of his time alone in his large house, filled with books and papers and unfinished projects.

Babbage was never a terribly good judge of people, or of how what he said would be perceived by them. And even in his eighties, he was still quite child-like in his polemics. He was also notoriously poor at staying focused; he always had a new idea to pursue. The one big exception to this was his almost-50-year persistence in trying to automate the process of computation.

I myself have shared a modern version of this very goal in my own life (…, Mathematica, Wolfram|Alpha, Wolfram Language, …)—though so far only for 40 years. I am fortunate to have lived in a time when ambient technology made this much easier to achieve, but in every large project I have done it has still taken a certain singlemindedness and gritty tenacity—as well as leadership—to actually get it finished.

So what about Ada? From everything I can tell, she was a clear speaking, clear thinking individual. She came from the upper classes, but didn’t wear especially fashionable clothes, and carried herself much less like a stereotypical countess than like an intellectual. As an adult, she was emotionally quite mature—probably more so than Babbage—and seems to have had a good practical grasp of people and the world.

Like Babbage, she was independently wealthy, and had no need to work for a living. But she was ambitious, and wanted to make something of herself. In person, beyond the polished Victorian upper-class exterior, I suspect she was something of a nerd, complete with math jokes and everything. She was also capable of great and sustained focus, for example over the months she spent writing her Notes.

In mathematics, she successfully learned up to the state of the art in her time—probably about the same level as Babbage. Unlike Babbage, we don’t know of any specific research she did in mathematics, so it’s hard to judge how good she would have been; Babbage was respectable though unremarkable.

When one reads Ada’s letters, what comes through is a smart, sophisticated person, with a clear, logical mind. What she says is often dressed in Victorian pleasantaries—but underneath, the ideas are clear and often quite forceful.

Ada was very conscious of her family background, and of being “Lord Byron’s daughter”. At some level, his story and success no doubt fueled her ambition, and her willingness to try new things. (I can’t help thinking of her leading the engineers of the Analytical Engine as a bit like Lord Byron leading the Greek army.) But I also suspect his troubles loomed over her. For many years, partly at her mother’s behest, she eschewed things like poetry. But she was drawn to abstract ways of thinking, not only in mathematics and science, but also in more metaphysical areas.

And she seems to have concluded that her greatest strength would be in bridging the scientific with the metaphysical—perhaps in what she called “poetical science”. It was likely a correct self perception. For that is in a sense exactly what she did in the Notes she wrote: she took Babbage’s detailed engineering, and made it more abstract and “metaphysical”—and in the process gave us a first glimpse of the idea of universal computation.

The Final Story

The story of Ada and Babbage has many interesting themes. It is a story of technical prowess meeting abstract “big picture” thinking. It is a story of friendship between old and young. It is a story of people who had the confidence to be original and creative.

It is also a tragedy. A tragedy for Babbage, who lost so many people in his life, and whose personality pushed others away and prevented him from realizing his ambitions. A tragedy for Ada, who was just getting started in something she loved when her health failed.

We will never know what Ada could have become. Another Mary Somerville, famous Victorian expositor of science? A Steve-Jobs-like figure who would lead the vision of the Analytical Engine? Or an Alan Turing, understanding the abstract idea of universal computation?

That Ada touched what would become a defining intellectual idea of our time was good fortune. Babbage did not know what he had; Ada started to see glimpses and successfully described them.

For someone like me the story of Ada and Babbage has particular resonance. Like Babbage, I have spent much of my life pursuing particular goals—though unlike Babbage, I have been able to see a fair fraction of them achieved. And I suspect that, like Ada, I have been put in a position where I can potentially see glimpses of some of the great ideas of the future.

But the challenge is to be enough of an Ada to grasp what’s there—or at least to find an Ada who does. But at least now I think I have an idea of what the original Ada born 200 years ago today was like: a fitting personality on the road to universal computation and the present and future achievements of computational thinking.

It’s been a pleasure getting to know you, Ada.

Quite a few organizations and people helped in getting information and material for this post. I’d like to thank the British Library, the Museum of the History of Science, Oxford, Science Museum, London; the Bodleian Library, Oxford (with permission from the Earl of Lytton, Ada’s great-great grandson, and one of her 10 living descendants); the New York Public Library; St. Mary Magdalene Church, Hucknall, Nottinghamshire (Ada’s burial place); and Betty Toole (author of a collection of Ada’s letters); as well as two old friends: Tim Robinson (re-creator of Babbage engines) and Nathan Myhrvold (funder of Difference Engine #2 re-creation).

↧

Announcing Wolfram Programming Lab

January 19, 2016, 1:24 pm

≫ Next: Farewell, Marvin Minsky (1927–2016)

≪ Previous: Untangling the Tale of Ada Lovelace

I’m excited today to be able to announce the launch of Wolfram Programming Lab—an environment for anyone to learn programming and computational thinking through the Wolfram Language. You can run Wolfram Programming Lab through a web browser, as well as natively on desktop systems (Mac, Windows, Linux).

I’ve long wanted to have a way to let anybody—kids, adults, whoever—get a hands-on introduction to the Wolfram Language and everything it makes possible, even if they’ve had no experience with programming before. Now we have a way!

The startup screen gives four places to go. First, there’s a quick video. Then it’s hands on, with “Try It Yourself”—going through some very simple but interesting computations.

Then there are two different paths. Either start learning systematically—or jump right in and explore. My new book An Elementary Introduction to the Wolfram Language is the basis for the systematic approach.

The whole book is available inside Wolfram Programming Lab. And the idea is that as you read the book, you can immediately try things out for yourself—whether you’re making up your own computations, or doing the exercises given in the book.

Try the book's exercises or your own computations as you go

But there’s also another way to use Wolfram Programming Lab: just jump right in and explore. Programming Lab comes with several dozen Explorations—each providing an activity with a different focus. When you open an Exploration, you see a series of steps with code ready to run.

Press Shift+Enter (or the Evaluate button) to run each piece of code and see what it does—or edit the code first and then run your own version. The idea is always to start with a piece of code that works, and then modify it to do different things. It’s like you’re starting off learning to read the language; then you’re beginning to write it. You can always press the “Show Details” button to open up an explanation of what’s going on.

The Show Details buttons offer additional information to help you learn

Each Exploration goes through a series of steps to build to a final result. But then there’s usually a “Go Further” button that gives you suggestions for free-form projects to do based on the Exploration.

Go Further suggestions help you cement your understanding and expand your knowledge

When you create something neat, you can share it with your friends, teachers, or anyone else. Just press the Share button to create a webpage of what you’ve made.

How We Got Here

I first started thinking about making something like Wolfram Programming Lab quite a while ago. I’d had lots of great experiences showing the Wolfram Language in person to people from middle-school-age on up. But I wanted us to find a way for people to get started with the Wolfram Language on their own.

We used our education expertise to put together a whole series of what seemed like good approaches, building prototypes and testing them with groups of kids. It was often a sobering experience—with utter failure in a matter of minutes. Sometimes the problem was that there was nothing the kids found interesting. Sometimes the kids were confused about what to do. Sometimes they’d do a little, but clearly not understand what they were doing.

At first we thought that it was just a matter of finding the one “right approach”: immersion language learning, systematic exercise-based learning, project-based learning, or something else. But gradually we realized we needed to allow not just one approach, but instead several that could be used interchangeably on different occasions or by different people. And once we did this, our tests started to be more and more successful—leading us in the end to the Wolfram Programming Lab that we have today.

What’s Possible Now

I’m very excited about the potential of Wolfram Programming Lab. In fact, we’ve already started developing a whole ecosystem around it—with online and offline educational and community programs, lots of opportunities for students, educators, volunteers and others, and a whole variety of additional deployment channels.

Wolfram Programming Lab can be used by people on their own—but it can also be used by teachers in classrooms. Explain things through a demo based on an Exploration. Do a project based on a Go Further suggestion (with live coding if you’re bold). Use the Elementary Introduction book as the basis for lectures or independent reading. Use exercises from the book as class projects or homework.

Wolfram Programming Lab is something that’s uniquely made possible by the Wolfram Language. Because it’s only with the whole knowledge-based programming approach—and all the technology we’ve built—that one gets to the point where simple code can routinely do really interesting and compelling things.

It’s a very important—and in fact transformative—moment for programming education.

In the past one could use a “toy programming language” like Scratch, or one could use a professional low-level programming language like C++ or Java. Scratch is easy to use, but is very limited. C++ or Java can ultimately do much more (though they don’t have built-in knowledge), but you need to put in significant time—and get deep into the engineering details—to make programs that get beyond a toy level of functionality.

With the Wolfram Language, though, it’s a completely different story. Because now even beginners can write programs that do really interesting things. And the programs don’t have to just be “computer science exercises”: they can be programs that immediately connect to the real world, and to what students study across the whole curriculum.

Wolfram Programming Lab gives people a broad way to learn modern programming—and to acquire an incredibly valuable career-building practical skill. But it also helps develop the kind of computational thinking that’s increasingly central to today’s world.

For many students (and others) today, Wolfram|Alpha serves as a kind of “zeroth” programming language. The Wolfram Language is not only an incredibly powerful professional programming language, but also a great first programming language. Wolfram Programming Lab lets people learn the Wolfram Language—and computational thinking—while preserving as much as possible the accessibility and simplicity of Wolfram|Alpha.

I’m excited to see how Wolfram Programming Lab is used. I think it’s going to open up programming like never before—and give all sorts of people around the world the opportunity to join the new generation of programmers who turn ideas into reality using computational thinking and the Wolfram Language.

To comment, please visit the copy of this post at the Wolfram Blog »

↧

Farewell, Marvin Minsky (1927–2016)

January 26, 2016, 8:51 pm

≫ Next: Launching the Wolfram Open Cloud: Open Access to the Wolfram Language

≪ Previous: Announcing Wolfram Programming Lab

I think it was 1979 when I first met Marvin Minsky, while I was still a teenager working on physics at Caltech. It was a weekend, and I’d arranged to see Richard Feynman to discuss some physics. But Feynman had another visitor that day as well, who didn’t just want to talk about physics, but instead enthusiastically brought up one unexpected topic after another.

That afternoon we were driving through Pasadena, California—and with no apparent concern to the actual process of driving, Feynman’s visitor was energetically pointing out all sorts of things an AI would have to figure if it was to be able to do the driving. I was a bit relieved when we arrived at our destination, but soon the visitor was on to another topic, talking about how brains work, and then saying that as soon as he’d finished his next book he’d be happy to let someone open up his brain and put electrodes inside, if they had a good plan to figure out how it worked.

Feynman often had eccentric visitors, but I was really wondering who this one was. It took a couple more encounters, but then I got to know that eccentric visitor as Marvin Minsky, pioneer of computation and AI—and was pleased to count him as a friend for more than three decades.

Just a few days ago I was talking about visiting Marvin—and I was so sad when I heard he died. I started reminiscing about all the ways we interacted over the years, and all the interests we shared. Every major project of my life I discussed with Marvin, from SMP, my first big software system back in 1981, through Mathematica, A New Kind of Science, Wolfram|Alpha and most recently the Wolfram Language.

This picture is from one of the last times I saw Marvin. His health was failing, but he was keen to talk. Having watched more than 35 years of my life, he wanted to tell me his assessment: “You really did it, Steve.” Well, so did you, Marvin! (I’m always “Stephen”, but somehow Americans of a certain age have a habit of calling me “Steve”.)

Marvin Minsky and me

The Marvin that I knew was a wonderful mixture of serious and quirky. About almost any subject he’d have something to say, most often quite unusual. Sometimes it’d be really interesting; sometimes it’d just be unusual. I’m reminded of a time in the early 1980s when I was visiting Boston and subletting an apartment from Marvin’s daughter Margaret (who was in Japan at the time). Margaret had a large and elaborate collection of plants, and one day I noticed that some of them had developed nasty-looking spots on their leaves.

Being no expert on such things (and without the web to look anything up!), I called Marvin to ask what to do. What ensued was a long discussion about the possibility of developing microrobots that could chase mealybugs away. Fascinating though it was, at the end of it I still had to ask, “But what should I actually do about Margaret’s plants?” Marvin replied, “Oh, I guess you’d better talk to my wife.”

For many decades, Marvin was perhaps the world’s greatest energy source for artificial intelligence research. He was a fount of ideas, which he fed to his long sequence of students at MIT. And though the details changed, he always kept true to his goal of figuring out how thinking works, and how to make machines do it.

Marvin the Computation Theorist

By the time I knew Marvin, he tended to talk mostly about theories where things could be figured out by what amounts to common sense, perhaps based on psychological or philosophical reasoning. But earlier in his life, Marvin had taken a different approach. His 1954 PhD thesis from Princeton was about artificial neural networks (“Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem”) and it was a mathematics thesis, full of technical math. And in 1956, for example, Marvin published a paper entitled “Some Universal Elements for Finite Automata”, in which he talked about how “complicated machinery can be constructed from a small number of basic elements”.

This particular paper considered only essentially finite machines, based directly on specific models of artificial neural networks. But soon Marvin was looking at more general computational systems, and trying to see what they could do. In a sense, Marvin was beginning just the kind of exploration of the computational universe that years later I would also do, and eventually write A New Kind of Science about. And in fact, as early as 1960, Marvin came extremely close to discovering the same core phenomenon I eventually did.

In 1960, as now, Turing machines were used as a standard basic model of computation. And in his quest to understand what computation—and potentially brains—could be built from, Marvin started looking at the very simplest Turing machines (with just 2 states and 2 colors) and using a computer to find out what all 4096 of them actually do. Most he discovered just have repetitive behavior, and a few have what we’d now call nested or fractal behavior. But none do anything more complicated, and indeed Marvin based the final exercise in his classic 1967 book Computation: Finite and Infinite Machines on this, noting that “D. G. Bobrow and the author did this for all (2,2) machines [1961, unpublished] by a tedious reduction to thirty-odd cases (unpublishable).”

Years later, Marvin told me that after all the effort he’d spent on the (2,2) Turing machines he wasn’t inclined to go further. But as I finally discovered in 1991, if one just looks at (2,3) Turing machines, then among the 3 million or so of them, there are a few that don’t just show simple behavior any more—and instead generate immense complexity even from their very simple rules.

Back in the early 1960s, even though he didn’t find complexity just by searching simple “naturally occurring” Turing machines, Marvin still wanted to construct the simplest one he could that would exhibit it. And through painstaking work, he came up in 1962 with a (7,4) Turing machine that he proved was universal (and so, in a sense, capable of arbitrarily complex behavior).

At the time, Marvin’s (7,4) Turing machine was the simplest known universal Turing machine. And it kept that record essentially unbroken for 40 years—until I finally published a (2,5) universal Turing machine in A New Kind of Science. I felt a little guilty taking the record away from Marvin’s machine after so long. But Marvin was very nice about it. And a few years later he enthusiastically agreed to be on the committee for a prize I put up to establish whether a (2,3) Turing machine that I had identified as the simplest possible candidate for universality was in fact universal.

It didn’t take long for a proof of universality to be submitted, and Marvin got quite involved in some of the technical details of validating it, noting that perhaps we should all have known something like this was possible, given the complexity that Emil Post had observed with the simple rules of what he called a tag system—back in 1921, before Marvin was even born.

Marvin and Neural Networks

When it came to science, it sometimes seemed as if there were two Marvins. One was the Marvin trained in mathematics who could give precise proofs of theorems. The other was the Marvin who talked about big and often quirky ideas far away from anything like mathematical formalization.

I think Marvin was ultimately disappointed with what could be achieved by mathematics and formalization. In his early years he had thought that with simple artificial neural networks—and maybe things like Turing machines—it would be easy to build systems that worked like brains. But it never seemed to happen. And in 1969, with his long-time mathematician collaborator Seymour Papert, Marvin wrote a book that proved that a certain simple class of neural networks known as perceptrons couldn’t (in Marvin’s words) “do anything interesting”.

To Marvin’s later chagrin, people took the book to show that no neural network of any kind could ever do anything interesting, and research on neural networks all but stopped. But a bit like with the (2,2) Turing machines, much richer behavior was actually lurking just out of sight. It started being noticed in the 1980s, but it’s only been in the last couple of years—with computers able to handle almost-brain-scale networks—that the richness of what neural networks can do has begun to become clear.

And although I don’t think anyone could have known it then, we now know that the neural networks Marvin was investigating as early as 1951 were actually on a path that would ultimately lead to just the kind of impressive AI capabilities he was hoping for. It’s a pity it took so long, and Marvin barely got to see it. (When we released our neural-network-based image identifier last year, I sent Marvin a pointer saying “I never thought neural networks would actually work… but…” Sadly, I never ended up talking to Marvin about it.)

Marvin and Symbolic AI

Marvin’s earliest approaches to AI were through things like neural networks. But perhaps through the influence of John McCarthy, the inventor of LISP, with whom Marvin started the MIT AI Lab, Marvin began to consider more “symbolic” approaches to AI as well. And in 1961 Marvin got a student of his to write a program in LISP to do symbolic integration. Marvin told me that he wanted the program to be as “human like” as possible—so every so often it would stop and say “Give me a cookie”, and the user would have to respond “A cookie”.

By the standards of Mathematica or Wolfram|Alpha, the 1961 integration program was very primitive. But I’m certainly glad Marvin had it built. Because it started a sequence of projects at MIT that led to the MACSYMA system that I ended up using in the 1970s—that in many ways launched my efforts on SMP and eventually Mathematica.

Marvin himself, though, didn’t go on thinking about using computers to do mathematics, but instead started working on how they might do the kind of tasks that all humans—including children—routinely do. Marvin’s collaborator Seymour Papert, who had worked with developmental psychologist Jean Piaget, was interested in how children learn, and Marvin got quite involved in Seymour’s project of developing a computer language for children. The result was Logo—a direct precursor of Scratch—and for a brief while in the 1970s Marvin and Seymour had a company that tried to market Logo and a hardware “turtle” to schools.

For me there was always a certain mystique around Marvin’s theories about AI. In some ways they seemed like psychology, and in some ways philosophy. But occasionally there’d actually be pieces of software—or hardware—that claimed to implement them, often in ways that I didn’t understand very well.

Probably the most spectacular example was the Connection Machine, developed by Marvin’s student Danny Hillis and his company Thinking Machines (for which Richard Feynman and I were both consultants). It was always in the air that the Connection Machine was built to implement one of Marvin’s theories about the brain, and might be seen one day as like the “transistor of artificial intelligence”. But I, for example, ended up using its massively parallel architecture to implement cellular automaton models of fluids, and not anything AI-ish at all.

Marvin was always having new ideas and theories. And even as the Connection Machine was being built, he was giving me drafts of his book The Society of Mind, which talked about new and different approaches to AI. Ever one to do the unusual, Marvin told me he thought about writing the book in verse. But instead the book is structured a bit like so many conversations I had with Marvin: with one idea on each page, often good, but sometimes not—yet always lively.

I think Marvin viewed The Society of Mind as his magnum opus, and I think he was disappointed that more people didn’t understand and appreciate it. It probably didn’t help that the book came out in the 1980s, when AI was at its lowest ebb. But somehow I think to really appreciate what’s in the book one would need Marvin there, presenting his ideas with his characteristic personal energy and responding to any objections one might have about them.

Marvin and Cellular Automata

Marvin was used to having theories about thinking that could be figured out just by thinking—a bit like the ancient philosophers had done. But Marvin was interested in everything, including physics. He wasn’t an expert on the formalism of physics, though he did make contributions to physics topics (notably patenting a confocal microscope). And through his long-time friend Ed Fredkin, he had already been introduced to cellular automata in the early 1960s. He really liked the philosophy of having physics based on them—and ended up for example writing a paper entitled “Nature Abhors an Empty Vacuum” that talked about how one might in effect engineer certain features of physics from cellular automata.

Marvin didn’t do terribly much with cellular automata, though in 1970 he and Fredkin used something like them in the Triadex Muse digital music synthesizer that they patented and marketed—an early precursor of cellular-automaton-based music composition.

Marvin was very supportive of my work on cellular automata and other simple programs, though I think he found my orientation towards natural science a bit alien. During the decade that I worked on A New Kind of Science I interacted with Marvin with some regularity. He was starting work on a book then too, about emotions, that he told me in 1992 he hoped “might reform how people think about themselves”. I talked to him occasionally about his book, trying I suppose to understand the epistemological character of it (I once asked if it was a bit like Freud in this respect, and he said yes). It took 15 years for Marvin to finish what became The Emotion Machine. I know he had other books planned too; in 2006, for example, he told me he was working on a book on theology that was “a couple of years away”—but which sadly never saw the light of day.

Marvin in Person

It was always a pleasure to see Marvin. Often it would be at his big house in Brookline, Massachusetts. As soon as one entered, Marvin would start saying something unusual. It could be, “What would we conclude if the sun didn’t set today?” Or, “You’ve got to come see the actual binary tree in my greenhouse.” Once someone told me that Marvin could give a talk about almost anything, but if one wanted it to be good, one should ask him an interesting question just before he started, and then that’d be what he would talk about. I realized this was how to handle conversations with Marvin too: bring up a topic and then he could be counted on to say something unusual and often interesting about it.

I remember a few years ago bringing up the topic of teaching programming, and how I was hoping the Wolfram Language would be relevant to it. Marvin immediately launched into talking about how programming languages are the only ones that people are expected to learn to write before they can read. He said he’d been trying to convince Seymour Papert that the best way to teach programming was to start by showing people good code. He gave the example of teaching music by giving people Eine kleine Nachtmusik, and asking them to transpose it to a different rhythm and see what bugs occur. (Marvin was a long-time enthusiast of classical music.) In just this vein, one way the Wolfram Programming Lab that we launched just last week lets people learn programming is by starting with good code, and then having them modify it.

There was always a certain warmth to Marvin. He liked and supported people; he connected with all sorts of interesting people; he enjoyed telling nice stories about people. His house always seemed to buzz with activity, even as, over the years, it piled up with stuff to the point where the only free space was a tiny part of a kitchen table.

Marvin also had a great love of ideas. Ones that seemed important. Ones that were strange and unusual. But I think in the end Marvin’s greatest pleasure was in connecting ideas with people. He was a hacker of ideas, but I think the ideas became meaningful to him when he used them as a way to connect with people.

I shall miss all those conversations about ideas—both ones I thought made sense and ones I thought didn’t. Of course, Marvin was always a great enthusiast of cryonics, so perhaps this isn’t the end of the story. But at least for now, farewell, Marvin, and thank you.

↧

Launching the Wolfram Open Cloud: Open Access to the Wolfram Language

January 28, 2016, 8:03 am

≫ Next: Black Hole Tech?

≪ Previous: Farewell, Marvin Minsky (1927–2016)

Six and a half years ago we put Wolfram|Alpha and the sophisticated computational knowledge it delivers out free on the web for anyone in the world to use. Now we’re launching the Wolfram Open Cloud to let anyone in the world use the Wolfram Language—and do sophisticated knowledge-based programming—free on the web.

It’s been very satisfying to see how successfully Wolfram|Alpha has democratized computational knowledge and how its effects have grown over the years. Now I want to do the same thing with knowledge-based programming—through the Wolfram Open Cloud.

Last week we released Wolfram Programming Lab as an environment for people to learn knowledge-based programming with the Wolfram Language. Today I’m pleased to announce that we’re making Wolfram Programming Lab available for free use on the web in the Wolfram Open Cloud.

Go to wolfram.com, and you’ll see buttons labeled “Immediate Access”. One is for Wolfram|Alpha. But now there are two more: Programming Lab and Development Platform.

Wolfram Programming Lab is for learning to program. Wolfram Development Platform (still in beta) is for doing professional software development. Go to either of these in the Wolfram Open Cloud and you’ll immediately be able to start writing and executing Wolfram Language code in an active Wolfram Language notebook.

Just as with Wolfram|Alpha, you don’t have to log in to use the Wolfram Open Cloud. And you can go pretty far like that. You can create notebook documents that involve computations, text, graphics, interactivity—and all the other things the Wolfram Language can do. You can even deploy active webpages, web apps and APIs that anyone can access and use on the web.

If you want to save things then you’ll need to set up a (free) Wolfram Cloud account. And if you want to get more serious—about computation, deployments or storage—you’ll need to have an actual subscription for Wolfram Programming Lab or Wolfram Development Platform.

But the Wolfram Open Cloud gives anyone a way to do “casual” programming whenever they want—with access to all the core computation, interface, deployment and knowledge capabilities of the Wolfram Language.

In Wolfram|Alpha, you give a single line of natural language input to get your computational knowledge output. In the Wolfram Open Cloud, the power and automation of the Wolfram Language make it possible to give remarkably small amounts of Wolfram Language code to get remarkably sophisticated operations done.

The Wolfram Open Cloud is set up for learning and prototyping and other kinds of casual use. But a great thing about the Wolfram Language is that it’s fully scalable. Start in the Wolfram Open Cloud, then scale up to the full Wolfram Cloud, or to a Wolfram Private Cloud—or instead run in Wolfram Desktop, or, for that matter, in the bundled version for Raspberry Pi computers.

I’ve been working towards what’s now the Wolfram Language for nearly 30 years, and it’s tremendously exciting now to be able to deliver it to anyone anywhere through the Wolfram Open Cloud. It takes a huge stack of technology to make this possible, but what matters most to me is what can be achieved with it.

With Wolfram Programming Lab now available through the Wolfram Open Cloud, anyone anywhere can learn and start doing the latest knowledge-based programming. Last month I published An Elementary Introduction to the Wolfram Language (which is free on the web); now there’s a way anyone anywhere can do all the things the book describes.

Ever since the web was young, our company has been creating large-scale public resources for it, from Wolfram MathWorld to the Wolfram Demonstrations Project to Wolfram|Alpha. Today we’re adding what may ultimately be the most significant of all: the Wolfram Open Cloud. In a sense it’s making the web into a true computing environment—in which anyone can use the power of knowledge-based programming to create whatever they want. And it’s an important step towards a world of ubiquitous knowledge-based programming, with all the opportunities that brings for so many people.

To comment, please visit the copy of this post at the Wolfram Blog »

↧

Black Hole Tech?

February 22, 2016, 11:14 am

≫ Next: My Life in Technology—As Told at the Computer History Museum

≪ Previous: Launching the Wolfram Open Cloud: Open Access to the Wolfram Language

The Theory Works!

The equation that Albert Einstein wrote down for the gravitational field in 1915 is simple enough:

Einstein's gravitational field equation

But working out its consequences is not. And in fact even after 100 years we’re still just at the beginning of the process.

Riemann perturbations of Einstein's gravitational field equation

Millions of lines of algebra have been done along the way (often courtesy of Mathematica and the Wolfram Language). And there have been all sorts of predictions. Like that if two black holes merge, there should be a burst of gravitational radiation generated, with a particular form. And a little more than a week ago—in a triumph of theoretical and experimental science—it was announced that just such gravitational radiation had been detected.

I’ve followed General Relativity and gravitation theory for more than 40 years now—and it’s been inspiring to see how the small community that’s pursued it has progressively increased its theoretical prowess, and how the discussions I saw at Caltech in the late 1970s finally led to a successful detector of gravitational waves.

General Relativity is surely not the whole story of how spacetime and gravity work. But we’ve now just got some spectacular new evidence of how far the theory can be taken. For a long time I myself was a bit skeptical about black holes—and for example about whether true General-Relativity-style ones would actually form in real physical processes. But as of a little more than a week ago I’m finally convinced that black holes exist, just as General Relativity suggests.

Fast Forward to Black Hole Technology

OK, so we’ve observed one pair of black holes, a billion light years away. And no doubt now—quite amazingly—we’ll get evidence for a steady stream of others around the universe. But what if somehow we could get our hands on our very own black holes, and maybe even lots of them? What could we—or, for that matter, any putative extraterrestrials—do with them? What kind of perhaps extremely exotic structures or technology could eventually be made with them?

It’s always the same story with technology. We have to take the raw material that our universe provides, and somehow find ways to organize it for purposes we want. It’s remarkable to look through the list of chemical elements, or a list of physics effects that have been discovered, and to realize that—though it sometimes took a while—almost all those that can be readily realized on the time and energy scales of today’s technology have found real applications. So what about black holes? Given how hard it’s been to detect our very first pair of black holes, it might seem almost irreverent to ask. And perhaps our universe just isn’t big enough for the question to be sensible. But as a kind of celebration of the detection of gravitational waves I thought it might be fun to try fast-forwarding a long way—and seeing what one can figure out about technology that black holes could make possible.

It seems inconceivable that we ourselves will ever get to try out anything like this for real—unless we find a way to locally make tiny stable black holes. But if something is possible to do, perhaps some more-advanced civilization out there in the universe has already done it—but we likely couldn’t recognize evidence of it without having more idea of what’s possible.

But before we can get to speculating about black hole technology, we’re going to have to talk a bit about what’s known about black holes, General Relativity and gravitation. There are lots of complicated issues—that are probably most easily explained using some fairly mathematically sophisticated concepts (Riemann tensors, covariant derivatives, spacelike hypersurfaces, Penrose diagrams, etc. etc.). But for the sake of writing a general blog post, I’m going to try to do without these, while still, I hope, correctly communicating what’s known and what’s not. I won’t be able to do it perfectly, and might lapse unwittingly into physics-speak from time to time, but here goes…

What Are Black Holes?

General Relativity is often discussed in terms of the geometry of spacetime. But one can also think of it as just saying that gravity is associated with a field that has a certain strength or value at every point. This idea of a field is basically just like in electromagnetism, with its electric and magnetic fields. It’s also like in fluid mechanics, where there’s a velocity field that gives the velocity of the fluid at every point (like a wind velocity map for the weather).

What Einstein did in 1915 was to suggest particular equations that should be satisfied by the gravitational field. Mathematically, they’re partial differential equations, which means that they say how values of the field relate to rates of change (partial derivatives) of these values. They’re the same general kind of equations that we know work for electromagnetic fields, or for the velocity field in a fluid.

So what does one do with these equations? Well, one solves them to find out what the field is in any particular case. It turns out that for electromagnetism, the structure of the equations makes this in principle straightforward. But for fluid mechanics, it’s considerably more complicated—and for Einstein’s equations it’s much more complicated still.

In electromagnetism, one can just think of charges and currents as being sources of electromagnetic field, and there’s no “internal effect” of the field on itself (unless one considers quantum effects). But for fluid mechanics and Einstein’s equations, it’s a different story. In a first approximation, the velocity of a fluid is determined by whatever pressure is applied to it. But what complicates things greatly is that within the fluid there’s an internal effect of each part of the velocity field on others. And it’s similar with the gravitational field: In a first approximation, the field is just determined by whatever configuration of masses exists. But there’s also an “internal effect” of the field on itself. Physically, this is because the gravitational field can be thought of as having energy and momentum, which behave like mass in effectively being a source of the field. (The electromagnetic field has energy and momentum too, but it doesn’t itself have charge, so doesn’t act as a source for itself. In QCD, the color field itself has color, so it has the same general kind of nonlinear character as fluid mechanics or Einstein’s equations.)

In electromagnetism, with its simpler structure, one can’t have any region of static nonzero field unless one has charges or currents explicitly producing it. But when fields can act on themselves it’s a different story, and there can be structures that exist purely in the field, without any external sources being present. For example, in a fluid there can be a vortex that just exists within the fluid—because this happens to be a possible solution to the pure equations for the velocity field of the fluid, without any external forces.

What about the Einstein equations? Well, it’s somewhat the same story, though the details are considerably more complicated. There are nontrivial solutions to the Einstein equations even in the case of “pure gravity”, without any matter or external configuration of masses being present. And that’s exactly what black holes are. They’re examples of solutions to the Einstein equations that correspond to structures that can just exist independently in a gravitational field, a bit like vortices can just exist in the velocity field of a fluid.

How Black Holes Are Made

From everyday experience and from seeing the operation of programs, we tend to be used to the idea that the way to work out what something will do is to start from the beginning and then go forwards step by step. But in mathematically based science the setup is often much less direct and constructive, and instead is basically “the system obeys such-and-such an equation; whatever the system does must correspond to some solution or another to the equation”. And that’s ultimately the setup with Einstein’s equations.

There can be some serious complications. For example, given particular constraints it’s far from obvious that any solutions to the equations will exist, or be unique. And indeed we’ll encounter difficulties along these lines later. But let’s start off by trying to get some rough idea of the physics of how black holes can be made.

The classic way one imagines a black hole is made is from the collapse of a massive star. And that’s presumably where the two black holes just detected came from.

For the Earth, with its particular mass and radius, we can work out that something launched from the surface must have a velocity of about 25,000 miles per hour to escape Earth’s gravity. But for a body whose mass is larger or whose radius is smaller, the escape velocity will be larger. And what General Relativity (like Newtonian gravity before it) says is that eventually the escape velocity will exceed the speed of light—so that neither light nor anything else will be able to escape, so the object will always seem black: a black hole.

When this happens, there’s inevitably also a strong gravitational field. And this gravitational field effectively has mass, which itself serves as a source of gravitational field. And in the end, it’s actually irrelevant if there’s matter there at all: the black hole is in effect a self-sustaining configuration of the gravitational field that exists as a solution to Einstein’s equations. It’s a bit like a vortex in a fluid, which you can start by stirring, but which, once it’s there, effectively just perpetuates itself (though in a real fluid with viscosity it’ll eventually damp out).

It’s not obvious of course that the mass and radius needed to get a black hole would actually occur. It’s known that stars like the Sun will never collapse far enough. But above about 3 or 4 solar masses, there’s at least no known physical process that will prevent a star from collapsing enough to form a black hole. And the 36- and 29-solar-mass black holes recently observed presumably formed this way.

What Can a Black Hole Be Like?

Let’s for a moment ignore how black holes might be formed, and just ask what they can be like. This is really a question about possible solutions to Einstein’s equations. And if we want something that doesn’t change with time, and that’s localized in space, then there are mathematical theorems that say the choices are very limited.

There could have been a whole zoo of possible black hole structures—and in higher dimensions, there are at least a few more. But for 4D spacetime, it actually turns out that all stationary black hole solutions are mathematically similar, and are determined by just two parameters: their overall mass and angular momentum. (If one includes electromagnetism as well, then they’re also determined by charge—and it’d be the same story with any other long-range gauge fields.)

The case of non-rotating black holes (zero angular momentum) is simplest. The relevant solution to the Einstein equations was found already by Karl Schwarzschild in 1915. But it took nearly 50 years for the interpretation of the solution to become clear.

One crucial feature of the Schwarzschild solution is that it has an event horizon. This means that any light rays (or anything else) that originate inside a certain sphere (the event horizon) are trapped forever, and can’t escape. There was confusion for quite a while, because the original formula for the Schwarzschild solution has a singularity at the event horizon. But actually this is just a mathematical artifact that can be removed by using a different coordinate system, and isn’t relevant to anything physically observable.

But even though there’s no real singularity at the event horizon, there is a singularity at the very center of the black hole—where the curvature of spacetime, and thus the effective strength of the gravitational field, is infinite. And it turns out that this singularity is in effect where the whole mass of the black hole is concentrated. It’s a pretty pathological situation. If this were happening in fluid mechanics, for example, we’d just assume that the continuum differential equations we’re using must break down, and that instead we’d have to work at the level of molecules. But for General Relativity we don’t yet have any established lower-level theory to use (though I certainly have ideas, and string theory has claims of being able to come to the rescue). There’s also elegant mathematics that’s developed around black holes and their singularities—and anyway at least in this case one can say that “It’s all happening inside the event horizon so nobody outside will ever find out about it”. So the current state of the art is just to work with the theory assuming the singularity is real—and what’s interesting now is that calculations based on this seem to have given correct answers for the recent gravitational wave discovery.

The Life of a Non-rotating Black Hole

I just talked a bit about the mathematical structure of a black hole solution to Einstein’s equations. But how does this correspond to an actual black hole that could form from the collapse of a massive star?

The truest way to find out would be to start from an accurate model of the star and then simulate the whole process of forming the black hole. And at least in some approximation, it’s possible these days to do this. But let’s try a more lightweight approach.

Let’s assume that there’s a black hole solution to Einstein’s equations that exists. Then let’s ask what happens when small things fall into it. Well, there’s already an issue here. Think about an observer far from the black hole. In order to “get the news” that something crossed the event horizon of the black hole, the observer would have to get some signal—say a light pulse. But as the thing gets closer to the event horizon, it’ll take longer and longer for the signal to escape. And the result is that the observer will never see things cross the event horizon: they’ll appear to get closer and closer (and darker and darker), but never actually cross.

And that’ll be true even when it comes to the formation of the black hole. The star will be seen to be collapsing, but it’ll look as if it’s just freezing when it gets to the point where an event horizon would form.

OK, but what if the observer is also falling into a black hole? Here the experience is completely different. They probably wouldn’t even notice when they cross the event horizon, except that “handshake” signals to the outside world will stop getting responses. But then they’ll get pulled in towards the singularity at the center of the black hole. The gravitational field will steadily increase, and the fact that it’s stronger further in will inevitably stretch any object (or observer!) out. But eventually, splat, they’ll hit the singularity—and in some sense be sucked into it.

Is that really how things will work? Well, it’s hard to tell, but probably not. Outside the event horizon it’s known that small perturbations in the structure of the gravitational field—say associated with the presence of matter—will tend to get damped out, so that what emerges is exactly the official Schwarzschild black hole solution to the Einstein equations.

But inside the event horizon it’s much less clear what happens. As soon as there are perturbations, there’ll be time variations in the gravitational field, and one’s no longer dealing with a static solution to the Einstein equations. The result is that the known theorems no longer apply—and quite possibly there’ll be instabilities that change the structure or even existence of the singularity. But at least in this case, in some sense it doesn’t matter—because none of what happens will ever be visible outside of the event horizon.

Rotating Black Holes

In 1963 Roy Kerr found a solution to Einstein’s equations that corresponds to a black hole with angular momentum. Like the solution for a non-rotating black hole, it has a singularity in the middle. But now the singularity is not a point; instead it forms a ring.

And at least so long as the angular momentum J is (in suitable units) less than the square of the mass, M², the rotating black hole solution has an event horizon. And outside the event horizon, perturbations tend to get damped, just like in the non-rotating case. But inside, things are different.

In a non-rotating black hole anything that goes inside the event horizon will eventually hit the singularity, but won’t “see it coming”. And if light or anything else originates at the singularity it’ll just stay there, and never “get out”.

But the same isn’t true in a rotating black hole. Here, not everything will hit the singularity, and things that originate at the singularity can “get out”. This latter point is quite a problem—because it means that to know the behavior inside the black hole, you have to know what happens at the singularity. But at the singularity, Einstein’s equations can’t tell one anything: they essentially just say infinity=infinity. So the conclusion is that at least based on Einstein’s equations, one simply can’t predict what will happen.

At least with J < M², this failure of prediction occurs only inside the so-called inner horizon of the black hole. But even outside this, something weird happens. To an observer falling into the black hole, it’ll seem like a finite time elapses between when they cross the event horizon and the inner horizon. But to an observer outside the black hole, this will seem like an infinite time. And that means that any signals that come from outside the black hole—into the infinite future—could be collected by the observer inside the black hole, in finite time.

Most likely this is a sign that in practice unbounded amounts of energy will accumulate near the inner horizon, making it unstable. But if somehow stability were maintained, there’d be a really weird effect going on: the observer inside the black hole would get to see, in finite time, the whole infinite future unfolding outside the black hole. And if that future happened to include Turing machines doing computations, then in finite time the observer would get to see computations—like solving the halting problem—that can’t necessarily be done by Turing machines in any finite time.

This might be billed as evidence for “physics going beyond the Turing limit”, but it’s not really convincing, first because the whole theoretical internal structure of rotating black holes probably gets modified in practice; and second, because to really talk about the infinite future we have to consider the structure of the whole universe, not just one specific black hole.

But despite all this complexity about what happens inside the event horizon, General Relativity has clear predictions for outside—and these are what were needed for the pair of black holes just detected.

Naked Singularities

In a rotating black hole with J < M², there’s a nasty singularity—but it’s safely inside an event horizon. But for J > M², there’s the same kind of singularity, but now it’s no longer inside an event horizon, and instead it’s “naked” and exposed to the outside universe.

If there’s a naked singularity like this, the consequence is simple: General Relativity alone isn’t sufficient to describe what happens in the universe; some additional theory is needed.

Encountering something like this is one of the hazards of using a theory—like General Relativity—that’s based on solving equations (rather than, say, running a program) to deduce how systems behave.

And in fact, it’s still quite possible that something similar happens in the Navier–Stokes equations for fluid mechanics. There are lots of partial results, but it’s still not known whether starting from smooth initial conditions, the Navier–Stokes equations can generate singularities.

From a physics point of view, though, there’s something to say: the Navier–Stokes equations for fluids are derived by assuming that the velocity field doesn’t change too rapidly in space or time. And that’s a fine assumption when the velocities are small. But as soon as there’s supersonic flow, there are shocks where the velocity changes rapidly. Viscosity smooths out the shocks a bit, but by the time one’s in the hypersonic regime, at Mach 4 or so, the shocks get very sharp—in fact, so sharp that their width is less than the typical distance between collisions for molecules in the fluid. And the result of this is that the continuum description of the fluid necessarily breaks down, and one has to start looking at the underlying molecular structure.

OK, so can naked singularities actually occur in practice in General Relativity? We know they occur if you somehow have a J > M² object. But what if you start from a realistic star, or some other distribution of matter? Can it spontaneously evolve to produce a naked singularity?

It was proved a few decades ago that if you start with something that’s close to ordinary flat spacetime, it can’t spontaneously make singularities. But if you start putting matter in, then the story changes. And in fact there are now several examples known where a smooth initial distribution of matter can evolve to make a naked singularity—though the singularity only shows up if the initial conditions are very carefully arranged and as soon as there’s any perturbation, it goes away.

Can one get a stable naked singularity without this kind of special setup? So far, nobody knows.

And nobody knows whether J > M² objects can be formed. If one looks at candidate black holes around the universe, most of them are rotating. The final one from the week before last had J ≃ 0.7 M². And it’s certainly interesting to note that while many have J close to M², none seen so far have J > M². It’s also interesting that in numerical simulations of pairs of rotating black holes, they always eventually merge—but if the result would have J > M² they seem to “delay” their merger, and emit lots of gravitational radiation that gets rid of angular momentum, before merging to produce a black hole with J < M².

Gravitational Waves

People have been talking about gravitational waves for almost a century, and there’s been indirect evidence of them for a while. But the recent announcement of direct detection of gravitational waves is pretty exciting.

So what are gravitational waves? They’re a fairly direct analog of electromagnetic waves. If you take a charge and wiggle it around, it’ll radiate electromagnetic waves—for example, radio waves. And in a directly analogous way, if you take a mass and wiggle it around, it’ll radiate gravitational waves. Usually they’ll be incredibly weak. But if the mass is very big and concentrated, like a black hole, the gravitational waves can be stronger—and, as we’ve now seen, even strong enough to detect.

Why is there radiation when you wiggle something around? It’s not hard to see. Imagine, say, that there’s a charge sitting somewhere, and you’re some distance away. There’ll be electric field from the charge—that’s, say, pointing towards the charge. Now suddenly move the charge. After things have stabilized again, there’d better be a new version of the electric field, say pointing to the new position of the charge. But how does the transition happen? The answer is that the change somehow has to propagate outward from the charge—and the process of that happening is electromagnetic radiation, which (in a vacuum) moves at the speed of light.

In general, the amount of electromagnetic radiation that’s produced is proportional to (the square of) the acceleration of the charge. (Actually, there’s considerable subtlety to this, particularly in the relativistic case—and the details of the globally correct formula are still somewhat debated.) It’s similar for gravitational radiation.

There are some differences though. A minimal antenna for electromagnetic radiation is a straight wire, that electrons can go up and down. For gravitational radiation, the minimal “antenna” has to be something that effectively has motion in two perpendicular directions—or, more technically, a changing quadrupole moment. In practice, two bodies orbiting each other will emit gravitational radiation, more or less as a result of the acceleration necessary to keep them in their orbits. More or less any mass that “blobs around” without being spherically symmetric will also emit gravitational waves.

When something emits gravitational waves, it’s radiating away some of its energy. And in general the emission of gravitational radiation tends to have a damping effect on the motion of things. For example, the emission of gravitational radiation will make orbits decay—and makes orbiting bodies progressively spiral in towards each other.

For something like the Earth and the Sun, this is an absolutely infinitesimal effect. But for a pair of neutron stars orbiting each other, it’s more significant. And indeed, starting in 1974 such an effect was observed in a binary pulsar. And now, this is what caused two black holes eventually to spiral in so far that they hit each other—and produce the event just announced.

Once two black holes hit, there’s a tremendous amount of gravitational radiation emitted as the resulting object “blobs around” before assuming its final single-black-hole shape. For stellar-sized black holes it all happens in a few hundred milliseconds. And in the case of the event just announced, the total energy in gravitational radiation was a whopping 3 solar masses—big enough that we’re able to detect it a significant fraction of the way across the universe.

Mathematics of Waves

Pretty much any kind of field or continuous material supports some kind of waves. Start from whatever the stable state of the system is, then perturb it just a little by periodically changing something, and you’ll get waves. When the amplitude of the waves is small enough, the math tends to be fairly straightforward. For example, in a first approximation, the amplitudes of different waves at a particular point will just add linearly.

But when the amplitudes of the waves get bigger, things can get much more complicated. In electromagnetism, everything stays linear however big the amplitude is (well, until one runs into quantum effects). But for pretty much any other kind of waves—including, say, water waves, as well as gravitational waves—there start to be nonlinear effects as soon as the amplitude is larger.

When there’s linearity, one can effectively break down any field configuration into a sequence of non-interacting waves of different frequencies. But that’s no longer true for something nonlinear, and eventually it usually doesn’t make sense to talk about waves at all: one’s just dealing with some field configuration or another.

In the case of gravitational waves, one of the notable features is that one can in principle arrange waves to combine so that they’ll form black holes. Indeed, one can potentially start with low-amplitude waves, but somehow make them converge to a point where they’ll generate a black hole (think “gravitational implosion lens”, etc.).

Two Black Holes

A single static black hole in an infinite universe is a possible solution to Einstein’s equations. So what about two black holes orbiting each other? Well, there’s no known exact solution to the equations for this case, and it’s only fairly recently that it’s become possible to calculate with any reliability what happens.

Roughly, there are three regimes. First, the black holes are peacefully orbiting, and emitting gravitational radiation. When the black holes are far apart, and have velocities small compared to the speed of light, it’s fairly straightforward. But as they get closer and speed up, it becomes more complicated. Each black hole perturbs the other, but with a lot of algebra it’s possible to calculate the effects (as a power series in v/c).

Eventually, though, this breaks down, and the only choice is to solve the Einstein equations numerically using many of the same methods traditionally used for fluid mechanics. (There’ve been various efforts to use the same kind of cellular automaton approach on the Einstein equations that I used for the Navier–Stokes equations, but I think what’s more promising is to try something like my network-rewriting models for gravity.)

It’s only in recent years that computers have become fast enough to get sensible answers from computations like this involving high gravitational fields as well as velocities close to the speed of light. And in these computations, the result is that something like a single black hole is formed. Inevitably it’s a deformed black hole, and the third regime is one where—a bit like a bell—the black hole “rings down” these deformations (either by emitting gravitational radiation, or by absorbing them into the black hole itself).

It’s a pretty complicated stack of computations, requiring a variety of different methods. But the impressive thing is that—judging from the recent announcement—it seems to correctly capture what goes on in the interaction between two black holes.

There are plenty of detailed issues, however. One of them is that you can’t just set up some elaborate initial state with two black holes and expect that it will be a solution to the Einstein equations, even for an instant. So in addition to working out the time evolution, one also has to somehow progressively modify the initial conditions one specifies, so that they actually correspond to a possible configuration of the gravitational field according to Einstein’s equations.

If we want to start thinking about black hole configurations for purposes of technology, it would help to devise a simplified summary of interactions between two—or more—black holes. For example, one might want to have a summary of the effects of the direction of rotation (or “spin”) and of orbiting on black holes’ interactions, organized (in analogy with quantum systems) into spin-orbit, spin-spin, etc. components.

Gravitational Turbulence?

It’s a general feature of fluids that when they flow rapidly, they tend to show turbulence and behave in seemingly random ways. It’s still not completely clear what the origin of this apparent randomness is. It could be that somehow one is seeing an amplified version of small-scale random molecular motions. Or it could be there is enough instability that one is progressively exploring random details of initial conditions (as in chaos theory). I’ve spent a long time studying this, and my conclusion is that the randomness mostly isn’t coming from things that are essentially outside of the fluid; it’s instead coming from the actual dynamics of the fluid, as if the fluid were computing my rule 30 cellular automaton, or running a pseudorandom number generator.

If one works with the standard Navier–Stokes equations for fluid mechanics, it’s not very clear what’s going on—because one ends up having to solve the equations numerically, and whenever something complicated happens, it’s almost impossible to tell if it’s a consequence of the numerical analysis one’s done, or a genuine feature of the equations. I sidestepped these issues by using cellular automaton models for fluids rather than differential equations—and from that it’s pretty clear that intrinsic randomness generation is at least a large part of what’s going on. And having seen this, my expectation would be that if one could solve the equations well enough, one would see exactly the same behavior in the Navier–Stokes equations.

So what about the Einstein equations? Can they show turbulence? I’ve long thought that they should be able to, although to establish this will run into the same kinds of numerical-analysis issues as with the Navier–Stokes equations, though probably in an even more difficult form.

In a fluid the typical pattern is that one starts with a large-scale motion (say induced by an airplane going through the air). Then what roughly happens (at least in 3D) is that this motion breaks down into a cascade of smaller and smaller eddies, until the eddies are so small that they are damped out by viscosity in the fluid.

Would something similar happen with turbulence in the gravitational field? It can’t be quite the same, because unlike fluids, which dissipate small-scale motion by turning it into heat, the gravitational field has no such dissipation mechanism, at least according to Einstein’s equations (without adding matter, quantum effects, etc.). (Note that even with ordinary fluid mechanics, things are very different in 2D: there eddies tend not to break into smaller ones, but instead to combine into larger ones, perhaps like the Great Red Spot on Jupiter.)

My guess is that a phenomenon akin to turbulence is endemic in systems that have fields which can interact with themselves. Another potential example is the classical analog of QCD—or, more simply, classical Yang–Mills theory (the theory of a classical self-interacting color field). Yang–Mills theory shares with gravity the feature that it exhibits no dissipation, but is mathematically perhaps simpler. For years I’ve been asking people who do lattice-gauge-theory simulations whether they see any analog of turbulence. But with the randomized sampling (as opposed to evolution) approach they typically use, it’s hard to tell. (There are mathematical connections between versions of gravity and versions of Yang–Mills theory that have been extensively explored in recent years, but I don’t know what implications they have for questions of turbulence.)

Orbiting a Black Hole

In Newton’s theory of gravity, there’s an inverse square law for the force of gravity. Sufficiently far away from a massive object, the same law holds in General Relativity too. With an inverse square law for gravity, the orbit of a pointlike object around any spherical mass will always be an ellipse (just like Newton said it should be for Halley’s Comet). And every time the object goes around its orbit, it will just retrace the exact same ellipse, keeping the long axis of the ellipse in the same direction.

But what happens in General Relativity, and with black holes? The first important fact is that if something is spherically symmetric, then the gravitational field it produces outside itself must always be given exactly by the Schwarzschild solution to Einstein’s equations. That’s true for a perfectly spherical star, and it’s also true for a non-rotating black hole. And in fact that’s why it was often hard to tell if you were dealing with a genuine black hole: because the gravitational field outside it would be the same as for a star of the same mass.

So what happens according to General Relativity if you’re in orbit around something spherical? In a first approximation, the orbit is still elliptical, but the axis of the ellipse can change (“precess”)—and in fact one of the early successes of General Relativity was to explain an effect like this that had been seen for the orbit of the planet Mercury (the “advance of the perihelion”).

Here’s what actually happens as the orbital distance goes down:

Orbits around spherical objects per General Relativity

The object in the middle looks larger and larger relative to the orbit. In the final picture, there’s no orbit at all, and one just spirals into the object in the middle. In the other cases, there are roughly elliptical orbits, but the precession effect gets larger and larger, and typically one ends up eventually visiting a whole ring of possible positions. (There’s an interactive version of this on the Wolfram Demonstrations Project.)

But does this always happen? The answer is no: one can pick special initial conditions that instead give a variety of closed orbits with various patterns:

Closed orbits from special initial conditions

So what about a rotating object, or specifically a rotating black hole? One notable feature is a phenomenon called “frame dragging”, which causes orbits to be pulled towards rotating along with the object. A consequence of this is that unless the orbit precisely follows the direction of rotation, it won’t stay in a single plane, and—in a seemingly quite random way—will typically fill up not a ring but a whole 3D torus. (Try out the interactive demonstration to see this.)

Frame-dragging of orbits around rotating objects

Although it eventually fills in a torus, the pattern of the orbit can be fairly different depending on what initial “latitude” one starts from (all these are shown for the same total time):

Starting from a different latitude can produce different results

If you’re sufficiently far away from the black hole, then it turns out that even though you’re pulled by frame dragging, you can in principle overcome the force (say with a powerful enough rocket). But if you’re inside a region called the ergosphere (indicated by the gray region in the pictures), you’d have to be going faster than the speed of light to do that. So the result is that any object that gets into the ergosphere (which extends outside of the event horizon) will inevitably be made to co-rotate with the black hole, just through frame dragging.

And this means that if you can put something into the ergosphere, it can gain energy—ultimately by reducing the angular momentum of the black hole. One could imagine using this as a way to harvest the energy of a black hole—and indeed astronomical phenomena like high-energy gamma ray bursts are thought to be possibly related.

The Three-Body Problem

OK, so we’ve talked about orbiting a black hole, and earlier about what happens with two black holes. But what about with more black holes? Well, we can start by asking that question just for simple point masses following Newton’s law of gravity—and it turns out that even there things are already extremely complicated.

The pictures below show a bunch of possible trajectories for three equal-mass pointlike objects interacting through ordinary Newtonian gravity. The only difference between the setup for the different pictures is where the objects were started. But one can see that just changing this initial condition leads to an incredible diversity of behavior:

Slight changes in the starting location can give very different trajectories

Here are some animated versions:

An animation of trajectories from the three-body problem

Solving the necessary differential equations is fast enough these days in the Wolfram Language that one can actually generate these interactively. Here’s a version in 2D where you can interactively move around the initial positions and velocities:

And here’s a version in 3D where you can set all the positions and velocities in 3D:

If we just had two objects (a “two-body problem”), all that would ever happen is that they’d orbit each other in a simple ellipse. But adding a third object (“three-body problem”) immediately allows dramatically more complexity. Sometimes in the end all three objects just go their separate ways. Sometimes two form a binary system and the third goes separately. And sometimes all three make anything from an orderly arrangement to a complicated tangled mess.

The three-body problem turns out to be a classic example of the chaos-theory idea of sensitive dependence on initial conditions: in many situations, even the tiniest change in, say, the initial position of an object will be progressively amplified. And the result is that if one specifies the initial conditions by numbers (say, for coordinate positions), then the evolution of the system will effectively “excavate” more and more digits in these numbers.

Here’s a particularly simple example. Imagine having a pair of objects in a simple elliptical orbit. Then a third object (assumed to have infinitesimally small mass) is started a certain distance above the plane of the ellipse. Gravity will make the third object oscillate back and forth through this plane forever. But the tricky thing is that the details of these oscillations depend arbitrarily sensitively on the details of the initial conditions.

This picture shows what happens when one starts that third object at one of four different coordinate positions that differ by one part in a billion. For a while, all of them follow what looks like exactly the same trajectory. But then they start to diverge, and eventually each of them does something completely different:

Plotting this in 3D (with the initial position z(0) shown going into the page) we can see just how random things can get—even though each specific trajectory is precisely determined by the sequence of digits in the real number that represents its initial condition. (It’s not trivial, by the way, to compute these pictures correctly; it requires using the arbitrary-precision number arithmetic of the Wolfram Language—and as time goes on more and more digits are needed.)

Not surprisingly, there’s no simple formula that represents these results. But a few interesting things have been proved—for example that if one measures each oscillation by how many orbits are completed while it is happening, then one can get any sequence of integers one wants by choosing the initial conditions appropriately.

The two-body problem was solved in terms of mathematical formulas by Isaac Newton in 1687—as a highlight of his introduction of calculus. And in the 1700s and 1800s it was assumed that eventually someone would find the same kind of solution for the three-body problem. But by the end of the 1800s there were results (notably by Henri Poincaré) that suggested there couldn’t be a solution in terms of at least certain kinds of functions.

It’s still not proved that there can’t be solutions in terms of any kind of known functions (much as even though there aren’t algebraic solutions to quintic equations, there are ones in terms of elliptic or hypergeometric functions). But I strongly suspect that there can never, even in principle, be a complete solution to the three-body problem as an explicit formula.

Gravitational Computation

One can think of the time evolution of a system of masses interacting according to gravity as being a computation: you put in the initial conditions, and then you get out where the masses are after a certain time. But how sophisticated is this computation? For the two-body problem, it’s fairly simple. In fact, however long the actual two-body system runs, one can always find the outcome just by plugging numbers into a straightforward formula.

But what about the three-body problem? The pictures above suggest a very different story. And indeed my guess is that the evolution of a three-body system can correspond to an arbitrarily sophisticated computation—and that with suitable initial conditions it should in fact be able, for example, to emulate any Turing machine, and thus act as a universal computer.

I’ve suspected computational universality in the three-body problem for about 35 years now. But it’s a technically complicated thing to prove. Usually in studying computation we look at fundamentally discrete systems, like Turing machines or cellular automata. But the three-body problem is fundamentally continuous—and can for example make use of arbitrarily many digits in the real numbers it’s given as initial conditions.

Still, at least from a formal point of view, one can set up initial conditions that have, say, a finite sequence of nonzero digits. Then one can look at the output from the evolution of the system, binning the results to get a sequence of discrete data (e.g. using ideas of symbolic dynamics). And then the question is whether by changing the initial conditions we can have the output sequence correspond to the result from any program we want—say one that shows which successive numbers are prime, or computes the digits of pi.

So what would it mean if we could prove this kind of computational universality? One thing it would mean is that three-body problem must be computationally irreducible, so there couldn’t ever be a way to “shortcut”—say with a formula—the actual computation it does in getting a result. And another thing it means is that certain infinite-time questions—like whether a particular body can ever escape for any of a particular range of initial conditions—could in general be undecidable.

(There’s a whole discussion about whether the three-body problem, because it works with real numbers, can compute more than a standard universal computer like a Turing machine, which only works with integers. Suffice it here to say that my strong suspicion is that it can’t, at least if one insists that the initial conditions and the results can be expressed in finite symbolic terms.)

Random Orbits

How stable are the seemingly random trajectories in the three-body problem? Some are very sensitive to the details of the initial conditions, but others are quite robust. And for example, if one were designing a trajectory for a spacecraft, it seems perfectly possible that one could find a complex and seemingly random trajectory that would achieve some purpose one wants.

Are there cases where actual star or planetary systems will exhibit apparent randomness? There were undoubtedly examples even in the history of our own solar system. But because randomness tends to bring bodies into regions where they haven’t been before, there’s a higher chance of disruption by external effects—such as collisions—and so the apparent randomness probably doesn’t typically last under “natural selection for solar systems” when there are many bodies in the system.

In the ever-difficult problem of working out whether something is of “intelligent origin”, the three-body problem adds another twist—because it allows astronomical processes to show complexity just as a consequence of their intrinsic dynamics. If it is indeed possible to do arbitrary computation with a three-body system, then such a system could in principle be programmed to, say, generate the digits of pi, and perhaps make them visible in the light curve of a star. But often the system will show just as complex behavior from many different initial conditions—and one won’t be able to tell whether the behavior has any element of “purpose”.

Gravitational Engineering

Can one pick initial conditions for the three-body problem to achieve particular kinds of behavior? The answer is certainly yes. One example (already found by Lagrange in 1772) is to have the bodies on the corners of an equilateral triangle—which produces stable periodic behavior.

One can find other periodic configurations too:

Stable periodic three-body configurations

And indeed, particularly if one allows more bodies, given some specified periodic trajectory, one can probably find (by fairly traditional gradient descent methods) initial conditions that will reproduce it, at least to some accuracy. (A notable example found in 1993 is just three bodies following a figure-eight orbit.)

But what about more-complex trajectories? Clearly, each set of initial conditions gives some kind of behavior. The question is whether it’s useful.

The situation is similar to what I’ve encountered for a long time in studying simple programs like cellular automata: out there in the computational universe of possible programs, there’s all kinds of rich and complex behavior. Now the issue is to “mine” those examples that are actually useful for something.

In practice, I’ve done lots of “algorithm discovery” in the computational universe, setting up criteria and then searching huge numbers of possible programs to find ones that are useful. And I expect exactly the same can be done for gravitational systems like the three-body problem. It’s really a question of formulating some purpose one’s trying to achieve with the system; then one can just start searching, often quite exhaustively, for a case that achieves that purpose.

So how do black holes work in things like the three-body problem? The basic story is simple: so long as the bodies stay far enough apart, it doesn’t matter whether they’re black holes or just generic masses. But if they get close, there’ll start to be relativistic effects, and that’s where black holes will be important. Presumably, however, one can just set up a constraint that there should be no close approaches, and one will still be able to do plenty of gravitational engineering—with black holes or any other massive objects.

Where Can We Get Black Holes, Anyway?

If we’re going to be able to do serious black hole engineering, we’d better have a serious source of black holes. It’s not clear that our universe is going to cooperate on this. There are probably big black holes at the centers of galaxies (and that may be the rather unsatisfying answer to “what’s the ‘equilibrium’ state” of a large number of self-gravitating objects). There’s probably a decent population of black holes from collapsed massive stars—perhaps one per thousand stars or so, which means 100 million spread across our galaxy.

There’s an important other point to mention about black holes: if current theories correctly graft certain aspects of quantum mechanics onto the classical physics of the Einstein equations, then any black hole will emit Hawking radiation, and will eventually evaporate away as a result. Star-sized black holes would have huge lifespans, but for less-massive black holes, the lifespan goes down, and for a black hole the mass of Halley’s comet, the lifespan would be about a billion years.

What about tiny black holes? Hawking radiation suggests they should evaporate almost instantly: an electron-mass one should be gone in well under 10^-100 seconds. (When I was 15 or so, I remember asking a distinguished physicist whether electrons could actually be black holes. He said it was a stupid idea, which probably it was. But in writing this blog I discovered that Einstein also considered this idea—though about 50 years before I did. And as it happens, in my network-based models, electrons do end up being made of “pure space”, not so unlike black holes.)

Even if it’s hard to get genuine gravitational black holes, one might wonder if there could at least be analogs that are easier to get. And in recent years there’s been some success with making “sonic black holes”—that are at least a rough analog of gravitational black holes, but where it’s sound, rather than light, that’s trapped.

Finally, Black Hole Technology

OK, so we’re now finally ready to talk about creating technology with black holes. I should say at the outset that I’m not at all happy with what I’ve managed to figure out. Lots of things I thought might work turn out simply to be impossible when one looks at them in the light of actual black hole physics. And some others, while perhaps interesting, require assembling large numbers of black holes, which seems almost absurdly infeasible in our universe—given how sparse at least larger black holes seem to be, with only perhaps 10^19 spread across our whole universe.

Time Travel to the Future

But let’s say we just have one black hole. What can we do with it? One answer is to “bask in its time dilation”—or in some sense to use it to do “time travel to the future”.

Special Relativity already exhibits the phenomenon of time dilation, in which time runs more slowly for an object that’s moving quickly. General Relativity also messes around with the rate at which time runs. In particular, in a place with stronger gravity, time runs slower than in a place with weaker gravity. And so this means, for example, that as one goes further from the Earth, time runs slightly faster. (The clocks on GPS satellites are back-corrected for this—making them at least naively appear to “violate General Relativity”.)

Near a black hole, strong gravity can make time run significantly more slowly. There’s a nice example in the movie Interstellar, in which there’s a planet orbiting at exactly the right distance from a black hole with exactly the right parameters—so that time runs much more slowly on the planet, but other gravitational effects there aren’t too extreme.

In a sense, as soon as one has a way to make time locally run slower, one can do “time travel to the future”. For the “traveler” a month might have elapsed—but outside it could have been a century. (It’s worth mentioning that one can achieve the same kind of effect without gravity just by doing a trip in which one accelerates to close to the speed of light.)

Of course, even though this would allow “time travel to the future”, it would give no way to get back. For that, one would need so-called closed timelike curves, which do in principle exist in solutions to the Einstein equations (notably, the one found by Kurt Gödel), but which don’t seem to appear in any physically realizable case. (In a system determined by equations, a closed timelike curve is really less about “traveling in time” than it is about defining a consistency condition between what happens in the past and the future.)

Black-Hole-Mediated Travel

In science fiction, black holes and related phenomena tend to be a staple of faster-than-light travel. At a more mundane level, the kind of “gravity assist” maneuvers that real spacecraft do by swinging, say, around Jupiter could be done on a much larger scale if one could swing around a black hole—where the maximum achievable velocity would be essentially the speed of light.

In General Relativity, the only way to effectively go faster than light is to modify the structure of spacetime. For example, one can imagine a “wormhole” or tube that directly connects different places in space. In General Relativity there’s no way to form such a wormhole if it doesn’t already exist—but there’s nothing to say such wormholes couldn’t already have existed at the beginning of the universe. There is a problem, though, in maintaining an “open wormhole”: the curvature of spacetime at the end would tend to create gravity that would make it collapse.

I don’t know if it can be proved that there’s no configuration of, say, orbiting black holes that would keep the wormhole open. One known way to keep it open is to introduce matter with special properties like negative energy density—which sounds implausible until you consider vacuum fluctuations in quantum field theory, inflationary-universe scenarios or dark-energy ideas.

Introducing exotic matter makes all sorts of new solutions possible for the Einstein equations. A notable example is the Alcubierre solution, which in some sense provides a different way to traverse space at any speed, effectively by warping the space.

Could there be a solution to the Einstein equations that allows something similar, without exotic matter? It hasn’t been proved that it’s impossible. And I suppose one could imagine some configuration of judiciously placed black holes that would make it possible.

It’s perhaps worth mentioning that in the models I’ve studied where the underlying structure of spacetime is a network with no predefined number of space dimensions, wormhole-like phenomena seem more natural—though insofar as the models reproduce General Relativity on large scales, this means such phenomena can’t originate on those scales.

Energy Sources

It’s easy to generate high energies with a black hole. Matter that spirals in towards the black hole will gain energy—and indeed, around stellar and larger black holes there’s potentially an accretion disk that contains high-energy matter.

With rotating black holes, there are some additional energy phenomena. In the ergosphere, objects can gain energy at the expense of the black hole itself. This is relevant both in accelerating ordinary matter, and in producing “superradiance” where energy is added to waves, say of light, that pass through the ergosphere.

Can one do better with multiple black holes than a single one? I don’t know. Maybe there’s a configuration of orbiting black holes that’s somehow optimized for imparting energy to matter—like a kind of particle accelerator made from black holes.

Gravitational Crystals

We saw earlier some of the complex trajectories that three bodies interacting through gravity can follow. But what kind of trajectories can we potentially “engineer”, particularly with more bodies?

It’s not too difficult to start with approximate trajectories and then do gradient descent (e.g. in Fourier space) to try to find trajectories that actually correspond, for example, to closed orbits. So can one for example find a “gravitational crystal” that consists of an infinite regular array of interacting gravitational bodies?

There are some mathematical tricks to apply—and one ends up having to use randomized search more than systematic gradient descent—but there do seem to be gravitational crystals to be found. Here are two potential examples that show a kind of checkerboard symmetry:

An example of a gravitational crystal lattice Another example of a gravitational crystal lattice

I suppose a “gravitational wall” like this might be good for stopping things that approach it. With the right parameters, it might be able to capture anything (perhaps up to some speed) that tries to cross it.

Given a “gravitational crystal”, one can ask about implementing things like cellular automata on it. I don’t know how to store “bits” for cellular automaton cells in lattices like these without disrupting the lattice too much, but I suspect there’s a way. (Yes, classical gravity is reversible, so one would have to have reversible cellular automata, but there are plenty of those.)

What’s shown here is something that’s intended to be a regular, periodic “crystal”. One can also potentially imagine creating a “random crystal” in which there’s overall regularity, but at a small scale there’s seemingly random motion. If one could make such a random crystal work, then it might provide a more robust “wall”, less affected by outside perturbations.

Gravitational Shielding

Modularization is an important general technique in engineering because it lets one break a problem into parts and then solve each one separately. But for gravitational systems, it’s hard to do modularization—because gravity is a large-range force, dropping off only gradually with distance.

And even with spinning black holes and the like, I don’t know of any way to achieve the analog of gravitational shielding—though this changes if one introduces exotic matter that effectively has negative mass, or if, for example, every black hole has electric charge.

And without modularization, it’s surely more difficult to create something technologically useful—because in effect one has to figure out everything at once. But it’s certainly conceivable that by searching a space of possibilities one could find something—though without modularization it might look very complicated (as long-range simple programs, like combinators, tend to do), and it could be difficult even to tell what the system achieves without looking for specific properties one already knows.

What I’m Missing…

Having said all this, I suspect that there are big things I am missing—and that with the right ways of thinking, there’ll end up being some spectacular kinds of technology that black holes make possible. And for all we know, once we figure this out we’ll realize that an example of it has already existed in our universe for a billion years, whether of “natural” origin or not.

But for now, the discovery of gravitational radiation from merging black holes is a remarkable example of how something like the small equation Einstein wrote down for the gravitational field a hundred years ago can lead to such elaborate consequences. It’s an impressive endorsement of the strength of theoretical science—and perhaps an inspiration to see just how small the rules might be to generate everything we see in our universe.

↧

My Life in Technology—As Told at the Computer History Museum

April 19, 2016, 11:19 am

≫ Next: Who Was Ramanujan?

≪ Previous: Black Hole Tech?

Edited transcript of a talk given on March 4, 2016, at the Computer History Museum, Mountain View, California.

Stephen Wolfram at the Computer History Museum—click to play video

I normally spend my time trying to build the future. But I find history really interesting and informative, and I study it quite a lot. Usually it’s other people’s history. But the Computer History Museum asked me to talk today about my own history, and the history of technology I’ve built. So that’s what I’m going to do here.

This happens to be a really exciting time for me—because a bunch of things that I’ve been working on for more than 30 years are finally coming to fruition. And mostly that’s what I’ve been out in the Bay Area this week talking about.

The focus is the Wolfram Language, which is really a new kind of language—a knowledge-based language—in which as much knowledge as possible about computation and about the world is built in. And in which the language automates as much as possible so one can go as directly as possible from computational thinking to actual implementation.

And what I want to do here is to talk about how all this came to be, and how things like Mathematica and Wolfram|Alpha emerged along the way.

Inevitably a lot of what I’m going to talk about is really my story: basically the story of how I’ve spent most of my life so far building a big stack of technology and science. When I look back, some of what’s happened seems sort of inevitable and inexorable. And some I didn’t see coming.

But let me begin at the beginning. I was born in London, England, in 1959—so, yes, I’m outrageously old, at least by my current standards. My father ran a small company—doing international trading of textiles—for nearly 60 years, and also wrote a few “serious fiction” novels. My mother was a philosophy professor at Oxford. I actually happened to notice her textbook on philosophical logic in the Stanford bookstore last time I was there.

You know, I remember when I was maybe 5 or 6 being bored at some party with a bunch of adults, and somehow ending up talking at great length to some probably very distinguished Oxford philosopher—who I heard say at the end, “One day that child will be a philosopher—but it may take a while.” Well, they were right. It’s sort of funny how these things work out.

Here’s me back then:

I went to elementary school in Oxford—to a place called the Dragon School, that I guess happens to be probably the most famous elementary school in England. Wikipedia seems to think the most famous people now from my class there are myself and the actor Hugh Laurie.

Here’s one of my school reports, from when I was 7. Those are class ranks. So, yes, I did well in poetry and geography, but not in math. (And, yes, it’s England, so they taught “Bible study” in school, at least then.) But at least it said “He is full of spirit and determination; he should go far”…

My Dragon School half-term report card, age 7

But OK, that was 1967, and I was learning Latin and things—but what I really liked was the future. And the big future-oriented thing happening back then was the space program. And I was really interested in that, and started collecting all the information I could about every spacecraft launched—and putting together little books summarizing it. And I discovered that even from England one could write to NASA and get all this great stuff mailed to one for free.

An early Stephen Wolfram drawing of the Ranger spacecraft

An early Stephen Wolfram drawing of the Luna 9 spacecraft

An early Stephen Wolfram drawing of a Lunar Orbiter spacecraft

An early Stephen Wolfram drawing of the Surveyor 7 spacecraft

Well, back then, there was supposed to be a Mars colony any day, and I started doing little designs for that, and for spacecraft and things.

An early Stephen Wolfram drawing of a Mars colony

An early Stephen Wolfram page of information on the Orion spacecraft

An early Stephen Wolfram diagram of the use of boosters for spacecraft

An early Stephen Wolfram drawing of a spacecraft on Mars

And that got me interested in propulsion and ion drives and stuff like that—and by the time I was 11 what I was really interested in was physics.

An early Stephen Wolfram drawing of temperature control for the Lunar Orbiter

An early Stephen Wolfram drawing: Bouncing radio waves off of Venus

An early Stephen Wolfram drawing: Laser construction

And I discovered—having nothing to do with school—that if one just reads books one can learn stuff pretty quickly. I would pick areas of physics and try to organize knowledge about them. And when I was turning 12 I ended up spending the summer putting together all the facts I could accumulate about physics. And, yes, I suppose you could call some of these “visualizations”. And, yes, like so much else, it’s on the web now:

A page from the Directory of Physics I wrote when I was 12, showing the abundance of elements in the solar system

A page from the Directory of Physics I wrote when I was 12, showing crystal structures for sodium chloride and graphite

Another page from my age-12 Directory of Physics, showing neutron numbers for high-half-life nuclei

A Hertzprung-Russell diagram from my age-12 Directory of Physics

I found this again a few years ago—around the time Wolfram|Alpha came out—and I thought, “Oh my gosh, I’ve been doing the same thing all my life!” And then of course I started typing in numbers from when I was 11 or 12 to see if Wolfram|Alpha got them right. It did, of course:

Well, when I was 12, following British tradition I went to a so-called public school that’s actually a private school. I went to the most famous such school—Eton—which was founded about 50 years before Columbus came to America. And, oh so impressively , I even got the top scholarship among new kids in 1972.

Yes, everyone wore tailcoats all the time, and King’s Scholars, like me, wore gowns too—which provided excellent rain protection etc. I think I avoided these annual Harry-Potter-like pictures all but one time:

An Eton class photo

And back in those Latin-and-Greek-and-tailcoat days I had a sort of double life, because my real passion was doing physics.

The summer when I turned 13 I put together a summary of particle physics:

Table of contents from my age-13 Physics of Subatomic Particles book

The start of the first chapter in my age-13 Physics of Subatomic Particles book

And I made the important meta-discovery that even if one was a kid, one could discover stuff. And I started just trying to answer questions about physics, either by finding answers in books, or by figuring them out myself. And by the time I was 15 I started publishing papers about physics. Yes, nobody asks how old you are when you mail a paper in to a physics journal.

But, OK, something important for me had happened back when I was 12 and first at Eton: I got to know my first computer. It’s an Elliott 903C. This is not the actual one I used, but it’s similar:

It had come to Eton through a teacher of mine named Norman Routledge, who had been a friend of Alan Turing’s. It had 8 kilowords of 18-bit ferrite core memory, and you usually programmed it with paper—or Mylar—tape, most often in a little 16-instruction assembler called SIR.

SIR assembler instructions and code times

It often seemed like one of the most important skills was rewinding the tape as quickly as possible after it got dumped in a bin after going through the optical reader.

Anyway, I wanted to use the computer to do physics. When I was 12 I had gotten this book:

What’s on the cover is supposed to be a simulation of gas molecules showing increasing randomness and entropy. As it happens, years later I discovered this picture was actually kind of a fake. But back when I was 12, I really wanted to reproduce it—with the computer.

It wasn’t so easy. The molecule positions were supposed to be real numbers; one had to have an algorithm for collisions; and so on. And to make this fit on the Elliott 903 I ended up simplifying a lot—to what was actually a 2D cellular automaton.

Well, a decade after that, I made some big discoveries about cellular automata. But back then I was unlucky with my cellular automaton rule, and I ended up not discovering anything with it. And in the end my biggest achievement with the Elliott 903 was writing a punched tape loader for it.

Computer paper tape

You see, the big problem with the Mylar tape that one used for serious programs is that it would get statically electrically charged and pick up little confetti holes, so the bits would be read wrong. Well, for my loader, I came up with what I later found out were error-correcting codes—and I set it up so that if the checks failed, the tape would stop in the reader, and you could pull it back a couple of feet, and then re-read it, after shaking out the confetti.

OK, so by the time I was 16 I had published some physics papers and was starting to be known in physics circles—and I left school, and went to work at a British government lab called the Rutherford Lab that did particle physics research.

Now you might remember from my age-7 school report that I didn’t do very well in math. Things got a bit better when I started to use a slide rule, and then in 1972 a calculator—of which I was a very early adopter. But I never liked doing school math, or math calculations in general. Well, in particle physics there’s a lot of math to be done—and so my dislike of it was a problem.

At the Rutherford Lab, two things helped. First, a lovely HP desktop computer with a plotter, on which I could do very nice interactive computation. And second, a mainframe for crunchier things, that I programmed in Fortran.

Well, after my time at the Rutherford Lab I went to college at Oxford. Within a very short time I’d decided this was a mistake—but in those days one didn’t actually have to go to lectures for classes—so I was able to just hide out and do physics research. And mostly I spent my time in a nice underground air-conditioned room in the Nuclear Physics building—that had terminals connected to a mainframe, and to the ARPANET.

And that was when—in 1976—I first started using computers to do symbolic math, and algebra and things. Feynman diagrams in particle physics involve lots and lots of algebra. And back in 1962, I think, three physicists had met at CERN and decided to try to use computers to do this. They had three different approaches. One wrote a system called ASHMEDAI in Fortran. One—influenced by John McCarthy at Stanford—wrote a system called Reduce in Lisp. And one wrote a system called SCHOONSCHIP in CDC 6000 series assembly language, with mnemonics in Dutch. Curiously, years later, one of these physicists won a Nobel Prize. It was Tini Veltman—the one who wrote SCHOONSCHIP in assembly language.

Anyway, back in 1976 very few people other than the creators of these systems used them. But I started using all of them. But my favorite was a quite different system, written in Lisp at MIT since the mid-1960s. It was a system called Macsyma. It ran on the Project MAC PDP-10 computer. And what was really important to me as a 17-year-old kid in England was that I could get to it on the ARPANET.

It was host 236. So I would type @O 236, and there I was in an interactive operating system. Someone had taken the login SW. So I became Swolf, and started to use Macsyma.

I spent the summer of 1977 at Argonne National Lab—where they actually trusted physicists to be right in the room with the mainframe.

Then in 1978 I went to Caltech as a graduate student. By that point, I think I was the world’s largest user of computer algebra. And it was so neat, because I could just compute all this stuff so easily. I used to have fun putting incredibly ornate formulas in my physics papers. Then I could see if anyone was reading the papers, because I’d get letters saying, “How did you derive line such-and-such from the one before?”

Some physics papers I wrote while at Caltech

I got a reputation for being a great calculator. Which was of course 100% undeserved—because it wasn’t me, it was just the computer. Well, actually, to be fair, there was part that was me. You see, by being able to compute so many different examples, I had gotten a new kind of intuition. I was no good at computing integrals myself, but I could go back and forth with the computer, knowing from intuition what to try, and then doing experiments to see what worked.

I was writing lots of code for Macsyma, and building this whole tower. And sometime in 1979 I hit the edge. Something new was needed. (Notice, for example, the ominous “MACSYMA RELOAD” line in the diagram.)

ILINT routine structure I wrote for Macsyma

Well, in November 1979, just after I turned 20, I put together some papers, called it a thesis, and got my PhD. And a couple of days later I was visiting CERN in Geneva—and thinking about my future in, I thought, physics. And the one thing I was sure about was that I needed something beyond Macsyma that would let me compute things. And that was when I decided I had to build a system for myself. And right then and there, I started designing the system, handwriting its specification.

At first it was going to be ALGY–The Algebraic Manipulator. But I quickly realized that I actually had to make it do much more than algebraic manipulation. I knew most of the general-purpose computer languages of the time—both the ALGOL-like ones, and ones like Lisp and APL. But somehow they didn’t seem to capture what I wanted the system to do.

So I guess I did what I’d learned in physics: I tried to drill down to find the atoms—the primitives—of what’s going on. I knew a certain amount about mathematical logic, and the history of attempts to formulate things using logic and so on—even if my mother’s textbook about philosophical logic didn’t exist yet.

The whole history of this effort at formalization—through Aristotle, Leibniz, Frege, Peano, Hilbert, Whitehead, Russell, and so on—is really interesting. But it’s a different talk. But back in 1979 it was thinking about this kind of thing that led me to the design I came up with, that was based on the idea of symbolic expressions, and doing transformations on them.

I named what I wanted to build SMP: a Symbolic Manipulation Program, and started recruiting people from around Caltech to help me with it. Richard Feynman came to a bunch of the meetings I had to discuss the design of SMP, offering various ideas—which I have to admit I considered hacky—about shortcuts for interacting with the system. Meanwhile, the physics department had just gotten a VAX 11/780, and after some wrangling, it was made to run Unix. Meanwhile, a young physics grad student named Rob Pike—more recently creator of the Go programming language—persuaded me that I should write the code for my system in the “language of the future”: C.

I got pretty good at writing C, for a while averaging about a thousand lines a day. And with the help of a somewhat colorful collection of characters, by June 1981, the first version of SMP existed—with a big book of documentation I’d written.

OK, you might ask: so can we see SMP? Well, back when we were working on SMP I had the bright idea that we should protect the source code by encrypting it. And—you guessed it—over a span of three decades nobody remembers the password. And until a little while ago, that was the situation.

In another bright idea, I had used a modified version of the Unix crypt program to do the encryption—thinking that would be more secure. Well, as part of the 25th anniversary of Mathematica a couple of years ago, we did a crowdsourced project to break the encryption—and we did it. Unfortunately it wasn’t easy to compile the code though—but thanks to a 15-year-old volunteer, we’ve actually now got something running.

So here it is: running inside a VAX virtual machine emulator, I can show you for the first time in public in 30 years—a running version of SMP.

SMP running on a VAX emulator—first time in 30 years!

SMP had a mixture of good ideas, and very bad ideas. One example of a bad idea—actually suggested to me by Tini Veltman, author of SCHOONSHIP—was representing rationals using floating point, so one could make use of the faster floating-point instructions on many processors. But there were plenty of other bad ideas too, like having a garbage collector that had to crawl the stack and realign pointers when it ran.

There were some interesting ideas. Like what I called “projections”—which were essentially a unification of functions and lists. They were almost wonderful, but there were confusions about currying—or what I called tiering. And there were weird edge cases about things that were almost vectors with sequential integer indices.

But all in all, SMP worked pretty well, and I certainly found it very useful. So now the next problem was what to do with it. I realized it needed a real team to work on it, and I thought the best way to get that was somehow to make it commercial. But at the time I was a 21-year-old physics-professor type, who didn’t know anything about business.

So I thought, let me go to the tech transfer office at the university, and ask them what to do. But it turned out they didn’t know, because, as they explained, “Mostly professors don’t come to us; they just start their own companies.” “Well,” I said, “can I do that?” And right then and there the lawyer who pretty much was the tech transfer office pulled out the faculty handbook, and looked through it, and said, “Well, yes, it says copyrightable materials are owned by their authors, and software is copyrightable, so, yes, you can do whatever you want.”

And so off I went to try to start a company. Though it turned out not to be so simple—because suddenly the university decided that actually I couldn’t just do what I wanted.

A couple of years ago I was visiting Caltech and I ran into the 95-year-old chap who had been the provost at the time—and he finally filled in for me the remaining details of what he called the “Wolfram Affair”. It was more bizarre than one could possibly imagine. I won’t tell it all here. But suffice it to say that the story starts with Arnold Beckman, Caltech postdoc in 1929, claiming rights to the pH meter, and starting Beckman Instruments—and then in 1980 being chairman of the Caltech board of trustees and being upset when he realized that gene-sequencing technology had been invented at Caltech and had “walked off campus” to turn into Applied Biosystems.

But the company I started weathered this storm—even if I ended up quitting Caltech, and Caltech ended up with a weird software-ownership policy that affected their computer-science recruiting efforts for a long time.

I didn’t do a great job starting what I called Computer Mathematics Corporation. I brought in a person—who happened to be twice my age—to be CEO. And rather quickly things started to diverge from what I thought made sense.

One of my favorite moments of insanity was the idea to get into the hardware business and build a workstation to run SMP on. Well, at the time no workstation had enough memory, and the 68000 didn’t handle virtual memory. So a scheme was concocted whereby two 68000s would run an instruction out of step, and if the first one saw a page fault, it would stop the other one and fetch the data. I thought it was nuts. And I also happened to have visited Stanford, and run into a grad student named Andy Bechtolsheim who was showing off a Stanford University Network—SUN—workstation with a cardboard box as a case.

But worse than all that, this was 1981, and there was the idea that AI—in the form of expert systems—was hot. So the company merged with another company that did expert systems, to form what was called Inference Corporation (which eventually became Nasdaq:INFR). SMP was the cash cow—selling for about $40,000 a copy to industrial and government research labs. But the venture capitalists who’d come in were convinced that the future was expert systems, and after not very long, I left.

Meanwhile I’d become a big expert on the intellectual property policies of universities—and eventually went to work at the Institute for Advanced Study in Princeton, where the director very charmingly said that since they’d “given away the computer” after von Neumann died, it didn’t make much sense for them to claim IP rights to anything now.

I dived into basic science, working a lot on cellular automata, and discovering some things I thought were very interesting. Here’s me with my SUN workstation with cellular automata running on it (and, yes, the mollusc looks like the cellular automaton):

Me working with cellular automata on a SUN workstation in my office (located over Einstein's old one)

A mollusc shell with a pattern like the cellular automaton on the screen

I did some consulting work, mostly on technology strategy, which was very educational, particularly in seeing things not to do. I did quite a lot of work for Thinking Machines Corporation. I think my most important contribution was going to see the movie WarGames with Danny Hillis—and as we were walking out of the movie theater, saying to Danny, “Maybe your computer should have flashing lights too.” (The flashing lights ended up being a big feature of the Connection Machine computer—certainly important in its afterlife in museums.)

I was mostly working on basic science—but “because it would be easy” I decided to do a software project of building a C interpreter that we called IXIS. I hired some young people—one of whom was Tsutomu Shimomura, whom I’d already fished out of several hacking disasters. I made the horrible mistake of writing the boring code nobody else wanted to write myself—so I wrote a (quite lovely) text editor, but the whole project flopped.

I had all kinds of interactions with the computer industry back then. I remember Nathan Myhrvold, then a physics grad student at Princeton, coming to see me to ask what to do with a window system he’d developed. My basic suggestion was “sell it to Microsoft”. As it happens, Nathan later became CTO of Microsoft.

Well, by about 1985 I’d done a bunch of basic science I was pretty pleased with, and I was trying to use it to start the field of what I called complex systems research. I ended up getting a little involved in an outfit called the Rio Grande Institute—that later became the Santa Fe Institute—and encouraging them to pursue this kind of research. But I wasn’t convinced about their chances, and I resolved to start my own research institute.

So I went around to lots of different universities, in effect to get bids. The University of Illinois won, ironically in part because they thought it would help their chances getting funding from the Beckman Foundation—which in fact it did. So in August 1986, off I went to the University of Illinois, and the cornfields of Champaign-Urbana, 100 miles south of Chicago.

I think I did pretty well at recruiting faculty and setting things up for the new Center for Complex Systems Research—and the university lived up to its end of the bargain too. But within a few weeks I started to think it was all a big mistake. I was spending all my time managing things and trying to raise money—and not actually doing science.

So I quickly came up with Plan B. Rather than getting other people to help with the science I wanted to do, I would set things up so I could just do the science myself, as efficiently as possible. And this meant two things: first, I had to have the best possible tools; and second, I needed the best possible environment for myself.

When I was doing my basic science I kept on using different tools. There was some SMP. Quite a lot of C. Some PostScript, and graphics libraries, and things. And a lot of my time was spent gluing all this stuff together. And what I decided was that I should try to build a single system that would just do all the stuff I wanted to do—and that I could expect to keep growing forever.

Well, meanwhile, personal computers were just getting to the point where it was plausible to build a system like this that would run on them. And I knew a lot about what to do—and not do—from my experience with SMP. So I started designing and building what became Mathematica.

My scheme was to write documentation to define what to build. I wrote a bunch of core code—for example for the pattern matcher—a surprising amount of which is still in the system all these years later. The design of Mathematica was in many respects less radical and less extreme than SMP. SMP had insisted on using the idea of transforming symbolic expressions for everything—but in Mathematica I saw my goal as being to design a language that would effectively capture all the possible different paradigms for thinking about programming in a nice seamless way.

At first, of course, Mathematica wasn’t called Mathematica. In a strange piece of later fate, it was actually called Omega. It went through other names. There was Polymath. And Technique. Here’s a list of names. It’s kind of shocking to me how many of these—even the really horrible ones—have actually been used for products in the years since.

Early possible Mathematica names

Well, meanwhile, I was starting to investigate how to build a company around the system. My original model was something like what Adobe was doing at the time with PostScript: we build core IP, then license it to hardware companies to bundle. And as it happened, the first person to show interest in that was Steve Jobs, who was then in the middle of doing NeXT.

Well, one of the consequences of interacting with Steve was that we talked about the name of the product. With all that Latin I’d learned in school, I’d thought about the name “Mathematica” but I thought it was too long and ponderous. Steve insisted that “that’s the name”—and had a whole theory about taking generic words and romanticizing them. And eventually he convinced me.

It took about 18 months to build Version 1 of Mathematica. I was still officially a professor of physics, math and computer science at the University of Illinois. But apart from that I was spending every waking hour building software and later making deals.

We closed a deal with Steve Jobs at NeXT to bundle Mathematica on the NeXT computer:

The software license agreement to bundle Mathematica on every NeXT computer

We also made a bunch of other deals. With Sun, through Andy Bechtolsheim and Bill Joy. With Silicon Graphics, through Forest Baskett. With Ardent, through Gordon Bell and Cleve Moler. With the AIX/RT part of IBM, basically through Andy Heller and Vicky Markstein.

And eventually we set a release date: June 23, 1988.

Meanwhile, as documentation for the system, I wrote a book called Mathematica: A System for Doing Mathematics by Computer. It was going to be published by Addison-Wesley, and it was the longest lead-time element of the release. And it ended up being very tight, because the book was full of fancy PostScript graphics—which nobody could apparently figure out how to render at high-enough resolution. So eventually I just took a hard disk to a friend of mine in Canada who had a phototypesetting company, and he and I babysat his phototypesetting machine over a holiday weekend, after which I flew to Logan Airport in Boston and handed the finished film for the book to a production person from Addison-Wesley.

The cover of the first edition of The Mathematica Book, 1988

We decided to do the announcement of Mathematica in Silicon Valley, and specifically at the TechMart place in Santa Clara. In those days, Mathematica couldn’t run under MS-DOS because of the 640K memory limit. So the only consumer version was for the Mac. And the day before the announcement there we were stuffing disks into boxes, and delivering them to the ComputerWare software store in Palo Alto.

The announcement was a nice affair. Steve Jobs came—even though he was not really “out in public” at the time. Larry Tesler came from Apple—courageously doing a demo himself. John Gage from Sun had the sense to get all the speakers to sign a book:

Signatures from the speakers at Mathematica's release announcement: Larry Tesler from Apple; Gordon Bell from Ardent; Eric Lyons from Autodesk; Dana Scott from Carnegie Mellon University; Jerry Glynn from The Math Program; Steve Christensen from NCSA; Steve Jobs, then from NeXT; Forest Baskett from Silicon Graphics; and Andy Bechtolsheim, John Gage, Bill Joy, and Curt Wozniak from Sun

And so that was how Mathematica was launched. The Mathematica Book became a bestseller in bookstores, and from that people started understanding how to use Mathematica. It was really neat seeing all these science types and so on—of all ages—who’d basically never used computers themselves before, starting to just compute things themselves.

It was fun looking through registration cards. Lots of interesting and famous names. Sometimes some nice juxtapositions. Like when I’d just seen an article about Roger Penrose and his new book in Time magazine with the headline “Those Computers Are Dummies”… but then there was Roger’s registration card for Mathematica.

Some early Mathematica registration cards

As part of the growth of Mathematica, we ended up interacting with pretty much all possible computer companies, and collected all kinds of exotic machines. Sometimes that came in handy, like when the Morris worm came through the internet, and our gateway machine was a weird Sony workstation with a Japanese OS that the worm hadn’t been built for.

There were all kind of porting adventures. Probably my favorite was on the Cray-2. With great effort we’d gotten Mathematica compiled. And there we were, ready for the first calculation. And someone typed 2+2. And—I kid you not—it came out “5”. I think it was an issue with integer vs. floating point representation.

You know, here’s a price list from 1990 that’s a bit of a stroll down computer memory lane:

We got a boost when the NeXT computer came out, with Mathematica bundled on it. I think Steve Jobs made a good deal there, because all kinds of people got NeXT machines to run Mathematica. Like the Theory group at CERN—where the systems administrator was Tim Berners-Lee, who decided to do a little networking experiment on those machines.

Well, a couple of years in, the company was growing nicely—we had maybe 150 employees. And I thought to myself: I built this because I wanted to have a way to do my science, so isn’t it time I started doing that? Also, to be fair, I was injecting new ideas at too high a rate; I was worried the company might just fly apart. But anyway, I decided I would take a partial sabbatical—for maybe six months or a year—to do basic science and write a book about it.

So I moved from Illinois to the Oakland Hills—right before the big fire there, which narrowly missed our house. And I started being a remote CEO—using Mathematica to do science. Well, the good news was that I started discovering lots and lots of science. It was kind of a “turn a telescope to the sky for the first time” moment—except now it was the computational universe of possible programs.

It was really great. But I just couldn’t stop—because there kept on being more and more things to discover. And all in all I kept on doing it for ten and a half years. I was really a hermit, mostly living in Chicago, and mostly interacting only virtually… although my oldest three children were born during that period, so there were humans around!

I had thought maybe there’d be a coup at the company. But there wasn’t. And the company continued to steadily grow. We kept on doing new things.

Here’s our first website, from October 7, 1994:

And it wasn’t too long after that we started doing computation on the web:

I actually took a break from my science in 1996 to finish a big new version of Mathematica. Back in 1988 lots of people used Mathematica through a command line interface. In fact, it’s still there today. 1989^1989 is the basic computation I’ve been using since, yes, 1989, to test speed on a new machine. And actually a basic Raspberry Pi today gives a pretty good sense of what it was like back at the beginning.

But, OK, on the Mac and on NeXT back in 1988 we’d invented these things we called notebooks that were documents that mixed text and graphics and structure and computation—and that was the UI. It was all very modern, with a clean front-end/kernel architecture where it was easy to run the kernel on a remote machine—and by 1996 a complete symbolic XML-like representation of the structure of the notebooks.

Maybe I should say something about the software engineering of Mathematica. The core code was written in an extension of C—actually an object-oriented version of C that we had to develop ourselves, because C++ wasn’t efficient enough back in 1988. Even from the beginning, some code was written in the Mathematica top-level language—that’s now the Wolfram Language—and over the years a larger and larger fraction of the code was that way.

Well, back at the beginning it was very challenging getting the front end to run on different machines. And we wound up with different codebases on Mac, NeXT, Microsoft Windows, and X Windows. And in 1996 one of the achievements was merging all that together. And for almost 20 years the code was gloriously merged—but now we’ve again got separate codebases for desktop, browser and mobile, and history is repeating itself.

Back in 1996 we had all kinds of ways to get the word out about the new Mathematica Version 3. My original Mathematica book had now become quite large, to accommodate all the things we were adding.

And we had a couple of other “promotional vehicles” that we called the MathMobiles that drove around with the latest gear inside—and served as moving billboard ads for our graphics.

There were Mathematicas everywhere, getting used for all kinds of things. And of course wild things sometimes happened. Like in 1997 when Mike Foale had a PC running Mathematica on the Mir space station. Well, there was an accident, and the PC got stuck in a part of the space station that got depressurized. Meanwhile, the space station was tumbling, and Mike was trying to debug it—and wanted to use Mathematica to do it. So he got a new copy on the next supply mission—and installed it on a Russian PC.

But there was a problem. Because our DRM system immediately said, “That’s a Russian PC; you can’t run a US-licensed Mathematica there!” And that led to what might be our all-time most exotic customer service call: “The user is in a tumbling space station.” But fortunately we could just issue a different password—Mike solved the equations, and the space station was stabilized.

Well, after more than a decade—in 2002—I finally finished my science project and my big book:

During my “science decade” the company had been steadily growing, and we’d built up a terrific team. But not least because of things I’d learned from my science, I thought it could do more. It was refreshing coming back to focus on it again. And I rather quickly realized that the structure we’d built could be applied to lots of new things.

Math had been the first big application of Mathematica, but the symbolic language I’d built was much more general than that. And it was pretty exciting seeing what we could do with it. One of the things in 2006 was representing user interfaces symbolically, and being able to create them computationally. And that led for example to CDF (our Computable Document Format), and things like our Wolfram Demonstrations Project.

We started doing all sorts of experiments. Many went really well. Some went a bit off track. We wanted to make a poster with all the facts we knew about mathematical functions. First it was going to be a small poster, but then it became 36 feet of poster… and eventually The Wolfram Functions Site, with 300,000+ formulas:

Our mathematical-functions posters, which expanded into the Wolfram Function Site

It was the time of the cell-phone ringtone craze, and I wanted a personal ringtone. So we came up with a way to use cellular automata to compose an infinite variety of ringtones, and we put it on the web. It was actually an interesting AI-creativity experience, and music people liked it. But after messing around with phone carriers for six months, we pretty much didn’t sell a single ringtone.

(Yes, if you go to that site, it’s currently a bit embarrassing, because it’s not working with the current browser releases. It’ll be fixed soon… but what happened is that the webMathematica server behind it was just running unattended for a decade—and now nobody knows how it works…)

But, anyway, having for many years been a one-product company making Mathematica, we were starting to get the idea that we could not only add new things to Mathematica—but also invent all kinds of other stuff.

Well, I mentioned that back when I was a kid I was really interested in trying to do what I’d now call “making knowledge computable”: take the knowledge of our civilization and build something that could automatically compute answers to questions from it. For a long time I’d assumed that to do that would require making some kind of brain-like AI. So, like, in 1980 I worked on neural networks—and didn’t get them to do anything interesting. And every few years after that I would think some more about the computable knowledge problem.

But then I did the science in A New Kind of Science—and I discovered this thing I call the Principle of Computational Equivalence. Which says many things. But one of them is that there can’t be a bright line between the “intelligent” and the “merely computational”. So that made me start to think that maybe I didn’t need to build a brain to solve the computable knowledge problem.

Meanwhile, my younger son, who I think was about six at the time, was starting to use Mathematica a bit. And he asked me, “Why can’t I just tell it what I want to in plain English?” I started explaining how hard that was. But he persisted with, “Well, there just aren’t that many different ways to say any particular thing,” etc. And that got me thinking—particularly about using the science I’d built to try to solve the problem of understanding natural language.

Meanwhile, I’d started a project to curate lots of data of all kinds. It was an interesting thing going into a big reference library and figuring out what it would take to just make all of that computable. Alan Turing had done some estimates of things like that, which were a bit daunting. But anyway, I started getting all kinds of experts on all kinds of topics that tech companies usually don’t care about. And I started building technology and a management system for making data computable.

It was not at all clear this was all going to work, and even a lot of my management team was skeptical. “Another WolframTones” was a common characterization. But the good news was that our main business was strong. And—even though I’d considered it in the early 1990s—I’d never taken the company public, and I didn’t have any investors at all, except I guess myself. So I wasn’t really answering to anyone. And so I could just do Wolfram|Alpha—as I have been able to do all kinds of long-term stuff throughout the history of our company.

And despite the concerns, Wolfram|Alpha did work. And I have to say that when it was finally ready to demo, it took only one meeting for my management team to completely come around, and be enthusiastic about it.

One problem, of course, with Wolfram|Alpha is that—like Mathematica and the Wolfram Language—it’s really an infinite project. But there came a point at which we really couldn’t do much more development without seeing what would happen with real users, asking real questions, in real natural language.

So we picked May 15, 2009 as the date to go live. But there was a problem: we had no idea how high the traffic would spike. And back then we couldn’t use Amazon or anything: to get performance we had to do fancy parallel computations right on the bare metal.

Michael Dell was kind enough to give us a good deal on getting lots of computers for our colos. But I was pretty concerned when I talked to some people who’d had services that had crashed horribly on launch. So I decided on a kind of hack. I decided that we’d launch on live internet TV—so if something horrible happened, at least people would know what was going on, and might have some fun with it. So I contacted Justin Kan, who was then doing justin.tv, and whose first company I’d failed to invest in at the very first Y Combinator—and we arranged to “launch live”.

It was fun building our “mission control”—and we made some very nice dashboards, many of which we actually still use today. But the day of the launch I was concerned that this was going to be the most boring TV ever: that basically at the appointed hour, I’d just click a mouse and we’d be live, and that’d be the end of it.

Some photos from the Wolfram|Alpha launch

Well, that was not to be. You know, I’ve never watched the broadcast. I don’t know how much it captures of some of the horrible things that went wrong—particularly with last-minute network configuration issues.

But perhaps the most memorable thing had to do with the weather. We were in central Illinois. And about an hour before our grand launch, there was a weather report—that a tornado was heading straight for us! You can see the wind speed spike in the Wolfram|Alpha historical weather data:

Well, fortunately, the tornado missed. And sure enough, at 9:33:50pm central time on May 15, 2009, I pressed the button, and Wolfram|Alpha went live. Lots of people started using it. Some people even understood that it wasn’t a search engine: it was computing things.

The early bug reports then started flowing in. This was the thing Wolfram|Alpha used to do at the very beginning, when something failed:

The initial Wolfram|Alpha failure message: “I’m sorry Dave, I’m afraid I can’t do that...”

And one of the bug reports was someone saying, “How did you know my name was Dave?!” All kinds of bug reports came in the first night—here are a couple:

Wolfram|Alpha early feedback

Well, not only did people start using Wolfram|Alpha; companies did too. Through Bill Gates, Microsoft hooked up Wolfram|Alpha to Bing. And a little company called Siri hooked it up to its app. And some time later Apple bought Siri, and through Steve Jobs, who was by then very sick, Wolfram|Alpha ended up powering the knowledge part of Siri.

Siri answers a great many queries via Wolfram|Alpha

OK, so we’re getting to modern times. And the big thing now is the Wolfram Language. Actually, it’s not such a modern thing for us. Back in the early 1990s I was going to break off the language component of Mathematica—we were thinking of calling it the M Language. And we even had people working on it, like Sergey Brin when he was an intern with us in 1993. But we hadn’t quite figured out how to distribute it, or what it should be called.

And in the end, the idea languished. Until we had Wolfram|Alpha, and the cloud existed, and so on. And also I must admit that I was really getting fed up with people thinking of Mathematica as being a “math thing”. It had been growing and growing:

Mathematica functions over time

And although we kept on strengthening the math, 90% of it wasn’t math at all. We had kind of a “let’s just implement everything” approach. And that had gone really well. We were really on a roll inventing all those meta-algorithms, and automating things. And combined with Wolfram|Alpha I realized that what we had was a new, very general kind of thing: a knowledge-based language that built in as much knowledge about computation and about the world as possible.

And there was another piece too: realizing that our symbolic programming paradigm could be used to represent not just computation, but also deployment, particularly in the cloud.

Mathematica has been very widely used in R&D and in education—but with notable exceptions, like in the finance industry, it’s not been so widely used for deployed production systems. And one of the ideas of the Wolfram Language—and our cloud—is to change that, and to really make knowledge-based programming something that can be deployed everywhere, from supercomputers to embedded devices. There’s a huge amount to say about all this…

And we’ve done lots of other things too. This shows function growth over the first 10,000 days of Mathematica, what kinds of things were in it over the years.

10,000 days of Mathematica function growth

We’ve done all kinds of different things with our technology. I don’t know why I have this picture here, but I have to show it anyway; this was a picture on the commemorative T-shirt for our Image Identification Project that we did a year ago. Maybe you can figure out what the caption on this means with respect to debugging the image identifier: it was an anteater in the image identifier because we lost the aardvark, who is pictured here:

The commemorative T-shirt for the Wolfram Language Image Identification Project: “It was an anteater because we lost the aardvark.”

And just in the last few weeks, we’ve opened up our Wolfram Open Cloud to let anyone use the Wolfram Language on the web. It’s really the culmination of 30, perhaps 40, years of work.

You know, for nearly 30 years I’ve been working hard to make sure the Wolfram Language is well designed—that as it gets bigger and bigger all the pieces fit nicely together, so you can build on them as well as possible. And I have to say it’s nice to see how well this has paid off now.

It’s pretty cool. We’ve got a very different kind of language—something that’s useful for communicating not just about computation, but about the world, with computers and with humans. You can write tiny programs. There’s Tweet-a-Program for example:

Or you can write big programs—like Wolfram|Alpha, which is 15 million lines of Wolfram Language code.

It’s pretty nice to see companies in all sorts of industries starting to base their technology on the Wolfram Language. And another thing I’m really excited about right now is that with the Wolfram Language I think we finally have a great way to teach computational thinking to kids. I even wrote a book about that recently:

And I can’t help wondering what would have happened if the 12-year-old me had had this—and if my first computer language had been the Wolfram Language rather than the machine code of the Elliott 903. I could certainly have made some of my favorite science discoveries with one-liners. And a lot of my questions about things like AI would already have been answered.

But actually I’m pretty happy to have been living at the time in history I have, and to have been able to be part of these decades in the evolution of the incredibly important idea of computation—and to have had the privilege of being able to discover and invent a few things relevant to it along the way.

↧

Who Was Ramanujan?

April 27, 2016, 1:28 pm

≫ Next: Something I Learned in Kindergarten

≪ Previous: My Life in Technology—As Told at the Computer History Museum

This week’s release of the movie The Man Who Knew Infinity (which I saw in rough form last fall through its mathematican-producers Manjul Bhargava and Ken Ono) leads me to write about its subject, Srinivasa Ramanujan…

A Remarkable Letter

They used to come by physical mail. Now it’s usually email. From around the world, I have for many years received a steady trickle of messages that make bold claims—about prime numbers, relativity theory, AI, consciousness or a host of other things—but give little or no backup for what they say. I’m always so busy with my own ideas and projects that I invariably put off looking at these messages. But in the end I try to at least skim them—in large part because I remember the story of Ramanujan.

On about January 31, 1913 a mathematician named G. H. Hardy in Cambridge, England received a package of papers with a cover letter that began: “Dear Sir, I beg to introduce myself to you as a clerk in the Accounts Department of the Port Trust Office at Madras on a salary of only £20 per annum. I am now about 23 years of age….” and went on to say that its author had made “startling” progress on a theory of divergent series in mathematics, and had all but solved the longstanding problem of the distribution of prime numbers. The cover letter ended: “Being poor, if you are convinced that there is anything of value I would like to have my theorems published…. Being inexperienced I would very highly value any advice you give me. Requesting to be excused for the trouble I give you. I remain, Dear Sir, Yours truly, S. Ramanujan”.

What followed were at least 11 pages of technical results from a range of areas of mathematics (at least 2 of the pages have now been lost). Early on, there are a few things that on first sight might seem absurd, like that the sum of all positive integers can be thought of as being equal to –1/12:

The sum of all positive integers equal to -1/12

Surprising claims from Ramanujan's letter to Hardy (Reproduced by kind permission of the Syndics of Cambridge University Library)

Then there are statements that suggest a kind of experimental approach to mathematics:

Empirical math in Ramanujan's letter (Reproduced by kind permission of the Syndics of Cambridge University Library)

But then things get more exotic, with pages of formulas like this:

Remarkable formulas in Ramanujan's letter (Reproduced by kind permission of the Syndics of Cambridge University Library)

What are these? Where do they come from? Are they even correct?

The concepts are familiar from college-level calculus. But these are not just complicated college-level calculus exercises. Instead, when one looks closely, each one has something more exotic and surprising going on—and seems to involve a quite different level of mathematics.

Today we can use Mathematica or Wolfram|Alpha to check the results—at least numerically. And sometimes we can even just type in the question and immediately get out the answer:

Modern Mathematica reproduces Ramanujan's results

And the first surprise—just as G. H. Hardy discovered back in 1913—is that, yes, the formulas are essentially all correct. But what kind of person would have made them? And how? And are they all part of some bigger picture—or in a sense just scattered random facts of mathematics?

One of the surviving math pages from Ramanujan's letter to Hardy (reproduced by kind permission of the Syndics of Cambridge University Library)

The Beginning of the Story

Needless to say, there’s a human story behind this: the remarkable story of Srinivasa Ramanujan.

He was born in a smallish town in India on December 22, 1887 (which made him not “about 23”, but actually 25, when he wrote his letter to Hardy). His family was of the Brahmin (priests, teachers, …) caste but of modest means. The British colonial rulers of India had put in place a very structured system of schools, and by age 10 Ramanujan stood out by scoring top in his district in the standard exams. He also was known as having an exceptional memory, and being able to recite digits of numbers like pi as well as things like roots of Sanskrit words. When he graduated from high school at age 17 he was recognized for his mathematical prowess, and given a scholarship for college.

While in high school Ramanujan had started studying mathematics on his own—and doing his own research (notably on the numerical evaluation of Euler’s constant, and on properties of the Bernoulli numbers). He was fortunate at age 16 (in those days long before the web!) to get a copy of a remarkably good and comprehensive (at least as of 1886) 1055-page summary of high-end undergraduate mathematics, organized in the form of results numbered up to 6165. The book was written by a tutor for the ultra-competitive Mathematical Tripos exams in Cambridge—and its terse “just the facts” format was very similar to the one Ramanujan used in his letter to Hardy.

The math book Ramanujan got at age 16

By the time Ramanujan got to college, all he wanted to do was mathematics—and he failed his other classes, and at one point ran away, causing his mother to send a missing-person letter to the newspaper:

Ramanujan moved to Madras (now Chennai), tried different colleges, had medical problems, and continued his independent math research. In 1909, when he was 21, his mother arranged (in keeping with customs of the time) for him to marry a then-10-year-old girl named Janaki, who started living with him a couple of years later.

Ramanujan seems to have supported himself by doing math tutoring—but soon became known around Madras as a math whiz, and began publishing in the recently launched Journal of the Indian Mathematical Society. His first paper—published in 1911—was on computational properties of Bernoulli numbers (the same Bernoulli numbers that Ada Lovelace had used in her 1843 paper on the Analytical Engine). Though his results weren’t spectacular, Ramanujan’s approach was an interesting and original one that combined continuous (“what’s the numerical value?”) and discrete (“what’s the prime factorization?”) mathematics.

Ramanujan's first published paper

When Ramanujan’s mathematical friends didn’t succeed in getting him a scholarship, Ramanujan started looking for jobs, and wound up in March 1912 as an accounting clerk—or effectively, a human calculator—for the Port of Madras (which was then, as now, a big shipping hub). His boss, the Chief Accountant, happened to be interested in academic mathematics, and became a lifelong supporter of his. The head of the Port of Madras was a rather distinguished British civil engineer, and partly through him, Ramanujan started interacting with a network of technically oriented British expatriates. They struggled to assess him, wondering whether “he has the stuff of great mathematicians” or whether “his brains are akin to those of the calculating boy”. They wrote to a certain Professor M. J. M. Hill in London, who looked at Ramanujan’s rather outlandish statements about divergent series and declared that “Mr. Ramanujan is evidently a man with a taste for Mathematics, and with some ability, but he has got on to wrong lines.” Hill suggested some books for Ramanujan to study.

Meanwhile, Ramanujan’s expat friends were continuing to look for support for him—and he decided to start writing to British mathematicians himself, though with some significant help at composing the English in his letters. We don’t know exactly who all he wrote to first—although Hardy’s long-time collaborator John Littlewood mentioned two names shortly before he died 64 years later: H. F. Baker and E. W. Hobson. Neither were particularly good choices: Baker worked on algebraic geometry and Hobson on mathematical analysis, both subjects fairly far from what Ramanujan was doing. But in any event, neither of them responded.

And so it was that on Thursday, January 16, 1913, Ramanujan sent his letter to G. H. Hardy.

Who Was Hardy?

G. H. Hardy

G. H. Hardy was born in 1877 to schoolteacher parents based about 30 miles south of London. He was from the beginning a top student, particularly in mathematics. Even when I was growing up in England in the early 1970s, it was typical for such students to go to Winchester for high school and Cambridge for college. And that’s exactly what Hardy did. (The other, slightly more famous, track—less austere and less mathematically oriented—was Eton and Oxford, which happens to be where I went.)

Cambridge undergraduate mathematics was at the time very focused on solving ornately constructed calculus-related problems as a kind of competitive sport—with the final event being the Mathematical Tripos exams, which ranked everyone from the “Senior Wrangler” (top score) to the “Wooden Spoon” (lowest passing score). Hardy thought he should have been top, but actually came in 4th, and decided that what he really liked was the somewhat more rigorous and formal approach to mathematics that was then becoming popular in Continental Europe.

The way the British academic system worked at that time—and basically until the 1960s—was that as soon as they graduated, top students could be elected to “college fellowships” that could last the rest of their lives. Hardy was at Trinity College—the largest and most scientifically distinguished college at Cambridge University—and when he graduated in 1900, he was duly elected to a college fellowship.

Hardy’s first research paper was about doing integrals like these:

The modern version of integrals from Hardy's first research paper

For a decade Hardy basically worked on the finer points of calculus, figuring out how to do different kinds of integrals and sums, and injecting greater rigor into issues like convergence and the interchange of limits.

His papers weren’t grand or visionary, but they were good examples of state-of-the-art mathematical craftsmanship. (As a colleague of Bertrand Russell’s, he dipped into the new area of transfinite numbers, but didn’t do much with them.) Then in 1908, he wrote a textbook entitled A Course of Pure Mathematics—which was a good book, and was very successful in its time, even if its preface began by explaining that it was for students “whose abilities reach or approach something like what is usually described as ‘scholarship standard’”.

By 1910 or so, Hardy had pretty much settled into a routine of life as a Cambridge professor, pursuing a steady program of academic work. But then he met John Littlewood. Littlewood had grown up in South Africa and was eight years younger than Hardy, a recent Senior Wrangler, and in many ways much more adventurous. And in 1911 Hardy—who had previously always worked on his own—began a collaboration with Littlewood that ultimately lasted the rest of his life.

As a person, Hardy gives me the impression of a good schoolboy who never fully grew up. He seemed to like living in a structured environment, concentrating on his math exercises, and displaying cleverness whenever he could. He could be very nerdy—whether about cricket scores, proving the non-existence of God, or writing down rules for his collaboration with Littlewood. And in a quintessentially British way, he could express himself with wit and charm, but was personally stiff and distant—for example always theming himself as “G. H. Hardy”, with “Harold” basically used only by his mother and sister.

So in early 1913 there was Hardy: a respectable and successful, if personally reserved, British mathematician, who had recently been energized by starting to collaborate with Littlewood—and was being pulled in the direction of number theory by Littlewood’s interests there. But then he received the letter from Ramanujan.

The Letter and Its Effects

Ramanujan’s letter began in a somewhat unpromising way, giving the impression that he thought he was describing for the first time the already fairly well-known technique of analytic continuation for generalizing things like the factorial function to non-integers. He made the statement that “My whole investigations are based upon this and I have been developing this to a remarkable extent so much so that the local mathematicians are not able to understand me in my higher flights.” But after the cover letter, there followed more than nine pages that listed over 120 different mathematical results.

Again, they began unpromisingly, with rather vague statements about having a method to count the number of primes up to a given size. But by page 3, there were definite formulas for sums and integrals and things. Some of them looked at least from a distance like the kinds of things that were, for example, in Hardy’s papers. But some were definitely more exotic. Their general texture, though, was typical of these types of math formulas. But many of the actual formulas were quite surprising—often claiming that things one wouldn’t expect to be related at all were actually mathematically equal.

At least two pages of the original letter have gone missing. But the last page we have again seems to end inauspiciously—with Ramanujan describing achievements of his theory of divergent series, including the seemingly absurd result about adding up all the positive integers, 1+2+3+4+…, and getting –1/12.

So what was Hardy’s reaction? First he consulted Littlewood. Was it perhaps a practical joke? Were these formulas all already known, or perhaps completely wrong? Some they recognized, and knew were correct. But many they did not. But as Hardy later said with characteristic clever gloss, they concluded that these too “must be true because, if they were not true, no one would have the imagination to invent them.”

Bertrand Russell wrote that by the next day he “found Hardy and Littlewood in a state of wild excitement because they believe they have found a second Newton, a Hindu clerk in Madras making 20 pounds a year.” Hardy showed Ramanujan’s letter to lots of people, and started making enquiries with the government department that handled India. It took him a week to actually reply to Ramanujan, opening with a certain measured and precisely expressed excitement: “I was exceedingly interested by your letter and by the theorems which you state.”

Then he went on: “You will however understand that, before I can judge properly of the value of what you have done, it is essential that I should see proofs of some of your assertions.” It was an interesting thing to say. To Hardy, it wasn’t enough to know what was true; he wanted to know the proof—the story—of why it was true. Of course, Hardy could have taken it upon himself to find his own proofs. But I think part of it was that he wanted to get an idea of how Ramanujan thought—and what level of mathematician he really was.

His letter went on—with characteristic precision—to group Ramanujan’s results into three classes: already known, new and interesting but probably not important, and new and potentially important. But the only things he immediately put in the third category were Ramanujan’s statements about counting primes, adding that “almost everything depends on the precise rigour of the methods of proof which you have used.”

Hardy had obviously done some background research on Ramanujan by this point, since in his letter he makes reference to Ramanujan’s paper on Bernoulli numbers. But in his letter he just says, “I hope very much that you will send me as quickly as possible… a few of your proofs,” then closes with, “Hoping to hear from you again as soon as possible.”

Ramanujan did indeed respond quickly to Hardy’s letter, and his response is fascinating. First, he says he was expecting the same kind of reply from Hardy as he had from the “Mathematics Professor at London”, who just told him “not [to] fall into the pitfalls of divergent series.” Then he reacts to Hardy’s desire for rigorous proofs by saying, “If I had given you my methods of proof I am sure you will follow the London Professor.” He mentions his result 1+2+3+4+…=–1/12 and says that “If I tell you this you will at once point out to me the lunatic asylum as my goal.” He goes on to say, “I dilate on this simply to convince you that you will not be able to follow my methods of proof… [based on] a single letter.” He says that his first goal is just to get someone like Hardy to verify his results—so he’ll be able to get a scholarship, since “I am already a half starving man. To preserve my brains I want food…”

Ramanujan responds to Hardy (Reproduced by kind permission of the Syndics of Cambridge University Library) Ramanujan responds to Hardy(Reproduced by kind permission of the Syndics of Cambridge University Library)

Ramanujan makes a point of saying that it was Hardy’s first category of results—ones that were already known—that he’s most pleased about, “For my results are verified to be true even though I may take my stand upon slender basis.” In other words, Ramanujan himself wasn’t sure if the results were correct—and he’s excited that they actually are.

So how was he getting his results? I’ll say more about this later. But he was certainly doing all sorts of calculations with numbers and formulas—in effect doing experiments. And presumably he was looking at the results of these calculations to get an idea of what might be true. It’s not clear how he figured out what was actually true—and indeed some of the results he quoted weren’t in the end true. But presumably he used some mixture of traditional mathematical proof, calculational evidence, and lots of intuition. But he didn’t explain any of this to Hardy.

Instead, he just started conducting a correspondence about the details of the results, and the fragments of proofs he was able to give. Hardy and Littlewood seemed intent on grading his efforts—with Littlewood writing about some result, for example, “(d) is still wrong, of course, rather a howler.” Still, they wondered if Ramanujan was “an Euler”, or merely “a Jacobi”. But Littlewood had to say, “The stuff about primes is wrong”—explaining that Ramanujan incorrectly assumed the Riemann zeta function didn’t have complex zeros, even though it actually has an infinite number of them, which are the subject of the whole Riemann hypothesis. (The Riemann hypothesis is still a famous unsolved math problem, even though an optimistic teacher suggested it to Littlewood as a project when he was an undergraduate…)

What about Ramanujan’s strange 1+2+3+4+… = –1/12? Well, that has to do with the Riemann zeta function as well. For positive integers, ζ(s) is defined as the sum 1:1^s + 1:2^s + 1:3^s + ... And given those values, there’s a nice function—called Zeta[s] in the Wolfram Language—that can be obtained by continuing to all complex s. Now based on the formula for positive arguments, one can identify Zeta[–1] with 1+2+3+4+… But one can also just evaluate Zeta[–1]:

The sum of all positive integers

It’s a weird result, to be sure. But not as crazy as it might at first seem. And in fact it’s a result that’s nowadays considered perfectly sensible for purposes of certain calculations in quantum field theory (in which, to be fair, all actual infinities are intended to cancel out at the end).

But back to the story. Hardy and Littlewood didn’t really have a good mental model for Ramanujan. Littlewood speculated that Ramanujan might not be giving the proofs they assumed he had because he was afraid they’d steal his work. (Stealing was a major issue in academia then as it is now.) Ramanujan said he was “pained” by this speculation, and assured them that he was not “in the least apprehensive of my method being utilised by others.” He said that actually he’d invented the method eight years earlier, but hadn’t found anyone who could appreciate it, and now he was “willing to place unreservedly in your possession what little I have.”

Meanwhile, even before Hardy had responded to Ramanujan’s first letter, he’d been investigating with the government department responsible for Indian students how he could bring Ramanujan to Cambridge. It’s not quite clear quite what got communicated, but Ramanujan responded that he couldn’t go—perhaps because of his Brahmin beliefs, or his mother, or perhaps because he just didn’t think he’d fit in. But in any case, Ramanujan’s supporters started pushing instead for him to get a graduate scholarship at the University of Madras. More experts were consulted, who opined that “His results appear to be wonderful; but he is not, now, able to present any intelligible proof of some of them,” but “He has sufficient knowledge of English and is not too old to learn modern methods from books.”

The university administration said their regulations didn’t allow a graduate scholarship to be given to someone like Ramanujan who hadn’t finished an undergraduate degree. But they helpfully suggested that “Section XV of the Act of Incorporation and Section 3 of the Indian Universities Act, 1904, allow of the grant of such a scholarship [by the Government Educational Department], subject to the express consent of the Governor of Fort St George in Council.” And despite the seemingly arcane bureaucracy, things moved quickly, and within a few weeks Ramanujan was duly awarded a scholarship for two years, with the sole requirement that he provide quarterly reports.

A Way of Doing Mathematics

By the time he got his scholarship, Ramanujan had started writing more papers, and publishing them in the Journal of the Indian Mathematical Society. Compared to his big claims about primes and divergent series, the topics of these papers were quite tame. But the papers were remarkable nevertheless.

What’s immediately striking about them is how calculational they are—full of actual, complicated formulas. Most math papers aren’t that way. They may have complicated notation, but they don’t have big expressions containing complicated combinations of roots, or seemingly random long integers.

In modern times, we’re used to seeing incredibly complicated formulas routinely generated by Mathematica. But usually they’re just intermediate steps, and aren’t what papers explicitly talk much about. For Ramanujan, though, complicated formulas were often what really told the story. And of course it’s incredibly impressive that he could derive them without computers and modern tools.

(As an aside, back in the late 1970s I started writing papers that involved formulas generated by computer. And in one particular paper, the formulas happened to have lots of occurrences of the number 9. But the experienced typist who typed the paper—yes, from a manuscript—replaced every “9” with a “g”. When I asked her why, she said, “Well, there are never explicit 9’s in papers!”)

Looking at Ramanujan’s papers, another striking feature is the frequent use of numerical approximations in arguments leading to exact results. People tend to think of working with algebraic formulas as an exact process—generating, for example, coefficients that are exactly 16, not just roughly 15.99999. But for Ramanujan, approximations were routinely part of the story, even when the final results were exact.

In some sense it’s not surprising that approximations to numbers are useful. Let’s say we want to know which is larger: Sqrt[2]^(Sqrt[2]+Sqrt[3]) or 2^Sqrt[3] . We can start doing all sorts of transformations among square roots, and trying to derive theorems from them. Or we can just evaluate each expression numerically, and find that the first one (2.9755…) is obviously smaller than the second (3.322…). In the mathematical tradition of someone like Hardy—or, for that matter, in a typical modern calculus course—such a direct calculational way of answering the question seems somehow inappropriate and improper.

And of course if the numbers are very close one has to be careful about numerical round-off and so on. But for example in Mathematica and the Wolfram Language today—particularly with their built-in precision tracking for numbers—we often use numerical approximations internally as part of deriving exact results, much like Ramanujan did.

When Hardy asked Ramanujan for proofs, part of what he wanted was to get a kind of story for each result that explained why it was true. But in a sense Ramanujan’s methods didn’t lend themselves to that. Because part of the “story” would have to be that there’s this complicated expression, and it happens to be numerically greater than this other expression. It’s easy to see it’s true—but there’s no real story of why it’s true.

And the same happens whenever a key part of a result comes from pure computation of complicated formulas, or in modern times, from automated theorem proving. Yes, one can trace the steps and see that they’re correct. But there’s no bigger story that gives one any particular understanding of the results.

For most people it’d be bad news to end up with some complicated expression or long seemingly random number—because it wouldn’t tell them anything. But Ramanujan was different. Littlewood once said of Ramanujan that “every positive integer was one of his personal friends.” And between a good memory and good ability to notice patterns, I suspect Ramanujan could conclude a lot from a complicated expression or a long number. For him, just the object itself would tell a story.

Ramanujan was of course generating all these things by his own calculational efforts. But back in the late 1970s and early 1980s I had the experience of starting to generate lots of complicated results automatically by computer. And after I’d been doing it awhile, something interesting happened: I started being able to quickly recognize the “texture” of results—and often immediately see what might be likely be true. If I was dealing, say, with some complicated integral, it wasn’t that I knew any theorems about it. I just had an intuition about, for example, what functions might appear in the result. And given this, I could then get the computer to go in and fill in the details—and check that the result was correct. But I couldn’t derive why the result was true, or tell a story about it; it was just something that intuition and calculation gave me.

Now of course there’s a fair amount of pure mathematics where one can’t (yet) just routinely go in and do an explicit computation to check whether or not some result is correct. And this often happens for example when there are infinite or infinitesimal quantities or limits involved. And one of the things Hardy had specialized in was giving proofs that were careful in handling such things. In 1910 he’d even written a book called Orders of Infinity that was about subtle issues that come up in taking infinite limits. (In particular, in a kind of algebraic analog of the theory of transfinite numbers, he talked about comparing growth rates of things like nested exponential functions—and we even make some use of what are now called Hardy fields in dealing with generalizations of power series in the Wolfram Language.)

So when Hardy saw Ramanujan’s “fast and loose” handling of infinite limits and the like, it wasn’t surprising that he reacted negatively—and thought he would need to “tame” Ramanujan, and educate him in the finer European ways of doing such things, if Ramanujan was actually going to reliably get correct answers.

Seeing What’s Important

Ramanujan was surely a great human calculator, and impressive at knowing whether a particular mathematical fact or relation was actually true. But his greatest skill was, I think, something in a sense more mysterious: an uncanny ability to tell what was significant, and what might be deduced from it.

Take for example his paper “Modular Equations and Approximations to π”, published in 1914, in which he calculates (without a computer of course):

Ramanujan discovers an "almost integer"

Most mathematicians would say, “It’s an amusing coincidence that that’s so close to an integer—but so what?” But Ramanujan realized there was more to it. He found other relations (those “=” should really be ≅):

Approximations to pi

Then he began to build a theory—that involves elliptic functions, though Ramanujan didn’t know that name yet—and started coming up with new series approximations for π:

Ramanujan's series for pi

Previous approximations to π had in a sense been much more sober, though the best one before Ramanujan’s (Machin’s series from 1706) did involve the seemingly random number 239:

Machin's series for pi

But Ramanujan’s series—bizarre and arbitrary as they might appear—had an important feature: they took far fewer terms to compute π to a given accuracy. In 1977, Bill Gosper—himself a rather Ramanujan-like figure, whom I’ve had the pleasure of knowing for more than 35 years—took the last of Ramanujan’s series from the list above, and used it to compute a record number of digits of π. There soon followed other computations, all based directly on Ramanujan’s idea—as is the method we use for computing π in Mathematica and the Wolfram Language.

It’s interesting to see in Ramanujan’s paper that even he occasionally didn’t know what was and wasn’t significant. For example, he noted:

A “curious” fact from Ramanujan's paper

And then—in pretty much his only published example of geometry—he gave a peculiar geometric construction for approximately “squaring the circle” based on this formula:

An approximate "squaring of the circle" by Ramanujan

Truth versus Narrative

To Hardy, Ramanujan’s way of working must have seemed quite alien. For Ramanujan was in some fundamental sense an experimental mathematician: going out into the universe of mathematical possibilities and doing calculations to find interesting and significant facts—and only then building theories based on them.

Hardy on the other hand worked like a traditional mathematician, progressively extending the narrative of existing mathematics. Most of his papers begin—explicitly or implicitly—by quoting some result from the mathematical literature, and then proceed by telling the story of how this result can be extended by a series of rigorous steps. There are no sudden empirical discoveries—and no seemingly inexplicable jumps based on intuition from them. It’s mathematics carefully argued, and built, in a sense, brick by brick.

A century later this is still the way almost all pure mathematics is done. And even if it’s discussing the same subject matter, perhaps anything else shouldn’t be called “mathematics”, because its methods are too different. In my own efforts to explore the computational universe of simple programs, I’ve certainly done a fair amount that could be called “mathematical” in the sense that it, for example, explores systems based on numbers.

Over the years, I’ve found all sorts of results that seem interesting. Strange structures that arise when one successively adds numbers to their digit reversals. Bizarre nested recurrence relations that generate primes. Peculiar representations of integers using trees of bitwise xors. But they’re empirical facts—demonstrably true, yet not part of the tradition and narrative of existing mathematics.

For many mathematicians—like Hardy—the process of proof is the core of mathematical activity. It’s not particularly significant to come up with a conjecture about what’s true; what’s significant is to create a proof that explains why something is true, constructing a narrative that other mathematicians can understand.

Particularly today, as we start to be able to automate more and more proofs, they can seem a bit like mundane manual labor, where the outcome may be interesting but the process of getting there is not. But proofs can also be illuminating. They can in effect be stories that introduce new abstract concepts that transcend the particulars of a given proof, and provide raw material to understand many other mathematical results.

For Ramanujan, though, I suspect it was facts and results that were the center of his mathematical thinking, and proofs felt a bit like some strange European custom necessary to take his results out of his particular context, and convince European mathematicians that they were correct.

Going to Cambridge

But let’s return to the story of Ramanujan and Hardy.
In the early part of 1913, Hardy and Ramanujan continued to exchange letters. Ramanujan described results; Hardy critiqued what Ramanujan said, and pushed for proofs and traditional mathematical presentation. Then there was a long gap, but finally in December 1913, Hardy wrote again, explaining that Ramanujan’s most ambitious results—about the distribution of primes—were definitely incorrect, commenting that “…the theory of primes is full of pitfalls, to surmount which requires the fullest of trainings in modern rigorous methods.” He also said that if Ramanujan had been able to prove his results it would have been “about the most remarkable mathematical feat in the whole history of mathematics.”

In January 1914 a young Cambridge mathematician named E. H. Neville came to give lectures in Madras, and relayed the message that Hardy was (in Ramanujan’s words) “anxious to get [Ramanujan] to Cambridge”. Ramanujan responded that back in February 1913 he’d had a meeting, along with his “superior officer”, with the Secretary to the Students Advisory Committee of Madras, who had asked whether he was prepared to go to England. Ramanujan wrote that he assumed he’d have to take exams like the other Indian students he’d seen go to England, which he didn’t think he’d do well enough in—and also that his superior officer, a “very orthodox Brahman having scruples to go to foreign land replied at once that I could not go”.

But then he said that Neville had “cleared [his] doubts”, explaining that there wouldn’t be an issue with his expenses, that his English would do, that he wouldn’t have to take exams, and that he could remain a vegetarian in England. He ended by saying that he hoped Hardy and Littlewood would “be good enough to take the trouble of getting me [to England] within a very few months.”

Hardy had assumed it would be bureaucratically trivial to get Ramanujan to England, but actually it wasn’t. Hardy’s own Trinity College wasn’t prepared to contribute any real funding. Hardy and Littlewood offered to put up some of the money themselves. But Neville wrote to the registrar of the University of Madras saying that “the discovery of the genius of S. Ramanujan of Madras promises to be the most interesting event of our time in the mathematical world”—and suggested the university come up with the money. Ramanujan’s expat supporters swung into action, with the matter eventually reaching the Governor of Madras—and a solution was found that involved taking money from a grant that had been given by the government five years earlier for “establishing University vacation lectures”, but that was actually, in the bureaucratic language of “Document No. 182 of the Educational Department”, “not being utilised for any immediate purpose”.

There are strange little notes in the bureaucratic record, like on February 12: “What caste is he? Treat as urgent.” But eventually everything was sorted out, and on March 17, 1914, after a send-off featuring local dignitaries, Ramanujan boarded a ship for England, sailing up through the Suez Canal, and arriving in London on April 14. Before leaving India, Ramanujan had prepared for European life by getting Western clothes, and learning things like how to eat with a knife and fork, and how to tie a tie. Many Indian students had come to England before, and there was a whole procedure for them. But after a few days in London, Ramanujan arrived in Cambridge—with the Indian newspapers proudly reporting that “Mr. S. Ramanujan, of Madras, whose work in the higher mathematics has excited the wonder of Cambridge, is now in residence at Trinity.”

(In addition to Hardy and Littlewood, two other names that appear in connection with Ramanujan’s early days in Cambridge are Neville and Barnes. They’re not especially famous in the overall history of mathematics, but it so happens that in the Wolfram Language they’re both commemorated by built-in functions: NevilleThetaS and BarnesG.)

Ramanujan in Cambridge

Ramanujan in Cambridge with friends Ramanujan in Cambridge at graduation with friends

What was the Ramanujan who arrived in Cambridge like? He was described as enthusiastic and eager, though diffident. He made jokes, sometimes at his own expense. He could talk about politics and philosophy as well as mathematics. He was never particularly introspective. In official settings he was polite and deferential and tried to follow local customs. His native language was Tamil, and earlier in his life he had failed English exams, but by the time he arrived in England, his English was excellent. He liked to hang out with other Indian students, sometimes going to musical events, or boating on the river. Physically, he was described as short and stout—with his main notable feature being the brightness of his eyes. He worked hard, chasing one mathematical problem after another. He kept his living space sparse, with only a few books and papers. He was sensible about practical things, for example in figuring out issues with cooking and vegetarian ingredients. And from what one can tell, he was happy to be in Cambridge.

But then on June 28, 1914—two and a half months after Ramanujan arrived in England—Archduke Ferdinand was assassinated, and on July 28, World War I began. There was an immediate effect on Cambridge. Many students were called up for military duty. Littlewood joined the war effort and ended up developing ways to compute range tables for anti-aircraft guns. Hardy wasn’t a big supporter of the war—not least because he liked German mathematics—but he volunteered for duty too, though was rejected on medical grounds.

Ramanujan described the war in a letter to his mother, saying for example, “They fly in aeroplanes at great heights, bomb the cities and ruin them. As soon as enemy planes are sighted in the sky, the planes resting on the ground take off and fly at great speeds and dash against them resulting in destruction and death.”

Ramanujan nevertheless continued to pursue mathematics, explaining to his mother that “war is waged in a country that is as far as Rangoon is away from [Madras]”. There were practical difficulties, like a lack of vegetables, which caused Ramanujan to ask a friend in India to send him “some tamarind (seeds being removed) and good cocoanut oil by parcel post”. But of more importance, as Ramanujan reported it, was that the “professors here… have lost their interest in mathematics owing to the present war”.

Ramanujan told a friend that he had “changed [his] plan of publishing [his] results”. He said that he would wait to publish any of the old results in his notebooks until the war was over. But he said that since coming to England he had learned “their methods”, and was “trying to get new results by their methods so that I can easily publish these results without delay”.

In 1915 Ramanujan published a long paper entitled “Highly Composite Numbers” about maxima of the function (DivisorSigma in the Wolfram Language) that counts the number of divisors of a given number. Hardy seems to have been quite involved in the preparation of this paper—and it served as the centerpiece of Ramanujan’s analog of a PhD thesis.

For the next couple of years, Ramanujan prolifically wrote papers—and despite the war, they were published. A notable paper he wrote with Hardy concerns the partition function (PartitionsP in the Wolfram Language) that counts the number of ways an integer can be written as a sum of positive integers. The paper is a classic example of mixing the approximate with the exact. The paper begins with the result for large n:

Approximating the number of partitions for large n

But then, using ideas Ramanujan developed back in India, it progressively improves the estimate, to the point where the exact integer result can be obtained. In Ramanujan’s day, computing the exact value of PartitionsP[200] was a big deal—and the climax of his paper. But now, thanks to Ramanujan’s method, it’s instantaneous:

Thanks to Ramanujan it now takes almost no time to compute this

Cambridge was dispirited by the war—with an appalling number of its finest students dying, often within weeks, at the front lines. Trinity College’s big quad had become a war hospital. But through all of this, Ramanujan continued to do his mathematics—and with Hardy’s help continued to build his reputation.

But then in May 1917, there was another problem: Ramanujan got sick. From what we know now, it’s likely that what he had was a parasitic liver infection picked up in India. But back then nobody could diagnose it. Ramanujan went from doctor to doctor, and nursing home to nursing home. He didn’t believe much of what he was told, and nothing that was done seemed to help much. Some months he would be well enough to do a significant amount of mathematics; others not. He became depressed, and at one point apparently suicidal. It didn’t help that his mother had prevented his wife back in India from communicating with him, presumably fearing it would distract him.

Hardy tried to help—sometimes by interacting with doctors, sometimes by providing mathematical input. One doctor told Hardy he suspected “some obscure Oriental germ trouble imperfectly studied at present”. Hardy wrote, “Like all Indians, [Ramanujan] is fatalistic, and it is terribly hard to get him to take care of himself.” Hardy later told the now-famous story that he once visited Ramanujan at a nursing home, telling him that he came in a taxicab with number 1729, and saying that it seemed to him a rather dull number—to which Ramanujan replied: “No, it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways”: 1729=1^3+12^3=9^3+10^3 . (Wolfram|Alpha now reports some other properties too.)

But through all of this, Ramanujan’s mathematical reputation continued to grow. He was elected a Fellow of the Royal Society (with his supporters including Hobson and Baker, both of whom had failed to respond to his original letter)—and in October 1918 he was elected a fellow of Trinity College, assuring him financial support. A month later World War I was over—and the threat of U-boat attacks, which had made travel to India dangerous, was gone.

And so on March 13, 1919, Ramanujan returned to India—now very famous and respected, but also very ill. Through it all, he continued to do mathematics, writing a notable letter to Hardy about “mock” theta functions on January 12, 1920. He chose to live humbly, and largely ignored what little medicine could do for him. And on April 26, 1920, at the age of 32, and three days after the last entry in his notebook, he died.

Notifying Hardy of Ramanujan's death (image courtesy of the Master and Fellows of Trinity College, Cambridge)

The Aftermath

From when he first started doing mathematics research, Ramanujan had recorded his results in a series of hardcover notebooks—publishing only a very small fraction of them. When Ramanujan died, Hardy began to organize an effort to study and publish all 3000 or so results in Ramanujan’s notebooks. Several people were involved in the 1920s and 1930s, and quite a few publications were generated. But through various misadventures the project was not completed—to be taken up again only in the 1970s.

In 1940, Hardy gave all the letters he had from Ramanujan to the Cambridge University Library, but the original cover letter for what Ramanujan sent in 1913 was not among them—so now the only record we have of that is the transcription Hardy later published. Ramanujan’s three main notebooks sat for many years on top of a cabinet in the librarian’s office at the University of Madras, where they suffered damage from insects, but were never lost. His other mathematical documents passed through several hands, and some of them wound up in the incredibly messy office of a Cambridge mathematician—but when he died in 1965 they were noticed and sent to a library, where they languished until they were “rediscovered” with great excitement as Ramanujan’s lost notebook in 1976.

When Ramanujan died, it took only days for his various relatives to start asking for financial support. There were large medical bills from England, and there was talk of selling Ramanujan’s papers to raise money.

Ramanujan’s wife was 21 when he died, but as was the custom, she never remarried. She lived very modestly, making her living mostly from tailoring. In 1950 she adopted the son of a friend of hers who had died. By the 1960s, Ramanujan was becoming something of a general Indian hero, and she started receiving various honors and pensions. Over the years, quite a few mathematicians had come to visit her—and she had supplied them for example with the passport photo that has become the most famous picture of Ramanujan.

She lived a long life, dying in 1994 at the age of 95, having outlived Ramanujan by 73 years.

What Became of Hardy?

Hardy was 35 when Ramanujan’s letter arrived, and was 43 when Ramanujan died. Hardy viewed his “discovery” of Ramanujan as his greatest achievement, and described his association with Ramanujan as the “one romantic incident of [his] life”. After Ramanujan died, Hardy put some of his efforts into continuing to decode and develop Ramanujan’s results, but for the most part he returned to his previous mathematical trajectory. His collected works fill seven large volumes (while Ramanujan’s publications make up just one fairly slim volume). The word clouds of the titles of his papers show only a few changes from before he met Ramanujan to after:

Word clouds of the titles of Hardy's papers, before Ramanujan

Shortly before Ramanujan entered his life, Hardy had started to collaborate with John Littlewood, who he would later say was an even more important influence on his life than Ramanujan. After Ramanujan died, Hardy moved to what seemed like a better job in Oxford, and ended up staying there for 11 years before returning to Cambridge. His absence didn’t affect his collaboration with Littlewood, though—since they worked mostly by exchanging written messages, even when their rooms were only a few hundred feet apart. After 1911 Hardy rarely did mathematics without a collaborator; he worked especially with Littlewood, publishing 95 papers with him over the course of 38 years.

Hardy’s mathematics was always of the finest quality. He dreamed of doing something like solving the Riemann hypothesis—but in reality never did anything truly spectacular. He wrote two books, though, that continue to be read today: An Introduction to the Theory of Numbers, with E. M. Wright; and Inequalities, with Littlewood and G. Pólya.

Hardy lived his life in the stratum of the intellectual elite. In the 1920s he displayed a picture of Lenin in his apartment, and was briefly president of the “scientific workers” trade union. He always wrote elegantly, mostly about mathematics, and sometimes about Ramanujan. He eschewed gadgets and always lived along with students and other professors in his college. He never married, though near the end of his life his younger sister joined him in Cambridge (she also had never married, and had spent most of her life teaching at the girls’ school where she went as a child).

In 1940 Hardy wrote a small book called A Mathematician’s Apology. I remember when I was about 12 being given a copy of this book. I think many people viewed it as a kind of manifesto or advertisement for pure mathematics. But I must say it didn’t resonate with me at all. It felt to me at once sanctimonious and austere, and I wasn’t impressed by its attempt to describe the aesthetics and pleasures of mathematics, or by the pride with which its author said that “nothing I have ever done is of the slightest practical use” (actually, he co-invented the Hardy-Weinberg law used in genetics). I doubt I would have chosen the path of a pure mathematician anyway, but Hardy’s book helped make certain of it.

The beginning of Hardy's A Mathematician's Apology

To be fair, however, Hardy wrote the book at a low point in his own life, when he was concerned about his health and the loss of his mathematical faculties. And perhaps that explains why he made a point of explaining that “mathematics… is a young man’s game”. (And in an article about Ramanujan, he wrote that “a mathematician is often comparatively old at 30, and his death may be less of a catastrophe than it seems.”) I don’t know if the sentiment had been expressed before—but by the 1970s it was taken as an established fact, extending to science as well as mathematics. Kids I knew would tell me I’d better get on with things, because it’d be all over by age 30.

Hardy explains that "mathematics... is a young man's game"

Is that actually true? I don’t think so. It’s hard to get clear evidence, but as one example I took the data we have on notable mathematical theorems in Wolfram|Alpha and the Wolfram Language, and make a histogram of the ages of people who proved them. It’s not a completely uniform distribution (though the peak just before 40 is probably just a theorem-selection effect associated with Fields Medals), but particularly if one corrects for life expectancies now and in the past it’s a far cry from showing that mathematical productivity has all but dried up by age 30.

Distribution of the ages at which mathematicians proved "notable" theorems

My own feeling—as someone who’s getting older myself—is that at least up to my age, many aspects of scientific and technical productivity actually steadily increase. For a start, it really helps to know more—and certainly a lot of my best ideas have come from making connections between things I’ve learned decades apart. It also helps to have more experience and intuition about how things will work out. And if one has earlier successes, those can help provide the confidence to move forward more definitively, without second guessing. Of course, one must maintain the constitution to focus with enough intensity—and be able to concentrate for long enough—to think through complex things. I think in some ways I’ve gotten slower over the years, and in some ways faster. I’m slower because I know more about mistakes I make, and try to do things carefully enough to avoid them. But I’m faster because I know more and can shortcut many more things. Of course, for me in particular, it also helps that over the years I’ve built all sorts of automation that I’ve been able to make use of.

A quite different point is that while making specific contributions to an existing area (as Hardy did) is something that can potentially be done by the young, creating a whole new structure tends to require the broader knowledge and experience that comes with age.

But back to Hardy. I suspect it was a lack of motivation rather than ability, but in his last years, he became quite dispirited and all but dropped mathematics. He died in 1947 at the age of 70.

Littlewood, who was a decade younger than Hardy, lived on until 1977. Littlewood was always a little more adventurous than Hardy, a little less austere, and a little less august. Like Hardy, he never married—though he did have a daughter (with the wife of the couple who shared his vacation home) whom he described as his “niece” until she was in her forties. And—giving a lie to Hardy’s claim about math being a young man’s game—Littlewood (helped by getting early antidepressant drugs at the age of 72) had remarkably productive years of mathematics in his 80s.

Ramanujan’s Mathematics

What became of Ramanujan’s mathematics? For many years, not too much. Hardy pursued it some, but the whole field of number theory—which was where the majority of Ramanujan’s work was concentrated—was out of fashion. Here’s a plot of the fraction of all math papers tagged as “number theory” as a function of time in the Zentralblatt database:

Fraction of mathematics papers tagged as "number theory" vs. time

Ramanujan’s interest may have been to some extent driven by the peak in the early 1900s (which would probably go even higher with earlier data). But by the 1930s, the emphasis of mathematics had shifted away from what seemed like particular results in areas like number theory and calculus, towards the greater generality and formality that seemed to exist in more algebraic areas.

In the 1970s, though, number theory suddenly became more popular again, driven by advances in algebraic number theory. (Other subcategories showing substantial increases at that time include automorphic forms, elementary number theory and sequences.)

Back in the late 1970s, I had certainly heard of Ramanujan—though more in the context of his story than his mathematics. And I was pleased in 1982, when I was writing about the vacuum in quantum field theory, that I could use results of Ramanujan’s to give closed forms for particular cases (of infinite sums in various dimensions of modes of a quantum field—corresponding to Epstein zeta functions):

Using results from Ramanujan in a 1982 physics paper

Starting in the 1970s, there was a big effort—still not entirely complete—to prove results Ramanujan had given in his notebooks. And there were increasing connections being found between the particular results he’d got, and general themes emerging in number theory.

A significant part of what Ramanujan did was to study so-called special functions—and to invent some new ones. Special functions—like the zeta function, elliptic functions, theta functions, and so on—can be thought of as defining convenient “packets” of mathematics. There are an infinite number of possible functions one can define, but what get called “special functions” are ones whose definitions survive because they turn out to be repeatedly useful.

And today, for example, in Mathematica and the Wolfram Language we have RamanujanTau, RamanujanTauL, RamanujanTauTheta and RamanujanTauZ as special functions. I don’t doubt that in the future we’ll have more Ramanujan-inspired functions. In the last year of his life, Ramanujan defined some particularly ambitious special functions that he called “mock theta functions”—and that are still in the process of being made concrete enough to routinely compute.

If one looks at the definition of Ramanujan’s tau function it seems quite bizarre (notice the “24”):

And to my mind, the most remarkable thing about Ramanujan is that he could define something as seemingly arbitrary as this, and have it turn out to be useful a century later.

Are They Random Facts?

In antiquity, the Pythagoreans made much of the fact that 1+2+3+4=10. But to us today, this just seems like a random fact of mathematics, not of any particular significance. When I look at Ramanujan’s results, many of them also seem like random facts of mathematics. But the amazing thing that’s emerged over the past century, and particularly over the past few decades, is that they’re not. Instead, more and more of them are being found to be connected to deep, elegant mathematical principles.

To enunciate these principles in a direct and formal way requires layers of abstract mathematical concepts and language which have taken decades to develop. But somehow, through his experiments and intuition, Ramanujan managed to find concrete examples of these principles. Often his examples look quite arbitrary—full of seemingly random definitions and numbers. But perhaps it’s not surprising that that’s what it takes to express modern abstract principles in terms of the concrete mathematical constructs of the early twentieth century. It’s a bit like a poet trying to express deep general ideas—but being forced to use only the imperfect medium of human natural language.

It’s turned out to be very challenging to prove many of Ramanujan’s results. And part of the reason seems to be that to do so—and to create the kind of narrative needed for a good proof—one actually has no choice but to build up much more abstract and conceptually complex structures, often in many steps.

So how is it that Ramanujan managed in effect to predict all these deep principles of later mathematics? I think there are two basic logical possibilities. The first is that if one drills down from any sufficiently surprising result, say in number theory, one will eventually reach a deep principle in the effort to explain it. And the second possibility is that while Ramanujan did not have the wherewithal to express it directly, he had what amounts to an aesthetic sense of which seemingly random facts would turn out to fit together and have deeper significance.

I’m not sure which of these possibilities is correct, and perhaps it’s a combination. But to understand this a little more, we should talk about the overall structure of mathematics. In a sense mathematics as it’s practiced is strangely perched between the trivial and the impossible. At an underlying level, mathematics is based on simple axioms. And it could be—as it is, say, for the specific case of Boolean algebra—that given the axioms there’s a straightforward procedure to figure out whether any particular result is true. But ever since Gödel’s theorem in 1931 (which Hardy must have been aware of, but apparently never commented on) it’s been known that for an area like number theory the situation is quite different: there are statements one can give within the context of the theory whose truth or falsity is undecidable from the axioms.

It was proved in the early 1960s that there are polynomial equations involving integers where it’s undecidable from the axioms of arithmetic—or in effect from the formal methods of number theory—whether or not the equations have solutions. The particular examples of classes of equations where it’s known that this happens are extremely complex. But from my investigations in the computational universe, I’ve long suspected that there are vastly simpler equations where it happens too. Over the past several decades, I’ve had the opportunity to poll some of the world’s leading number theorists on where they think the boundary of undecidability lies. Opinions differ, but it’s certainly within the realm of possibility that for example cubic equations with three variables could exhibit undecidability.

So the question then is, why should the truth of what seem like random facts of number theory even be decidable? In other words, it’s perfectly possible that Ramanujan could have stated a result that simply can’t be proved true or false from the axioms of arithmetic. Conceivably the Goldbach conjecture will turn out to be an example. And so could many of Ramanujan’s results.

Some of Ramanujan’s results have taken decades to prove—but the fact that they’re provable at all is already important information. For it suggests that in a sense they’re not just random facts; they’re actually facts that can somehow be connected by proofs back to the underlying axioms.

And I must say that to me this tends to support the idea that Ramanujan had intuition and aesthetic criteria that in some sense captured some of the deeper principles we now know, even if he couldn’t express them directly.

Automating Ramanujan

It’s pretty easy to start picking mathematical statements, say at random, and then getting empirical evidence for whether they’re true or not. Gödel’s Theorem effectively implies that you’ll never know how far you’ll have to go to be certain of any particular result. Sometimes it won’t be far, but sometimes it may in a sense be arbitrarily far.

Ramanujan no doubt convinced himself of many of his results by what amount to empirical methods—and often it worked well. In the case of the counting of primes, however, as Hardy pointed out, things turn out to be more subtle, and results that might work up to very large numbers can eventually fail.

So let’s say one looks at the space of possible mathematical statements, and picks statements that appear empirically at least to some level to be true. Now the next question: are these statements connected in any way?

Imagine one could find proofs of the statements that are true. These proofs effectively correspond to paths through a directed graph that starts with the axioms, and leads to the true results. One possibility is then that the graph is like a star—with every result being independently proved from the axioms. But another possibility is that there are many common “waypoints” in getting from the axioms to the results. And it’s these waypoints that in effect represent general principles.

If there’s a certain sparsity to true results, then it may be inevitable that many of them are connected through a small number of general principles. It might also be that there are results that aren’t connected in this way, but these results, perhaps just because of their lack of connections, aren’t considered “interesting”—and so are effectively dropped when one thinks about a particular subject.

I have to say that these considerations lead to an important question for me. I have spent many years studying what amounts to a generalization of mathematics: the behavior of arbitrary simple programs in the computational universe. And I’ve found that there’s a huge richness of complex behavior to be seen in such programs. But I have also found evidence—not least through my Principle of Computational Equivalence—that undecidability is rife there.

But now the question is, when one looks at all that rich and complex behavior, are there in effect Ramanujan-like facts to be found there? Ultimately there will be much that can’t readily be reasoned about in axiom systems like the ones in mathematics. But perhaps there are networks of facts that can be reasoned about—and that all connect to deeper principles of some kind.

We know from the idea around the Principle of Computational Equivalence that there will always be pockets of “computational reducibility”: places where one will be able to identify abstract patterns and make abstract conclusions without running into undecidability. Repetitive behavior and nested behavior are two almost trivial examples. But now the question is whether among all the specific details of particular programs there are other general forms of organization to be found.

Of course, whereas repetition and nesting are seen in a great many systems, it could be that another form of organization would be seen only much more narrowly. But we don’t know. And as of now, we don’t really have much of a handle on finding out—at least until or unless there’s a Ramanujan-like figure not for traditional mathematics but for the computational universe.

Modern Ramanujans?

Will there ever be another Ramanujan? I don’t know if it’s the legend of Ramanujan or just a natural feature of the way the world is set up, but for at least 30 years I’ve received a steady stream of letters that read a bit like the one Hardy got from Ramanujan back in 1913. Just a few months ago, for example, I received an email (from India, as it happens) with an image of a notebook listing various mathematical expressions that are numerically almost integers—very much like Ramanujan’s .

Mathematical results received in the mail a few months ago

Are these numerical facts significant? I don’t know. Wolfram|Alpha can certainly generate lots of similar facts, but without Ramanujan-like insight, it’s hard to tell which, if any, are significant.

Wolfram|Alpha finds possible formulas for a number

Over the years I’ve received countless communications a bit like this one. Number theory is a common topic. So are relativity and gravitation theory. And particularly in recent years, AI and consciousness have been popular too. The nice thing about letters related to math is that there’s typically something immediately concrete in them: some specific formula, or fact, or theorem. In Hardy’s day it was hard to check such things; today it’s a lot easier. But—as in the case of the almost integer above—there’s then the question of whether what’s being said is somehow “interesting”, or whether it’s just a “random uninteresting fact”.

Needless to say, the definition of “interesting” isn’t an easy or objective one. And in fact the issues are very much the same as Hardy faced with Ramanujan’s letter. If one can see how what’s being presented fits into some bigger picture—some narrative—that one understands, then one can tell whether, at least within that framework, something is “interesting”. But if one doesn’t have the bigger picture—or if what’s being presented is just “too far out”—then one really has no way to tell if it should be considered interesting or not.

When I first started studying the behavior of simple programs, there really wasn’t a context for understanding what was going on in them. The pictures I got certainly seemed visually interesting. But it wasn’t clear what the bigger intellectual story was. And it took quite a few years before I’d accumulated enough empirical data to formulate hypotheses and develop principles that let one go back and see what was and wasn’t interesting about the behavior I’d observed.

I’ve put a few decades into developing a science of the computational universe. But it’s still young, and there is much left to discover—and it’s a highly accessible area, with no threshold of elaborate technical knowledge. And one consequence of this is that I frequently get letters that show remarkable behavior in some particular cellular automaton or other simple program. Often I recognize the general form of the behavior, because it relates to things I’ve seen before, but sometimes I don’t—and so I can’t be sure what will or won’t end up being interesting.

Back in Ramanujan’s day, mathematics was a younger field—not quite as easy to enter as the study of the computational universe, but much closer than modern academic mathematics. And there were plenty of “random facts” being published: a particular type of integral done for the first time, or a new class of equations that could be solved. Many years later we would collect as many of these as we could to build them into the algorithms and knowledgebase of Mathematica and the Wolfram Language. But at the time probably the most significant aspect of their publication was the proofs that were given: the stories that explained why the results were true. Because in these proofs, there was at least the potential that concepts were introduced that could be reused elsewhere, and build up part of the fabric of mathematics.

It would take us too far afield to discuss this at length here, but there is a kind of analog in the study of the computational universe: the methodology for computer experiments. Just as a proof can contain elements that define a general methodology for getting a mathematical result, so the particular methods of search, visualization or analysis can define something in computer experiments that is general and reusable, and can potential give an indication of some underlying idea or principle.

And so, a bit like many of the mathematics journals of Ramanujan’s day, I’ve tried to provide a journal and a forum where specific results about the computational universe can be reported—though there is much more that could be done along these lines.

When a letter one receives contains definite mathematics, in mathematical notation, there is at least something concrete one can understand in it. But plenty of things can’t usefully be formulated in mathematical notation. And too often, unfortunately, letters are in plain English (or worse, for me, other languages) and it’s almost impossible for me to tell what they’re trying to say. But now there’s something much better that people increasingly do: formulate things in the Wolfram Language. And in that form, I’m always able to tell what someone is trying to say—although I still may not know if it’s significant or not.

Over the years, I’ve been introduced to many interesting people through letters they’ve sent. Often they’ll come to our Summer School, or publish something in one of our various channels. I have no story (yet) as dramatic as Hardy and Ramanujan. But it’s wonderful that it’s possible to connect with people in this way, particularly in their formative years. And I can’t forget that a long time ago, I was a 14-year-old who mailed papers about the research I’d done to physicists around the world…

What If Ramanujan Had Mathematica?

Ramanujan did his calculations by hand—with chalk on slate, or later pencil on paper. Today with Mathematica and the Wolfram Language we have immensely more powerful tools with which to do experiments and make discoveries in mathematics (not to mention the computational universe in general).

It’s fun to imagine what Ramanujan would have done with these modern tools. I rather think he would have been quite an adventurer—going out into the mathematical universe and finding all sorts of strange and wonderful things, then using his intuition and aesthetic sense to see what fits together and what to study further.

Ramanujan unquestionably had remarkable skills. But I think the first step to following in his footsteps is just to be adventurous: not to stay in the comfort of well-established mathematical theories, but instead to go out into the wider mathematical universe and start finding—experimentally—what’s true.

It’s taken the better part of a century for many of Ramanujan’s discoveries to be fitted into a broader and more abstract context. But one of the great inspirations that Ramanujan gives us is that it’s possible with the right sense to make great progress even before the broader context has been understood. And I for one hope that many more people will take advantage of the tools we have today to follow Ramanujan’s lead and make great discoveries in experimental mathematics—whether they announce them in unexpected letters or not.

↧

Something I Learned in Kindergarten

May 20, 2016, 1:30 am

≫ Next: Solomon Golomb (1932–2016)

≪ Previous: Who Was Ramanujan?

Fifty years ago today there was a six-year-old at a kindergarten (“nursery school” in British English) in Oxford, England who was walking under some trees and noticed that the patches of light under the trees didn’t look the same as usual. Curious, he looked up at the sun. It was bright, but he could see that one side of it seemed to be missing. And he realized that was why the patches of light looked odd.

He’d heard of eclipses. He didn’t really understand them. But he had the idea that that was what he was seeing. Excited, he told another kid about it. They hadn’t heard of eclipses. But he pointed out that the sun had a bite taken out of it. The other kid looked up. Perhaps the sun was too bright, but they looked away without noticing anything. Then the first kid tried another kid. And then another. None of them believed him about the eclipse and the bite taken out of the sun.

Of course, this is a story about me. And now I can find the eclipse by going to Wolfram|Alpha (or the Wolfram Language):

And, yes, it was fun to see my first eclipse (almost exactly 25 years later, I finally saw a total eclipse too). But my real takeaway from that day was about the world and about people. Even if you notice something as obvious as a bite taken out of the side of the sun, there’s no guarantee that you can convince anyone else that it’s there.

It’s been very helpful to me over the past fifty years to understand that. There’ve been so many times in my life in science, technology and business where things seemed as obvious to me as the bite taken out of the sun. And quite often it’s been easy to get other people to see them too. But sometimes they just don’t.

When they find out that people don’t agree with something that seems obvious to them, many people will just conclude that they’re the ones who are wrong. That even though it seems obvious to them, the “crowd” must be right, and they themselves must somehow be confused. Fifty years ago today I learned that wasn’t true. Perhaps it made me more obstinate, but I could list quite a few pieces of science and technology that I rather suspect wouldn’t exist today if it hadn’t been for that kindergarten experience of mine.

As I write this, I feel an urge to tell a few other stories—and lessons learned—from kindergarten. I should explain that I went to a kindergarten with lots of smart kids, mostly children of Oxford academics. They certainly seemed very bright to me at the time—and, interestingly, many of them have ended up having distinguished lives and careers.

In many ways, the kids were much brighter than most of the teachers. I remember one teacher with the curious theory that children’s minds were like elastic bands—and that if children learned too much, their minds would snap. Of course, those were the days when Bible Study was part of pretty much any school’s curriculum in the UK, and it was probably very annoying that I would come in every day and regale everyone with stories about dinosaurs and geology when the teacher just wanted people to learn Genesis stories.

I don’t think I was great at “doing what the other kids do”. When I was three years old, and first at school, there was a time when everyone was supposed to run around “like a bus” (I guess ignoring the fact that buses go on roads…). I didn’t want to do it, and just stood in one place. “Why aren’t you being a bus?”, the teacher asked. “Well, I am lamp post”, I said. They seemed sufficiently taken aback by that response that they left me alone.

I learned an important lesson when I was about five, from another kid. (The kid in question happened to grow up to become a distinguished mathematician—and she was even knighted recently for her contributions to mathematics—but that’s not really relevant to the story.) We were supposed to be hammering nails into pieces of wood. Yes, in those days in the UK they let five-year-olds do that. Anyway, she had the hammer and said “Can you hold the nail? Trust me, I know what I’m doing.” Needless to say, she missed the nail. My thumb was black for several days. But it was a small price to pay for a terrific life lesson: just because someone claims to know what they’re talking about doesn’t mean they do. And nowadays, when I’m dealing with some expert who says “trust me, I know what I’m talking about”, I can’t help but have my mind wander back half a century to that moment just before the hammer fell.

I’ll relate two more stories. The first one I’m not sure how I feel about now. It had to do with learning addition. Now, realistically, I have a good memory (which is perhaps obvious given that I’m writing about things that happened 50 years ago). So I could perfectly well have just memorized all my addition facts. But somehow I didn’t want to. And one day I noticed that if I put two rulers next to each other, I could make a little machine that would add for me—an “addition slide rule”. So whenever we were doing additions, I always “happened” to have two rulers on my desk. When it came to multiplication, I didn’t memorize that either—though in that case I discovered I could go far by knowing the single fact that 7×8=56—because that was the fact other kids didn’t know. (In the end, it took until I was in my forties before I’d finally learned every part of my multiplication table up to 12×12.) And as I look at Wolfram|Alpha and Mathematica and so on, and think about my addition slide rule, I’m reminded of the theory that people never really change….

My final story comes from around the same time as the eclipse. Back then, the UK used non-decimal currency: there were 12 pennies in a shilling, and 20 shillings in a pound. And one of the exercises for us kids was to do mixed-radix arithmetic with these things. I was very pleased with myself one day when I figured out that money didn’t have to work this way; that everything could be base 10 (well, I didn’t explicitly know the concept of base 10 yet). I told this to a teacher. They were a little confused, but said that currency had worked the same way for hundreds of years, and wasn’t going to change. A couple of years later, the UK announced it was going to decimalize its currency. (I suspect if it had waited longer it would still have non-decimal currency, and there would just be a big market for calculators that could compute with it.) I’ve kept this little incident with me all these years—as a reminder that things can change, even if they’ve been the way they are for a very long time. Oh, and again, that one shouldn’t necessarily believe what one’s told. But I guess that’s a theme….

↧

Solomon Golomb (1932–2016)

May 25, 2016, 10:44 am

≫ Next: Idea Makers: A Book about Lives & Ideas

≪ Previous: Something I Learned in Kindergarten

The Most-Used Mathematical Algorithm Idea in History

An octillion. A billion billion billion. That’s a fairly conservative estimate of the number of times a cellphone or other device somewhere in the world has generated a bit using a maximum-length linear-feedback shift register sequence. It’s probably the single most-used mathematical algorithm idea in history. And the main originator of this idea was Solomon Golomb, who died on May 1—and whom I knew for 35 years.

Solomon Golomb’s classic book Shift Register Sequences, published in 1967—based on his work in the 1950s—went out of print long ago. But its content lives on in pretty much every modern communications system. Read the specifications for 3G, LTE, Wi-Fi, Bluetooth, or for that matter GPS, and you’ll find mentions of polynomials that determine the shift register sequences these systems use to encode the data they send. Solomon Golomb is the person who figured out how to construct all these polynomials.

He also was in charge when radar was first used to find the distance to Venus, and of working out how to encode images to be sent from Mars. He introduced the world to what he called polyominoes, which later inspired Tetris (“tetromino tennis”). He created and solved countless math and wordplay puzzles. And—as I learned about 20 years ago—he came very close to discovering my all-time-favorite rule 30 cellular automaton all the way back in 1959, the year I was born.

How I Met Sol Golomb

Most of the scientists and mathematicians I know I met first through professional connections. But not Sol Golomb. It was 1981, and I was at Caltech, a 21-year-old physicist who’d just received some media attention from being the youngest in the first batch of MacArthur award recipients. I get a knock at my office door—and a young woman is there. Already this was unusual, because in those days there were hopelessly few women to be found around a theoretical high-energy physics group. I was a sheltered Englishman who’d been in California a couple of years, but hadn’t really ventured outside the university—and was ill prepared for the burst of Southern Californian energy that dropped in to see me that day. She introduced herself as Astrid, and said that she’d been visiting Oxford and knew someone I’d been at kindergarten with. She explained that she had a personal mandate to collect interesting acquaintances around the Pasadena area. I think she considered me a difficult case, but persisted nevertheless. And one day when I tried to explain something about the work I was doing she said, “You should meet my father. He’s a bit old, but he’s still as sharp as a tack.” And so it was that Astrid Golomb, oldest daughter of Sol Golomb, introduced me to Sol Golomb.

The Golombs lived in a house perched in the hills near Pasadena. I learned that they had two daughters—Astrid, a little older than me, an aspiring Hollywood person, and Beatrice, about my age, a high-powered science type. The Golomb sisters often had parties, usually at their family’s house. There were themes, like the flamingoes & hedgehogs croquet garden party (“recognition will be given to the person who appears most appropriately attired”), or the Stonehenge party with instructions written using runes. The parties had an interesting cross-section of young and not-so-young people, including various local luminaries. And always there, hanging back a little, was Sol Golomb, a small man with a large beard and a certain elf-like quality to him, typically wearing a dark suit coat.

I gradually learned a little about Sol Golomb. That he was involved in “information theory”. That he worked at USC (the University of Southern California). That he had various unspecified but apparently high-level government and other connections. I’d heard of shift registers, but didn’t really know anything much about them.

Then in the fall of 1982, I visited Bell Labs in New Jersey and gave a talk about my latest results on cellular automata. One topic I discussed was what I called “additive” or “linear” cellular automata—and their behavior with limited numbers of cells. Whenever a cellular automaton has a limited number of cells, it’s inevitable that its behavior will eventually repeat. But as the size increases, the maximum repetition period—say for the rule 90 additive cellular automaton—bounces around seemingly quite randomly: 1, 1, 3, 2, 7, 1, 7, 6, 31, 4, 63, …. A few days before my talk, however, I’d noticed that these periods actually seemed to follow a formula that depended on things like the prime factorization of the number of cells. But when I mentioned this during the talk, someone at the back put up their hand and asked, “Do you know if it works for the case n=37?” My experiments hadn’t gotten as far as the size-37 case yet, so I didn’t know. But why would someone ask that?

The person who asked turned out to be a certain Andrew Odlyzko, a number theorist at Bell Labs. I asked him, “What on earth makes you think there might be something special about n=37?” “Well,” he said, “I think what you’re doing is related to the theory of linear feedback shift registers,” and he suggested that I look at Sol Golomb’s book (“Oh yes,” I said, “I know his daughters…”). Andrew was indeed correct: there is a very elegant theory of additive cellular automata based on polynomials that is similar to the theory Sol developed for linear feedback shift registers. Andrew and I ended up writing a now-rather-well-cited paper about it (it’s interesting because it’s a rare case where traditional mathematical methods let one say things about nontrivial cellular automaton behavior). And for me, a side effect was that I learned something about what the somewhat mysterious Sol Golomb actually did. (Remember, this was before the web, so one couldn’t just instantly look everything up.)

The Story of Sol Golomb

Solomon Golomb was born in Baltimore, Maryland in 1932. His family came from Lithuania. His grandfather had been a rabbi; his father moved to the US when he was young, and got a master’s degree in math before switching to medieval Jewish philosophy and also becoming a rabbi. Sol’s mother came from a prominent Russian family that had made boots for the Tsar’s army and then ran a bank. Sol did well in school, notably being a force in the local debating scene. Encouraged by his father, he developed an interest in mathematics, publishing a problem he invented about primes when he was 17. After high school, Sol enrolled at Johns Hopkins University to study math, narrowly avoiding a quota on Jewish students by promising he wouldn’t switch to medicine—and took twice the usual course load, graduating in 1951 after half the usual time.

From there he would go to Harvard for graduate school in math. But first he took a summer job at the Glenn L. Martin Company, an aerospace firm founded in 1912 that had moved to Baltimore from Los Angeles in the 1920s and mostly become a defense contractor—and that would eventually merge into Lockheed Martin. At Harvard, Sol specialized in number theory, and in particular in questions about characterizations of sets of prime numbers. But every summer he would return to the Martin Company. As he later described it, he found that at Harvard “the question of whether anything that was taught or studied in the mathematics department had any practical applications could not even be asked, let alone discussed”. But at the Martin Company, he discovered that the pure mathematics he knew—even about primes and things—did indeed have practical applications, and very interesting ones, especially to shift registers.

The first summer he was at the Martin Company, Sol was assigned to a control theory group. But by his second summer, he’d been put in a group studying communications. And in June 1954 it so happened that his supervisor had just gone to a conference where he’d heard about strange behavior observed in linear feedback shift registers (he called them “tapped delay lines with feedback”)—and he asked Sol if he could investigate. It didn’t take Sol long to realize that what was going on could be very elegantly studied using the pure mathematics he knew about polynomials over finite fields. Over the year that followed, he split his time between graduate school at Harvard and consulting for the Martin Company, and in June 1955 he wrote his final report, “Sequences with Randomness Properties”—which would basically become the foundational document of the theory of shift register sequences.

Sol liked math puzzles, and in the process of thinking about a puzzle involving arranging dominoes on a checkerboard, he ended up inventing what he called “polyominoes”. He gave a talk about them in November 1953 at the Harvard Mathematics Club, published a paper about them (his first research publication), won a Harvard math prize for his work on them, and, as he later said, then “found [himself] irrevocably committed to their care and feeding” for the rest of his life.

Pages from Solomon Golomb's first research paper

In June 1955, Sol went to spend a year at the University of Oslo on a Fulbright Fellowship—partly so he could work with some distinguished number theorists there, and partly so he could add Norwegian, Swedish and Danish (and some runic scripts) to his collection of language skills. While he was there, he finished a long paper on prime numbers, but also spent time traveling around Scandinavia, and in Denmark met a young woman named Bo (Bodil Rygaard)—who came from a large family in a rural area mostly known for its peat moss, but had managed to get into university and was studying philosophy. Sol and Bo apparently hit it off, and within months, they were married.

When they returned to the US in July 1956, Sol interviewed in a few places, then accepted a job at JPL—the Jet Propulsion Lab that had spun off from Caltech, initially to do military work. Sol was assigned to the Communications Research Group, as a Senior Research Engineer. It was a time when the people at JPL were eager to try launching a satellite. At first, the government wouldn’t let them do it, fearing it would be viewed as a military act. But that all changed in October 1957 when the Soviet Union launched Sputnik, ostensibly as part of the International Geophysical Year. Amazingly, it took only 3 months for the US to launch Explorer 1. JPL built much of it, and Sol’s lab (where he had technicians building electronic implementations of shift registers) was diverted into doing things like making radiation detectors (including, as it happens, the ones that discovered the Van Allen radiation belts)—while Sol himself worked on using radar to determine the orbit of the satellite when it was launched, taking a little time out to go back to Harvard for his final PhD exam.

It was a time of great energy around JPL and the space program. In May 1958 a new Information Processing Group was formed, and Sol was put in charge—and in the same month, Sol’s first child, the aforementioned Astrid, was born. Sol continued his research on shift register sequences—particularly as applied to jamming-resistant radio control of missiles. In May 1959, Sol’s second child arrived—and was named Beatrice, forming a nice A, B sequence. In the fall of 1959, Sol took a sabbatical at MIT, where he got to know Claude Shannon and a number of other MIT luminaries, and got involved in information theory and the theory of algebraic codes.

As it happens, he’d already done some work on coding theory—in the area of biology. The digital nature of DNA had been discovered by Jim Watson and Francis Crick in 1953, but it wasn’t yet clear just how sequences of the four possible base pairs encoded the 20 amino acids. In 1956, Max Delbrück—Jim Watson’s former postdoc advisor at Caltech—asked around at JPL if anyone could figure it out. Sol and two colleagues analyzed an idea of Francis Crick’s and came up with “comma-free codes” in which overlapping triples of base pairs could encode amino acids. The analysis showed that exactly 20 amino acids could be encoded this way. It seemed like an amazing explanation of what was seen—but unfortunately it isn’t how biology actually works (biology uses a more straightforward encoding, where some of the 64 possible triples just don’t represent anything).

In addition to biology, Sol was also pulled into physics. His shift register sequences were useful for doing range finding with radar (much as they’re used now in GPS), and at Sol’s suggestion, he was put in charge of trying to use them to find the distance to Venus. And so it was that in early 1961—when the Sun, Venus, and Earth were in alignment—Sol’s team used the 85-foot Goldstone radio dish in the Mojave Desert to bounce a radar signal off Venus, and dramatically improve our knowledge of the Earth-Venus and Earth-Sun distances.

With his interest in languages, coding and space, it was inevitable that Sol would get involved in the question of communications with extraterrestrials. In 1961 he wrote a paper for the Air Force entitled “A Short Primer for Extraterrestrial Linguistics”, and over the next several years wrote several papers on the subject for broader audiences. He said that “There are two questions involved in communication with Extraterrestrials. One is the mechanical issue of discovering a mutually acceptable channel. The other is the more philosophical problem (semantic, ethic, and metaphysical) of the proper subject matter for discourse. In simpler terms, we first require a common language, and then we must think of something clever to say.” He continued, with a touch of his characteristic humor: “Naturally, we must not risk telling too much until we know whether the Extraterrestrials’ intentions toward us are honorable. The Government will undoubtedly set up a Cosmic Intelligence Agency (CIA) to monitor Extraterrestrial Intelligence. Extreme security precautions will be strictly observed. As H. G. Wells once pointed out [or was it an episode of The Twilight Zone?], even if the Aliens tell us in all truthfulness that their only intention is ‘to serve mankind,’ we must endeavor to ascertain whether they wish to serve us baked or fried.”

While at JPL, Sol had also been teaching some classes at the nearby universities: Caltech, USC and UCLA. In the fall of 1962, following some changes at JPL—and perhaps because he wanted to spend more time with his young children—he decided to become a full-time professor. He got offers from all three schools. He wanted to go somewhere where he could “make a difference”. He was told that at Caltech “no one has any influence if they don’t at least have a Nobel Prize”, while at UCLA “the UC bureaucracy is such that no one ever has any ability to affect anything”. The result was that—despite its much-inferior reputation at the time—Sol chose USC. He went there in the spring of 1963 as a Professor of Electrical Engineering—and ended up staying for 53 years.

Shift Registers

Before going on with the story of Sol’s life, I should explain what a linear feedback shift register (LFSR) actually is. The basic idea is simple. Imagine a row of squares, each containing either 1 or 0 (say, black or white). In a pure shift register all that happens is that at each step all values shift one position to the left. The leftmost value is lost, and a new value is “shifted in” from the right. The idea of a feedback shift register is that the value that’s shifted in is determined (or “fed back”) from values at other positions in the shift register. In a linear feedback shift register, the values from “taps” at particular positions in the register are combined by being added mod 2 (so that 1⊕1=0 instead of 2), or equivalently XOR’ed (“exclusive or”, true if either is true, but not both).

Linear feedback shift register

If one runs this for a while, here’s what happens:

The shift register sequence when run for a while

Obviously the shift register is always shifting bits to the left. And it has a very simple rule for how bits should be added at the right. But if one looks at the sequence of these bits, it seems rather random—though, as the picture shows, it does eventually repeat. What Sol Golomb did was to find an elegant mathematical way to analyze such sequences, and how they repeat.

If a shift register has size n, then it has 2ⁿ possible states altogether (corresponding to all possible sequences of 0s and 1s of length n). Since the rules for the shift register are deterministic, any given state must always go to the same next state. And that means the maximum possible number of steps the shift register could conceivably go through before it repeats is 2ⁿ(actually, it’s 2ⁿ–1, because the state with all 0s can’t evolve into anything else).

In the example above, the shift register is of size 7, and it turns out to repeat after exactly 2⁷–1=127 steps. But which shift registers—with which particular arrangements of taps—will produce sequences with maximal lengths? This is the first question Sol Golomb set out to investigate in the summer of 1954. His answer was simple and elegant.

The shift register above has taps at positions 7, 6 and 1. Sol represented this algebraically, using the polynomial x⁷+x⁶+1. Then what he showed was that the sequence that would be generated would be of maximal length if this polynomial is “irreducible modulo 2”, so that it can’t be factored, making it sort of the analog of a prime among polynomials—as well as having some other properties that make it a so-called “primitive polynomial“. Nowadays, with Mathematica and the Wolfram Language, it’s easy to test things like this:

Irreducible polynomial in the Wolfram Language: IrreduciblePolynomialQ[x^7 + x^6 + 1, Modulus ->2]

Back in 1954, Sol had to do all this by hand, but came up with a fairly long table of primitive polynomials corresponding to shift registers that give maximal length sequences:

Sol's table of "Irreducible polynomials modulo 2, through degree II, with their periods"

The Prehistory of Shift Registers

The idea of maintaining short-term memory by having “delay lines” that circulate digital pulses (say in an actual column of mercury) goes back to the earliest days of electronic computers. By the late 1940s such delay lines were routinely being implemented purely digitally, using sequences of vacuum tubes, and were being called “shift registers”. It’s not clear when the first feedback shift registers were built. Perhaps it was at the end of the 1940s. But it’s still shrouded in mystery—because the first place they seem to have been used was in military cryptography.

The basic idea of cryptography is to take meaningful messages, and then randomize them so they can’t be recognized, but in such a way that the randomization can always be reversed if you know the key that was used to create it. So-called stream ciphers work by generating long sequences of seemingly random bits, then combining these with some representation of the message—then decoding by having the receiver independently generate the same sequence of seemingly random bits, and “backing this out” of the encoded message received.

Linear feedback shift registers seem at first to have been prized for cryptography because of their long repetition periods. As it turns out, the mathematical analysis Sol used to find things like these periods also makes clear that such shift registers aren’t good for secure cryptography. But in the early days, they seemed pretty good—particularly compared to, say, successive rotor positions in an Enigma machine—and there’s been a persistent rumor that, for example, Soviet military cryptosystems were long based on them.

Back in 2001, when I was working on history notes for my book A New Kind of Science, I had a long phone conversation with Sol about shift registers. Sol told me that when he started out, he didn’t know anything about cryptographic work on shift registers. He said that people at Bell Labs, Lincoln Labs and JPL had also started working on shift registers around the same time he did—though perhaps through knowing more pure mathematics, he managed to get further than they did, and in the end his 1955 report basically defined the field.

Over the years that followed, Sol gradually heard about various precursors of his work in the pure mathematical literature. Way back in the year 1202 Fibonacci was already talking about what are now called Fibonacci numbers—and which are generated by a recurrence relation that can be thought of as an analog of a linear feedback shift register, but working with arbitrary integers rather than 0s and 1s. There was a little work on recurrences with 0s and 1s done in the early 1900s, but the first large-scale study seems to have been by Øystein Ore, who, curiously, came from the University of Oslo, though was by then at Yale. Ore had a student named Marshall Hall—who Sol told me he knew had consulted for the predecessor of the National Security Agency in the late 1940s—possibly about shift registers. But whatever he may have done was kept secret, and so it fell to Sol to discover and publish the story of linear feedback shift registers—even though Sol did dedicate his 1967 book on shift registers to Marshall Hall.

What Are Shift Register Sequences Good For?

Over the years I’ve noticed the principle that systems defined by sufficiently simple rules always eventually end up having lots of applications. Shift registers follow this principle in spades. And for example modern hardware (and software) systems are bristling with shift registers: a typical cellphone probably has a dozen or two, implemented usually in hardware but sometimes in software. (When I say “shift register” here, I mean linear feedback shift register, or LFSR.)

Most of the time, the shift registers that are used are ones that give maximum-length sequences (otherwise known as “m-sequences”). And the reasons they’re used are typically related to some very special properties that Sol discovered about them. One basic property they always have is that they contain the same total number of 0s and 1s (actually, there’s always exactly one extra 1). Sol then showed that they also have the same number of 00s, 01s, 10s and 11s—and the same holds for larger blocks too. This “balance” property is on its own already very useful, for example if one’s trying to efficiently test all possible bit patterns as input to a circuit.

But Sol discovered another, even more important property. Replace each 0 in a sequence by –1, then imagine multiplying each element in a shifted version of the sequence by the corresponding element in the original. What Sol showed is that if one adds up these products, they’ll always sum to zero, except when there’s no shift at all. Said more technically, he showed that the sequence has no correlation with shifted versions of itself.

Both this and the balance property will be approximately true for any sufficiently long random sequence of 0s and 1s. But the surprising thing about maximum-length shift register sequences is that these properties are always exactly true. The sequences in a sense have some of the signatures of randomness—but in a very perfect way, made possible by the fact that they’re not random at all, but instead have a very definite, organized structure.

It’s this structure that makes linear feedback shift registers ultimately not suitable for strong cryptography. But they’re great for basic “scrambling” and “cheap cryptography”—and they’re used all over the place for these purposes. A very common objective is just to “whiten” (as in “white noise”) a signal. It’s pretty common to want to transmit data that’s got long sequences of 0s in it. But the electronics that pick these up can get confused if they see what amounts to “silence” for too long. One can avoid the problem by scrambling the original data by combining it with a shift register sequence, so there’s always some kind of “chattering” going on. And that’s indeed what’s done in Wi-Fi, Bluetooth, USB, digital TV, modern Ethernet and lots of other places.

It’s often a nice side effect that the shift register scrambling makes the signal harder to decode—and this is sometimes used to provide at least some level of security. (DVDs use a combination of a size-16 and a size-24 shift register to attempt to encode their data; many GSM phones use a combination of three shift registers to encode all their signals, in a way that was at first secret.)

GPS makes crucial use of shift register sequences too. Each GPS satellite continuously transmits a shift register sequence (from a size-10 shift register, as it happens). A receiver can tell at exactly what time a signal it’s just received was transmitted from a particular satellite by seeing what part of the sequence it got. And by comparing delay times from different satellites, the receiver can triangulate its position. (There’s also a precision mode of GPS, that uses a size-1024 shift register.)

A quite different use of shift registers is for error detection. Say one’s transmitting a block of bits, but each one has a small probability of error. A simple way to let one check for a single error is to include a “parity bit” that says whether there should be an odd or even number of 1s in the block of bits. There are generalizations of this called CRCs (cyclic redundancy checks) that can check for a larger number of errors—and that are computed essentially by feeding one’s data into none other than a linear feedback shift register. (There are also error-correcting codes that let one not only detect but also correct a certain number of errors, and some of these, too, can be computed with shift register sequences—and in fact Sol Golomb used a version of these called Reed–Solomon codes to design the video encoding for Mars spacecraft.)

The list of uses for shift register sequences goes on and on. A fairly exotic example—more popular in the past than now—was to use shift register sequences to jitter the clock in a computer to spread out the frequency at which the CPU would potentially generate radio interference (“select Enable Spread Spectrum in the BIOS”).

One of the single most prominent uses of shift register sequences is in cellphones, for what’s called CDMA (Code Division Multiple Access). Cellphones got their name because they operate in “cells”, with all phones in a given cell being connected to a particular tower. But how do different cellphones in a cell not interfere with each other? In the first systems, each phone just negotiated with the tower to use a slightly different frequency. Later, they used different time slices (TDMA, or Time Division Multiple Access). But CDMA uses maximum-length shift register sequences to provide a clever alternative.

The idea is to have all phones essentially operate on the same frequency, but to have each phone encode its signal using (in the simplest case) a differently shifted version of a shift register sequence. And because of Sol’s mathematical results, these differently shifted versions have no correlation—so the cellphone signals don’t interfere. And this is how, for example, most 3G cellphone networks operate.

Sol created the mathematics for this, but he also brought some of the key people together. Back in 1959, he’d gotten to know a certain Irwin Jacobs, who’d recently gotten a PhD at MIT. Meanwhile, he knew Andy Viterbi, who worked at JPL. Sol introduced the two of them—and by 1968 they’d formed a company called Linkabit which did work on coding systems, mostly for the military.

Linkabit had many spinoffs and descendents, and in 1985 Jacobs and Viterbi started a new company called Qualcomm. It didn’t immediately do especially well, but by the early 1990s it began a meteoric rise when it started making the components to deploy CDMA in cellphones—and in 1999 Sol became the “Viterbi Professor of Communications” at USC.

Where Are There Shift Registers?

It’s sort of amazing that—although most people have never heard of them—shift register sequences are actually used in one way or another almost whenever bits are moved around in modern communication systems, computers and elsewhere. It’s quite confusing sometimes, because there are lots of things with different names and acronyms that all turn out to be linear feedback shift register sequences (PN, pseudonoise, M-, FSR, LFSR sequences, spread spectrum communications, MLS, SRS, PRBS, …).

If one looks at cellphones, shift register sequence usage has gone up and down over the years. 2G networks are based on TDMA, so don’t use shift register sequences to encode their data—but still often use CRCs to validate blocks of data. 3G networks are big users of CDMA—so there are shift register sequences involved in pretty much every bit that’s transmitted. 4G networks typically use a combination of time and frequency slots which don’t directly involve shift register sequences—though there are still CRCs used, for example to deal with data integrity when frequency windows overlap. 5G is designed to be more elaborate—with large arrays of antennas dynamically adapting to use optimal time and frequency slots. But half their channels are typically allocated to “pilot signals” that are used to infer the local radio environment—and work by transmitting none other than shift register sequences.

Throughout most kinds of electronics it’s common to want to use the highest data rates and the lowest powers that still get bits transmitted correctly above the “noise floor”. And typically the way one pushes to the edge is to do automatic error detection—using CRCs and therefore shift register sequences. And in fact pretty much every kind of bus (PCI, SATA, etc.) inside a computer does this: whether it’s connecting parts of CPUs, getting data off devices, or connecting to a display with HDMI. And on disks and in memory, for example, CRCs and other shift-register-sequence-based codes are pretty much universally used to operate at the highest possible rates and densities.

Shift registers are so ubiquitous, it’s a little difficult to estimate just how many of them are in use, and how many bits are being generated by them. There are perhaps 10 billion computers, slightly fewer cellphones, and an increasing number of billions of embedded and IoT (“Internet of Things”) devices. (Even many of the billion cars in the world, for example, have at least 10 microprocessors in them.)

At what rate are the shift registers running? Here, again, things are complicated. In communications systems, for example, there’s a basic carrier frequency—usually in the GHz range—and then there’s what’s called a “chipping rate” (or, confusingly, “chip rate”) that says how fast something like CDMA is done, and this is usually in the MHz range. On the other hand, in buses inside computers, or in connections to a display, all the data is going through shift registers, at the full data range, which is well into the GHz range.

So it seems safe to estimate that there are at least 10 billion communications links, running for at least 1/10 billion seconds (which is 3 years), that use at least 1 billion bits from a shift register every second—meaning that to date Sol’s algorithm has been used at least an octillion times.

Is it really the most-used mathematical algorithm idea in history? I think so. I suspect the main potential competition would be from arithmetic operations. These days processors are doing perhaps a trillion arithmetic operations per second—and such operations are needed for pretty much every bit that’s generated by a computer. But how is arithmetic done? At some level it’s just a digital electronics implementation of the way people have done arithmetic forever.

But there are some wrinkles—some “algorithmic ideas”—though they’re quite obscure, except to microprocessor designers. Just as when Babbage was making his Difference Engine, carries are a big nuisance in doing arithmetic. (One can actually think of a linear feedback shift register as being a system that does something like arithmetic, but doesn’t do carries.) There are “carry propagation trees” that optimize carrying. There are also little tricks (“Booth encoding”, “Wallace trees”, etc.) that reduce the number of bit operations needed to do the innards of arithmetic. But unlike with LFSRs, there doesn’t seem to be one algorithmic idea that’s universally used—and so I think it’s still likely that Sol’s maximum-length LFSR sequence idea is the winner for most used.

Cellular Automata and Nonlinear Shift Registers

Even though it’s not obvious at first, it turns out there’s a very close relationship between feedback shift registers and something I’ve spent many years studying: cellular automata. The basic setup for a feedback shift register involves computing one bit at a time. In a cellular automaton, one has a line of cells, and at each step all the cells are updated in parallel, based on a rule that depends, say, on the values of their nearest neighbors.

To see how these are related, think about running a feedback shift register of size n, but displaying its state only every n steps—in other words, letting all the bits be rewritten before one displays again. If one displays every step of a linear feedback shift register (here with two taps next to each other), as in the first two panels below, nothing much happens at each step, except that things shift to the left. But if one makes a compressed picture, showing only every n steps, suddenly a pattern emerges.

Compressing steps from the linear feedback register on the left reveals the pattern on the right

It’s a nested pattern, and it’s very close to being the exact same pattern that one gets with a cellular automaton that takes a cell and its neighbor, and adds them mod 2 (or XORs them). Here’s what happens with that cellular automaton, if one arranges its cells so they’re in a circle of the same size as the shift register above:

Output of a cellular automaton that takes a cell and its neighbor and adds them mod 2

At the beginning, the cellular automaton and shift register patterns are exactly the same—though when they “hit the edge” they become slightly different because the edges are handled differently. But looking at these pictures it becomes less surprising that the math of shift registers should be relevant to cellular automata. And seeing the regularity of the nested patterns makes it clearer why there might be an elegant mathematical theory of shift registers in the first place.

Typical shift registers used in practice don’t tend to make such obviously regular patterns, though. Here are a few examples of shift registers that yield maximum-length sequences. When one’s doing math, like Sol did, it’s very much the same story as for the case of obvious nesting. But here the fact that the taps are far apart makes things get mixed up, leaving no obvious visual trace of nesting.

Some shift registers that yield maximum-length sequences

So how broad is the correspondence between shift registers and cellular automata? In cellular automata the rules for generating new values of cells can be anything one wants. In linear feedback shift registers, however, they always have to be based on adding mod 2 (or XOR’ing). But that’s what the “linear” part of “linear feedback shift register” means. And it’s also in principle possible to have nonlinear feedback shift registers (NFSRs) that use whatever rule one wants for combining values.

And in fact, once Sol had worked out his theory for linear feedback shift registers, he started in on the nonlinear case. When he arrived at JPL in 1956 he got an actual lab, complete with racks of little electronic modules. Sol told me each module was about the size of a cigarette pack—and was built from a Bell Labs design to perform a particular logic operation (AND, OR, NOT, …). The modules could be strung together to implement whatever nonlinear feedback shift register one wanted, and they ran pretty fast—producing about a million bits per second. (Sol told me that someone tried doing the same thing with a general-purpose computer—and what took 1 second with the custom hardware modules took 6 weeks on the general-purpose computer.)

When Sol had looked at linear feedback shift registers, the first big thing he’d managed to understand was their repetition periods. And with nonlinear ones he put most of his effort into trying to understand the same thing. He collected all sorts of experimental data. He told me he even tested sequences of length 2⁴⁵—which must have taken a year. He made summaries, like the one below (notice the visualizations of sequences, shown as oscilloscope-like traces). But he never managed to come up with any kind of general theory as he had with linear feedback shift registers.

One of Sol's summaries of nonlinear shift sequences

It’s not surprising he couldn’t do it. Because when one looks at nonlinear feedback shift registers, one’s effectively sampling the whole richness of the computational universe of possible simple programs. Back in the 1950s there were already theoretical results—mostly based on Turing’s ideas of universal computation—about what programs could in principle do. But I don’t think Sol or anyone else ever thought they would apply to the very simple—if nonlinear—functions in NFSRs.

And in the end it basically took until my work around 1981 for it to become clear just how complicated the behavior of even very simple programs could be. My all-time favorite example is rule 30—a cellular automaton in which the values of neighboring cells are combined using a function that can be represented as p+q+r+qr mod 2 (or p XOR (q OR r)). And, amazingly, Sol looked at nonlinear feedback shift registers that were based on incredibly similar functions—like, in 1959, p+r+s+qr+qs+rs mod 2. Here’s what Sol’s function (which can be thought of as “rule 29070”), rule 30, and a couple of other similar rules look like in a shift register:

Rule 29070, rule 30, rule 45, and rule 73 as they appear in a shift register

And here’s what they look like as cellular automata, without being constrained to a fixed-size register:

Rule 29070, rule 30, rule 45, and rule 73 as cellular automata

Of course, Sol never made pictures like this (and it would, realistically, have been almost impossible to do so in the 1950s). Instead, he concentrated on a kind of aggregate feature: the overall repetition period.

Sol wondered whether nonlinear feedback shift registers might make good sources of randomness. From what we now know about cellular automata, it’s clear they can. And for example the rule 30 cellular automaton is what we used to generate randomness for Mathematica for 25 years (though we recently retired it in favor of a more efficient rule that we found by searching trillions of possibilities).

Sol didn’t talk about cryptography much—though I suspect he did quite a bit of government work on it. He did tell me though that in 1959 he’d found a “multi-dimensional correlation attack on nonlinear sequences”, though he said that at the time he “carefully avoided stating that the application was to cryptanalysis”. The fact is that cellular automata like rule 30 (and presumably also nonlinear feedback shift registers) do seem to be good cryptosystems—though partly because of confusions about whether they’re somehow equivalent to linear feedback shift registers (they’re not), they’ve never been used as much as they should.

Being a history enthusiast, I’ve tried over the past few decades to identify all precursors to my work on 1D cellular automata. 2D cellular automata had been studied a bit, but there was only quite theoretical work on the 1D case, together with a few specific investigations in the cryptography community (that I’ve never fully found out about). And in the end, of all the things I’ve seen, I think Sol Golomb’s nonlinear feedback shift registers were in a sense closest to what I actually ended up doing a quarter century later.

Polyominoes

Mention the name “Golomb” and some people will think of shift registers. But many more will think of polyominoes. Sol didn’t invent polyominoes—though he did invent the name. But what he did was to make systematic what had appeared only in isolated puzzles before.

The main question Sol was interested in was how and when collections of polyominoes can be arranged to tile particular (finite or infinite) regions. Sometimes it’s fairly obvious, but often it’s very tricky to figure out. Sol published his first paper on polyominoes in 1954, but what really launched polyominoes into the public consciousness was Martin Gardner’s 1957 Mathematical Games column on them in Scientific American. As Sol explained in the introduction to his 1964 book, the effect was that he acquired “a steady stream of correspondents from around the world and from every stratum of society—board chairmen of leading universities, residents of obscure monasteries, inmates of prominent penitentiaries…”

Game companies took notice too, and within months, for example, the “New Sensational Jinx Jigsaw Puzzle” had appeared—followed over the course of decades by a long sequence of other polyomino-based puzzles and games (no, the sinister bald guy doesn’t look anything like Sol):

Sol was still publishing papers about polyominoes 50 years after he first discussed them. In 1961 he introduced general subdividable “rep-tiles”, which it later became clear can make nested, fractal (“infin-tile”), patterns. But almost everything Sol did with polyominoes involved solving specific tiling problems with them.

For me, polyominoes are most interesting not for their specifics but for the examples they provide of more-general phenomena. One might have thought that given a few simple shapes it would be easy to decide whether they can tile the whole plane. But the example of polyominoes—with all the games and puzzles they support—makes it clear that it’s not necessarily so easy. And in fact it was proved in the 1960s that in general it’s a theoretically undecidable problem.

If one’s only interested in a finite region, then in principle one can just enumerate all conceivable arrangements of the original shapes, and see whether any of them correspond to successful tilings. But if one’s interested in the whole, infinite plane then one can’t do this. Maybe one will find a tiling of size one million, but there’s no guarantee how far the tiling can be extended.

It turns out it can be like running a Turing machine—or a cellular automaton. You start from a line of tiles. Then the question of whether there’s an infinite tiling is equivalent to the question of whether there’s a setup for some Turing machine that makes it never halt. And the point then is that if the Turing machine is universal (so that it can in effect be programmed to do any possible computation) then the halting problem for it can be undecidable, which means that the tiling problem is also undecidable.

Of course, whether a tiling problem is undecidable depends on the original set of shapes. And for me an important question is how complicated the shapes have to be so that they can encode universal computation, and yield an undecidable tiling problem. Sol Golomb knew the literature on this kind of question, but wasn’t especially interested in it. But I start thinking about materials formed from polyominoes whose pattern of “crystallization” can in effect do an arbitrary computation, or occur at a “melting point” that seems “random” because its value is undecidable.

Complicated, carefully crafted sets of polyominoes are known that in effect support universal computation. But what’s the simplest set—and is it simple enough that one might run across by accident? My guess is that—just like with other kinds of systems I’ve studied in the computational universe—the simplest set is in fact simple. But finding it is very difficult.

A considerably easier problem is to find polyominoes that successfully tile the plane, but can’t do so periodically. Roger Penrose (of Penrose tiles fame) found an example in 1994. My book A New Kind of Science gave a slightly simpler example with 3 polyominoes:

The Rest of the Story

By the time Sol was in his early thirties, he’d established his two most notable pursuits—shift registers and polyominoes—and he’d settled into life as a university professor. He was constantly active, though. He wrote what ended up being a couple of hundred papers, some extending his earlier work, some stimulated by questions people would ask him, and some written, it seems, for the sheer pleasure of figuring out interesting things about numbers, sequences, cryptosystems, or whatever.

Shift registers and polyominoes are both big subjects (they even each have their own category in the AMS classification of mathematical publication topics). Both have had a certain injection of energy in the past decade or two as modern computer experiments started to be done on them—and Sol collaborated with people doing these. But both fields still have many unanswered questions. Even for linear feedback shift registers there are bigger Hadamard matrices to be found. And very little is known even now about nonlinear feedback shift registers. Not to mention all the issues about nonperiodic and otherwise exotic polyomino tilings.

Sol was always interested in puzzles, both with math and with words. For a while he wrote a puzzle column for the Los Angeles Times—and for 32 years he wrote “Golomb’s Gambits” for the Johns Hopkins alumni magazine. He participated in MegaIQ tests—earning himself a trip to the White House when he and its chief of staff happened to both score in the top five in the country.

He poured immense effort into his work at the university, not only teaching undergraduate courses and mentoring graduate students but also ascending the ranks of university administration (president of the faculty senate, vice provost for research, etc.)—and occasionally opining more generally about university governance (for example writing a paper entitled “Faculty Consulting: Should It Be Curtailed?”; answer: no, it’s good for the university!). At USC, he was particularly involved in recruiting—and over his time at USC he helped it ascend from a school essentially unknown in electrical engineering to one that makes it onto lists of top programs.

And then there was consulting. He was meticulous at not disclosing what he did for government agencies, though at one point he did lament that some newly published work had been anticipated by a classified paper he had written 40 years earlier. In the late 1960s—frustrated that everyone but him seemed to be selling polyomino games—Sol started a company called Recreational Technology, Inc. It didn’t go particularly well, but one side effect was that he got involved in business with Elwyn Berlekamp—a Berkeley professor and fellow enthusiast of coding theory and puzzles—whom he persuaded to start a company called Cyclotomics (in honor of cyclotomic polynomials of the form xⁿ–1) which was eventually sold to Kodak for a respectable sum. (Berlekamp also created an algorithmic trading system that he sold to Jim Simons and that became a starting point for Renaissance Technologies, now one of the world’s largest hedge funds.)

More than 10,000 patents refer to Sol’s work, but Sol himself got only one patent: on a cryptosystem based on quasigroups—and I don’t think he ever did much to directly commercialize his work.

Sol was for many years involved with the Technion (Israel Institute of Technology) and quite devoted to Israel. He characterized himself as an “non-observant orthodox Jew”—but occasionally did things like teach a freshman seminar on the Book of Genesis, as well as working on decoding parts of the Dead Sea Scrolls.

Sol and his wife traveled extensively, but the center of Sol’s world was definitely Los Angeles—his office at USC, and the house in which he and his wife lived for nearly 60 years. He had a circle of friends and students who relied on him for many things. And he had his family. His daughter Astrid remained a local personality, even being portrayed in fiction a few times—as a student in a play about Richard Feynman (she sat as a drawing model for him many times), and as a character in a novel by a friend of mine. Beatrice became an MD/PhD who’s spent her career applying an almost mathematical level of precision to various kinds of medical reasoning and diagnosis (Gulf War illness, statin effects, hiccups, etc.)—even as she often quotes “Beatrice’s Law”, that “everything in biology is more complicated than you think, even taking into account Beatrice’s Law”. (I’m happy to have made at least one contribution to Beatrice’s life: introducing her to her husband, now of 26 years, Terry Sejnowski, one of the founders of modern computational neuroscience.)

In the years I knew Sol, there was always a quiet energy to him. He seemed to be involved in lots of things, even if he often wasn’t particularly forthcoming about the details. Occasionally I would talk to him about actual science and mathematics; usually he was more interested in telling stories (often very engaging ones) about personalities and organizations (“Can you believe that [in 1985] after not going to conferences for years, Claude Shannon just showed up unannounced at the bar at the annual information theory conference?”, “Do you know how much they had to pay the president of Caltech to get him to move to Saudi Arabia?”, etc.)

In retrospect, I wish I’d done more to get Sol interested in some of the math questions brought up by my own work. I don’t think I properly internalized the extent to which he liked cracking problems suggested by other people. And then there was the matter of computers. Despite all his contributions to the infrastructure of the computational world, Sol himself basically never seriously used computers. He took particular pride in his own mental calculation capabilities. And he didn’t really use email until he was in his seventies, and never used a computer at home—though, yes, he did have a cellphone. (A typical email from him was short. I had mentioned last year that I was researching Ada Lovelace; he responded: “The story of Ada Lovelace as Babbage’s programmer is so widespread that everyone seems to accept it as factual, but I’ve never seen original source material on this.”)

Sol’s daughters organized a party for his 80th birthday a few years ago:

Invitation for Solomon Golomb's 80th birthday celebration

Sol had a few medical problems, though they didn’t seem to be slowing him down much. His wife’s health, though, was failing, and a few weeks ago her condition suddenly worsened. Sol still went to his office as usual on Friday, but on Saturday night, in his sleep, he died. His wife Bo died just two weeks later, two days before what would have been their 60th wedding anniversary.

Though Sol himself is gone, the work he did lives on—responsible for an octillion bits (and counting) across the digital world. Farewell, Sol. And on behalf of all of us, thanks for all those cleverly created bits.

↧

Idea Makers: A Book about Lives & Ideas

July 7, 2016, 8:12 am

≫ Next: Today We Launch Version 11!

≪ Previous: Solomon Golomb (1932–2016)

I spend most of my time trying to build the future with science and technology. But for many years now I’ve also had two other great interests: people and history. And today I’m excited to be publishing my first book that builds on these interests. It’s called Idea Makers, and its subtitle is Personal Perspectives on the Lives & Ideas of Some Notable People. It’s based on essays I’ve written over the past decade about a range of people—from ones I’ve personally known (like Richard Feynman and Steve Jobs) to ones who died long before I was born (like Ada Lovelace and Gottfried Leibniz).

The book is about lives and ideas, and how they mix together. At its core it’s a book of stories about people, and what those people managed to create. It’s the first book I’ve written that’s fundamentally non-technical—and I’m hoping all sorts of readers without deep technical interests will be able to enjoy it.

There’s a common stereotype that techies like me aren’t interested in people. But for some reason I always have been. Yes, I like computers and abstract ideas and those sorts of things very much, and I certainly spend a great deal of my time on them. But I also like people, and find them interesting. And no doubt that has something to do with why I’ve chosen to spend the past 30 years building up a company that is—like any company—full of people.

One of the things I always find particularly interesting about people is their life trajectories. I’ve been fortunate enough to mentor many people, and I hope to have had a positive effect on many life trajectories. But I also find life trajectories interesting for their own sake—as things to understand and learn from.

Idea Makers is in a sense an exploration of a few life trajectories whose intellectual output happens to have intersected with my own. Some of the people in the book are extremely well known; others less so. I had all sorts of different reasons for writing the pieces that have ended up in the book. Sometimes it was to celebrate an anniversary. Sometimes because of some event. And sometimes—sadly—it was because the person they’re about just died. And although I didn’t plan it this way, the sixteen people in the book turn out to represent an interesting cross-section. (Yes, it would have been nice to have a bit more diversity among them, but unfortunately, with my subjects being from past generations, it didn’t work out that way.)

So what have I learned from the explorations in the book? The way the history of science and technology is told it often sounds like new ideas just suddenly arrive in the world. But my experience is that there’s always a story behind them—and usually the story is long and deeply interwoven with the life of a particular person. Even the person themself may sometimes not realize just how important their own life trajectory is in the formation of an idea. But if one digs down, there’s usually a whole long thread to be unearthed.

Some of the people in this book I personally knew, so I was able to watch the stories of their ideas unfold over the course of many years. But for the historical figures in the book I had to do research. I’ve done a certain amount of historical research before—notably for the history notes in A New Kind of Science. But I have to say it’s gotten much easier to do historical research in recent years—particularly with so many historical documents now being scanned and searchable—not to mention the advent of large-scale computational knowledge and knowledge-based programming.

It’s rather common to come across what at first seem like mysteries. How on earth did so-and-so manage to figure out such-and-such? One could conclude that it was just a miraculous flash of inspiration. But in reality it almost never is. And indeed my own life experiences have shown me over and over again just how incremental the process of coming up with ideas actually is. One develops some intellectual framework, from which one comes up with conclusions or tools, which then let one extend the framework, and so on—at each stage incrementally generating ideas. And I have to say I’ve found it a lot of fun to try to discover those intellectual missing links—that let one see just how someone went from here to there to manage to arrive at a particular idea.

The story itself is usually interesting—both historically and personally. But I’ve also found that knowing the story of how an idea came to be almost invariably lets me understand the idea itself more deeply.

Ideas are ultimately abstract things. But something I’ve noticed is that their origins are often surprisingly concrete and practical. One might imagine that the best way to arrive at some new idea would just be to think abstractly long and hard. But the stories in the book—and my own life experiences—suggest that a much more common path is something quite different. Instead of heading straight for an abstract goal, what often happens is that the trajectories of people’s lives cause them to work on solving some practical problem. Needless to say, most people will just be satisfied if they solve the practical problem, and will go no further. But some will try to generalize it, and build up an abstract intellectual framework around what they have done. And that, in my observation, is where a great many important new ideas come from.

As I’ve studied the lives and ideas of the people in the book, I think I’ve learned a lot that I can apply in my own life. Perhaps most important is just knowing so many stories of how things worked out in the past—because these give me all sorts of intuition about how things I see today will work out in the future. Even though they’re separated by many details and sometimes centuries, it is remarkable how similar many of the personalities, trends and situations in the book are to ones I see all the time.

The stories in the book involve both triumphs and tragedies. But in the end, I find them all inspiring. Because in their different ways they show how it’s possible to transcend the daily details of human lives and create ideas that can make persistent contributions to our world.

It’s been fun writing the pieces in the book. Of course, there’ve been plenty of challenges. But I feel good about the extent to which I’ve managed to decode history, and get the true stories of how things happened—as well as paint accurate portraits of the people behind them. I’ve had the privilege of personally knowing some of these people; others I’ve come to know only by studying them and poring through what they wrote. I’ve learned a lot—and I’m hoping that with this this book I can pass on some of it, and in particular, communicate something about what it takes for people to make ideas, in the past and in the future.

Idea Makers is now available from bookstores, or online at Amazon.com and Barnes & Noble.

↧

Today We Launch Version 11!

August 8, 2016, 9:29 am

≫ Next: How to Teach Computational Thinking

≪ Previous: Idea Makers: A Book about Lives & Ideas

I’m thrilled today to announce the release of a major new version of Mathematica and the Wolfram Language: Version 11, available immediately for both desktop and cloud. Hundreds of us have been energetically working on building this for the past two years—and in fact I’ve personally put several thousand hours into it. I’m very excited about what’s in it; it’s a major step forward, with a lot of both breadth and depth—and with remarkably central relevance to many of today’s most prominent technology areas.

It’s been more than 28 years since Version 1 came out—and nearly 30 years since I started its development. And all that time I’ve been continuing to pursue a bold vision—and to build a taller and taller stack of technology. With most software, after a few years and a few versions, not a lot of important new stuff ever gets added. But with Mathematica and the Wolfram Language it’s been a completely different story: for three decades we’ve been taking major steps forward at every version, progressively conquering vast numbers of new areas.

It’s been an amazing intellectual journey for me and all of us. From the very beginning we had a strong set of fundamental principles and a strong underlying design—and for three decades we’ve been able to just keep building more and more on these foundations, creating what is by now an unprecedentedly vast system that has nevertheless maintained its unity, elegance and, frankly, modernity. In the early years we concentrated particularly on abstract areas such as mathematics. But over time we’ve dramatically expanded, taking ever larger steps and covering ever more kinds of computation and knowledge.

Each new version represents both a lot of new ideas and a lot of hard work. But more than that, it represents ever greater leverage achieved with our technology. Because one of our key principles is automation, and at every version we’re building on all the automation we’ve achieved before—in effect, we’ve got larger and larger building blocks that we’re able to use to go further and further more and more quickly. And of course what makes this possible is all that effort that I and others have put in over the years maintaining a coherent design for the whole system—so all those building blocks from every different area fit perfectly together.

With traditional approaches to software development, it would have taken a great many years to create what we’ve added in Version 11. And the fact that we can deliver Version 11 now is a direct reflection of the effectiveness of our technology, our principles and our methodology. And as I look at Version 11, it’s very satisfying to see how far we’ve come not only in what’s in the system, but also in how effectively we can develop it. Not to mention that all these directions we’ve been pursuing for so many years as part of the logical development of our system have now turned out to be exactly what’s needed for many of today’s most active areas of technology development.

For many years we called our core system Mathematica. But as we added new directions in knowledge and deployment, and expanded far beyond things related in any way to “math”, we decided to introduce the concept of the Wolfram Language to represent the core of everything we’re doing. And the Wolfram Language now defines the operation not only of Mathematica, but also of Wolfram Development Platform and Wolfram Programming Lab, as well as other products and platforms. And because all our software engineering is unified, today we’re able to release Version 11 of all our Wolfram-Language-based systems, both desktop and cloud.

OK, so what’s the big new thing in Version 11? Well, it’s not one big thing; it’s many big things. To give a sense of scale, there are 555 completely new functions that we’re adding in Version 11—representing a huge amount of new functionality (by comparison, Version 1 had a total of 551 functions altogether). And actually that function count is even an underrepresentation—because it doesn’t include the vast deepening of many existing functions.

The way we manage development, we’ve always got a portfolio of projects going on, from fairly small ones, to ones that may take five years or more. And indeed Version 11 contains the results of several five-year projects. We’re always keen to deliver the results of our development as quickly as possible to users, so we’ve actually had several intermediate releases since Version 10—and effectively Version 11 represents the combination of many completely new developments together with ones that we’ve already previewed in 10.1, 10.2, 10.3 and 10.4. (Many functions that were tagged as “Experimental” in 10.x releases are now in full production form in Version 11.0.)

The First Things You Notice…

When you first launch Version 11 on your desktop the first thing you’ll notice is that notebooks have a new look, with crisper fonts and tighter design. When you type code there are lots of new autocompletions that appear (it’s getting harder and harder to type the wrong thing), and when you type text there’s a new real-time spellchecker, that we’ll be continually updating to make sure it has the latest words included.

If your computer system is set to any of a dozen languages other than English, you’ll also immediately see something else: every function is automatically annotated with a “code caption” in the language you’ve set:

When you actually run code, you’ll notice that messages look different too—and, very helpfully for debugging, they let you immediately see what chain of functions was being called when the message was produced.

3D Printing

There are lots of big, meaty new areas in Version 11. But let’s jump right into one of them: 3D printing. I made my first 3D printout (which didn’t last long before disintegrating) back in 2002. And we’ve had the ability to export to STL for years. But what’s new and exciting in Version 11 is that we’ve built a complete pipeline that goes from creating 3D geometry to having it printed on your 3D printer (or through a printing service).

Often in the past I’ve wanted to take a 3D plot and just make a 3D print of it. And occasionally I’ve been lucky, and it’s been easy to do. But most of the time it’s a fiddly, complicated process. Because graphics that display on the screen don’t necessarily correspond to geometry that can actually be printed on a 3D printer. And it turns out to be a difficult problem of 3D computational geometry to conveniently set up or repair the geometry so it really works on a 3D printer. (Oh, and if you get it wrong, you could have plastic randomly squirting out of your printer.)

In Version 11 it’s finally realistic to take any 3D plot, and just 3D print it. Or you can get the structure of a molecule or the elevation around a mountain, and similarly just 3D print it. Over the years I’ve personally made many 3D printouts. But each one has been its own little adventure. But now, thanks to Version 11, 3D printouts of everything are easy. And now that I think about it, maybe I need a 3D printout of the growth of the Wolfram Language by area for my desk…

Machine Learning & Neural Networks

In a sense, Mathematica and the Wolfram Language have always been doing AI. Over the years we’ve certainly been pioneers in solving lots of “AI-ish” problems—from mathematical solving to automated aesthetics to natural language understanding. But back in Version 10 we also made a great step forward in machine learning—developing extremely automated core functions (Classify and Predict) for learning by example.

I have to say that I wasn’t sure how well these functions would do in practice. But actually it’s been amazing to see how well they work—and it’s been very satisfying to see so many of our users being able to incorporate machine learning into their work, just using the automation we’ve built, and without having to consult any machine learning experts.

In Version 11 we’ve made many steps forward in machine learning. We’ve now got clean ways not just to do classification and prediction, but also to do feature extraction, dimension reduction, clustering and so on. And we’ve also done a lot of training ourselves to deliver pre-trained machine-learning functions. Machine-learning training is an interesting new kind of development. At its core, it’s a curation process. It’s just that instead of, say, collecting data on movies, you’re collecting as many images as possible of different kinds of animals.

Built into Version 11 are now functions like ImageIdentify that identify over 10,000 different kinds of objects. And through the whole design of the system, it’s easy to take the features that have been learned, and immediately use those to train new image classifiers vastly more efficiently than before.

We’ve done a lot to automate today’s most common machine learning tasks. But it’s become clear in the past few years that an amazing number of new areas can now be tackled by modern machine learning methods, and particularly by using neural networks. It’s really an amazing episode in the history of science: the field of neural networks, that I’ve followed for almost 40 years, has gone from seeming basically hopeless to being one of the hottest fields around, with major new discoveries being made almost every week.

But, OK, if you want to get involved, what should you do? Yes, you can cobble things together with a range of low-level libraries. But in building Version 11 we set ourselves the goal of creating a streamlined symbolic way to set up and train neural networks—in which as much of what has to be done as possible has been automated. It’s all very new, but in Version 11 we’ve now got functions like NetGraph and NetChain, together with all sorts of “neural net special functions”, like DotPlusLayer and ConvolutionLayer. And with these functions it’s easy to take the latest networks and quickly get them set up in the Wolfram Language (recurrent networks didn’t quite make it into Version 11.0, but they’re coming soon).

Of course, what makes this all really work well is the integration into the rest of the Wolfram Language. The neural network is just a Graph object like any other graph. And inputs like images or text can immediately and automatically be processed using standard Wolfram Language capabilities into forms appropriate for neural network computation.

From their name, “neural networks” sound like they’re related to brains. But actually they’re perfectly general computational structures: they just correspond to complex combinations of simple functions. They’re not unrelated to the simple programs that I’ve spent so long studying—though they have the special feature that they’re set up to be easy to train from examples.

We’ve had traditional statistical data fitting and interpolation forever. But what’s new with neural networks is a vastly richer space of possible computational structures to fit data to, or to train. It’s been remarkable just in the past couple of years to see a sequence of fields revolutionized by this—and there will be more to come.

I’m hoping that we can accelerate this with Version 11. Because we’ve managed to make “neural net programming” really just another programming paradigm integrated with all the others in the Wolfram Language. Yes, it’s very efficient and can deal with huge training sets. But ultimately probably the most powerful thing is that it immediately fits in with everything else the Wolfram Language does. And even in Version 11 we’re already using it in many of our internal algorithms in areas like image, signal and text processing. It’s still early days in the history of “neural net programming”—but I’m excited for the Wolfram Language to play a central part in what’s coming.

Audio

OK, let’s turn to another new area of Version 11: audio. Our goal is to be able to handle any kind of data directly in the Wolfram Language. We’ve already got graphics and images and geometry and networks and formulas and lots else, all consistently represented as first-class symbolic structures in the language. And as of Version 11, we’ve now always got another type of first-class data: audio.

Audio is mostly complicated because it’s big. But in Version 11 we’ve got everything set up so it’s seamless to handle, say, an hour of audio directly in the Wolfram Language. Behind the scenes there’s all sorts of engineering that’s caching and streaming and so on. But it’s all automated—and in the language it’s just a simple Audio object. And that Audio object is immediately amenable to all the sophisticated signal processing and analysis that’s available in the Wolfram Language.

Bones, Foods, and the Universe…

The Wolfram Language is a knowledge-based language. Which means it’s got lots of knowledge—about both computation and the world—built into it. And these days the Wolfram Language covers thousands of domains of real-world knowledge—from countries to movies to companies to planets. There’s new data flowing into the central Wolfram Knowledgebase in the cloud all the time, and we’re carefully curating data on new things that exist in the world (who knew, for example, that there were new administrative divisions recently created in Austria?). Lots of this data is visible in Wolfram|Alpha (as well as intelligent assistants powered by it). But it’s in the Wolfram Language that the data really comes alive for full computation—and where all the effort we put into ensuring its alignment and consistency becomes evident.

We’re always working to expand the domains of knowledge covered by the Wolfram Language. And in Version 11 several domains that we’ve been working on for many years are finally now ready. One that’s been particularly difficult is anatomy data. But in Version 11 we’ve now got detailed 3D models of all the significant structures in the human body. So you can see how those complicated bones in the foot fit together. And you can do computations on them. Or 3D print them. And you can understand the network of arteries around the heart. I must say that as I’ve explored this, I’m more amazed than ever at the level of morphological complexity that exists in the human body. But as of Version 11, it’s now a domain where we can actually do computations. And there are perhaps-unexpected new functions like AnatomyPlot3D to support it. (There’s certainly more to be done, by the way: for example, our anatomy data is only for a single “average adult male”, and the joints can’t move, etc.)

A completely different domain of data now handled in the Wolfram Language is food. There’s a lot that’s complicated in this domain. First, there are issues of ontology. What is an apple? Well, there’s a generic apple, and there are also many specific types of apples. Then there are issues of defining amounts of things. A cup of strawberries. Three apples. A quarter pounder. It’s taken many years of work, but we’ve now got a very robust symbolic way to represent food—from which we can immediately compute nutritional properties and lots of other things.

Version 11 offers extensive technical data on nutritional and other properties of thousands of foods

Another area that’s also been long coming is historical country data. We’ve had very complete data on countries in modern times (typically from 1960 or 1970 on). But what about earlier history? What about Prussia? What about the Roman Empire? Well, in Version 11 we’ve finally got at least approximate border information for all serious country-like entities, throughout recorded history. So one can do computations about the rise and fall of empires right from within the Wolfram Language.

Talking of history, a small but very useful addition in Version 11 is historical word frequency data. Just ask WordFrequencyData for a time series, and you’ll be able to see how much people talked about “war”—or “turnips”—at different times in history. Almost every plot is a history lesson.

Another convenient function in Version 11 is WikipediaData, which immediately gives any Wikipedia entry (or various kinds of data it contains). There’s also WolframLanguageData, which gives computable data on the Wolfram Language itself—like the examples in its documentation, links between functions, and so on.

In many domains one’s mostly just dealing with static data (“what is the density of gold?”; “what was the population of London in 1959?”). But there are other domains where one’s not really interested in static data so much as data-backed computation. There are several new examples of this in Version 11. Like human mortality data (“what is the probability of dying between age X and Y?”), standard ocean data (“what is the pressure at a depth X?”), radioactive stopping power and human growth data—as well as data on the whole universe according to standard cosmological models.

Also new in Version 11 are WeatherForecastData and MathematicalFunctionData. Oh, as well as data on Pokémon and lots of other useful things.

Computing with Real-World Entities

One of the very powerful features of the Wolfram Language is its ability to compute directly with real-world entities. To the Wolfram Language, the US, or Russia, or a type of lizard are all just entities that can be manipulated as symbolic constructs using the overall symbolic paradigm of the language. Entities don’t directly have values; they’re just symbolic objects. But their properties can have values: [[USA]]["Population"] is 322 million.

But let’s say we don’t just want to take some entity (like the US) and find values of its properties. Let’s say instead we want to find what entities have certain properties with particular values. Like let’s say we want to find the 5 largest countries in the world by population. Well, in Version 11 there’s a new way to do this. Instead of specifying a particular explicit entity, we instead specify a computation that implicitly defines a class of entities. And so for example we can get a list of the 5 largest countries by population like this:

TakeLargest[5] is an operator form of a new function in Version 11 that gets the largest elements in a list. Implicit entities end up making a lot of use of operator forms—much like queries in Dataset. And in a sense they’re also making deep use of the symbolic character of the Wolfram Language—because they’re treating the functions that define them just like data.

The whole mechanism of entities and properties and implicit entities works for all the different types of entities that exist in the Wolfram Language. But as of Version 11, it’s not limited to built-in types of entities. There’s a new construct called EntityStore that lets you define your own types of entities, and specify their properties and values and so on—and then seamlessly use them in any computation.

Just as Dataset is a powerful hierarchical generalization of typical database concepts, so EntityStore is a kind of symbolic generalization of a typical relational database. And if you set up a sophisticated entity store, you can just use CloudDeploy to immediately deploy it to the cloud, so you can use it whenever you want.

Geo Everything

One aspect of “knowing about the real world” is knowing about geography. But the Wolfram Language doesn’t just have access to detailed geographic data (not only for Earth, but also for the Moon, Mars, even Pluto); it can also compute with this data. It’s got a huge collection of geo projections, all immediately computable, and all set up to support very careful and detailed geodesy. Remember spherical trigonometry? Well, the Wolfram Language doesn’t just assume the Earth is a sphere, but correctly computes distances and areas and so on, using the actual shape of the Earth.

When it comes to making maps, the Wolfram Language now has access not only to the street map of the world, but also to things like historical country borders, as well as at least low-resolution satellite imagery. And given the street map, there’s an important new class of computations that can be done: travel directions (and travel times) along streets from anywhere to anywhere.

Don’t Forget Calculus…

Version 11 has lots and lots of new capabilities across all areas of the Wolfram Language. But it’s also got lots of new capabilities in traditional Mathematica areas—like calculus. And back in earlier versions, what we’ve just added for calculus in Version 11 is big enough that it would have undoubtedly been the headline new feature of the version.

One example is differential eigensystems: being able to solve eigenvalue versions of both ODEs and PDEs. There’s a huge stack of algorithmic technology necessary to make this possible—and in fact we’ve been building towards it for more than 25 years. And what’s really important is that it’s general: it’s not something where one has to carefully set up some particular problem using elaborate knowledge of numerical analysis. It’s something where one just specifies the equations and their boundary conditions—and then the system automatically figures out how to solve them.

Back around 1976 I wrote a Fortran program to solve an eigenvalue version of the 1D Schrödinger equation for a particle physics problem I was studying. In 1981 I wrote C programs to do the same thing for some equations in relativistic quantum mechanics. I’ve been patiently waiting for the day when I can just type in these problems, and immediately get answers. And now with Version 11 it’s here.

Of course, what’s in Version 11 is much more powerful and more general. I was dealing with simple boundary conditions. But in Version 11 one can use the whole Wolfram Language geometry system—and all the data we have—to set up boundary conditions. So it’s easy to find the eigenmodes of a “drum” of any shape—like the shape of the US.

For something like that, there’s no choice but to do the computation numerically. Still, Version 11 will do differential eigensystem computations symbolically when it can. And Version 11 also adds some major new capabilities for general symbolic differential equations. In particular, we’ve had a large R&D project that’s now gotten us to the point where we can compute a symbolic solution to pretty much any symbolic PDE that would appear in any kind of textbook or the like.

Back in 1979 when I created a precursor to Mathematica I made a list of things I hoped we’d eventually be able to do. One of the things on that list was to solve integral equations. Well, I’m excited to be able to say that 37 years later, we’ve finally got the algorithmic technology stack to make this possible—and Version 11 introduces symbolic solutions to many classes of integro-differential equations.

There’s more in calculus too. Like Green’s functions for general equations in general domains. And, long awaited (at least by me): Mellin transforms. (They’ve been a favorite of mine ever since they were central to a 1977 particle physics paper of mine.)

It’s not classic calculus fare, but in Version 11 we’ve also added a lot of strength in what one might consider “modern calculus”—the aspects of calculus needed to support areas like machine learning. We’ve got more efficient and robust minimization, and we’ve also got sophisticated Bayesian minimization, suitable for things like unsupervised machine learning.

Education

Things like partial differential equations are sophisticated math, that happen to be very important in lots of practical applications in physics and engineering and so on. But what about more basic kinds of math, of the kind that’s for example relevant to high-school education? Well, for a long time Mathematica has covered that very thoroughly. But as our algorithmic technology stack has grown, there are a few new things that become possible, even for more elementary math.

One example new in Version 11 is full automatic handling of discontinuities, asymptotes, and so on in plots of functions. So now, for example, Tan[x] is plotted in the perfect high-school way, not joining –∞ and +∞. For Tan[x] that’s pretty simple to achieve. But there’s some seriously sophisticated algorithmic technology inside to handle this for more complicated functions.

And, by the way, another huge new thing in Version 11 is MathematicalFunctionData—computable access to 100,000 properties and relations about mathematical functions—in a sense encapsulating centuries of mathematical research and making it immediately available for computation.

We’ve been doing a lot recently using the Wolfram Language as a way to teach computational thinking at all levels. And among many other things, we’ve wanted to make sure that any computation that comes up—say in math—in elementary school education is really elementary to do in the Wolfram Language. And so we’ve got little functions like NumberExpand, which takes 123 and writes it as {100, 20, 3}. And we’ve also got RomanNumeral and so on.

And, partly as a tribute to the legacy of Logo, we’ve introduced AnglePath—a kind of industrial-scale version of “turtle graphics”, that happens to be useful not just for elementary education, but for serious simulations, say of different types of random walks.

Making Everything Fit Together

One of the central goals of the Wolfram Language is to have everything seamlessly work together. And in Version 11 there are some powerful new examples of this going on.

Time series, for example, now work directly with arithmetic. So you can take two air pressure time series, and just subtract them. Of course, this would be easy if all the time points in the series lined up. But in Version 11 they don’t have to: the Wolfram Language automatically handles arbitrarily irregular time series.

Another example concerns units. In Version 11, statistical distributions now work seamlessly with units. So a normal distribution can have not just a variance of 2.5, but a variance of 2.5 meters. And all computations and unit conversions are handled completely automatically.

Geometry and geometric regions have also been seamlessly integrated into more parts of the system. Solvers that used to just take variables can now be given arbitrary regions to operate over. Another connection is between images and regions: ImageMesh now takes any image and constructs a geometric mesh from it. So, for example, you can do serious computational geometry with your favorite cat picture if you want.

One final example: random objects. RandomInteger and RandomReal are old functions. Version 8 introduced RandomVariate for picking random objects from arbitrary symbolic probability distributions. Then in Version 9 came RandomFunction, for generating functions from random processes. But now in Version 11 there’s more randomness. There’s RandomPoint, which picks a random point in any geometric region. And there’s also RandomEntity that picks a random entity, as well as RandomWord—that’s useful for natural language processing research, as well as being a nice way to test your vocabulary in lots of languages… And finally, in Version 11 there’s a whole major new mathematical area of randomness: random matrices—implemented with all the depth and completeness that we’ve made a hallmark of Mathematica and the Wolfram Language.

Visualization

One of the long-term achievements of Mathematica and the Wolfram Language has been that they’ve made visualization a routine part of everyday work. Our goal has always been to make it as automatic as possible to visualize as much as possible. And Version 11 now makes a whole collection of new things automatic to visualize.

There are very flexible word clouds that let one visualize text and collections of strings. There are timeline plots for visualizing events in time. There are audio plots that immediately visualize short and long pieces of audio. There are dendrograms that use machine learning methods to show hierarchical clustering of images, text, or any other kind of data. There are geo histograms to show geographic density. There’s a TextStructure function that diagrams the grammar of English sentences. And there are anatomy plots, to show features in the human body (making use of symbolic specifications, since there aren’t any explicit coordinates).

What other kinds of things are there to visualize? Well, one thing I’ve ended up visualizing a lot (especially in my efforts in basic science) are the rules for simple programs like cellular automata. And in Version 11 we’ve added RulePlot for automatically visualizing rules in many different styles.

Another longstanding visualization challenge has been how to automatically visualize 3D distributions of data. The issue tends to be that it’s hard to “see into” the 3D volume. But in Version 11 we’ve got a bunch of functions that solve this in different ways, often by making slices at positions that are defined by our geometry system.

In the quest for automation in visualization, another big area is labeling. And in Version 11 we’ve added Callout to make it possible to specify callouts for points, lines and regions (we’ve already got legends and tooltips and so on). There’s a trivial way to do callouts: just always put them (say) on the left. But that’d be really bad in practice, because lots of callouts could end up colliding. And what Version 11 does instead is something much more sophisticated, involving algorithmically laying out callouts to optimally achieve aesthetic and communication goals.

From Strings to Text…

Mathematica and the Wolfram Language have always been able to handle character strings. And in Version 10 there was a huge step forward with the introduction of Interpreter—in which we took the natural language understanding breakthroughs we made for Wolfram|Alpha, and applied them to interpreting strings in hundreds of domains. Well, in Version 11 we’re taking another big step—providing a variety of functions for large-scale natural language processing and text manipulation.

There are functions like TextWords and TextSentences for breaking text into words and sentences. (It takes fancy machine learning to do a good job, and not, for example, to be confused by things like the periods in “St. John’s St.”) Then there are functions like TextCases, which lets one automatically pick out different natural-language classes, like countries or dates, or, for that matter, nouns or verbs.

It’s pretty interesting being able to treat words as data. WordList gives lists of different kinds of words. WordDefinition gives definitions.

Then there are multilingual capabilities. Alphabet gives pretty much every kind of alphabet; Transliterate transliterates between writing systems. And WordTranslation gives translations of words into about 200 languages. Great raw material for all sorts of linguistics investigations.

The Most Modern Systems Programming

The Wolfram Language is arguably the highest-level language that’s ever been created. But in Version 11 we’ve added a bunch of capabilities for “reaching all the way down” to the lowest level of computer systems. First of all, there’s ByteArray that can store and manipulate raw sequences of bytes. Then there are functions that deal with raw networks, like PingTime and SocketConnect.

There’s a new framework for publish-subscribe “channels”. You can create a channel, then either the Wolfram Language or an external system can send it data, and you can set up a “listener” that will do something in your Wolfram Language session whenever the data arrives. There’s a lot that can be built with this setup, whether it’s connecting to external services and devices, handling notifications and third-party authentication—or creating your very own chat system.

Something else new in Version 11 is built-in cryptography. It’s a very clean symbolic framework that lets you set up pretty much whatever protocol you want, using public or private key systems.

What about interacting with the web? The symbolic character of the Wolfram Language is again very powerful here. Because for example it lets one have HTTPRequest and HTTPResponse as symbolic structures. And it also lets one have functions like URLSubmit, with symbolically defined handler functions for callbacks from asynchronous URL submission. There’s even now a CookieFunction, for symbolic handling of cookies.

Yes, one can do systems programming in pretty much any language—or even for example in a shell. But what I’ve found is that doing it in the Wolfram Language is incredibly more powerful. Let’s say you’re exploring the performance of a computer system. Well, first of all, everything you’re doing is kept nicely in a notebook, where you can add comments, etc. Then—very importantly—everything you do can be immediately visualized. Or you can apply machine learning, or whatever. Want to study network performance? Use PingTime to generate a list of ping times; then immediately make a histogram, correlate with other data, or whatever.

Something else we’re adding in Version 11 is FileSystemMap: being able to treat a file system like a collection of nested lists (or associations) and then mapping any function over it. So for example you can take a directory full of images, and use FileSystemMap to apply image processing to all of them.

Oh, and another thing: Version 11 also includes, though it’s still tagged as experimental, a full industrial-strength system for searching text documents, both locally and in the cloud.

Building Things on the Web

An incredibly powerful feature of the Wolfram Language is that it runs not only on the desktop but also in the cloud. And in Version 11 there are lots of new capabilities that use the cloud, for example to create things on the web.

Let’s start with the fairly straightforward stuff. CloudDeploy[FormFunction[...]] lets you immediately create a form-based app on the web. But now it’s easy to make the form even more sophisticated. There are lots of new types of “smart fields” that automatically use natural language understanding to interpret your input. There are new constructs, like RepeatingElement and CompoundElement, that automatically set up fields to get input for lists and associations. And there’s a whole new Programmable Linguistic Interface that lets you define your own grammar to extend the natural language understanding that’s already built into the Wolfram Language.

The forms you specify symbolically in the Wolfram Language can be quite sophisticated—with multiple pages and lots of interdependencies and formatting. But ultimately they’re still just forms where you set up your input, then submit it. Version 11 introduces the new AskFunction framework which lets you set up more complex interactions—like back-and-forth dialogs in which you “interview” the user to get data. In the Wolfram Language, the whole process is specified by a symbolic structure—which CloudDeploy then makes immediately active on the web.

It’s a goal of the Wolfram Language to make it easy to build complex things on the web. In Version 11, we’ve added FormPage to let you build a “recycling form” (like on wolframalpha.com), and GalleryView to let you take a list of assets in the Wolfram Language, and immediately deploy them as a “gallery” on the web (like in demonstrations.wolfram.com).

If you want to operate at a lower level, there are lots of new functions, like URLDispatcher and GenerateHTTPResponse, that let you determine exactly how web requests will be handled by things you set up in the cloud.

Also new in Version 11 are functions like CloudPublish and CloudShare that let you control access to things you put in the cloud from the Wolfram Language. A small but I think important new feature is SourceLink, which lets you automatically link from, say, a graphic that you deploy in the cloud back to the notebook (also in the cloud) in which it was created. I think this will be a great tool for “data-backed publication”—in which every picture you see in a paper, for example, links back to what created it. (Inside our company, I’m also insisting that automated reports we generate—from the Wolfram Language of course—include source links, so I can always get the raw data and analyze it myself or whatever.)

Data Into the Cloud

Already there experimentally in Version 10—but now fully supported in Version 11—is the whole Wolfram Data Drop mechanism, which lets you accumulate data from anywhere into the Wolfram Cloud. I have to say I think I underestimated the breadth of usefulness of the Wolfram Data Drop. I thought it would be used primarily to store data from sensors and the like. And, yes, there are lots of applications along these lines. But what I’ve found is that the Data Drop is incredibly useful purely inside the Wolfram Language. Let’s say you’ve got a web form that’s running in the Wolfram Language. You might process each request—but then throw the result into a databin in the Wolfram Data Drop so you can analyze them all together.

Wolfram Data Drop is basically set up to accumulate time series of data. In Version 11 another way to store data in the cloud is CloudExpression. You can put any Wolfram Language expression into a cloud expression, and it’ll be persistently stored there, with each part being extracted or set by all the usual operations (like Part and AppendTo) that one could use on a symbol in a Wolfram Language session. CloudExpression is a great way to store structured data where one’s continually modifying parts, but one wants all the data to be persistent in the cloud.

Things you store in the cloud are immediately persistent. In Version 11, there’s also LocalObject—which is the local analog of CloudObject—and provides persistent local storage on your machine. LocalCache is a seamless way of ensuring that things you’re using are cached in the local object store.

Connecting Everywhere

In the Wolfram Language we curate lots of data that we include directly in the knowledgebase—but we also curate ways to access more data, such as external APIs. Version 11 includes many new connections, for example to Flickr, Reddit, MailChimp, SurveyMonkey, SeatGeek, ArXiv and more.

The Wolfram Language is also an extremely powerful way to deploy your own APIs. And in Version 11 there’s an expanding set of authentication mechanisms that are supported for APIs—for example PermissionsKey for giving appids. CloudLoggingData also now gives what can be extremely detailed data about how any API or other cloud object you have is being accessed.

An API that you call on the web basically gets data passed to it through the URL that it’s given. In Version 11 we have a new kind of API-like construct that operates not through the web and URLs, but through email. MailReceiverFunction is like APIFunction, except when it’s deployed it defines an email address, and then any mail that’s sent to that email address gets fed to the code in the mail receiver function. MailReceiverFunction lets you get very detailed in separating out different parts of a mail message and its headers—and then lets you apply any Wolfram Language function you want—so you can do arbitrarily sophisticated automation in processing email. And particularly for someone like me who gets a huge amount of email from humans as well as automated systems, this is a pretty nice thing.

WolframScript

You can access the Wolfram Language through a notebook, on the desktop or in the cloud. You can access it through a scheduled task in the cloud, or through an API or a mail receiver function. It’s always been possible to run the Wolfram Language from a command line too, but in Version 11 there’s a powerful new way to do that, using WolframScript.

The idea of WolframScript is to provide a very simple but flexible interface to the Wolfram Language. WolframScript lets you run on a local Wolfram Engine on your computer—or just by saying -cloud it lets you run in the cloud. It lets you run code from a file or directly from the command line. And it lets you get back results in any format—including text or images or sounds or PDF or CDF or whatever. And in the usual Unix way, it lets you use #!wolframscript to create a script that can be called standalone and will run with WolframScript.

There’s more too. You can set up WolframScript to operate like a Wolfram Language FormFunction—pulling in arguments of whatever types you specify (and doing interpretation when needed). And you can also use WolframScript to call an API you’ve already defined in the cloud.

In our own company, there are lots of places where we’re using the Wolfram Language as part of some large and often distributed system. WolframScript provides a very clean way to just “throw in a Wolfram Language component” anywhere you want.

The Core Language

I’ve talked about all sorts of things that broaden and deepen the algorithmic capabilities of Mathematica and the Wolfram Language. But what about the structure of the core Wolfram Language itself? Of course, we’re committed to always maintaining compatibility (and I’m happy to say that all our attention to design on an ongoing basis tends to make this rather easy). But we also want to progressively strengthen and polish the language.

In natural languages, one process of evolution tends to be the construction of new words from common idioms. And we’re doing essentially the same thing in the Wolfram Language. We’ve made an active effort to study what “lumps of repeated computational work” appear most frequently across lots of Wolfram Language code. Then—assuming we can come up with a good name for a particular lump of computational work—we add it as a new function.

In the early days of Mathematica I used to take the point of view that if there were functions that let you do something using an idiom, then that was fine. But what I realized is that if an idiom is compressed into a single function whose name communicates clearly what it’s for, then one gets code that’s easier to read. And coupled with the convenience of not have to reconstruct an idiom many times, this justifies having a new function.

For the last several years, two initiatives we’d had within our company are Incremental Language Development (ILD), and Language Consistency & Completeness (LCC). The idea of ILD is to do things like introduce functions that are equivalent to common idioms. The idea of LCC is to do things like make sure that anything—like pattern matching, or units, or symbolic URLs—is supported wherever it makes sense across the system.

So, for example, a typical ILD addition in Version 11 is the function MinMax that returns min and max in a list (it’s amazing how many fiddly applications of Map that saves). A typical LCC addition is support for pattern matching in associations.

Between ILD and LCC there are lots of additions to the core language in Version 11. Functions like Cases have been extended to SequenceCases—looking for sequences in a list instead of individual elements. There’s also now SequenceFoldList, which is like FoldList, except it can “look back” to a sequence of elements of any length. In a similar vein, there’s FoldPairList, which generalizes FoldList by allowing the result “returned” at each step to be different from the result that’s passed on in the folding process. (This might sound abstract, and at some level it is—but this is a very useful operation whenever one wants to maintain a separate internal state while sequentially ingesting data.)

Another new construct that might at first sound weird is Nothing. Despite its name, Nothing does something very useful: whenever it appears in a list, it’s immediately removed. Which means, for example, that to get rid of something in a list, you just have to replace it by Nothing.

There are lots of little conveniences we’ve added in Version 11. First, for example, now has a second argument, that says what to give if there’s no first element—and avoids having to include an If for that case. There’s also a nice general mechanism for things like this: UpTo. You can say Take[list,UpTo[4]] to get up to 4 elements in list, or however many elements there happen to be. UpTo is supported in lots of places—and it simplifies lots of code.

Another little convenience is Echo. When you’re trying to tell what’s going on inside a piece of code, you sometimes want to print some intermediate result. Echo is a function that prints, then returns what it printed—so you can sprinkle it in your code without changing what the code does.

It’s hard to believe there are useful basic list operations still to add, but Version 11 has a few. Subdivide is like Range except it subdivides the range into equal parts. TakeLargest and related functions generalize Max etc. to give not just the largest, but the n largest elements in a list.

There’s a function Groupings, which I’d been thinking about for years but only now figured out a good design for—and which effectively generates all possible trees formed with certain binary or other combiners (“what numbers can you get by combining a list of 1s in all possible ways with Plus and Times?”).

There are nips and tucks to the language in all sorts of places. Like in Table, for example, where you can say Table[x,n] rather than needing to say Table[x,{n}]. And in general there are lots and lots of things that make the core of Version 11 of the Wolfram Language just a little smoother, nicer and more elegant to use.

And There’s Much More

This has been a long blog post. But I’m not even close to having covered everything that’s new in Version 11. There’s more information on the web. Check out the featured new areas, or the information for existing users or the summary of new features. (See also the list of new features specifically from Version 10.4 to 11.0.)

But most of all, get into using Version 11! If you want quick (and free) exposure to it, try it in the Wolfram Open Cloud. Or just start using Mathematica 11 or Version 11 of any other products based on the Wolfram Language.

I’ve been using test versions of Version 11 for some time now—and to me Version 10 already looks and feels very “old fashioned”, lacking those nice new interface features and all the new functionality and little conveniences. I’m really pleased with the way Version 11 has turned out. It’s yet another big step in what’s already been a 30-year journey of Mathematica and the Wolfram Language. And I’m excited to see all the wonderful things that all sorts of people around the world will now be able to do for first time with Mathematica 11 and the other Version 11 Wolfram Language products.

To comment, please visit the copy of this post at the Wolfram Blog »

↧

How to Teach Computational Thinking

September 7, 2016, 12:01 pm

≫ Next: Computational Law, Symbolic Discourse and the AI Constitution

≪ Previous: Today We Launch Version 11!

The Computational Future

Computational thinking is going to be a defining feature of the future—and it’s an incredibly important thing to be teaching to kids today. There’s always lots of discussion (and concern) about how to teach mathematical thinking to kids. But looking to the future, this pales in comparison to the importance of teaching computational thinking. Yes, there’s a certain amount of mathematical thinking that’s needed in everyday life, and in many careers. But computational thinking is going to be needed everywhere. And doing it well is going to be a key to success in almost all future careers.

Doctors, lawyers, teachers, farmers, whatever. The future of all these professions will be full of computational thinking. Whether it’s sensor-based medicine, computational contracts, education analytics or computational agriculture—success is going to rely on being able to do computational thinking well.

I’ve noticed an interesting trend. Pick any field X, from archeology to zoology. There either is now a “computational X” or there soon will be. And it’s widely viewed as the future of the field.

Computational Thinking WordCloud

So how do we prepare the kids of today for this future? I myself have been involved with computational thinking for nearly 40 years now—building technology for it, applying it in lots of places, studying its basic science—and trying to understand its principles. And by this point I think I have a clear view of what it takes to do computational thinking. So now the question is how to educate kids about it. And I’m excited to say that I think I now have a good answer to that—that’s based on something I’ve spent 30 years building for other purposes: the Wolfram Language. There have been ways to teach the mechanics of low-level programming for a long time, but what’s new and important is that with all the knowledge and automation that we’ve built into the Wolfram Language we’re finally now to the point where we have the technology to be able to directly teach broad computational thinking, even to kids.

I’m personally very committed to the goal of teaching computational thinking—because I believe it’s so crucial to our future. And I’m trying to do everything I can with our technology to support the effort. We’ve had Wolfram|Alpha free on the web for years now. But now we’ve also launched our Wolfram Open Cloud—so that anyone anywhere can start learning computational thinking with the Wolfram Programming Lab, using the Wolfram Language. But this is just the beginning—and as I’ll discuss here, there are many exciting new things that I think are now possible.

What Is Computational Thinking?

But first, let’s try to define what we mean by “computational thinking”. As far as I’m concerned, its intellectual core is about formulating things with enough clarity, and in a systematic enough way, that one can tell a computer how to do them. Mathematical thinking is about formulating things so that one can handle them mathematically, when that’s possible. Computational thinking is a much bigger and broader story, because there are just a lot more things that can be handled computationally.

But how does one “tell a computer” anything? One has to have a language. And the great thing is that today with the Wolfram Language we’re in a position to communicate very directly with computers about things we think about. The Wolfram Language is knowledge based: it knows about things in the world—like cities, or species, or songs, or photos we take—and it knows how to compute with them. And as soon as we have an idea that we can formulate computationally, the point is that the language lets us express it, and then—thanks to 30 years of technology development—lets us as automatically as possible actually execute the idea.

The Wolfram Language is a programming language. So when you write in it, you’re doing programming. But it’s a new kind of programming. It’s programming in which one’s as directly as possible expressing computational thinking—rather than just telling the computer step-by-step what low-level operations it should do. It’s programming where humans—including kids—provide the ideas, then it’s up to the computer and the Wolfram Language to handle the details of how they get executed.

Programming—and programming education—have traditionally been about telling a computer at a low level what to do. But thanks to all the technology we’ve built in the Wolfram Language, one doesn’t have to do that any more. One can express things at a much higher level—so one can concentrate on computational thinking, not mere programming.

Yes, there’s certainly a need for some number of software engineers in the world who can write low-level programs in languages like C++ or Java or JavaScript—and can handle the details of loops and declarations. But that number is tiny compared to the number of people who need to be able to think computationally.

The Wolfram Language—particularly in the form of Mathematica—has been widely used in technical research and development around the world for more than a quarter of a century, and endless important inventions and discoveries have been made with it. And all these years we’ve also been progressively filling out my original vision of having an integrated language in which every possible domain of knowledge is built in and automated. And the exciting thing is that now we’ve actually done this across a vast range of areas—enough to support all kinds of computational thinking, for example across all the fields traditionally taught in schools.

Seven years ago we released Wolfram|Alpha—which kids (and many others) use all the time to answer questions. Wolfram|Alpha takes plain English input, and then uses sophisticated computation from the Wolfram Language to automatically generate pages of results. I think Wolfram|Alpha is a spectacular illustration—for kids and others—of what’s possible with knowledge-based computation in the Wolfram Language. But it’s only intended for quick “drive by” questions that can be expressed in fairly few words, or maybe a bit of notation.

So what about more complicated questions and other things? Plain English doesn’t work well for these. To get enough precision to be able to get definite results one would end up with something like very elaborate and incomprehensible legalese. But the good news is that there’s an alternative: the Wolfram Language—which is built specifically to make it easy to express complex things, yet is always precise and definite.

It doesn’t take any skill to use Wolfram|Alpha. But if one wants to go further in taking advantage of what computation makes possible, one has to learn more about how to formulate and structure what one wants. Or, in other words, one needs to learn to do computational thinking. And the great thing is that the Wolfram Language finally provides the language in which one can do that—because, through all the work we’ve put into it, it’s managed to transcend mere programming, and as directly as possible support computational thinking.

Getting into the Wolfram Language

So what’s it like when kids are first exposed to the Wolfram Language? As part of my effort to understand how to teach computational thinking, I’ve spent quite a bit of time in the last few years using the Wolfram Language with kids. Sometimes it’s with large groups, sometimes with small groups—and sometimes I’ll notice a kid at some grown-up event I’m at, and end up getting out my computer and spending time with the kid rather than the grown ups. I’ve worked with high-school-age kids, and with middle-school-age (11–14) ones.

Elizabeth Wolfram writes code

If it’s one kid, or a small group, I’ll always insist that the kids do the typing. Usually I’ll start off with something everyone knows. Get the computer to compute 2+2. They type it in, and they can see that, yes, the computer gives them the result they know:

2+2=4

They’ll often then try some other basic arithmetic. It’s very important that the Wolfram Language lets them just enter input, and immediately see output from it. There are no extra steps.

After they’ve done some basic arithmetic, I’ll usually suggest they try something that generates more digits:

12^123

Often they’ll ask if it’s OK, or if somehow the long number will break the computer. I encourage them to try other examples, and they’ll often do computations that instantly generate pages and pages of numbers. These kinds of big-number computations are something we’ve been able to do for decades, but kids still always seem to get very excited by them. I think the point is that it lets them see that, yes, a computer really can compute nontrivial things. (Just think how long it would take you to compute all those digits…)

After they’ve done some basic arithmetic, it’s time for them to try some other functions. The most common function that I end up starting with is Range:

Range[10]

Range is good because it’s easy for kids to see what it does—and they quickly get the sense that, yes, they can tell the computer to do something, and it will do it. Range is also good because it’s easy to use it to generate something satisfyingly big. Often I’ll suggest they try Range[1000]. They’ll ask if Range[10000] is OK too. I tell them to try it…

Range[10000]

I think I do something different with every kid or group of kids I deal with. But a pretty common next step is to see how to visualize the list we’ve made:

ListPlot[Range[10]]

If the kids happen to be into math, I might try next making a table of primes:

Table[Prime[n], {n, 20}]

And then plotting them:

ListPlot[Table[Prime[n], {n, 20}]]

For kids who perhaps don’t think they like math—or tech in general—I might instead make some colors:

{Red, Green, Blue}

Maybe we’d try blending red and blue to make purple:

Blend[{Red, Blue}]

Maybe we’d pick up the current image from the camera:

CurrentImage[]

And we’d find all the “edges” in it:

EdgeDetect[]

We might also get a bit more sophisticated with color:

Table[Hue[n], {n, 0, 1, 1/20}]

Perhaps then we’d go in another direction, getting a list of common words in English (I’d also try another language if any of the kids know one):

WordList[]

If the kids are into language arts, we might try generating some random words:

RandomWord[10]

We might see how to use StringTake to take the first letter of each word:

StringTake[WordList[], 1]

Then use WordCloud to make a word cloud and see the relative frequencies of first letters:

WordCloud[StringTake[WordList[], 1]]

Some kid might ask “what about the first two letters?”. Then we’d be off trying that (yes, there’s some computational thinking involved in that UpTo):

WordCloud[StringTake[WordList[], UpTo[2]]]

We might talk for a bit about how many words start with “un-” etc. And maybe we’d investigate some of those words. We could go on and look at translations of words:

WordTranslation["hello", All]
Actually, it’d be easy to go on for hours just doing things with what I’ve talked about so far. But let’s look at some other examples. A big thing about the Wolfram Language is that it knows about lots of real-world data. I’d typically build this up through a bunch of steps, but here’s an example of making a collage of flags of countries in Europe, where the size of each flag is determined by the current population of the country:

ImageCollage[ CountryData["Europe", "Population"] -> CountryData["Europe", "FlagImage"]]

Since we happen to have talked about color, it’s fun to see where in color space the flags lie (apparently not many “pink countries”, for example):

ChromaticityPlot

A big theme is that the Wolfram Language lets one do not just abstract computation, but computation based on real-world knowledge. The Wolfram Language covers a huge range of areas, from traditional STEM-like areas to art, history, music, sports, literature, geography and so on. Kids often like doing things with maps.

We might start from where we are (Here). Or from some landmark. Like here’s a map with a 100-mile-radius disk around the Eiffel tower:

GeoGraphics[ GeoDisk[Entity["Building", "EiffelTower::5h9w8"], Quantity[100, "Miles"]]]

Here’s a “powers of 10” sequence of images:

$Table[GeoGraphics[ GeoDisk[Entity["Building", "EiffelTower::5h9w8"], 10^n Quantity[0.1, "Miles"]]], {n, 0, 4}]$

So what about history, for example? How can the Wolfram Language engage with that? Actually, it’s full of historical knowledge. About countries (plot the growth and decline of the Roman Empire), or movies (compare movie posters over time), or, for example, words. Like here’s a comparison of the use of “horse” and “car” in books over the last 300 years:

DateListPlot[WordFrequencyData[{"horse", "car"}, "TimeSeries"]]

Try the same thing for names of countries, or inventions, or whatever; there’s always lots of history to discuss.

There are so many different directions to go. Here’s another one: graphics. Let’s make a 3D sphere:

Graphics3D[Sphere[]]

It’s always fun for kids that they can make something like this in 3D and move it around. If they’re on the more sophisticated end, we might build up 3D graphics like this from 100 random spheres with random colors:

Graphics3D[ Table[Style[Sphere[RandomReal[10, 3]], RandomColor[]], 100]]

Kids of all ages like making interactive stuff. Here’s a simple “adjustable Cyclops eye” that one can easily build up to in stages:

Manipulate[ Graphics[{Style[Disk[{0, 0}, 1], Yellow], Style[Disk[{0, 0}, r], Black]}], {r, 0, 1}]

Another thing I sometimes do is have the Wolfram Language make sound. Here’s a random sequence of musical notes:

Sound[Table[SoundNote[RandomInteger[20], 0.1], {n, 20}]]

There are so many directions to go. For the budding medical person, there’s anatomy in 3D—and you can pick out the geometry of a bone and 3D print it. And so on and so on.

AnatomyPlot3D hand

But What About…

I’d never seriously tried working with kids (though, yes, I do have four kids of my own) before launching into my recent efforts on computational thinking. So I didn’t know quite what to expect. People I talked to seemed somewhat amused about the contrast to my usual life of hard-driving technology development. And they kept on bringing up a couple of issues they thought might be crippling to what I wanted to do. The first was that they were skeptical that kids would actually be able to type raw code in the Wolfram Language; they thought they’d just get too confused and tangled up with syntax and so on. And the second issue is that they didn’t think kids would be motivated to do anything with code unless it led to creating a game they could play.

One of the nice features of working with kids is that if you give them the chance, they’ll very quickly make it very clear to you what works with them and what doesn’t. So what actually happens? Well, it turns out that in my experience neither of the potential problems people brought up ends up being a real issue at all. But the reasons for this are quite interesting, and not particularly what I would have expected.

About typing code, one thing to realize is that in today’s world, most middle-school-age kids are quite used to typing, or at least typing text. Sometimes when they start typing code they at first have to look where the [ ] keys are, or even where the + is. But they don’t have any fundamental problem with typing. They’re also quite used to learning precise rules for how things work (“i comes before e …” in English spelling; the order of operations in math; etc.). So learning a few rules like “functions use square brackets” or “function names start with capital letters” isn’t a big deal. And of course in the Wolfram Language there’s nothing like all those irregularities that exist in a natural language like English.

When I watch kids typing code, the automatic hints we provide are quite important (brackets being purple until they’re matched; things turning red if they’re in the wrong place; autocompletions being suggested for everything; etc.). But the bottom line is that despite the theoretical concerns of adults, actual kids seem to find it extremely easy to type syntactically correct code in the Wolfram Language. In fact, I’ve been amazed at how quickly many kids “get it”. Having seen just a few examples, they immediately generalize. And the great thing is that because the Wolfram Language is designed in a very consistent way, the generalizations they come up with actually work. It’s heartwarming for me as the language designer to see this. Though of course, to the kids it’s just obvious that something must work this-or-that way, and they don’t imagine that it took effort to design it that way.

OK, so kids can type Wolfram Language code. But do they want to? Lots of kids like playing games on computers, and adults often think that’s all they’ll be interested in creating on computers too. But in my observation, this simply isn’t true. The most important thing for most kids about the Wolfram Language is that they can immediately “do something real” with it. They can type whatever code they want, and immediately get the computer to do something for them. They can create pictures or sounds or text. They can make art. They can do science. They can explore human languages. They can analyze Pokémon (yes, the Wolfram Language has extensive Pokémon data). And yes, if they really want to, they can make games.

In my experience, if you ask kids before they’ve seen the Wolfram Language what they might be interesting in programming they’ll often say games. But as soon as they’ve actually seen what’s possible in the Wolfram Language, they’ll stop talking about games, and they’ll want to do something “real” instead.

Nuts and Bolts

It’s only very a recent thing (and it’s basically taken 30 years of work) that the Wolfram Language has got to the point where I think it provides an instantly compelling way for kids to learn computational thinking. And actually, it’s not just the raw language—and all the knowledge it contains—that’s important: it’s also the environment.

The first point is that the Wolfram Notebook concept that we invented nearly 30 years ago is a really good way for kids (and others) to interact with the language. The idea of a notebook is to have an interactive document that freely mixes code, results, graphics, text and everything else. One can build up a computation in a notebook, typing code and getting results right there in the document. The results can be dynamic—with their own automatically generated user interfaces. And one can read—or write—explanations or instructions directly in the notebook. It’s taken decades to polish all aspects of notebooks. But now we’ve got an extremely efficient and wonderful environment in which to work and think—and learn computational thinking.

For many years, notebooks and the Wolfram Language were basically available only as desktop software. But now—after a huge software engineering effort—they’re also available in the cloud, directly in a web browser, or on mobile devices. So that means that any kid can just go to a web browser, and immediately start interacting with the Wolfram Language—creating or editing a notebook, and writing whatever code they want.

It takes a big stack of technology to make this possible. And building it has taken a big chunk of my life. It’s been very satisfying to see so many great leading-edge achievements made over the years with our technology. And now I’m really excited to see what’s possible in using it to spread computational thinking to future generations.

I made the decision when we created Wolfram|Alpha to make it available free on the web to the world. And it’s been wonderful to see so many people—and especially kids—using it every day. So a few months ago, when the technology was ready, I made the decision also to provide free access to the whole Wolfram Language in our Wolfram Open Cloud—and to set it up so kids (and others) could learn computational thinking there.

Wolfram|Alpha is set up so anyone can ask it questions, in plain English. And it’s turned out to be great—among other things—as a way to support education in lots of fields. But if one wants to learn true computational thinking for the future, then one’s got to go beyond asking questions in plain English. And that’s where the Wolfram Language comes in.

So what’s the best way to get started with the Wolfram Language, and the computational thinking it makes possible? There are probably many answers to this, that, among other things, depend on the details of the environment and resources that different kids have available. I’d like to think I’ve personally done a decent job working directly with kids—and for example at our Wolfram Summer Camp for high-school students I’ve seen very good things achieved with direct personal mentoring.

But it’s also important to have “self service” solutions—and one thing I’ve done to contribute to that is to write a book called An Elementary Introduction to the Wolfram Language. It’s really a book about computational thinking. It doesn’t assume any previous knowledge of programming, or, for example, of math. But in the course of the book it gets people to the point where they can routinely write real programs that do things they’re interested in.

The book is available free online. And it’s also got exercises—which are automatically graded in the cloud. I originally intended the book for high school and up. But it’s turned out that there’s ended up being quite a collection of middle-school students (aged 11 and up) who have enthusiastically worked their way through it—even as the book has also turned out to be used for things like graduate math courses, trainings at banks, and educating professional software developers.

There’s a (free) online course based on my book that will be available soon, and I know there are quite a few courses under development that use the book to teach modern programming and computational thinking.

But, OK, when a kid walks up to their web browser to learn computational thinking and the Wolfram Language, where can they actually go? A few months ago we launched Wolfram Programming Lab as an answer to this. There’s a version in the Wolfram Open Cloud that’s free (and doesn’t even require login so long as you don’t want to save your work).

Wolfram Programming Lab has two basic branches. The first is a collection of Explorations. Each Exploration is a notebook that’s set up to contain code you can edit and run to do something interesting. After you’ve gone through the code that’s already there, the notebook then suggests ways to go further, and to explore on your own.

Explorations let you get a taste of the Wolfram Language and computational thinking. Kids can typically get through the basics of several in an hour. In a sense they’re like “immersion language learning”: You start from code that “fluent speakers” might write, then you interact with it.

But Wolfram Programming Lab provides a second branch too: an interactive version of my book, that lets people go step-by-step, building up from a very simple start, and progressively creating more and more sophisticated code.

You can use Wolfram Programming Lab entirely through a web browser, in the cloud. But there’s also a desktop version that runs on any standard computer—and lets you get really zippy local interactivity, as well as letting you do bigger computations if you want. And if you have a Raspberry Pi computer, the desktop version of Wolfram Programming Lab comes bundled right with the operating system, including special features for getting data from sensors connected to the Raspberry Pi.

I’ve wanted to make sure that Wolfram Programming Lab is suitable for any kid, anywhere, whether or not they’re embedded in an educational environment that can support what they’re doing. And from what we can tell, this seems to be working nicely—though it certainly helps when kids have actual people they can work with. We plan to set up the structure for informal networks to support this, among other things using the existing, very active Wolfram Community. But we’re also setting things up so Wolfram Programming Lab can easily fit into existing, organized, educational settings—not least by using the Wolfram Language to create some of the world’s best educational analytics to analyze student progress.

It’s worth mentioning that one of the great things about our whole Wolfram Cloud infrastructure is that it lets anyone—whether they’re students or teachers—directly publish things on the web for the world to use. And in Wolfram Programming Lab, for example, it’s routine to end up deploying an app on the web as part of an Exploration.

We’re still in the early days of understanding all the nuances of actually deploying Wolfram Programming Lab in every possible learning environment—and we’re steadily advancing on many fronts. A little while ago I happened to be talking to some kids at a school in Korea, and asked them whether they thought they’d be able to learn the Wolfram Language. One of the kids responded that she thought it looked easy—except for having to read all the English in the names of the functions.

Well, that got me thinking. And the result was that we introduced multilingual code captions, that annotate code in a whole range of different languages. You still type Wolfram Language code using standard function names, but you get an instant explanation in your native language. (By the way, there are also versions of my book that will be available in various languages.)

Where Does Computational Thinking Fit In?

OK, so I’ve talked a bit about the mechanics of teaching computational thinking. But where does computational thinking fit into the standard educational curriculum? The answer, I think, is simple: everywhere!

One might think that computational thinking was somehow only relevant to STEM education. But it’s not true. Computational thinking is relevant across the whole curriculum. To social studies. To language arts. To music. To art. Even to sports. People have tried to make math relevant to all these areas. But you just can’t do enough with traditional hand-calculation-based math to make this realistic. But with computation and computational thinking it’s a completely different story. In every one of these areas there are very powerful—and often very clarifying—things that can be done with computation and computational thinking. And the great thing is that it’s all accessible to kids. The Wolfram Language takes care of all the internal technicalities—so one can really focus on the pure computational thinking and understanding, without the mechanics getting in the way.

One way to get to this is to extend what one imagines “math” education to be—and that’s a large part of what Computer-Based Math is doing. But another approach is just to think about inserting computational thinking directly into every other area of the curriculum. I’ve noticed that in practice—particularly at the grade school level—the teachers who get enthusiastic about teaching computational thinking may or may not have obvious technical backgrounds. It’s like with the current generation of kids: you don’t have to be a techie to be into knowledge-based programming and computational thinking.

In the past, with low-level computer languages like C++ and Java, you really did have to be a committed, engineering-oriented person to be teaching with them. But it’s a completely different story with the Wolfram Language. Yes, there’s plenty to learn if one wants to know the language well. But one is learning about general computational thinking, not the engineering details of computer systems.

So how should computational thinking be fitted into the school curriculum? Something I hear quite a lot is that teachers already have a hard time fitting everything they’re supposed to teach into the available time. So how can anything else be added? Well, here’s the surprising thing that I’m only just beginning to understand: adding computational thinking actually makes it easier to teach lots of things, so even with the time spent on computational thinking, the total time can actually go down, even though there’s more being learned.

How can this be? The main point is that computational thinking provides a framework that makes things more transparent and easier to understand. When you formulate something computationally, everyone can try it out and explicitly see how it works. There’s nothing hidden that the student somehow has to infer from some comment the teacher made.

Here’s a story from years ago, when the Wolfram Language—in the form of Mathematica—was first being used to teach calculus. It’s pretty common for calculus students to have trouble understanding the concept of a function. But professors told me that they started noticing that when they were learning calculus through Mathematica, somehow none of the students ended up being confused about functions. And the reason was that they had learned about functions through computational thinking—through seeing them explicitly and computationally in the Wolfram Language, rather than hearing about them more indirectly and abstractly as in standard calculus teaching.

Particularly in past decades there was a great tendency for textbooks in almost every subject to “stand on ceremony” in explaining things—so the best explanations often had to be sought out in semi-illicit outline publications. But somehow, with things like MathWorld and Wikipedia, a more direct style of presenting information has become commonplace—and has come to be taken for granted by today’s students. I see the application of computational thinking across every field as being a kind of dramatic continuation of this trend: taking things which could only be talked around, and turning them into things that can be shown through computation directly and explicitly.

You talk about a Shakespeare play and try to get a general sense of the flow in it. Well, with computational thinking you can imagine creating a social network for the play (who “knows” who through being in the same scene, etc.). And pretty soon you have a nice summary, that’s a place to launch from in talking about the nuances of the play and its themes.

Imagine you’re talking about different language families. Well, you can just take some words and use WordTranslation to translate them into hundreds of languages. Then you could make a dendrogram to show how the forms of those words cluster in different languages—and you can discover the Indo-European language family.

You could be talking about styles of art—and pull up lots of images of famous paintings that are built into the Wolfram Language. Then you could start comparing the use of color in different paintings—maybe making a plot of how it changed over time, seeing if one can tell when different styles came in.

You could be talking about the economics of different countries—and you could immediately create your own infographics, working with students to see how best to present what’s important. You could be talking about history, and you could use the historical map data in the Wolfram Language to compare the conquests of Alexander the Great and Julius Caesar. Or you could ask about US presidents, make a timeline showing their administrations, and compare them using economic or cultural indicators.

Let’s say you’re teaching English grammar. Well, it certainly helps that the Wolfram Language can automatically diagram sentences. But you can also let students try their own rules for generating sentences—so they can see what generates something they think is grammatically correct, and what doesn’t. How about spelling? Can computational thinking help with that? I’m not sure. It’s certainly easy to take all the common words in English, and start trying out different rules one might think of. And it’s fun to discover exceptions (does “u” always follow “q”: it’s trivial in the Wolfram Language to find out).

It’s an interesting exercise to take standard pieces of the curriculum for different subjects and ask “can this be helped by applying computational thinking?”. Sometimes the first thing one thinks of may be a gimmick. But what I’ve found is that if one really asks what the point of that piece of the curriculum is, there will end up being a way that computational thinking can help, right from the foundations.

Over time, there will be a larger and larger inventory of great examples of all this. In the past, with math (the non-computer-based version), it’s been rather disappointing: there just aren’t that many examples that work. Yes, there are things like exponential growth that show up in a bunch of places, but by the time one realizes that the examples in the calculus books are in many cases the same as they were in the 1700s, it’s not looking so good. And with standard programming the picture isn’t much better: there are only so many places that the Fibonacci sequence shows up. But with knowledge-based programming in the Wolfram Language the picture is completely different. Because the language immediately connects to the data and computations that are relevant across essentially every domain.

OK, so if one’s going to be teaching computational thinking, how should it be organized? Should one for example have a Computational Thinking class? At the college level, I think Computational Thinking 101 is a good idea. In fact, it might well be the single most important course many students take. At the high-school level, I think it’s less obvious what should be done, and though I’m certainly no expert, my tendency is to think that computational thinking is better inserted into lots of different modules within different classes.

One obvious question is: what’s the startup cost to having students engage with computational thinking? My feeling is that with the technology we’ve got now, it’s extremely low. With Wolfram|Alpha, it’s zero. With Explorations in the Wolfram Language, it’s very close to zero. With free-form code in the Wolfram Language, there’s a small amount to know, and perhaps it’s better for this to be taught in one go, a little like a miniature version of what would be a “service math course” at the college level.

It’s worth mentioning that computational thinking is rather unique in its breadth of applicability across the curriculum. Everyone would like what’s learned in one class to be applied in others, but it doesn’t happen all that often. I’ve already mentioned the difficulties with traditional math. The situation is a bit better with writing, where one would at least hope that students use what they’ve learned in producing essays in other subjects. But most fields are taught in intellectual silos, with nothing learned in one even being referenced in others. With computational thinking, though, there’s vastly more cross-connection. The social network for the Shakespeare play involves the same computational ideas as a network for international trade, or a diagram of the relations between words in different languages. The visualization technique one might use for economic performance is the same as for sports results. And so on.

What Can Kids Do?

Every day lots of top scientists and technologists use the Wolfram Language to do lots of sophisticated things. But of course the big thing in recent times is that the Wolfram Language has got to the point where it can also readily be used by kids. And I’m not talking about some watered-down toy version. I’m talking about the very same Wolfram Language that the fanciest professionals use. (Yes, just like the English language where there are obscure words kids won’t typically use, so there are obscure functions in the Wolfram Language that kids won’t typically use.)

So what’s made this possible? It’s basically the layers and layers of automation that we’ve built into the Wolfram Language over the past thirty years. The goal is to automate as much as possible—so that the humans who use the Wolfram Language, whether they’re sophisticated professionals or middle-school kids, just have to provide the concepts and the computational thinking, and then the language takes over and automates the details of actually getting things done.

In the past, there always had to be separate systems for kids and professionals to use. But thanks to all this automation, they’ve converged. It’s happened before, in other fields. For example, in video editing. Where there used to be simple systems for amateurs and complicated systems for professionals—but now everyone, from kids to makers of the world’s most expensive movies, uses the very same systems.

It’s probably more difficult to achieve this in computational thinking and programming—but that’s what the past thirty years of work on the Wolfram Language has, I think, now definitively achieved.

In many standard curriculum subjects, kids in school only get to do pale shadows of what professionals do. But when it comes to computational thinking, they’ve now got the same tools—and it’s now realistic for them to do the same professional-grade kinds of things.

Most of what kids get to do in school has, in a sense, little visible leverage. Kids spend a lot of effort to produce one answer in math or chemistry or whatever. If kids write essays, they have to explicitly write out each word. But with computational thinking and the Wolfram Language, it’s a different story. Once a kid understands how to formulate something computationally, and how to write it to the Wolfram Language, then the language takes over to build what’s potentially a big and sophisticated result.

A student might have some idea about the growth and decay of historical empires, and might figure out how to formulate the idea in terms of time series of geographic areas of historical countries. And as soon as they write this idea in the Wolfram Language, the language takes over, and pretty soon the student has elaborate tables and infographics and whatever—from which they can then draw all sorts of conclusions.

But what do kids learn from writing things in the Wolfram Language? Well, first and foremost, they learn computational thinking. Computational thinking is really a new way of thinking. But it’s got certain similarities in its character to other things kids do. Like math, for example, it forces a certain precision and clarity of thinking. But like writing, it’s fundamentally about communicating ideas. And also like writing, it’s a fundamentally creative activity. Good code in the Wolfram Language, like good writing, is clear and elegant—and can readily be read and understood. But unlike ordinary writing, humans aren’t the only target audience: it’s also for computers, to tell them what to automatically do.

When students do problems in math or chemistry or other subjects, the only way they can typically tell if they’ve got the right answer is for their teacher to tell them, or for them to “look it up in the back of the book”. But it’s a whole different story with Wolfram Language code. Because kids themselves can tell if they’re on the right track. The code was supposed to make a honeycomb-like array. Well, did it?

The whole process of creating code is a little different from anything else kids normally do. There’s formulating the code, and then there’s debugging it. Debugging is a very interesting intellectual exercise. The mechanics of it are vastly easier in the Wolfram Language than they’ve ever been before—because the Wolfram Language is symbolic, so any fragment of code can always be run on its own, and separately studied.

But debugging is ultimately about understanding, and problem solving. It’s a very pure form of what comes up in a great many things in life. But what’s really nice about it—particularly in the Wolfram Language—is the instant feedback. You changed something; did it help? Or do you have to dive in and figure out something else?

Part of debugging is just about getting a piece of code to produce something. But the other part is understanding if it produces the right thing. Is that really a sensible social network for the Shakespeare play? Why are there lots of characters who don’t seem to connect to anyone else? Let’s understand how we defined “connectivity”. Does it really make sense? Is there a better definition?

This is the kind of thing computational thinking is about. It’s not so much about programming: it’s about what should be programmed; it’s about the overall problem of formulating things so they can be put into computational form. And now—with today’s Wolfram Language—we have an environment for taking what’s been formulated, and actually turning it into something real.

Led by Kids

When I show computational thinking and the Wolfram Language to kids, I’ll usually try to figure out what the kids are interested in. Are they into art? Or science? Or history? Or videogames? Or what? Then—and it’s always fun for me to do this—I’ll come up with an example that relates to their interest. And we’ll run it. And it’ll produce some result, maybe some image or visualization. And then the kids will look at it, and think about it based on what they already know. And then, almost always, they’ll ask question. “How does this extend to that?” “What about doing this instead?” And this is where things get really good. Because when the kids are asking their own questions, you can tell they’re getting seriously engaged; they’re really thinking about what’s going on.

Most subjects that are taught in school are somewhat tightly constrained. Questions can be asked, but they’re more like typical “tech support”: help me to understand this existing feature. They’re not like “let’s talk about something new”. A few times I’ve done “ask me anything” sessions about science with kids. It’s an interesting experience. There’ll be a question where, yes, it can easily be answered from college-level physics. Then another question that might require graduate-level knowledge. And then—whoosh—there’ll be an obvious-sounding question which I know simply hasn’t been answered, even by the latest leading-edge research. Or maybe one where, yes, I know the answer, but only because just last month I happened to talk to the world expert who recently figured out. Before I tried these kinds of “ask me anything” sessions I didn’t really appreciate how hard it can be when kids ask “free-range” questions. But now I understand why unless one has teachers with broad research-level knowledge there’s little choice but to make traditional school subjects much more tightly constrained.

But there’s something new that’s possible using the Wolfram Language as a tool. Because with the Wolfram Language a teacher doesn’t have to know the whole answer to a question: they just have to be able to formulate the question in a computational way, so the Wolfram Language can compute the answer. Yes, there’s skill required on the part of the teacher to be able to write in the Wolfram Language. But it’s really fun—and educational—for student and teacher together to be getting the answers to questions.

I’ve often done what I call “live experiments”. I take some topic—either suggested by the audience, or that I thought of just before I start—and then I explore that topic live with the Wolfram Language, and see what I can discover about it. It’s gotten a lot easier over the years, as the capabilities and level of automation in the Wolfram Language have increased. I usually open our Wolfram Summer School by doing a live experiment. And I’ll make the claim that over the course of an hour or so, we’ll build up a notebook where we’ve discovered something new and interesting enough that it could be the seed for an academic paper or the like. It can be quite nerve-wracking for me. But in almost all cases it works out extremely well. And I think it’s an educational and empowering thing to watch. Because most people don’t realize that it’s even faintly possible to go from zero to a publishable discovery in an hour. But that’s what the modern Wolfram Language makes possible. And while it obviously helps that I personally have a lifetime of experience in computational thinking and discovering things, it’s surprisingly easy for anyone with decent knowledge of computational thinking and the Wolfram Language to do a very compelling live experiment.

When I was a kid I was never a fan of exercises in textbooks. I always took the point of view that it wasn’t very exciting to do the same thing lots of people had already done. And so I always tried to think of different questions that I could explore, and where I could potentially see things that nobody had seen before. Well, in modern times with the Wolfram Language, doing things that have never been done before has become vastly easier. Not every kid has the same motivation structure as I had. But for many people there’s extra satisfaction in being able to make something that’s really their own creation—and not just a re-run of what’s been made before. And at a practical level, it’s great that with the Wolfram Cloud it’s easy to share what’s made—and for example to create one’s own active website or app, that one can show to one’s class, one’s friends, or the world.

So where are there discoveries that can be made by kids? Everywhere! Even in a technical, well-developed area like math, there’s endless experimental mathematics to be done, where discoveries can be made. In the sciences, there’s a small additional hurdle, because one’s typically got to deal with actual data. Of course, there’s lots of data built right into the Wolfram Language. And it’s easier than ever to get more data. Perhaps one just uses a camera or a microphone, or, more elaborately, one gets sensors connected through Raspberry Pi or Arduino, or whatever.

So what about the humanities? Well, here again one needs data. But again there’s lots of it that’s built right into the Wolfram Language (images of famous artworks, texts of famous books, information on historical countries, and so on and so on). And in today’s world, it’s become extremely easy to find more data on the web—and to import it into the Wolfram Language. Sometimes there’s some data curation involved (which itself is interesting and educational), but it’s amazing in modern times how easy it’s become to find, for example, even obscure documents from centuries ago on the web. (And, yes, that’s one of the things that’s really helped my own hobby of studying history.)

Computational thinking is an area that really lends itself to project-based learning. Every year for our summer programs, I come up with hundreds of ideas for projects that are accessible to kids. And with a little help, the kids themselves come up with even more. For our summer programs, we have kids work on projects on their own, but it’s easy for kids to collaborate on these projects too. We typically have a definite end point for projects: create a Demonstration, or a web app, and write a description, perhaps to post on the Wolfram Community. (Particularly with Demonstrations for our Wolfram Demonstrations Project, the actual process of review and publication tends to be educational too.)

Of course, even when a particular project has been “done before”, it’ll usually be different if it’s done again. At the very simplest level, writing code is a creative process and different people will write it differently. And if there are visualizations or user interfaces as part of the project, each person can creatively invent new ways to do these.

But, OK, all this creative stuff is well and good. But in practice a lot of education has to be done in more of a production-line mode, with large numbers of students in some sense always doing the same thing. And even with this constraint, there’s something good about computational thinking, and coding in the Wolfram Language. One of the convenient features of math is that when people do exercises, they get definite answers, which are easy to check (well, at least up to issues of equivalence of algebraic expressions, which basically needs our whole math technology stack to get right). When people write essays, there’s basically no choice but to have actual humans read them (yes, one can determine some things with natural language processing and machine learning, but the real point of essays is to communicate with humans, and ultimately to tell if that’s working you really need humans in the loop).

Well, when one writes a piece of code, it’s a creative act, like writing an essay. But now one’s making something that’s set up to be communicated to a computer. And so it makes perfect sense to have a computer read it and assess it. It’s still not a trivial task, though. Because, for example, one wants to check that the student didn’t in effect just put the final answer right into the code they wrote—and that the code really did express, preferably with clarity, a computational idea. It gets pretty high tech, but by using the symbolic character of the Wolfram Language, plus some automated theorem proving and machine learning, it seems to be possible to do very well on this in practice. And that’s for example what’s allowed us to put automatically graded versions of the exercises from my Elementary Introduction book on the web.

At one level one can assess what’s going on by looking at the final code students write. Even though there may be an infinite number of different possible programs, one can assess which ones are correct, and even which ones satisfy particular efficiency or elegance criteria. But there’s much further one can go. Because unlike an area like math where students tend to do their thinking on scratch paper, in coding each step in the process of writing a program tends to be done on the computer, with every keystroke able to be captured. I myself have long been an enthusiast of personal analytics, and occasionally I’ve done at least a little analysis on the process by which I write and debug programs. But there’s a great opportunity in education for this, first in producing elaborate educational analytics (for which the Wolfram Language and Wolfram Cloud are a perfect fit), and then for creating deep ways of adapting to the actual behavior and learning process of each individual student.

Ultimately what we presumably want is an accurate computational model of every student. And with the current machine learning technology that we have in the Wolfram Language I think we’re beginning to have what’s needed to build it. Given this model what we’d then presumably do is in effect to run lots of simulations of what would happen if the student were told this or that, trying to determine what the optimal thing to explain, or optimal exercise to give, would be at any given time.

In helping with an area like basic math, this kind of personalization is fairly easy to do with simple heuristics. When it comes to helping with coding and computational thinking, the problem is considerably more complicated. But it’s a place where, with good computational thinking, and sophisticated computation inside the system, I think it’ll be possible to do something really good.

I might mention that there’s always a question of what one should assess to find out if someone has really understood a particular thing. With a good computational model of every student, one could have a very sophisticated answer to this. But somewhere one’s still going to have to invent types of exercises or tests to give (well, assuming that one doesn’t just go for the arguably much better scheme of just assessing whole projects).

One fundamental type of exercise—of which my Elementary Introduction is full—is of the form “write a piece of code to do X”. But there are others too. One is “simplify this piece of code”, or “find an input where this function will fail”. Of course, there are exercises like “what will this piece of code do?”. But in some sense exercises like that seem silly: after all, one can just run the code to find out.

Now, I have to say I think it’s useful for people to do a bit of “acting like a computer”. It’s helpful in understanding what computation is, and how the process of computation works. But it’s not something to do a lot of. The real focus, I think, should be on educating people about what they themselves actually need to do. There is technology and automation in the world, and there’ll be more of it over time. There’s no point in teaching people to do a computer’s job; one should teach them to do what only they can do, working with the computer as a tool and partner, in the best possible way.

(I’ve heard arguments about teaching kids how to do arithmetic without calculators that go along the lines of “what if you were on a desert island without a calculator?”. And I can hear it now having someone make the same argument about teaching kids how to work out what programs do by hand. But, er, if you’re on a desert island without a computer, why exactly are you writing code? [Of course, when code literacy becomes more universal, it might be a different story, because humans on a desert island might be writing code to read themselves...])

OK, so what are the important things to teach? Computational thinking is really about thinking. It’s about formulating ideas in a structured way, that, conveniently enough, can in the modern world be communicated to a computer, which can then do interesting things.

There are facts and ideas to know. Some of them are about the abstract process of computation. But some of them are about how things in the world get made systematic. How is color represented? How are points on the Earth specified? How does one represent the glyphs of different human languages? And so on. We made a poster a few years ago of the history of the systematic representation of data. Just the content of that poster would make an interesting course.

But, OK, so if one knows something about how to represent things, and about the processes of computation, what should one learn how to figure out? The fundamental goal is to get to the point where one’s able to take something one wants to know or do, and be able to cast it into computational form.

Often that’s about “inventing an algorithm”, or “inventing a heuristic”. What’s a good way to compare the growth of the Roman Empire with the spread of the Mongols? What’s the right thing to compute? The right thing to display? How can one tell if there are really more craters near the poles of the Moon? What’s a good way to identify a crater from an image anyway?

It’s the analog of things like this that are at the core of making progress in basically every “computational X” field. And it’s people who learn to be good at these kinds of things who will be the most successful in these fields. Around our company, many of these “invent an algorithm; invent a heuristic” kinds of problems are solved every day—and that’s a large part of what’s gone into building up the Wolfram Language, and Wolfram|Alpha, all these years.

Yes, once the algorithm or the heuristic is invented, it’s up to the computer to execute it. But inventing it is typically first and foremost about understanding what’s wanted in a clear and structured enough way that it can be made computational. With effort, one can invent disembodied exercises that are as abstract as possible. But what’s much more common—and useful—is to have questions that connect to the outside world.

Even a question like “Given a bunch of x,y pairs, what’s a good algorithm for deciding if one should plot them as separate points, or with a line joining them?” is really a question that depends on thinking about the way the world is. And from an educational point of view, what’s really nice about questions of computational thinking is that they almost inevitably involve input from other domains of knowledge. They force a certain kind of broad, general thinking, and a certain application of common sense, that is incredibly valuable for so much of what people need to do.

What Is Computation, and Programming, Anyway?

Teaching “coding” is something that’s been talked about quite a lot in the past few years. Of course, “coding” isn’t the same as computational thinking. It’s a little bit like the relation of handwriting or typing to essay writing. You (normally) need handwriting or typing to be able to actually produce an essay, but it’s not the intellectual core of the activity. But, OK, so how should one teach “coding”?

Well, in the Wolfram Language the idea is that one should be able to take ideas as humans formulate them with computational thinking, and convert them as directly as possible into code in the language. In some small cases (and they’ll gradually get a bit bigger) it’s possible to just specify what one wants in English. But normally one’s writing directly in the Wolfram Language. Which means at some level one’s doing coding, otherwise known as programming.

It’s a much higher-level form of programming, though, than most programmers are used to. And that’s precisely why it’s now accessible to a much broader range of people, and why it makes sense to inject it on a large scale into education.

So how does it relate to “traditional” programming education? There are really two types of programming education that have been tried: what one might call the “high-school version” and the “elementary-school version”. These days the high-school version is mostly about C++ and Java. The elementary-school version is mostly about derivatives of Logo like Scratch. I’ve been shocked, though, that even among technically-oriented kids educated at sophisticated schools in the US, it’s still surprisingly rare to find ones who’ve learned any serious amount of programming in school.

But when they do learn about “programming”, say in high school, what do they actually learn? There’s usually a lot of syntactic detail, but the top concepts tend to be conditionals, loops and variables. As someone who’s spent most of his life thinking about computation, this is really disappointing. Yes, these concepts are certainly part of low-level computer languages. But they’re not central to what we now broadly understand as computation—and in computational thinking in general they’re at best side shows.

What is important? In practice, probably the single most important concept is just that everything (text, images, networks, user interfaces, whatever) can be represented in computational form. Ideas like functions and lists are also important. And if one’s being intellectual, the notion of universal computation (which is what makes software possible) is important too.

But the problem is that what’s being taught now is not only not general computational thinking, it’s not even general programming. Conditionals, loops and variables were central to the very first practical computer languages that emerged in the 1960s. Today’s computer languages—like C++ and Java—have much better ways to manage large volumes of code. But their underlying computational structure is remarkably similar to the 1960s languages. And in fact kids—who are typically writing very small amounts of code—end up really just dealing with computing as it was in the 1960s (though perhaps with a mechanisms aimed at large codebases making it more complicated).

The Wolfram Language is really a language of modern times. It wouldn’t have been practical at all in the 1960s: computers just weren’t big and fast enough, and there wasn’t anything like the cloud in which to maintain a large knowledgebase. (As it happens, there were languages like LISP and APL even in the early 1960s that had higher-level ideas reminiscent of the Wolfram Language, but it took decades before those ideas could really be used in practice.)

So what of loops and conditionals and variables? Well, they all exist in the Wolfram Language. They just aren’t front and center concepts. In my Elementary Introduction book, for example, it’s Chapter 38 before I talk about assigning values to variables, and it happens after I’ve discussed deploying sophisticated knowledge-based apps to the web.

To give an example, let’s say one wants to make a table of the first 10 squares. In the Wolfram Language one could do this very simply, with:

$Table[n^2, {n, 10}]$

But if one’s working in C for example, it’d be roughly:

int n; for (n=1; n<=10; n++) printf("%d ", n*n);

A non-programmer might ask: “What the heck is all that stuff?” Well, instead of just saying directly what we want, what it’s doing is telling the computer at a low level exactly what it should do. We’re telling it to allocate memory to store the integer value of n. We’re saying to start with n=1, and keep incrementing n until it gets to 10. And then we’re saying in each case to the computer that it should print the square. There’s a lot of detail. (To be fair, in a more modern language like Python or JavaScript, some of this goes away, but in this example we’re still left dealing with an explicit loop and its variable.)

Now, the crucial point is that the loops and conditionals and variables aren’t the real point of the computation; they’re just details of the particular implementation in a low-level language. I’ve heard people say it’s simpler for kids to understand what’s going on when there are explicit loops and conditionals and variables. From my observations this simply isn’t true. Maybe it’s something that’s changed over the years, as people have gotten more exposed to computation and computational ideas in their everyday lives. But as of now, talking about the details of loops and conditionals and variables just seems to make it harder for kids to understand the concepts of computation.

Is it useful to learn about loops and conditionals and variables at some point? Definitely. They’re part of the whole story of computation and computational thinking. They’re just not the most important part, or the first part to learn. Oh, and by the way, if one’s going to start talking about doing computation with images or networks or whatever, concepts like loops really aren’t what one wants at all.

One important feature of the Wolfram Language is that in its effort to cover general computational thinking it integrates a large number of different computational paradigms. There’s functional programming. And procedural programming. And list-based programming. And symbolic programming. And machine learning and example-based programming. And so on. So when people learn the Wolfram Language, they’re immediately getting exposed to a broad spectrum of computational ideas, conveniently all consistently packaged together.

But what happens when someone who’s learned programming in the Wolfram Language wants to do low-level programming in C++ or Java? I’ve seen this a few times, and it’s been quite charming. They seem to have no difficulty at all grasping how to do good programming in these lower-level languages, but they keep on exclaiming about all the quaint things they have to do, and all the things that don’t work. “Oh my gosh, I actually have to allocate memory myself”. “Wow, there’s a limit on the size of an integer”. And so on.

The transition from the Wolfram Language to lower-level languages seems to be easy. The other way around it’s sometimes a little more challenging. And I must say that I often find it easier to teach computational thinking to kids who know nothing about programming: they pick up the concepts very quickly, and they don’t have to unlearn the idea that everything must turn into loops and conditionals and so on.

When I started considering teaching computational thinking and the Wolfram Language to kids, I imagined it would mostly be high-school kids. But particularly when my Introduction book came out, I was surprised to learn that all sorts of 11- and 12-year-olds were going through it. And my current conclusion is that what we’ve got with Wolfram Programming Lab and so on is suitable for kids down to about age 11 or 12.

What about younger kids? Well, in today’s world, all of them are using computers or smartphones, and are getting exposed to all sorts of computational activities. Maybe they’re making and editing videos. Maybe they’re constructing assets for a game. And all of these kinds of activities are good precursors to computational thinking.

Back in the 1960s, a bold experiment was started in the form of Logo. I’m told the original idea was to construct 50 “microworlds” where kids could experiment with computers. The very first one involved a “turtle” moving around on the screen—and over the course of a half-century this evolved into things like Scratch (which has an orange cat rather than a turtle). Unfortunately, however, the other 49 microworlds never got built. And while the turtle (or cat) is quite cute (and an impressive idea for the 1960s), it seems disappointingly narrow from the point of view of today’s understanding and experience of computation.

Still, lots of kids are exposed to things like Scratch in elementary school—even if sometimes only for a single “hour of code” in a year. In past years, there was clear value in having younger kids get the idea that they could make a computer do what they want at all. But the proliferation of other ways young kids use computation and computational ideas has made this much less significant. And yes, teaching loops and conditionals to elementary-school kids does seem a bit bizarre in modern times.

I strongly suspect that there are some much better ways to teach ideas of computational thinking at young ages—making use of all the technology and automation we have now. One feature of systems like Scratch is that their programs are assembled visually out of brick-like blocks, rather than having to be typed. Usually in practice the programs are quite linear in their structure. But the blocks do two things. First, they avoid the need for any explicit syntax (instead it’s just “does the block fit or not?”). And second, by having a stack of possible blocks on the side of the screen, they immediately document what’s possible.

And perhaps even more important: this whole setup forces one to have only a small collection of possible blocks, in effect a microworld. In the full Wolfram Language, there are over 5000 built-in functions, and just turning them all into blocks would be overwhelming and unhelpful. But the point is to select out of all these possible functions several (50?) microworlds, each involving only a small set of functions, but each chosen so that rich and interesting things can be done with them.

With our current technology, those microworlds can readily involve image computation, or natural language understanding, or machine learning—and, most importantly, can immediately relate to the real world. And I strongly suspect that by including some of these far-past-the-1960s things, we’ll be able to expose young kids much more directly and successfully to ideas about computational thinking that they’ll be able to take with them when they come to learn more later.

How Is This All Going to Happen?

The process of educating kids—and the world—about computational thinking is only just beginning. I’m very excited that with the Wolfram Language and the systems around it, we’ve finally got tools that I think solve the core technological problems involved. But there are lots of structural, organizational and other issues that remain.

I’m trying to do my part, for example, by writing my Elementary Introduction to the Wolfram Language, releasing Wolfram Programming Lab, and creating the free Wolfram Open Cloud. But these are just first steps. There need to be lots of books and courses aimed at different populations. There need to be online and offline communities and activities defined. There need to be ways to deliver what’s now possible to students. And there need to be ways to teach teachers how to help.

We’ve got quite a few basic things in the works. A packaged course based on the Elementary Introduction. A Wolfram Challenges website with coding and computational thinking challenges. A more structured mentorship program for individual students doing projects. A franchisable version of our Wolfram Summer Camp. And more. Some of these are part of Wolfram Research; some come from the Wolfram Foundation. We’re considering a broader non-profit initiative to support delivering computational thinking education. And we’ve even thought about creating a whole school that’s centered around computational thinking—not least to show at least one model of how it can be done.

But beyond anything we’re doing, what I’m most excited about is that other people, and other organizations, are starting to take things forward, too. There are in-school programs, after-school programs, summer programs. There are the beginnings of very large-scale programs across countries.

Our own company and foundation are fairly small. To be able to educate the world about computational thinking, many other people and organizations need to be involved. Thanks to three decades of work we are at the point where have the technology. But now we have to actually get it delivered to kids all over the world in the right way.

Computational thinking is something that I think can be successfully taught to a very wide range of people, regardless of their economic resources. And because it’s so new, countries or regions with more sophisticated educational setups, or greater technological prowess, don’t really have any great advantage over anyone else in doing it.

Eventually, much of the world’s population will be able to do computational thinking and be able to communicate with computers using code—just as they can now read and write. But today we’re just at the beginning of making this happen. I’m pleased to be able to contribute technology and a little more to this. I look forward to seeing what I hope will be rapid progress on this in the next year or so, and in the years to come.

Try the example computations from this blog post in the Wolfram Open Cloud »

↧

Computational Law, Symbolic Discourse and the AI Constitution

October 12, 2016, 11:30 am

≫ Next: A Short Talk on AI Ethics

≪ Previous: How to Teach Computational Thinking

Leibniz’s Dream

Computational Law, Discourse Language and the AI Constitution Gottfried Leibniz—who died 300 years ago this November—worked on many things. But a theme that recurred throughout his life was the goal of turning human law into an exercise in computation. Of course, as we know, he didn’t succeed. But three centuries later, I think we’re finally ready to give it a serious try again. And I think it’s a really important thing to do—not just because it’ll enable all sorts of new societal opportunities and structures, but because I think it’s likely to be critical to the future of our civilization in its interaction with artificial intelligence.

Human law, almost by definition, dates from the very beginning of civilization—and undoubtedly it’s the first system of rules that humans ever systematically defined. Presumably it was a model for the axiomatic structure of mathematics as defined by the likes of Euclid. And when science came along, “natural laws” (as their name suggests) were at first viewed as conceptually similar to human laws, except that they were supposed to define constraints for the universe (or God) rather than for humans.

Over the past few centuries we’ve had amazing success formalizing mathematics and exact science. And out of this there’s a more general idea that’s emerged: the idea of computation. In computation, we’re dealing with arbitrary systems of rules—not necessarily ones that correspond to mathematical concepts we know, or features of the world we’ve identified. So now the question is: can we use the ideas of computation, in very much the way Leibniz imagined, to formalize human law?

The basic issue is that human law talks about human activities, and (unlike say for the mechanics of particles) we don’t have a general formalism for describing human activities. When it comes to talking about money, for example, we often can be precise. And as a result, it’s pretty easy to write a very formal contract for paying a subscription, or determining how an option on a publicly traded stock should work.

But what about all the things that typical legal contracts deal with? Well, clearly we have one way to write legal contracts: just use natural language (like English). It’s often very stylized natural language, because it’s trying to be as precise as possible. But ultimately it’s never going to be precise. Because at the lowest level it’s always going to depend on the meanings of words, which for natural language are effectively defined just by the practice and experience of the users of the language.

A New Kind of Language

For a computer language, though, it’s a different story. Because now the constructs in the language are absolutely precise: instead of having a vague, societally defined effect on human brains, they’re defined to have a very specific effect on a computer. Of course, traditional computer languages don’t directly talk about things relevant to human activities: they only directly talk about things like setting values for variables, or calling abstractly defined functions.

But what I’m excited about is that we’re starting to have a bridge between the precision of traditional computer languages and the ability to talk about real-world constructs. And actually, it’s something I’ve personally been working on for more than three decades now: our knowledge-based Wolfram Language.

The Wolfram Language is precise: everything in it is defined to the point where a computer can unambiguously work with it. But its unique feature among computer languages is that it’s knowledge based. It’s not just a language to describe the low-level operations of a computer; instead, built right into the language is as much knowledge as possible about the real world. And this means that the language includes not just numbers like 2.7 and strings like “abc”, but also constructs like the United States, or the Consumer Price Index, or an elephant. And that’s exactly what we need in order to start talking about the kinds of things that appear in legal contracts or human laws.

I should make it clear that the Wolfram Language as it exists today doesn’t include everything that’s needed. We’ve got a large and solid framework, and we’re off to a good start. But there’s more about the world that we have to encode to be able to capture the full range of human activities and human legal specifications.

The Wolfram Language has, for example, a definition of what a banana is, broken down by all kinds of details. So if one says “you should eat a banana”, the language has a way to represent “a banana”. But as of now, it doesn’t have a meaningful way to represent “you”, “should” or “eat”.

Is it possible to represent things like this in a precise computer language? Absolutely! But it takes language design to set up how to do it. Language design is a difficult business—in fact, it’s probably the most intellectually demanding thing I know, requiring a strange mixture of high abstraction together with deep knowledge and down-to-earth practical judgment. But I’ve been doing it now for nearly four decades, and I think I’m finally ready for the challenge of doing language design for everyday discourse.

So what’s involved? Well, let’s first talk about it in a simpler case: the case of mathematics. Consider the function Plus, which adds things like numbers together. When we use the English word “plus” it can have all sorts of meanings. One of those meanings is adding numbers together. But there are other meanings, that are related, say, by various analogies (“product X plus”, “the plus wire”, “it’s a real plus”, …).

When we come to define Plus in the Wolfram Language we want to build on the everyday notion of “plus”, but we want to make it precise. And we can do that by picking the specific meaning of “plus” that’s about adding things like numbers together. And once we know that this is what Plus means, we immediately know all sorts of properties, and can do explicit computations with it.

Now consider a concept like “magnesium”. It’s not as perfect and abstract a concept as Plus. But physics and chemistry give us a clear definition of the element magnesium—which we can then use in the Wolfram Language to have a well-defined “magnesium” entity.

It’s very important that the Wolfram Language is a symbolic language—because it means that the things in it don’t immediately have to have “values”; they can just be symbolic constructs that stand for themselves. And so, for example, the entity “magnesium” is represented as a symbolic construct, that doesn’t itself “do” anything, but can still appear in a computation, just like, for example, a number (like 9.45) can appear.

There are many kinds of constructs that the Wolfram Language supports. Like “New York City” or “last Christmas” or “geographically contained within”. And the point is that the design of the language has defined a precise meaning for them. New York City, for example, is taken to mean the precise legal entity considered to be New York City, with geographical borders defined by law. Internal to the Wolfram Language, there’s always a precise canonical representation for something like New York City (it’s Entity["City", {"NewYork", "NewYork", "UnitedStates"}]). And this internal representation is all that matters when it comes to computation. Yes, it’s convenient to refer to New York City as “nyc”, but in the Wolfram Language that natural language form is immediately converted to the precise internal form.

So what about “you should eat a banana”? Well, we’ve got to go through the same language design process for something like “eat” as for Plus (or “banana”). And the basic idea is that we’ve got to figure out a standard meaning for “eat”. For example, it might be “ingestion of food by a person (or animal)”. Now, there are plenty of other possible meanings for the English word “eat”—for example, ones that use analogies, as in “this function eats its arguments”. But the idea—like for Plus—is to ignore these, and just to define a standard notion of “eat” that is precise, and suitable for computation.

One gets a reasonable idea of what kinds of constructs one has to deal with just by thinking about parts of speech in English. There are nouns. Sometimes (as in “banana” or “elephant”) there’s a pretty precise definition of what these correspond to, and usually the Wolfram Language already knows about them. Sometimes it’s a little vaguer but still concrete (as in “chair” or “window”), and sometimes it’s abstract (like “happiness” or “justice”). But in each case one can imagine one or several entities that capture a definite meaning for the noun—just like the Wolfram Language already has entities for thousands of kinds of things.

Beyond nouns, there are verbs. There’s typically a certain superstructure that exists around verbs. Grammatically there might be a subject for the verb, and an object, and so on. Verbs are similar to functions in the Wolfram Language: each one deals with certain arguments, that for example correspond to its subject, object, etc. Now of course in English (or any other natural language) there are all sorts of elaborate special cases and extra features that can be associated with verbs. But basically we don’t care about these. Because we’re really just trying to define symbolic constructs that represent certain concepts. We don’t have to capture every detail of how a particular verb works; we’re just using the English verb as a way to give us a kind of “cognitive hook” for the concept.

We can go through other parts of speech. Adverbs that modify verbs; adjectives that modify nouns. These can sometimes be represented in the Wolfram Language by constructs like EntityInstance, and sometimes by options to functions. But the important point in all cases is that we’re not trying to faithfully reproduce how the natural language works; we’re just using the natural language as a guide to how concepts are set up.

Pronouns are interesting. They work a bit like variables in pure anonymous functions. In “you should eat a banana”, the “you” is like a free variable that’s going to be filled in with a particular person.

Parts of speech and grammatical structures suggest certain general features to capture in a symbolic representation of discourse. There are a bunch of others, though. For example, there are what amount to “calculi” that one needs to represent notions of time (“within the time interval”, “starting later”, etc.) or of space (“on top of”, “contained within”, etc.). We’ve already got many calculi like these in the Wolfram Language; the most straightforward are ones about numbers (“greater than”, etc.) or sets (“member of”), etc. Some calculi have long histories (“temporal logic”, “set theory”, etc.); others still have to be constructed.

Is there a global theory of what to do? Well, no more than there’s a global theory of how the world works. There are concepts and constructs that are part of how our world works, and we need to capture these. No doubt there’ll be new things that come along in the future, and we’ll want to capture those too. And my experience from building Wolfram|Alpha is that the best thing to do is just to build each thing one needs, without starting off with any kind of global theory. After a while, one may notice that one’s built similar things several times, and one may go in and unify them.

One can get deep into the foundations of science and philosophy about this. Yes, there’s a computational universe out there of all the possible rules by which systems can operate (and, yes, I’ve spent a good part of my life studying the basic science of this). And there’s our physical universe that presumably operates according to certain rules from the computational universe. But from these rules can emerge all sorts of complex behavior, and in fact the phenomenon of computational irreducibility implies that in a sense there’s no limit to what can be built up.

But there’s not going to be an overall way to talk about all this stuff. And if we’re going to be dealing with any finite kind of discourse it’s going to only capture certain features. Which features we choose to capture is going to be determined by what concepts have evolved in the history of our society. And usually these concepts will be mirrored in the words that exist in the languages we use.

At a foundational level, computational irreducibility implies that there’ll always be new concepts that could be introduced. Back in antiquity, Aristotle introduced logic as a way to capture certain aspects of human discourse. And there are other frameworks that have been introduced in the history of philosophy, and more recently, natural language processing and artificial intelligence research. But computational irreducibility effectively implies that none of them can ever ultimately be complete. And we must expect that as the concepts we consider relevant evolve, so too must the symbolic representation we have for discourse.

The Discourse Workflow

OK, so let’s say we’ve got a symbolic representation for discourse. How’s it actually going to be used? Well, there are some good clues from the way natural language works.

In standard discussions of natural language, it’s common to talk about “interrogative statements” that ask a question, “declarative statements” that assert something and “imperative statements” that say to do something. (Let’s ignore “exclamatory statements”, like expletives, for now.)

Interrogative statements are what we’re dealing with all the time in Wolfram|Alpha: “what is the density of gold?”, “what is 3+7?”, “what was the latest reading from that sensor?”, etc. They’re also common in notebooks used to interact with the Wolfram Language: there’s an input (In[1]:= 2+2) and then there’s a corresponding output (Out[1]= 4).

Declarative statements are all about filling in particular values for variables. In a very coarse way, one can set values (x=7), as in typical procedural languages. But it’s typically better to think about having environments in which one’s asserting things. Maybe those environments are supposed to represent the real world, or some corner of it. Or maybe they’re supposed to represent some fictional world, where for example dinosaurs didn’t go extinct, or something.

Imperative statements are about making things happen in the world: “open the pod bay doors”, “pay Bob 0.23 bitcoin”, etc.

In a sense, interrogative statements determine the state of the world, declarative statements assert things about the state of the world, and imperative statements change the state of the world.

In different situations, we can mean different things by “the world”. We could be talking about abstract constructs, like integers or logic operations, that just are the way they are. We could be talking about natural laws or other features of our physical universe that we can’t change. Or we could be talking about our local environment, where we can move around tables and chairs, choose to eat bananas, and so on. Or we could be talking about our mental states, or the internal state of something like a computer.

There are lots of things one can do if one has a general symbolic representation for discourse. But one of them—which is the subject of this post—is to express things like legal contracts. The beginning of a contract, with its various whereas clauses, recitals, definitions and so on tends to be dense with declarative statements (“this is so”). Then the actual terms of the contract tend to end up with imperative statements (“this should happen”), perhaps depending on certain things determined by interrogative statements (“did this happen?”).

It’s not hard to start seeing the structure of contracts as being much like programs. In simple cases, they just contain logical conditionals: “if X then Y”. In other cases they’re more modeled on math: “if this amount of X happens, that amount of Y should happen”. Sometimes there’s iteration: “keep doing X until Y happens”. Occasionally there’s some recursion: “keep applying X to every Y”. And so on.

There are already some places where legal contracts are routinely represented by what amount to programs. The most obvious are financial contracts for things like bonds and options—which just amount to little programs that define payouts based on various formulas and conditionals.

There’s a whole industry of using “rules engines” to encode certain kinds of regulations as “if then” rules, usually mixed with formulas. In fact, such things are almost universally used for tax and insurance computations. (They’re also common in pricing engines and the like.)

Of course, it’s no coincidence that one talks about “legal codes”. The word code—which comes from the Latin codex—originally referred to systematic collections of legal rules. And when programming came along a couple of millennia later, it used the word “code” because it basically saw itself as similarly setting up rules for how things should work, except now the things had to do with the operation of computers rather than the conduct of worldly affairs.

But now, with our knowledge-based computer language and the idea of a symbolic discourse language, what we’re trying to do is to make it so we can talk about a broad range of worldly affairs in the same kind of way that we talk about computational processes—so we put all those legal codes and contracts into computational form.

Code versus Language

How should we think about symbolic discourse language compared to ordinary natural language? In a sense, the symbolic discourse language is a representation in which all the nuance and “poetry” have been “crushed” out of the natural language. The symbolic discourse language is precise, but it’ll almost inevitably lose the nuance and poetry of the original natural language.

If someone says “2+2” to Wolfram|Alpha, it’ll dutifully answer “4”. But what if instead they say, “hey, will you work out 2+2 for me”. Well, that sets up a different mood. But Wolfram|Alpha will take that input and convert it to exactly the same symbolic form as “2+2”, and similarly just respond “4”.

This is exactly the kind of thing that’ll happen all the time with symbolic discourse language. And if the goal is to answer precise questions—or, for that matter, to create a precise legal contract, it’s exactly what one wants. One just needs the hard content that will actually have a consequence for what one’s trying to do, and in this case one doesn’t need the “extras” or “pleasantries”.

Of course, what one chooses to capture depends on what one’s trying to do. If one’s trying to get psychological information, then the “mood” of a piece of natural language can be very important. Those “exclamatory statements” (like expletives) carry meaning one cares about. But one can still perfectly well imagine capturing things like that in a symbolic way—for example by having an “emotion track” in one’s symbolic discourse language. (Very coarsely, this might be represented by sentiment or by position in an emotion space—or, for that matter, by a whole symbolic language derived, say, from emoji.)

In actual human communication through natural language, “meaning” is a slippery concept, that inevitably depends on the context of the communication, the history of whoever is communicating, and so on. My notion of a symbolic discourse language isn’t to try to magically capture the “true meaning” of a piece of natural language. Instead, my goal is just to capture some meaning that one can then compute with.

For convenience, one might choose to start with natural language, and then try to translate it into the symbolic discourse language. But the point is for the symbolic discourse language to be the real representation: the natural language is just a guide for trying to generate it. And in the end, the notion is that if one really wants to be sure one’s accurate in what one’s saying, one should say it directly in the symbolic discourse language, without ever using natural language.

Back in the 1600s, one of Leibniz’s big concerns was to have a representation that was independent of which natural language people were using (French, German, Latin, etc.). And one feature of a symbolic discourse language is that it has to operate “below” the level of specific natural languages.

There’s a rough kind of universality among human languages, in that it seems to be possible to represent any human concept at least to some approximation in any language. But there are plenty of nuances that are extremely hard to translate—between different languages, or the different cultures that surround them (or even the same language at different times in history). But in the symbolic discourse language, one’s effectively “crushing out” these differences—and getting something that is precise, even though it typically won’t correspond exactly to any particular human natural language.

A symbolic discourse language is about representing things in the world. Natural language is just one way to try to describe those things. But there are others. For example, one might give a picture. One could try to describe certain features of the picture in natural language (“a cat with a hat on its head”)—or one could go straight from the picture to the symbolic discourse language.

In the example of a picture, it’s very obvious that the symbolic discourse language isn’t going to capture everything. Maybe it could capture something like “he is taking the diamond”. But it’s not going to specify the color of every pixel, and it’s not going to describe all conceivable features of a scene at every level of detail.

In some sense, what the symbolic discourse language is doing is to specify a model of the system it’s describing. And like any model, it’s capturing some features, and idealizing others away. But the importance of it is that it provides a solid foundation on which computations can be done, conclusions can be drawn, and actions can be taken.

Why Now?

I’ve been thinking about creating what amounts to a general symbolic discourse language for nearly 40 years. But it’s only recently—with the current state of the Wolfram Language—that I’ve had the framework to actually do it. And it’s also only recently that I’ve understood how to think about the problem in a sufficiently practical way.

Yes, it’s nice in principle to have a symbolic way to represent things in the world. And in specific cases—like answering questions in Wolfram|Alpha—it’s completely clear why it’s worth doing this. But what’s the point of dealing with more general discourse? Like, for example, when do we really want to have a “general conversation” with a machine?

The Turing test says that being able to do this is a sign of achieving general AI. But “general conversations” with machines—without any particular purpose in mind—so far usually seem in practice to devolve quickly into party tricks and Easter eggs. At least that’s our experience looking at interactions people have with Wolfram|Alpha, and it also seems to be the experience with decades of chatbots and the like.

But the picture quickly changes if there’s a purpose to the conversation: if you’re actually trying to get the machine to do something, or learn something from the machine. Still, in most of these cases, there’s no real reason to have a general representation of things in the world; it’s sufficient just to represent specific machine actions, particular customer service goals, or whatever. But if one wants to tackle the general problem of law and contracts, it’s a different story. Because inevitably one’s going to have to represent the full spectrum of human affairs and issues. And so now there’s a definite goal to having a symbolic representation of the world: one needs it to be able to say what should happen and have machines understand it.

Sometimes it’s useful to do that because one wants the machines just to be able to check whether what was supposed to happen actually did; sometimes one wants to actually have the machines automatically enforce or do things. But either way, one needs the machine to be able to represent general things in the world—and so one needs a symbolic discourse language to be able to do this.

Some History

In a sense, it’s a very obvious idea to have something like a symbolic discourse language. And indeed it’s an idea that’s come up repeatedly across the course of centuries. But it’s proved a very difficult idea to make work, and it has a history littered with (sometimes quite wacky) failures.

Things in a sense started well. Back in antiquity, logic as discussed by Aristotle provided a very restricted example of a symbolic discourse language. And when the formalism of mathematics began to emerge it provided another example of a restricted symbolic discourse language.

But what about more general concepts in the world? There’d been many efforts—between the Tetractys of the Pythagoreans and the I Ching of the Chinese—to assign symbols or numbers to a few important concepts. But around 1300 Ramon Llull took it further, coming up with a whole combinatorial scheme for representing concepts—and then trying to implement this with circles of paper that could supposedly mechanically determine the validity of arguments, particularly religious ones.

Four centuries later, Gottfried Leibniz was an enthusiast of Llull’s work, at first imagining that perhaps all concepts could be converted to numbers and truth then determined by doing something like factoring into primes. Later, Leibniz starting talking about a characteristica universalis (or, as Descartes called it, an “alphabet of human thoughts”)—essentially a universal symbolic language. But he never really tried to construct such a thing, instead chasing what one might consider “special cases”—including the one that led him to calculus.

With the decline of Latin as the universal natural language in the 1600s, particularly in areas like science and diplomacy, there had already been efforts to invent “philosophical languages” (as they were called) that would represent concepts in an abstract way, not tied to any specific natural language. The most advanced of these was by John Wilkins—who in 1668 produced a book cataloging over 10,000 concepts and representing them using strange-looking glyphs, with a rendering of the Lord’s Prayer as an example.

In some ways these efforts evolved into the development of encyclopedias and later thesauruses, but as language-like systems, they basically went nowhere. Two centuries later, though, as the concept of internationalization spread, there was a burst of interest in constructing new, country-independent languages—and out of this emerged Volapük and then Esperanto. These languages were really just artificial natural languages; they weren’t an attempt to produce anything like a symbolic discourse language. I always used to enjoy seeing signs in Esperanto at European airports, and was disappointed in the 1980s when these finally disappeared. But, as it happens, right around that time, there was another wave of language construction. There were languages like Lojban, intended to be as unambiguous as possible, and ones like the interestingly minimal Toki Pona intended to support the simple life, as well as the truly bizarre Ithkuil, intended to encompass the broadest range of linguistic and supposedly cognitive structures.

Along the way, there were also attempts to simplify languages like English by expressing everything in terms of 1000 or 2000 basic words (instead of the usual 20,000–30,000)—as in the “Simple English” version of Wikipedia or the xkcd Thing Explainer.

There were a few, more formal, efforts. One example was Hans Freudenthal’s 1960 Lincos “language for cosmic intercourse” (i.e. communication with extraterrestrials) which attempted to use the notation of mathematical logic to capture everyday concepts. In the early days of the field of artificial intelligence, there were plenty of discussions of “knowledge representation”, with approaches based variously on the grammar of natural language, the structure of predicate logic or the formalism of databases. Very few large-scale projects were attempted (Doug Lenat’s Cyc being a notable counterexample), and when I came to develop Wolfram|Alpha I was disappointed at how little of relevance to our needs seemed to have emerged.

In a way I find it remarkable that something as fundamental as the construction of a symbolic discourse language should have had so little serious attention paid to it in the past. But at some level it’s not so surprising. It’s a difficult, large project, and it somehow lies in between established fields. It’s not a linguistics project. Yes, it may ultimately illuminate how languages work, but that’s not its main point. It’s not a computer science project because it’s really about content, not algorithms. And it’s not a philosophy project because it’s mostly about specific nitty-gritty and not much about general principles.

There’ve been a few academic efforts in the last half century or so, discussing ideas like “semantic primes” and “natural semantic metalanguage”. Usually such efforts have tried to attach themselves to the field of linguistics—but their emphasis on abstract meaning rather than pure linguistic structure has put them at odds with prevailing trends, and none have turned into large-scale projects.

Outside of academia, there’s been a steady stream of proposals—sometimes promoted by wonderfully eccentric individuals—for systems to organize and name concepts in the world. It’s not clear how far this pursuit has come since Ramon Llull—and usually it’s only dealing with pure ontology, and never with full meaning of the kind that can be conveyed in natural language.

I suppose one might hope that with all the recent advances in machine learning there’d be some magic way to automatically learn an abstract representation for meaning. And, yes, one can take Wikipedia, for example, or a text corpus, and use dimension reduction to derive some effective “space of concepts”. But, not too surprisingly, simple Euclidean space doesn’t seem to be a very good model for the way concepts relate (one can’t even faithfully represent graph distances). And even the problem of taking possible meanings for words—as a dictionary might list them—and breaking them into clusters in a space of concepts doesn’t seem to be easy to do effectively.

Still, as I’ll discuss later, I think there’s a very interesting interplay between symbolic discourse language and machine learning. But for now my conclusion is that there’s not going to be any alternative but to use human judgment to construct the core of any symbolic discourse language that’s intended for humans to use.

Contracts into Code

But let’s get back to contracts. Today, there are hundreds of billions of them being signed every year around the world (and vastly more being implicitly entered into)—though the number of “original” ones that aren’t just simple modifications is probably just in the millions (and is perhaps comparable to the number of original computer programs or apps being written.)

So can these contracts be represented in precise symbolic form, as Leibniz hoped 300 years ago? Well, if we can develop a decently complete symbolic discourse language, it should be possible. (Yes, every contract would have to be defined relative to some underlying set of “governing law” rules, etc., that are in some ways like the built-in functions of the symbolic discourse language.)

But what would it mean? Among other things, it would mean that contracts themselves would become computable things. A contract would be converted to a program in the symbolic discourse language. And one could do abstract operations just on this program. And this means one can imagine formally determining—in effect through a kind of generalization of logic—whether, say, a given contract has some particular implication, could ever lead to some particular outcome, or is equivalent to some other contract.

Ultimately, though, there’s a theoretical problem with this. Because questions like this can run into issues of formal undecidability, which means there’s no guarantee that any systematic finite computation will answer them. The same problem arises in reasoning about typical software programs people write, and in practice it’s a mixed bag, with some things being decidable, and others not.

Of course, even in the Wolfram Language as it is today, there are plenty of things (such as the very basic “are these expressions equal?”) that are ultimately in principle undecidable. And there are certainly questions one can ask that run right into such issues. But an awful lot of the kinds of questions that people naturally ask turn out to be answerable with modest amounts of computation. And I wouldn’t be surprised if this were true for questions about contracts too. (It’s worth noting that human-formulated questions tend to run into undecidability much less than questions picked, say at random, from the whole computational universe of possibilities.)

If one has contracts in computational form, there are other things one can expect to do too. Like to be able to automatically work out what the contracts imply for a large range of possible inputs. The 1980s revolution in quantitative finance started when it became clear one could automatically compute distributions of outcomes for simple options contracts. If one had lots (perhaps billions) of contracts in computational form, there’d be a lot more that could be done along these lines—and no doubt, for better or worse, whole new areas of financial engineering that could be developed.

Where Do the Inputs Come From?

OK, so let’s say one has a computational contract. What can one directly do with it? Well, it depends somewhat on what the form of its inputs is. One important possibility is that they’re in a sense “born computational”: that they’re immediately statements about a computational system (“how many accesses has this ID made today?”, “what is the ping time for this connection?”, “how much bitcoin got transferred?”, etc.). And in that case, it should be possible to immediately and unambiguously “evaluate” the contract—and find out if it’s being satisfied.

This is something that’s very useful for lots of purposes—both for humans interacting with machines, and machines interacting with machines. In fact, there are plenty of cases where versions of it are already in use. One can think of computer security provisions such as firewall rules as one example. There are others that are gradually emerging, such as automated SLAs (service-level agreements) and automated terms-of-service. (I’m certainly hoping our company, for example, will be able to make these a routine part of our business practices before too long.)

But, OK, it’s certainly not true that every input for every contract is “born computational”: plenty of inputs have to come from seeing what happens in the “outside” world (“did the person actually go to place X?”, “was the package maintained in a certain environment?”, “did the information get leaked to social media?”, “is the parrot dead?”, etc.). And the first thing to say is that in modern times it’s become vastly easier to automatically determine things about the world, not least because one can just make measurements with sensors. Check the GPS trace. Look at the car counting sensor. And so on. The whole Internet of Things is out there to provide input about the real world for computational contracts.

Having said this, though, there’s still an issue. Yes, with a GPS trace there’s a definite answer (assuming the GPS is working properly) for whether someone or something went to a particular place. But let’s say one’s trying to determine something less obviously numerical. Let’s say, for example, that one’s trying to determine whether a piece of fruit should be considered “Fancy Grade” or not. Well, given some pictures of the piece of fruit an expert can pretty unambiguously tell. But how can we make this computational?

Well, here’s a place where we can use modern machine learning. We can set up some neural net, say in the Wolfram Language, and then show it lots of examples of fruit that’s Fancy Grade and that’s not. And from my experience (and those of our customers!) most of the time we’ll get a system that’s really good at a task like grading fruit. It’ll certainly be much faster than humans, and it’ll probably be more reliable and more consistent too.

And this gives a whole new way to set up contracts about things in the world. Two parties can just agree that the contract should say “if the machine learning system says X then do Y”. In a sense it’s like any other kind of computational contract: the machine learning system is just a piece of code. But it’s a little different. Because normally one expects that one can readily examine everything that a contract says: one can in effect read and understand the code. But with machine learning in the middle, there can no longer be any expectation of that.

Nobody specifically set up all those millions of numerical weights in the neural net; they were just determined by some approximate and somewhat random process from whatever training data that was given. Yes, in principle we can measure everything about what’s happening inside the neural net. But there’s no reason to expect that we’ll ever be able to get an understandable explanation—or prediction—of what the net will do in any particular case. Most likely it’s an example of the phenomenon I call computational irreducibility–which means there really isn’t any way to see what will happen much more efficiently than just by running it.

What’s the difference with asking a human expert, then, whose thought processes one can’t understand? Well, in practice machine learning is much faster so one can make much more use of “expert judgment”. And one can set things up so they’re repeatable, and one can for example systematically test for biases one thinks might be there, and so on.

Of course, one can always imagine cheating the machine learning. If it’s repeatable, one could use machine learning itself to try to learn cases where it would fail. And in the end it becomes rather like computer security, where holes are being found, patches are being applied, and so on. And in some sense this is no different from the typical situation with contracts too: one tries to cover all situations, then it becomes clear that something hasn’t been correctly addressed, and one tries to write a new contract to address it, and so on.

But the important bottom line is that with machine learning one can expect to get “judgment oriented” input into contracts. I expect the typical pattern will be this: in the contract there’ll be something stated in the symbolic discourse language (like “X will personally do Y”). And at the level of the symbolic discourse language there’ll be a clear meaning to this, from which, for example, all sorts of implications can be drawn. But then there’s the question of whether what the contract said is actually what happened in the real world. And, sure, there can be lots of sensor data that gives information on this. But in the end there’ll be a “judgment call” that has to be made. Did the person actually personally do this? Well—like for a remote exam proctoring system—one can have a camera watching the person, one can record their pattern of keystrokes, and maybe even measure their EEG. But something’s got to synthesize this data, and make the judgment call about what happened, and turn this in effect into a symbolic statement. And in practice I expect it will typically end up being a machine learning system that does this.

Smart Contracts

OK, so let’s say we’ve got ways to set up computational contracts. How can we enforce them? Well, ones that basically just involve computational processes can at some level enforce themselves. A particular piece of software can be built to issue licenses only in such-and-such a way. A cloud system can be built to make a download available only if it receives a certain amount of bitcoin. And so on.

But how far do we trust what’s going on? Maybe someone hacked the software, or the cloud. How can we be sure nothing bad has happened? The basic answer is to use the fact that the world is a big place. As a (sometime) physicist it makes me think of measurement in quantum mechanics. If we’re just dealing with a little quantum effect, there’s always interference that can happen. But when we do a real measurement, we’re amplifying that little quantum effect to the point where so many things (atoms, etc.) are involved that it’s unambiguous what happened—in much the same way as the Second Law of Thermodynamics makes it inconceivable that all the air molecules in a room will spontaneously line up on one side.

And so it is with bitcoin, Ethereum, etc. The idea is that some particular thing that happened (“X paid Y such-and-such” or whatever) is shared and recorded in so many places that there can’t be any doubt about it. Yes, it’s in principle possible that all the few thousand places that actually participate in something like bitcoin today could collude to give a fake result. But the idea is that it’s like with gas molecules in a room: the probability is inconceivably small. (As it happens, my Principle of Computational Equivalence suggests that there’s more than an analogy with the gas molecules, and that actually the underlying principles at work are basically exactly the same. And, yes, there are lots of interesting technical details about the operation of distributed blockchain ledgers, distributed consensus protocols, etc., but I’m not going to get into them here.)

It’s popular these days to talk about “smart contracts”. When I’ve been talking about “computational contracts” I mean contracts that can be expressed computationally. But by “smart contracts” people usually mean contracts that can both be expressed computationally and execute automatically. Most often the idea is to set up a smart contract in a distributed computation environment like Ethereum, and then to have the code in the contract evaluate based on inputs from the computation environment.

Sometimes the input is intrinsic—like the passage of time (who could possibly tamper with the clock of the whole internet?), or physically generated random numbers. And in cases like this, one has fairly pure smart contracts, say for paying subscriptions, or for running distributed lotteries.

But more often there has to be some input from the outside—from something that happens in the world. Sometimes one just needs public information: the price of a stock, the temperature at a weather station, or a seismic event like a nuclear explosion. But somehow the smart contract needs access to an “oracle” that can give it this information. And conveniently enough, there is one good such oracle available in the world: Wolfram|Alpha. And indeed Wolfram|Alpha is becoming widely used as an oracle for smart contracts. (Yes, our general public terms of service say you currently just shouldn’t rely on Wolfram|Alpha for anything you consider critical—though hopefully soon those terms of service will get more sophisticated, and computational.)

But what about non-public information from the outside world? The current thinking for smart contracts tends to be that one has to get humans in the loop to verify the information: that in effect one has to have a jury (or a democracy) to decide whether something is true. But is that really the best one can do? I tend to suspect there’s another path, that’s like using machine learning to inject human-like judgment into things. Yes, one can use people, with all their inscrutable and hard-to-systematically-influence behavior. But what if one replaces those people in effect by AIs—or even a collection of today’s machine-learning systems?

One can think of a machine-learning system as being a bit like a cryptosystem. To attack it and spoof its input one has to do something like inverting how it works. Well, given a single machine-learning system there’s a certain effort needed to achieve this. But if one has a whole collection of sufficiently independent systems, the effort goes up. It won’t be good enough just to change a few parameters in the system. But if one just goes out into the computational universe and picks systems at random then I think one can expect to have the same kind of independence as by having different people. (To be fair, I don’t yet quite know how to apply the mining of the computational universe that I’ve done for programs like cellular automata to the case of systems like neural nets.)

There’s another point as well: if one has a sufficiently dense net of sensors in the world, then it becomes increasingly easy to be sure about what’s happened. If there’s just one motion sensor in a room, it might be easy to cover it. And maybe even if there are several sensors, it’s still possible to avoid them, Mission Impossible-style. But if there are enough sensors, then by synthesizing information from them one can inevitably build up an understanding of what actually happened. In effect, one has a model of how the world works, and with enough sensors one can validate that the model is correct.

It’s not surprising, but it always helps to have redundancy. More nodes to ensure the computation isn’t tampered with. More machine-learning algorithms to make sure they aren’t spoofed. More sensors to make sure they’re not fooled. But in the end, there has to be something that says what should happen—what the contract is. And the contract has to be expressed in some language in which there are definite concepts. So somehow from the various redundant systems one has in the world, one has to make a definite conclusion—one has to turn the world into something symbolic, on which the contract can operate.

Writing Computational Contracts

Let’s say we have a good symbolic discourse language. Then how should contracts actually get written in it?

One approach is to take existing contracts written in English or any other natural language, and try to translate (or parse) them into the symbolic discourse language. Well, what will happen is somewhat like what happens with Wolfram|Alpha today. The translator will not know exactly what the natural language was supposed to mean, and so it will give several possible alternatives. Maybe there was some meaning that the original writer of the natural-language contract had in mind. But maybe the “poetry” of that meaning can’t be expressed in the symbolic discourse language: it requires something more definite. And a human is going to have to decide which alternative to pick.

Translating from natural-language contracts may be a good way to start, but I suspect it will quickly give way to writing contracts directly in the symbolic discourse language. Today lawyers have to learn to write legalese. In the future, they’re going to have to learn to write what amounts to code: contracts expressed precisely in a symbolic discourse language.

One might think that writing everything as code, rather than natural-language legalese, would be a burden. But my guess is that it will actually be a great benefit. And it’s not just because it will let contracts operate more easily. It’s also that it will help lawyers think better about contracts. It’s an old claim (the Sapir–Whorf hypothesis) that the language one uses affects the way one thinks. And this is no doubt somewhat true for natural languages. But in my experience it’s dramatically true for computer languages. And indeed I’ve been amazed over the years at how my thinking has changed as we’ve added more to the Wolfram Language. When I didn’t have a way to express something, it didn’t enter my thinking. But once I had a way to express it, I could think in terms of it.

And so it will be, I believe, for legal thinking. When there’s a precise symbolic discourse language, it’ll become possible to think more clearly about all sorts of things.

Of course, in practice it’ll help that there’ll no doubt be all sorts of automated annotation: “if you add that clause, it’ll imply X, Y and Z”, etc. It’ll also help that it’ll routinely be possible to take some contract and simulate its consequences for a range of inputs. Sometimes one will want statistical results (“is this biased?”). Sometimes one will want to hunt for particular “bugs” that will only be found by trying lots of inputs.

Yes, one can read a contract in natural language, like one can read a math paper. But if one really wants to know its implications one needs it in computational form, so one can run it and see what it implies—and also so one can give it to a computer to implement.

The World with Computational Contracts

Back in ancient Babylon it was a pretty big deal when there started to be written laws like the Code of Hammurabi. Of course, with very few people able to read, there was all sorts of clunkiness at first—like having people recite the laws in order from memory. Over the centuries things got more streamlined, and then about 500 years ago, with the advent of widespread literacy, laws and contracts started to be able to get more complex (which among other things allowed them to be more nuanced, and to cover more situations).

In recent decades the trend has accelerated, particularly now that it’s so easy to copy and edit documents of any length. But things are still limited by the fact that humans are in the loop, authoring and interpreting the documents. Back 50 years ago, pretty much the only way to define a procedure for anything was to write it down, and have humans implement it. But then along came computers, and programming. And very soon it started to be possible to define vastly more complex procedures—to be implemented not by humans, but instead by computers.

And so, I think, it will be with law. Once computational law becomes established, the complexity of what can be done will increase rapidly. Typically a contract defines some model of the world, and specifies what should happen in different situations. Today the logical and algorithmic structure of models defined by contracts still tends to be fairly simple. But with computational contracts it’ll be feasible for them to be much more complex—so that they can for example more faithfully capture how the world works.

Of course, that just makes defining what should happen even more complex—and before long it might feel a bit like constructing an operating system for a computer, that tries to cover all the different situations the computer might find itself in.

In the end, though, one’s going to have to say what one wants. One might be able to get a certain distance by just giving specific examples. But ultimately I think one’s going to have to use a symbolic discourse language that can express a higher level of abstraction.

Sometimes one will be able to just write everything in the symbolic discourse language. But often, I suspect, one will use the symbolic discourse language to define what amount to goals, and then one will have to use machine-learning kinds of methods to fill in how to define a contract that actually achieves them.

And as soon as there’s computational irreducibility involved, it’ll typically be impossible to know for sure that there are no bugs, or “unintended consequences”. Yes, one can do all kinds of automated tests. But in the end it’s theoretically impossible to have any finite procedure that can guarantee to check all possibilities.

Today there are plenty of legal situations that are too complex to handle without expert lawyers. And in a world where computational law is common, it won’t just be convenient to have computers involved, it’ll be necessary.

In a sense it’s similar to what’s already happened in many areas of engineering. Back when humans had to design everything themselves, humans could typically understand the structures that were being built. But once computers are involved in design it becomes inevitable that they’re needed in figuring out how things work too.

Today a fairly complex contract might involve a hundred pages of legalese. But once there’s computational law—and particularly contracts constructed automatically from goals—the lengths are likely to increase rapidly. At some level it won’t matter, though—just as it doesn’t really matter how long the code of a program one’s using is. Because the contract will in effect just be run automatically by computer.

Leibniz saw computation as a simplifying element in the practice of law. And, yes, some things will become simpler and better defined. But a vast ocean of complexity will also open up.

What Does It Mean for AIs?

How should one tell an AI what to do? Well, you have to have some form of communication that both humans and AIs can understand—and that is rich enough to describe what one wants. And as I’ve described elsewhere, what I think this basically means is that one has to have a knowledge-based computer language—which is precisely what the Wolfram Language is—and ultimately one needs a full symbolic discourse language.

But, OK, so one tells an AI to do something, like “go get some cookies from the store”. But what one says inevitably won’t be complete. The AI has to operate within some model of the world, and with some code of conduct. Maybe it can figure out how to steal the cookies, but it’s not supposed to do that; presumably one wants it to follow the law, or a certain code of conduct.

And this is where computational law gets really important: because it gives us a way to provide that code of conduct in a way that AIs can readily make use of.

In principle, we could have AIs ingest the complete corpus of laws and historical cases and so on, and try to learn from these examples. But as AIs become more and more important in our society, it’s going to be necessary to define all sorts of new laws, and many of these are likely to be “born computational”, not least, I suspect, because they’ll be too algorithmically complex to be usefully described in traditional natural language.

There’s another problem too: we really don’t just want AIs to follow the letter of the law (in whatever venue they happen to be), we want them to behave ethically too, whatever that may mean. Even if it’s within the law, we probably don’t want our AIs lying and cheating; we want them somehow to enhance our society along the lines of whatever ethical principles we follow.

Well, one might think, why not just teach AIs ethics like we could teach them laws? In practice, it’s not so simple. Because whereas laws have been somewhat decently codified, the same can’t be said for ethics. Yes, there are philosophical and religious texts that talk about ethics. But it’s a lot vaguer and less extensive than what exists for law.

Still, if our symbolic discourse language is sufficiently complete, it certainly should be able to describe ethics too. And in effect we should be able to set up a system of computational laws that defines a whole code of conduct for AIs.

But what should it say? One might have a few immediate ideas. Perhaps one could combine all the ethical systems of the world. Obviously hopeless. Perhaps one could have the AIs just watch what humans do and learn their system of ethics from it. Similarly hopeless. Perhaps one could try something more local, where the AIs switch their behavior based on geography, cultural context, etc. (think “protocol droid”). Perhaps useful in practice, but hardly a complete solution.

So what can one do? Well, perhaps there are a few principles one might agree on. For example, at least the way we think about things today, most of us don’t want humans to go extinct (of course, maybe in the future, having mortal beings will be thought too disruptive, or whatever). And actually, while most people think there are all sorts of things wrong with our current society and civilization, people usually don’t want it to change too much, and they definitely don’t want change forced upon them.

So what should we tell the AIs? It would be wonderful if we could just give the AIs some simple set of almost axiomatic principles that would make them always do what we want. Maybe they could be based on Asimov’s Three Laws of Robotics. Maybe they could be something seemingly more modern based on some kind of global optimization. But I don’t think it’s going to be that easy.

The world is a complicated place; if nothing else, that’s basically guaranteed by the phenomenon of computational irreducibility. And it’s pretty much inevitable that there’s not going to be any finite procedure that’ll force everything to “come out the way one wants” (whatever that may be).

Let me take a somewhat abstruse, but well defined, example from mathematics. We think we know what integers are. But to really be able to answer all questions about integers (including about infinite collections of them, etc.) we need to set up axioms that define how integers work. And that’s what Giuseppe Peano tried to do in the late 1800s. For a while it looked good, but then in 1931 Kurt Gödel surprised the world with his Incompleteness Theorem, which implied among other things, that actually, try as one might, there was never going to be a finite set of axioms that would define the integers as we expect them to be, and nothing else.

In some sense, Peano’s original axioms actually got quite close to defining just the integers we want. But Gödel showed that they also allow bizarre non-standard integers, where for example the operation of addition isn’t finitely computable.

Well, OK, that’s abstract mathematics. What about the real world? Well, one of the things that we’ve learned since Gödel’s time is that the real world can be thought of in computational terms, pretty much just like the mathematical systems Gödel considered. And in particular, one can expect the same phenomenon of computational irreducibility (which itself is closely related to Gödel’s Theorem). And the result of this is that whatever simple intuitive goal we may define, it’s pretty much inevitable we’ll have to build up what amount to an arbitrarily complicated collection of rules to try to achieve it—and whatever we do, there’ll always be at least some “unintended consequences”.

None of this should really come as much of a surprise. After all, if we look at actual legal systems as they’ve evolved over the past couple of thousand years, there always end up being a lot of laws. It’s not like there’s a single principle from which everything else can be derived; there inevitably end up being lots of different situations that have to be covered.

Principles of the World?

But is all this complexity just a consequence of the “mechanics” of how the world works? Imagine—as one expects—that AIs get more and more powerful. And that more and more of the systems of the world, from money supplies to border controls, are in effect put in the hands of AIs. In a sense, then, the AIs play a role a little bit like governments, providing an infrastructure for human activities.

So, OK, perhaps we need a “constitution” for the AIs, just like we set up constitutions for governments. But again the question comes: what should the constitution have in it?

Let’s say that the AIs could mold human society in pretty much any way. How would we want it molded? Well, that’s an old question in political philosophy, debated since antiquity. At first an idea like utilitarianism might sound good: somehow maximize the well-being of as many people as possible. But imagine actually trying to do this with AIs that in effect control the world. Immediately one is thrust into concrete versions of questions that philosophers and others have debated for centuries. Let’s say one can sculpt the probability distribution for happiness among people in the world. Well, now we’ve got to get precise about whether it’s the mean or the median or the mode or a quantile or, for that matter, the kurtosis of the distribution that we’re trying to maximize.

No doubt one can come up with rhetoric that argues for some particular choice. But there just isn’t an abstract “right answer”. Yes, we can have a symbolic discourse language that expresses any choice. But there’s no mathematical derivation of the answer and there’s no law of nature that forces a particular answer. I suppose there could be a “best answer given our biological nature”. But as things advance, this won’t be on solid ground either, as we increasingly manage to use technology to transcend the biology that evolution has delivered to us.

Still, we might argue, there’s at least one constraint: we don’t want a scheme where we’ll go extinct—and where nothing will in the end exist. Even this is going to be a complicated thing to discuss, because we need to say what the “we” here is supposed to be: just how “evolved” relative to the current human condition can things be, and not consider “us” to have gone extinct?

But even independent of this, there’s another issue: given any particular setup, computational irreducibility can make it in a sense irreducibly difficult to find out its consequences. And so in particular, given any specific optimization criterion (or constitution), there may be no finite procedure that will determine whether it allows for infinite survival, or whether in effect it implies civilization will “halt” and go extinct.

OK, so things are complicated. What can one actually do? For a little while there’ll probably be the notion that AIs must ultimately have human owners, who must act according to certain principles, following the usual way human society operates. But realistically this won’t last long.

Who would be responsible for a public-domain AI system that’s spread across the internet? What happens when the bots it spawns start misbehaving on social media (yes, the notion that social media accounts are just for humans will soon look very “early 21st century”)?

Of course, there’s an important question of why AIs should “follow the rules” at all. After all, humans certainly don’t always do that. It’s worth remembering, though, that we humans are probably a particularly difficult case: after all, we’re the product a multibillion-year process of natural selection, in which there’s been a continual competitive struggle for survival. AIs are presumably coming into the world in very different circumstances, and without the same need for “brutish instincts”. (Well, I can’t help thinking of AIs from different companies or countries being imbued by their creators with certain brutish instincts, but that’s surely not a necessary feature of AI existence.)

In the end, though, the best hope for getting AIs to “follow the rules” is probably by more or less the same mechanism that seems to maintain human society today: that following the rules is the way some kind of dynamic equilibrium is achieved. But if we can get the AIs to “follow the rules”, we still have to define what the rules—the AI Constitution—should be.

And, of course, this is a hard problem, with no “right answer”. But perhaps one approach is to see what’s happened historically with humans. And one important and obvious thing is that there are different countries, with different laws and customs. So perhaps at the very least we have to expect that there’d be multiple AI Constitutions, not just one.

Even looking at countries today, an obvious question is how many there should be. Is there some easy way to say that—with technology as it exists, for example—7 billion people should be expected to organize themselves into about 200 countries?

It sounds a bit like asking how many planets the solar system should end up with. For a long time this was viewed as a “random fact of nature” (and widely used by philosophers as an example of something that, unlike 2+2=4, doesn’t “have to be that way”). But particularly having seen so many exoplanet systems, it’s become clear that our solar system actually pretty much has to have about the number of planets it does.

And maybe after we’ve seen the sociologies of enough video-game virtual worlds, we’ll know something about how to “derive” the number of countries. But of course it’s not at all clear that AI Constitutions should be divided anything like countries.

The physicality of humans has the convenient consequence that at least at some level one can divide the world geographically. But AIs don’t need to have that kind of spatial locality. One can imagine some other schemes, of course. Like let’s say one looks at the space of personalities and motivations, and finds clusters in it. Perhaps one could start to say “here’s an AI Constitution for that cluster” and so on. Maybe the constitutions could fork, perhaps almost arbitrarily (a “Git-like model of society”). I don’t know how things like this would ultimately work, but they seem more plausible than what amounts to a single, consensus, AI Constitution for everywhere and everyone.

There are so many issues, though. Like here’s one. Let’s assume AIs are the dominant power in our world. But let’s assume that they successfully follow some constitution or constitutions that we’ve defined for them. Well, that’s nice—but does it mean nothing can ever change in the world? I mean, just think if we were still all operating according to laws that had been set up 200 years ago: most of society has moved on since then, and wants different laws (or at least different interpretations) to reflect its principles.

But what if precise laws for AIs were burnt in around the year 2020, for all eternity? Well, one might say, real constitutions always have explicit clauses that allow for their own modification (in the US Constitution it’s Article V). But looking at the actual constitutions of countries around the world isn’t terribly encouraging. Some just say basically that the constitution can be changed if some supreme leader (a person) says so. Many say that the constitution can be changed through some democratic process—in effect by some sequence of majority or similar votes. And some basically define a bureaucratic process for change so complex that one wonders if it’s formally undecidable whether it would ever come to a conclusion.

At first, the democratic scheme seems like an obvious winner. But it’s fundamentally based on the concept that people are somehow easy to count (of course, one can argue about which people, etc.). But what happens when personhood gets more complicated? When, for example, there are in effect uploaded human consciousnesses, deeply intertwined with AIs? Well, one might say, there’s always got to be some “indivisible person” involved. And yes, I can imagine little clumps of pineal gland cells that are maintained to define “a person”, just like in the past they were thought to be the seat of the soul. But from the basic science I’ve done I think I can say for certain that none of this will ultimately work—because in the end the computational processes that define things just don’t have this kind of indivisibility.

So what happens to “democracy” when there are no longer “people to count”? One can imagine all sorts of schemes, involving identifying the density of certain features in “people space”. I suppose one can also imagine some kind of bizarre voting involving transfinite numbers of entities, in which perhaps the axiomatization of set theory has a key effect on the future of history.

It’s an interesting question how to set up a constitution in which change is “burned in”. There’s a very simple example in bitcoin, where the protocol just defines by fiat that the value of mined bitcoin goes down every year. Of course, that setup is in a sense based on a model of the world—and in particular on something like Moore’s Law and the apparent short-term predictability of technological development. But following the same general idea, one might starting thinking about a constitution that says “change 1% of the symbolic code in this every year”. But then one’s back to having to decide “which 1%?”. Maybe it’d be based on usage, or observations of the world, or some machine-learning procedure. But whatever algorithm or meta-algorithm is involved, there’s still at some point something that has to be defined once and for all.

Can one make a general theory of change? At first, this might seem hopeless. But in a sense exploring the computational universe of programs is like seeing a spectrum of all possible changes. And there’s definitely some general science that can be done on such things. And maybe there’s some setup—beyond just “fork whenever there could be a change”—that would let one have a constitution that appropriately allows for change, as well as changing the way one allows for change, and so on.

Making It Happen

OK, we’ve talked about some far-reaching and foundational issues. But what about the here and now? Well, I think the exciting thing is that 300 years after Gottfried Leibniz died, we’re finally in a position to do what he dreamed of: to create a general symbolic discourse language, and to apply it to build a framework for computational law.

With the Wolfram Language we have the foundational symbolic system—as well as a lot of knowledge of the world—to start from. There’s still plenty to do, but I think there’s now a definite path forward. And it really helps that in addition to the abstract intellectual challenge of creating a symbolic discourse language, there’s now also a definite target in mind: being able to set up practical systems for computational law.

It’s not going to be easy. But I think the world is ready for it, and needs it. There are simple smart contracts already in things like bitcoin and Ethereum, but there’s vastly more that can be done—and with a full symbolic discourse language the whole spectrum of activities covered by law becomes potentially accessible to structured computation. It’s going to lead to all sorts of both practical and conceptual advances. And it’s going to enable new legal, commercial and societal structures—in which, among other things, computers are drawn still further into the conduct of human affairs.

I think it’s also going to be critical in defining the overall framework for AIs in the future. What ethics, and what principles, should they follow? How do we communicate these to them? For ourselves and for the AIs we need a way to formulate what we want. And for that we need a symbolic discourse language. Leibniz had the right idea, but 300 years too early. Now in our time I’m hoping we’re finally going to get to build for real what he only imagined. And in doing so we’re going to take yet another big step forward in harnessing the power of the computational paradigm.

↧

A Short Talk on AI Ethics

October 17, 2016, 12:49 pm

≫ Next: Quick, How Might the Alien Spacecraft Work?

≪ Previous: Computational Law, Symbolic Discourse and the AI Constitution

Last week I gave a talk (and did a panel discussion) at a conference entitled “Ethics of Artificial Intelligence” held at the NYU Philosophy Department’s Center for Mind, Brain and Consciousness. Here’s the video and a transcript:

Thanks for inviting me here today.

You know, it’s funny to be here. My mother was a philosophy professor in Oxford. And when I was a kid I always said the one thing I’d never do was do or talk about philosophy. But, well, here I am.

Before I really get into AI, I think I should say a little bit about my worldview. I’ve basically spent my life alternating between doing basic science and building technology. I’ve been interested in AI for about as long as I can remember. But as a kid I started out doing physics and cosmology and things. That got me into building technology to automate stuff like math. And that worked so well that I started thinking about, like, how to really know and compute everything about everything. That was in about 1980—and at first I thought I had to build something like a brain, and I was studying neural nets and so on. But I didn’t get too far.

And meanwhile I got interested in an even bigger problem in science: how to make the most general possible theories of things. The dominant idea for 300 years had been to use math and equations. But I wanted to go beyond them. And the big thing I realized was that the way to do that was to think about programs, and the whole computational universe of possible programs.

Cellular automata grid

And that led to my personal Galileo-like moment. I just pointed my “computational telescope” at these simplest possible programs, and I saw this amazing one I called rule 30—that just seemed to go on producing complexity forever from essentially nothing.

Rule 30

Well, after I’d seen this, I realized this is actually something that happens all over the computational universe—and all over nature. It’s really the secret that lets nature make all the complicated stuff we see. But it’s something else too: it’s a window into what raw, unfettered computation is like. At least traditionally when we do engineering we’re always building things that are simple enough that we can foresee what they’ll do.

But if we just go out into the computational universe, things can be much wilder. Our company has done a lot of mining out there, finding programs that are useful for different purposes, like rule 30 is for randomness. And modern machine learning is kind of part way from traditional engineering to this kind of free-range mining.

But, OK, what can one say in general about the computational universe? Well, all these programs can be thought of as doing computations. And years ago I came up with what I call the Principle of Computational Equivalence—that says that if behavior isn’t obviously simple, it typically corresponds to a computation that’s maximally sophisticated. There are lots of predictions and implications of this. Like that universal computation should be ubiquitous. As should undecidability. And as should what I call computational irreducibility.

An example of cellular automata

Can you predict what it’s going to do? Well, it’s probably computationally irreducible, which means you can’t figure out what it’s going to do without effectively tracing every step and going through the same computational effort it does. It’s completely deterministic. But to us it’s got what seems like free will—because we can never know what it’s going to do.

Here’s another thing: what’s intelligence? Well, our big unifying principle says that everything—from a tiny program, to our brains, is computationally equivalent. There’s no bright line between intelligence and mere computation. The weather really does have a mind of its own: it’s doing computations just as sophisticated as our brains. To us, though, it’s pretty alien computation. Because it’s not connected to our human goals and experiences. It’s just raw computation that happens to be going on.

So how do we tame computation? We have to mold it to our goals. And the first step there is to describe our goals. And for the past 30 years what I’ve basically been doing is creating a way to do that.

I’ve been building a language—that’s now called the Wolfram Language—that allows us to express what we want to do. It’s a computer language. But it’s not really like other computer languages. Because instead of telling a computer what to do in its terms, it builds in as much knowledge as possible about computation and the world, so that we humans can describe in our terms what we want, and then it’s up to the language to get it done as automatically as possible.

This basic idea has worked really well, and in the form of Mathematica it’s been used to make endless inventions and discoveries over the years. It’s also what’s inside Wolfram|Alpha. Where the idea is to take pure natural language questions, understand them, and use the kind of curated knowledge and algorithms of our civilization to answer them. And, yes, it’s a very classic AIish thing. And of course it’s computed answers to billions and billions of questions from humans, for example inside Siri.

I had an interesting experience recently, figuring out how to use what we’ve built to teach computational thinking to kids. I was writing exercises for a book. At the beginning, it was easy: “make a program to do X”. But later on, it was like “I know what to say in the Wolfram Language, but it’s really hard to express in English”. And of course that’s why I just spent 30 years building the Wolfram Language.

English has maybe 25,000 common words; the Wolfram Language has about 5000 carefully designed built-in constructs—including all the latest machine learning—together with millions of things based on curated data. And the idea is that once one can think about something in the world computationally, it should be as easy as possible to express it in the Wolfram Language. And the cool thing is, it really works. Humans, including kids, can read and write the language. And so can computers. It’s a kind of high-level bridge between human thinking, in its cultural context, and computation.

OK, so what about AI? Technology has always been about finding things that exist, and then taming them to automate the achievement of particular human goals. And in AI the things we’re taming exist in the computational universe. Now, there’s a lot of raw computation seething around out there—just as there’s a lot going on in nature. But what we’re interested in is computation that somehow relates to human goals.

So what about ethics? Well, maybe we want to constrain the computation, the AI, to only do things we consider ethical. But somehow we have to find a way to describe what we mean by that.

Well, in the human world, one way we do this is with laws. But so how do we connect laws to computations? We may call them “legal codes”, but today laws and contracts are basically written in natural language. There’ve been simple computable contracts in areas like financial derivatives. And now one’s talking about smart contracts around cryptocurrencies.

But what about the vast mass of law? Well, Leibniz—who died 300 years ago next month—was always talking about making a universal language to, as we would say now, express it all in a computable way. He was a few centuries too early, but I think now we’re finally in a position to do this.

I just posted a long blog about all this last week, but let me try to summarize. With the Wolfram Language we’ve managed to express a lot of kinds of things in the world—like the ones people ask Siri about. And I think we’re now within sight of what Leibniz wanted: to have a general symbolic discourse language that represents everything involved in human affairs.

I see it basically as a language design problem. Yes, we can use natural language to get clues, but ultimately we have to build our own symbolic language. It’s actually the same kind of thing I’ve done for decades in the Wolfram Language. Take even a word like “plus”. Well, in the Wolfram Language there’s a function called Plus, but it doesn’t mean the same thing as the word. It’s a very specific version, that has to do with adding things mathematically. And as we design a symbolic discourse language, it’s the same thing. The word “eat” in English can mean lots of things. But we need a concept—that we’ll probably refer to as “eat”—that’s a specific version, that we can compute with.

So let’s say we’ve got a contract written in natural language. One way to get a symbolic version is to use natural language understanding—just like we do for billions of Wolfram|Alpha inputs, asking humans about ambiguities. Another way might be to get machine learning to describe a picture. But the best way is just to write in symbolic form in the first place, and actually I’m guessing that’s what lawyers will be doing before too long.

And of course once you have a contract in symbolic form, you can start to compute about it, automatically seeing if it’s satisfied, simulating different outcomes, automatically aggregating it in bundles, and so on. Ultimately the contract has to get input from the real world. Maybe that input is “born digital”, like data about accessing a computer system, or transferring bitcoin. Often it’ll come from sensors and measurements—and it’ll take machine learning to turn into something symbolic.

Well, if we can express laws in computable form maybe we can start telling AIs how we want them to act. Of course it might be better if we could boil everything down to simple principles, like Asimov’s Laws of Robotics, or utilitarianism or something.

But I don’t think anything like that is going to work. What we’re ultimately trying to do is to find perfect constraints on computation, but computation is something that’s in some sense infinitely wild. The issue already shows up in Gödel’s Theorem. Like let’s say we’re looking at integers and we’re trying to set up axioms to constrain them to just work the way we think they do. Well, what Gödel showed is that no finite set of axioms can ever achieve this. With any set of axioms you choose, there won’t just be the ordinary integers; there’ll also be other wild things.

And the phenomenon of computational irreducibility implies a much more general version of this. Basically, given any set of laws or constraints, there’ll always be “unintended consequences”. This isn’t particularly surprising if one looks at the evolution of human law. But the point is that there’s theoretically no way around it. It’s ubiquitous in the computational universe.

Now I think it’s pretty clear that AI is going to get more and more important in the world—and is going to eventually control much of the infrastructure of human affairs, a bit like governments do now. And like with governments, perhaps the thing to do is to create an AI Constitution that defines what AIs should do.

What should the constitution be like? Well, it’s got to be based on a model of the world, and inevitably an imperfect one, and then it’s got to say what to do in lots of different circumstances. And ultimately what it’s got to do is provide a way of constraining the computations that happen to be ones that align with our goals. But what should those goals be? I don’t think there’s any ultimate right answer. In fact, one can enumerate goals just like one can enumerate programs out in the computational universe. And there’s no abstract way to choose between them.

But for us there’s a way to choose. Because we have particular biology, and we have a particular history of our culture and civilization. It’s taken us a lot of irreducible computation to get here. But now we’re just at some point in the computational universe, that corresponds to the goals that we have.

Human goals have clearly evolved through the course of history. And I suspect they’re about to evolve a lot more. I think it’s pretty inevitable that our consciousness will increasingly merge with technology. And eventually maybe our whole civilization will end up as something like a box of a trillion uploaded human souls.

But then the big question is: “what will they choose to do?”. Well, maybe we don’t even have the language yet to describe the answer. If we look back even to Leibniz’s time, we can see all sorts of modern concepts that hadn’t formed yet. And when we look inside a modern machine learning or theorem proving system, it’s humbling to see how many concepts it effectively forms—that we haven’t yet absorbed in our culture.

Maybe looked at from our current point of view, it’ll just seem like those disembodied virtual souls are playing videogames for the rest of eternity. At first maybe they’ll operate in a simulation of our actual universe. Then maybe they’ll start exploring the computational universe of all possible universes.

But at some level all they’ll be doing is computation—and the Principle of Computational Equivalence says it’s computation that’s fundamentally equivalent to all other computation. It’s a bit of a letdown. Our proud future ending up being computationally equivalent just to plain physics, or to little rule 30.

Of course, that’s just an extension of the long story of science showing us that we’re not fundamentally special. We can’t look for ultimate meaning in where we’ve reached. We can’t define an ultimate purpose. Or ultimate ethics. And in a sense we have to embrace the details of our existence and our history.

There won’t be a simple principle that encapsulates what we want in our AI Constitution. There’ll be lots of details that reflect the details of our existence and history. And the first step is just to understand how to represent those things. Which is what I think we can do with a symbolic discourse language.

And, yes, conveniently I happen to have just spent 30 years building the framework to create such a thing. And I’m keen to understand how we can really use it to create an AI Constitution.

So I’d better stop talking about philosophy, and try to answer some questions.

After the talk there was a lively Q&A (followed by a panel discussion), included on the video. Some questions were:

When will AI reach human-level intelligence?
What are the difficulties you foresee in developing a symbolic discourse language?
Do we live in a deterministic universe?
Is our present reality a simulation?
Does free will exist, and how does consciousness arise from computation?
Can we separate rules and principles in a way that is computable for AI?
How can AI navigate contradictions in human ethical systems?

↧

Quick, How Might the Alien Spacecraft Work?

November 10, 2016, 7:58 am

≫ Next: Launching Wolfram|Alpha Open Code

≪ Previous: A Short Talk on AI Ethics

[This post is about the movie Arrival; there are no movie spoilers here.]

Connecting with Hollywood

“It’s an interesting script” said someone on our PR team. It’s pretty common for us to get requests from movie-makers about showing our graphics or posters or books in movies. But the request this time was different: could we urgently help make realistic screen displays for a big Hollywood science fiction movie that was just about to start shooting?

Well, in our company unusual issues eventually land in my inbox, and so it was with this one. Now it so happens that through some combination of relaxation and professional interest I’ve probably seen basically every mainstream science fiction movie that’s appeared over the past few decades. But just based on the working title (“Story of Your Life”) I wasn’t even clear that this movie was science fiction, or what it was at all.

But then I heard that it was about first contact with aliens, and so I said “sure, I’ll read the script”. And, yes, it was an interesting script. Complicated, but interesting. I couldn’t tell if the actual movie would be mostly science fiction or mostly a love story. But there were definitely interesting science-related themes in it—albeit mixed with things that didn’t seem to make sense, and a liberal sprinkling of minor science gaffes.

When I watch science fiction movies I have to say I quite often cringe, thinking, “someone’s spent $100 million on this movie—and yet they’ve made some gratuitous science mistake that could have been fixed in an instant if they’d just asked the right person”. So I decided that even though it was a very busy time for me, I should get involved in what’s now called Arrival and personally try to give it the best science I could.

There are, I think, several reasons Hollywood movies often don’t get as much science input as they should. The first is that movie-makers usually just aren’t sensitive to the “science texture” of their movies. They can tell if things are out of whack at a human level, but they typically can’t tell if something is scientifically off. Sometimes they’ll get as far as calling a local university for help, but too often they’re sent to a hyper-specialized academic who’ll not-very-usefully tell them their whole story is wrong. Of course, to be fair, science content usually doesn’t make or break movies. But I think having good science content—like, say, good set design—can help elevate a good movie to greatness.

As a company we’ve had a certain amount of experience working with Hollywood, for example writing all the math for six seasons of the television show Numb3rs. I hadn’t personally been involved—though I have quite a few science friends who’ve helped with movies. There’s Jack Horner, who worked on Jurassic Park, and ended up (as he tells it) pretty much having all his paleontology theories in the movie, including ones that turned out to be wrong. And then there’s Kip Thorne (famous for the recent triumph of detecting gravitational waves), who as a second career in his 80s was the original driving force behind Interstellar—and who made the original black-hole visual effects with Mathematica. From an earlier era there was Marvin Minsky who consulted on AI for 2001: A Space Odyssey, and Ed Fredkin who ended up as the model for the rather eccentric Dr. Falken in WarGames. And recently there was Manjul Bhargava, who for a decade shepherded what became The Man Who Knew Infinity, eventually carefully “watching the math” in weeks of editing sessions.

All of these people had gotten involved with movies much earlier in their production. But I figured that getting involved when the movie was about to start shooting at least had the advantage that one knew the movie was actually going to get made (and yes, there’s often a remarkably high noise-to-signal ratio about such things in Hollywood). It also meant that my role was clear: all I could do was try to uptick and smooth out the science; it wasn’t even worth thinking about changing anything significant in the plot.

The inspiration for the movie had come from an interesting 1998 short story by Ted Chiang. But it was a conceptually complicated story, riffing off a fairly technical idea in mathematical physics—and I wasn’t alone in wondering how anyone could possibly make a movie out of it. Still, there it was, a 120-page script that basically did it, with some science from the original story, and quite a lot added, mostly still in a rather “lorem ipsum” state. And so I went to work, making comments, suggesting fixes, and so on.

A Few Weeks Later…

Cut to a few weeks later. My son Christopher and I arrive on the set of Arrival in Montreal. The latest X-Men movie is filming at a huge facility next door. Arrival is at a more modest facility. We get there when they’re in the middle of filming a scene inside a helicopter. We can’t see the actors, but we’re watching on the “video village” monitor, along with a couple of producers and other people.

The first line I hear is “I’ve prepared a list of questions [for the aliens], starting with some binary sequences…”. And I’m like “Wow, I suggested saying that! This is great!” But then there’s another take. And a word changes. And then there are more takes. And, yes, the dialogue sounds smoother. But the meaning isn’t right. And I’m realizing: this is more difficult than I thought. Lots of tradeoffs. Lots of complexity. (Happily, in the final movie, it ends up being a blend, with the right meaning, and sounding good.)

After a while there’s a break in filming. We talk to Amy Adams, who plays a linguist assigned to communicate with the aliens. She’s spent some time shadowing a local linguistics professor, and is keen to talk about the question of how much the language one uses determines how one thinks—which is a topic that as a computer-language designer I’ve long been interested in. But what the producers really want is for me to talk to Jeremy Renner, who plays a physicist in the movie. He’s feeling out of sorts right then—so off we go to look at the “science tent” set they’ve built and think about what visuals will work with it.

Me and Christopher on set

Writing Code

The script made it clear that there were going to be lots of opportunities for interesting visuals. But much as I might have found it fun, I just didn’t personally have the time to work on creating them. Fortunately, though, my son Christopher—who is a very fast and creative programmer—was interested in doing it. We’d hoped to just be able to ship him off to the set for a week or two, but it was decided he was still too young, so he started off working remotely.

His basic strategy was simple: just ask “if we were doing this for real, what analysis and computations would we be doing?”. We’ve got a list of alien landing sites; what’s the pattern? We’ve got geometric data on the shape of the spacecraft; what’s its significance? We’ve got alien “handwriting”; what does it mean?

The movie-makers were giving Christopher raw data, just like in real life, and he was trying to analyze it. And he was turning each question that was asked into all sorts of Wolfram Language code and visualizations.

Christopher was well aware that code shown in movies often doesn’t make sense (a favorite, regardless of context, seems to be the source code for nmap.c in Linux). But he wanted to create code that would make sense, and would actually do the analyses that would be going on in the movie.

GeoGraphics[{Thickness[0.001], {Red, GeoPath /@ (List @@@ EdgeList[NearestNeighborGraph[landingSites, 3]])}, Table[GeoDisk[#, Quantity[n, "Miles"]] & /@ landingSites, {n, 0, 1000, 250}], Red, GeoStyling[Opacity[1]], GeoDisk[#, Quantity[50, "Miles"]] & /@ landingSites}, GeoRange -> "World", GeoProjection -> "WagnerII", GeoZoomLevel -> 3]

$Module[{i = image from previous output , corners = ImageCorners[i, 3, 0.1, 5]}, Show[{i, Graphics[{ {Orange, Thickness[0.003], Outer[If[#1 === #2, {}, {Opacity[ 3000/EuclideanDistance[#1, #2]^2], Line[{#1, #2}]}] &, corners, corners, 1]}, {EdgeForm[Green], FaceForm[], Rectangle[# - 10, # + 10] & /@ corners} }]}]]$

In the final movie, the screen visuals are a mixture of ones Christopher created, ones derived from what he created, and ones that were put in separately. Occasionally one can see code. Like there’s a nice shot of rearranging alien “handwriting”, in which one sees a Wolfram Language notebook with rather elegant Wolfram Language code in it. And, yes, those lines of code actually do the transformation that’s in the notebook. It’s real stuff, with real computations being done.

A Theory of Interstellar Travel

When I first started looking at the script for the movie, I quickly realized that to make coherent suggestions I really needed to come up with a concrete theory for the science of what might be going on. Unfortunately there wasn’t much time—and in the end I basically had just one evening to invent how interstellar space travel might work. Here’s the beginning of what I wrote for the movie-makers about what I came up with that evening (to avoid spoilers I’m not showing more):

Science (Fiction) of Interstellar Spacecraft

Obviously all these physics details weren’t directly needed in the movie. But thinking them through was really useful in making consistent suggestions about the script. And they led to all sorts of science-fictiony ideas for dialogue. Here are a few of the ones that (probably for the better) didn’t make it into the final script. “The whole ship goes through space like one giant quantum particle”. “The aliens must directly manipulate the spacetime network at the Planck scale”. “There’s spacetime turbulence around the skin of the ship”. “It’s like the skin of the ship has an infinite number of types of atoms, not just the 115 elements we know” (that was going to be related to shining a monochromatic laser at the ship and seeing it come back looking like a rainbow). It’s fun for an “actual scientist” like me to come up with stuff like this. It’s kind of liberating. Especially since every one of these science-fictiony pieces of dialog can lead one into a long, serious, physics discussion.

For the movie, I wanted to have a particular theory for interstellar travel. And who knows, maybe one day in the distant future it’ll turn out to be correct. But as of now, we certainly don’t know. In fact, for all we know, there’s just some simple “hack” in existing physics that’ll immediately make interstellar travel possible. For example, there’s even some work I did back in 1982 that implies that with standard quantum field theory one should, almost paradoxically, be able to continually extract “zero point energy” from the vacuum. And over the years, this basic mechanism has become what’s probably the most quoted potential propulsion source for interstellar travel, even if I myself don’t actually believe in it. (I think it takes idealizations of materials much too far.)

Maybe (as has been popular recently) there’s a much more prosaic way to propel at least a tiny spacecraft, by pushing it at least to nearby stars with radiation pressure from a laser. Or maybe there’s some way to do “black hole engineering” to set up appropriate distortions in spacetime, even in the standard Einsteinian theory of gravity. It’s important to realize that even if (when?) we know the fundamental theory of physics, we still may not immediately be able to determine, for example, whether faster-than-light travel is possible in our universe. Is there some way to set up some configuration of quantum fields and black holes and whatever so that things behave just so? Computational irreducibility (related to undecidability, Gödel’s Theorem, the Halting Problem, etc.) tells one that there’s no upper bound on just how elaborate and difficult-to-set-up the configuration might need to be. And in the end one could use up all the computation that can be done in the history of the universe—and more—trying to invent the structure that’s needed, and never know for sure if it’s impossible.

What Are Physicists Like?

When we’re visiting the set, we eventually meet up with Jeremy Renner. We find him sitting on the steps of his trailer smoking a cigarette, looking every bit the gritty action-adventurer that I realize I’ve seen him as in a bunch of movies. I wonder about the most efficient way to communicate what physicists are like. I figure I should just start talking about physics. So I start explaining the physics theories that are relevant to the movie. We’re talking about space and time and quantum mechanics and faster-than-light travel and so on. I’m sprinkling in a few stories I heard from Richard Feynman about “doing physics in the field” on the Manhattan Project. It’s an energetic discussion, and I’m wondering what mannerisms I’m displaying—that might or might not be typical of physicists. (I can’t help remembering Oliver Sacks telling me how uncanny it was for him to see how many of his mannerisms Robin Williams had picked up for Awakenings after only a little exposure, so I’m wondering what Jeremy is going to pick up from me in these few hours.)

Jeremy is keen to understand how the science relates to the arc of the story for the movie, and what the aliens as well as humans must be feeling at different points. I try to talk about what it’s like to figure stuff out in science. Then I realize the best thing is to actually show it a bit, by doing some Wolfram Language live coding. And it turns out that the way the script is written right then, Jeremy is actually supposed to be on camera using Wolfram Language himself (just like—I’m happy to say—so many real-life physicists do).

Christopher shows some of the code he’s written for the movie, and how the controls to make the dynamics work. Then we start talking about how one sets about figuring out the code. We do some preliminaries. Then we’re off and running, doing live coding. And here’s the first example we make—based on the digits of pi that we’d been discussing in relation to SETI or Contact (the book version) or something:

RealDigits[Pi,10,100][1]]

What to Say to the Aliens

Arrival is partly about interstellar travel. But it’s much more about how we’d communicate with the aliens once they’ve showed up here. I’ve actually thought a lot about alien intelligence. But mostly I’ve thought about it in a more difficult case than in Arrival—where there are no aliens or spaceships in evidence, and where the only thing we have is some thin stream of data, say from a radio transmission, and where it’s difficult even to know if what we’ve got should be considered evidence of “intelligence” at all (remember, for example, that it often seems that even the weather can be complex enough to seem like it “has a mind of its own”).

But in Arrival, the aliens are right here. So then how should we start communicating with them? We need something universal that doesn’t depend on the details of human language or human history. Well, OK, if you’re right there with the aliens, there are physical objects to point to. (Yes, that assumes the aliens have some notion of discrete objects, rather than just a continuum, but by the time they’ve got spaceships and so on, that seems like a decently safe bet.) But what if you want to be more abstract?

Well, then there’s always mathematics. But is mathematics actually universal? Does anyone who builds spaceships necessarily have to know about prime numbers, or integrals, or Fourier series? It’s certainly true that in our human development of technology, those are things we’ve needed to understand. But are there other (and perhaps better) paths to technology? I think so.

For me, the most general form of abstraction that seems relevant to the actual operation of our universe is what we get by looking at the computational universe of possible programs. Mathematics as we’ve practiced it does show up there. But so do an infinite diversity of other abstract collections of rules. And what I realized a while back is that many of these are very relevant—and actually very good—for producing technology.

So, OK, if we look across the computational universe of possible programs, what might we pick out as reasonable universals to start an abstract discussion with aliens who’ve come to visit us?

Once one can point to discrete objects, one has the potential to start talking about numbers, first in unary, then perhaps in binary. Here’s the beginning of a notebook I made about this for the movie. The words and code are for human consumption; for the aliens there’d just be “flash cards” of the main graphics:

Establishing Communication

OK, so after basic numbers, and maybe some arithmetic, what’s next? It’s interesting to realize that even what we’ve discussed so far doesn’t reflect the history of human mathematics: despite how fundamental they are (as well as their appearance in very old traditions like the I Ching) binary numbers only got popular quite recently—long after lots of much-harder-to-explain mathematical ideas.

So, OK, we don’t need to follow the history of human mathematics or science—or, for that matter, the order in which it’s taught to humans. But we need to find things that can be understood very directly—without outside knowledge or words. Things that for example we’d recognize if we just unearthed them without context in some archeological dig.

Well, it so happens that there’s a class of computational systems that I’ve studied for decades that I think fit the bill remarkably well: cellular automata. They’re based on simple rules that are easy to display visually. And they work by repeatedly applying these rules, and often generating complex patterns—that we now know can be used as the basis for all sorts of interesting technology.

Cellular Automata

From looking at cellular automata one can actually start to build up a whole world view, or, as I called the book I wrote about such things, A New Kind of Science. But what if we want to communicate more traditional ideas in human science and mathematics? What should we do then?

Maybe we could start by showing 2D geometrical figures. Gauss suggested back around 1820 that one could carve a picture of the standard visual for the Pythagorean theorem out of the Siberian forest, for aliens to see.

It’s easy to get into trouble, though. We might think of showing Platonic solids. And, yes, 3D printouts should work. But 2D perspective renderings depend in a lot of detail on our particular visual systems. Networks are even worse: how are we to know that those lines joining nodes represent abstract connections?

One might think about logic: perhaps start showing the true theorems of logic. But how would one present them? Somehow one has to have a symbolic representation: textual, expression trees, or something. From what we know now about computational knowledge, logic isn’t a particularly good global starting point for representing general concepts. But in the 1950s this wasn’t clear, and there was a charming book (my copy of which wound up on the set of Arrival) that tried to build up a whole way to communicate with aliens using logic:

LINCOS book cover

But what about things with numbers? In Contact (the movie), prime numbers are key. Well, despite their importance in the history of human mathematics, primes actually don’t figure much in today’s technology, and when they do (like in public-key cryptosystems) it usually seems somehow incidental that they’re what’s used.

In a radio signal, primes might at first seem like good “evidence for intelligence”. But of course primes can be generated by programs—and actually by fairly simple ones, including for example cellular automata. And so if one sees a sequence of primes, it’s not immediate evidence that there’s a whole elaborate civilization behind it; it might just come from a simple program that somehow “arose naturally”.

One can easily illustrate primes visually (not least as numbers of objects that can’t be arranged in non-trivial rectangles). But going further with them seems to require concepts that can’t be represented so directly.

Hyrdogren diagram from Pioneer 10 plaque It’s awfully easy to fall into implicitly assuming a lot of human context. Pioneer 10—the human artifact that’s gone further into interstellar space than any other (currently about 11 billion miles, which is about 0.05% of the distance to α Centauri)—provides one of my favorite examples. There’s a plaque on that spacecraft that includes a representation of the wavelength of the 21-centimeter spectral line of hydrogen. Now the most obvious way to represent that would probably just be a line 21 cm long. But back in 1972 Carl Sagan and others decided to do something “more scientific”, and instead made a schematic diagram of the quantum mechanical process leading to the spectral line. The problem is that this diagram relies on conventions from human textbooks—like using arrows to represent quantum spins—that really have nothing to do with the underlying concepts and are incredibly specific to the details of how science happened to develop for us humans.

But back to Arrival. To ask a question like “what is your purpose on Earth?” one has to go a lot further than just talking about things like binary sequences or cellular automata. It’s a very interesting problem, and one that’s strangely analogous to something that’s becoming very important right now in the world: communicating with AIs, and defining what goals or purposes they should have (notably “be nice to the humans”).

In a sense, AIs are a little like alien intelligences, right now, here on Earth. The only intelligence we really understand so far is human intelligence. But inevitably every example we see of it shares all the details of the human condition and of human history. So what is intelligence like when it doesn’t share those details?

Well, one of the things that’s emerged from basic science I’ve done is that there isn’t really a bright line between the “intelligent” and the merely “computational”. Things like cellular automata—or the weather—are doing things just as complex as our brains. But even if in some sense they’re “thinking”, they’re not doing so in human-like ways. They don’t share our context and our details.

But if we’re going to “communicate” about things like purpose, we’ve got to find some way to align things. In the AI case, I’ve in fact been working on creating what I call a “symbolic discourse language” that’s a way of expressing concepts that are important to us humans, and communicating them to AIs. There are short-term practical applications, like setting up smart contracts. And there are long-term goals, like defining some analog of a “constitution” for how AIs should generally behave.

Well, in communicating with aliens, we’ve got to build up a common “universal” language that allows us to express concepts that are important to us. That’s not going to be easy. Human natural languages are based on the particulars of the human condition and the history of human civilization. And my symbolic discourse language is really just trying to capture things that are important to humans—not what might be important to aliens.

Of course, in Arrival, we already know that the aliens share some things with us. After all, like the monolith in 2001: A Space Odyssey, even from their shape we recognize the aliens’ spaceships as artifacts. They don’t seem like weird meteorites or something; they seem like something that was made “on purpose”.

But what purpose? Well, purpose is not really something that can be defined abstractly. It’s really something that can be defined only relative to a whole historical and cultural framework. So to ask aliens what their purpose is, we first have to have them understand the historical and cultural framework in which we operate.

Somehow I wonder about the day when we’ll have developed our AIs to the point where we can start asking them what their purpose is. At some level I think it’s going to be disappointing. Because, as I’ve said, I don’t think there’s any meaningful abstract definition of purpose. So there’s nothing “surprising” the AI will tell us. What it considers its purpose will just be a reflection of its detailed history and context. Which in the case of the AI—as its ultimate creators—we happen to have considerable control over.

For aliens, of course, it’s a different story. But that’s part of what Arrival is about.

The Movie Process

I’ve spent a lot of my life doing big projects—and I’m always curious how big projects of any kind are organized. When I see a movie I’m one of those people who sits through to the end of the credits. So it was pretty interesting for me to see the project of making a movie a little closer up in Arrival.

In terms of scale, making a movie like Arrival is a project of about the same size as releasing a major new version of the Wolfram Language. And it’s clear there are some similarities—as well as lots of differences.

Both involve all sorts of ideas and creativity. Both involve pulling together lots of different kinds of skills. Both have to have everything fit together to make a coherent product in the end.

In some ways I think movie-makers have it easier than us software developers. After all, they just have to make one thing that people can watch. In software—and particularly in language design—we have to make something that different people can use in an infinite diversity of different ways, including ones we can’t directly foresee. Of course, in software you always get to make new versions that incrementally improve things; in movies you just get one shot.

And in terms of human resources, there are definitely ways software has it easier than a movie like Arrival. Well-managed software development tends to have a somewhat steady rhythm, so one can have consistent work going on, with consistent teams, for years. In making a movie like Arrival one’s usually bringing in a whole sequence of people—who might never even have met before—each for a very short time. To me, it’s amazing this can work at all. But I guess over the years many of the tasks in the movie industry have become standardized enough that someone can be there for a week or two and do something, then successfully hand it on to another person.

I’ve led a few dozen major software releases in my life. And one might think that by now I’d have got to the point where doing a software release would just be a calm and straightforward process. But it never is. Perhaps it’s because we’re always trying to do majorly new and innovative things. Or perhaps it’s just the nature of such projects. But I’ve found that to get the project done to the quality level I want always requires a remarkable degree of personal intensity. Yes, at least in the case of our company, there are always extremely talented people working on the project. But somehow there are always things to do that nobody expected, and it takes a lot of energy, focus and pushing to get them all together.

At times, I’ve imagined that the process might be a little like making a movie. And in fact in the early years of Mathematica, for example, we even used to have “software credits” that looked very much like movie credits—except that the categories of contributors were things that often had to be made up by me (“lead package developers”, “expression formatting”, “lead font designer”, …). But after a decade or so, recognizing the patchwork of contributions to different versions just became too complex, and so we had to give up on software credits. Still, for a while I thought we’d try having “wrap parties”, just like for movies. But somehow when the scheduled party came around, there was always some critical software issue that had come up, and the key contributors couldn’t come to the party because they were off fixing it.

Software development—or at least language development—also has some structural similarities to movie making. One starts from a script—an overall specification of what one wants the finished product to be like. Then one actually tries to build it. Then, inevitably, at the end when one looks at what one has, one realizes one has to change the specification. In movies like Arrival, that’s post-production. In software, it’s more an iteration of the development process.

It was interesting to me to see how the script and the suggestions I made for it propagated through the making of Arrival. It reminded me quite a lot of how I, at least, do software design: everything kept on getting simpler. I’d suggest some detailed way to fix a piece of dialogue. “You shouldn’t say [the Amy Adams character] flunked calculus; she’s way too analytical for that.” “You shouldn’t say the spacecraft came a million light years; that’s outside the galaxy; say a trillion miles instead.” The changes would get made. But then things would get simpler, and the core idea would get communicated in some more minimal way. I didn’t see all the steps (though that would have been interesting). But the results reminded me quite a lot of the process of software design I’ve done so many times—cut out any complexity one can, and make everything as clear and minimal as possible.

Can You Write a Whiteboard?

My contributions to Arrival were mostly concentrated around the time the movie was shooting early in the summer of 2015. And for almost a year all I heard was that the movie was “in post-production”. But then suddenly in May of this year I get an email: could I urgently write a bunch of relevant physics on a whiteboard for the movie?

There was a scene with Amy Adams in front of a whiteboard, and somehow what was written on the whiteboard when the scene was shot was basic high-school-level physics—not the kind of top-of-the-line physics one would expect from people like the Jeremy Renner character in the movie.

Somewhat amusingly, I don’t think I’ve ever written much on a whiteboard before. I’ve used computers for essentially all my work and presentations for more than 30 years, and before that the prevailing technologies were blackboards and overhead projector transparencies. Still, I duly got a whiteboard set up in my office, and got to work writing (in my now-very-rarely-used handwriting) some things I imagined a good physicist might think of if they were trying to understand an interstellar spacecraft that had just showed up.

Here’s what I came up with. The big spaces on the whiteboard were there to make it easier to composite in Amy Adams (and particularly her hair) moving around in front of the whiteboard. (In the end, the whiteboard got rewritten yet again for the final movie, so what’s here isn’t in detail what’s in the movie.)

In writing the whiteboard, I imagined it as a place where the Jeremy Renner character or his colleagues would record notable ideas about the spacecraft, and formulas related to them. And after a little while, I ended up with quite a tale of physics fact and speculation.

Here’s a key:

Whiteboard key

(1) Maybe the spacecraft has its strange (here, poorly drawn) rattleback-like shape because it spins as it travels, generating gravitational waves in spacetime in the process.

(2) Maybe the shape of the spacecraft is somehow optimized for producing a maximal intensity of some pattern of gravitational radiation.

(3) This is Einstein’s original formula for the strength of gravitational radiation emitted by a changing mass distribution. Q_ij is the quadrupole moment of the distribution, computed from the integral shown.

(4) There are higher-order terms, that depend on higher-order multipole moments, computed by these integrals of the spacecraft mass density ρ(Ω) weighted by spherical harmonics.

(5) The gravitational waves would lead to a perturbation in the structure of spacetime, represented by the 4-dimensional tensor h_μν.

(6) Maybe the spacecraft somehow “swims” through spacetime, propelled by the effects of these gravitational waves.

(7) Maybe around the skin of the spacecraft, there’s “gravitational turbulence” in the structure of spacetime, with power-law correlations like the turbulence one sees around objects moving in fluids. (Or maybe the spacecraft just “boils spacetime” around it…)

(8) This is the Papapetrou equation for how a spin tensor evolves in General Relativity, as a function of proper time τ.

(9) The equation of geodesic motion describing how things move in (potentially curved) spacetime. Γ is the Christoffel symbol determined by the structure of spacetime. And, yes, one can just go ahead and solve such equations using NDSolve in the Wolfram Language.

(10) Einstein’s equation for the gravitational field produced by a moving mass (the field determines the motion of the mass, which in turn reacts back to change the field).

(11) A different idea is that the spacecraft might somehow have negative mass, or at least negative pressure. A photon gas has pressure 1/3 ρ; the most common version of dark energy would have pressure −ρ.

(12) The equation for the energy–momentum tensor, that specifies the combination of mass, pressure and velocity that appears in relativistic computations for perfect fluids.

(13) Maybe the spacecraft represents a “bubble” in which the structure of spacetime is different. (The arrow pointed to a schematic spacecraft shape pre-drawn on the whiteboard.)

(14) Is there anything special about the Christoffel symbols (“coefficients of the connection on the tangent fiber bundle”) for the shape of the spacecraft, as computed from its spatial metric tensor?

(15) A gravitational wave can be described as a perturbation in the metric of spacetime relative to flat background Minkowski space where Special Relativity operates.

(16) The equation for the propagation of a gravitational wave, taking into account the first few “nonlinear” effects of the wave on itself.

(17) The relativistic Boltzmann equation describing motion (“transport”) and collision in a gas of Bose–Einstein particles like gravitons.

(18) A far-out idea: maybe there’s a way of making a “laser” using gravitons rather than photons, and maybe that’s how the spacecraft works.

(19) Lasers are a quantum phenomenon. This is a Feynman diagram of self-interaction of gravitons in a cavity. (Photons don’t have these kinds of direct “nonlinear” self interactions.)

(20) How might one make a mirror for gravitons? Maybe one can make a metamaterial with a carefully constructed microscopic structure all the way down to the Planck scale.

(21) Lasers involve coherent states made from superpositions of infinite numbers of photons, as formed by infinitely nested creation operators applied to the quantum field theoretic vacuum.

(22) There’s a Feynman diagram for that: this is a Bethe–Salpeter-type self-consistent equation for a graviton bound state (which we don’t know exists) that might be relevant to a graviton laser.

(23) Basic nonlinear interactions of gravitons in a perturbative approximation to quantum gravity.

(24) A possible correction term for the Einstein–Hilbert action of General Relativity from quantum effects.

Eek, I can see how these explanation might seem like they’re in an alien language themselves! Still, they’re actually fairly tame compared to “full physics-speak”. But let me explain a bit of the “physics story” on the whiteboard.

It starts from an obvious feature of the spacecraft: its rather unusual, asymmetrical shape. It looks a bit like one of those rattleback tops that one can start spinning one way, but then it changes direction. So I thought: maybe the spacecraft spins around. Well, any massive (non-spherical) object spinning around will produce gravitational waves. Usually they’re absurdly too weak to detect, but if the object is sufficiently massive or spins sufficiently rapidly, they can be substantial. And indeed, late last year, after a 30-year odyssey, gravitational waves from two black holes spinning around and merging were detected—and they were sufficiently intense to detect from a third of the way across the universe. (Accelerating masses effectively generate gravitational waves like accelerating electric charges generate electromagnetic waves.)

OK, so let’s imagine the spacecraft somehow spins rapidly enough to generate lots of gravitational waves. And what if we could somehow confine those gravitational waves in a small region, maybe even by using the motion of the spacecraft itself? Well, then the waves would interfere with themselves. But what if the waves got coherently amplified, like in a laser? Well, then the waves would get stronger, and they’d inevitably start having a big effect on the motion of the spacecraft—like perhaps pushing it through spacetime.

But why should the gravitational waves get amplified? In an ordinary laser that uses photons (“particles of light”), one basically needs to continually make new photons by pumping energy into a material. Photons are so-called Bose–Einstein particles (“bosons”) which means that they tend to all “do the same thing”—which is why the light in a laser comes out as a coherent wave. (Electrons are fermions, which means that they try never to do the same thing, leading to the Exclusion Principle that’s crucial in making matter stable, etc.)

Just as light waves can be thought of as made up of photons, so gravitational waves can most likely be thought of as made up of gravitons (though, to be fair, we don’t yet have any fully consistent theory of gravitons). Photons don’t interact directly with each other—basically because photons interact with things like electrons that have electric charge, but photons themselves don’t have electric charge. Gravitons, on the other hand, do interact directly with each other—basically because they interact with things that have any kind of energy, and they themselves can have energy.

These kinds of nonlinear interactions can have wild effects. For example, gluons in QCD have nonlinear interactions that have the effect of keeping them permanently confined inside the particles like protons that they keep “glued” together. It’s not at all clear what nonlinear interactions of gravitons might do. The idea here is that perhaps they’d lead to some kind of self-sustaining “graviton laser”.

The formulas at the top of the whiteboard are basically about the generation and effects of gravitational waves. The ones at the bottom are mostly about gravitons and their interactions. The formulas at the top are basically all associated with Einstein’s General Theory of Relativity (which for 100 years has been the theory of gravity used in physics). The formulas at the bottom give a mixture of classical and quantum approaches to gravitons and their interactions. The diagrams are so-called Feynman diagrams, in which wavy lines schematically represent gravitons propagating through spacetime.

I have no real idea if a “graviton laser” is possible, or how it would work. But in an ordinary photon laser, the photons always effectively bounce around inside some kind of cavity whose walls act as mirrors. Unfortunately, however, we don’t know how to make a graviton mirror—just like we don’t know any way of making something that will shield a gravitational field (well, dark matter sort of would, if it actually exists). For the whiteboard, I made the speculation that perhaps there’s some weird way of making a “metamaterial” down at the Planck scale of 10^-34 meters (where quantum effects in gravity basically have to become important) that could act as a graviton mirror. (Another possibility is that a graviton laser could work more like a free-electron laser without a cavity as such.)

Now, remember, my idea with the whiteboard was to write what I thought a typical good physicist, say plucked from a government lab, might think about if confronted with the situation in the movie. It’s more “conventional” than the theory I personally came up with for how to make an interstellar spacecraft. But that’s because my theory depends on a bunch of my own ideas about how fundamental physics works, that aren’t yet mainstream in the physics community.

What’s the correct theory of interstellar travel? Needless to say, I don’t know. I’d be amazed if either the main theory I invented for the movie or the theory on the whiteboard were correct as they stand. But who knows? And of course it’d be extremely helpful if some aliens showed up in interstellar spaceships to even show us that interstellar travel is possible…

What Is Your Purpose on Earth?

If aliens show up on Earth, one of the obvious big questions is: why are you here? What is your purpose? It’s something the characters in Arrival talk about a lot. And when Christopher and I were visiting the set we were asked to make a list of possible answers, that could be put on a whiteboard or a clipboard. Here’s what we came up with:

What Is Your Purpose on Earth?

As I mentioned before, the whole notion of purpose is something that’s very tied into cultural and other context. And it’s interesting to think about what purposes one would have put on this list at different times in human history. It’s also interesting to imagine what purposes humans—or AIs—might give for doing things in the future. Perhaps I’m too pessimistic but I rather expect that for future humans, AIs, and aliens, the answer will very often be something out there in the computational universe of possibilities—that we today aren’t even close to having words or concepts for.

And Now It’s a Movie…

The movie came together really well, the early responses look great… and it’s fun to see things like this (yes, that’s Christopher’s code):

It’s been interesting and stimulating to be involved with Arrival. It’s let me understand a little more about just what’s involved in creating all those movies I see—and what it takes to merge science with compelling fiction. It’s also led me to ask some science questions beyond any I’ve asked before—but that relate to all sorts of things I’m interested in.

But through all of this, I can’t help wondering: “What if it was real, and aliens did arrive on Earth?”. I’d like to think that being involved with Arrival has made me a little more prepared for that. And certainly if their spaceships do happen to look like giant black rattlebacks, we’ll even already have some nice Wolfram Language code for that…

↧