Quantcast
Channel: Stephen Wolfram Writings
Viewing all 204 articles
Browse latest View live

Launching Wolfram|Alpha Open Code

$
0
0
opencode-thumb

Wolfram|Alpha and Wolfram Language logos

Code for Everyone

Computational thinking needs to be an integral part of modern education—and today I’m excited to be able to launch another contribution to this goal: Wolfram|Alpha Open Code.

Every day, millions of students around the world use Wolfram|Alpha to compute answers. With Wolfram|Alpha Open Code they’ll now not just be able to get answers, but also be able to get code that lets them explore further and immediately apply computational thinking.

It takes a lot of sophisticated technology to make this possible. But to the user, it’s simple. Do a computation with Wolfram|Alpha. Now in almost every section of the output you’ll see an “Open Code” link. Click it and Wolfram|Alpha will generate code for you, then open it in a fully runnable and editable notebook that you can immediately use in the Wolfram Open Cloud:

x^2 sin x in Wolfram|Alpha

The sections of the notebook parallel the sections of your Wolfram|Alpha output. But now each section contains not results, but instead core Wolfram Language code needed to get those results. You can run any piece of code by clicking the [>] button (or typing Shift+Enter):

Running code in the cloud

But the really important thing is that right there on the web you can change and extend the code, and then instantly run it again:

Plot[x^2Sin[x]/(1+Tan[x]), {x, -6.3, 6.3}]

The Power of Code

If all someone wants is a single, quick result, then classic Wolfram|Alpha should be all they’ll need. But as soon as they want to go further—that’s where Wolfram|Alpha Open Code comes in.

Let’s say you just got a mathematical result from Wolfram|Alpha:

X2cos(x)sin(y)

But then you wonder: “what happens for a whole range of exponents?” Well, it’s going to get pretty complicated to tell Wolfram|Alpha what you want just using natural language. But it’s easy to say what to do by giving a tiny bit of Wolfram Language code (and, yes, you can interactively spin those 3D surfaces around):

Table[Plot3D[x2 Cos[n x] Sin[y], {x, -3.1, 3.1}, {y, -6.6, 6.6}],{n,0,4}]

You could give code to interactively change the parameters too:

ManipulateTable[Plot3D[x2 Cos[n x] Sin[y], {x, -3.1, 3.1}, {y, -6.6, 6.6}],{n,0,10}]

Starting with Wolfram|Alpha, then extending using the Wolfram Language, is very powerful. Here’s what happens with some real-world data. Start in Wolfram|Alpha, then get the underlying Wolfram Language code (it can be made shorter, but then it’s a little less clear what’s going on):

Italy GDP

Evaluate the code to get a time series. Then plot it. And divide by the corresponding result for the US:

DateListPlot[%]  DateListPlot[Entity["Country", "Italy"][EntityProperty["Country", "GDP", {"Date" -> All, "CurrencyUnit" -> "CurrentUSDollar"}]]/Entity["Country", "UnitedStates"][EntityProperty["Country", "GDP", {"Date" -> All, "CurrencyUnit" -> "CurrentUSDollar"}]],Filling->Axis]

An important feature of notebooks is that they’re full, computable documents—and you can add whatever you want to them. You can do a whole series of computations. You can put in text to annotate what you’re doing. You can add section headings. You can edit out parts you don’t need. And so on. And of course you can do all of this in the cloud, using any modern web browser.

The Ulterior Motive

Wolfram|Alpha Open Code is going to be really useful to a lot of people—not just students. But when I invented it my immediate objective was very much educational: I wanted to be able to give the millions of students who use Wolfram|Alpha every day a taste of the power of code, and what can be achieved if one learns about code and computational thinking.

Computational thinking is a critically important skill for the future. And after 30 years of development we’re at the exciting point with the Wolfram Language of being able to directly teach serious computational thinking to a very wide range of students. I see Wolfram|Alpha Open Code as opening a window into the world of computational thinking for all the students who use Wolfram|Alpha.

There’s no learning curve to climb with Wolfram|Alpha: you just type in your question, directly in natural language. But now with Wolfram|Alpha Open Code you can explicitly see how your question gets interpreted computationally. And as soon as you want to go further, you’re immediately doing computational thinking, and writing code. You’re not doing an abstract coding exercise, or creating code in some toy context. You’re immediately using code to formulate computational ideas and get results about something you’re working on.

Of course, what makes this feasible is the character of the Wolfram Language—and its uniquely high-level knowledge-based nature. Because that’s what allows real computations that you want to do to be expressed in small amounts of code that can readily be understood and modified or extended.

Yes, the Wolfram Language has a definite structure and syntax, based on definite principles. But that’s a lot of what makes it easy to understand and to write. And in a notebook you’re always getting suggestions about what to type—and if your browser language is set to something other than English you’ll often get annotations in that language too. And the code you get from using Wolfram|Alpha Open Code will continually illustrate the core principles of the Wolfram Language.

Paths into Computational Thinking

Over the course of the past year, we’ve introduced two important paths into computational thinking, both supported by Wolfram Programming Lab, and available free in the Wolfram Open Cloud.

The first path is to start from Explorations: small projects created using code, that a student can immediately dive into, and then modify and interact with. The second path is to systematically learn the Wolfram Language, for example using my book An Elementary Introduction to the Wolfram Language.

And now Wolfram|Alpha Open Code provides a third path: start from a question that a student has asked, and then automatically generate custom code that provides a starting point for further work and thinking.

It’s a nice complement to the other two paths—and perhaps it’ll often provide encouragement to pursue one or the other of them. But it’s a perfectly good path all by itself—and students can go a long way following it.

Of course, under the hood, there’s a remarkable amount of sophisticated technology that’s being used. There’s the whole natural-language understanding system of Wolfram|Alpha that’s understanding the original question. There’s the Wolfram|Alpha computational knowledge system that’s formulating what pieces of code to generate. Then there’s the Wolfram Open Cloud, providing an interactive notebook environment on the web capable of running the code. And at the center of all of it is the Wolfram Language, with its whole integrated design and vast built-in capabilities and knowledge.

It’s taken 30 years of development to get to this point. But now we’ve been able to put everything together to create a very powerful path for students to get into computational thinking.

And I have to say that for me it’s exciting to think about kids out there using Wolfram|Alpha just for homework, but then pressing the Open Code button, and suddenly being transported into the world of code and computational thinking—and perhaps starting on a lifelong journey.

I’m thrilled to be able to provide the tools that make this possible. Try it out. Tell us what you think. Share what you do, and show others what’s possible.

To comment, please visit the copy of this post at the Wolfram Blog.


Two Hours of Experimental Mathematics

$
0
0
experiment-thumb

Stephen Wolfram leading a live experiment at MoMath

A Talk, a Performance… a Live Experiment

“In the next hour I’m going to try to make a new discovery in mathematics.” So I began a few days ago at two different hour-long Math Encounters events at the National Museum of Mathematics (“MoMath”) in New York City. I’ve been a trustee of the museum since before it opened in 2012, and I was looking forward to spending a couple of hours trying to “make some math” there with a couple of eclectic audiences from kids to retirees.

People usually assume that new discoveries aren’t things one can ever see being made in real time. But the wonderful thing about the computational tools I’ve spent decades building is that they make it so fast to implement ideas that it becomes realistic to make discoveries as a kind of real-time performance art.
Try the experiments for yourself in the Wolfram Open Cloud »

But mathematics is an old field. Haven’t all the “easy” discoveries already been made? Absolutely not! Mathematics has progressed along definite lines, steadily adding theorems about all sorts of things. Many great mathematicians (Gauss, Ramanujan, etc.) have done experimental mathematics to find out what’s true. But in general, experimental mathematics hasn’t been pursued nearly as much as it could or should have been. And that means that there’s a huge amount of “low-hanging fruit” still to be picked—even if one’s only got a couple of hours to spend doing it.

Experiment #1

My rule for live experiments is that to keep everything fresh I think of the topic only a few minutes before I start. But since this was my first-ever such event at the museum, I thought I should have a topic that’s somehow a big one for me. So I decided it should be something related to cellular automata—which were the very first examples I explored in the multi-decade journey that led to A New Kind of Science.

While their setup is nice and easy to understand, cellular automata are fundamentally systems from the computational universe, not “mathematical” systems. But what I thought I’d do for my first experiment was to look at some cellular automata that somehow have a traditional mathematical interpretation.

After introducing cellular automata (and the Wolfram Language), I started off talking about Pascal’s triangle—formed by making each number to be the sum of left and right neighbors at each step. Here’s the code I wrote to make Pascal’s triangle (yes, replacing 0 by “” is a bit hacky, but it makes everything much easier to read):

NestList[RotateLeft[#] + RotateRight[#] &, CenterArray[{1}, 21, 0, 10]/. 0 -> "" // Grid

If one does the same thing mod 2, one gets a rather clear pattern:

NestList[Mod[RotateLeft[#] + RotateRight[#],2] &, CenterArray[{1}, 21, 0], 10]/. 0 -> "" // Grid

And one can think of this as a cellular automaton, with this rule:

RulePlot[CellularAutomaton[90]]

Here’s what happens if one runs this, starting from a single 1:

ArrayPlot[CellularAutomaton[90, {{1}, 0}, 50]]

And here’s the same result, from the “mathematical” code:

ArrayPlot[  NestList[Mod[RotateLeft[#] + RotateRight[#], 2] &,    CenterArray[{1}, 101, 0], 50]]

OK, so what happens if we change the math a bit? Instead of using mod 2, let’s use mod 5:

ArrayPlot[  NestList[Mod[RotateLeft[#] + RotateRight[#], 5] &,    CenterArray[{1}, 101, 0], 50]]

It’s still a regular pattern. But here was my idea for the experiment: explore what happens if the rule involves mathematical operations other than pure addition.

So what about multiplication? I was mindful of the fact that all the 0s in the initial conditions would tend to make a lot of 0s. So I thought: let’s try adding constants before doing the multiplication. And here’s the first thing I tried:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 5] &,    CenterArray[{1}, 101, 0], 50]]

I was pretty surprised. I wasn’t expecting anything that complicated. But, OK, I thought, let’s back off and try an even simpler rule: let’s use mod 3 instead of mod 5. (Mod 2 would already have been covered by my exhaustive study of the “elementary cellular automata”.)

Here’s the result I got:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3] &,    CenterArray[{1}, 101, 0], 50]]

I immediately said, “I wonder how fast that pattern grows.” I guessed it might be a logarithm or a square root.

But before going on, I wanted to scope out what else was there in the space of rules like this. Just to check, I ran the mod 2 case. As expected, nothing interesting.

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 2] &,     CenterArray[{1}, 101, 0], 50]], {a, 0, 1}, {b, 0, 1}]

OK, now the mod 3 case:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 3] &,     CenterArray[{1}, 101, 0], 50]], {a, 0, 2}, {b, 0, 2}]

An interesting little collection. But then it was time to analyze the growth of those patterns.

The first step, as suggested by someone in the audience, was just to rotate the list every step, to make the straight edge be vertical:

ArrayPlot[  NestList[RotateRight[     Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,    CenterArray[{1}, 100, 0], 50]]

Then we picked every other step, to get rid of the horizontal stripes:

ArrayPlot[  Take[NestList[    RotateRight[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,     CenterArray[{1}, 200, 0], 200], 1 ;; -1 ;; 2]]

And—when in doubt—just run it longer, here for 3000 steps. Well, my guess about square root or logarithm was wrong: this looks roughly linear, albeit irregular.

ArrayPlot[  Take[NestList[    RotateRight[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,     CenterArray[{1}, 3000, 0], 3000], 1 ;; -1 ;; 2]]

I was disappointed that this was so gray and hard to read. Trying colors didn’t help, though; the pattern is just rather sparse.

Well, then I tried to just plot the position of the right-hand edge. Here’s the code I came up with:

data = (First[#] - #) &[    Flatten[FirstPosition[Reverse[#], 1 | 2] & /@       Take[NestList[        RotateRight[          Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,         CenterArray[{1}, 2000, 0], 2000], 1 ;; -1 ;; 2]]];

Here’s a fit:

Fit[data, {1, x}, x]

OK, how can one get some better analysis? First, I took differences to see the growth at each step: always either 0 or 2 cells. Then I looked for runs of growth or no growth. And then I looked specifically for runs of growth, and saw how long the successive runs were.

runs = Length /@ Take[Split[Differences[data]], 1 ;; -1 ;; 2];

What is this? Being New York, there were lots of finance people in the audience—including in the front row a world expert on power laws. So the obvious question was, did the spikes have a power-law distribution of sizes? The results, based on the data I had, were inconclusive:

Histogram[runs]

But instead of looking further at this particular rule, I decided to take a quick look at the case of higher moduli. These are the results I got for mod 4:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 4] &,     CenterArray[{1}, 100, 0], 50], PlotLabel -> {a, b}], {a, 0, 3}, {b,    0, 3}]

There was one that looked interesting here:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(1 + RotateRight[#]), 4] &,    CenterArray[{1, 2, 1}, 100, 0], 50]]

Would it end up having lots of different possible structures? Trying it with random initial conditions made it look like it was never going to have anything other than repetitive behavior:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(1 + RotateRight[#]), 4] &,    RandomInteger[2, 100], 50]]

Well, by this point our time was basically up. But it was hard to stop. I quickly tried the case of mod 5—and discovered all sorts of interesting behavior:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 5] &,     CenterArray[{1, 2}, 100, 0], 50], PlotLabel -> {a, b}], {a, 0,    4}, {b, 0, 4}]

I just had to check out a couple of these. One that has an overall nested pattern, but with lots of complicated stuff going on “in the background”:

ArrayPlot[  NestList[Mod[(3 + RotateLeft[#])*(3 + RotateRight[#]), 5] &,    CenterArray[{1, 2}, 400, 0], 200]]

And one with a mixture of regular and irregular growth:

ArrayPlot[  Take[NestList[Mod[(2 + RotateLeft[#])*(4 + RotateRight[#]), 5] &,     CenterArray[{1, 2}, 2000, 0], 1000], 1 ;; -1 ;; 2]]

It was time to stop. But I was pretty satisfied. Live experiments are always risky. And we might have found nothing interesting. But instead we found something really interesting: rich and complex behavior based on iterating rules given by simple algebraic formulas. In a sense what we found is an example of a bridge between traditional mathematical constructs (like algebraic formulas), and pure computational systems, with arbitrary computational rules. In an hour we certainly didn’t finish—but we found a seed for all sorts of future research—on what we might call “MoMath Cellular Automata.”

Experiment #2

After a break, it was time for experiment #2. This time I decided to do something more related to numbers. I started by talking about reversal-addition systems—where at each step one adds a number to the number obtained by reversing its digits. I showed the result for base 10, starting from the number 123:

PadLeft[IntegerDigits[    NestList[FromDigits[Reverse[IntegerDigits[#]]] + # &, 123,      100]]] // ArrayPlot

Then I said, “Instead of reversing the digits, let’s just rotate them to the left. And let’s make the system simpler, by using base 2 instead of base 10.”

This was the sequence of numbers obtained, starting from 1:

NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1, 10]

Someone asked whether it was a recognizable sequence. FindSequenceFunction didn’t think so:

FindSequenceFunction[%]

Then the question was, what’s the overall pattern? Here’s the result for 100 steps:

PadLeft[IntegerDigits[    NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1,      100], 2]] // ArrayPlot" src="http://blog.stephenwolfram.com/data/uploads/2017/03/21.png

It looks remarkably complex. And doing 1000 steps doesn’t make it look any simpler:

PadLeft[IntegerDigits[    NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1,      1000], 2]] // ArrayPlot" src="http://blog.stephenwolfram.com/data/uploads/2017/03/22.png

What about starting with something other than 1?

Table[PadLeft[    IntegerDigits[     NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, n,       100], 2]] // ArrayPlot, {n, 10}]

All pretty similar. I wondered if rotating right, rather than left, would make a difference. It really didn’t:

Table[PadLeft[IntegerDigits[     NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, n,       100], 2]] // ArrayPlot, {n,10}]

I thought maybe it’d be interesting to have a fixed number of digits, so I tried reducing mod 220, to keep only the last 20 digits:

Table[PadLeft[    IntegerDigits[     NestList[FromDigits[RotateRight[IntegerDigits[#, 2]], 2], 2^20] + # &, n,       100], 2]] // ArrayPlot, {n, 10}]

Table[FindTransientRepeat[NestList[Mod[FromDigits[RotateRight[IntegerDigits[#,2]],2+#,2^n]&,1,1000],4,{n,8}] // Column" src="http://blog.stephenwolfram.com/data/uploads/2017/03/26.png

Then I decided to make complete transition graphs for all 2n states in each case. Curious-looking pictures, but not immediately illuminating.

Table[Labeled[   Graph[# ->        Mod[FromDigits[RotateRight[IntegerDigits[#, 2]], 2] + #,         2^n] & /@ Range[2^n]], n], {n, 2, 9}]

By now I was wondering: “Is there a still simpler system involving digit rotation that does something interesting?” I wondered what would happen if instead of adding in the original number at each step, I just multiplied by 2, and added some constant. This didn’t immediately lead to anything interesting:

Table[PadLeft[    IntegerDigits[     NestList[2 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + n &,       1, 100], 2]] // ArrayPlot, {n, 6}]

So then I wondered about multiplying by 3:

Table[PadLeft[    IntegerDigits[     NestList[3 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + n &,       1, 100], 2]] // ArrayPlot, {n, 7}]

Again, nothing too exciting. But—just to be complete—I thought I’d better run the experiment of looking at a sequence of other multipliers.

Table[Labeled[   PadLeft[IntegerDigits[      NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 100], 2]] // ArrayPlot, a], {a, 20}]

Similar behavior until—aha!—something weird and complicated happens when one gets to multiplier 13.

There was an immediate guess from the audience that primes might be special. But that theory was quickly exploded by the case of multiplier 21.

Table[Labeled[   PadLeft[IntegerDigits[      NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 100], 2]] // ArrayPlot, a], {a, 20, 24}]

OK, so then the hunt was on for what was special about the multipliers that led to complex behavior. But first we had to figure out how to recognize complex behavior. I thought I’d try something newfangled: using machine learning to make a feature space plot of the images for different multipliers.

It was somewhat interesting—and a nice application of machine learning—but not immediately too useful. (To make it better, one would have to think harder about the feature extractor to use.)

imags = Table[    PadLeft[IntegerDigits[       NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,         1, 100], 2]] // Image, {a, 50}]; FeatureSpacePlot[imags]

So how could one tell from that which were the complex patterns? A histogram of entropies wasn’t obviously illuminating:

Histogram[Entropy /@ imags]

As I was writing this blog post, I thought I should find the entropy distribution more accurately; even including 1000 possible multipliers, it still doesn’t seem terribly helpful:

Histogram[  Entropy /@    Table[PadLeft[      IntegerDigits[       NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,         1, 100], 2]] // Image, {a, 1000}]]

An expert in telecom math in the front row suggested taking a Fourier transform. I said I wasn’t hopeful:

Image[Abs[Fourier[ImageData[]]]]

Yes, there are better ways to do the Fourier transform. But someone else (a hedge-fund CEO, as it happened) suggested looking at the occurrences of particular 2×2 blocks in each pattern. For the case of multiplier 13, lots of blocks occur:

Counts[Flatten[   Partition[    PadLeft[IntegerDigits[      NestList[13 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 20], 2]], {2, 2}], 1]]

But for the case of multiplier 5, where the pattern is simple, most blocks never occur:

Counts[Flatten[   Partition[    PadLeft[IntegerDigits[      NestList[5 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 20], 2]], {2, 2}], 1]]

So this suggested that we just generate a list of how many of the 16 possible blocks actually do occur, for each multiplier:

blks = Table[   Length[Union[     Flatten[Partition[       PadLeft[IntegerDigits[         NestList[          a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &, 1,           20], 2]], {2, 2}], 1]]], {a, 50}]

Here’s a plot:

ListLinePlot[blks]

Where are the 16s?

Flatten[Position[blks, 16]]

FindSequenceFunction didn’t have any luck with these numbers. Plotting the “block count” for longer gave this:

ListLinePlot[  Table[Length[    Union[Flatten[      Partition[       PadLeft[IntegerDigits[         NestList[          a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &, 1,           20], 2]], {2, 2}], 1]]], {a, 1000}]]

Definitely some structure. But it’s not clear what it is.

And once again, we were out of time—having found an interesting kind of system with the curious property that it’s usually complex in its behavior, but for some special cases, isn’t.

The Live Experiment Process

I’ve done many live experiments over the years—though it’s been a while since they were about math. And as the Wolfram Language has evolved, it’s become easier and easier to do the experiments nicely and smoothly—without time wasted on glitches and debugging.

Wolfram Notebooks have the nice little feature that they (by default) keep a Notebook History (see the Cell menu)—that shows when each cell in the notebook has been modified. Here are the results for Experiment #1 and Experiment #2. Mostly they show rather linear progress, with comparatively little backtracking. (There’s a gap in Experiment #2, which came because my network connection suddenly stopped working. Conveniently, there were some networking experts in the audience—and eventually it was determined that the USB-C connection from my fine new computer to the projector had somehow misnegotiated itself as an Ethernet connection…)

Cell history

Every year at our Summer School I start out by doing a live experiment or two—because I think live experiments are a great way to show just how accessible discovery can be if one approaches it the right way, and with the right tools. I’m expecting that live experiments will be an important part of the process of educating people about computational thinking too.

With the Wolfram Language, one can do live experiments—and live coding—about all sorts of things. (We even tried doing a prototype Live Coding Competition at our last Wolfram Technology Conference; it worked well, and we’ll probably develop it into a whole program.)

But whether they’re live or not, computer experiments are an incredibly powerful methodology for making discoveries—not least in mathematics.

Of course, it’s easy to generate all kinds of random facts about mathematics. The issue is: how does one generate “interesting” facts? In a first approximation, for a fact to be interesting to us humans, it has to relate to things we care about. Those things could be technological applications, observations about the real world—or just pieces of mathematics that have, for whatever reason, historically been studied (think Fermat’s Last Theorem, for example).

I like to think that my book A New Kind of Science significantly broadened the kinds of “math-like facts” that one might consider “interesting”—by providing a general intellectual framework (about computation, complexity, and so on) into which those facts can be fit.

But part of the skill needed to do good experimental mathematics is to look for facts that somehow can ultimately be related to larger frameworks, and ultimately to the traditions of mathematics. Like in any area of research, it takes experience and intuition—and luck can help too.

But in experimental mathematics, it’s extremely easy to get started: there’s plenty of fertile territory to be explored, even with quite elementary mathematical ideas. We just happen to live at a time when the tools to make this kind of exploration feasible first exist. (Of course, I’ve spent a lot of my life building them…)

How should experimental mathematics be done? Perhaps there could be “math-a-thons” (or “discover-a-thons”), analogous to hackathons, where the output is math papers, not software projects.

More than 30 years ago I started the journal Complex Systems—and one of my long-term goals was to make it a repository for results in experimental mathematics. It certainly has published plenty of them, but the standard form of modern academic papers isn’t optimized for experimental mathematics. Instead, one can imagine some kind of “Discoveries in Experimental Mathematics,” that is much more oriented towards straightforward reports of the results of experiments.

In some ways it would be a return to an earlier style of scientific publishing—like all those papers from the 1800s reporting sighting of strange new animals or physical phenomena. But what’s new today is that with the Wolfram Language—and particularly with Notebook documents—it’s possible not just to report on what one’s seen, but instead to give a completely reproducible version of it, that anyone else can also explore. (And if there’s a giant computation involved, to store the results in a cloud object.)

I’m hoping that finally it’ll now be possible to establish a serious ecosystem for experimental mathematics. A place where results can be found, presented clearly with good visualization and so on, and published in a form where others can build on them. It’s been a long time coming, but I think it’s going to be an important direction for mathematics going forward.

And it was fun for me (and I hope for the audience too) to spend a couple of hours prototyping it live and in public a few days ago.


Download the complete notebooks:
Session #1/Experiment #1 »
Session #2/Experiment #2 »

Try the experiments for yourself in the Wolfram Open Cloud »

The R&D Pipeline Continues: Launching Version 11.1

$
0
0
v11-1-thumb

A Minor Release That’s Not Minor

I’m pleased to announce the release today of Version 11.1 of the Wolfram Language (and Mathematica). As of now, Version 11.1 is what’s running in the Wolfram Cloud—and desktop versions are available for immediate download for Mac, Windows and Linux.

What’s new in Version 11.1? Well, actually a remarkable amount. Here’s a summary:

Summary of new features

There’s a lot here. One might think that a .1 release, nearly 29 years after Version 1.0, wouldn’t have much new any more. But that’s not how things work with the Wolfram Language, or with our company. Instead, as we’ve built our technology stack and our procedures, rather than progressively slowing down, we’ve been continually accelerating. And now even something as notionally small as the Version 11.1 release packs an amazing amount of R&D, and new functionality.

A Visual Change

There’s one very obvious change in 11.1: the documentation looks different. We’ve spiffed up the design, and on the web we’ve made everything responsive to window width—so it looks good even when it’s in a narrow sidebar in the cloud, or on a phone.

Wolfram Language documentation

We’ve also introduced some new design elements—like the mini-view of the Details section. Most people like to see examples as soon as they get to a function page. But it’s important not to forget the Details—and the mini-view provides what amounts to a little “ad” for them.

Examples and details

Lots of New Functions

Here’s a word cloud of new functions in Version 11.1:

Word cloud of new functions

Altogether there are an impressive 132 new functions—together with another 98 that have been significantly enhanced. These functions represent the finished output of our R&D pipeline in just the few months that have elapsed since Version 11.0 was released.

When we bring out a major “integer” release—like Version 11—we’re typically introducing a bunch of complete, new frameworks. In (supposedly) minor .1 releases like Version 11.1, we’re not aiming for complete new frameworks. Instead, there’s typically new functionality that’s adding to existing frameworks—together with a few (sometimes “experimental”) hints of major new frameworks to come. Oh, and if a complete, new framework does happen to be finished in time for a .1 release, it’ll be there too.

Neural Nets

One very hot area in which Version 11.1 makes some big steps forward is neural nets. It’s been exciting over the past few years to see this area advance so quickly in the world at large, and it’s been great to see the Wolfram Language at the very leading edge of what’s being done.

Our goal is to define a very high-level interface to neural nets, that’s completely integrated into the Wolfram Language. Version 11.1 adds some new recently developed building blocks—in particular 30 new types of neural net layers (more than double what was there in 11.0), together with automated support for recurrent nets. The concept is always to let the neural net be specified symbolically in the Wolfram Language, then let the language automatically fill in the details, interface with low-level libraries, etc. It’s something that’s very convenient for ordinary feed-forward networks (tensor sizes are all knitted together automatically, etc.)—but for recurrent nets (with variable-length sequences, etc.) it’s something that’s basically essential if one’s going to avoid lots of low-level programming.

Another crucial feature of neural nets in the Wolfram Language is that it’s set up to be automatic to encode images, text or whatever in an appropriate way. In Version 11.1, NetEncoder and NetDecoder cover a lot of new cases—extending what’s integrated into the Wolfram Language.

It’s worth saying that underneath the whole integrated symbolic interface, the Wolfram Language is using a very efficient low-level library—currently MXNet—which takes care of optimizing ultimate performance for the latest CPU and GPU configurations. By the way, another feature enhanced in 11.1 is the ability to store complete neural net specifications, complete with encoders, etc. in a portable and reusable .wlnet file.

There’s a lot of power in treating neural nets as symbolic objects. In 11.1 there are now functions like NetMapOperator and NetFoldOperator that symbolically build up new neural nets. And because the neural nets are symbolic, it’s easy to manipulate them, for example breaking them apart to monitor what they’re doing inside, or systematically comparing the performance of different structures of net.

In some sense, neural net layers are like the machine code of a neural net programming system. In 11.1 there’s a convenient function—NetModel—that provides pre-built trained or untrained neural net models. As of today, there are a modest number of famous neural nets included, but we plan to add more every week—surfing the leading edge of what’s being developed in the neural net research community, as well as adding some ideas of our own.

Here’s a simple example of NetModel at work:

net = NetModel["LeNet Trained on MNIST Data"]

Now apply the network to some actual data—and see it gets the right answer:

net[{6,8,0}]

But because the net is specified symbolically, it’s easy to “go inside” and “see what it’s thinking”. Here’s a tiny (but neat) piece of functional programming that visualizes what happens at every layer in the net—and, yes, in the end the first square lights up red to show that the output is 0:

FoldPairList[{ArrayPlot[ArrayFlatten[Partition[#1, UpTo[5]]],      ColorFunction -> "Rainbow"], #2[#1]} &,   NetExtract[net, "Input"][0], Normal[net]]

More Machine Learning

Neural nets are an important method for machine learning. But one of the core principles of the Wolfram Language is to provide highly automated functionality, independent of underlying methods. And in 11.1 there’s a bunch more of this in the area of machine learning. (As it happens, much of it uses the latest deep learning neural net methods, but for users what’s important is what it does, not how it does it.)

My personal favorite new machine learning function in 11.1 is FeatureSpacePlot. Give it any collection of objects, and it’ll try to lay them out in an appropriate “feature space”. Like here are the flags of countries in Europe:

FeatureSpacePlot[EntityValue[=countries in Europe, "FlagImage"]]

What’s particularly neat about FeatureSpacePlot is that it’ll immediately use sophisticated pre-trained feature extractors for specific classes of input—like photographs, texts, etc. And there’s also now a FeatureNearest function that’s the analog of Nearest, but operates in feature space. Oh, and all the stuff with NetModel and pre-trained net models immediately flows into these functions, so it becomes trivial, say, to experiment with “meaning spaces”:

FeatureSpacePlot[{"dog", "ant", "bear", "moose", "cucumber", "bean",    "broccoli", "cabbage"},   FeatureExtractor ->    NetModel["GloVe 50-Dimensional Word Vectors Trained on Wikipedia \ and Gigaword-5 Data"]]

Particularly with NetModel, there are all sorts of very useful few-line neural net programs that one can construct. But in 11.1 there are also some major new, more infrastructural, machine learning capabilities. Notable examples are ActiveClassification and ActivePrediction—which build classifiers and predictors by actively sampling a space, learning how to do this as efficiently as possible. There will be lots of end-user applications for ActiveClassification and ActivePrediction, but for us internally the most immediately interesting thing is that we can use these functions to optimize all sorts of meta-algorithms that are built into the Wolfram Language.

Audio

Version 11.0 began the process of making audio—like images—something completely integrated into the Wolfram Language. Version 11.1 continues that process. For example, for desktop systems, it adds AudioCapture to immediately capture audio from a microphone on your computer. (Yes, it’s nontrivial to automatically handle out-of-core storage and processing of large audio samples, etc.) Here’s an example of me saying “hello”:

Play Audio
AudioCapture

You can immediately take this, and, say, make a cepstrogram (yes, that’s another new audio function in 11.1):

Cepstrogram[%]

Images & Visualization

Version 11.1 has quite an assortment of new features for images and visualization. CurrentImage got faster and better. ImageEffect has lots of new effects added. There are new functions and options to support the latest in computational photography and computational microscopy. And images got even more integrated as first-class objects—that one can for example now immediately do arithmetic with:

Sqrt[2 Wolfie Image]-EdgeDetect[Wolfie Image]

Something else with images—that I’ve long wanted—is the ability to take a bitmap image, and find an approximate vector graphics representation of it:

ImageGraphics[Poke Spikey]

TextRecognize has also become significantly stronger—in particular being able to pick out structure in text, like paragraphs and columns and the like.

Oh, and in visualization, there are things like GeoBubbleChart, here showing the populations of the largest cities in the US:

GeoBubbleChart[EntityValue[United States["LargestCities"], {"Position",     "Population"}]]

There’s lots of little (but nice) stuff too. Like support for arbitrary callouts in pie charts, optimized labeling of discrete histograms and full support of scaling functions for Plot3D, etc.

More Data

There’s always new data flowing into the Wolfram Knowledgebase, and there’ve also been plenty of completely new things added since 11.0: 130,000+ new types of foods, 250,000+ atomic spectral lines, 12,000+ new mountains, 10,000+ new notable buildings, 300+ types of neurons, 650+ new waterfalls, 200+ new exoplanets (because they’ve recently been discovered), and lots else (not to mention 7,000+ new spelling words). There’s also, for example, much higher resolution geo elevation data—so now a 3D-printable Mount Everest can have much more detail:

ListPlot3D[GeoElevationData[GeoDisk[Mount Everest]], Mesh -> None]

Integrated External Services

Something new in Version 11.1 are integrated external services—that allow built-in functions that work by calling external APIs. Two examples are WebSearch and WebImageSearch. Here are thumbnail images found by searching the web for “colorful birds”:

WebImageSearch["colorful birds", "Thumbnails"]

For the heck of it, let’s see what ImageIdentify thinks they are (oh, and in 11.1. ImageIdentify is much more accurate, and you can even play with the network inside it by using NetModel):

ImageIdentify /@ %

Since WebSearch and WebImageSearch use external APIs, users need to pay for them separately. But we’ve set up what we call Service Credits to make this seamless. (Everything’s in the language, of course, so there’s for example $ServiceCreditsAvailable.)

There will be quite a few more examples of integrated services in future versions, but in 11.1, beyond web searching, there’s also TextTranslation. WordTranslation (new in 11.0) handles individual word translation for hundreds of languages; now in 11.1 TextTranslation uses external services to also translate complete pieces of text between several tens of languages:

TextTranslation["This is an integrated external service.", "French"]

More Math, More Algorithms

A significant part of our R&D organization is devoted to continuing our three-decade effort to push the frontiers of mathematical and algorithmic computation. So it should come as no surprise that Version 11.1 has all sorts of advances in these areas. There’s space-filling curves, fractal meshes, ways to equidistribute points on a sphere:

Graphics[HilbertCurve[5]] MengerMesh[3, 3] Graphics3D[Sphere[SpherePoints[200], 0.1]]

There are new kinds of spatial, robust and multivariate statistics. There are Hankel transforms, built-in modular inverses, and more. Even in differentiation, there’s something new: nth order derivatives, for symbolic n:

D[x Exp[x], {x, n}]

Here’s something else about differentiation: there are now functions RealAbs and RealSign that are versions of Abs and Sign that are defined only by the real axis, and so can freely be differentiated, without having to give any assumptions about variables.

In Version 10.1, we introduced the function AnglePath, that computes a path from successive segments with specified lengths and angles. At some level, AnglePath is like an industrial-scale version of Logo (or Scratch) “turtle geometry”. But AnglePath has turned out to be surprisingly broadly useful, so for Version 11.1, we’ve generalized it to AnglePath3D (and, yes, there are all sorts of subtleties about frames and Euler angles and so on).

A Language of Granular Dates

When we say “June 23, 1988”, what do we mean? The beginning of that day? The whole 24-hour period from midnight to midnight? Or what? In Version 11.1 we’ve introduced the notion of granularity for dates—so you can say whether a date is supposed to represent a day, a year, a second, a week starting Sunday—or for that matter just an instant in time.

It’s a nice application of the symbolic character of the Wolfram Language—and it solves all sorts of problems in dealing with dates and times. In a way, it’s a little like precision for numbers, but it’s really its own thing. Here for example is how we now represent “the current week”:

CurrentDate["Week"]

Here’s the current decade:

CurrentDate["Decade"]

This is the next month from now:

NextDate["Month"]

This says we want to start from next month, then add 7 weeks—getting another month:

NextDate["Month"] + =7wk

And here’s the result to the granularity of a month:

CurrentDate[%, "Month"]

Talking of dates, by the way, one of the things that’s coming across the system is the use of Dated as a qualifier, for example for properties of entities of the knowledgebase (so this asks for the population of New York City in 1970):

New York City [ Dated[ "Population", 1970 ] ]

Language Tweaks

I’m very proud of how smooth the Wolfram Language is to use—and part of how that’s been achieved is that for 30 years we’ve been continually polishing it. We’re always making sure everything fits perfectly together—and we’re always adding little conveniences.

One of our principles is that if there’s a lump of computational work that people repeatedly do, then so long as there’s a good name for it (that people will readily remember, and readily recognize when they see it in a piece of code), it should be inserted as a built-in function. A very simple example in Version 11.1 is ReverseSort:

ReverseSort[{1, 2, 3, 4}]

(One might think: what’s the point of this—it’s just Reverse[Sort[...]]. But it’s very common to want to map what’s now ReverseSort over a bunch of objects, and it’s smoother to be able to say ReverseSort /@ ... rather than Reverse[Sort[#]]& /@ ... or Reverse@*Sort /@ ...).

Another little convenience: Nearest now has special ways to specify useful things to return. For example, this gives the distances from 2.7 to the 5 nearest values:

Nearest[{1, 2, 3, 4, 5, 6, 7} -> "Distance", 2.7, 5]

CellularAutomaton is a very broad function. Version 11.1 makes it easier to use for common cases by allowing rules to be specified by associations with labeled elements:

ArrayPlot[  CellularAutomaton[<|"OuterTotalisticCode" -> 110, "Dimension" -> 2,     "Neighborhood" -> 5|>, {{{1}}, 0}, {{{50}}}]]

We’re always trying to make sure that patterns we’ve established get used as broadly as possible. Like in 11.1, you can use UpTo in lots of new places, like in ImageSize specifications.

We also always trying to make sure that things are as general as possible. Like IntegerString now works not only with the standard representation of integers, but also with traditional ones used for different purposes around the world:

IntegerString[12345, "TraditionalChineseFinancial"]

And IntegerName can also now handle different types and languages of names:

IntegerName[12345, {"French", "Ordinal"}]

And there are lots more examples—each making the experience of using the Wolfram Language just a little bit smoother.

A Language of Persistence

If you make a definition list x=7, or $TimeZone=11, the definition will persist until you clear it, or until your session is over. But what if you want a definition that persists longer—say across all your sessions? Well, in Version 11.1 that’s now possible, thanks to PersistentValue.

PersistentValue lets you specify a name (like "foo"), and a "persistence location". (It also allows options like PersistenceTime and ExpirationDate.) The persistence location can just be "KernelSession"—which means that the value lasts only for a single kernel session. But it can also be "FrontEndSession", or "Local" (meaning that it should be the same whenever you use the same computer), or "Cloud" (meaning that it’s globally synchronized across the cloud).

PersistentValue is pretty general. It lets you have values in different places (like different private clouds, for example); then there’s a $PersistencePath that defines the order to look at them in, and a MergingFunction that specifies how (if at all) the values should be merged.

Systems-Level Programming

One of the goals of the Wolfram Language is to be able to interact as broadly as possible with all computational ecosystems. Version 11.1 adds support for the M4A audio format, the .ubj binary JSON format, as well as .ini files and Java .properties files. There’s also a new function, BinarySerialize, that converts any Wolfram Language expression into a new binary (“WXF”) form, optimized for speed or size:

BinarySerialize[RandomGraph[{50, 100}]]

BinaryDeserialize gets it back:

BinaryDeserialize[%]

Version 11.0 introduced WolframScript—a command-line interface to the Wolfram Language, running either locally or in the cloud. With WolframScript you can create standalone Wolfram Language programs that run from the shell. There are several enhancements to WolframScript itself in 11.1, but there’s also now a new New > Script menu item that gives you a notebook interface for creating .wls (=“Wolfram Language Script”) files to be run by WolframScript:

WolframScript

Strengthening the Infrastructure

One of the major ways the Wolfram Language has advanced in recent times has been in its deployability. We’ve put a huge amount of work into making sure that the Wolfram Language can be robustly deployed at scale (and there are now lots of examples of successes out in the world).

We make updates to the Wolfram Cloud very frequently (and invisibly), steadily enhancing server performance and user interface capabilities. Along with Version 11.1 we’ve made some major updates. There are a few signs of this in the language.

Like there’s now an option AutoCopy that can be set for any cloud object—and that means that every time the object is accessed, one should get a fresh copy of it. This is very useful if, for example, you want to have a notebook that lots of people can separately modify. (“Explore these ideas; here’s a notebook to start from…”, etc.)

CloudDeploy[APIFunction[...]] makes it extremely easy to deploy web APIs. In Version 11.1 there are some options to automate aspects of how those APIs behave. For example, there’s AllowedCloudExtraParameters, which lets you say that APIs can have parameters like "_timeout" or "_geolocation" automated. There’s also AllowedCloudParameterExtensions (no, it’s not the longest name in the system; that honor currently goes to MultivariateHypergeometricDistribution). What AllowedCloudParameterExtensions does is to let you say not just x=value, but x__url=..., or x__json=....

Another thing about Version 11.1 is that it’s got various features added to support private instances of the Wolfram Cloud—and our major new Wolfram Enterprise Private Cloud product (with a first version released late last year). For example, in addition to $WolframID for the Wolfram Cloud, there’s also $CloudUserID that’s generalized to allow authentication on private clouds. And inside the system, there are all sorts of new capabilities associated with “multicloud authentication” and so on. (Yes, it’s a complicated area—but the symbolic character of the Wolfram Language lets one handle it rather beautifully.)

And There’s More

OK, so I’ve summarized some of what’s in 11.1. There’s a lot more I could say. New functions, and new capabilities—each of which is going to be exciting to somebody. But to me it’s actually pretty amazing that I can write this long a post about a .1 release! It’s a great testament to the strength of the R&D pipeline—and to how much can be done with the framework we’ve built in the Wolfram Language over the past 30 years.

We always work on a portfolio of projects—from small ones that get delivered very quickly, to ones that may take a decade or more to mature. Version 11.1 has the results of several multi-year projects (e.g. in machine learning, computational geometry, etc.), and a great many shorter projects. It’s exciting for us to be able to deliver the fruits of our efforts, and I look forward to hearing what people do with Version 11.1—and to seeing successive versions continue to be developed and delivered.


To comment, please visit the copy of this post at the Wolfram Blog »

Launching the Wolfram Data Repository: Data Publishing that Really Works

$
0
0
launching-the-wolfram-data-repository

After a Decade, It’s Finally Here!

I’m pleased to announce that as of today, the Wolfram Data Repository is officially launched! It’s been a long road. I actually initiated the project a decade ago—but it’s only now, with all sorts of innovations in the Wolfram Language and its symbolic ways of representing data, as well as with the arrival of the Wolfram Cloud, that all the pieces are finally in place to make a true computable data repository that works the way I think it should.

Wolfram Data Respository

It’s happened to me a zillion times: I’m reading a paper or something, and I come across an interesting table or plot. And I think to myself: “I’d really like to get the data behind that, to try some things out”. But how can I get the data?

If I’m lucky there’ll be a link somewhere in the paper. But it’s usually a frustrating experience to follow it. Because even if there’s data there (and often there actually isn’t), it’s almost never in a form where one can readily use it. It’s usually quite raw—and often hard to decode, and perhaps even intertwined with text. And even if I can see the data I want, I almost always find myself threading my way through footnotes to figure out what’s going on with it. And in the end I usually just decide it’s too much trouble to actually pull out the data I want.

And I suppose one might think that this is just par for the course in working with data. But in modern times, we have a great counterexample: the Wolfram Language. It’s been one of my goals with the Wolfram Language to build into it as much data as possible—and make all of that data immediately usable and computable. And I have to say that it’s worked out great. Whether you need the mass of Jupiter, or the masses of all known exoplanets, or Alan Turing’s date of birth—or a trillion much more obscure things—you just ask for them in the language, and you’ll get them in a form where you can immediately compute with them.

Here’s the mass of Jupiter (and, yes, one can use “Wolfram|Alpha-style” natural language to ask for it):

Jupiter(planet)["Mass"]

Dividing it by the mass of the Earth immediately works:

Entity["Planet", "Jupiter"]["Mass"]/Entity["Planet", "Earth"]["Mass"]

Here’s a histogram of the masses of known exoplanets, divided by the mass of Jupiter:

Histogram[  EntityClass["Exoplanet", All]["Mass"]/   Entity["Planet", "Jupiter"]["Mass"]]

And here, for good measure, is Alan Turing’s date of birth, in an immediately computable form:

Alan Turing(person)["BirthDate"]

Of course, it’s taken many years and lots of work to make everything this smooth, and to get to the point where all those thousands of different kinds of data are fully integrated into the Wolfram Language—and Wolfram|Alpha.

But what about other data—say data from some new study or experiment? It’s easy to upload it someplace in some raw form. But the challenge is to make the data actually useful.

And that’s where the new Wolfram Data Repository comes in. Its idea is to leverage everything we’ve done with the Wolfram Language—and Wolfram|Alpha, and the Wolfram Cloud—to make it as easy as possible to make data as broadly usable and computable as possible.

There are many parts to this. But let me state our basic goal. I want it to be the case that if someone is dealing with data they understand well, then they should be able to prepare that data for the Wolfram Data Repository in as little as 30 minutes—and then have that data be something that other people can readily use and compute with.

It’s important to set expectations. Making data fully computable—to the standard of what’s built into the Wolfram Language—is extremely hard. But there’s a lower standard that still makes data extremely useful for many purposes. And what’s important about the Wolfram Data Repository (and the technology around it) is it now makes that standard easy to achieve—with the result that it’s now practical to publish data in a form that can really be used by many people.

An Example

Each item published in the Wolfram Data Repository gets its own webpage. Here, for example, is the page for a public dataset about meteorite landings:

Meteorite Landings

At the top is some general information about the dataset. But then there’s a piece of a Wolfram Notebook illustrating how to use the dataset in the Wolfram Language. And by looking at this notebook, one can start to see some of the real power of the Wolfram Data Repository.

One thing to notice is that it’s very easy to get the data. All you do is ask for ResourceData["Meteorite Landings"]. And whether you’re using the Wolfram Language on a desktop or in the cloud, this will give you a nice symbolic representation of data about 45716 meteorite landings (and, yes, the data is carefully cached so this is as fast as possible, etc.):

And then the important thing is that you can immediately start to do whatever computation you want on that dataset. As an example, this takes the "Coordinates" element from all rows, then takes a random sample of 1000 results, and geo plots them:

GeoListPlot[RandomSample[ResourceData["Meteorite Landings"][All, "Coordinates"],1000]]

Many things have to come together for this to work. First, the data has to be reliably accessible—as it is in the Wolfram Cloud. Second, one has to be able to tell where the coordinates are—which is easy if one can see the dataset in a Wolfram Notebook. And finally, the coordinates have to be in a form in which they can immediately be computed with.

This last point is critical. Just storing the textual form of a coordinate—as one might in something like a spreadsheet—isn’t good enough. One has to have it in a computable form. And needless to say, the Wolfram Language has such a form for geo coordinates: the symbolic construct GeoPosition[{lat, lon}].

There are other things one can immediately see from the meteorites dataset too. Like notice there’s a "Mass" column. And because we’re using the Wolfram Language, masses don’t have to just be numbers; they can be symbolic Quantity objects that correctly include their units. There’s also a "Year" column in the data, and again, each year is represented by an actual, computable, symbolic DateObject construct.

There are lots of different kinds of possible data, and one needs a sophisticated data ontology to handle them. But that’s exactly what we’ve built for the Wolfram Language, and for Wolfram|Alpha, and it’s now been very thoroughly tested. It involves 10,000 kinds of units, and tens of millions of “core entities”, like cities and chemicals and so on. We call it the Wolfram Data Framework (WDF)—and it’s one of the things that makes the Wolfram Data Repository possible.

What’s in the Wolfram Data Repository So Far?

Today is the initial launch of the Wolfram Data Repository, and to get ready for this launch we’ve been adding sample content to the repository for several months. Some of what we’ve added are “obvious” famous datasets. Some are datasets that we found for some reason interesting, or curious. And some are datasets that we created ourselves—and in some cases that I created myself, for example, in the course of writing my book A New Kind of Science.

WDR Home Page Categories

There’s plenty already in the Wolfram Data Repository that’ll immediately be useful in a variety of applications. But in a sense what’s there now is just an example of what can be there—and the kinds of things we hope and expect will be contributed by many other people and organizations.

The fact that the Wolfram Data Repository is built on top of our Wolfram Language technology stack immediately gives it great generality—and means that it can handle data of any kind. It’s not just tables of numerical data as one might have in a spreadsheet or simple database. It’s data of any type and structure, in any possible combination or arrangement.

Home Page Types

There are time series:

Take[ResourceData["US Federal Outlays by Agency"], 3]

There are training sets for machine learning:

RandomSample[ResourceData["MNIST"], 30]

There’s gridded data:

ResourceData["GMM-3 Mars Gravity Map"]

There’s the text of many books:

WordCloud[ResourceData["On the Origin of Species"]]

There’s geospatial data:

ResourceData["Global Air Navigation Aids"]

GeoHistogram[ResourceData"Global Air Navigation Aids"][All, "Geoposition"], 50, GeoRange-> United States]

Many of the data resources currently in the Wolfram Data Repository are quite tabular in nature. But unlike traditional spreadsheets or tables in databases, they’re not restricted to having just one level of rows and columns—because they’re represented using symbolic Wolfram Language Dataset constructs, which can handle arbitrarily ragged structures, of any depth.

ResourceData["Sample Data: Solar System Planets and Moons"]

But what about data that normally lives in relational or graph databases? Well, there’s a construct called EntityStore that was recently added to the Wolfram Language. We’ve actually been using something like it for years inside Wolfram|Alpha. But what EntityStore now does is to let you set up arbitrary networks of entities, properties and values, right in the Wolfram Language. It typically takes more curation than setting up something like a Dataset—but the result is a very convenient representation of knowledge, on which all the same functions can be used as with built-in Wolfram Language knowledge.

Here’s a data resource that’s an entity store:

ResourceData ["Museum of Modern Art Holdings and Artists"]

This adds the entity stores to the list of entity stores to be used automatically:

PrependTo [$EntityStores, %];

Now here are 5 random entities of type "MoMAArtist" from the entity store:

RandomEntity ["MoMAArtist", 5]

For each artist, one can extract a dataset of values:

Entity[MoMAArtist], Otto Mühl[Dataset]

This queries the entity store to find artists with the most recent birth dates:

EntityList[Entity["MoMAArtist", "BirthDate" -> TakeLargest[5]]]

How It Works

The Wolfram Data Repository is built on top of a new, very general thing in the Wolfram Language called the “resource system”. (Yes, expect all sorts of other repository and marketplace-like things to be rolling out shortly.)

The resource system has “resource objects”, that are stored in the cloud (using CloudObject), then automatically downloaded and cached on the desktop if necessary (using LocalObject). Each ResourceObject contains both primary content and metadata. For the Wolfram Data Repository, the primary content is data, which you can access using ResourceData.

ResourceData

The Wolfram Data Repository that we’re launching today is a public resource, that lives in the public Wolfram Cloud. But we’re also going to be rolling out private Wolfram Data Repositories, that can be run in Enterprise Private Clouds—and indeed inside our own company we’ve already set up several private data repositories, that contain internal data for our company.

There’s no limit in principle on the size of the data that can be stored in the Wolfram Data Repository. But for now, the “plumbing” is optimized for data that’s at most about a few gigabytes in size—and indeed the existing examples in the Wolfram Data Repository make it clear that an awful lot of useful data never even gets bigger than a few megabytes in size.

The Wolfram Data Repository is primarily intended for the case of definitive data that’s not continually changing. For data that’s constantly flowing in—say from IoT devices—we released last year the Wolfram Data Drop. Both Data Repository and Data Drop are deeply integrated into the Wolfram Language, and through our resource system, there’ll be some variants and combinations coming in the future.

Delivering Data to the World

Our goal with the Wolfram Data Repository is to provide a central place for data from all sorts of organizations to live—in such a way that it can readily be found and used.

Each entry in the Wolfram Data Repository has an associated webpage, which describes the data it contains, and gives examples that can immediately be run in the Wolfram Cloud (or downloaded to the desktop).

Open in Cloud

On the webpage for each repository entry (and in the ResourceObject that represents it), there’s also metadata, for indexing and searching—including standard Dublin Core bibliographic data. To make it easier to refer to the Wolfram Data Repository entries, every entry also has a unique DOI.

The way we’re managing the Wolfram Data Repository, every entry also has a unique readable registered name, that’s used both for the URL of its webpage, and for the specification of the ResourceObject that represents the entry.

It’s extremely easy to use data from the Wolfram Data Repository inside a Wolfram Notebook, or indeed in any Wolfram Language program. The data is ultimately stored in the Wolfram Cloud. But you can always download it—for example right from the webpage for any repository entry.

The richest and most useful form in which to get the data is the Wolfram Language or the Wolfram Data Framework (WDF)—either in ASCII or in binary. But we’re also setting it up so you can download in other formats, like JSON (and in suitable cases CSV, TXT, PNG, etc.) just by pressing a button.

Data Downloads

Of course, even formats like JSON don’t have native ways to represent entities, or quantities with units, or dates, or geo positions—or all those other things that WDF and the Wolfram Data Repository deal with. So if you really want to handle data in its full form, it’s much better to work directly in the Wolfram Language. But then with the Wolfram Language you can always process some slice of the data into some simpler form that does makes sense to export in a lower-level format.

How Contributions Work

The Wolfram Data Repository as we’re releasing it today is a platform for publishing data to the world. And to get it started, we’ve put in about 500 sample entries. But starting today we’re accepting contributions from anyone. We’re going to review and vet contributions much like we’ve done for the past decade for the Wolfram Demonstrations Project. And we’re going to emphasize contributions and data that we feel are of general interest.

But the technology of the Wolfram Data Repository—and the resource system that underlies it—is quite general, and allows people not just to publish data freely to the world, but also to share data in a more controlled fashion. The way it works is that people prepare their data just like they would for submission to the public Wolfram Data Repository. But then instead of actually submitting it, they just deploy it to their own Wolfram Cloud accounts, giving access to whomever they want.

And in fact, the general workflow is that even when people are submitting to the public Wolfram Data Repository, we’re going to expect them to have first deployed their data to their own Wolfram Cloud accounts. And as soon as they do that, they’ll get webpages and everything—just like in the public Wolfram Data Repository.

OK, so how does one create a repository entry? You can either do it programmatically using Wolfram Language code, or do it more interactively using Wolfram Notebooks. Let’s talk about the notebook way first.

You start by getting a template notebook. You can either do this through the menu item File > New > Data Resource, or you can use CreateNotebook["DataResource"]. Either way, you’ll get something that looks like this:

Data Resource Construction Notebook

Basically it’s then a question of “filling out the form”. A very important section is the one that actually provides the content for the resource:

Resource Content

Yes, it’s Wolfram Language code—and what’s nice is that it’s flexible enough to allow for basically any content you want. You can either just enter the content directly in the notebook, or you can have the notebook refer to a local file, or to a cloud object you have.

An important part of the Construction Notebook (at least if you want to have a nice webpage for your data) is the section that lets you give examples. When the examples are actually put up on the webpage, they’ll reference the data resource you’re creating. But when you’re filling in the Construction Notebook the resource hasn’t been created yet. The symbolic character of the Wolfram Language comes to the rescue, though. Because it lets you reference the content of the data resource symbolically as $$Data in the inputs that’ll be displayed, but lets you set $$Data to actual data when you’re working in the Construction Notebook to build up the examples.

Heart Rate Data

Alright, so once you’ve filled out the Construction Notebook, what do you do? There are two initial choices: set up the resource locally on your computer, or set it up in the cloud:

Private Deploy

And then, if you’re ready, you can actually submit your resource for publication in the public Wolfram Data Repository (yes, you need to get a Publisher ID, so your resource can be associated with your organization rather than just with your personal account):

Submit to the Wolfram Data Repository Page

It’s often convenient to set up resources in notebooks. But like everything else in our technology stack, there’s a programmatic Wolfram Language way to do it too—and sometimes this is what will be best.

Remember that everything that is going to be in the Wolfram Data Repository is ultimately a ResourceObject. And a ResourceObject—like everything else in the Wolfram Language—is just a symbolic expression, which happens to contain an association that gives the content and metadata of the resource object.

Well, once you’ve created an appropriate ResourceObject, you can just deploy it to the cloud using CloudDeploy. And when you do this, a private webpage associated with your cloud account will automatically be created. That webpage will in turn correspond to a CloudObject. And by setting the permissions of that cloud object, you can determine who will be able to look at the webpage, and who will be able to get the data that’s associated with it.

When you’ve got a ResourceObject, you can submit it to the public Wolfram Data Repository just by using ResourceSubmit.

By the way, all this stuff works not just for the main Wolfram Data Repository in the public Wolfram Cloud, but also for data repositories in private clouds. The administrator of an Enterprise Private Cloud can decide how they want to vet data resources that are submitted (and how they want to manage things like name collisions)—though often they may choose just to publish any resource that’s submitted.

The procedure we’ve designed for vetting and editing resources for the public Wolfram Data Repository is quite elaborate—though in any given case we expect it to run quickly. It involves doing automated tests on the incoming data and examples—and then ensuring that these continue working as changes are made, for example in subsequent versions of the Wolfram Language. Administrators of private clouds definitely don’t have to use this procedure—but we’ll be making our tools available if they want to.

Making a Data-Backed Publication

OK, so let’s say there’s a data resource in the Wolfram Data Repository. How can it actually be used to create a data-backed publication? The most obvious answer is just for the publication to include a link to the webpage for the data resource in the Wolfram Data Repository. And once people go to the page, it immediately shows them how to access the data in the Wolfram Language, use it in the Wolfram Open Cloud, download it, or whatever.

But what about an actual visualization or whatever that appears in the paper? How can people know how to make it? One possibility is that the visualization can just be included among the examples on the webpage for the data resource. But there’s also a more direct way, which uses Source Links in the Wolfram Cloud.

Here’s how it works. You create a Wolfram Notebook that takes data from the Wolfram Data Repository and creates the visualization:

Creating Visualizations with a Wolfram Notebook

Then you deploy this visualization to the Wolfram Cloud—either using Wolfram Language functions like CloudDeploy and EmbedCode, or using menu items. But when you do the deployment, you say to include a source link (SourceLink->Automatic in the Wolfram Language). And this means that when you get an embeddable graphic, it comes with a source link that takes you back to the notebook that made the graphic:

Sourcelink to the Notebook

So if someone is reading along and they get to that graphic, they can just follow its source link to see how it was made, and to see how it accesses data from the Wolfram Data Repository. With the Wolfram Data Repository you can do data-backed publishing; with source links you can also do full notebook-backed publishing.

The Big Win

Now that we’ve talked a bit about how the Wolfram Data Repository works, let’s talk again about why it’s important—and why having data in it is so valuable.

The #1 reason is simple: it makes data immediately useful, and computable.

There’s nice, easy access to the data (just use ResourceData["..."]). But the really important—and unique—thing is that data in the Wolfram Data Repository is stored in a uniform, symbolic way, as WDF, leveraging everything we’ve done with data over the course of so many years in the Wolfram Language and Wolfram|Alpha.

Why is it good to have data in WDF? First, because in WDF the meaning of everything is explicit: whether it’s an entity, or quantity, or geo position, or whatever, it’s a symbolic element that’s been carefully designed and documented. (And it’s not just a disembodied collection of numbers or strings.) And there’s another important thing: data in WDF is already in precisely the form it’s needed for one to be able to immediately visualize, analyze or otherwise compute with it using any of the many thousands of functions that are built into the Wolfram Language.

Wolfram Notebooks are also an important part of the picture—because they make it easy to show how to work with the data, and give immediately runnable examples. Also critical is the fact that the Wolfram Language is so succinct and easy to read—because that’s what makes it possible to give standalone examples that people can readily understand, modify and incorporate into their own work.

In many cases using the Wolfram Data Repository will consist of identifying some data resource (say through a link from a document), then using the Wolfram Language in Wolfram Notebooks to explore the data in it. But the Wolfram Data Repository is fully integrated into the Wolfram Language, so it can be used wherever the language is used. Which means the data from the Wolfram Data Repository can be used not just in the cloud or on the desktop, but also in servers and so on. And, for example, it can also be used in APIs or scheduled tasks, using the exact same ResourceData functions as ever.

The most common way the Wolfram Data Repository will be used is one resource at a time. But what’s really great about the uniformity and standardization that WDF provides is that it allows different data resources to be used together: those dates or geo positions mean the same thing even in different data resources, so they can immediately be put together in the same analysis, visualization, or whatever.

The Wolfram Data Repository builds on the whole technology stack that we’ve been assembling for the past three decades. In some ways it’s just a sophisticated piece of infrastructure that makes a lot of things easier to do. But I can already tell that its implications go far beyond that—and that it’s going to have a qualitative effect on the extent to which people can successfully share and reuse a wide range of kinds of data.

The Process of Data Curation

It’s a big win to have data in the Wolfram Data Repository. But what’s involved in getting it there? There’s almost always a certain amount of data curation required.

Let’s take a look again at the meteorite landings dataset I showed earlier in this post. It started from a collection of data made available in a nicely organized way by NASA. (Quite often one has to scrape webpages or PDFs; this is a case where the data happens to be set up to be downloadable in a variety of convenient formats.)

Meteorite Landing Data from NASA

As is fairly typical, the basic elements of the data here are numbers and strings. So the first thing to do is to figure out how to map these to meaningful symbolic constructs in WDF. For example, the “mass” column is labeled as being “(g)”, i.e. in grams—so each element in it should get converted to Quantity[value,"Grams"]. It’s a little trickier, though, because for some rows—corresponding to some meteorites—the value is just blank, presumably because it isn’t known.

So how should that be represented? Well, because the Wolfram Language is symbolic it’s pretty easy. And in fact there’s a standard symbolic construct Missing[...] for indicating missing data, which is handled consistently in analysis and visualization functions.

As we start to look further into the dataset, we see all sorts of other things. There’s a column labeled “year”. OK, we can convert that into DateObject[{value}]—though we need to be careful about any BC dates (how would they appear in the raw data?).

Next there are columns “reclat” and “reclong”, as well as a column called “GeoLocation” that seems to combine these, but with numbers quoted a different precision. A little bit of searching suggests that we should just take reclat and reclong as the latitude and longitude of the meteorite—then convert these into the symbolic form GeoPosition[{lat,lon}].

To do this in practice, we’d start by just importing all the data:

Import ["https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv?accessType=\ DOWNLOAD", "CSV"]

OK, let’s extract a sample row:

data [[2]]

Already there’s something unexpected: the date isn’t just the year, but instead it’s a precise time. So this needs to be converted:

Interpreter ["DateTime"][data[[2, 7]]]

Now we’ve got to reset this to correspond only to a date at a granularity of a year:

DateObject [{DateValue[Interpreter["DateTime"][data[[2, 7]]],     "Year"]}, "Year"]

Here is the geo position:

GeoPosition [{data[[2, -3]], data[[2, -2]]}]" title="GeoPosition[{data[[2, -3]], data[[2, -2]]}]

And we can keep going, gradually building up code that can be applied to each row of the imported data. In practice there are often little things that go wrong. There’s something missing in some row. There’s an extra piece of text (a “footnote”) somewhere. There’s something in the data that got misinterpreted as a delimiter when the data was provided for download. Each one of these needs to be handled—preferably with as much automation as possible.

But in the end we have a big list of rows, each of which needs to be assembled into an association, then all combined to make a Dataset object that can be checked to see if it’s good to go into the Wolfram Data Repository.

The Data Curation Hierarchy

The example above is fairly typical of basic curation that can be done in less than 30 minutes by any decently skilled user of the Wolfram Language. (A person who’s absorbed my book An Elementary Introduction to the Wolfram Language should, for example, be able to do it.)

It’s a fairly simple example—where notably the original form of the data was fairly clean. But even in this case it’s worth understanding what hasn’t been done. For example, look at the column labeled "Classification" in the final dataset. It’s got a bunch of strings in it. And, yes, we can do something like make a word cloud of these strings:

WordCloud [  Normal[ResourceData["Meteorite Landings"][All, "Classification"]]]

But to really make these values computable, we’d have to do more work. We’d have to figure out some kind of symbolic representation for meteorite classification, then we’d have to do curation (and undoubtedly ask some meteorite experts) to fit everything nicely into that representation. The advantage of doing this is that we could then ask questions about those values (“what meteorites are above L3?”), and expect to compute answers. But there’s plenty we can already do with this data resource without that.

My experience in general has been that there’s a definite hierarchy of effort and payoff in getting data to be computable at different levels—starting with the data just existing in digital form, and ending with the data being cleanly computable enough that it can be fully integrated in the core Wolfram Language, and used for repeated, systematic computations.

Data Hierarchy Levels

 

Let’s talk about this hierarchy a bit.

The zeroth thing, of course, is that the data has to exist. And the next thing is that it has to be in digital form. If it started on handwritten index cards, for example, it had better have been entered into a document or spreadsheet or something.

But then the next issue is: how are people supposed to get access to that document or spreadsheet? Well, a good answer is that it should be in some kind of accessible cloud—perhaps referenced with a definite URI. And for a lot of data repositories that exist out there, just making the data accessible like this is the end of the story.

But one has to go a lot further to make the data actually useful. The next step is typically to make sure that the data is arranged in some definite structure. It might be a set of rows and columns, or it might be something more elaborate, and, say, hierarchical. But the point is to have a definite, known structure.

In the Wolfram Language, it’s typically trivial to take data that’s stored in any reasonable format, and use Import to get it into the Wolfram Language, arranged in some appropriate way. (As I’ll talk about later, it might be a Dataset, it might be an EntityStore, it might just be a list of Image objects, or it might be all sorts of other things.)

But, OK, now things start getting more difficult. We need to be able to recognize, say, that such-and-such a column has entries representing countries, or pairs of dates, or animal species, or whatever. SemanticImport uses machine learning and does a decent job of automatically importing many kinds of data. But there are often things that have to be fixed. How exactly is missing data represented? Are there extra annotations that get in the way of automatic interpretation? This is where one starts needing experts, who really understand the data.

But let’s say one’s got through this stage. Well, then in my experience, the best thing to do is to start visualizing the data. And very often one will immediately see things that are horribly wrong. Some particular quantity was represented in several inconsistent ways in the data. Maybe there was some obvious transcription or other error. And so on. But with luck it’s fairly easy to transform the data to handle the obvious issues—though to actually get it right almost always requires someone who is an expert on the data.

What comes out of this process is typically very useful for many purposes—and it’s the level of curation that we’re expecting for things submitted to the Wolfram Data Repository.

It’ll be possible to do all sorts of analysis and visualization and other things with data in this form.

But if one wants, for example, to actually integrate the data into Wolfram|Alpha, there’s considerably more that has to be done. For a start, everything that can realistically be represented symbolically has to be represented symbolically. It’s not good enough to have random strings giving values of things—because one can’t ask systematic questions about those. And this typically requires inventing systematic ways to represent new kinds of concepts in the world—like the "Classification" for meteorites.

Wolfram|Alpha works by taking natural language input. So the next issue is: when there’s something in the data that can be referred to, how do people refer to it in natural language? Often there’ll be a whole collection of names for something, with all sorts of variations. One has to algorithmically capture all of the possibilities.

Next, one has to think about what kinds of questions will be asked about the data. In Wolfram|Alpha, the fact that the questions get asked in natural language forces a certain kind of simplicity on them. But it makes one also need to figure out just what the linguistics of the questions can be (and typically this is much more complicated than the linguistics for entities or other definite things). And then—and this is often a very difficult part—one has to figure out what people want to compute, and how they want to compute it.

At least in the world of Wolfram|Alpha, it turns out to be quite rare for people just to ask for raw pieces of data. They want answers to questions—that have to be computed with models, or methods, or algorithms, from the underlying data. For meteorites, they might want to know not the raw information about when a meteorite fell, but compute the weathering of the meteorite, based on when it fell, what climate it’s in, what it’s made of, and so on. And to have data successfully be integrated into Wolfram|Alpha, those kinds of computations all need to be there.

For full Wolfram|Alpha there’s even more. Not only does one have to be able to give a single answer, one has to be able to generate a whole report, that includes related answers, and presents them in a well-organized way.

It’s ultimately a lot of work. There are very few domains that have been added to Wolfram|Alpha with less than a few skilled person-months of work. And there are plenty of domains that have taken person-years or tens of person-years. And to get the right answers, there always has to be a domain expert involved.

Getting data integrated into Wolfram|Alpha is a significant achievement. But there’s further one can go—and indeed to integrate data into the Wolfram Language one has to go further. In Wolfram|Alpha people are asking one-off questions—and the goal is to do as well as possible on individual questions. But if there’s data in the Wolfram Language, people won’t just ask one-off questions with it: they’ll also do large-scale systematic computations. And this demands a much greater level of consistency and completeness—which in my experience rarely takes less than person-years per domain to achieve.

But OK. So where does this leave the Wolfram Data Repository? Well, the good news is that all that work we’ve put into Wolfram|Alpha and the Wolfram Language can be leveraged for the Wolfram Data Repository. It would take huge amounts of work to achieve what’s needed to actually integrate data into Wolfram|Alpha or the Wolfram Language. But given all the technology we have, it takes very modest amounts of work to make data already very useful. And that’s what the Wolfram Data Repository is about.

The Data Publishing Ecosystem

With the Wolfram Data Repository (and Wolfram Notebooks) there’s finally a great way to do true data-backed publishing—and to ensure that data can be made available in an immediately useful and computable way.

For at least a decade there’s been lots of interest in sharing data in areas like research and government. And there’ve been all sorts of data repositories created—often with good software engineering—with the result that instead of data just sitting on someone’s local computer, it’s now pretty common for it to be uploaded to a central server or cloud location.

But the problem has been that the data in these repositories is almost always in a quite raw form—and not set up to be generally meaningful and computable. And in the past—except in very specific domains—there’s been no really good way to do this, at least in any generality. But the point of the Wolfram Data Repository is to use all the development we’ve done on the Wolfram Language and WDF to finally be able to provide a framework for having data in an immediately computable form.

The effect is dramatic. One goes from a situation where people are routinely getting frustrated trying to make use of data to one in which data is immediately and readily usable. Often there’s been lots of investment and years of painstaking work put into accumulating some particular set of data. And it’s often sad to see how little the data actually gets used—even though it’s in principle accessible to anyone. But I’m hoping that the Wolfram Data Repository will provide a way to change this—by allowing data not just to be accessible, but also computable, and easy for anyone to immediately and routinely use as part of their work.

There’s great value to having data be computable—but there’s also some cost to making it so. Of course, if one’s just collecting the data now, and particularly if it’s coming from automated sources, like networks of sensors, then one can just set it up to be in nice, computable WDF right from the start (say by using the data semantics layer of the Wolfram Data Drop). But at least for a while there’s going to still be a lot of data that’s in the form of things like spreadsheets and traditional databases—-that don’t even have the technology to support the kinds of structures one would need to directly represent WDF and computable data.

So that means that there’ll inevitably have to be some effort put into curating the data to make it computable. Of course, with everything that’s now in the Wolfram Language, the level of tools available for curation has become extremely high. But to do curation properly, there’s always some level of human effort—and some expert input—that’s required. And a key question in understanding the post-Wolfram-Data-Repository data publishing ecosystem is who is actually going to do this work.

In a first approximation, it could be the original producers of the data—or it could be professional or other “curation specialists”—or some combination. There are advantages and disadvantages to all of these possibilities. But I suspect that at least for things like research data it’ll be most efficient to start with the original producers of the data.

The situation now with data curation is a little similar to the historical situation with document production. Back when I was first doing science (yes, in the 1970s) people handwrote papers, then gave them to professional typists to type. Once typed, papers would be submitted to publishers, who would then get professional copyeditors to copyedit them, and typesetters to typeset them for printing. It was all quite time consuming and expensive. But over the course of the 1980s, authors began to learn to type their own papers on a computer—and then started just uploading them directly to servers, in effect putting them immediately in publishable form.

It’s not a perfect analogy, but in both data curation and document editing there are issues of structure and formatting—and then there are issues that require actual understanding of the content. (Sometimes there are also more global “policy” issues too.) And for producing computable data, as for producing documents, almost always the most efficient thing will be to start with authors “typing their own papers”—or in the case of data, putting their data into WDF themselves.

Of course, to do this requires learning at least a little about computable data, and about how to do curation. And to assist with this we’re working with various groups to develop materials and provide training about such things. Part of what has to be communicated is about mechanics: how to move data, convert formats, and so on. But part of it is also about principles—and about how to make the best judgement calls in setting up data that’s computable.

We’re planning to organize “curateathons” where people who know the Wolfram Language and have experience with WDF data curation can pair up with people who understand particular datasets—and hopefully quickly get all sorts of data that they may have accumulated over decades into computable form—and into the Wolfram Data Repository.

In the end I’m confident that a very wide range of people (not just techies, but also humanities people and so on) will be able to become proficient at data curation with the Wolfram Language. But I expect there’ll always be a certain mixture of “type it yourself” and “have someone type it for you” approaches to data curation. Some people will make their data computable themselves—or will have someone right there in their lab or whatever who does. And some people will instead rely on outside providers to do it.

Who will these providers be? There’ll be individuals or companies set up much like the ones who provide editing and publishing services today. And to support this we’re planning a “Certified Data Curator” program to help define consistent standards for people who will work with the originators of a wide range of different kinds of data putting it into computable form.

But in additional to individuals or specific “curation companies”, there are at least two other kinds of entities that have the potential to be major facilitators of making data computable.

The first is research libraries. The role of libraries at many universities is somewhat in flux these days. But something potentially very important for them to do is to provide a central place for organizing—and making computable—data from the university and beyond. And in many ways this is just a modern analog of traditional library activities like archiving and cataloging.

It might involve the library actually having a private cloud version of the Wolfram Data Repository—and it might involve the library having its own staff to do curation. Or it might just involve the library providing advice. But I’ve found there’s quite a bit of enthusiasm in the library community for this kind of direction (and it’s perhaps an interesting sign that at our company people involved in data curation have often originally been trained in library science).

In addition to libraries, another type of organization that should be involved in making data computable is publishing companies. Some might say that publishing companies have had it a bit easy in the last couple of decades. Back in the day, every paper they published involved all sorts of production work, taking it from manuscript to final typeset version. But for years now, authors have been delivering their papers in digital forms that publishers don’t have to do much work on.

With data, though, there’s again something for publishers to do, and again a place for them to potentially add great value. Authors can pretty much put raw data into public repositories for themselves. But what would make publishers visibly add value is for them to process (or “edit”) the data—putting in the work to make it computable. The investment and processes will be quite similar to what was involved on the text side in the past—it’s just that now instead of learning about phototypesetting systems, publishers should be learning about WDF and the Wolfram Language.

It’s worth saying that as of today all data that we accept into the Wolfram Data Repository is being made freely available. But we’re anticipating in the near future we’ll also incorporate a marketplace in which data can be bought and sold (and even potentially have meaningful DRM, at least if it’s restricted to being used in the Wolfram Language). It’ll also be possible to have a private cloud version of the Wolfram Data Repository—in which whatever organization that runs it can set up whatever rules it wants about contributions, subscriptions and access.

One feature of traditional paper publishing is the sense of permanence it provides: once even just a few hundred printed copies of a paper are on shelves in university libraries around the world, it’s reasonable to assume that the paper is going to be preserved forever. With digital material, preservation is more complicated.

If someone just deploys a data resource to their Wolfram Cloud account, then it can be available to the world—but only so long as the account is maintained. The Wolfram Data Repository, though, is intended to be something much more permanent. Once we’ve accepted a piece of data for the repository, our goal is to ensure that it’ll continue to be available, come what may. It’s an interesting question how best to achieve that, given all sorts of possible future scenarios in the world. But now that the Wolfram Data Repository is finally launched, we’re going to be working with several well-known organizations to make sure that its content is as securely maintained as possible.

Data-Backed Journals

The Wolfram Data Repository—and private versions of it—is basically a powerful, enabling technology for making data available in computable form. And sometimes all one wants to do is to make the data available.

But at least in academic publishing, the main point usually isn’t the data. There’s usually a “story to be told”—and the data is just backup for that story. Of course, having that data backing is really important—and potentially quite transformative. Because when one has the data, in computable form, it’s realistic for people to work with it themselves, reproducing or checking the research, and directly building on it themselves.

But, OK, how does the Wolfram Data Repository relate to traditional academic publishing? For our official Wolfram Data Repository we’re going to have definite standards for what we accept—and we’re going to concentrate on data that we think is of general interest or use. We have a whole process for checking the structure of data, and applying software quality assurance methods, as well as expert review, to it.

And, yes, each entry in the Wolfram Data Repository gets a DOI, just like a journal article. But for our official Wolfram Data Repository we’re focused on data—and not the story around it. We don’t see it as our role to check the methods by which the data was obtained, or to decide whether conclusions drawn from it are valid or not.

But given the Wolfram Data Repository, there are lots of new opportunities for data-backed academic journals that do in effect “tell stories”, but now have the infrastructure to back them up with data that can readily be used.

I’m looking forward, for example, to finally making the journal Complex Systems that I founded 30 years ago a true data-backed journal. And there are many existing journals where it makes sense to use versions of the Wolfram Data Repository (often in a private cloud) to deliver computable data associated with journal articles.

But what’s also interesting is that now that one can take computable data for granted, there’s a whole new generation of “Journal of Data-Backed ____” journals that become possible—that not only use data from the Wolfram Data Repository, but also actually present their results as Wolfram Notebooks that can immediately be rerun and extended (and can also, for example, contain interactive elements).

The Corporate Version

I’ve been talking about the Wolfram Data Repository in the context of things like academic journals. But it’s also important in corporate settings. Because it gives a very clean way to have data shared across an organization (or shared with customers, etc.).

Typically in a corporate setting one’s talking about private cloud versions. And of course these can have their own rules about how contributions work, and who can access what. And the data can not only be immediately used in Wolfram Notebooks, but also in automatically generated reports, or instant APIs.

It’s been interesting to see—during the time we’ve been testing the Wolfram Data Repository—just how many applications we’ve found for it within our own company.

There’s information that used to be on webpages, but is now in our private Wolfram Data Repository, and is now immediately usable for computation. There’s information that used to be in databases, and which required serious programming to access, but is now immediately accessible through the Wolfram Language. And there are all sorts of even quite small lists and so on that used to exist only in textual form, but are now computable data in our data repository.

It’s always been my goal to have a truly “computable company”—and putting in place our private Wolfram Data Repository is an important step in achieving this.

My Very Own Data

In addition to public and corporate uses, there are also great uses of Wolfram Data Repository technology for individuals—and particularly for individual researchers. In my own case, I’ve got huge amounts of data that I’ve collected or generated over the course of my life. I happen to be pretty organized at keeping things—but it’s still usually something of an adventure to remember enough to “bring back to life” data I haven’t dealt with in a decade or more. And in practice I make much less use of older data than I should—even though in many cases it took me immense effort to collect or generate the data in the first place.

But now it’s a different story. Because all I have to do is to upload data once and for all to the Wolfram Data Repository, and then it’s easy for me to get and use the data whenever I want to. Some data (like medical or financial records) I want just for myself, so I use a private cloud version of the Wolfram Data Repository. But other data I’ve been getting uploaded into the public Wolfram Data Repository.

Here’s an example. It comes from a page in my book A New Kind of Science:

Page 833 from A New Kind of Science

The page says that by searching about 8 trillion possible systems in the computational universe I found 199 that satisfy some particular criterion. And in the book I show examples of some of these. But where’s the data?

Well, because I’m fairly organized about such things, I can go into my file system, and find the actual Wolfram Notebook from 2001 that generated the picture in the book. And that leads me to a file that contains the raw data—which then takes a very short time to turn into a data resource for the Wolfram Data Repository:

Three-Color Cellular Automaton Rules that Double Their Input

We’ve been systematically mining data from my research going back into the 1980s—even from Mathematica Version 1 notebooks from 1988 (which, yes, still work today). Sometimes the experience is a little less inspiring. Like to find a list of people referenced in the index of A New Kind of Science, together with their countries and dates, the best approach seemed to be to scrape the online book website:

ResourceData["People Mentioned in Stephen Wolfram\[CloseCurlyQuote]s \ \[OpenCurlyDoubleQuote]A New Kind of Science\[CloseCurlyDoubleQuote]"]

And to get a list of the books I used while working on A New Kind of Science required going into an ancient FileMaker database. But now all the data—nicely merged with Open Library information deduced from ISBNs—is in a clean WDF form in the Wolfram Data Repository. So I can do such things as immediately make a word cloud of the titles of the books:

WordCloud[  StringRiffle[   Normal[ResourceData["Books in Stephen Wolfram's Library"][All,      "Title"]]]]

What It Means

Many things have had to come together to make today’s launch of the Wolfram Data Repository possible. In the modern software world it’s easy to build something that takes blobs of data and puts them someplace in the cloud for people to access. But what’s vastly more difficult is to have the data actually be immediately useful—and making that possible is what’s required the whole development of our Wolfram Language and Wolfram Cloud technology stack, which are now the basis for the Wolfram Data Repository.

But now that the Wolfram Data Repository exists—and private versions of it can be set up—there are lots of new opportunities. For the research community, the most obvious is finally being able to do genuine data-backed publication, where one can routinely make underlying data from pieces of research available in a way that people can actually use. There are variants of this in education—making data easy to access and use for educational exercises and projects.

In the corporate world, it’s about making data conveniently available across an organization. And for individuals, it’s about maintaining data in such a way that it can be readily used for computation, and built on.

But in the end, I see the Wolfram Data Repository as a key enabling technology for defining how one can work with data in the future—and I’m excited that after all this time it’s finally now launched and available to everyone.


To comment, please visit the copy of this post at the Wolfram Blog »

Machine Learning for Middle Schoolers

$
0
0
Machine Learning for Middle Schoolers

(An Elementary Introduction to the Wolfram Language is available in print, as an ebook, and free on the web—as well as in Wolfram Programming Lab in the Wolfram Open Cloud. There’s also now a free online hands-on course based on it.)

An Elementary Introduction to the Wolfram LanguageA year ago I published a book entitled An Elementary Introduction to the Wolfram Language—as part of my effort to teach computational thinking to the next generation. I just published the second edition of the book—with (among other things) a significantly extended section on modern machine learning.

I originally expected my book’s readers would be high schoolers and up. But it’s actually also found a significant audience among middle schoolers (11- to 14-year-olds). So the question now is: can one teach the core concepts of modern machine learning even to middle schoolers? Well, the interesting thing is that—thanks to the whole technology stack we’ve now got in the Wolfram Language—the answer seems to be “yes”!

Here’s what I did in the book:


After this main text, the book has Exercises, Q&A and Tech Notes.

Exercises, Q&A, Tech Notes

The Backstory

What was my thinking behind this machine learning section? Well, first, it has to fit into the flow of the book—using only concepts that have already been introduced, and, when possible, reinforcing them. So it can talk about images, and real-world data, and graphs, and text—but not functional programming or external data resources.

Chapter list

With modern machine learning, it’s easy to show “wow” examples—like our imageidentify.com website from 2015 (based on the Wolfram Language ImageIdentify function). But my goal in the book was also to communicate a bit of the background and intuition of how machine learning works, and where it can be used.

I start off by explaining that machine learning is different from traditional “programming”, because it’s based on learning from examples, rather than on explicitly specifying computational steps. The first thing I discuss is something that doesn’t really need all the fanciness of modern neural-net machine learning: it’s recognizing what languages text fragments are from:

LanguageIdentify[{"thank you", "merci", "dar las gracias", "感謝",    "благодарить"}]

Kids (and other people) can sort of imagine (or discuss in a classroom) how something like this might work—looking words up in dictionaries, etc. And I think it’s useful to give a first example that doesn’t seem like “pure magic”. (In reality, LanguageIdentify uses a combination of traditional lookup, and modern machine learning techniques.)

But then I give a much more “magic” example—of ImageIdentify:

ImageIdentify[]

I don’t immediately try to explain how it works, but instead go on to something different: sentiment analysis. Kids have lots of fun trying out sentiment analysis. But the real point here is that it shows the idea of making a “classifier”: there are an infinite number of possible inputs, but only (in this case) 3 possible outputs:

Classify["Sentiment", "I'm so excited to be programming"]

Having seen this, we’re ready to give a little more indication of how something like this works. And what I do is to show the function Classify classifying handwritten digits into 0s and 1s. I’m not saying what’s going on inside, but people can get the idea that Classify is given a bunch of examples, and then it’s using those to classify a particular input as being 0 or 1:

Classify[]

OK, but how does it do this? In reality one’s dealing with ideas about attractors—and inputs that lie in the basins of attraction for particular outputs. But in a first approximation, one can say that inputs that are “nearer to”, say, the 0 examples are taken to be 0s, and inputs that are nearer to the 1 examples are taken to be 1s.

People don’t usually have much difficulty with that explanation—unless they start to think too hard about what “nearest” might really mean in this context. But rather than concentrating on that, what I do in the book is just to talk about the case of numbers, where it’s really easy to see what “nearest” means:

Nearest[{10, 20, 30, 40, 50, 60, 70, 80}, 22]

Nearest isn’t the most exciting function to play with: one potentially puts a lot of things in, and then just one “nearest thing” comes out. Still, Nearest is nice because its functionality is pretty easy to understand (and one can have reasonable guesses about algorithms it could use).

Having seen Nearest for numbers, I show Nearest for colors. In the book, I’ve already talked about how colors are represented by red-green-blue triples of numbers, so this isn’t such a stretch—but seeing Nearest operate on colors begins to make it a little more plausible that it could operate on things like images too.

Nearest[]

Next I show the case of words. In the book, I’ve already done quite a bit with strings and words. In the main text I don’t talk about the precise definition of “nearness” for words, but again, kids easily get the basic idea. (In a Tech Note, I do talk about EditDistance, another good algorithmic operation that people can think about and try out.)

Nearest[WordList[], "good", 10]

OK, so how does one get from here to something like ImageIdentify? The approach I used is to talk next about OCR and TextRecognize. This doesn’t seem as “magic” as ImageIdentify (and lots of people know about “OCR’ing documents”), but it’s a good place to get a further idea of what ImageIdentify is doing.

Turning a piece of text into an image, and then back into the same text again, doesn’t seem that impressive or useful. But it gets more interesting if one blurs the text out (and, yes, blurring an image is something I talked about earlier in the book):

Table[Blur[Rasterize[Style["hello", 20]], r], {r, 0, 4}]

Given the blurred image, the question is: can one still recognize the text? At this stage in the book I haven’t talked about /@ (Map) or % (last output) yet, so I have to write the code out a bit more verbosely. But the result is:

TextRecognize /@ %

And, yes, when the image isn’t too blurred, TextRecognize can recognize the text, but when the text gets too blurred, it stops being able to. I like this example, because it shows something impressive—but not “magic”—happening. And I think it’s useful to show both where machine learning-based functions succeed, and where they fail. By the way, the result here is different from the one in the book—because the text font is different, and those details matter when one’s on the edge of what can be recognized. (If one was doing this in a class, for example, one might try some different fonts and sizes, and discuss why some survive more blurring than others.)

TextRecognize shows how one can effectively do something like ImageIdentify, but with just 26 letterforms (well, actually, TextRecognize handles many more glyphs than that). But now in the book I show ImageIdentify again, blurring like we did with letters:

Table[Blur[], r], {r, 0, 22, 2}]

ImageIdentify /@ %

It’s fun to see what it does, but it’s also helpful. Because it gives a sense of the “attractor” around the “cheetah” concept: stay fairly close and the cheetah can still be recognized; go too far away and it can’t. (A slightly tricky issue is that we’re continually producing new, better neural nets for ImageIdentify—so even between when the book was finished and today there’ve been some new nets—and it so happens they give different results for the not-a-cheetah cases. Presumably the new results are “better”, though it’s not clear what that means, given that we don’t have an official right-answer “blurred cheetah” category, and who’s to say whether the blurriest image is more like a whortleberry or a person.)

I won’t go through my whole discussion of machine learning from the book here. Suffice it to say that after discussing explicitly trained functions like TextRecognize and ImageIdentify, I start discussing “unsupervised learning”, and things like clustering in feature space. I think our new FeatureSpacePlot is particularly helpful.

It’s fairly clear what it means to arrange colors:

FeatureSpacePlot[RandomColor[100]]

But then one can “do the same thing” with images of letters. (In the book the code is a little longer, because I haven’t talked about /@ yet.)

FeatureSpacePlot[Rasterize /@ Alphabet[]]

And what’s nice about this is that—as well as being useful in its own right—it also reinforces the idea of how something like TextRecognize might work by finding the “nearest letter” to whatever input it’s given.

My final example in the section uses photographs. FeatureSpacePlot does a nice job of separating images of different kinds of things—again giving an idea of how ImageIdentify might work:

FeatureSpacePlot[{}]

Obviously in just 10 pages in an elementary book I’m not able to give a complete exposition of modern machine learning. But I was pleased to see how many of the core concepts I was able to touch on.

Of course, the fact that this was possible at all depends critically on our whole Wolfram Language technology stack. Whether it’s the very fact that we have machine learning in the language, or the fact that we can seamlessly work with images or text or whatever, or the whole (28-year-old!) Wolfram Notebook system that lets us put all these pieces together—all these pieces are critical to making it possible to bring modern machine learning to people like middle schoolers.

And what I really like is that what one gets to do isn’t toy stuff: one can take what I’m discussing in the book, and immediately apply it in real-world situations. At some level the fact that this works is a reflection of the whole automation emphasis of the Wolfram Language: there’s very sophisticated stuff going on inside, but it’s automated at all levels, so one doesn’t need to be an expert and understand the details to be able to use it—or to get a good intuition about what can work and what can’t.

Going Further

OK, so how would one go further in teaching machine learning?

One early thing might be to start talking about probabilities. ImageIdentify has various possible choices of identifications, but what probabilities does it assign to them?
ImageIdentify[, All, 10, "Probability"]

This can lead to a useful discussion about prior probabilities, and about issues like trading off specificity for certainty.

But the big thing to talk about is training. (After all, “machine learning trainer” will surely be a big future career for some of today’s middle schoolers…) And the good news is that in the Wolfram Language environment, it’s possible to make training work with only a modest amount of data.

Let’s get some examples of images of characters from Guardians of the Galaxy by searching the web (we’re using an external search API, so you unfortunately can’t do exactly this on the Open Cloud):

data = AssociationMap[ WebImageSearch[#, "Thumbnails"] &, {"Star-Lord", "Gamora", "Groot", "Rocket Raccoon"}]

Now we can use these images as training material to create a classifier:

classifier=Classify[data]

And, sure enough, it can identify Rocket:

classifier[]

And, yes, it thinks a real raccoon is him too:

classifier[]

How does it do it? Well, let’s look at FeatureSpacePlot:

FeatureSpacePlot[Flatten[Values[data]]]

Some of this looks good—but some looks confusing. Because it’s arranging some of the images not according to who they’re of, but just according to their background colors. And here we begin to see some of the subtlety of machine learning. The actual classifier we built works only because in the training examples for each character there were ones with different backgrounds—so it can figure out that background isn’t the only distinguishing feature.

Actually, there’s another critical thing as well: Classify isn’t starting from scratch in classifying the images. Because it’s already been pre-trained to pick out “good features” that help distinguish real-world images. In fact, it’s actually using everything it learned from the creation of ImageIdentify—and the tens of millions of images it saw in connection with that—to know up front what features it should pay attention to.

It’s a bit weird to see, but internally Classify is characterizing each image as a list of numbers, each associated with a different “feature”:

FeatureExtract[{}]

One can do an extreme version of this in which one insists that each image is reduced to just two numbers—and that’s essentially how FeatureSpacePlot determines where to position an image:

DimensionReduce[{}]

Under the Hood

OK, but what’s going on under the hood? Well, it’s complicated. But in the Wolfram Language it’s easy to see—and getting a look at it helps in terms of getting an intuition about how neural nets really work. So, for example, here’s the low-level Wolfram Language symbolic representation of the neural net that powers ImageIdentify:

net = NetModel["Wolfram ImageIdentify Net for WL 11.1"]

And there’s actually even more: just click and keep drilling down:

net = NetModel["Wolfram ImageIdentify Net for WL 11.1"]

And yes, this is hard to understand—certainly for middle schoolers, and even for professionals. But if we take this whole neural net object, and apply it to a picture of a tiger, it’ll do what ImageIdentify does, and tell us it’s a tiger:

net[]

But here’s a neat thing, made possible by a whole stack of functionality in the Wolfram Language: we can actually go “inside” the neural net, to get a sense of what’s happening. As an example, let’s just take the first 3 “layers” of the network, apply them to the tiger, and visualize what comes out:

Image /@ Take[net, 3][]

Basically what’s happening is that the network has made lots of copies of the original image, and then processed each of them to pick out a different aspect of the image. (What’s going on actually seems to be remarkably similar to the first few levels of visual processing in the brain.)

What if we go deeper into the network? Here’s what happens at layer 10. The images are more abstracted, and presumably pick out higher-level features:

Image /@ Take[Take[net, 10][],20]

Go to level 20, and the network is “thinking about” lots of little images:

ImageAssemble[Partition[Image /@ Take[net, 20][],30]]

But by level 28, it’s beginning to “come to some conclusions”, with only a few of its possible channels of activity “lighting up”:

ImageAdjust[ImageAssemble[Partition[Image /@ Take[net, 28][],50]]]

Finally, by level 31, all that’s left is an array of numbers, with a few peaks visible:

ListLinePlot[Take[net, 31][]]

And applying the very last layer of the network (a “softmax” layer) only a couple of peaks are left:

ListLinePlot[net[,None], PlotRange -> All]

And the highest one is exactly the one that corresponds to the concept of “tiger”:

net[,"TopProbabilities"]

I’m not imagining that middle schoolers will follow all these details (and no, nobody should be learning neural net layer types like they learn parts of the water cycle). But I think it’s really useful to see “inside” ImageIdentify, and get even a rough sense of how it works. To someone like me it still seems a little like magic that it all comes together as it does. But what’s great is that now with our latest Wolfram Language tools one can easily look inside, and start getting an intuition about what’s going on.

The Process of Training

The idea of the Wolfram Language Classify function is to do machine learning at the highest possible level—as automatically as possible, and building on as much pre-training as possible. But if one wants to get a more complete feeling for what machine learning is like, it’s useful to see what happens if one instead tries to just train a neural net from scratch.

There is an immediate practical issue though: to get a neural net, starting from scratch, to actually do anything useful, one typically has to give it a very large amount of training data—which is hard to collect and wrangle. But the good news here is that with the recent release of the Wolfram Data Repository we have a growing collection of ready-to-use training sets immediately available for use in the Wolfram Language.

Like here’s the classic MNIST handwritten digit training set, with its 60,000 training examples:

ResourceData["MNIST"]

One thing one can do with a training set like this is just feed a random sample of it into Classify. And sure enough this gives one a classifier function that’s essentially a simple version of TextRecognize for handwritten digits:

c = Classify[RandomSample[ResourceData["MNIST"], 1000]]

And even with just 1000 training examples, it does pretty well:

c[{}]

And, yes, we can use FeatureSpacePlot to see how the different digits tend to separate in feature space:

FeatureSpacePlot[First /@ RandomSample[ResourceData["MNIST"], 250]]

But, OK, what if we want to actually train a neural net from scratch, with none of the fancy automation of Classify? Well, first we have to set up a raw neural net. And conveniently, the Wolfram Language has a bunch of classic neural nets built in. Here one’s called LeNet:

lenet = NetModel["LeNet"]

It’s much simpler than the ImageIdentify net, but it’s still pretty complicated. But we don’t have to understand what’s inside it to start training it. Instead, in the Wolfram Language, we can just use NetTrain (which, needless to say, automatically applies all the latest GPU tricks and so on):

net = NetTrain[lenet, RandomSample[ResourceData["MNIST"], 1000]]

It’s pretty neat to watch the training happening, and to see the orange line of the neural net’s error rate for fitting the examples keep going down. After about 20 seconds, NetTrain decides it’s gone far enough, and generates a finally trained net—which works pretty well:

Net[{}]

If you stop the training early, it won’t do quite so well:

net = NetTrain[lenet, RandomSample[ResourceData["MNIST"], 1000], MaxTrainingRounds -> 1]

net[{}]

In the professional world of machine learning, there’s a whole art and science of figuring out the best parameters for training. But with what we’ve got now in the Wolfram Language, nothing is stopping a middle schooler from doing their own experiments, visualizing and analyzing the results, and getting as good an intuition as anyone.

What Are Neural Nets Made Of?

OK, so if we want to really get down to the lowest level, we have to talk about what neural nets are made of. I’m not sure how much of this is middle-school stuff—but as soon as one knows about graphs of functions, one can already explain quite a bit. Because, you see, the “layers” in a neural net are actually just functions, that take numbers in, and put numbers out.

Take layer 2 of LeNet. It’s essentially just a simple Ramp function, which we can immediately plot (and, yes, it looks like a ramp):

Plot[Ramp[x], {x, -1, 1}]

Neural nets don’t typically just deal with individual numbers, though. They deal with arrays (or “tensors”) of numbers—represented in the Wolfram Language as nested lists. And each layer takes an array of numbers in, and puts an array of numbers out. Here’s a typical single layer:

layer = NetInitialize[LinearLayer[4, "Input" -> 2]]

This particular layer is set up to take 2 numbers as input, and put 4 numbers out:

layer[{2, 3}]

It might seem to be doing something quite “random”, and actually it is. Because the actual function the layer is implementing is determined by yet another array of numbers, or “weights”—which NetInitialize here just sets randomly. Here’s what it set them to in this particular case:

NetExtract[layer, "Weights"]

Why is any of this useful? Well, the crucial point is that what NetTrain does is to progressively tweak the weights in each layer of a neural network to try to get the overall behavior of the net to match the training examples you gave.

There are two immediate issues, though. First, the structure of the network has to be such that it’s possible to get the behavior you want by using some appropriate set of weights. And second, there has to be some way to progressively tweak weights so as to get to appropriate values.

Well, it turns out a single LinearLayer like the one above can’t do anything interesting. Here’s a contour plot of (the first element of) its output, as a function of its two inputs. And as the name LinearLayer might suggest, we always get something flat and linear out:

ContourPlot[First[layer[{x, y}]], {x, -1, 1}, {y, -1, 1}]

But here’s the big discovery that makes neural nets useful: if we chain together several layers, it’s easy to get something much more complicated. (And, yes, in the Wolfram Language outputs from one layer get knitted into inputs to the next layer in a nice, automatic way.) Here’s an example with 4 layers—two linear layers and two ramps:

net = NetInitialize[   NetChain[{LinearLayer[10], Ramp, LinearLayer[1], Ramp},     "Input" -> 2]]

And now when we plot the function, it’s more complicated:

ContourPlot[net[{x, y}], {x, -1, 1}, {y, -1, 1}]

We can actually also look at an even simpler case—of a neural net with 3 layers, and just one number as final output. (For technical reasons, it’s nice to still have 2 inputs, though we’ll always set one of those inputs to the constant value of 1.)

net = NetInitialize[   NetChain[{LinearLayer[3], Ramp, LinearLayer[1]}, "Input" -> 2]]

Here’s what this particular network does as a function of its input:

Plot[net[{x, 1}], {x, -2, 2}]

Inside the network, there’s an array of 3 numbers being generated—and it turns out that “3” causes there to be at most 3 (+1) distinct linear parts in the function. Increase the 3 to 100, and things can get more complicated:

net = NetInitialize[   NetChain[{LinearLayer[100], Ramp, LinearLayer[1]}, "Input" -> 2]]
Plot[net[{x, 1}], {x, -2, 2}]

Now, the point is that this is in a sense a “random function”, determined by the particular random weights picked by NetInitialize. If we run NetInitialize a bunch of times, we’ll get a bunch of different results:

Table[With[{net =      NetInitialize[      NetChain[{LinearLayer[100], Ramp, LinearLayer[1]},        "Input" -> 2]]}, Plot[net[{x, 1}], {x, -2, 2}]], 8]

But the big question is: can we find an instance of this “random function” that’s useful for whatever we’re trying to do? Or, more particularly, can we find a random function that reproduces particular training examples?

Let’s imagine that our training examples give the values of the function at the dots in this plot (by the way, the setup here is more like machine learning in the style of Predict than Classify):

ListLinePlot[Table[Mod[n^2, 5], {n, 15}], Mesh -> All]

Here’s an instance of our network again:

net = NetInitialize[   NetChain[{LinearLayer[100], Ramp, LinearLayer[1]}, "Input" -> 2]]

And here’s a plot of what it initially does over the range of the training examples (and, yes, it’s obviously completely wrong):

Plot[net[{n, 1}], {n, 1, 15}]

Well, let’s just try training our network on our training data using NetTrain:

net = NetTrain[net, data = Table[{n, 1} -> {Mod[n^2, 5]}, {n, 15}]]

After about 20 seconds of training on my computer, there’s some vague sign that we’re beginning to reproduce at least some aspects of the original training data. But it’s at best slow going—and it’s not clear what’s eventually going to happen.

Plot[net[{n, 1}], {n, 1, 15}]

It’s a frontier question in neural net research just what structure of net will work best in any particular case (yes, we’re working on this question). But here let’s just try a slightly more complicated network:

net = NetInitialize[   NetChain[{LinearLayer[100], Tanh, LinearLayer[10], Ramp,      LinearLayer[1]}, "Input" -> 2]]

Random instances of this network don’t give very different results from our last network (though the presence of that Tanh layer makes the functions a bit smoother):

Tanh layer

But now let’s do some training (data was defined above):

net = NetTrain[net, data]

And here’s the result—which is surprisingly decent:

Plot[net[{n, 1}], {n, 1, 15}]

In fact, if we compare it to our original training data we see that the training values lie right on the function that the neural net produced:

Show[Plot[net[{n, 1}], {n, 1, 15}],   ListPlot[Table[Mod[n^2, 5], {n, 1, 15}], PlotStyle -> Red]]

Here’s what happened during the training process. The neural net effectively “tried out” a bunch of different possibilities, finally settling on the result here:

Machine learning animation

In what sense is the result “correct”? Well, it fits the training examples, and that’s really all we can ask. Because that’s all the input we gave. How it “interpolates” between the training examples is really its own business.  We’d like it to learn to “generalize” from the data it’s given—but it can’t really deduce much about the whole distribution of the data from the few points it’s being given here, so the kind of smooth interpolation it’s doing is as good as anything.

Outside the range of the training values, the neural net does what seem to be fairly random things—but again, there’s no “right answer” so one can’t really fault it:

Plot[net[{n, 1}], {n, -5, 25}]

But the fact that with the arbitrariness and messiness of our original neural net, we were able to successfully train it at all is quite remarkable. Neural nets of pretty much the type we’re talking about here had actually been studied for more than 60 years—but until the modern “deep learning revolution” nobody knew that it was going to be practical to train them for real problems.

But now—particularly with everything we have now in the Wolfram Language—it’s easy for anyone to do this.

So Much to Explore

Modern machine learning is very new—so even many of the obvious experiments haven’t been tried yet. But with our whole Wolfram Language setup there’s a lot that even middle schoolers can do. For example (and I admit I’m curious about this as I write this post): one can ask just how much something like the tiny neural net we were studying can learn.

Here’s a plot of the lengths of the first 60 Roman numerals:

ListLinePlot[Table[StringLength[RomanNumeral[n]], {n, 60}]]

After a small amount of training, here’s what the network managed to reproduce:

NetTrain[net,    Table[{n, 1} -> {StringLength[RomanNumeral[n]]}, {n, 60}]];
Plot[%[{n, 1}], {n, 1, 60}]

And one might think that maybe this is the best it’ll ever do. But I was curious if it could eventually do better—and so I just let it train for 2 minutes on my computer. And here’s the considerably better result that came out:

NetTrain[net,    Table[{n, 1} -> {StringLength[RomanNumeral[n]]}, {n, 60}],    MaxTrainingRounds -> Quantity[2, "Minutes"]];

Plot[%[{n, 1}], {n, 1, 60}]

I think I can see why this particular thing works the way it does.  But seeing it suggests all sorts of new questions to pursue. But to me the most exciting point is the overarching one of just how wide open this territory is—and how easy it is now to explore it.

Yes, there are plenty of technical details—some fundamental, some superficial. But transcending all of these, there’s intuition to be developed. And that’s something that can perfectly well start with the middle schoolers…

A New Kind of Science: A 15-Year View

$
0
0
15th-thumb

Starting now, in celebration of its 15th anniversary, A New Kind of Science will be freely available in its entirety, with high-resolution images, on the web or for download.

A New Kind of Science

It’s now 15 years since I published my book A New Kind of Science—more than 25 since I started writing it, and more than 35 since I started working towards it. But with every passing year I feel I understand more about what the book is really about—and why it’s important. I wrote the book, as its title suggests, to contribute to the progress of science. But as the years have gone by, I’ve realized that the core of what’s in the book actually goes far beyond science—into many areas that will be increasingly important in defining our whole future.

So, viewed from a distance of 15 years, what is the book really about? At its core, it’s about something profoundly abstract: the theory of all possible theories, or the universe of all possible universes. But for me one of the achievements of the book is the realization that one can explore such fundamental things concretely—by doing actual experiments in the computational universe of possible programs. And in the end the book is full of what might at first seem like quite alien pictures made just by running very simple such programs.

Back in 1980, when I made my living as a theoretical physicist, if you’d asked me what I thought simple programs would do, I expect I would have said “not much”. I had been very interested in the kind of complexity one sees in nature, but I thought—like a typical reductionistic scientist—that the key to understanding it must lie in figuring out detailed features of the underlying component parts.

In retrospect I consider it incredibly lucky that all those years ago I happened to have the right interests and the right skills to actually try what is in a sense the most basic experiment in the computational universe: to systematically take a sequence of the simplest possible programs, and run them.

I could tell as soon as I did this that there were interesting things going on, but it took a couple more years before I began to really appreciate the force of what I’d seen. For me it all started with one picture:

Rule 30

Or, in modern form:

Rule 30, modern form

I call it rule 30. It’s my all-time favorite discovery, and today I carry it around everywhere on my business cards. What is it? It’s one of the simplest programs one can imagine. It operates on rows of black and white cells, starting from a single black cell, and then repeatedly applies the rules at the bottom. And the crucial point is that even though those rules are by any measure extremely simple, the pattern that emerges is not.

It’s a crucial—and utterly unexpected—feature of the computational universe: that even among the very simplest programs, it’s easy to get immensely complex behavior. It took me a solid decade to understand just how broad this phenomenon is. It doesn’t just happen in programs (“cellular automata”) like rule 30. It basically shows up whenever you start enumerating possible rules or possible programs whose behavior isn’t obviously trivial.

Similar phenomena had actually been seen for centuries in things like the digits of pi and the distribution of primes—but they were basically just viewed as curiosities, and not as signs of something profoundly important. It’s been nearly 35 years since I first saw what happens in rule 30, and with every passing year I feel I come to understand more clearly and deeply what its significance is.

Four centuries ago it was the discovery of the moons of Jupiter and their regularities that sowed the seeds for modern exact science, and for the modern scientific approach to thinking. Could my little rule 30 now be the seed for another such intellectual revolution, and a new way of thinking about everything?

In some ways I might personally prefer not to take responsibility for shepherding such ideas (“paradigm shifts” are hard and thankless work). And certainly for years I have just quietly used such ideas to develop technology and my own thinking. But as computation and AI become increasingly central to our world, I think it’s important that the implications of what’s out there in the computational universe be more widely understood.

Implications of the Computational Universe

Here’s the way I see it today. From observing the moons of Jupiter we came away with the idea that—if looked at right—the universe is an ordered and regular place, that we can ultimately understand. But now, in exploring the computational universe, we quickly come upon things like rule 30 where even the simplest rules seem to lead to irreducibly complex behavior.

One of the big ideas of A New Kind of Science is what I call the Principle of Computational Equivalence. The first step is to think of every process—whether it’s happening with black and white squares, or in physics, or inside our brains—as a computation that somehow transforms input to output. What the Principle of Computational Equivalence says is that above an extremely low threshold, all processes correspond to computations of equivalent sophistication.

It might not be true. It might be that something like rule 30 corresponds to a fundamentally simpler computation than the fluid dynamics of a hurricane, or the processes in my brain as I write this. But what the Principle of Computational Equivalence says is that in fact all these things are computationally equivalent.

It’s a very important statement, with many deep implications. For one thing, it implies what I call computational irreducibility. If something like rule 30 is doing a computation just as sophisticated as our brains or our mathematics, then there’s no way we can “outrun” it: to figure out what it will do, we have to do an irreducible amount of computation, effectively tracing each of its steps.

The mathematical tradition in exact science has emphasized the idea of predicting the behavior of systems by doing things like solving mathematical equations. But what computational irreducibility implies is that out in the computational universe that often won’t work, and instead the only way forward is just to explicitly run a computation to simulate the behavior of the system.

A Shift in Looking at the World

One of the things I did in A New Kind of Science was to show how simple programs can serve as models for the essential features of all sorts of physical, biological and other systems. Back when the book appeared, some people were skeptical about this. And indeed at that time there was a 300-year unbroken tradition that serious models in science should be based on mathematical equations.

But in the past 15 years something remarkable has happened. For now, when new models are created—whether of animal patterns or web browsing behavior—they are overwhelmingly more often based on programs than on mathematical equations.

Year by year, it’s been a slow, almost silent, process. But by this point, it’s a dramatic shift. Three centuries ago pure philosophical reasoning was supplanted by mathematical equations. Now in these few short years, equations have been largely supplanted by programs. For now, it’s mostly been something practical and pragmatic: the models work better, and are more useful.

But when it comes to understanding the foundations of what’s going on, one’s led not to things like mathematical theorems and calculus, but instead to ideas like the Principle of Computational Equivalence. Traditional mathematics-based ways of thinking have made concepts like force and momentum ubiquitous in the way we talk about the world. But now as we think in fundamentally computational terms we have to start talking in terms of concepts like undecidability and computational irreducibility.

Will some type of tumor always stop growing in some particular model? It might be undecidable. Is there a way to work out how a weather system will develop? It might be computationally irreducible.

These concepts are pretty important when it comes to understanding not only what can and cannot be modeled, but also what can and cannot be controlled in the world. Computational irreducibility in economics is going to limit what can be globally controlled. Computational irreducibility in biology is going to limit how generally effective therapies can be—and make highly personalized medicine a fundamental necessity.

And through ideas like the Principle of Computational Equivalence we can start to discuss just what it is that allows nature—seemingly so effortlessly—to generate so much that seems so complex to us. Or how even deterministic underlying rules can lead to computationally irreducible behavior that for all practical purposes can seem to show “free will”.

Cellular automata

Mining the Computational Universe

A central lesson of A New Kind of Science is that there’s a lot of incredible richness out there in the computational universe. And one reason that’s important is that it means that there’s a lot of incredible stuff out there for us to “mine” and harness for our purposes.

Want to automatically make an interesting custom piece of art? Just start looking at simple programs and automatically pick out one you like—as in our WolframTones music site from a decade ago. Want to find an optimal algorithm for something? Just search enough programs out there, and you’ll find one.

We’ve normally been used to creating things by building them up, step by step, with human effort—progressively creating architectural plans, or engineering drawings, or lines of code. But the discovery that there’s so much richness so easily accessible in the computational universe suggests a different approach: don’t try building anything; just define what you want, and then search for it in the computational universe.

Sometimes it’s really easy to find. Like let’s say you want to generate apparent randomness. Well, then just enumerate cellular automata (as I did in 1984), and very quickly you come upon rule 30—which turns out to be one of the very best known generators of apparent randomness (look down the center column of cell values, for examples). In other situations you might have to search 100,000 cases (as I did in finding the simplest axiom system for logic, or the simplest universal Turing machine), or you might have to search millions or even trillions of cases. But in the past 25 years, we’ve had incredible success in just discovering algorithms out there in the computational universe—and we rely on many of them in implementing the Wolfram Language.

At some level it’s quite sobering. One finds some tiny program out in the computational universe. One can tell it does what one wants. But when one looks at what it’s doing, one doesn’t have any real idea how it works. Maybe one can analyze some part—and be struck by how “clever” it is. But there just isn’t a way for us to understand the whole thing; it’s not something familiar from our usual patterns of thinking.

Of course, we’ve often had similar experiences before—when we use things from nature. We may notice that some particular substance is a useful drug or a great chemical catalyst, but we may have no idea why. But in doing engineering and in most of our modern efforts to build technology, the great emphasis has instead been on constructing things whose design and operation we can readily understand.

In the past we might have thought that was enough. But what our explorations of the computational universe show is that it’s not: selecting only things whose operation we can readily understand misses most of the immense power and richness that’s out there in the computational universe.

A World of Discovered Technology

What will the world look like when more of what we have is mined from the computational universe? Today the environment we build for ourselves is dominated by things like simple shapes and repetitive processes. But the more we use what’s out there in the computational universe, the less regular things will look. Sometimes they may look a bit “organic”, or like what we see in nature (since after all, nature follows similar kinds of rules). But sometimes they may look quite random, until perhaps suddenly and incomprehensibly they achieve something we recognize.

For several millennia we as a civilization have been on a path to understand more about what happens in our world—whether by using science to decode nature, or by creating our own environment through technology. But to use more of the richness of the computational universe we must at least to some extent forsake this path.

In the past, we somehow counted on the idea that between our brains and the tools we could create we would always have fundamentally greater computational power than the things around us—and as a result we would always be able to “understand” them. But what the Principle of Computational Equivalence says is that this isn’t true: out in the computational universe there are lots of things just as powerful as our brains or the tools we build. And as soon as we start using those things, we lose the “edge” we thought we had.

Today we still imagine we can identify discrete “bugs” in programs. But most of what’s powerful out there in the computational universe is rife with computational irreducibility—so the only real way to see what it does is just to run it and watch what happens.

We ourselves, as biological systems, are a great example of computation happening at a molecular scale—and we are no doubt rife with computational irreducibility (which is, at some fundamental level, why medicine is hard). I suppose it’s a tradeoff: we could limit our technology to consist only of things whose operation we understand. But then we would miss all that richness that’s out there in the computational universe. And we wouldn’t even be able to match the achievements of our own biology in the technology we create.

Machine Learning and the Neural Net Renaissance

There’s a common pattern I’ve noticed with intellectual fields. They go for decades and perhaps centuries with only incremental growth, and then suddenly, usually as a result of a methodological advance, there’s a burst of “hypergrowth” for perhaps 5 years, in which important new results arrive almost every week.

I was fortunate enough that my own very first field—particle physics—was in its period of hypergrowth right when I was involved in the late 1970s. And for myself, the 1990s felt like a kind of personal period of hypergrowth for what became A New Kind of Science—and indeed that’s why I couldn’t pull myself away from it for more than a decade.

But today, the obvious field in hypergrowth is machine learning, or, more specifically, neural nets. It’s funny for me to see this. I actually worked on neural nets back in 1981, before I started on cellular automata, and several years before I found rule 30. But I never managed to get neural nets to do anything very interesting—and actually I found them too messy and complicated for the fundamental questions I was concerned with.

And so I “simplified them”—and wound up with cellular automata. (I was also inspired by things like the Ising model in statistical physics, etc.) At the outset, I thought I might have simplified too far, and that my little cellular automata would never do anything interesting. But then I found things like rule 30. And I’ve been trying to understand its implications ever since.

In building Mathematica and the Wolfram Language, I’d always kept track of neural nets, and occasionally we’d use them in some small way for some algorithm or another. But about 5 years ago I suddenly started hearing amazing things: that somehow the idea of training neural nets to do sophisticated things was actually working. At first I wasn’t sure. But then we started building neural net capabilities in the Wolfram Language, and finally two years ago we released our ImageIdentify.com website—and now we’ve got our whole symbolic neural net system. And, yes, I’m impressed. There are lots of tasks that had traditionally been viewed as the unique domain of humans, but which now we can routinely do by computer.

But what’s actually going on in a neural net? It’s not really to do with the brain; that was just the inspiration (though in reality the brain probably works more or less the same way). A neural net is really a sequence of functions that operate on arrays of numbers, with each function typically taking quite a few inputs from around the array. It’s not so different from a cellular automaton. Except that in a cellular automaton, one’s usually dealing with, say, just 0s and 1s, not arbitrary numbers like 0.735. And instead of taking inputs from all over the place, in a cellular automaton each step takes inputs only from a very well-defined local region.

Now, to be fair, it’s pretty common to study “convolutional neural nets”, in which the patterns of inputs are very regular, just like in a cellular automaton. And it’s becoming clear that having precise (say 32-bit) numbers isn’t critical to the operation of neural nets; one can probably make do with just a few bits.

But a big feature of neural nets is that we know how to make them “learn”. In particular, they have enough features from traditional mathematics (like involving continuous numbers) that techniques like calculus can be applied to provide strategies to make them incrementally change their parameters to “fit their behavior” to whatever training examples they’re given.

It’s far from obvious how much computational effort, or how many training examples, will be needed. But the breakthrough of about five years ago was the discovery that for many important practical problems, what’s available with modern GPUs and modern web-collected training sets can be enough.

Pretty much nobody ends up explicitly setting or “engineering” the parameters in a neural net. Instead, what happens is that they’re found automatically. But unlike with simple programs like cellular automata, where one’s typically enumerating all possibilities, in current neural nets there’s an incremental process, essentially based on calculus, that manages to progressively improve the net—a little like the way biological evolution progressively improves the “fitness” of an organism.

It’s plenty remarkable what comes out from training a neural net in this way, and it’s plenty difficult to understand how the neural net does what it does. But in some sense the neural net isn’t venturing too far across the computational universe: it’s always basically keeping the same basic computational structure, and just changing its behavior by changing parameters.

But to me the success of today’s neural nets is a spectacular endorsement of the power of the computational universe, and another validation of the ideas of A New Kind of Science. Because it shows that out in the computational universe, away from the constraints of explicitly building systems whose detailed behavior one can foresee, there are immediately all sorts of rich and useful things to be found.

NKS Meets Modern Machine Learning

Is there a way to bring the full power of the computational universe—and the ideas of A New Kind of Science—to the kinds of things one does with neural nets? I suspect so. And in fact, as the details become clear, I wouldn’t be surprised if exploration of the computational universe saw its own period of hypergrowth: a “mining boom” of perhaps unprecedented proportions.

In current work on neural nets, there’s a definite tradeoff one sees. The more what’s going on inside the neural net is like a simple mathematical function with essentially arithmetic parameters, the easier it is to use ideas from calculus to train the network. But the more what’s going is like a discrete program, or like a computation whose whole structure can change, the more difficult it is to train the network.

It’s worth remembering, though, that the networks we’re routinely training now would have looked utterly impractical to train only a few years ago. It’s effectively just all those quadrillions of GPU operations that we can throw at the problem that makes training feasible. And I won’t be surprised if even quite pedestrian (say, local exhaustive search) techniques will fairly soon let one do significant training even in cases where no incremental numerical approach is possible. And perhaps even it will be possible to invent some major generalization of things like calculus that will operate in the full computational universe. (I have some suspicions, based on thinking about generalizing basic notions of geometry to cover things like cellular automaton rule spaces.)

What would this let one do? Likely it would let one find considerably simpler systems that could achieve particular computational goals. And maybe that would bring within reach some qualitatively new level of operations, perhaps beyond what we’re used to being possible with things like brains.

There’s a funny thing that’s going on with modeling these days. As neural nets become more successful, one begins to wonder: why bother to simulate what’s going on inside a system when one can just make a black-box model of its output using a neural net? Well, if we manage to get machine learning to reach deeper into the computational universe, we won’t have as much of this tradeoff any more—because we’ll be able to learn models of the mechanism as well as the output.

I’m pretty sure that bringing the full computational universe into the purview of machine learning will have spectacular consequences. But it’s worth realizing that computational universality—and the Principle of Computational Equivalence—make it less a matter of principle. Because they imply that even neural nets of the kinds we have now are universal, and are capable of emulating anything any other system can do. (In fact, this universality result was essentially what launched the whole modern idea of neural nets, back in 1943.)

And as a practical matter, the fact that current neural net primitives are being built into hardware and so on will make them a desirable foundation for actual technology systems, though, even if they’re far from optimal. But my guess is that there are tasks where for the foreseeable future access to the full computational universe will be necessary to make them even vaguely practical.

Finding AI

What will it take to make artificial intelligence? As a kid, I was very interested in figuring out how to make a computer know things, and be able to answer questions from what it knew. And when I studied neural nets in 1981, it was partly in the context of trying to understand how to build such a system. As it happens, I had just developed SMP, which was a forerunner of Mathematica (and ultimately the Wolfram Language)—and which was very much based on symbolic pattern matching (“if you see this, transform it to that”). At the time, though, I imagined that artificial intelligence was somehow a “higher level of computation”, and I didn’t know how to achieve it.

I returned to the problem every so often, and kept putting it off. But then when I was working on A New Kind of Science it struck me: if I’m to take the Principle of Computational Equivalence seriously, then there can’t be any fundamentally “higher level of computation”—so AI must be achievable just with the standard ideas of computation that I already know.

And it was this realization that got me started building Wolfram|Alpha. And, yes, what I found is that lots of those very “AI-oriented things”, like natural language understanding, could be done just with “ordinary computation”, without any magic new AI invention. Now, to be fair, part of what was happening was that we were using ideas and methods from A New Kind of Science: we weren’t just engineering everything; we were often searching the computational universe for rules and algorithms to use.

So what about “general AI”? Well, I think at this point that with the tools and understanding we have, we’re in a good position to automate essentially anything we can define. But definition is a more difficult and central issue than we might imagine.

The way I see things at this point is that there’s a lot of computation even near at hand in the computational universe. And it’s powerful computation. As powerful as anything that happens in our brains. But we don’t recognize it as “intelligence” unless it’s aligned with our human goals and purposes.

Ever since I was writing A New Kind of Science, I’ve been fond of quoting the aphorism “the weather has a mind of its own”. It sounds so animistic and pre-scientific. But what the Principle of Computational Equivalence says is that actually, according to the most modern science, it’s true: the fluid dynamics of the weather is the same in its computational sophistication as the electrical processes that go on in our brains.

But is it “intelligent”? When I talk to people about A New Kind of Science, and about AI, I’ll often get asked when I think we’ll achieve “consciousness” in a machine. Life, intelligence, consciousness: they are all concepts that we have a specific example of, here on Earth. But what are they in general? All life on Earth shares RNA and the structure of cell membranes. But surely that’s just because all life we know is part of one connected thread of history; it’s not that such details are fundamental to the very concept of life.

And so it is with intelligence. We have only one example we’re sure of: us humans. (We’re not even sure about animals.) But human intelligence as we experience it is deeply entangled with human civilization, human culture and ultimately also human physiology—even though none of these details are presumably relevant in the abstract definition of intelligence.

We might think about extraterrestrial intelligence. But what the Principle of Computational Equivalence implies is that actually there’s “alien intelligence” all around us. But somehow it’s just not quite aligned with human intelligence. We might look at rule 30, for example, and be able to see that it’s doing sophisticated computation, just like our brains. But somehow it just doesn’t seem to have any “point” to what it’s doing.

We imagine that in doing the things we humans do, we operate with certain goals or purposes. But rule 30, for example, just seems to be doing what it’s doing—just following some definite rule. In the end, though, one realizes we’re not so very different. After all, there are definite laws of nature that govern our brains. So anything we do is at some level just playing out those laws.

Any process can actually be described either in terms of mechanism (“the stone is moving according to Newton’s laws”), or in terms of goals (“the stone is moving so as to minimize potential energy”). The description in terms of mechanism is usually what’s most useful in connecting with science. But the description in terms of goals is usually what’s most useful in connecting with human intelligence.

And this is crucial in thinking about AI. We know we can have computational systems whose operations are as sophisticated as anything. But can we get them to do things that are aligned with human goals and purposes?

In a sense this is what I now view as the key problem of AI: it’s not about achieving underlying computational sophistication, but instead it’s about communicating what we want from this computation.

The Importance of Language

I’ve spent much of my life as a computer language designer—most importantly creating what is now the Wolfram Language. I’d always seen my role as a language designer being to imagine the possible computations people might want to do, then—like a reductionist scientist—trying to “drill down” to find good primitives from which all these computations could be built up. But somehow from A New Kind of Science, and from thinking about AI, I’ve come to think about it a little differently.

Now what I more see myself as doing is making a bridge between our patterns of human thinking, and what the computational universe is capable of. There are all sorts of amazing things that can in principle be done by computation. But what the language does is to provide a way for us humans to express what we want done, or want to achieve—and then to get this actually executed, as automatically as possible.

Language design has to start from what we know and are familiar with. In the Wolfram Language, we name the built-in primitives with English words, leveraging the meanings that those words have acquired. But the Wolfram Language is not like natural language. It’s something more structured, and more powerful. It’s based on the words and concepts that we’re familiar with through the shared corpus of human knowledge. But it gives us a way to build up arbitrarily sophisticated programs that in effect express arbitrarily complex goals.

Yes, the computational universe is capable of remarkable things. But they’re not necessarily things that we humans can describe or relate to. But in building the Wolfram Language my goal is to do the best I can in capturing everything we humans want—and being able to express it in executable computational terms.

When we look at the computational universe, it’s hard not to be struck by the limitations of what we know how to describe or think about. Modern neural nets provide an interesting example. For the ImageIdentify function of the Wolfram Language we’ve trained a neural net to identify thousands of kinds of things in the world. And to cater to our human purposes, what the network ultimately does is to describe what it sees in terms of concepts that we can name with words—tables, chairs, elephants, etc.

But internally what the network is doing is to identify a series of features of any object in the world. Is it green? Is it round? And so on. And what happens as the neural network is trained is that it identifies features it finds useful for distinguishing different kinds of things in the world. But the point is that almost none of these features are ones to which we happen to have assigned words in human language.

Out in the computational universe it’s possible to find what may be incredibly useful ways to describe things. But they’re alien to us humans. They’re not something we know how to express, based on the corpus of knowledge our civilization has developed.

Now of course new concepts are being added to the corpus of human knowledge all the time. Back a century ago, if someone saw a nested pattern they wouldn’t have any way to describe it. But now we’d just say “it’s a fractal”. But the problem is that in the computational universe there’s an infinite collection of “potentially useful concepts”—with which we can never hope to ultimately keep up.

The Analogy in Mathematics

When I wrote A New Kind of Science I viewed it in no small part as an effort to break away from the use of mathematics—at least as a foundation for science. But one of the things I realized is that the ideas in the book also have a lot of implications for pure mathematics itself.

What is mathematics? Well, it’s a study of certain abstract kinds of systems, based on things like numbers and geometry. In a sense it’s exploring a small corner of the computational universe of all possible abstract systems. But still, plenty has been done in mathematics: indeed, the 3 million or so published theorems of mathematics represent perhaps the largest single coherent intellectual structure that our species has built.

Ever since Euclid, people have at least notionally imagined that mathematics starts from certain axioms (say, a+b=b+a, a+0=a, etc.), then builds up derivations of theorems. Why is math hard? The answer is fundamentally rooted in the phenomenon of computational irreducibility—which here is manifest in the fact that there’s no general way to shortcut the series of steps needed to derive a theorem. In other words, it can be arbitrarily hard to get a result in mathematics. But worse than that—as Gödel’s Theorem showed—there can be mathematical statements where there just aren’t any finite ways to prove or disprove them from the axioms. And in such cases, the statements just have to be considered “undecidable”.

And in a sense what’s remarkable about math is that one can usefully do it at all. Because it could be that most mathematical results one cares about would be undecidable. So why doesn’t that happen?

Well, if one considers arbitrary abstract systems it happens a lot. Take a typical cellular automaton—or a Turing machine—and ask whether it’s true that the system, say, always settles down to periodic behavior regardless of its initial state. Even something as simple as that will often be undecidable.

So why doesn’t this happen in mathematics? Maybe there’s something special about the particular axioms used in mathematics. And certainly if one thinks they’re the ones that uniquely describe science and the world there might be a reason for that. But one of the whole points of the book is that actually there’s a whole computational universe of possible rules that can be useful for doing science and describing the world.

And in fact I don’t think there’s anything abstractly special about the particular axioms that have traditionally been used in mathematics: I think they’re just accidents of history.

What about the theorems that people investigate in mathematics? Again, I think there’s a strong historical character to them. For all but the most trivial areas of mathematics, there’s a whole sea of undecidability out there. But somehow mathematics picks the islands where theorems can actually be proved—often particularly priding itself on places close to the sea of undecidability where the proof can only be done with great effort.

I’ve been interested in the whole network of published theorems in mathematics (it’s a thing to curate, like wars in history, or properties of chemicals). And one of the things I’m curious about is whether something there’s an inexorable sequence to the mathematics that’s done, or whether, in a sense, random parts are being picked.

And here, I think, there’s a considerable analogy to the kind of thing we were discussing before with language. What is a proof? Basically it’s a way of explaining to someone why something is true. I’ve made all sorts of automated proofs in which there are hundreds of steps, each perfectly verifiable by computer. But—like the innards of a neural net—what’s going on looks alien and not understandable by a human.

For a human to understand, there have to be familiar “conceptual waypoints”. It’s pretty much like with words in languages. If some particular part of a proof has a name (“Smith’s Theorem”), and has a known meaning, then it’s useful to us. But if it’s just a lump of undifferentiated computation, it won’t be meaningful to us.

In pretty much any axiom system, there’s an infinite set of possible theorems. But which ones are “interesting”? That’s really a human question. And basically it’s going to end up being ones with “stories”. In the book I show that for the simple case of basic logic, the theorems that have historically been considered interesting enough to be given names happen to be precisely the ones that are in some sense minimal.

But my guess is that for richer axiom systems pretty much anything that’s going to be considered “interesting” is going to have to be reached from things that are already considered interesting. It’s like building up words or concepts: you don’t get to introduce new ones unless you can directly relate them to existing ones.

In recent years I’ve wondered quite a bit about how inexorable or not progress is in a field like mathematics. Is there just one historical path that can be taken, say from arithmetic to algebra to the higher reaches of modern mathematics? Or are there an infinite diversity of possible paths, with completely different histories for mathematics?

The answer is going to depend on—in a sense—the “structure of metamathematical space”: just what is the network of true theorems that avoid the sea of undecidability? Maybe it’ll be different for different fields of mathematics, and some will be more “inexorable” (so it feels like the math is being “discovered”) than others (where it seems more like the math is arbitrary, and “invented”).

But to me one of the most interesting things is how close—when viewed in these kinds of terms—questions about the nature and character of mathematics end up being to questions about the nature and character of intelligence and AI. And it’s this kind of commonality that makes me realize just how powerful and general the ideas in A New Kind of Science actually are.

When Is There a Science?

There are some areas of science—like physics and astronomy—where the traditional mathematical approach has done quite well. But there are others—like biology, social science and linguistics—where it’s had a lot less to say. And one of the things I’ve long believed is that what’s needed to make progress in these areas is to generalize the kinds of models one’s using, to consider a broader range of what’s out there in the computational universe.

And indeed in the past 15 or so years there’s been increasing success in doing this. And there are lots of biological and social systems, for example, where models have now been constructed using simple programs.

But unlike with mathematical models which can potentially be “solved”, these computational models often show computational irreducibility, and are typically used by doing explicit simulations. This can be perfectly successful for making particular predictions, or for applying the models in technology. But a bit like for the automated proofs of mathematical theorems one might still ask, “is this really science?”.

Yes, one can simulate what a system does, but does one “understand” it? Well, the problem is that computational irreducibility implies that in some fundamental sense one can’t always “understand” things. There might be no useful “story” that can be told; there may be no “conceptual waypoints”—only lots of detailed computation.

Imagine that one’s trying to make a science of how the brain understands language—one of the big goals of linguistics. Well, perhaps we’ll get an adequate model of the precise rules which determine the firing of neurons or some other low-level representation of the brain. And then we look at the patterns generated in understanding some whole collection of sentences.

Well, what if those patterns look like the behavior of rule 30? Or, closer at hand, the innards of some recurrent neural network? Can we “tell a story” about what’s happening? To do so would basically require that we create some kind of higher-level symbolic representation: something where we effectively have words for core elements of what’s going on.

But computational irreducibility implies that there may ultimately be no way to create such a thing. Yes, it will always be possible to find patches of computational reducibility, where some things can be said. But there won’t be a complete story that can be told. And one might say there won’t be a useful reductionistic piece of science to be done. But that’s just one of the things that happens when one’s dealing with (as the title says) a new kind of science.

Controlling the AIs

People have gotten very worried about AI in recent years. They wonder what’s going to happen when AIs “get much smarter” than us humans. Well, the Principle of Computational Equivalence has one piece of good news: at some fundamental level, AIs will never be “smarter”—they’ll just be able to do computations that are ultimately equivalent to what our brains do, or, for that matter, what all sorts of simple programs do.

As a practical matter, of course, AIs will be able to process larger amounts of data more quickly than actual brains. And no doubt we’ll choose to have them run many aspects of the world for us—from medical devices, to central banks to transportation systems, and much more.

So then it’s important to figure how we’ll tell them what to do. As soon as we’re making serious use of what’s out there in the computational universe, we’re not going to be able to give a line-by-line description of what the AIs are going to do. Rather, we’re going to have to define goals for the AIs, then let them figure out how best to achieve those goals.

In a sense we’ve already been doing something like this for years in the Wolfram Language. There’s some high-level function that describes something you want to do (“lay out a graph”, “classify data”, etc.). Then it’s up to the language to automatically figure out the best way to do it.

And in the end the real challenge is to find a way to describe goals. Yes, you want to search for cellular automata that will make a “nice carpet pattern”, or a “good edge detector”. But what exactly do those things mean? What you need is a language that a human can use to say as precisely as possible what they mean.

It’s really the same problem as I’ve been talking about a lot here. One has to have a way for humans to be able to talk about things they care about. There’s infinite detail out there in the computational universe. But through our civilization and our shared cultural history we’ve come to identify certain concepts that are important to us. And when we describe our goals, it’s in terms of these concepts.

Three hundred years ago people like Leibniz were interested in finding a precise symbolic way to represent the content of human thoughts and human discourse. He was far too early. But now I think we’re finally in a position to actually make this work. In fact, we’ve already gotten a long way with the Wolfram Language in being able to describe real things in the world. And I’m hoping it’ll be possible to construct a fairly complete “symbolic discourse language” that lets us talk about the things we care about.

Right now we write legal contracts in “legalese” as a way to make them slightly more precise than ordinary natural language. But with a symbolic discourse language we’ll be able to write true “smart contracts” that describe in high-level terms what we want to have happen—and then machines will automatically be able to verify or execute the contract.

But what about the AIs? Well, we need to tell them what we generally want them to do. We need to have a contract with them. Or maybe we need to have a constitution for them. And it’ll be written in some kind of symbolic discourse language, that both allows us humans to express what we want, and is executable by the AIs.

There’s lots to say about what should be in an AI Constitution, and how the construction of such things might map onto the political and cultural landscape of the world. But one of the obvious questions is: can the constitution be simple, like Asimov’s Laws of Robotics?

And here what we know from A New Kind of Science tells us the answer: it can’t be. In a sense the constitution is an attempt to sculpt what can happen in the world and what can’t. But computational irreducibility says that there will be an unbounded collection of cases to consider.

For me it’s interesting to see how theoretical ideas like computational irreducibility end up impinging on these very practical—and central—societal issues. Yes, it all started with questions about things like the theory of all possible theories. But in the end it turns into issues that everyone in society is going to end up being concerned about.

There’s an Endless Frontier

Will we reach the end of science? Will we—or our AIs—eventually invent everything there is to be invented?

For mathematics, it’s easy to see that there’s an infinite number of possible theorems one can construct. For science, there’s an infinite number of possible detailed questions to ask. And there’s also an infinite array of possible inventions one can construct.

But the real question is: will there always be interesting new things out there?

Well, computational irreducibility says there will always be new things that need an irreducible amount of computational work to reach from what’s already there. So in a sense there’ll always be “surprises”, that aren’t immediately evident from what’s come before.

But will it just be like an endless array of different weirdly shaped rocks? Or will there be fundamental new features that appear, that we humans consider interesting?

It’s back to the very same issue we’ve encountered several times before: for us humans to find things “interesting” we have to have a conceptual framework that we can use to think about them. Yes, we can identify a “persistent structure” in a cellular automaton. Then maybe we can start talking about “collisions between structures”. But when we just see a whole mess of stuff going on, it’s not going to be “interesting” to us unless we have some higher-level symbolic way to talk about it.

In a sense, then, the rate of “interesting discovery” isn’t going to be limited by our ability to go out into the computational universe and find things. Instead, it’s going to be limited by our ability as humans to build a conceptual framework for what we’re finding.

It’s a bit like what happened in the whole development of what became A New Kind of Science. People had seen related phenomena for centuries if not millennia (distribution of primes, digits of pi, etc.). But without a conceptual framework they just didn’t seem “interesting”, and nothing was built around them. And indeed as I understand more about what’s out there in the computational universe—and even about things I saw long ago there—I gradually build up a conceptual framework that lets me go further.

By the way, it’s worth realizing that inventions work a little differently from discoveries. One can see something new happen in the computational universe, and that might be a discovery. But an invention is about figuring out how something can be achieved in the computational universe.

And—like in patent law—it isn’t really an invention if you just say “look, this does that”. You have to somehow understand a purpose that it’s achieving.

In the past, the focus of the process of invention has tended to be on actually getting something to work (“find the lightbulb filament that works”, etc.). But in the computational universe, the focus shifts to the question of what you want the invention to do. Because once you’ve described the goal, finding a way to achieve it is something that can be automated.

That’s not to say that it will always be easy. In fact, computational irreducibility implies that it can be arbitrarily difficult. Let’s say you know the precise rules by which some chemicals can interact. Can you find a chemical synthesis pathway that will let you get to some particular chemical structure? There may be a way, but computational irreducibility implies that there may be no way to find out how long the pathway may be. And if you haven’t found a pathway you may never be sure if it’s because there isn’t one, or just because you didn’t reach it yet.

The Fundamental Theory of Physics

If one thinks about reaching the edge of science, one cannot help but wonder about the fundamental theory of physics. Given everything we’ve seen in the computational universe, is it conceivable that our physical universe could just correspond to one of those programs out there in the computational universe?

Of course, we won’t really know until or unless we find it. But in the years since A New Kind of Science appeared, I’ve become ever more optimistic about the possibilities.

Needless to say, it would be a big change for physics. Today there are basically two major frameworks for thinking about fundamental physics: general relativity and quantum field theory. General relativity is a bit more than 100 years old; quantum field theory maybe 90. And both have achieved spectacular things. But neither has succeeded in delivering us a complete fundamental theory of physics. And if nothing else, I think after all this time, it’s worth trying something new.

But there’s another thing: from actually exploring the computational universe, we have a huge amount of new intuition about what’s possible, even in very simple models. We might have thought that the kind of richness we know exists in physics would require some very elaborate underlying model. But what’s become clear is that that kind of richness can perfectly well emerge even from a very simple underlying model.

What might the underlying model be like? I’m not going to discuss this in great detail here, but suffice it to say that I think the most important thing about the model is that it should have as little as possible built in. We shouldn’t have the hubris to think we know how the universe is constructed; we should just take a general type of model that’s as unstructured as possible, and do what we typically do in the computational universe: just search for a program that does what we want.

My favorite formulation for a model that’s as unstructured as possible is a network: just a collection of nodes with connections between them. It’s perfectly possible to formulate such a model as an algebraic-like structure, and probably many other kinds of things. But we can think of it as a network. And in the way I’ve imagined setting it up, it’s a network that’s somehow “underneath” space and time: every aspect of space and time as we know it must emerge from the actual behavior of the network.

Over the past decade or so there’s been increasing interest in things like loop quantum gravity and spin networks. They’re related to what I’ve been doing in the same way that they also involve networks. And maybe there’s some deeper relationship. But in their usual formulation, they’re much more mathematically elaborate.

From the point of view of the traditional methods of physics, this might seem like a good idea. But with the intuition we have from studying the computational universe—and using it for science and technology—it seems completely unnecessary. Yes, we don’t yet know the fundamental theory of physics. But it seems sensible to start with the simplest hypothesis. And that’s definitely something like a simple network of the kind I’ve studied.

At the outset, it’ll look pretty alien to people (including myself) trained in traditional theoretical physics. But some of what emerges isn’t so alien. A big result I found nearly 20 years ago (that still hasn’t been widely understood) is that when you look at a large enough network of the kind I studied you can show that its averaged behavior follows Einstein’s equations for gravity. In other words, without putting any fancy physics into the underlying model, it ends up automatically emerging. I think it’s pretty exciting.

People ask a lot about quantum mechanics. Yes, my underlying model doesn’t build in quantum mechanics (just as it doesn’t build in general relativity). Now, it’s a little difficult to pin down exactly what the essence of “being quantum mechanical” actually is. But there are some very suggestive signs that my simple networks actually end up showing what amounts to quantum behavior—just like in the physics we know.

OK, so how should one set about actually finding the fundamental theory of physics if it’s out there in the computational universe of possible programs? Well, the obvious thing is to just start searching for it, starting with the simplest programs.

I’ve been doing this—more sporadically than I would like—for the past 15 years or so. And my main discovery so far is that it’s actually quite easy to find programs that aren’t obviously not our universe. There are plenty of programs where space or time are obviously completely different from the way they are in our universe, or there’s some other pathology. But it turns out it’s not so difficult to find candidate universes that aren’t obviously not our universe.

But we’re immediately bitten by computational irreducibility. We can simulate the candidate universe for billions of steps. But we don’t know what it’s going to do—and whether it’s going to grow up to be like our universe, or completely different.

It’s pretty unlikely that in looking at that tiny fragment of the very beginning of a universe we’re going to ever be able to see anything familiar, like a photon. And it’s not at all obvious that we’ll be able to construct any kind of descriptive theory, or effective physics. But in a sense the problem is bizarrely similar to the one we have even in systems like neural networks: there’s computation going on there, but can we identify “conceptual waypoints” from which we can build up a theory that we might understand?

It’s not at all clear our universe has to be understandable at that level, and it’s quite possible that for a very long time we’ll be left in the strange situation of thinking we might have “found our universe” out in the computational universe, but not being sure.

Of course, we might be lucky, and it might be possible to deduce an effective physics, and see that some little program that we found ends up reproducing our whole universe. It would be a remarkable moment for science. But it would immediately raise a host of new questions—like why this universe, and not another?

Box of a Trillion Souls

Right now us humans exist as biological systems. But in the future it’s certainly going to be technologically possible to reproduce all the processes in our brains in some purely digital—computational—form. So insofar as those processes represent “us”, we’re going to be able to be “virtualized” on pretty much any computational substrate. And in this case we might imagine that the whole future of a civilization could wind up in effect as a “box of a trillion souls”.

Inside that box there would be all kinds of computations going on, representing the thoughts and experiences of all those disembodied souls. Those computations would reflect the rich history of our civilization, and all the things that have happened to us. But at some level they wouldn’t be anything special.

It’s perhaps a bit disappointing, but the Principle of Computational Equivalence tells us that ultimately these computations will be no more sophisticated than the ones that go on in all sorts of other systems—even ones with simple rules, and no elaborate history of civilization. Yes, the details will reflect all that history. But in a sense without knowing what to look for—or what to care about—one won’t be able to tell that there’s anything special about it.

OK, but what about for the “souls” themselves? Will one be able to understand their behavior by seeing that they achieve certain purposes? Well, in our current biological existence, we have all sorts of constraints and features that give us goals and purposes. But in a virtualized “uploaded” form, most of these just go away.

I’ve thought quite a bit about how “human” purposes might evolve in such a situation, recognizing, of course, that in virtualized form there’s little difference between human and AI. The disappointing vision is that perhaps the future of our civilization consists in disembodied souls in effect “playing videogames” for the rest of eternity.

But what I’ve slowly realized is that it’s actually quite unrealistic to project our view of goals and purposes from our experience today into that future situation. Imagine talking to someone from a thousand years ago and trying to explain that people in the future would be walking on treadmills every day, or continually sending photographs to their friends. The point is that such activities don’t make sense until the cultural framework around them has developed.

It’s the same story yet again as with trying to characterize what’s interesting or what’s explainable. It relies on the development of a whole network of conceptual waypoints.

Can we imagine what the mathematics of 100 years from now will be like? It depends on concepts we don’t yet know. So similarly if we try to imagine human motivation in the future, it’s going to rely on concepts we don’t know. Our best description from today’s viewpoint might be that those disembodied souls are just “playing videogames”. But to them there might be a whole subtle motivation structure that they could only explain by rewinding all sorts of steps in history and cultural development.

By the way, if we know the fundamental theory of physics then in a sense we can make the virtualization complete, at least in principle: we can just run a simulation of the universe for those disembodied souls. Of course, if that’s what’s happening, then there’s no particular reason it has to be a simulation of our particular universe. It could as well be any universe from out in the computational universe.

Now, as I’ve mentioned, even in any given universe one will never in a sense run out of things to do, or discover. But I suppose I myself at least find it amusing to imagine that at some point those disembodied souls might get bored with just being in a simulated version of our physical universe—and might decide it’s more fun (whatever that means to them) to go out and explore the broader computational universe. Which would mean that in a sense the future of humanity would be an infinite voyage of discovery in the context of none other than A New Kind of Science!

The Economics of the Computational Universe

Long before we have to think about disembodied human souls, we’ll have to confront the issue of what humans should be doing in a world where more and more can be done automatically by AIs. Now in a sense this issue is nothing new: it’s just an extension of the long-running story of technology and automation. But somehow this time it feels different.

And I think the reason is in a sense just that there’s so much out there in the computational universe, that’s so easy to get to. Yes, we can build a machine that automates some particular task. We can even have a general-purpose computer that can be programmed to do a full range of different tasks. But even though these kinds of automation extend what we can do, it still feels like there’s effort that we have to put into them.

But the picture now is different—because in effect what we’re saying is that if we can just define the goal we want to achieve, then everything else will be automatic. All sorts of computation, and, yes, “thinking”, may have to be done, but the idea is that it’s just going to happen, without human effort.

At first, something seems wrong. How could we get all that benefit, without putting in more effort? It’s a bit like asking how nature could manage to make all the complexity it does—even though when we build artifacts, even with great effort, they end up far less complex. The answer, I think, is it’s mining the computational universe. And it’s exactly the same thing for us: by mining the computational universe, we can achieve essentially an unbounded level of automation.

If we look at the important resources in today’s world, many of them still depend on actual materials. And often these materials are literally mined from the Earth. Of course, there are accidents of geography and geology that determine by whom and where that mining can be done. And in the end there’s a limit (if often very large) to the amount of material that’ll ever be available.

But when it comes to the computational universe, there’s in a sense an inexhaustible supply of material—and it’s accessible to anyone. Yes, there are technical issues about how to “do the mining”, and there’s a whole stack of technology associated with doing it well. But the ultimate resource of the computational universe is a global and infinite one. There’s no scarcity and no reason to be “expensive”. One just has to understand that it’s there, and take advantage of it.

The Path to Computational Thinking

Probably the greatest intellectual shift of the past century has been the one towards the computational way of thinking about things. I’ve often said that if one picks almost any field “X”, from archaeology to zoology, then by now there either is, or soon will be, a field called “computational X”—and it’s going to be the future of the field.

I myself have been deeply involved in trying to enable such computational fields, in particular through the development of the Wolfram Language. But I’ve also been interested in what is essentially the meta problem: how should one teach abstract computational thinking, for example to kids? The Wolfram Language is certainly important as a practical tool. But what about the conceptual, theoretical foundations?

Well, that’s where A New Kind of Science comes in. Because at its core it’s discussing the pure abstract phenomenon of computation, independent of its applications to particular fields or tasks. It’s a bit like with elementary mathematics: there are things to teach and understand just to introduce the ideas of mathematical thinking, independent of their specific applications. And so it is too with the core of A New Kind of Science. There are things to learn about the computational universe that give intuition and introduce patterns of computational thinking—quite independent of detailed applications.

One can think of it as a kind of “pre computer science” , or “pre computational X”. Before one gets into discussing the specifics of particular computational processes, one can just study the simple but pure things one finds in the computational universe.

And, yes, even before kids learn to do arithmetic, it’s perfectly possible for them to fill out something like a cellular automaton coloring book—or to execute for themselves or on a computer a whole range of different simple programs. What does it teach? Well, it certainly teaches the idea that there can be definite rules or algorithms for things—and that if one follows them one can create useful and interesting results. And, yes, it helps that systems like cellular automata make obvious visual patterns, that for example one can even find in nature (say on mollusc shells).

As the world becomes more computational—and more things are done by AIs and by mining the computational universe—there’s going to an extremely high value not only in understanding computational thinking, but also in having the kind of intuition that develops from exploring the computational universe and that is, in a sense, the foundation for A New Kind of Science.

What’s Left to Figure Out?

My goal over the decade that I spent writing A New Kind of Science was, as much as possible, to answer all the first round of “obvious questions” about the computational universe. And looking back 15 years later I think that worked out pretty well. Indeed, today, when I wonder about something to do with the computational universe, I find it’s incredibly likely that somewhere in the main text or notes of the book I already said something about it.

But one of the biggest things that’s changed over the past 15 years is that I’ve gradually begun to understand more of the implications of what the book describes. There are lots of specific ideas and discoveries in the book. But in the longer term I think what’s most significant is how they serve as foundations, both practical and conceptual, for a whole range of new things that one can now understand and explore.

But even in terms of the basic science of the computational universe, there are certainly specific results one would still like to get. For example, it would be great to get more evidence for or against the Principle of Computational Equivalence, and its domain of applicability.

Like most general principles in science, the whole epistemological status of the Principles of Computational Equivalence is somewhat complicated. Is it like a mathematical theorem that can be proved? Is it like a law of nature that might (or might not) be true about the universe? Or is it like a definition, say of the very concept of computation? Well, much like, say, the Second Law of Thermodynamics or Evolution by Natural Selection, it’s a combination of these.

But one thing that’s significant is that it’s possible to get concrete evidence for (or against) the Principle of Computational Equivalence. The principle says that even systems with very simple rules should be capable of arbitrarily sophisticated computation—so that in particular they should be able to act as universal computers.

And indeed one of the results of the book is that this is true for one of the simplest possible cellular automata (rule 110). Five years after the book was published I decided to put up a prize for evidence about another case: the simplest conceivably universal Turing machine. And I was very pleased that in just a few months the prize was won, the Turing machine was proved universal, and there was another piece of evidence for the Principle of Computational Equivalence.

There’s a lot to do in developing the applications of A New Kind of Science. There are models to be made of all sorts of systems. There’s technology to be found. Art to be created. There’s also a lot to do in understanding the implications.

But it’s important not to forget the pure investigation of the computational universe. In the analogy of mathematics, there are applications to be pursued. But there’s also a “pure mathematics” that’s worth pursuing in its own right. And so it is with the computational universe: there’s a huge amount to explore just at an abstract level. And indeed (as the title of the book implies) there’s enough to define a whole new kind of science: a pure science of the computational universe. And it’s the opening of that new kind of science that I think is the core achievement of A New Kind of Science—and the one of which I am most proud.


For the 10th anniversary of A New Kind of Science, I wrote three posts:

The complete high-resolution A New Kind of Science is now available on the web. There are also a limited number of print copies of the book still available (all individually coded!).

Oh My Gosh, It’s Covered in Rule 30s!

$
0
0
blog-cambridge-station-thumb

A British Train Station

A week ago a new train station, named “Cambridge North”, opened in Cambridge, UK. Normally such an event would be far outside my sphere of awareness. (I think I last took a train to Cambridge in 1975.) But last week people started sending me pictures of the new train station, wondering if I could identify the pattern on it:

Cambridge North train station

And, yes, it does indeed look a lot like patterns I’ve spent years studying—that come from simple programs in the computational universe. My first—and still favorite—examples of simple programs are one-dimensional cellular automata like this:

One-dimensional cellular automata

The system evolves line by line from the top, determining the color of each cell according to the rule underneath. This particular cellular automata I called “rule 182”, because the bit pattern in the rule corresponds to the number 182 in binary. There are altogether 256 possible cellular automata like this, and this is what all of them do:

256 possible cellular automata

Many of them show fairly simple behavior. But the huge surprise I got when I first ran all these cellular automata in the early 1980s is that even though all the rules are very simple to state, some of them generate very complex behavior. The first in the list that does that—and still my favorite example—is rule 30:

Rule 30

If one runs it for 400 steps one gets this:

After 400 steps

And, yes, it’s remarkable that starting from one black cell at the top, and just repeatedly following a simple rule, it’s possible to get all this complexity. I think it’s actually an example of a hugely important phenomenon, that’s central to how complexity gets made in nature, as well as to how we can get a new level of technology. And in fact, I think it’s important enough that I spent more than a decade writing a 1200-page book (that just celebrated its 15th anniversary) based on it.

And for years I’ve actually had rule 30 on my business cards:

Business cards

But back to the Cambridge North train station. Its pattern is obviously not completely random. But if it was made by a rule, what kind of rule? Could it be a cellular automaton?

I zoomed in on a photograph of the pattern:

Enlarged pattern

Suddenly, something seemed awfully familiar: the triangles, the stripes, the L shapes. Wait a minute… it couldn’t actually be my favorite rule of all time, rule 30?

Clearly the pattern is tipped 45° from how I’d usually display a cellular automaton. And there are black triangles in the photograph, not white ones like in rule 30. But if one black-white inverts the rule (so it’s now rule 135), one gets this:

Black-white inversion of the pattern

And, yes, it’s the same kind of pattern as in the photograph! But if it’s rule 30 (or rule 135) what’s its initial condition? Rule 30 can actually be used as a cryptosystem—because it can be hard (maybe even NP complete) to reconstruct its initial condition.

But, OK, if it’s my favorite rule, I wondered if maybe it’s also my favorite initial condition—a single black cell. And, yes, it is! The train station pattern comes exactly from the (inverted) right-hand edge of my favorite rule 30 pattern!

Edge of rule 30

Here’s the Wolfram Language code. First run the cellular automaton, then rotate the pattern:

Rotate[ArrayPlot[CellularAutomaton[135, {{1},0},40],Mesh->True],-45 Degree]

It’s a little trickier to pull out precisely the section of the pattern that’s used. Here’s the code (the PlotRange is what determines the part of the pattern that’s shown):

Graphics[Rotate[First[ArrayPlot[CellularAutomaton[135, {{1},0},40],Mesh->True]],-45 Degree], PlotRange->{{83,104},{-12,60}}]

OK, so where is this pattern actually used at the train station? Everywhere!

Pattern repeats everywhere

It’s made of perforated aluminum. You can actually look through it, reminiscent of an old latticed window. From inside, the pattern is left-right reversed—so if it’s rule 135 from outside, it’s rule 149 from inside. And at night, the pattern is black-white inverted, because there’s light coming from inside—so from the outside it’s “rule 135 by day, and rule 30 at night”.

What are some facts about the rule 30 pattern? It’s extremely hard to rigorously prove things about it (and that’s interesting in itself—and closely related to the fundamental phenomenon of computational irreducibility). But, for example—like, say, the digits of π—many aspects of it seem random. And, for instance, black and white squares appear to occur with equal frequency—meaning that at the train station the panels let in about 50% of the outside light.

If one looks at sequences of n cells, it seems that all 2n configurations will occur on average with equal frequency. But not everything is random. And so, for example, if one looks at 3×2 blocks of cells, only 24 of the 32 possible ones ever occur. (Maybe some people waiting for trains will figure out which blocks are missing…)

When we look at the pattern, our visual system particularly picks out the black triangles. And, yes, it seems as if triangles of any size can ultimately occur, albeit with frequency decreasing exponentially with size.

If one looks carefully at the right-hand edge of the rule 30 pattern, one can see that it repeats. However, the repetition period seems to increase exponentially as one goes in from the edge.

At the train station, there are lots of identical panels. But rule 30 is actually an inexhaustible source of new patterns. So what would happen if one just continued the evolution, and rendered it on successive panels? Here’s the result. It’s a pity about the hint of periodicity on the right-hand edge, and the big triangle on panel 5 (which might be a safety problem at the train station).

Successive panels

Fifteen more steps in from the edge, there’s no hint of that anymore:

Fifteen more steps

What about other initial conditions? If the initial conditions repeat, then so will the pattern. But otherwise, so far as one can tell, the pattern will look essentially the same as with a single-cell initial condition.

One can try other rules too. Here are a few from the same simplest 256-rule set as rule 30:

Simple 256-rule set

Moving deeper from the edge the results look a little different (for aficionados, rule 89 is a transformed version of rule 45, rule 182 of rule 90, and rule 193 of rule 110):

Moving deeper from the edge

And starting from random initial conditions, rather than a single black cell, things again look different:

Starting from random initial conditions

And here are a few more rules, started from random initial conditions:

A few more rules

Here’s a website (made in a couple of minutes with a tiny piece of Wolfram Language code) that lets you experiment (including with larger rule numbers, based on longer-range rules). (And if you want to explore more systematically, here’s a Wolfram Notebook to try.)

Cellular automaton panel explorer

It’s amazing what’s out there in the computational universe of possible programs. There’s an infinite range of possible patterns. But it’s cool that the Cambridge North train station uses my all-time favorite discovery in the computational universe—rule 30! And it looks great!

The Bigger Picture

There’s something curiously timeless about algorithmically generated forms. A dodecahedron from ancient Egypt still looks crisp and modern today. As do periodic tilings—or nested forms—even from centuries ago:

Periodic tilings and nested forms

But can one generate richer forms algorithmically? Before I discovered rule 30, I’d always assumed that any form generated from simple rules would always somehow end up being obviously simple. But rule 30 was a big shock to my intuition—and from it I realized that actually in the computational universe of all possible rules, it’s actually very easy to get rich and complex behavior, even from simple underlying rules.

And what’s more, the patterns that are generated often have remarkable visual interest. Here are a few produced by cellular automata (now with 3 possible colors for each cell, rather than 2):

Three-color cellular automata

There’s an amazing diversity of forms. And, yes, they’re often complicated. But because they’re based on simple underlying rules, they always have a certain logic to them: in a sense each of them tells a definite “algorithmic story”.

One thing that’s notable about forms we see in the computational universe is that they often look a lot like forms we see in nature. And I don’t think that’s a coincidence. Instead, I think what’s going on is that rules in the computational universe capture the essence of laws that govern lots of systems in nature—whether in physics, biology or wherever. And maybe there’s a certain familiarity or comfort associated with forms in the computational universe that comes from their similarity to forms we’re used to in nature.

But is what we get from the computational universe art? When we pick out something like rule 30 for a particular purpose, what we’re doing is conceptually a bit like photography: we’re not creating the underlying forms, but we are selecting the ones we choose to use.

In the computational universe, though, we can be more systematic. Given some aesthetic criterion, we can automatically search through perhaps even millions or billions of possible rules to find optimal ones: in a sense automatically “discovering art” in the computational universe.

We did an experiment on this for music back in 2007: WolframTones. And what’s remarkable is that even by sampling fairly small numbers of rules (cellular automata, as it happens), we’re able to produce all sorts of interesting short pieces of music—that often seem remarkably “creative” and “inventive”.

From a practical point of view, automatic discovery in the computational universe is important because it allows for mass customization. It makes it easy to be “original” (and “creative”)—and to find something different every time, or to fit constraints that have never been seen before (say, a pattern in a complicated geometric region).

The Cambridge North train station uses a particular rule from the computational universe to make what amounts to an ornamental pattern. But one can also use rules from the computational universe for other things in architecture. And one can even imagine a building in which everything—from overall massing down to details of moldings—is completely determined by something close to a single rule.

One might assume that such a building would somehow be minimalist and sterile. But the remarkable fact is that this doesn’t have to be true—and that instead there are plenty of rich, almost “organic” forms to be “mined” from the computational universe.

Ever since I started writing about one-dimensional cellular automata back in the early 1980s, there’s been all sorts of interesting art done with them. Lots of different rules have been used. Sometimes they’ve been what I called “class 4” rules that have a particularly organic look. But often it’s been other rules—and rule 30 has certainly made its share of appearances—whether it’s on floors, shirts, tea cosies, kinetic installations, or, recently, mass-customized scarves (with the knitting machine actually running the cellular automaton):

CA art

But today we’re celebrating a new and different manifestation of rule 30. Formed from permanent aluminum panels, in an ancient university town, a marvellous corner of the computational universe adorns one of the most practical of structures: a small train station. My compliments to the architects. May what they’ve made give generations of rail travelers a little glimpse of the wonders of the computational universe. And maybe perhaps a few, echoing the last words attributed to the traveler in the movie 2001: A Space Odyssey, exclaim “oh my gosh, it’s covered in rule 30s!”


(Thanks to Wolfram Summer School alum Alyssa Adams for sending us the photos of Cambridge North.)

The Practical Business of Ontology: A Tale from the Front Lines

$
0
0
ontoloty-thumb

The Philosophy of Chemicals

“We’ve just got to decide: is a chemical like a city or like a number?” I spent my day yesterday—as I have for much of the past 30 years—designing new features of the Wolfram Language. And yesterday afternoon one of my meetings was a fast-paced discussion about how to extend the chemistry capabilities of the language.

At some level the problem we were discussing was quintessentially practical. But as so often turns out to be the case for things we do, it ultimately involves some deep intellectual issues. And to actually get the right answer—and to successfully design language features that will stand the test of time—we needed to plumb those depths, and talk about things that usually wouldn’t be considered outside of some kind of philosophy seminar.

Thinker

Part of the issue, of course, is that we’re dealing with things that haven’t really ever come up before. Traditional computer languages don’t try to talk directly about things like chemicals; they just deal with abstract data. But in the Wolfram Language we’re trying to build in as much knowledge about everything as possible, and that means we have to deal with actual things in the world, like chemicals.

We’ve built a whole system in the Wolfram Language for handling what we call entities. An entity could be a city (like New York City), or a movie, or a planet—or a zillion other things. An entity has some kind of name (“New York City”). And it has definite properties (like population, land area, founding date, …).

We’ve long had a notion of chemical entities—like water, or ethanol, or tungsten carbide. Each of these chemical entities has properties, like molecular mass, or structure graph, or boiling point.

And we’ve got many hundreds of thousands of chemicals where we know lots of properties. But all of these are in a sense concrete chemicals: specific compounds that we could put in a test tube and do things with.

But what we were trying to figure out yesterday is how to handle abstract chemicals—chemicals that we just abstractly construct, say by giving an abstract graph representing their chemical structures. Should these be represented by entities, like water or New York City? Or should they be considered more abstract, like lists of numbers, or, for that matter, mathematical graphs?

Well, of course, among the abstract chemicals we can construct are chemicals that we already represent by entities, like sucrose or aspirin or whatever. But here there’s an immediate distinction to make. Are we talking about individual molecules of sucrose or aspirin? Or about these things as bulk materials?

At some level it’s a confusing distinction. Because, we might think, once we know the molecular structure, we know everything—it’s just a matter of calculating it out. And some properties—like molar mass—are basically trivial to calculate from the molecular structure. But others—like melting point—are very far from trivial.

OK, but is this just a temporary problem that one shouldn’t base a long-term language design on? Or is it something more fundamental that will never change? Well, conveniently enough, I happen to have done a bunch of basic science that essentially answers this: and, yes, it’s something fundamental. It’s connected to what I call computational irreducibility. And for example, the precise value of, say, the melting point for an infinite amount of some material may actually be fundamentally uncomputable. (It’s related to the undecidability of the tiling problem; fitting in tiles is like seeing how molecules will arrange to make a solid.)

So by knowing this piece of (rather leading-edge) basic science, we know that we can meaningfully make a distinction between bulk versions of chemicals and individual molecules. Clearly there’s a close relation between, say, water molecules, and bulk water. But there’s still something fundamentally and irreducibly different about them, and about the properties we can compute for them.

At Least the Atoms Should Be OK

Alright, so let’s talk about individual molecules. Obviously they’re made of atoms. And it seems like at least when we talk about atoms, we’re on fairly solid ground. It might be reasonable to say that any given molecule always has some definite collection of atoms in it—though maybe we’ll want to consider “parametrized molecules” when we talk about polymers and the like.

But at least it seems safe to consider types of atoms as entities. After all, each type of atom corresponds to a chemical element, and there are only a limited number of those on the periodic table. Now of course in principle one can imagine additional “chemical elements”; one could even think of a neutron star as being like a giant atomic nucleus. But again, there’s a reasonable distinction to be made: almost certainly there are only a limited number of fundamentally stable types of atoms—and most of the others have ridiculously short lifetimes.

There’s an immediate footnote, however. A “chemical element” isn’t quite as definite a thing as one might imagine. Because it’s always a mixture of different isotopes. And, say, from one tungsten mine to another, that mixture might change, giving a different effective atomic mass.

And actually this is a good reason to represent types of atoms by entities. Because then one just has to have a single entity representing tungsten that one can use in talking about molecules. And only if one wants to get properties of that type of atom that depend on qualifiers like which mine it’s from does one have to deal with such things.

In a few cases (think heavy water, for example), one will need to explicitly talk about isotopes in what is essentially a chemical context. But most of the time, it’s going to be enough just to specify a chemical element.

To specify a chemical element you just have to give its atomic number Z. And then textbooks will tell you that to specify a particular isotope you just have to say how many neutrons it contains. But that ignores the unexpected case of tantalum. Because, you see, one of the naturally occurring forms of tantalum (180mTa) is actually an excited state of the tantalum nucleus, which happens to be very stable. And to properly specify this, you have to give its excitation level as well as its neutron count.

In a sense, though, quantum mechanics saves one here. Because while there are an infinite number of possible excited states of a nucleus, quantum mechanics says that all of them can be characterized just by two discrete values: spin and parity.

Every isotope—and every excited state—is different, and has its own particular properties. But the world of possible isotopes is much more orderly than, say, the world of possible animals. Because quantum mechanics says that everything in the world of isotopes can be characterized just by a limited set of discrete quantum numbers.

We’ve gone from molecules to atoms to nuclei, so why not talk about particles too? Well, it’s a bigger can of worms. Yes, there are the well-known particles like electrons and protons that are pretty easy to talk about—and are readily represented by entities in the Wolfram Language. But then there’s a zoo of other particles. Some of them—just like nuclei—are pretty easy to characterize. You can basically say things like: “it’s a particular excited state of a charm-quark-anti-charm-quark system” or some such. But in particle physics one’s dealing with quantum field theory, not just quantum mechanics. And one can’t just “count elementary particles”; one also has to deal with the possibility of virtual particles and so on. And in the end the question of what kinds of particles can exist is a very complicated one—rife with computational irreducibility. (For example, what stable states there can be of the gluon field is a much more elaborate version of something like the tiling problem I mentioned in connection with melting points.)

Maybe one day we’ll have a complete theory of fundamental physics. And maybe it’ll even be simple. But exciting as that will be, it’s not going to help much here. Because computational irreducibility means that there’s essentially an irreducible distance between what’s underneath, and what phenomena emerge.

And in creating a language to describe the world, we need to talk in terms of things that can actually be observed and computed about. We need to pay attention to the basic physics—not least so we can avoid setups that will lead to confusion later. But we also need to pay attention to the actual history of science, and actual things that have been measured. Yes, there are, for example, an infinite number of possible isotopes. But for an awful lot of purposes it’s perfectly useful just to set up entities for ones that are known.

The Space of Possible Chemicals

But is it the same in chemistry? In nuclear physics, we think we know all the reasonably stable isotopes that exist—so any additional and exotic ones will be very short-lived, and therefore probably not important in practical nuclear processes. But it’s a different story in chemistry. There are tens of millions of chemicals that people have studied (and, for example, put into papers or patents). And there’s really no limit on the molecules that one might want to consider, and that might be useful.

But, OK, so how can we refer to all these potential molecules? Well, in a first approximation we can specify their chemical structures, by giving graphs in which every node is an atom, and every edge is a bond.

What really is a “bond”? While it’s incredibly useful in practical chemistry, it’s at some level a mushy concept—some kind of semiclassical approximation to a full quantum mechanical story. There are some standard extra bits: double bonds, ionization states, etc. But in practice chemistry is very successfully done just by characterizing molecular structures by appropriately labeled graphs of atoms and bonds.

OK, but should chemicals be represented by entities, or by abstract graphs? Well, if it’s a chemical one’s already heard of, like carbon dioxide, an entity seems convenient. But what if it’s a new chemical that’s never been discussed before? Well, one could think about inventing a new entity to represent it.

Any self-respecting entity, though, better have a name. So what would the name be? Well, in the Wolfram Language, it could just be the graph that represents the structure. But maybe one wants something that seems more like an ordinary textual name—a string. Well, there’s always the IUPAC way of naming chemicals with names like 1,1′-{[3-(dimethylamino)propyl]imino}bis-2-propanol. Or there’s the more computer-friendly SMILES version: CC(CN(CCCN(C)C)CC(C)O)O. And whatever underlying graph one has, one can always generate one of these strings to represent it.

There’s an immediate problem, though: the string isn’t unique. In fact, however one chooses to write down the graph, it can’t always be unique. A particular chemical structure corresponds to a particular graph. But there can be many ways to draw the graph—and many different representations for it. And in fact even the (“graph isomorphism”) problem of determining whether two representations correspond to the same graph can be difficult to solve.

What Is a Chemical in the End?

OK, so let’s imagine we represent a chemical structure by a graph. At first, it’s an abstract thing. There are atoms as nodes in the graph, but we don’t know how they’d be arranged in an actual molecule (and e.g. how many angstroms apart they’d be). Of course, the answer isn’t completely well defined. Are we talking about the lowest-energy configuration of the molecule? (What if there are multiple configurations of the same energy?) Is the molecule supposed to be on its own, or in water, or whatever? How was the molecule supposed to have been made? (Maybe it’s a protein that folded a particular way when it came off the ribosome.)

Well, if we just had an entity representing, say, “naturally occurring hemoglobin”, maybe we’d be better off. Because in a sense that entity could encapsulate all these details.

But if we want to talk about chemicals that have never actually been synthesized it’s a bit of a different story. And it feels as if we’d be better off just with an abstract representation of any possible chemical.

But let’s talk about some other cases, and analogies. Maybe we should just treat everything as an entity. Like every integer could be an entity. Yes, there are an infinite number of them. But at least it’s clear what names they should be given. With real numbers, things are already messier. For example, there’s no longer the same kind of uniqueness as with integers: 0.99999… is really the same as 1.00000…, but it’s written differently.

What about sequences of integers, or, for that matter, mathematical formulas? Well, every possible sequence or every possible formula could conceivably be a different entity. But this wouldn’t be particularly useful, because much of what one wants to do with sequences or formulas is to go inside them, and transform their structure. But what’s convenient about entities is that they’re each just “single things” that one doesn’t have to “go inside”.

So what’s the story with “abstract chemicals”? It’s going to be a mixture. But certainly one’s going to want to “go inside” and transform the structure. Which argues for representing the chemical by a graph.

But then there’s potentially a nasty discontinuity. We’ve got the entity of carbon dioxide, which we already know lots of properties about. And then we’ve got this graph that abstractly represents the carbon dioxide molecule.

We might worry that this would be confusing both to humans and programs. But the first thing to realize is that we can distinguish what these two things are representing. The entity represents the bulk naturally occurring version of the chemical—whose properties have potentially been measured. The graph represents an abstract theoretical chemical, whose properties would have to be computed.

But obviously there’s got to be a bridge. Given a concrete chemical entity, one of the properties will be the graph that represents the structure of the molecule. And given a graph, one will need some kind of ChemicalIdentify function, that—a bit like GeoIdentify or maybe ImageIdentify—tries to identify from the graph what chemical entity (if any) has a molecular structure that corresponds to that graph.

Philosophy Meets Chemistry Meets Math Meets Physics…

As I write out some of the issues, I realize how complicated all this may seem. And, yes, it is complicated. But in our meeting yesterday, it all went very quickly. Of course it helps that everyone there had seen similar issues before: this is the kind of thing that’s all over the foundations of what we do. But each case is different.

And somehow this case got a bit deeper and more philosophical than usual. “Let’s talk about naming stars”, someone said. Obviously there are nearby stars that we have explicit names for. And some other stars may have been identified in large-scale sky surveys, and given identifiers of some kind. But there are lots of stars in distant galaxies that will never have been named. So how should we represent them?

That led to talking about cities. Yes, there are definite, chartered cities that have officially been assigned names–and we probably have essentially all of these right now in the Wolfram Language, updated regularly. But what about some village that’s created for a single season by some nomadic people? How should we represent it? Well, it has a certain location, at least for a while. But is it even a definite single thing, or might it, say, devolve into two villages, or not a village at all?

One can argue almost endlessly about identity—and even existence—for many of these things. But ultimately it’s not the philosophy of such things that we’re interested in: we’re trying to build software that people will find useful. And so what matters in the end is what’s going to be useful.

Now of course that’s not a precise thing to know. But it’s like for language design in general: think of everything people might want to do, then see how to set up primitives that will let people do those things. Does one want some chemicals represented by entities? Yes, that’s useful. Does one want a way to represent arbitrary chemical structures by graphs? Yes, that’s useful.

But to see what to actually do, one has to understand quite deeply what’s really being represented in each case, and how everything is related. And that’s where the philosophy has to meet the chemistry, and the math, and the physics, and so on.

I’m happy to say that by the end of our hour-long meeting yesterday (informed by about 40 years of relevant experience I’ve had, and collectively 100+ years from people in the meeting), I think we’d come up with the essence of a really nice way to handle chemicals and chemical structures. It’s going to be a while before it’s all fully worked out and implemented in the Wolfram Language. But the ideas are going to help inform the way we compute and reason about chemistry for many years to come. And for me, figuring out things like this is an extremely satisfying way to spend my time. And I’m just glad that in my long-running effort to advance the Wolfram Language I get to do so much of it.


High-School Summer Camp: A Two-Week Path to Computational Thinking

$
0
0
wsc-thumb

The Summer Camp Was a Success!

How far can one get in teaching computational thinking to high-school students in two weeks? Judging by the results of this year’s Wolfram High-School Summer Camp the answer is: remarkably far.

I’ve been increasingly realizing what an immense and unique opportunity there now is to teach computational thinking with the whole stack of technology we’ve built up around the Wolfram Language. But it was a thrill to see just how well this seems to actually work with real high-school students—and to see the kinds of projects they managed to complete in only two weeks.

Wolfram Summer Camp 2017

We’ve been doing our high-school summer camp for 5 years now (as well as our 3-week Summer School for more experienced students for 15 years). And every time we do the camp, we figure out a little more. And I think that by now we really have it down—and we’re able to take even students who’ve never really been exposed to computation before, and by the end of the camp have them doing serious computational thinking—and fluently implementing their ideas by writing sometimes surprisingly sophisticated Wolfram Language code (as well as creating well-written notebooks and “computational essays” that communicate about what they’ve done).

Over the coming year, we’re going to be dramatically expanding our Computational Thinking Initiative, and working to bring analogs of the Summer Camp experience to as many students as possible. But the Summer Camp provides fascinating and important data about what’s possible.

The Setup for the Camp

So how did the Summer Camp actually work? We had a lot of applicants for the 40 slots we had available this year. Some had been pointed to the camp by parents, teachers, or previous attendees. But a large fraction had just seen mention of it in the Wolfram|Alpha sidebar. There were students from a range of kinds of schools around the US, and overseas (though we still have to figure out how to get more applicants from underserved populations). Our team had done interviews to pick the final students—and I thought the ones they’d selected were terrific.

Students at the Wolfram Summer Camp

The students’ past experience was quite diverse. Some were already accomplished programmers (almost always self-taught). Others had done a CS class or two. But quite a few had never really done anything computational before—even though they were often quite advanced in various STEM areas such as math. But almost regardless of background, it was striking to me how new the core concepts of computational thinking seemed to be to so many of the students.

How does one take an idea or a question about almost anything, and find a way to formulate it for a computer? To be fair, it’s only quite recently, with all the knowledge and automation that we’ve been able to build into the Wolfram Language, that it’s become realistic for kids to do these kinds of things for real. So it’s not terribly surprising that in their schools or elsewhere our students hadn’t really been exposed to such things before. But it’s now possible—and that means there’s a great new opportunity to seriously teach computational thinking to kids, and to position them to pursue the amazing range of directions that computational thinking is opening up.

It’s important, by the way, to distinguish between “computational thinking” and straight “coding”. Computational thinking is about formulating things in computational terms. Coding is about the actual mechanics of telling a computer what to do. One of our great goals with the Wolfram Language is to automate the process of coding as much as possible so people can concentrate on pure computational thinking. When one’s using lower-level languages, like C++ and Java, there’s no choice but to be involved with the detailed mechanics of coding. But with the Wolfram Language the exciting thing is that it’s possible to teach pure high-level computational thinking, without being forced to deal with the low-level mechanics of coding.

What does this mean in practice? I think it’s very empowering for students: as soon as they “get” a concept, they can immediately apply it, and do real things with it. And it was pretty neat at the Summer Camp to see how easily even students who’d never written programs before were able to express surprisingly sophisticated computational ideas in the Wolfram Language. Sometimes it seemed like students who’d learned a low-level language before were actually at a disadvantage. Though for me it was interesting a few times to witness the “aha” moment when a student realized that they didn’t have to break down their computations into tiny steps the way they’d been taught—and that they could turn some big blob of code they’d written into one simple line that they could immediately understand and extend.

Suggesting Projects

The Summer Camp program involves several hours each day of lectures and workshops aimed at bringing students up to speed with computational thinking and how to express it in the Wolfram Language. But the real core of the program is every student doing an individual, original, computational thinking project.

And, yes, this is a difficult thing to orchestrate. But over the years we’ve been doing our Summer School and Summer Camp we’ve developed a very successful way of setting this up. There are a bunch of pieces to it, and the details depend on the level of the students. But here let’s talk about high-school students, and this year’s Summer Camp.

Right before the camp we (well, actually, I) came up with a list of about 70 potential projects. Some are quite specific, some are quite open-ended, and some are more like “metaprojects” (e.g. pick a dataset in the Wolfram Data Repository and analyze it). Some are projects that could at least in some form already have been done quite a few years ago. But many projects have only just become possible—this year particularly as a result of all our recent advances in machine learning.

summer camp list

 

I tried to have a range of nominal difficulty levels for the projects. I say “nominal” because even a project that can in principle be done in an easy way can also always be done in a more elaborate and sophisticated way. I wanted to have projects that ranged from the extremely well defined and precise (implement a precise algorithm of this particular type), to ones that involved wrangling data or machine learning training, to ones that were basically free-form and where the student got to define the objective.

Many of the projects in this list might seem challenging for high-school students. But my calculation (which in fact worked out well) was that with the technology we now have, all of them are within range.

It’s perhaps interesting to compare the projects with what I suggested for this year’s Summer School. The Summer School caters to more experienced students—typically at the college, graduate school or postdoc level. And so I was able to suggest projects that require deeper mathematical or software engineering knowledge—or are just bigger, with a higher threshold to achieve a reasonable level of success.

summer school list

 

Matching Projects to Students

Before students start picking projects, it’s important that they understand what a finished project should look like, and what’s involved in doing it. So at the very beginning of the camp, the instructors went through projects from previous camps, and discussed what the “output” of a project should be. Maybe it’ll be an active website; maybe an interactive Demonstration; maybe it’ll be a research paper. It’s got to be possible to make a notebook that describes the project and its results, and to make a post about it for Wolfram Community.

After talking about the general idea of projects, and giving examples of previous ones, the instructors did a quick survey of this year’s suggestions list, filling in a few details of what the imagined projects actually were. After this, the students were asked to pick their top three projects from our list, and then invent two more potential projects of their own.

It’s always an interesting challenge to find the right project for each student—and it’s something I’ve personally been involved in at our Summer Camp for the past several years. (And, yes, it helps that I have decades of experience in organizing professional and research projects and figuring out the best people to do them.)

It’s taken us a few iterations, but here’s the approach we’ve found works well. First, we randomly break the students up into groups of a dozen or so. Then we meet with each group, going around the room and asking each student a little about themselves, their interests and goals—and their list of projects.

After we’re finished with each group, we meet separately and try to come up with a project for each student. Sometimes it’ll be one of the projects straight from our list. Sometimes it’ll be a project that the student themself suggested. And sometimes it’ll be some creative combination of these, or even something completely different based on what they said they were interested in.

After we think we’ve come up with a good project, the next step is to meet individually with each student and actually suggest it to them. It’s very satisfying that a lot of the time the students seem really enthused about the projects we end up suggesting. But sometimes it becomes clear that a project just isn’t a good fit—and then sometimes we modify it in real time, but more often we circle back later with a different suggestion.

Once the projects are set, we assign an appropriate mentor to each student, taking into account both the student and the subject of the project. And then things are off and running. We have various checkpoints, like that students have to write up descriptions of their projects and post them on the internal Summer Camp site.

I personally wasn’t involved in the actual execution of the projects (though I did have a chance to check in on a few of them). So it was pretty interesting for me to see at the end of the camp what had actually happened. It’s worth mentioning that our scheme is that mentors can make suggestions about projects, but all the final code in a project should be created by the student. And if one version of the project ends up being too difficult, it’s up to the mentor to simplify it. So however the final project comes out, it really is the student’s work.

Much of the time, the Summer Camp will be the first time students have ever done an original project. It could potentially seem daunting. But I think the fact that we give so many examples of other projects, and that everyone else at the camp is also doing a project, really helps. And in the end experiencing the whole process of going from the idea for a project to a real, finished project is incredibly educational—and seems to have a big effect on many of our students.

A Few Case Studies

OK, so that’s the theory. So what actually happened at this year’s Summer Camp? Well, here are all the projects the students did, with the titles they gave them:

final projects list

 

It’s a very interesting, impressive, and diverse list. But let me pick out a few semi-randomly to discuss in a bit more detail. Consider these as “case studies” for what high-school students can accomplish with the Wolfram Language in a couple of weeks at a summer camp.

Routing Airplanes around Mountains

One young man at our camp had quite a lot of math background, and told me he was interested in airplanes and flying, and had designed his own remote-control plane. I started thinking about all sorts of drone survey projects. But he didn’t have a drone with him—and we had to come up with a project that could actually be done in a couple of weeks. So I ended up suggesting the following: given two points on Earth, find how an airplane can get from one to the other by the shortest path that never needs to go above a certain altitude. (And, yes, a small-scale version of this would be relevant to things like drone surveying too.)

Here’s how the student did this project. First, he realized that one could think of possible flight paths as edges on a graph whose nodes are laid out on a grid on the Earth. Then he used the built-in GeoElevationData to delete nodes that couldn’t be visited because the elevation at that point was above the cutoff. Then he just used FindShortestPath to find the shortest path in the graph from the start to the end.

I thought this was a pretty clever solution. It was a nice piece of computational thinking to realize that the elements of paths could be thought of as edges on a graph with nodes removed. Needless to say, there were some additional details to get a really good result. First, the student added in diagonal connections on the grid, with appropriate weightings to still get the correct shortest path computation. And then he refined the path by successively merging line segments to better approximate a great-circle path, at each step using computational geometry to check that the path wouldn’t go through a “too-high” region.

Finding the shortest flight path

Finding Kiwi Calls

You never know what people are going to come to Summer Camp with. A young man from New Zealand came to our camp with some overnight audio recordings from outside his house featuring occasional periods of (quite strange-sounding) squawking that were apparently the calls of one or more kiwi birds. What the young man wanted to do was automatic “kiwi voice recognition”, finding the calls, and perhaps distinguishing different birds.

I said I thought this wouldn’t be a particularly easy project, but he should try it anyway. Looking at what happened, it’s clear the project started out well. It was easy to pull out all intervals in his audio that weren’t just silence. But that broke up everything, including kiwi calls, into very small blocks. He solved that by the following interesting piece of code, that uses pattern matching to combine symbolic audio objects:

Wolfram Language code for kiwi code project

At this point it might just have worked to use unsupervised machine learning and FeatureSpacePlot to distinguish kiwi from non-kiwi sound clips. But machine learning is still quite a hit-or-miss business—and in this case it wasn’t a hit. So what did the student do? Well, he built himself a tiny lightweight user interface in a notebook, then started manually classifying sound clips. (Various instructors commented that it was fortunate he brought headphones…)

After classifying 200 clips, he used Classify to automatically classify all the other clips. He did a variety of transformations to the data—applying signal processing, generating a spectrogram, etc. And in the end he got his kiwi classifier to 82% accuracy: enough to make a reasonable first pass on finding kiwi calls—and going down a path to computational ornithology.

Biomechanics of Walking and Running

One young woman said she’d recently gotten a stress fracture in her foot that she was told was related to the force she was putting on it while running. She asked if she could make a computational model of what was going on. I have to say I was pessimistic about being able to do that in two weeks—and I suggested instead a project that I thought would be more manageable, involving studying possible gaits (walk, trot, etc.) for creatures with different numbers of legs. But I encouraged her to spend a little time seeing if she could do her original project—and I suggested that if she got to the stage of actually modeling bones, she could use our built-in anatomical data.

The next I knew it was a day before the end of the Summer Camp, and I was looking at what had happened with the projects… and I was really impressed! She’d found a paper with an appropriate model, understood it, and implemented it, and now she had an interactive demonstration of the force on a foot during walking or running. She’d even used the anatomical data to show a 3D image of what was happening.

She explained that when one walks there are two peaks in the force, but when one runs, there’s only one. And when I set her interactive demonstration for my own daily walking regimen I found out that (as she said was typical) I put a maximum force of about twice my weight on my foot when I walk.

Biomechanics of Walking and Running

Banana Ripeness Classifier

At first I couldn’t tell if he was really serious… but one young man insisted he wanted to use machine learning to tell when a piece of fruit is ripe. As it happens, I had used pretty much this exact example in a blog post some time ago discussing the use of machine learning in smart contracts. So I said, “sure, why don’t you try it”. I saw the student a few times during the Summer Camp, curiously always carrying a banana. And what I discovered at the end of the camp was that that very banana was a key element of his project.

At first he searched the web for images of bananas described as “underripe”, “overripe”, etc., then arranged them using FeatureSpacePlot:

Banana ripeness

Then he realized that he could get more quantitative by first looking at where in color space the pixels of the banana image lay. The result was that he was actually able to define a “banana ripeness scale”, where, as he described it: “A value of one was assigned to bananas that were on the brink of spoilage. A value of zero was assigned to a green banana fresh off a tree. A value of 0.5 was assigned to the ‘perfect’ banana.” It’s a nice example of how something everyday and qualitative can be made computational.

For his project, the student made a “Banana Classifier” app that he deployed through the Wolfram Cloud. And he even had an actual banana to test it on!

Banana classifier

Number Internationalization

One of my suggested projects was to implement “international or historical numeral systems”—the analogs of things like Roman numerals but for different cultures and times. One young woman fluent in Korean said she’d like to do this project, starting with Korean.

As it happens, our built-in IntegerName function converts to traditional Korean numerals. So she set herself the task of converting from Korean numerals. It’s an interesting algorithmic exercise, and she solved it with some nice, elegant code.

Korean to Hindu-Arabic numerals

By that point, she was on a roll… so she decided to go on to Burmese, and Thai. She tried to figure out Burmese from web sources… only to discover they were inconsistent… with the result that she ended up contacting a person who had an educational video about Burmese numerals, and eventually unscrambled the issue, wrote code to represent it, and then corrected the Wikipedia page about Burmese numerals. All in all, a great example of real-world algorithm curation. Oh, and she set up the conversions as a Wolfram Language microsite on the web.

Is That a Joke?

Can machine learning tell if something is funny? One young man at the Summer Camp wanted to find out. So for his project he used our Reddit API connection to pull jokes from the Jokes subreddit, and (presumably) non-jokes from the AskReddit subreddit. It took a bit of cleanup and data wrangling… but then he was able to feed his training data straight into the Classify function, and generated a classifier from which he then built a website.

It’s a little hard to know how well it works outside of “Reddit-style humor”—but his anecdotal study at the Summer Camp suggested about a 90% success rate.

Is this a joke machine

Making and Checking Checksums

Different projects involve different kinds of challenges. Sometimes the biggest challenge is just to define the project precisely enough. Other times it’s to get—or clean—the data that’s needed. Still other times, it’s to find a way to interpret voluminous output. And yet other times, it’s to see just how elegantly some particular idea can be implemented.

One math-oriented young woman at the camp picked “implementing checksum algorithms” from my list. Such algorithms (used for social security numbers, credit card numbers, etc.) are very nicely and precisely defined. But how simply and elegantly can they be implemented in the Wolfram Language? It’s a good computational thinking exercise—that requires really understanding both the algorithms and the language. And for me it’s nice to be able to immediately read off from the young woman’s code just how these checksum algorithms work…

Checksums algorithm

4D Plotting on a Tesseract

How should one plot a function in 4D? I had a project in my list about this, though I have to admit I hadn’t really figured out how it should be done. But, fortunately, a young man at the Summer Camp was keen to try to work on it. And with an interesting mixture of computational and mathematical thinking, he created ParametricPlot4D—then did a bunch of math to figure out how to render the results in what seemed like two useful ways: as an orthogonal projection, and as a stereographic projection. A Manipulate makes the results interactive—and they look pretty neat…

Plotting a tesseract

State-Level Mortality

In addition to my explicit list of project suggestions, I had a “meta suggestion”: take any dataset, for example from the new Wolfram Data Repository, and try to analyze and understand it. One student took a dataset about meteorite impacts; another about the recent Ebola outbreak in Africa. One young woman said she was interested in actuarial science—so I suggested that she look at something quintessentially actuarial: mortality data.

I suggested that maybe she could look at the (somewhat macabrely named) Death Master File. I wasn’t sure how far she’d get with it. But at the end of the camp I found out that she’d processed 90 million records—and successfully reduced them to derive aggregate survival curves for 25 different states and make an interactive Demonstration of the results. (Letting me conclude, for example, that my current probability of living to age 100 is 28% higher in Massachusetts than in Indiana…)

Survival image

“OCRing” Regular Tiling

Each year when I make up a list of projects for the Summer Camp I wonder if there’ll be particular favorites. My goal is actually to avoid this, and to have as uniform a distribution of interest in the projects as possible. But this year “Use Machine Learning to Identify Polyhedra” ended up being a minor favorite. And one consequence was that a student had already started working on the project even before we’d talked to him—even though by that time the project was already assigned to someone else.

But actually the “recovery” was better than the original. Because we figured out a really nice alternative project that was very well suited to the student. The project was to take images of regular tilings, say from a book, and to derive a computational representation of them, suitable, say, for LatticeData.

The student came up with a pretty sophisticated approach, largely based on image processing, but with a dash of computational geometry, combinatorics and even some cluster analysis thrown in. First, he used fairly elaborate image processing to identify the basic unit in the tiling. Then he figured out how this unit was arranged to form the final tiling. It ended up being about 102 lines of fairly dense algorithmic code—but the result was a quite robust “tiling OCR” system, that he also deployed on the web.

Analyzing regular tilings

Finding Buildings in Satellite Images

In my list I had a project “Identify buildings from satellite images”. A few students thought it sounded interesting, but as I thought about it some more, I got concerned that it might be really difficult. Still, one of our students was a capable young man who already seemed to know a certain amount about machine learning. So I encouraged him to give it a try. He ended up doing an impressive job.

He started by getting training data by comparing satellite images with street maps that marked buildings (and, conveniently, starting with the upcoming version of the Wolfram Language, not only streets maps but also satellite images are built in):

Buildings

Then he used NetChain to build a neural net (based on the classic LeNet network, but modified). And then he started trying to classify parts of images as “building” or “not building”.

FinalNetUpdate3[ ]

The results weren’t at all bad. But so far they were only answering the question “is there a building in that square?”, not “where is there a building?”. So then—in a nice piece of computational thinking—the student came up with a further idea: just have a window pan across the image, at each step estimating the probability of building vs. not-building. The result was a remarkably accurate heat map of where buildings might be.

TempMapDensity[ , FinalNetUpdate3, 50, 25]

It’d be a nice machine learning result for anyone. But as something done by a high-school student in two weeks I think it’s really impressive. And another great example of what’s now possible at an educational level with our whole Wolfram Language technology stack.

Beyond the Summer Camp

OK, so our Summer Camp was a success, and, with luck, the students from it are now successfully “launched” as independent computational thinkers. (The test, as far as I’m concerned, is whether when confronted with something in their education or their lives, they routinely turn to computational thinking, and just “write a program to solve the problem”. I’m hopeful that many of them now will. And, by the way, they immediately have “marketable skills”—like being able to do all sorts of data-science-related things.)

But how can we scale up what we’ve achieved with the Summer Camp? Well, we have a whole Computational Thinking Initiative that we’ve been designing to do just that. We’ll be rolling out different parts over the next little while, but one aspect will be doing other camps, and enabling other people to also do camps.

We’ve now got what amounts to an operations manual for how to “do a camp”. But suffice it to say that the core of it is to have instructors with good knowledge of the Wolfram Language (e.g. to the level of our Certified Instructor program), access to a bunch of great students, and use of a suitable venue. Two weeks seems to be a good length, though longer would work too. (Shorter will probably not be sufficient for students without prior experience to get to the point of doing a real project.)

Our camp is for high-school students (mainly aged 15 through 17). I think it would also be possible to do a successful camp for advanced middle-school students (maybe aged 12 and 13). And, of course, our long-running Summer School provides a very successful model for older students.

Beyond camps, we’ve had for some time a mentorships program which we will be streamlining and scaling up—helping students to work on longer-term projects. We’re also planning a variety of events and venues in which students can showcase their computational thinking work.

But for now it’s just exciting to see what was achieved in two weeks at this year’s Summer Camp. Yes, with the tech stack we now have, high-school students really can do serious computational thinking—that will make them not only immediately employable, but also positioned for what I think will be some of the most interesting career directions of the next few decades.

When Exactly Will the Eclipse Happen? A Multimillenium Tale of Computation

$
0
0
eclipseblog-thumb

Preparing for August 21, 2017

On August 21, 2017, there’s going to be a total eclipse of the Sun visible on a line across the US. But when exactly will the eclipse occur at a given location? Being able to predict astronomical events has historically been one of the great triumphs of exact science. But in 2017, how well can it actually be done?

The answer, I think, is well enough that even though the edge of totality moves at just over 1000 miles per hour it should be possible to predict when it will arrive at a given location to within perhaps a second. And as a demonstration of this, we’ve created a website to let anyone enter their geo location (or address) and then immediately compute when the eclipse will reach them—as well as generate many pages of other information.

PrecisionEclipse.com pages

It’s an Old Business

These days it’s easy to find out when the next solar eclipse will be; indeed built right into the Wolfram Language there’s just a function that tells you (in this form the output is the “time of greatest eclipse”):

SolarEclipse[]

SolarEclipse[]

It’s also easy to find out, and plot, where the region of totality will be:

GeoListPlot[SolarEclipse["TotalPhasePolygon"]]
GeoListPlot[SolarEclipse["TotalPhasePolygon"]]

Or to determine that the whole area of totality will be about 16% of the area of the US:

GeoArea[SolarEclipse["TotalPhasePolygon"]]/GeoArea[Entity["Country", "UnitedStates"]]

GeoArea[SolarEclipse["TotalPhasePolygon"]]/GeoArea[Entity["Country", "UnitedStates"]]

But computing eclipses is not exactly a new business. In fact, the Antikythera device from 2000 years ago even tried to do it—using 37 metal gears to approximate the motion of the Sun and Moon (yes, with the Earth at the center). To me there’s something unsettling—and cautionary—about the fact that the Antikythera device stands as such a solitary piece of technology, forgotten but not surpassed for more than 1600 years.

But right there on the bottom of the device there’s an arm that moves around, and when it points to an Η or Σ marking, it indicates a possible Sun or Moon eclipse. The way of setting dates on the device is a bit funky (after all, the modern calendar wouldn’t be invented for another 1500 years), but if one takes the simulation on the Wolfram Demonstrations Project (which was calibrated back in 2012 when the Demonstration was created), and turns the crank to set the device for August 21, 2017, here’s what one gets:

The Antikythera device

Antikythera device in action

And, yes, all those gears move so as to line the Moon indicator up with the Sun—and to make the arm on the bottom point right at an Η—just as it should for a solar eclipse. It’s amazing to see this computation successfully happen on a device designed 2000 years ago.

Of course the results are a lot more accurate today. Though, strangely, despite all the theoretical science that’s been done, the way we actually compute the position of the Sun and Moon is conceptually very much like the gears—and effectively epicycles—of the Antikythera device. It’s just that now we have the digital equivalent of hundreds of thousands of gears.

Why Do Eclipses Happen?

A total solar eclipse occurs when the Moon gets in front of the Sun from the point of view of a particular location on the Earth. And it so happens that at this point in the Earth’s history the Moon can just block the Sun because it has almost exactly the same angular diameter in the sky as the Sun (about 0.5° or 30 arc-minutes).

So when does the Moon get between the Sun and the Earth? Well, basically every time there’s a new moon (i.e. once every lunar month). But we know there isn’t an eclipse every month. So how come?

Sun, Moon and Earth aligned

Graphics[{Style[Disk[{0, 0}, .3/5], Yellow], 
  Style[Disk[{.8, 0}, .1/5], Gray], Style[Disk[{1, 0}, .15/5], Blue]}]

Well, actually, in the analogous situation of Ganymede and Jupiter, there is an eclipse every time Ganymede goes around Jupiter (which happens to be about once per week). Like the Earth, Jupiter’s orbit around the Sun lies in a particular plane (the “Plane of the Ecliptic”). And it turns out that Ganymede’s orbit around Jupiter also lies in essentially the same plane. So every time Ganymede reaches the “new moon” position (or, in official astronomy parlance, when it’s aligned “in syzygy”—pronounced sizz-ee-gee), it’s in the right place to cast its shadow onto Jupiter, and to eclipse the Sun wherever that shadow lands. (From Jupiter, Ganymede appears about 3 times the size of the Sun.)

But our moon is different. Its orbit doesn’t lie in the plane of the ecliptic. Instead, it’s inclined at about 5°. (How it got that way is unknown, but it’s presumably related to how the Moon was formed.) But that 5° is what makes eclipses so comparatively rare: they can only happen when there’s a “new moon configuration” (syzygy) right at a time when the Moon’s orbit passes through the Plane of the Ecliptic.

To show what’s going on, let’s draw an exaggerated version of everything. Here’s the Moon going around the Earth, colored red whenever it’s close to the Plane of the Ecliptic:

Lunar path and the Plane of the Ecliptic

Graphics3D[{With[{dt = 0, \[Theta] = 20 Degree},
   Table[{With[{p = {Sin[2 Pi (t + dt)/27.3] Cos[\[Theta]],
         Cos[2 Pi (t + dt)/27.3] Cos[\[Theta]],
         Cos[2 Pi (t + dt)/27.3] Sin[\[Theta]]}}, {Style[
        Line[{{0, 0, 0}, p}], Opacity[.1]],
       Style[Sphere[p, .05],
        Blend[{Red, GrayLevel[.8, .02]},
         Sqrt[Abs[Cos[2 Pi t/27.2]]]]]}],
     Style[Sphere[{0, 0, 0}, .1], Blue]}, {t, 0, 26}]], EdgeForm[Red],
   Style[InfinitePlane[{0, 0, 0}, {{1, 0, 0}, {0, 1, 0}}],
   Directive[Red, Opacity[.02]]]}, Lighting -> "Neutral",
 Boxed -> False]

Now let’s look at what happens over the course of about a year. We’re showing a dot for where the Moon is each day. And the dot is redder if the Moon is closer to the Plane of the Ecliptic that day. (Note that if this was drawn to scale, you’d barely be able to see the Moon’s orbit, and it wouldn’t ever seem to go backwards like it does here.)

Approximately one year of lunar orbits

With[{dt = 1}, 
 Graphics[{Style[Disk[{0, 0}, .1], Darker[Yellow]], 
   Table[{With[{p = .2 {Sin[2 Pi t/27.3], Cos[2 Pi t/27.3]} + {Sin[
           2 Pi t/365.25], Cos[2 Pi t/365.25]}}, {Style[
        Line[{{Sin[2 Pi t/365.25], Cos[2 Pi t/365.25]}, p}], 
        Opacity[.3]], 
       Style[Disk[p, .01], 
        Blend[{Red, GrayLevel[.8]}, 
         Sqrt[Abs[Cos[2 Pi (t + dt)/27.2]]]]]}], 
     Style[Disk[{Sin[2 Pi t/365.25], Cos[2 Pi t/365.25]}, .005], 
      Blue]}, {t, 360}]}]]

Now we can start to see how eclipses work. The basic point is that there’s a solar eclipse whenever the Moon is both positioned between the Earth and the Sun, and it’s in the Plane of the Ecliptic. In the picture, those two conditions correspond to the Moon being as far as possible towards the center, and as red as possible. So far we’re only showing the position of the (exaggerated) moon once per day. But to make things clearer, let’s show it four times a day—and now prune out cases where the Moon isn’t at least roughly lined up with the Sun:

Eclipse points in lunar orbit

With[{dt=1},Graphics[{Style[Disk[{0,0},.1],Darker[Yellow]],Table[{With[{p=.2 {Sin[2 Pi t/27.3],Cos[2 Pi t/27.3]}+{Sin[2 Pi t/365.25],Cos[2 Pi t/365.25]}},If[Norm[p]>.81,{},{Style[Line[{{Sin[2 Pi t/365.25],Cos[2 Pi t/365.25]},p}],Opacity[.3]],Style[Disk[p,.01],Blend[{Red,GrayLevel[.8]},Sqrt[Abs[Cos[2 Pi (t+dt)/27.2]]]]]}]],Style[Disk[{Sin[2 Pi t/365.25],Cos[2 Pi t/365.25]},.005],Blue]},{t,1,360,.25}]}]]

And now we can see that at least in this particular case, there are two points (indicated by arrows) where the Moon is lined up and in the plane of the ecliptic (so shown red)—and these points will then correspond to solar eclipses.

In different years, the picture will look slightly different, essentially because the Moon is starting at a different place in its orbit at the beginning of the year. Here are schematic pictures for a few successive years:

Schematic pictures for a few successive years

GraphicsGrid[
 Partition[
  Table[With[{dt = 1}, 
    Graphics[{Table[{With[{p = .2 {Sin[2 Pi t/27.3], 
              Cos[2 Pi t/27.3]} + {Sin[2 Pi t/365.25], 
             Cos[2 Pi t/365.25]}}, 
         If[Norm[p] > .81, {}, {Style[Line[{{0, 0}, p}], 
            Blend[{Red, GrayLevel[.8]}, 
             Sqrt[Abs[Cos[2 Pi (t + dt)/27.2]]]]], 
           Style[Line[{{Sin[2 Pi t/365.25], Cos[2 Pi t/365.25]}, p}], 
            Opacity[.3]], 
           Style[Disk[p, .01], 
            Blend[{Red, GrayLevel[.8]}, 
             Sqrt[Abs[Cos[2 Pi (t + dt)/27.2]]]]]}]], 
        Style[Disk[{Sin[2 Pi t/365.25], Cos[2 Pi t/365.25]}, .005], 
         Blue]}, {t, 1 + n *365.25, 360 + n*365.25, .25}], 
      Style[Disk[{0, 0}, .1], Darker[Yellow]]}]], {n, 0, 5}], 3]]

It’s not so easy to see exactly when eclipses occur here—and it’s also not possible to tell which are total eclipses where the Moon is exactly lined up, and which are only partial eclipses. But there’s at least an indication, for example, that there are “eclipse seasons” in different parts of the year where eclipses happen.

OK, so what does the real data look like? Here’s a plot for 20 years in the past and 20 years in the future, showing the actual days in each year when total and partial solar eclipses occur (the small dots everywhere indicate new moons):

Plot of solar eclipses over 40 years

coord[date_] := {DateValue[date, "Year"], 
  date - NextDate[DateObject[{DateValue[date, "Year"]}], "Instant"]}
ListPlot[coord /@ SolarEclipse[{Now - Quantity[20, "Years"], Now + Quantity[20, "Years"], All}], AspectRatio -> 1/3, Frame -> True]

The reason for the “drift” between successive years is just that the lunar month (29.53 days) doesn’t line up with the year, so the Moon doesn’t go through a whole number of orbits in the course of a year, with the result that at the beginning of a new year, the Moon is in a different phase. But as the picture makes clear, there’s quite a lot of regularity in the general times at which eclipses occur—and for example there are usually 2 eclipses in a given year—though there can be more (and in 0.2% of years there can be as many as 5, as there last were in 1935 ).

To see more detail about eclipses, let’s plot the time differences (in fractional years) between all successive solar eclipses for 100 years in the past and 100 years in the future:

ListLinePlot[Differences[SolarEclipse[{Now - Quantity[100, "Years"], Now + Quantity[100, "Years"], All}]]/Quantity[1, "Years"], Mesh -> All, PlotRange -> All, Frame -> True, AspectRatio -> 1/3, FrameTicks -> {None, Automatic}]

ListLinePlot[Differences[SolarEclipse[{Now - Quantity[100, "Years"], Now + Quantity[100, "Years"], All}]]/Quantity[1, "Years"], Mesh -> All, PlotRange -> All, Frame -> True, AspectRatio -> 1/3, FrameTicks -> {None, Automatic}]

And now let’s plot the same time differences, but just for total solar eclipses:

ListLinePlot[Differences[SolarEclipse[{Now - Quantity[100, "Years"], Now + Quantity[100, "Years"], All}, EclipseType -> "Total"]]/Quantity[1, "Years"], Mesh -> All, PlotRange -> All, Frame -> True, AspectRatio -> 1/3, FrameTicks -> {None, Automatic}]

ListLinePlot[Differences[SolarEclipse[{Now - Quantity[100, "Years"], Now + Quantity[100, "Years"], All}, EclipseType -> "Total"]]/Quantity[1, "Years"], Mesh -> All, PlotRange -> All, Frame -> True, AspectRatio -> 1/3, FrameTicks -> {None, Automatic}]

There’s obviously a fair amount of overall regularity here, but there are also lots of little fine structure and irregularities. And being able to correctly predict all these details has basically taken science the better part of a few thousand years.

Ancient History

It’s hard not to notice an eclipse, and presumably even from the earliest times people did. But were eclipses just reflections—or omens—associated with random goings-on in the heavens, perhaps in some kind of soap opera among the gods? Or were they things that could somehow be predicted?

A few thousand years ago, it wouldn’t have been clear what people like astrologers could conceivably predict. When will the Moon be at a certain place in the sky? Will it rain tomorrow? What will the price of barley be? Who will win a battle? Even now, we’re not sure how predictable all of these are. But the one clear case where prediction and exact science have triumphed is astronomy.

At least as far as the Western tradition is concerned, it all seems to have started in ancient Babylon—where for many hundreds of years, careful observations were made, and, in keeping with the ways of that civilization, detailed records were kept. And even today we still have thousands of daily official diary entries written in what look like tiny chicken scratches preserved on little clay tablets (particularly from Ninevah, which happens now to be in Mosul, Iraq). “Night of the 14th: Cold north wind. Moon was in front of α Leonis. From 15th to 20th river rose 1/2 cubit. Barley was 1 kur 5 siit. 25th, last part of night, moon was 1 cubit 8 fingers behind ε Leonis. 28th, 74° after sunrise, solar eclipse…”

Babylonian clay tabletIf one looks at what happens on a particular day, one probably can’t tell much. But by putting observations together over years or even hundreds of years it’s possible to see all sorts of repetitions and regularities. And back in Babylonian times the idea arose of using these to construct an ephemeris—a systematic table that said where a particular heavenly body such as the Moon was expected to be at any particular time.

(Needless to say, reconstructing Babylonian astronomy is a complicated exercise in decoding what’s by now basically an alien culture. A key figure in this effort was a certain Otto Neugebauer, who happened to work down the hall from me at the Institute for Advanced Study in Princeton in the early 1980s. I would see him almost every day—a quiet white-haired chap, with a twinkle in his eye—and just sometimes I’d glimpse his huge filing system of index cards which I now realize was at the center of understanding Babylonian astronomy.)

One thing the Babylonians did was to measure surprisingly accurately the repetition period for the phases of the Moon—the so-called synodic month (or “lunation period”) of about 29.53 days. And they noticed that 235 synodic months was very close to 19 years—so that about every 19 years dates and phases of the Moon repeat their alignment, forming a so-called Metonic cycle (named after Meton of Athens, who described it in 432 BC).

It probably helps that the random constellations in the sky form a good pattern against which to measure the precise position of the Moon (it reminds me of the modern fashion of wearing fractals to make motion capture for movies easier). But the Babylonians noticed all sorts of details of the motion of the Moon. They knew about its “anomaly”: its periodic speeding up and slowing down in the sky (now known to be a consequence of its slightly elliptical orbit). And they measured the average period of this—the so-called anomalistic month—to be about 27.55 days. They also noticed that the Moon went above and below the Plane of the Ecliptic (now known to be because of the inclination of its orbit)—with an average period (the so-called draconic month) that they measured as about 27.21 days.

And by 400 BC they’d noticed that every so-called saros of about 18 years 11 days all these different periods essentially line up (223 synodic months, 239 anomalistic months and 242 draconic months)—with the result that the Moon ends up at about the same position relative to the Sun. And this means that if there was an eclipse at one saros, then one can make the prediction that there’s going to be an eclipse at the next saros too.

When one’s absolutely precise about it, there are all sorts of effects that prevent precise repetition at each saros. But over timescales of more than 1300 years, there are in fact still strings of eclipses separated from each other by one saros. (Over the course of such a saros series, the locations of the eclipses effectively scan across the Earth; the upcoming eclipse is number 22 in a series of 77 that began in 1639 AD with an eclipse near the North Pole and will end in 3009 AD with an eclipse near the South Pole.)

Any given moment in time will be in the middle of quite a few saros series (right now it’s 40)—and successive eclipses will always come from different series. But knowing about the saros cycle is a great first step in predicting eclipses—and it’s for example what the Antikythera device uses. In a sense, it’s a quintessential piece of science: take many observations, then synthesize a theory from them, or a least a scheme for computation.

It’s not clear what the Babylonians thought about abstract, formal systems. But the Greeks were definitely into them. And by 300 BC Euclid had defined his abstract system for geometry. So when someone like Ptolemy did astronomy, they did it a bit like Euclid—effectively taking things like the saros cycle as axioms, and then proving from them often surprisingly elaborate geometrical theorems, such as that there must be at least two solar eclipses in a given year.

Ptolemy’s Almagest from around 150 AD is an impressive piece of work, containing among many other things some quite elaborate procedures—and explicit tables—for predicting eclipses. (Yes, even in the later printed version, numbers are still represented confusingly by letters, as they always were in ancient Greek.)

Ptolemy's Almagest

 

In Ptolemy’s astronomy, Earth was assumed to be at the center of everything. But in modern terms that just meant he was choosing to use a different coordinate system—which didn’t affect most of the things he wanted to do, like working out the geometry of eclipses. And unlike the mainline Greek philosophers he wasn’t so much trying to make a fundamental theory of the world, but just wanted whatever epicycles and so on he needed to explain what he observed.

The Dawn of Modern Science

For more than a thousand years Ptolemy’s theory of the Moon defined the state of the art. In the 1300s Ibn al-Shatir revised Ptolemy’s models, achieving somewhat better accuracy. In 1472 Regiomontanus (Johannes Müller), systematizer of trigonometry, published more complete tables as part of his launch of what was essentially the first-ever scientific publishing company. But even in 1543 when Nicolaus Copernicus introduced his Sun-centered model of the solar system, the results he got were basically the same as Ptolemy’s, even though his underlying description of what was going on was quite different.

It’s said that Tycho Brahe got interested in astronomy in 1560 at age 13 when he saw a solar eclipse that had been predicted—and over the next several decades his careful observations uncovered several effects in the motion of the Moon (such as speeding up just before a full moon)—that eventually resulted in perhaps a factor 5 improvement in the prediction of its position. To Tycho eclipses were key tests, and he measured them carefully, and worked hard to be able to predict their timing more accurately than to within a few hours. (He himself never saw a total solar eclipse, only partial ones.)

Tycho's pages

 

Armed with Tycho’s observations, Johannes Kepler developed his description of orbits as ellipses—introducing concepts like inclination and eccentric anomaly—and in 1627 finally produced his Rudolphine Tables, which got right a lot of things that had been got wrong before, and included all sorts of detailed tables of lunar positions, as well as vastly better predictions for eclipses.

Kepler's Rudolphine Tables

 

Using Kepler’s Rudolphine Tables (and a couple of pages of calculations)—the first known actual map of a solar eclipse was published in 1654. And while there are some charming inaccuracies in overall geography, the geometry of the eclipse isn’t too bad.

The first map of an eclipse

 

Whether it was Ptolemy’s epicycles or Kepler’s ellipses, there were plenty of calculations to do in determining the motions of heavenly bodies (and indeed the first known mechanical calculator—excepting the Antikythera device—was developed by a friend of Kepler’s, presumably for the purpose). But there wasn’t really a coherent underlying theory; it was more a matter of describing effects in ways that could be used to make predictions.

So it was a big step forward in 1687 when Isaac Newton published his Principia, and claimed that with his laws for motion and gravity it should be possible—essentially from first principles—to calculate everything about the motion of the Moon. (Charmingly, in his “Theory of the World” section he simply asserts as his Proposition XXII “That all the motions of the Moon… follow from the principles which we have laid down.”)

Newton was proud of the fact that he could explain all sorts of known effects on the basis of his new theory. But when it came to actually calculating the detailed motion of the Moon he had a frustrating time. And even after several years he still couldn’t get the right answer—in later editions of the Principia adding the admission that actually “The apse of the Moon is about twice as swift” (i.e. his answer was wrong by a factor of 2).

Still, in 1702 Newton was happy enough with his results that he allowed them to be published, in the form of a 20-page booklet on the “Theory of the Moon”, which proclaimed that “By this Theory, what by all Astronomers was thought most difficult and almost impossible to be done, the Excellent Mr. Newton hath now effected, viz. to determine the Moon’s Place even in her Quadratures, and all other Parts of her Orbit, besides the Syzygys, so accurately by Calculation, that the Difference between that and her true Place in the Heavens shall scarce be two Minutes…”

Newton's "Theory of the Moon" booklet

 

Newton didn’t explain his methods (and actually it’s still not clear exactly what he did, or how mathematically rigorous it was or wasn’t). But his booklet effectively gave a step-by-step algorithm to compute the position of the Moon. He didn’t claim it worked “at the syzygys” (i.e. when the Sun, Moon and Earth are lined up for an eclipse)—though his advertised error of two arc-minutes was still much smaller than the angular size of the Moon in the sky.

But it wasn’t eclipses that were the focus then; it was a very practical problem of the day: knowing the location of a ship out in the open ocean. It’s possible to determine what latitude you’re at just by measuring how high the Sun gets in the sky. But to determine longitude you have to correct for the rotation of the Earth—and to do that you have to accurately keep track of time. But back in Newton’s day, the clocks that existed simply weren’t accurate enough, especially when they were being tossed around on a ship.

But particularly after various naval accidents, the problem of longitude was deemed important enough that the British government in 1714 established a “Board of Longitude” to offer prizes to help get it solved. One early suggestion was to use the regularity of the moons of Jupiter discovered by Galileo as a way to tell time. But it seemed that a simpler solution (not requiring a powerful telescope) might just be to measure the position of our moon, say relative to certain fixed stars—and then to back-compute the time from this.

But to do this one had to have an accurate way to predict the motion of the Moon—which is what Newton was trying to provide. In reality, though, it took until the 1760s before tables were produced that were accurate enough to be able to determine time to within a minute (and thus distance to within 15 miles or so). And it so happens that right around the same time a marine chronometer was invented that was directly able to keep good time.

The Three-Body Problem

One of Newton’s great achievements in the Principia was to solve the so-called two-body problem, and to show that with an inverse square law of gravity the orbit of one body around another must always be what Kepler had said: an ellipse.

In a first approximation, one can think of the Moon as just orbiting the Earth in a simple elliptical orbit. But what makes everything difficult is that that’s just an approximation, because in reality there’s also a gravitational pull on the Moon from the Sun. And because of this, the Moon’s orbit is no longer a simple fixed ellipse—and in fact it ends up being much more complicated. There are a few definite effects one can describe and reason about. The ellipse gets stretched when the Earth is closer to the Sun in its own orbit. The orientation of the ellipse precesses like a top as a result of the influence of the Sun. But there’s no way in the end to work out the orbit by pure reasoning—so there’s no choice but to go into the mathematics and start solving the equations of the three-body problem.

In many ways this represented a new situation for science. In the past, one hadn’t ever been able to go far without having to figure out new laws of nature. But here the underlying laws were supposedly known, courtesy of Newton. Yet even given these laws, there was difficult mathematics involved in working out the behavior they implied.

Over the course of the 1700s and 1800s the effort to try to solve the three-body problem and determine the orbit of the Moon was at the center of mathematical physics—and attracted a veritable who’s who of mathematicians and physicists.

An early entrant was Leonhard Euler, who developed methods based on trigonometric series (including much of our current notation for such things), and whose works contain many immediately recognizable formulas:

Euler's methods

 

In the mid-1740s there was a brief flap—also involving Euler’s “competitors” Clairaut and d’Alembert—about the possibility that the inverse-square law for gravity might be wrong. But the problem turned out to be with the calculations, and by 1748 Euler was using sums of about 20 trigonometric terms and proudly proclaiming that the tables he’d produced for the three-body problem had predicted the time of a total solar eclipse to within minutes. (Actually, he had said there’d be 5 minutes of totality, whereas in reality there was only 1—but he blamed this error on incorrect coordinates he’d been given for Berlin.)

Mathematical physics moved rapidly over the next few decades, with all sorts of now-famous methods being developed, notably by people like Lagrange. And by the 1770s, for example, Lagrange’s work was looking just like it could have come from a modern calculus book (or from a Wolfram|Alpha step-by-step solution):

Lagrange's methods

 

Particularly in the hands of Laplace there was increasingly obvious success in deriving the observed phenomena of what he called “celestial mechanics” from mathematics—and in establishing the idea that mathematics alone could indeed generate new results in science.

At a practical level, measurements of things like the position of the Moon had always been much more accurate than calculations. But now they were becoming more comparable—driving advances in both. Meanwhile, there was increasing systematization in the production of ephemeris tables. And in 1767 the annual publication began of what was for many years the standard: the British Nautical Almanac.

The almanac quoted the position of the Moon to the arc-second, and systematically achieved at least arc-minute accuracy. The primary use of the almanac was for navigation (and it was what started the convention of using Greenwich as the “prime meridian” for measuring time). But right at the front of each year’s edition were the predicted times of the eclipses for that year—in 1767 just two solar eclipses:

Nautical Almanac

 

The Math Gets More Serious

At a mathematical level, the three-body problem is about solving a system of three ordinary differential equations that give the positions of the three bodies as a function of time. If the positions are represented in standard 3D Cartesian coordinates
ri={xi, yi, zi}, the equations can be stated in the form:

Three equations

 

The {x,y,z} coordinates here aren’t, however, what traditionally show up in astronomy. For example, in describing the position of the Moon one might use longitude and latitude on a sphere around the Earth. Or, given that one knows the Moon has a roughly elliptical orbit, one might instead choose to describe its motions by variables that are based on deviations from such an orbit. In principle it’s just a matter of algebraic manipulation to restate the equations with any given choice of variables. But in practice what comes out is often long and complex—and can lead to formulas that fill many pages.

But, OK, so what are the best kind of variables to use for the three-body problem? Maybe they should involve relative positions of pairs of bodies. Or relative angles. Or maybe positions in various kinds of rotating coordinate systems. Or maybe quantities that would be constant in a pure two-body problem. Over the course of the 1700s and 1800s many treatises were written exploring different possibilities.

But in essentially all cases the ultimate approach to the three-body problem was the same. Set up the problem with the chosen variables. Identify parameters that, if set to zero, would make the problem collapse to some easy-to-solve form. Then do a series expansion in powers of these parameters, keeping just some number of terms.

By the 1860s Charles Delaunay had spent 20 years developing the most extensive theory of the Moon in this way. He’d identified five parameters with respect to which to do his series expansions (eccentricities, inclinations, and ratios of orbit sizes)—and in the end he generated about 1800 pages like this (yes, he really needed Mathematica!):

Delaunay's series expansions

 

But the sad fact was that despite all this effort, he didn’t get terribly good answers. And eventually it became clear why. The basic problem was that Delaunay wanted to represent his results in terms of functions like sin and cos. But in his computations, he often wanted to do series expansions with respect to the frequencies of those functions. Here’s a minimal case:

Series[Sin[(ω + δ)*t], {δ, 0, 3}]

Series[Sin[(ω + δ)*t], {δ, 0, 3}]

And here’s the problem. Take a look even at the second term. Yes, the δ parameter may be small. But how about the t parameter, standing for time? If you don’t want to make predictions very far out, that’ll stay small. But what if you want to figure out what will happen further in the future?

Well eventually that term will get big. And higher-order terms will get even bigger. But unless the Moon is going to escape its orbit or something, the final mathematical expressions that represent its position can’t have values that are too big. So in these expressions the so-called secular terms that increase with t must somehow cancel out.

But the problem is that at any given order in the series expansion, there’s no guarantee that will happen in a numerically useful way. And in Delaunay’s case—even though with immense effort he often went to 7th order or beyond—it didn’t.

One nice feature of Delaunay’s computation was that it was in a sense entirely algebraic: everything was done symbolically, and only at the very end were actual numerical values of parameters substituted in.

But even before Delaunay, Peter Hansen had taken a different approach—substituting numbers as soon as he could, and dropping terms based on their numerical size rather than their symbolic form. His presentations look less pure (notice things like all those t1800, where t is the time in years), and it’s more difficult to tell what’s going on. But as a practical matter, his results were much better, and in fact were used for many national almanacs from about 1862 to 1922, achieving errors as small as 1 or 2 arc-seconds at least over periods of a decade or so. (Over longer periods, the errors could rapidly increase because of the lack of terms that had been dropped as a result of what amounted to numerical accidents.)

Hansen's method

 

Both Delaunay and Hansen tried to represent orbits as series of powers and trigonometric functions (so-called Poisson series). But in the 1870s, George Hill in the US Nautical Almanac Office proposed instead using as a basis numerically computed functions that came from solving an equation for two-body motion with a periodic driving force of roughly the kind the Sun exerts on the Moon’s orbit. A large-scale effort was mounted, and starting in 1892 Ernest W. Brown (who had moved to the US, but had been a student of George Darwin, Charles Darwin’s physicist son) took charge of the project and in 1918 produced what would stand for many years as the definitive “Tables of the Motion of the Moon”.

Brown’s tables consist of hundreds of pages like this—ultimately representing the position of the Moon as a combination of about 1400 terms with very precise coefficients:

Brown's tables

 

He says right at the beginning that the tables aren’t particularly intended for unique events like eclipses, but then goes ahead and does a “worked example” of computing an eclipse from 381 BC, reported by Ptolemy:

Brown's tables 2

 

It was an impressive indication of how far things had come. But ironically enough the final presentation of Brown’s tables had the same sum-of-trigonometric-functions form that one would get from having lots of epicycles. At some level it’s not surprising, because any function can ultimately be represented by epicycles, just as it can be represented by a Fourier or other series. But it’s a strange quirk of history that such similar forms were used.

Can the Three-Body Problem Be Solved?

It’s all well and good that one can find approximations to the three-body problem, but what about just finding an outright solution—like as a mathematical formula? Even in the 1700s, there’d been some specific solutions found—like Euler’s collinear configuration, and Lagrange’s equilateral triangle. But a century later, no further solutions had been found—and finding a complete solution to the three-body problem was beginning to seem as hopeless as trisecting an angle, solving the quintic, or making a perpetual motion machine. (That sentiment was reflected for example in a letter Charles Babbage wrote Ada Lovelace in 1843 mentioning the “horrible problem [of] the three bodies”—even though this letter was later misinterpreted by Ada’s biographers to be about a romantic triangle, not the three-body problem of celestial mechanics.)

In contrast to the three-body problem, what seemed to make the two-body problem tractable was that its solutions could be completely characterized by “constants of the motion”—quantities that stay constant with time (in this case notably the direction of the axis of the ellipse). So for many years one of the big goals with the three-body problem was to find constants of the motion.

In 1887, though, Heinrich Bruns showed that there couldn’t be any such constants of the motion, at least expressible as algebraic functions of the standard {x,y,z} position and velocity coordinates of the three bodies. Then in the mid-1890s Henri Poincaré showed that actually there couldn’t be any constants of the motion that were expressible as any analytic functions of the positions, velocities and mass ratios.

One reason that was particularly disappointing at the time was that it had been hoped that somehow constants of the motion would be found in n-body problems that would lead to a mathematical proof of the long-term stability of the solar system. And as part of his work, Poincaré also saw something else: that at least in particular cases of the three-body problem, there was arbitrarily sensitive dependence on initial conditions—implying that even tiny errors in measurement could be amplified to arbitrarily large changes in predicted behavior (the classic “chaos theory” phenomenon).

But having discovered that particular solutions to the three-body problem could have this kind of instability, Poincaré took a different approach that would actually be characteristic of much of pure mathematics going forward: he decided to look not at individual solutions, but at the space of all possible solutions. And needless to say, he found that for the three-body problem, this was very complicated—though in his efforts to analyze it he invented the field of topology.

Poincaré’s work all but ended efforts to find complete solutions to the three-body problem. It also seemed to some to explain why the series expansions of Delaunay and others hadn’t worked out—though in 1912 Karl Sundman did show that at least in principle the three-body problem could be solved in terms of an infinite series, albeit one that converges outrageously slowly.

But what does it mean to say that there can’t be a solution to the three-body problem? Galois had shown that there couldn’t be a solution to the generic quintic equation, at least in terms of radicals. But actually it’s still perfectly possible to express the solution in terms of elliptic or hypergeometric functions. So why can’t there be some more sophisticated class of functions that can be used to just “solve the three-body problem”?

Here are some pictures of what can actually happen in the three-body problem, with various initial conditions:

Three body problem pictures

eqns = {Subscript[m, 1]*
     Derivative[2][Subscript[r, 1]][
      t] == -((Subscript[m, 1]*
          Subscript[m, 2]*(Subscript[r, 1][t] - Subscript[r, 2][t]))/
        Norm[Subscript[r, 1][t] - Subscript[r, 2][t]]^3) - (Subscript[
         m, 1]*Subscript[m, 
         3]*(Subscript[r, 1][t] - Subscript[r, 3][t]))/
      Norm[Subscript[r, 1][t] - Subscript[r, 3][t]]^3, 
   Subscript[m, 2]*
     Derivative[2][Subscript[r, 2]][
      t] == -((Subscript[m, 1]*
          Subscript[m, 2]*(Subscript[r, 2][t] - Subscript[r, 1][t]))/
        Norm[Subscript[r, 2][t] - Subscript[r, 1][t]]^3) - (Subscript[
         m, 2]*Subscript[m, 
         3]*(Subscript[r, 2][t] - Subscript[r, 3][t]))/
      Norm[Subscript[r, 2][t] - Subscript[r, 3][t]]^3, 
   Subscript[m, 3]*
     Derivative[2][Subscript[r, 3]][
      t] == -((Subscript[m, 1]*
          Subscript[m, 3]*(Subscript[r, 3][t] - Subscript[r, 1][t]))/
        Norm[Subscript[r, 3][t] - Subscript[r, 1][t]]^3) - (Subscript[
         m, 2]*Subscript[m, 
         3]*(Subscript[r, 3][t] - Subscript[r, 2][t]))/
      Norm[Subscript[r, 3][t] - Subscript[r, 2][t]]^3};
(SeedRandom[#]; {Subscript[m, 1], Subscript[m, 2], Subscript[m, 3]} = 
    RandomReal[{0, 1}, 3]; 
   inits = Table[{Subscript[r, i][0] == RandomReal[{-1, 1}, 3], 
      Subscript[r, i]'[0] == RandomReal[{-1, 1}, 3]}, {i, 3}]; 
   sols = NDSolve[{eqns, inits}, {Subscript[r, 1], Subscript[r, 2], 
      Subscript[r, 3]}, {t, 0, 100}]; 
   ParametricPlot3D[{Subscript[r, 1][t], Subscript[r, 2][t], 
      Subscript[r, 3][t]} /. sols, {t, 0, 100}, 
    Ticks -> None]) & /@ {776, 5742, 6711, 2300, 5281, 9225}

And looking at these immediately gives some indication of why it’s not easy to just “solve the three-body problem”. Yes, there are cases where what happens is fairly simple. But there are also cases where it’s not, and where the trajectories of the three bodies continue to be complicated and tangled for a long time.

So what’s fundamentally going on here? I don’t think traditional mathematics is the place to look. But I think what we’re seeing is actually an example of a general phenomenon I call computational irreducibility that I discovered in the 1980s in studying the computational universe of possible programs.

Many programs, like many instances of the three-body problem, behave in quite simple ways. But if you just start looking at all possible simple programs, it doesn’t take long before you start seeing behavior like this:

Cellular automaton array

ArrayPlot[ CellularAutomaton[{#, 3, 1}, {{2}, 0}, 100], 
   ImageSize -> {Automatic, 100}] & /@ {5803305107286, 2119737824118, 
  5802718895085, 4023376322994, 6252890585925}

How can one tell what’s going to happen? Well, one can just keep explicitly running each program and seeing what it does. But the question is: is there some systematic way to jump ahead, and to predict what will happen without tracing through all the steps?

The answer is that in general there isn’t. And what I call the Principle of Computational Equivalence suggests that pretty much whenever one sees complex behavior, there won’t be.

Here’s the way to think about this. The system one’s studying is effectively doing a computation to work out what its behavior will be. So to jump ahead we’d in a sense have to do a more sophisticated computation. But what the Principle of Computational Equivalence says is that actually we can’t—and that whether we’re using our brains or our mathematics or a Turing machine or anything else, we’re always stuck with computations of the same sophistication.

So what about the three-body problem? Well, I strongly suspect that it’s an example of computational irreducibility: that in effect the computations it’s doing are as sophisticated as any computations that we can do, so there’s no way we can ever expect to systematically jump ahead and solve the problem. (We also can’t expect to just define some new finite class of functions that can just be evaluated to give the solution.)

I’m hoping that one day someone will rigorously prove this. There’s some technical difficulty, because the three-body problem is usually formulated in terms of real numbers that immediately have an infinite number of digits—but to compare with ordinary computation one has to require finite processes to set up initial conditions. (Ultimately one wants to show for example that there’s a “compiler” that can go from any program, say for a Turing machine, and can generate instructions to set up initial conditions for a three-body problem so that the evolution of the three-body problem will give the same results as running that program—implying that the three-body problem is capable of universal computation.)

I have to say that I consider Newton in a sense very lucky. It could have been that it wouldn’t have been possible to work out anything interesting from his theory without encountering the kind of difficulties he had with the motion of the Moon—because one would always be running into computational irreducibility. But in fact, there was enough computational reducibility and enough that could be computed easily that one could see that the theory was useful in predicting features of the world (and not getting wrong answers, like with the apse of the Moon)—even if there were some parts that might take two centuries to work out, or never be possible at all.

Newton himself was certainly aware of the potential issue, saying that at least if one was dealing with gravitational interactions between many planets then “to define these motions by exact laws admitting of easy calculation exceeds, if I am not mistaken, the force of any human mind”. And even today it’s extremely difficult to know what the long-term evolution of the solar system will be.

It’s not particularly that there’s sensitive dependence on initial conditions: we actually have measurements that should be precise enough to determine what will happen for a long time. The problem is that we just have to do the computation—a bit like computing the digits of π—to work out the behavior of the n-body problem that is our solar system.

Existing simulations show that for perhaps a few tens of millions of years, nothing too dramatic can happen. But after that we don’t know. Planets could change their order. Maybe they could even collide, or be ejected from the solar system. Computational irreducibility implies that at least after an infinite time it’s actually formally undecidable (in the sense of Gödel’s Theorem or the Halting Problem) what can happen.

One of my children, when they were very young, asked me whether when dinosaurs existed the Earth could have had two moons. For years when I ran into celestial mechanics experts I would ask them that question—and it was notable how difficult they found it. Most now say that at least at the time of the dinosaurs we couldn’t have had an extra moon—though a billion years earlier it’s not clear.

We used to only have one system of planets to study. And the fact that there were (then) 9 of them used to be a classic philosopher’s example of a truth about the world that just happens to be the way it is, and isn’t “necessarily true” (like 2+2=4). But now of course we know about lots of exoplanets. And it’s beginning to look as if there might be a theory for things like how many planets a solar system is likely to have.

At some level there’s presumably a process like natural selection: some configurations of planets aren’t “fit enough” to be stable—and only those that are survive. In biology it’s traditionally been assumed that natural selection and adaptation is somehow what’s led to the complexity we see. But actually I suspect much of it is instead just a reflection of what generally happens in the computational universe—both in biology and in celestial mechanics. Now in celestial mechanics, we haven’t yet seen in the wild any particularly complex forms (beyond a few complicated gap structures in rings, and tumbling moons and asteroids). But perhaps elsewhere we’ll see things like those obviously tangled solutions to the three-body problem—that come closer to what we’re used to in biology.

It’s remarkable how similar the issues are across so many different fields. For example, the whole idea of using “perturbation theory” and series expansions that has existed since the 1700s in celestial mechanics is now also core to quantum field theory. But just like in celestial mechanics there’s trouble with convergence (maybe one should try renormalization or resummation in celestial mechanics). And in the end one begins to realize that there are phenomena—no doubt like turbulence or the three-body problem—that inevitably involve more sophisticated computations, and that need to be studied not with traditional mathematics of the kind that was so successful for Newton and his followers but with the kind of science that comes from exploring the computational universe.

Approaching Modern Times

But let’s get back to the story of the motion of the Moon. Between Brown’s tables, and Poincaré’s theoretical work, by the beginning of the 1900s the general impression was that whatever could reasonably be computed about the motion of the Moon had been computed.

Occasionally there were tests. Like for example in 1925, when there was a total solar eclipse visible in New York City, and the New York Times perhaps overdramatically said that “scientists [were] tense… wondering whether they or moon is wrong as eclipse lags five seconds behind”. The fact is that a prediction accurate to 5 seconds was remarkably good, and we can’t do all that much better even today. (By the way, the actual article talks extensively about “Professor Brown”—as well as about how the eclipse might “disprove Einstein” and corroborate the existence of “coronium”—but doesn’t elaborate on the supposed prediction error.)

Newspaper article

 

As a practical matter, Brown’s tables were not exactly easy to use: to find the position of the Moon from them required lots of mechanical desk calculator work, as well as careful transcription of numbers. And this led Leslie Comrie in 1932 to propose using a punch-card-based IBM Hollerith automatic tabulator—and with the help of Thomas Watson, CEO of IBM, what was probably the first “scientific computing laboratory” was established—to automate computations from Brown’s tables.

Automatic tabulation

 

(When I was in elementary school in England in the late 1960s—before electronic calculators—I always carried around, along with my slide rule, a little book of “4-figure mathematical tables”. I think I found it odd that such a book would have an author—and perhaps for that reason I still remember the name: “L. J. Comrie”.)

By the 1950s, the calculations in Brown’s tables were slowly being rearranged and improved to make them more suitable for computers. But then with John F. Kennedy’s 1962 “We choose to go the Moon”, there was suddenly urgent interest in getting the most accurate computations of the Moon’s position. As it turned out, though, it was basically just a tweaked version of Brown’s tables, running on a mainframe computer, that did the computations for the Apollo program.

At first, computers were used in celestial mechanics purely for numerical computation. But by the mid-1960s there were also experiments in using them for algebraic computation, and particularly to automate the generation of series expansions. Wallace Eckert at IBM started using FORMAC to redo Brown’s tables, while in Cambridge David Barton and Steve Bourne (later the creator of the “Bourne shell” (sh) in Unix) built their own CAMAL computer algebra system to try extending the kind of thing Delaunay had done. (And by 1970, Delaunay’s 7th-order calculations had been extended to 20th order.)

When I myself started to work on computer algebra in 1976 (primarily for computations in particle physics), I’d certainly heard about CAMAL, but I didn’t know what it had been used for (beyond vaguely “celestial mechanics”). And as a practicing theoretical physicist in the late 1970s, I have to say that the “problem of the Moon” that had been so prominent in the 1700s and 1800s had by then fallen into complete obscurity.

I remember for example in 1984 asking a certain Martin Gutzwiller, who was talking about quantum chaos, what his main interest actually was. And when he said “the problem of the Moon”, I was floored; I didn’t know there still was any problem with the Moon. As it turns, in writing this post I found out that Gutzwiller was actually the person who took over from Eckert and spent nearly two decades working on trying to improve the computations of the position of the Moon.

Why Not Just Solve It?

Traditional approaches to the three-body problem come very much from a mathematical way of thinking. But modern computational thinking immediately suggests a different approach. Given the differential equations for the three-body problem, why not just directly solve them? And indeed in the Wolfram Language there’s a built-in function NDSolve for numerically solving systems of differential equations.

So what happens if one just feeds in equations for a three-body problem? Well, here are the equations:

eqns = {Subscript[m,       1] (Subscript[r, 1]^\[Prime]\[Prime])[t] == -((       Subscript[m, 1] Subscript[m,         2] (Subscript[r, 1][t] - Subscript[r, 2][t]))/       Norm[Subscript[r, 1][t] - Subscript[r, 2][t]]^3) - (      Subscript[m, 1] Subscript[m,        3] (Subscript[r, 1][t] - Subscript[r, 3][t]))/      Norm[Subscript[r, 1][t] - Subscript[r, 3][t]]^3,     Subscript[m,       2] (Subscript[r, 2]^\[Prime]\[Prime])[t] == -((       Subscript[m, 1] Subscript[m,         2] (Subscript[r, 2][t] - Subscript[r, 1][t]))/       Norm[Subscript[r, 2][t] - Subscript[r, 1][t]]^3) - (      Subscript[m, 2] Subscript[m,        3] (Subscript[r, 2][t] - Subscript[r, 3][t]))/      Norm[Subscript[r, 2][t] - Subscript[r, 3][t]]^3,     Subscript[m,       3] (Subscript[r, 3]^\[Prime]\[Prime])[t] == -((       Subscript[m, 1] Subscript[m,         3] (Subscript[r, 3][t] - Subscript[r, 1][t]))/       Norm[Subscript[r, 3][t] - Subscript[r, 1][t]]^3) - (      Subscript[m, 2] Subscript[m,        3] (Subscript[r, 3][t] - Subscript[r, 2][t]))/      Norm[Subscript[r, 3][t] - Subscript[r, 2][t]]^3};
eqns = {Subscript[m, 
     1] (Subscript[r, 1]^\[Prime]\[Prime])[t] == -((
      Subscript[m, 1] Subscript[m, 
       2] (Subscript[r, 1][t] - Subscript[r, 2][t]))/
      Norm[Subscript[r, 1][t] - Subscript[r, 2][t]]^3) - (
     Subscript[m, 1] Subscript[m, 
      3] (Subscript[r, 1][t] - Subscript[r, 3][t]))/
     Norm[Subscript[r, 1][t] - Subscript[r, 3][t]]^3, 
   Subscript[m, 
     2] (Subscript[r, 2]^\[Prime]\[Prime])[t] == -((
      Subscript[m, 1] Subscript[m, 
       2] (Subscript[r, 2][t] - Subscript[r, 1][t]))/
      Norm[Subscript[r, 2][t] - Subscript[r, 1][t]]^3) - (
     Subscript[m, 2] Subscript[m, 
      3] (Subscript[r, 2][t] - Subscript[r, 3][t]))/
     Norm[Subscript[r, 2][t] - Subscript[r, 3][t]]^3, 
   Subscript[m, 
     3] (Subscript[r, 3]^\[Prime]\[Prime])[t] == -((
      Subscript[m, 1] Subscript[m, 
       3] (Subscript[r, 3][t] - Subscript[r, 1][t]))/
      Norm[Subscript[r, 3][t] - Subscript[r, 1][t]]^3) - (
     Subscript[m, 2] Subscript[m, 
      3] (Subscript[r, 3][t] - Subscript[r, 2][t]))/
     Norm[Subscript[r, 3][t] - Subscript[r, 2][t]]^3};

Now as an example let’s set the masses to random values:

{Subscript[m, 1], Subscript[m, 2], Subscript[m, 3]} = RandomReal[{0, 1}, 3]

{Subscript[m, 1], Subscript[m, 2], Subscript[m, 3]} = RandomReal[{0, 1}, 3]

And let’s define the initial position and velocity for each body to be random as well:

inits = Table[{Subscript[r, i][0] == RandomReal[{-1, 1}, 3], Derivative[1][Subscript[r, i]][0] == RandomReal[{-1, 1}, 3]}, {i, 3}]

inits = Table[{Subscript[r, i][0] == RandomReal[{-1, 1}, 3], Derivative[1][Subscript[r, i]][0] == RandomReal[{-1, 1}, 3]}, {i, 3}]

Now we can just use NDSolve to get the solutions (it gives them as implicit approximate numerical functions of t):

sols = NDSolve[{eqns, inits}, {Subscript[r, 1], Subscript[r, 2], Subscript[r, 3]}, {t, 0, 100}]

sols = NDSolve[{eqns, inits}, {Subscript[r, 1], Subscript[r, 2], Subscript[r, 3]}, {t, 0, 100}]

And now we can plot them. And now we’ve got a solution to a three-body problem, just like that!

ParametricPlot3D[Evaluate[{Subscript[r, 1][t], Subscript[r, 2][t], Subscript[r, 3][t]} /. First[sols]], {t, 0, 100}]

ParametricPlot3D[Evaluate[{Subscript[r, 1][t], Subscript[r, 2][t], Subscript[r, 3][t]} /. First[sols]], {t, 0, 100}]

Well, obviously this is using the Wolfram Language and a huge tower of modern technology. But would it have been possible even right from the beginning for people to generate direct numerical solutions to the three-body problem, rather than doing all that algebra? Back in the 1700s, Euler already knew what’s now called Euler’s method for finding approximate numerical solutions to differential equations. So what if he’d just used that method to calculate the motion of the Moon?

The method relies on taking a sequence of discrete steps in time. And if he’d used, say, a step size of a minute, then he’d have had to take 40,000 steps to get results for a month, but he should have been able to successfully reproduce the position of the Moon to about a percent. If he’d tried to extend to 3 months, however, then he would already have had at least a 10% error.

Any numerical scheme for solving differential equations in practice eventually builds up some kind of error—but the more one knows about the equations one’s solving, and their expected solutions, the more one’s able to preprocess and adapt things to minimize the error. NDSolve has enough automatic adaptivity built into it that it’ll do pretty well for a surprisingly long time on a typical three-body problem. (It helps that the Wolfram Language and NDSolve can handle numbers with arbitrary precision, not just machine precision.)

But if one looks, say, at the total energy of the three-body system—which one can prove from the equations should stay constant—then one will typically see an error slowly build up in it. One can avoid this if one effectively does a change of variables in the equations to “factor out” energy. And one can imagine doing a whole hierarchy of algebraic transformations that in a sense give the numerical scheme as much help as possible.

And indeed since at least the 1980s that’s exactly what’s been done in practical work on the three-body problem, and the Earth-Moon-Sun system. So in effect it’s a mixture of the traditional algebraic approach from the 1700s and 1800s, together with modern numerical computation.

The Real Earth-Moon-Sun Problem

OK, so what’s involved in solving the real problem of the Earth-Moon-Sun system? The standard three-body problem gives a remarkably good approximation to the physics of what’s happening. But it’s obviously not the whole story.

For a start, the Earth isn’t the only planet in the solar system. And if one’s trying to get sufficiently accurate answers, one’s going to have to take into account the gravitational effect of other planets. The most important is Jupiter, and its typical effect on the orbit of the Moon is at about the 10-5 level—sufficiently large that for example Brown had to take it into account in his tables.

The next effect is that the Earth isn’t just a point mass, or even a precise sphere. Its rotation makes it bulge at the equator, and that affects the orbit of the Moon at the 10-6 level.

Orbits around the Earth ultimately depend on the full mass distribution and gravitational field of the Earth (which is what Sputnik-1 was nominally launched to map)—and both this, and the reverse effect from the Moon, come in at the 10-8 level. At the 10-9 level there are then effects from tidal deformations (“solid tides”) on the Earth and moon, as well as from gravitational redshift and other general relativistic phenomena.

To predict the position of the Moon as accurately as possible one ultimately has to have at least some model for these various effects.

But there’s a much more immediate issue to deal with: one has to know the initial conditions for the Earth, Sun and Moon, or in other words, one has to know as accurately as possible what their positions and velocities were at some particular time.

And conveniently enough, there’s now a really good way to do that, because Apollo 11, 14 and 15 all left laser retroreflectors on the Moon. And by precisely timing how long it takes a laser pulse from the Earth to round-trip to these retroreflectors, it’s now possible in effect to measure the position of the Moon to millimeter accuracy.

OK, so how do modern analogs of the Babylonian ephemerides actually work? Internally they’re dealing with the equations for all the significant bodies in the solar system. They do symbolic preprocessing to make their numerical work as easy as possible. And then they directly solve the differential equations for the system, appropriately inserting models for things like the mass distribution in the Earth.

They start from particular measured initial conditions, but then they repeatedly insert new measurements, trying to correct the parameters of the model so as to optimally reproduce all the measurements they have. It’s very much like a typical machine learning task—with the training data here being observations of the solar system (and typically fitting just being least squares).

But, OK, so there’s a model one can run to figure out something like the position of the Moon. But one doesn’t want to have to explicitly do that every time one needs to get a result; instead one wants in effect just to store a big table of pre-computed results, and then to do something like interpolation to get any particular result one needs. And indeed that’s how it’s done today.

How It’s Really Done

Back in the 1960s NASA started directly solving differential equations for the motion of planets. The Moon was more difficult to deal with, but by the 1980s that too was being handled in a similar way. Ongoing data from things like the lunar retroreflectors was added, and all available historical data was inserted as well.

The result of all this was the JPL Development Ephemeris (JPL DE). In addition to new observations being used, the underlying system gets updated every few years, typically to get what’s needed for some spacecraft going to some new place in the solar system. (The latest is DE432, built for going to Pluto.)

But how is the actual ephemeris delivered? Well, for every thousand years covered, the ephemeris has about 100 megabytes of results, given as coefficients for Chebyshev polynomials, which are convenient for interpolation. And for any given quantity in any given coordinate system over a particular period of time, one accesses the appropriate parts of these results.

OK, but so how does one find an eclipse? Well, it’s an iterative process. Start with an approximation, perhaps from the saros cycle. Then interpolate the ephemeris and look at the result. Then keep iterating until one finds out just when the Moon will be in the appropriate position.

But actually there’s some more to do. Because what’s originally computed are the positions of the barycenters (centers of mass) of the various bodies. But now one has to figure out how the bodies are oriented.

The Earth rotates, and we know its rate quite precisely. But the Moon is basically locked with the same face pointing to the Earth, except that in practice there are small “librations” where the Moon wobbles a little back and forth—and these turn out to be particularly troublesome to predict.

Computing the Eclipse

OK, so let’s say one knows where the Earth, Moon and Sun are. How does one then figure out where on the Earth the eclipse will actually hit? Well, there’s some further geometry to do. Basically, the Moon generates a cone of shadow in a direction defined by the location of the Sun, and what’s then needed is to figure out how the surface of the Earth intersects that cone.

In 1824 Friedrich Bessel suggested in effect inverting the problem by using the shadow cone to define a coordinate system in which to specify the positions of the Sun and Moon. The resulting so-called Besselian elements provide a convenient summary of the local geometry of an eclipse—with respect to which its path can be defined.

OK, but so how does one figure out at what time an eclipse will actually reach a given point on Earth? Well, first one has to be clear on one’s definition of time. And there’s an immediate issue with the speed of light and special relativity. What does it mean to say that the positions of the Earth and Sun are such-and-such at such-and-such a time? Because it takes light about 8 minutes to get to the Earth from the Sun, we only get to see where the Sun was 8 minutes ago, not where it is now.

And what we need is really a classic special relativity setup. We essentially imagine that the solar system is filled with a grid of clocks that have been synchronized by light pulses. And what a modern ephemeris does is to quote the results for positions of bodies in the solar system relative to the times on those clocks. (General relativity implies that in different gravitational fields the clocks will run at different rates, but for our purposes this is a tiny effect. But what isn’t a tiny effect is including retardation in the equations for the n-body problem—making them become delay differential equations.)

But now there’s another issue. If one’s observing the eclipse, one’s going to be using some timepiece (phone?) to figure out what time it is. And if it’s working properly that timepiece should show official “civil time” that’s based on UTC—which is what NTP internet time is synchronized to. But the issue is that UTC has a complicated relationship to the time used in the astronomical ephemeris.

The starting point is what’s called UT1: a definition of time in which one day is the average time it takes the Earth to rotate once relative to the Sun. But the point is that this average time isn’t constant, because the rotation of the Earth is gradually slowing down, primarily as a result of interactions with the Moon. But meanwhile, UTC is defined by an atomic clock whose timekeeping is independent of any issues about the rotation of the Earth.

There’s a convention for keeping UT1 aligned with UTC: if UT1 is going to get more than 0.9 seconds away from UTC, then a leap second is added to UTC. One might think this would be a tiny effect, but actually, since 1972, a total of 27 leap seconds have been added. Exactly when a new leap second will be needed is unpredictable; it depends on things like what earthquakes have occurred. But we need to account for leap seconds if we’re going to get the time of the eclipse correct to the second relative to UTC or internet time.

There are a few other effects that are also important in the precise observed timing of the eclipse. The most obvious is geo elevation. In doing astronomical computations, the Earth is assumed to be an ellipsoid. (There are many different definitions, corresponding to different geodetic “datums”—and that’s an issue in defining things like “sea level”, but it’s not relevant here.) But if you’re at a different height above the ellipsoid, the cone of shadow from the eclipse will reach you at a different time. And the size of this effect can be as much as 0.3 seconds for every 1000 feet of height.

All of the effects we’ve talked about we’re readily able to account for. But there is one remaining effect that’s a bit more difficult. Right at the beginning or end of totality one typically sees points of light on the rim of the Moon. Known as Baily’s beads, these are the result of rays of light that make it to us between mountains on the Moon. Figuring out exactly when all these rays are extinguished requires taking geo elevation data for the Moon, and effectively doing full 3D ray tracing. The effect can last as long as a second, and can cause the precise edge of totality to move by as much as a mile. (One can also imagine effects having to do with the corona of the Sun, which is constantly changing.)

But in the end, even though the shadow of the Moon on the Earth moves at more than 1000 mph, modern science successfully makes it possible to compute when the shadow will reach a particular point on Earth to an accuracy of perhaps a second. And that’s what our precisioneclipse.com website is set up to do.

Eclipse Experiences

I saw my first partial solar eclipse more than 50 years ago. And I’ve seen one total solar eclipse before in my life—in 1991. It was the longest eclipse (6 minutes 53 seconds) that’ll happen for more than a century.

There was a certain irony to my experience, though, especially in view of our efforts now to predict the exact arrival time of next week’s eclipse. I’d chartered a plane and flown to a small airport in Mexico (yes, that’s me on the left with the silly hat)—and my friends and I had walked to a beautiful deserted beach, and were waiting under a cloudless sky for the total eclipse to begin.

1991 eclipse trip photos

 

I felt proud of how prepared I was—with maps marking to the minute when the eclipse should arrive. But then I realized: there we were, out on a beach with no obvious signs of modern civilization—and nobody had brought any properly set timekeeping device (and in those days my cellphone was just a phone, and didn’t even have signal there).

And so it was that I missed seeing a demonstration of an impressive achievement of science. And instead I got to experience the eclipse pretty much the way people throughout history have experienced eclipses—even if I did know that the Moon would continue gradually eating into the Sun and eventually cover it, and that it wouldn’t make the world end.

There’s always something sobering about astronomical events, and about realizing just how tiny human scales are compared to them. Billions of eclipses have happened over the course of the Earth’s history. Recorded history has covered only a few thousand of them. On average, there’s an eclipse at any given place on Earth roughly every 400 years; in Jackson, WY, where I’m planning to see next week’s eclipse, it turns out the next total eclipse will be 727 years from now—in 2744.

In earlier times, civilizations built giant monuments to celebrate the motions of the Sun and moon. For the eclipse next week what we’re making is a website. But that website builds on one of the great epics of human intellectual history—stretching back to the earliest times of systematic science, and encompassing contributions from a remarkable cross-section of the most celebrated scientists and mathematicians from past centuries.

It’ll be about 9538 days since the eclipse I saw in 1991. The Moon will have traveled some 500 million miles around the Earth, and the Earth some 15 billion miles around the Sun. But now—in a remarkable triumph of science—we’re computing to the second when they’ll be lined up again.

It’s Another Impressive Release! Launching Version 11.2 Today

$
0
0
It’s Another Impressive Release! Launching Version 11.2 Today

Our Latest R&D Output

I’m excited today to announce the latest output from our R&D pipeline: Version 11.2 of the Wolfram Language and Mathematica—available immediately on desktop (Mac, Windows, Linux) and cloud.

It was only this spring that we released Version 11.1. But after the summer we’re now ready for another impressive release—with all kinds of additions and enhancements, including 100+ entirely new functions:

New functions word cloud

We have a very deliberate strategy for our releases. Integer releases (like 11) concentrate on major complete new frameworks that we’ll be building on far into the future. “.1” releases (like 11.2) are intended as snapshots of the latest output from our R&D pipeline–delivering new capabilities large and small as soon as they’re ready.

Version 11.2 has a mixture of things in it—ranging from ones that provide finishing touches to existing major frameworks, to ones that are first hints of major frameworks under construction. One of my personal responsibilities is to make sure that everything we add is coherently designed, and fits into the long-term vision of the system in a unified way.

And by the time we’re getting ready for a release, I’ve been involved enough with most of the new functions we’re adding that they begin to feel like personal friends. So when we’re doing a .1 release and seeing what new functions are going to be ready for it, it’s a bit like making a party invitation list: who’s going to be able to come to the big celebration?

Years back there’d be a nice list, but it would be of modest length. Today, however, I’m just amazed at how fast our R&D pipeline is running, and how much comes out of it every month. Yes, we’ve been consistently building our Wolfram Language technology stack for more than 30 years—and we’ve got a great team. But it’s still a thrill for me to see just how much we’re actually able to deliver to all our users in a .1 release like 11.2.

Advances in Machine Learning

It’s hard to know where to begin. But let’s pick a current hot area: machine learning.

We’ve had functionality that would now be considered machine learning in the Wolfram Language for decades, and back in 2014 we introduced the “machine-learning superfunctions” Classify and Predict—to give broad access to modern machine learning. By early 2015, we had state-of-the-art deep-learning image identification in ImageIdentify, and then, last year, in Version 11, we began rolling out our full symbolic neural net computation system.

Our goal is to push the envelope of what’s possible in machine learning, but also to deliver everything in a nice, integrated way that makes it easy for a wide range of people to use, even if they’re not machine-learning experts. And in Version 11.2 we’ve actually used machine learning to add automation to our machine-learning capabilities.

So, in particular, Classify and Predict are significantly more powerful in Version 11.2. Their basic scheme is that you give them training data, and they’ll learn from it to automatically produce a machine-learning classifier or predictor. But a critical thing in doing this well is to know what features to extract from the data—whether it’s images, sounds, text, or whatever. And in Version 11.2 Classify and Predict have a variety of new kinds of built-in feature extractors that have been pre-trained on a wide range of kinds of data.

But the most obviously new aspect of Classify and Predict is how they select the core machine-learning method to use (as well as hyperparameters for it). (By the way, 11.2 also introduces things like optimized gradient-boosted trees.) And if you run Classify and Predict now in a notebook you’ll actually see them dynamically figuring out and optimizing what they’re doing (needless to say, using machine learning):

Classify and Predict animation

By the way, you can always press Stop to stop the training process. And with the new option TimeGoal you can explicitly say how long the training should be planned to be—from seconds to years.

As a field, machine learning is advancing very rapidly right now (in the course of my career, I’ve seen perhaps a dozen fields in this kind of hypergrowth—and it’s always exciting). And one of the things about our general symbolic neural net framework is that we’re able to take new advances and immediately integrate them into our long-term system—and build on them in all sorts of ways.

At the front lines of this is the function NetModel—to which new trained and untrained models are being added all the time. (The models are hosted in the cloud—but downloaded and cached for desktop or embedded use.) And so, for example, a few weeks ago NetModel got a new model for inferring geolocations of photographs—that’s based on basic research from just a few months ago:

NetModel["ResNet-101 Trained on YFCC100M Geotagged Data"]

NetModel["ResNet-101 Trained on YFCC100M Geotagged Data"]

Now if we give it a picture with sand dunes in it, its top inferences for possible locations seem to center around certain deserts:

GeoBubbleChart[NetModel["ResNet-101 Trained on YFCC100M Geotagged Data"]["<image suppressed>", {"TopProbabilities", 50}]]

GeoBubbleChart[
 NetModel["ResNet-101 Trained on YFCC100M Geotagged Data"][
  CloudGet["https://wolfr.am/dunes"], {"TopProbabilities", 50}]]

NetModel handles networks that can be used for all sorts of purposes—not only as classifiers, but also, for example, as feature extractors.

Building on NetModel and our symbolic neural network framework, we’ve also been able to add new built-in classifiers to use directly from Classify. So now, in addition to things like sentiment, we have NSFW, face age and facial expression (yes, an actual tiger isn’t safe, but in a different sense):

Classify["NSFWImage", "<image suppressed>"]

Classify["NSFWImage",CloudGet["https://wolfr.am/tiger"]]

Our built-in ImageIdentify function (whose underlying network you can access with NetModel) has been tuned and retrained for Version 11.2—but fundamentally it’s still a classifier. One of the important things that’s happening with machine learning is the development of new types of functions, supporting new kinds of workflows. We’ve got a lot of development going on in this direction, but for 11.2 one new (and fun) example is ImageRestyle—that takes a picture and applies the style of another picture to it:

ImageRestyle["<image suppressed>", "<image suppressed>"]

ImageRestyle[\[Placeholder],\[Placeholder]]

And in honor of this new functionality, maybe it’s time to get the image on my personal home page replaced with something more “styled”—though it’s a bit hard to know what to choose:

ImageRestyle[#, [] PerformanceGoal -> "Quality", TargetDevice -> "GPU"] & /@ {insert image,insert image,insert image,insert image,insert image,insert image}

ImageRestyle gallery

ImageRestyle[#, , PerformanceGoal -> "Quality", 
   TargetDevice -> 
    "GPU"] & /@ {\[Placeholder], \[Placeholder], \[Placeholder], \
\[Placeholder], \[Placeholder], \[Placeholder]}

By the way, another new feature of 11.2 is the ability to directly export trained networks and other machine-learning functionality. If you’re only interested in an actual network, you can get in MXNet format—suitable for immediate execution wherever MXNet is supported. In typical real situations, there’s some pre- and post-processing that’s needed as well—and the complete functionality can be exported in WMLF (Wolfram Machine Learning Format).

Cloud (and iOS) Notebooks

We invented the idea of notebooks back in 1988, for Mathematica 1.0—and over the past 29 years we’ve been steadily refining and extending how they work on desktop systems. About nine years ago we also began the very complex process of bringing our notebook interface to web browsers—to be able to run notebooks directly in the cloud, without any need for local installation.

It’s been a long, hard journey. But between new features of the Wolfram Language and new web technologies (like isomorphic React, Flow, MobX)—and heroic efforts of software engineering—we’re finally reaching the point where our cloud notebooks are ready for robust prime-time use. Like, try this one:

Notebook

We actually do continuous releases of the Wolfram Cloud—but with Version 11.2 of the Wolfram Language we’re able to add a final layer of polish and tuning to cloud notebooks.

You can create and compute directly on the web, and you can immediately “peel off” a notebook to run on the desktop. Or you can start on the desktop, and immediately push your notebook to the cloud, so it can be shared, embedded—and further edited or computed with—in the cloud.

By the way, when you’re using the Wolfram Cloud, you’re not limited to desktop systems. With the Wolfram Cloud App, you can work with notebooks on mobile devices too. And now that Version 11.2 is released, we’re able to roll out a new version of the Wolfram Cloud App, that makes it surprisingly realistic (thanks to some neat UX ideas) to write Wolfram Language code even on your phone.

Talking of mobile devices, there’s another big thing that’s coming: interactive Wolfram Notebooks running completely locally and natively on iOS devices—both tablets and phones. This has been another heroic software engineering project—which actually started nearly as long ago as the cloud notebook project.

The goal here is to be able to read and interact with—but not author—notebooks directly on an iOS device. And so now with the Wolfram Player App that will be released next week, you can have a notebook on your iOS device, and use Manipulate and other dynamic content, as well as read and navigate notebooks—with the whole interface natively adapted to the touch environment.

For years it’s been frustrating when people send me notebook attachments in email, and I’ve had to do things like upload them to the cloud to be able to read them on my phone. But now with native notebooks on iOS, I can immediately just read notebook attachments directly from email.

Mathematical Limits

Math was the first big application of the Wolfram Language (that’s why it was called Mathematica!)… and for more than 30 years we’ve been committed to aggressively pursuing R&D to expand the domain of math that can be made computational. And in Version 11.2 the biggest math advance we’ve made is in the area of limits.

Mathematica 1.0 back in 1988 already had a basic Limit function. And over the years Limit has gradually been enhanced. But in 11.2—as a result of algorithms we’ve developed over the past several years—it’s reached a completely new level.

The simple-minded way to compute a limit is to work out the first terms in a power series. But that doesn’t work when functions increase too rapidly, or have wild and woolly singularities. But in 11.2 the new algorithms we’ve developed have no problem handling things like this:

Limit[E^(E^x + x^2) (-Erf[E^-E^x - x] - Erf[x]), x -> \[Infinity]]

Limit[E^(E^x + x^2) (-Erf[E^-E^x - x] - Erf[x]), x -> \[Infinity]]
"Limit[3*x + Sqrt[9*x^2 + 4*x - Sin[x]], x -> -Infinity]

Limit[(3 x + Sqrt[9 x^2 + 4 x - Sin[x]]), x -> -\[Infinity]]

It’s very convenient that we have a test set of millions of complicated limit problems that people have asked Wolfram|Alpha about over the past few years—and I’m pleased to say that with our new algorithms we can now immediately handle more than 96% of them.

Limits are in a sense at the very core of calculus and continuous mathematics—and to do them correctly requires a huge tower of knowledge about a whole variety of areas of mathematics. Multivariate limits are particularly tricky—with the main takeaway from many textbooks basically being “it’s hard to get them right”. Well, in 11.2, thanks to our new algorithms (and with a lot of support from our algebra, functional analysis and geometry capabilities), we’re finally able to correctly do a very wide range of multivariate limits—saying whether there’s a definite answer, or whether the limit is provably indeterminate.

Version 11.2 also introduces two other convenient mathematical constructs: MaxLimit and MinLimit (sometimes known as lim sup and lim inf). Ordinary limits have a habit of being indeterminate whenever things get funky, but MaxLimit and MinLimit have definite values, and are what come up most often in applications.

So, for example, there isn’t a definite ordinary limit here:

Limit[Sin[x] + Cos[x/4], x -> Infinity]

Limit[Sin[x] + Cos[x/4], x -> \[Infinity]]

But there’s a MaxLimit, that turns out to be a complicated algebraic number:

MaxLimit[Sin[x] + Cos[x/4], x -> \[Infinity]] // FullSimplify

MaxLimit[Sin[x] + Cos[x/4], x -> \[Infinity]] // FullSimplify
N[%]

N[%]

Another new construct in 11.2 is DiscreteLimit, that gives limits of sequences. Like here’s it’s illustrating the Prime Number Theorem:

DiscreteLimit[Prime[n]/(n*Log[n]), n -> Infinity]

DiscreteLimit[Prime[n]/(n Log[n]), n -> \[Infinity]]

And here it’s giving the limiting value of the solution to a recurrence relation:

DiscreteLimit[RSolveValue[{x[n+1] == Sqrt[1 + x[n]+1/x[n]], x[1] == 3}, x[n], n],n->\[Infinity]]
DiscreteLimit[
 RSolveValue[{x[n + 1] == Sqrt[1 + x[n] + 1/x[n]], x[1] == 3}, x[n], 
  n], n -> \[Infinity]]

All Sorts of New Data

There’s always new data in the Wolfram Knowledgebase—flowing every second from all sorts of data feeds, and systematically being added by our curators and curation systems. The architecture of our cloud and desktop system allows both new data and new types of data (as well as natural language input for it) to be immediately available in the Wolfram Language as soon as it’s in the Wolfram Knowledgebase.

And between Version 11.1 and Version 11.2, there’ve been millions of updates to the Knowledgebase. There’ve also been some new types of data added. For example—after several years of development—we’ve now got well-curated data on all notable military conflicts, battles, etc. in history:

Entity["MilitaryConflict", "SecondPunicWar"][EntityProperty["MilitaryConflict", "Battles"]]

Entity["MilitaryConflict", "SecondPunicWar"][
 EntityProperty["MilitaryConflict", "Battles"]]
GeoListPlot[%]

GeoListPlot[%]

Another thing that’s new in 11.2 is greatly enhanced predictive caching of data in the Wolfram Language—making it much more efficient to compute with large volumes of curated data from the Wolfram Knowledgebase.

By the way, Version 11.2 is the first new version to be released since the Wolfram Data Repository was launched. And through the Data Repository, 11.2 has access to nearly 600 curated datasets across a very wide range of areas. 11.2 also now supports functions like ResourceSubmit, for programmatically submitting data for publication in the Wolfram Data Repository. (You can also publish data yourself just using CloudDeploy.)

There’s a huge amount of data and types of computations available in Wolfram|Alpha—that with great effort have been brought to the level where they can be relied on, at least for the kind of one-shot usage that’s typical in Wolfram|Alpha. But one of our long-term goals is to take as many areas as possible and raise the level even higher—to the point where they can be built into the core Wolfram Language, and relied on for systematic programmatic usage.

In Version 11.2 an area where this has happened is ocean tides. So now there’s a function TideData that can give tide predictions for any of the tide stations around the world. I actually found myself using this function in a recent livecoding session I did—where it so happened that I needed to know daily water levels in Aberdeen Harbor in 1913. (Watch the Twitch recording to find out why!)

TideData[Entity["City", {"Aberdeen", "Maryland", "UnitedStates"}], "WaterLevel", DateRange[DateObject[{1913, 1, 1}], DateObject[{1913, 12, 31}], "Day"]]

TideData[Entity[
  "City", {"Aberdeen", "Maryland", "UnitedStates"}], "WaterLevel", 
 DateRange[DateObject[{1913, 1, 1}], DateObject[{1913, 12, 31}], 
  "Day"]]
DateListPlot[%]

DateListPlot[%]

GeoImage

GeoGraphics and related functions have built-in access to detailed maps of the world. They’ve also had access to low-resolution satellite imagery. But in Version 11.2 there’s a new function GeoImage that uses an integrated external service to provide full-resolution satellite imagery:

GeoImage[GeoDisk[Entity["Building", "ThePentagon::qzh8d"], Quantity[0.4, "Miles"]]]

GeoImage[GeoDisk[Entity["Building", "ThePentagon::qzh8d"], 
  Quantity[0.4, "Miles"]]]
GeoImage[GeoDisk[Entity["Building", "Stonehenge::46k59"],    Quantity[250, "Feet"]]]
GeoImage[GeoDisk[Entity["Building", "Stonehenge::46k59"], 
  Quantity[250, "Feet"]]]

I’ve ended up using GeoImage in each of the two livecoding sessions I did just recently. Yes, in principle one could go to the web and find a satellite image of someplace, but it’s amazing what a different level of utility one reaches when one can programmatically get the satellite image right inside the Wolfram Language—and then maybe feed it to image processing, or visualization, or machine-learning functions. Like here’s a feature space plot of satellite images of volcanos in California:

FeatureSpacePlot[GeoImage /@ GeoEntities[Entity["AdministrativeDivision", {"California", "UnitedStates"}], "Volcano"]]

FeatureSpacePlot[
 GeoImage /@ 
  GeoEntities[
   Entity["AdministrativeDivision", {"California", "UnitedStates"}], 
   "Volcano"]]

We’re always updating and adding all sorts of geo data in the Wolfram Knowledgebase. And for example, as of Version 11.2, we’ve now got high-resolution geo elevation data for the Moon—which came in very handy for our recent precision eclipse computation project.

ListPlot3D[GeoElevationData[GeoDisk[Entity["MannedSpaceMission", "Apollo15"][EntityProperty["MannedSpaceMission", "LandingPosition"]], Quantity[10, "Miles"]]], Mesh -> None]" width="564" height="433

ListPlot3D[
 GeoElevationData[
  GeoDisk[Entity["MannedSpaceMission", "Apollo15"][
    EntityProperty["MannedSpaceMission", "LandingPosition"]], 
   Quantity[10, "Miles"]]], Mesh -> None]

Visualization

One of the obvious strengths of the Wolfram Language is its wide range of integrated and highly automated visualization capabilities. Version 11.2 adds some convenient new functions and options. An example is StackedListPlot, which, as its name suggests, makes stacked (cumulative) list plots:

StackedListPlot[RandomInteger[10, {3, 30}]]

StackedListPlot[RandomInteger[10, {3, 30}]]

There’s also StackedDateListPlot, here working with historical time series from the Wolfram Knowledgebase:

StackedDateListPlot[  EntityClass[   "Country", {    EntityProperty["Country", "Population"] -> TakeLargest[10]}][   Dated["Population", All],     "Association"], PlotLabels -> Automatic]

StackedDateListPlot[
 EntityClass[
  "Country", {
   EntityProperty["Country", "Population"] -> TakeLargest[10]}][
  Dated["Population", All],
    "Association"], PlotLabels -> Automatic]
StackedDateListPlot[  EntityClass[   "Country", {    EntityProperty["Country", "Population"] -> TakeLargest[10]}][   Dated["Population", All],   "Association"], PlotLabels -> Automatic, PlotLayout -> "Percentile"]

StackedDateListPlot[
 EntityClass[
  "Country", {
   EntityProperty["Country", "Population"] -> TakeLargest[10]}][
  Dated["Population", All],
  "Association"], PlotLabels -> Automatic, PlotLayout -> "Percentile"]

One of our goals in the Wolfram Language is to make good stylistic choices as automatic as possible. And in Version 11.2 we’ve, for example, added a whole collection of plot themes for AnatomyPlot3D. You can always explicitly give whatever styling you want. But we provide many default themes. You can pick a classic anatomy book look (by the way, all these 3D objects are fully manipulable and computable):

AnatomyPlot3D[Entity["AnatomicalStructure", "LeftHand"], PlotTheme -> "Classic"]
AnatomyPlot3D[Entity["AnatomicalStructure", "LeftHand"],PlotTheme -> "Classic"]

Or you can go for more of a Gray’s Anatomy look:

AnatomyPlot3D[Entity["AnatomicalStructure", "LeftHand"], PlotTheme -> "Vintage"]

AnatomyPlot3D[Entity["AnatomicalStructure", "LeftHand"], 
 PlotTheme -> "Vintage"]

Or you can have a “scientific” theme that tries to make different structures as distinct as possible:

AnatomyPlot3D[Entity["AnatomicalStructure", "LeftHand"], PlotTheme -> "Scientific"]

StackedDateListPlot[
 EntityClass[
  "Country", {
   EntityProperty["Country", "Population"] -> TakeLargest[10]}][
  Dated["Population", All],
  "Association"], PlotLabels -> Automatic, PlotLayout -> "Percentile"]

3D Computational Geometry

The Wolfram Language has very strong computational geometry capabilities—that work on both exact surfaces and approximate meshes. It’s a tremendous algorithmic challenge to smoothly handle constructive geometry in 3D—but after many years of work, Version 11.2 can do it:

RegionIntersection[MengerMesh[2, 3],<br />  BoundaryDiscretizeRegion[Ball[{1, 1, 1}]]]

RegionIntersection[MengerMesh[2, 3], 
 BoundaryDiscretizeRegion[Ball[{1, 1, 1}]]]

And of course, everything fits immediately into the rest of the system:

Volume[%]

Volume[%]

More Audio

Version 11 introduced a major new framework for large-scale audio processing in the Wolfram Language. We’re still developing all sorts of capabilities based on this framework, especially using machine learning. And in Version 11.2 there are a number of immediate enhancements. There are very practical things, like built-in support for AudioCapture under Linux. There’s also now the notion of a dynamic AudioStream, whose playback can be programmatically controlled.

Another new function is SpeechSynthesize, which creates audio from text:

SpeechSynthesize["hello"]

SpeechSynthesize["hello"]
Spectrogram[%]

Spectrogram[%]

Capture the Screen

The Wolfram Language tries to let you get data wherever you can. One capability added for Version 11.2 is being able to capture images of your computer screen. (Rasterize has been able to rasterize complete notebooks for a long time; CurrentNotebookImage now captures an image of what’s visible from a notebook on your screen.) Here’s an image of my main (first) screen, captured as I’m writing this post:

CurrentScreenImage[1]

CurrentScreen output

CurrentScreenImage[1]

Of course, I can now do computation on this image, just like I would on any other image. Here’s a map of the inferred “saliency” of different parts of my screen:

ImageSaliencyFilter["<image suppressed>"]//Colorize

ImageSaliencyFilter[CurrentScreenImage[1]]//Colorize

Language Features

Part of developing the Wolfram Language is adding major new frameworks. But another part is polishing the system, and implementing new functions that make doing things in the system ever easier, smoother and clearer.

Here are a few functions we’ve added in 11.2. The first is simple, but useful: TakeList—a function that successively takes blocks of elements from a list:

TakeList[Alphabet[], {2, 5, 3, 4}]

TakeList[Alphabet[], {2, 5, 3, 4}]

Then there’s FindRepeat (a “colleague” of FindTransientRepeat), that finds exact repeats in sequences—here for a Fibonacci sequence mod 10:

FindRepeat[Mod[Array[Fibonacci, 500], 10]]

FindRepeat[Mod[Array[Fibonacci, 500], 10]]

Here’s a very different kind of new feature: an addition to Capitalize that applies the heuristics for capitalizing “important words” to make something “title case”. (Yes, for an individual string this doesn’t look so useful; but it’s really useful when you’ve got 100 strings from different sources to make consistent.)

Capitalize["a new kind of science", "TitleCase"]

Capitalize["a new kind of science", "TitleCase"]

Talking of presentation, here’s a simple but very useful new output format: DecimalForm. Numbers are normally displayed in scientific notation when they get big, but DecimalForm forces “grade school” number format, without scientific notation:

Table[16.5^n, {n, 10}]

Table[16.5^n, {n, 10}]
DecimalForm[Table[16.5^n, {n, 10}]]

DecimalForm[Table[16.5^n, {n, 10}]]

Another language enhancement added in 11.2—though it’s really more of a seed for the future—is TwoWayRule, input as <->. Ever since Version 1.0 we’ve had Rule (->), and over the years we’ve found Rule increasingly useful as an inert structure that can symbolically represent diverse kinds of transformations and connections. Rule is fundamentally one-way: “left-hand side goes to right-hand side”. But one also sometimes needs a two-way version—and that’s what TwoWayRule provides.

Right now TwoWayRule can be used, for example, to enter undirected edges in a graph, or pairs of levels to exchange in Transpose. But in the future, it’ll be used more and more widely.

Graph[{1 <-> 2, 2 <-> 3, 3 <-> 1}]

Graph[{1 <-> 2, 2 <-> 3, 3 <-> 1}]

11.2 has all sorts of other language enhancements. Here’s an example of a somewhat different kind: the functions StringToByteArray and ByteArrayToString, which handle the somewhat tricky issue of converting between raw byte arrays and strings with various encodings (like UTF-8).

Initialization & System Operations

How do you get the Wolfram Language to automatically initialize itself in some particular way? All the way from Version 1.0, you’ve been able to set up an init.m file to run at initialization time. But finally now in Version 11.2 there’s a much more general and programmatic way of doing this—using InitializationValue and related constructs.

It’s made possible by the PersistentValue framework introduced in 11.1. And what’s particularly nice about it is that it allows a whole range of “persistence locations”—so you can store your initialization information on a per-session, per-computer, per-user, or also (new in 11.2) per-notebook way.

Talking about things that go all the way to Version 1.0, here’s a little story. Back in Version 1.0, Mathematica (as it then was) pretty much always used to display how much memory was still available on your computer (and, yes, you had to be very careful back then because there usually wasn’t much). Well, somewhere along the way, as virtual memory became widespread, people started thinking that “available memory” didn’t mean much, and we stopped displaying it. But now, after being gone for 25+ years, modern operating systems have let us bring it back—and there’s a new function MemoryAvailable in Version 11.2. And, yes, for my computer the result has gained about 5 digits relative to what it had in 1988:

MemoryAvailable[]

MemoryAvailable[ ]

Unified Asynchronous Tasks

There’ve been ways to do some kinds of asynchronous or “background” tasks in the Wolfram Language for a while, but in 11.2 there’s a complete systematic framework for it. There’s a thing called TaskObject that symbolically represents an asynchronous task. And there are basically now three ways such a task can be executed. First, there’s CloudSubmit, which submits the task for execution in the cloud. Then there’s LocalSubmit, which submits the task to be executed on your local computer, but in a separate subkernel. And finally, there’s SessionSubmit, which executes the task in idle time in your current Wolfram Language session.

When you submit a task, it’s off getting executed (you can schedule it to happen at particular times using ScheduledTask). The way you “hear back” from the task is through “handler functions”: functions that are set up when you submit the task to “handle” certain events that can occur during the execution of the task (completion, errors, etc.).

There are also functions like TaskSuspend, TaskAbort, TaskWait and so on, that let you interact with tasks “from the outside”. And, yes, when you’re doing big machine-learning trainings, for example, this comes in pretty handy.

Connectivity

We’re always keen to make the Wolfram Language as connected as it can be. And in Version 11.2 we’ve added a variety of features to achieve that. In Version 11 we introduced the Authentication option, which lets you give credentials in functions like URLExecute. Version 11 already allowed for PermissionsKey (a.k.a. an “app id”). In 11.2 you can now give an explicit username and password, and you can also use SecuredAuthenticationKey to provide OAuth credentials. It’s tricky stuff, but I’m pleased with how cleanly we’re able to represent it using the symbolic character of the Wolfram Language—and it’s really useful when you’re, for example, actually working with a bunch internal websites or APIs.

Back in Version 10 (2014) we introduced the very powerful idea of using APIFunction to provide a symbolic specification for a web API—that could be deployed to the cloud using CloudDeploy. Then in Version 10.2 we introduced MailReceiverFunction, which responds not to web requests, but instead to receiving mail messages. (By the way, in 11.2 we’ve considerably strengthened SendMail, notably adding various authentication and address validation capabilities.)

In Version 11, we introduced the channel framework, which allows for publish-subscribe interactions between Wolfram Language instances (and external programs)—enabling things like chat, as well as a host of useful internal services. Well, in our continual path of automating and unifying, we’re introducing in 11.2 ChannelReceiverFunction—which can be deployed to the cloud to respond to whatever messages are sent on a particular channel.

In the low-level software engineering of the Wolfram Language we’ve used sockets for a long time. A few years ago we started exposing some socket functionality within the language. And now in 11.2 we have a full socket framework. The socket framework supports both traditional TCP sockets, as well as modern ZeroMQ sockets.

External Programs

Ever since the beginning, the Wolfram Language has been able to communicate with external C programs—actually using its native WSTP (Wolfram Symbolic Transfer Protocol) symbolic expression transfer protocol. Years ago J/Link and .NetLink enabled seamless connection to Java and .Net programs. RLink did the same for R. Then there are things like LibraryLink, that allow direct connection to DLLs—or RunProcess for running programs from the shell.

But 11.2 introduces a new form of external program communication: ExternalEvaluate. ExternalEvaluate is for doing computation in languages which—like the Wolfram Language—support REPL-style input/output. The two first examples available in 11.2 are Python and NodeJS.

Here’s a computation done with NodeJS—though this would definitely be better done directly in the Wolfram Language:

ExternalEvaluate["NodeJS", "Math.sqrt(50)"]

ExternalEvaluate["NodeJS", "Math.sqrt(50)"]

Here’s a Python computation (yes, it’s pretty funky to use & for BitAnd):

ExternalEvaluate["Python", "[ i & 10 for i in range(10)]"]

ExternalEvaluate["Python", "[ i & 10 for i in range(10)]"]

Of course, the place where things start to get useful is when one’s accessing large external code bases or libraries. And what’s nice is that one can use the Wolfram Language to control everything, and to analyze the results. ExternalEvaluate is in a sense a very lightweight construct—and one can routinely use it even deep inside some piece of Wolfram Language code.

There’s an infrastructure around ExternalEvaluate, aimed at connecting to the correct executable, appropriately converting types, and so on. There’s also StartExternalSession, which allows you to start a single external session, and then perform multiple evaluations in it.

The Whole List

So is there still more to say about 11.2? Yes! There are lots of new functions and features that I haven’t mentioned at all. Here’s a more extensive list:

New features

But if you want to find out about 11.2, the best thing to do is to actually run it. I’ve actually been running pre-release versions of 11.2 on my personal machines for a couple of months. So by now I’m taking the new features and functions quite for granted—even though, earlier on, I kept on saying “this is really useful; how could we have not had this for 30 years?”. Well, realistically, it’s taken building everything we have so far—not only to provide the technical foundations, but also to seed the ideas, for 11.2. But now our work on 11.2 is done, and 11.2 is ready to go out into the world—and deliver the latest results from our decades of research and development.

To comment, please visit the copy of this post at the Wolfram Blog »

Are All Fish the Same Shape if You Stretch Them? The Victorian Tale of On Growth and Form

$
0
0
Are All Fish the Same Shape If You Stretch Them? The Victorian Tale of On Growth and Form

Is there a global theory for the shapes of fishes? It’s the kind of thing I might feel encouraged to ask by my explorations of simple programs and the forms they produce. But for most of the history of biology, it’s not the kind of thing anyone would ever have asked. With one notable exception: D’Arcy Wentworth Thompson.

And it’s now 100 years since D’Arcy Thompson published the first edition of his magnum opus On Growth and Form—and tried to use ideas from mathematics and physics to discuss global questions of biological growth and form. Probably the most famous pages of his book are the ones about fish shapes:

D'Arcy Thomspon and fish sketches

Stretch one kind of fish, and it looks like another. Yes, without constraints on how you stretch, it’s not quite clear what this is telling one, and I don’t think it’s much. But just to ask the question is interesting, and On Growth and Form is full of interesting questions—together with all manner of curious and interesting answers.

D’Arcy Thompson was in many ways a quintessential British Victorian academic, steeped in the classics, and writing books with titles like A Glossary of Greek Fishes (i.e. how were fish described in classical Greek texts). But he was also a diligent natural scientist, and he became a serious enthusiast of mathematics and physics. And where Aristotle (whom D’Arcy Thompson had translated) used plain language, with perhaps a dash of logic, to try to describe the natural world, D’Arcy Thompson tried to use the language of mathematics and physics.

At Christmas time, according to his daughter, he used to entertain children by drawing pictures of dogs on rubber sheets and stretching them from poodles to dachshunds. But it was not until the age of 57 that he turned such pursuits into the piece of scholarship that is On Growth and Form.

The first edition of the book was published in 1917. In many ways it’s like a catalog of biological forms—a kind of geometrical analog of Aristotle’s books on natural history. It’s particularly big on aquatic life—from plankton to fish. Land animals do make a showing, though mostly as skeletons. And ordinary plants make only specific appearances. But throughout the book the emphasis is on “why does such-and-such a thing have the form or shape it does?”. And over and over again the answer that’s given is: “because it’s following such-and-such a physical phenomenon, or mathematical structure”.

Much of the story of the book is told in its pictures. There are growth curves—of haddock, trees, regenerated tadpole tails, etc. There’s a long discussion of the shapes of cells—and especially their connection with phenomena (like splashes, bubbles and foams) where surface tension is important. There are spirals—described mathematically, and appearing in shells and horns and leaf arrangements. And finally there’s a long discussion of the “theory of transformations”—about how different forms (like the shapes of fishes or primate skulls) might be related by various (mathematically rather undefined) “transformations”.

On Growth and Form pages

In D’Arcy Thompson’s time—as still to a large extent today—the dominant form of explanation in biology is Darwinism: essentially the idea that things are the way they are because they’ve somehow evolved to be that way, in order to maximize some kind of fitness. D’Arcy Thompson didn’t think that was the whole story, or even necessarily the most important part of the story. He thought instead that many natural forms are the way they are because it’s an inevitable feature of the physics of biological tissue, or the mathematics of geometrical forms.

Sometimes his explanations fall a little flat. Leaves aren’t really shaped much like polar plots of trigonometric functions. Jellyfish aren’t convincingly shaped like drops of ink in water. But what he says often rings true. Hexagonal arrangements of cells are like closest geometrical packings of disks. Sheep horns and nautilus shells form logarithmic (equiangular) spirals.

He uses basic geometry and algebra quite a bit—and even sometimes a little combinatorics or topology. But he never goes as far as calculus (and, as it happens, he never learned it), and he never considers ideas like recursive rules or nested structures. But for me—as for quite a few others over the years—D’Arcy Thompson’s book is an important inspiration for the concept that even though biological forms may at first look complicated, there can still be theories and explanations for them.

In modern times, though, there’s a crucial new idea, that D’Arcy Thompson did not have: the idea of using not traditional mathematics and physics, but instead computation and simple programs as a way to describe the rules by which things grow. And—as I discovered in writing my book A New Kind of Science—it’s remarkable to what extent that idea lets us understand the mechanisms by which complex biological forms are produced, and lets us finish the bold initiative that D’Arcy Thompson began a century ago in On Growth and Form.

Who Was D’Arcy Thompson?

D’Arcy Wentworth Thompson was born in Edinburgh on May 5, 1860. His father, who was also named D’Arcy Wentworth Thompson, had been born in 1829, aboard a ship captained by his father, that was transporting convicts to Tasmania. D’Arcy Senior was soon sent to boarding school in England, and eventually studied classics at Cambridge. Though academically distinguished, he was apparently passed over for a fellowship because of perceived eccentricity—and wound up as a (modernizing, if opinionated) schoolteacher in Edinburgh. Once there, he soon met the lively young Fanny Gamgee, daughter of Joseph Gamgee, an early and distinguished veterinary surgeon—and in 1859 they were married.

D’Arcy (junior) was born the next year—but unfortunately his mother contracted an infection during childbirth, and died within the week. The result was that D’Arcy (junior) ended up living with his mother’s parents, taken care of by one of his mother’s sisters. When D’Arcy (junior) was three years old, his father then got a university professorship (of ancient Greek) in Ireland, and moved there. Still, D’Arcy (junior) stayed in close touch with his father through letters, and, later, visits. And indeed his father seems to have doted on him, for example publishing two children’s books dedicated to him:

Fun and Earnest first pages

In a foreshadowing of his later interests, D’Arcy (junior) learned Latin from his father almost as soon as he was able to speak, and was continually exposed to animals of all sorts in the Gamgee household. There was also a certain math/physics theme. D’Arcy Thompson (senior)’s best friend in Edinburgh was Peter Guthrie Tait—a distinguished mathematical physicist (mechanics, thermodynamics, knot theory, …) and friend of Maxwell, Hamilton and Kelvin—and D’Arcy (junior) often hung out at his house. Joseph Gamgee was also engaged in various scientific pursuits, for example publishing the book On Horseshoeing and Lameness based in part on a statistical study he’d done with the then 10-year-old D’Arcy (junior). Meanwhile, D’Arcy Thompson (senior) began to travel, as D’Arcy (junior) would later do, for example visiting Harvard in 1867 to give the Lowell Lectures—which D’Arcy (junior) would also give, in 1936, 69 years later.

At the age of 11, D’Arcy went to the school where his father had previously taught. He did well in academic studies, but also organized a natural history (“Eureka”) club, where he and his friends collected all sorts of specimens. And by the end of his time at school, he published his first paper: the 11-page (with photographs) “Note on Ulendron and Halonia”, describing the regular pattern of growth scars on two kinds of fossil plants.

At 18, D’Arcy started at Edinburgh University as a medical student. His grandfather—while distinguished—was not wealthy, with the result that D’Arcy had to support himself by tutoring Greek and writing articles for the Edinburgh-published Encyclopedia Britannica (the 9th edition, from 1889, contains an extensive article by D’Arcy on John Ray, a British naturalist of the 1600s). But D’Arcy’s real passion at the time was the then-hot field of paleontology, and after two years he abandoned his medical studies—and left to instead study Natural Science at the place his father had been years earlier: Trinity College, Cambridge.

D’Arcy did well at Cambridge, had an interesting circle of friends (including the future co-author of Principia Mathematica, Alfred North Whitehead), and quickly became something of a fixture in the local natural history scene. This led Macmillan & Co. to commission D’Arcy (still an undergraduate) to produce his first book: a translation from German of Hermann Muller’s The Fertilisation of Flowers. The publisher thought that the book—which was a fairly traditional work of descriptive natural history, based in part on observing about 14,000 visits of insects to flowers—would be of popular interest, and (in one of his last published appearances) got no less than Charles Darwin to write a preface for it:

Fertilisation of Flowers 1
Fertilisation of Flowers 2
Fertilisation of Flowers 3
Fertilisation of Flowers 4

At Cambridge, D’Arcy hung out a lot at the new Museum of Zoology, and was particularly influenced by a young professor named Frank Balfour who studied comparative embryology, and for whom a new Department of Animal Morphology was being created—but who died trying to climb Mont Blanc right when D’Arcy was finishing Cambridge.

D’Arcy began to pursue all sorts of projects, giving lectures on topics such as “Aristotle on Cephalopods”, and making detailed studies of “hydroid zoophyte” specimens (aquatic animals like sea anemones that look like plants) brought back from expeditions to the Arctic and Antarctic. He applied for a fellowship in Cambridge, but—like his father before him—didn’t get it.

In 1884, though, the newly created and new-thinking (non-religious, co-ed, young professors, …) University College in Dundee, Scotland, advertised for a professor of biology (yes, combining zoology and botany!). D’Arcy applied, and got the job—with the result that at age 24 he became a professor, a role in which he would remain for nearly 64 years.

D’Arcy the Professor

D’Arcy was immediately popular as a teacher, and continued to do a certain amount of rather dry academic work (in 1885 he published A Bibliography of Protozoa, Sponges, Coelenterata, and Worms, which was, as advertised, a list of about 6000 publications on those subjects between 1861 and 1883). But his real passion was the creation of his own Museum of Zoology, and the accumulation of specimens for it.

He was soon excitedly writing that “within the last week, I have had a porpoise, two mongooses, a small shark, an eel 8ft long… a young ostrich and two bagfuls of monkeys: all dead of course.” His archive (among its 30,000 items) contains extensive evidence of all sorts of trading of specimens from around the world:

Trading notes 1
Trading notes 2
Trading notes 3
Trading notes 4
Trading notes 5
Trading notes 6
Trading notes 7
Trading notes 8

But in Dundee he found a particularly good local source of specimens. Dundee had long been a center of international textile trade, and had also developed a small whaling industry. And when it was discovered that by mixing jute with whale oil it could be turned into fabric, whaling in Dundee grew dramatically.

Some of the hunting they did was local. But whaling ships from Dundee went as far as Canada and Greenland (and once even to Antarctica). And befriending their captains, D’Arcy persuaded them to bring him back specimens (as skeletons, in jars, etc.) from their expeditions—with the result, for example, that his museum rapidly accumulated the best arctic collection around.

The museum always operated on a shoestring budget, and it was typical in 1886 when D’Arcy wrote that he’d personally been “working all day on a baby Ornithorhynchus” (platypus). In his early years as a professor, D’Arcy published only a few papers, mostly on very detailed matters—like the strangely shaped stomach of a type of possum, or the structure of the porpoise larynx, or the correct taxonomic placement of a duck-like dinosaur. And he always followed the prevailing Darwinian paradigm of trying to explain things either by their evolutionary connections, or by their fitness for a particular function.

Museum specimen

The Matter of the Alaskan Seals

In Dundee, D’Arcy joined various local clubs, like the Dundee Naturalists’ Society, the Dundee Working Men’s Field Club, the Homeric Club, and, later, also the Freemasons. He became quite active in university and community affairs, notably campaigning for a medical school (and giving all sorts of statistical evidence for its utility), as well as for education for the local poor. But mostly D’Arcy lived the life of an academic, centered around his teaching and his museum.

Still, as a responsible member of the community, he was called on in various ways, and in 1892, he joined his first government commission—formed to investigate a plague of voles in Scotland (conclusions included: “don’t shoot hawks and owls that eat voles”, and “it’s probably not a good idea to set loose a ‘virus’ to infect the voles”). Then in 1896—at the age of 36—D’Arcy was tapped for a piece of international scientific diplomacy.

It all had to do with seals, and the fur trade based on them. When Russia sold Alaska to the US in 1867 it also sold the rights to the seals which bred on some of the islands in the Bering Sea. But by the 1890s Canadian ships (under British protection) were claiming the right to catch seals in the open ocean—and too many seals were being killed for the population to be maintained. In 1893 a treaty was made to clarify the situation. But in 1896 there was a need to analyze more carefully what was going on (and, yes, to claim what ended up being $10M in damages for Canadian/British sealers).

Lord Salisbury, the British Prime Minister at the time, who happened to be an amateur botanist, knew of D’Arcy and asked him to travel to the Bering Sea to investigate. D’Arcy had by that point traveled a bit around Europe, but this was a complex trip. At first he went to Washington, DC, dropping in at the White House. Then across Canada, and then by Coast Guard ship (and dog sled) to the seals.

North American trip

D’Arcy did well at making friends with his American counterparts (who included the president of the then-a-decade-old Stanford University), and found that at least on the American-controlled islands (the Russian-controlled ones were a different story) seals were being herded a bit like sheep in Scotland, and that though there was “abundant need for care and prudent measures of conservation”, things were basically OK. In Washington, DC, D’Arcy gave a long speech, and helped broker a kind of “seal peace treaty”—that the British government was pleased enough with to give D’Arcy a (medieval-inspired) “Companion of the Bath” honor.

Statesman of Science

Being a professor in Dundee wasn’t a particularly high position in the pecking order of the time. And after his Bering Sea experience, D’Arcy started investigating moving up. He applied for various jobs (for example at the Natural History Museum in London), but perhaps in part because he didn’t have fancier academic credentials (like a PhD)—and also had spent so much of his time organizing things rather than doing research—he never got any of them.

He was nevertheless increasingly sought after as a kind of statesman of science. And in 1898 he was appointed to the Fishery Board for Scotland (a role in which he continued for 43 years), and the next year he was the British delegate to the first International Conference on Oceanography.

D’Arcy was a serious collector of data. He maintained a team of people at the fish market, keeping track of the catches brought in from boats:

Tide log

And then he took this data and created graphics and statistical analyses:

Thompson fish catch graph

And over the years he became well known as a negotiator of fishing rights, both locally and internationally. He was also a collector of oceanographic data. He saw to it that there were detailed tide measurements made:

Oceanographic data

And had the data analyzed and decomposed into harmonic components—much as it is today:

Data analysis

The Scottish government even provided for him a research ship (a steam trawler named the SS Goldseeker) in which he and his students would go around the Scottish coast, measuring ocean properties and collecting specimens.

D’Arcy the Classical Scholar

D’Arcy always had many interests. First and foremost was natural history. But after that came classics. And indeed, back in his undergraduate days, D’Arcy had already started working with his classicist father on translating Aristotle’s works on natural history into English.

One of the complexities of that task, however, was to know what species Aristotle meant by words he used in Greek. And this led D’Arcy into what became a lifelong project—the first output of which was his 1894 book Glossary of Greek Birds:

Greek Birds 1
Greek Birds 2
Greek Birds 3
Greek Birds 4

It’s an interesting exercise—trying to fit together clues to deduce just what modern bird some passage in classical Greek literature was talking about. Often D’Arcy succeeds. Sometimes by using natural history; sometimes by thinking about mythology or about configurations of things like constellations named for birds. But sometimes D’Arcy just has to describe something as “a remarkable bird, of three varieties, of which one croaks like a frog, one bleats like a goat, and the third barks like a dog”—and he doesn’t know the modern equivalent.

Over the years, D’Arcy continued his efforts to translate Aristotle, and finally in 1910 (8 years after his father’s death) he was able to publish what remains to this day the standard translation of Aristotle’s main work on zoology, his History of Animals.

This project established D’Arcy as a classical scholar—and in 1912 he even got an honorary PhD (D.Litt.) at Cambridge on the basis of it. He also began a long association with what’s known as Liddell & Scott, the still-standard dictionary of ancient Greek. (Liddell had been notable for being the father of Alice, of Wonderland fame.)

But D’Arcy’s interests in Greek science extended beyond natural history, and into astronomy and mathematics. D’Arcy explored such things as ancient methods for computing square roots—and also studied Greek geometry.

So in 1889, when D’Arcy was investigating Foraminifera (protozoa that live in sediment or in the ocean and often form spiral shells) he was able to bring his knowledge of Greek mathematics to bear, declaring that “I have taken to Mathematics… and discovered some unsuspected wonders in regard to the Spirals of the Foraminifera!”.

Towards Something Bigger

When he was 41, in 1901, D’Arcy married his stepmother’s niece, the then 29-year-old Ada Maureen Drury (yes, small world that it is, she was named after “Byron’s” Ada, because an ancestor had reputedly been a romantic interest of Byron’s). They bought a small house somewhat outside of town—and between 1902 and 1910 they had three children, all daughters.

By 1910, D’Arcy was 50 years old, and an elder statesman of science. He kept himself busy teaching, managing his museum, doing administrative and government work, and giving public lectures. A typical lecture—given at Oxford in 1913—was entitled “On Aristotle as a Biologist”. It was charming, eloquent, ponderous and Victorian:

Thompson lecture excerpt

In many ways, D’Arcy was first and foremost a collector. He collected natural history specimens. He collected Greek words. He collected academic references—and antiquarian books. And he collected facts and statements—many of which he typed onto index cards, now to be found in his archive:

Index cards 1
Index cards 1

Still, in his role as elder statesman, D’Arcy was called upon to make broad pronouncements. And in many ways the great achievement of the later part of his life was to connect the disparate things he collected and identify common themes that could connect them.

In 1908 he had published (in Nature) a 2-page paper entitled “On the Shapes of Eggs and the Causes Which Determine Them”. In a sense the paper was about the physics of egg formation. And what was significant was that instead of accounting for different egg shapes in terms of their evolutionary fitness, it talked about the physical mechanisms that could produce them.

Three years later D’Arcy gave a speech entitled “Magnalia Naturae: or the Greater Problems of Biology” in which he took this much further, and started discussing “the possibility of… supporting the observed facts of organic form on mathematical principles [so as to make] morphology… a true natural science… justified by its relation to mathematics”.

Magnalia Naturae

In 1910, Cambridge University Press had asked D’Arcy if he’d like to write a book about whales. He said that instead perhaps he should write a “little book” about “The Forms of Organisms” or “Growth and Form”—and he began the process of assembling what would become On Growth and Form. The book had elements that drew on D’Arcy’s whole range of interests. His archives contain some of what went into the assembly of the book, like the original drawings of fish-shape transformations (D’Arcy wasn’t a great sketch artist):

Fish sketches

There were also other, more impressionistic images—like the one illustrating transformations between zebra-related animals (quagga, etc.) or one showing tortoise (?) shell structure:

Sketches 1
Sketches 2

D’Arcy didn’t contact his publisher again for several years, but in 1915—in the middle of World War I—he wrote them again, saying that he had finally finished the book “on a larger scale”, and soon signed a publishing contract (that’s shockingly similar to the way modern ones look):

D'Arcy Thompson publishing contract

It took a couple more years, between D’Arcy’s last-minute changes, and paper shortages associated with the war—but finally in 1917 the book (which had by then swelled to 800 pages) was published.

The Book

On Growth and Form opens with a classic D’Arcy “Prefatory Note”: “This book of mine has little need of preface, for indeed it is ‘all preface’ from beginning to end.” He goes on to apologize for his lack of mathematical skill—and then launches, beginning with a discussion of the relation of the philosophies of Kant and Aristotle on the nature of science.

D'Arcy Thompson prefatory note

The reviews were positive, and surprisingly sensible, with the Times Literary Supplement for example writing:

On Growth and Form review

Further into Mathematics

D’Arcy was 57 years old by the time On Growth and Form was published—and he could have used it as a closing act in his career. But instead it seemed to make him more energetic—and seemed to encourage him to take mathematical methods as a kind of personal theme.

In his study of the shapes of biological cells, D’Arcy had gotten very interested in polyhedra and packings, and particularly in Archimedean solids (such as the tetrakaidecahedron). His archives contain all sorts of investigations of possible packings and their properties, together with actual cardboard polyhedra, still ready to assemble:

Shape sketch 1
Shape sketch 2
Shape sketch 3
Shape sketch 4
Shape sketch 5
Shape sketch 6
Shape sketch 7
Shape sketch 8

D’Arcy extended his interest in number theory, collecting properties of numbers a little like he’d collected so many other things:

Number theory notes 1
Number theory notes 2
Number theory notes 3
Number theory notes 4

He dipped into chemistry, thinking about it in terms of graphs, like those derived from polyhedra:

Chemistry notes 1
Chemistry notes 2

And even when he worked on history, D’Arcy used mathematical thinking, here studying the distribution of when famous people lived, in connection with writing about the Golden Ages:

Golden Age notes 1
Golden Age notes 2

As an administrator he brought in math as well, here analyzing what in today’s world would be called a grading curve—and comparing exam results between different years:

Learning curve 1
Learning curve 2
Learning curve 3

He worked extensively on tides and tide computations. He collected data from harbors. And came up with theories about the various components of tides, some of which turned out to be correct:

Tide notes 1
Tide notes 2
Tide notes 3

The mathematics he used was always a bit circumscribed—and for example he never learned calculus, even to the point of apparently getting confused about growth rates versus finite differences in plots in On Growth and Form. (There seems to be just a single sheet of calculus-like work by him in his archives, and it’s simply an exercise copied without solution from the famous Whittaker & Watson textbook.)

But what about systems based on pure computational rules—of the kind that, for example, I have spent so much time studying? Well, in the archive there are things like this—perhaps a version of a space-filling curve:

Computational rules 1
Computational rules 2

And back from 1897 there’s a curious cardboard object that D’Arcy described as a “reasoning machine”:

Reasoning machine 1
Reasoning machine 2

It’s not completely clear what this was (though its wheel still turns nicely!). It seemed to involve a diagrammatic way of determining the truth value of a logical expression, perhaps following the work of Jevons from a couple of decades earlier. But so far as I can tell it was D’Arcy’s sole excursion into the world of logic and rule-based processes—and he never connected anything like this to biology.

The Later D’Arcy

Before On Growth and Form, D’Arcy had published only quite sporadically. But after it, as he entered his sixties, he began to write prodigiously, publishing all over the place on a wide range of topics. He gave lectures, in person and on the radio. And he also began to receive all sorts of honors (he became Sir D’Arcy in 1937)—and was invited to events all over the world (he did a grand tour of the US in the 1930s, and was also received as a celebrity in places like the Soviet Union).

On Growth and Form was considered a commercial success. Its original print run was 500 copies (of which at least 113 are now in academic libraries around the world), and by 1923 it had sold out. The publisher (Cambridge University Press) wanted to reprint it. But D’Arcy insisted that it needed to be revised—and in the end it took until 1942 before he got the revisions done. The second edition added 300 pages to the book—including photographs of splashes (obtained directly from Harold Edgerton at MIT), analysis of teeth, and patterns on animal coats. But the main elements of the book remained exactly the same.

D’Arcy had published a second edition of his Glossary of Greek Birds in 1936 (more birds, more interpretations), and in 1947, based on notes he started collecting in 1879, he released a kind of sequel: his Glossary of Greek Fishes. (Oxford University Press, in the flap copy for the book, says charmingly that “… it is highly improbable that there is any other scholar who has studied Greek fishes over so long a period as Sir D’Arcy Thompson…”.)

Even into his eighties, D’Arcy continued to travel all over the place—with his archives containing some typical travel documents of the time:

Travel document 1
Travel document 2
Travel document 3
Travel document 4
Travel document 5
Travel document 6

His travel was interrupted by World War II (which is perhaps why the second edition of On Growth and Form finally got finished in 1942). But in 1947, with the war over, at the age of 87, D’Arcy went to India for several months, notably lecturing on the skeletal structure of birds while holding a somewhat impatient live hen in a box. But in India D’Arcy’s health began to fail, and after returning to Scotland, he died in June 1948—to the last corresponding about specimens for his museum.

Thompson biography

Aftermath

D’Arcy’s wife (who seemed in frail health through much of her 47-year marriage to D’Arcy) lived on for only 7 months after his death. None of D’Arcy’s daughters ever married. His oldest daughter Ruth became a music teacher and administrator at a girl’s boarding school, and in 1958 (when she was 56) published a biography of D’Arcy.

His middle daughter Molly moved to South Africa, wrote children’s and travel books, and lived to the age of 101, dying in 2010—while his youngest daughter Barbara wrote a book on healing and herbalism and died in a freak river accident in 1990.

On Growth and Form was D’Arcy’s most notable output, and it has been reprinted many times over the course of a hundred years. The museum D’Arcy created in Dundee was largely dismantled in the 1950s, but has now been to some extent reconstituted, complete with some of the very specimens D’Arcy collected, with labels duly signed “DWT” (yup, that’s me next to the same orangutan as in the old picture of the museum):

Crocodile skull
Specimen jars
Stephen Wolfram with primate skeleton

In 1917 D’Arcy moved from Dundee to the nearby but more distinguished and ancient university in St Andrews, where he took over another museum. It too fell upon hard times, but still exists in a reduced form.

Crab
Birds

And now some of the D’Arcy’s specimens are being 3D-scanned (yes, that’s the same crocodile):

Crocodile skull 3D scan

And on a main street in St. Andrews there’s still a plaque where D’Arcy lived:

Thompson home plaque

What Was D’Arcy Like?

Thompson portrait

D’Arcy had an imposing physical presence. He stood 6’3” and had a large head, on which he often wore a black fedora. He had piercing blue eyes, and in his youth, he had red hair—which he grew into a large beard when he was a young professor. He often wore a long coat, which could sometimes seem moth eaten. Later in his life, he would sometimes walk around town with a parrot on his shoulder.

He was renowned as an engaging speaker and lecturer—known both for his colorful and eloquent content (he could regale the audience with the tale of a walrus he had known, or equally well discuss Aristotle’s year by the seaside), and for the various physical (and biological) demonstrations he would use. Many stories are told of his eccentricities, especially by his former students. It is said, for example, that he once came to give a lecture to his students which began with him pulling a dead frog out of one pocket of his coat—and then a live one out of the other pocket. Despite having spent most of his life in Scotland, he didn’t have a Scottish accent.

He was charming and jolly, and even in his eighties he was given to dancing when he could. He was tactful and diplomatic, if not particularly good at sensing other people’s opinions. He presented himself with a certain modesty (for example always expressing his weakness in mathematics), and—perhaps to his detriment—did little to advocate for himself.

He led a fairly simple life, centered around his work and family. He worked hard, typically until midnight each day. He always liked to learn. He enjoyed children and the young, and would happily play with them. When he walked around town, he was universally recognized (the shoulder parrot helped!). He was happy to chat with anyone, and in later years, he carried candy in his pocket, which he gave out to children he encountered.

D’Arcy was a product of his age, but also of an unusual combination of influences. Like many of the members of his adopted family, D’Arcy aspired to be a scientist. But like his father, he aspired to be a classical scholar. He did diligent and detailed academic work for many years, in natural history, in classics, and in ancient science. But he also enjoyed presentation and lecturing. And it was in large part through his efforts to explain his academic work that he came to make the connections that would lead to On Growth and Form.

What Happened After

If you search the scientific literature today, you’ll find about 4000 publications citing On Growth and Form. Their number relative to the total scientific literature has remained remarkably fairly even over the years (with a peak around the publication of the second edition in 1942, and perhaps a dip in the 1960s when genetics began to dominate biology):

On Growth and Form citations graph

There’s quite a diversity in the topics, as this random sample of titles indicates:

Titles referencing On Growth and Form

Most concern specific biological systems; some are more general. Making word clouds from titles by decade, one sees that “growth” is the dominant theme—though centered in the 1990s there are signs of the discussion that was going on about the “philosophy of evolution”, and the interplay between natural selection and “developmental constraints”:

Word clouds - works referencing On Growth and Form

On Growth and Form has never really become mainstream in biology—or any other field. (It didn’t help that by the 1930s, biology was firmly going off in the direction of biochemistry and later molecular biology.) So how have people found out about On Growth and Form?

Indeed, as I write this, I’m wondering: how did I myself find out about On Growth and Form? I can tell I knew about it by 1983, because I referenced it (somewhat casually) in my first long paper about cellular automata and the patterns they generate. I also know that in 1982 I bought a copy of the (heavily abridged) version of On Growth and Form that was available then. (I was thrilled in 1992 when I chanced upon a complete second edition of On Growth and Form in a used bookstore; I’d never seen the whole book before.)

But how did I first become aware of D’Arcy, and On Growth and Form? My first hypothesis today was that it was in 1977, from the historical notes of Benoit Mandelbrot’s Fractals book (yes, D’Arcy had actually used the term “self-similar”, though only in connection with spirals). Then I thought perhaps it might have been around 1980, from the references to Alan Turing’s 1952 paper on the chemical basis of morphogenesis. I wondered if perhaps it was from hearing about catastrophe theory, and the work of René Thom, in the mid-1970s. But my best guess as of now is that it was actually around 1978, from a little book called Patterns in Nature, by a certain Peter S. Stevens, that heavily references On Growth and Form, and that I happened across in a bookstore.

I’ve almost never seen mentions of Patterns in Nature, but in some ways it’s a simplified and modernized On Growth and Form, full of photographs comparing biological and non-biological systems, together with diagrams about how various structures can be built. But what was the path from D’Arcy to Patterns in Nature? It’s a typical kind of history question that comes up.

The first thing I noticed is that Peter Stevens (born 1936) was trained as an architect, and spent most of his career around Harvard. In his book, he thanks his father, Stanley Stevens (1906–1973), who was a psychoacoustics expert, who was at Harvard from 1932 on, and who organized a “Science of Science” interdisciplinary discussion group there. But recall that D’Arcy visited Harvard to give the Lowell Lectures in 1936. So that’s no doubt how Stevens, Sr. knew about him.

But in any case, from his Harvard connections came, I believe, the references to D’Arcy by evolutionary biologist Stephen J. Gould, and by John Tyler Bonner, who was the person who created the abridged version of On Growth and Form (sadly, omitting for example the chapter on phyllotaxis). I suspect D’Arcy’s influence on Buckminster Fuller also came through Harvard connections. And maybe Benoit Mandelbrot heard about D’Arcy there too. (One would think that with On Growth and Form being out there as a published book, there wouldn’t be need for word-of-mouth communication, but particularly outside of mainstream areas of science, word of mouth remains surprisingly important.)

But what about Turing? How did he know about D’Arcy? Well, I have at least a guess here. D’Arcy had been good friends in high school with a certain John Scott Haldane, who would go on to be a well-known physiology researcher, and who had a son named J. B. S. Haldane, who became a major figure in evolutionary biology and in the presentation of science to the public. Haldane often referenced D’Arcy, and notably introduced him to Peter Medawar (who would win a Nobel Prize for immunology), of whom D’Arcy (in 1944) would say “I do believe that more than any man you have understood what I have tried to say!”.

Both Medawar and evolutionary biologist (and originator of the term “transhumanism”) Julian Huxley encouraged D’Arcy to think about continuity and gradients in connection with his shape transformations (e.g. of fish). I don’t know the whole story, but I suspect these two connected with C. H. Waddington, a developmental biologist (and inventor of the term “epigenetics”) who interacted with Turing in Cambridge. (Small world that it is, Waddington’s mathematician daughter is married to a distinguished mathematician named John Milnor, with whom I discussed D’Arcy in the early 1980s.) And when Turing came to write about morphogenesis in 1952, he referenced D’Arcy (and Waddington), then proceeded to base his theory on (morphogen) gradients.

In another direction, D’Arcy interacted with early mathematical biologists like Alfred Lotka and Vito Volterra and Nicolas Rashevsky. And though their work was heavily based on differential equations (which D’Arcy didn’t really believe in), he took pains to support them when he could.

On Growth and Form also seems to have been popular in the art and architecture community, with people as diverse as the architects Mies van der Rohe, Le Corbusier, the painter Jackson Pollock and the sculptor Henry Moore also mentioning its influence.

Modern Times

So now that it’s been 100 years since On Growth and Form was published, do we finally understand how biological organisms grow? Lots of work has certainly been done at the genetic and molecular scale, and great progress has been made. But when it comes to macroscopic growth, much less has been done. And a large part of the reason, I suspect, is that it’s needed a new paradigm in order to make progress.

D’Arcy’s work was, more than anything, concerned with analogy and (essentially Aristotelian-style) mechanism. He didn’t really pursue traditional “theory” in the sense of the exact sciences. In his day, though, such theory would normally have meant writing down mathematical equations to represent growth, and then solving them to see what would happen.

And the problem is that when one looks at biological forms, they often seem far too complex to be the results of traditional mathematical equations. But starting in the 1950s a new possibility emerged: perhaps one could model biological growth as following not mathematical equations but instead rules like a program for a computer.

And when I started my systematic investigation of the computational universe of possible programs in the early 1980s, I was immediately struck by how “biological” a lot of the forms created, say, by simple cellular automata seemed:

Cellular automata

And this is how I came to study On Growth and Form. I viewed it almost as a catalog of biological forms—that I wondered if one could explain with computational rules. I even started collecting specimens—in a very pale shadow of D’Arcy’s efforts (and with no animal skeletons!):

Seashells 1
Seashells 2

Occasionally I would find one that just seemed to cry out as being from something like a program:

Shell pattern

But more than that, I kept on exploring spaces of possible programs—and discovering that the range of forms they produced seem to align remarkably well with the actual range of forms one sees across biological organisms. (I looked particularly at shell shapes and patterns, as well as other pigmentation patterns, and various forms of plants.)

And in a sense what I found strongly supports a core idea of D’Arcy’s: that the forms of organisms are not so much determined by evolution, as by what it’s possible for processes to produce. D’Arcy thought about physical processes and mathematical forms; 60+ years later I was in a position to explore the more general space of computational processes.

And it so happened that, like D’Arcy, I ended up presenting my main results in a (big) book, that I called A New Kind of Science. My main purpose in the book was to describe what I’d learned from exploring the computational universe. And I devoted two sections (out of 114) respectively to “Growth of Plants and Animals” and “Biological Pigmentation Patterns”—producing something that looks a bit similar to On Growth and Form:

A New Kind of Science pages

So, in the end, what about the fish? Well, I think I’ve managed to understand something about the “morphospace” of possible mollusc shells. And I’ve made a start on leaves—though I’m hoping one of these years to be able to get a lot more data. I’ve also looked at animal skeletons a bit. But, yes, I at least still don’t know about the space of possible fish shapes. Though maybe somewhere inside our image identification neural net (which saw plenty of fish in its training) it already knows. And maybe it agrees with what D’Arcy thought—a hundred years ago.

(For help with facts and materials I’d like to thank Matthew Jarron, Maia Sheridan, Isabella Scott, Special Collections at the University of St Andrews Library and the On Growth and Form 100 conference in Dundee/St Andrews.)

Watch a livestream of some of the research for this post being done here.

What Is a Computational Essay?

$
0
0
compessaythumb

A Powerful Way to Express Ideas

People are used to producing prose—and sometimes pictures—to express themselves. But in the modern age of computation, something new has become possible that I’d like to call the computational essay.

I’ve been working on building the technology to support computational essays for several decades, but it’s only very recently that I’ve realized just how central computational essays can be to both the way people learn, and the way they communicate facts and ideas. Professionals of the future will routinely deliver results and reports as computational essays. Educators will routinely explain concepts using computational essays. Students will routinely produce computational essays as homework for their classes.

Here’s a very simple example of a computational essay:

Simple computational essay example

There are basically three kinds of things here. First, ordinary text (here in English). Second, computer input. And third, computer output. And the crucial point is that these all work together to express what’s being communicated.

The ordinary text gives context and motivation. The computer input gives a precise specification of what’s being talked about. And then the computer output delivers facts and results, often in graphical form. It’s a powerful form of exposition that combines computational thinking on the part of the human author with computational knowledge and computational processing from the computer.

But what really makes this work is the Wolfram Language—and the succinct representation of high-level ideas that it provides, defining a unique bridge between human computational thinking and actual computation and knowledge delivered by a computer.

In a typical computational essay, each piece of Wolfram Language input will usually be quite short (often not more than a line or two). But the point is that such input can communicate a high-level computational thought, in a form that can readily be understood both by the computer and by a human reading the essay.

It’s essential to all this that the Wolfram Language has so much built-in knowledge—both about the world and about how to compute things in it. Because that’s what allows it to immediately talk not just about abstract computations, but also about real things that exist and happen in the world—and ultimately to provide a true computational communication language that bridges the capabilities of humans and computers.

An Example

Let’s use a computational essay to explain computational essays.

Let’s say we want to talk about the structure of a human language, like English. English is basically made up of words. Let’s get a list of the common ones.

Generate a list of common words in English:

WordList[]
WordList[]

How long is a typical word? Well, we can take the list of common words, and make a histogram that shows their distribution of lengths.

Make a histogram of word lengths:

Histogram[StringLength[WordList[]]]
Histogram[StringLength[WordList[]]]

Do the same for French:

Histogram[StringLength[WordList[Language -> "French"]]]
Histogram[StringLength[WordList[Language -> "French"]]]

Notice that the word lengths tend to be longer in French. We could investigate whether this is why documents tend to be longer in French than in English, or how this relates to quantities like entropy for text. (Of course, because this is a computational essay, the reader can rerun the computations in it themselves, say by trying Russian instead of French.)

But as something different, let’s compare languages by comparing their translations for, say, the word “computer”.

Find the translations for “computer” in the 10 most common languages:

Take[WordTranslation["computer", All], 10]
Take[WordTranslation["computer", All], 10]

Find the first translation in each case:

First /@ Take[WordTranslation["computer", All], 10]
First /@ Take[WordTranslation["computer", All], 10]

Arrange common languages in “feature space” based on their translations for “computer”:

FeatureSpacePlot[First /@ Take[WordTranslation["computer", All], 40]]
FeatureSpacePlot[First /@ Take[WordTranslation["computer", All], 40]]

From this plot, we can start to investigate all sorts of structural and historical relationships between languages. But from the point of view of a computational essay, what’s important here is that we’re sharing the exposition between ordinary text, computer input, and output.

The text is saying what the basic point is. Then the input is giving a precise definition of what we want. And the output is showing what’s true about it. But take a look at the input. Even just by looking at the names of the Wolfram Language functions in it, one can get a pretty good idea what it’s talking about. And while the function names are based on English, one can use “code captions” to understand it in another language, say Japanese:

FeatureSpacePlot[First/@Take[WordTranslation["computer",All],40]]
FeatureSpacePlot[First /@ Take[WordTranslation["computer", All], 40]]

But let’s say one doesn’t know about FeatureSpacePlot. What is it? If it was just a word or phrase in English, we might be able to look in a dictionary, but there wouldn’t be a precise answer. But a function in the Wolfram Language is always precisely defined. And to know what it does we can start by just looking at its documentation. But much more than that, we can just run it ourselves to explicitly see what it does.

FeatureSpacePlot page

And that’s a crucial part of what’s great about computational essays. If you read an ordinary essay, and you don’t understand something, then in the end you really just have to ask the author to find out what they meant. In a computational essay, though, there’s Wolfram Language input that precisely and unambiguously specifies everything—and if you want to know what it means, you can just run it and explore any detail of it on your computer, automatically and without recourse to anything like a discussion with the author.

Practicalities

How does one actually create a computational essay? With the technology stack we have, it’s very easy—mainly thanks to the concept of notebooks that we introduced with the first version of Mathematica all the way back in 1988. A notebook is a structured document that mixes cells of text together with cells of Wolfram Language input and output, including graphics, images, sounds, and interactive content:

A typical notebook

In modern times one great (and very hard to achieve!) thing is that full Wolfram Notebooks run seamlessly across desktop, cloud and mobile. You can author a notebook in the native Wolfram Desktop application (Mac, Windows, Linux)—or on the web through any web browser, or on mobile through the Wolfram Cloud app. Then you can share or publish it through the Wolfram Cloud, and get access to it on the web or on mobile, or download it to desktop or, now, iOS devices.

Notebook environments

Sometimes you want the reader of a notebook just to look at it, perhaps opening and closing groups of cells. Sometimes you also want them to be able to operate the interactive elements. And sometimes you want them to be able to edit and run the code, or maybe modify the whole notebook. And the crucial point is that all these things are easy to do with the cloud-desktop-mobile system we’ve built.

A New Form of Student Work

Computational essays are great for students to read, but they’re also great for students to write. Most of the current modalities for student work are remarkably old. Write an essay. Give a math derivation. These have been around for millennia. Not that there’s anything wrong with them. But now there’s something new: write a computational essay. And it’s wonderfully educational.

A computational essay is in effect an intellectual story told through a collaboration between a human author and a computer. The computer acts like a kind of intellectual exoskeleton, letting you immediately marshall vast computational power and knowledge. But it’s also an enforcer of understanding. Because to guide the computer through the story you’re trying to tell, you have to understand it yourself.

When students write ordinary essays, they’re typically writing about content that in some sense “already exists” (“discuss this passage”; “explain this piece of history”; …). But in doing computation (at least with the Wolfram Language) it’s so easy to discover new things that computational essays will end up with an essentially inexhaustible supply of new content, that’s never been seen before. Students will be exploring and discovering as well as understanding and explaining.

When you write a computational essay, the code in your computational essay has to produce results that fit with the story you’re telling. It’s not like you’re doing a mathematical derivation, and then some teacher tells you you’ve got the wrong answer. You can immediately see what your code does, and whether it fits with the story you’re telling. If it doesn’t, well then maybe your code is wrong—or maybe your story is wrong.

What should the actual procedure be for students producing computational essays? At this year’s Wolfram Summer School we did the experiment of asking all our students to write a computational essay about anything they knew about. We ended up with 72 interesting essays—exploring a very wide range of topics.

In a more typical educational setting, the “prompt” for a computational essay could be something like “What is the typical length of a word in English” or “Explore word lengths in English”.

There’s also another workflow I’ve tried. As the “classroom” component of a class, do livecoding (or a live experiment). Create or discover something, with each student following along by doing their own computations. At the end of the class, each student will have a notebook they made. Then have their “homework” be to turn that notebook into a computational essay that explains what was done.

And in my experience, this ends up being a very good exercise—that really tests and cements the understanding students have. But there’s also something else: when students have created a computational essay, they have something they can keep—and directly use—forever.

And this is one of the great general features of computational essays. When students write them, they’re in effect creating a custom library of computational tools for themselves—that they’ll be in a position to immediately use at any time in the future. It’s far too common for students to write notes in a class, then never refer to them again. Yes, they might run across some situation where the notes would be helpful. But it’s often hard to motivate going back and reading the notes—not least because that’s only the beginning; there’s still the matter of implementing whatever’s in the notes.

But the point is that with a computational essay, once you’ve found what you want, the code to implement it is right there—immediately ready to be applied to whatever has come up.

Any Subject You Want

What can computational essays be about? Almost anything! I’ve often said that for any field of study X (from archaeology to zoology), there either is now, or soon will be, a “computational X”. And any “computational X” can immediately be explored and explained using computational essays.

But even when there isn’t a clear “computational X” yet,  computational essays can still be a powerful way to organize and present material. In some sense, the very fact that a sequence of computations are typically needed to “tell the story” in an essay helps define a clear backbone for the whole essay. In effect, the structured nature of the computational presentation helps suggest structure for the narrative—making it easier for students (and others) to write essays that are easy to read and understand.

But what about actual subject matter? Well, imagine you’re studying history—say the history of the English Civil War. Well, conveniently, the Wolfram Language has a lot of knowledge about history (as about so many other things) built in. So you can present the English Civil War through a kind of dialog with it. For example, you can ask it for the geography of battles:

GeoListPlot[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "English Civil War", 
      Typeset`boxes$$ = TemplateBox[{"\"English Civil War\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"MilitaryConflict\"", ",", "\"EnglishCivilWar\""}], "]"}], 
        "\"Entity[\\\"MilitaryConflict\\\", \
\\\"EnglishCivilWar\\\"]\"", "\"military conflict\""}, "Entity"], 
      Typeset`allassumptions$$ = {{
       "type" -> "Clash", "word" -> "English Civil War", 
        "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "3", 
        "Values" -> {{
          "name" -> "MilitaryConflict", 
           "desc" -> "a military conflict", 
           "input" -> "*C.English+Civil+War-_*MilitaryConflict-"}, {
          "name" -> "Word", "desc" -> "a word", 
           "input" -> "*C.English+Civil+War-_*Word-"}, {
          "name" -> "HistoricalEvent", "desc" -> "a historical event",
            "input" -> "*C.English+Civil+War-_*HistoricalEvent-"}}}, {
       "type" -> "SubCategory", "word" -> "English Civil War", 
        "template" -> "Assuming ${desc1}. Use ${desc2} instead", 
        "count" -> "4", 
        "Values" -> {{
          "name" -> "EnglishCivilWar", 
           "desc" -> "English Civil War (1642 - 1651)", 
           "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_*\
EnglishCivilWar-"}, {
          "name" -> "FirstEnglishCivilWar", 
           "desc" -> "English Civil War (1642 - 1646)", 
           "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_*\
FirstEnglishCivilWar-"}, {
          "name" -> "SecondEnglishCivilWar", 
           "desc" -> "Second English Civil War", 
           "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_*\
SecondEnglishCivilWar-"}, {
          "name" -> "ThirdEnglishCivilWar", 
           "desc" -> "Third English Civil War", 
           "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_*\
ThirdEnglishCivilWar-"}}}}, Typeset`assumptions$$ = {}, 
      Typeset`open$$ = {1, 2}, Typeset`querystate$$ = {
      "Online" -> True, "Allowed" -> True, 
       "mparse.jsp" -> 1.305362`6.5672759594240935, 
       "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{265., {7., 17.}},
TrackedSymbols:>{
        Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
         Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)["Battles"]]

You could ask for a timeline of the beginning of the war (you don’t need to say “first 15 battles”, because if one cares, one can just read that from the Wolfram Language code):

TimelinePlot[Take[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "English Civil War", 
       Typeset`boxes$$ = TemplateBox[{"\"English Civil War\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"MilitaryConflict\"", ",", "\"EnglishCivilWar\""}], "]"}], 
         "\"Entity[\\\"MilitaryConflict\\\", \\\"EnglishCivilWar\\\"]\
\"", "\"military conflict\""}, "Entity"], 
       Typeset`allassumptions$$ = {{
        "type" -> "Clash", "word" -> "English Civil War", 
         "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "3", 
         "Values" -> {{
           "name" -> "MilitaryConflict", 
            "desc" -> "a military conflict", 
            "input" -> "*C.English+Civil+War-_*MilitaryConflict-"}, {
           "name" -> "Word", "desc" -> "a word", 
            "input" -> "*C.English+Civil+War-_*Word-"}, {
           "name" -> "HistoricalEvent", 
            "desc" -> "a historical event", 
            "input" -> "*C.English+Civil+War-_*HistoricalEvent-"}}}, {
        "type" -> "SubCategory", "word" -> "English Civil War", 
         "template" -> "Assuming ${desc1}. Use ${desc2} instead", 
         "count" -> "4", 
         "Values" -> {{
           "name" -> "EnglishCivilWar", 
            "desc" -> "English Civil War (1642 - 1651)", 
            "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_\
*EnglishCivilWar-"}, {
           "name" -> "FirstEnglishCivilWar", 
            "desc" -> "English Civil War (1642 - 1646)", 
            "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_\
*FirstEnglishCivilWar-"}, {
           "name" -> "SecondEnglishCivilWar", 
            "desc" -> "Second English Civil War", 
            "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_\
*SecondEnglishCivilWar-"}, {
           "name" -> "ThirdEnglishCivilWar", 
            "desc" -> "Third English Civil War", 
            "input" -> "*DPClash.MilitaryConflictE.English+Civil+War-_\
*ThirdEnglishCivilWar-"}}}}, Typeset`assumptions$$ = {}, 
       Typeset`open$$ = {1, 2}, Typeset`querystate$$ = {
       "Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 1.305362`6.5672759594240935, 
        "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{275., {7., 17.}},
TrackedSymbols:>{
         Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
          Typeset`assumptions$$, Typeset`open$$, 
          Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)["Battles"], 15]]

You could start looking at how armies moved, or who won and who lost at different points. At first, you can write a computational essay in which the computations are basically just generating custom infographics to illustrate your narrative. But then you can go further—and start really doing “computational history”. You can start to compute various statistical measures of the progress of the war. You can find ways to quantitatively compare it to other wars, and so on.

Can you make a “computational essay” about art? Absolutely. Maybe about art history. Pick 10 random paintings by van Gogh:



van Gogh paintings output
EntityValue[RandomSample[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "van gogh", Typeset`boxes$$ = 
       TemplateBox[{"\"Vincent van Gogh\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"Person\"", ",", "\"VincentVanGogh::9vq62\""}], "]"}], 
         "\"Entity[\\\"Person\\\", \\\"VincentVanGogh::9vq62\\\"]\"", 
         "\"person\""}, "Entity"], 
       Typeset`allassumptions$$ = {{
        "type" -> "Clash", "word" -> "van gogh", 
         "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "4", 
         "Values" -> {{
           "name" -> "Person", "desc" -> "a person", 
            "input" -> "*C.van+gogh-_*Person-"}, {
           "name" -> "Movie", "desc" -> "a movie", 
            "input" -> "*C.van+gogh-_*Movie-"}, {
           "name" -> "SolarSystemFeature", 
            "desc" -> "a solar system feature", 
            "input" -> "*C.van+gogh-_*SolarSystemFeature-"}, {
           "name" -> "Word", "desc" -> "a word", 
            "input" -> "*C.van+gogh-_*Word-"}}}}, 
       Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
       Typeset`querystate$$ = {
       "Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.472412`6.125865914333281, 
        "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{227., {7., 17.}},
TrackedSymbols:>{
         Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
          Typeset`assumptions$$, Typeset`open$$, 
          Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)["NotableArtworks"], 10], "Image"]

Then look at what colors they use (a surprisingly narrow selection):

ChromaticityPlot[%]
ChromaticityPlot[%]

Or maybe one could write a computational essay about actually creating art, or music.

What about science? You could rediscover Kepler’s laws by looking at properties of planets:

\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "planets", Typeset`boxes$$ = 
     TemplateBox[{"\"planets\"", 
RowBox[{"EntityClass", "[", 
RowBox[{"\"Planet\"", ",", "All"}], "]"}], 
       "\"EntityClass[\\\"Planet\\\", All]\"", "\"planets\""}, 
      "EntityClass"], 
     Typeset`allassumptions$$ = {{
      "type" -> "Clash", "word" -> "planets", 
       "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "4", 
       "Values" -> {{
         "name" -> "PlanetClass", "desc" -> " referring to planets", 
          "input" -> "*C.planets-_*PlanetClass-"}, {
         "name" -> "ExoplanetClass", 
          "desc" -> " referring to exoplanets", 
          "input" -> "*C.planets-_*ExoplanetClass-"}, {
         "name" -> "MinorPlanetClass", 
          "desc" -> " referring to minor planets", 
          "input" -> "*C.planets-_*MinorPlanetClass-"}, {
         "name" -> "Word", "desc" -> "a word", 
          "input" -> "*C.planets-_*Word-"}}}}, 
     Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
     Typeset`querystate$$ = {
     "Online" -> True, "Allowed" -> True, 
      "mparse.jsp" -> 0.400862`6.054539882441674, "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{171., {7., 17.}},
TrackedSymbols:>{
       Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
        Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)[{"DistanceFromSun", "OrbitPeriod"}]
ListLogLogPlot[%]
ListLogLogPlot[%]

Maybe you could go on and check it for exoplanets. Or you could start solving the equations of motion for planets.

You could look at biology. Here’s the first beginning of the reference sequence for the human mitochondrion:

GenomeData[{"Mitochondrion", {1, 150}}]
GenomeData[{"Mitochondrion", {1, 150}}]

You can start off breaking it into possible codons:

StringPartition[%, 3]
StringPartition[%, 3]

There’s an immense amount of data about all kinds of things built into the Wolfram Language. But there’s also the Wolfram Data Repository, which contains all sorts of specific datasets. Like here’s a map of state fairgrounds in the US:

GeoListPlot[  ResourceData["U.S. State Fairgrounds"][All, "GeoPosition"]]
GeoListPlot[
 ResourceData["U.S. State Fairgrounds"][All, "GeoPosition"]]

And here’s a word cloud of the constitutions of countries that have been enacted since 2010:

WordCloud[
 StringJoin[
  Normal[ResourceData["World Constitutions"][
    Select[#YearEnacted > \!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "year 2010", Typeset`boxes$$ = 
           RowBox[{"DateObject", "[", 
RowBox[{"{", "2010", "}"}], "]"}], 
           Typeset`allassumptions$$ = {{
            "type" -> "MultiClash", "word" -> "", 
             "template" -> "Assuming ${word1} is referring to \
${desc1}. Use \"${word2}\" as ${desc2}.", "count" -> "2", 
             "Values" -> {{
               "name" -> "PseudoTokenYear", "word" -> "year 2010", 
                "desc" -> "a year", 
                "input" -> "*MC.year+2010-_*PseudoTokenYear-"}, {
               "name" -> "Unit", "word" -> "year", "desc" -> "a unit",
                 "input" -> "*MC.year+2010-_*Unit-"}}}}, 
           Typeset`assumptions$$ = {}, Typeset`open$$ = {1}, 
           Typeset`querystate$$ = {
           "Online" -> True, "Allowed" -> True, 
            "mparse.jsp" -> 0.542662`6.186074404594303, 
            "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{86., {7., 18.}},
TrackedSymbols:>{
             Typeset`query$$, Typeset`boxes$$, 
              Typeset`allassumptions$$, Typeset`assumptions$$, 
              Typeset`open$$, Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\) &], "Text"]]]]

Quite often one’s interested in dealing not with public data, but with some kind of local data. One convenient source of this is the Wolfram Data Drop. In an educational setting, particular databins (or cloud objects in general) can be set so that they can be read (and/or added to) by some particular group. Here’s a databin that I accumulate for myself, showing my heart rate through the day. Here it is for today:

DataListPlot[TimeSeries[Databin[
DateListPlot[TimeSeries[YourDatabinHere]]

Of course, it’s easy to make a histogram too:

Histogram[TimeSeries[Databin[
Histogram[TimeSeries[YourDatabinHere]]

What about math? A key issue in math is to understand why things are true. The traditional approach to this is to give proofs. But computational essays provide an alternative. The nature of the steps in them is different—but the objective is the same: to show what’s true and why.

As a very simple example, let’s look at primes. Here are the first 50:

Table[Prime[n], {n, 50}]
Table[Prime[n], {n, 50}]

Let’s find the remainder mod 6 for all these primes:

Mod[Table[Prime[n], {n, 50}], 6]
Mod[Table[Prime[n], {n, 50}], 6]

But why do only 1 and 5 occur (well, after the trivial cases of the primes 2 and 3)? We can see this by computation. Any number can be written as 6n+k for some n and k:

Table[6 n + k, {k, 0, 5}]
Table[6 n + k, {k, 0, 5}]

But if we factor numbers written in this form, we’ll see that 6n+1 and 6n+5 are the only ones that don’t have to be multiples:

Factor[%]
Factor[%]

What about computer science? One could for example write a computational essay about implementing Euclid’s algorithm, studying its running time, and so on.

Define a function to give all steps in Euclid’s algorithm:

gcdlist[a_, b_] :=   NestWhileList[{Last[#], Apply[Mod, #]} &, {a, b}, Last[#] != 0 &, 1]
gcdlist[a_, b_] := 
 NestWhileList[{Last[#], Apply[Mod, #]} &, {a, b}, Last[#] != 0 &, 1]

Find the distribution of running lengths for the algorithm for numbers up to 200:

Histogram[Flatten[Table[Length[gcdlist[i, j]], {i, 200}, {j, 200}]]]
Histogram[Flatten[Table[Length[gcdlist[i, j]], {i, 200}, {j, 200}]]]

Or in modern times, one could explore machine learning, starting, say, by making a feature space plot of part of the MNIST handwritten digits dataset:

FeatureSpacePlot[RandomSample[Keys[ResourceData["MNIST"]], 50]]
FeatureSpacePlot[RandomSample[Keys[ResourceData["MNIST"]], 50]]

If you wanted to get deeper into software engineering, you could write a computational essay about the HTTP protocol. This gets an HTTP response from a site:

URLRead["https://www.wolframalpha.com"]
URLRead["https://www.wolfram.com"]

And this shows the tree structure of the elements on the webpage at that URL:

TreeForm[Import["http://www.wolframalpha.com", {"HTML", "XMLObject"}],   VertexLabeling -> False, AspectRatio -> 1/2]
TreeForm[Import["http://www.wolframalpha.com", {"HTML", "XMLObject"}],
  VertexLabeling -> False, AspectRatio -> 1/2]

Or—in a completely different direction—you could talk about anatomy:

AnatomyPlot3D[left foot]
AnatomyPlot3D[Entity["AnatomicalStructure", "LeftFoot"]]

What Makes a Good Computational Essay?

As far as I’m concerned, for a computational essay to be good, it has to be as easy to understand as possible. The format helps quite a lot, of course. Because a computational essay is full of outputs (often graphical) that are easy to skim, and that immediately give some impression of what the essay is trying to say. It also helps that computational essays are structured documents, that deliver information in well-encapsulated pieces.

But ultimately it’s up to the author of a computational essay to make it clear. But another thing that helps is that the nature of a computational essay is that it must have a “computational narrative”—a sequence of pieces of code that the computer can execute to do what’s being discussed in the essay. And while one might be able to write an ordinary essay that doesn’t make much sense but still sounds good, one can’t ultimately do something like that in a computational essay. Because in the end the code is the code, and actually has to run and do things.

So what can go wrong? Well, like English prose, Wolfram Language code can be unnecessarily complicated, and hard to understand. In a good computational essay, both the ordinary text, and the code, should be as simple and clean as possible. I try to enforce this for myself by saying that each piece of input should be at most one or perhaps two lines long—and that the caption for the input should always be just one line long. If I’m trying to do something where the core of it (perhaps excluding things like display options) takes more than a line of code, then I break it up, explaining each line separately.

Another important principle as far as I’m concerned is: be explicit. Don’t have some variable that, say, implicitly stores a list of words. Actually show at least part of the list, so people can explicitly see what it’s like. And when the output is complicated, find some tabulation or visualization that makes the features you’re interested in obvious. Don’t let the “key result” be hidden in something that’s tucked away in the corner; make sure the way you set things up makes it front and center.

Use the structured nature of notebooks. Break up computational essays with section headings, again helping to make them easy to skim. I follow the style of having a “caption line” before each input. Don’t worry if this somewhat repeats what a paragraph of text has said; consider the caption something that someone who’s just “looking at the pictures” might read to understand what a picture is of, before they actually dive into the full textual narrative.

The technology of Wolfram Notebooks makes it straightforward to put in interactive elements, like Manipulate, into computational essays. And sometimes this is very helpful, and perhaps even essential. But interactive elements shouldn’t be overused. Because whenever there’s an element that requires interaction, this reduces the ability to skim the essay.

Sometimes there’s a fair amount of data—or code—that’s needed to set up a particular computational essay. The cloud is very useful for handling this. Just deploy the data (or code) to the Wolfram Cloud, and set appropriate permissions so it can automatically be read whenever the code in your essay is executed.

Notebooks also allow “reverse closing” of cells—allowing an output cell to be immediately visible, even though the input cell that generated it is initially closed. This kind of hiding of code should generally be avoided in the body of a computational essay, but it’s sometimes useful at the beginning or end of an essay, either to give an indication of what’s coming, or to include something more advanced where you don’t want to go through in detail how it’s made.

OK, so if a computational essay is done, say, as homework, how can it be assessed? A first, straightforward question is: does the code run? And this can be determined pretty much automatically. Then after that, the assessment process is very much like it would be for an ordinary essay. Of course, it’s nice and easy to add cells into a notebook to give comments on what’s there. And those cells can contain runnable code—that for example can take results in the essay and process or check them.

Are there principles of good computational essays? Here are a few candidates:

0. Understand what you’re talking about (!)

1. Find the most straightforward and direct way to represent your subject matter

2. Keep the core of each piece of Wolfram Language input to a line or two

3. Use explicit visualization or other information presentation as much as possible

4. Try to make each input+caption independently understandable

5. Break different topics or directions into different subsections

Learning the Language

At the core of computational essays is the idea of expressing computational thoughts using the Wolfram Language. But to do that, one has to know the language. Now, unlike human languages, the Wolfram Language is explicitly designed (and, yes, that’s what I’ve been doing for the past 30+ years) to follow definite principles and to be as easy to learn as possible. But there’s still learning to be done.

One feature of the Wolfram Language is that—like with human languages—it’s typically easier to read than to write. And that means that a good way for people to learn what they need to be able to write computational essays is for them first to read a bunch of essays. Perhaps then they can start to modify those essays. Or they can start creating “notes essays”, based on code generated in livecoding or other classroom sessions.

As people get more fluent in writing the Wolfram Language, something interesting happens: they start actually expressing themselves in the language, and using Wolfram Language input to carry significant parts of the narrative in a computational essay.

When I was writing An Elementary Introduction to the Wolfram Language (which itself is written in large part as a sequence of computational essays) I had an interesting experience. Early in the book, it was decently easy to explain computational exercises in English (“Make a table of the first 10 squares”). But a little later in the book, it became a frustrating process.

It was easy to express what I wanted in the Wolfram Language. But to express it in English was long and awkward (and had a tendency of sounding like legalese). And that’s the whole point of using the Wolfram Language, and the reason I’ve spent 30+ years building it: because it provides a better, crisper way to express computational thoughts.

It’s sometimes said of human languages that the language you use determines how you think. It’s not clear how true this is of human languages. But it’s absolutely true of computer languages. And one of the most powerful things about the Wolfram Language is that it helps one formulate clear computational thinking.

Traditional computer languages are about writing code that describes the details of what a computer should do. The point of the Wolfram Language is to provide something much higher level—that can immediately talk about things in the world, and that can allow people as directly as possible to use it as a medium of computational thinking. And in a sense that’s what makes a good computational essay possible.

The Long Path to Computational Essays

Now that we have full-fledged computational essays, I realize I’ve been on a path towards them for nearly 40 years. At first I was taking interactive computer output and Scotch-taping descriptions into it:

Interactive computer output sketch

By 1981, when I built SMP, I was routinely writing documents that interspersed code and explanations:

Code interspersed with explanations

But it was only in 1986, when I started documenting what became Mathematica and the Wolfram Language, that I started seriously developing a style close to what I now favor for computational essays:

Wolfram Language Version 1 documentation

And with the release of Mathematica 1.0 in 1988 came another critical element: the invention of Wolfram Notebooks. Notebooks arrived in a form at least superficially very similar to the way they are today (and already in many ways more sophisticated than the imitations that started appearing 25+ years later!): collections of cells arranged into groups, and capable of containing text, executable code, graphics, etc.

Early Mac notebooks

At first notebooks were only possible on Mac and NeXT computers. A few years later they were extended to Microsoft Windows and X Windows (and later, Linux). But immediately people started using notebooks both to provide reports about they’d done, and to create rich expository and educational material. Within a couple of years, there started to be courses based on notebooks, and books printed from notebooks, with interactive versions available on CD-ROM at the back:

Notebook publication example

So in a sense the raw material for computational essays already existed by the beginning of the 1990s. But to really make computational essays come into their own required the development of the cloud—as well as the whole broad range of computational knowledge that’s now part of the Wolfram Language.

By 1990 it was perfectly possible to create a notebook with a narrative, and people did it, particularly about topics like mathematics. But if there was real-world data involved, things got messy. One had to make sure that whatever was needed was appropriately available from a distribution CD-ROM or whatever. We created a Player for notebooks very early, that was sometimes distributed with notebooks.

But in the last few years, particularly with the development of the Wolfram Cloud, things have gotten much more streamlined. Because now you can seamlessly store things in the cloud and use them anywhere. And you can work directly with notebooks in the cloud, just using a web browser. In addition, thanks to lots of user-assistance innovations (including natural language input), it’s become even easier to write in the Wolfram Language—and there’s ever more that can be achieved by doing so.

And the important thing that I think has now definitively happened is that it’s become lightweight enough to produce a good computational essay that it makes sense to do it as something routine—either professionally in writing reports, or as a student doing homework.

Ancient Educational History

The idea of students producing computational essays is something new for modern times, made possible by a whole stack of current technology. But there’s a curious resonance with something from the distant past. You see, if you’d learned a subject like math in the US a couple of hundred years ago, a big thing you’d have done is to create a so-called ciphering book—in which over the course of several years you carefully wrote out the solutions to a range of problems, mixing explanations with calculations. And the idea then was that you kept your ciphering book for the rest of your life, referring to it whenever you needed to solve problems like the ones it included.

Well, now, with computational essays you can do very much the same thing. The problems you can address are vastly more sophisticated and wide-ranging than you could reach with hand calculation. But like with ciphering books, you can write computational essays so they’ll be useful to you in the future—though now you won’t have to imitate calculations by hand; instead you’ll just edit your computational essay notebook and immediately rerun the Wolfram Language inputs in it.

I actually only learned about ciphering books quite recently. For about 20 years I’d had essentially as an artwork a curious handwritten notebook (created in 1818, it says, by a certain George Lehman, apparently of Orwigsburg, Pennsylvania), with pages like this:

Ciphering book

I now know this is a ciphering book—that on this page describes how to find the “height of a perpendicular object… by having the length of the shadow given”. And of course I can’t resist a modern computational essay analog, which, needless to say, can be a bit more elaborate.

Find the current position of the Sun as azimuth, altitude:

SunPosition[]
SunPosition[]

Find the length of a shadow for an object of unit height:

1/Tan[SunPosition[][[2]]]
1/Tan[SunPosition[][[2]]]

Given a 10-ft shadow, find the height of the object that made it:

Tan[SunPosition[][[2]]]10ft
Tan[SunPosition[][[2]]]10ft

The Path Ahead

I like writing textual essays (such as blog posts!). But I like writing computational essays more. Because at least for many of the things I want to communicate, I find them a purer and more efficient way to do it. I could spend lots of words trying to express an idea—or I can just give a little piece of Wolfram Language input that expresses the idea very directly and shows how it works by generating (often very visual) output with it.

When I wrote my big book A New Kind of Science (from 1991 to 2002), neither our technology nor the world was quite ready for computational essays in the form in which they’re now possible. My research for the book filled thousands of Wolfram Notebooks. But when it actually came to putting together the book, I just showed the results from those notebooks—including a little of the code from them in notes at the back of the book.

But now the story of the book can be told in computational essays—that I’ve been starting to produce. (Just for fun, I’ve been livestreaming some of the work I’m doing to create these.)  And what’s very satisfying is just how clearly and crisply the ideas in the book can be communicated in computational essays.

There is so much potential in computational essays. And indeed we’re now starting the project of collecting “topic explorations” that use computational essays to explore a vast range of topics in unprecedentedly clear and direct ways. It’ll be something like our Wolfram Demonstrations Project (that now has 11,000+ Wolfram Language–powered Demonstrations). Here’s a typical example I wrote:

The Central Limit Theorem

Computational essays open up all sorts of new types of communication. Research papers that directly present computational experiments and explorations. Reports that describe things that have been found, but allow other cases to be immediately explored. And, of course, computational essays define a way for students (and others) to very directly and usefully showcase what they’ve learned.

There’s something satisfying about both writing—and reading—computational essays. It’s as if in communicating ideas we’re finally able to go beyond pure human effort—and actually leverage the power of computation. And for me, having built the Wolfram Language to be a computational communication language, it’s wonderful to see how it can be used to communicate so effectively in computational essays.

It’s so nice when I get something sent to me as a well-formed computational essay. Because I immediately know that I’m going to get a straight story that I can actually understand. There aren’t going to be all sorts of missing sources and hidden assumptions; there’s just going to be Wolfram Language input that stands alone, and that I can take out and study or run for myself.

The modern world of the web has brought us a few new formats for communication—like blogs, and social media, and things like Wikipedia. But all of these still follow the basic concept of text + pictures that’s existed since the beginning of the age of literacy. With computational essays we finally have something new—and it’s going to be exciting to see all the things it makes possible.

What Do I Do All Day? Livestreamed Technology CEOing

$
0
0
LivestreamIcon

Catch a current livestream, or watch recorded livestreams at twitch.tv/stephen_wolfram »

Thinking in Public

I’ve been CEOing Wolfram Research for more than 30 years now. But what does that actually entail? What do I end up doing on a typical day? I certainly work hard. But I think I’m not particularly typical of CEOs of tech companies our size. Because for me a large part of my time is spent on the front lines of figuring out how our products should be designed and architected, and what they should do.

Thirty years ago I mostly did this by myself. But nowadays I’m almost always working with groups of people from our 800 or so employees. I like to do things very interactively. And in fact, for the past 15 years or so I’ve spent much of my time doing what I often call “thinking in public”: solving problems and making decisions live in meetings with other people.

I’m often asked how this works, and what actually goes on in our meetings. And recently I realized: what better way to show (and perhaps educate) people than just to livestream lots of our actual meetings? So over the past couple of months, I’ve livestreamed over 40 hours of my internal meetings—in effect taking everyone behind the scenes in what I do and how our products are created. (Yes, the livestreams are also archived.)

Livestream CEOing

This essay is also posted in WIRED »

Seeing Decisions Be Made

In the world at large, people often complain that “nothing happens in meetings”. Well, that’s not true of my meetings. In fact, I think it’s fair to say that in every single product-design meeting I do, significant things are figured out, and at least some significant decisions are made. So far this year, for example, we’ve added over 250 completely new functions to the Wolfram Language. Each one of those went through a meeting of mine. And quite often the design, the name, or even the very idea of the function was figured out live in the meeting.

There’s always a certain intellectual intensity to our meetings. We’ll have an hour or whatever, and we’ll have to work through what are often complex issues, that require a deep understanding of some area or another—and in the end come up with ideas and decisions that will often have very long-term consequences.

I’ve worked very hard over the past 30+ years to maintain the unity and coherence of the Wolfram Language. But every day I’m doing meetings where we decide about new things to be added to the language—and it’s always a big challenge and a big responsibility to maintain the standards we’ve set, and to make sure that the decisions we make today will serve us well in the years to come.

It could be about our symbolic framework for neural nets. Or about integrating with databases. Or how to represent complex engineering systems. Or new primitives for functional programming. Or new forms of geo visualization. Or quantum computing. Or programmatic interactions with mail servers. Or the symbolic representation of molecules. Or a zillion other topics that the Wolfram Language covers now, or will cover in the future.

What are the important functions in a particular area? How do they relate to other functions? Do they have the correct names? How can we deal with seemingly incompatible design constraints? Are people going to understand these functions? Oh, and are related graphics or icons as good and clear and elegant as they can be?

By now I basically have four decades of experience in figuring things like this out—and many of the people I work with are also very experienced. Usually a meeting will start with some proposal that’s been developed for how something should work. And sometimes it’ll just be a question of understanding what’s proposed, thinking it through, and then confirming it. But often—in order to maintain the standards we’ve set—there are real problems that still have to be solved. And a meeting will go back and forth, grappling with some issue or another.

Ideas will come up, often to be shot down. Sometimes it’ll feel like we’re completely stuck. But everyone in the meeting knows this isn’t an exercise; we’ve got to come up with an actual answer. Sometimes I’ll be trying to make analogies—to find somewhere else where we’ve solved a similar problem before. Or I’ll be insisting we go back to first principles—to kind of the center of the problem—to understand everything from the beginning. People will bring up lots of detailed academic or technical knowledge—and I’ll usually be trying to extract the essence of what it should be telling us.

It’d certainly be a lot easier if our standards were lower. But we don’t want a committee-compromise result. We want actual, correct answers that will stand the test of time. And these often require actual new ideas. But in the end it’s typically tremendously satisfying. We put in lots of work and thinking—and eventually we get a solution, and it’s a really good solution, that’s a real intellectual achievement.

Usually all of this goes on in private, inside our company. But with the livestream, anyone can see it happening—and can see the moment when some function is named, or some problem is solved.

What Are the Meetings Like?

What will actually be going on if you tune into a livestream? It’s pretty diverse. You might see some new Wolfram Language function being tried out (often based on code that’s only days or even hours old). You might see a discussion about software engineering, or trends in machine learning, or the philosophy of science, or how to handle some issue of popular culture, or what it’s going to take to fix some conceptual bug. You might see some new area get started, you might see some specific piece of Wolfram Language documentation get finished, or you might see a piece of final visual design get done.

There’s quite a range of people in our meetings, with a whole diversity of accents and backgrounds and specialties. And it’s pretty common for us to need to call in some extra person with specific expertise we hadn’t thought was needed. (I find it a little charming that our company culture is such that nobody ever seems surprised to be called into a meeting and asked about a detail of some unusual topic they had no idea was relevant to us before.)

We’re a very geographically distributed company (I’ve been a remote CEO since 1991). So basically all our meetings are through webconferencing. (We use audio and screensharing, but we never find video helpful, except perhaps for looking at a mobile device or a book or a drawing on a piece of paper.)

Most often we’re looking at my screen, but sometimes it’ll be someone else’s screen. (The most common reason to look at someone else’s screen is to see something that’s only working on their machine so far.) Most often I’ll be working in a Wolfram Notebook. Usually there’ll be an initial agenda in a notebook, together with executable Wolfram Language code. We’ll start from that, but then I’ll be modifying the notebook, or creating a new one. Often I’ll be trying out design ideas. Sometimes people will be sending code fragments for me to run, or I’ll be writing them myself. Sometimes I’ll be live-editing our main documentation. Sometimes we’ll be watching graphic design being done in real time.

As much as possible, the goal in our meetings is to finish things. To consult in real time with all the people who have input we need, and to get all the ideas and issues about something resolved. Yes, sometimes, afterwards, someone (sometimes me) will realize that something we thought we figured out isn’t correct, or won’t work. But the good news is that that’s pretty rare, probably because the way we run our meetings, things get well aired in real time.

People in our meetings tend to be very direct. If they don’t agree with something, they’ll say so. I’m very keen that everyone in a meeting actually understands anything that’s relevant to them—so we get the benefit of their thinking and judgement about it. (That probably leads to an over-representation from me of phrases like “does that make sense?” or “do you get what I’m saying?”.)

It really helps, of course, that we have very talented people, who are quick at understanding things. And by now everyone knows that even if the main topic of a meeting is one thing, it’s quite likely that we’ll have to dip into something completely different in order to make progress. It requires a certain intellectual agility to keep up with this—but if nothing else, I think that’s on its own a great thing to practice and cultivate.

For me it’s very invigorating to work on so many different topics—often wildly different even between successive hours in a day. It’s hard work, but it’s also fun. And, yes, there is often humor, particularly in the specifics of the examples we’ll end up discussing (lots of elephants and turtles, and strange usage scenarios).

The meetings vary in size from 2 or 3 people to perhaps 20 people. Sometimes people will be added and dropped through the course of the meeting, as the details of what we’re discussing change. Particularly in larger meetings—that tend to be about projects that cut across multiple groups—we’ll typically have one or more project managers (we call them “PMs”) present. The PMs are responsible for the overall flow of the project—and particularly for coordinating between different groups that need to contribute.

If you listen to the livestream, you’ll hear a certain amount of jargon. Some of it is pretty typical in the software industry (UX = user experience, SQA = software quality assurance). Some of it is more specific to our company—like acronyms for departments (DQA = Document Quality Assurance, WPE = Web Product Engineering) or names of internal things (XKernel = prototype Wolfram Language build, pods = elements of Wolfram|Alpha output, pinkboxing = indicating undisplayable output, knitting = crosslinking elements of documentation). And occasionally, of course, there’s a new piece of jargon, or a new name for something, invented right in the meeting.

Usually our meetings are pretty fast paced. An idea will come up—and immediately people are responding to it. And as soon as something’s been decided, people will start building on the decision, and figuring out more. It’s remarkably productive, and I think it’s a pretty interesting process to watch. Even though without the experience base that the people in the meeting have, there may be some points at which it seems as if ideas are flying around too fast to keep track of what’s going on.

The Process of Livestreaming

The idea of livestreaming our internal meetings is new. But over the years I’ve done a fair amount of livestreaming for other purposes.

Back in 2009, when we launched Wolfram|Alpha, we actually livestreamed the process of making the site live. (I figured that if things went wrong, we might as well just show everyone what actually went wrong, rather than just putting up a “site unavailable” message.)

I’ve livestreamed demos and explorations of new software we’ve released. I’ve livestreamed work I happen to be doing in writing code or producing “computational essays”. (My son Christopher is arguably a faster Wolfram Language programmer than me, and he’s livestreamed some livecoding he’s done too.) I’ve also livestreamed live experiments, particularly from our Wolfram Summer School and Wolfram Summer Camp.

But until recently, all my livestreaming had basically been solo: it hadn’t involved having other people in the livestream. But I’ve always thought our internal design review meetings are pretty interesting, so I thought “why not let other people listen in on them too?”. I have to admit I was a little nervous about this at first. After all, these meetings are pretty central to what our company does, and we can’t afford to have them be dragged down by anything.

And so I’ve insisted that a meeting has to be just the same whether it’s livestreamed or not. My only immediate concession to livestreaming is that I give a few sentences of introduction to explain roughly what the meeting is going to be about. And the good news has been that as soon as a meeting gets going, the people in it (including myself) seem to rapidly forget that it’s being livestreamed—and just concentrate on the (typically quite intense) things that are going on in the meeting.

But something interesting that happens when we’re livestreaming a meeting is that there’s real-time text chat with viewers. Often it’s questions and general discussion. But sometimes it’s interesting comments or suggestions about what we’re doing or saying. It’s like having instant advisors, or an instant focus group, giving us real-time input or feedback about our decisions.

As a practical matter, the primary people in the meeting are too focused on the meeting itself to be handling text chat. So we have separate people doing that—surfacing a small number of the most relevant comments and suggestions. And this has worked great—and in fact in most meetings at least one or two good ideas come from our viewers, that we’re instantly able to incorporate into our thinking.

One can think of livestreaming as something a bit like reality TV—except that it’s live and real time. We’re planning to have some systematic “broadcast times” for recorded material. But the live component has the constraint that it has to happen when the meetings are actually happening. I tend to have a very full and complex schedule, packing in all the various things I do. And exactly when a particular design review meeting can happen will often depend on when a particular piece of code or design work is ready.

It will also depend on the availability of the various other people in the meetings—who have their own constraints, and often live in a wide range of time zones. I’ve tried other approaches, but the most common thing now is that design review meetings are scheduled soon before they actually happen, and typically not more than a day or two in advance. And even though I personally work at night as well as during the day, most design reviews tend to get scheduled during US (East Coast) working hours, because that’s when it’s easiest to arrange for all the people who have to be in the meeting—as well as people who might be called in if their expertise is needed.

From the point of view of livestreaming, it would be nice to have a more predictable schedule of relevant meetings, but the meetings are being set up to achieve maximum productivity in their own right—and livestreaming is just an add-on.

We’re trying to use Twitter to give some advance notice of livestreaming. But in the end the best indication of when a livestream is starting is just the notification that comes from the Twitch livestreaming platform we’re using. (Yes, Twitch is mainly used for e-sports right now, but we [and they] hope it can be used for other things too—and with their e-sports focus, their technology for screensharing has become very good. Curiously, I’ve been aware of Twitch for a long time. I met its founders at the very first Y Combinator Demo Day in 2005, and we used its precursor, justin.tv, to livestream the Wolfram|Alpha launch.)

Styles of Work

Not all the work I do is suitable for livestreaming. In addition to “thinking in public” in meetings, I also spend time “thinking in private”, doing things like just writing. (I actually spent more than 10 years almost exclusively “thinking in private” when I worked on my book A New Kind of Science.)

If I look at my calendar for a given week, I’ll see a mixture of things. Every day there are typically at least one or two design reviews of the kind I’ve been livestreaming. There are also a fair number of project reviews, where I’m trying to help move all kinds of projects along. And there are some strategy and management discussions too, along with the very occasional external meeting.

Our company is weighted very heavily towards R&D—and trying to build the best possible products. And that’s certainly reflected in the way I spend my time—and in my emphasis on intellectual rather than commercial value. Some people might think that after all these years I couldn’t possibly still be involved in the level of detail that’s in evidence in the design reviews we’ve been livestreaming.

But here’s the thing: I’m trying hard to design the Wolfram Language in the very best possible way for the long term. And after 40 years of doing software design, I’m pretty experienced at it. So I’m both fairly fast at doing it, and fairly good at not making mistakes. By now, of course, there are many other excellent software designers at our company. But I’m still the person who has the most experience with Wolfram Language design—as well as the most global view of the system (which is part of why in design review meetings, I end up spending some fraction of my time just connecting different related design efforts).

And, yes, I get involved in details. What exactly should the name of that option be? What color should that icon be? What should this function do in a particular corner case? And, yes, every one of these things could be solved in some way without me. But in a fairly short time, I can help make sure that what we have is really something that we can build on—and be proud of—in the years to come. And I consider it a good and worthy way for me to spend my time.

And it’s fun to be able to open up this process for people, by livestreaming the meetings we have. I’m hoping it’ll be useful for people to understand a bit about what goes into creating the Wolfram Language (and yes, software design often tends to be a bit unsung, and mainly noticed only if it’s got wrong—so it’s nice to be able to show what’s actually involved).

In a sense, doing the design of the Wolfram Language is a very concentrated and high-end example of computational thinking. And I hope that by experiencing it in watching our meetings, people will learn more about how they can do computational thinking themselves.

The meetings that we’re livestreaming now are about features of the Wolfram Language etc. that we currently have under development. But with our aggressive schedule of releasing software, it shouldn’t be long before the things we’re talking about are actually released in working products. And when that happens, there’ll be something quite unique about it. Because for the first time ever, people will not only be able to see what got done, but they’ll also be able to go back to a recorded livestream and see how it came to be figured out.

It’s an interesting and unique record of a powerful form of intellectual activity. But for me it’s already nice just to be able to share some of the fascinating conversations I end up being part of every day. And to feel like the time I’m spending as a very hands-on CEO not only advances the Wolfram Language and the other things we’re building, but can also directly help educate—and perhaps entertain—a few more people out in the world.

A New Kind of Science: A 15-Year View

$
0
0
15th-thumb

Starting now, in celebration of its 15th anniversary, A New Kind of Science will be freely available in its entirety, with high-resolution images, on the web or for download.

A New Kind of Science

It’s now 15 years since I published my book A New Kind of Science—more than 25 since I started writing it, and more than 35 since I started working towards it. But with every passing year I feel I understand more about what the book is really about—and why it’s important. I wrote the book, as its title suggests, to contribute to the progress of science. But as the years have gone by, I’ve realized that the core of what’s in the book actually goes far beyond science—into many areas that will be increasingly important in defining our whole future.

So, viewed from a distance of 15 years, what is the book really about? At its core, it’s about something profoundly abstract: the theory of all possible theories, or the universe of all possible universes. But for me one of the achievements of the book is the realization that one can explore such fundamental things concretely—by doing actual experiments in the computational universe of possible programs. And in the end the book is full of what might at first seem like quite alien pictures made just by running very simple such programs.

Back in 1980, when I made my living as a theoretical physicist, if you’d asked me what I thought simple programs would do, I expect I would have said “not much”. I had been very interested in the kind of complexity one sees in nature, but I thought—like a typical reductionistic scientist—that the key to understanding it must lie in figuring out detailed features of the underlying component parts.

In retrospect I consider it incredibly lucky that all those years ago I happened to have the right interests and the right skills to actually try what is in a sense the most basic experiment in the computational universe: to systematically take a sequence of the simplest possible programs, and run them.

I could tell as soon as I did this that there were interesting things going on, but it took a couple more years before I began to really appreciate the force of what I’d seen. For me it all started with one picture:

Rule 30

Or, in modern form:

Rule 30, modern form

I call it rule 30. It’s my all-time favorite discovery, and today I carry it around everywhere on my business cards. What is it? It’s one of the simplest programs one can imagine. It operates on rows of black and white cells, starting from a single black cell, and then repeatedly applies the rules at the bottom. And the crucial point is that even though those rules are by any measure extremely simple, the pattern that emerges is not.

It’s a crucial—and utterly unexpected—feature of the computational universe: that even among the very simplest programs, it’s easy to get immensely complex behavior. It took me a solid decade to understand just how broad this phenomenon is. It doesn’t just happen in programs (“cellular automata”) like rule 30. It basically shows up whenever you start enumerating possible rules or possible programs whose behavior isn’t obviously trivial.

Similar phenomena had actually been seen for centuries in things like the digits of pi and the distribution of primes—but they were basically just viewed as curiosities, and not as signs of something profoundly important. It’s been nearly 35 years since I first saw what happens in rule 30, and with every passing year I feel I come to understand more clearly and deeply what its significance is.

Four centuries ago it was the discovery of the moons of Jupiter and their regularities that sowed the seeds for modern exact science, and for the modern scientific approach to thinking. Could my little rule 30 now be the seed for another such intellectual revolution, and a new way of thinking about everything?

In some ways I might personally prefer not to take responsibility for shepherding such ideas (“paradigm shifts” are hard and thankless work). And certainly for years I have just quietly used such ideas to develop technology and my own thinking. But as computation and AI become increasingly central to our world, I think it’s important that the implications of what’s out there in the computational universe be more widely understood.

Implications of the Computational Universe

Here’s the way I see it today. From observing the moons of Jupiter we came away with the idea that—if looked at right—the universe is an ordered and regular place, that we can ultimately understand. But now, in exploring the computational universe, we quickly come upon things like rule 30 where even the simplest rules seem to lead to irreducibly complex behavior.

One of the big ideas of A New Kind of Science is what I call the Principle of Computational Equivalence. The first step is to think of every process—whether it’s happening with black and white squares, or in physics, or inside our brains—as a computation that somehow transforms input to output. What the Principle of Computational Equivalence says is that above an extremely low threshold, all processes correspond to computations of equivalent sophistication.

It might not be true. It might be that something like rule 30 corresponds to a fundamentally simpler computation than the fluid dynamics of a hurricane, or the processes in my brain as I write this. But what the Principle of Computational Equivalence says is that in fact all these things are computationally equivalent.

It’s a very important statement, with many deep implications. For one thing, it implies what I call computational irreducibility. If something like rule 30 is doing a computation just as sophisticated as our brains or our mathematics, then there’s no way we can “outrun” it: to figure out what it will do, we have to do an irreducible amount of computation, effectively tracing each of its steps.

The mathematical tradition in exact science has emphasized the idea of predicting the behavior of systems by doing things like solving mathematical equations. But what computational irreducibility implies is that out in the computational universe that often won’t work, and instead the only way forward is just to explicitly run a computation to simulate the behavior of the system.

A Shift in Looking at the World

One of the things I did in A New Kind of Science was to show how simple programs can serve as models for the essential features of all sorts of physical, biological and other systems. Back when the book appeared, some people were skeptical about this. And indeed at that time there was a 300-year unbroken tradition that serious models in science should be based on mathematical equations.

But in the past 15 years something remarkable has happened. For now, when new models are created—whether of animal patterns or web browsing behavior—they are overwhelmingly more often based on programs than on mathematical equations.

Year by year, it’s been a slow, almost silent, process. But by this point, it’s a dramatic shift. Three centuries ago pure philosophical reasoning was supplanted by mathematical equations. Now in these few short years, equations have been largely supplanted by programs. For now, it’s mostly been something practical and pragmatic: the models work better, and are more useful.

But when it comes to understanding the foundations of what’s going on, one’s led not to things like mathematical theorems and calculus, but instead to ideas like the Principle of Computational Equivalence. Traditional mathematics-based ways of thinking have made concepts like force and momentum ubiquitous in the way we talk about the world. But now as we think in fundamentally computational terms we have to start talking in terms of concepts like undecidability and computational irreducibility.

Will some type of tumor always stop growing in some particular model? It might be undecidable. Is there a way to work out how a weather system will develop? It might be computationally irreducible.

These concepts are pretty important when it comes to understanding not only what can and cannot be modeled, but also what can and cannot be controlled in the world. Computational irreducibility in economics is going to limit what can be globally controlled. Computational irreducibility in biology is going to limit how generally effective therapies can be—and make highly personalized medicine a fundamental necessity.

And through ideas like the Principle of Computational Equivalence we can start to discuss just what it is that allows nature—seemingly so effortlessly—to generate so much that seems so complex to us. Or how even deterministic underlying rules can lead to computationally irreducible behavior that for all practical purposes can seem to show “free will”.

Cellular automata

Mining the Computational Universe

A central lesson of A New Kind of Science is that there’s a lot of incredible richness out there in the computational universe. And one reason that’s important is that it means that there’s a lot of incredible stuff out there for us to “mine” and harness for our purposes.

Want to automatically make an interesting custom piece of art? Just start looking at simple programs and automatically pick out one you like—as in our WolframTones music site from a decade ago. Want to find an optimal algorithm for something? Just search enough programs out there, and you’ll find one.

We’ve normally been used to creating things by building them up, step by step, with human effort—progressively creating architectural plans, or engineering drawings, or lines of code. But the discovery that there’s so much richness so easily accessible in the computational universe suggests a different approach: don’t try building anything; just define what you want, and then search for it in the computational universe.

Sometimes it’s really easy to find. Like let’s say you want to generate apparent randomness. Well, then just enumerate cellular automata (as I did in 1984), and very quickly you come upon rule 30—which turns out to be one of the very best known generators of apparent randomness (look down the center column of cell values, for examples). In other situations you might have to search 100,000 cases (as I did in finding the simplest axiom system for logic, or the simplest universal Turing machine), or you might have to search millions or even trillions of cases. But in the past 25 years, we’ve had incredible success in just discovering algorithms out there in the computational universe—and we rely on many of them in implementing the Wolfram Language.

At some level it’s quite sobering. One finds some tiny program out in the computational universe. One can tell it does what one wants. But when one looks at what it’s doing, one doesn’t have any real idea how it works. Maybe one can analyze some part—and be struck by how “clever” it is. But there just isn’t a way for us to understand the whole thing; it’s not something familiar from our usual patterns of thinking.

Of course, we’ve often had similar experiences before—when we use things from nature. We may notice that some particular substance is a useful drug or a great chemical catalyst, but we may have no idea why. But in doing engineering and in most of our modern efforts to build technology, the great emphasis has instead been on constructing things whose design and operation we can readily understand.

In the past we might have thought that was enough. But what our explorations of the computational universe show is that it’s not: selecting only things whose operation we can readily understand misses most of the immense power and richness that’s out there in the computational universe.

A World of Discovered Technology

What will the world look like when more of what we have is mined from the computational universe? Today the environment we build for ourselves is dominated by things like simple shapes and repetitive processes. But the more we use what’s out there in the computational universe, the less regular things will look. Sometimes they may look a bit “organic”, or like what we see in nature (since after all, nature follows similar kinds of rules). But sometimes they may look quite random, until perhaps suddenly and incomprehensibly they achieve something we recognize.

For several millennia we as a civilization have been on a path to understand more about what happens in our world—whether by using science to decode nature, or by creating our own environment through technology. But to use more of the richness of the computational universe we must at least to some extent forsake this path.

In the past, we somehow counted on the idea that between our brains and the tools we could create we would always have fundamentally greater computational power than the things around us—and as a result we would always be able to “understand” them. But what the Principle of Computational Equivalence says is that this isn’t true: out in the computational universe there are lots of things just as powerful as our brains or the tools we build. And as soon as we start using those things, we lose the “edge” we thought we had.

Today we still imagine we can identify discrete “bugs” in programs. But most of what’s powerful out there in the computational universe is rife with computational irreducibility—so the only real way to see what it does is just to run it and watch what happens.

We ourselves, as biological systems, are a great example of computation happening at a molecular scale—and we are no doubt rife with computational irreducibility (which is, at some fundamental level, why medicine is hard). I suppose it’s a tradeoff: we could limit our technology to consist only of things whose operation we understand. But then we would miss all that richness that’s out there in the computational universe. And we wouldn’t even be able to match the achievements of our own biology in the technology we create.

Machine Learning and the Neural Net Renaissance

There’s a common pattern I’ve noticed with intellectual fields. They go for decades and perhaps centuries with only incremental growth, and then suddenly, usually as a result of a methodological advance, there’s a burst of “hypergrowth” for perhaps 5 years, in which important new results arrive almost every week.

I was fortunate enough that my own very first field—particle physics—was in its period of hypergrowth right when I was involved in the late 1970s. And for myself, the 1990s felt like a kind of personal period of hypergrowth for what became A New Kind of Science—and indeed that’s why I couldn’t pull myself away from it for more than a decade.

But today, the obvious field in hypergrowth is machine learning, or, more specifically, neural nets. It’s funny for me to see this. I actually worked on neural nets back in 1981, before I started on cellular automata, and several years before I found rule 30. But I never managed to get neural nets to do anything very interesting—and actually I found them too messy and complicated for the fundamental questions I was concerned with.

And so I “simplified them”—and wound up with cellular automata. (I was also inspired by things like the Ising model in statistical physics, etc.) At the outset, I thought I might have simplified too far, and that my little cellular automata would never do anything interesting. But then I found things like rule 30. And I’ve been trying to understand its implications ever since.

In building Mathematica and the Wolfram Language, I’d always kept track of neural nets, and occasionally we’d use them in some small way for some algorithm or another. But about 5 years ago I suddenly started hearing amazing things: that somehow the idea of training neural nets to do sophisticated things was actually working. At first I wasn’t sure. But then we started building neural net capabilities in the Wolfram Language, and finally two years ago we released our ImageIdentify.com website—and now we’ve got our whole symbolic neural net system. And, yes, I’m impressed. There are lots of tasks that had traditionally been viewed as the unique domain of humans, but which now we can routinely do by computer.

But what’s actually going on in a neural net? It’s not really to do with the brain; that was just the inspiration (though in reality the brain probably works more or less the same way). A neural net is really a sequence of functions that operate on arrays of numbers, with each function typically taking quite a few inputs from around the array. It’s not so different from a cellular automaton. Except that in a cellular automaton, one’s usually dealing with, say, just 0s and 1s, not arbitrary numbers like 0.735. And instead of taking inputs from all over the place, in a cellular automaton each step takes inputs only from a very well-defined local region.

Now, to be fair, it’s pretty common to study “convolutional neural nets”, in which the patterns of inputs are very regular, just like in a cellular automaton. And it’s becoming clear that having precise (say 32-bit) numbers isn’t critical to the operation of neural nets; one can probably make do with just a few bits.

But a big feature of neural nets is that we know how to make them “learn”. In particular, they have enough features from traditional mathematics (like involving continuous numbers) that techniques like calculus can be applied to provide strategies to make them incrementally change their parameters to “fit their behavior” to whatever training examples they’re given.

It’s far from obvious how much computational effort, or how many training examples, will be needed. But the breakthrough of about five years ago was the discovery that for many important practical problems, what’s available with modern GPUs and modern web-collected training sets can be enough.

Pretty much nobody ends up explicitly setting or “engineering” the parameters in a neural net. Instead, what happens is that they’re found automatically. But unlike with simple programs like cellular automata, where one’s typically enumerating all possibilities, in current neural nets there’s an incremental process, essentially based on calculus, that manages to progressively improve the net—a little like the way biological evolution progressively improves the “fitness” of an organism.

It’s plenty remarkable what comes out from training a neural net in this way, and it’s plenty difficult to understand how the neural net does what it does. But in some sense the neural net isn’t venturing too far across the computational universe: it’s always basically keeping the same basic computational structure, and just changing its behavior by changing parameters.

But to me the success of today’s neural nets is a spectacular endorsement of the power of the computational universe, and another validation of the ideas of A New Kind of Science. Because it shows that out in the computational universe, away from the constraints of explicitly building systems whose detailed behavior one can foresee, there are immediately all sorts of rich and useful things to be found.

NKS Meets Modern Machine Learning

Is there a way to bring the full power of the computational universe—and the ideas of A New Kind of Science—to the kinds of things one does with neural nets? I suspect so. And in fact, as the details become clear, I wouldn’t be surprised if exploration of the computational universe saw its own period of hypergrowth: a “mining boom” of perhaps unprecedented proportions.

In current work on neural nets, there’s a definite tradeoff one sees. The more what’s going on inside the neural net is like a simple mathematical function with essentially arithmetic parameters, the easier it is to use ideas from calculus to train the network. But the more what’s going is like a discrete program, or like a computation whose whole structure can change, the more difficult it is to train the network.

It’s worth remembering, though, that the networks we’re routinely training now would have looked utterly impractical to train only a few years ago. It’s effectively just all those quadrillions of GPU operations that we can throw at the problem that makes training feasible. And I won’t be surprised if even quite pedestrian (say, local exhaustive search) techniques will fairly soon let one do significant training even in cases where no incremental numerical approach is possible. And perhaps even it will be possible to invent some major generalization of things like calculus that will operate in the full computational universe. (I have some suspicions, based on thinking about generalizing basic notions of geometry to cover things like cellular automaton rule spaces.)

What would this let one do? Likely it would let one find considerably simpler systems that could achieve particular computational goals. And maybe that would bring within reach some qualitatively new level of operations, perhaps beyond what we’re used to being possible with things like brains.

There’s a funny thing that’s going on with modeling these days. As neural nets become more successful, one begins to wonder: why bother to simulate what’s going on inside a system when one can just make a black-box model of its output using a neural net? Well, if we manage to get machine learning to reach deeper into the computational universe, we won’t have as much of this tradeoff any more—because we’ll be able to learn models of the mechanism as well as the output.

I’m pretty sure that bringing the full computational universe into the purview of machine learning will have spectacular consequences. But it’s worth realizing that computational universality—and the Principle of Computational Equivalence—make it less a matter of principle. Because they imply that even neural nets of the kinds we have now are universal, and are capable of emulating anything any other system can do. (In fact, this universality result was essentially what launched the whole modern idea of neural nets, back in 1943.)

And as a practical matter, the fact that current neural net primitives are being built into hardware and so on will make them a desirable foundation for actual technology systems, though, even if they’re far from optimal. But my guess is that there are tasks where for the foreseeable future access to the full computational universe will be necessary to make them even vaguely practical.

Finding AI

What will it take to make artificial intelligence? As a kid, I was very interested in figuring out how to make a computer know things, and be able to answer questions from what it knew. And when I studied neural nets in 1981, it was partly in the context of trying to understand how to build such a system. As it happens, I had just developed SMP, which was a forerunner of Mathematica (and ultimately the Wolfram Language)—and which was very much based on symbolic pattern matching (“if you see this, transform it to that”). At the time, though, I imagined that artificial intelligence was somehow a “higher level of computation”, and I didn’t know how to achieve it.

I returned to the problem every so often, and kept putting it off. But then when I was working on A New Kind of Science it struck me: if I’m to take the Principle of Computational Equivalence seriously, then there can’t be any fundamentally “higher level of computation”—so AI must be achievable just with the standard ideas of computation that I already know.

And it was this realization that got me started building Wolfram|Alpha. And, yes, what I found is that lots of those very “AI-oriented things”, like natural language understanding, could be done just with “ordinary computation”, without any magic new AI invention. Now, to be fair, part of what was happening was that we were using ideas and methods from A New Kind of Science: we weren’t just engineering everything; we were often searching the computational universe for rules and algorithms to use.

So what about “general AI”? Well, I think at this point that with the tools and understanding we have, we’re in a good position to automate essentially anything we can define. But definition is a more difficult and central issue than we might imagine.

The way I see things at this point is that there’s a lot of computation even near at hand in the computational universe. And it’s powerful computation. As powerful as anything that happens in our brains. But we don’t recognize it as “intelligence” unless it’s aligned with our human goals and purposes.

Ever since I was writing A New Kind of Science, I’ve been fond of quoting the aphorism “the weather has a mind of its own”. It sounds so animistic and pre-scientific. But what the Principle of Computational Equivalence says is that actually, according to the most modern science, it’s true: the fluid dynamics of the weather is the same in its computational sophistication as the electrical processes that go on in our brains.

But is it “intelligent”? When I talk to people about A New Kind of Science, and about AI, I’ll often get asked when I think we’ll achieve “consciousness” in a machine. Life, intelligence, consciousness: they are all concepts that we have a specific example of, here on Earth. But what are they in general? All life on Earth shares RNA and the structure of cell membranes. But surely that’s just because all life we know is part of one connected thread of history; it’s not that such details are fundamental to the very concept of life.

And so it is with intelligence. We have only one example we’re sure of: us humans. (We’re not even sure about animals.) But human intelligence as we experience it is deeply entangled with human civilization, human culture and ultimately also human physiology—even though none of these details are presumably relevant in the abstract definition of intelligence.

We might think about extraterrestrial intelligence. But what the Principle of Computational Equivalence implies is that actually there’s “alien intelligence” all around us. But somehow it’s just not quite aligned with human intelligence. We might look at rule 30, for example, and be able to see that it’s doing sophisticated computation, just like our brains. But somehow it just doesn’t seem to have any “point” to what it’s doing.

We imagine that in doing the things we humans do, we operate with certain goals or purposes. But rule 30, for example, just seems to be doing what it’s doing—just following some definite rule. In the end, though, one realizes we’re not so very different. After all, there are definite laws of nature that govern our brains. So anything we do is at some level just playing out those laws.

Any process can actually be described either in terms of mechanism (“the stone is moving according to Newton’s laws”), or in terms of goals (“the stone is moving so as to minimize potential energy”). The description in terms of mechanism is usually what’s most useful in connecting with science. But the description in terms of goals is usually what’s most useful in connecting with human intelligence.

And this is crucial in thinking about AI. We know we can have computational systems whose operations are as sophisticated as anything. But can we get them to do things that are aligned with human goals and purposes?

In a sense this is what I now view as the key problem of AI: it’s not about achieving underlying computational sophistication, but instead it’s about communicating what we want from this computation.

The Importance of Language

I’ve spent much of my life as a computer language designer—most importantly creating what is now the Wolfram Language. I’d always seen my role as a language designer being to imagine the possible computations people might want to do, then—like a reductionist scientist—trying to “drill down” to find good primitives from which all these computations could be built up. But somehow from A New Kind of Science, and from thinking about AI, I’ve come to think about it a little differently.

Now what I more see myself as doing is making a bridge between our patterns of human thinking, and what the computational universe is capable of. There are all sorts of amazing things that can in principle be done by computation. But what the language does is to provide a way for us humans to express what we want done, or want to achieve—and then to get this actually executed, as automatically as possible.

Language design has to start from what we know and are familiar with. In the Wolfram Language, we name the built-in primitives with English words, leveraging the meanings that those words have acquired. But the Wolfram Language is not like natural language. It’s something more structured, and more powerful. It’s based on the words and concepts that we’re familiar with through the shared corpus of human knowledge. But it gives us a way to build up arbitrarily sophisticated programs that in effect express arbitrarily complex goals.

Yes, the computational universe is capable of remarkable things. But they’re not necessarily things that we humans can describe or relate to. But in building the Wolfram Language my goal is to do the best I can in capturing everything we humans want—and being able to express it in executable computational terms.

When we look at the computational universe, it’s hard not to be struck by the limitations of what we know how to describe or think about. Modern neural nets provide an interesting example. For the ImageIdentify function of the Wolfram Language we’ve trained a neural net to identify thousands of kinds of things in the world. And to cater to our human purposes, what the network ultimately does is to describe what it sees in terms of concepts that we can name with words—tables, chairs, elephants, etc.

But internally what the network is doing is to identify a series of features of any object in the world. Is it green? Is it round? And so on. And what happens as the neural network is trained is that it identifies features it finds useful for distinguishing different kinds of things in the world. But the point is that almost none of these features are ones to which we happen to have assigned words in human language.

Out in the computational universe it’s possible to find what may be incredibly useful ways to describe things. But they’re alien to us humans. They’re not something we know how to express, based on the corpus of knowledge our civilization has developed.

Now of course new concepts are being added to the corpus of human knowledge all the time. Back a century ago, if someone saw a nested pattern they wouldn’t have any way to describe it. But now we’d just say “it’s a fractal”. But the problem is that in the computational universe there’s an infinite collection of “potentially useful concepts”—with which we can never hope to ultimately keep up.

The Analogy in Mathematics

When I wrote A New Kind of Science I viewed it in no small part as an effort to break away from the use of mathematics—at least as a foundation for science. But one of the things I realized is that the ideas in the book also have a lot of implications for pure mathematics itself.

What is mathematics? Well, it’s a study of certain abstract kinds of systems, based on things like numbers and geometry. In a sense it’s exploring a small corner of the computational universe of all possible abstract systems. But still, plenty has been done in mathematics: indeed, the 3 million or so published theorems of mathematics represent perhaps the largest single coherent intellectual structure that our species has built.

Ever since Euclid, people have at least notionally imagined that mathematics starts from certain axioms (say, a+b=b+a, a+0=a, etc.), then builds up derivations of theorems. Why is math hard? The answer is fundamentally rooted in the phenomenon of computational irreducibility—which here is manifest in the fact that there’s no general way to shortcut the series of steps needed to derive a theorem. In other words, it can be arbitrarily hard to get a result in mathematics. But worse than that—as Gödel’s Theorem showed—there can be mathematical statements where there just aren’t any finite ways to prove or disprove them from the axioms. And in such cases, the statements just have to be considered “undecidable”.

And in a sense what’s remarkable about math is that one can usefully do it at all. Because it could be that most mathematical results one cares about would be undecidable. So why doesn’t that happen?

Well, if one considers arbitrary abstract systems it happens a lot. Take a typical cellular automaton—or a Turing machine—and ask whether it’s true that the system, say, always settles down to periodic behavior regardless of its initial state. Even something as simple as that will often be undecidable.

So why doesn’t this happen in mathematics? Maybe there’s something special about the particular axioms used in mathematics. And certainly if one thinks they’re the ones that uniquely describe science and the world there might be a reason for that. But one of the whole points of the book is that actually there’s a whole computational universe of possible rules that can be useful for doing science and describing the world.

And in fact I don’t think there’s anything abstractly special about the particular axioms that have traditionally been used in mathematics: I think they’re just accidents of history.

What about the theorems that people investigate in mathematics? Again, I think there’s a strong historical character to them. For all but the most trivial areas of mathematics, there’s a whole sea of undecidability out there. But somehow mathematics picks the islands where theorems can actually be proved—often particularly priding itself on places close to the sea of undecidability where the proof can only be done with great effort.

I’ve been interested in the whole network of published theorems in mathematics (it’s a thing to curate, like wars in history, or properties of chemicals). And one of the things I’m curious about is whether something there’s an inexorable sequence to the mathematics that’s done, or whether, in a sense, random parts are being picked.

And here, I think, there’s a considerable analogy to the kind of thing we were discussing before with language. What is a proof? Basically it’s a way of explaining to someone why something is true. I’ve made all sorts of automated proofs in which there are hundreds of steps, each perfectly verifiable by computer. But—like the innards of a neural net—what’s going on looks alien and not understandable by a human.

For a human to understand, there have to be familiar “conceptual waypoints”. It’s pretty much like with words in languages. If some particular part of a proof has a name (“Smith’s Theorem”), and has a known meaning, then it’s useful to us. But if it’s just a lump of undifferentiated computation, it won’t be meaningful to us.

In pretty much any axiom system, there’s an infinite set of possible theorems. But which ones are “interesting”? That’s really a human question. And basically it’s going to end up being ones with “stories”. In the book I show that for the simple case of basic logic, the theorems that have historically been considered interesting enough to be given names happen to be precisely the ones that are in some sense minimal.

But my guess is that for richer axiom systems pretty much anything that’s going to be considered “interesting” is going to have to be reached from things that are already considered interesting. It’s like building up words or concepts: you don’t get to introduce new ones unless you can directly relate them to existing ones.

In recent years I’ve wondered quite a bit about how inexorable or not progress is in a field like mathematics. Is there just one historical path that can be taken, say from arithmetic to algebra to the higher reaches of modern mathematics? Or are there an infinite diversity of possible paths, with completely different histories for mathematics?

The answer is going to depend on—in a sense—the “structure of metamathematical space”: just what is the network of true theorems that avoid the sea of undecidability? Maybe it’ll be different for different fields of mathematics, and some will be more “inexorable” (so it feels like the math is being “discovered”) than others (where it seems more like the math is arbitrary, and “invented”).

But to me one of the most interesting things is how close—when viewed in these kinds of terms—questions about the nature and character of mathematics end up being to questions about the nature and character of intelligence and AI. And it’s this kind of commonality that makes me realize just how powerful and general the ideas in A New Kind of Science actually are.

When Is There a Science?

There are some areas of science—like physics and astronomy—where the traditional mathematical approach has done quite well. But there are others—like biology, social science and linguistics—where it’s had a lot less to say. And one of the things I’ve long believed is that what’s needed to make progress in these areas is to generalize the kinds of models one’s using, to consider a broader range of what’s out there in the computational universe.

And indeed in the past 15 or so years there’s been increasing success in doing this. And there are lots of biological and social systems, for example, where models have now been constructed using simple programs.

But unlike with mathematical models which can potentially be “solved”, these computational models often show computational irreducibility, and are typically used by doing explicit simulations. This can be perfectly successful for making particular predictions, or for applying the models in technology. But a bit like for the automated proofs of mathematical theorems one might still ask, “is this really science?”.

Yes, one can simulate what a system does, but does one “understand” it? Well, the problem is that computational irreducibility implies that in some fundamental sense one can’t always “understand” things. There might be no useful “story” that can be told; there may be no “conceptual waypoints”—only lots of detailed computation.

Imagine that one’s trying to make a science of how the brain understands language—one of the big goals of linguistics. Well, perhaps we’ll get an adequate model of the precise rules which determine the firing of neurons or some other low-level representation of the brain. And then we look at the patterns generated in understanding some whole collection of sentences.

Well, what if those patterns look like the behavior of rule 30? Or, closer at hand, the innards of some recurrent neural network? Can we “tell a story” about what’s happening? To do so would basically require that we create some kind of higher-level symbolic representation: something where we effectively have words for core elements of what’s going on.

But computational irreducibility implies that there may ultimately be no way to create such a thing. Yes, it will always be possible to find patches of computational reducibility, where some things can be said. But there won’t be a complete story that can be told. And one might say there won’t be a useful reductionistic piece of science to be done. But that’s just one of the things that happens when one’s dealing with (as the title says) a new kind of science.

Controlling the AIs

People have gotten very worried about AI in recent years. They wonder what’s going to happen when AIs “get much smarter” than us humans. Well, the Principle of Computational Equivalence has one piece of good news: at some fundamental level, AIs will never be “smarter”—they’ll just be able to do computations that are ultimately equivalent to what our brains do, or, for that matter, what all sorts of simple programs do.

As a practical matter, of course, AIs will be able to process larger amounts of data more quickly than actual brains. And no doubt we’ll choose to have them run many aspects of the world for us—from medical devices, to central banks to transportation systems, and much more.

So then it’s important to figure how we’ll tell them what to do. As soon as we’re making serious use of what’s out there in the computational universe, we’re not going to be able to give a line-by-line description of what the AIs are going to do. Rather, we’re going to have to define goals for the AIs, then let them figure out how best to achieve those goals.

In a sense we’ve already been doing something like this for years in the Wolfram Language. There’s some high-level function that describes something you want to do (“lay out a graph”, “classify data”, etc.). Then it’s up to the language to automatically figure out the best way to do it.

And in the end the real challenge is to find a way to describe goals. Yes, you want to search for cellular automata that will make a “nice carpet pattern”, or a “good edge detector”. But what exactly do those things mean? What you need is a language that a human can use to say as precisely as possible what they mean.

It’s really the same problem as I’ve been talking about a lot here. One has to have a way for humans to be able to talk about things they care about. There’s infinite detail out there in the computational universe. But through our civilization and our shared cultural history we’ve come to identify certain concepts that are important to us. And when we describe our goals, it’s in terms of these concepts.

Three hundred years ago people like Leibniz were interested in finding a precise symbolic way to represent the content of human thoughts and human discourse. He was far too early. But now I think we’re finally in a position to actually make this work. In fact, we’ve already gotten a long way with the Wolfram Language in being able to describe real things in the world. And I’m hoping it’ll be possible to construct a fairly complete “symbolic discourse language” that lets us talk about the things we care about.

Right now we write legal contracts in “legalese” as a way to make them slightly more precise than ordinary natural language. But with a symbolic discourse language we’ll be able to write true “smart contracts” that describe in high-level terms what we want to have happen—and then machines will automatically be able to verify or execute the contract.

But what about the AIs? Well, we need to tell them what we generally want them to do. We need to have a contract with them. Or maybe we need to have a constitution for them. And it’ll be written in some kind of symbolic discourse language, that both allows us humans to express what we want, and is executable by the AIs.

There’s lots to say about what should be in an AI Constitution, and how the construction of such things might map onto the political and cultural landscape of the world. But one of the obvious questions is: can the constitution be simple, like Asimov’s Laws of Robotics?

And here what we know from A New Kind of Science tells us the answer: it can’t be. In a sense the constitution is an attempt to sculpt what can happen in the world and what can’t. But computational irreducibility says that there will be an unbounded collection of cases to consider.

For me it’s interesting to see how theoretical ideas like computational irreducibility end up impinging on these very practical—and central—societal issues. Yes, it all started with questions about things like the theory of all possible theories. But in the end it turns into issues that everyone in society is going to end up being concerned about.

There’s an Endless Frontier

Will we reach the end of science? Will we—or our AIs—eventually invent everything there is to be invented?

For mathematics, it’s easy to see that there’s an infinite number of possible theorems one can construct. For science, there’s an infinite number of possible detailed questions to ask. And there’s also an infinite array of possible inventions one can construct.

But the real question is: will there always be interesting new things out there?

Well, computational irreducibility says there will always be new things that need an irreducible amount of computational work to reach from what’s already there. So in a sense there’ll always be “surprises”, that aren’t immediately evident from what’s come before.

But will it just be like an endless array of different weirdly shaped rocks? Or will there be fundamental new features that appear, that we humans consider interesting?

It’s back to the very same issue we’ve encountered several times before: for us humans to find things “interesting” we have to have a conceptual framework that we can use to think about them. Yes, we can identify a “persistent structure” in a cellular automaton. Then maybe we can start talking about “collisions between structures”. But when we just see a whole mess of stuff going on, it’s not going to be “interesting” to us unless we have some higher-level symbolic way to talk about it.

In a sense, then, the rate of “interesting discovery” isn’t going to be limited by our ability to go out into the computational universe and find things. Instead, it’s going to be limited by our ability as humans to build a conceptual framework for what we’re finding.

It’s a bit like what happened in the whole development of what became A New Kind of Science. People had seen related phenomena for centuries if not millennia (distribution of primes, digits of pi, etc.). But without a conceptual framework they just didn’t seem “interesting”, and nothing was built around them. And indeed as I understand more about what’s out there in the computational universe—and even about things I saw long ago there—I gradually build up a conceptual framework that lets me go further.

By the way, it’s worth realizing that inventions work a little differently from discoveries. One can see something new happen in the computational universe, and that might be a discovery. But an invention is about figuring out how something can be achieved in the computational universe.

And—like in patent law—it isn’t really an invention if you just say “look, this does that”. You have to somehow understand a purpose that it’s achieving.

In the past, the focus of the process of invention has tended to be on actually getting something to work (“find the lightbulb filament that works”, etc.). But in the computational universe, the focus shifts to the question of what you want the invention to do. Because once you’ve described the goal, finding a way to achieve it is something that can be automated.

That’s not to say that it will always be easy. In fact, computational irreducibility implies that it can be arbitrarily difficult. Let’s say you know the precise rules by which some chemicals can interact. Can you find a chemical synthesis pathway that will let you get to some particular chemical structure? There may be a way, but computational irreducibility implies that there may be no way to find out how long the pathway may be. And if you haven’t found a pathway you may never be sure if it’s because there isn’t one, or just because you didn’t reach it yet.

The Fundamental Theory of Physics

If one thinks about reaching the edge of science, one cannot help but wonder about the fundamental theory of physics. Given everything we’ve seen in the computational universe, is it conceivable that our physical universe could just correspond to one of those programs out there in the computational universe?

Of course, we won’t really know until or unless we find it. But in the years since A New Kind of Science appeared, I’ve become ever more optimistic about the possibilities.

Needless to say, it would be a big change for physics. Today there are basically two major frameworks for thinking about fundamental physics: general relativity and quantum field theory. General relativity is a bit more than 100 years old; quantum field theory maybe 90. And both have achieved spectacular things. But neither has succeeded in delivering us a complete fundamental theory of physics. And if nothing else, I think after all this time, it’s worth trying something new.

But there’s another thing: from actually exploring the computational universe, we have a huge amount of new intuition about what’s possible, even in very simple models. We might have thought that the kind of richness we know exists in physics would require some very elaborate underlying model. But what’s become clear is that that kind of richness can perfectly well emerge even from a very simple underlying model.

What might the underlying model be like? I’m not going to discuss this in great detail here, but suffice it to say that I think the most important thing about the model is that it should have as little as possible built in. We shouldn’t have the hubris to think we know how the universe is constructed; we should just take a general type of model that’s as unstructured as possible, and do what we typically do in the computational universe: just search for a program that does what we want.

My favorite formulation for a model that’s as unstructured as possible is a network: just a collection of nodes with connections between them. It’s perfectly possible to formulate such a model as an algebraic-like structure, and probably many other kinds of things. But we can think of it as a network. And in the way I’ve imagined setting it up, it’s a network that’s somehow “underneath” space and time: every aspect of space and time as we know it must emerge from the actual behavior of the network.

Over the past decade or so there’s been increasing interest in things like loop quantum gravity and spin networks. They’re related to what I’ve been doing in the same way that they also involve networks. And maybe there’s some deeper relationship. But in their usual formulation, they’re much more mathematically elaborate.

From the point of view of the traditional methods of physics, this might seem like a good idea. But with the intuition we have from studying the computational universe—and using it for science and technology—it seems completely unnecessary. Yes, we don’t yet know the fundamental theory of physics. But it seems sensible to start with the simplest hypothesis. And that’s definitely something like a simple network of the kind I’ve studied.

At the outset, it’ll look pretty alien to people (including myself) trained in traditional theoretical physics. But some of what emerges isn’t so alien. A big result I found nearly 20 years ago (that still hasn’t been widely understood) is that when you look at a large enough network of the kind I studied you can show that its averaged behavior follows Einstein’s equations for gravity. In other words, without putting any fancy physics into the underlying model, it ends up automatically emerging. I think it’s pretty exciting.

People ask a lot about quantum mechanics. Yes, my underlying model doesn’t build in quantum mechanics (just as it doesn’t build in general relativity). Now, it’s a little difficult to pin down exactly what the essence of “being quantum mechanical” actually is. But there are some very suggestive signs that my simple networks actually end up showing what amounts to quantum behavior—just like in the physics we know.

OK, so how should one set about actually finding the fundamental theory of physics if it’s out there in the computational universe of possible programs? Well, the obvious thing is to just start searching for it, starting with the simplest programs.

I’ve been doing this—more sporadically than I would like—for the past 15 years or so. And my main discovery so far is that it’s actually quite easy to find programs that aren’t obviously not our universe. There are plenty of programs where space or time are obviously completely different from the way they are in our universe, or there’s some other pathology. But it turns out it’s not so difficult to find candidate universes that aren’t obviously not our universe.

But we’re immediately bitten by computational irreducibility. We can simulate the candidate universe for billions of steps. But we don’t know what it’s going to do—and whether it’s going to grow up to be like our universe, or completely different.

It’s pretty unlikely that in looking at that tiny fragment of the very beginning of a universe we’re going to ever be able to see anything familiar, like a photon. And it’s not at all obvious that we’ll be able to construct any kind of descriptive theory, or effective physics. But in a sense the problem is bizarrely similar to the one we have even in systems like neural networks: there’s computation going on there, but can we identify “conceptual waypoints” from which we can build up a theory that we might understand?

It’s not at all clear our universe has to be understandable at that level, and it’s quite possible that for a very long time we’ll be left in the strange situation of thinking we might have “found our universe” out in the computational universe, but not being sure.

Of course, we might be lucky, and it might be possible to deduce an effective physics, and see that some little program that we found ends up reproducing our whole universe. It would be a remarkable moment for science. But it would immediately raise a host of new questions—like why this universe, and not another?

Box of a Trillion Souls

Right now us humans exist as biological systems. But in the future it’s certainly going to be technologically possible to reproduce all the processes in our brains in some purely digital—computational—form. So insofar as those processes represent “us”, we’re going to be able to be “virtualized” on pretty much any computational substrate. And in this case we might imagine that the whole future of a civilization could wind up in effect as a “box of a trillion souls”.

Inside that box there would be all kinds of computations going on, representing the thoughts and experiences of all those disembodied souls. Those computations would reflect the rich history of our civilization, and all the things that have happened to us. But at some level they wouldn’t be anything special.

It’s perhaps a bit disappointing, but the Principle of Computational Equivalence tells us that ultimately these computations will be no more sophisticated than the ones that go on in all sorts of other systems—even ones with simple rules, and no elaborate history of civilization. Yes, the details will reflect all that history. But in a sense without knowing what to look for—or what to care about—one won’t be able to tell that there’s anything special about it.

OK, but what about for the “souls” themselves? Will one be able to understand their behavior by seeing that they achieve certain purposes? Well, in our current biological existence, we have all sorts of constraints and features that give us goals and purposes. But in a virtualized “uploaded” form, most of these just go away.

I’ve thought quite a bit about how “human” purposes might evolve in such a situation, recognizing, of course, that in virtualized form there’s little difference between human and AI. The disappointing vision is that perhaps the future of our civilization consists in disembodied souls in effect “playing videogames” for the rest of eternity.

But what I’ve slowly realized is that it’s actually quite unrealistic to project our view of goals and purposes from our experience today into that future situation. Imagine talking to someone from a thousand years ago and trying to explain that people in the future would be walking on treadmills every day, or continually sending photographs to their friends. The point is that such activities don’t make sense until the cultural framework around them has developed.

It’s the same story yet again as with trying to characterize what’s interesting or what’s explainable. It relies on the development of a whole network of conceptual waypoints.

Can we imagine what the mathematics of 100 years from now will be like? It depends on concepts we don’t yet know. So similarly if we try to imagine human motivation in the future, it’s going to rely on concepts we don’t know. Our best description from today’s viewpoint might be that those disembodied souls are just “playing videogames”. But to them there might be a whole subtle motivation structure that they could only explain by rewinding all sorts of steps in history and cultural development.

By the way, if we know the fundamental theory of physics then in a sense we can make the virtualization complete, at least in principle: we can just run a simulation of the universe for those disembodied souls. Of course, if that’s what’s happening, then there’s no particular reason it has to be a simulation of our particular universe. It could as well be any universe from out in the computational universe.

Now, as I’ve mentioned, even in any given universe one will never in a sense run out of things to do, or discover. But I suppose I myself at least find it amusing to imagine that at some point those disembodied souls might get bored with just being in a simulated version of our physical universe—and might decide it’s more fun (whatever that means to them) to go out and explore the broader computational universe. Which would mean that in a sense the future of humanity would be an infinite voyage of discovery in the context of none other than A New Kind of Science!

The Economics of the Computational Universe

Long before we have to think about disembodied human souls, we’ll have to confront the issue of what humans should be doing in a world where more and more can be done automatically by AIs. Now in a sense this issue is nothing new: it’s just an extension of the long-running story of technology and automation. But somehow this time it feels different.

And I think the reason is in a sense just that there’s so much out there in the computational universe, that’s so easy to get to. Yes, we can build a machine that automates some particular task. We can even have a general-purpose computer that can be programmed to do a full range of different tasks. But even though these kinds of automation extend what we can do, it still feels like there’s effort that we have to put into them.

But the picture now is different—because in effect what we’re saying is that if we can just define the goal we want to achieve, then everything else will be automatic. All sorts of computation, and, yes, “thinking”, may have to be done, but the idea is that it’s just going to happen, without human effort.

At first, something seems wrong. How could we get all that benefit, without putting in more effort? It’s a bit like asking how nature could manage to make all the complexity it does—even though when we build artifacts, even with great effort, they end up far less complex. The answer, I think, is it’s mining the computational universe. And it’s exactly the same thing for us: by mining the computational universe, we can achieve essentially an unbounded level of automation.

If we look at the important resources in today’s world, many of them still depend on actual materials. And often these materials are literally mined from the Earth. Of course, there are accidents of geography and geology that determine by whom and where that mining can be done. And in the end there’s a limit (if often very large) to the amount of material that’ll ever be available.

But when it comes to the computational universe, there’s in a sense an inexhaustible supply of material—and it’s accessible to anyone. Yes, there are technical issues about how to “do the mining”, and there’s a whole stack of technology associated with doing it well. But the ultimate resource of the computational universe is a global and infinite one. There’s no scarcity and no reason to be “expensive”. One just has to understand that it’s there, and take advantage of it.

The Path to Computational Thinking

Probably the greatest intellectual shift of the past century has been the one towards the computational way of thinking about things. I’ve often said that if one picks almost any field “X”, from archaeology to zoology, then by now there either is, or soon will be, a field called “computational X”—and it’s going to be the future of the field.

I myself have been deeply involved in trying to enable such computational fields, in particular through the development of the Wolfram Language. But I’ve also been interested in what is essentially the meta problem: how should one teach abstract computational thinking, for example to kids? The Wolfram Language is certainly important as a practical tool. But what about the conceptual, theoretical foundations?

Well, that’s where A New Kind of Science comes in. Because at its core it’s discussing the pure abstract phenomenon of computation, independent of its applications to particular fields or tasks. It’s a bit like with elementary mathematics: there are things to teach and understand just to introduce the ideas of mathematical thinking, independent of their specific applications. And so it is too with the core of A New Kind of Science. There are things to learn about the computational universe that give intuition and introduce patterns of computational thinking—quite independent of detailed applications.

One can think of it as a kind of “pre computer science” , or “pre computational X”. Before one gets into discussing the specifics of particular computational processes, one can just study the simple but pure things one finds in the computational universe.

And, yes, even before kids learn to do arithmetic, it’s perfectly possible for them to fill out something like a cellular automaton coloring book—or to execute for themselves or on a computer a whole range of different simple programs. What does it teach? Well, it certainly teaches the idea that there can be definite rules or algorithms for things—and that if one follows them one can create useful and interesting results. And, yes, it helps that systems like cellular automata make obvious visual patterns, that for example one can even find in nature (say on mollusc shells).

As the world becomes more computational—and more things are done by AIs and by mining the computational universe—there’s going to an extremely high value not only in understanding computational thinking, but also in having the kind of intuition that develops from exploring the computational universe and that is, in a sense, the foundation for A New Kind of Science.

What’s Left to Figure Out?

My goal over the decade that I spent writing A New Kind of Science was, as much as possible, to answer all the first round of “obvious questions” about the computational universe. And looking back 15 years later I think that worked out pretty well. Indeed, today, when I wonder about something to do with the computational universe, I find it’s incredibly likely that somewhere in the main text or notes of the book I already said something about it.

But one of the biggest things that’s changed over the past 15 years is that I’ve gradually begun to understand more of the implications of what the book describes. There are lots of specific ideas and discoveries in the book. But in the longer term I think what’s most significant is how they serve as foundations, both practical and conceptual, for a whole range of new things that one can now understand and explore.

But even in terms of the basic science of the computational universe, there are certainly specific results one would still like to get. For example, it would be great to get more evidence for or against the Principle of Computational Equivalence, and its domain of applicability.

Like most general principles in science, the whole epistemological status of the Principles of Computational Equivalence is somewhat complicated. Is it like a mathematical theorem that can be proved? Is it like a law of nature that might (or might not) be true about the universe? Or is it like a definition, say of the very concept of computation? Well, much like, say, the Second Law of Thermodynamics or Evolution by Natural Selection, it’s a combination of these.

But one thing that’s significant is that it’s possible to get concrete evidence for (or against) the Principle of Computational Equivalence. The principle says that even systems with very simple rules should be capable of arbitrarily sophisticated computation—so that in particular they should be able to act as universal computers.

And indeed one of the results of the book is that this is true for one of the simplest possible cellular automata (rule 110). Five years after the book was published I decided to put up a prize for evidence about another case: the simplest conceivably universal Turing machine. And I was very pleased that in just a few months the prize was won, the Turing machine was proved universal, and there was another piece of evidence for the Principle of Computational Equivalence.

There’s a lot to do in developing the applications of A New Kind of Science. There are models to be made of all sorts of systems. There’s technology to be found. Art to be created. There’s also a lot to do in understanding the implications.

But it’s important not to forget the pure investigation of the computational universe. In the analogy of mathematics, there are applications to be pursued. But there’s also a “pure mathematics” that’s worth pursuing in its own right. And so it is with the computational universe: there’s a huge amount to explore just at an abstract level. And indeed (as the title of the book implies) there’s enough to define a whole new kind of science: a pure science of the computational universe. And it’s the opening of that new kind of science that I think is the core achievement of A New Kind of Science—and the one of which I am most proud.


For the 10th anniversary of A New Kind of Science, I wrote three posts:

The complete high-resolution A New Kind of Science is now available on the web. There are also a limited number of print copies of the book still available (all individually coded!).


Showing Off to the Universe: Beacons for the Afterlife of Our Civilization

$
0
0
Showing Off to the Universe: Beacons for the Afterlife of Our Civilization

The Nature of the Problem

Confused alien with Spikey
Let’s say we had a way to distribute beacons around our solar system (or beyond) that could survive for billions of years, recording what our civilization has achieved. What should they be like?

It’s easy to come up with what I consider to be sophomoric answers. But in reality I think this is a deep—and in some ways unsolvable—philosophical problem, that’s connected to fundamental issues about knowledge, communication and meaning.

Still, a friend of mine recently started a serious effort to build little quartz disks, etc., and have them hitch rides on spacecraft, to be deposited around the solar system. At first I argued that it was all a bit futile, but eventually I agreed to be an advisor to the project, and at least try to figure out what to do to the extent we can.

But, OK, so what’s the problem? Basically it’s about communicating meaning or knowledge outside of our current cultural and intellectual context. We just have to think about archaeology to know this is hard. What exactly was some arrangement of stones from a few thousand years ago for? Sometimes we can pretty much tell, because it’s close to something in our current culture. But a lot of the time it’s really hard to tell.

OK, but what are the potential use cases for our beacons? One might be to back up human knowledge so things could be restarted even if something goes awfully wrong with our current terrestrial civilization. And of course historically it was very fortunate that we had all those texts from antiquity when things in Europe restarted during the Renaissance. But part of what made this possible was that there had been a continuous tradition of languages like Latin and Greek—not to mention that it was humans that were both the creators and consumers of the material.

But what if the consumers of the beacons we plan to spread around the solar system are aliens, with no historical connection to us? Well, then it’s a much harder problem.

In the past, when people have thought about this, there’s been a tendency to say “just show them math: it’s universal, and it’ll impress them!” But actually, I think neither claim about math is really true.

To understand this, we have to dive a little into some basic science that I happen to have spent many years working on. The reason people think math is a candidate for universal communication is that its constructs seem precise, and that at least here on Earth there’s only one (extant) version of it, so it seems definable without cultural references. But if one actually starts trying to work out how to communicate about current math without any assumptions (as, for example, I did as part of consulting on the Arrival movie), one quickly discovers that one really has to go “below math” to get to computational processes with simpler rules.

And (as seems to happen with great regularity, at least to me) one obvious place one lands is with cellular automata. It’s easy to show an elaborate pattern that’s created according to simple well-defined rules:

Complex cellular automata

But here’s the problem: there are plenty of physical systems that basically operate according to rules like these, and produce similarly elaborate patterns. So if this is supposed to show the impressive achievement of our civilization, it fails.

OK, but surely there must be something we can show that makes it clear that we’ve got some special spark of intelligence. I certainly always assumed there was. But one of the things that’s come out of the basic science I’ve done is what I called the Principle of Computational Equivalence, that basically says that once one’s gotten beyond a very basic level, every system will show behavior that’s equivalent in the sophistication of the computation it exhibits.

So although we’re very proud of our brains, and our computers, and our mathematics, they’re ultimately not going to be able to produce anything that’s beyond what simple programs like cellular automata—or, for that matter, “naturally occurring” physical systems—can produce. So when we make an offhand comment like “the weather has a mind of its own”, it’s not so silly: the fluid dynamic processes that lead to the weather are computationally equivalent to the processes that, for example, go on in our brains.

It’s a natural human tendency at this point to protest that surely there must be something special about us, and everything we’ve achieved with our civilization. People may say, for example, that there’s no meaning and no purpose to what the weather does. Of course, we can certainly attribute such things to it (“it’s trying to equalize temperatures between here and there”, etc.), and without some larger cultural story there’s no meaningful way to say if they’re “really there” or not.

OK, so if showing a sophisticated computation isn’t going to communicate what’s special about us and our civilization, what is? The answer is in the end details. Sophisticated computation is ubiquitous in our universe. But what’s inevitably special about us is the details of our history and what we care about.

We’re learning the same thing as we watch the progress of artificial intelligence. Increasingly, we can automate the things we humans can do—even ones that involve reasoning, or judgement, or creativity. But what we (essentially by definition) can’t automate is defining what we want to do, and what our goals are. For these are intimately connected to the details of our biological existence, and the history of our civilization—which is exactly what’s special about us.

But, OK, how can we communicate these things? Well, it’s hard. Because—needless to say—they’re tied into aspects of us that are special, and that won’t necessarily be shared with whatever we’re trying to communicate with.

At the end of the day, though, we’ve got a project that’s going to launch beacons on spacecraft. So what’s the best thing to put on them? I’ve spent a significant part of my life building what’s now the Wolfram Language, whose core purpose is to provide a precise language for communicating knowledge that our civilization has accumulated in a way that both us humans, and computers, can understand. So perhaps this—and my experience with it—can help. But first, we should talk about history to get an idea of what has and hasn’t worked in the past.

Lessons from the Past

A few years ago I was visiting a museum and looking at little wooden models of life in ancient Egypt that had been buried with some king several millennia ago. “How sad,” I thought. “They imagined this would help them in the afterlife. But it didn’t work; instead it just ended up in a museum.” But then it struck me: “No, it did work! This is their ‘afterlife’!” And they successfully transmitted some essence of their life to a world far beyond their own.

Tomb models

Of course, when we look at these models, it helps that a lot of what’s in them is familiar from modern times. Cows. A boat with oars. Scrolls. But some isn’t that familiar. What are those weird things at the ends of the boat, for example? What’s the purpose of those? What are they for? And here begins the challenge—of trying to understand without shared context.

I happened last summer to visit an archaeological site in Peru named Caral, that has all sorts of stone structures built more than 4000 years ago. It was pretty obvious what some of the structures were for. But others I couldn’t figure out. So I kept on asking our guide. And almost always the answer was the same: “it was for ceremonial purposes”.

Caral

Immediately I started thinking about modern structures. Yes, there are monuments and public artworks. But there are also skyscrapers, stadiums, cathedrals, canals, freeway interchanges and much more. And people have certain almost-ritual practices in interacting with these structures. But in the context of modern society, we would hardly call them “ceremonial”: we think of each type of structure as having a definite purpose which we can describe. But that description inevitably involves a considerable depth of cultural context.

When I was growing up in England, I went wandering around in woods near where I lived—and came across all sorts of pits and berms and other earthworks. I asked people what they were. Some said they were ancient fortifications; some said at least the pits were from bombs dropped in World War II. And who knows: maybe instead they were created by some process of erosion having nothing to do with people.

Almost exactly 50 years ago, as a young child vacationing in Sicily, I picked up this object on a beach:

Sicilian object

Being very curious what it was, I took it to my local archaeology museum. “You’ve come to the wrong place, young man,” they said, “it’s obviously a natural object.” So off I went to a natural history museum, only to be greeted with “Sorry, it’s not for us; it’s an artifact”. And from then until now the mystery has remained (though with modern materials analysis techniques it could perhaps be resolved—and I obviously should do it!)

There are so many cases where it’s hard to tell if something is an artifact or not. Consider all the structures we’ve built on Earth. Back when I was writing A New Kind of Science, I asked some astronauts what the most obvious manmade structure they noticed from space was. It wasn’t anything like the Great Wall of China (which is actually hard to see). Instead, they said it was a line across the Great Salt Lake in Utah (actually a 30-mile-long railroad causeway built in 1959, with algae that happen to have varying colors on the two sides):

Great Salt Lake
Right image courtesy of Ravell Call and Deseret News.

Then there was the 12-mile-diameter circle in New Zealand, the 30-mile one in Mauritania, and the 40-mile one in Quebec (with a certain Arrival heptapod calligraphy look):

Row of manmade images

Which were artifacts? This was before the web, so we had to contact people to find out. A New Zealand government researcher told us not to make the mistake of thinking their circle followed the shape of the cone volcano at its center. “The truth is, alas, much more prosaic,” he said: it’s the border of a national park, with trees cut outside only, i.e. an artifact. The other circles, however, had nothing to do with humans.

(It’s fun to look for evidence of humans visible from space. Like the grids of lights at night in Kansas, or lines of lights across Kazakhstan. And in recent years, there’s the 7-mile-long palm tree rendering in Dubai.  And, on the flip side, people have also tried to look for what might be “archaeological structures” in high-resolution satellite images of the Moon.)

But, OK, let’s come back to the question of what things mean. In a cave painting from 7000 years ago, we can recognize shapes of animals, and hand stencils that we can see were made with hands. But what do the configurations of these things mean? Realistically at this point we have no serious idea.

Cave paintings

Maybe it’s easier if we look at things that are more “mathematical”-like. In the 1990s I did a worldwide hunt for early examples of complex but structured patterns. I found all sorts of interesting things (such as mosaics supposedly made by Gilgamesh, from 3000 BC—and the earliest fractals, from 1210 AD). Most of the time I could tell what rules were used to make the patterns—though I could not tell what “meaning” the patterns were supposed to convey, or whether, instead, they were “merely ornamental”.

Ornamental mathlike patterns

The last pattern above, though, had me very puzzled for a while. Is it a cellular automaton being constructed back in the 1300s? Or something from number theory? Well, no, in the end it turns out it’s a rendering of a list of 62 attributes of Allah from the Koran, in a special square form of Arabic calligraphy constructed like this:

Arabic calligraphy

About a decade ago, I learned about a pattern from 11,000 years ago, on a wall in Aleppo, Syria (one hopes it’s still intact there). What is this? Math? Music? Map? Decoration? Digitally encoded data? We pretty much have no idea.

Aleppo, Syria

I could go on giving examples. Lots of times people have said “if one sees such-and-such, then it must have been made for a purpose”. The philosopher Immanuel Kant offered the opinion that if one saw a regular hexagon drawn in the sand, one could only imagine a “rational cause” for it. I used to think of this whenever I saw hexagonal patterns formed in rocks. And a few years ago I heard about hexagons in sand, produced purely by the action of wind. But the biggest hexagon I know is the storm pattern around the north pole of Saturn—that presumably wasn’t in any usual sense “put there for a purpose”:

Saturn

In 1899 Nikola Tesla picked up all sorts of elaborate and strange-sounding radio emissions, often a little reminiscent of Morse code. He knew they weren’t of human origin, so his immediate conclusion was that they must be radio messages from the inhabitants of Mars. Needless to say, they’re not. And instead, they’re just the result of physical processes in the Earth’s ionosphere and magnetosphere.

But here’s the ironic thing: they often sound bizarrely similar to whale songs! And, yes, whale songs have all sorts of elaborate rhyme-like and other features that remind us of languages. But we still don’t really know if they’re actually for “communication”, or just for “decoration” or “play”.

One might imagine that with modern machine learning and with enough data one should be able to train a translator for “talking to animals”. And no doubt that’d be easy enough for “are you happy?” or “are you hungry?”. But what about more sophisticated things? Say the kind of things we want to communicate to aliens?

I think it’d be very challenging. Because even if animals live in the same environment as us, it’s very unclear how they think about things. And it doesn’t help that even their experience of the world may be quite different—emphasizing for example smell rather than sight, and so on.

Animals can of course make “artifacts” too. Like this arrangement of sand produced over the course of a week or so by a little puffer fish:

Pufferfish circle

But what is this? What does it mean? Should we think of this “piscifact” as some great achievement of puffer fish civilization, that should be celebrated throughout the solar system?

Surely not, one might say. Because even though it looks complex—and even “artistic” (a bit like bird songs have features of music)—we can imagine that one day we’d be able to decode the neural pathways in the brain of the puffer fish that lead it to make this. But so what? We’ll also one day be able to know the neural pathways in humans that lead them to build cathedrals—or try to plant beacons around the solar system.

Aliens and the Philosophy of Purpose

There’s a thought experiment I’ve long found useful. Imagine a very advanced civilization, that’s able to move things like stars and planets around at will. What arrangement would they put them in?

Maybe they’d want to make a “beacon of purpose”. And maybe—like Kant—one could think that would be achievable by setting up some “recognizable” geometric pattern. Like how about an equilateral triangle? But no, that won’t do. Because for example the Trojan asteroids actually form an equilateral triangle with Jupiter and the Sun already, just as a result of physics.

And pretty soon one realizes that there’s actually nothing the aliens could do to “prove their purpose”. The configuration of stars in the sky may look kind of random to us (except, of course, that we still see constellations in it). But there’s nothing to say that looked at in the right way it doesn’t actually represent some grand purpose.

And here’s the confusing part: there’s a sense in which it does! Because, after all, just as a matter of physics, the configuration that occurs can be characterized as achieving the purpose of extremizing some quantity defined by the equations for matter and gravity and so on. Of course, one might say “that doesn’t count; it’s just physics”. But our whole universe (including ourselves) operates according to physics. And so now we’re back to discussing whether the extremization is “meaningful” or not.

We humans have definite ways to judge what’s meaningful or not to us. And what it comes down to is whether we can “tell a story” that explains, in culturally meaningful terms, why we’re doing something. Of course, the notion of purpose has evolved over the course of human history. Imagine trying to explain walking on a treadmill, or buying goods in a virtual world, or, for that matter, sending beacons out into the solar system—to the people thousands of years ago who created the structures from Peru that I showed above.

We’re not familiar (except in mythology) with telling “culturally meaningful stories” about the world of stars and planets. And in the past we might have imagined that somehow whatever stories we could tell would inevitably be far less rich than the ones we can tell about our civilization. But this is where basic science I’ve done comes in. The Principle of Computational Equivalence says that this isn’t true—and that in the end what goes on with stars and planets is just as rich as what goes on in our brains or our civilization.

In an effort to “show something interesting” to the universe, we might have thought that the best thing to do would be to present sophisticated abstract computational things. But that won’t be useful. Because those abstract computational things are ubiquitous throughout the universe.

And instead, the “most interesting” thing we have is actually the specific and arbitrary details of our particular history. Of course, one might imagine that there could be some sophisticated thing out there in the universe that could look at how our history starts, and immediately be able to deduce everything about how it will play out. But a consequence of the Principle of Computational Equivalence is what I call computational irreducibility, which implies that there can be no general shortcut to history; to find how it plays out, one effectively just has to live through it—which certainly helps one feel better about the meaningfulness of life.

The Role of Language

OK, so let’s say we want to explain our history. How can we do it? We can’t show every detail of everything that’s happened. Instead, we need to give a higher-level symbolic description, where we capture what’s important while idealizing everything else away. Of course, “what’s important” depends on who’s looking at it.

We might say “let’s show a picture”. But then we have to start talking about how to make the picture out of pixels at a certain resolution, how to represent colors, say with RGB—not to mention discussing how things might be imaged in 2D, compressed, etc. Across human history, we’ve had a decent record in having pictures remain at least somewhat comprehensible. But that’s probably in no small part because our biologically determined visual systems have stayed the same.

(It’s worth mentioning, though, that pictures can have features that are noticed only when they become “culturally absorbed”. For example, the nested patterns from the 1200s that I showed above were reproduced but ignored in art history books for hundreds of years—until fractals became “a thing”, and people had a way to talk about them.)

When it comes to communicating knowledge on a large scale, the only scheme we know (and maybe the only one that’s possible) is to use language—in which essentially there’s a set of symbolic constructs that can be arranged in an almost infinite number of ways to communicate different meanings.

It was presumably the introduction of language that allowed our species to begin accumulating knowledge from one generation to the next, and eventually to develop civilization as we know it. So it makes sense that language should be at the center of how we might communicate the story of what we’ve achieved.

And indeed if we look at human history, the cultures we know the most about are precisely those with records in written language that we’ve been able to read. If the structures in Caral had inscriptions, then (assuming we could read them) we’d have a much better chance of knowing what the structures were for.

There’ve been languages like Latin, Greek, Hebrew, Sanskrit and Chinese that have been continuously used (or at least known) for thousands of years—and that we’re readily able to translate. But in cases like Egyptian hieroglyphs, Babylonian cuneiform, Linear B, or Mayan, the thread of usage was broken, and it took heroic efforts to decipher them (and often the luck of finding something like the Rosetta Stone). And in fact today there are still plenty of languages—like Linear A, Etruscan, Rongorongo, Zapotec and the Indus script—that have simply never been deciphered.

Then there are cases where it’s not even clear whether something represents a language. An example is the quipus of Peru—that presumably recorded “data” of some kind, but that might or might not have recorded something we’d usually call a language:

Quipu

Math to the Rescue?

OK, but with all our abstract knowledge about mathematics, and computation, and so on, surely we can now invent a “universal language” that can be universally understood. Well, we can certainly create a formal system—like a cellular automaton—that just consistently operates according to its own formal rules. But does this communicate anything?

In its actual operation, the system just does what it does. But where there’s a choice is what the actual system is, what rules it uses, and what its initial conditions were. So if we were using cellular automata, we could for example decide that these particular ones are the ones we want to show:

Doubling cellular automata

What are we communicating here? Each rule has all sorts of detailed properties and behavior. But as a human you might say: “Aha, I see that all these rules double the length of their input; that’s the point”. But to be able to make that summary again requires a certain cultural context. Yes, with our human intellectual history, we have an easy way to talk about “doubling the length of their input”. But with a different intellectual history, that might not be a feature we have a way to talk about, just as human art historians for centuries didn’t have a way to talk about nested patterns.

Let’s say we choose to concentrate on traditional math. We have the same situation there. Maybe we could present theorems in some abstract system. But for each theorem it’s just “OK, fine, with those rules, that follows—much like with those shapes of molecules, this is a way they can arrange in a crystal”. And the only way one’s really “communicating something” is in the decision of which theorems to show, or which axiom systems to use. But again, to interpret those choices inevitably requires cultural context.

One place where the formal meets the actual is in the construction of theoretical models for things. We’ve got some actual physical process, and then we’ve got a formal, symbolic model for it—using mathematical equations, programs like cellular automata, or whatever. We might think that that connection would immediately define an interpretation for our formal system. But once again it does not, because our model is just a model, that captures some features of the system, and idealizes others away. And seeing how that works again requires cultural context.

There is one slight exception to this: what if there is a fundamental theory of all of physics, that can perhaps be stated as a simple program? That program is then not just an idealized model, but a full representation of physics. And the point is that that “ground truth” about our universe describes the physics that govern absolutely any entity that exists in our universe.

If there is indeed a simple model for the universe, it’s essentially inevitable that the things it directly describes are not ones familiar from our everyday sensory experience; for example they’re presumably “below” constructs like space and time as we know them. But still, we might imagine that we could show off our achievements by presenting a version of the ultimate theory for our universe (if we’d found it!). But even with this, there’s a problem. Because, well, it’s not difficult to show a correct model for the universe: you just have to look at the actual universe! So the main information in an abstract representation is in what the primitives of the abstract representation end up being (do you set up your universe in terms of networks, or algebraic structures, or what?).

Let’s back off from this level of philosophy for a moment. Let’s say we’re delivering a physical object—like a spacecraft, or a car—to our aliens. You might think the problem would be simpler. But the problem again is that it requires cultural context to decide what’s important, and what’s not. Is the placement of those rivets a message? Or an engineering optimization? Or an engineering tradition? Or just arbitrary?

Pretty much everything on, say, a spacecraft was presumably put there as part of building the spacecraft. Some was decided upon “on purpose” by its human designers. Some was probably a consequence of the physics of its manufacturing. But in the end the spacecraft just is what it is. You could imagine reconstructing the neural processes of its human designers, as you could imagine reconstructing the heat flows in the annealing of some part of it. But what is just the mechanism by which the spacecraft was built, and what is its “purpose”—or what is it trying to “communicate”?

The Molecular Version

It’s one thing to talk about sending messages based on the achievements of our civilization. But what about just sending our DNA? Yes, it doesn’t capture (at least in any direct way) all our intellectual achievements. But it does capture a couple of billion years of biological evolution, and represent a kind of memorial of the 1040 or so organisms that have ever lived on our planet.

Of course, we might again ask “what does it mean?”. And indeed one of the points of Darwinism is that the forms of organisms (and the DNA that defines them) arise purely as a consequence of the process of biological evolution, without any “intentional design”. Needless to say, when we actually start talking about biological organisms there’s a tremendous tendency to say things like “that mollusc has a pointy shell because it’s useful in wedging itself in rocks”—in other words, to attribute a purpose to what has arisen from evolution.

So what would we be communicating by sending DNA (or, for that matter, complete instances of organisms)? In a sense we’d be providing a frozen representation of history, though now biological history. There’s an issue of context again too. How does one interpret a disembodied piece of DNA? (Or, what environment is needed to get this spore to actually do something?)

Long ago it used to be said that if there were “organic molecules” out in space, it’d be a sign of life. But in fact plenty of even quite complex molecules have now been found, even in interstellar space. And while these molecules no doubt reflect all sorts of complex physical processes, nobody takes them as a sign of anything like life.

So what would happen if aliens found a DNA molecule? Is that elaborate sequence a “meaningful message”, or just something created through random processes? Yes, in the end the sequences that have survived in modern DNA reflect in some way what leads to successful organisms in our specific terrestrial environment, though—just as with technology and language—there is a certain feedback in the way that organisms create the environment for others.

But, so, what does a DNA sequence show? Well, like a library of human knowledge, it’s a representation of a lot of elaborate historical processes—and of a lot of irreducible computation. But the difference is that it doesn’t have any “spark of human intention” in it.

Needless to say, as we’ve been discussing, it’s hard to identify a signature for that. If we look at things we’ve created so far in our civilization, they’re typically recognizable by the presence of things like (what we at least currently consider) simple geometrical forms, such as lines and circles and so on. And in a sense it’s ironic that after all our development as a civilization, what we produce as artifacts look so much simpler than what nature routinely produces.

And we don’t have to look at biology, with all its effort of biological evolution. We can just as well think of physics, and things like the forms of snowflakes or splashes or turbulent fluids.

As I’ve argued at length, the real point is that out in the computational universe of possible programs, it’s actually easy to find examples where even simple underlying rules lead to highly complex behavior. And that’s what’s happening in nature. And the only reason we don’t see that usually in the things we construct is that we constrain ourselves to use engineering practices that avoid complexity, so that we can foresee their outcome. And the result of this is that we tend to always end up with things that are simple and familiar.

Now that we understand more about the computational universe, we can see, however, that it doesn’t always have to be this way. And in fact I have had great success just “mining the computational universe” for programs (and structures) that turn out to be useful, independent of whether one can “understand” how they operate. And something like the same thing happens when one trains a modern machine learning system. One ends up with a technological system that we can identify as achieving some overall purpose, but where the individual parts we can’t particularly recognize as doing meaningful things.

And indeed my expectation is that in the future, a smaller and smaller fraction of human-created technology will be “recognizable” and “understandable”. Optimized circuitry doesn’t have nice repetitive structure; nor do optimized algorithms. Needless to say, it’s sometimes hard to tell what’s going on. Is that pattern of holes on a speakerphone arranged to optimize some acoustic feature, or is it just “decorative”?

Yet again we’re thrust back into the same philosophical quandary: we can see the mechanism by which things operate, and we can come up with a story that describes why they might work that way. But there is no absolute way to decide whether that story is “correct”—except by referring back to the details of humans and human culture.

Talking about the World

Let’s go back to language. What really is a language? Structurally (at least in all the examples we know so far) it’s a collection of primitives (words, grammatical constructs, etc.) that can be assembled according to certain rules. And yes, we can look at a language formally at this level, just like we can look, say, at how to make tilings according to some set of rules. But what makes a language useful for communication is that its primitives somehow relate to the world—and that they’re tied into knowledge.

In a first approximation, the words or other primitives in a language end up being things that are useful in describing aspects of the world that we want to communicate. We have different words for “table” and “chair” because those are buckets of meaning that we find it useful to distinguish. Yes, we could start describing the details of how the legs of the table are arranged, but for many purposes it’s sufficient to just have that one word, or one symbolic primitive, “table”, that describes what we think of as a table.

Of course, for the word “table” to be useful for communication, the sender and recipient of the word have to have shared understanding of its meaning. As a practical matter, for natural languages, this is usually achieved in an essentially societal way—with people seeing other people describing things as “tables”.

How do we determine what words should exist? It’s a societally driven process, but at some level it’s about having ways to define concepts that are repeatedly useful to us. There’s a certain circularity to the whole thing. The concepts that are useful to us depend on the environment in which we live. If there weren’t any tables around (e.g. during the Stone Age), it wouldn’t be terribly useful to have the word “table”.

But then once we introduce a word for something (like “blog”), it starts to be easier for us to think about the thing—and then there tends to be more of it in the environment that we construct for ourselves, or choose to live in.

Imagine an intelligence that exists as a fluid (say the weather, for example). Or even imagine an aquatic organism, used to a fluid environment. Lots of the words we might take for granted about solid objects or locations won’t be terribly useful. And instead there might be words for aspects of fluid flow (say, lumps of vorticity that change in some particular way) that we’ve never identified as concepts that we need words for.

It might seem as if different entities that exist within our physical universe must necessarily have some commonality in the way they describe the world. But I don’t think this is the case—essentially as a consequence of the phenomenon of computational irreducibility.

The issue is that computational irreducibility implies that there are in effect an infinite number of irreducibly different environments that can be constructed on the basis of our physical universe—just like there are an infinite number of irreducibly different universal computers that can be built up using any given universal computer. In more practical terms, a way to say this is that different entities—or different intelligences—could operate using irreducibly different “technology stacks”, based on different elements of the physical world (e.g. atomic vs. electronic vs. fluidic vs. gravitational, etc.) and different chains of inventions. And the result would be that their way of describing the world would be irreducibly different.

Forming a Language

But OK, given a certain experience of the world, how can one figure out what words or concepts are useful in describing it? In human natural languages, this seems to be something that basically just evolves through a process roughly analogous to natural selection in the course of societal use of the language. And in designing the Wolfram Language as a computational communication language I’ve basically piggybacked on what has evolved in human natural language.

So how can we see the emergence of words and concepts in a context further away from human language? Well, in modern times, there’s an answer, which is basically to use our emerging example of alien intelligence: artificial intelligence.

Just take a neural network and start feeding it, say, images of lots of things in the world. (By picking the medium of 2D images, with a particular encoding of data, we’re essentially defining ourselves to be “experiencing the world” in a specific way.) Now see what kinds of distinctions the neural net makes in clustering or classifying these images.

In practice, different runs will give different answers. But any pattern of answers is in effect providing an example of the primitives for a language.

An easy place to see this is in training an image identification network. We started doing this several years ago with tens of millions of example images, in about 10,000 categories. And what’s notable is that if you look inside the network, what it’s effectively doing is to hone in on features of images that let it efficiently distinguish between different categories.

These features then in effect define the emergent symbolic language of the neural net. And, yes, this language is quite alien to us. It doesn’t directly reflect human language or human thinking. It’s in effect an alternate path for “understanding the world”, different from the one that humans and human language have taken.

Can we decipher the language? Doing so would allow us to “explain the story” of what the neural net is “thinking”. But it won’t typically be easy to do. Because the “concepts” that are being identified in the neural network typically won’t have easy translations to things we know about—and we’ll be stuck in effect doing something like natural science to try to identify phenomena from which we can build up a description of what’s going on.

OK, but in the problem of communicating with aliens, perhaps this suggests a way. Don’t try (and it’ll be hard) to specify a formal definition of “chair”. Just show lots of examples of chairs—and use this to define the symbolic “chair” construct. Needless to say, as soon as one’s showing pictures of chairs, not providing actual chairs, there are issues of how one’s describing or encoding things. And while this approach might work decently for common nouns, it’s more challenging for things like verbs, or more complex linguistic constructs.

But if we don’t want our spacecraft full of sample objects (a kind of ontological Noah’s Ark), maybe we could get away with just sending a device that looks at objects, and outputs what they’re called. After all, a human version of this is basically how people learn languages, either as children, or when they’re out doing linguistic fieldwork. And today we could certainly have a little computer with a very respectable, human-grade image identifier on it.

But here’s the problem. The aliens will start showing the computer all sorts of things that they’re familiar with. But there’s no guarantee whatsoever that they’ll be aligned with the things we (or the image identifier) has words for. One can already see the problem if one feeds an image identifier human abstract art; it’s likely to be even worse with the products of alien civilization:

ImageIdentify goofs

What the Wolfram Language Does

So can the Wolfram Language help? My goal in building it has been to create a bridge between the things humans want to do, and the things computation abstractly makes possible. And if I were building the language not for humans but for aliens—or even dolphins—I’d expect it to be different.

In the end, it’s all about computation, and representing things computationally. But what one chooses to represent—and how one does it—depends on the whole context one’s dealing with. And in fact, even for us humans, this has steadily changed over time. Over the 30+ years I’ve been working on the Wolfram Language, for example, both technology and the world have measurably evolved—with the result that there are all sorts of new things that make sense to have in the language. (The advance of our whole cultural understanding of computation—with things like hyperlinks and functional programming now becoming commonplace—also changes the concepts that can be used in the language.)

Right now most people think of the Wolfram Language mainly as a way for humans to communicate with computers. But I’ve always seen it as a general computational communication language for humans and computers—that’s relevant among other things in giving us humans a way to think and communicate in computational terms. (And, yes, the kind of computational thinking this makes possible is going to be increasingly critical—even more so than mathematical thinking has been in the past.)

But the key point is that the Wolfram Language is capturing computation in human-compatible terms. And in fact we can view it as in effect giving a definition of which parts of the universe of possible computations we humans—at the current stage in the evolution of our civilization—actually care about.

Another way to put this is that we can think of the Wolfram Language as providing a compressed representation (or, in effect, a model) of the core content of our civilization. Some of that content is algorithmic and structural; some of it is data and knowledge about the details of our world and its history.

There’s more to do to make the Wolfram Language into a full symbolic discourse language that can express a full range of human intentions (for example what’s needed for encoding complete legal contracts, or ethical principles for AIs.) But with the Wolfram Language as it exists today, we’re already capturing a very broad swath of the concerns and achievements of our civilization.

But how would we feed it to aliens? At some level its gigabytes of code and terabytes of data just define rules—like the rules for a cellular automaton or any other computational system. But the point is that these rules are chosen to be ones that do computations that we humans care about.

It’s a bit like those Egyptian tomb models, which show things Egyptians cared about doing. If we give the aliens the Wolfram Language we’re essentially giving them a computational model of things we care about doing. Except, of course, that by providing a whole language—rather than just individual pictures or dioramas—we’re communicating in a vastly broader and deeper way.

The Reality of Time Capsules

What we’re trying to create in a sense amounts to a time capsule. So what can we learn from time capsules of the past? Sadly, the history is not too inspiring.

Particularly following the discovery of King Tutankhamun’s tomb in 1922, there was a burst of enthusiasm for time capsules that lasted a little over 50 years, and led to the creation—and typically burial—of perhaps 10,000 capsules. Realistically, though, the majority of these time capsules are even by now long forgotten—most often because the organizations that created them have changed or disappeared. (The Westinghouse Time Capsule for the 1939 World’s Fair was at one time a proud example; but last year the remains of Westinghouse filed for bankruptcy.)

My own email archive records a variety of requests in earlier years for materials for time capsules, and looking at it today I’m reminded that we seem to have created a time capsule for Mathematica’s 10th anniversary in 1998. But where is it now? I don’t know. And this is a typical problem. Because whereas an ongoing archive (or library, etc.) can keep organized track of things, time capsules tend to be singular, and have a habit of ending up sequestered away in places that quickly get obscured and forgotten. (The reverse can also happen: people think there’s a time capsule somewhere—like one supposedly left by John von Neumann to be opened 50 years after his death—but it turns out just to be a confusion.)

The one area where at least informal versions of time capsules seem to work out with some frequency is in building construction. In England, for example, when thatched roofs are redone after 50 years or so, it’s common for messages from the previous workers to be found. But a particularly old tradition—dating even back to the Babylonians—is to put things in the foundations, and particularly at the cornerstones, of buildings.

Often in Babylonian times, there would just be an inscription cursing whoever had demolished the building to the point of seeing its foundations. But later, there was for example a longstanding tradition among Freemason stonemasons to embed small boxes of memorabilia in public buildings they built.

More successful, however, than cleverly hidden time capsules have been stone inscriptions out in plain sight. And indeed much of our knowledge of ancient human history and culture comes from just such objects. Sometimes they are part of large surviving architectural structures. But one famous example (key to the deciphering of cuneiform) is simply carved into the side of a cliff in what’s now Iran:

Mount Behistun

Behistun inscription

For emphasis, it has a life-size relief of a bunch of warriors at the top. The translated text begins: “I am Darius the great, king of kings, …” and goes on to list 76 paragraphs of Darius’s achievements, many of them being the putting down of attempted rebellions against him, in which he brought their leaders to sticky ends.

Such inscriptions were common in the ancient world (as their tamer successors are common today). But somehow their irony was well captured by my childhood favorite poem, Shelley’s “Ozymandias” (named after Ramses II of Egypt):

“I met a traveller from an antique land,
Who said—Two vast and trunkless legs of stone
Stand in the desert.

And on the pedestal, these words appear:
‘My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.”

If there was a “Risks” section to a prospectus for the beacon project, this might be a good exhibit for it.

Of course, in addition to intentional “showoff” inscriptions, ancient civilizations left plenty of “documentary exhaust” that’s still around in one form or another today. A decade ago, for example, I bought off the web (and, yes, I’m pretty sure it’s genuine) a little cuneiform tablet from about 2100 BC:

Cuneiform tablet

It turns out to be a contract saying that a certain Mr. Lu-Nanna is receiving 1.5 gur (about 16 cubic feet) of barley in the month of Dumuzi (Tammuz/June-July), and that in return he should pay out certain goods in September-November.

Most surviving cuneiform tablets are about things like this. One in a thousand or so are about things like math and astronomy, though. And when we look at these tablets today, it’s certainly interesting to see how far the Babylonians had got in math and astronomy. But (with the possible exception of some astronomical parameters) after a while we don’t really learn anything more from such tablets.

And that’s a lesson for our efforts now. If we put math or science facts in our beacons, then, yes, it shows how far we’ve got (and of course to make the best impression we should try to illustrate the furthest reaches of, for example, today’s math, which will be quite hard to do). But it feels a bit like job applicants writing letters that start by explaining basic facts. Yes, we already know those; now tell us something about yourselves!

But what’s the best way to do that? In the past the channel with the highest bandwidth was the written word. In today’s world, maybe video—or AI simulation—goes further. But there’s more—and we’re starting to see this in modern archaeology. The fact is that pretty much any solid object carries microscopic traces of its history. Maybe it’s a few stray molecules—say from the DNA of something that got onto an eating utensil. Maybe it’s microscopic scratches or cracks in the material itself, indicating some pattern of wear.

Atomic force microscopy gives us the beginning of one way to systematically read such things out. But as molecular-scale computing comes online, such capabilities will grow rapidly. And this will give us access to a huge repository of “historical exhaust”.

We won’t immediately know the name “Lu-Nanna”. But we might well know their DNA, the DNA of their scribe, what time of day their tablet was made, and what smells and maybe even sounds there were while the clay was drying. All of this one can think of as a form of “sensory data”—once again giving us information on “what happened”, though with no interpretation of what was considered important.

Messages in Space

OK, but our objective is to put information about our civilization out into space. So what’s the history of previous efforts to do that? Well, right now there are just four spacecraft outside our solar system (and another one that’s headed there), and there are under 100 spacecraft more-or-less intact on various planetary surfaces (not counting hard landings, melted spacecraft on Venus, etc.). And at some level a spacecraft itself is a great big “message”, illustrating lots of technology and so on.

Satellites

Probably the largest amounts of “design information” will be in the microprocessors. And although radiation hardening forces deep space probes to use chip designs that are typically a decade or more behind the latest models, something like the New Horizons spacecraft launched in 2006 still has MIPS R3000 CPUs (albeit running at 12 MHz) with more than 100,000 transistors:

MIPS R3000

There are also substantial amounts of software, typically stored in some kind of ROM. Of course, it may not be easy to understand, even for humans—and indeed just last month, firing backup thrusters on Voyager 1 that hadn’t been used for 37 years required deciphering the machine code for a long-extinct custom CPU.

The structure of a spacecraft tells a lot about human engineering and its history. Why was the antenna assembly that shape? Well, because it came from a long lineage of other antennas that were conveniently modeled and manufactured in such-and-such a way, and so on.

But what about more direct human information? Well, there are often little labels printed on components by manufacturers. And in recent times there’s been a trend of sending lists of people’s names (more than 400,000 on New Horizons) in engravings, microfilm or CDs/DVDs. (The MAVEN Mars mission also notably carried 1000+ publicly submitted haikus about Mars, together with 300+ drawings by kids, all on a DVD.) But on most spacecraft the single most prominent piece of “human communication” is a flag:

New Horizons

A few times, however, there have been explicit, purposeful plaques and things displayed. For example, on the leg of Apollo 11’s lunar module this was attached (with the Earth rendered in a stereographic projection cut in the middle of the Atlantic around 20°W):

Apollo 11 plaque

Each Apollo mission to the Moon also planted an American flag (most still “flying” according to recent high-res reconnaissance)—strangely reminiscent of shrines to ancient gods found in archaeological remains:

Apollo 11

The very first successful moon probe (Luna 2) carried to the Moon this ball-like object—which was intended to detonate like a grenade and scatter its pentagonal facets just before the probe hit the lunar surface, proclaiming (presumably to stake a claim): “USSR, January 1959”:

USSR sphere
Courtesy of the Cosmosphere, Hutchinson, KS

On Mars, there’s a plaque that seems more like the cover sheet for a document—or that might be summarized as “putting the output of some human cerebellums out in the cosmos” (what kind of personality analysis could the aliens do from those signatures?):

Curiosity plaque

There’s another list of names, this time an explicit memorial for fallen astronauts, left on the Moon by Apollo 15. But this time it comes with a small figurine, strangely reminiscent of the figurines we find in early archaeological remains:

Hadley memorial

Figurines have actually been sent on other spacecraft too. Here are some LEGO ones that went to Jupiter on the Juno spacecraft (from left to right: mythological Jupiter, mythological Juno, and real Galileo, complete with LEGO attachments):

Juno LEGO figurines

Also on that spacecraft was a tribute to Galileo—though all this will be vaporized when the spacecraft deorbits Jupiter later in 2018 to avoid contaminating any moons:

Galileo tribute

A variety of somewhat random personal and other trinkets have been left—usually unofficially—on the Moon. An example is a collection of tiny artworks (which are head scratchers even for me as a human) apparently attached to the leg of the Apollo 12 lunar module:

Moon art

There was also a piece of “artwork” (doubling as a color calibration target) sent on the ill-fated Beagle 2 Mars lander:

Beagle 2 painting

There are “MarsDials” on several Mars landers, serving as sundials and color calibration targets. The earlier ones had the statement “Two worlds, one sun”—along with the word “Mars” in 22 languages; on later ones the statement was the less poetic “On Mars, to explore”:

MarsDial

As another space trinket, the New Horizons spacecraft that recently passed Pluto has a simple Florida state quarter on board—which at least was presumably easy and cheap to obtain near its launch site.

But the most serious—and best-known—attempts to provide messages are the engraved aluminum plaques on the Pioneer 10 and 11 spacecraft that were launched in 1972 and 1973 (though are sadly now out of contact):

Color Pioneer plaque

Mounted Pioneer Plaque

I must say I have never been a big fan of this plaque. It always seemed to me too clever by half. My biggest beef has always been with the element at the top left. The original paper (with lead author Carl Sagan) about the plaque states that this “should be readily recognizable to the physicists of other civilizations”.

But what is it? As a human physicist, I can figure it out: it’s an iconic representation of the hyperfine transition of atomic hydrogen—the so-called 21-centimeter line. And those little arrows are supposed to represent the spin directions of protons and electrons before and after the transition. But wait a minute: electrons and protons are spin-1/2, so they act as spinors. And yes, traditional human quantum mechanics textbooks do often illustrate spinors using vectors. But that’s a really arbitrary convention.

Oh, and why should we represent quantum mechanical wavefunctions in atoms using localized lines? Presumably the electron is supposed to “go all the way around” the circle, indicating that it’s delocalized. And, yes, you can explain that iconography to someone who’s used to human quantum mechanics textbooks. But it’s about as obscure and human-specific as one can imagine. And, by the way, if one wants to represent 21.106-centimeter radiation, why not just draw a line precisely that length, or make the plaque that size (it actually has a width of 22.9 centimeters)!

I could go on and on about what’s wrong with the plaque. The rendering conventions for the (widely mocked) human figures, especially when compared to those for the spacecraft. The use of an arrow to show the spacecraft direction (do all aliens go through a stage of shooting arrowheads?). The trailing (binary) zeros to cover the lack of precision in pulsar periods.

The official key from the original paper doesn’t help the case, and in fact the paper lays out some remarkably elaborate “science IQ test” reasoning needed to decode other things on the plaque:

Science IQ test

After the attention garnered by the Pioneer plaques, a more ambitious effort was made for the Voyager spacecraft launched in 1977. The result was the 12-inch gold-plated Voyager Golden Record, with an “album cover”:

Voyager Golden Record

In 1977, phonograph records seemed like “universally obvious technology”. Today of course even the concept of analog recording is (at least for now) all but gone. And what of the elaborately drawn “needle” on the top left? In modern times the obvious way to read the record would just be to image the whole thing, without any needles tracking grooves.

But, OK, so what’s on the record? There are some spoken greetings in 55 languages (beginning with one in a modern rendering of Akkadian), along with a 90-minute collection of music from around the world. (Somehow I imagine an alien translator—or, for that matter, an AI—trying in vain to align the messages between the words and the music.) There’s an hour of recorded brainwaves of Carl Sagan’s future wife (Ann Druyan), apparently thinking about various things.

Then there are 116 images, encoded in analog scan lines (though I don’t know how color was done). Many were photographs of 1970s life on Earth. Some were “scientific explanations”, which are at least good exercises for human science students of the 2010s to interpret (though the real-number rounding is weird, there are “9 planets”, there’s “S” in place of “C” as a base pair—and it’s charming to see the stencil-and-ink rendering):

Voyager 1-2

Voyager 3-4

Voyager 5-6

Voyager 7-8

Yes, when I proposed the “alien flashcards” for scientists in the movie Arrival, I too started with binary—though in modern times it’s easy and natural to show the whole nested pattern of successive digit sequences:

Digit sequences

Among efforts after Voyager have been the (very 1990s-style) CD of human Mars-related “Visions of Mars” fiction on the failed 1996 Mars 96 spacecraft, as well as the 2012 “time capsule” CD of images and videos on the EchoStar 16 satellite in geostationary orbit around Earth:

Visions of Mars
(The Planetary Society)

A slightly different kind of plaque was launched back in 1976 on the LAGEOS-1 satellite that’s supposed to be in polar orbit around the Earth for 8.4 million years. There are the binary numbers, reminiscent of Leibniz’s original “binary medal”. And then there’s an image of the predicted effect of continental drift (and what about sea level?) from 228 years ago, to the end of the satellite’s life—that to me gives off a certain “so, did we get it right?” vibe:

LAGEOS

There was almost an engraved diamond plaque sent on the Cassini mission to Saturn and beyond in 1997, but as a result of human disagreements, it was never sent—and instead, in a very Ozymandias kind of way, all that’s left on the spacecraft is an empty mounting pedestal, whose purpose might be difficult to imagine.

Still another class of artifacts sent into the cosmos are radio transmissions. And until we have better-directed radio communications (and 5G will help), we’re radiating a certain amount of (increasingly encrypted) radio energy into the cosmos. The most intense ongoing transmissions remain the 50 Hz or 60 Hz hum of power lines, as well as the perhaps-almost-pulsar-like Ballistic Missile Early Warning System radars. But in the past there’ve been specific attempts to send messages for aliens to pick up.

The most famous was sent by the Arecibo radio telescope in 1974. Its repetition length was a product of two primes, intended to suggest assembly as a rectangular array. It’s an interesting exercise for humans to try to decipher the resulting image. Can you see the sequence of binary numbers? The schematic DNA, and the bitvectors for its components? The telescope icon? And the little 8-bit-video-game-like human?

Arecibo message

(There’ve been other messages sent, including a Doritos ad, a Beatles song, some Craigslist pages and a plant gene sequence—as well as some arguably downright embarrassing “artworks”.)

Needless to say, we pick up radio transmissions from the cosmos that we don’t understand fairly often. But are they signs of intelligence? Or “merely physics”? As I’ve said, the Principle of Computational Equivalence tells us there isn’t ultimately a distinction. And that, of course, is the challenge of our beacons project.

It’s worth mentioning that in addition to what’s been sent into space, there are a few messages on Earth specifically intended for at least few thousand years in the future. Examples are the 2000-year equinox star charts at the Hoover Dam, and the long-planned-but-not-yet-executed 10,000-year “stay away; it’s radioactive” warnings (or maybe it’s an “atomic priesthood” passing information generation to generation) for facilities like the WIPP nuclear waste repository in southeastern New Mexico.  (Not strictly a “message”, but there’s also the “10,000-year clock” being built in West Texas.)

A discussion of extraterrestrial communication wouldn’t be complete without at least mentioning the 1960 book Lincos: Design of a Language for Cosmic Intercourse—my copy of which wound up on the set of Arrival. The idea of the book was to use the methods and notation of mathematical logic to explain math, science, human behavior and other things “from first principles”. Its author, Hans Freudenthal, had spent decades working on math education—and on finding the best ways to explain math to (human) kids.

Lincos pages

Lincos was created too early to benefit from modern thinking about computer languages. And as it was, it used the often almost comically abstruse approach of Whitehead and Russell’s 1910 Principia Mathematica—in which even simple ideas become notationally complex. When it came to a topic like human behavior Lincos basically just gave examples, like small scenes in a stage play—but written in the notation of mathematical logic.

Yes, it’s interesting to try to have a symbolic representation for such things—and that’s the point of my symbolic discourse language project. But even though Lincos was at best just at the very beginning of trying to formulate something like this, it was still the obvious source for attempts to send “active SETI” messages starting in 1999, and some low-res bitmaps of Lincos were transmitted to nearby stars.

Science Fiction and Beyond

For our beacons project, we want to create human artifacts that will be recognized even by aliens. The related question of how alien artifacts might be recognizable has been tackled many times in science fiction.

Most often there’s something that just “doesn’t look natural”, either because it’s obviously defying gravity, or because it’s just too simple or perfect. For example, in the movie 2001, when the black cuboid monolith with its exact 1:4:9 side ratios shows up on Stone Age Earth or on the Moon, it’s obvious it’s “not natural”.

On the flip side, people in the 1800s argued that the fact that, while complex, a human-made pocket watch was so much simpler than a biological organism meant that the latter could only be an “artifact of God”. But actually I think the issue is just that our technology isn’t advanced enough yet. We’re still largely relying on engineering traditions and structures where we readily foresee every aspect of how our system will behave.

But I don’t think this will go on much longer. As I’ve spent many years studying, out in the computational universe of all possible programs it’s very common that the most efficient programs for a particular purpose don’t look at all simple in their behavior (and in fact this is a somewhat inevitable consequence of making better use of computational resources). And the result is that as soon as we can systematically mine such programs (as Darwinian evolution and neural network training already begin to), we’ll end up with artifacts that no longer look simple.

Ironically—but not surprisingly, given the Principle of Computational Equivalence—this suggests that our future artifacts will often look much more like “natural systems”. And indeed our current artifacts may look as primitive in the future as many of those produced before modern manufacturing look to us today.

Some science fiction stories have explored “natural-looking” alien artifacts, and how one might detect them. Of course it’s mired in the same issues that I’ve been exploring throughout this post—making it very difficult for example to tell for certain even whether the strangely red and strangely elongated interstellar object recently observed crossing our solar system is an alien artifact, or just a “natural rock”.

The Space of All Possible Civilizations

A major theme of this post has been that “communication” requires a certain sharing of “cultural context”. But how much sharing is enough? Different people—with at least fairly different backgrounds and experiences—can usually understand each other well enough for society to function, although as the “cultural distance” increases, such understanding becomes more and more difficult.

Over the course of human history, one can imagine a whole net of cultural contexts, defined in large part (at least until recently) by place and time. Neighboring contexts are typically closely connected—but to get a substantial distance, say in time, often requires following a quite long chain of intermediate connections, a bit like one might have to go through a chain of intermediate translations to get from one language to another.

Particularly in modern times, cultural context often evolves quite significantly even over the course of a single human lifetime. But usually the process is gradual enough that an individual can bridge the contexts they encounter—though of course there’s no lack of older people who are at best confused at the preferences and interests of the young (think modern social media, etc.). And indeed were one just suddenly to wake up a century hence, it’s fairly certain that some of the cultural context would be somewhat disorientingly different.

But, OK, can we imagine making some kind of formal theory of cultural contexts? To do so would likely in effect require describing the space of all possible civilizations. And at first this might seem utterly infeasible.

But when we explore the computational universe of possible programs we are looking at a space of all possible rules. And it’s easy to imagine defining at least some feature of a civilization by some appropriate rule—and different rules can lead to dramatically different behavior, as in these cellular automata:

Cellular automata array

But, OK, what would “communication” mean in this context? Well, as soon as these rules are computationally universal (and the Principle of Computational Equivalence implies that except in trivial cases they always will be), there’s got to be some way to translate between them. More specifically, given one universal rule, there must be some program for it—or some class of initial conditions—that make it emulate any other specified rule. Or, in other words, it must be possible to implement an interpreter for any given rule in the original rule.

We might then think of defining a distance between rules to be determined by the size or complexity of the interpreter necessary to translate between them. But while this sounds good in principle, it’s certainly not an easy thing to deal with out in practice. And it doesn’t help that interpretability can be formally undecidable, so there’s no upper bound on the size or complexity of the translator between rules.

But at least conceptually, this gives us a chance to think about how a “communication distance” might be defined. And perhaps one could imagine a first approximation for the simplified case of neural networks, in which one just asks how difficult it is to train one network to act like another.

As a more down-to-earth analogy to the space of cultural contexts, we could consider human languages, of which there are about 10,000 known. One can assess similarities between languages by looking at their words, and perhaps by looking at things like their grammatical structures. And even though in first approximation all languages can talk about the same kinds of things, languages can at least superficially have significant differences.

But for the specific case of human languages, there’s a lot determined by history. And indeed there’s a whole evolutionary tree of languages that one can identify, that effectively explains what’s close and what’s not. (Languages are often related to cultures, but aren’t the same. For example, Finnish is very different as a language from Swedish, even though Finnish and Swedish cultures are fairly similar.)

In the case of human civilizations, there are all sorts of indicators of similarity one might use. How similar do their artifacts look, say as recognized by neural networks? How similar are their social, economic or genealogical networks? How similar are quantitative measures of their patterns of laws or government?

Of course, all human civilizations share all sorts of common history—and no doubt occupy only some infinitesimal corner in the space of all possible civilizations. And in the vast majority of potential alien civilizations, it’s completely unrealistic to expect that the kinds of indicators we’re discussing for human civilizations could even be defined.

So how might one characterize a civilization and its cultural context? One way is to ask how it uses the computational universe of possible programs. What parts of that universe does it care about, and what not?

Now perhaps the endpoint of cultural evolution is to make use of the whole space of possible programs. Of course, our actual physical universe is presumably based on specific programs—although within the universe one can perfectly well emulate other programs.

And presumably anything that we could identify as a definite “civilization” with definite “culture context” must make use of some particular type of encoding—and in effect some particular type of language—for the programs it wants to specify. So one way to characterize a civilization is to imagine what analog of the Wolfram Language (or in general what symbolic discourse language) it would invent to describe things.

Yes, I’ve spent much of my life building the single example of the Wolfram Language intended for humans. And now what I’m suggesting is to imagine the space of all possible analogous languages, with all possible ways of sampling and encoding the computational universe.

But that’s the kind of thing we need to consider if we’re serious about alien communication. And in a sense just as we might say that we’re only going to consider aliens who live within a certain number of light years of us, so also we may have to say that we’ll only consider aliens where the language defining their cultural context is within a certain “translation distance” of ours.

How can we study this in practice? Well, of course we could think about what analog of the Wolfram Language other creatures with whom we share the Earth might find useful. We could also think about what AIs would find useful—though there is some circularity to this, insofar as we are creating AIs for the purpose of furthering our human goals. But probably the best path forward is just to imagine some kind of abstract enumeration of possible Wolfram-Language analogs, and then to start studying what methods of translation might be possible between them.

What Should We Actually Send?

OK, so there are lots of complicated intellectual and philosophical issues. But if we’re going to send beacons about the achievements of our civilization into space, what’s the best thing to do in practice?

A few points are obvious. First, even though it might seem more “universal”, don’t send lots of content that’s somehow formally derivable. Yes, we could say 2+2=4, or state a bunch of mathematical theorems, or show the evolution of a cellular automaton. But other than demonstrating that we can successfully do computation (which isn’t anything special, given the Principle of Computational Equivalence) we’re not really communicating anything like this. In fact, the only real information about us is our choice of what to send: which arithmetic facts, which theorems, etc.

Egyptian die

Here’s an ancient Egyptian die. And, yes, it’s interesting that they knew about icosahedra, and chose to use them. But the details of the icosahedral shape don’t tell us anything: it’s just the same as any other icosahedron.

OK, so an important principle is: if we want to communicate about ourselves, send things that are special to us—which means all sorts of arbitrary details about our history and interests. We could send an encyclopedia. Or if we have more space, we could send the whole content of the web, or scans of all books, or all available videos.

There’s a point, though, at which we will have sent enough: where basically there’s the raw material to answer any reasonable question one could ask about our civilization and our achievements.

But how does one make this as efficient as possible? Well, at least for general knowledge I’ve spent a long time trying to solve that problem. Because in a sense that’s what Wolfram|Alpha is all about: creating a system that can compute the answers to as broad a range as possible of questions.

So, yes, if we send a Wolfram|Alpha, we’re sending knowledge of our civilization in a concentrated, computational form, ready to be used as broadly as possible.

Of course, at least the public version of Wolfram|Alpha is just about general, public knowledge. So what about more detailed information about humans and the human condition?

Well, there’re always things like email archives, and personal analytics, and recordings, and so on. And, yes, I happen to have three decades of rather extensive data about myself, that I’ve collected mostly because it was easy for me to do.

But what could one get from that? Well, I suspect there’s enough data there that at least in principle one could construct a bot of me from it: in other words, one could create an AI system that would respond to things in pretty much the same way I would.

Of course, one could imagine just “going to the source” and starting to read out the content of a human brain. We don’t know how to do that yet. But if we’re going to assume that the recipients of our beacons have advanced further, then we have to assume that given a brain, they could tell what it would do.

Indeed, perhaps the most obvious thing to send (though it’s a bit macabre) would just be whole cryonically preserved humans (and, yes, they should keep well at the temperature of interstellar space!). Of course, it’s ironic how similar this is to the Egyptian idea of making mummies—though our technology is better (even if we still haven’t yet solved the problem of cryonics).

Is there a way to do even better, though? Perhaps by using AI and digital technology, rather than biology. Well, then we have a different problem. Yes, I expect we’ll be able to make AIs that represent any aspect of our civilization that we want. But then we have to decide what the “best of our civilization” is supposed to be.

It’s very related to questions about the ethics and “constitution” we should define for the AIs—and it’s an issue that comes back directly to the dynamics of our society. If we were sending biological humans then we’d get whatever bundle of traits each human we sent happened to have. But if we’re sending AIs, then somehow we’d have to decide which of the infinite range of possible characteristics we’d assign to best represent our civilization.

Whatever we might send—biological or digital—there’s absolutely no guarantee of any successful communication. Sure, our person or our AI might do their best to understand and respond to the alien that picked them up. But it might be hopeless. Yes, our representative might be able to identify the aliens, and observe the computations they’re doing. But that doesn’t mean that there’s enough alignment to be able to communicate anything we might think of as meaning.

It’s certainly not encouraging that we haven’t yet been able to recognize what we consider to be signs of extraterrestrial intelligence anywhere else in the universe. And it’s also not encouraging that even on our own planet we haven’t succeeded in serious communication with other species.

But just like Darius—or even Ozymandias—we shouldn’t give up. We should think of the beacons we send as monuments. Perhaps they will be useful for some kind of “afterlife”. But for now they serve as a useful rallying point for thinking about what we’re proud of in the achievements of our civilization—and what we want to capture and celebrate in the best way we can. And I’ll certainly be pleased to contribute to this effort the computational knowledge that I’ve been responsible for accumulating.

Roaring into 2018 with Another Big Release: Launching Version 11.3 of the Wolfram Language & Mathematica

$
0
0
Word Cloud

The Release Pipeline

Last September we released Version 11.2 of the Wolfram Language and Mathematica—with all sorts of new functionality, including 100+ completely new functions. Version 11.2 was a big release. But today we’ve got a still bigger release: Version 11.3 that, among other things, includes nearly 120 completely new functions.

This June 23rd it’ll be 30 years since we released Version 1.0, and I’m very proud of the fact that we’ve now been able to maintain an accelerating rate of innovation and development for no less than three decades. Critical to this, of course, has been the fact that we use the Wolfram Language to develop the Wolfram Language—and indeed most of the things that we can now add in Version 11.3 are only possible because we’re making use of the huge stack of technology that we’ve been systematically building for more than 30 years.

11.3We’ve always got a large pipeline of R&D underway, and our strategy for .1 versions is to use them to release everything that’s ready at a particular moment in time. Sometimes what’s in a .1 version may not completely fill out a new area, and some of the functions may be tagged as “experimental”. But our goal with .1 versions is to be able to deliver the latest fruits of our R&D efforts on as timely a basis as possible. Integer (.0) versions aim to be more systematic, and to provide full coverage of new areas, rounding out what has been delivered incrementally in .1 versions.

In addition to all the new functionality in 11.3, there’s a new element to our process. Starting a couple of months ago, we began livestreaming internal design review meetings that I held as we brought Version 11.3 to completion. So for those interested in “how the sausage is made”, there are now almost 122 hours of recorded meetings, from which you can find out exactly how some of the things you can now see released in Version 11.3 were originally invented. And in this post, I’m going to be linking to specific recorded livestreams relevant to features I’m discussing.

What’s New?

OK, so what’s new in Version 11.3? Well, a lot of things. And, by the way, Version 11.3 is available today on both desktop (Mac, Windows, Linux) and the Wolfram Cloud. (And yes, it takes extremely nontrivial software engineering, management and quality assurance to achieve simultaneous releases of this kind.)

In general terms, Version 11.3 not only adds some completely new directions, but also extends and strengthens what’s already there. There’s lots of strengthening of core functionality: still more automated machine learning, more robust data import, knowledgebase predictive prefetching, more visualization options, etc. There are all sorts of new conveniences: easier access to external languages, immediate input iconization, direct currying, etc. And we’ve also continued to aggressively push the envelope in all sorts of areas where we’ve had particularly active development in recent years: machine learning, neural nets, audio, asymptotic calculus, external language computation, etc.

Here’s a word cloud of new functions that got added in Version 11.3:

Word cloud

Blockchain

There are so many things to say about 11.3, it’s hard to know where to start. But let’s start with something topical: blockchain. As I’ll be explaining at much greater length in future posts, the Wolfram Language—with its built-in ability to talk about the real world—turns out to be uniquely suited to defining and executing computational smart contracts. The actual Wolfram Language computation for these contracts will (for now) happen off the blockchain, but it’s important for the language to be able to connect to blockchains—and that’s what’s being added in Version 11.3. [Livestreamed design discussion.]

The first thing we can do is just ask about blockchains that are out there in the world. Like here’s the most recent block added to the main Ethereum blockchain:

Blockchain

BlockchainBlockData[-1, BlockchainBase -> "Ethereum"]

Now we can pick up one of the transactions in that block, and start looking at it:

BlockchainBase

BlockchainTransactionData[\
"735e1643c33c6a632adba18b5f321ce0e13b612c90a3b9372c7c9bef447c947c", 
 BlockchainBase -> "Ethereum"]

And we can then start doing data science—or whatever analysis—we want about the structure and content of the blockchain. For the initial release of Version 11.3, we’re supporting Bitcoin and Ethereum, though other public blockchains will be added soon.

But already in Version 11.3, we’re supporting a private (Bitcoin-core) Wolfram Blockchain that’s hosted in our Wolfram Cloud infrastructure. We’ll be periodically publishing hashes from this blockchain out in the world (probably in things like physical newspapers). And it’ll also be possible to run versions of it in private Wolfram Clouds.

It’s extremely easy to write something to the Wolfram Blockchain (and, yes, it charges a small number of Cloud Credits):

BlockchainPut

BlockchainPut[Graphics[Circle[]]]

The result is a transaction hash, which one can then look up on the blockchain:

BlockchainTransactionData

BlockchainTransactionData[\
"9db73562fb45a75dd810456d575abbeb313ac19a2ec5813974c108a6935fcfb9"]

Here’s the circle back again from the blockchain:

BlockchainGet


 

 

By the way, the Hash function in the Wolfram Language has been extended in 11.3 to immediately support the kinds of hashes (like “RIPEMD160SHA256”) that are used in cryptocurrency blockchains. And by using Encrypt and related functions, it’s possible to start setting up some fairly sophisticated things on the blockchain—with more coming soon.

System Modeling

Alright, so now let’s talk about something really big that’s new—at least in experimental form—in Version 11.3. One of our long-term goals in the Wolfram Language is to be able to compute about anything in the world. And in Version 11.3 we’re adding a major new class of things that we can compute about: complex engineering (and other) systems. [Livestreamed design discussions 1 and 2.]

Back in 2012 we introduced Wolfram SystemModeler: an industrial-strength system modeling environment that’s been used to model things like jet engines with tens of thousands of components. SystemModeler lets you both run simulations of models, and actually develop models using a sophisticated graphical interface.

What we’re adding (experimentally) in Version 11.3 is the built-in capability for the Wolfram Language to run models from SystemModeler—or in fact basically any model described in the Modelica language.

Let’s start with a simple example. This retrieves a particular model from our built-in repository of models:

SystemModel
SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"]

If you press the [+] you see more detail:

IdealTriacCircuit

But the place where it gets really interesting is that you can actually run this model. SystemModelPlot makes a plot of a “standard simulation” of the model:

Standard manipulation

SystemModelPlot[
 SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"]]

What actually is the model underneath? Well, it’s a set of equations that describe the dynamics of how the components of the system behave. And for a very simple system like this, these equations are already pretty complicated:

SystemEquations

SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"][\
"SystemEquations"]

It comes with the territory in modeling real-world systems that there tend to be lots of components, with lots of complicated interactions. SystemModeler is set up to let people design arbitrarily complicated systems graphically, hierarchically connecting together components representing physical or other objects. But the big new thing is that once you have the model, then with Version 11.3 you can immediately work with it in the Wolfram Language.

Every model has lots of properties:

Properties

[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"] \
"Properties"]

One of these properties gives the variables that characterize the system. And, yes, even in a very simple system like this, there are already lots of those:

SystemVariables

[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"] \
"SystemVariables"]

Here’s a plot of how one of those variables behaves in the simulation:

Variable behavior

SystemModelPlot[[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"], 
  "idealTriac.capacitor.p.i"]]

A typical thing one wants to do is to investigate how the system behaves when parameters are changed. This simulates the system with one of its parameters changed, then makes a plot:

SystemModelSimulate

SystemModelSimulate[[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"]],  {"V.freqHz" -> 2.5}|>]
SystemModelPlot

SystemModelPlot[%, "idealTriac.capacitor.p.i"]

We could go on from here to sample lots of different possible inputs or parameter values, and do things like studying the robustness of the system to changes. Version 11.3 provides a very rich environment for doing all these things as an integrated part of the Wolfram Language.

In 11.3 there are already over 1000 ready-to-run models included—of electrical, mechanical, thermal, hydraulic, biological and other systems. Here’s a slightly more complicated example—the core part of a car:

SystemModel

SystemModel["IndustryExamples.AutomotiveTransportation.Driveline.\
DrivelineModel"]

If you expand the icon, you can mouse over the parts to find out what they are:

DrivelineModel

This gives a quick summary of the model, showing that it involves 1110 variables:

Summary

SystemModel["IndustryExamples.AutomotiveTransportation.Driveline.\
DrivelineModel"]["Summary"]

In addition to complete ready-to-run models, there are also over 6000 components included in 11.3, from which models can be constructed. SystemModeler provides a full graphical environment for assembling these components. But one can also do it purely with Wolfram Language code, using functions like ConnectSystemModelComponents (which essentially defines the graph of how the connectors of different components are connected):

Components

components = {"R" \[Element] 
    "Modelica.Electrical.Analog.Basic.Resistor", 
   "L" \[Element] "Modelica.Electrical.Analog.Basic.Inductor", 
   "AC" \[Element] "Modelica.Electrical.Analog.Sources.SineVoltage", 
   "G" \[Element] "Modelica.Electrical.Analog.Basic.Ground"};
Connections

connections = {"G.p" -> "AC.n", "AC.p" -> "L.n", "L.p" -> "R.n", 
   "R.p" -> "AC.n"};
ConnectSystemModelComponents

model = ConnectSystemModelComponents[components, connections]

You can also create models directly from their underlying equations, as well as making “black-box models” purely from data or empirical functions (say from machine learning).

It’s taken a long time to build all the system modeling capabilities that we’re introducing in 11.3. And they rely on a lot of sophisticated features of the Wolfram Language—including large-scale symbolic manipulation, the ability to robustly solve systems of differential-algebraic equations, handling of quantities and units, and much more. But now that system modeling is integrated into the Wolfram Language, it opens all sorts of important new opportunities—not only in engineering, but in all fields that benefit from being able to readily simulate multi-component real-world systems.

New in Notebooks

We first introduced notebooks in Version 1.0 back in 1988—so by now we’ve been polishing how they work for no less than 30 years. Version 11.3 introduces a number of new features. A simple one is that closed cell groups now by default have an “opener button”, as well as being openable using their cell brackets:

Section

I find this helpful, because otherwise I sometimes don’t notice closed groups, with extra cells inside. (And, yes, if you don’t like it, you can always switch it off in the stylesheet.)

Another small but useful change is the introduction of “indefinite In/Out labels”. In a notebook that’s connected to an active kernel, successive cells are labeled In[1], Out[1], etc. But if one’s no longer connected to the same kernel (say, because one saved and reopened the notebook), the In/Out numbering no longer makes sense. So in the past, there were just no In, Out labels shown. But as of Version 11.3, there are still labels, but they’re grayed down, and they don’t have any explicit numbers in them:

In Out

Another new feature in Version 11.3 is Iconize. Here’s the basic problem it solves. Let’s say you’ve got some big piece of data or other input that you want to store in the notebook, but you don’t want it to visually fill up the notebook. Well, one thing you can do is to put it in closed cells. But then to use the data you have to do something like creating a variable and so on. Iconize provides a simple, inline way to save data in a notebook.

Here’s how you make an iconized version of an expression:

Iconize

Iconize[Range[10]]

Now you can use this iconized form in place of giving the whole expression; it just immediately evaluates to the full expression:

Reverse

Reverse[{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}]

Another convenient use of Iconize is to make code easier to read, while still being complete. For example, consider something like this:

Plot

Plot[Sin[Tan[x]], {x, 0, 10}, Filling -> Axis, 
 PlotTheme -> "Scientific"]

You can select the options here, then go to the right-click menu and say to Iconize them:

Iconize menu

The result is an easier-to-read piece of code—that still evaluates just as it did before:

Better plot

Plot[Sin[Tan[x]], {x, 0, 10}, Sequence[
 Filling -> Axis, PlotTheme -> "Scientific"]]

In Version 11.2 we introduced ExternalEvaluate, for evaluating code in external languages (initially Python and JavaScript) directly from the Wolfram Language. (This is supported on the desktop and in private clouds; for security and provisioning reasons, the public Wolfram Cloud only runs pure Wolfram Language code.)

In Version 11.3 we’re now making it even easier to enter external code in notebooks. Just start an input cell with a > and you’ll get an external code cell (you can stickily select the language you want):

Python code

ExternalEvaluate["Python", "import platform; platform.platform()"]

And, yes, what comes back is a Wolfram Language expression that you can compute with:

StringSplit

StringSplit[%, "-"]

Workflow Documentation

We put a lot of emphasis on documenting the Wolfram Language—and traditionally we’ve had basically three kinds of components to our documentation: “reference pages” that cover a single function, “guide pages” that give a summary with links to many functions, and “tutorials” that provide narrative introductions to areas of functionality. Well, as of Version 11.3 there’s a fourth kind of component: workflows—which is what the gray tiles at the bottom of the “root guide page” lead to.

Documentation page

When everything you’re doing is represented by explicit Wolfram Language code, the In/Out paradigm of notebooks is a great way to show what’s going on. But if you’re clicking around, or, worse, using external programs, this isn’t enough. And that’s where workflows come in—because they use all sorts of graphical devices to present sequences of actions that aren’t just entering Wolfram Language input.

Hiding input

So if you’re getting coordinates from a plot, or deploying a complex form to the web, or adding a banner to a notebook, then expect to follow the new workflow documentation that we have. And, by the way, you’ll find links to relevant workflows from reference pages for functions.

Presenter Tools

Another big new interface-related thing in Version 11.3 is Presenter Tools—a complete environment for creating and running presentations that include live interactivity. What makes Presenter Tools possible is the rich notebook system that we’ve built over the past 30 years. But what it does is to add all the features one needs to conveniently create and run really great presentations.

People have been using our previous SlideShow format to give presentations with Wolfram Notebooks for about 20 years. But it was never a complete solution. Yes, it provided nice notebook features like live computation in a slide show environment, but it didn’t do “PowerPoint-like” things such as automatically scaling content to screen resolution. To be fair, we expected that operating systems would just intrinsically solve problems like content scaling. But it’s been 20 years and they still haven’t. So now we’ve built the new Presenter Tools that both solves such problems, and adds a whole range of features to create great presentations with notebooks as easy as possible.

To start, just choose File > New > Presenter Notebook. Then pick your template and theme, and you’re off and running:

Presenter Notebook

Here’s what it looks like when you’re editing your presentation (and you can change themes whenever you want):

Presenter demonstration

When you’re ready to present, just press Start Presentation. Everything goes full screen and is automatically scaled to the resolution of the screen you’re using. But here’s the big difference from PowerPoint-like systems: everything is live, interactive, editable, and scrollable. For example, you can have a Manipulate right inside a slide, and you can immediately interact with it. (Oh, and everything can be dynamic, say recreating graphics based on data that’s being imported in real time.)  You can also use things like cell groups to organize content in slides. And you can edit what’s on a slide, and for example, do livecoding, running your code as you go.

When you’re ready to go to a new slide, just press a single key (or have your remote do it for you). By default, the key is Page Down (so you can still use arrow keys in editing), but you can set a different key if you want. You can have Presenter Tools show your slides on one display, then display notes and controls on another display. When you make your slides, you can include SideNotes and SideCode. SideNotes are “PowerPoint-like” textual notes. But SideCode is something different. It’s actually based on something I’ve done in my own talks for years. It’s code you’ve prepared, that you can “magically” insert onto a slide in real time during your presentation, immediately evaluating it if you want.

Presenter details

I’ve given a huge number of talks using Wolfram Notebooks over the years. A few times I’ve used the SlideShow format, but mostly I’ve just done everything in an ordinary notebook, often keeping notes on a separate device. But now I’m excited that with Version 11.3 I’ve got basically exactly the tools I need to prepare and present talks. I can pre-define some of the content and structure, but then the actual talk can be very dynamic and spontaneous—with live editing, livecoding and all sorts of interactivity.

Wolfram Chat

While we’re discussing interface capabilities, here’s another new one: Wolfram Chat. When people are interactively working together on something, it’s common to hear someone say “let me just send you a piece of code” or “let me send you a Manipulate”. Well, in Version 11.3 there’s now a very convenient way to do this, built directly into the Wolfram Notebook system—and it’s called Wolfram Chat. [Livestreamed design discussion.]

Just select File > New > Chat; you’ll get asked who you want to “chat with”—and it could be anyone anywhere with a Wolfram ID (though of course they do have to accept your invitation):

Chat invite

Then you can start a chat session, and, for example, put it alongside an ordinary notebook:

Notebook chat session

The neat thing is that you can send anything that can appear in a notebook, including images, code, dynamic objects, etc. (though it’s sandboxed so people can’t send “code bombs” to each other).

There are lots of obvious applications of Wolfram Chat, not only in collaboration, but also in things like classroom settings and technical support. And there are some other applications too. Like for running livecoding competitions. And in fact one of the ways we stress-tested Wolfram Chat during development was to use it for the livecoding competition at the Wolfram Technology Conference last fall.

One might think that chat is something straightforward. But actually it’s surprisingly tricky, with a remarkable number of different situations and cases to cover. Under the hood, Wolfram Chat is using both the Wolfram Cloud and the new pub-sub channel framework that we introduced in Version 11.0. In Version 11.3, Wolfram Chat is only being supported for desktop Wolfram Notebooks, but it’ll be coming soon to notebooks on the web and on mobile.

Language Conveniences

We’re always polishing the Wolfram Language to make it more convenient and productive to use. And one way we do this is by adding new little “convenience functions” in every version of the language. Often what these functions do is pretty straightforward; the challenge (which has often taken years) is to come up with really clean designs for them. (You can see quite a bit of the discussion about the new convenience functions for Version 11.3 in livestreams we’ve done recently.)

Here’s a function that it’s sort of amazing we’ve never explicitly had before—a function that just constructs an expression from its head and arguments:

Construct

Construct[f, x, y]

Why is this useful? Well, it can save explicitly constructing pure functions with Function or &, for example in a case like this:

Fold

Fold[Construct, f, {a, b, c}]

Another function that at some level is very straightforward (but about whose name we agonized for quite a while) is Curry. Curry (named after “currying”, which is in turn named after Haskell Curry) essentially makes operator forms, with Curry[f,n] “currying in” n arguments:

Curry

Curry[f, 3][a][b][c][d][e]

The one-argument form of Curry itself is:

One-argument Curry

Curry[f][x][y]

Why is this useful? Well, some functions (like Select, say) have built-in “operator forms”, in which you give one argument, then you “curry in” others:

Select Curry

Select[# > 5 &][Range[10]]

But what if you wanted to create an operator form yourself? Well, you could always explicitly construct it using Function or &. But with Curry you don’t need to do that. Like here’s an operator form of D, in which the second argument is specified to be x:

Curry operator form

Curry[D][x]

Now we can apply this operator form to actually do differentiation with respect to x:

Differentiation

%[f[x]]

Yes, Curry is at some level rather abstract. But it’s a nice convenience if you understand it—and understanding it is a good exercise in understanding the symbolic structure of the Wolfram Language.

Talking of operator forms, by the way, NearestTo is an operator-form analog of Nearest (the one-argument form of Nearest itself generates a NearestFunction):

Nearest

NearestTo[2.3][{1, 2, 3, 4, 5}]

Here’s an example of why this is useful. This finds the 5 chemical elements whose densities are nearest to 10 g/cc:

Chemical elements

Entity["Element", "Density" -> NearestTo[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "10 g/cc", Typeset`boxes$$ = 
        TemplateBox[{"10", 
RowBox[{"\"g\"", " ", "\"/\"", " ", 
SuperscriptBox["\"cm\"", "3"]}], "grams per centimeter cubed", 
FractionBox["\"Grams\"", 
SuperscriptBox["\"Centimeters\"", "3"]]}, "Quantity", 
         SyntaxForm -> Mod], Typeset`allassumptions$$ = {}, 
        Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
        Typeset`querystate$$ = {
        "Online" -> True, "Allowed" -> True, 
         "mparse.jsp" -> 0.777394`6.342186177878503, 
         "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{94., {8., 19.}},
TrackedSymbols:>{
          Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
           Typeset`assumptions$$, Typeset`open$$, 
           Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\), 5]] // EntityList

In Version 10.1 in 2015 we introduced a bunch of  functions that operate on sequences in lists. Version 11.3 adds a couple more such functions. One is SequenceSplit. It’s like StringSplit for lists: it splits lists at the positions of particular sequences:

SequenceSplit

uenceSplit[{a, b, x, x, c, d, x, e, x, x, a, b}, {x, x}]

Also new in the “Sequence family” is the function SequenceReplace:

SequenceReplace

SequenceReplace[{a, b, x, x, c, d, x, e, x, x, a, 
  b}, {x, n_} -> {n, n, n}]

Visualization Updates

Just as we’re always polishing the core programming functionality of the Wolfram Language, we’re also always polishing things like visualization.

In Version 11.0, we added GeoHistogram, here showing “volcano density” in the US:

GeoHistogram

GeoHistogram[GeoPosition[GeoEntities[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "USA", Typeset`boxes$$ = 
       TemplateBox[{"\"United States\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"Country\"", ",", "\"UnitedStates\""}], "]"}], 
         "\"Entity[\\\"Country\\\", \\\"UnitedStates\\\"]\"", 
         "\"country\""}, "Entity"], 
       Typeset`allassumptions$$ = {{
        "type" -> "Clash", "word" -> "USA", 
         "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "2", 
         "Values" -> {{
           "name" -> "Country", "desc" -> "a country", 
            "input" -> "*C.USA-_*Country-"}, {
           "name" -> "FileFormat", "desc" -> "a file format", 
            "input" -> "*C.USA-_*FileFormat-"}}}}, 
       Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
       Typeset`querystate$$ = {
       "Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.373096`6.02336558644664, "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{197., {7., 16.}},
TrackedSymbols:>{
         Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
          Typeset`assumptions$$, Typeset`open$$, 
          Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\), "Volcano"]]]

In Version 11.3, we’ve added GeoSmoothHistogram:

GeoSmoothHistogram

GeoSmoothHistogram[GeoPosition[GeoEntities[\!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "USA", Typeset`boxes$$ = 
       TemplateBox[{"\"United States\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"Country\"", ",", "\"UnitedStates\""}], "]"}], 
         "\"Entity[\\\"Country\\\", \\\"UnitedStates\\\"]\"", 
         "\"country\""}, "Entity"], 
       Typeset`allassumptions$$ = {{
        "type" -> "Clash", "word" -> "USA", 
         "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "2", 
         "Values" -> {{
           "name" -> "Country", "desc" -> "a country", 
            "input" -> "*C.USA-_*Country-"}, {
           "name" -> "FileFormat", "desc" -> "a file format", 
            "input" -> "*C.USA-_*FileFormat-"}}}}, 
       Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
       Typeset`querystate$$ = {
       "Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.373096`6.02336558644664, "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{197., {7., 16.}},
TrackedSymbols:>{
         Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
          Typeset`assumptions$$, Typeset`open$$, 
          Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\), "Volcano"]]]

Also new in Version 11.3 are callouts in 3D plots, here random words labeling random points (but note how the words are positioned to avoid each other):

3D plot callout

ListPointPlot3D[Table[Callout[RandomReal[10, 3], RandomWord[]], 25]]

We can make a slightly more meaningful plot of words in 3D by using the new machine-learning-based FeatureSpacePlot3D (notice for example that “vocalizing” and “crooning” appropriately end up close together):

FeatureSpacePlot3D

FeatureSpacePlot3D[RandomWord[20]]

Text Reading

Talking of machine learning, Version 11.3 continues our aggressive development of automated machine learning, building both general tools, and specific functions that make use of machine learning.

An interesting example of a new function is FindTextualAnswer, which takes a piece of text, and tries to find answers to textual questions. Here we’re using the Wikipedia article on “rhinoceros”, asking how much a rhino weighs:

FindTextualAnswer

FindTextualAnswer[
 WikipediaData["rhinoceros"], "How much does a rhino weigh?"]

It almost seems like magic. Of course it doesn’t always work, and it can do things that we humans would consider pretty stupid. But it’s using very state-of-the-art machine learning methodology, together with a lot of unique training data based on Wolfram|Alpha. We can see a little more of what it does if we ask not just for its top answer about rhino weights, but for its top 5:

FindTextualAnswer top 5

FindTextualAnswer[
 WikipediaData["rhinoceros"], "How much does a rhino weigh?", 5]

Hmmm. So what’s a more definitive answer? Well, for that we can use our actual curated knowledgebase:

Knowledgebase answer

\!\(
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "rhino weight", Typeset`boxes$$ = 
    RowBox[{
TemplateBox[{"\"rhinoceroses\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"Species\"", ",", "\"Family:Rhinocerotidae\""}], "]"}], 
        "\"Entity[\\\"Species\\\", \\\"Family:Rhinocerotidae\\\"]\"", 
        "\"species specification\""}, "Entity"], "[", 
TemplateBox[{"\"weight\"", 
RowBox[{"EntityProperty", "[", 
RowBox[{"\"Species\"", ",", "\"Weight\""}], "]"}], 
        "\"EntityProperty[\\\"Species\\\", \\\"Weight\\\"]\""}, 
       "EntityProperty"], "]"}], 
    Typeset`allassumptions$$ = {{
     "type" -> "MultiClash", "word" -> "", 
      "template" -> "Assuming ${word1} is referring to ${desc1}. Use \
\"${word2}\" as ${desc2}. Use \"${word3}\" as ${desc3}.", 
      "count" -> "3", 
      "Values" -> {{
        "name" -> "Species", "word" -> "rhino", 
         "desc" -> "a species specification", 
         "input" -> "*MC.%7E-_*Species-"}, {
        "name" -> "Person", "word" -> "rhino", "desc" -> "a person", 
         "input" -> "*MC.%7E-_*Person-"}, {
        "name" -> "Formula", "word" -> "", "desc" -> "a formula", 
         "input" -> "*MC.%7E-_*Formula-"}}}}, 
    Typeset`assumptions$$ = {}, Typeset`open$$ = {1}, 
    Typeset`querystate$$ = {
    "Online" -> True, "Allowed" -> True, 
     "mparse.jsp" -> 0.812573`6.361407381082941, "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{96., {7., 16.}},
TrackedSymbols:>{
      Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
       Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)

Or in tons:

UnitConvert

UnitConvert[%, \!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "tons", Typeset`boxes$$ = 
     TemplateBox[{
InterpretationBox[" ", 1], "\"sh tn\"", "short tons", 
       "\"ShortTons\""}, "Quantity", SyntaxForm -> Mod], 
     Typeset`allassumptions$$ = {{
      "type" -> "Clash", "word" -> "tons", 
       "template" -> "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" -> "2", 
       "Values" -> {{
         "name" -> "Unit", "desc" -> "a unit", 
          "input" -> "*C.tons-_*Unit-"}, {
         "name" -> "Word", "desc" -> "a word", 
          "input" -> "*C.tons-_*Word-"}}}, {
      "type" -> "Unit", "word" -> "tons", 
       "template" -> "Assuming ${desc1} for \"${word}\". Use ${desc2} \
instead", "count" -> "10", 
       "Values" -> {{
         "name" -> "ShortTons", "desc" -> "short tons", 
          "input" -> "UnitClash_*tons.*ShortTons--"}, {
         "name" -> "LongTons", "desc" -> "long tons", 
          "input" -> "UnitClash_*tons.*LongTons--"}, {
         "name" -> "MetricTons", "desc" -> "metric tons", 
          "input" -> "UnitClash_*tons.*MetricTons--"}, {
         "name" -> "ShortTonsForce", "desc" -> "short tons-force", 
          "input" -> "UnitClash_*tons.*ShortTonsForce--"}, {
         "name" -> "TonsOfTNT", "desc" -> "tons of TNT", 
          "input" -> "UnitClash_*tons.*TonsOfTNT--"}, {
         "name" -> "DisplacementTons", "desc" -> "displacement tons", 
          "input" -> "UnitClash_*tons.*DisplacementTons--"}, {
         "name" -> "LongTonsForce", "desc" -> "long tons-force", 
          "input" -> "UnitClash_*tons.*LongTonsForce--"}, {
         "name" -> "MetricTonsForce", "desc" -> "metric tons-force", 
          "input" -> "UnitClash_*tons.*MetricTonsForce--"}, {
         "name" -> "TonsOfRefrigerationUS", 
          "desc" -> "US commercial tons of refrigeration", 
          "input" -> "UnitClash_*tons.*TonsOfRefrigerationUS--"}, {
         "name" -> "TonsOfRefrigerationUKCommercial", 
          "desc" -> "UK commercial tons of refrigeration (power)", 
          "input" -> "UnitClash_*tons.*\
TonsOfRefrigerationUKCommercial--"}}}}, Typeset`assumptions$$ = {}, 
     Typeset`open$$ = {1}, Typeset`querystate$$ = {
     "Online" -> True, "Allowed" -> True, 
      "mparse.jsp" -> 0.303144`5.933193970346431, "Messages" -> {}}}, 

DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{47., {7., 16.}},
TrackedSymbols:>{
       Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, 
        Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}],

DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)]

FindTextualAnswer is no substitute for our whole data curation and computable data strategy. But it’s useful as a way to quickly get a first guess of an answer, even from completely unstructured text. And, yes, it should do well at critical reading exercises, and could probably be made to do well at Jeopardy! too.

Face Computation

We humans respond a lot to human faces, and with modern machine learning it’s possible to do all sorts of face-related computations—and in Version 11.3 we’ve added systematic functions for this. Here FindFaces pulls out faces (of famous physicists) from a photograph:

Physicists' faces

FindFaces[CloudGet["https://wolfr.am/sWoDYqbb"], "Image"]

FacialFeatures uses machine learning methods to estimate various attributes of faces (such as the apparent age, apparent gender and emotional state):

FacialFeatures[CloudGet["https://wolfr.am/sWRQARe8"]]//Dataset

These features can for example be used as criteria in FindFaces, here picking out physicists who appear to be under 40:

FindFaces

FindFaces[CloudGet["https://wolfr.am/sWoDYqbb"], #Age < 40 &, "Image"]

Neural Networks

There are now all sorts of functions in the Wolfram Language (like FacialFeatures) that use neural networks inside. But for several years we’ve also been energetically building a whole subsystem in the Wolfram Language to let people work directly with neural networks. We’ve been building on top of low-level libraries (particularly MXNet, to which we’ve been big contributors), so we can make use of all the latest GPU and other optimizations. But our goal is to build a high-level symbolic layer that makes it as easy as possible to actually set up neural net computations. [Livestreamed design discussions 1, 2 and 3.]

There are many parts to this. Setting up automatic encoding and decoding to standard Wolfram Language constructs for text, images, audio and so on. Automatically being able to knit together individual neural net operations, particularly ones that deal with things like sequences. Being able to automate training as much as possible, including automatically doing hyperparameter optimization.

But there’s something perhaps even more important too: having a large library of existing, trained (and untrained) neural nets, that can both be used directly for computations, and can be used for transfer learning, or as feature extractors. And to achieve this, we’ve been building our Neural Net Repository:

Neural Net Repository

There are networks here that do all sorts of remarkable things. And we’re adding new networks every week. Each network has its own page, that includes examples and detailed information. The networks are stored in the cloud. But all you have to do to pull them into your computation is to use NetModel:

NetModel trained

NetModel["3D Face Alignment Net Trained on 300W Large Pose Data"]

Here’s the actual network used by FindTextualAnswer:

NetModel

NetModel["Wolfram FindTextualAnswer Net for WL 11.3"]

One thing that’s new in Version 11.3 is the iconic representation we’re using for networks. We’ve optimized it to give you a good overall view of the structure of net graphs, but then to allow interactive drilldown to any level of detail. And when you train a neural network, the interactive panels that come up have some spiffy new features—and with NetTrainResultsObject, we’ve now made the actual training process itself computable.

Version 11.3 has some new layer types like CTCLossLayer (particularly to support audio), as well as lots of updates and enhancements to existing layer types (10x faster LSTMs on GPUs, automatic variable-length convolutions, extensions of many layers to support arbitrary-dimension inputs, etc.). In Version 11.3 we’ve had a particular focus on recurrent networks and sequence generation. And to support this, we’ve introduced things like NetStateObject—that basically allows a network to have a persistent state that’s updated as a result of input data the network receives.

In developing our symbolic neural net framework we’re really going in two directions. The first is to make everything more and more automated, so it’s easier and easier to set up neural net systems. But the second is to be able to readily handle more and more neural net structures. And in Version 11.3 we’re adding a whole collection of “network surgery” functions—like NetTake, NetJoin and NetFlatten—to let you go in and tweak and hack neural nets however you want. Of course, our system is designed so that even if you do this, our whole automated system—with training and so on—still works just fine.

Asymptotic Analysis

For more than 30 years, we’ve been on a mission to make as much mathematics as possible computational. And in Version 11.3 we’ve finally started to crack an important holdout area: asymptotic analysis.

Here’s a simple example: find an approximate solution to a differential equation near x = 0:

AsymptoticDSolveValue

AsymptoticDSolveValue[x^2  y'[x] + (x^2 + 1) y[x] == 0, 
 y[x], {x, 0, 10}]

At first, this might just look like a power series solution. But look more carefully: there’s an e(1/x) factor that would just give infinity at every order as a power series in x. But with Version 11.3, we’ve now got asymptotic analysis functions that handle all sorts of scales of growth and oscillation, not just powers.

Back when I made my living as a physicist, it always seemed like some of the most powerful dark arts centered around perturbation methods. There were regular perturbations and singular perturbations. There were things like the WKB method, and the boundary layer method. The point was always to compute an expansion in some small parameter, but it seemed to always require different trickery in different cases to achieve it. But now, after a few decades of work, we finally in Version 11.3 have a systematic way to solve these problems. Like here’s a differential equation where we’re looking for the solution for small ε:

AsymptoticDSolveValue

AsymptoticDSolveValue[{\[Epsilon] y''[x] + (x + 1) y[x] == 0, 
  y[0] == 1, y[1] == 0}, y[x], x, {\[Epsilon], 0, 2}]

Back in Version 11.2, we added a lot of capabilities for dealing with more sophisticated limits. But with our asymptotic analysis techniques we’re now also able to do something else, that’s highly relevant for all sorts of problems in areas like number theory and computational complexity theory, which is to compare asymptotic growth rates.

This is asking: is 2nk asymptotically less than (nm)! as n->∞? The result: yes, subject to certain conditions:

AsymptoticLess

AsymptoticLess[ 2^n^k, (n^m)!, n -> \[Infinity]]

“Elementary” Algebra

One of the features of Wolfram|Alpha popular among students is its “Show Steps” functionality, in which it synthesizes “on-the-fly tutorials” showing how to derive answers it gives. But what actually are the steps, in, say, a Show Steps result for algebra? Well, they’re “elementary operations” like “add the corresponding sides of two equations”. And in Version 11.3, we’re including functions to just directly do things like this:

AddSides

AddSides[a == b, c == d]
MultiplySides

MultiplySides[a == b, c == d]

And, OK, it seems like these are really trivial functions, that basically just operate on the structure of equations. And that’s actually what I thought when I said we should implement them. But as our Algebra R&D team quickly pointed out, there are all sorts of gotchas (“what if b is negative?”, etc.), that are what students often get wrong—but that with all of the algorithmic infrastructure in the Wolfram Language it’s easy for us to get right:

Negative MultiplySides

MultiplySides[x/b > 7, b]

Proofs

The Wolfram Language is mostly about computing results. But given a result, one can also ask why it’s correct: one can ask for some kind of proof that demonstrates that it’s correct. And for more than 20 years I’ve been wondering how to find and represent general proofs in a useful and computable way in the Wolfram Language. And I’m excited that finally in Version 11.3 the function FindEquationalProof provides an example—which we’ll be generalizing and building on in future versions. [Livestreamed design discussion.]

My all-time favorite success story for automated theorem proving is the tiny (and in fact provably simplest) axiom system for Boolean algebra that I found in 2000. It’s just a single axiom, with a single operator that one can think of as corresponding to the Nand operation. For 11 years, FullSimplify has actually been able to use automated theorem-proving methods inside, to be able to compute things. So here it’s starting from my axiom for Boolean algebra, then computing that Nand is commutative:

FullSimplify

FullSimplify[nand[p, q] == nand[q, p], 
 ForAll[{a, b, c}, 
  nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]

But this just tells us the result; it doesn’t give any kind of proof. Well, in Version 11.3, we can now get a proof:

FindEquationalProof

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]

What is the proof object? We can see from the summary that the proof takes 102 steps. Then we can ask for a “proof graph”. The green arrow at the top represents the original axiom; the red square at the bottom represents the thing being proved. All the nodes in the middle are intermediate lemmas, proved from each other according to the connections shown.

ProofGraph

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]];
proof["ProofGraph"]

What’s actually in the proof? Well, it’s complicated. But here’s a dataset that gives all the details:

ProofDataset

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]];
proof["ProofDataset"]

You can get a somewhat more narrative form as a notebook too:

Proof notebook

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]];
proof["ProofNotebook"]

And then you can also get a “proof function”, which is a piece of code that can be executed to verify the result:

Proof

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]];
proof["ProofFunction"]

Unsurprisingly, and unexcitingly, it gives True if you run it:

Proof result

proof = FindEquationalProof[nand[p, q] == nand[q, p], 
  ForAll[{a, b, c}, 
   nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]];
proof["ProofFunction"][]

Now that we can actually generate symbolic proof structures in the Wolfram Language, there’s a lot of empirical metamathematics to do—as I’ll discuss in a future post. But given that FindEquationalProof works on arbitrary “equation-like” symbolic relations, it can actually be applied to lots of things—like verifying protocols and policies, for example in popular areas like blockchain.

The Growing Knowledgebase

The Wolfram Knowledgebase grows every single day—partly through systematic data feeds, and partly through new curated data and domains being explicitly added. If one asks what happens to have been added between Version 11.2 and Version 11.3, it’s a slightly strange grab bag. There are 150+ new properties about public companies. There are 900 new named features on Pluto and Mercury. There are 16,000 new anatomical structures, such as nerve pathways. There are nearly 500 new “notable graphs”. There are thousands of new mountains, islands, notable buildings, and other geo-related features. There are lots of new properties of foods, and new connections to diseases. And much more.

But in terms of typical everyday use of the Wolfram Knowledgebase the most important new feature in Version 11.3 is the entity prefetching system. The knowledgebase is obviously big, and it’s stored in the cloud. But if you’re using a desktop system, the data you need is “magically” downloaded for you.

Well, in Version 11.3, the magic got considerably stronger. Because now when you ask for one particular item, the system will try to figure out what you’re likely to ask for next, and it’ll automatically start asynchronously prefetching it, so when you actually ask for it, it’ll already be there on your computer—and you won’t have to wait for it to download from the cloud. (If you want to do the prefetching “by hand”, there’s the function EntityPrefetch to do it. Note that if you’re using the Wolfram Language in the cloud, the knowledgebase is already “right there”, so there’s no downloading or prefetching to do.)

The whole prefetching mechanism is applied quite generally. So, for example, if you use Interpreter to interpret some input (say, US state abbreviations), information about how to do the interpretations will also get prefetched—so if you’re using the desktop, the interpretations can be done locally without having to communicate with the cloud.

Messages and Mail

You’ve been able to send email from the Wolfram Language (using SendMail) for a decade. But starting in Version 11.3, it can use full HTML formatting, and you can embed lots of things in it—not just graphics and images, but also cloud objects, datasets, audio and so on. [Livestreamed design discussion.]

Version 11.3 also introduces the ability to send text messages (SMS and MMS) using SendMessage. For security reasons, though, you can only send to your own mobile number, as given by the value of $MobilePhone (and, yes, obviously, the number gets validated).

The Wolfram Language has been able to import mail messages and mailboxes for a long time, and with MailReceiverFunction it’s also able to respond to incoming mail. But in Version 11.3 something new that’s been added is the capability to deal with live mailboxes.

First, connect to an (IMAP, for now) mail server (I’m not showing the authentication dialog that comes up):

MailServerConnect

mail = MailServerConnect[]

Then you can basically use the Wolfram Language as a programmable mail client. This gives you a dataset of current unread messages in your mailbox:

MailSearch

MailSearch[ "fahim"|>]

Now we can pick out one of these messages, and we get a symbolic MailItem object, that for example we can delete:

MailSearch Part

MailSearch[ "fahim"|>][[1]]
MailExecute

MailExecute["Delete", %%["MailItem"]]

Systems-Level Operations

Version 11.3 supports a lot of new systems-level operations. Let’s start with a simple but useful one: remote program execution. The function RemoteRun is basically like Unix rsh: you give it a host name (or IP address) and it runs a command there. The Authentication option lets you specify a username and password. If you want to run a persistent program remotely, you can now do that with RemoteRunProcess, which is the remote analog of the local RunProcess.

In dealing with remote computer systems, authentication is always an issue—and for several years we’ve been building a progressively more sophisticated symbolic authentication framework in the Wolfram Language. In Version 11.3 there’s a new AuthenticationDialog function, which pops up a whole variety of appropriately configured authentication dialogs. Then there’s GenerateSecuredAuthenticationKey—which generates OAuth SecuredAuthenticationKey objects that people can use to authenticate calls into the Wolfram Cloud from the outside.

Also at a systems level, there are some new import/export formats, like BSON (JSON-like binary serialization format) and WARC (web archive format). There are also HTTPResponse and HTTPRequest formats, that (among many other things) you can use to basically write a web server in the Wolfram Language in a couple of lines.

We introduced ByteArray objects into the Wolfram Language quite a few years ago—and we’ve been steadily growing support for them. In Version 11.3, there are BaseEncode and BaseDecode for converting between byte arrays and Base64 strings. Version 11.3 also extends Hash (which, among other things, works on byte arrays), adding various types of hashing (such as double SHA-256 and RIPEMD) that are used for modern blockchain and cryptocurrency purposes.

We’re always adding more kinds of data that we can make computable in the Wolfram Language, and in Version 11.3 one addition is system process data, of the sort that you might get from a Unix ps command:

SystemProcessData

SystemProcessData[]

Needless to say, you can do very detailed searches for processes with specific properties. You can also use SystemProcesses to get an explicit list of ProcessObject symbolic objects, which you can interrogate and manipulate (for example, by using KillProcess).

RandomProcess

RandomSample[SystemProcesses[], 3]

Of course, because everything is computable, it’s easy to do things like make plots of the start times of processes running on your computer (and, yes, I last rebooted a few days ago):

TimelinePlot

TimelinePlot[SystemProcessData[][All, "StartTime"]]

If you want to understand what’s going on around your computer, Version 11.3 provides another powerful tool: NetworkPacketRecording. You may have to do some permissions setup, but then this function can record network packets going through any network interface on your computer.

Here’s just 0.1 seconds of packets going in and out of my computer as I quietly sit here writing this post:

NetworkPacketRecording

NetworkPacketRecording[.1]

You can drill down to look at each packet; here’s the first one that was recorded:

NetworkPacketRecording

NetworkPacketRecording[.1][[1]]

Why is this interesting? Well, I expect to use it for debugging quite regularly—and it’s also useful for studying computer security, not least because you can immediately feed everything into standard Wolfram Language visualization, machine learning and other functionality.

What Has Not Been Mentioned

This is already a long post—but there are lots of other things in 11.3 that I haven’t even mentioned. For example, there’ve been all sorts of updates for importing and exporting. Like much more efficient and robust XLS, CSV, and TSV import. Or export of animated PNGs. Or support for metadata in sound formats like MP3 and WAV. Or more sophisticated color quantization in GIF, TIFF, etc. [Livestreamed design discussions 1 and 2.]

We introduced symbolic Audio objects in 11.0, and we’ve been energetically developing audio functionality ever since. Version 11.3 has made audio capture more robust (and supported it for the first time on Linux). It’s also introduced functions like AudioPlay, AudioPause and AudioStop that control open AudioStream objects.

Also new is AudioDistance, which supports various distance measures for audio. Meanwhile, AudioIntervals can now automatically break audio into sections that are separated by silence. And, in a somewhat different area, $VoiceStyles gives the list of possible voices available for SpeechSynthesize.

Here’s a little new math function—that in this case gives a sequence of 0s and 1s in which every length-4 block appears exactly once:

DeBrujinSequence

DeBruijnSequence[{0, 1}, 4]

The Wolfram Language now has sophisticated support for quantities and units—both explicit quantities (like 2.5 kg) and symbolic “quantity variables” (“p which has units of pressure”). But once you’re inside, doing something like solving an equation, you typically want to “factor the units out”. And in 11.3 there’s now a function that systematically does this: NondimensionalizationTransform. There’s also a new mechanism in 11.3 for introducing new kinds of quantities, using IndependentPhysicalQuantity.

Much of the built-in Wolfram Knowledgebase is ultimately represented in terms of entity stores, and in Version 11 we introduced an explicit EntityStore construct for defining new entity stores. Version 11.3 introduces the function EntityRegister, which lets you register an entity store, so that you can refer to the types of entities it contains just like you would refer to built-in types of entities (like cities or chemicals).

Another thing that’s being introduced as an experiment in Version 11.3 is the MongoLink package, which supports connection to external MongoDB databases. We use MongoLink ourselves to manage terabyte-and-beyond datasets for things like machine learning training. And in fact MongoLink is part of our large-scale development effort—whose results will be seen in future versions—to seamlessly support extremely large amounts of externally stored data.

In Version 11.2 we introduced ExternalEvaluate to run code in external languages like Python. In Version 11.3 we’re experimenting with generalizing ExternalEvaluate to control web browsers, by setting up a WebDriver framework. You can give all sorts of commands, both ones that have the same effect as clicking around an actual web browser, and ones that extract things you can see on the page.

Here’s how you can use Chrome (we support both it and Firefox) to open a webpage, then capture it:

WebDriver

ExternalEvaluate["WebDriver-Chrome", {"OpenWebPage" -> 
   "https://www.wolfram.com", "CaptureWebPage"}]//Last

Well, this post is getting long, but there’s certainly more I could say. Here’s a more complete list of functions that are new or updated in Version 11.3:

Summary of New Features in 11.3

But to me it’s remarkable how much there is that’s in a .1 release of the Wolfram Language—and that’s emerged in just the few months since the last .1 release. It’s a satisfying indication of the volume of R&D that we’re managing to complete—by building on the whole Wolfram Language technology stack that we’ve created. And, yes, even in 11.3 there are a great many new corners to explore. And I hope that lots of people will do this, and will use the latest tools we’ve created to discover and invent all sorts of new and important things in the world.


To comment, please visit the copy of this post at the Wolfram Blog »

Newest Features in Mathematica and the Wolfram Language

Buzzword Convergence: Making Sense of Quantum Neural Blockchain AI

$
0
0
buzzword-thumb

Not Entirely Fooling Around

What happens if you take four of today’s most popular buzzwords and string them together? Does the result mean anything? Given that today is April 1 (as well as being Easter Sunday), I thought it’d be fun to explore this. Think of it as an Easter egg… from which something interesting just might hatch. And to make it clear: while I’m fooling around in stringing the buzzwords together, the details of what I’ll say here are perfectly real.

Buzzword convergence

But before we can really launch into talking about the whole string of buzzwords, let’s discuss some of the background to each of the buzzwords on their own.

“Quantum”

Saying something is “quantum” sounds very modern. But actually, quantum mechanics is a century old. And over the course of the past century, it’s been central to understanding and calculating lots of things in the physical sciences. But even after a century, “truly quantum” technology hasn’t arrived. Yes, there are things like lasers and MRIs and atomic force microscopes that rely on quantum phenomena, and needed quantum mechanics in order to be invented. But when it comes to the practice of engineering, what’s done is still basically all firmly classical, with nothing quantum about it.

Today, though, there’s a lot of talk about quantum computing, and how it might change everything. I actually worked on quantum computing back in the early 1980s (so, yes, it’s not that recent an idea). And I have to say, I was always a bit skeptical about whether it could ever really work—or whether any “quantum gains” one might get would be counterbalanced by inefficiencies in measuring what was going on.

But in any case, in the past 20 years or so there’s been all sorts of nice theoretical work on formulating the idea of quantum circuits and quantum computing. Lots of things have been done with the Wolfram Language, including an ongoing project of ours to produce a definitive symbolic way of representing quantum computations. But so far, all we can ever do is calculate about quantum computations, because the Wolfram Language itself just runs on ordinary, classical computers.

There are companies that have built what they say are (small) true quantum computers. And actually, we’ve been hoping to hook the Wolfram Language up to them, so we can implement a QuantumEvaluate function. But so far, this hasn’t happened. So I can’t really vouch for what QuantumEvaluate will (or will not) do.

But the big idea is basically this. In ordinary classical physics, one can pretty much say that definite things happen in the world. A billiard ball goes in this direction, or that. But in any particular case, it’s a definite direction. In quantum mechanics, though, the idea is that an electron, say, doesn’t intrinsically go in a particular, definite direction. Instead, it essentially goes in all possible directions, each with a particular amplitude. And it’s only when you insist on measuring where it went that you’ll get a definite answer. And if you do many measurements, you’ll just see probabilities for it to go in each direction.

Well, what quantum computing is trying to do is somehow to make use of the “all possible directions” idea in order to in effect get lots of computations done in parallel. It’s a tricky business, and there are only a few types of problems where the theory’s been worked out—the most famous being integer factoring. And, yes, according to the theory, a big quantum computer should be able to factor a big integer fast enough to make today’s cryptography infrastructure implode. But the only thing anyone so far even claims to have built along these lines is a tiny quantum computer—that definitely can’t yet do anything terribly interesting.

But, OK, so one critical aspect of quantum mechanics is that there can be interference between different paths that, say, an electron can take. This is mathematically similar to the interference that happens in light, or even in water waves, just in classical physics. In quantum mechanics, though, there’s supposed to be something much more intrinsic about the interference, leading to the phenomenon of entanglement, in which one basically can’t ever “see the wave that’s interfering”—only the effect.

In computing, though, we’re not making use of any kind of interference yet. Because (at least in modern times) we’re always trying to deal with discrete bits—while the typical phenomenon of interference (say in light) basically involves continuous numbers. And my personal guess is that optical computing—which will surely come—will succeed in delivering some spectacular speedups. It won’t be truly “quantum”, though (though it might be marketed like that). (For the technically minded, it’s a complicated question how computation-theoretic results apply to continuous processes like interference-based computing.)

“Neural”

A decade ago computers didn’t have any systematic way to tell whether a picture was of an elephant or a teacup. But in the past five years, thanks to neural networks, this has basically become easy. (Interestingly, the image identifier we made three years ago remains basically state of the art.)

So what’s the big idea? Well, back in the 1940s people started thinking seriously about the brain being like an electrical machine. And this led to mathematical models of “neural networks”—which were proved to be equivalent in computational power to mathematical models of digital computers. Over the years that followed, billions of actual digital electronic computers were built. And along the way, people (including me) experimented with neural networks, but nobody could get them to do anything terribly interesting. (Though for years they were quietly used for things like optical character recognition.)

But then, starting in 2012, a lot of people suddenly got very excited, because it seemed like neural nets were finally able to do some very interesting things, at first especially in connection with images.

So what happened? Well, a neural net basically corresponds to a big mathematical function, formed by connecting together lots of smaller functions, each involving a certain number of parameters (“weights”). At the outset, the big function basically just gives random outputs. But the way the function is set up, it’s possible to “train the neural net” by tuning the parameters inside it so that the function will give the outputs one wants.

It’s not like ordinary programming where one explicitly defines the steps a computer should follow. Instead, the idea is just to give examples of what one wants the neural net to do, and then to expect it to interpolate between them to work out what to do for any particular input. In practice one might show a bunch of images of elephants, and a bunch of images of teacups, and then do millions of little updates to the parameters to get the network to output “elephant” when it’s fed an elephant, and “teacup” when it’s fed a teacup.

But here’s the crucial idea: the neural net is somehow supposed to generalize from the specific examples it’s shown—and it’s supposed to say that anything that’s “like” an elephant example is an elephant, even if its particular pixels are quite different. Or, said another way, there are lots of images that might be fed to the network that are in the “basin of attraction” for “elephant” as opposed to “teacup”. In a mechanical analogy, one might say that there are lots of places water might fall on a landscape, while still ending up flowing to one lake rather than another.

At some level, any sufficiently complicated neural net can in principle be trained to do anything. But what’s become clear is that for lots of practical tasks (that turn out to overlap rather well with some of what our brains seem to do easily) it’s realistic with feasible amounts of GPU time to actually train neural networks with a few million elements to do useful things. And, yes, in the Wolfram Language we’ve now got a rather sophisticated symbolic framework for training and using neural networks—with a lot of automation (that itself uses neural nets) for everything.

“Blockchain”

The word “blockchain” was first used in connection with the invention of Bitcoin in 2008. But of course the idea of a blockchain had precursors. In its simplest form, a blockchain is like a ledger, in which successive entries are coded in a way that depends on all previous entries.

Crucial to making this work is the concept of hashing. Hashing has always been one of my favorite practical computation ideas (and I even independently came up with it when I was about 13 years old, in 1973). What hashing does is to take some piece of data, like a text string, and make a number (say between 1 and a million) out of it. It does this by “grinding up the data” using some complicated function that always gives the same result for the same input, but will almost always give different results for different inputs. There’s a function called Hash in the Wolfram Language, and for example applying it to the previous paragraph of text gives 8643827914633641131.

OK, but so how does this relate to blockchain? Well, back in the 1980s people invented “cryptographic hashes” (and actually they’re very related to things I’ve done on computational irreducibility). A cryptographic hash has the feature that while it’s easy to work out the hash for a particular piece of data, it’s very hard to find a piece of data that will generate a given hash.

So let’s say you want to prove that you created a particular document at a particular time. Well, you could compute a hash of that document, and publish it in a newspaper (and I believe Bell Labs actually used to do this every week back in the 1980s). And then if anyone ever says “no, you didn’t have that document yet” on a certain date, you can just say “but look, its hash was already in every copy of the newspaper!”.

The idea of a blockchain is that one has a series of blocks, with each containing certain content, together with a hash. And then the point is that the data from which that hash is computed is a combination of the content of the block, and the hash of the preceding block. So this means that each block in effect confirms everything that came before it on the blockchain.

In cryptocurrencies like Bitcoin the big idea is to be able to validate transactions, and, for example, be able to guarantee just by looking at the blockchain that nobody has spent the same bitcoin twice.

How does one know that the blocks are added correctly, with all their hashes computed, etc.? Well, the point is that there’s a whole decentralized network of thousands of computers around the world that store the blockchain, and there are lots of people (well, actually not so many in practice these days) competing to be the one to add each new block (and include transactions people have submitted that they want in it).

The rules are (more or less) that the first person to add a block gets to keep the fees offered on the transactions in it. But each block gets “confirmed” by lots of people including this block in their copy of the blockchain, and then continuing to add to the blockchain with this block in it.

In the latest version of the Wolfram Language, BlockchainBlockData[−1, BlockchainBase -> "Bitcoin"] gives a symbolic representation of the latest block that we’ve seen be added to the Bitcoin blockchain. And by the time maybe 5 more blocks have been added, we can be pretty sure everyone’s satisfied that the block is correct. (Yes, there’s an analogy with measurement in quantum mechanics here, which I’ll be talking about soon.)

Traditionally, when people keep ledgers, say of transactions, they’ll have one central place where a master ledger is maintained. But with a blockchain the whole thing can be distributed, so you don’t have to trust any single entity to keep the ledger correct.

And that’s led to the idea that cryptocurrencies like Bitcoin can flourish without central control, governments or banks involved. And in the last couple of years there’s been lots of excitement generated by people making large amounts of money speculating on cryptocurrencies.

But currencies aren’t the only thing one can use blockchains for, and Ethereum pioneered the idea that in addition to transactions, one can run arbitrary computations at each node. Right now with Ethereum the results of each computation are confirmed by being run on every single computer in the network, which is incredibly inefficient. But the bigger point is just that computations can be running autonomously on the network. And the computations can interact with each other, defining “smart contracts” that run autonomously, and say what should happen in different circumstances.

Pretty much any nontrivial smart contract will eventually need to know about something in the world (“did it rain today?”, “did the package arrive?”, etc.), and that has to come from off the blockchain—from an “oracle”. And it so happens (yes, as a result of a few decades of work) that our Wolfram Knowledgebase, which powers Wolfram|Alpha, etc., provides the only realistic foundation today for making such oracles.

“AI”

Back in the 1950s, people thought that pretty much anything human intelligence could do, it’d soon be possible to make artificial (machine) intelligence do better. Of course, this turned out to be much harder than people expected. And in fact the whole concept of “creating artificial intelligence” pretty much fell into disrepute, with almost nobody wanting to market their systems as “doing AI”.

But about five years ago—particularly with the unexpected successes in neural networks—all that changed, and AI was back, and cooler than ever.

What is AI supposed to be, though? Well, in the big picture I see it as being the continuation of a long trend of automating things that humans previously had to do for themselves—and in particular doing that through computation. But what makes a computation an example of AI, and not just, well, a computation?

I’ve built a whole scientific and philosophical structure around something I call the Principle of Computational Equivalence, that basically says that the universe of possible computations—even done by simple systems—is full of computations that are as sophisticated as one can ever get, and certainly as our brains can do.

In doing engineering, and in building programs, though, there’s been a tremendous tendency to try to prevent anything too sophisticated from happening—and to set things up so that the systems we build just follow exactly steps we can foresee. But there’s much more to computation than that, and in fact I’ve spent much of my life building systems that make use of this.

Wolfram|Alpha is a great example. Its goal is to take as much knowledge about the world as possible, and make it computable, then to be able to answer questions as expertly as possible about it. Experientially, it “feels like AI”, because you get to ask it questions in natural language like a human, then it computes answers often with unexpected sophistication.

Most of what’s inside Wolfram|Alpha doesn’t work anything like brains probably do, not least because it’s leveraging the last few hundred years of formalism that our civilization has developed, that allow us to be much more systematic than brains naturally are.

Some of the things modern neural nets do (and, for example, our machine learning system in the Wolfram Language does) perhaps work a little more like brains. But in practice what really seems to make things “seem like AI” is just that they’re operating on the basis of sophisticated computations whose behavior we can’t readily understand.

These days the way I see it is that out in the computational universe there’s amazing computational power. And the issue is just to be able to harness that for useful human purposes. Yes, “an AI” can go off and do all sorts of computations that are just as sophisticated as our brains. But the issue is: can we align what it does with things we care about doing?

And, yes, I’ve spent a large part of my life building the Wolfram Language, whose purpose is to provide a computational communication language in which humans can express what they want in a form suitable for computation. There’s lots of “AI power” out there in the computational universe; our challenge is to harness it in a way that’s useful to us.

Oh, and we want to have some kind of computational smart contracts that define how we want the AIs to behave (e.g. “be nice to humans”). And, yes, I think the Wolfram Language is going to be the right way to express those things, and build up the “AI constitutions” we want.

Common Themes

At the outset, it might seem as if “quantum”, “neural”, “blockchain” and “AI” are all quite separate concepts, without a lot of commonality. But actually it turns out that there are some amazing common themes.

One of the strongest has to do with complexity generation. And in fact, in their different ways, all the things we’re talking about rely on complexity generation.

Rule 30

What do I mean by complexity generation? One day I won’t have to explain this. But for now I probably still do. And somehow I find myself always showing the same picture—of my all-time favorite science discovery, the rule 30 automaton. Here it is:

And the point here is that even though the rule (or program) is very simple, the behavior of the system just spontaneously generates complexity, and apparent randomness. And what happens is complicated enough that it shows what I call “computational irreducibility”, so that you can’t reduce the computational work needed to see how it will behave: you essentially just have to follow each step to find out what will happen.

There are all sorts of important phenomena that revolve around complexity generation and computational irreducibility. The most obvious is just the fact that sophisticated computation is easy to get—which is in a sense what makes something like AI possible.

But OK, how does this relate to blockchain? Well, complexity generation is what makes cryptographic hashing possible. It’s what allows a simple algorithm to make enough apparent randomness to successfully be used as a cryptographic hash.

In the case of something like Bitcoin, there’s another connection too: the protocol needs people to have to make some investment to be able to add blocks to the blockchain, and the way this is achieved is (bizarrely enough) by forcing them to do irreducible computations that effectively cost computer time.

What about neural nets? Well, the very simplest neural nets don’t involve much complexity at all. If one drew out their “basins of attraction” for different inputs, they’d just be simple polygons. But in useful neural nets the basins of attraction are much more complicated.

It’s most obvious when one gets to recurrent neural nets, but it happens in the training process for any neural net: there’s a computational process that effectively generates complexity as a way to approximate things like the distinctions (“elephant” vs. “teacup”) that get made in the world.

Alright, so what about quantum mechanics? Well, quantum mechanics is at some level full of randomness. It’s essentially an axiom of the traditional mathematical formalism of quantum mechanics that one can only compute probabilities, and that there’s no way to “see under the randomness”.

I personally happen to think it’s pretty likely that that’s just an approximation, and that if one could get “underneath” things like space and time, we’d see how the randomness actually gets generated.

But even in the standard formalism of quantum mechanics, there’s a kind of complementary place where randomness and complexity generation is important, and it’s in the somewhat mysterious process of measurement.

Let’s start off by talking about another phenomenon in physics: the Second Law of Thermodynamics, or Law of Entropy Increase. This law says that if you start, for example, a bunch of gas molecules in a very orderly configuration (say all in one corner of a box), then with overwhelming probability they’ll soon randomize (and e.g. spread out randomly all over the box). And, yes, this kind of trend towards randomness is something we see all the time.

But here’s the strange part: if we look at the laws for, say, the motion of individual gas molecules, they’re completely reversible—so just as they say that the molecules can randomize themselves, so also they say that they should be able to unrandomize themselves.

But why do we never see that happen? It’s always been a bit mysterious, but I think there’s a clear answer, and it’s related to complexity generation and computational irreducibility. The point is that when the gas molecules randomize themselves, they’re effectively encrypting the initial conditions they were given.

It’s not impossible to place the gas molecules so they’ll unrandomize rather than randomize; it’s just that to work out how to do this effectively requires breaking the encryption—or in essence doing something very much like what’s involved in Bitcoin mining.

OK, so how does this relate to quantum mechanics? Well, quantum mechanics itself is fundamentally based on probability amplitudes, and interference between different things that can happen. But our experience of the world is that definite things happen. And the bridge from quantum mechanics to this involves the rather “bolted-on” idea of quantum measurement.

The notion is that some little quantum effect (“the electron ends up with spin up, rather than down”) needs to get amplified to the point where one can really be sure what happened. In other words, one’s measuring device has to make sure that the little quantum effect associated with one electron cascades so that it’s spread across lots and lots of electrons and other things.

And here’s the tricky part: if one wants to avoid interference being possible (so we can really perceive something “definite” as having happened), then one needs to have enough randomness that things can’t somehow equally well go backwards—just like in thermodynamics.

So even though pure quantum circuits as one imagines them for practical quantum computers typically have a sufficiently simple mathematical structure that they (presumably) don’t intrinsically generate complexity, the process of measuring what they do inevitably must generate complexity. (And, yes, it’s a reasonable question whether that’s in some sense where the randomness one sees “really” comes from… but that’s a different story.)

Reversibility, Irreversibility and More

Reversibility and irreversibility are a strangely common theme, at least between “quantum”, “neural” and “blockchain”. If one ignores measurement, a fundamental feature of quantum mechanics is that it’s reversible. What this means is that if one takes a quantum system, and lets it evolve in time, then whatever comes out one will always, at least in principle, be able to take and run backwards, to precisely reproduce where one started from.

Typical computation isn’t reversible like that. Consider an OR gate, that might be a basic component in a computer. In p OR q, the result will be true if either p or q is true. But just knowing that the result is “true”, you can’t figure out which of p and q (or both) is true. In other words, the OR operation is irreversible: it doesn’t preserve enough information for you to invert it.

In quantum circuits, one uses gates that, say, take two inputs (say p and q), and give two outputs (say p' and q'). And from those two outputs one can always uniquely reproduce the two inputs.

OK, but now let’s talk about neural nets. Neural nets as they’re usually conceived are fundamentally irreversible. Here’s why. Imagine (again) that you make a neural network to distinguish elephants and teacups. To make that work, a very large number of different possible input images all have to map, say, to “elephant”. It’s like the OR gate, but more so. Just knowing the result is “elephant” there’s no unique way to invert the computation. And that’s the whole point: one wants anything that’s enough like the elephant pictures one showed to still come out as “elephant”; in other words, irreversibility is central to the whole operation of at least this kind of neural net.

So, OK, then how could one possibly make a quantum neural net? Maybe it’s just not possible. But if so, then what’s going on with brains? Because brains seem to work very much like neural nets. And yet brains are physical systems that presumably follow quantum mechanics. So then how are brains possible?

At some level the answer has to do with the fact that brains dissipate heat. Well, what is heat? Microscopically, heat is the random motion of things like molecules. And one way to state the Second Law of Thermodynamics (or the Law of Entropy Increase) is that under normal circumstances those random motions never spontaneously organize themselves into any kind of systematic motion. In principle all those molecules could start moving in just such a way as to turn a flywheel. But in practice nothing like that ever happens. The heat just stays as heat, and doesn’t spontaneously turn into macroscopic mechanical motion.

OK, but so let’s imagine that microscopic processes involving, say, collisions of molecules, are precisely reversible—as in fact they are according to quantum mechanics. Then the point is that when lots of molecules are involved, their motions can get so “encrypted” that they just seem random. If one could look at all the details, there’d still be enough information to reverse everything. But in practice one can’t do that, and so it seems like whatever was going on in the system has just “turned into heat”.

So then what about producing “neural net behavior”? Well, the point is that while one part of a system is, say, systematically “deciding to say elephant”, the detailed information that would be needed to go back to the initial state is getting randomized, and turning into heat.

To be fair, though, this is glossing over quite a bit. And in fact I don’t think anyone knows how one can actually set up a quantum system (say a quantum circuit) that behaves in this kind of way. It’d be pretty interesting to do so, because it’d potentially tell us a lot about the quantum measurement process.

To explain how one goes from quantum mechanics in which everything is just an amplitude, to our experience of the world in which definite things seem to happen, people sometimes end up trying to appeal to mystical features of consciousness. But the point about a quantum neural net is that it’s quantum mechanical, yet it “comes to definite conclusions” (e.g. elephant vs. teacup).

Is there a good toy model for such a thing? I suspect one could create one from a quantum version of a cellular automaton that shows phase transition behavior—actually not unlike the detailed mechanics of a real quantum magnetic material. And what will be necessary is that the system has enough components (say spins) that the “heat” needed to compensate for its apparent irreversible behavior will stay away from the part where the irreversible behavior is observed.

Let me make a perhaps slightly confusing side remark. When people talk about “quantum computers”, they are usually talking about quantum circuits that operate on qubits (quantum analog of binary bits). But sometimes they actually mean something different: they mean quantum annealing devices.

Imagine you’ve got a bunch of dominoes and you’re trying to arrange them on the plane so that some matching condition associated with the markings on them is always satisfied. It turns out this can be a very hard problem. It’s related to computational irreducibility (and perhaps to problems like integer factoring). But in the end, to find out, say, the configuration that does best in satisfying the matching condition everywhere, one may effectively have to essentially just try out all possible configurations, and see which one works best.

Well, OK, but let’s imagine that the dominoes were actually molecules, and the matching condition corresponds to arranging molecules to minimize energy. Then the problem of finding the best overall configuration is like the problem of finding the minimum energy configuration for the molecules, which physically should correspond to the most stable solid structure that can be formed from the molecules.

And, OK, it might be hard to compute that. But what about an actual physical system? What will the molecules in it actually do when one cools it down? If it’s easy for the molecules to get to the lowest energy configuration, they’ll just do it, and one will have a nice crystalline solid.

People sometimes assume that “the physics will always figure it out”, and that even if the problem is computationally hard, the molecules will always find the optimal solution. But I don’t think this is actually true—and I think what instead will happen is that the material will turn mushy, not quite liquid and not quite solid, at least for a long time.

Still, there’s the idea that if one sets up this energy minimization problem quantum mechanically, then the physical system will be successful at finding the lowest energy state. And, yes, in quantum mechanics it might be harder to get stuck in local minima, because there is tunneling, etc.

But here’s the confusing part: when one trains a neural net, one ends up having to effectively solve minimization problems like the one I’ve described (“which values of weights make the network minimize the error in its output relative to what one wants?”). So people end up sometimes talking about “quantum neural nets”, meaning domino-like arrays which are set up to have energy minimization problems that are mathematically equivalent to the ones for neural nets.

(Yet another connection is that convolutional neural nets—of the kind used for example in image recognition—are structured very much like cellular automata, or like dynamic spin systems. But in training neural nets to handle multiscale features in images, one seems to end up with scale invariance similar to what one sees at critical points in spin systems, or their quantum analogs, as analyzed by renormalization group methods.)

OK, but let’s return to our whole buzzword string. What about blockchain? Well, one of the big points about a blockchain is in a sense to be as irreversible as possible. Once something has been added to a blockchain, one wants it to be inconceivable that it should ever be reversed out.

How is that achieved? Well, it’s curiously similar to how it works in thermodynamics or in quantum measurement. Imagine someone adds a block to their copy of a blockchain. Well, then the idea is that lots of other people all over the world will make their own copies of that block on their own blockchain nodes, and then go on independently adding more blocks from there.

Bad things would happen if lots of the people maintaining blockchain nodes decided to collude to not add a block, or to modify it, etc. But it’s a bit like with gas molecules (or degrees of freedom in quantum measurement). By the time everything is spread out among enough different components, it’s extremely unlikely that it’ll all concentrate together again to have some systematic effect.

Of course, people might not be quite like gas molecules (though, frankly, their observed aggregate behavior, e.g. jostling around in a crowd, is often strikingly similar). But all sorts of things in the world seem to depend on an assumption of randomness. And indeed, that’s probably necessary to maintain stability and robustness in markets where trading is happening.

OK, so when a blockchain tries to ensure that there’s a “definite history”, it’s doing something very similar to what a quantum measurement has to do. But just to close the loop a little more, let’s ask what a quantum blockchain might be like.

Yes, one could imagine using quantum computing to somehow break the cryptography in a standard blockchain. But the more interesting (and in my view, realistic) possibility is to make the actual operation of the blockchain itself be quantum mechanical.

In a typical blockchain, there’s a certain element of arbitrariness in how blocks get added, and who gets to do it. In a “proof of work” scheme (as used in Bitcoin and currently also Ethereum), to find out how to add a new block one searches for a “nonce”—a number to throw in to make a hash come out in a certain way. There are always many possible nonces (though each one is hard to find), and the typical strategy is to search randomly for them, successively testing each candidate.

But one could imagine a quantum version in which one is in effect searching in parallel for all possible nonces, and as a result producing many possible blockchains, each with a certain quantum amplitude. And to fill out the concept, imagine that—for example in the case of Ethereum—all computations done on the blockchain were reversible quantum ones (achieved, say, with a quantum version of the Ethereum Virtual Machine).

But what would one do with such a blockchain? Yes, it would be an interesting quantum system with all kinds of dynamics. But to actually connect it to the world, one has get data on and off the blockchain—or, in other words, one has to do a measurement. And the act of that measurement would in effect force the blockchain to pick a definite history.

OK, so what about a “neural blockchain”? At least today, by far the most common strategy with neural nets is first to train them, then to put them to work. (One can train them “passively” by just feeding them a fixed set of examples, or one can train them “actively” by having them in effect “ask” for the examples they want.)  But by analogy with people, neural nets can also have “lifelong learning”, in which they’re continually getting updated based on the “experiences” they’re having.

So how do the neural nets record these experiences? Well, by changing various internal weights. And in some ways what happens is like what happens with blockchains.

Science fiction sometimes talks about direct brain-to-brain transfer of memories. And in a neural net context this might mean just taking a big block of weights from one neural net and putting it into another. And, yes, it can work well to transfer definite layers in one network to another (say to transfer information on what features of images are worth picking out). But if you try to insert a “memory” deep inside a network, it’s a different story. Because the way a memory is represented in a network will depend on the whole history of the network.

It’s like in a blockchain: you can’t just replace one block and expect everything else to work. The whole thing has been knitted into the sequence of things that happen through time. And it’s the same thing with memories in neural nets: once a memory has formed in a certain way, subsequent memories will be built on top of this one.

Bringing It Together

At the outset, one might have thought that “quantum”, “neural” and “blockchain” (not to mention “AI”) didn’t have much in common (other than that they’re current buzzwords)—and that in fact they might in some sense be incompatible. But what we’ve seen is that actually there are all sorts of connections between them, and all sorts of fundamental phenomena that are shared between systems based on them.

So what might a “quantum neural blockchain AI” (“QNBAI”) be like?

Let’s look at the pieces again. A single blockchain node is a bit like a single brain, with a definite memory. But in a sense the whole blockchain network becomes robust through all the interactions between different blockchain nodes. It’s a little like how human society and human knowledge develop.

Let’s say we’ve got a “raw AI” that can do all sorts of computation. Well, the big issue is whether we can find a way to align what it can do with things that we humans think we want to do. And to make that alignment, we essentially have to communicate with the AI at a level of abstraction that transcends the details of how it works: in effect, we have to have some symbolic language that we both understand, and that for example AI can translate into the details of how it operates.

Inside the AI it may end up using all kinds of “concepts” (say to distinguish one class of images from another). But the question is whether those concepts are ones that we humans in a sense “culturally understand”. In other words, are those concepts (and, for example, the words for them) ones that there’s a whole widely understood story about?

In a sense, concepts that we humans find useful for communication are ones that have been used in all sorts of interactions between different humans. The concepts become robust by being “knitted into” the thought patterns of many interacting brains, a bit like the data put on a blockchain becomes a robust part of “collective blockchain memory” through the interactions between blockchain nodes.

OK, so there’s something strange here. At first it seemed like QNBAIs would have to be something completely exotic and unfamiliar (and perhaps impossible). But somehow as we go over their features they start to seem awfully familiar—and actually awfully like us.

Yup, according to the physics, we know we are “quantum”. Neural nets capture many core features of how our brains seem to work. Blockchain—at least as a general concept—is somehow related to individual and societal memory. And AI, well, AI in effect tries to capture what’s aligned with human goals and intelligence in the computational universe—which is also what we’re doing.

OK, so what’s the closest thing we know to a QNBAI? Well, it’s probably all of us!

Maybe that sounds crazy. I mean, why should a string of buzzwords from 2018 connect like that? Well, at some level perhaps there’s an obvious answer: we tend to create and study things that are relevant to us, and somehow revolve around us. And, more than that, the buzzwords of today are things that are somehow just within the scope that we can now think about with the concepts we’ve currently developed–and that are somehow connected through them.

I must say that when I chose these buzzwords I had no idea they’d connect at all. But as I’ve tried to work through things in writing this, it’s been remarkable how much connection I’ve found. And, yes, in a fittingly bizarre end to a somewhat bizarre journey,  it does seem to be the case that a string plucked from today’s buzzword universe has landed very close to home. And maybe in the end—at least in some sense—we are our buzzwords!

Learning about the Future from 2001: A Space Odyssey, Fifty Years Later

$
0
0
thumbnail

2001: A Space Odyssey

A Glimpse of the Future

It was 1968. I was 8 years old. The “space race” was in full swing. For the first time, a space probe had recently landed on another planet (Venus). And I was eagerly studying everything I could to do with space.

Then on April 3, 1968 (May 15 in the UK), the movie 2001: A Space Odyssey was released—and I was keen to see it. So in the early summer of 1968 there I was, the first time I’d ever been in an actual cinema (yes, it was called that in the UK). I’d been dropped off for a matinee, and was pretty much the only person in the theater. And to this day, I remember sitting in a plush seat and eagerly waiting for the curtain to go up, and the movie to begin.

It started with an impressive extraterrestrial sunrise. But then what was going on? Those weren’t space scenes. Those were landscapes, and animals. I was confused, and frankly a little bored. But just when I was getting concerned, there was a bone thrown in the air that morphed into a spacecraft, and pretty soon there was a rousing waltz—and a big space station turning majestically on the screen.

Listen to this essay on SoundCloud »

Scene one

The next two hours had a big effect on me. It wasn’t really the spacecraft (I’d seen plenty of them in books by then, and in fact made many of my own concept designs). And at the time I didn’t care much about the extraterrestrials. But what was new and exciting for me in the movie was the whole atmosphere of a world full of technology—and the notion of what might be possible there, with all those bright screens doing things, and, yes, computers driving it all.

Control screens

It would be another year before I saw my first actual computer in real life. But those two hours in 1968 watching 2001 defined an image of what the computational future could be like, that I carried around for years.

I think it was during the intermission to the movie that some seller of refreshments—perhaps charmed by a solitary kid so earnestly pondering the movie—gave me a “cinema program” about the movie. Half a century later I still have that program, complete with a food stain, and faded writing from my 8-year-old self, recording (with some misspelling) where and when I saw the movie:

Brochure

What Actually Happened

A lot has happened in the past 50 years, particularly in technology, and it’s an interesting experience for me to watch 2001 again—and compare what it predicted with what’s actually happened. Of course, some of what’s actually been built over the past 50 years has been done by people like me, who were influenced in larger or smaller ways by 2001.

When Wolfram|Alpha was launched in 2009—showing some distinctly HAL-like characteristics—we paid a little homage to 2001 in our failure message (needless to say, one piece of notable feedback we got at the beginning was someone asking: “How did you know my name was Dave?!”):

Sorry Dave error code

One very obvious prediction of 2001 that hasn’t panned out, at least yet, is routine, luxurious space travel. But like many other things in the movie, it doesn’t feel like what was predicted was off track; it’s just that—50 years later—we still haven’t got there yet.

So what about the computers in the movie? Well, they have lots of flat-screen displays, just like real computers today. In the movie, though, one obvious difference is that there’s one physical display per functional area; the notion of windows, or dynamically changeable display areas, hadn’t arisen yet.

Another difference is in how the computers are controlled. Yes, you can talk to HAL. But otherwise, it’s lots and lots of mechanical buttons. To be fair, cockpits today still have plenty of buttons—but the centerpiece is now a display. And, yes, in the movie there weren’t any touchscreens—or mice. (Both had actually been invented a few years before the movie was made, but neither was widely known.)

There also aren’t any keyboards to be seen (and in the high-tech spacecraft full of computers going to Jupiter, the astronauts are writing with pens on clipboards; presciently, no slide rules and no tape are shown—though there is one moment when a printout that looks awfully like a punched card is produced). Of course, there were keyboards for computers back in the 1960s. But in those days, very few people could type, and there probably didn’t seem to be any reason to think that would change. (Being something of a committed tool user, I myself was routinely using a typewriter even in 1968, though I didn’t know any other kids who were—and my hands at the time weren’t big or strong enough to do much other than type fast with one finger, a skill whose utility returned decades later with the advent of smartphones.)

What about the content of the computer displays? That might have been my favorite thing in the whole movie. They were so graphical, and communicating so much information so quickly. I had seen plenty of diagrams in books, and had even painstakingly drawn quite a few myself. But back in 1968 it was amazing to imagine that a computer could generate information, and display it graphically, so quickly.

Of course there was television (though color only arrived in the UK in 1968, and I’d only seen black and white). But television wasn’t generating images; it was just showing what a camera saw. There were oscilloscopes too, but they just had a single dot tracing out a line on the screen. So the computer displays in 2001 were, at least for me, something completely new.

At the time it didn’t seem odd that in the movie there were lots of printed directions (how to use the “Picturephone”, or the zero-gravity toilet, or the hibernation modules). Today, any such instructions (and they’d surely be much shorter, or at least broken up a lot, for today’s less patient readers) would be shown onscreen. But when 2001 was made, the idea of word processing, and of displaying text to read onscreen, was still several years in the future—probably not least because at the time people thought of computers as machines for calculation, and there didn’t seem to be anything calculational about text.

There are lots of different things shown on the displays in 2001.  Even though there isn’t the idea of dynamically movable windows, the individual displays, when they’re not showing anything, go into a kind of “iconic” state, just showing in large letters codes like NAV or ATM or FLX or VEH or GDE.

When the displays are active they sometimes show things like tables of numbers, and sometimes show lightly animated versions of a whole variety of textbook-like diagrams. A few of them show 1980s-style animated 3D line graphics (“what’s the alignment of the spacecraft?”, etc.)—perhaps modeled after analog airplane controls.

But very often there’s also something else—and occasionally it fills a whole display. There’s something that looks like code, or a mixture of code and math.

Docking controls

It’s usually in a fairly “modern-looking” sans serif font (well, actually, a font called Manifold for IBM Selectric electric typewriters). Everything’s uppercase. And with stars and parentheses and names like TRAJ04, it looks a bit like early Fortran code (except that given the profusion of semicolons, it was more likely modeled on IBM’s PL/I language). But then there are also superscripts, and built-up fractions—like math.

Looking at this now, it’s a bit like trying to decode an alien language. What did the makers of the movie intend this to be about? A few pieces make sense to me. But a lot of it looks random and nonsensical—meaningless formulas full of unreasonably high-precision numbers. Considering all the care put into the making of 2001, this seems like a rare lapse—though perhaps 2001 started the long and somewhat unfortunate tradition of showing meaningless code in movies. (A recent counterexample is my son Christopher’s alien-language-analysis code for Arrival, which is actual Wolfram Language code that genuinely makes the visualizations shown.)

But would it actually make sense to show any form of code on real displays like the ones in 2001? After all, the astronauts aren’t supposed to be building the spacecraft; they’re only operating it. But here’s a place where the future is only just now arriving. During most of the history of computing, code has been something that humans write, and computers read. But one of my goals with the Wolfram Language is to create a true computational communication language that is high-level enough that not only computers, but also humans, can usefully read.

Yes, one might be able to describe in words some procedure that a spacecraft is executing. But one of the points of the Wolfram Language is to be able to state the procedure in a form that directly fits in with human computational thinking. So, yes, on the first real manned spacecraft going to Jupiter, it’ll make perfect sense to display code, though it won’t look quite like what’s in 2001.

Accidents of History

I’ve watched 2001 several times over the years, though not specifically in the year 2001 (that year for me was dominated by finishing my magnum opus A New Kind of Science). But there are several very obvious things in the movie 2001 that don’t ring true for the real year 2001—quite beyond the very different state of space travel.

One of the most obvious is that the haircuts and clothing styles and general formality look wrong. Of course these would have been very hard to predict. But perhaps one could at least have anticipated (given the hippie movement etc.) that clothing styles and so on would get less formal. But back in 1968, I certainly remember for example getting dressed up even to go on an airplane.

Another thing that today doesn’t look right in the movie is that nobody has a personal computer. Of course, back in 1968 there were still only a few thousand computers in the whole world—each weighing at least some significant fraction of a ton—and basically nobody imagined that one day individual people would have computers, and be able to carry them around.

As it happens, back in 1968 I’d recently been given a little plastic kit mechanical computer (called Digi-Comp I) that could (very laboriously) do 3-digit binary operations. But I think it’s fair to say that I had absolutely no grasp of how this could scale up to something like the computers in 2001. And indeed when I saw 2001 I imagined that to have access to technology like I saw in the movie, I’d have to be joining something like NASA when I was grown up.

What of course I didn’t foresee—and I’m not sure anyone did—is that consumer electronics would become so small and cheap. And that access to computers and computation would therefore become so ubiquitous.

In the movie, there’s a sequence where the astronauts are trying to troubleshoot a piece of electronics. Lots of nice computer-aided, engineering-style displays come up. But they’re all of printed circuit boards with discrete components. There are no integrated circuits or microprocessors—which isn’t surprising, because in 1968 these basically hadn’t been invented yet. (Correctly, there aren’t vacuum tubes, though. Apparently the actual prop used—at least for exterior views—was a gyroscope.)

Troubleshooting machinery

It’s interesting to see all sorts of little features of technology that weren’t predicted in the movie. For example, when they’re taking commemorative pictures in front of the monolith on the Moon, the photographer keeps tipping the camera after each shot—presumably to advance the film inside. The idea of digital cameras that could electronically take pictures simply hadn’t been imagined then.

In the history of technology, there are certain things that just seem inevitable—even though sometimes they may take decades to finally arrive. An example are videophones. There were early ones even back in the 1930s. And there were attempts to consumerize them in the 1970s and 1980s. But even by the 1990s they were still exotic—though I remember that with some effort I successfully rented a pair of them in 1993—and they worked OK, even over regular phone lines.

On the space station in 2001, there’s a Picturephone shown, complete with an AT&T logo—though it’s the old Bell System logo that looks like an actual bell. And as it happens, when 2001 was being made, there was a real project at AT&T called the Picturephone.

Picturephone

Of course, in 2001 the Picturephone isn’t a cellphone or a mobile device. It’s a built-in object, in a kiosk—a pay Picturephone. In the actual course of history, though, the rise of cellphones occurred before the consumerization of videochat—so payphone and videochat technology basically never overlapped.

Also interesting in 2001 is that the Picturephone is a push-button phone, with exactly the same numeric button layout as today (though without the * and # [“octothorp”]). Push-button phones actually already existed in 1968, although they were not yet widely deployed. And, of course, because of the details of our technology today, when one actually does a videochat, I don’t know of any scenario in which one ends up pushing mechanical buttons.

There’s a long list of instructions printed on the Picturephone—but in actuality, just like today, its operation seems quite straightforward. Back in 1968, though, even direct long-distance dialing (without an operator) was fairly new—and wasn’t yet possible at all between different countries.

To use the Picturephone in 2001, one inserts a credit card. Credit cards had existed for a while even in 1968, though they were not terribly widely used. The idea of automatically reading credit cards (say, using a magnetic stripe) had actually been developed in 1960, but it didn’t become common until the 1980s. (I remember that in the mid-1970s in the UK, when I got my first ATM card, it consisted simply of a piece of plastic with holes like a punched card—not the most secure setup one can imagine.)

At the end of the Picturephone call in 2001, there’s a charge displayed: $1.70. Correcting for inflation, that would be about $12 today. By the standards of modern cellphones—or internet videochatting—that’s very expensive. But for a present-day satellite phone, it’s not so far off, even for an audio call. (Today’s handheld satphones can’t actually support the necessary data rates for videocalls, and networks on planes still struggle to handle videocalls.)

On the space shuttle (or, perhaps better, space plane) the cabin looks very much like a modern airplane—which probably isn’t surprising, because things like Boeing 737s already existed in 1968. But in a correct (at least for now) modern touch, the seat backs have TVs—controlled, of course, by a row of buttons. (And there’s also futuristic-for-the-1960s programming, like a televised women’s judo match.)

A curious film-school-like fact about 2001 is that essentially every major scene in the movie (except the ones centered on HAL) shows the consumption of food. But how would food be delivered in the year 2001? Well, like everything else, it was assumed that it would be more automated, with the result that in the movie a variety of elaborate food dispensers are shown. As it’s turned out, however, at least for now, food delivery is something that’s kept humans firmly in the loop (think McDonald’s, Starbucks, etc.).

In the part of the movie concerned with going to Jupiter, there are “hibernaculum pods” shown—with people inside in hibernation. And above these pods there are vital-sign displays, that look very much like modern ICU displays. In a sense, that was not such a stretch of a prediction, because even in 1968, there had already been oscilloscope-style EKG displays for some time.

Of course, how to put people into hibernation isn’t something that’s yet been figured out in real life. That it—and cryonics—should be possible has been predicted for perhaps a century. And my guess is that—like cloning or gene editing—to do it will take inventing some clever tricks. But in the end I expect it will pretty much seem like a historical accident in which year it’s figured out. It just so happens not to have happened yet.

There’s a scene in 2001 where one of the characters arrives on the space station and goes through some kind of immigration control (called “Documentation”)—perhaps imagined to be set up as some kind of extension to the Outer Space Treaty from 1967. But what’s particularly notable in the movie is that the clearance process is handled automatically, using biometrics, or specifically, voiceprint identification. (The US insignia displayed are identical to the ones on today’s US passports, but in typical pre-1980s form, the system asks for “surname” and “Christian name”.)

There had been primitive voice recognition systems even in the 1950s (“what digit is that?”), and the idea of identifying speakers by voice was certainly known. But what was surely not obvious is that serious voice systems would need the kind of computer processing power that only became available in the late 2000s.

And in just the last few years, automatic biometric immigration control systems have started to become common at airports—though using face and sometimes fingerprint recognition rather than voice. (Yes, it probably wouldn’t work well to have lots of people talking at different kiosks at the same time.)

In the movie, the kiosk has buttons for different languages: English, Dutch, Russian, French, Italian, Japanese. It would have been very hard to predict what a more appropriate list for 2001 might have been.

Even though 1968 was still in the middle of the Cold War, the movie correctly portrays international use of the space station—though, like in Antarctica today, it portrays separate moon bases for different countries. Of course, the movie talks about the Soviet Union. But the fact the Berlin Wall would fall 21 years after 1968 isn’t the kind of thing that ever seems predictable in human history.

The movie shows logos from quite a few companies as well. The space shuttle is proudly branded Pan Am. And in at least one scene, its instrument panel has “IBM” in the middle. (There’s also an IBM logo on spacesuit controls during an EVA near Jupiter.)  On the space station there are two hotels shown: Hilton and Howard Johnson’s. There’s also a Whirlpool “TV dinner” dispenser in the galley of the spacecraft going to the Moon. And there’s the AT&T (Bell System) Picturephone, as well as an Aeroflot bag, and a BBC newscast. (The channel is “BBC 12”, though in reality the expansion has only been from BBC 2 to BBC 4 in the past 50 years.)

Companies have obviously risen and fallen over the course of 50 years, but it’s interesting how many of the ones featured in the movie still exist, at least in some form. Many of their logos are even almost the same—though AT&T and BBC are two exceptions, and the IBM logo got stripes added in 1972.

It’s also interesting to look at the fonts used in the movie. Some seem quite dated to us today, while others (like the title font) look absolutely modern. But what’s strange is that at times over the past 50 years some of those “modern” fonts would have seemed old and tired. But such, I suppose, is the nature of fashion. And it’s worth remembering that even those “serifed fonts” from stone inscriptions in ancient Rome are perfectly capable of looking sharp and modern.

Something else that’s changed since 1968 is how people talk, and the words they use. The change seems particularly notable in the technospeak. “We are running cross-checking routines to determine reliability of this conclusion” sounds fine for the 1960s, but not so much for today. There’s mention of the risk of “social disorientation” without “adequate preparation and conditioning”, reflecting a kind of behaviorist view of psychology that at least wouldn’t be expressed the same way today.

It’s sort of charming when a character in 2001 says that whenever they “phone” a moon base, they get “a recording which repeats that the phone lines are temporarily out of order”. One might not say something too different about landlines on Earth today, but it feels like with a moon base one should at least be talking about automatically finding out if their network is down, rather than about having a person call on the phone and listen to a recorded message.

Of course, had a character in 2001 talked about “not being able to ping their servers”, or “getting 100% packet loss” it would have been completely incomprehensible to 1960s movie-goers—because those are concepts of a digital world which basically had just not been invented yet (even though the elements for it definitely existed).

What about HAL?

The most notable and enduring character from 2001 is surely the HAL 9000 computer, described (with exactly the same words as might be used today) as “the latest in machine intelligence”. HAL talks, lipreads, plays chess, recognizes faces from sketches, comments on artwork, does psychological evaluations, reads from sensors and cameras all over the spaceship, predicts when electronics will fail, and—notably to the plot—shows a variety of human-like emotional responses.

It might seem remarkable that all these AI-like capabilities would be predicted in the 1960s. But actually, back then, nobody yet thought that AI would be hard to create—and it was widely assumed that before too long computers would be able to do pretty much everything humans can, though probably better and faster and on a larger scale.

But already by the 1970s it was clear that things weren’t going to be so easy, and before long the whole field of AI basically fell into disrepute—with the idea of creating something like HAL beginning to seem as fictional as digging up extraterrestrial artifacts on the Moon.

In the movie, HAL’s birthday is January 12, 1992 (though in the book version of 2001, it was 1997). And in 1997, in Urbana, Illinois, fictional birthplace of HAL (and, also, as it happens, the headquarters location of my company), I went to a celebration of HAL’s fictional birthday. People talked about all sorts of technologies relevant to HAL. But to me the most striking thing was how low the expectations had become. Almost nobody even seemed to want to mention “general AI” (probably for fear of appearing kooky), and instead people were focusing on solving very specific problems, with specific pieces of hardware and software.

Having read plenty of popular science (and some science fiction) in the 1960s, I certainly started from the assumption that one day HAL-like AIs would exist. And in fact I remember that in 1972, when I happened to end up delivering a speech to my whole school—and picking the topic of what amounts to AI ethics. I’m afraid that what I said I would now consider naive and misguided (and in fact I was perhaps partly misled by 2001). But, heck, I was only 12 at the time. And what I find interesting today is just that I thought AI was an important topic even back then.

For the remainder of the 1970s I was personally mostly very focused on physics (which, unlike AI, was thriving at the time). AI was still in the back of my mind, though, when for example I wanted to understand how brains might or might not relate to statistical physics and to things like the formation of complexity. But what made AI really important again for me was that in 1981 I had launched my first computer language (SMP) and had seen how successful it was at doing mathematical and scientific computations—and I got to wondering what it would take to do computations about (and know about) everything.

My immediate assumption was that it would require full brain-like capabilities, and therefore general AI. But having just lived through so many advances in physics, this didn’t immediately faze me. And in fact, I even had a fairly specific plan. You see, SMP—like the Wolfram Language today—was fundamentally based on the idea of defining transformations to apply when expressions match particular patterns. I always viewed this as a rough idealization of certain forms of human thinking. And what I thought was that general AI might effectively just require adding a way to match not just precise patterns, but also approximate ones (e.g. “that’s a picture of an elephant, even though its pixels aren’t exactly the same as in the sample”).

I tried a variety of schemes for doing this, one of them being neural nets. But somehow I could never formulate experiments that were simple enough to even have a clear definition of success. But by making simplifications to neural nets and a couple of other kinds of systems, I ended up coming up with cellular automata—which quickly allowed me to make some discoveries that started me on my long journey of studying the computational universe of simple programs, and made me set aside approximate pattern matching and the problem of AI.

At the time of HAL’s fictional birthday in 1997, I was actually right in the middle of my intense 10-year process of exploring the computational universe and writing A New Kind of Science—and it was only out of my great respect for 2001 that I agreed to break out of being a hermit for a day and talk about HAL.

It so happened that just three weeks before there had been the news of the successful cloning of Dolly the sheep.

And, as I pointed out, just like general AI, people had discussed cloning mammals for ages. But it had been assumed to be impossible, and almost nobody had worked on it—until the success with Dolly. I wasn’t sure what kind of discovery or insight would lead to progress in AI. But I felt certain that eventually it would come.

Meanwhile, from my study of the computational universe, I’d formulated my Principle of Computational Equivalence—which had important things to say about artificial intelligence. And at some level, what it said is that there isn’t some magic “bright line” that separates the “intelligent” from the merely computational.

Emboldened by this—and with the Wolfram Language as a tool—I then started thinking again about my quest to solve the problem of computational knowledge. It certainly wasn’t an easy thing. But after quite a few years of work, in 2009, there it was: Wolfram|Alpha—a general computational knowledge engine with a lot of knowledge about the world. And particularly after Wolfram|Alpha was integrated with voice input and voice output in things like Siri, it started to seem in many ways quite HAL-like.

HAL in the movie had some more tricks, though. Of course he had specific knowledge about the spacecraft he was running—a bit like the custom Enterprise Wolfram|Alpha systems that now exist at various large corporations. But he had other capabilities too—like being able to do visual recognition tasks.

And as computer science developed, such things had hardened into tough nuts that basically “computers just can’t do”. To be fair, there was lots of practical progress in things like OCR for text, and face recognition. But it didn’t feel general. And then in 2012, there was a surprise: a trained neural net was suddenly discovered to perform really well on standard image recognition tasks.

It was a strange situation. Neural nets had first been discussed in the 1940s, and had seen several rounds of waxing and waning enthusiasm over the decades. But suddenly just a few years ago they really started working. And a whole bunch of “HAL-like tasks” that had seemed out of range suddenly began to seem achievable.

In 2001, there’s the idea that HAL wasn’t just “programmed”, but somehow “learned”. And in fact HAL mentions at one point that HAL had a (human) teacher. And perhaps the gap between HAL’s creation in 1992 and deployment in 2001 was intended to correspond to HAL’s human-like period of education. (Arthur C. Clarke probably changed the birth year to 1997 for the book because he thought that a 9-year-old computer would be obsolete.)

But the most important thing that’s made modern machine learning systems actually start to work is precisely that they haven’t been trained at human-type rates. Instead, they’ve immediately been fed millions or billions of example inputs—and then they’ve been expected to burn huge amounts of CPU time systematically finding what amount to progressively better fits to those examples. (It’s conceivable that an “active learning” machine could be set up to basically find the examples it needs within a human-schoolroom-like environment, but this isn’t how the most important successes in current machine learning have been achieved.)

So can machines now do what HAL does in the movie? Unlike a lot of the tasks presumably needed to run an actual spaceship, most of the tasks the movie concentrates on HAL doing are ones that seem quintessentially human. And most of these turn out to be well-suited to modern machine learning—and month by month more and more of them have now been successfully tackled.

But what about knitting all these tasks together, to make a “complete HAL”? One could conceivably imagine having some giant neural net, and “training it for all aspects of life”. But this doesn’t seem like a good way to do things. After all, if we’re doing celestial mechanics to work out the trajectory of a spacecraft, we don’t have to do it by matching examples; we can do it by actual calculation, using the achievements of mathematical science.

We need our HAL to be able to know about a lot of kinds of things, and to be able to compute about a lot of kinds of things, including ones that involve human-like recognition and judgement.

In the book version of 2001, the name HAL was said to stand for “Heuristically programmed ALgorithmic computer”. And the way Arthur C. Clarke explained it is that this was supposed to mean “it can work on a program that’s already set up, or it can look around for better solutions and you get the best of both worlds”.

And at least in some vague sense, this is actually a pretty good description of what I’ve built over the past 30 years as the Wolfram Language. The “programs that are already set up” happen to try to encompass a lot of the systematic knowledge about computation and about the world that our civilization has accumulated.

But there’s also the concept of searching for new programs. And actually the science that I’ve done has led me to do a lot of work searching for programs in the computational universe of all possible programs. We’ve had many successes in finding useful programs that way, although the process is not as systematic as one might like.

In recent years, the Wolfram Language has also incorporated modern machine learning—in which one is effectively also searching for programs, though in a restricted domain defined for example by weights in a neural network, and constructed so that incremental improvement is possible.

Could we now build a HAL with the Wolfram Language? I think we could at least get close. It seems well within range to be able to talk to HAL in natural language about all sorts of relevant things, and to have HAL use knowledge-based computation to control and figure out things about the spaceship (including, for example, simulating components of it).

The “computer as everyday conversation companion” side of things is less well developed, not least because it’s not as clear what the objective might be there. But it’s certainly my hope that in the next few years—in part to support applications like computational smart contracts (and yes, it would have been good to have one of those set up for HAL)—that things like my symbolic discourse language project will provide a general framework for doing this.

“Incapable of Error”

Do computers “make mistakes”? When the first electronic computers were made in the 1940s and 1950s, the big issue was whether the hardware in them was reliable. Did the electrical signals do what they were supposed to, or did they get disrupted, say because a moth (“bug”) flew inside the computer?

By the time mainframe computers were developed in the early 1960s, such hardware issues were pretty well under control. And so in some sense one could say (and marketing material did) that computers were “perfectly reliable”.

HAL reflects this sentiment in 2001. “The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.”

From a modern point of view, saying this kind of thing seems absurd. After all, everyone knows that computer systems—or, more specifically, software systems—inevitably have bugs. But in 1968, bugs weren’t really understood.

After all, computers were supposed to be perfect, logical machines. And so, the thinking went, they must operate in a perfect way. And if anything went wrong, it must, as HAL says in the movie, “be attributable to human error”. Or, in other words, that if the human were smart and careful enough, the computer would always “do the right thing”.

When Alan Turing did his original theoretical work in 1936 to show that universal computers could exist, he did it by writing what amounts to a program for his proposed universal Turing machine. And even in this very first program (which is only a page long), it turns out that there were already bugs.

But, OK, one might say, with enough effort, surely one can get rid of any possible bug. Well, here’s the problem: to do so requires effectively foreseeing every aspect of what one’s program could ever do. But in a sense, if one were able to do that, one almost doesn’t need the program in the first place.

And actually, pretty much any program that’s doing nontrivial things is likely to show what I call computational irreducibility, which implies that there’s no way to systematically shortcut what the program does. To find out what it does, there’s basically no choice but just to run it and watch what it does. Sometimes this might be seen like a desirable feature—for example if one’s setting up a cryptocurrency that one wants to take irreducible effort to mine.

And, actually, if there isn’t computational irreducibility in a computation, then it’s a sign that the computation isn’t being done as efficiently as it could be.

What is a bug? One might define it as a program doing something one doesn’t want. So maybe we want the pattern on the left created by a very simple program to never die out. But the point is that there may be no way in anything less than an infinite time to answer the “halting problem” of whether it can in fact die out. So, in other words, figuring out if the program “has a bug” and does something one doesn’t want may be infinitely hard.

Cellular automaton

And of course we know that bugs are not just a theoretical problem; they exist in all large-scale practical software. And unless HAL only does things that are so simple that we foresee every aspect of them, it’s basically inevitable that HAL will exhibit bugs.

But maybe, one might think, HAL could at least be given some overall directives—like “be nice to humans”, or other potential principles of AI ethics. But here’s the problem: given any precise specification, it’s inevitable that there will unintended consequences. One might say these are “bugs in the specification”, but the problem is they’re inevitable. When computational irreducibility is present, there’s basically never any finite specification that can avoid any conceivable “unintended consequence”.

Or, said in terms of 2001, it’s inevitable that HAL will be capable of exhibiting unexpected behavior. It’s just a consequence of being a system that does sophisticated computation. It lets HAL “show creativity” and “take initiative”. But it also means HAL’s behavior can’t ever be completely predicted.

The basic theoretical underpinnings to know this already existed in the 1950s or even earlier. But it took experience with actual complex computer systems in the 1970s and 1980s for intuition about bugs to develop. And it took my explorations of the computational universe in the 1980s and 1990s to make it clear how ubiquitous the phenomenon of computational irreducibility actually is, and how much it affects basically any sufficiently broad specification.

How Did They Get It Right?

It’s interesting to see what the makers of 2001 got wrong about the future, but it’s impressive how much they got right. So how did they do it? Well, between Stanley Kubrick and Arthur C. Clarke (and their “scientific consultant” Fred Ordway III), they solicited input from a fair fraction of the top technology companies of the day—and (though there’s nothing in the movie credits about them) received a surprising amount of detailed information about the plans and aspirations of these companies, along with quite a few designs custom-made for the movie as a kind of product placement.

In the very first space scene in the movie, for example, one sees an assortment of differently shaped spacecraft, that were based on concept designs from the likes of Boeing, Grumman and General Dynamics, as well as NASA. (In the movie, there are no aerospace manufacturer logos—and NASA also doesn’t get a mention; instead the assorted spacecraft carry the flags of various countries.)

But so where did the notion of having an intelligent computer come from? I don’t think it had an external source. I think it was just an idea that was very much “in the air” at the time. My late friend Marvin Minsky, who was one of the pioneers of AI in the 1960s, visited the set of 2001 during its filming. But Kubrick apparently didn’t ask him about AI; instead he asked about things like computer graphics, the naturalness of computer voices, and robotics. (Marvin claims to have suggested the configuration of arms that was used for the pods on the Jupiter spacecraft.)

But what about the details of HAL? Where did those come from? The answer is that they came from IBM.

IBM was at the time by far the world’s largest computer company, and it also conveniently happened to be headquartered in New York City, which is where Kubrick and Clarke were doing their work. IBM—as now—was always working on advanced concepts that they could demo. They worked on voice recognition. They worked on image recognition. They worked on computer chess. In fact, they worked on pretty much all the specific technical features of HAL shown in 2001. Many of these features are even shown in the “Information Machine” movie IBM made for the 1964 World’s Fair in New York City (though, curiously, that movie has a dynamic multi-window form of presentation that wasn’t adopted for HAL).

Marketing brochure

In 1964, IBM had proudly introduced their System/360 mainframe computers:

And the rhetoric about HAL having a flawless operational record could almost be out of IBM’s marketing material for the 360. And of course HAL was physically big—like a mainframe computer (actually even big enough that a person could go inside the computer). But there was one thing about HAL that was very non-IBM. Back then, IBM always strenuously avoided ever saying that computers could themselves be smart; they just emphasized that computers would do what people told them to. (Somewhat ironically, the internal slogan that IBM used for its employees was “Think”. It took until the 1980s for IBM to start talking about computers as smart—and for example in 1980 when my friend Greg Chaitin was advising the then-head of research at IBM he was told it was deliberate policy not to pursue AI, because IBM didn’t want its human customers to fear they might be replaced by AIs.)

An interesting letter from 1966 surfaced recently. In it, Kubrick asks one of his producers (a certain Roger Caras, who later became well known as a wildlife TV personality): “Does I.B.M. know that one of the main themes of the story is a psychotic computer?”. Kubrick is concerned that they will feel “swindled”. The producer writes back, talking about IBM as “the technical advisor for the computer”, and saying that IBM will be OK so long as they are “not associated with the equipment failure by name”.

But was HAL supposed to be an IBM computer? The IBM logo appears a couple of times in the movie, but not on HAL. Instead, HAL has a nameplate that looks like this:

HAL's nameplate

It’s certainly interesting that the blue is quite like IBM’s characteristic “big blue” blue. It’s also very curious that if you go one step forward in the alphabet from the letters H A L, you get I B M. Arthur C. Clarke always claimed this was a coincidence, and it probably was. But my guess is that at some point, that blue part of HAL’s nameplate was going to say “IBM”.

Like some other companies, IBM was fond of naming its products with numbers. And it’s interesting to look at what numbers they used. In the 1960s, there were a lot of 3- and 4-digit numbers starting with 3’s and 7’s, including a whole 7000 series, etc. But, rather curiously, there was not a single one starting with 9: there was no IBM 9000 series. In fact, IBM didn’t have a single product whose name started with 9 until the 1990s. And I suspect that was due to HAL.

By the way, the IBM liaison for the movie was their head of PR, C. C. Hollister, who was interviewed in 1964 by the New York Times about why IBM—unlike its competitors—ran general advertising (think Super Bowl), given that only a thin stratum of corporate executives actually made purchasing decisions about computers. He responded that their ads were “designed to reach… the articulators or the 8 million to 10 million people that influence opinion on all levels of the nation’s life” (today one would say “opinion makers”, not “articulators”).

He then added “It is important that important people understand what a computer is and what it can do.” And in some sense, that’s what HAL did, though not in the way Hollister might have expected.

Predicting the Future

OK, so now we know—at least over the span of 50 years—what happened to the predictions from 2001, and in effect how science fiction did (or did not) turn into science fact. So what does this tell us about predictions we might make today?

In my observation things break into three basic categories. First, there are things people have been talking about for years, that will eventually happen—though it’s not clear when. Second, there are surprises that basically nobody expects, though sometimes in retrospect they may seem somewhat obvious. And third, there are things people talk about, but that potentially just won’t ever be possible in our universe, given how its physics works.

Something people have talked about for ages, that surely will eventually happen, is routine space travel. When 2001 was released, no humans had ever ventured beyond Earth orbit. But even by the very the next year, they’d landed on the Moon. And 2001 made what might have seemed like a reasonable prediction that by the year 2001 people would routinely be traveling to the Moon, and would be able to get as far as Jupiter.

Now of course in reality this didn’t happen. But actually it probably could have, if it had been considered a sufficient priority. But there just wasn’t the motivation for it. Yes, space has always been more broadly popular than, say, ocean exploration. But it didn’t seem important enough to put the necessary resources into.

Will it ever happen? I think it’s basically a certainty. But will it take 5 years or 50? It’s very hard to tell—though based on recent developments I would guess about halfway between.

People have been talking about space travel for well over a hundred years. They’ve been talking about what’s now called AI for even longer. And, yes, at times there’ve been arguments about how some feature of human intelligence is so fundamentally special that AI will never capture it. But I think it’s pretty clear at this point that AI is on an inexorable path to reproduce any and all features of whatever we would call intelligence.

A more mundane example of what one might call “inexorable technology development” is videophones. Once one had phones and one had television, it was sort of inevitable that eventually one would have videophones. And, yes, there were prototypes in the 1960s. But for detailed reasons of computer and telecom capacity and cost, videophone technology didn’t really become broadly available for a few more decades. But it was basically inevitable that it eventually would.

In science fiction, basically ever since radio was invented, it was common to imagine that in the future everyone would be able to communicate through radio instantly. And, yes, it took the better part of a century. But eventually we got cellphones. And in time we got smartphones that could serve as magic maps, and magic mirrors, and much more.

An example that’s today still at an earlier stage in its development is virtual reality. I remember back in the 1980s trying out early VR systems. But back then, they never really caught on. But I think it’s basically inevitable that they eventually will. Perhaps it will require having video that’s at the same quality level as human vision (as audio has now been for a couple of decades). And whether it’s exactly VR, or instead augmented reality, that eventually becomes widespread is not clear. But something like that surely will. Though exactly when is not clear.

There are endless examples one can cite. People have been talking about self-driving cars since at least the 1960s. And eventually they will exist. People have talked about flying cars for even longer. Maybe helicopters could have gone in this direction, but for detailed reasons of control and reliability that didn’t work out. Maybe modern drones will solve the problem. But again, eventually there will be flying cars. It’s just not clear exactly when.

Similarly, there will eventually be robotics everywhere. I have to say that this is something I’ve been hearing will “soon happen” for more than 50 years, and progress has been remarkably slow. But my guess is that once it’s finally figured out how to really do “general-purpose robotics”—like we can do general-purpose computation—things will advance very quickly.

And actually there’s a theme that’s very clear over the past 50+ years: what once required the creation of special devices is eventually possible by programming something that is general purpose. In other words, instead of relying on the structure of physical devices, one builds up capabilities using computation.

What is the end point of this? Basically it’s that eventually everything will be programmable right down to atomic scales. In other words, instead of specifically constructing computers, we’ll basically build everything “out of computers”. To me, this seems like an inevitable outcome. Though it happens to be one that hasn’t yet been much discussed, or, say, explored in science fiction.

Returning to more mundane examples, there are other things that will surely be possible one day, like drilling into the Earth’s mantle, or having cities under the ocean (both subjects of science fiction in the past—and there’s even an ad for a “Pan Am Underwater Hotel” visible on the space station in 2001). But whether these kinds of things will be considered worth doing is not so clear. Bringing back dinosaurs? It’ll surely be possible to get a good approximation to their DNA. How long all the necessary bioscience developments will take I don’t know, but one day one will surely be able to have a live stegosaurus again.

Perhaps one of the oldest “science fiction” ideas ever is immortality. And, yes, human lifespans have been increasing. But will there come a point where humans can for practical purposes be immortal? I am quite certain that there will. Quite whether the path will be primarily biological, or primarily digital, or some combination involving molecular-scale technology, I do not know. And quite what it will all mean, given the inevitable presence of an infinite number of possible bugs (today’s “medical conditions”), I am not sure. But I consider it a certainty that eventually the old idea of human immortality will become a reality. (Curiously, Kubrick—who was something of an enthusiast for things like cryonics—said in an interview in 1968 that one of the things he thought might have happened by the year 2001 is the “elimination of old age”.)

So what’s an example of something that won’t happen? There’s a lot we can’t be sure about without knowing the fundamental theory of physics. (And even given such a theory, computational irreducibility means it can be arbitrarily hard to work out the consequence for some particular issue.)  But two decent candidates for things that won’t ever happen are Honey-I-Shrunk-the-Kids miniaturization and faster-than-light travel.

Well, at least these things don’t seem likely to happen the way they are typically portrayed in science fiction. But it’s still possible that things that are somehow functionally equivalent will happen. For example, it perfectly well could be possible to “scan an object” at an atomic scale, and then “reinterpret it”, and build up using molecular-scale construction at least a very good approximation to it that happens to be much smaller.

What about faster-than-light travel? Well, maybe one will be able to deform spacetime enough that it’ll effectively be possible. Or conceivably one will be able to use quantum mechanics to effectively achieve it. But these kinds of solutions assume that what one cares about are things happening directly in our physical universe.

But imagine that in the future everyone has effectively been “uploaded” into some digital system—so that the “physics” one’s experiencing is instead something virtualized. And, yes, at the level of the underlying hardware maybe there will be restrictions based on the speed of light. But for purposes of the virtualized experience, there’ll be no such constraint. And, yes, in a setup like this, one can also imagine another science fiction favorite: time travel (notwithstanding its many philosophical issues).

OK, so what about surprises? If we look at the world today, compared to 50 years ago, it’s easy to identify some surprises. Computers are far more ubiquitous than almost anyone expected. And there are things like the web, and social media, that weren’t really imagined (even though perhaps in retrospect they seem “obvious”).

There’s another surprise, whose consequences are so far much less well understood, but that I’ve personally been very involved with: the fact that there’s so much complexity and richness to be found in the computational universe.

Almost by definition, “surprises” tend to occur when understanding what’s possible, or what makes sense, requires a change of thinking, or some kind of “paradigm shift”. Often in retrospect one imagines that such changes of thinking just occur—say in the mind of one particular person—out of the blue. But in reality what’s almost always going on is that there’s a progressive stack of understanding developed—which, perhaps quite suddenly, allows one to see something new.

And in this regard it’s interesting to reflect on the storyline of 2001. The first part of the movie shows an alien artifact—a black monolith—that appears in the world of our ape ancestors, and starts the process that leads to modern civilization. Maybe the monolith is supposed to communicate critical ideas to the apes by some kind of telepathic transmission.

But I like to have another interpretation. No ape 4 million years ago had ever seen a perfect black monolith, with a precise geometrical shape. But as soon as they saw one, they could tell that something they had never imagined was possible. And the result was that their worldview was forever changed. And—a bit like the emergence of modern science as a result of Galileo seeing the moons of Jupiter—that’s what allowed them to begin constructing what became modern civilization.

The Extraterrestrials

When I first saw 2001 fifty years ago nobody knew whether there would turn out to be life on Mars. People didn’t expect large animals or anything. But lichens or microorganisms seemed, if anything, more likely than not.

With radio telescopes coming online, and humans just beginning to venture out into space, it also seemed quite likely that before long we’d find evidence of extraterrestrial intelligence. But in general people seemed neither particularly excited, or particularly concerned, about this prospect. Yes, there would be mention of the time when a radio broadcast of H. G. Wells’s War of the Worlds story was thought to be a real alien invasion in New Jersey. But 20 or so years after the end of World War II, people were much more concerned about the ongoing Cold War, and what seemed like the real possibility that the world would imminently blow itself up in a giant nuclear conflagration.

The seed for what became 2001 was a rather nice 1951 short story by Arthur C. Clarke called “The Sentinel” about a mysterious pyramid discovered on the Moon, left there before life emerged on Earth, and finally broken open by humans using nuclear weapons, but found to have contents that were incomprehensible. Kubrick and Clarke worried that before 2001 was released, their story might have been overtaken by the actual discovery of extraterrestrial intelligence (and they even explored taking out insurance against this possibility).

But as it is, 2001 became basically the first serious movie exploration of what the discovery of extraterrestrial intelligence might be like. As I’ve recently discussed at length, deciding in the abstract whether or not something was really “produced by intelligence” is a philosophically deeply challenging problem. But at least in the world as it is today, we have a pretty good heuristic: things that look geometrically simpler (with straight edges, circles, etc.) are probably artifacts. Of course, at some level it’s a bit embarrassing that nature seems to quite effortlessly make things that look more complex than what we typically produce, even with all our engineering prowess. And, as I’ve argued elsewhere, as we learn to take advantage of more of the computational universe, this will no doubt change. But at least for now, the “if it’s geometrically simple, it’s probably an artifact” heuristic works quite well.

And in 2001 we see it in action—when the perfectly cuboidal black monolith appears on the 4-million-year-old Earth: it’s visually very obvious that it isn’t something that belongs, and that it’s something that was presumably deliberately constructed.

A little later in the movie, another black monolith is discovered on the Moon. It’s noticed because of what’s called in the movie the “Tycho Magnetic Anomaly” (“TMA-1”)—probably named by Kubrick and Clarke after the South Atlantic Anomaly associated with the Earth’s radiation belts, that was discovered in 1958. The magnetic anomaly could have been natural (“a magnetic rock”, as one of the characters says). But once it’s excavated and found to be a perfect black cuboidal monolith, extraterrestrial intelligence seems the only plausible origin.

As I’ve discussed elsewhere, it’s hard to even recognize intelligence that doesn’t have any historical or cultural connection to our own. And it’s essentially inevitable that this kind of alien intelligence will seem to us in many ways incomprehensible. (It’s a curious question, though, what would happen if the alien intelligence had already inserted itself into the distant past of our own history, as in 2001.)

Kubrick and Clarke at first assumed that they’d have to actually show extraterrestrials somewhere in the movie. And they worried about things like how many legs they might have. But in the end Kubrick decided that the only alien that had the degree of impact and mystery that he wanted was an alien one never actually saw.

And so, for the last 17% of 2001, after Dave Bowman goes through the “star gate” near Jupiter, one sees what was probably supposed to be purposefully incomprehensible—if aesthetically interesting. Are these scenes of the natural world elsewhere in the universe? Or are these artifacts created by some advanced civilization?

Stargate

We see some regular geometric structures, that read to us like artifacts. And we see what appear to be more fluid or organic forms, that do not. For just a few frames there are seven strange flashing octahedra.

Flashing octahedra

I’m pretty sure I never noticed these when I first saw 2001 fifty years ago. But in 1997, when I studied the movie in connection with HAL’s birthday, I’d been thinking for years about the origins of complexity, and about the differences between natural and artificial systems—so the octahedra jumped out at me (and, yes, I spent quite a while wrangling the LaserDisc version of 2001 I had back then to try to look at them more carefully).

I didn’t know what the octahedra were supposed to be. With their regular flashing, I at first assumed they were meant to be some kind of space beacons. But I’m told that actually they were supposed to be the extraterrestrials themselves, appearing in a little cameo. Apparently there’d been an earlier version of the script in which the octahedra wound up riding in a ticker tape parade in New York City—but I think the cameo was a better idea.

When Kubrick was interviewed about 2001, he gave an interesting theory for the extraterrestrials:  “They may have progressed from biological species, which are fragile shells for the mind at best, into immortal machine entities—and then, over innumerable eons, they could emerge from the chrysalis of matter transformed into beings of pure energy and spirit. Their potentialities would be limitless and their intelligence ungraspable by humans.”

It’s interesting to see Kubrick grappling with the idea that minds and intelligence don’t have to have physical form. Of course, in HAL he’d already in a sense imagined a “non-physical mind”. But back in the 1960s, with the idea of software only just emerging, there wasn’t yet a clear notion that computation could be something meaningful in its own right, independent of the particulars of its “hardware” implementation.

That universal computation was possible had arisen as an essentially mathematical idea in the 1930s. But did it have physical implications? In the 1980s I started talking about things like computational irreducibility, and about some of the deep connections between universal computation and physics. But back in the 1950s, people looked for much more direct implications of universal computation. And one of the notable ideas that emerged was of “universal constructors”—that would somehow be able to construct anything, just as universal computers could compute anything.

In 1952—as part of his attempt to “mathematicize” biology—John von Neumann wrote a book about “self-reproducing automata” in which he came up with what amounts to an extremely complicated 2D cellular automaton that can have a configuration that reproduces itself. And of course—as was discovered in 1953—it turns out to be correct that digital information, as encoded in DNA, is what specifies the construction of biological organisms.

But in a sense von Neumann’s efforts were based on the wrong intuition. For he assumed (as I did, before I saw evidence to the contrary) that to make something that has a sophisticated feature like self-reproduction, the thing itself must somehow be correspondingly complicated.

But as I discovered many years later by doing experiments in the computational universe of simple programs, it’s just not true that it takes a complicated system to show complicated behavior: even systems (like cellular automata) with some of the simplest imaginable rules can do it. And indeed, it’s perfectly possible to have systems with very simple rules that show self-reproduction—and in the end self-reproduction doesn’t seem like a terribly special feature at all (think computer code that copies itself, etc.).

But back in the 1950s von Neumann and his followers didn’t know that. And given the enthusiasm for things to do with space, it was inevitable that the idea of “self-reproducing machines” would quickly find its way into notions of self-reproducing space probes (as well as self-reproducing lunar factories, etc.)

I’m not sure if these threads had come together by the time 2001 was made, but certainly by the time of the 2010 sequel, Arthur C. Clarke had decided that the black monoliths were self-reproducing machines. And in a scene reminiscent of the modern idea that AIs, when given the directive to make more paperclips, might turn everything (including humans) into paperclips, the 2010 movie includes black monoliths turning the entire planet of Jupiter into a giant collection of black monoliths.

What are the aliens trying to do in 2001? I think Kubrick recognized that their motivations would be difficult to map onto anything human. Why for example does Dave Bowman wind up in what looks like a Louis-XV-style hotel suite—that’s probably the most timeless human-created backdrop of the movie (except for the fact that in keeping with 1960s practices, there’s a bathtub but no shower in the suite)?

It’s interesting that 2001 contains both artificial and extraterrestrial intelligence. And it’s interesting that 50 years after 2001 was released, we’re getting more and more comfortable with the idea of artificial intelligence, yet we believe we’ve seen no evidence of extraterrestrial intelligence.

As I’ve argued extensively elsewhere, I think the great challenge of thinking about extraterrestrial intelligence is defining what we might mean by intelligence. It’s very easy for us humans to have the analog of a pre-Copernican view in which we assume that our intelligence and capabilities are somehow fundamentally special, just like the Earth used to be assumed to be at the center of the universe.

But what my Principle of Computational Equivalence suggests is that in fact we’ll never be able to define anything fundamentally special about our intelligence; what’s special about it is its particular history and connections. Does the weather “have a mind of its own”? Well, based on the Principle of Computational Equivalence I don’t think there’s anything fundamentally different about the computations it’s doing from the ones that go on in our brains.

And similarly, when we look out into the cosmos, it’s easy to see examples of sophisticated computation going on. Of course, we don’t think of the complex processes in a pulsar magnetosphere  as “extraterrestrial intelligence”; we just think of them as something “natural”. In the past we might have argued that however complex such a process looks, it’s really somehow fundamentally simpler than human intelligence. But given the Principle of Computational Equivalence we know this isn’t true.

So why don’t we consider a pulsar magnetosphere to be an example of “intelligence”? Well, because in it we don’t recognize anything like our own history, or our own detailed behavior. And as a result, we don’t have a way to connect what it does with purposes that we humans understand.

The computational universe of all possible programs is full of sophisticated computations that aren’t aligned with any existing human purposes. But as we try to develop AI, what we are effectively doing is to mine that computational universe for programs that do things we want done.

Out there in the computational universe, though, there’s an infinite collection of “possible AIs”. And there’s nothing less capable about the ones that we don’t yet choose to use; we just don’t see how they align with things we want.

Artificial intelligence is in a sense the first example of alien intelligence that we’re seeing (yes, there are animals too, but it’s easier to connect with AI). We’re still at the very early stages of getting widespread intuition about AI. But as we understand more about what AI really can be, and how it relates to everything else in the computational universe, I think we’ll get a clearer perspective on the forms intelligence can take.

Will we find extraterrestrial intelligence? Well, in many respects I think we already have. It’s all around us in the universe—doing all kinds of sophisticated computations.

Will there ever be a dramatic moment, like in 2001, where we find extraterrestrial intelligence that’s aligned enough with our own intelligence that we can recognize the perfect black monoliths it makes—even if we can’t figure out their “purpose”? My current suspicion is that it’ll be more “push” than “pull”: instead of seeing something that we suddenly recognize, we’ll instead gradually generalize our notion of intelligence, until we start to be comfortable attributing it not just to ourselves and our AIs, but also to other things in the universe.

Personal Journey

When I first saw 2001 I don’t think I ever even calculated how old I’d be in the year 2001. I was always thinking about what the future might be like, but I didn’t internalize actually living through it. Back when I was 8 years old, in 1968, space was my greatest interest, and I made lots of little carefully stapled booklets, full of typewritten text and neatly drawn diagrams. I kept detailed notes on every space probe that was launched, and tried to come up with spacecraft (I wrote it “space-craft”) designs of my own.

What made me do this? Well, presaging quite a bit that I’ve done in my life, I did it just because I found it personally interesting. I never showed any of it to anyone, and never cared what anyone might think of it. And for nearly 50 years I’ve just had it all stored away. But looking at it again now, I found one unique example of something related to my interests that I did for school: a booklet charmingly titled “The Future”, written when I was 9 or 10 years old, and containing what’s to me now a cringingly embarrassing page of my predictions for the future of space exploration (complete with a nod to 2001):

The Future by Stephen Wolfram

Fortunately perhaps, I didn’t wait around to find out how wrong these predictions were, and within a couple of years my interest in space had transformed into interests in more foundational fields, first physics and then computation and the study of the computational universe. When I first started using computers around 1972, it was a story of paper tape and teleprinters—far from the flashing screens of 2001.

But I’ve been fortunate enough to live through a time when the computer technology of 2001 went from pure fiction to something close to fact. And I’ve been even more fortunate to have been able to contribute a bit to that.

I’ve often said—in a kind of homage to 2001—that my favorite personal aspiration is to build “alien artifacts”: things that are recognizable once they’re built, but which nobody particularly expected would exist or be possible. I like to think that Wolfram|Alpha is some kind of example—as is what the Wolfram Language has become. And in a sense so have my efforts been in exploring the computational universe.

I never interacted with Stanley Kubrick. But I did interact with Arthur C. Clarke, particularly when my big book A New Kind of Science was being published. (I like to think that the book is big in content, but it is definitely big in size, with 1280 pages, weighing nearly 6 pounds.) Arthur C. Clarke asked for a pre-publication copy, which I duly sent, and on March 1, 2002, I received an email from him saying that “A ruptured postman has just staggered away from my front door… Stay tuned…..”.

Then, three days later, I got another piece of mail: “Well, I have <looked> at (almost) every page and am still in a state of shock. Even with computers, I don’t see how you could have done it.”  Wow! I actually succeeded in making what seemed to Arthur C. Clarke like an alien artifact!

He offered me a back-cover quote for the book: “… Stephen’s magnum opus may be the book of the decade, if not the century. It’s so comprehensive that perhaps he should have called it ‘A New Kind of Universe’, and even those who skip the 1200 pages of (extremely lucid) text will find the computer-generated illustrations fascinating. My friend HAL is very sorry he hadn’t thought of them first…” (In the end Steve Jobs talked me out of having quotes on the book, though, saying “Isaac Newton didn’t have back-cover quotes; why do you want them?”)

It’s hard for me to believe it’s been 50 years since I first saw 2001. Not all of 2001 has come true (yet). But for me what was important was that it presented a vision of what might be possible—and an idea of how different the future might be. It helped me set the course of my life to try to define in whatever ways I can what the future will be. And not just waiting for aliens to deliver monoliths, but trying to build some “alien artifacts” myself.

Launching the Wolfram Challenges Site

$
0
0
thumbnail

Wolfram Challenges

The more one does computational thinking, the better one gets at it. And today we’re launching the Wolfram Challenges site to give everyone a source of bite-sized computational thinking challenges based on the Wolfram Language. Use them to learn. Use them to stay sharp. Use them to prove how great you are.

The Challenges typically have the form: “Write a function to do X”. But because we’re using the Wolfram Language—with all its built-in computational intelligence—it’s easy to make the X be remarkably sophisticated.

The site has a range of levels of Challenges. Some are good for beginners, while others will require serious effort even for experienced programmers and computational thinkers. Typically each Challenge has at least some known solution that’s at most a few lines of Wolfram Language code. But what are those lines of code?

There may be many different approaches to a particular Challenge, leading to very different kinds of code. Sometimes the code will be smaller, sometimes it will run faster, and so on. And for each Challenge, the site maintains a leaderboard that shows who’s got the smallest, the fastest, etc. solution so far.

What does it take to be able to tackle Challenges on the site? If you’ve read my An Elementary Introduction to the Wolfram Language, for example, you should be well prepared—maybe with some additional help on occasion from the main Wolfram Language documentation. But even if you’re more of a beginner, you should still be able to do simpler Challenges, perhaps looking at parts of my book when you need to. (If you’re an experienced programmer, a good way to jump-start yourself is to look at the Fast Introduction for Programmers.)

How It Works

There are lots of different kinds of Challenges on the site. Each Challenge is tagged with topic areas. And on the front page there are a number of “tracks” that you can use as guides to sequences of related Challenges. Here are the current Challenges in the Real-World Data track:

Real-World Data Challenges

Click one you want to try—and you’ll get a webpage that explains the Challenge:

Antipode Above or Below Sea Level Challenge

Now you can choose either to download the Challenge notebook to the desktop, or just open it directly in your web browser in the Wolfram Cloud. (It’s free to use the Wolfram Cloud for this, though you’ll have to have a login—otherwise the system won’t be able to give you credit for the Challenges you’ve solved.)

Here’s the cloud version of this particular notebook:

Challenge cloud notebook

You can build up your solution in the Scratch Area, and try it out there. Then when you’re ready, put your code where it says “Enter your code here”. Then press Submit.

What Submit does is to send your solution to the Wolfram Cloud—where it’ll be tested to see if it’s correct. If it’s not correct, you’ll get something like this:

Error code

But if it’s correct, you’ll get this, and you’ll be able to go to the leaderboard and see how your solution compared to other people’s. You can submit the same Challenge as many times as you want. (By the way, you can pick your name and icon for the leaderboard from the Profile tab.)

Challenges leaderboard

The Range of Challenges

The range of Challenges on the site is broad both in terms of difficulty level and topic. (And, by the way, we’re planning to progressively grow the site, not least through material from outside contributors.)

Here’s an example of a simple Challenge, that for example I can personally solve in a few seconds:

Butterflied Strings Challenge

Here’s a significantly more complicated Challenge, that took me a solid 15 minutes to solve at all well:

Babbage Squares Challenge

Some of the Challenges are in a sense “pure algorithm challenges” that don’t depend on any outside data:

Maximal Contiguous Sum Challenge

Some of the Challenges are “real-world”, and make use of the Wolfram Knowledgebase:

Country Chains Challenge

And some of the Challenges are “math-y”, and make use of the math capabilities of the Wolfram Language:

Factorial Zeros Challenge

Count the Number of Squares Challenge

Pre-launch Experience

We’ve been planning to launch a site like Wolfram Challenges for years, but it’s only now, with the current state of the Wolfram Cloud, that we’ve been able to set it up as we have today—so that anyone can just open a web browser and start solving Challenges.

Still, we’ve had unannounced preliminary versions for about three years now—complete with a steadily growing number of Challenges. And in fact, a total of 270 people have discovered the preliminary version—and produced in all no less than 11,400 solutions. Some people have solved the same Challenge many times, coming up with progressively shorter or progressively faster solutions. Others have moved on to different Challenges.

It’s interesting to see how diverse the solutions to even a single Challenge can be. Here are word clouds of the functions used in solutions to three different Challenges:

Functions used in Wolfram Challenges

And when it comes to lengths of solutions (here in characters of code), there can be quite a variation for a particular Challenge:

Length of solutions in Wolfram Challenges

Here’s the distribution of solution lengths for all solutions submitted during the pre-launch period, for all Challenges:

Solution lengths for submitted solutions

It’s not clear what kind of distribution this is (though it seems close to lognormal). But what’s really nice is how concentrated it is on solutions that aren’t much more than a line long. (81% of them would even fit in a 280-character tweet!)

And in fact what we’re seeing can be viewed as a great tribute to the Wolfram Language. In any other programming language most Challenges—if one could do them at all—would take pages of code. But in the Wolfram Language even sophisticated Challenges can often be solved with just tweet-length amounts of code.

Why is this? Well, basically it’s because the Wolfram Language is a different kind of language: it’s a knowledge-based language where lots of knowledge about computation and other things is built right into the language (thanks to 30+ years of hard work on our part).

But then are the Challenges still “real”? Of course! It’s just that the Wolfram Language lets one operate at a higher level. One doesn’t have to worry about writing out the low-level mechanics of how even sophisticated operations get implemented—one can just concentrate on the pure high-level computational thinking of how to get the Challenge done.

Under the Hood

OK, so what have been some of the challenges in setting up the Wolfram Challenges site? Probably the most important is how to check whether a particular solution is correct. After all, we’re not just asking to compute some single result (say, 42) that we can readily compare with. We’re asking to create a function that can take a perhaps infinite set of possible arguments, and in each case give the correct result.

So how can we know if the function is correct? In some simple cases, we can actually see if the code of the function can be transformed in a meaning-preserving way into code that we already know is correct. But most of the time—like in most practical software quality assurance—the best thing to do is just to try test cases. Some will be deterministically chosen—say based on checking simple or corner cases. Others can be probabilistically generated.

But in the end, if we find that the function isn’t correct, we want to give the user a simple case that demonstrates this. Often in practice we may first see failure in some fairly complicated case—but then the system tries to simplify the failure as much as possible.

OK, so another issue is: how does one tell whether a particular value of a function is correct? If the value is just something like an integer (say, 343) or a string (say, “hi”), then it’s easy. But what if it’s an approximate number (say, 3.141592…)? Well, then we have to start worrying about numerical precision. And what if it’s a mathematical expression (say, 1 + 1/x)? What transformations should we allow on the expression?

There are many other cases too. If it’s a network, we’ll probably want to say it’s correct if it’s isomorphic to what we expect (i.e. the same up to relabeling nodes). If it’s a graphic, we’ll probably want to say it’s correct if it visually looks the same as we expected, or at least is close enough. And if we’re dealing with real-world data, then we have to make sure to recompute our expected result, to take account of data in our knowledgebase that’s changed because of changes out there in the real world.

Alright, so let’s say we’ve concluded that a particular function is correct. Well now, to fill in the leaderboard, we have to make some measurements on it. First, how long is the code?

We can just format the code in InputForm, then count characters. That gives us one measure. One can also apply ByteCount to just count bytes in the definition of the function. Or we can apply LeafCount, to count the number of leaves in the expression tree for the definition. The leaderboard separately tracks the values for all these measures of “code size”.

OK, so how about the speed of the code? Well, that’s a bit tricky. First because speed isn’t something abstract like “total number of operations on a Turing machine”—it’s actual speed running a computer. And so it has be normalized for the speed of the computer hardware. Then it has to somehow discard idiosyncrasies (say associated with caching) seen in particular test runs, as achieved by RepeatedTiming. Oh, and even more basically, it has to decide which instances of the function to test, and how to average them. (And it has to make sure that it won’t waste too much time chasing an incredibly slow solution.)

Well, to actually do all these things, one has to make a whole sequence of specific decisions. And in the end what we’ve done is to package everything up into a single “speed score” that we report in the leaderboard.

A final metric in the leaderboard is “memory efficiency”. Like “speed score”, this is derived in a somewhat complicated way from actual test runs of the function. But the point is that within narrow margins, the results should be repeatable between identical solutions. (And, yes, the speed and memory leaderboards might change when they’re run in a new version of the Wolfram Language, with different optimizations.)

Backstory

We first started testing what’s now the Wolfram Challenges site at the Wolfram Summer School in 2016—and it was rapidly clear that many people found the kinds of Challenges we’d developed quite engaging. At first we weren’t sure how long—and perhaps whimsical—to make the Challenges. We experimented with having whole “stories” in each Challenge (like some math competitions and things like Project Euler do). But pretty soon we decided to restrict Challenges to be fairly short to state—albeit sometimes giving them slightly whimsical names.

We tested our Challenges again at the 2017 Wolfram Summer School, as well as at the Wolfram High School Summer Camp—and we discovered that the Challenges were addictive enough that some people systematically went through trying to solve all of them.

We were initially not sure what forms of Challenges to allow. But after a while we made the choice to (at least initially) concentrate on “write a function to do X”, rather than, for example, just “compute X”. Our basic reason was that we wanted the solutions to the Challenges to be more open-ended.

If the challenge is “compute X”, then there’s typically just one final answer, and once you have it, you have it. But with “write a function to do X”, there’s always a different function to write—that might be faster, smaller, or just different. At a practical level, with “compute X” it’s easier to “spoil the fun” by having answers posted on the web. With “write a function”, yes, there could be one version of code for a function posted somewhere, but there’ll always be other versions to write—and if you always submit versions that have been seen before it’ll soon be pretty clear you have to have just copied them from somewhere.

As it turns out, we’ve actually had quite a bit of experience with the “compute X” format. Because in my book An Elementary Introduction to the Wolfram Language all 655 exercises are basically of the form “write code to compute X”. And in the online version of the book, all these exercises are automatically graded.

Automatic grading

Now, if we were just doing “cheap” automatic grading, we’d simply look to see if the code produces the correct result when it runs. But that doesn’t actually check the code. After all, if the answer was supposed to be 42, someone could just give 42 (or maybe 41+1) as the “code”.

Our actual automatic grading system is much more sophisticated. It certainly looks at what comes out when the code runs (being careful not to blindly evaluate Quit in a piece of code—and taking account of things like random numbers or graphics or numerical precision). But the real meat of the system is the analysis of the code itself, and the things that happen when it runs.

Because the Wolfram Language is symbolic, “code” is the same kind of thing as “data”. And the automatic grading system makes extensive use of this—not least in applying sequences of symbolic code transformations to determine whether a particular piece of code that’s been entered is equivalent to one that’s known to represent an appropriate solution. (The system has ways to handle “completely novel” code structures too.)

Code equivalence is a difficult (in fact, in general, undecidable) problem. A slightly easier problem (though still in general undecidable) is equivalence of mathematical expressions. And a place where we’ve used this kind of equivalence extensively is in our Wolfram Problem Generator:

Wolfram Problem Generator

Of course, exactly what equivalence we want to allow may depend on the kind of problem we’re generating. Usually we’ll want 1+x and x+1 to be considered equivalent. But (1+x)/x might or might not want to be considered equivalent to 1+1/x. It’s not easy to get these things right (and many online grading systems do horribly at it). But by using some of the sophisticated math and symbolic transformation capabilities available in the Wolfram Language, we’ve managed to make this work well in Wolfram Problem Generator.

Contribute New Challenges!

The Wolfram Challenges site as it exists today is only the beginning. We intend it to grow. And the best way for it to grow—like our long-running Wolfram Demonstrations Project—is for people to contribute great new Challenges for us to include.

At the bottom of the Wolfram Challenges home page you can download the Challenges Authoring Notebook:

Challenges Authoring Notebook

Fill this out, press “Submit Challenge”—and off this will go to us for review.

Beyond Challenges

I’m not surprised that Wolfram Challenges seem to appeal to people who like solving math puzzles, crosswords, brain teasers, sudoku and the like. I’m also not surprised that they appeal to people who like gaming and coding competitions. But personally—for better or worse—I don’t happen to fit into any of these categories. And in fact when we were first considering creating Wolfram Challenges I said “yes, lots of people will like it, but I won’t be one of them”.

Well, I have to say I was wrong about myself. Because actually I really like doing these Challenges—and I’m finding I have to avoid getting started on them because I’ll just keep doing them (and, yes, I’m a finisher, so there’s a risk I could just keep going until I’ve done them all, which would be a very serious investment of time).

So what’s different about these Challenges? I think the answer for me is that they feel much more real. Yes, they’ve been made up to be Challenges. But the kind of thinking that’s needed to solve them is essentially just the same as the kind of thinking I end up doing all the time in “real settings”. So when I work on these Challenges, I don’t feel like I’m “just doing something recreational”; I feel like I’m honing my skills for real things.

Now I readily recognize that not everyone’s motivation structure is the same—and many people will like doing these Challenges as true recreations. But I think it’s great that Challenges can also help build real skills. And of course, if one sees that someone has done lots of these Challenges, it shows that they have some real skills. (And, yes, we’re starting to use Challenges as a way to assess applicants, say, for our summer programs.)

It’s worth saying there are some other nice “potentially recreational” uses of the Wolfram Language too.

One example is competitive livecoding. The Wolfram Language is basically unique in being a language in which interesting programs can be written fast enough that it’s fun to watch. Over the years, I’ve done large amounts of (non-competitive) livecoding—both in person and livestreamed. But in the past couple of years we’ve been developing the notion of competitive livecoding as a kind of new sport.

Wolfram Technology Conference

We’ve done some trial runs at our Wolfram Technology Conference—and we’re working towards having robust rules and procedures. In what we’ve done so far, the typical challenges have been of the “compute X” form—and people have taken between a few seconds and perhaps ten minutes to complete them. We’ve used what’s now our Wolfram Chat functionality to distribute Challenges and let contestants submit solutions. And we’ve used automated testing methods—together with human “refereeing”—to judge the competitions.

A different kind of recreational application of the Wolfram Language is our Tweet-a-Program service, released in 2014. The idea here is to write Wolfram Language programs that are short enough to fit in a tweet (and when we launched Tweet-a-Program that meant just 128 characters)—and to make them produce output that is as interesting as possible:

Tweet-a-Program output

We’ve also had a live analog of this at our Wolfram Technology Conference for some time: our annual One-Liner Competition. And I have to say that even though I (presumably) know the Wolfram Language well, I’m always amazed at what people actually manage to do with just a single line of Wolfram Language code.

At our most recent Wolfram Technology Conference, in recognition of our advances in machine learning, we decided to also do a “Machine-Learning Art Competition”—to make the most interesting possible restyled “Wolfie”:

Wolfie submissions

In the future, we’re planning to do machine learning challenges as part of Wolfram Challenges too. In fact, there are several categories of Challenges we expect to add. We’ve already got Challenges that make use of the Wolfram Knowledgebase, and the built-in data it contains. But we’re also planning to add Challenges that use external data from the Wolfram Data Repository. And we want to add Challenges that involve creating things like neural networks.

There’s a new issue that arises here—and that’s actually associated with a large category of possible Challenges. Because with most uses of things like neural networks, one no longer expects to produce a function that definitively “gets the right answer”. Instead, one just wants a function that does the best possible job on a particular task.

There are plenty of examples of Challenges one can imagine that involve finding “the lowest-cost solution”, or the “best fit”. And it’s a similar setup with typical machine learning tasks: find a function (say based on a neural network) that performs best on classifying a certain test set, etc.

And, yes, the basic structure of Wolfram Challenges is well set up to handle a situation like this. It’s just that instead of it definitively telling you that you’ve got a correct solution for particular Challenge, it’ll just tell you how your solution ranks relative to others on the leaderboard.

The Challenges in the Wolfram Challenges site always have very well-defined end goals. But one of the great things about the Wolfram Language is how easy it is to use it to explore and create in an open-ended way. But as a kind of analog of Challenges one can always give seeds for this. One example is the Go Further sections of the Explorations in Wolfram Programming Lab. And other examples are the many kinds of project suggestions we make for things like our summer programs.

What is the right output for an open-ended exploration? I think a good answer in many cases is a computational essay, written in a Wolfram Notebook, and “telling a story” with a mixture of ordinary text and Wolfram Language code. Of course, unlike Challenges, where one’s doing something that’s intended to be checked and analyzed by machine, computational essays are fundamentally about communicating with humans—and don’t have right or wrong “answers”.

The Path Forward

One of my overarching goals in creating the Wolfram Language has been to bring computational knowledge and computational thinking to as many people as possible. And the launch of the Wolfram Challenges site is the latest step in the long journey of doing this.

It’s a great way to engage with programming and computational thinking. And it’s set up to always let you know how you’re getting on. Did you solve that Challenge? How did you do relative to other people who’ve also solved the Challenge?

I’m looking forward to seeing just how small and efficient people can make the solutions to these Challenges. (And, yes, large numbers of equivalent solutions provide great raw material for doing machine learning on program transformations and optimization.)

Who will be the leaders on the leaderboards of Wolfram Challenges? I think it’ll be a wide range of people—with different backgrounds and education. Some will be young; some will be old. Some will be from the most tech-rich parts of the world; some, I hope, will be from tech-poor areas. Some will already be energetic contributors to the Wolfram Language community; others, I hope, will come to the Wolfram Language through Challenges—and perhaps even be “discovered” as talented programmers and computational thinkers this way.

But most of all, I hope lots of people get lots of enjoyment and fulfillment out of Wolfram Challenges—and get a chance to experience that thrill that comes with figuring out a particularly clever and powerful solution that you can then see run on your computer.

Viewing all 204 articles
Browse latest View live