On June 23 we celebrate the 30th anniversary of the launch of Mathematica. Most software from 30 years ago is now long gone. But not Mathematica. In fact, it feels in many ways like even after 30 years, we’re really just getting started. Our mission has always been a big one: to make the world as computable as possible, and to add a layer of computational intelligence to everything.
Our first big application area was math (hence the name “Mathematica”). And we’ve kept pushing the frontiers of what’s possible with math. But over the past 30 years, we’ve been able to build on the framework that we defined in Mathematica 1.0 to create the whole edifice of computational capabilities that we now call the Wolfram Language—and that corresponds to Mathematica as it is today.
From when I first began to design Mathematica, my goal was to create a system that would stand the test of time, and would provide the foundation to fill out my vision for the future of computation. It’s exciting to see how well it’s all worked out. My original core concepts of language design continue to infuse everything we do. And over the years we’ve been able to just keep building and building on what’s already there, to create a taller and taller tower of carefully integrated capabilities.
It’s fun today to launch Mathematica 1.0 on an old computer, and compare it with today:
Yes, even in Version 1, there’s a recognizable Wolfram Notebook to be seen. But what about the Mathematica code (or, as we would call it today, Wolfram Language code)? Well, the code that ran in 1988 just runs today, exactly the same! And, actually, I routinely take code I wrote at any time over the past 30 years and just run it.
Of course, it’s taken a lot of long-term discipline in language design to make this work. And without the strength and clarity of the original design it would never have been possible. But it’s nice to see that all that daily effort I’ve put into leadership and consistent language design has paid off so well in long-term stability over the course of 30 years.
There were 551 built-in functions in 1988; there are now more than 5100. And the expectations for each function have vastly increased too. The concept of “superfunctions” that automate a swath of algorithmic capability already existed in 1988—but their capabilities pale in comparison to our modern superfunctions.
Back in 1988 the core ideas of symbolic expressions and symbolic programming were already there, working essentially as they do today. And there were also all sorts of functions related to mathematical computation, as well as to things like basic visualization. But in subsequent years we were able to conquer area after area.
Partly it’s been the growth of raw computer power that’s made new areas possible. And partly it’s been our ability to understand what could conceivably be done. But the most important thing has been that—through the integrated design of our system—we’ve been able to progressively build on what we’ve already done to reach one new area after another, at an accelerating pace. (Here’s a plot of function count by version.)
I recently found a to-do list I wrote in 1991—and I’m happy to say that now, in 2018, essentially everything on it has been successfully completed. But in many cases it took building a whole tower of capabilities—over a large number of years—to be able to achieve what I wanted.
From the very beginning—and even from projects of mine that preceded Mathematica—I had the goal of building as much knowledge as possible into the system. At the beginning the knowledge was mostly algorithmic, and formal. But as soon we could routinely expect network connectivity to central servers, we started building in earnest what’s now our immense knowledgebase of computable data about the real world.
Back in 1988, I could document pretty much everything about Mathematica in the 750-page book I wrote. Today if we were to print out the online documentation it would take perhaps 36,000 pages. The core concepts of the system remain as simple and clear as they ever were, though—so it’s still perfectly possible to capture them even in a small book.
How the World Has Changed
Thirty years is basically half the complete history of modern digital computing. And it’s remarkable—and very satisfying—that Mathematica and the Wolfram Language have had the strength not only to persist, but to retain their whole form and structure, across all that time.
Thirty years ago Mathematica (all 2.2 megabytes of it) came in boxes available at “neighborhood software stores”, and was distributed on collections of floppy disks (or, for larger computers, on various kinds of magnetic tapes). Today one just downloads it anytime (about 4 gigabytes), accessing its knowledgebase (many terabytes) online—or one just runs the whole system directly in the Wolfram Cloud, through a web browser. (In a curious footnote to history, the web was actually invented back in 1989 on a collection of NeXT computers that had been bought to run Mathematica.)
Thirty years ago there were “workstation class computers” that ran Mathematica, but were pretty much only owned by institutions. In 1988, PCs used MS-DOS, and were limited to 640K of working memory—which wasn’t enough to run Mathematica. The Mac could run Mathematica, but it was always a tight fit (“2.5 megabytes of memory required; 4 megabytes recommended”)—and in the footer of every notebook was a memory gauge that showed you how close you were to running out of memory. Oh, yes, and there were two versions of Mathematica, depending on whether or not your machine had a “numeric coprocessor” (which let it do floating-point arithmetic in hardware rather than in software).
Back in 1988, I had got my first cellphone—which was the size of a shoe. And the idea that something like Mathematica could “run on a phone” would have seemed preposterous. But here we are today with the Wolfram Cloud app on phones, and Wolfram Player running natively on iPads (and, yes, they don’t have virtual memory, so our tradition of tight memory management from back in the old days comes in very handy).
In 1988, computers that ran Mathematica were always things you plugged into a power outlet to use. And the notion of, for example, using Mathematica on a plane was basically inconceivable (well, OK, even in 1981 when I lugged my Osborne 1 computer running CP/M onto a plane, I did find one power outlet for it at the very back of a 747). It wasn’t until 1991 that I first proudly held up at a talk a Compaq laptop that was (creakily) running Mathematica off batteries—and it wasn’t routine to run Mathematica portably for perhaps another decade.
For years I used to use 1989^1989 as my test computation when I tried Mathematica on a new machine. And in 1989 I would usually be counting the seconds waiting for the computation to be finished. (1988^1988 was usually too slow to be useful back in 1988: it could take minutes to return.) Today, of course, the same computation is instantaneous. (Actually, a few years ago, I did the computation again on the first Raspberry Pi computer—and it again took several seconds. But that was a $25 computer. And now even it runs the computation very fast.)
The increase in computer speed over the years has had not only quantitative but also qualitative effects on what we’ve been able to do. Back in 1988 one basically did a computation and then looked at the result. We talked about being able to interact with a Mathematica computation in real time (and there was actually a demo on the NeXT computer that did a simple case of this even in 1989). But it basically took 18 years before computers were routinely fast enough that we could implement Manipulate and Dynamic—with “Mathematica in the loop”.
I considered graphics and visualization an important feature of Mathematica from the very beginning. Back then there were “paint” (bitmap) programs, and there were “draw” (vector) programs. We made the decision to use the then-new PostScript language to represent all our graphics output resolution-independently.
We had all sorts of computational geometry challenges (think of all those little shattered polygons), but even back in 1988 we were able to generate resolution-independent 3D graphics, and in preparing for the original launch of Mathematica we found the “most complicated 3D graphic we could easily generate”, and ended up with the original icosahedral “spikey”—which has evolved today into our rhombic hexecontahedron logo:
In a sign of a bygone software era, the original Spikey also graced the elegant, but whimsical, Mathematica startup screen on the Mac:
Back in 1988, there were command-line interfaces (like the Unix shell), and there were word processors (like WordPerfect). But it was a new idea to have “notebooks” (as we called them) that mixed text, input and output—as well as graphics, which more usually were generated in a separate window or even on a separate screen.
Even in Mathematica 1.0, many of the familiar features of today’s Wolfram Notebooks were already present: cells, cell groups, style mechanisms, and more. There was even the same doubled-cell-bracket evaluation indicator—though in those days longer rendering times meant there needed to be more “entertainment”, which Mathematica provided in the form of a bouncing-string-figure wait cursor that was computed in real time during the vertical retrace interrupt associated with refreshing the CRT display.
In what would now be standard good software architecture, Mathematica from the very beginning was always divided into two parts: a kernel doing computations, and a front end supporting the notebook interface. The two parts communicated through the MathLink protocol (still used today, but now called WSTP) that in a very modern way basically sent symbolic expressions back and forth.
Back in 1988—with computers like Macs straining to run Mathematica—it was common to run the front end on a local desktop machine, and then have a “remote kernel” on a heftier machine. Sometimes that machine would be connected through Ethernet, or rarely through the internet. More often one would use a dialup connection, and, yes, there was a whole mechanism in Version 1.0 to support modems and phone dialing.
When we first built the notebook front end, we thought of it as a fairly thin wrapper around the kernel—that we’d be able to “dash off” for the different user interfaces of different computer systems. We built the front end first for the Mac, then (partly in parallel) for the NeXT. Within a couple of years we’d built separate codebases for the then-new Microsoft Windows, and for X Windows.
But as we polished the notebook front end it became more and more sophisticated. And so it was a great relief in 1996 when we managed to create a merged codebase that ran on all platforms.
And for more than 15 years this was how things worked. But then along came the cloud, and mobile. And now, out of necessity, we again have multiple notebook front end codebases. Maybe in a few years we’ll be able to merge them again. But it’s funny how the same issues keep cycling around as the decades go by.
Unlike the front end, we designed the kernel from the beginning to be as robustly portable as possible. And over the years it’s been ported to an amazing range of computers—very often as the first serious piece of application software that a new kind of computer runs.
From the earliest days of Mathematica development, there was always a raw command-line interface to the kernel. And it’s still there today. And what’s amazing to me is how often—in some new and unfamiliar situation—it’s really nice to have that raw interface available. Back in 1988, it could even make graphics—as ASCII art—but that’s not exactly in so much demand today. But still, the raw kernel interface is what for example wolframscript uses to provide programmatic access to the Wolfram Language.
Software Archaeology
There’s much of the earlier history of computing that’s disappearing. And it’s not so easy in practice to still run Mathematica 1.0. But after going through a few early Macs, I finally found one that still seemed to run well enough. We loaded up Mathematica 1.0 from its distribution floppies, and yes, it launched! (I guess the distribution floppies were made the week before the actual release on June 23, 1988; I vaguely remember a scramble to get the final disks copied.)
Needless to say, when I wanted to livestream this, the Mac stopped working, showing only a strange zebra pattern on its screen. Whacking the side of the computer (a typical 1980s remedy) didn’t do anything. But just as I was about to give up, the machine suddenly came to life, and there I was, about to run Mathematica 1.0 again.
I tried all sorts of things, creating a fairly long notebook. But then I wondered: just how compatible is this? So I saved the notebook on a floppy, and put it in a floppy drive (yes, you can still get those) on a modern computer. At first, the modern operating system didn’t know what to do with the notebook file.
But then I added our old “.ma” file extension, and opened it. And… oh my gosh… it just worked! The latest version of the Wolfram Language successfully read the 1988 notebook file format, and rendered the live notebook (and also created a nice, modern “.nb” version):
There’s a bit of funny spacing around the graphics, reflecting the old way that graphics had to be handled back in 1988. But if one just selects the cells in the notebook, and presses Shift + Enter, up comes a completely modern version, now with color outputs too!
The Path Ahead
Before Mathematica, sophisticated technical computing was at best the purview of a small “priesthood” of technical computing experts. But as soon as Mathematica appeared on the scene, this all changed—and suddenly a typical working scientist or mathematician could realistically expect to do serious computation with their own hands (and then to save or publish the results in notebooks).
Over the past 30 years, we’ve worked very hard to open progressively more areas to immediate computation. Often there’s great technical sophistication inside. But our goal is to be able to let people translate high-level computational thinking as directly and automatically as possible into actual computations.
The result has been incredibly powerful. And it’s a source of great satisfaction to see how much has been invented and discovered with Mathematica over the years—and how many of the world’s most productive innovators use Mathematica and the Wolfram Language.
But amazingly, even after all these years, I think the greatest strengths of Mathematica and the Wolfram Language are only just now beginning to become broadly evident.
Part of it has to do with the emerging realization of how important it is to systematically and coherently build knowledge into a system. And, yes, the Wolfram Language has been unique in all these years in doing this. And what this now means is that we have a huge tower of computational intelligence that can be immediately applied to anything.
To be fair, for many of the past 30 years, Mathematica and the Wolfram Language were primarily deployed as desktop software. But particularly with the increasing sophistication of the general computing ecosystem, we’ve been able in the past 5–10 years to build out extremely strong deployment channels that have now allowed Mathematica and the Wolfram Language to be used in an increasing range of important enterprise settings.
Mathematica and the Wolfram Language have long been standards in research, education and fields like quantitative finance. But now they’re in a position to bring the tower of computational intelligence that they embody to any area where computation is used.
Since the very beginning of Mathematica, we’ve been involved with what’s now called artificial intelligence (and in recent times we’ve been leaders in supporting modern machine learning). We’ve also been very deeply involved with data in all forms, and with what’s now called data science.
But what’s becoming clearer only now is just how critical the breadth of Mathematica and the Wolfram Language is to allowing data science and artificial intelligence to achieve their potential. And of course it’s satisfying to see that all those capabilities that we’ve built over the past 30 years—and all the design coherence that we’ve worked so hard to maintain—are now so important in areas like these.
The concept of computation is surely the single most important intellectual development of the past century. And it’s been my goal with Mathematica and the Wolfram Language to provide the best possible vehicle to infuse high-level computation into every conceivable domain.
For pretty much every field X (from art to zoology) there either is now, or soon will be, a “computational X” that defines the future of the field by using the paradigm of computation. And it’s exciting to see how much the unique features of the Wolfram Language are allowing it to help drive this process, and become the “language of computational X”.
Traditional non-knowledge-based computer languages are fundamentally set up as a way to tell computers what to do—typically at a fairly low level. But one of the aspects of the Wolfram Language that’s only now beginning to be recognized is that it’s not just intended to be for telling computers what to do; it’s intended to be a true computational communication language, that provides a way of expressing computational thinking that’s meaningful both to computers and to humans.
In the past, it was basically just computers that were supposed to “read code”. But like a vast generalization of the idea of mathematical notation, the goal with the Wolfram Language is to have something that humans can readily read, and use to represent and understand computational ideas.
Combining this with the idea of notebooks brings us the notion of computational essays—which I think are destined to become a key communication tool for the future, uniquely made possible by the Wolfram Language, with its 30-year history.
Thirty years ago it was exciting to see so many scientists and mathematicians “discover computers” through Mathematica. Today it’s exciting to see so many new areas of “computational X” being opened up. But it’s also exciting to see that—with the level of automation we’ve achieved in the Wolfram Language—we’ve managed to bring sophisticated computation to the point where it’s accessible to essentially anyone. And it’s been particularly satisfying to see all sorts of kids—at middle-school level or even below—start to get fluent in the Wolfram Language and the high-level computational ideas it provides access to.
If one looks at the history of computing, it’s in many ways a story of successive layers of capability being added, and becoming ubiquitous. First came the early languages. Then operating systems. Later, around the time Mathematica came on the scene, user interfaces began to become ubiquitous. A little later came networking and then large-scale interconnected systems like the web and the cloud.
But now what the Wolfram Language provides is a new layer: a layer of computational intelligence—that makes it possible to take for granted a high level of built-in knowledge about computation and about the world, and an ability to automate its application.
Over the past 30 years many people have used Mathematica and the Wolfram Language, and many more have been exposed to their capabilities, through systems like Wolfram|Alpha built with them. But what’s possible now is to let the Wolfram Language provide a truly ubiquitous layer of computational intelligence across the computing world. It’s taken decades to build a tower of technology and capabilities that I believe are worthy of this—but now we are there, and it’s time to make this happen.
But the story of Mathematica and the Wolfram Language is not just a story of technology. It’s also a story of the remarkable community of individuals who’ve chosen to make Mathematica and the Wolfram Language part of their work and lives. And now, as we go forward to realize the potential for the Wolfram Language in the world of the future, we need this community to help explain and implement the paradigm that the Wolfram Language defines.
Needless to say, injecting new paradigms into the world is never easy. But doing so is ultimately what moves forward our civilization, and defines the trajectory of history. And today we’re at a remarkable moment in the ability to bring ubiquitous computational intelligence to the world.
But for me, as I look back at the 30 years since Mathematica was launched, I am thankful for everything that’s allowed me to single-mindedly pursue the path that’s brought us to the Mathematica and Wolfram Language of today. And I look forward to our collective effort to move forward from this point, and to contribute to what I think will ultimately be seen as a crucial element in the development of technology and our world.
To comment, please visit the copy of this post at the Wolfram Blog »
Logic is a foundation for many things. But what are the foundations of logic itself?
In symbolic logic, one introduces symbols like p and q to stand for statements (or “propositions”) like “this is an interesting essay”. Then one has certain “rules of logic”, like that, for any p and any q, NOT (pANDq) is the same as (NOTp) OR (NOTq).
But where do these “rules of logic” come from? Well, logic is a formal system. And, like Euclid’s geometry, it can be built on axioms. But what are the axioms? We might start with things like pANDq = qANDp, or NOTNOTp = p. But how many axioms does one need? And how simple can they be?
It was a nagging question for a long time. But at 8:31pm on Saturday, January 29, 2000, out on my computer screen popped a single axiom. I had already shown there couldn’t be anything simpler, but I soon established that this one little axiom was enough to generate all of logic:
That’s the same kind of question that’s increasingly being asked about all sorts of computational systems, and all sorts of applications of machine learning and AI. Yes, we can see what happens. But can we understand it?
I think this is ultimately a deep question—that’s actually critical to the future of science and technology, and in fact to the future of our whole intellectual development.
But before we talk more about this, let’s talk about logic, and about the axiom I found for it.
The History
Logic as a formal discipline basically originated with Aristotle in the 4th century BC. As part of his lifelong effort to catalog things (animals, causes, etc.), Aristotle cataloged valid forms of arguments, and created symbolic templates for them which basically provided the main content of logic for two thousand years.
By the 1400s, however, algebra had been invented, and with it came cleaner symbolic representations of things. But it was not until 1847 that George Boole finally formulated logic in the same kind of way as algebra, with logical operations like AND and OR being thought of as operating according to algebra-like rules.
Within a few years, people were explicitly writing down axiom systems for logic. A typical example was:
But does logic really need AND and OR and NOT? After the first decade of the 1900s several people had discovered that actually the single operation that we now call NAND is enough, with for example pORqbeing computed as (pNANDp) NAND (qNANDq). (The “functional completeness” of NAND could have remained forever a curiosity but for the development of semiconductor technology—which implements all the billions of logic operations in a modern microprocessor with combinations of transistors that perform none other than NAND or the related function NOR.)
But, OK, so what do the axioms of logic (or “Boolean algebra”) look like in terms of NAND? Here’s the first known version of them, from Henry Sheffer in 1913 (here dot · stands for NAND):
Back in 1910 Whitehead and Russell’sPrincipia Mathematica had popularized the idea that perhaps all of mathematics could be derived from logic. And particularly with this in mind, there was significant interest in seeing just how simple the axioms for logic could be. Some of the most notable work on this was done in Lviv and Warsaw (then both part of Poland), particularly by Jan Łukasiewicz (who, as a side effect of his work, invented in 1920 parenthesis-free Łukasiewicz or “Polish” notation). In 1944, at the age of 66, Łukasiewicz fled from the approaching Soviets—and in 1947 ended up in Ireland.
Meanwhile, the Irish-born Carew Meredith, who had been educated at Winchester and Cambridge, and had become a mathematics coach in Cambridge, had been forced by his pacifism to go back to Ireland in 1939. And in 1947, Meredith went to lectures by Łukasiewicz in Dublin, which inspired him to begin a search for simple axioms, which would occupy most of the rest of his life.
But could it get any simpler? Meredith had been picking away for years trying to see how a NAND could be removed here or there. But after 1967 he apparently didn’t get any further (he died in 1976), though in 1969 he did find the three-axiom system:
I actually didn’t know about Meredith’s work when I started exploring axiom systems for logic. I’d gotten into the subject as part of trying to understand what kinds of behavior simple rules could produce. Back in the early 1980s I’d made the surprising discovery that even cellular automata with some of the simplest possible rules—like my favorite rule 30—could generate behavior of great complexity.
And having spent the 1990s basically trying to figure out just how general this phenomenon was, I eventually wanted to see how it might apply to mathematics. It’s an immediate observation that in mathematics one’s basically starting from axioms (say for arithmetic, or geometry, or logic), and then trying to prove a whole collection of sophisticated theorems from them.
But just how simple can the axioms be? Well, that was what I wanted to discover in 1999. And as my first example, I decided to look at logic (or, equivalently, Boolean algebra). Contrary to what I would ever have expected beforehand, my experience with cellular automata, Turing machines, and many other kinds of systems—including even partial differential equations—was that one could just start enumerating the simplest possible cases, and after not too long one would start seeing interesting things.
But could one “discover logic” this way? Well, there was only one way to tell. And in late 1999 I set things up to start exploring what amounts to the space of all possible axiom systems—starting with the simplest ones.
In a sense any axiom system provides a set of constraints, say on p · q. It doesn’t say what p · q “is”; it just gives properties that p · q must satisfy (like, for example, it could say that p · q = p · q). Then the question is whether from these properties one can derive all the theorems of logic that hold when p · q is Nand[p, q]: no more and no less.
There’s a direct way to test some of this. Just take the axiom system, and see what explicit forms of satisfy the axioms if and can, say, be True or False. If the axiom system were just then, yes, could be Nand[, ]—but it doesn’t have to be. It could also be And[, ] or Equal[, ]—or lots of other things which won’t satisfy the same theorems as the NAND function in logic. But by the time one gets to the axiom system one’s reached the point where Nand[, ] (and the basically equivalent Nor[, ]) are the only “models” of that work—at least assuming and have only two possible values.
So is this then an axiom system for logic? Well, no. Because it implies, for example, that there’s a possible form for with 3 values for and , whereas there’s no such thing for logic. But, OK, the fact that this axiom system with just one axiom even gets close suggests it might be worth looking for a single axiom that reproduces logic. And that’s what I did back in January 2000 (it’s gotten a bit easier these days, thanks notably to the handy, fairly new Wolfram Language function Groupings).
It was easy to see that no axioms with 3 or fewer “NANDs” (or, really, 3 or fewer “dot operators”) could work. And by 5am on Saturday, January 29 (yes, I was a night owl then), I’d found that none with 4 NANDs could work either. By the time I stopped working on it a little after 6am, I’d gotten 14 possible candidates with 5 NANDs. But when I started work again on Saturday evening and did more tests, every one of these candidates failed.
So, needless to say, the next step was to try cases with 6 NANDs. There were 288,684 of these in all. But my code was efficient, and it didn’t take long before out popped on my screen (yes, from Mathematica Version 4):
At first I didn’t know what I had. All I knew was that these were the 25 inequivalent 6-NAND axioms that got further than any of the 5-NAND ones. But were any of them really an axiom system for logic? I had a (rather computation-intensive) empirical method that could rule axioms out. But the only way to know for sure whether any axiom was actually correct was to prove that it could successfully reproduce, say, the Sheffer axioms for logic.
It took a little software wrangling, but before many days had gone by, I’d discovered that most of the 25 couldn’t work. And in the end, just two survived:
And to my great excitement, I was successfully able to have my computer prove that both are axioms for logic. The procedure I’d used ensured that there could be no simpler axioms for logic. So I knew I’d come to the end of the road: after a century (or maybe even a couple of millennia), we could finally say that the simplest possible axiom for logic was known.
Not long after, I found two 2-axiom systems, also with 6 NANDs in total, that I proved could reproduce logic:
And if one chooses to take commutativity for granted, then these show that all it takes to get logic is one tiny 4-NAND axiom.
Why It Matters
OK, so it’s neat to be able to say that one’s “finished what Aristotle started” (or at least what Boole started) and found the very simplest possible axiom system for logic. But is it just a curiosity, or is there real significance to it?
Before the whole framework I developed in A New Kind of Science, I think one would have been hard-pressed to view it as much more than a curiosity. But now one can see that it’s actually tied into all sorts of foundational questions, like whether one should consider mathematics to be invented or discovered.
Mathematics as humans practice it is based on a handful of particular axiom systems—each in effect defining a certain field of mathematics (say logic, or group theory, or geometry, or set theory). But in the abstract, there are an infinite number of possible axiom systems out there—in effect each defining a field of mathematics that could in principle be studied, even if we humans haven’t ever done it.
Before A New Kind of Science I think I implicitly assumed that pretty much anything that’s just “out there” in the computational universe must somehow be “less interesting” than things we humans have explicitly built and studied. But my discoveries about simple programs made it clear that at the very least there’s often lots of richness in systems that are just “out there” than in ones that we carefully select.
So what about axiom systems for mathematics? Well, to compare what’s just “out there” with what we humans have studied, we have to know where the axiom systems for existing areas of mathematics that we’ve studied—like logic—actually lie. And based on traditional human-constructed axiom systems we’d conclude that they have to be far, far out there—in effect only findable if one already knows where they are.
But my axiom-system discovery basically answered the question, “How far out is logic?” For something like cellular automata, it’s particularly easy to assign a number (as I did in the early 1980s) to each possible cellular automaton. It’s slightly harder to do this with axiom systems, though not much. And in one approach, my axiom can be labeled as 411;3;7;118—constructed in the Wolfram Language as:
✕
Groupings[{p, q, r}[[1 + IntegerDigits[411, 3, 7]]], CenterDot -> 2][[118]] == r
And at least in the space of possible functional forms (not accounting for variable labeling), here’s a visual indication of where the axiom lies:
Given how fundamental logic is to so many formal systems we humans study, we might have thought that in any reasonable representation, logic corresponds to one of the very simplest conceivable axiom systems. But at least with the (NAND-based) representation we’re using, that’s not true. There’s still by most measures a very simple axiom system for it, but it’s perhaps the hundred thousandth possible axiom system one would encounter if one just started enumerating axiom systems starting from the simplest one.
So given this, the obvious next question is, what about all the other axiom systems? What’s the story with those? Well, that’s exactly the kind of investigation that A New Kind of Science is all about. And indeed in the book I argue that things like the systems we see in nature are often best captured precisely by those “other rules” that we can find by enumerating possibilities.
In the case of axiom systems, I made a picture that represents what happens in “fields of mathematics” corresponding to different possible axiom systems. Each row shows the consequences of a particular axiom system, with the boxes across the page indicating whether a particular theorem is true in that axiom system. (Yes, at some point Gödel’s Theorem bites one, and it becomes irreducibly difficult to prove or disprove a given theorem in a given axiom system; in practice, with my methods that happened just a little further to the right than the picture shows…)
Is there something fundamentally special about “human-investigated” fields of mathematics? From this picture, and other things I’ve studied, there doesn’t seem to be anything obvious. And I suspect actually that the only thing that’s really special about these fields of mathematics is the historical fact that they are what have been studied. (One might make claims like that they arise because they “describe the real world”, or because they’re “related to how our brains work”, but the results in A New Kind of Science argue against these.)
Alright, well then what’s the significance of my axiom system for logic? The size of it gives a sense of the ultimate information content of logic as an axiomatic system. And it makes it look like—at least for now—we should view logic as much more having been “invented as a human construct” than having been “discovered” because it was somehow “naturally exposed”.
If history had been different, and we’d routinely looked (in the manner of A New Kind of Science) at lots of possible simple axiom systems, then perhaps we would have “discovered” the axiom system for logic as one with particular properties we happened to find interesting. But given that we have explored so few of the possible simple axiom systems, I think we can only reasonably view logic as something “invented”—by being constructed in an essentially “discretionary” way.
In a sense this is how logic looked, say, back in the Middle Ages—when the possible syllogisms (or valid forms of argument) were represented by (Latin) mnemonics like bArbArA and cElErAnt. And to mirror this, it’s fun to find mnemonics for what we now know is the simplest possible axiom system for logic.
Starting with , we can represent each in prefix or Polish form (the reverse of the “reverse Polish” of an HP calculator) as Dpq—so the whole axiom can be written =DDDpqrDpDDprpr. Then (as Ed Pegg found for me) there’s an English mnemonic for this: FIGure OuT Queue, where are u, r, e. Or, looking at first letters of words (with operator B, and being a, p, c): “Bit by bit, a program computed Boolean algebra’s best binary axiom covering all cases”.
The Mechanics of Proof
OK, so how does one actually prove that my axiom system is correct? Well, the most immediate thing to do is just to show that from it one can derive a known axiom system for logic—like Sheffer’s axiom system:
There are three axioms here, and we’ve got to derive each of them. Well, with the latest version of the Wolfram Language, here’s what we do to derive the first one:
It’s pretty remarkable that it’s now possible to just do this. The “proof object” records that 54 steps were used in the proof. And from this proof object we can generate a notebook that describes each of those steps:
✕
pf["ProofNotebook"]
In outline, what happens is that a whole sequence of intermediate lemmas are proved, which eventually allow the final result to be derived. There’s a whole network of interdependencies between lemmas, as this visualization shows:
✕
pf["ProofGraph"]
Here are the networks involved in deriving all three of the axioms in the Sheffer axiom system—with the last one involving a somewhat whopping 504 steps:
And, yes, it’s clear these are pretty complicated. But before we discuss what that complexity means, let’s talk about what actually goes on in the individual steps of these proofs.
The basic idea is straightforward. Let’s imagine we had an axiom that just said . (Mathematically, this corresponds to the statement that · is commutative.) More precisely, what the axiom says is that for any expressions and , is equivalent to .
OK, so let’s say we wanted to derive from this axiom that . We could do this by using the axiom to transform to , to , and then finally to .
FindEquationalProof does essentially the same thing, though it chooses to do the steps in a slightly different order, and modifies the left-hand side as well as the right-hand side:
Once one’s got a proof like this, it’s straightforward to just run through each of its steps, and check that they produce the result that’s claimed. But how does one find the proof? There are lots of different possible sequences of substitutions and transformations that one could do. So how does one find a sequence that successfully gets to the final result?
One might think: why not just try all possible sequences, and if there is any sequence that works, one will eventually find it? Well, the problem is that one quickly ends up with an astronomical number of possible sequences to check. And indeed the main art of automated theorem proving consists of finding ways to prune the number of sequences one has to check.
This quickly gets pretty technical, but the most important idea is easy to talk about if one knows basic algebra. Let’s say you’re trying to prove an algebraic result like:
✕
(-1 + x^2) (1 - x + x^2) (1 + x + x^2) == (-1 + x) (1 + x + x^2) (1 + x^3)
Well, there’s a guaranteed way to do this: just apply the rules of algebra to expand out each side—and immediately one can see they’re the same:
✕
{Expand[(-1 + x^2) (1 - x + x^2) (1 + x + x^2)],
Expand[(-1 + x) (1 + x + x^2) (1 + x^3)]}
Why does this work? Well, it’s because there’s a way of taking algebraic expressions like this, and always systematically reducing them so that eventually they get to a standard form. OK, but so can one do the same thing for proofs with arbitrary axiom systems?
The answer is: not immediately. It works in algebra because algebra has a special property that guarantees one can always “make progress” in reducing expressions. But what was discovered independently several times in the 1970s (under names like the Knuth–Bendix and the Gröbner Basis algorithm) is that even if an axiom system doesn’t intrinsically have the appropriate property, one can potentially find “completions” of it that do.
And that’s what’s going on in typical proofs produced by FindEquationalProof (which is based on the Waldmeister (“master of trees”) system). There are so-called “critical pair lemmas” that don’t directly “make progress” themselves, but make it possible to set up paths that do. And the reason things get complicated is that even if the final expression one’s trying to get to is fairly short, one may have to go through all sorts of much longer intermediate expressions to get there. And so, for example, for the proof of the first Sheffer axiom above, here are the intermediate steps:
In this case, the largest intermediate form is about 4 times the size of the original axiom. Here it is:
One can represent expressions like this as a tree. Here’s this one, compared to the original axiom:
And here’s how the sizes of intermediate steps evolve through the proofs found for each of the Sheffer axioms:
Why Is It So Hard?
Is it surprising that these proofs are so complicated? In some ways, not really. Because, after all, we know perfectly well that math can be hard. In principle it might have been that anything that’s true in math would be easy to prove. But one of the side effects of Gödel’s Theorem from 1931 was to establish that even things we can eventually prove can have proofs that are arbitrarily long.
And actually this is a symptom of the much more general phenomenon I call computational irreducibility. Consider a system governed, say, by the simple rule of a cellular automaton (and of course, every essay of mine must have a cellular automaton somewhere!). Now just run the system:
One might have thought that given that there’s a simple rule that underlies the system, there’d always be a quick way to figure out what the system will do. But that’s not the case. Because according to my Principle of Computational Equivalence the operation of the system will often correspond to a computation that’s just as sophisticated as any computation that we could set up to figure out the behavior of the system. And this means that the actual behavior of the system in effect corresponds to an irreducible amount of computational work that we can’t in general shortcut in any way.
In the picture above, let’s say we want to know whether the pattern eventually dies out. Well, we could just keep running it, and if we’re lucky it’ll eventually resolve to something whose outcome is obvious. But in general there’s no upper bound to how far we’ll have to go to, in effect, prove what happens.
When we do things like the logic proofs above, it’s a slightly different setup. Instead of just running something according to definite rules, we’re asking whether there exists a way to get to a particular result by taking some series of steps that each follow a particular rule. And, yes, as a practical computational problem, this is immediately more difficult. But the core of the difficulty is still the same phenomenon of computational irreducibility—and that this phenomenon implies that there isn’t any general way to shortcut the process of working out what a system will do.
Needless to say, there are plenty of things in the world—especially in technology and scientific modeling, as well as in areas where there are various forms of regulation—that have traditionally been set up to implicitly avoid computational irreducibility, and to operate in ways whose outcome can readily be foreseen without an irreducible amount of computation.
But one of the implications of my Principle of Computational Equivalence is that this is a rather singular and contrived situation—because it says that computational irreducibility is in fact ubiquitous across systems in the computational universe.
OK, but what about mathematics? Maybe somehow the rules of mathematics are specially chosen to show computational reducibility. And there are indeed some cases where that’s true (and in some sense it even happens in logic). But for the most part it appears that the axiom systems of mathematics are not untypical of the space of all possible axiom systems—where computational irreducibility is inevitably rampant.
What’s the Point of a Proof?
At some level, the point of a proof is to know that something is true. Of course, particularly in modern times, proof has very much taken a back seat to pure computation. Because in practice it’s much more common to want to generate things by explicit computation than it is to want to “go back” and construct a proof that something is true.
In pure mathematics, though, it’s fairly common to deal with things that at least nominally involve an infinite number of cases (“true for all primes”, etc.), for which at least direct computation can’t work. And when it comes to questions of verification (“can this program ever crash?” or “can this cryptocurrency ever get spent twice?”) it’s often more reasonable to attempt a proof than to do something like run all possible cases.
But in the actual practice of mathematics, there’s more to proof than just establishing if things are true. Back when Euclid first wrote his Elements, he just gave results, and proofs were “left to the reader”. But for better or worse, particularly over the past century, proof has become something that doesn’t just happen behind the scenes, but is instead actually the primary medium through which things are supposed to be communicated.
At some level I think it’s a quirk of history that proofs are typically today presented for humans to understand, while programs are usually just thought of as things for computers to run. Why has this happened? Well, at least in the past, proofs could really only be represented in essentially textual form—so if they were going to be used, it would have to be by humans. But programs have essentially always been written in some form of computer language. And for the longest time, that language tended to be set up to map fairly directly onto the low-level operations of the computer—which meant that it was readily “understandable” by the computer, but not necessarily by humans.
But as it happens, one of the main goals of my own efforts over the past several decades has been to change this—and to develop in the Wolfram Language a true “computational communication language” in which computational ideas can be communicated in a way that is readily understandable to both computers and humans.
There are many consequences of having such a language. But one of them is that it changes the role of proof. Let’s say one’s looking at some mathematical result. Well, in the past the only plausible way to communicate how one should understand it was to give a proof that people could read. But now something different is possible: one can give a Wolfram Language program that computes the result. And in many ways this is a much more powerful way to communicate why the result is true. Because every piece of the program is something precise and unambiguous—that if one wants to, one can actually run. There’s no issue of trying to divine what some piece of text means, perhaps filling in some implicit assumptions. Instead, everything is right there, in absolutely explicit form.
OK, so what about proof? Are there in fact unambiguous and precise ways to write proofs? Well, potentially yes, though it’s not particularly easy. And even though the main Wolfram Language has now existed for 30 years, it’s taken until pretty much now to figure out a reasonable way to represent in it even such structurally comparatively straightforward proofs as the one for my axiom system above.
One can imagine authoring proofs in the Wolfram Language much like one authors programs—and indeed we’re working on seeing how to provide high-level versions of this kind of “proof assistant” functionality. But the proof of my axiom system that I showed above is not something anyone authored; it’s something that was found by the computer. And as such, it’s more like the output of running a program than like a program itself. (Like a program, though, the proof can in some sense be “run” to verify the result.)
Generating Understandability
Most of the time when people use the Wolfram Language—or Wolfram|Alpha—they just want to compute things. They’re interested in getting results, not in understanding why they get the results they do. But in Wolfram|Alpha, particularly in areas like math and chemistry, a popular feature for students is “step-by-step solutions”:
When Wolfram|Alpha does something like computing an integral, it’s using all sorts of powerful systematic algorithmic techniques optimized for getting answers. But when it’s asked to show steps it needs to do something different: it needs instead to explain step by step why it gets the result it does.
It wouldn’t be useful for it to explain how it actually got the result; it’s a very non-human process. Instead, it basically has to figure out how the kinds of operations humans learn can be used to get the result. Often it’ll figure out some trick that can be used. Yes, there’ll be a systematic way to do it that’ll always work. But it involves too many “mechanical” steps. The “trick” (“trig substitution”, “integration by parts”, whatever) won’t work in general, but in this particular case it’ll provide a faster way to get to the answer.
OK, but what about getting understandable versions of other things? Like the operation of programs in general. Or like the proof of my axiom system.
Let’s start by talking about programs. Let’s say one’s written a program, and one wants to explain how it works. One traditional approach is just to “include comments” in the code. Well, if one’s writing in a traditional low-level language, that may be the best one can do. But the whole point of the Wolfram Language being a computational communication language is that the language itself is supposed to allow you to communicate ideas, without needing extra pieces of English text.
It takes effort to make a Wolfram Language program be a good piece of exposition, just like it takes effort to make English text a good piece of exposition. But one can end up with a piece of Wolfram Language code that really explains very clearly how it works just through the code itself.
Of course, it’s very common for the actual execution of the code to do things that can’t readily be foreseen just from the program. I’ll talk about extreme cases like cellular automata soon. But for now let’s imagine that one’s constructed a program where there’s some ability to foresee the broad outlines of what it does.
And in such a case, I’ve found that computational essays (presented as Wolfram Notebooks) are a great tool in explaining what’s going on. It’s crucial that the Wolfram Language is symbolic, so it’s possible to run even the tiniest fragments of any program on their own (with appropriate symbolic expressions as input or output). And when one does this, one can present a succession of steps in the program as a succession of elements in the dialog that forms the core of a computational notebook.
In practice, it’s often critical to create visualizations of inputs or outputs. Yes, everything can be represented as an explicit symbolic expression. But we humans often have a much easier time understanding things when they’re presented visually, rather than as some kind of one-dimensional language-like string.
Of course, there’s something of an art to creating good visualizations. But in the Wolfram Language we’ve managed to go a long way towards automating this art—often using pretty sophisticated machine learning and other algorithms to do things like lay out networks or graphics elements.
What about just starting from the raw execution trace for a program? Well, it’s hard. I’ve done experiments on this for decades, and never been very satisfied with the results. Yes, you can zoom in to see lots of details of what’s going on. But when it comes to knowing the “big picture” I’ve never found any particularly good techniques for automatically producing things that are terribly useful.
At some level it’s similar to the general problem of reverse engineering. You are shown some final machine code, chip design, or whatever. But now you want to go backwards to reconstruct the higher-level description that some human started from, that was somehow “compiled” to what you see.
In the traditional approach to engineering, where one builds things up incrementally, always somehow being able to foresee the consequences of what one’s building, this approach can in principle work. But if one does engineering by just searching the computational universe to find an optimal program (much like I searched possible axiom systems to find one for logic), then there’s no guarantee that there’s any “human story” or explanation behind this program.
It’s a similar problem in natural science. You see some elaborate set of things happening in some biological system. Can one “reverse engineer” these to find an “explanation” for them? Sometimes one might be able to say, for example, that evolution by natural selection would be likely to lead to something. Or that it’s just common in the computational universe and so is likely to occur. But there’s no guarantee that the natural world is set up in any way that necessarily allows human explanation.
Needless to say, when one makes models for things, one inevitably considers only the particular aspects that one’s interested in, and idealizes everything else away. And particularly in areas like medicine, it’s not uncommon to end up with some approximate model that’s a fairly shallow decision tree that’s easy to explain, at least as far as it goes.
The Nature of Explainability
What does it mean to say that something is explainable? Basically it’s that humans can understand it.
So what does it take for humans to understand something? Well, somehow we have to be able to “wrap our brains around it”. Let’s take a typical cellular automaton with complex behavior. A computer has no problem following each step in the evolution. And with immense effort a human could laboriously reproduce what a computer does.
But one wouldn’t say that means the human “understands” what the cellular automaton is doing. To get to that point, the human would have to be readily able to reason about how the cellular automaton behaves, at some high level. Or put another way, the human would have to be able to “tell a story” that other humans could readily understand, about how the cellular automaton behaves.
Is there a general way to do this? Well, no, because of computational irreducibility. But it can still be the case that certain features that humans choose to care about can be explained in some reduced, higher-level way.
How does this work? Well, in a sense it requires that some higher-level language be constructed that can describe the features one’s interested in. Looking at a typical cellular automaton pattern, one might try to talk not in terms of colors of huge numbers of individual cells, but instead in terms of the higher-level structures one can pick out. And the key point is that it’s possible to make at least a partial catalog of these structures: even though there are lots of details that don’t quite fit, there are still particular structures that occur often.
And if we were going to start “explaining” the behavior of the cellular automaton, we’d typically begin by giving the structures names, and then we’d start talking about what’s going on in terms of these named things.
The case of a cellular automaton has an interesting simplifying feature: because it operates according to simple deterministic rules, there are structures that just repeat identically. If we’re dealing with things in the natural world, for example, we typically won’t see this kind of identical repetition. Instead, it’ll just be that this tiger, say, is extremely similar to this other one, so we can call them both “tigers”, even though their atoms are not identical in their arrangement.
What’s the bigger picture of what’s going on? Well, it’s basically that we’re using the idea of symbolic representation. We’re saying that we can assign something—often a word—that we can use to symbolically describe a whole class of things, without always having to talk about all the detailed parts of each thing.
In effect it’s a kind of information compression: we’re using symbolic constructs to find a shorter way to describe what we’re interested in.
Let’s imagine we’ve generated a giant structure, say a mathematical one:
✕
Solve[a x^4 + b x^3 + c x^2 + d x + e == 0, x]
Well, a first step is to generate a kind of internal higher-level representation. For example, we might find substructures that appear repeatedly. And we might then assign names to them. And then display a “skeleton” of the whole structure in terms of these names:
And, yes, this kind of “dictionary compression”–like scheme is useful in bringing a first level of explainability.
But let’s go back to the proof of my axiom system. The lemmas that were generated in this proof are precisely set up to be elements that are used repeatedly (a bit like shared common subexpressions). But even having in effect factored them out, we’re still left with a proof that is not something that we humans can readily understand.
So how can we go further? Well, basically we have to come up with some yet-higher-level description. But what might this be?
The Concept of Concepts
If you’re trying to explain something to somebody, it’s a lot easier when there’s something similar that they’ve already understood. Imagine trying to explain a modern drone to someone from the Stone Age. It’d probably be pretty difficult. But explaining it to someone from 50 years ago, who’d already seen helicopters and model airplanes etc., would be a lot easier.
And ultimately the point is that when we explain something, we do it in some language that both we and whoever we’re explaining it to knows. And the richer this language is, the fewer new elements we have to introduce in order to communicate whatever it is that we’re trying to explain.
There’s a pattern that’s been repeated throughout intellectual history. Some particular collection of things gets seen a bunch of times. And gradually it’s understood that these things are all somehow abstractly similar. And they can all be described in terms of some particular new concept, often referred to by some new word or phrase.
Let’s say one had seen things like water and blood and oil. Well, at some point one realizes that there’s a general concept of “liquid”, and all of these can be described as liquids. And once one has this concept, one can start reasoning in terms of it, and identifying more concepts—like, say, viscosity—that build on it.
When does it makes sense to group things into a concept? Well, that’s a difficult question, which can’t ultimately be answered without foreseeing everything that might be done with that concept. And in practice, in the evolution of human language and human ideas there’s some kind of process of progressive approximation that goes on.
There’s a much more rapid recapitulation that happens in a modern machine learning system. Imagine taking all sorts of objects that one’s seen in the world, and just feeding them to FeatureSpacePlot and seeing what comes out. Well, if one gets definite clusters in feature space, then one might reasonably think that each of these clusters should be identified as corresponding to a “concept”, that we could for example label with a word.
Now, to be fair, what’s happening with FeatureSpacePlot—like in human intellectual development—is in some ways incremental. Because to lay the objects out in feature space, FeatureSpacePlot is using features that it’s learned how to extract from previous categorizations it knows about.
But, OK, given the world as it is, what are the best categories—or best concepts—one can use to describe things? Well, it’s an evolving story. And in fact breakthroughs—whether in science, technology or elsewhere—are very often precisely associated with the realization that some new category or concept can usefully be identified.
But in the actual evolution of our civilization, there’s a kind of spiral at work. First some particular concept is identified—say the idea of a program. And once some concept has been identified, people start using it, and thinking in terms of it. And pretty soon all sorts of new things have been constructed on the basis of that concept. But then another level of abstraction is identified, and new concepts get constructed, building on top of the previous one.
It’s pretty much the same story for the technology stack of modern civilization, and its “intellectual stack”. Both involve towers of concepts, and successive levels of abstraction.
The Problem of Education
In order for people to be able to communicate using some concept, they have to have learned about it. And, yes, there are some concepts (like object permanence) that humans automatically learn by themselves just by observing the natural world. But looking for example at a list of common words in modern English, it’s pretty clear that most of the concepts that we now use in modern civilization aren’t ones that people can just learn for themselves from the natural world.
Instead—much like a modern machine learning system—at the very least they need some “specially curated” experience of the world, organized to highlight particular concepts. And for more abstract areas (like mathematics) they probably need explicit exposure to the concepts themselves in their raw abstract forms.
But, OK, as the “intellectual stack” of civilization advances, will we always have to learn progressively more? We might worry that at some point our brains just won’t be able to keep up, and we’d have to add some kind of augmentation. But perhaps fortunately, I think it’s one of those cases where the problem can instead most likely be “solved in software”.
The issue is this: At any given point in history, there’s a certain set of concepts that are important in being able to operate in the world as it is at that time. And, yes, as civilization progresses new things are discovered, and new concepts are introduced. But there’s another process at work as well: new concepts bring new levels of abstraction, which typically subsume large numbers of earlier concepts.
We often see this in technology. There was a time when to operate a computer you needed to know all sorts of low-level details. But over time those got abstracted away, so all you need to know is some general concept. You click an icon and things start to happen—and you don’t have to understand operating systems, or interrupt handlers or schedulers, or any of those details.
Needless to say, the Wolfram Language provides a great example of all this. Because it goes to tremendous trouble to “automate out” lots of low-level details (for example about what specific algorithm to use) and let human users just think about things in terms of higher-level concepts.
Yes, there still need to be some people who understand the details “underneath” the abstraction (though I’m not sure how many flint knappers modern society needs). But mostly education can concentrate on teaching at a higher level.
There’s often an implicit assumption in education that to reach higher-level concepts one has to somehow recapitulate the history of how those concepts were historically arrived at. But usually—and perhaps always—this doesn’t seem to be true. In an extreme case, one might imagine that to teach about computers, one would have to recapitulate the history of mathematical logic. But actually we know that people can go straight to modern concepts of computing, without recapitulating any of the history.
But what is ultimately the understandability network of concepts? Are there concepts that can only be understood if one already understands other concepts? Given a particular ambient experience for a human (or particular background training for a neural network) there is presumably some ordering.
But I suspect that something analogous to computation universality probably implies that if one’s just dealing with a “raw brain” then one could start anywhere. So if some alien were exposed to category theory and little else from the very beginning, they’d no doubt build a network of concepts where this is at the root, and maybe what for us is basic arithmetic would be something only reached in their analog of math graduate school.
Of course, such an alien might form their technology stack and their built environment in a quite different way from us—much as the recent history of our own civilization might have been very different if computers had successfully been developed in the 1800s rather than in the mid-1900s.
The Progress of Mathematics
I’ve often wondered to what extent the historical trajectory of human mathematics is an “accident”, and to what extent it’s somehow inexorable. As I mentioned earlier, at the level of formal systems there are many possible axiom systems from which one could construct something that is formally like mathematics.
But the actual history of mathematics did not start with arbitrary axiom systems. It started—in Babylonian times—with efforts to use arithmetic for commerce and geometry for land surveying. And from these very practical origins, successive layers of abstraction have been added that have led eventually to modern mathematics—with for example numbers being generalized from positive integers, to rationals, to roots, to all integers, to decimals, to complex numbers, to algebraic numbers, to quaternions and so on.
Is there an inexorability to this progression of abstraction? I suspect to some extent there is. And probably it’s a similar story as with other kinds of concept formation. Given some stage that’s been reached, there are various things that can readily get studied, and after a while groups of them are seen to be examples of more general and abstract constructs—which then in turn define another stage from which new things can be studied.
Are there ways to break out of this cycle? One possibility would be through doing experiments in mathematics. Yes, one can systematically prove things about particular mathematical systems. But one can also just empirically notice mathematical facts—like Ramanujan’s observation that is numerically close to an integer. And the question is: are things like this just “random facts of mathematics” or do they somehow fit into the whole “fabric of mathematics”?
One can ask the same kind of thing about questions in mathematics. Is the question of whether odd perfect numbers exist (which has been unanswered since Pythagoras) a core question in mathematics, or is it, in a sense, a random question that doesn’t connect into the fabric of mathematics?
Just like one can enumerate things like axiom systems, so also one can imagine enumerating possible questions in mathematics. But if one does this, I suspect there’s immediately an issue. Gödel’s Theorem establishes that in axiom systems like the one for arithmetic there are “formally undecidable” propositions, that can’t be proved or disproved from within the axiom system.
But the particular examples that Gödel constructed seemed far from anything that would arise naturally in doing mathematics. And for a long time it was assumed that somehow the phenomenon of undecidability was something that, while in principle present, wasn’t going to be relevant in “real mathematics”.
However, with my Principle of Computational Equivalence and my experience in the computational universe, I’ve come to the strong conclusion that this isn’t correct—and that instead undecidability is actually close at hand even in typical mathematics as it’s been practiced. Indeed, I won’t be surprised if a fair fraction of the current famous unsolved problems of mathematics (Riemann Hypothesis, P=NP, etc.) actually turn out to be in effect undecidable.
But if there’s undecidability all around, how come there’s so much mathematics that’s successfully been done? Well, I think it’s because the things that have been done have implicitly been chosen to avoid undecidability, basically just by virtue of the way mathematics has been built up. Because if what one’s doing is basically to form progressive levels of abstraction based on things one has shown are true, one’s basically setting up a path that’s going to be able to move forward without being forced into undecidability.
Of course, doing experimental mathematics or asking “random questions” may immediately land one in some area that’s full of undecidability. But at least so far in its history, this hasn’t been the way the mainstream discipline of mathematics has evolved.
So what about those “random facts of mathematics”? Well, it’s pretty much like in other areas of intellectual endeavor. “Random facts” don’t really get integrated into a line of intellectual development until some structure—and typically some abstract concepts—are built around them.
Nested patterns are another example. There are isolated examples of these in mosaics from the 1200s, but nobody really paid attention to them until the whole framework around nesting and fractals emerged in the 1980s.
It’s the same story over and over again: until abstract concepts around them have been identified, it’s hard to really think about new things, even when one encounters phenomena that exhibit them.
And so, I suspect, it is with mathematics: there’s a certain inevitable layering of abstract concept on top of abstract concept that defines the trajectory of mathematics. Is it a unique path? Undoubtedly not. In the vast space of possible mathematical facts, there are particular directions that get picked, and built along. But others could have been picked instead.
So does this mean that the subject matter of mathematics is inevitably dominated by historical accidents? Not as much as one might think. Because—as mathematics has discovered over and over again, starting with things like algebra and geometry—there’s a remarkable tendency for different directions and different approaches to wind up having equivalences or correspondences in the end.
And probably at some level this is a consequence of the Principle of Computational Equivalence, and the phenomenon of computational universality: even though the underlying rules (or underlying “language”) used in different areas of mathematics are different, there ends up being some way to translate between them—so that at the next level of abstraction the path that was taken no longer critically matters.
The Logic Proof and the Automation of Abstraction
OK, so let’s go back to the logic proof. How does it connect to typical mathematics? Well, right now, it basically doesn’t. Yes, the proof has the same nominal form as a standard mathematical proof. But it isn’t “human-mathematician friendly”. It’s all just mechanical details. It doesn’t connect to higher-level abstract concepts that a human mathematician can readily understand.
It would help a lot if we discovered that nontrivial lemmas in the proof already appeared in the mathematics literature. (I don’t think any of them do, but our theorem-searching capabilities haven’t gotten to the point where one can be sure.) But if they did appear, then this would likely give us a way to connect these lemmas to other things in mathematics, and in effect to identify a circle of abstract concepts around them.
But without that, how can the proof become explainable?
Well, maybe there’s just a different way to do the proof that’s fundamentally more connected to existing mathematics. But even given the proof as we have it now, one could imagine “building out” new concepts that would define a higher level of abstraction and put the proof in a more general context.
I’m not sure how to do either of these things. I’ve considered sponsoring a prize (analogous to my 2007 Turing machine prize) for “making the proof explainable”. But it’s not at all clear how one could objectively judge “explainability”. (Maybe one could ask for a 1-hour video that would successfully explain the proof to a typical mathematician—but this is definitely rather subjective.)
But just like we can automate things like finding aesthetic layouts for networks, perhaps we can automate the process of making a proof explainable. The proof as it is right now basically just says (without explanation), “Consider these few hundred lemmas”. But let’s say we could identify a modest number of “interesting” lemmas. Maybe we could somehow add these to our canon of known mathematics and then be able to use them to understand the proof.
There’s an analogy here with language design. In building up the Wolfram Language what I’ve basically done is to try to identify “lumps of computational work” that people will often want. Then we make these into built-in functions in the language, with particular names that people can use to refer to them.
A similar process goes on—though in a much less organized way—in the evolution of human natural languages. “Lumps of meaning” that turn out to be useful eventually get represented by words in the language. Sometimes they start as phrases constructed out of a few existing words. But the most impactful ones are typically sufficiently far away from anything that has come before that they just arrive as new words with potentially quite-hard-to-give definitions.
In the design of the Wolfram Language—with functions named with English words—I leverage the “ambient understanding” that comes from the English words (and sometimes from their meanings in common applications of computation).
One would want to do something similar in identifying lemmas to add to our canon of mathematics. Not only would one want to make sure that each lemma was somehow “intrinsically interesting”, but one would also want when possible to select lemmas that are “easy to reach” from existing known mathematical results and concepts.
But what does it mean for a lemma to be “intrinsically interesting”? I have to say that before I worked on A New Kind of Science, I assumed that there was great arbitrariness and historical accident in the choice of lemmas (or theorems) in any particular areas of mathematics that get called out and given names in typical textbooks.
But when I looked in detail at theorems in basic logic, I was surprised to find something different. Let’s say one arranges all the true theorems of basic logic in order of their sizes (e.g. might come first; AND a bit later, and so on). When one goes through this list there’s lots of redundancy. Indeed, most of the theorems end up being trivial extensions of theorems that have already appeared in the list.
But just sometimes one gets to a theorem that essentially gives new information—and that can’t be proved from the theorems that have already appeared in the list. And here’s the remarkable fact: there are 14 such theorems, and they essentially correspond exactly with the theorems that are typically given names in textbooks of logic. (Here AND is ∧, OR is ∨, and NOT is ¬.)
In other words, at least in this case, the named or “interesting” theorems are the ones that give minimal statements of new information. (Yes, after a while, by this definition there will be no new information, because one will have encountered all the axioms needed to prove anything that can be proved—though one can go a bit further with this approach by starting to discuss limiting the complexity of proofs that are allowed.)
What about with NAND theorems, like the ones in the proof? Once again, one can arrange all trueNAND theorems in order—and then find which of them can’t be proved from any earlier in the list:
NAND doesn’t have the same kind of historical traditional as AND, OR and NOT. (And there doesn’t seem to be any human language that, for example, has a single ordinary word for NAND.) But in the list of NAND theorems, the first highlighted one is easy to recognize as commutativity of NAND. After that, one really has to do a bit of translation to name the theorems: is like the law of double negation, is like the absorption law, is like “weakening”, and so on.
But, OK, so if one’s going to learn just a few “key theorems” of NAND logic, which should they be? Perhaps they should be theorems that appear as “popular lemmas” in proofs.
Of course, there are many possible proofs of any given theorem. But let’s say we just use the particular proofs that FindEquationalProof generates. Then it turns out that in the proofs of the first thousand NAND theorems the single most popular lemma is , followed by such lemmas like .
What are these? Well, for the particular methods that FindEquationalProof uses, they’re useful. But for us humans they don’t seem terribly helpful.
But what about popular lemmas that happen to be short? is definitely not the most popular lemma, but it is the shortest. is more popular, but longer. And then there are lemmas like .
But how useful are these lemmas? Here’s a way to test. Look at the first thousand NAND theorems, and see how much adding the lemmas shortens the proofs of these theorems (at least as found by FindEquationalProof):
is very successful, often cutting down the proof by nearly 100 steps. is much less successful; in fact, it actually sometimes seems to “confuse” FindEquationalProof, causing it to take more rather than fewer steps (visible as negative values in the plot). is OK at shortening, but not as good as . Though if one combines it with , the result is more consistent shortening.
One could go on with this analysis, say including a comparison of how much shortening is produced by a given lemma, relative to how long its own proof was. But the problem is that if one adds several “useful lemmas”, like and , there are still plenty of long proofs—and thus a lot left to “understand”:
What Can One Understand?
There are different ways to create models of things. For a few hundred years, exact science was dominated by the idea of finding mathematical equations that could be solved to say how things should behave. But in pretty much the time since A New Kind of Science appeared, there’s been a strong shift to instead set up programs that can be run to say how things should behave.
Sometimes those programs are explicitly constructed for a particular purpose; sometimes they’re exhaustively searched for. And in modern times, at least one class of such programs is deduced using machine learning, essentially by going backwards from examples of how the system is known to behave.
OK, so with these different forms of modeling, how easy is it to “understand what’s going on”? With mathematical equations, it’s a big plus when it’s possible to find an “exact solution”—in which the behavior of the system can be represented by something like an explicit mathematical formula. And even when this doesn’t happen, it’s fairly common to be able to make at least some mathematical statements that are abstract enough to connect to other systems and other behaviors.
As I discussed above, with a program—like a cellular automaton—it can be a different story. Because it’s common to be thrust immediately into computational irreducibility, which ultimately limits how much one can ever hope to shortcut or “explain” what’s going on.
But what about with machine learning, and, say, with neural nets? At some level, the training of a neural net is like recapitulating inductive discovery in natural science. One’s trying to start from examples and deduce a model for how a system behaves. But then can one understand the model?
Again there are issues of computational irreducibility. But let’s talk about a case where we can at least imagine what it would look like to understand what’s going on.
Instead of using a neural net to model how some system behaves, let’s consider making a neural net that classifies some aspect of the world: say, takes images and classifies them according to what they’re images of (“boat”, “giraffe”, etc.). When we train the neural net, it’s learning to give correct final outputs. But potentially one can think of the way it does this as being to internally make a sequence of distinctions (a bit like playing a game of Twenty Questions) that eventually determines the correct output.
But what are those distinctions? Sometimes we can recognize some of them. “Is there a lot of blue in the image?”, for example. But most of the time they’re essentially features of the world that we humans don’t notice. Maybe there’s an alternative history of natural science where some of them would have shown up. But they’re not things that are part of our current canon of perception or analysis.
If we wanted to add them, we’d probably end up inventing words for them. But the situation is very similar to the one with the logic proof. An automated system has created things that it’s effectively using as “waypoints” in generating a result. But they’re not waypoints we recognize or relate to.
Once again, if we found that particular distinctions were very common for neural nets, we might decide that those are distinctions that are worth us humans learning, and adding to our standard canon of ways to describe the world.
Can we expect that a modest number of such distinctions would go a long way? It’s analogous to asking whether a modest number of theorems would go a long way in understanding something like the logic proof.
My guess is that the answer is fuzzy. If one looks, for example, at a large corpus of math papers, one can ask how common different theorems are. It turns out that the frequency of theorems follows an almost perfect Zipf law (with the Central Limit Theorem, the Implicit Function Theorem and Fubini’s Theorem as the top three). And it’s probably the same with distinctions that are “worth knowing”, or new theorems that are “worth knowing”.
Knowing a few will get one a certain distance, but there’ll be an infinite power-law tail, and one will never get to the end.
The Future of Knowledge
Whether one looks at mathematics, science or technology, one sees the same basic qualitative progression of building a stack of increasing abstraction. It would be nice to be able to quantify this process. Perhaps one could look at how certain terms or descriptions that are common at one time later get subsumed into higher levels of abstraction, which then in turn have new terms or descriptions associated with them.
Maybe one could create an idealized model of this process using some formal model of computation, like Turing machines. Imagine that at the lowest level one has a basic Turing machine, with no abstraction. Now imagine selecting programs for this Turing machine according to some defined random process. Then run these programs and analyze them to see what “higher-level” model of computation can successfully reproduce the aggregate behavior of these programs without having to run each step in each program.
One might have thought that computational irreducibility would imply that this higher-level model of computation would inevitably be more complicated in its construction. But the key point is that we’re only trying to reproduce the aggregate behavior of the programs, not their individual behavior.
OK, but so then what happens if you iterate this process—essentially recapitulating idealized human intellectual history and building a progressive tower of abstraction?
Conceivably there’s some analogy to critical phenomena in physics, and to the renormalization group. And if so, one might imagine being able to identify a definite trajectory in the space of what amount to concept representation frameworks. What will the trajectory do?
Maybe it’ll have some kind of fixed-point behavior, representing the guess that at any point in history there are about the same number of abstract concepts that are worth learning—with new ones slowly being invented, and old ones being subsumed.
What might any of this mean for mathematics? One guess might be that any “random fact of mathematics”, say discovered empirically, would eventually be covered when some level of abstraction is reached. It’s not obvious how this process would work. After all, at any given level of abstraction, there are always new empirical facts to be “jumped to”. And it might very well be that the “rising tide of abstraction” would move only slowly compared to the rate at which such jumps could be made.
The Future of Understanding
OK, so what does all this mean for the future of understanding?
In the past, when humans looked, say, at the natural world, they had few pretensions to understand it. Sometimes they would personify certain aspects of it in terms of spirits or deities. But they saw it as just acting as it did, without any possibility for humans to understand in detail why.
But with the rise of modern science—and especially as more of our everyday existence came to be in built environments dominated by technology (or regulatory structures) that we had designed—these expectations changed. And as we look at computation or AI today, it seems unsettling that we might not be able to understand it.
But ultimately there’s always going to be a competition between what the systems in our world do, and what our brains are capable of computing about them. If we choose to interact only with systems that are computationally much simpler than our brains, then, yes, we can expect to use our brains to systematically understand what the systems are doing.
But if we actually want to make full use of the computational capabilities that our universe makes possible, then it’s inevitable that the systems we’re dealing with will be equivalent in their computational capabilities to our brains. And this means that—as computational irreducibility implies—we’ll never be able to systematically “outthink” or “understand” those systems.
But then how can we use them? Well, pretty much like people have always used systems from the natural world. Yes, we don’t know everything about how they work or what they might do. But at some level of abstraction we know enough to be able to see how to get purposes we care about achieved with them.
What about in an area like mathematics? In mathematics we’re used to building our stack of knowledge so that each step is something we can understand. But experimental mathematics—as well as things like automated theorem proving—make it clear that there are places to go that won’t have this feature.
Will we call this “mathematics”? I think we should. But it’s a different tradition from what we’ve mostly used for the past millennium. It’s one where we can still build abstractions, and we can still construct new levels of understanding.
But somewhere underneath there will be all sorts of computational irreducibility that we’ll never really be able to bring into the realm of human understanding. And that’s basically what’s going on in the proof of my little axiom for logic. It’s an early example of what I think will be the dominant experience in mathematics—and a lot else—in the future.
Is there a global theory for the shapes of fishes? It’s the kind of thing I might feel encouraged to ask by my explorations of simple programs and the forms they produce. But for most of the history of biology, it’s not the kind of thing anyone would ever have asked. With one notable exception: D’Arcy Wentworth Thompson.
And it’s now 100 years since D’Arcy Thompson published the first edition of his magnum opus On Growth and Form—and tried to use ideas from mathematics and physics to discuss global questions of biological growth and form. Probably the most famous pages of his book are the ones about fish shapes:
Stretch one kind of fish, and it looks like another. Yes, without constraints on how you stretch, it’s not quite clear what this is telling one, and I don’t think it’s much. But just to ask the question is interesting, and On Growth and Form is full of interesting questions—together with all manner of curious and interesting answers.
D’Arcy Thompson was in many ways a quintessential British Victorian academic, steeped in the classics, and writing books with titles like A Glossary of Greek Fishes (i.e. how were fish described in classical Greek texts). But he was also a diligent natural scientist, and he became a serious enthusiast of mathematics and physics. And where Aristotle (whom D’Arcy Thompson had translated) used plain language, with perhaps a dash of logic, to try to describe the natural world, D’Arcy Thompson tried to use the language of mathematics and physics.
At Christmas time, according to his daughter, he used to entertain children by drawing pictures of dogs on rubber sheets and stretching them from poodles to dachshunds. But it was not until the age of 57 that he turned such pursuits into the piece of scholarship that is On Growth and Form.
The first edition of the book was published in 1917. In many ways it’s like a catalog of biological forms—a kind of geometrical analog of Aristotle’s books on natural history. It’s particularly big on aquatic life—from plankton to fish. Land animals do make a showing, though mostly as skeletons. And ordinary plants make only specific appearances. But throughout the book the emphasis is on “why does such-and-such a thing have the form or shape it does?”. And over and over again the answer that’s given is: “because it’s following such-and-such a physical phenomenon, or mathematical structure”.
Much of the story of the book is told in its pictures. There are growth curves—of haddock, trees, regenerated tadpole tails, etc. There’s a long discussion of the shapes of cells—and especially their connection with phenomena (like splashes, bubbles and foams) where surface tension is important. There are spirals—described mathematically, and appearing in shells and horns and leaf arrangements. And finally there’s a long discussion of the “theory of transformations”—about how different forms (like the shapes of fishes or primate skulls) might be related by various (mathematically rather undefined) “transformations”.
In D’Arcy Thompson’s time—as still to a large extent today—the dominant form of explanation in biology is Darwinism: essentially the idea that things are the way they are because they’ve somehow evolved to be that way, in order to maximize some kind of fitness. D’Arcy Thompson didn’t think that was the whole story, or even necessarily the most important part of the story. He thought instead that many natural forms are the way they are because it’s an inevitable feature of the physics of biological tissue, or the mathematics of geometrical forms.
Sometimes his explanations fall a little flat. Leaves aren’t really shaped much like polar plots of trigonometric functions. Jellyfish aren’t convincingly shaped like drops of ink in water. But what he says often rings true. Hexagonal arrangements of cells are like closest geometrical packings of disks. Sheep horns and nautilus shells form logarithmic (equiangular) spirals.
He uses basic geometry and algebra quite a bit—and even sometimes a little combinatorics or topology. But he never goes as far as calculus (and, as it happens, he never learned it), and he never considers ideas like recursive rules or nested structures. But for me—as for quite a few others over the years—D’Arcy Thompson’s book is an important inspiration for the concept that even though biological forms may at first look complicated, there can still be theories and explanations for them.
In modern times, though, there’s a crucial new idea, that D’Arcy Thompson did not have: the idea of using not traditional mathematics and physics, but instead computation and simple programs as a way to describe the rules by which things grow. And—as I discovered in writing my book A New Kind of Science—it’s remarkable to what extent that idea lets us understand the mechanisms by which complex biological forms are produced, and lets us finish the bold initiative that D’Arcy Thompson began a century ago in On Growth and Form.
Who Was D’Arcy Thompson?
D’Arcy Wentworth Thompson was born in Edinburgh on May 5, 1860. His father, who was also named D’Arcy Wentworth Thompson, had been born in 1829, aboard a ship captained by his father, that was transporting convicts to Tasmania. D’Arcy Senior was soon sent to boarding school in England, and eventually studied classics at Cambridge. Though academically distinguished, he was apparently passed over for a fellowship because of perceived eccentricity—and wound up as a (modernizing, if opinionated) schoolteacher in Edinburgh. Once there, he soon met the lively young Fanny Gamgee, daughter of Joseph Gamgee, an early and distinguished veterinary surgeon—and in 1859 they were married.
D’Arcy (junior) was born the next year—but unfortunately his mother contracted an infection during childbirth, and died within the week. The result was that D’Arcy (junior) ended up living with his mother’s parents, taken care of by one of his mother’s sisters. When D’Arcy (junior) was three years old, his father then got a university professorship (of ancient Greek) in Ireland, and moved there. Still, D’Arcy (junior) stayed in close touch with his father through letters, and, later, visits. And indeed his father seems to have doted on him, for example publishing two children’s books dedicated to him:
In a foreshadowing of his later interests, D’Arcy (junior) learned Latin from his father almost as soon as he was able to speak, and was continually exposed to animals of all sorts in the Gamgee household. There was also a certain math/physics theme. D’Arcy Thompson (senior)’s best friend in Edinburgh was Peter Guthrie Tait—a distinguished mathematical physicist (mechanics, thermodynamics, knot theory, …) and friend of Maxwell, Hamilton and Kelvin—and D’Arcy (junior) often hung out at his house. Joseph Gamgee was also engaged in various scientific pursuits, for example publishing the book On Horseshoeing and Lameness based in part on a statistical study he’d done with the then 10-year-old D’Arcy (junior). Meanwhile, D’Arcy Thompson (senior) began to travel, as D’Arcy (junior) would later do, for example visiting Harvard in 1867 to give the Lowell Lectures—which D’Arcy (junior) would also give, in 1936, 69 years later.
At the age of 11, D’Arcy went to the school where his father had previously taught. He did well in academic studies, but also organized a natural history (“Eureka”) club, where he and his friends collected all sorts of specimens. And by the end of his time at school, he published his first paper: the 11-page (with photographs) “Note on Ulendron and Halonia”, describing the regular pattern of growth scars on two kinds of fossil plants.
At 18, D’Arcy started at Edinburgh University as a medical student. His grandfather—while distinguished—was not wealthy, with the result that D’Arcy had to support himself by tutoring Greek and writing articles for the Edinburgh-published Encyclopedia Britannica (the 9th edition, from 1889, contains an extensive article by D’Arcy on John Ray, a British naturalist of the 1600s). But D’Arcy’s real passion at the time was the then-hot field of paleontology, and after two years he abandoned his medical studies—and left to instead study Natural Science at the place his father had been years earlier: Trinity College, Cambridge.
D’Arcy did well at Cambridge, had an interesting circle of friends (including the future co-author of Principia Mathematica, Alfred North Whitehead), and quickly became something of a fixture in the local natural history scene. This led Macmillan & Co. to commission D’Arcy (still an undergraduate) to produce his first book: a translation from German of Hermann Muller’s The Fertilisation of Flowers. The publisher thought that the book—which was a fairly traditional work of descriptive natural history, based in part on observing about 14,000 visits of insects to flowers—would be of popular interest, and (in one of his last published appearances) got no less than Charles Darwin to write a preface for it:
At Cambridge, D’Arcy hung out a lot at the new Museum of Zoology, and was particularly influenced by a young professor named Frank Balfour who studied comparative embryology, and for whom a new Department of Animal Morphology was being created—but who died trying to climb Mont Blanc right when D’Arcy was finishing Cambridge.
D’Arcy began to pursue all sorts of projects, giving lectures on topics such as “Aristotle on Cephalopods”, and making detailed studies of “hydroid zoophyte” specimens (aquatic animals like sea anemones that look like plants) brought back from expeditions to the Arctic and Antarctic. He applied for a fellowship in Cambridge, but—like his father before him—didn’t get it.
In 1884, though, the newly created and new-thinking (non-religious, co-ed, young professors, …) University College in Dundee, Scotland, advertised for a professor of biology (yes, combining zoology and botany!). D’Arcy applied, and got the job—with the result that at age 24 he became a professor, a role in which he would remain for nearly 64 years.
D’Arcy the Professor
D’Arcy was immediately popular as a teacher, and continued to do a certain amount of rather dry academic work (in 1885 he published A Bibliography of Protozoa, Sponges, Coelenterata, and Worms, which was, as advertised, a list of about 6000 publications on those subjects between 1861 and 1883). But his real passion was the creation of his own Museum of Zoology, and the accumulation of specimens for it.
He was soon excitedly writing that “within the last week, I have had a porpoise, two mongooses, a small shark, an eel 8ft long… a young ostrich and two bagfuls of monkeys: all dead of course.” His archive (among its 30,000 items) contains extensive evidence of all sorts of trading of specimens from around the world:
But in Dundee he found a particularly good local source of specimens. Dundee had long been a center of international textile trade, and had also developed a small whaling industry. And when it was discovered that by mixing jute with whale oil it could be turned into fabric, whaling in Dundee grew dramatically.
Some of the hunting they did was local. But whaling ships from Dundee went as far as Canada and Greenland (and once even to Antarctica). And befriending their captains, D’Arcy persuaded them to bring him back specimens (as skeletons, in jars, etc.) from their expeditions—with the result, for example, that his museum rapidly accumulated the best arctic collection around.
The museum always operated on a shoestring budget, and it was typical in 1886 when D’Arcy wrote that he’d personally been “working all day on a baby Ornithorhynchus” (platypus). In his early years as a professor, D’Arcy published only a few papers, mostly on very detailed matters—like the strangely shaped stomach of a type of possum, or the structure of the porpoise larynx, or the correct taxonomic placement of a duck-like dinosaur. And he always followed the prevailing Darwinian paradigm of trying to explain things either by their evolutionary connections, or by their fitness for a particular function.
The Matter of the Alaskan Seals
In Dundee, D’Arcy joined various local clubs, like the Dundee Naturalists’ Society, the Dundee Working Men’s Field Club, the Homeric Club, and, later, also the Freemasons. He became quite active in university and community affairs, notably campaigning for a medical school (and giving all sorts of statistical evidence for its utility), as well as for education for the local poor. But mostly D’Arcy lived the life of an academic, centered around his teaching and his museum.
Still, as a responsible member of the community, he was called on in various ways, and in 1892, he joined his first government commission—formed to investigate a plague of voles in Scotland (conclusions included: “don’t shoot hawks and owls that eat voles”, and “it’s probably not a good idea to set loose a ‘virus’ to infect the voles”). Then in 1896—at the age of 36—D’Arcy was tapped for a piece of international scientific diplomacy.
It all had to do with seals, and the fur trade based on them. When Russia sold Alaska to the US in 1867 it also sold the rights to the seals which bred on some of the islands in the Bering Sea. But by the 1890s Canadian ships (under British protection) were claiming the right to catch seals in the open ocean—and too many seals were being killed for the population to be maintained. In 1893 a treaty was made to clarify the situation. But in 1896 there was a need to analyze more carefully what was going on (and, yes, to claim what ended up being $10M in damages for Canadian/British sealers).
Lord Salisbury, the British Prime Minister at the time, who happened to be an amateur botanist, knew of D’Arcy and asked him to travel to the Bering Sea to investigate. D’Arcy had by that point traveled a bit around Europe, but this was a complex trip. At first he went to Washington, DC, dropping in at the White House. Then across Canada, and then by Coast Guard ship (and dog sled) to the seals.
D’Arcy did well at making friends with his American counterparts (who included the president of the then-a-decade-old Stanford University), and found that at least on the American-controlled islands (the Russian-controlled ones were a different story) seals were being herded a bit like sheep in Scotland, and that though there was “abundant need for care and prudent measures of conservation”, things were basically OK. In Washington, DC, D’Arcy gave a long speech, and helped broker a kind of “seal peace treaty”—that the British government was pleased enough with to give D’Arcy a (medieval-inspired) “Companion of the Bath” honor.
Statesman of Science
Being a professor in Dundee wasn’t a particularly high position in the pecking order of the time. And after his Bering Sea experience, D’Arcy started investigating moving up. He applied for various jobs (for example at the Natural History Museum in London), but perhaps in part because he didn’t have fancier academic credentials (like a PhD)—and also had spent so much of his time organizing things rather than doing research—he never got any of them.
He was nevertheless increasingly sought after as a kind of statesman of science. And in 1898 he was appointed to the Fishery Board for Scotland (a role in which he continued for 43 years), and the next year he was the British delegate to the first International Conference on Oceanography.
D’Arcy was a serious collector of data. He maintained a team of people at the fish market, keeping track of the catches brought in from boats:
And then he took this data and created graphics and statistical analyses:
And over the years he became well known as a negotiator of fishing rights, both locally and internationally. He was also a collector of oceanographic data. He saw to it that there were detailed tide measurements made:
And had the data analyzed and decomposed into harmonic components—much as it is today:
The Scottish government even provided for him a research ship (a steam trawler named the SS Goldseeker) in which he and his students would go around the Scottish coast, measuring ocean properties and collecting specimens.
D’Arcy the Classical Scholar
D’Arcy always had many interests. First and foremost was natural history. But after that came classics. And indeed, back in his undergraduate days, D’Arcy had already started working with his classicist father on translating Aristotle’s works on natural history into English.
One of the complexities of that task, however, was to know what species Aristotle meant by words he used in Greek. And this led D’Arcy into what became a lifelong project—the first output of which was his 1894 book Glossary of Greek Birds:
It’s an interesting exercise—trying to fit together clues to deduce just what modern bird some passage in classical Greek literature was talking about. Often D’Arcy succeeds. Sometimes by using natural history; sometimes by thinking about mythology or about configurations of things like constellations named for birds. But sometimes D’Arcy just has to describe something as “a remarkable bird, of three varieties, of which one croaks like a frog, one bleats like a goat, and the third barks like a dog”—and he doesn’t know the modern equivalent.
Over the years, D’Arcy continued his efforts to translate Aristotle, and finally in 1910 (8 years after his father’s death) he was able to publish what remains to this day the standard translation of Aristotle’s main work on zoology, his History of Animals.
This project established D’Arcy as a classical scholar—and in 1912 he even got an honorary PhD (D.Litt.) at Cambridge on the basis of it. He also began a long association with what’s known as Liddell & Scott, the still-standard dictionary of ancient Greek. (Liddell had been notable for being the father of Alice, of Wonderland fame.)
But D’Arcy’s interests in Greek science extended beyond natural history, and into astronomy and mathematics. D’Arcy explored such things as ancient methods for computing square roots—and also studied Greek geometry.
So in 1889, when D’Arcy was investigating Foraminifera (protozoa that live in sediment or in the ocean and often form spiral shells) he was able to bring his knowledge of Greek mathematics to bear, declaring that “I have taken to Mathematics… and discovered some unsuspected wonders in regard to the Spirals of the Foraminifera!”.
Towards Something Bigger
When he was 41, in 1901, D’Arcy married his stepmother’s niece, the then 29-year-old Ada Maureen Drury (yes, small world that it is, she was named after “Byron’s” Ada, because an ancestor had reputedly been a romantic interest of Byron’s). They bought a small house somewhat outside of town—and between 1902 and 1910 they had three children, all daughters.
By 1910, D’Arcy was 50 years old, and an elder statesman of science. He kept himself busy teaching, managing his museum, doing administrative and government work, and giving public lectures. A typical lecture—given at Oxford in 1913—was entitled “On Aristotle as a Biologist”. It was charming, eloquent, ponderous and Victorian:
In many ways, D’Arcy was first and foremost a collector. He collected natural history specimens. He collected Greek words. He collected academic references—and antiquarian books. And he collected facts and statements—many of which he typed onto index cards, now to be found in his archive:
Still, in his role as elder statesman, D’Arcy was called upon to make broad pronouncements. And in many ways the great achievement of the later part of his life was to connect the disparate things he collected and identify common themes that could connect them.
In 1908 he had published (in Nature) a 2-page paper entitled “On the Shapes of Eggs and the Causes Which Determine Them”. In a sense the paper was about the physics of egg formation. And what was significant was that instead of accounting for different egg shapes in terms of their evolutionary fitness, it talked about the physical mechanisms that could produce them.
Three years later D’Arcy gave a speech entitled “Magnalia Naturae: or the Greater Problems of Biology” in which he took this much further, and started discussing “the possibility of… supporting the observed facts of organic form on mathematical principles [so as to make] morphology… a true natural science… justified by its relation to mathematics”.
In 1910, Cambridge University Press had asked D’Arcy if he’d like to write a book about whales. He said that instead perhaps he should write a “little book” about “The Forms of Organisms” or “Growth and Form”—and he began the process of assembling what would become On Growth and Form. The book had elements that drew on D’Arcy’s whole range of interests. His archives contain some of what went into the assembly of the book, like the original drawings of fish-shape transformations (D’Arcy wasn’t a great sketch artist):
There were also other, more impressionistic images—like the one illustrating transformations between zebra-related animals (quagga, etc.) or one showing tortoise (?) shell structure:
D’Arcy didn’t contact his publisher again for several years, but in 1915—in the middle of World War I—he wrote them again, saying that he had finally finished the book “on a larger scale”, and soon signed a publishing contract (that’s shockingly similar to the way modern ones look):
It took a couple more years, between D’Arcy’s last-minute changes, and paper shortages associated with the war—but finally in 1917 the book (which had by then swelled to 800 pages) was published.
The Book
On Growth and Form opens with a classic D’Arcy “Prefatory Note”: “This book of mine has little need of preface, for indeed it is ‘all preface’ from beginning to end.” He goes on to apologize for his lack of mathematical skill—and then launches, beginning with a discussion of the relation of the philosophies of Kant and Aristotle on the nature of science.
The reviews were positive, and surprisingly sensible, with the Times Literary Supplement for example writing:
Further into Mathematics
D’Arcy was 57 years old by the time On Growth and Form was published—and he could have used it as a closing act in his career. But instead it seemed to make him more energetic—and seemed to encourage him to take mathematical methods as a kind of personal theme.
In his study of the shapes of biological cells, D’Arcy had gotten very interested in polyhedra and packings, and particularly in Archimedean solids (such as the tetrakaidecahedron). His archives contain all sorts of investigations of possible packings and their properties, together with actual cardboard polyhedra, still ready to assemble:
D’Arcy extended his interest in number theory, collecting properties of numbers a little like he’d collected so many other things:
He dipped into chemistry, thinking about it in terms of graphs, like those derived from polyhedra:
And even when he worked on history, D’Arcy used mathematical thinking, here studying the distribution of when famous people lived, in connection with writing about the Golden Ages:
As an administrator he brought in math as well, here analyzing what in today’s world would be called a grading curve—and comparing exam results between different years:
He worked extensively on tides and tide computations. He collected data from harbors. And came up with theories about the various components of tides, some of which turned out to be correct:
The mathematics he used was always a bit circumscribed—and for example he never learned calculus, even to the point of apparently getting confused about growth rates versus finite differences in plots in On Growth and Form. (There seems to be just a single sheet of calculus-like work by him in his archives, and it’s simply an exercise copied without solution from the famous Whittaker & Watson textbook.)
And back from 1897 there’s a curious cardboard object that D’Arcy described as a “reasoning machine”:
It’s not completely clear what this was (though its wheel still turns nicely!). It seemed to involve a diagrammatic way of determining the truth value of a logical expression, perhaps following the work of Jevons from a couple of decades earlier. But so far as I can tell it was D’Arcy’s sole excursion into the world of logic and rule-based processes—and he never connected anything like this to biology.
The Later D’Arcy
Before On Growth and Form, D’Arcy had published only quite sporadically. But after it, as he entered his sixties, he began to write prodigiously, publishing all over the place on a wide range of topics. He gave lectures, in person and on the radio. And he also began to receive all sorts of honors (he became Sir D’Arcy in 1937)—and was invited to events all over the world (he did a grand tour of the US in the 1930s, and was also received as a celebrity in places like the Soviet Union).
On Growth and Form was considered a commercial success. Its original print run was 500 copies (of which at least 113 are now in academic libraries around the world), and by 1923 it had sold out. The publisher (Cambridge University Press) wanted to reprint it. But D’Arcy insisted that it needed to be revised—and in the end it took until 1942 before he got the revisions done. The second edition added 300 pages to the book—including photographs of splashes (obtained directly from Harold Edgerton at MIT), analysis of teeth, and patterns on animal coats. But the main elements of the book remained exactly the same.
D’Arcy had published a second edition of his Glossary of Greek Birds in 1936 (more birds, more interpretations), and in 1947, based on notes he started collecting in 1879, he released a kind of sequel: his Glossary of Greek Fishes. (Oxford University Press, in the flap copy for the book, says charmingly that “… it is highly improbable that there is any other scholar who has studied Greek fishes over so long a period as Sir D’Arcy Thompson…”.)
Even into his eighties, D’Arcy continued to travel all over the place—with his archives containing some typical travel documents of the time:
His travel was interrupted by World War II (which is perhaps why the second edition of On Growth and Form finally got finished in 1942). But in 1947, with the war over, at the age of 87, D’Arcy went to India for several months, notably lecturing on the skeletal structure of birds while holding a somewhat impatient live hen in a box. But in India D’Arcy’s health began to fail, and after returning to Scotland, he died in June 1948—to the last corresponding about specimens for his museum.
Aftermath
D’Arcy’s wife (who seemed in frail health through much of her 47-year marriage to D’Arcy) lived on for only 7 months after his death. None of D’Arcy’s daughters ever married. His oldest daughter Ruth became a music teacher and administrator at a girl’s boarding school, and in 1958 (when she was 56) published a biography of D’Arcy.
His middle daughter Molly moved to South Africa, wrote children’s and travel books, and lived to the age of 101, dying in 2010—while his youngest daughter Barbara wrote a book on healing and herbalism and died in a freak river accident in 1990.
On Growth and Form was D’Arcy’s most notable output, and it has been reprinted many times over the course of a hundred years. The museum D’Arcy created in Dundee was largely dismantled in the 1950s, but has now been to some extent reconstituted, complete with some of the very specimens D’Arcy collected, with labels duly signed “DWT” (yup, that’s me next to the same orangutan as in the old picture of the museum):
In 1917 D’Arcy moved from Dundee to the nearby but more distinguished and ancient university in St Andrews, where he took over another museum. It too fell upon hard times, but still exists in a reduced form.
And now some of the D’Arcy’s specimens are being 3D-scanned (yes, that’s the same crocodile):
And on a main street in St. Andrews there’s still a plaque where D’Arcy lived:
What Was D’Arcy Like?
D’Arcy had an imposing physical presence. He stood 6’3” and had a large head, on which he often wore a black fedora. He had piercing blue eyes, and in his youth, he had red hair—which he grew into a large beard when he was a young professor. He often wore a long coat, which could sometimes seem moth eaten. Later in his life, he would sometimes walk around town with a parrot on his shoulder.
He was renowned as an engaging speaker and lecturer—known both for his colorful and eloquent content (he could regale the audience with the tale of a walrus he had known, or equally well discuss Aristotle’s year by the seaside), and for the various physical (and biological) demonstrations he would use. Many stories are told of his eccentricities, especially by his former students. It is said, for example, that he once came to give a lecture to his students which began with him pulling a dead frog out of one pocket of his coat—and then a live one out of the other pocket. Despite having spent most of his life in Scotland, he didn’t have a Scottish accent.
He was charming and jolly, and even in his eighties he was given to dancing when he could. He was tactful and diplomatic, if not particularly good at sensing other people’s opinions. He presented himself with a certain modesty (for example always expressing his weakness in mathematics), and—perhaps to his detriment—did little to advocate for himself.
He led a fairly simple life, centered around his work and family. He worked hard, typically until midnight each day. He always liked to learn. He enjoyed children and the young, and would happily play with them. When he walked around town, he was universally recognized (the shoulder parrot helped!). He was happy to chat with anyone, and in later years, he carried candy in his pocket, which he gave out to children he encountered.
D’Arcy was a product of his age, but also of an unusual combination of influences. Like many of the members of his adopted family, D’Arcy aspired to be a scientist. But like his father, he aspired to be a classical scholar. He did diligent and detailed academic work for many years, in natural history, in classics, and in ancient science. But he also enjoyed presentation and lecturing. And it was in large part through his efforts to explain his academic work that he came to make the connections that would lead to On Growth and Form.
What Happened After
If you search the scientific literature today, you’ll find about 4000 publications citing On Growth and Form. Their number relative to the total scientific literature has remained remarkably fairly even over the years (with a peak around the publication of the second edition in 1942, and perhaps a dip in the 1960s when genetics began to dominate biology):
There’s quite a diversity in the topics, as this random sample of titles indicates:
Most concern specific biological systems; some are more general. Making word clouds from titles by decade, one sees that “growth” is the dominant theme—though centered in the 1990s there are signs of the discussion that was going on about the “philosophy of evolution”, and the interplay between natural selection and “developmental constraints”:
On Growth and Form has never really become mainstream in biology—or any other field. (It didn’t help that by the 1930s, biology was firmly going off in the direction of biochemistry and later molecular biology.) So how have people found out about On Growth and Form?
Indeed, as I write this, I’m wondering: how did I myself find out about On Growth and Form? I can tell I knew about it by 1983, because I referenced it (somewhat casually) in my first long paper about cellular automata and the patterns they generate. I also know that in 1982 I bought a copy of the (heavily abridged) version of On Growth and Form that was available then. (I was thrilled in 1992 when I chanced upon a complete second edition of On Growth and Form in a used bookstore; I’d never seen the whole book before.)
But how did I first become aware of D’Arcy, and On Growth and Form? My first hypothesis today was that it was in 1977, from the historical notes of Benoit Mandelbrot’s Fractals book (yes, D’Arcy had actually used the term “self-similar”, though only in connection with spirals). Then I thought perhaps it might have been around 1980, from the references to Alan Turing’s 1952 paper on the chemical basis of morphogenesis. I wondered if perhaps it was from hearing about catastrophe theory, and the work of René Thom, in the mid-1970s. But my best guess as of now is that it was actually around 1978, from a little book called Patterns in Nature, by a certain Peter S. Stevens, that heavily references On Growth and Form, and that I happened across in a bookstore.
I’ve almost never seen mentions of Patterns in Nature, but in some ways it’s a simplified and modernized On Growth and Form, full of photographs comparing biological and non-biological systems, together with diagrams about how various structures can be built. But what was the path from D’Arcy to Patterns in Nature? It’s a typical kind of history question that comes up.
The first thing I noticed is that Peter Stevens (born 1936) was trained as an architect, and spent most of his career around Harvard. In his book, he thanks his father, Stanley Stevens (1906–1973), who was a psychoacoustics expert, who was at Harvard from 1932 on, and who organized a “Science of Science” interdisciplinary discussion group there. But recall that D’Arcy visited Harvard to give the Lowell Lectures in 1936. So that’s no doubt how Stevens, Sr. knew about him.
But in any case, from his Harvard connections came, I believe, the references to D’Arcy by evolutionary biologist Stephen J. Gould, and by John Tyler Bonner, who was the person who created the abridged version of On Growth and Form (sadly, omitting for example the chapter on phyllotaxis). I suspect D’Arcy’s influence on Buckminster Fuller also came through Harvard connections. And maybe Benoit Mandelbrot heard about D’Arcy there too. (One would think that with On Growth and Form being out there as a published book, there wouldn’t be need for word-of-mouth communication, but particularly outside of mainstream areas of science, word of mouth remains surprisingly important.)
But what about Turing? How did he know about D’Arcy? Well, I have at least a guess here. D’Arcy had been good friends in high school with a certain John Scott Haldane, who would go on to be a well-known physiology researcher, and who had a son named J. B. S. Haldane, who became a major figure in evolutionary biology and in the presentation of science to the public. Haldane often referenced D’Arcy, and notably introduced him to Peter Medawar (who would win a Nobel Prize for immunology), of whom D’Arcy (in 1944) would say “I do believe that more than any man you have understood what I have tried to say!”.
Both Medawar and evolutionary biologist (and originator of the term “transhumanism”) Julian Huxley encouraged D’Arcy to think about continuity and gradients in connection with his shape transformations (e.g. of fish). I don’t know the whole story, but I suspect these two connected with C. H. Waddington, a developmental biologist (and inventor of the term “epigenetics”) who interacted with Turing in Cambridge. (Small world that it is, Waddington’s mathematician daughter is married to a distinguished mathematician named John Milnor, with whom I discussed D’Arcy in the early 1980s.) And when Turing came to write about morphogenesis in 1952, he referenced D’Arcy (and Waddington), then proceeded to base his theory on (morphogen) gradients.
In another direction, D’Arcy interacted with early mathematical biologists like Alfred Lotka and Vito Volterra and Nicolas Rashevsky. And though their work was heavily based on differential equations (which D’Arcy didn’t really believe in), he took pains to support them when he could.
On Growth and Form also seems to have been popular in the art and architecture community, with people as diverse as the architects Mies van der Rohe, Le Corbusier, the painter Jackson Pollock and the sculptor Henry Moore also mentioning its influence.
Modern Times
So now that it’s been 100 years since On Growth and Form was published, do we finally understand how biological organisms grow? Lots of work has certainly been done at the genetic and molecular scale, and great progress has been made. But when it comes to macroscopic growth, much less has been done. And a large part of the reason, I suspect, is that it’s needed a new paradigm in order to make progress.
D’Arcy’s work was, more than anything, concerned with analogy and (essentially Aristotelian-style) mechanism. He didn’t really pursue traditional “theory” in the sense of the exact sciences. In his day, though, such theory would normally have meant writing down mathematical equations to represent growth, and then solving them to see what would happen.
And the problem is that when one looks at biological forms, they often seem far too complex to be the results of traditional mathematical equations. But starting in the 1950s a new possibility emerged: perhaps one could model biological growth as following not mathematical equations but instead rules like a program for a computer.
And this is how I came to study On Growth and Form. I viewed it almost as a catalog of biological forms—that I wondered if one could explain with computational rules. I even started collecting specimens—in a very pale shadow of D’Arcy’s efforts (and with no animal skeletons!):
Occasionally I would find one that just seemed to cry out as being from something like a program:
But more than that, I kept on exploring spaces of possible programs—and discovering that the range of forms they produced seem to align remarkably well with the actual range of forms one sees across biological organisms. (I looked particularly at shell shapes and patterns, as well as other pigmentation patterns, and various forms of plants.)
And in a sense what I found strongly supports a core idea of D’Arcy’s: that the forms of organisms are not so much determined by evolution, as by what it’s possible for processes to produce. D’Arcy thought about physical processes and mathematical forms; 60+ years later I was in a position to explore the more general space of computational processes.
And it so happened that, like D’Arcy, I ended up presenting my main results in a (big) book, that I called A New Kind of Science. My main purpose in the book was to describe what I’d learned from exploring the computational universe. And I devoted two sections (out of 114) respectively to “Growth of Plants and Animals” and “Biological Pigmentation Patterns”—producing something that looks a bit similar to On Growth and Form:
So, in the end, what about the fish? Well, I think I’ve managed to understand something about the “morphospace” of possible mollusc shells. And I’ve made a start on leaves—though I’m hoping one of these years to be able to get a lot more data. I’ve also looked at animal skeletons a bit. But, yes, I at least still don’t know about the space of possible fish shapes. Though maybe somewhere inside our image identification neural net (which saw plenty of fish in its training) it already knows. And maybe it agrees with what D’Arcy thought—a hundred years ago.
People are used to producing prose—and sometimes pictures—to express themselves. But in the modern age of computation, something new has become possible that I’d like to call the computational essay.
I’ve been working on building the technology to support computational essays for several decades, but it’s only very recently that I’ve realized just how central computational essays can be to both the way people learn, and the way they communicate facts and ideas. Professionals of the future will routinely deliver results and reports as computational essays. Educators will routinely explain concepts using computational essays. Students will routinely produce computational essays as homework for their classes.
Here’s a very simple example of a computational essay:
There are basically three kinds of things here. First, ordinary text (here in English). Second, computer input. And third, computer output. And the crucial point is that these all work together to express what’s being communicated.
The ordinary text gives context and motivation. The computer input gives a precise specification of what’s being talked about. And then the computer output delivers facts and results, often in graphical form. It’s a powerful form of exposition that combines computational thinking on the part of the human author with computational knowledge and computational processing from the computer.
But what really makes this work is the Wolfram Language—and the succinct representation of high-level ideas that it provides, defining a unique bridge between human computational thinking and actual computation and knowledge delivered by a computer.
In a typical computational essay, each piece of Wolfram Language input will usually be quite short (often not more than a line or two). But the point is that such input can communicate a high-level computational thought, in a form that can readily be understood both by the computer and by a human reading the essay.
It’s essential to all this that the Wolfram Language has so much built-in knowledge—both about the world and about how to compute things in it. Because that’s what allows it to immediately talk not just about abstract computations, but also about real things that exist and happen in the world—and ultimately to provide a true computational communication language that bridges the capabilities of humans and computers.
An Example
Let’s use a computational essay to explain computational essays.
Let’s say we want to talk about the structure of a human language, like English. English is basically made up of words. Let’s get a list of the common ones.
Generate a list of common words in English:
✕
WordList[]
How long is a typical word? Well, we can take the list of common words, and make a histogram that shows their distribution of lengths.
Notice that the word lengths tend to be longer in French. We could investigate whether this is why documents tend to be longer in French than in English, or how this relates to quantities like entropy for text. (Of course, because this is a computational essay, the reader can rerun the computations in it themselves, say by trying Russian instead of French.)
But as something different, let’s compare languages by comparing their translations for, say, the word “computer”.
Find the translations for “computer” in the 10 most common languages:
✕
Take[WordTranslation["computer", All], 10]
Find the first translation in each case:
✕
First /@ Take[WordTranslation["computer", All], 10]
Arrange common languages in “feature space” based on their translations for “computer”:
From this plot, we can start to investigate all sorts of structural and historical relationships between languages. But from the point of view of a computational essay, what’s important here is that we’re sharing the exposition between ordinary text, computer input, and output.
The text is saying what the basic point is. Then the input is giving a precise definition of what we want. And the output is showing what’s true about it. But take a look at the input. Even just by looking at the names of the Wolfram Language functions in it, one can get a pretty good idea what it’s talking about. And while the function names are based on English, one can use “code captions” to understand it in another language, say Japanese:
But let’s say one doesn’t know about FeatureSpacePlot. What is it? If it was just a word or phrase in English, we might be able to look in a dictionary, but there wouldn’t be a precise answer. But a function in the Wolfram Language is always precisely defined. And to know what it does we can start by just looking at its documentation. But much more than that, we can just run it ourselves to explicitly see what it does.
And that’s a crucial part of what’s great about computational essays. If you read an ordinary essay, and you don’t understand something, then in the end you really just have to ask the author to find out what they meant. In a computational essay, though, there’s Wolfram Language input that precisely and unambiguously specifies everything—and if you want to know what it means, you can just run it and explore any detail of it on your computer, automatically and without recourse to anything like a discussion with the author.
Practicalities
How does one actually create a computational essay? With the technology stack we have, it’s very easy—mainly thanks to the concept of notebooks that we introduced with the first version of Mathematica all the way back in 1988. A notebook is a structured document that mixes cells of text together with cells of Wolfram Language input and output, including graphics, images, sounds, and interactive content:
In modern times one great (and very hard to achieve!) thing is that full Wolfram Notebooks run seamlessly across desktop, cloud and mobile. You can author a notebook in the native Wolfram Desktop application (Mac, Windows, Linux)—or on the web through any web browser, or on mobile through the Wolfram Cloud app. Then you can share or publish it through the Wolfram Cloud, and get access to it on the web or on mobile, or download it to desktop or, now, iOS devices.
Sometimes you want the reader of a notebook just to look at it, perhaps opening and closing groups of cells. Sometimes you also want them to be able to operate the interactive elements. And sometimes you want them to be able to edit and run the code, or maybe modify the whole notebook. And the crucial point is that all these things are easy to do with the cloud-desktop-mobile system we’ve built.
A New Form of Student Work
Computational essays are great for students to read, but they’re also great for students to write. Most of the current modalities for student work are remarkably old. Write an essay. Give a math derivation. These have been around for millennia. Not that there’s anything wrong with them. But now there’s something new: write a computational essay. And it’s wonderfully educational.
A computational essay is in effect an intellectual story told through a collaboration between a human author and a computer. The computer acts like a kind of intellectual exoskeleton, letting you immediately marshall vast computational power and knowledge. But it’s also an enforcer of understanding. Because to guide the computer through the story you’re trying to tell, you have to understand it yourself.
When students write ordinary essays, they’re typically writing about content that in some sense “already exists” (“discuss this passage”; “explain this piece of history”; …). But in doing computation (at least with the Wolfram Language) it’s so easy to discover new things that computational essays will end up with an essentially inexhaustible supply of new content, that’s never been seen before. Students will be exploring and discovering as well as understanding and explaining.
When you write a computational essay, the code in your computational essay has to produce results that fit with the story you’re telling. It’s not like you’re doing a mathematical derivation, and then some teacher tells you you’ve got the wrong answer. You can immediately see what your code does, and whether it fits with the story you’re telling. If it doesn’t, well then maybe your code is wrong—or maybe your story is wrong.
What should the actual procedure be for students producing computational essays? At this year’s Wolfram Summer School we did the experiment of asking all our students to write a computational essay about anything they knew about. We ended up with 72 interesting essays—exploring a very wide range of topics.
In a more typical educational setting, the “prompt” for a computational essay could be something like “What is the typical length of a word in English” or “Explore word lengths in English”.
There’s also another workflow I’ve tried. As the “classroom” component of a class, do livecoding (or a live experiment). Create or discover something, with each student following along by doing their own computations. At the end of the class, each student will have a notebook they made. Then have their “homework” be to turn that notebook into a computational essay that explains what was done.
And in my experience, this ends up being a very good exercise—that really tests and cements the understanding students have. But there’s also something else: when students have created a computational essay, they have something they can keep—and directly use—forever.
And this is one of the great general features of computational essays. When students write them, they’re in effect creating a custom library of computational tools for themselves—that they’ll be in a position to immediately use at any time in the future. It’s far too common for students to write notes in a class, then never refer to them again. Yes, they might run across some situation where the notes would be helpful. But it’s often hard to motivate going back and reading the notes—not least because that’s only the beginning; there’s still the matter of implementing whatever’s in the notes.
But the point is that with a computational essay, once you’ve found what you want, the code to implement it is right there—immediately ready to be applied to whatever has come up.
Any Subject You Want
What can computational essays be about? Almost anything! I’ve often said that for any field of study X (from archaeology to zoology), there either is now, or soon will be, a “computational X”. And any “computational X” can immediately be explored and explained using computational essays.
But even when there isn’t a clear “computational X” yet, computational essays can still be a powerful way to organize and present material. In some sense, the very fact that a sequence of computations are typically needed to “tell the story” in an essay helps define a clear backbone for the whole essay. In effect, the structured nature of the computational presentation helps suggest structure for the narrative—making it easier for students (and others) to write essays that are easy to read and understand.
But what about actual subject matter? Well, imagine you’re studying history—say the history of the English Civil War. Well, conveniently, the Wolfram Language has a lot of knowledge about history (as about so many other things) built in. So you can present the English Civil War through a kind of dialog with it. For example, you can ask it for the geography of battles:
You could ask for a timeline of the beginning of the war (you don’t need to say “first 15 battles”, because if one cares, one can just read that from the Wolfram Language code):
You could start looking at how armies moved, or who won and who lost at different points. At first, you can write a computational essay in which the computations are basically just generating custom infographics to illustrate your narrative. But then you can go further—and start really doing “computational history”. You can start to compute various statistical measures of the progress of the war. You can find ways to quantitatively compare it to other wars, and so on.
Can you make a “computational essay” about art? Absolutely. Maybe about art history. Pick 10 random paintings by van Gogh:
Maybe you could go on and check it for exoplanets. Or you could start solving the equations of motion for planets.
You could look at biology. Here’s the first beginning of the reference sequence for the human mitochondrion:
✕
GenomeData[{"Mitochondrion", {1, 150}}]
You can start off breaking it into possible codons:
✕
StringPartition[%, 3]
There’s an immense amount of data about all kinds of things built into the Wolfram Language. But there’s also the Wolfram Data Repository, which contains all sorts of specific datasets. Like here’s a map of state fairgrounds in the US:
✕
GeoListPlot[
ResourceData["U.S. State Fairgrounds"][All, "GeoPosition"]]
And here’s a word cloud of the constitutions of countries that have been enacted since 2010:
Quite often one’s interested in dealing not with public data, but with some kind of local data. One convenient source of this is the Wolfram Data Drop. In an educational setting, particular databins (or cloud objects in general) can be set so that they can be read (and/or added to) by some particular group. Here’s a databin that I accumulate for myself, showing my heart rate through the day. Here it is for today:
✕
DateListPlot[TimeSeries[YourDatabinHere]]
Of course, it’s easy to make a histogram too:
✕
Histogram[TimeSeries[YourDatabinHere]]
What about math? A key issue in math is to understand why things are true. The traditional approach to this is to give proofs. But computational essays provide an alternative. The nature of the steps in them is different—but the objective is the same: to show what’s true and why.
As a very simple example, let’s look at primes. Here are the first 50:
✕
Table[Prime[n], {n, 50}]
Let’s find the remainder mod 6 for all these primes:
✕
Mod[Table[Prime[n], {n, 50}], 6]
But why do only 1 and 5 occur (well, after the trivial cases of the primes 2 and 3)? We can see this by computation. Any number can be written as 6n+k for some n and k:
✕
Table[6 n + k, {k, 0, 5}]
But if we factor numbers written in this form, we’ll see that 6n+1 and 6n+5 are the only ones that don’t have to be multiples:
✕
Factor[%]
What about computer science? One could for example write a computational essay about implementing Euclid’s algorithm, studying its running time, and so on.
Define a function to give all steps in Euclid’s algorithm:
If you wanted to get deeper into software engineering, you could write a computational essay about the HTTP protocol. This gets an HTTP response from a site:
✕
URLRead["https://www.wolfram.com"]
And this shows the tree structure of the elements on the webpage at that URL:
As far as I’m concerned, for a computational essay to be good, it has to be as easy to understand as possible. The format helps quite a lot, of course. Because a computational essay is full of outputs (often graphical) that are easy to skim, and that immediately give some impression of what the essay is trying to say. It also helps that computational essays are structured documents, that deliver information in well-encapsulated pieces.
But ultimately it’s up to the author of a computational essay to make it clear. But another thing that helps is that the nature of a computational essay is that it must have a “computational narrative”—a sequence of pieces of code that the computer can execute to do what’s being discussed in the essay. And while one might be able to write an ordinary essay that doesn’t make much sense but still sounds good, one can’t ultimately do something like that in a computational essay. Because in the end the code is the code, and actually has to run and do things.
So what can go wrong? Well, like English prose, Wolfram Language code can be unnecessarily complicated, and hard to understand. In a good computational essay, both the ordinary text, and the code, should be as simple and clean as possible. I try to enforce this for myself by saying that each piece of input should be at most one or perhaps two lines long—and that the caption for the input should always be just one line long. If I’m trying to do something where the core of it (perhaps excluding things like display options) takes more than a line of code, then I break it up, explaining each line separately.
Another important principle as far as I’m concerned is: be explicit. Don’t have some variable that, say, implicitly stores a list of words. Actually show at least part of the list, so people can explicitly see what it’s like. And when the output is complicated, find some tabulation or visualization that makes the features you’re interested in obvious. Don’t let the “key result” be hidden in something that’s tucked away in the corner; make sure the way you set things up makes it front and center.
Use the structured nature of notebooks. Break up computational essays with section headings, again helping to make them easy to skim. I follow the style of having a “caption line” before each input. Don’t worry if this somewhat repeats what a paragraph of text has said; consider the caption something that someone who’s just “looking at the pictures” might read to understand what a picture is of, before they actually dive into the full textual narrative.
The technology of Wolfram Notebooks makes it straightforward to put in interactive elements, like Manipulate, into computational essays. And sometimes this is very helpful, and perhaps even essential. But interactive elements shouldn’t be overused. Because whenever there’s an element that requires interaction, this reduces the ability to skim the essay.
Sometimes there’s a fair amount of data—or code—that’s needed to set up a particular computational essay. The cloud is very useful for handling this. Just deploy the data (or code) to the Wolfram Cloud, and set appropriate permissions so it can automatically be read whenever the code in your essay is executed.
Notebooks also allow “reverse closing” of cells—allowing an output cell to be immediately visible, even though the input cell that generated it is initially closed. This kind of hiding of code should generally be avoided in the body of a computational essay, but it’s sometimes useful at the beginning or end of an essay, either to give an indication of what’s coming, or to include something more advanced where you don’t want to go through in detail how it’s made.
OK, so if a computational essay is done, say, as homework, how can it be assessed? A first, straightforward question is: does the code run? And this can be determined pretty much automatically. Then after that, the assessment process is very much like it would be for an ordinary essay. Of course, it’s nice and easy to add cells into a notebook to give comments on what’s there. And those cells can contain runnable code—that for example can take results in the essay and process or check them.
Are there principles of good computational essays? Here are a few candidates:
0. Understand what you’re talking about (!)
1. Find the most straightforward and direct way to represent your subject matter
2. Keep the core of each piece of Wolfram Language input to a line or two
3. Use explicit visualization or other information presentation as much as possible
4. Try to make each input+caption independently understandable
5. Break different topics or directions into different subsections
Learning the Language
At the core of computational essays is the idea of expressing computational thoughts using the Wolfram Language. But to do that, one has to know the language. Now, unlike human languages, the Wolfram Language is explicitly designed (and, yes, that’s what I’ve been doing for the past 30+ years) to follow definite principles and to be as easy to learn as possible. But there’s still learning to be done.
One feature of the Wolfram Language is that—like with human languages—it’s typically easier to read than to write. And that means that a good way for people to learn what they need to be able to write computational essays is for them first to read a bunch of essays. Perhaps then they can start to modify those essays. Or they can start creating “notes essays”, based on code generated in livecoding or other classroom sessions.
As people get more fluent in writing the Wolfram Language, something interesting happens: they start actually expressing themselves in the language, and using Wolfram Language input to carry significant parts of the narrative in a computational essay.
When I was writing An Elementary Introduction to the Wolfram Language (which itself is written in large part as a sequence of computational essays) I had an interesting experience. Early in the book, it was decently easy to explain computational exercises in English (“Make a table of the first 10 squares”). But a little later in the book, it became a frustrating process.
It was easy to express what I wanted in the Wolfram Language. But to express it in English was long and awkward (and had a tendency of sounding like legalese). And that’s the whole point of using the Wolfram Language, and the reason I’ve spent 30+ years building it: because it provides a better, crisper way to express computational thoughts.
It’s sometimes said of human languages that the language you use determines how you think. It’s not clear how true this is of human languages. But it’s absolutely true of computer languages. And one of the most powerful things about the Wolfram Language is that it helps one formulate clear computational thinking.
Traditional computer languages are about writing code that describes the details of what a computer should do. The point of the Wolfram Language is to provide something much higher level—that can immediately talk about things in the world, and that can allow people as directly as possible to use it as a medium of computational thinking. And in a sense that’s what makes a good computational essay possible.
The Long Path to Computational Essays
Now that we have full-fledged computational essays, I realize I’ve been on a path towards them for nearly 40 years. At first I was taking interactive computer output and Scotch-taping descriptions into it:
By 1981, when I built SMP, I was routinely writing documents that interspersed code and explanations:
But it was only in 1986, when I started documenting what became Mathematica and the Wolfram Language, that I started seriously developing a style close to what I now favor for computational essays:
And with the release of Mathematica 1.0 in 1988 came another critical element: the invention of Wolfram Notebooks. Notebooks arrived in a form at least superficially very similar to the way they are today (and already in many ways more sophisticated than the imitations that started appearing 25+ years later!): collections of cells arranged into groups, and capable of containing text, executable code, graphics, etc.
At first notebooks were only possible on Mac and NeXT computers. A few years later they were extended to Microsoft Windows and X Windows (and later, Linux). But immediately people started using notebooks both to provide reports about they’d done, and to create rich expository and educational material. Within a couple of years, there started to be courses based on notebooks, and books printed from notebooks, with interactive versions available on CD-ROM at the back:
So in a sense the raw material for computational essays already existed by the beginning of the 1990s. But to really make computational essays come into their own required the development of the cloud—as well as the whole broad range of computational knowledge that’s now part of the Wolfram Language.
By 1990 it was perfectly possible to create a notebook with a narrative, and people did it, particularly about topics like mathematics. But if there was real-world data involved, things got messy. One had to make sure that whatever was needed was appropriately available from a distribution CD-ROM or whatever. We created a Player for notebooks very early, that was sometimes distributed with notebooks.
But in the last few years, particularly with the development of the Wolfram Cloud, things have gotten much more streamlined. Because now you can seamlessly store things in the cloud and use them anywhere. And you can work directly with notebooks in the cloud, just using a web browser. In addition, thanks to lots of user-assistance innovations (including natural language input), it’s become even easier to write in the Wolfram Language—and there’s ever more that can be achieved by doing so.
And the important thing that I think has now definitively happened is that it’s become lightweight enough to produce a good computational essay that it makes sense to do it as something routine—either professionally in writing reports, or as a student doing homework.
Ancient Educational History
The idea of students producing computational essays is something new for modern times, made possible by a whole stack of current technology. But there’s a curious resonance with something from the distant past. You see, if you’d learned a subject like math in the US a couple of hundred years ago, a big thing you’d have done is to create a so-called ciphering book—in which over the course of several years you carefully wrote out the solutions to a range of problems, mixing explanations with calculations. And the idea then was that you kept your ciphering book for the rest of your life, referring to it whenever you needed to solve problems like the ones it included.
Well, now, with computational essays you can do very much the same thing. The problems you can address are vastly more sophisticated and wide-ranging than you could reach with hand calculation. But like with ciphering books, you can write computational essays so they’ll be useful to you in the future—though now you won’t have to imitate calculations by hand; instead you’ll just edit your computational essay notebook and immediately rerun the Wolfram Language inputs in it.
I actually only learned about ciphering books quite recently. For about 20 years I’d had essentially as an artwork a curious handwritten notebook (created in 1818, it says, by a certain George Lehman, apparently of Orwigsburg, Pennsylvania), with pages like this:
I now know this is a ciphering book—that on this page describes how to find the “height of a perpendicular object… by having the length of the shadow given”. And of course I can’t resist a modern computational essay analog, which, needless to say, can be a bit more elaborate.
Find the current position of the Sun as azimuth, altitude:
✕
SunPosition[]
Find the length of a shadow for an object of unit height:
✕
1/Tan[SunPosition[][[2]]]
Given a 10-ft shadow, find the height of the object that made it:
✕
Tan[SunPosition[][[2]]]10ft
The Path Ahead
I like writing textual essays (such as blog posts!). But I like writing computational essays more. Because at least for many of the things I want to communicate, I find them a purer and more efficient way to do it. I could spend lots of words trying to express an idea—or I can just give a little piece of Wolfram Language input that expresses the idea very directly and shows how it works by generating (often very visual) output with it.
When I wrote my big book A New Kind of Science (from 1991 to 2002), neither our technology nor the world was quite ready for computational essays in the form in which they’re now possible. My research for the book filled thousands of Wolfram Notebooks. But when it actually came to putting together the book, I just showed the results from those notebooks—including a little of the code from them in notes at the back of the book.
But now the story of the book can be told in computational essays—that I’ve been starting to produce. (Just for fun, I’ve been livestreaming some of the work I’m doing to create these.) And what’s very satisfying is just how clearly and crisply the ideas in the book can be communicated in computational essays.
There is so much potential in computational essays. And indeed we’re now starting the project of collecting “topic explorations” that use computational essays to explore a vast range of topics in unprecedentedly clear and direct ways. It’ll be something like our Wolfram Demonstrations Project (that now has 11,000+ Wolfram Language–powered Demonstrations). Here’s a typical example I wrote:
Computational essays open up all sorts of new types of communication. Research papers that directly present computational experiments and explorations. Reports that describe things that have been found, but allow other cases to be immediately explored. And, of course, computational essays define a way for students (and others) to very directly and usefully showcase what they’ve learned.
There’s something satisfying about both writing—and reading—computational essays. It’s as if in communicating ideas we’re finally able to go beyond pure human effort—and actually leverage the power of computation. And for me, having built the Wolfram Language to be a computational communication language, it’s wonderful to see how it can be used to communicate so effectively in computational essays.
It’s so nice when I get something sent to me as a well-formed computational essay. Because I immediately know that I’m going to get a straight story that I can actually understand. There aren’t going to be all sorts of missing sources and hidden assumptions; there’s just going to be Wolfram Language input that stands alone, and that I can take out and study or run for myself.
The modern world of the web has brought us a few new formats for communication—like blogs, and social media, and things like Wikipedia. But all of these still follow the basic concept of text + pictures that’s existed since the beginning of the age of literacy. With computational essays we finally have something new—and it’s going to be exciting to see all the things it makes possible.
Catch a current livestream, or watch recorded livestreams here »
Thinking in Public
I’ve been CEOing Wolfram Research for more than 30 years now. But what does that actually entail? What do I end up doing on a typical day? I certainly work hard. But I think I’m not particularly typical of CEOs of tech companies our size. Because for me a large part of my time is spent on the front lines of figuring out how our products should be designed and architected, and what they should do.
Thirty years ago I mostly did this by myself. But nowadays I’m almost always working with groups of people from our 800 or so employees. I like to do things very interactively. And in fact, for the past 15 years or so I’ve spent much of my time doing what I often call “thinking in public”: solving problems and making decisions live in meetings with other people.
I’m often asked how this works, and what actually goes on in our meetings. And recently I realized: what better way to show (and perhaps educate) people than just to livestream lots of our actual meetings? So over the past couple of months, I’ve livestreamed over 40 hours of my internal meetings—in effect taking everyone behind the scenes in what I do and how our products are created. (Yes, the livestreams are also archived.)
In the world at large, people often complain that “nothing happens in meetings”. Well, that’s not true of my meetings. In fact, I think it’s fair to say that in every single product-design meeting I do, significant things are figured out, and at least some significant decisions are made. So far this year, for example, we’ve added over 250 completely new functions to the Wolfram Language. Each one of those went through a meeting of mine. And quite often the design, the name, or even the very idea of the function was figured out live in the meeting.
There’s always a certain intellectual intensity to our meetings. We’ll have an hour or whatever, and we’ll have to work through what are often complex issues, that require a deep understanding of some area or another—and in the end come up with ideas and decisions that will often have very long-term consequences.
I’ve worked very hard over the past 30+ years to maintain the unity and coherence of the Wolfram Language. But every day I’m doing meetings where we decide about new things to be added to the language—and it’s always a big challenge and a big responsibility to maintain the standards we’ve set, and to make sure that the decisions we make today will serve us well in the years to come.
What are the important functions in a particular area? How do they relate to other functions? Do they have the correct names? How can we deal with seemingly incompatible design constraints? Are people going to understand these functions? Oh, and are related graphics or icons as good and clear and elegant as they can be?
By now I basically have four decades of experience in figuring things like this out—and many of the people I work with are also very experienced. Usually a meeting will start with some proposal that’s been developed for how something should work. And sometimes it’ll just be a question of understanding what’s proposed, thinking it through, and then confirming it. But often—in order to maintain the standards we’ve set—there are real problems that still have to be solved. And a meeting will go back and forth, grappling with some issue or another.
Ideas will come up, often to be shot down. Sometimes it’ll feel like we’re completely stuck. But everyone in the meeting knows this isn’t an exercise; we’ve got to come up with an actual answer. Sometimes I’ll be trying to make analogies—to find somewhere else where we’ve solved a similar problem before. Or I’ll be insisting we go back to first principles—to kind of the center of the problem—to understand everything from the beginning. People will bring up lots of detailed academic or technical knowledge—and I’ll usually be trying to extract the essence of what it should be telling us.
It’d certainly be a lot easier if our standards were lower. But we don’t want a committee-compromise result. We want actual, correct answers that will stand the test of time. And these often require actual new ideas. But in the end it’s typically tremendously satisfying. We put in lots of work and thinking—and eventually we get a solution, and it’s a really good solution, that’s a real intellectual achievement.
Usually all of this goes on in private, inside our company. But with the livestream, anyone can see it happening—and can see the moment when some function is named, or some problem is solved.
What Are the Meetings Like?
What will actually be going on if you tune into a livestream? It’s pretty diverse. You might see some new Wolfram Language function being tried out (often based on code that’s only days or even hours old). You might see a discussion about software engineering, or trends in machine learning, or the philosophy of science, or how to handle some issue of popular culture, or what it’s going to take to fix some conceptual bug. You might see some new area get started, you might see some specific piece of Wolfram Language documentation get finished, or you might see a piece of final visual design get done.
There’s quite a range of people in our meetings, with a whole diversity of accents and backgrounds and specialties. And it’s pretty common for us to need to call in some extra person with specific expertise we hadn’t thought was needed. (I find it a little charming that our company culture is such that nobody ever seems surprised to be called into a meeting and asked about a detail of some unusual topic they had no idea was relevant to us before.)
We’re a very geographically distributed company (I’ve been a remote CEO since 1991). So basically all our meetings are through webconferencing. (We use audio and screensharing, but we never find video helpful, except perhaps for looking at a mobile device or a book or a drawing on a piece of paper.)
Most often we’re looking at my screen, but sometimes it’ll be someone else’s screen. (The most common reason to look at someone else’s screen is to see something that’s only working on their machine so far.) Most often I’ll be working in a Wolfram Notebook. Usually there’ll be an initial agenda in a notebook, together with executable Wolfram Language code. We’ll start from that, but then I’ll be modifying the notebook, or creating a new one. Often I’ll be trying out design ideas. Sometimes people will be sending code fragments for me to run, or I’ll be writing them myself. Sometimes I’ll be live-editing our main documentation. Sometimes we’ll be watching graphic design being done in real time.
As much as possible, the goal in our meetings is to finish things. To consult in real time with all the people who have input we need, and to get all the ideas and issues about something resolved. Yes, sometimes, afterwards, someone (sometimes me) will realize that something we thought we figured out isn’t correct, or won’t work. But the good news is that that’s pretty rare, probably because the way we run our meetings, things get well aired in real time.
People in our meetings tend to be very direct. If they don’t agree with something, they’ll say so. I’m very keen that everyone in a meeting actually understands anything that’s relevant to them—so we get the benefit of their thinking and judgement about it. (That probably leads to an over-representation from me of phrases like “does that make sense?” or “do you get what I’m saying?”.)
It really helps, of course, that we have very talented people, who are quick at understanding things. And by now everyone knows that even if the main topic of a meeting is one thing, it’s quite likely that we’ll have to dip into something completely different in order to make progress. It requires a certain intellectual agility to keep up with this—but if nothing else, I think that’s on its own a great thing to practice and cultivate.
For me it’s very invigorating to work on so many different topics—often wildly different even between successive hours in a day. It’s hard work, but it’s also fun. And, yes, there is often humor, particularly in the specifics of the examples we’ll end up discussing (lots of elephants and turtles, and strange usage scenarios).
The meetings vary in size from 2 or 3 people to perhaps 20 people. Sometimes people will be added and dropped through the course of the meeting, as the details of what we’re discussing change. Particularly in larger meetings—that tend to be about projects that cut across multiple groups—we’ll typically have one or more project managers (we call them “PMs”) present. The PMs are responsible for the overall flow of the project—and particularly for coordinating between different groups that need to contribute.
If you listen to the livestream, you’ll hear a certain amount of jargon. Some of it is pretty typical in the software industry (UX = user experience, SQA = software quality assurance). Some of it is more specific to our company—like acronyms for departments (DQA = Document Quality Assurance, WPE = Web Product Engineering) or names of internal things (XKernel = prototype Wolfram Language build, pods = elements of Wolfram|Alpha output, pinkboxing = indicating undisplayable output, knitting = crosslinking elements of documentation). And occasionally, of course, there’s a new piece of jargon, or a new name for something, invented right in the meeting.
Usually our meetings are pretty fast paced. An idea will come up—and immediately people are responding to it. And as soon as something’s been decided, people will start building on the decision, and figuring out more. It’s remarkably productive, and I think it’s a pretty interesting process to watch. Even though without the experience base that the people in the meeting have, there may be some points at which it seems as if ideas are flying around too fast to keep track of what’s going on.
The Process of Livestreaming
The idea of livestreaming our internal meetings is new. But over the years I’ve done a fair amount of livestreaming for other purposes.
Back in 2009, when we launched Wolfram|Alpha, we actually livestreamed the process of making the site live. (I figured that if things went wrong, we might as well just show everyone what actually went wrong, rather than just putting up a “site unavailable” message.)
I’ve livestreamed demos and explorations of new software we’ve released. I’ve livestreamed work I happen to be doing in writing code or producing “computational essays”. (My son Christopher is arguably a faster Wolfram Language programmer than me, and he’s livestreamed some livecoding he’s done too.) I’ve also livestreamed live experiments, particularly from our Wolfram Summer School and Wolfram Summer Camp.
But until recently, all my livestreaming had basically been solo: it hadn’t involved having other people in the livestream. But I’ve always thought our internal design review meetings are pretty interesting, so I thought “why not let other people listen in on them too?”. I have to admit I was a little nervous about this at first. After all, these meetings are pretty central to what our company does, and we can’t afford to have them be dragged down by anything.
And so I’ve insisted that a meeting has to be just the same whether it’s livestreamed or not. My only immediate concession to livestreaming is that I give a few sentences of introduction to explain roughly what the meeting is going to be about. And the good news has been that as soon as a meeting gets going, the people in it (including myself) seem to rapidly forget that it’s being livestreamed—and just concentrate on the (typically quite intense) things that are going on in the meeting.
But something interesting that happens when we’re livestreaming a meeting is that there’s real-time text chat with viewers. Often it’s questions and general discussion. But sometimes it’s interesting comments or suggestions about what we’re doing or saying. It’s like having instant advisors, or an instant focus group, giving us real-time input or feedback about our decisions.
As a practical matter, the primary people in the meeting are too focused on the meeting itself to be handling text chat. So we have separate people doing that—surfacing a small number of the most relevant comments and suggestions. And this has worked great—and in fact in most meetings at least one or two good ideas come from our viewers, that we’re instantly able to incorporate into our thinking.
One can think of livestreaming as something a bit like reality TV—except that it’s live and real time. We’re planning to have some systematic “broadcast times” for recorded material. But the live component has the constraint that it has to happen when the meetings are actually happening. I tend to have a very full and complex schedule, packing in all the various things I do. And exactly when a particular design review meeting can happen will often depend on when a particular piece of code or design work is ready.
It will also depend on the availability of the various other people in the meetings—who have their own constraints, and often live in a wide range of time zones. I’ve tried other approaches, but the most common thing now is that design review meetings are scheduled soon before they actually happen, and typically not more than a day or two in advance. And even though I personally work at night as well as during the day, most design reviews tend to get scheduled during US (East Coast) working hours, because that’s when it’s easiest to arrange for all the people who have to be in the meeting—as well as people who might be called in if their expertise is needed.
From the point of view of livestreaming, it would be nice to have a more predictable schedule of relevant meetings, but the meetings are being set up to achieve maximum productivity in their own right—and livestreaming is just an add-on.
We’re trying to use Twitter to give some advance notice of livestreaming. But in the end the best indication of when a livestream is starting is just the notification that comes from the Twitch livestreaming platform we’re using. (Yes, Twitch is mainly used for e-sports right now, but we [and they] hope it can be used for other things too—and with their e-sports focus, their technology for screensharing has become very good. Curiously, I’ve been aware of Twitch for a long time. I met its founders at the very first Y Combinator Demo Day in 2005, and we used its precursor, justin.tv, to livestream the Wolfram|Alpha launch.)
Styles of Work
Not all the work I do is suitable for livestreaming. In addition to “thinking in public” in meetings, I also spend time “thinking in private”, doing things like just writing. (I actually spent more than 10 years almost exclusively “thinking in private” when I worked on my book A New Kind of Science.)
If I look at my calendar for a given week, I’ll see a mixture of things. Every day there are typically at least one or two design reviews of the kind I’ve been livestreaming. There are also a fair number of project reviews, where I’m trying to help move all kinds of projects along. And there are some strategy and management discussions too, along with the very occasional external meeting.
Our company is weighted very heavily towards R&D—and trying to build the best possible products. And that’s certainly reflected in the way I spend my time—and in my emphasis on intellectual rather than commercial value. Some people might think that after all these years I couldn’t possibly still be involved in the level of detail that’s in evidence in the design reviews we’ve been livestreaming.
But here’s the thing: I’m trying hard to design the Wolfram Language in the very best possible way for the long term. And after 40 years of doing software design, I’m pretty experienced at it. So I’m both fairly fast at doing it, and fairly good at not making mistakes. By now, of course, there are many other excellent software designers at our company. But I’m still the person who has the most experience with Wolfram Language design—as well as the most global view of the system (which is part of why in design review meetings, I end up spending some fraction of my time just connecting different related design efforts).
And, yes, I get involved in details. What exactly should the name of that option be? What color should that icon be? What should this function do in a particular corner case? And, yes, every one of these things could be solved in some way without me. But in a fairly short time, I can help make sure that what we have is really something that we can build on—and be proud of—in the years to come. And I consider it a good and worthy way for me to spend my time.
And it’s fun to be able to open up this process for people, by livestreaming the meetings we have. I’m hoping it’ll be useful for people to understand a bit about what goes into creating the Wolfram Language (and yes, software design often tends to be a bit unsung, and mainly noticed only if it’s got wrong—so it’s nice to be able to show what’s actually involved).
In a sense, doing the design of the Wolfram Language is a very concentrated and high-end example of computational thinking. And I hope that by experiencing it in watching our meetings, people will learn more about how they can do computational thinking themselves.
The meetings that we’re livestreaming now are about features of the Wolfram Language etc. that we currently have under development. But with our aggressive schedule of releasing software, it shouldn’t be long before the things we’re talking about are actually released in working products. And when that happens, there’ll be something quite unique about it. Because for the first time ever, people will not only be able to see what got done, but they’ll also be able to go back to a recorded livestream and see how it came to be figured out.
It’s an interesting and unique record of a powerful form of intellectual activity. But for me it’s already nice just to be able to share some of the fascinating conversations I end up being part of every day. And to feel like the time I’m spending as a very hands-on CEO not only advances the Wolfram Language and the other things we’re building, but can also directly help educate—and perhaps entertain—a few more people out in the world.
Let’s say we had a way to distribute beacons around our solar system (or beyond) that could survive for billions of years, recording what our civilization has achieved. What should they be like?
It’s easy to come up with what I consider to be sophomoric answers. But in reality I think this is a deep—and in some ways unsolvable—philosophical problem, that’s connected to fundamental issues about knowledge, communication and meaning.
Still, a friend of mine recently started a serious effort to build little quartz disks, etc., and have them hitch rides on spacecraft, to be deposited around the solar system. At first I argued that it was all a bit futile, but eventually I agreed to be an advisor to the project, and at least try to figure out what to do to the extent we can.
But, OK, so what’s the problem? Basically it’s about communicating meaning or knowledge outside of our current cultural and intellectual context. We just have to think about archaeology to know this is hard. What exactly was some arrangement of stones from a few thousand years ago for? Sometimes we can pretty much tell, because it’s close to something in our current culture. But a lot of the time it’s really hard to tell.
OK, but what are the potential use cases for our beacons? One might be to back up human knowledge so things could be restarted even if something goes awfully wrong with our current terrestrial civilization. And of course historically it was very fortunate that we had all those texts from antiquity when things in Europe restarted during the Renaissance. But part of what made this possible was that there had been a continuous tradition of languages like Latin and Greek—not to mention that it was humans that were both the creators and consumers of the material.
But what if the consumers of the beacons we plan to spread around the solar system are aliens, with no historical connection to us? Well, then it’s a much harder problem.
In the past, when people have thought about this, there’s been a tendency to say “just show them math: it’s universal, and it’ll impress them!” But actually, I think neither claim about math is really true.
To understand this, we have to dive a little into some basic science that I happen to have spent many years working on. The reason people think math is a candidate for universal communication is that its constructs seem precise, and that at least here on Earth there’s only one (extant) version of it, so it seems definable without cultural references. But if one actually starts trying to work out how to communicate about current math without any assumptions (as, for example, I did as part of consulting on the Arrival movie), one quickly discovers that one really has to go “below math” to get to computational processes with simpler rules.
And (as seems to happen with great regularity, at least to me) one obvious place one lands is with cellular automata. It’s easy to show an elaborate pattern that’s created according to simple well-defined rules:
But here’s the problem: there are plenty of physical systems that basically operate according to rules like these, and produce similarly elaborate patterns. So if this is supposed to show the impressive achievement of our civilization, it fails.
OK, but surely there must be something we can show that makes it clear that we’ve got some special spark of intelligence. I certainly always assumed there was. But one of the things that’s come out of the basic science I’ve done is what I called the Principle of Computational Equivalence, that basically says that once one’s gotten beyond a very basic level, every system will show behavior that’s equivalent in the sophistication of the computation it exhibits.
So although we’re very proud of our brains, and our computers, and our mathematics, they’re ultimately not going to be able to produce anything that’s beyond what simple programs like cellular automata—or, for that matter, “naturally occurring” physical systems—can produce. So when we make an offhand comment like “the weather has a mind of its own”, it’s not so silly: the fluid dynamic processes that lead to the weather are computationally equivalent to the processes that, for example, go on in our brains.
It’s a natural human tendency at this point to protest that surely there must be something special about us, and everything we’ve achieved with our civilization. People may say, for example, that there’s no meaning and no purpose to what the weather does. Of course, we can certainly attribute such things to it (“it’s trying to equalize temperatures between here and there”, etc.), and without some larger cultural story there’s no meaningful way to say if they’re “really there” or not.
OK, so if showing a sophisticated computation isn’t going to communicate what’s special about us and our civilization, what is? The answer is in the end details. Sophisticated computation is ubiquitous in our universe. But what’s inevitably special about us is the details of our history and what we care about.
We’re learning the same thing as we watch the progress of artificial intelligence. Increasingly, we can automate the things we humans can do—even ones that involve reasoning, or judgement, or creativity. But what we (essentially by definition) can’t automate is defining what we want to do, and what our goals are. For these are intimately connected to the details of our biological existence, and the history of our civilization—which is exactly what’s special about us.
But, OK, how can we communicate these things? Well, it’s hard. Because—needless to say—they’re tied into aspects of us that are special, and that won’t necessarily be shared with whatever we’re trying to communicate with.
At the end of the day, though, we’ve got a project that’s going to launch beacons on spacecraft. So what’s the best thing to put on them? I’ve spent a significant part of my life building what’s now the Wolfram Language, whose core purpose is to provide a precise language for communicating knowledge that our civilization has accumulated in a way that both us humans, and computers, can understand. So perhaps this—and my experience with it—can help. But first, we should talk about history to get an idea of what has and hasn’t worked in the past.
Lessons from the Past
A few years ago I was visiting a museum and looking at little wooden models of life in ancient Egypt that had been buried with some king several millennia ago. “How sad,” I thought. “They imagined this would help them in the afterlife. But it didn’t work; instead it just ended up in a museum.” But then it struck me: “No, it did work! This is their ‘afterlife’!” And they successfully transmitted some essence of their life to a world far beyond their own.
Of course, when we look at these models, it helps that a lot of what’s in them is familiar from modern times. Cows. A boat with oars. Scrolls. But some isn’t that familiar. What are those weird things at the ends of the boat, for example? What’s the purpose of those? What are they for? And here begins the challenge—of trying to understand without shared context.
I happened last summer to visit an archaeological site in Peru named Caral, that has all sorts of stone structures built more than 4000 years ago. It was pretty obvious what some of the structures were for. But others I couldn’t figure out. So I kept on asking our guide. And almost always the answer was the same: “it was for ceremonial purposes”.
Immediately I started thinking about modern structures. Yes, there are monuments and public artworks. But there are also skyscrapers, stadiums, cathedrals, canals, freeway interchanges and much more. And people have certain almost-ritual practices in interacting with these structures. But in the context of modern society, we would hardly call them “ceremonial”: we think of each type of structure as having a definite purpose which we can describe. But that description inevitably involves a considerable depth of cultural context.
When I was growing up in England, I went wandering around in woods near where I lived—and came across all sorts of pits and berms and other earthworks. I asked people what they were. Some said they were ancient fortifications; some said at least the pits were from bombs dropped in World War II. And who knows: maybe instead they were created by some process of erosion having nothing to do with people.
Almost exactly 50 years ago, as a young child vacationing in Sicily, I picked up this object on a beach:
Being very curious what it was, I took it to my local archaeology museum. “You’ve come to the wrong place, young man,” they said, “it’s obviously a natural object.” So off I went to a natural history museum, only to be greeted with “Sorry, it’s not for us; it’s an artifact”. And from then until now the mystery has remained (though with modern materials analysis techniques it could perhaps be resolved—and I obviously should do it!)
There are so many cases where it’s hard to tell if something is an artifact or not. Consider all the structures we’ve built on Earth. Back when I was writing A New Kind of Science, I asked some astronauts what the most obvious manmade structure they noticed from space was. It wasn’t anything like the Great Wall of China (which is actually hard to see). Instead, they said it was a line across the Great Salt Lake in Utah (actually a 30-mile-long railroad causeway built in 1959, with algae that happen to have varying colors on the two sides):
Right image courtesy of Ravell Call and Deseret News.
Then there was the 12-mile-diameter circle in New Zealand, the 30-mile one in Mauritania, and the 40-mile one in Quebec (with a certain Arrival heptapod calligraphy look):
Which were artifacts? This was before the web, so we had to contact people to find out. A New Zealand government researcher told us not to make the mistake of thinking their circle followed the shape of the cone volcano at its center. “The truth is, alas, much more prosaic,” he said: it’s the border of a national park, with trees cut outside only, i.e. an artifact. The other circles, however, had nothing to do with humans.
But, OK, let’s come back to the question of what things mean. In a cave painting from 7000 years ago, we can recognize shapes of animals, and hand stencils that we can see were made with hands. But what do the configurations of these things mean? Realistically at this point we have no serious idea.
Maybe it’s easier if we look at things that are more “mathematical”-like. In the 1990s I did a worldwide hunt for early examples of complex but structured patterns. I found all sorts of interesting things (such as mosaics supposedly made by Gilgamesh, from 3000 BC—and the earliest fractals, from 1210 AD). Most of the time I could tell what rules were used to make the patterns—though I could not tell what “meaning” the patterns were supposed to convey, or whether, instead, they were “merely ornamental”.
The last pattern above, though, had me very puzzled for a while. Is it a cellular automaton being constructed back in the 1300s? Or something from number theory? Well, no, in the end it turns out it’s a rendering of a list of 62 attributes of Allah from the Koran, in a special square form of Arabic calligraphy constructed like this:
About a decade ago, I learned about a pattern from 11,000 years ago, on a wall in Aleppo, Syria (one hopes it’s still intact there). What is this? Math? Music? Map? Decoration? Digitally encoded data? We pretty much have no idea.
I could go on giving examples. Lots of times people have said “if one sees such-and-such, then it must have been made for a purpose”. The philosopher Immanuel Kant offered the opinion that if one saw a regular hexagon drawn in the sand, one could only imagine a “rational cause” for it. I used to think of this whenever I saw hexagonal patterns formed in rocks. And a few years ago I heard about hexagons in sand, produced purely by the action of wind. But the biggest hexagon I know is the storm pattern around the north pole of Saturn—that presumably wasn’t in any usual sense “put there for a purpose”:
In 1899 Nikola Tesla picked up all sorts of elaborate and strange-sounding radio emissions, often a little reminiscent of Morse code. He knew they weren’t of human origin, so his immediate conclusion was that they must be radio messages from the inhabitants of Mars. Needless to say, they’re not. And instead, they’re just the result of physical processes in the Earth’s ionosphere and magnetosphere.
But here’s the ironic thing: they often sound bizarrely similar to whale songs! And, yes, whale songs have all sorts of elaborate rhyme-like and other features that remind us of languages. But we still don’t really know if they’re actually for “communication”, or just for “decoration” or “play”.
One might imagine that with modern machine learning and with enough data one should be able to train a translator for “talking to animals”. And no doubt that’d be easy enough for “are you happy?” or “are you hungry?”. But what about more sophisticated things? Say the kind of things we want to communicate to aliens?
I think it’d be very challenging. Because even if animals live in the same environment as us, it’s very unclear how they think about things. And it doesn’t help that even their experience of the world may be quite different—emphasizing for example smell rather than sight, and so on.
But what is this? What does it mean? Should we think of this “piscifact” as some great achievement of puffer fish civilization, that should be celebrated throughout the solar system?
Surely not, one might say. Because even though it looks complex—and even “artistic” (a bit like bird songs have features of music)—we can imagine that one day we’d be able to decode the neural pathways in the brain of the puffer fish that lead it to make this. But so what? We’ll also one day be able to know the neural pathways in humans that lead them to build cathedrals—or try to plant beacons around the solar system.
Aliens and the Philosophy of Purpose
There’s a thought experiment I’ve long found useful. Imagine a very advanced civilization, that’s able to move things like stars and planets around at will. What arrangement would they put them in?
Maybe they’d want to make a “beacon of purpose”. And maybe—like Kant—one could think that would be achievable by setting up some “recognizable” geometric pattern. Like how about an equilateral triangle? But no, that won’t do. Because for example the Trojan asteroids actually form an equilateral triangle with Jupiter and the Sun already, just as a result of physics.
And pretty soon one realizes that there’s actually nothing the aliens could do to “prove their purpose”. The configuration of stars in the sky may look kind of random to us (except, of course, that we still see constellations in it). But there’s nothing to say that looked at in the right way it doesn’t actually represent some grand purpose.
And here’s the confusing part: there’s a sense in which it does! Because, after all, just as a matter of physics, the configuration that occurs can be characterized as achieving the purpose of extremizing some quantity defined by the equations for matter and gravity and so on. Of course, one might say “that doesn’t count; it’s just physics”. But our whole universe (including ourselves) operates according to physics. And so now we’re back to discussing whether the extremization is “meaningful” or not.
We humans have definite ways to judge what’s meaningful or not to us. And what it comes down to is whether we can “tell a story” that explains, in culturally meaningful terms, why we’re doing something. Of course, the notion of purpose has evolved over the course of human history. Imagine trying to explain walking on a treadmill, or buying goods in a virtual world, or, for that matter, sending beacons out into the solar system—to the people thousands of years ago who created the structures from Peru that I showed above.
We’re not familiar (except in mythology) with telling “culturally meaningful stories” about the world of stars and planets. And in the past we might have imagined that somehow whatever stories we could tell would inevitably be far less rich than the ones we can tell about our civilization. But this is where basic science I’ve done comes in. The Principle of Computational Equivalence says that this isn’t true—and that in the end what goes on with stars and planets is just as rich as what goes on in our brains or our civilization.
In an effort to “show something interesting” to the universe, we might have thought that the best thing to do would be to present sophisticated abstract computational things. But that won’t be useful. Because those abstract computational things are ubiquitous throughout the universe.
And instead, the “most interesting” thing we have is actually the specific and arbitrary details of our particular history. Of course, one might imagine that there could be some sophisticated thing out there in the universe that could look at how our history starts, and immediately be able to deduce everything about how it will play out. But a consequence of the Principle of Computational Equivalence is what I call computational irreducibility, which implies that there can be no general shortcut to history; to find how it plays out, one effectively just has to live through it—which certainly helps one feel better about the meaningfulness of life.
The Role of Language
OK, so let’s say we want to explain our history. How can we do it? We can’t show every detail of everything that’s happened. Instead, we need to give a higher-level symbolic description, where we capture what’s important while idealizing everything else away. Of course, “what’s important” depends on who’s looking at it.
We might say “let’s show a picture”. But then we have to start talking about how to make the picture out of pixels at a certain resolution, how to represent colors, say with RGB—not to mention discussing how things might be imaged in 2D, compressed, etc. Across human history, we’ve had a decent record in having pictures remain at least somewhat comprehensible. But that’s probably in no small part because our biologically determined visual systems have stayed the same.
(It’s worth mentioning, though, that pictures can have features that are noticed only when they become “culturally absorbed”. For example, the nested patterns from the 1200s that I showed above were reproduced but ignored in art history books for hundreds of years—until fractals became “a thing”, and people had a way to talk about them.)
When it comes to communicating knowledge on a large scale, the only scheme we know (and maybe the only one that’s possible) is to use language—in which essentially there’s a set of symbolic constructs that can be arranged in an almost infinite number of ways to communicate different meanings.
It was presumably the introduction of language that allowed our species to begin accumulating knowledge from one generation to the next, and eventually to develop civilization as we know it. So it makes sense that language should be at the center of how we might communicate the story of what we’ve achieved.
And indeed if we look at human history, the cultures we know the most about are precisely those with records in written language that we’ve been able to read. If the structures in Caral had inscriptions, then (assuming we could read them) we’d have a much better chance of knowing what the structures were for.
There’ve been languages like Latin, Greek, Hebrew, Sanskrit and Chinese that have been continuously used (or at least known) for thousands of years—and that we’re readily able to translate. But in cases like Egyptian hieroglyphs, Babylonian cuneiform, Linear B, or Mayan, the thread of usage was broken, and it took heroic efforts to decipher them (and often the luck of finding something like the Rosetta Stone). And in fact today there are still plenty of languages—like Linear A, Etruscan, Rongorongo, Zapotec and the Indus script—that have simply never been deciphered.
Then there are cases where it’s not even clear whether something represents a language. An example is the quipus of Peru—that presumably recorded “data” of some kind, but that might or might not have recorded something we’d usually call a language:
Math to the Rescue?
OK, but with all our abstract knowledge about mathematics, and computation, and so on, surely we can now invent a “universal language” that can be universally understood. Well, we can certainly create a formal system—like a cellular automaton—that just consistently operates according to its own formal rules. But does this communicate anything?
In its actual operation, the system just does what it does. But where there’s a choice is what the actual system is, what rules it uses, and what its initial conditions were. So if we were using cellular automata, we could for example decide that these particular ones are the ones we want to show:
What are we communicating here? Each rule has all sorts of detailed properties and behavior. But as a human you might say: “Aha, I see that all these rules double the length of their input; that’s the point”. But to be able to make that summary again requires a certain cultural context. Yes, with our human intellectual history, we have an easy way to talk about “doubling the length of their input”. But with a different intellectual history, that might not be a feature we have a way to talk about, just as human art historians for centuries didn’t have a way to talk about nested patterns.
Let’s say we choose to concentrate on traditional math. We have the same situation there. Maybe we could present theorems in some abstract system. But for each theorem it’s just “OK, fine, with those rules, that follows—much like with those shapes of molecules, this is a way they can arrange in a crystal”. And the only way one’s really “communicating something” is in the decision of which theorems to show, or which axiom systems to use. But again, to interpret those choices inevitably requires cultural context.
One place where the formal meets the actual is in the construction of theoretical models for things. We’ve got some actual physical process, and then we’ve got a formal, symbolic model for it—using mathematical equations, programs like cellular automata, or whatever. We might think that that connection would immediately define an interpretation for our formal system. But once again it does not, because our model is just a model, that captures some features of the system, and idealizes others away. And seeing how that works again requires cultural context.
There is one slight exception to this: what if there is a fundamental theory of all of physics, that can perhaps be stated as a simple program? That program is then not just an idealized model, but a full representation of physics. And the point is that that “ground truth” about our universe describes the physics that govern absolutely any entity that exists in our universe.
If there is indeed a simple model for the universe, it’s essentially inevitable that the things it directly describes are not ones familiar from our everyday sensory experience; for example they’re presumably “below” constructs like space and time as we know them. But still, we might imagine that we could show off our achievements by presenting a version of the ultimate theory for our universe (if we’d found it!). But even with this, there’s a problem. Because, well, it’s not difficult to show a correct model for the universe: you just have to look at the actual universe! So the main information in an abstract representation is in what the primitives of the abstract representation end up being (do you set up your universe in terms of networks, or algebraic structures, or what?).
Let’s back off from this level of philosophy for a moment. Let’s say we’re delivering a physical object—like a spacecraft, or a car—to our aliens. You might think the problem would be simpler. But the problem again is that it requires cultural context to decide what’s important, and what’s not. Is the placement of those rivets a message? Or an engineering optimization? Or an engineering tradition? Or just arbitrary?
Pretty much everything on, say, a spacecraft was presumably put there as part of building the spacecraft. Some was decided upon “on purpose” by its human designers. Some was probably a consequence of the physics of its manufacturing. But in the end the spacecraft just is what it is. You could imagine reconstructing the neural processes of its human designers, as you could imagine reconstructing the heat flows in the annealing of some part of it. But what is just the mechanism by which the spacecraft was built, and what is its “purpose”—or what is it trying to “communicate”?
The Molecular Version
It’s one thing to talk about sending messages based on the achievements of our civilization. But what about just sending our DNA? Yes, it doesn’t capture (at least in any direct way) all our intellectual achievements. But it does capture a couple of billion years of biological evolution, and represent a kind of memorial of the 1040 or so organisms that have ever lived on our planet.
Of course, we might again ask “what does it mean?”. And indeed one of the points of Darwinism is that the forms of organisms (and the DNA that defines them) arise purely as a consequence of the process of biological evolution, without any “intentional design”. Needless to say, when we actually start talking about biological organisms there’s a tremendous tendency to say things like “that mollusc has a pointy shell because it’s useful in wedging itself in rocks”—in other words, to attribute a purpose to what has arisen from evolution.
So what would we be communicating by sending DNA (or, for that matter, complete instances of organisms)? In a sense we’d be providing a frozen representation of history, though now biological history. There’s an issue of context again too. How does one interpret a disembodied piece of DNA? (Or, what environment is needed to get this spore to actually do something?)
Long ago it used to be said that if there were “organic molecules” out in space, it’d be a sign of life. But in fact plenty of even quite complex molecules have now been found, even in interstellar space. And while these molecules no doubt reflect all sorts of complex physical processes, nobody takes them as a sign of anything like life.
So what would happen if aliens found a DNA molecule? Is that elaborate sequence a “meaningful message”, or just something created through random processes? Yes, in the end the sequences that have survived in modern DNA reflect in some way what leads to successful organisms in our specific terrestrial environment, though—just as with technology and language—there is a certain feedback in the way that organisms create the environment for others.
But, so, what does a DNA sequence show? Well, like a library of human knowledge, it’s a representation of a lot of elaborate historical processes—and of a lot of irreducible computation. But the difference is that it doesn’t have any “spark of human intention” in it.
Needless to say, as we’ve been discussing, it’s hard to identify a signature for that. If we look at things we’ve created so far in our civilization, they’re typically recognizable by the presence of things like (what we at least currently consider) simple geometrical forms, such as lines and circles and so on. And in a sense it’s ironic that after all our development as a civilization, what we produce as artifacts look so much simpler than what nature routinely produces.
And we don’t have to look at biology, with all its effort of biological evolution. We can just as well think of physics, and things like the forms of snowflakes or splashes or turbulent fluids.
As I’ve argued at length, the real point is that out in the computational universe of possible programs, it’s actually easy to find examples where even simple underlying rules lead to highly complex behavior. And that’s what’s happening in nature. And the only reason we don’t see that usually in the things we construct is that we constrain ourselves to use engineering practices that avoid complexity, so that we can foresee their outcome. And the result of this is that we tend to always end up with things that are simple and familiar.
Now that we understand more about the computational universe, we can see, however, that it doesn’t always have to be this way. And in fact I have had great success just “mining the computational universe” for programs (and structures) that turn out to be useful, independent of whether one can “understand” how they operate. And something like the same thing happens when one trains a modern machine learning system. One ends up with a technological system that we can identify as achieving some overall purpose, but where the individual parts we can’t particularly recognize as doing meaningful things.
And indeed my expectation is that in the future, a smaller and smaller fraction of human-created technology will be “recognizable” and “understandable”. Optimized circuitry doesn’t have nice repetitive structure; nor do optimized algorithms. Needless to say, it’s sometimes hard to tell what’s going on. Is that pattern of holes on a speakerphone arranged to optimize some acoustic feature, or is it just “decorative”?
Yet again we’re thrust back into the same philosophical quandary: we can see the mechanism by which things operate, and we can come up with a story that describes why they might work that way. But there is no absolute way to decide whether that story is “correct”—except by referring back to the details of humans and human culture.
Talking about the World
Let’s go back to language. What really is a language? Structurally (at least in all the examples we know so far) it’s a collection of primitives (words, grammatical constructs, etc.) that can be assembled according to certain rules. And yes, we can look at a language formally at this level, just like we can look, say, at how to make tilings according to some set of rules. But what makes a language useful for communication is that its primitives somehow relate to the world—and that they’re tied into knowledge.
In a first approximation, the words or other primitives in a language end up being things that are useful in describing aspects of the world that we want to communicate. We have different words for “table” and “chair” because those are buckets of meaning that we find it useful to distinguish. Yes, we could start describing the details of how the legs of the table are arranged, but for many purposes it’s sufficient to just have that one word, or one symbolic primitive, “table”, that describes what we think of as a table.
Of course, for the word “table” to be useful for communication, the sender and recipient of the word have to have shared understanding of its meaning. As a practical matter, for natural languages, this is usually achieved in an essentially societal way—with people seeing other people describing things as “tables”.
How do we determine what words should exist? It’s a societally driven process, but at some level it’s about having ways to define concepts that are repeatedly useful to us. There’s a certain circularity to the whole thing. The concepts that are useful to us depend on the environment in which we live. If there weren’t any tables around (e.g. during the Stone Age), it wouldn’t be terribly useful to have the word “table”.
But then once we introduce a word for something (like “blog”), it starts to be easier for us to think about the thing—and then there tends to be more of it in the environment that we construct for ourselves, or choose to live in.
Imagine an intelligence that exists as a fluid (say the weather, for example). Or even imagine an aquatic organism, used to a fluid environment. Lots of the words we might take for granted about solid objects or locations won’t be terribly useful. And instead there might be words for aspects of fluid flow (say, lumps of vorticity that change in some particular way) that we’ve never identified as concepts that we need words for.
It might seem as if different entities that exist within our physical universe must necessarily have some commonality in the way they describe the world. But I don’t think this is the case—essentially as a consequence of the phenomenon of computational irreducibility.
The issue is that computational irreducibility implies that there are in effect an infinite number of irreducibly different environments that can be constructed on the basis of our physical universe—just like there are an infinite number of irreducibly different universal computers that can be built up using any given universal computer. In more practical terms, a way to say this is that different entities—or different intelligences—could operate using irreducibly different “technology stacks”, based on different elements of the physical world (e.g. atomic vs. electronic vs. fluidic vs. gravitational, etc.) and different chains of inventions. And the result would be that their way of describing the world would be irreducibly different.
Forming a Language
But OK, given a certain experience of the world, how can one figure out what words or concepts are useful in describing it? In human natural languages, this seems to be something that basically just evolves through a process roughly analogous to natural selection in the course of societal use of the language. And in designing the Wolfram Language as a computational communication language I’ve basically piggybacked on what has evolved in human natural language.
So how can we see the emergence of words and concepts in a context further away from human language? Well, in modern times, there’s an answer, which is basically to use our emerging example of alien intelligence: artificial intelligence.
Just take a neural network and start feeding it, say, images of lots of things in the world. (By picking the medium of 2D images, with a particular encoding of data, we’re essentially defining ourselves to be “experiencing the world” in a specific way.) Now see what kinds of distinctions the neural net makes in clustering or classifying these images.
In practice, different runs will give different answers. But any pattern of answers is in effect providing an example of the primitives for a language.
An easy place to see this is in training an image identification network. We started doing this several years ago with tens of millions of example images, in about 10,000 categories. And what’s notable is that if you look inside the network, what it’s effectively doing is to hone in on features of images that let it efficiently distinguish between different categories.
These features then in effect define the emergent symbolic language of the neural net. And, yes, this language is quite alien to us. It doesn’t directly reflect human language or human thinking. It’s in effect an alternate path for “understanding the world”, different from the one that humans and human language have taken.
Can we decipher the language? Doing so would allow us to “explain the story” of what the neural net is “thinking”. But it won’t typically be easy to do. Because the “concepts” that are being identified in the neural network typically won’t have easy translations to things we know about—and we’ll be stuck in effect doing something like natural science to try to identify phenomena from which we can build up a description of what’s going on.
OK, but in the problem of communicating with aliens, perhaps this suggests a way. Don’t try (and it’ll be hard) to specify a formal definition of “chair”. Just show lots of examples of chairs—and use this to define the symbolic “chair” construct. Needless to say, as soon as one’s showing pictures of chairs, not providing actual chairs, there are issues of how one’s describing or encoding things. And while this approach might work decently for common nouns, it’s more challenging for things like verbs, or more complex linguistic constructs.
But if we don’t want our spacecraft full of sample objects (a kind of ontological Noah’s Ark), maybe we could get away with just sending a device that looks at objects, and outputs what they’re called. After all, a human version of this is basically how people learn languages, either as children, or when they’re out doing linguistic fieldwork. And today we could certainly have a little computer with a very respectable, human-grade image identifier on it.
But here’s the problem. The aliens will start showing the computer all sorts of things that they’re familiar with. But there’s no guarantee whatsoever that they’ll be aligned with the things we (or the image identifier) has words for. One can already see the problem if one feeds an image identifier human abstract art; it’s likely to be even worse with the products of alien civilization:
What the Wolfram Language Does
So can the Wolfram Language help? My goal in building it has been to create a bridge between the things humans want to do, and the things computation abstractly makes possible. And if I were building the language not for humans but for aliens—or even dolphins—I’d expect it to be different.
In the end, it’s all about computation, and representing things computationally. But what one chooses to represent—and how one does it—depends on the whole context one’s dealing with. And in fact, even for us humans, this has steadily changed over time. Over the 30+ years I’ve been working on the Wolfram Language, for example, both technology and the world have measurably evolved—with the result that there are all sorts of new things that make sense to have in the language. (The advance of our whole cultural understanding of computation—with things like hyperlinks and functional programming now becoming commonplace—also changes the concepts that can be used in the language.)
Right now most people think of the Wolfram Language mainly as a way for humans to communicate with computers. But I’ve always seen it as a general computational communication language for humans and computers—that’s relevant among other things in giving us humans a way to think and communicate in computational terms. (And, yes, the kind of computational thinking this makes possible is going to be increasingly critical—even more so than mathematical thinking has been in the past.)
But the key point is that the Wolfram Language is capturing computation in human-compatible terms. And in fact we can view it as in effect giving a definition of which parts of the universe of possible computations we humans—at the current stage in the evolution of our civilization—actually care about.
Another way to put this is that we can think of the Wolfram Language as providing a compressed representation (or, in effect, a model) of the core content of our civilization. Some of that content is algorithmic and structural; some of it is data and knowledge about the details of our world and its history.
There’s more to do to make the Wolfram Language into a full symbolic discourse language that can express a full range of human intentions (for example what’s needed for encoding complete legal contracts, or ethical principles for AIs.) But with the Wolfram Language as it exists today, we’re already capturing a very broad swath of the concerns and achievements of our civilization.
But how would we feed it to aliens? At some level its gigabytes of code and terabytes of data just define rules—like the rules for a cellular automaton or any other computational system. But the point is that these rules are chosen to be ones that do computations that we humans care about.
It’s a bit like those Egyptian tomb models, which show things Egyptians cared about doing. If we give the aliens the Wolfram Language we’re essentially giving them a computational model of things we care about doing. Except, of course, that by providing a whole language—rather than just individual pictures or dioramas—we’re communicating in a vastly broader and deeper way.
The Reality of Time Capsules
What we’re trying to create in a sense amounts to a time capsule. So what can we learn from time capsules of the past? Sadly, the history is not too inspiring.
Particularly following the discovery of King Tutankhamun’s tomb in 1922, there was a burst of enthusiasm for time capsules that lasted a little over 50 years, and led to the creation—and typically burial—of perhaps 10,000 capsules. Realistically, though, the majority of these time capsules are even by now long forgotten—most often because the organizations that created them have changed or disappeared. (The Westinghouse Time Capsule for the 1939 World’s Fair was at one time a proud example; but last year the remains of Westinghouse filed for bankruptcy.)
My own email archive records a variety of requests in earlier years for materials for time capsules, and looking at it today I’m reminded that we seem to have created a time capsule for Mathematica’s 10th anniversary in 1998. But where is it now? I don’t know. And this is a typical problem. Because whereas an ongoing archive (or library, etc.) can keep organized track of things, time capsules tend to be singular, and have a habit of ending up sequestered away in places that quickly get obscured and forgotten. (The reverse can also happen: people think there’s a time capsule somewhere—like one supposedly left by John von Neumann to be opened 50 years after his death—but it turns out just to be a confusion.)
The one area where at least informal versions of time capsules seem to work out with some frequency is in building construction. In England, for example, when thatched roofs are redone after 50 years or so, it’s common for messages from the previous workers to be found. But a particularly old tradition—dating even back to the Babylonians—is to put things in the foundations, and particularly at the cornerstones, of buildings.
Often in Babylonian times, there would just be an inscription cursing whoever had demolished the building to the point of seeing its foundations. But later, there was for example a longstanding tradition among Freemason stonemasons to embed small boxes of memorabilia in public buildings they built.
More successful, however, than cleverly hidden time capsules have been stone inscriptions out in plain sight. And indeed much of our knowledge of ancient human history and culture comes from just such objects. Sometimes they are part of large surviving architectural structures. But one famous example (key to the deciphering of cuneiform) is simply carved into the side of a cliff in what’s now Iran:
For emphasis, it has a life-size relief of a bunch of warriors at the top. The translated text begins: “I am Darius the great, king of kings, …” and goes on to list 76 paragraphs of Darius’s achievements, many of them being the putting down of attempted rebellions against him, in which he brought their leaders to sticky ends.
Such inscriptions were common in the ancient world (as their tamer successors are common today). But somehow their irony was well captured by my childhood favorite poem, Shelley’s “Ozymandias” (named after Ramses II of Egypt):
“I met a traveller from an antique land,
Who said—Two vast and trunkless legs of stone
Stand in the desert.
…
And on the pedestal, these words appear:
‘My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.”
If there was a “Risks” section to a prospectus for the beacon project, this might be a good exhibit for it.
Of course, in addition to intentional “showoff” inscriptions, ancient civilizations left plenty of “documentary exhaust” that’s still around in one form or another today. A decade ago, for example, I bought off the web (and, yes, I’m pretty sure it’s genuine) a little cuneiform tablet from about 2100 BC:
It turns out to be a contract saying that a certain Mr. Lu-Nanna is receiving 1.5 gur (about 16 cubic feet) of barley in the month of Dumuzi (Tammuz/June-July), and that in return he should pay out certain goods in September-November.
Most surviving cuneiform tablets are about things like this. One in a thousand or so are about things like math and astronomy, though. And when we look at these tablets today, it’s certainly interesting to see how far the Babylonians had got in math and astronomy. But (with the possible exception of some astronomical parameters) after a while we don’t really learn anything more from such tablets.
And that’s a lesson for our efforts now. If we put math or science facts in our beacons, then, yes, it shows how far we’ve got (and of course to make the best impression we should try to illustrate the furthest reaches of, for example, today’s math, which will be quite hard to do). But it feels a bit like job applicants writing letters that start by explaining basic facts. Yes, we already know those; now tell us something about yourselves!
But what’s the best way to do that? In the past the channel with the highest bandwidth was the written word. In today’s world, maybe video—or AI simulation—goes further. But there’s more—and we’re starting to see this in modern archaeology. The fact is that pretty much any solid object carries microscopic traces of its history. Maybe it’s a few stray molecules—say from the DNA of something that got onto an eating utensil. Maybe it’s microscopic scratches or cracks in the material itself, indicating some pattern of wear.
Atomic force microscopy gives us the beginning of one way to systematically read such things out. But as molecular-scale computing comes online, such capabilities will grow rapidly. And this will give us access to a huge repository of “historical exhaust”.
We won’t immediately know the name “Lu-Nanna”. But we might well know their DNA, the DNA of their scribe, what time of day their tablet was made, and what smells and maybe even sounds there were while the clay was drying. All of this one can think of as a form of “sensory data”—once again giving us information on “what happened”, though with no interpretation of what was considered important.
Messages in Space
OK, but our objective is to put information about our civilization out into space. So what’s the history of previous efforts to do that? Well, right now there are just four spacecraft outside our solar system (and another one that’s headed there), and there are under 100 spacecraft more-or-less intact on various planetary surfaces (not counting hard landings, melted spacecraft on Venus, etc.). And at some level a spacecraft itself is a great big “message”, illustrating lots of technology and so on.
Probably the largest amounts of “design information” will be in the microprocessors. And although radiation hardening forces deep space probes to use chip designs that are typically a decade or more behind the latest models, something like the New Horizons spacecraft launched in 2006 still has MIPS R3000 CPUs (albeit running at 12 MHz) with more than 100,000 transistors:
There are also substantial amounts of software, typically stored in some kind of ROM. Of course, it may not be easy to understand, even for humans—and indeed just last month, firing backup thrusters on Voyager 1 that hadn’t been used for 37 years required deciphering the machine code for a long-extinct custom CPU.
The structure of a spacecraft tells a lot about human engineering and its history. Why was the antenna assembly that shape? Well, because it came from a long lineage of other antennas that were conveniently modeled and manufactured in such-and-such a way, and so on.
But what about more direct human information? Well, there are often little labels printed on components by manufacturers. And in recent times there’s been a trend of sending lists of people’s names (more than 400,000 on New Horizons) in engravings, microfilm or CDs/DVDs. (The MAVEN Mars mission also notably carried 1000+ publicly submitted haikus about Mars, together with 300+ drawings by kids, all on a DVD.) But on most spacecraft the single most prominent piece of “human communication” is a flag:
A few times, however, there have been explicit, purposeful plaques and things displayed. For example, on the leg of Apollo 11’s lunar module this was attached (with the Earth rendered in a stereographic projection cut in the middle of the Atlantic around 20°W):
Each Apollo mission to the Moon also planted an American flag (most still “flying” according to recent high-res reconnaissance)—strangely reminiscent of shrines to ancient gods found in archaeological remains:
The very first successful moon probe (Luna 2) carried to the Moon this ball-like object—which was intended to detonate like a grenade and scatter its pentagonal facets just before the probe hit the lunar surface, proclaiming (presumably to stake a claim): “USSR, January 1959”:
Courtesy of the Cosmosphere, Hutchinson, KS
On Mars, there’s a plaque that seems more like the cover sheet for a document—or that might be summarized as “putting the output of some human cerebellums out in the cosmos” (what kind of personality analysis could the aliens do from those signatures?):
There’s another list of names, this time an explicit memorial for fallen astronauts, left on the Moon by Apollo 15. But this time it comes with a small figurine, strangely reminiscent of the figurines we find in early archaeological remains:
Figurines have actually been sent on other spacecraft too. Here are some LEGO ones that went to Jupiter on the Juno spacecraft (from left to right: mythological Jupiter, mythological Juno, and real Galileo, complete with LEGO attachments):
Also on that spacecraft was a tribute to Galileo—though all this will be vaporized when the spacecraft deorbits Jupiter later in 2018 to avoid contaminating any moons:
A variety of somewhat random personal and other trinkets have been left—usually unofficially—on the Moon. An example is a collection of tiny artworks (which are head scratchers even for me as a human) apparently attached to the leg of the Apollo 12 lunar module:
There was also a piece of “artwork” (doubling as a color calibration target) sent on the ill-fated Beagle 2 Mars lander:
There are “MarsDials” on several Mars landers, serving as sundials and color calibration targets. The earlier ones had the statement “Two worlds, one sun”—along with the word “Mars” in 22 languages; on later ones the statement was the less poetic “On Mars, to explore”:
As another space trinket, the New Horizons spacecraft that recently passed Pluto has a simple Florida state quarter on board—which at least was presumably easy and cheap to obtain near its launch site.
But the most serious—and best-known—attempts to provide messages are the engraved aluminum plaques on the Pioneer 10 and 11 spacecraft that were launched in 1972 and 1973 (though are sadly now out of contact):
I must say I have never been a big fan of this plaque. It always seemed to me too clever by half. My biggest beef has always been with the element at the top left. The original paper (with lead author Carl Sagan) about the plaque states that this “should be readily recognizable to the physicists of other civilizations”.
But what is it? As a human physicist, I can figure it out: it’s an iconic representation of the hyperfine transition of atomic hydrogen—the so-called 21-centimeter line. And those little arrows are supposed to represent the spin directions of protons and electrons before and after the transition. But wait a minute: electrons and protons are spin-1/2, so they act as spinors. And yes, traditional human quantum mechanics textbooks do often illustrate spinors using vectors. But that’s a really arbitrary convention.
Oh, and why should we represent quantum mechanical wavefunctions in atoms using localized lines? Presumably the electron is supposed to “go all the way around” the circle, indicating that it’s delocalized. And, yes, you can explain that iconography to someone who’s used to human quantum mechanics textbooks. But it’s about as obscure and human-specific as one can imagine. And, by the way, if one wants to represent 21.106-centimeter radiation, why not just draw a line precisely that length, or make the plaque that size (it actually has a width of 22.9 centimeters)!
I could go on and on about what’s wrong with the plaque. The rendering conventions for the (widely mocked) human figures, especially when compared to those for the spacecraft. The use of an arrow to show the spacecraft direction (do all aliens go through a stage of shooting arrowheads?). The trailing (binary) zeros to cover the lack of precision in pulsar periods.
The official key from the original paper doesn’t help the case, and in fact the paper lays out some remarkably elaborate “science IQ test” reasoning needed to decode other things on the plaque:
After the attention garnered by the Pioneer plaques, a more ambitious effort was made for the Voyager spacecraft launched in 1977. The result was the 12-inch gold-plated Voyager Golden Record, with an “album cover”:
In 1977, phonograph records seemed like “universally obvious technology”. Today of course even the concept of analog recording is (at least for now) all but gone. And what of the elaborately drawn “needle” on the top left? In modern times the obvious way to read the record would just be to image the whole thing, without any needles tracking grooves.
But, OK, so what’s on the record? There are some spoken greetings in 55 languages (beginning with one in a modern rendering of Akkadian), along with a 90-minute collection of music from around the world. (Somehow I imagine an alien translator—or, for that matter, an AI—trying in vain to align the messages between the words and the music.) There’s an hour of recorded brainwaves of Carl Sagan’s future wife (Ann Druyan), apparently thinking about various things.
Then there are 116 images, encoded in analog scan lines (though I don’t know how color was done). Many were photographs of 1970s life on Earth. Some were “scientific explanations”, which are at least good exercises for human science students of the 2010s to interpret (though the real-number rounding is weird, there are “9 planets”, there’s “S” in place of “C” as a base pair—and it’s charming to see the stencil-and-ink rendering):
Among efforts after Voyager have been the (very 1990s-style) CD of human Mars-related “Visions of Mars” fiction on the failed 1996 Mars 96 spacecraft, as well as the 2012 “time capsule” CD of images and videos on the EchoStar 16 satellite in geostationary orbit around Earth:
A slightly different kind of plaque was launched back in 1976 on the LAGEOS-1 satellite that’s supposed to be in polar orbit around the Earth for 8.4 million years. There are the binary numbers, reminiscent of Leibniz’s original “binary medal”. And then there’s an image of the predicted effect of continental drift (and what about sea level?) from 228 years ago, to the end of the satellite’s life—that to me gives off a certain “so, did we get it right?” vibe:
There was almost an engraved diamond plaque sent on the Cassini mission to Saturn and beyond in 1997, but as a result of human disagreements, it was never sent—and instead, in a very Ozymandias kind of way, all that’s left on the spacecraft is an empty mounting pedestal, whose purpose might be difficult to imagine.
Still another class of artifacts sent into the cosmos are radio transmissions. And until we have better-directed radio communications (and 5G will help), we’re radiating a certain amount of (increasingly encrypted) radio energy into the cosmos. The most intense ongoing transmissions remain the 50 Hz or 60 Hz hum of power lines, as well as the perhaps-almost-pulsar-like Ballistic Missile Early Warning System radars. But in the past there’ve been specific attempts to send messages for aliens to pick up.
The most famous was sent by the Arecibo radio telescope in 1974. Its repetition length was a product of two primes, intended to suggest assembly as a rectangular array. It’s an interesting exercise for humans to try to decipher the resulting image. Can you see the sequence of binary numbers? The schematic DNA, and the bitvectors for its components? The telescope icon? And the little 8-bit-video-game-like human?
Needless to say, we pick up radio transmissions from the cosmos that we don’t understand fairly often. But are they signs of intelligence? Or “merely physics”? As I’ve said, the Principle of Computational Equivalence tells us there isn’t ultimately a distinction. And that, of course, is the challenge of our beacons project.
It’s worth mentioning that in addition to what’s been sent into space, there are a few messages on Earth specifically intended for at least few thousand years in the future. Examples are the 2000-year equinox star charts at the Hoover Dam, and the long-planned-but-not-yet-executed 10,000-year “stay away; it’s radioactive” warnings (or maybe it’s an “atomic priesthood” passing information generation to generation) for facilities like the WIPP nuclear waste repository in southeastern New Mexico. (Not strictly a “message”, but there’s also the “10,000-year clock” being built in West Texas.)
A discussion of extraterrestrial communication wouldn’t be complete without at least mentioning the 1960 book Lincos: Design of a Language for Cosmic Intercourse—my copy of which wound up on the set of Arrival. The idea of the book was to use the methods and notation of mathematical logic to explain math, science, human behavior and other things “from first principles”. Its author, Hans Freudenthal, had spent decades working on math education—and on finding the best ways to explain math to (human) kids.
Lincos was created too early to benefit from modern thinking about computer languages. And as it was, it used the often almost comically abstruse approach of Whitehead and Russell’s 1910 Principia Mathematica—in which even simple ideas become notationally complex. When it came to a topic like human behavior Lincos basically just gave examples, like small scenes in a stage play—but written in the notation of mathematical logic.
Yes, it’s interesting to try to have a symbolic representation for such things—and that’s the point of my symbolic discourse language project. But even though Lincos was at best just at the very beginning of trying to formulate something like this, it was still the obvious source for attempts to send “active SETI” messages starting in 1999, and some low-res bitmaps of Lincos were transmitted to nearby stars.
Science Fiction and Beyond
For our beacons project, we want to create human artifacts that will be recognized even by aliens. The related question of how alien artifacts might be recognizable has been tackled many times in science fiction.
Most often there’s something that just “doesn’t look natural”, either because it’s obviously defying gravity, or because it’s just too simple or perfect. For example, in the movie 2001, when the black cuboid monolith with its exact 1:4:9 side ratios shows up on Stone Age Earth or on the Moon, it’s obvious it’s “not natural”.
On the flip side, people in the 1800s argued that the fact that, while complex, a human-made pocket watch was so much simpler than a biological organism meant that the latter could only be an “artifact of God”. But actually I think the issue is just that our technology isn’t advanced enough yet. We’re still largely relying on engineering traditions and structures where we readily foresee every aspect of how our system will behave.
But I don’t think this will go on much longer. As I’ve spent many years studying, out in the computational universe of all possible programs it’s very common that the most efficient programs for a particular purpose don’t look at all simple in their behavior (and in fact this is a somewhat inevitable consequence of making better use of computational resources). And the result is that as soon as we can systematically mine such programs (as Darwinian evolution and neural network training already begin to), we’ll end up with artifacts that no longer look simple.
Ironically—but not surprisingly, given the Principle of Computational Equivalence—this suggests that our future artifacts will often look much more like “natural systems”. And indeed our current artifacts may look as primitive in the future as many of those produced before modern manufacturing look to us today.
Some science fiction stories have explored “natural-looking” alien artifacts, and how one might detect them. Of course it’s mired in the same issues that I’ve been exploring throughout this post—making it very difficult for example to tell for certain even whether the strangely red and strangely elongated interstellar object recently observed crossing our solar system is an alien artifact, or just a “natural rock”.
The Space of All Possible Civilizations
A major theme of this post has been that “communication” requires a certain sharing of “cultural context”. But how much sharing is enough? Different people—with at least fairly different backgrounds and experiences—can usually understand each other well enough for society to function, although as the “cultural distance” increases, such understanding becomes more and more difficult.
Over the course of human history, one can imagine a whole net of cultural contexts, defined in large part (at least until recently) by place and time. Neighboring contexts are typically closely connected—but to get a substantial distance, say in time, often requires following a quite long chain of intermediate connections, a bit like one might have to go through a chain of intermediate translations to get from one language to another.
Particularly in modern times, cultural context often evolves quite significantly even over the course of a single human lifetime. But usually the process is gradual enough that an individual can bridge the contexts they encounter—though of course there’s no lack of older people who are at best confused at the preferences and interests of the young (think modern social media, etc.). And indeed were one just suddenly to wake up a century hence, it’s fairly certain that some of the cultural context would be somewhat disorientingly different.
But, OK, can we imagine making some kind of formal theory of cultural contexts? To do so would likely in effect require describing the space of all possible civilizations. And at first this might seem utterly infeasible.
But when we explore the computational universe of possible programs we are looking at a space of all possible rules. And it’s easy to imagine defining at least some feature of a civilization by some appropriate rule—and different rules can lead to dramatically different behavior, as in these cellular automata:
But, OK, what would “communication” mean in this context? Well, as soon as these rules are computationally universal (and the Principle of Computational Equivalence implies that except in trivial cases they always will be), there’s got to be some way to translate between them. More specifically, given one universal rule, there must be some program for it—or some class of initial conditions—that make it emulate any other specified rule. Or, in other words, it must be possible to implement an interpreter for any given rule in the original rule.
We might then think of defining a distance between rules to be determined by the size or complexity of the interpreter necessary to translate between them. But while this sounds good in principle, it’s certainly not an easy thing to deal with out in practice. And it doesn’t help that interpretability can be formally undecidable, so there’s no upper bound on the size or complexity of the translator between rules.
But at least conceptually, this gives us a chance to think about how a “communication distance” might be defined. And perhaps one could imagine a first approximation for the simplified case of neural networks, in which one just asks how difficult it is to train one network to act like another.
As a more down-to-earth analogy to the space of cultural contexts, we could consider human languages, of which there are about 10,000 known. One can assess similarities between languages by looking at their words, and perhaps by looking at things like their grammatical structures. And even though in first approximation all languages can talk about the same kinds of things, languages can at least superficially have significant differences.
But for the specific case of human languages, there’s a lot determined by history. And indeed there’s a whole evolutionary tree of languages that one can identify, that effectively explains what’s close and what’s not. (Languages are often related to cultures, but aren’t the same. For example, Finnish is very different as a language from Swedish, even though Finnish and Swedish cultures are fairly similar.)
In the case of human civilizations, there are all sorts of indicators of similarity one might use. How similar do their artifacts look, say as recognized by neural networks? How similar are their social, economic or genealogical networks? How similar are quantitative measures of their patterns of laws or government?
Of course, all human civilizations share all sorts of common history—and no doubt occupy only some infinitesimal corner in the space of all possible civilizations. And in the vast majority of potential alien civilizations, it’s completely unrealistic to expect that the kinds of indicators we’re discussing for human civilizations could even be defined.
So how might one characterize a civilization and its cultural context? One way is to ask how it uses the computational universe of possible programs. What parts of that universe does it care about, and what not?
Now perhaps the endpoint of cultural evolution is to make use of the whole space of possible programs. Of course, our actual physical universe is presumably based on specific programs—although within the universe one can perfectly well emulate other programs.
And presumably anything that we could identify as a definite “civilization” with definite “culture context” must make use of some particular type of encoding—and in effect some particular type of language—for the programs it wants to specify. So one way to characterize a civilization is to imagine what analog of the Wolfram Language (or in general what symbolic discourse language) it would invent to describe things.
Yes, I’ve spent much of my life building the single example of the Wolfram Language intended for humans. And now what I’m suggesting is to imagine the space of all possible analogous languages, with all possible ways of sampling and encoding the computational universe.
But that’s the kind of thing we need to consider if we’re serious about alien communication. And in a sense just as we might say that we’re only going to consider aliens who live within a certain number of light years of us, so also we may have to say that we’ll only consider aliens where the language defining their cultural context is within a certain “translation distance” of ours.
How can we study this in practice? Well, of course we could think about what analog of the Wolfram Language other creatures with whom we share the Earth might find useful. We could also think about what AIs would find useful—though there is some circularity to this, insofar as we are creating AIs for the purpose of furthering our human goals. But probably the best path forward is just to imagine some kind of abstract enumeration of possible Wolfram-Language analogs, and then to start studying what methods of translation might be possible between them.
What Should We Actually Send?
OK, so there are lots of complicated intellectual and philosophical issues. But if we’re going to send beacons about the achievements of our civilization into space, what’s the best thing to do in practice?
A few points are obvious. First, even though it might seem more “universal”, don’t send lots of content that’s somehow formally derivable. Yes, we could say 2+2=4, or state a bunch of mathematical theorems, or show the evolution of a cellular automaton. But other than demonstrating that we can successfully do computation (which isn’t anything special, given the Principle of Computational Equivalence) we’re not really communicating anything like this. In fact, the only real information about us is our choice of what to send: which arithmetic facts, which theorems, etc.
Here’s an ancient Egyptian die. And, yes, it’s interesting that they knew about icosahedra, and chose to use them. But the details of the icosahedral shape don’t tell us anything: it’s just the same as any other icosahedron.
OK, so an important principle is: if we want to communicate about ourselves, send things that are special to us—which means all sorts of arbitrary details about our history and interests. We could send an encyclopedia. Or if we have more space, we could send the whole content of the web, or scans of all books, or all available videos.
There’s a point, though, at which we will have sent enough: where basically there’s the raw material to answer any reasonable question one could ask about our civilization and our achievements.
But how does one make this as efficient as possible? Well, at least for general knowledge I’ve spent a long time trying to solve that problem. Because in a sense that’s what Wolfram|Alpha is all about: creating a system that can compute the answers to as broad a range as possible of questions.
So, yes, if we send a Wolfram|Alpha, we’re sending knowledge of our civilization in a concentrated, computational form, ready to be used as broadly as possible.
Of course, at least the public version of Wolfram|Alpha is just about general, public knowledge. So what about more detailed information about humans and the human condition?
Well, there’re always things like email archives, and personal analytics, and recordings, and so on. And, yes, I happen to have three decades of rather extensive data about myself, that I’ve collected mostly because it was easy for me to do.
But what could one get from that? Well, I suspect there’s enough data there that at least in principle one could construct a bot of me from it: in other words, one could create an AI system that would respond to things in pretty much the same way I would.
Of course, one could imagine just “going to the source” and starting to read out the content of a human brain. We don’t know how to do that yet. But if we’re going to assume that the recipients of our beacons have advanced further, then we have to assume that given a brain, they could tell what it would do.
Indeed, perhaps the most obvious thing to send (though it’s a bit macabre) would just be whole cryonically preserved humans (and, yes, they should keep well at the temperature of interstellar space!). Of course, it’s ironic how similar this is to the Egyptian idea of making mummies—though our technology is better (even if we still haven’t yet solved the problem of cryonics).
Is there a way to do even better, though? Perhaps by using AI and digital technology, rather than biology. Well, then we have a different problem. Yes, I expect we’ll be able to make AIs that represent any aspect of our civilization that we want. But then we have to decide what the “best of our civilization” is supposed to be.
It’s very related to questions about the ethics and “constitution” we should define for the AIs—and it’s an issue that comes back directly to the dynamics of our society. If we were sending biological humans then we’d get whatever bundle of traits each human we sent happened to have. But if we’re sending AIs, then somehow we’d have to decide which of the infinite range of possible characteristics we’d assign to best represent our civilization.
Whatever we might send—biological or digital—there’s absolutely no guarantee of any successful communication. Sure, our person or our AI might do their best to understand and respond to the alien that picked them up. But it might be hopeless. Yes, our representative might be able to identify the aliens, and observe the computations they’re doing. But that doesn’t mean that there’s enough alignment to be able to communicate anything we might think of as meaning.
It’s certainly not encouraging that we haven’t yet been able to recognize what we consider to be signs of extraterrestrial intelligence anywhere else in the universe. And it’s also not encouraging that even on our own planet we haven’t succeeded in serious communication with other species.
But just like Darius—or even Ozymandias—we shouldn’t give up. We should think of the beacons we send as monuments. Perhaps they will be useful for some kind of “afterlife”. But for now they serve as a useful rallying point for thinking about what we’re proud of in the achievements of our civilization—and what we want to capture and celebrate in the best way we can. And I’ll certainly be pleased to contribute to this effort the computational knowledge that I’ve been responsible for accumulating.
This June 23rd it’ll be 30 years since we released Version 1.0, and I’m very proud of the fact that we’ve now been able to maintain an accelerating rate of innovation and development for no less than three decades. Critical to this, of course, has been the fact that we use the Wolfram Language to develop the Wolfram Language—and indeed most of the things that we can now add in Version 11.3 are only possible because we’re making use of the huge stack of technology that we’ve been systematically building for more than 30 years.
We’ve always got a large pipeline of R&D underway, and our strategy for .1 versions is to use them to release everything that’s ready at a particular moment in time. Sometimes what’s in a .1 version may not completely fill out a new area, and some of the functions may be tagged as “experimental”. But our goal with .1 versions is to be able to deliver the latest fruits of our R&D efforts on as timely a basis as possible. Integer (.0) versions aim to be more systematic, and to provide full coverage of new areas, rounding out what has been delivered incrementally in .1 versions.
In addition to all the new functionality in 11.3, there’s a new element to our process. Starting a couple of months ago, we began livestreaming internal design review meetings that I held as we brought Version 11.3 to completion. So for those interested in “how the sausage is made”, there are now almost 122 hours of recorded meetings, from which you can find out exactly how some of the things you can now see released in Version 11.3 were originally invented. And in this post, I’m going to be linking to specific recorded livestreams relevant to features I’m discussing.
What’s New?
OK, so what’s new in Version 11.3? Well, a lot of things. And, by the way, Version 11.3 is available today on both desktop (Mac, Windows, Linux) and the Wolfram Cloud. (And yes, it takes extremely nontrivial software engineering, management and quality assurance to achieve simultaneous releases of this kind.)
In general terms, Version 11.3 not only adds some completely new directions, but also extends and strengthens what’s already there. There’s lots of strengthening of core functionality: still more automated machine learning, more robust data import, knowledgebase predictive prefetching, more visualization options, etc. There are all sorts of new conveniences: easier access to external languages, immediate input iconization, direct currying, etc. And we’ve also continued to aggressively push the envelope in all sorts of areas where we’ve had particularly active development in recent years: machine learning, neural nets, audio, asymptotic calculus, external language computation, etc.
Here’s a word cloud of new functions that got added in Version 11.3:
Blockchain
There are so many things to say about 11.3, it’s hard to know where to start. But let’s start with something topical: blockchain. As I’ll be explaining at much greater length in future posts, the Wolfram Language—with its built-in ability to talk about the real world—turns out to be uniquely suited to defining and executing computational smart contracts. The actual Wolfram Language computation for these contracts will (for now) happen off the blockchain, but it’s important for the language to be able to connect to blockchains—and that’s what’s being added in Version 11.3. [Livestreamed design discussion.]
The first thing we can do is just ask about blockchains that are out there in the world. Like here’s the most recent block added to the main Ethereum blockchain:
And we can then start doing data science—or whatever analysis—we want about the structure and content of the blockchain. For the initial release of Version 11.3, we’re supporting Bitcoin and Ethereum, though other public blockchains will be added soon.
But already in Version 11.3, we’re supporting a private (Bitcoin-core) Wolfram Blockchain that’s hosted in our Wolfram Cloud infrastructure. We’ll be periodically publishing hashes from this blockchain out in the world (probably in things like physical newspapers). And it’ll also be possible to run versions of it in private Wolfram Clouds.
It’s extremely easy to write something to the Wolfram Blockchain (and, yes, it charges a small number of Cloud Credits):
✕
BlockchainPut[Graphics[Circle[]]]
The result is a transaction hash, which one can then look up on the blockchain:
By the way, the Hash function in the Wolfram Language has been extended in 11.3 to immediately support the kinds of hashes (like “RIPEMD160SHA256”) that are used in cryptocurrency blockchains. And by using Encrypt and related functions, it’s possible to start setting up some fairly sophisticated things on the blockchain—with more coming soon.
System Modeling
Alright, so now let’s talk about something really big that’s new—at least in experimental form—in Version 11.3. One of our long-term goals in the Wolfram Language is to be able to compute about anything in the world. And in Version 11.3 we’re adding a major new class of things that we can compute about: complex engineering (and other) systems. [Livestreamed design discussions 1 and 2.]
Back in 2012 we introduced Wolfram SystemModeler: an industrial-strength system modeling environment that’s been used to model things like jet engines with tens of thousands of components. SystemModeler lets you both run simulations of models, and actually develop models using a sophisticated graphical interface.
What we’re adding (experimentally) in Version 11.3 is the built-in capability for the Wolfram Language to run models from SystemModeler—or in fact basically any model described in the Modelica language.
Let’s start with a simple example. This retrieves a particular model from our built-in repository of models:
But the place where it gets really interesting is that you can actually run this model. SystemModelPlot makes a plot of a “standard simulation” of the model:
What actually is the model underneath? Well, it’s a set of equations that describe the dynamics of how the components of the system behave. And for a very simple system like this, these equations are already pretty complicated:
It comes with the territory in modeling real-world systems that there tend to be lots of components, with lots of complicated interactions. SystemModeler is set up to let people design arbitrarily complicated systems graphically, hierarchically connecting together components representing physical or other objects. But the big new thing is that once you have the model, then with Version 11.3 you can immediately work with it in the Wolfram Language.
One of these properties gives the variables that characterize the system. And, yes, even in a very simple system like this, there are already lots of those:
A typical thing one wants to do is to investigate how the system behaves when parameters are changed. This simulates the system with one of its parameters changed, then makes a plot:
We could go on from here to sample lots of different possible inputs or parameter values, and do things like studying the robustness of the system to changes. Version 11.3 provides a very rich environment for doing all these things as an integrated part of the Wolfram Language.
In 11.3 there are already over 1000 ready-to-run models included—of electrical, mechanical, thermal, hydraulic, biological and other systems. Here’s a slightly more complicated example—the core part of a car:
In addition to complete ready-to-run models, there are also over 6000 components included in 11.3, from which models can be constructed. SystemModeler provides a full graphical environment for assembling these components. But one can also do it purely with Wolfram Language code, using functions like ConnectSystemModelComponents (which essentially defines the graph of how the connectors of different components are connected):
model = ConnectSystemModelComponents[components, connections]
You can also create models directly from their underlying equations, as well as making “black-box models” purely from data or empirical functions (say from machine learning).
It’s taken a long time to build all the system modeling capabilities that we’re introducing in 11.3. And they rely on a lot of sophisticated features of the Wolfram Language—including large-scale symbolic manipulation, the ability to robustly solve systems of differential-algebraic equations, handling of quantities and units, and much more. But now that system modeling is integrated into the Wolfram Language, it opens all sorts of important new opportunities—not only in engineering, but in all fields that benefit from being able to readily simulate multi-component real-world systems.
I find this helpful, because otherwise I sometimes don’t notice closed groups, with extra cells inside. (And, yes, if you don’t like it, you can always switch it off in the stylesheet.)
Another small but useful change is the introduction of “indefinite In/Out labels”. In a notebook that’s connected to an active kernel, successive cells are labeled In[1], Out[1], etc. But if one’s no longer connected to the same kernel (say, because one saved and reopened the notebook), the In/Out numbering no longer makes sense. So in the past, there were just no In, Out labels shown. But as of Version 11.3, there are still labels, but they’re grayed down, and they don’t have any explicit numbers in them:
Another new feature in Version 11.3 is Iconize. Here’s the basic problem it solves. Let’s say you’ve got some big piece of data or other input that you want to store in the notebook, but you don’t want it to visually fill up the notebook. Well, one thing you can do is to put it in closed cells. But then to use the data you have to do something like creating a variable and so on. Iconize provides a simple, inline way to save data in a notebook.
In Version 11.2 we introduced ExternalEvaluate, for evaluating code in external languages (initially Python and JavaScript) directly from the Wolfram Language. (This is supported on the desktop and in private clouds; for security and provisioning reasons, the public Wolfram Cloud only runs pure Wolfram Language code.)
In Version 11.3 we’re now making it even easier to enter external code in notebooks. Just start an input cell with a > and you’ll get an external code cell (you can stickily select the language you want):
And, yes, what comes back is a Wolfram Language expression that you can compute with:
✕
StringSplit[%, "-"]
Workflow Documentation
We put a lot of emphasis on documenting the Wolfram Language—and traditionally we’ve had basically three kinds of components to our documentation: “reference pages” that cover a single function, “guide pages” that give a summary with links to many functions, and “tutorials” that provide narrative introductions to areas of functionality. Well, as of Version 11.3 there’s a fourth kind of component: workflows—which is what the gray tiles at the bottom of the “root guide page” lead to.
When everything you’re doing is represented by explicit Wolfram Language code, the In/Out paradigm of notebooks is a great way to show what’s going on. But if you’re clicking around, or, worse, using external programs, this isn’t enough. And that’s where workflows come in—because they use all sorts of graphical devices to present sequences of actions that aren’t just entering Wolfram Language input.
Another big new interface-related thing in Version 11.3 is Presenter Tools—a complete environment for creating and running presentations that include live interactivity. What makes Presenter Tools possible is the rich notebook system that we’ve built over the past 30 years. But what it does is to add all the features one needs to conveniently create and run really great presentations.
People have been using our previous SlideShow format to give presentations with Wolfram Notebooks for about 20 years. But it was never a complete solution. Yes, it provided nice notebook features like live computation in a slide show environment, but it didn’t do “PowerPoint-like” things such as automatically scaling content to screen resolution. To be fair, we expected that operating systems would just intrinsically solve problems like content scaling. But it’s been 20 years and they still haven’t. So now we’ve built the new Presenter Tools that both solves such problems, and adds a whole range of features to create great presentations with notebooks as easy as possible.
Here’s what it looks like when you’re editing your presentation (and you can change themes whenever you want):
When you’re ready to present, just press Start Presentation. Everything goes full screen and is automatically scaled to the resolution of the screen you’re using. But here’s the big difference from PowerPoint-like systems: everything is live, interactive, editable, and scrollable. For example, you can have a Manipulate right inside a slide, and you can immediately interact with it. (Oh, and everything can be dynamic, say recreating graphics based on data that’s being imported in real time.) You can also use things like cell groups to organize content in slides. And you can edit what’s on a slide, and for example, do livecoding, running your code as you go.
When you’re ready to go to a new slide, just press a single key (or have your remote do it for you). By default, the key is Page Down (so you can still use arrow keys in editing), but you can set a different key if you want. You can have Presenter Tools show your slides on one display, then display notes and controls on another display. When you make your slides, you can include SideNotes and SideCode. SideNotes are “PowerPoint-like” textual notes. But SideCode is something different. It’s actually based on something I’ve done in my own talks for years. It’s code you’ve prepared, that you can “magically” insert onto a slide in real time during your presentation, immediately evaluating it if you want.
I’ve given a huge number of talks using Wolfram Notebooks over the years. A few times I’ve used the SlideShow format, but mostly I’ve just done everything in an ordinary notebook, often keeping notes on a separate device. But now I’m excited that with Version 11.3 I’ve got basically exactly the tools I need to prepare and present talks. I can pre-define some of the content and structure, but then the actual talk can be very dynamic and spontaneous—with live editing, livecoding and all sorts of interactivity.
Wolfram Chat
While we’re discussing interface capabilities, here’s another new one: Wolfram Chat. When people are interactively working together on something, it’s common to hear someone say “let me just send you a piece of code” or “let me send you a Manipulate”. Well, in Version 11.3 there’s now a very convenient way to do this, built directly into the Wolfram Notebook system—and it’s called Wolfram Chat. [Livestreamed design discussion.]
Just select File > New > Chat; you’ll get asked who you want to “chat with”—and it could be anyone anywhere with a Wolfram ID (though of course they do have to accept your invitation):
Then you can start a chat session, and, for example, put it alongside an ordinary notebook:
The neat thing is that you can send anything that can appear in a notebook, including images, code, dynamic objects, etc. (though it’s sandboxed so people can’t send “code bombs” to each other).
There are lots of obvious applications of Wolfram Chat, not only in collaboration, but also in things like classroom settings and technical support. And there are some other applications too. Like for running livecoding competitions. And in fact one of the ways we stress-tested Wolfram Chat during development was to use it for the livecoding competition at the Wolfram Technology Conference last fall.
One might think that chat is something straightforward. But actually it’s surprisingly tricky, with a remarkable number of different situations and cases to cover. Under the hood, Wolfram Chat is using both the Wolfram Cloud and the new pub-sub channel framework that we introduced in Version 11.0. In Version 11.3, Wolfram Chat is only being supported for desktop Wolfram Notebooks, but it’ll be coming soon to notebooks on the web and on mobile.
Language Conveniences
We’re always polishing the Wolfram Language to make it more convenient and productive to use. And one way we do this is by adding new little “convenience functions” in every version of the language. Often what these functions do is pretty straightforward; the challenge (which has often taken years) is to come up with really clean designs for them. (You can see quite a bit of the discussion about the new convenience functions for Version 11.3 in livestreams we’ve done recently.)
Here’s a function that it’s sort of amazing we’ve never explicitly had before—a function that just constructs an expression from its head and arguments:
✕
Construct[f, x, y]
Why is this useful? Well, it can save explicitly constructing pure functions with Function or &, for example in a case like this:
✕
Fold[Construct, f, {a, b, c}]
Another function that at some level is very straightforward (but about whose name we agonized for quite a while) is Curry. Curry (named after “currying”, which is in turn named after Haskell Curry) essentially makes operator forms, with Curry[f,n] “currying in” n arguments:
Why is this useful? Well, some functions (like Select, say) have built-in “operator forms”, in which you give one argument, then you “curry in” others:
✕
Select[# > 5 &][Range[10]]
But what if you wanted to create an operator form yourself? Well, you could always explicitly construct it using Function or &. But with Curry you don’t need to do that. Like here’s an operator form of D, in which the second argument is specified to be x:
✕
Curry[D][x]
Now we can apply this operator form to actually do differentiation with respect to x:
✕
%[f[x]]
Yes, Curry is at some level rather abstract. But it’s a nice convenience if you understand it—and understanding it is a good exercise in understanding the symbolic structure of the Wolfram Language.
Talking of operator forms, by the way, NearestTo is an operator-form analog of Nearest (the one-argument form of Nearest itself generates a NearestFunction):
✕
NearestTo[2.3][{1, 2, 3, 4, 5}]
Here’s an example of why this is useful. This finds the 5 chemical elements whose densities are nearest to 10 g/cc:
In Version 10.1 in 2015 we introduced a bunch of functions that operate on sequences in lists. Version 11.3 adds a couple more such functions. One is SequenceSplit. It’s like StringSplit for lists: it splits lists at the positions of particular sequences:
✕
uenceSplit[{a, b, x, x, c, d, x, e, x, x, a, b}, {x, x}]
Also new in the “Sequence family” is the function SequenceReplace:
✕
SequenceReplace[{a, b, x, x, c, d, x, e, x, x, a,
b}, {x, n_} -> {n, n, n}]
Visualization Updates
Just as we’re always polishing the core programming functionality of the Wolfram Language, we’re also always polishing things like visualization.
In Version 11.0, we added GeoHistogram, here showing “volcano density” in the US:
Also new in Version 11.3 are callouts in 3D plots, here random words labeling random points (but note how the words are positioned to avoid each other):
We can make a slightly more meaningful plot of words in 3D by using the new machine-learning-based FeatureSpacePlot3D (notice for example that “vocalizing” and “crooning” appropriately end up close together):
✕
FeatureSpacePlot3D[RandomWord[20]]
Text Reading
Talking of machine learning, Version 11.3 continues our aggressive development of automated machine learning, building both general tools, and specific functions that make use of machine learning.
An interesting example of a new function is FindTextualAnswer, which takes a piece of text, and tries to find answers to textual questions. Here we’re using the Wikipedia article on “rhinoceros”, asking how much a rhino weighs:
✕
FindTextualAnswer[
WikipediaData["rhinoceros"], "How much does a rhino weigh?"]
It almost seems like magic. Of course it doesn’t always work, and it can do things that we humans would consider pretty stupid. But it’s using very state-of-the-art machine learning methodology, together with a lot of unique training data based on Wolfram|Alpha. We can see a little more of what it does if we ask not just for its top answer about rhino weights, but for its top 5:
✕
FindTextualAnswer[
WikipediaData["rhinoceros"], "How much does a rhino weigh?", 5]
FindTextualAnswer is no substitute for our whole data curation and computable data strategy. But it’s useful as a way to quickly get a first guess of an answer, even from completely unstructured text. And, yes, it should do well at critical reading exercises, and could probably be made to do well at Jeopardy! too.
Face Computation
We humans respond a lot to human faces, and with modern machine learning it’s possible to do all sorts of face-related computations—and in Version 11.3 we’ve added systematic functions for this. Here FindFaces pulls out faces (of famous physicists) from a photograph:
There are now all sorts of functions in the Wolfram Language (like FacialFeatures) that use neural networks inside. But for several years we’ve also been energetically building a whole subsystem in the Wolfram Language to let people work directly with neural networks. We’ve been building on top of low-level libraries (particularly MXNet, to which we’ve been big contributors), so we can make use of all the latest GPU and other optimizations. But our goal is to build a high-level symbolic layer that makes it as easy as possible to actually set up neural net computations. [Livestreamed design discussions 1, 2 and 3.]
There are many parts to this. Setting up automatic encoding and decoding to standard Wolfram Language constructs for text, images, audio and so on. Automatically being able to knit together individual neural net operations, particularly ones that deal with things like sequences. Being able to automate training as much as possible, including automatically doing hyperparameter optimization.
But there’s something perhaps even more important too: having a large library of existing, trained (and untrained) neural nets, that can both be used directly for computations, and can be used for transfer learning, or as feature extractors. And to achieve this, we’ve been building our Neural Net Repository:
There are networks here that do all sorts of remarkable things. And we’re adding new networks every week. Each network has its own page, that includes examples and detailed information. The networks are stored in the cloud. But all you have to do to pull them into your computation is to use NetModel:
✕
NetModel["3D Face Alignment Net Trained on 300W Large Pose Data"]
NetModel["Wolfram FindTextualAnswer Net for WL 11.3"]
One thing that’s new in Version 11.3 is the iconic representation we’re using for networks. We’ve optimized it to give you a good overall view of the structure of net graphs, but then to allow interactive drilldown to any level of detail. And when you train a neural network, the interactive panels that come up have some spiffy new features—and with NetTrainResultsObject, we’ve now made the actual training process itself computable.
Version 11.3 has some new layer types like CTCLossLayer (particularly to support audio), as well as lots of updates and enhancements to existing layer types (10x faster LSTMs on GPUs, automatic variable-length convolutions, extensions of many layers to support arbitrary-dimension inputs, etc.). In Version 11.3 we’ve had a particular focus on recurrent networks and sequence generation. And to support this, we’ve introduced things like NetStateObject—that basically allows a network to have a persistent state that’s updated as a result of input data the network receives.
In developing our symbolic neural net framework we’re really going in two directions. The first is to make everything more and more automated, so it’s easier and easier to set up neural net systems. But the second is to be able to readily handle more and more neural net structures. And in Version 11.3 we’re adding a whole collection of “network surgery” functions—like NetTake, NetJoin and NetFlatten—to let you go in and tweak and hack neural nets however you want. Of course, our system is designed so that even if you do this, our whole automated system—with training and so on—still works just fine.
Asymptotic Analysis
For more than 30 years, we’ve been on a mission to make as much mathematics as possible computational. And in Version 11.3 we’ve finally started to crack an important holdout area: asymptotic analysis.
Here’s a simple example: find an approximate solution to a differential equation near x = 0:
At first, this might just look like a power series solution. But look more carefully: there’s an e(1/x) factor that would just give infinity at every order as a power series in x. But with Version 11.3, we’ve now got asymptotic analysis functions that handle all sorts of scales of growth and oscillation, not just powers.
Back when I made my living as a physicist, it always seemed like some of the most powerful dark arts centered around perturbation methods. There were regular perturbations and singular perturbations. There were things like the WKB method, and the boundary layer method. The point was always to compute an expansion in some small parameter, but it seemed to always require different trickery in different cases to achieve it. But now, after a few decades of work, we finally in Version 11.3 have a systematic way to solve these problems. Like here’s a differential equation where we’re looking for the solution for small ε:
Back in Version 11.2, we added a lot of capabilities for dealing with more sophisticated limits. But with our asymptotic analysis techniques we’re now also able to do something else, that’s highly relevant for all sorts of problems in areas like number theory and computational complexity theory, which is to compare asymptotic growth rates.
This is asking: is 2nk asymptotically less than (nm)! as n->∞? The result: yes, subject to certain conditions:
✕
AsymptoticLess[ 2^n^k, (n^m)!, n -> \[Infinity]]
“Elementary” Algebra
One of the features of Wolfram|Alpha popular among students is its “Show Steps” functionality, in which it synthesizes “on-the-fly tutorials” showing how to derive answers it gives. But what actually are the steps, in, say, a Show Steps result for algebra? Well, they’re “elementary operations” like “add the corresponding sides of two equations”. And in Version 11.3, we’re including functions to just directly do things like this:
✕
AddSides[a == b, c == d]
✕
MultiplySides[a == b, c == d]
And, OK, it seems like these are really trivial functions, that basically just operate on the structure of equations. And that’s actually what I thought when I said we should implement them. But as our Algebra R&D team quickly pointed out, there are all sorts of gotchas (“what if b is negative?”, etc.), that are what students often get wrong—but that with all of the algorithmic infrastructure in the Wolfram Language it’s easy for us to get right:
✕
MultiplySides[x/b > 7, b]
Proofs
The Wolfram Language is mostly about computing results. But given a result, one can also ask why it’s correct: one can ask for some kind of proof that demonstrates that it’s correct. And for more than 20 years I’ve been wondering how to find and represent general proofs in a useful and computable way in the Wolfram Language. And I’m excited that finally in Version 11.3 the function FindEquationalProof provides an example—which we’ll be generalizing and building on in future versions. [Livestreamed design discussion.]
My all-time favorite success story for automated theorem proving is the tiny (and in fact provably simplest) axiom system for Boolean algebra that I found in 2000. It’s just a single axiom, with a single operator that one can think of as corresponding to the Nand operation. For 11 years, FullSimplify has actually been able to use automated theorem-proving methods inside, to be able to compute things. So here it’s starting from my axiom for Boolean algebra, then computing that Nand is commutative:
What is the proof object? We can see from the summary that the proof takes 102 steps. Then we can ask for a “proof graph”. The green arrow at the top represents the original axiom; the red square at the bottom represents the thing being proved. All the nodes in the middle are intermediate lemmas, proved from each other according to the connections shown.
Now that we can actually generate symbolic proof structures in the Wolfram Language, there’s a lot of empirical metamathematics to do—as I’ll discuss in a future post. But given that FindEquationalProof works on arbitrary “equation-like” symbolic relations, it can actually be applied to lots of things—like verifying protocols and policies, for example in popular areas like blockchain.
The Growing Knowledgebase
The Wolfram Knowledgebase grows every single day—partly through systematic data feeds, and partly through new curated data and domains being explicitly added. If one asks what happens to have been added between Version 11.2 and Version 11.3, it’s a slightly strange grab bag. There are 150+ new properties about public companies. There are 900 new named features on Pluto and Mercury. There are 16,000 new anatomical structures, such as nerve pathways. There are nearly 500 new “notable graphs”. There are thousands of new mountains, islands, notable buildings, and other geo-related features. There are lots of new properties of foods, and new connections to diseases. And much more.
But in terms of typical everyday use of the Wolfram Knowledgebase the most important new feature in Version 11.3 is the entity prefetching system. The knowledgebase is obviously big, and it’s stored in the cloud. But if you’re using a desktop system, the data you need is “magically” downloaded for you.
Well, in Version 11.3, the magic got considerably stronger. Because now when you ask for one particular item, the system will try to figure out what you’re likely to ask for next, and it’ll automatically start asynchronously prefetching it, so when you actually ask for it, it’ll already be there on your computer—and you won’t have to wait for it to download from the cloud. (If you want to do the prefetching “by hand”, there’s the function EntityPrefetch to do it. Note that if you’re using the Wolfram Language in the cloud, the knowledgebase is already “right there”, so there’s no downloading or prefetching to do.)
The whole prefetching mechanism is applied quite generally. So, for example, if you use Interpreter to interpret some input (say, US state abbreviations), information about how to do the interpretations will also get prefetched—so if you’re using the desktop, the interpretations can be done locally without having to communicate with the cloud.
Messages and Mail
You’ve been able to send email from the Wolfram Language (using SendMail) for a decade. But starting in Version 11.3, it can use full HTML formatting, and you can embed lots of things in it—not just graphics and images, but also cloud objects, datasets, audio and so on. [Livestreamed design discussion.]
Version 11.3 also introduces the ability to send text messages (SMS and MMS) using SendMessage. For security reasons, though, you can only send to your own mobile number, as given by the value of $MobilePhone (and, yes, obviously, the number gets validated).
The Wolfram Language has been able to import mail messages and mailboxes for a long time, and with MailReceiverFunction it’s also able to respond to incoming mail. But in Version 11.3 something new that’s been added is the capability to deal with live mailboxes.
First, connect to an (IMAP, for now) mail server (I’m not showing the authentication dialog that comes up):
✕
mail = MailServerConnect[]
Then you can basically use the Wolfram Language as a programmable mail client. This gives you a dataset of current unread messages in your mailbox:
✕
MailSearch[ "fahim"|>]
Now we can pick out one of these messages, and we get a symbolic MailItem object, that for example we can delete:
✕
MailSearch[ "fahim"|>][[1]]
✕
MailExecute["Delete", %%["MailItem"]]
Systems-Level Operations
Version 11.3 supports a lot of new systems-level operations. Let’s start with a simple but useful one: remote program execution. The function RemoteRun is basically like Unix rsh: you give it a host name (or IP address) and it runs a command there. The Authentication option lets you specify a username and password. If you want to run a persistent program remotely, you can now do that with RemoteRunProcess, which is the remote analog of the local RunProcess.
In dealing with remote computer systems, authentication is always an issue—and for several years we’ve been building a progressively more sophisticated symbolic authentication framework in the Wolfram Language. In Version 11.3 there’s a new AuthenticationDialog function, which pops up a whole variety of appropriately configured authentication dialogs. Then there’s GenerateSecuredAuthenticationKey—which generates OAuth SecuredAuthenticationKey objects that people can use to authenticate calls into the Wolfram Cloud from the outside.
Also at a systems level, there are some new import/export formats, like BSON (JSON-like binary serialization format) and WARC (web archive format). There are also HTTPResponse and HTTPRequest formats, that (among many other things) you can use to basically write a web server in the Wolfram Language in a couple of lines.
We introduced ByteArray objects into the Wolfram Language quite a few years ago—and we’ve been steadily growing support for them. In Version 11.3, there are BaseEncode and BaseDecode for converting between byte arrays and Base64 strings. Version 11.3 also extends Hash (which, among other things, works on byte arrays), adding various types of hashing (such as double SHA-256 and RIPEMD) that are used for modern blockchain and cryptocurrency purposes.
We’re always adding more kinds of data that we can make computable in the Wolfram Language, and in Version 11.3 one addition is system process data, of the sort that you might get from a Unix ps command:
✕
SystemProcessData[]
Needless to say, you can do very detailed searches for processes with specific properties. You can also use SystemProcesses to get an explicit list of ProcessObject symbolic objects, which you can interrogate and manipulate (for example, by using KillProcess).
✕
RandomSample[SystemProcesses[], 3]
Of course, because everything is computable, it’s easy to do things like make plots of the start times of processes running on your computer (and, yes, I last rebooted a few days ago):
If you want to understand what’s going on around your computer, Version 11.3 provides another powerful tool: NetworkPacketRecording. You may have to do some permissions setup, but then this function can record network packets going through any network interface on your computer.
Here’s just 0.1 seconds of packets going in and out of my computer as I quietly sit here writing this post:
✕
NetworkPacketRecording[.1]
You can drill down to look at each packet; here’s the first one that was recorded:
✕
NetworkPacketRecording[.1][[1]]
Why is this interesting? Well, I expect to use it for debugging quite regularly—and it’s also useful for studying computer security, not least because you can immediately feed everything into standard Wolfram Language visualization, machine learning and other functionality.
What Has Not Been Mentioned
This is already a long post—but there are lots of other things in 11.3 that I haven’t even mentioned. For example, there’ve been all sorts of updates for importing and exporting. Like much more efficient and robust XLS, CSV, and TSV import. Or export of animated PNGs. Or support for metadata in sound formats like MP3 and WAV. Or more sophisticated color quantization in GIF, TIFF, etc. [Livestreamed design discussions 1 and 2.]
We introduced symbolic Audio objects in 11.0, and we’ve been energetically developing audio functionality ever since. Version 11.3 has made audio capture more robust (and supported it for the first time on Linux). It’s also introduced functions like AudioPlay, AudioPause and AudioStop that control open AudioStream objects.
Also new is AudioDistance, which supports various distance measures for audio. Meanwhile, AudioIntervals can now automatically break audio into sections that are separated by silence. And, in a somewhat different area, $VoiceStyles gives the list of possible voices available for SpeechSynthesize.
Here’s a little new math function—that in this case gives a sequence of 0s and 1s in which every length-4 block appears exactly once:
✕
DeBruijnSequence[{0, 1}, 4]
The Wolfram Language now has sophisticated support for quantities and units—both explicit quantities (like 2.5 kg) and symbolic “quantity variables” (“p which has units of pressure”). But once you’re inside, doing something like solving an equation, you typically want to “factor the units out”. And in 11.3 there’s now a function that systematically does this: NondimensionalizationTransform. There’s also a new mechanism in 11.3 for introducing new kinds of quantities, using IndependentPhysicalQuantity.
Much of the built-in Wolfram Knowledgebase is ultimately represented in terms of entity stores, and in Version 11 we introduced an explicit EntityStore construct for defining new entity stores. Version 11.3 introduces the function EntityRegister, which lets you register an entity store, so that you can refer to the types of entities it contains just like you would refer to built-in types of entities (like cities or chemicals).
Another thing that’s being introduced as an experiment in Version 11.3 is the MongoLink package, which supports connection to external MongoDB databases. We use MongoLink ourselves to manage terabyte-and-beyond datasets for things like machine learning training. And in fact MongoLink is part of our large-scale development effort—whose results will be seen in future versions—to seamlessly support extremely large amounts of externally stored data.
In Version 11.2 we introduced ExternalEvaluate to run code in external languages like Python. In Version 11.3 we’re experimenting with generalizing ExternalEvaluate to control web browsers, by setting up a WebDriver framework. You can give all sorts of commands, both ones that have the same effect as clicking around an actual web browser, and ones that extract things you can see on the page.
Here’s how you can use Chrome (we support both it and Firefox) to open a webpage, then capture it:
Well, this post is getting long, but there’s certainly more I could say. Here’s a more complete list of functions that are new or updated in Version 11.3:
But to me it’s remarkable how much there is that’s in a .1 release of the Wolfram Language—and that’s emerged in just the few months since the last .1 release. It’s a satisfying indication of the volume of R&D that we’re managing to complete—by building on the whole Wolfram Language technology stack that we’ve created. And, yes, even in 11.3 there are a great many new corners to explore. And I hope that lots of people will do this, and will use the latest tools we’ve created to discover and invent all sorts of new and important things in the world.
To comment, please visit the copy of this post at the Wolfram Blog »
What happens if you take four of today’s most popular buzzwords and string them together? Does the result mean anything? Given that today is April 1 (as well as being Easter Sunday), I thought it’d be fun to explore this. Think of it as an Easter egg… from which something interesting just might hatch. And to make it clear: while I’m fooling around in stringing the buzzwords together, the details of what I’ll say here are perfectly real.
But before we can really launch into talking about the whole string of buzzwords, let’s discuss some of the background to each of the buzzwords on their own.
“Quantum”
Saying something is “quantum” sounds very modern. But actually, quantum mechanics is a century old. And over the course of the past century, it’s been central to understanding and calculating lots of things in the physical sciences. But even after a century, “truly quantum” technology hasn’t arrived. Yes, there are things like lasers and MRIs and atomic force microscopes that rely on quantum phenomena, and needed quantum mechanics in order to be invented. But when it comes to the practice of engineering, what’s done is still basically all firmly classical, with nothing quantum about it.
Today, though, there’s a lot of talk about quantum computing, and how it might change everything. I actually worked on quantum computing back in the early 1980s (so, yes, it’s not that recent an idea). And I have to say, I was always a bit skeptical about whether it could ever really work—or whether any “quantum gains” one might get would be counterbalanced by inefficiencies in measuring what was going on.
But in any case, in the past 20 years or so there’s been all sorts of nice theoretical work on formulating the idea of quantum circuits and quantum computing. Lots of things have been done with the Wolfram Language, including an ongoing project of ours to produce a definitive symbolic way of representing quantum computations. But so far, all we can ever do is calculate about quantum computations, because the Wolfram Language itself just runs on ordinary, classical computers.
There are companies that have built what they say are (small) true quantum computers. And actually, we’ve been hoping to hook the Wolfram Language up to them, so we can implement a QuantumEvaluate function. But so far, this hasn’t happened. So I can’t really vouch for what QuantumEvaluate will (or will not) do.
But the big idea is basically this. In ordinary classical physics, one can pretty much say that definite things happen in the world. A billiard ball goes in this direction, or that. But in any particular case, it’s a definite direction. In quantum mechanics, though, the idea is that an electron, say, doesn’t intrinsically go in a particular, definite direction. Instead, it essentially goes in all possible directions, each with a particular amplitude. And it’s only when you insist on measuring where it went that you’ll get a definite answer. And if you do many measurements, you’ll just see probabilities for it to go in each direction.
Well, what quantum computing is trying to do is somehow to make use of the “all possible directions” idea in order to in effect get lots of computations done in parallel. It’s a tricky business, and there are only a few types of problems where the theory’s been worked out—the most famous being integer factoring. And, yes, according to the theory, a big quantum computer should be able to factor a big integer fast enough to make today’s cryptography infrastructure implode. But the only thing anyone so far even claims to have built along these lines is a tiny quantum computer—that definitely can’t yet do anything terribly interesting.
But, OK, so one critical aspect of quantum mechanics is that there can be interference between different paths that, say, an electron can take. This is mathematically similar to the interference that happens in light, or even in water waves, just in classical physics. In quantum mechanics, though, there’s supposed to be something much more intrinsic about the interference, leading to the phenomenon of entanglement, in which one basically can’t ever “see the wave that’s interfering”—only the effect.
In computing, though, we’re not making use of any kind of interference yet. Because (at least in modern times) we’re always trying to deal with discrete bits—while the typical phenomenon of interference (say in light) basically involves continuous numbers. And my personal guess is that optical computing—which will surely come—will succeed in delivering some spectacular speedups. It won’t be truly “quantum”, though (though it might be marketed like that). (For the technically minded, it’s a complicated question how computation-theoretic results apply to continuous processes like interference-based computing.)
“Neural”
A decade ago computers didn’t have any systematic way to tell whether a picture was of an elephant or a teacup. But in the past five years, thanks to neural networks, this has basically become easy. (Interestingly, the image identifier we made three years ago remains basically state of the art.)
So what’s the big idea? Well, back in the 1940s people started thinking seriously about the brain being like an electrical machine. And this led to mathematical models of “neural networks”—which were proved to be equivalent in computational power to mathematical models of digital computers. Over the years that followed, billions of actual digital electronic computers were built. And along the way, people (including me) experimented with neural networks, but nobody could get them to do anything terribly interesting. (Though for years they were quietly used for things like optical character recognition.)
But then, starting in 2012, a lot of people suddenly got very excited, because it seemed like neural nets were finally able to do some very interesting things, at first especially in connection with images.
So what happened? Well, a neural net basically corresponds to a big mathematical function, formed by connecting together lots of smaller functions, each involving a certain number of parameters (“weights”). At the outset, the big function basically just gives random outputs. But the way the function is set up, it’s possible to “train the neural net” by tuning the parameters inside it so that the function will give the outputs one wants.
It’s not like ordinary programming where one explicitly defines the steps a computer should follow. Instead, the idea is just to give examples of what one wants the neural net to do, and then to expect it to interpolate between them to work out what to do for any particular input. In practice one might show a bunch of images of elephants, and a bunch of images of teacups, and then do millions of little updates to the parameters to get the network to output “elephant” when it’s fed an elephant, and “teacup” when it’s fed a teacup.
But here’s the crucial idea: the neural net is somehow supposed to generalize from the specific examples it’s shown—and it’s supposed to say that anything that’s “like” an elephant example is an elephant, even if its particular pixels are quite different. Or, said another way, there are lots of images that might be fed to the network that are in the “basin of attraction” for “elephant” as opposed to “teacup”. In a mechanical analogy, one might say that there are lots of places water might fall on a landscape, while still ending up flowing to one lake rather than another.
At some level, any sufficiently complicated neural net can in principle be trained to do anything. But what’s become clear is that for lots of practical tasks (that turn out to overlap rather well with some of what our brains seem to do easily) it’s realistic with feasible amounts of GPU time to actually train neural networks with a few million elements to do useful things. And, yes, in the Wolfram Language we’ve now got a rather sophisticated symbolic framework for training and using neural networks—with a lot of automation (that itself uses neural nets) for everything.
“Blockchain”
The word “blockchain” was first used in connection with the invention of Bitcoin in 2008. But of course the idea of a blockchain had precursors. In its simplest form, a blockchain is like a ledger, in which successive entries are coded in a way that depends on all previous entries.
Crucial to making this work is the concept of hashing. Hashing has always been one of my favorite practical computation ideas (and I even independently came up with it when I was about 13 years old, in 1973). What hashing does is to take some piece of data, like a text string, and make a number (say between 1 and a million) out of it. It does this by “grinding up the data” using some complicated function that always gives the same result for the same input, but will almost always give different results for different inputs. There’s a function called Hash in the Wolfram Language, and for example applying it to the previous paragraph of text gives 8643827914633641131.
OK, but so how does this relate to blockchain? Well, back in the 1980s people invented “cryptographic hashes” (and actually they’re very related to things I’ve done on computational irreducibility). A cryptographic hash has the feature that while it’s easy to work out the hash for a particular piece of data, it’s very hard to find a piece of data that will generate a given hash.
So let’s say you want to prove that you created a particular document at a particular time. Well, you could compute a hash of that document, and publish it in a newspaper (and I believe Bell Labs actually used to do this every week back in the 1980s). And then if anyone ever says “no, you didn’t have that document yet” on a certain date, you can just say “but look, its hash was already in every copy of the newspaper!”.
The idea of a blockchain is that one has a series of blocks, with each containing certain content, together with a hash. And then the point is that the data from which that hash is computed is a combination of the content of the block, and the hash of the preceding block. So this means that each block in effect confirms everything that came before it on the blockchain.
In cryptocurrencies like Bitcoin the big idea is to be able to validate transactions, and, for example, be able to guarantee just by looking at the blockchain that nobody has spent the same bitcoin twice.
How does one know that the blocks are added correctly, with all their hashes computed, etc.? Well, the point is that there’s a whole decentralized network of thousands of computers around the world that store the blockchain, and there are lots of people (well, actually not so many in practice these days) competing to be the one to add each new block (and include transactions people have submitted that they want in it).
The rules are (more or less) that the first person to add a block gets to keep the fees offered on the transactions in it. But each block gets “confirmed” by lots of people including this block in their copy of the blockchain, and then continuing to add to the blockchain with this block in it.
In the latest version of the Wolfram Language, BlockchainBlockData[−1, BlockchainBase -> "Bitcoin"] gives a symbolic representation of the latest block that we’ve seen be added to the Bitcoin blockchain. And by the time maybe 5 more blocks have been added, we can be pretty sure everyone’s satisfied that the block is correct. (Yes, there’s an analogy with measurement in quantum mechanics here, which I’ll be talking about soon.)
Traditionally, when people keep ledgers, say of transactions, they’ll have one central place where a master ledger is maintained. But with a blockchain the whole thing can be distributed, so you don’t have to trust any single entity to keep the ledger correct.
And that’s led to the idea that cryptocurrencies like Bitcoin can flourish without central control, governments or banks involved. And in the last couple of years there’s been lots of excitement generated by people making large amounts of money speculating on cryptocurrencies.
But currencies aren’t the only thing one can use blockchains for, and Ethereum pioneered the idea that in addition to transactions, one can run arbitrary computations at each node. Right now with Ethereum the results of each computation are confirmed by being run on every single computer in the network, which is incredibly inefficient. But the bigger point is just that computations can be running autonomously on the network. And the computations can interact with each other, defining “smart contracts” that run autonomously, and say what should happen in different circumstances.
Pretty much any nontrivial smart contract will eventually need to know about something in the world (“did it rain today?”, “did the package arrive?”, etc.), and that has to come from off the blockchain—from an “oracle”. And it so happens (yes, as a result of a few decades of work) that our Wolfram Knowledgebase, which powers Wolfram|Alpha, etc., provides the only realistic foundation today for making such oracles.
“AI”
Back in the 1950s, people thought that pretty much anything human intelligence could do, it’d soon be possible to make artificial (machine) intelligence do better. Of course, this turned out to be much harder than people expected. And in fact the whole concept of “creating artificial intelligence” pretty much fell into disrepute, with almost nobody wanting to market their systems as “doing AI”.
But about five years ago—particularly with the unexpected successes in neural networks—all that changed, and AI was back, and cooler than ever.
What is AI supposed to be, though? Well, in the big picture I see it as being the continuation of a long trend of automating things that humans previously had to do for themselves—and in particular doing that through computation. But what makes a computation an example of AI, and not just, well, a computation?
I’ve built a whole scientific and philosophical structure around something I call the Principle of Computational Equivalence, that basically says that the universe of possible computations—even done by simple systems—is full of computations that are as sophisticated as one can ever get, and certainly as our brains can do.
In doing engineering, and in building programs, though, there’s been a tremendous tendency to try to prevent anything too sophisticated from happening—and to set things up so that the systems we build just follow exactly steps we can foresee. But there’s much more to computation than that, and in fact I’ve spent much of my life building systems that make use of this.
Wolfram|Alpha is a great example. Its goal is to take as much knowledge about the world as possible, and make it computable, then to be able to answer questions as expertly as possible about it. Experientially, it “feels like AI”, because you get to ask it questions in natural language like a human, then it computes answers often with unexpected sophistication.
Most of what’s inside Wolfram|Alpha doesn’t work anything like brains probably do, not least because it’s leveraging the last few hundred years of formalism that our civilization has developed, that allow us to be much more systematic than brains naturally are.
Some of the things modern neural nets do (and, for example, our machine learning system in the Wolfram Language does) perhaps work a little more like brains. But in practice what really seems to make things “seem like AI” is just that they’re operating on the basis of sophisticated computations whose behavior we can’t readily understand.
These days the way I see it is that out in the computational universe there’s amazing computational power. And the issue is just to be able to harness that for useful human purposes. Yes, “an AI” can go off and do all sorts of computations that are just as sophisticated as our brains. But the issue is: can we align what it does with things we care about doing?
And, yes, I’ve spent a large part of my life building the Wolfram Language, whose purpose is to provide a computational communication language in which humans can express what they want in a form suitable for computation. There’s lots of “AI power” out there in the computational universe; our challenge is to harness it in a way that’s useful to us.
Oh, and we want to have some kind of computational smart contracts that define how we want the AIs to behave (e.g. “be nice to humans”). And, yes, I think the Wolfram Language is going to be the right way to express those things, and build up the “AI constitutions” we want.
Common Themes
At the outset, it might seem as if “quantum”, “neural”, “blockchain” and “AI” are all quite separate concepts, without a lot of commonality. But actually it turns out that there are some amazing common themes.
One of the strongest has to do with complexity generation. And in fact, in their different ways, all the things we’re talking about rely on complexity generation.
What do I mean by complexity generation? One day I won’t have to explain this. But for now I probably still do. And somehow I find myself always showing the same picture—of my all-time favorite science discovery, the rule 30 automaton. Here it is:
And the point here is that even though the rule (or program) is very simple, the behavior of the system just spontaneously generates complexity, and apparent randomness. And what happens is complicated enough that it shows what I call “computational irreducibility”, so that you can’t reduce the computational work needed to see how it will behave: you essentially just have to follow each step to find out what will happen.
There are all sorts of important phenomena that revolve around complexity generation and computational irreducibility. The most obvious is just the fact that sophisticated computation is easy to get—which is in a sense what makes something like AI possible.
But OK, how does this relate to blockchain? Well, complexity generation is what makes cryptographic hashing possible. It’s what allows a simple algorithm to make enough apparent randomness to successfully be used as a cryptographic hash.
In the case of something like Bitcoin, there’s another connection too: the protocol needs people to have to make some investment to be able to add blocks to the blockchain, and the way this is achieved is (bizarrely enough) by forcing them to do irreducible computations that effectively cost computer time.
What about neural nets? Well, the very simplest neural nets don’t involve much complexity at all. If one drew out their “basins of attraction” for different inputs, they’d just be simple polygons. But in useful neural nets the basins of attraction are much more complicated.
It’s most obvious when one gets to recurrent neural nets, but it happens in the training process for any neural net: there’s a computational process that effectively generates complexity as a way to approximate things like the distinctions (“elephant” vs. “teacup”) that get made in the world.
Alright, so what about quantum mechanics? Well, quantum mechanics is at some level full of randomness. It’s essentially an axiom of the traditional mathematical formalism of quantum mechanics that one can only compute probabilities, and that there’s no way to “see under the randomness”.
But even in the standard formalism of quantum mechanics, there’s a kind of complementary place where randomness and complexity generation is important, and it’s in the somewhat mysterious process of measurement.
Let’s start off by talking about another phenomenon in physics: the Second Law of Thermodynamics, or Law of Entropy Increase. This law says that if you start, for example, a bunch of gas molecules in a very orderly configuration (say all in one corner of a box), then with overwhelming probability they’ll soon randomize (and e.g. spread out randomly all over the box). And, yes, this kind of trend towards randomness is something we see all the time.
But here’s the strange part: if we look at the laws for, say, the motion of individual gas molecules, they’re completely reversible—so just as they say that the molecules can randomize themselves, so also they say that they should be able to unrandomize themselves.
But why do we never see that happen? It’s always been a bit mysterious, but I think there’s a clear answer, and it’s related to complexity generation and computational irreducibility. The point is that when the gas molecules randomize themselves, they’re effectively encrypting the initial conditions they were given.
It’s not impossible to place the gas molecules so they’ll unrandomize rather than randomize; it’s just that to work out how to do this effectively requires breaking the encryption—or in essence doing something very much like what’s involved in Bitcoin mining.
OK, so how does this relate to quantum mechanics? Well, quantum mechanics itself is fundamentally based on probability amplitudes, and interference between different things that can happen. But our experience of the world is that definite things happen. And the bridge from quantum mechanics to this involves the rather “bolted-on” idea of quantum measurement.
The notion is that some little quantum effect (“the electron ends up with spin up, rather than down”) needs to get amplified to the point where one can really be sure what happened. In other words, one’s measuring device has to make sure that the little quantum effect associated with one electron cascades so that it’s spread across lots and lots of electrons and other things.
And here’s the tricky part: if one wants to avoid interference being possible (so we can really perceive something “definite” as having happened), then one needs to have enough randomness that things can’t somehow equally well go backwards—just like in thermodynamics.
So even though pure quantum circuits as one imagines them for practical quantum computers typically have a sufficiently simple mathematical structure that they (presumably) don’t intrinsically generate complexity, the process of measuring what they do inevitably must generate complexity. (And, yes, it’s a reasonable question whether that’s in some sense where the randomness one sees “really” comes from… but that’s a different story.)
Reversibility, Irreversibility and More
Reversibility and irreversibility are a strangely common theme, at least between “quantum”, “neural” and “blockchain”. If one ignores measurement, a fundamental feature of quantum mechanics is that it’s reversible. What this means is that if one takes a quantum system, and lets it evolve in time, then whatever comes out one will always, at least in principle, be able to take and run backwards, to precisely reproduce where one started from.
Typical computation isn’t reversible like that. Consider an OR gate, that might be a basic component in a computer. In p OR q, the result will be true if either p or q is true. But just knowing that the result is “true”, you can’t figure out which of p and q (or both) is true. In other words, the OR operation is irreversible: it doesn’t preserve enough information for you to invert it.
In quantum circuits, one uses gates that, say, take two inputs (say p and q), and give two outputs (say p' and q'). And from those two outputs one can always uniquely reproduce the two inputs.
OK, but now let’s talk about neural nets. Neural nets as they’re usually conceived are fundamentally irreversible. Here’s why. Imagine (again) that you make a neural network to distinguish elephants and teacups. To make that work, a very large number of different possible input images all have to map, say, to “elephant”. It’s like the OR gate, but more so. Just knowing the result is “elephant” there’s no unique way to invert the computation. And that’s the whole point: one wants anything that’s enough like the elephant pictures one showed to still come out as “elephant”; in other words, irreversibility is central to the whole operation of at least this kind of neural net.
So, OK, then how could one possibly make a quantum neural net? Maybe it’s just not possible. But if so, then what’s going on with brains? Because brains seem to work very much like neural nets. And yet brains are physical systems that presumably follow quantum mechanics. So then how are brains possible?
At some level the answer has to do with the fact that brains dissipate heat. Well, what is heat? Microscopically, heat is the random motion of things like molecules. And one way to state the Second Law of Thermodynamics (or the Law of Entropy Increase) is that under normal circumstances those random motions never spontaneously organize themselves into any kind of systematic motion. In principle all those molecules could start moving in just such a way as to turn a flywheel. But in practice nothing like that ever happens. The heat just stays as heat, and doesn’t spontaneously turn into macroscopic mechanical motion.
OK, but so let’s imagine that microscopic processes involving, say, collisions of molecules, are precisely reversible—as in fact they are according to quantum mechanics. Then the point is that when lots of molecules are involved, their motions can get so “encrypted” that they just seem random. If one could look at all the details, there’d still be enough information to reverse everything. But in practice one can’t do that, and so it seems like whatever was going on in the system has just “turned into heat”.
So then what about producing “neural net behavior”? Well, the point is that while one part of a system is, say, systematically “deciding to say elephant”, the detailed information that would be needed to go back to the initial state is getting randomized, and turning into heat.
To be fair, though, this is glossing over quite a bit. And in fact I don’t think anyone knows how one can actually set up a quantum system (say a quantum circuit) that behaves in this kind of way. It’d be pretty interesting to do so, because it’d potentially tell us a lot about the quantum measurement process.
To explain how one goes from quantum mechanics in which everything is just an amplitude, to our experience of the world in which definite things seem to happen, people sometimes end up trying to appeal to mystical features of consciousness. But the point about a quantum neural net is that it’s quantum mechanical, yet it “comes to definite conclusions” (e.g. elephant vs. teacup).
Is there a good toy model for such a thing? I suspect one could create one from a quantum version of a cellular automaton that shows phase transition behavior—actually not unlike the detailed mechanics of a real quantum magnetic material. And what will be necessary is that the system has enough components (say spins) that the “heat” needed to compensate for its apparent irreversible behavior will stay away from the part where the irreversible behavior is observed.
Let me make a perhaps slightly confusing side remark. When people talk about “quantum computers”, they are usually talking about quantum circuits that operate on qubits (quantum analog of binary bits). But sometimes they actually mean something different: they mean quantum annealing devices.
Imagine you’ve got a bunch of dominoes and you’re trying to arrange them on the plane so that some matching condition associated with the markings on them is always satisfied. It turns out this can be a very hard problem. It’s related to computational irreducibility (and perhaps to problems like integer factoring). But in the end, to find out, say, the configuration that does best in satisfying the matching condition everywhere, one may effectively have to essentially just try out all possible configurations, and see which one works best.
Well, OK, but let’s imagine that the dominoes were actually molecules, and the matching condition corresponds to arranging molecules to minimize energy. Then the problem of finding the best overall configuration is like the problem of finding the minimum energy configuration for the molecules, which physically should correspond to the most stable solid structure that can be formed from the molecules.
And, OK, it might be hard to compute that. But what about an actual physical system? What will the molecules in it actually do when one cools it down? If it’s easy for the molecules to get to the lowest energy configuration, they’ll just do it, and one will have a nice crystalline solid.
People sometimes assume that “the physics will always figure it out”, and that even if the problem is computationally hard, the molecules will always find the optimal solution. But I don’t think this is actually true—and I think what instead will happen is that the material will turn mushy, not quite liquid and not quite solid, at least for a long time.
Still, there’s the idea that if one sets up this energy minimization problem quantum mechanically, then the physical system will be successful at finding the lowest energy state. And, yes, in quantum mechanics it might be harder to get stuck in local minima, because there is tunneling, etc.
But here’s the confusing part: when one trains a neural net, one ends up having to effectively solve minimization problems like the one I’ve described (“which values of weights make the network minimize the error in its output relative to what one wants?”). So people end up sometimes talking about “quantum neural nets”, meaning domino-like arrays which are set up to have energy minimization problems that are mathematically equivalent to the ones for neural nets.
(Yet another connection is that convolutional neural nets—of the kind used for example in image recognition—are structured very much like cellular automata, or like dynamic spin systems. But in training neural nets to handle multiscale features in images, one seems to end up with scale invariance similar to what one sees at critical points in spin systems, or their quantum analogs, as analyzed by renormalization group methods.)
OK, but let’s return to our whole buzzword string. What about blockchain? Well, one of the big points about a blockchain is in a sense to be as irreversible as possible. Once something has been added to a blockchain, one wants it to be inconceivable that it should ever be reversed out.
How is that achieved? Well, it’s curiously similar to how it works in thermodynamics or in quantum measurement. Imagine someone adds a block to their copy of a blockchain. Well, then the idea is that lots of other people all over the world will make their own copies of that block on their own blockchain nodes, and then go on independently adding more blocks from there.
Bad things would happen if lots of the people maintaining blockchain nodes decided to collude to not add a block, or to modify it, etc. But it’s a bit like with gas molecules (or degrees of freedom in quantum measurement). By the time everything is spread out among enough different components, it’s extremely unlikely that it’ll all concentrate together again to have some systematic effect.
Of course, people might not be quite like gas molecules (though, frankly, their observed aggregate behavior, e.g. jostling around in a crowd, is often strikingly similar). But all sorts of things in the world seem to depend on an assumption of randomness. And indeed, that’s probably necessary to maintain stability and robustness in markets where trading is happening.
OK, so when a blockchain tries to ensure that there’s a “definite history”, it’s doing something very similar to what a quantum measurement has to do. But just to close the loop a little more, let’s ask what a quantum blockchain might be like.
Yes, one could imagine using quantum computing to somehow break the cryptography in a standard blockchain. But the more interesting (and in my view, realistic) possibility is to make the actual operation of the blockchain itself be quantum mechanical.
In a typical blockchain, there’s a certain element of arbitrariness in how blocks get added, and who gets to do it. In a “proof of work” scheme (as used in Bitcoin and currently also Ethereum), to find out how to add a new block one searches for a “nonce”—a number to throw in to make a hash come out in a certain way. There are always many possible nonces (though each one is hard to find), and the typical strategy is to search randomly for them, successively testing each candidate.
But one could imagine a quantum version in which one is in effect searching in parallel for all possible nonces, and as a result producing many possible blockchains, each with a certain quantum amplitude. And to fill out the concept, imagine that—for example in the case of Ethereum—all computations done on the blockchain were reversible quantum ones (achieved, say, with a quantum version of the Ethereum Virtual Machine).
But what would one do with such a blockchain? Yes, it would be an interesting quantum system with all kinds of dynamics. But to actually connect it to the world, one has get data on and off the blockchain—or, in other words, one has to do a measurement. And the act of that measurement would in effect force the blockchain to pick a definite history.
OK, so what about a “neural blockchain”? At least today, by far the most common strategy with neural nets is first to train them, then to put them to work. (One can train them “passively” by just feeding them a fixed set of examples, or one can train them “actively” by having them in effect “ask” for the examples they want.) But by analogy with people, neural nets can also have “lifelong learning”, in which they’re continually getting updated based on the “experiences” they’re having.
So how do the neural nets record these experiences? Well, by changing various internal weights. And in some ways what happens is like what happens with blockchains.
Science fiction sometimes talks about direct brain-to-brain transfer of memories. And in a neural net context this might mean just taking a big block of weights from one neural net and putting it into another. And, yes, it can work well to transfer definite layers in one network to another (say to transfer information on what features of images are worth picking out). But if you try to insert a “memory” deep inside a network, it’s a different story. Because the way a memory is represented in a network will depend on the whole history of the network.
It’s like in a blockchain: you can’t just replace one block and expect everything else to work. The whole thing has been knitted into the sequence of things that happen through time. And it’s the same thing with memories in neural nets: once a memory has formed in a certain way, subsequent memories will be built on top of this one.
Bringing It Together
At the outset, one might have thought that “quantum”, “neural” and “blockchain” (not to mention “AI”) didn’t have much in common (other than that they’re current buzzwords)—and that in fact they might in some sense be incompatible. But what we’ve seen is that actually there are all sorts of connections between them, and all sorts of fundamental phenomena that are shared between systems based on them.
So what might a “quantum neural blockchain AI” (“QNBAI”) be like?
Let’s look at the pieces again. A single blockchain node is a bit like a single brain, with a definite memory. But in a sense the whole blockchain network becomes robust through all the interactions between different blockchain nodes. It’s a little like how human society and human knowledge develop.
Let’s say we’ve got a “raw AI” that can do all sorts of computation. Well, the big issue is whether we can find a way to align what it can do with things that we humans think we want to do. And to make that alignment, we essentially have to communicate with the AI at a level of abstraction that transcends the details of how it works: in effect, we have to have some symbolic language that we both understand, and that for example AI can translate into the details of how it operates.
Inside the AI it may end up using all kinds of “concepts” (say to distinguish one class of images from another). But the question is whether those concepts are ones that we humans in a sense “culturally understand”. In other words, are those concepts (and, for example, the words for them) ones that there’s a whole widely understood story about?
In a sense, concepts that we humans find useful for communication are ones that have been used in all sorts of interactions between different humans. The concepts become robust by being “knitted into” the thought patterns of many interacting brains, a bit like the data put on a blockchain becomes a robust part of “collective blockchain memory” through the interactions between blockchain nodes.
OK, so there’s something strange here. At first it seemed like QNBAIs would have to be something completely exotic and unfamiliar (and perhaps impossible). But somehow as we go over their features they start to seem awfully familiar—and actually awfully like us.
Yup, according to the physics, we know we are “quantum”. Neural nets capture many core features of how our brains seem to work. Blockchain—at least as a general concept—is somehow related to individual and societal memory. And AI, well, AI in effect tries to capture what’s aligned with human goals and intelligence in the computational universe—which is also what we’re doing.
OK, so what’s the closest thing we know to a QNBAI? Well, it’s probably all of us!
Maybe that sounds crazy. I mean, why should a string of buzzwords from 2018 connect like that? Well, at some level perhaps there’s an obvious answer: we tend to create and study things that are relevant to us, and somehow revolve around us. And, more than that, the buzzwords of today are things that are somehow just within the scope that we can now think about with the concepts we’ve currently developed–and that are somehow connected through them.
I must say that when I chose these buzzwords I had no idea they’d connect at all. But as I’ve tried to work through things in writing this, it’s been remarkable how much connection I’ve found. And, yes, in a fittingly bizarre end to a somewhat bizarre journey, it does seem to be the case that a string plucked from today’s buzzword universe has landed very close to home. And maybe in the end—at least in some sense—we are our buzzwords!
It was 1968. I was 8 years old. The “space race” was in full swing. For the first time, a space probe had recently landed on another planet (Venus). And I was eagerly studying everything I could to do with space.
Then on April 3, 1968 (May 15 in the UK), the movie 2001: A Space Odyssey was released—and I was keen to see it. So in the early summer of 1968 there I was, the first time I’d ever been in an actual cinema (yes, it was called that in the UK). I’d been dropped off for a matinee, and was pretty much the only person in the theater. And to this day, I remember sitting in a plush seat and eagerly waiting for the curtain to go up, and the movie to begin.
It started with an impressive extraterrestrial sunrise. But then what was going on? Those weren’t space scenes. Those were landscapes, and animals. I was confused, and frankly a little bored. But just when I was getting concerned, there was a bone thrown in the air that morphed into a spacecraft, and pretty soon there was a rousing waltz—and a big space station turning majestically on the screen.
The next two hours had a big effect on me. It wasn’t really the spacecraft (I’d seen plenty of them in books by then, and in fact made many of my own concept designs). And at the time I didn’t care much about the extraterrestrials. But what was new and exciting for me in the movie was the whole atmosphere of a world full of technology—and the notion of what might be possible there, with all those bright screens doing things, and, yes, computers driving it all.
It would be another year before I saw my first actual computer in real life. But those two hours in 1968 watching 2001 defined an image of what the computational future could be like, that I carried around for years.
I think it was during the intermission to the movie that some seller of refreshments—perhaps charmed by a solitary kid so earnestly pondering the movie—gave me a “cinema program” about the movie. Half a century later I still have that program, complete with a food stain, and faded writing from my 8-year-old self, recording (with some misspelling) where and when I saw the movie:
What Actually Happened
A lot has happened in the past 50 years, particularly in technology, and it’s an interesting experience for me to watch 2001 again—and compare what it predicted with what’s actually happened. Of course, some of what’s actually been built over the past 50 years has been done by people like me, who were influenced in larger or smaller ways by 2001.
One very obvious prediction of 2001 that hasn’t panned out, at least yet, is routine, luxurious space travel. But like many other things in the movie, it doesn’t feel like what was predicted was off track; it’s just that—50 years later—we still haven’t got there yet.
So what about the computers in the movie? Well, they have lots of flat-screen displays, just like real computers today. In the movie, though, one obvious difference is that there’s one physical display per functional area; the notion of windows, or dynamically changeable display areas, hadn’t arisen yet.
Another difference is in how the computers are controlled. Yes, you can talk to HAL. But otherwise, it’s lots and lots of mechanical buttons. To be fair, cockpits today still have plenty of buttons—but the centerpiece is now a display. And, yes, in the movie there weren’t any touchscreens—or mice. (Both had actually been invented a few years before the movie was made, but neither was widely known.)
There also aren’t any keyboards to be seen (and in the high-tech spacecraft full of computers going to Jupiter, the astronauts are writing with pens on clipboards; presciently, no slide rules and no tape are shown—though there is one moment when a printout that looks awfully like a punched card is produced). Of course, there were keyboards for computers back in the 1960s. But in those days, very few people could type, and there probably didn’t seem to be any reason to think that would change. (Being something of a committed tool user, I myself was routinely using a typewriter even in 1968, though I didn’t know any other kids who were—and my hands at the time weren’t big or strong enough to do much other than type fast with one finger, a skill whose utility returned decades later with the advent of smartphones.)
What about the content of the computer displays? That might have been my favorite thing in the whole movie. They were so graphical, and communicating so much information so quickly. I had seen plenty of diagrams in books, and had even painstakingly drawn quite a few myself. But back in 1968 it was amazing to imagine that a computer could generate information, and display it graphically, so quickly.
Of course there was television (though color only arrived in the UK in 1968, and I’d only seen black and white). But television wasn’t generating images; it was just showing what a camera saw. There were oscilloscopes too, but they just had a single dot tracing out a line on the screen. So the computer displays in 2001 were, at least for me, something completely new.
At the time it didn’t seem odd that in the movie there were lots of printed directions (how to use the “Picturephone”, or the zero-gravity toilet, or the hibernation modules). Today, any such instructions (and they’d surely be much shorter, or at least broken up a lot, for today’s less patient readers) would be shown onscreen. But when 2001 was made, the idea of word processing, and of displaying text to read onscreen, was still several years in the future—probably not least because at the time people thought of computers as machines for calculation, and there didn’t seem to be anything calculational about text.
There are lots of different things shown on the displays in 2001. Even though there isn’t the idea of dynamically movable windows, the individual displays, when they’re not showing anything, go into a kind of “iconic” state, just showing in large letters codes like NAV or ATM or FLX or VEH or GDE.
When the displays are active they sometimes show things like tables of numbers, and sometimes show lightly animated versions of a whole variety of textbook-like diagrams. A few of them show 1980s-style animated 3D line graphics (“what’s the alignment of the spacecraft?”, etc.)—perhaps modeled after analog airplane controls.
But very often there’s also something else—and occasionally it fills a whole display. There’s something that looks like code, or a mixture of code and math.
It’s usually in a fairly “modern-looking” sans serif font (well, actually, a font called Manifold for IBM Selectric electric typewriters). Everything’s uppercase. And with stars and parentheses and names like TRAJ04, it looks a bit like early Fortran code (except that given the profusion of semicolons, it was more likely modeled on IBM’s PL/I language). But then there are also superscripts, and built-up fractions—like math.
Looking at this now, it’s a bit like trying to decode an alien language. What did the makers of the movie intend this to be about? A few pieces make sense to me. But a lot of it looks random and nonsensical—meaningless formulas full of unreasonably high-precision numbers. Considering all the care put into the making of 2001, this seems like a rare lapse—though perhaps 2001 started the long and somewhat unfortunate tradition of showing meaningless code in movies. (A recent counterexample is my son Christopher’s alien-language-analysis code for Arrival, which is actual Wolfram Language code that genuinely makes the visualizations shown.)
But would it actually make sense to show any form of code on real displays like the ones in 2001? After all, the astronauts aren’t supposed to be building the spacecraft; they’re only operating it. But here’s a place where the future is only just now arriving. During most of the history of computing, code has been something that humans write, and computers read. But one of my goals with the Wolfram Language is to create a true computational communication language that is high-level enough that not only computers, but also humans, can usefully read.
Yes, one might be able to describe in words some procedure that a spacecraft is executing. But one of the points of the Wolfram Language is to be able to state the procedure in a form that directly fits in with human computational thinking. So, yes, on the first real manned spacecraft going to Jupiter, it’ll make perfect sense to display code, though it won’t look quite like what’s in 2001.
Accidents of History
I’ve watched 2001 several times over the years, though not specifically in the year 2001 (that year for me was dominated by finishing my magnum opus A New Kind of Science). But there are several very obvious things in the movie 2001 that don’t ring true for the real year 2001—quite beyond the very different state of space travel.
One of the most obvious is that the haircuts and clothing styles and general formality look wrong. Of course these would have been very hard to predict. But perhaps one could at least have anticipated (given the hippie movement etc.) that clothing styles and so on would get less formal. But back in 1968, I certainly remember for example getting dressed up even to go on an airplane.
Another thing that today doesn’t look right in the movie is that nobody has a personal computer. Of course, back in 1968 there were still only a few thousand computers in the whole world—each weighing at least some significant fraction of a ton—and basically nobody imagined that one day individual people would have computers, and be able to carry them around.
As it happens, back in 1968 I’d recently been given a little plastic kit mechanical computer (called Digi-Comp I) that could (very laboriously) do 3-digit binary operations. But I think it’s fair to say that I had absolutely no grasp of how this could scale up to something like the computers in 2001. And indeed when I saw 2001 I imagined that to have access to technology like I saw in the movie, I’d have to be joining something like NASA when I was grown up.
What of course I didn’t foresee—and I’m not sure anyone did—is that consumer electronics would become so small and cheap. And that access to computers and computation would therefore become so ubiquitous.
In the movie, there’s a sequence where the astronauts are trying to troubleshoot a piece of electronics. Lots of nice computer-aided, engineering-style displays come up. But they’re all of printed circuit boards with discrete components. There are no integrated circuits or microprocessors—which isn’t surprising, because in 1968 these basically hadn’t been invented yet. (Correctly, there aren’t vacuum tubes, though. Apparently the actual prop used—at least for exterior views—was a gyroscope.)
It’s interesting to see all sorts of little features of technology that weren’t predicted in the movie. For example, when they’re taking commemorative pictures in front of the monolith on the Moon, the photographer keeps tipping the camera after each shot—presumably to advance the film inside. The idea of digital cameras that could electronically take pictures simply hadn’t been imagined then.
In the history of technology, there are certain things that just seem inevitable—even though sometimes they may take decades to finally arrive. An example are videophones. There were early ones even back in the 1930s. And there were attempts to consumerize them in the 1970s and 1980s. But even by the 1990s they were still exotic—though I remember that with some effort I successfully rented a pair of them in 1993—and they worked OK, even over regular phone lines.
On the space station in 2001, there’s a Picturephone shown, complete with an AT&T logo—though it’s the old Bell System logo that looks like an actual bell. And as it happens, when 2001 was being made, there was a real project at AT&T called the Picturephone.
Of course, in 2001 the Picturephone isn’t a cellphone or a mobile device. It’s a built-in object, in a kiosk—a pay Picturephone. In the actual course of history, though, the rise of cellphones occurred before the consumerization of videochat—so payphone and videochat technology basically never overlapped.
Also interesting in 2001 is that the Picturephone is a push-button phone, with exactly the same numeric button layout as today (though without the * and # [“octothorp”]). Push-button phones actually already existed in 1968, although they were not yet widely deployed. And, of course, because of the details of our technology today, when one actually does a videochat, I don’t know of any scenario in which one ends up pushing mechanical buttons.
There’s a long list of instructions printed on the Picturephone—but in actuality, just like today, its operation seems quite straightforward. Back in 1968, though, even direct long-distance dialing (without an operator) was fairly new—and wasn’t yet possible at all between different countries.
To use the Picturephone in 2001, one inserts a credit card. Credit cards had existed for a while even in 1968, though they were not terribly widely used. The idea of automatically reading credit cards (say, using a magnetic stripe) had actually been developed in 1960, but it didn’t become common until the 1980s. (I remember that in the mid-1970s in the UK, when I got my first ATM card, it consisted simply of a piece of plastic with holes like a punched card—not the most secure setup one can imagine.)
At the end of the Picturephone call in 2001, there’s a charge displayed: $1.70. Correcting for inflation, that would be about $12 today. By the standards of modern cellphones—or internet videochatting—that’s very expensive. But for a present-day satellite phone, it’s not so far off, even for an audio call. (Today’s handheld satphones can’t actually support the necessary data rates for videocalls, and networks on planes still struggle to handle videocalls.)
On the space shuttle (or, perhaps better, space plane) the cabin looks very much like a modern airplane—which probably isn’t surprising, because things like Boeing 737s already existed in 1968. But in a correct (at least for now) modern touch, the seat backs have TVs—controlled, of course, by a row of buttons. (And there’s also futuristic-for-the-1960s programming, like a televised women’s judo match.)
A curious film-school-like fact about 2001 is that essentially every major scene in the movie (except the ones centered on HAL) shows the consumption of food. But how would food be delivered in the year 2001? Well, like everything else, it was assumed that it would be more automated, with the result that in the movie a variety of elaborate food dispensers are shown. As it’s turned out, however, at least for now, food delivery is something that’s kept humans firmly in the loop (think McDonald’s, Starbucks, etc.).
In the part of the movie concerned with going to Jupiter, there are “hibernaculum pods” shown—with people inside in hibernation. And above these pods there are vital-sign displays, that look very much like modern ICU displays. In a sense, that was not such a stretch of a prediction, because even in 1968, there had already been oscilloscope-style EKG displays for some time.
Of course, how to put people into hibernation isn’t something that’s yet been figured out in real life. That it—and cryonics—should be possible has been predicted for perhaps a century. And my guess is that—like cloning or gene editing—to do it will take inventing some clever tricks. But in the end I expect it will pretty much seem like a historical accident in which year it’s figured out. It just so happens not to have happened yet.
There’s a scene in 2001 where one of the characters arrives on the space station and goes through some kind of immigration control (called “Documentation”)—perhaps imagined to be set up as some kind of extension to the Outer Space Treaty from 1967. But what’s particularly notable in the movie is that the clearance process is handled automatically, using biometrics, or specifically, voiceprint identification. (The US insignia displayed are identical to the ones on today’s US passports, but in typical pre-1980s form, the system asks for “surname” and “Christian name”.)
There had been primitive voice recognition systems even in the 1950s (“what digit is that?”), and the idea of identifying speakers by voice was certainly known. But what was surely not obvious is that serious voice systems would need the kind of computer processing power that only became available in the late 2000s.
And in just the last few years, automatic biometric immigration control systems have started to become common at airports—though using face and sometimes fingerprint recognition rather than voice. (Yes, it probably wouldn’t work well to have lots of people talking at different kiosks at the same time.)
In the movie, the kiosk has buttons for different languages: English, Dutch, Russian, French, Italian, Japanese. It would have been very hard to predict what a more appropriate list for 2001 might have been.
Even though 1968 was still in the middle of the Cold War, the movie correctly portrays international use of the space station—though, like in Antarctica today, it portrays separate moon bases for different countries. Of course, the movie talks about the Soviet Union. But the fact the Berlin Wall would fall 21 years after 1968 isn’t the kind of thing that ever seems predictable in human history.
The movie shows logos from quite a few companies as well. The space shuttle is proudly branded Pan Am. And in at least one scene, its instrument panel has “IBM” in the middle. (There’s also an IBM logo on spacesuit controls during an EVA near Jupiter.) On the space station there are two hotels shown: Hilton and Howard Johnson’s. There’s also a Whirlpool “TV dinner” dispenser in the galley of the spacecraft going to the Moon. And there’s the AT&T (Bell System) Picturephone, as well as an Aeroflot bag, and a BBC newscast. (The channel is “BBC 12”, though in reality the expansion has only been from BBC 2 to BBC 4 in the past 50 years.)
Companies have obviously risen and fallen over the course of 50 years, but it’s interesting how many of the ones featured in the movie still exist, at least in some form. Many of their logos are even almost the same—though AT&T and BBC are two exceptions, and the IBM logo got stripes added in 1972.
It’s also interesting to look at the fonts used in the movie. Some seem quite dated to us today, while others (like the title font) look absolutely modern. But what’s strange is that at times over the past 50 years some of those “modern” fonts would have seemed old and tired. But such, I suppose, is the nature of fashion. And it’s worth remembering that even those “serifed fonts” from stone inscriptions in ancient Rome are perfectly capable of looking sharp and modern.
Something else that’s changed since 1968 is how people talk, and the words they use. The change seems particularly notable in the technospeak. “We are running cross-checking routines to determine reliability of this conclusion” sounds fine for the 1960s, but not so much for today. There’s mention of the risk of “social disorientation” without “adequate preparation and conditioning”, reflecting a kind of behaviorist view of psychology that at least wouldn’t be expressed the same way today.
It’s sort of charming when a character in 2001 says that whenever they “phone” a moon base, they get “a recording which repeats that the phone lines are temporarily out of order”. One might not say something too different about landlines on Earth today, but it feels like with a moon base one should at least be talking about automatically finding out if their network is down, rather than about having a person call on the phone and listen to a recorded message.
Of course, had a character in 2001 talked about “not being able to ping their servers”, or “getting 100% packet loss” it would have been completely incomprehensible to 1960s movie-goers—because those are concepts of a digital world which basically had just not been invented yet (even though the elements for it definitely existed).
What about HAL?
The most notable and enduring character from 2001 is surely the HAL 9000 computer, described (with exactly the same words as might be used today) as “the latest in machine intelligence”. HAL talks, lipreads, plays chess, recognizes faces from sketches, comments on artwork, does psychological evaluations, reads from sensors and cameras all over the spaceship, predicts when electronics will fail, and—notably to the plot—shows a variety of human-like emotional responses.
It might seem remarkable that all these AI-like capabilities would be predicted in the 1960s. But actually, back then, nobody yet thought that AI would be hard to create—and it was widely assumed that before too long computers would be able to do pretty much everything humans can, though probably better and faster and on a larger scale.
But already by the 1970s it was clear that things weren’t going to be so easy, and before long the whole field of AI basically fell into disrepute—with the idea of creating something like HAL beginning to seem as fictional as digging up extraterrestrial artifacts on the Moon.
In the movie, HAL’s birthday is January 12, 1992 (though in the book version of 2001, it was 1997). And in 1997, in Urbana, Illinois, fictional birthplace of HAL (and, also, as it happens, the headquarters location of my company), I went to a celebration of HAL’s fictional birthday. People talked about all sorts of technologies relevant to HAL. But to me the most striking thing was how low the expectations had become. Almost nobody even seemed to want to mention “general AI” (probably for fear of appearing kooky), and instead people were focusing on solving very specific problems, with specific pieces of hardware and software.
Having read plenty of popular science (and some science fiction) in the 1960s, I certainly started from the assumption that one day HAL-like AIs would exist. And in fact I remember that in 1972, when I happened to end up delivering a speech to my whole school—and picking the topic of what amounts to AI ethics. I’m afraid that what I said I would now consider naive and misguided (and in fact I was perhaps partly misled by 2001). But, heck, I was only 12 at the time. And what I find interesting today is just that I thought AI was an important topic even back then.
For the remainder of the 1970s I was personally mostly very focused on physics (which, unlike AI, was thriving at the time). AI was still in the back of my mind, though, when for example I wanted to understand how brains might or might not relate to statistical physics and to things like the formation of complexity. But what made AI really important again for me was that in 1981 I had launched my first computer language (SMP) and had seen how successful it was at doing mathematical and scientific computations—and I got to wondering what it would take to do computations about (and know about) everything.
My immediate assumption was that it would require full brain-like capabilities, and therefore general AI. But having just lived through so many advances in physics, this didn’t immediately faze me. And in fact, I even had a fairly specific plan. You see, SMP—like the Wolfram Language today—was fundamentally based on the idea of defining transformations to apply when expressions match particular patterns. I always viewed this as a rough idealization of certain forms of human thinking. And what I thought was that general AI might effectively just require adding a way to match not just precise patterns, but also approximate ones (e.g. “that’s a picture of an elephant, even though its pixels aren’t exactly the same as in the sample”).
I tried a variety of schemes for doing this, one of them being neural nets. But somehow I could never formulate experiments that were simple enough to even have a clear definition of success. But by making simplifications to neural nets and a couple of other kinds of systems, I ended up coming up with cellular automata—which quickly allowed me to make some discoveries that started me on my long journey of studying the computational universe of simple programs, and made me set aside approximate pattern matching and the problem of AI.
At the time of HAL’s fictional birthday in 1997, I was actually right in the middle of my intense 10-year process of exploring the computational universe and writing A New Kind of Science—and it was only out of my great respect for 2001 that I agreed to break out of being a hermit for a day and talk about HAL.
And, as I pointed out, just like general AI, people had discussed cloning mammals for ages. But it had been assumed to be impossible, and almost nobody had worked on it—until the success with Dolly. I wasn’t sure what kind of discovery or insight would lead to progress in AI. But I felt certain that eventually it would come.
Meanwhile, from my study of the computational universe, I’d formulated my Principle of Computational Equivalence—which had important things to say about artificial intelligence. And at some level, what it said is that there isn’t some magic “bright line” that separates the “intelligent” from the merely computational.
Emboldened by this—and with the Wolfram Language as a tool—I then started thinking again about my quest to solve the problem of computational knowledge. It certainly wasn’t an easy thing. But after quite a few years of work, in 2009, there it was: Wolfram|Alpha—a general computational knowledge engine with a lot of knowledge about the world. And particularly after Wolfram|Alpha was integrated with voice input and voice output in things like Siri, it started to seem in many ways quite HAL-like.
HAL in the movie had some more tricks, though. Of course he had specific knowledge about the spacecraft he was running—a bit like the custom Enterprise Wolfram|Alpha systems that now exist at various large corporations. But he had other capabilities too—like being able to do visual recognition tasks.
And as computer science developed, such things had hardened into tough nuts that basically “computers just can’t do”. To be fair, there was lots of practical progress in things like OCR for text, and face recognition. But it didn’t feel general. And then in 2012, there was a surprise: a trained neural net was suddenly discovered to perform really well on standard image recognition tasks.
It was a strange situation. Neural nets had first been discussed in the 1940s, and had seen several rounds of waxing and waning enthusiasm over the decades. But suddenly just a few years ago they really started working. And a whole bunch of “HAL-like tasks” that had seemed out of range suddenly began to seem achievable.
In 2001, there’s the idea that HAL wasn’t just “programmed”, but somehow “learned”. And in fact HAL mentions at one point that HAL had a (human) teacher. And perhaps the gap between HAL’s creation in 1992 and deployment in 2001 was intended to correspond to HAL’s human-like period of education. (Arthur C. Clarke probably changed the birth year to 1997 for the book because he thought that a 9-year-old computer would be obsolete.)
But the most important thing that’s made modern machine learning systems actually start to work is precisely that they haven’t been trained at human-type rates. Instead, they’ve immediately been fed millions or billions of example inputs—and then they’ve been expected to burn huge amounts of CPU time systematically finding what amount to progressively better fits to those examples. (It’s conceivable that an “active learning” machine could be set up to basically find the examples it needs within a human-schoolroom-like environment, but this isn’t how the most important successes in current machine learning have been achieved.)
So can machines now do what HAL does in the movie? Unlike a lot of the tasks presumably needed to run an actual spaceship, most of the tasks the movie concentrates on HAL doing are ones that seem quintessentially human. And most of these turn out to be well-suited to modern machine learning—and month by month more and more of them have now been successfully tackled.
But what about knitting all these tasks together, to make a “complete HAL”? One could conceivably imagine having some giant neural net, and “training it for all aspects of life”. But this doesn’t seem like a good way to do things. After all, if we’re doing celestial mechanics to work out the trajectory of a spacecraft, we don’t have to do it by matching examples; we can do it by actual calculation, using the achievements of mathematical science.
We need our HAL to be able to know about a lot of kinds of things, and to be able to compute about a lot of kinds of things, including ones that involve human-like recognition and judgement.
In the book version of 2001, the name HAL was said to stand for “Heuristically programmed ALgorithmic computer”. And the way Arthur C. Clarke explained it is that this was supposed to mean “it can work on a program that’s already set up, or it can look around for better solutions and you get the best of both worlds”.
And at least in some vague sense, this is actually a pretty good description of what I’ve built over the past 30 years as the Wolfram Language. The “programs that are already set up” happen to try to encompass a lot of the systematic knowledge about computation and about the world that our civilization has accumulated.
But there’s also the concept of searching for new programs. And actually the science that I’ve done has led me to do a lot of work searching for programs in the computational universe of all possible programs. We’ve had many successes in finding useful programs that way, although the process is not as systematic as one might like.
In recent years, the Wolfram Language has also incorporated modern machine learning—in which one is effectively also searching for programs, though in a restricted domain defined for example by weights in a neural network, and constructed so that incremental improvement is possible.
Could we now build a HAL with the Wolfram Language? I think we could at least get close. It seems well within range to be able to talk to HAL in natural language about all sorts of relevant things, and to have HAL use knowledge-based computation to control and figure out things about the spaceship (including, for example, simulating components of it).
The “computer as everyday conversation companion” side of things is less well developed, not least because it’s not as clear what the objective might be there. But it’s certainly my hope that in the next few years—in part to support applications like computational smart contracts (and yes, it would have been good to have one of those set up for HAL)—that things like my symbolic discourse language project will provide a general framework for doing this.
“Incapable of Error”
Do computers “make mistakes”? When the first electronic computers were made in the 1940s and 1950s, the big issue was whether the hardware in them was reliable. Did the electrical signals do what they were supposed to, or did they get disrupted, say because a moth (“bug”) flew inside the computer?
By the time mainframe computers were developed in the early 1960s, such hardware issues were pretty well under control. And so in some sense one could say (and marketing material did) that computers were “perfectly reliable”.
HAL reflects this sentiment in 2001. “The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.”
From a modern point of view, saying this kind of thing seems absurd. After all, everyone knows that computer systems—or, more specifically, software systems—inevitably have bugs. But in 1968, bugs weren’t really understood.
After all, computers were supposed to be perfect, logical machines. And so, the thinking went, they must operate in a perfect way. And if anything went wrong, it must, as HAL says in the movie, “be attributable to human error”. Or, in other words, that if the human were smart and careful enough, the computer would always “do the right thing”.
When Alan Turing did his original theoretical work in 1936 to show that universal computers could exist, he did it by writing what amounts to a program for his proposed universal Turing machine. And even in this very first program (which is only a page long), it turns out that there were already bugs.
But, OK, one might say, with enough effort, surely one can get rid of any possible bug. Well, here’s the problem: to do so requires effectively foreseeing every aspect of what one’s program could ever do. But in a sense, if one were able to do that, one almost doesn’t need the program in the first place.
And actually, pretty much any program that’s doing nontrivial things is likely to show what I call computational irreducibility, which implies that there’s no way to systematically shortcut what the program does. To find out what it does, there’s basically no choice but just to run it and watch what it does. Sometimes this might be seen like a desirable feature—for example if one’s setting up a cryptocurrency that one wants to take irreducible effort to mine.
And, actually, if there isn’t computational irreducibility in a computation, then it’s a sign that the computation isn’t being done as efficiently as it could be.
What is a bug? One might define it as a program doing something one doesn’t want. So maybe we want the pattern on the left created by a very simple program to never die out. But the point is that there may be no way in anything less than an infinite time to answer the “halting problem” of whether it can in fact die out. So, in other words, figuring out if the program “has a bug” and does something one doesn’t want may be infinitely hard.
And of course we know that bugs are not just a theoretical problem; they exist in all large-scale practical software. And unless HAL only does things that are so simple that we foresee every aspect of them, it’s basically inevitable that HAL will exhibit bugs.
But maybe, one might think, HAL could at least be given some overall directives—like “be nice to humans”, or other potential principles of AI ethics. But here’s the problem: given any precise specification, it’s inevitable that there will unintended consequences. One might say these are “bugs in the specification”, but the problem is they’re inevitable. When computational irreducibility is present, there’s basically never any finite specification that can avoid any conceivable “unintended consequence”.
Or, said in terms of 2001, it’s inevitable that HAL will be capable of exhibiting unexpected behavior. It’s just a consequence of being a system that does sophisticated computation. It lets HAL “show creativity” and “take initiative”. But it also means HAL’s behavior can’t ever be completely predicted.
The basic theoretical underpinnings to know this already existed in the 1950s or even earlier. But it took experience with actual complex computer systems in the 1970s and 1980s for intuition about bugs to develop. And it took my explorations of the computational universe in the 1980s and 1990s to make it clear how ubiquitous the phenomenon of computational irreducibility actually is, and how much it affects basically any sufficiently broad specification.
How Did They Get It Right?
It’s interesting to see what the makers of 2001 got wrong about the future, but it’s impressive how much they got right. So how did they do it? Well, between Stanley Kubrick and Arthur C. Clarke (and their “scientific consultant” Fred Ordway III), they solicited input from a fair fraction of the top technology companies of the day—and (though there’s nothing in the movie credits about them) received a surprising amount of detailed information about the plans and aspirations of these companies, along with quite a few designs custom-made for the movie as a kind of product placement.
In the very first space scene in the movie, for example, one sees an assortment of differently shaped spacecraft, that were based on concept designs from the likes of Boeing, Grumman and General Dynamics, as well as NASA. (In the movie, there are no aerospace manufacturer logos—and NASA also doesn’t get a mention; instead the assorted spacecraft carry the flags of various countries.)
But so where did the notion of having an intelligent computer come from? I don’t think it had an external source. I think it was just an idea that was very much “in the air” at the time. My late friend Marvin Minsky, who was one of the pioneers of AI in the 1960s, visited the set of 2001 during its filming. But Kubrick apparently didn’t ask him about AI; instead he asked about things like computer graphics, the naturalness of computer voices, and robotics. (Marvin claims to have suggested the configuration of arms that was used for the pods on the Jupiter spacecraft.)
But what about the details of HAL? Where did those come from? The answer is that they came from IBM.
IBM was at the time by far the world’s largest computer company, and it also conveniently happened to be headquartered in New York City, which is where Kubrick and Clarke were doing their work. IBM—as now—was always working on advanced concepts that they could demo. They worked on voice recognition. They worked on image recognition. They worked on computer chess. In fact, they worked on pretty much all the specific technical features of HAL shown in 2001. Many of these features are even shown in the “Information Machine” movie IBM made for the 1964 World’s Fair in New York City (though, curiously, that movie has a dynamic multi-window form of presentation that wasn’t adopted for HAL).
And the rhetoric about HAL having a flawless operational record could almost be out of IBM’s marketing material for the 360. And of course HAL was physically big—like a mainframe computer (actually even big enough that a person could go inside the computer). But there was one thing about HAL that was very non-IBM. Back then, IBM always strenuously avoided ever saying that computers could themselves be smart; they just emphasized that computers would do what people told them to. (Somewhat ironically, the internal slogan that IBM used for its employees was “Think”. It took until the 1980s for IBM to start talking about computers as smart—and for example in 1980 when my friend Greg Chaitin was advising the then-head of research at IBM he was told it was deliberate policy not to pursue AI, because IBM didn’t want its human customers to fear they might be replaced by AIs.)
An interesting letter from 1966 surfaced recently. In it, Kubrick asks one of his producers (a certain Roger Caras, who later became well known as a wildlife TV personality): “Does I.B.M. know that one of the main themes of the story is a psychotic computer?”. Kubrick is concerned that they will feel “swindled”. The producer writes back, talking about IBM as “the technical advisor for the computer”, and saying that IBM will be OK so long as they are “not associated with the equipment failure by name”.
But was HAL supposed to be an IBM computer? The IBM logo appears a couple of times in the movie, but not on HAL. Instead, HAL has a nameplate that looks like this:
It’s certainly interesting that the blue is quite like IBM’s characteristic “big blue” blue. It’s also very curious that if you go one step forward in the alphabet from the letters H A L, you get I B M. Arthur C. Clarke always claimed this was a coincidence, and it probably was. But my guess is that at some point, that blue part of HAL’s nameplate was going to say “IBM”.
Like some other companies, IBM was fond of naming its products with numbers. And it’s interesting to look at what numbers they used. In the 1960s, there were a lot of 3- and 4-digit numbers starting with 3’s and 7’s, including a whole 7000 series, etc. But, rather curiously, there was not a single one starting with 9: there was no IBM 9000 series. In fact, IBM didn’t have a single product whose name started with 9 until the 1990s. And I suspect that was due to HAL.
By the way, the IBM liaison for the movie was their head of PR, C. C. Hollister, who was interviewed in 1964 by the New York Times about why IBM—unlike its competitors—ran general advertising (think Super Bowl), given that only a thin stratum of corporate executives actually made purchasing decisions about computers. He responded that their ads were “designed to reach… the articulators or the 8 million to 10 million people that influence opinion on all levels of the nation’s life” (today one would say “opinion makers”, not “articulators”).
He then added “It is important that important people understand what a computer is and what it can do.” And in some sense, that’s what HAL did, though not in the way Hollister might have expected.
Predicting the Future
OK, so now we know—at least over the span of 50 years—what happened to the predictions from 2001, and in effect how science fiction did (or did not) turn into science fact. So what does this tell us about predictions we might make today?
In my observation things break into three basic categories. First, there are things people have been talking about for years, that will eventually happen—though it’s not clear when. Second, there are surprises that basically nobody expects, though sometimes in retrospect they may seem somewhat obvious. And third, there are things people talk about, but that potentially just won’t ever be possible in our universe, given how its physics works.
Something people have talked about for ages, that surely will eventually happen, is routine space travel. When 2001 was released, no humans had ever ventured beyond Earth orbit. But even by the very the next year, they’d landed on the Moon. And 2001 made what might have seemed like a reasonable prediction that by the year 2001 people would routinely be traveling to the Moon, and would be able to get as far as Jupiter.
Now of course in reality this didn’t happen. But actually it probably could have, if it had been considered a sufficient priority. But there just wasn’t the motivation for it. Yes, space has always been more broadly popular than, say, ocean exploration. But it didn’t seem important enough to put the necessary resources into.
Will it ever happen? I think it’s basically a certainty. But will it take 5 years or 50? It’s very hard to tell—though based on recent developments I would guess about halfway between.
People have been talking about space travel for well over a hundred years. They’ve been talking about what’s now called AI for even longer. And, yes, at times there’ve been arguments about how some feature of human intelligence is so fundamentally special that AI will never capture it. But I think it’s pretty clear at this point that AI is on an inexorable path to reproduce any and all features of whatever we would call intelligence.
A more mundane example of what one might call “inexorable technology development” is videophones. Once one had phones and one had television, it was sort of inevitable that eventually one would have videophones. And, yes, there were prototypes in the 1960s. But for detailed reasons of computer and telecom capacity and cost, videophone technology didn’t really become broadly available for a few more decades. But it was basically inevitable that it eventually would.
In science fiction, basically ever since radio was invented, it was common to imagine that in the future everyone would be able to communicate through radio instantly. And, yes, it took the better part of a century. But eventually we got cellphones. And in time we got smartphones that could serve as magic maps, and magic mirrors, and much more.
An example that’s today still at an earlier stage in its development is virtual reality. I remember back in the 1980s trying out early VR systems. But back then, they never really caught on. But I think it’s basically inevitable that they eventually will. Perhaps it will require having video that’s at the same quality level as human vision (as audio has now been for a couple of decades). And whether it’s exactly VR, or instead augmented reality, that eventually becomes widespread is not clear. But something like that surely will. Though exactly when is not clear.
There are endless examples one can cite. People have been talking about self-driving cars since at least the 1960s. And eventually they will exist. People have talked about flying cars for even longer. Maybe helicopters could have gone in this direction, but for detailed reasons of control and reliability that didn’t work out. Maybe modern drones will solve the problem. But again, eventually there will be flying cars. It’s just not clear exactly when.
Similarly, there will eventually be robotics everywhere. I have to say that this is something I’ve been hearing will “soon happen” for more than 50 years, and progress has been remarkably slow. But my guess is that once it’s finally figured out how to really do “general-purpose robotics”—like we can do general-purpose computation—things will advance very quickly.
And actually there’s a theme that’s very clear over the past 50+ years: what once required the creation of special devices is eventually possible by programming something that is general purpose. In other words, instead of relying on the structure of physical devices, one builds up capabilities using computation.
What is the end point of this? Basically it’s that eventually everything will be programmable right down to atomic scales. In other words, instead of specifically constructing computers, we’ll basically build everything “out of computers”. To me, this seems like an inevitable outcome. Though it happens to be one that hasn’t yet been much discussed, or, say, explored in science fiction.
Returning to more mundane examples, there are other things that will surely be possible one day, like drilling into the Earth’s mantle, or having cities under the ocean (both subjects of science fiction in the past—and there’s even an ad for a “Pan Am Underwater Hotel” visible on the space station in 2001). But whether these kinds of things will be considered worth doing is not so clear. Bringing back dinosaurs? It’ll surely be possible to get a good approximation to their DNA. How long all the necessary bioscience developments will take I don’t know, but one day one will surely be able to have a live stegosaurus again.
Perhaps one of the oldest “science fiction” ideas ever is immortality. And, yes, human lifespans have been increasing. But will there come a point where humans can for practical purposes be immortal? I am quite certain that there will. Quite whether the path will be primarily biological, or primarily digital, or some combination involving molecular-scale technology, I do not know. And quite what it will all mean, given the inevitable presence of an infinite number of possible bugs (today’s “medical conditions”), I am not sure. But I consider it a certainty that eventually the old idea of human immortality will become a reality. (Curiously, Kubrick—who was something of an enthusiast for things like cryonics—said in an interview in 1968 that one of the things he thought might have happened by the year 2001 is the “elimination of old age”.)
So what’s an example of something that won’t happen? There’s a lot we can’t be sure about without knowing the fundamental theory of physics. (And even given such a theory, computational irreducibility means it can be arbitrarily hard to work out the consequence for some particular issue.) But two decent candidates for things that won’t ever happen are Honey-I-Shrunk-the-Kids miniaturization and faster-than-light travel.
Well, at least these things don’t seem likely to happen the way they are typically portrayed in science fiction. But it’s still possible that things that are somehow functionally equivalent will happen. For example, it perfectly well could be possible to “scan an object” at an atomic scale, and then “reinterpret it”, and build up using molecular-scale construction at least a very good approximation to it that happens to be much smaller.
What about faster-than-light travel? Well, maybe one will be able to deform spacetime enough that it’ll effectively be possible. Or conceivably one will be able to use quantum mechanics to effectively achieve it. But these kinds of solutions assume that what one cares about are things happening directly in our physical universe.
But imagine that in the future everyone has effectively been “uploaded” into some digital system—so that the “physics” one’s experiencing is instead something virtualized. And, yes, at the level of the underlying hardware maybe there will be restrictions based on the speed of light. But for purposes of the virtualized experience, there’ll be no such constraint. And, yes, in a setup like this, one can also imagine another science fiction favorite: time travel (notwithstanding its many philosophical issues).
OK, so what about surprises? If we look at the world today, compared to 50 years ago, it’s easy to identify some surprises. Computers are far more ubiquitous than almost anyone expected. And there are things like the web, and social media, that weren’t really imagined (even though perhaps in retrospect they seem “obvious”).
There’s another surprise, whose consequences are so far much less well understood, but that I’ve personally been very involved with: the fact that there’s so much complexity and richness to be found in the computational universe.
Almost by definition, “surprises” tend to occur when understanding what’s possible, or what makes sense, requires a change of thinking, or some kind of “paradigm shift”. Often in retrospect one imagines that such changes of thinking just occur—say in the mind of one particular person—out of the blue. But in reality what’s almost always going on is that there’s a progressive stack of understanding developed—which, perhaps quite suddenly, allows one to see something new.
And in this regard it’s interesting to reflect on the storyline of 2001. The first part of the movie shows an alien artifact—a black monolith—that appears in the world of our ape ancestors, and starts the process that leads to modern civilization. Maybe the monolith is supposed to communicate critical ideas to the apes by some kind of telepathic transmission.
But I like to have another interpretation. No ape 4 million years ago had ever seen a perfect black monolith, with a precise geometrical shape. But as soon as they saw one, they could tell that something they had never imagined was possible. And the result was that their worldview was forever changed. And—a bit like the emergence of modern science as a result of Galileo seeing the moons of Jupiter—that’s what allowed them to begin constructing what became modern civilization.
The Extraterrestrials
When I first saw 2001 fifty years ago nobody knew whether there would turn out to be life on Mars. People didn’t expect large animals or anything. But lichens or microorganisms seemed, if anything, more likely than not.
With radio telescopes coming online, and humans just beginning to venture out into space, it also seemed quite likely that before long we’d find evidence of extraterrestrial intelligence. But in general people seemed neither particularly excited, or particularly concerned, about this prospect. Yes, there would be mention of the time when a radio broadcast of H. G. Wells’s War of the Worlds story was thought to be a real alien invasion in New Jersey. But 20 or so years after the end of World War II, people were much more concerned about the ongoing Cold War, and what seemed like the real possibility that the world would imminently blow itself up in a giant nuclear conflagration.
The seed for what became 2001 was a rather nice 1951 short story by Arthur C. Clarke called “The Sentinel” about a mysterious pyramid discovered on the Moon, left there before life emerged on Earth, and finally broken open by humans using nuclear weapons, but found to have contents that were incomprehensible. Kubrick and Clarke worried that before 2001 was released, their story might have been overtaken by the actual discovery of extraterrestrial intelligence (and they even explored taking out insurance against this possibility).
But as it is, 2001 became basically the first serious movie exploration of what the discovery of extraterrestrial intelligence might be like. As I’ve recently discussed at length, deciding in the abstract whether or not something was really “produced by intelligence” is a philosophically deeply challenging problem. But at least in the world as it is today, we have a pretty good heuristic: things that look geometrically simpler (with straight edges, circles, etc.) are probably artifacts. Of course, at some level it’s a bit embarrassing that nature seems to quite effortlessly make things that look more complex than what we typically produce, even with all our engineering prowess. And, as I’ve argued elsewhere, as we learn to take advantage of more of the computational universe, this will no doubt change. But at least for now, the “if it’s geometrically simple, it’s probably an artifact” heuristic works quite well.
And in 2001 we see it in action—when the perfectly cuboidal black monolith appears on the 4-million-year-old Earth: it’s visually very obvious that it isn’t something that belongs, and that it’s something that was presumably deliberately constructed.
A little later in the movie, another black monolith is discovered on the Moon. It’s noticed because of what’s called in the movie the “Tycho Magnetic Anomaly” (“TMA-1”)—probably named by Kubrick and Clarke after the South Atlantic Anomaly associated with the Earth’s radiation belts, that was discovered in 1958. The magnetic anomaly could have been natural (“a magnetic rock”, as one of the characters says). But once it’s excavated and found to be a perfect black cuboidal monolith, extraterrestrial intelligence seems the only plausible origin.
As I’ve discussed elsewhere, it’s hard to even recognize intelligence that doesn’t have any historical or cultural connection to our own. And it’s essentially inevitable that this kind of alien intelligence will seem to us in many ways incomprehensible. (It’s a curious question, though, what would happen if the alien intelligence had already inserted itself into the distant past of our own history, as in 2001.)
Kubrick and Clarke at first assumed that they’d have to actually show extraterrestrials somewhere in the movie. And they worried about things like how many legs they might have. But in the end Kubrick decided that the only alien that had the degree of impact and mystery that he wanted was an alien one never actually saw.
And so, for the last 17% of 2001, after Dave Bowman goes through the “star gate” near Jupiter, one sees what was probably supposed to be purposefully incomprehensible—if aesthetically interesting. Are these scenes of the natural world elsewhere in the universe? Or are these artifacts created by some advanced civilization?
We see some regular geometric structures, that read to us like artifacts. And we see what appear to be more fluid or organic forms, that do not. For just a few frames there are seven strange flashing octahedra.
I’m pretty sure I never noticed these when I first saw 2001 fifty years ago. But in 1997, when I studied the movie in connection with HAL’s birthday, I’d been thinking for years about the origins of complexity, and about the differences between natural and artificial systems—so the octahedra jumped out at me (and, yes, I spent quite a while wrangling the LaserDisc version of 2001 I had back then to try to look at them more carefully).
I didn’t know what the octahedra were supposed to be. With their regular flashing, I at first assumed they were meant to be some kind of space beacons. But I’m told that actually they were supposed to be the extraterrestrials themselves, appearing in a little cameo. Apparently there’d been an earlier version of the script in which the octahedra wound up riding in a ticker tape parade in New York City—but I think the cameo was a better idea.
When Kubrick was interviewed about 2001, he gave an interesting theory for the extraterrestrials: “They may have progressed from biological species, which are fragile shells for the mind at best, into immortal machine entities—and then, over innumerable eons, they could emerge from the chrysalis of matter transformed into beings of pure energy and spirit. Their potentialities would be limitless and their intelligence ungraspable by humans.”
It’s interesting to see Kubrick grappling with the idea that minds and intelligence don’t have to have physical form. Of course, in HAL he’d already in a sense imagined a “non-physical mind”. But back in the 1960s, with the idea of software only just emerging, there wasn’t yet a clear notion that computation could be something meaningful in its own right, independent of the particulars of its “hardware” implementation.
That universal computation was possible had arisen as an essentially mathematical idea in the 1930s. But did it have physical implications? In the 1980s I started talking about things like computational irreducibility, and about some of the deep connections between universal computation and physics. But back in the 1950s, people looked for much more direct implications of universal computation. And one of the notable ideas that emerged was of “universal constructors”—that would somehow be able to construct anything, just as universal computers could compute anything.
In 1952—as part of his attempt to “mathematicize” biology—John von Neumann wrote a book about “self-reproducing automata” in which he came up with what amounts to an extremely complicated 2D cellular automaton that can have a configuration that reproduces itself. And of course—as was discovered in 1953—it turns out to be correct that digital information, as encoded in DNA, is what specifies the construction of biological organisms.
But in a sense von Neumann’s efforts were based on the wrong intuition. For he assumed (as I did, before I saw evidence to the contrary) that to make something that has a sophisticated feature like self-reproduction, the thing itself must somehow be correspondingly complicated.
But as I discovered many years later by doing experiments in the computational universe of simple programs, it’s just not true that it takes a complicated system to show complicated behavior: even systems (like cellular automata) with some of the simplest imaginable rules can do it. And indeed, it’s perfectly possible to have systems with very simple rules that show self-reproduction—and in the end self-reproduction doesn’t seem like a terribly special feature at all (think computer code that copies itself, etc.).
But back in the 1950s von Neumann and his followers didn’t know that. And given the enthusiasm for things to do with space, it was inevitable that the idea of “self-reproducing machines” would quickly find its way into notions of self-reproducing space probes (as well as self-reproducing lunar factories, etc.)
I’m not sure if these threads had come together by the time 2001 was made, but certainly by the time of the 2010 sequel, Arthur C. Clarke had decided that the black monoliths were self-reproducing machines. And in a scene reminiscent of the modern idea that AIs, when given the directive to make more paperclips, might turn everything (including humans) into paperclips, the 2010 movie includes black monoliths turning the entire planet of Jupiter into a giant collection of black monoliths.
What are the aliens trying to do in 2001? I think Kubrick recognized that their motivations would be difficult to map onto anything human. Why for example does Dave Bowman wind up in what looks like a Louis-XV-style hotel suite—that’s probably the most timeless human-created backdrop of the movie (except for the fact that in keeping with 1960s practices, there’s a bathtub but no shower in the suite)?
It’s interesting that 2001 contains both artificial and extraterrestrial intelligence. And it’s interesting that 50 years after 2001 was released, we’re getting more and more comfortable with the idea of artificial intelligence, yet we believe we’ve seen no evidence of extraterrestrial intelligence.
As I’ve argued extensively elsewhere, I think the great challenge of thinking about extraterrestrial intelligence is defining what we might mean by intelligence. It’s very easy for us humans to have the analog of a pre-Copernican view in which we assume that our intelligence and capabilities are somehow fundamentally special, just like the Earth used to be assumed to be at the center of the universe.
But what my Principle of Computational Equivalence suggests is that in fact we’ll never be able to define anything fundamentally special about our intelligence; what’s special about it is its particular history and connections. Does the weather “have a mind of its own”? Well, based on the Principle of Computational Equivalence I don’t think there’s anything fundamentally different about the computations it’s doing from the ones that go on in our brains.
And similarly, when we look out into the cosmos, it’s easy to see examples of sophisticated computation going on. Of course, we don’t think of the complex processes in a pulsar magnetosphere as “extraterrestrial intelligence”; we just think of them as something “natural”. In the past we might have argued that however complex such a process looks, it’s really somehow fundamentally simpler than human intelligence. But given the Principle of Computational Equivalence we know this isn’t true.
So why don’t we consider a pulsar magnetosphere to be an example of “intelligence”? Well, because in it we don’t recognize anything like our own history, or our own detailed behavior. And as a result, we don’t have a way to connect what it does with purposes that we humans understand.
The computational universe of all possible programs is full of sophisticated computations that aren’t aligned with any existing human purposes. But as we try to develop AI, what we are effectively doing is to mine that computational universe for programs that do things we want done.
Out there in the computational universe, though, there’s an infinite collection of “possible AIs”. And there’s nothing less capable about the ones that we don’t yet choose to use; we just don’t see how they align with things we want.
Artificial intelligence is in a sense the first example of alien intelligence that we’re seeing (yes, there are animals too, but it’s easier to connect with AI). We’re still at the very early stages of getting widespread intuition about AI. But as we understand more about what AI really can be, and how it relates to everything else in the computational universe, I think we’ll get a clearer perspective on the forms intelligence can take.
Will we find extraterrestrial intelligence? Well, in many respects I think we already have. It’s all around us in the universe—doing all kinds of sophisticated computations.
Will there ever be a dramatic moment, like in 2001, where we find extraterrestrial intelligence that’s aligned enough with our own intelligence that we can recognize the perfect black monoliths it makes—even if we can’t figure out their “purpose”? My current suspicion is that it’ll be more “push” than “pull”: instead of seeing something that we suddenly recognize, we’ll instead gradually generalize our notion of intelligence, until we start to be comfortable attributing it not just to ourselves and our AIs, but also to other things in the universe.
Personal Journey
When I first saw 2001 I don’t think I ever even calculated how old I’d be in the year 2001. I was always thinking about what the future might be like, but I didn’t internalize actually living through it. Back when I was 8 years old, in 1968, space was my greatest interest, and I made lots of little carefully stapled booklets, full of typewritten text and neatly drawn diagrams. I kept detailed notes on every space probe that was launched, and tried to come up with spacecraft (I wrote it “space-craft”) designs of my own.
What made me do this? Well, presaging quite a bit that I’ve done in my life, I did it just because I found it personally interesting. I never showed any of it to anyone, and never cared what anyone might think of it. And for nearly 50 years I’ve just had it all stored away. But looking at it again now, I found one unique example of something related to my interests that I did for school: a booklet charmingly titled “The Future”, written when I was 9 or 10 years old, and containing what’s to me now a cringingly embarrassing page of my predictions for the future of space exploration (complete with a nod to 2001):
Fortunately perhaps, I didn’t wait around to find out how wrong these predictions were, and within a couple of years my interest in space had transformed into interests in more foundational fields, first physics and then computation and the study of the computational universe. When I first started using computers around 1972, it was a story of paper tape and teleprinters—far from the flashing screens of 2001.
But I’ve been fortunate enough to live through a time when the computer technology of 2001 went from pure fiction to something close to fact. And I’ve been even more fortunate to have been able to contribute a bit to that.
I’ve often said—in a kind of homage to 2001—that my favorite personal aspiration is to build “alien artifacts”: things that are recognizable once they’re built, but which nobody particularly expected would exist or be possible. I like to think that Wolfram|Alpha is some kind of example—as is what the Wolfram Language has become. And in a sense so have my efforts been in exploring the computational universe.
I never interacted with Stanley Kubrick. But I did interact with Arthur C. Clarke, particularly when my big book A New Kind of Sciencewas being published. (I like to think that the book is big in content, but it is definitely big in size, with 1280 pages, weighing nearly 6 pounds.) Arthur C. Clarke asked for a pre-publication copy, which I duly sent, and on March 1, 2002, I received an email from him saying that “A ruptured postman has just staggered away from my front door… Stay tuned…..”.
Then, three days later, I got another piece of mail: “Well, I have <looked> at (almost) every page and am still in a state of shock. Even with computers, I don’t see how you could have done it.” Wow! I actually succeeded in making what seemed to Arthur C. Clarke like an alien artifact!
He offered me a back-cover quote for the book: “… Stephen’s magnum opus may be the book of the decade, if not the century. It’s so comprehensive that perhaps he should have called it ‘A New Kind of Universe’, and even those who skip the 1200 pages of (extremely lucid) text will find the computer-generated illustrations fascinating. My friend HAL is very sorry he hadn’t thought of them first…” (In the end Steve Jobs talked me out of having quotes on the book, though, saying “Isaac Newton didn’t have back-cover quotes; why do you want them?”)
It’s hard for me to believe it’s been 50 years since I first saw 2001. Not all of 2001 has come true (yet). But for me what was important was that it presented a vision of what might be possible—and an idea of how different the future might be. It helped me set the course of my life to try to define in whatever ways I can what the future will be. And not just waiting for aliens to deliver monoliths, but trying to build some “alien artifacts” myself.
The more one does computational thinking, the better one gets at it. And today we’re launching the Wolfram Challenges site to give everyone a source of bite-sized computational thinking challenges based on the Wolfram Language. Use them to learn. Use them to stay sharp. Use them to prove how great you are.
The Challenges typically have the form: “Write a function to do X”. But because we’re using the Wolfram Language—with all its built-in computational intelligence—it’s easy to make the X be remarkably sophisticated.
The site has a range of levels of Challenges. Some are good for beginners, while others will require serious effort even for experienced programmers and computational thinkers. Typically each Challenge has at least some known solution that’s at most a few lines of Wolfram Language code. But what are those lines of code?
There may be many different approaches to a particular Challenge, leading to very different kinds of code. Sometimes the code will be smaller, sometimes it will run faster, and so on. And for each Challenge, the site maintains a leaderboard that shows who’s got the smallest, the fastest, etc. solution so far.
What does it take to be able to tackle Challenges on the site? If you’ve read my An Elementary Introduction to the Wolfram Language, for example, you should be well prepared—maybe with some additional help on occasion from the main Wolfram Language documentation. But even if you’re more of a beginner, you should still be able to do simpler Challenges, perhaps looking at parts of my book when you need to. (If you’re an experienced programmer, a good way to jump-start yourself is to look at the Fast Introduction for Programmers.)
How It Works
There are lots of different kinds of Challenges on the site. Each Challenge is tagged with topic areas. And on the front page there are a number of “tracks” that you can use as guides to sequences of related Challenges. Here are the current Challenges in the Real-World Data track:
Click one you want to try—and you’ll get a webpage that explains the Challenge:
Now you can choose either to download the Challenge notebook to the desktop, or just open it directly in your web browser in the Wolfram Cloud. (It’s free to use the Wolfram Cloud for this, though you’ll have to have a login—otherwise the system won’t be able to give you credit for the Challenges you’ve solved.)
Here’s the cloud version of this particular notebook:
You can build up your solution in the Scratch Area, and try it out there. Then when you’re ready, put your code where it says “Enter your code here”. Then press Submit.
What Submit does is to send your solution to the Wolfram Cloud—where it’ll be tested to see if it’s correct. If it’s not correct, you’ll get something like this:
But if it’s correct, you’ll get this, and you’ll be able to go to the leaderboard and see how your solution compared to other people’s. You can submit the same Challenge as many times as you want. (By the way, you can pick your name and icon for the leaderboard from the Profile tab.)
The Range of Challenges
The range of Challenges on the site is broad both in terms of difficulty level and topic. (And, by the way, we’re planning to progressively grow the site, not least through material from outside contributors.)
Here’s an example of a simple Challenge, that for example I can personally solve in a few seconds:
Here’s a significantly more complicated Challenge, that took me a solid 15 minutes to solve at all well:
Some of the Challenges are in a sense “pure algorithm challenges” that don’t depend on any outside data:
And some of the Challenges are “math-y”, and make use of the math capabilities of the Wolfram Language:
Pre-launch Experience
We’ve been planning to launch a site like Wolfram Challenges for years, but it’s only now, with the current state of the Wolfram Cloud, that we’ve been able to set it up as we have today—so that anyone can just open a web browser and start solving Challenges.
Still, we’ve had unannounced preliminary versions for about three years now—complete with a steadily growing number of Challenges. And in fact, a total of 270 people have discovered the preliminary version—and produced in all no less than 11,400 solutions. Some people have solved the same Challenge many times, coming up with progressively shorter or progressively faster solutions. Others have moved on to different Challenges.
It’s interesting to see how diverse the solutions to even a single Challenge can be. Here are word clouds of the functions used in solutions to three different Challenges:
And when it comes to lengths of solutions (here in characters of code), there can be quite a variation for a particular Challenge:
Here’s the distribution of solution lengths for all solutions submitted during the pre-launch period, for all Challenges:
It’s not clear what kind of distribution this is (though it seems close to lognormal). But what’s really nice is how concentrated it is on solutions that aren’t much more than a line long. (81% of them would even fit in a 280-character tweet!)
And in fact what we’re seeing can be viewed as a great tribute to the Wolfram Language. In any other programming language most Challenges—if one could do them at all—would take pages of code. But in the Wolfram Language even sophisticated Challenges can often be solved with just tweet-length amounts of code.
Why is this? Well, basically it’s because the Wolfram Language is a different kind of language: it’s a knowledge-based language where lots of knowledge about computation and other things is built right into the language (thanks to 30+ years of hard work on our part).
But then are the Challenges still “real”? Of course! It’s just that the Wolfram Language lets one operate at a higher level. One doesn’t have to worry about writing out the low-level mechanics of how even sophisticated operations get implemented—one can just concentrate on the pure high-level computational thinking of how to get the Challenge done.
Under the Hood
OK, so what have been some of the challenges in setting up the Wolfram Challenges site? Probably the most important is how to check whether a particular solution is correct. After all, we’re not just asking to compute some single result (say, 42) that we can readily compare with. We’re asking to create a function that can take a perhaps infinite set of possible arguments, and in each case give the correct result.
So how can we know if the function is correct? In some simple cases, we can actually see if the code of the function can be transformed in a meaning-preserving way into code that we already know is correct. But most of the time—like in most practical software quality assurance—the best thing to do is just to try test cases. Some will be deterministically chosen—say based on checking simple or corner cases. Others can be probabilistically generated.
But in the end, if we find that the function isn’t correct, we want to give the user a simple case that demonstrates this. Often in practice we may first see failure in some fairly complicated case—but then the system tries to simplify the failure as much as possible.
OK, so another issue is: how does one tell whether a particular value of a function is correct? If the value is just something like an integer (say, 343) or a string (say, “hi”), then it’s easy. But what if it’s an approximate number (say, 3.141592…)? Well, then we have to start worrying about numerical precision. And what if it’s a mathematical expression (say, 1 + 1/x)? What transformations should we allow on the expression?
There are many other cases too. If it’s a network, we’ll probably want to say it’s correct if it’s isomorphic to what we expect (i.e. the same up to relabeling nodes). If it’s a graphic, we’ll probably want to say it’s correct if it visually looks the same as we expected, or at least is close enough. And if we’re dealing with real-world data, then we have to make sure to recompute our expected result, to take account of data in our knowledgebase that’s changed because of changes out there in the real world.
Alright, so let’s say we’ve concluded that a particular function is correct. Well now, to fill in the leaderboard, we have to make some measurements on it. First, how long is the code?
We can just format the code in InputForm, then count characters. That gives us one measure. One can also apply ByteCount to just count bytes in the definition of the function. Or we can apply LeafCount, to count the number of leaves in the expression tree for the definition. The leaderboard separately tracks the values for all these measures of “code size”.
OK, so how about the speed of the code? Well, that’s a bit tricky. First because speed isn’t something abstract like “total number of operations on a Turing machine”—it’s actual speed running a computer. And so it has be normalized for the speed of the computer hardware. Then it has to somehow discard idiosyncrasies (say associated with caching) seen in particular test runs, as achieved by RepeatedTiming. Oh, and even more basically, it has to decide which instances of the function to test, and how to average them. (And it has to make sure that it won’t waste too much time chasing an incredibly slow solution.)
Well, to actually do all these things, one has to make a whole sequence of specific decisions. And in the end what we’ve done is to package everything up into a single “speed score” that we report in the leaderboard.
A final metric in the leaderboard is “memory efficiency”. Like “speed score”, this is derived in a somewhat complicated way from actual test runs of the function. But the point is that within narrow margins, the results should be repeatable between identical solutions. (And, yes, the speed and memory leaderboards might change when they’re run in a new version of the Wolfram Language, with different optimizations.)
Backstory
We first started testing what’s now the Wolfram Challenges site at the Wolfram Summer School in 2016—and it was rapidly clear that many people found the kinds of Challenges we’d developed quite engaging. At first we weren’t sure how long—and perhaps whimsical—to make the Challenges. We experimented with having whole “stories” in each Challenge (like some math competitions and things like Project Euler do). But pretty soon we decided to restrict Challenges to be fairly short to state—albeit sometimes giving them slightly whimsical names.
We tested our Challenges again at the 2017 Wolfram Summer School, as well as at the Wolfram High School Summer Camp—and we discovered that the Challenges were addictive enough that some people systematically went through trying to solve all of them.
We were initially not sure what forms of Challenges to allow. But after a while we made the choice to (at least initially) concentrate on “write a function to do X”, rather than, for example, just “compute X”. Our basic reason was that we wanted the solutions to the Challenges to be more open-ended.
If the challenge is “compute X”, then there’s typically just one final answer, and once you have it, you have it. But with “write a function to do X”, there’s always a different function to write—that might be faster, smaller, or just different. At a practical level, with “compute X” it’s easier to “spoil the fun” by having answers posted on the web. With “write a function”, yes, there could be one version of code for a function posted somewhere, but there’ll always be other versions to write—and if you always submit versions that have been seen before it’ll soon be pretty clear you have to have just copied them from somewhere.
As it turns out, we’ve actually had quite a bit of experience with the “compute X” format. Because in my book An Elementary Introduction to the Wolfram Language all 655 exercises are basically of the form “write code to compute X”. And in the online version of the book, all these exercises are automatically graded.
Now, if we were just doing “cheap” automatic grading, we’d simply look to see if the code produces the correct result when it runs. But that doesn’t actually check the code. After all, if the answer was supposed to be 42, someone could just give 42 (or maybe 41 + 1) as the “code”.
Our actual automatic grading system is much more sophisticated. It certainly looks at what comes out when the code runs (being careful not to blindly evaluate Quit in a piece of code—and taking account of things like random numbers or graphics or numerical precision). But the real meat of the system is the analysis of the code itself, and the things that happen when it runs.
Because the Wolfram Language is symbolic, “code” is the same kind of thing as “data”. And the automatic grading system makes extensive use of this—not least in applying sequences of symbolic code transformations to determine whether a particular piece of code that’s been entered is equivalent to one that’s known to represent an appropriate solution. (The system has ways to handle “completely novel” code structures too.)
Code equivalence is a difficult (in fact, in general, undecidable) problem. A slightly easier problem (though still in general undecidable) is equivalence of mathematical expressions. And a place where we’ve used this kind of equivalence extensively is in our Wolfram Problem Generator:
Of course, exactly what equivalence we want to allow may depend on the kind of problem we’re generating. Usually we’ll want 1 + x and x + 1 to be considered equivalent. But (1 + x)/x might or might not want to be considered equivalent to 1 + 1/x. It’s not easy to get these things right (and many online grading systems do horribly at it). But by using some of the sophisticated math and symbolic transformation capabilities available in the Wolfram Language, we’ve managed to make this work well in Wolfram Problem Generator.
Contribute New Challenges!
The Wolfram Challenges site as it exists today is only the beginning. We intend it to grow. And the best way for it to grow—like our long-running Wolfram Demonstrations Project—is for people to contribute great new Challenges for us to include.
At the bottom of the Wolfram Challenges home page you can download the Challenges Authoring Notebook:
Fill this out, press “Submit Challenge”—and off this will go to us for review.
Beyond Challenges
I’m not surprised that Wolfram Challenges seem to appeal to people who like solving math puzzles, crosswords, brain teasers, sudoku and the like. I’m also not surprised that they appeal to people who like gaming and coding competitions. But personally—for better or worse—I don’t happen to fit into any of these categories. And in fact when we were first considering creating Wolfram Challenges I said “yes, lots of people will like it, but I won’t be one of them”.
Well, I have to say I was wrong about myself. Because actually I really like doing these Challenges—and I’m finding I have to avoid getting started on them because I’ll just keep doing them (and, yes, I’m a finisher, so there’s a risk I could just keep going until I’ve done them all, which would be a very serious investment of time).
So what’s different about these Challenges? I think the answer for me is that they feel much more real. Yes, they’ve been made up to be Challenges. But the kind of thinking that’s needed to solve them is essentially just the same as the kind of thinking I end up doing all the time in “real settings”. So when I work on these Challenges, I don’t feel like I’m “just doing something recreational”; I feel like I’m honing my skills for real things.
Now I readily recognize that not everyone’s motivation structure is the same—and many people will like doing these Challenges as true recreations. But I think it’s great that Challenges can also help build real skills. And of course, if one sees that someone has done lots of these Challenges, it shows that they have some real skills. (And, yes, we’re starting to use Challenges as a way to assess applicants, say, for our summer programs.)
It’s worth saying there are some other nice “potentially recreational” uses of the Wolfram Language too.
One example is competitive livecoding. The Wolfram Language is basically unique in being a language in which interesting programs can be written fast enough that it’s fun to watch. Over the years, I’ve done large amounts of (non-competitive) livecoding—both in person and livestreamed. But in the past couple of years we’ve been developing the notion of competitive livecoding as a kind of new sport.
We’ve done some trial runs at our Wolfram Technology Conference—and we’re working towards having robust rules and procedures. In what we’ve done so far, the typical challenges have been of the “compute X” form—and people have taken between a few seconds and perhaps ten minutes to complete them. We’ve used what’s now our Wolfram Chat functionality to distribute Challenges and let contestants submit solutions. And we’ve used automated testing methods—together with human “refereeing”—to judge the competitions.
A different kind of recreational application of the Wolfram Language is our Tweet-a-Program service, released in 2014. The idea here is to write Wolfram Language programs that are short enough to fit in a tweet (and when we launched Tweet-a-Program that meant just 128 characters)—and to make them produce output that is as interesting as possible:
We’ve also had a live analog of this at our Wolfram Technology Conference for some time: our annual One-Liner Competition. And I have to say that even though I (presumably) know the Wolfram Language well, I’m always amazed at what people actually manage to do with just a single line of Wolfram Language code.
At our most recent Wolfram Technology Conference, in recognition of our advances in machine learning, we decided to also do a “Machine-Learning Art Competition”—to make the most interesting possible restyled “Wolfie”:
In the future, we’re planning to do machine learning challenges as part of Wolfram Challenges too. In fact, there are several categories of Challenges we expect to add. We’ve already got Challenges that make use of the Wolfram Knowledgebase, and the built-in data it contains. But we’re also planning to add Challenges that use external data from the Wolfram Data Repository. And we want to add Challenges that involve creating things like neural networks.
There’s a new issue that arises here—and that’s actually associated with a large category of possible Challenges. Because with most uses of things like neural networks, one no longer expects to produce a function that definitively “gets the right answer”. Instead, one just wants a function that does the best possible job on a particular task.
There are plenty of examples of Challenges one can imagine that involve finding “the lowest-cost solution”, or the “best fit”. And it’s a similar setup with typical machine learning tasks: find a function (say based on a neural network) that performs best on classifying a certain test set, etc.
And, yes, the basic structure of Wolfram Challenges is well set up to handle a situation like this. It’s just that instead of it definitively telling you that you’ve got a correct solution for a particular Challenge, it’ll just tell you how your solution ranks relative to others on the leaderboard.
The Challenges in the Wolfram Challenges site always have very well-defined end goals. But one of the great things about the Wolfram Language is how easy it is to use it to explore and create in an open-ended way. But as a kind of analog of Challenges one can always give seeds for this. One example is the Go Further sections of the Explorations in Wolfram Programming Lab. And other examples are the many kinds of project suggestions we make for things like our summer programs.
What is the right output for an open-ended exploration? I think a good answer in many cases is a computational essay, written in a Wolfram Notebook, and “telling a story” with a mixture of ordinary text and Wolfram Language code. Of course, unlike Challenges, where one’s doing something that’s intended to be checked and analyzed by machine, computational essays are fundamentally about communicating with humans—and don’t have right or wrong “answers”.
The Path Forward
One of my overarching goals in creating the Wolfram Language has been to bring computational knowledge and computational thinking to as many people as possible. And the launch of the Wolfram Challenges site is the latest step in the long journey of doing this.
It’s a great way to engage with programming and computational thinking. And it’s set up to always let you know how you’re getting on. Did you solve that Challenge? How did you do relative to other people who’ve also solved the Challenge?
I’m looking forward to seeing just how small and efficient people can make the solutions to these Challenges. (And, yes, large numbers of equivalent solutions provide great raw material for doing machine learning on program transformations and optimization.)
Who will be the leaders on the leaderboards of Wolfram Challenges? I think it’ll be a wide range of people—with different backgrounds and education. Some will be young; some will be old. Some will be from the most tech-rich parts of the world; some, I hope, will be from tech-poor areas. Some will already be energetic contributors to the Wolfram Language community; others, I hope, will come to the Wolfram Language through Challenges—and perhaps even be “discovered” as talented programmers and computational thinkers this way.
But most of all, I hope lots of people get lots of enjoyment and fulfillment out of Wolfram Challenges—and get a chance to experience that thrill that comes with figuring out a particularly clever and powerful solution that you can then see run on your computer.
On June 23 we celebrate the 30th anniversary of the launch of Mathematica. Most software from 30 years ago is now long gone. But not Mathematica. In fact, it feels in many ways like even after 30 years, we’re really just getting started. Our mission has always been a big one: to make the world as computable as possible, and to add a layer of computational intelligence to everything.
Our first big application area was math (hence the name “Mathematica”). And we’ve kept pushing the frontiers of what’s possible with math. But over the past 30 years, we’ve been able to build on the framework that we defined in Mathematica 1.0 to create the whole edifice of computational capabilities that we now call the Wolfram Language—and that corresponds to Mathematica as it is today.
From when I first began to design Mathematica, my goal was to create a system that would stand the test of time, and would provide the foundation to fill out my vision for the future of computation. It’s exciting to see how well it’s all worked out. My original core concepts of language design continue to infuse everything we do. And over the years we’ve been able to just keep building and building on what’s already there, to create a taller and taller tower of carefully integrated capabilities.
It’s fun today to launch Mathematica 1.0 on an old computer, and compare it with today:
Yes, even in Version 1, there’s a recognizable Wolfram Notebook to be seen. But what about the Mathematica code (or, as we would call it today, Wolfram Language code)? Well, the code that ran in 1988 just runs today, exactly the same! And, actually, I routinely take code I wrote at any time over the past 30 years and just run it.
Of course, it’s taken a lot of long-term discipline in language design to make this work. And without the strength and clarity of the original design it would never have been possible. But it’s nice to see that all that daily effort I’ve put into leadership and consistent language design has paid off so well in long-term stability over the course of 30 years.
There were 551 built-in functions in 1988; there are now more than 5100. And the expectations for each function have vastly increased too. The concept of “superfunctions” that automate a swath of algorithmic capability already existed in 1988—but their capabilities pale in comparison to our modern superfunctions.
Back in 1988 the core ideas of symbolic expressions and symbolic programming were already there, working essentially as they do today. And there were also all sorts of functions related to mathematical computation, as well as to things like basic visualization. But in subsequent years we were able to conquer area after area.
Partly it’s been the growth of raw computer power that’s made new areas possible. And partly it’s been our ability to understand what could conceivably be done. But the most important thing has been that—through the integrated design of our system—we’ve been able to progressively build on what we’ve already done to reach one new area after another, at an accelerating pace. (Here’s a plot of function count by version.)
I recently found a to-do list I wrote in 1991—and I’m happy to say that now, in 2018, essentially everything on it has been successfully completed. But in many cases it took building a whole tower of capabilities—over a large number of years—to be able to achieve what I wanted.
From the very beginning—and even from projects of mine that preceded Mathematica—I had the goal of building as much knowledge as possible into the system. At the beginning the knowledge was mostly algorithmic, and formal. But as soon we could routinely expect network connectivity to central servers, we started building in earnest what’s now our immense knowledgebase of computable data about the real world.
Back in 1988, I could document pretty much everything about Mathematica in the 750-page book I wrote. Today if we were to print out the online documentation it would take perhaps 36,000 pages. The core concepts of the system remain as simple and clear as they ever were, though—so it’s still perfectly possible to capture them even in a small book.
How the World Has Changed
Thirty years is basically half the complete history of modern digital computing. And it’s remarkable—and very satisfying—that Mathematica and the Wolfram Language have had the strength not only to persist, but to retain their whole form and structure, across all that time.
Thirty years ago Mathematica (all 2.2 megabytes of it) came in boxes available at “neighborhood software stores”, and was distributed on collections of floppy disks (or, for larger computers, on various kinds of magnetic tapes). Today one just downloads it anytime (about 4 gigabytes), accessing its knowledgebase (many terabytes) online—or one just runs the whole system directly in the Wolfram Cloud, through a web browser. (In a curious footnote to history, the web was actually invented back in 1989 on a collection of NeXT computers that had been bought to run Mathematica.)
Thirty years ago there were “workstation class computers” that ran Mathematica, but were pretty much only owned by institutions. In 1988, PCs used MS-DOS, and were limited to 640K of working memory—which wasn’t enough to run Mathematica. The Mac could run Mathematica, but it was always a tight fit (“2.5 megabytes of memory required; 4 megabytes recommended”)—and in the footer of every notebook was a memory gauge that showed you how close you were to running out of memory. Oh, yes, and there were two versions of Mathematica, depending on whether or not your machine had a “numeric coprocessor” (which let it do floating-point arithmetic in hardware rather than in software).
Back in 1988, I had got my first cellphone—which was the size of a shoe. And the idea that something like Mathematica could “run on a phone” would have seemed preposterous. But here we are today with the Wolfram Cloud app on phones, and Wolfram Player running natively on iPads (and, yes, they don’t have virtual memory, so our tradition of tight memory management from back in the old days comes in very handy).
In 1988, computers that ran Mathematica were always things you plugged into a power outlet to use. And the notion of, for example, using Mathematica on a plane was basically inconceivable (well, OK, even in 1981 when I lugged my Osborne 1 computer running CP/M onto a plane, I did find one power outlet for it at the very back of a 747). It wasn’t until 1991 that I first proudly held up at a talk a Compaq laptop that was (creakily) running Mathematica off batteries—and it wasn’t routine to run Mathematica portably for perhaps another decade.
For years I used to use 1989^1989 as my test computation when I tried Mathematica on a new machine. And in 1989 I would usually be counting the seconds waiting for the computation to be finished. (1988^1988 was usually too slow to be useful back in 1988: it could take minutes to return.) Today, of course, the same computation is instantaneous. (Actually, a few years ago, I did the computation again on the first Raspberry Pi computer—and it again took several seconds. But that was a $25 computer. And now even it runs the computation very fast.)
The increase in computer speed over the years has had not only quantitative but also qualitative effects on what we’ve been able to do. Back in 1988 one basically did a computation and then looked at the result. We talked about being able to interact with a Mathematica computation in real time (and there was actually a demo on the NeXT computer that did a simple case of this even in 1989). But it basically took 18 years before computers were routinely fast enough that we could implement Manipulate and Dynamic—with “Mathematica in the loop”.
I considered graphics and visualization an important feature of Mathematica from the very beginning. Back then there were “paint” (bitmap) programs, and there were “draw” (vector) programs. We made the decision to use the then-new PostScript language to represent all our graphics output resolution-independently.
We had all sorts of computational geometry challenges (think of all those little shattered polygons), but even back in 1988 we were able to generate resolution-independent 3D graphics, and in preparing for the original launch of Mathematica we found the “most complicated 3D graphic we could easily generate”, and ended up with the original icosahedral “spikey”—which has evolved today into our rhombic hexecontahedron logo:
In a sign of a bygone software era, the original Spikey also graced the elegant, but whimsical, Mathematica startup screen on the Mac:
Back in 1988, there were command-line interfaces (like the Unix shell), and there were word processors (like WordPerfect). But it was a new idea to have “notebooks” (as we called them) that mixed text, input and output—as well as graphics, which more usually were generated in a separate window or even on a separate screen.
Even in Mathematica 1.0, many of the familiar features of today’s Wolfram Notebooks were already present: cells, cell groups, style mechanisms, and more. There was even the same doubled-cell-bracket evaluation indicator—though in those days longer rendering times meant there needed to be more “entertainment”, which Mathematica provided in the form of a bouncing-string-figure wait cursor that was computed in real time during the vertical retrace interrupt associated with refreshing the CRT display.
In what would now be standard good software architecture, Mathematica from the very beginning was always divided into two parts: a kernel doing computations, and a front end supporting the notebook interface. The two parts communicated through the MathLink protocol (still used today, but now called WSTP) that in a very modern way basically sent symbolic expressions back and forth.
Back in 1988—with computers like Macs straining to run Mathematica—it was common to run the front end on a local desktop machine, and then have a “remote kernel” on a heftier machine. Sometimes that machine would be connected through Ethernet, or rarely through the internet. More often one would use a dialup connection, and, yes, there was a whole mechanism in Version 1.0 to support modems and phone dialing.
When we first built the notebook front end, we thought of it as a fairly thin wrapper around the kernel—that we’d be able to “dash off” for the different user interfaces of different computer systems. We built the front end first for the Mac, then (partly in parallel) for the NeXT. Within a couple of years we’d built separate codebases for the then-new Microsoft Windows, and for X Windows.
But as we polished the notebook front end it became more and more sophisticated. And so it was a great relief in 1996 when we managed to create a merged codebase that ran on all platforms.
And for more than 15 years this was how things worked. But then along came the cloud, and mobile. And now, out of necessity, we again have multiple notebook front end codebases. Maybe in a few years we’ll be able to merge them again. But it’s funny how the same issues keep cycling around as the decades go by.
Unlike the front end, we designed the kernel from the beginning to be as robustly portable as possible. And over the years it’s been ported to an amazing range of computers—very often as the first serious piece of application software that a new kind of computer runs.
From the earliest days of Mathematica development, there was always a raw command-line interface to the kernel. And it’s still there today. And what’s amazing to me is how often—in some new and unfamiliar situation—it’s really nice to have that raw interface available. Back in 1988, it could even make graphics—as ASCII art—but that’s not exactly in so much demand today. But still, the raw kernel interface is what for example wolframscript uses to provide programmatic access to the Wolfram Language.
Software Archaeology
There’s much of the earlier history of computing that’s disappearing. And it’s not so easy in practice to still run Mathematica 1.0. But after going through a few early Macs, I finally found one that still seemed to run well enough. We loaded up Mathematica 1.0 from its distribution floppies, and yes, it launched! (I guess the distribution floppies were made the week before the actual release on June 23, 1988; I vaguely remember a scramble to get the final disks copied.)
Needless to say, when I wanted to livestream this, the Mac stopped working, showing only a strange zebra pattern on its screen. Whacking the side of the computer (a typical 1980s remedy) didn’t do anything. But just as I was about to give up, the machine suddenly came to life, and there I was, about to run Mathematica 1.0 again.
I tried all sorts of things, creating a fairly long notebook. But then I wondered: just how compatible is this? So I saved the notebook on a floppy, and put it in a floppy drive (yes, you can still get those) on a modern computer. At first, the modern operating system didn’t know what to do with the notebook file.
But then I added our old “.ma” file extension, and opened it. And… oh my gosh… it just worked! The latest version of the Wolfram Language successfully read the 1988 notebook file format, and rendered the live notebook (and also created a nice, modern “.nb” version):
There’s a bit of funny spacing around the graphics, reflecting the old way that graphics had to be handled back in 1988. But if one just selects the cells in the notebook, and presses Shift + Enter, up comes a completely modern version, now with color outputs too!
The Path Ahead
Before Mathematica, sophisticated technical computing was at best the purview of a small “priesthood” of technical computing experts. But as soon as Mathematica appeared on the scene, this all changed—and suddenly a typical working scientist or mathematician could realistically expect to do serious computation with their own hands (and then to save or publish the results in notebooks).
Over the past 30 years, we’ve worked very hard to open progressively more areas to immediate computation. Often there’s great technical sophistication inside. But our goal is to be able to let people translate high-level computational thinking as directly and automatically as possible into actual computations.
The result has been incredibly powerful. And it’s a source of great satisfaction to see how much has been invented and discovered with Mathematica over the years—and how many of the world’s most productive innovators use Mathematica and the Wolfram Language.
But amazingly, even after all these years, I think the greatest strengths of Mathematica and the Wolfram Language are only just now beginning to become broadly evident.
Part of it has to do with the emerging realization of how important it is to systematically and coherently build knowledge into a system. And, yes, the Wolfram Language has been unique in all these years in doing this. And what this now means is that we have a huge tower of computational intelligence that can be immediately applied to anything.
To be fair, for many of the past 30 years, Mathematica and the Wolfram Language were primarily deployed as desktop software. But particularly with the increasing sophistication of the general computing ecosystem, we’ve been able in the past 5–10 years to build out extremely strong deployment channels that have now allowed Mathematica and the Wolfram Language to be used in an increasing range of important enterprise settings.
Mathematica and the Wolfram Language have long been standards in research, education and fields like quantitative finance. But now they’re in a position to bring the tower of computational intelligence that they embody to any area where computation is used.
Since the very beginning of Mathematica, we’ve been involved with what’s now called artificial intelligence (and in recent times we’ve been leaders in supporting modern machine learning). We’ve also been very deeply involved with data in all forms, and with what’s now called data science.
But what’s becoming clearer only now is just how critical the breadth of Mathematica and the Wolfram Language is to allowing data science and artificial intelligence to achieve their potential. And of course it’s satisfying to see that all those capabilities that we’ve built over the past 30 years—and all the design coherence that we’ve worked so hard to maintain—are now so important in areas like these.
The concept of computation is surely the single most important intellectual development of the past century. And it’s been my goal with Mathematica and the Wolfram Language to provide the best possible vehicle to infuse high-level computation into every conceivable domain.
For pretty much every field X (from art to zoology) there either is now, or soon will be, a “computational X” that defines the future of the field by using the paradigm of computation. And it’s exciting to see how much the unique features of the Wolfram Language are allowing it to help drive this process, and become the “language of computational X”.
Traditional non-knowledge-based computer languages are fundamentally set up as a way to tell computers what to do—typically at a fairly low level. But one of the aspects of the Wolfram Language that’s only now beginning to be recognized is that it’s not just intended to be for telling computers what to do; it’s intended to be a true computational communication language, that provides a way of expressing computational thinking that’s meaningful both to computers and to humans.
In the past, it was basically just computers that were supposed to “read code”. But like a vast generalization of the idea of mathematical notation, the goal with the Wolfram Language is to have something that humans can readily read, and use to represent and understand computational ideas.
Combining this with the idea of notebooks brings us the notion of computational essays—which I think are destined to become a key communication tool for the future, uniquely made possible by the Wolfram Language, with its 30-year history.
Thirty years ago it was exciting to see so many scientists and mathematicians “discover computers” through Mathematica. Today it’s exciting to see so many new areas of “computational X” being opened up. But it’s also exciting to see that—with the level of automation we’ve achieved in the Wolfram Language—we’ve managed to bring sophisticated computation to the point where it’s accessible to essentially anyone. And it’s been particularly satisfying to see all sorts of kids—at middle-school level or even below—start to get fluent in the Wolfram Language and the high-level computational ideas it provides access to.
If one looks at the history of computing, it’s in many ways a story of successive layers of capability being added, and becoming ubiquitous. First came the early languages. Then operating systems. Later, around the time Mathematica came on the scene, user interfaces began to become ubiquitous. A little later came networking and then large-scale interconnected systems like the web and the cloud.
But now what the Wolfram Language provides is a new layer: a layer of computational intelligence—that makes it possible to take for granted a high level of built-in knowledge about computation and about the world, and an ability to automate its application.
Over the past 30 years many people have used Mathematica and the Wolfram Language, and many more have been exposed to their capabilities, through systems like Wolfram|Alpha built with them. But what’s possible now is to let the Wolfram Language provide a truly ubiquitous layer of computational intelligence across the computing world. It’s taken decades to build a tower of technology and capabilities that I believe are worthy of this—but now we are there, and it’s time to make this happen.
But the story of Mathematica and the Wolfram Language is not just a story of technology. It’s also a story of the remarkable community of individuals who’ve chosen to make Mathematica and the Wolfram Language part of their work and lives. And now, as we go forward to realize the potential for the Wolfram Language in the world of the future, we need this community to help explain and implement the paradigm that the Wolfram Language defines.
Needless to say, injecting new paradigms into the world is never easy. But doing so is ultimately what moves forward our civilization, and defines the trajectory of history. And today we’re at a remarkable moment in the ability to bring ubiquitous computational intelligence to the world.
But for me, as I look back at the 30 years since Mathematica was launched, I am thankful for everything that’s allowed me to single-mindedly pursue the path that’s brought us to the Mathematica and Wolfram Language of today. And I look forward to our collective effort to move forward from this point, and to contribute to what I think will ultimately be seen as a crucial element in the development of technology and our world.
To comment, please visit the copy of this post at the Wolfram Blog »
Logic is a foundation for many things. But what are the foundations of logic itself?
In symbolic logic, one introduces symbols like p and q to stand for statements (or “propositions”) like “this is an interesting essay”. Then one has certain “rules of logic”, like that, for any p and any q, NOT (pANDq) is the same as (NOTp) OR (NOTq).
But where do these “rules of logic” come from? Well, logic is a formal system. And, like Euclid’s geometry, it can be built on axioms. But what are the axioms? We might start with things like pANDq = qANDp, or NOTNOTp = p. But how many axioms does one need? And how simple can they be?
It was a nagging question for a long time. But at 8:31pm on Saturday, January 29, 2000, out on my computer screen popped a single axiom. I had already shown there couldn’t be anything simpler, but I soon established that this one little axiom was enough to generate all of logic:
That’s the same kind of question that’s increasingly being asked about all sorts of computational systems, and all sorts of applications of machine learning and AI. Yes, we can see what happens. But can we understand it?
I think this is ultimately a deep question—that’s actually critical to the future of science and technology, and in fact to the future of our whole intellectual development.
But before we talk more about this, let’s talk about logic, and about the axiom I found for it.
The History
Logic as a formal discipline basically originated with Aristotle in the 4th century BC. As part of his lifelong effort to catalog things (animals, causes, etc.), Aristotle cataloged valid forms of arguments, and created symbolic templates for them which basically provided the main content of logic for two thousand years.
By the 1400s, however, algebra had been invented, and with it came cleaner symbolic representations of things. But it was not until 1847 that George Boole finally formulated logic in the same kind of way as algebra, with logical operations like AND and OR being thought of as operating according to algebra-like rules.
Within a few years, people were explicitly writing down axiom systems for logic. A typical example was:
But does logic really need AND and OR and NOT? After the first decade of the 1900s several people had discovered that actually the single operation that we now call NAND is enough, with for example pORqbeing computed as (pNANDp) NAND (qNANDq). (The “functional completeness” of NAND could have remained forever a curiosity but for the development of semiconductor technology—which implements all the billions of logic operations in a modern microprocessor with combinations of transistors that perform none other than NAND or the related function NOR.)
But, OK, so what do the axioms of logic (or “Boolean algebra”) look like in terms of NAND? Here’s the first known version of them, from Henry Sheffer in 1913 (here dot · stands for NAND):
Back in 1910 Whitehead and Russell’sPrincipia Mathematica had popularized the idea that perhaps all of mathematics could be derived from logic. And particularly with this in mind, there was significant interest in seeing just how simple the axioms for logic could be. Some of the most notable work on this was done in Lviv and Warsaw (then both part of Poland), particularly by Jan Łukasiewicz (who, as a side effect of his work, invented in 1920 parenthesis-free Łukasiewicz or “Polish” notation). In 1944, at the age of 66, Łukasiewicz fled from the approaching Soviets—and in 1947 ended up in Ireland.
Meanwhile, the Irish-born Carew Meredith, who had been educated at Winchester and Cambridge, and had become a mathematics coach in Cambridge, had been forced by his pacifism to go back to Ireland in 1939. And in 1947, Meredith went to lectures by Łukasiewicz in Dublin, which inspired him to begin a search for simple axioms, which would occupy most of the rest of his life.
But could it get any simpler? Meredith had been picking away for years trying to see how a NAND could be removed here or there. But after 1967 he apparently didn’t get any further (he died in 1976), though in 1969 he did find the three-axiom system:
I actually didn’t know about Meredith’s work when I started exploring axiom systems for logic. I’d gotten into the subject as part of trying to understand what kinds of behavior simple rules could produce. Back in the early 1980s I’d made the surprising discovery that even cellular automata with some of the simplest possible rules—like my favorite rule 30—could generate behavior of great complexity.
And having spent the 1990s basically trying to figure out just how general this phenomenon was, I eventually wanted to see how it might apply to mathematics. It’s an immediate observation that in mathematics one’s basically starting from axioms (say for arithmetic, or geometry, or logic), and then trying to prove a whole collection of sophisticated theorems from them.
But just how simple can the axioms be? Well, that was what I wanted to discover in 1999. And as my first example, I decided to look at logic (or, equivalently, Boolean algebra). Contrary to what I would ever have expected beforehand, my experience with cellular automata, Turing machines, and many other kinds of systems—including even partial differential equations—was that one could just start enumerating the simplest possible cases, and after not too long one would start seeing interesting things.
But could one “discover logic” this way? Well, there was only one way to tell. And in late 1999 I set things up to start exploring what amounts to the space of all possible axiom systems—starting with the simplest ones.
In a sense any axiom system provides a set of constraints, say on p · q. It doesn’t say what p · q “is”; it just gives properties that p · q must satisfy (like, for example, it could say that p · q = p · q). Then the question is whether from these properties one can derive all the theorems of logic that hold when p · q is Nand[p, q]: no more and no less.
There’s a direct way to test some of this. Just take the axiom system, and see what explicit forms of satisfy the axioms if and can, say, be True or False. If the axiom system were just then, yes, could be Nand[, ]—but it doesn’t have to be. It could also be And[, ] or Equal[, ]—or lots of other things which won’t satisfy the same theorems as the NAND function in logic. But by the time one gets to the axiom system one’s reached the point where Nand[, ] (and the basically equivalent Nor[, ]) are the only “models” of that work—at least assuming and have only two possible values.
So is this then an axiom system for logic? Well, no. Because it implies, for example, that there’s a possible form for with 3 values for and , whereas there’s no such thing for logic. But, OK, the fact that this axiom system with just one axiom even gets close suggests it might be worth looking for a single axiom that reproduces logic. And that’s what I did back in January 2000 (it’s gotten a bit easier these days, thanks notably to the handy, fairly new Wolfram Language function Groupings).
It was easy to see that no axioms with 3 or fewer “NANDs” (or, really, 3 or fewer “dot operators”) could work. And by 5am on Saturday, January 29 (yes, I was a night owl then), I’d found that none with 4 NANDs could work either. By the time I stopped working on it a little after 6am, I’d gotten 14 possible candidates with 5 NANDs. But when I started work again on Saturday evening and did more tests, every one of these candidates failed.
So, needless to say, the next step was to try cases with 6 NANDs. There were 288,684 of these in all. But my code was efficient, and it didn’t take long before out popped on my screen (yes, from Mathematica Version 4):
At first I didn’t know what I had. All I knew was that these were the 25 inequivalent 6-NAND axioms that got further than any of the 5-NAND ones. But were any of them really an axiom system for logic? I had a (rather computation-intensive) empirical method that could rule axioms out. But the only way to know for sure whether any axiom was actually correct was to prove that it could successfully reproduce, say, the Sheffer axioms for logic.
It took a little software wrangling, but before many days had gone by, I’d discovered that most of the 25 couldn’t work. And in the end, just two survived:
And to my great excitement, I was successfully able to have my computer prove that both are axioms for logic. The procedure I’d used ensured that there could be no simpler axioms for logic. So I knew I’d come to the end of the road: after a century (or maybe even a couple of millennia), we could finally say that the simplest possible axiom for logic was known.
Not long after, I found two 2-axiom systems, also with 6 NANDs in total, that I proved could reproduce logic:
And if one chooses to take commutativity for granted, then these show that all it takes to get logic is one tiny 4-NAND axiom.
Why It Matters
OK, so it’s neat to be able to say that one’s “finished what Aristotle started” (or at least what Boole started) and found the very simplest possible axiom system for logic. But is it just a curiosity, or is there real significance to it?
Before the whole framework I developed in A New Kind of Science, I think one would have been hard-pressed to view it as much more than a curiosity. But now one can see that it’s actually tied into all sorts of foundational questions, like whether one should consider mathematics to be invented or discovered.
Mathematics as humans practice it is based on a handful of particular axiom systems—each in effect defining a certain field of mathematics (say logic, or group theory, or geometry, or set theory). But in the abstract, there are an infinite number of possible axiom systems out there—in effect each defining a field of mathematics that could in principle be studied, even if we humans haven’t ever done it.
Before A New Kind of Science I think I implicitly assumed that pretty much anything that’s just “out there” in the computational universe must somehow be “less interesting” than things we humans have explicitly built and studied. But my discoveries about simple programs made it clear that at the very least there’s often lots of richness in systems that are just “out there” than in ones that we carefully select.
So what about axiom systems for mathematics? Well, to compare what’s just “out there” with what we humans have studied, we have to know where the axiom systems for existing areas of mathematics that we’ve studied—like logic—actually lie. And based on traditional human-constructed axiom systems we’d conclude that they have to be far, far out there—in effect only findable if one already knows where they are.
But my axiom-system discovery basically answered the question, “How far out is logic?” For something like cellular automata, it’s particularly easy to assign a number (as I did in the early 1980s) to each possible cellular automaton. It’s slightly harder to do this with axiom systems, though not much. And in one approach, my axiom can be labeled as 411;3;7;118—constructed in the Wolfram Language as:
✕
Groupings[{p, q, r}[[1 + IntegerDigits[411, 3, 7]]], CenterDot -> 2][[118]] == r
And at least in the space of possible functional forms (not accounting for variable labeling), here’s a visual indication of where the axiom lies:
Given how fundamental logic is to so many formal systems we humans study, we might have thought that in any reasonable representation, logic corresponds to one of the very simplest conceivable axiom systems. But at least with the (NAND-based) representation we’re using, that’s not true. There’s still by most measures a very simple axiom system for it, but it’s perhaps the hundred thousandth possible axiom system one would encounter if one just started enumerating axiom systems starting from the simplest one.
So given this, the obvious next question is, what about all the other axiom systems? What’s the story with those? Well, that’s exactly the kind of investigation that A New Kind of Science is all about. And indeed in the book I argue that things like the systems we see in nature are often best captured precisely by those “other rules” that we can find by enumerating possibilities.
In the case of axiom systems, I made a picture that represents what happens in “fields of mathematics” corresponding to different possible axiom systems. Each row shows the consequences of a particular axiom system, with the boxes across the page indicating whether a particular theorem is true in that axiom system. (Yes, at some point Gödel’s Theorem bites one, and it becomes irreducibly difficult to prove or disprove a given theorem in a given axiom system; in practice, with my methods that happened just a little further to the right than the picture shows…)
Is there something fundamentally special about “human-investigated” fields of mathematics? From this picture, and other things I’ve studied, there doesn’t seem to be anything obvious. And I suspect actually that the only thing that’s really special about these fields of mathematics is the historical fact that they are what have been studied. (One might make claims like that they arise because they “describe the real world”, or because they’re “related to how our brains work”, but the results in A New Kind of Science argue against these.)
Alright, well then what’s the significance of my axiom system for logic? The size of it gives a sense of the ultimate information content of logic as an axiomatic system. And it makes it look like—at least for now—we should view logic as much more having been “invented as a human construct” than having been “discovered” because it was somehow “naturally exposed”.
If history had been different, and we’d routinely looked (in the manner of A New Kind of Science) at lots of possible simple axiom systems, then perhaps we would have “discovered” the axiom system for logic as one with particular properties we happened to find interesting. But given that we have explored so few of the possible simple axiom systems, I think we can only reasonably view logic as something “invented”—by being constructed in an essentially “discretionary” way.
In a sense this is how logic looked, say, back in the Middle Ages—when the possible syllogisms (or valid forms of argument) were represented by (Latin) mnemonics like bArbArA and cElErAnt. And to mirror this, it’s fun to find mnemonics for what we now know is the simplest possible axiom system for logic.
Starting with , we can represent each in prefix or Polish form (the reverse of the “reverse Polish” of an HP calculator) as Dpq—so the whole axiom can be written =DDDpqrDpDDprpr. Then (as Ed Pegg found for me) there’s an English mnemonic for this: FIGure OuT Queue, where are u, r, e. Or, looking at first letters of words (with operator B, and being a, p, c): “Bit by bit, a program computed Boolean algebra’s best binary axiom covering all cases”.
The Mechanics of Proof
OK, so how does one actually prove that my axiom system is correct? Well, the most immediate thing to do is just to show that from it one can derive a known axiom system for logic—like Sheffer’s axiom system:
There are three axioms here, and we’ve got to derive each of them. Well, with the latest version of the Wolfram Language, here’s what we do to derive the first one:
It’s pretty remarkable that it’s now possible to just do this. The “proof object” records that 54 steps were used in the proof. And from this proof object we can generate a notebook that describes each of those steps:
✕
pf["ProofNotebook"]
In outline, what happens is that a whole sequence of intermediate lemmas are proved, which eventually allow the final result to be derived. There’s a whole network of interdependencies between lemmas, as this visualization shows:
✕
pf["ProofGraph"]
Here are the networks involved in deriving all three of the axioms in the Sheffer axiom system—with the last one involving a somewhat whopping 504 steps:
And, yes, it’s clear these are pretty complicated. But before we discuss what that complexity means, let’s talk about what actually goes on in the individual steps of these proofs.
The basic idea is straightforward. Let’s imagine we had an axiom that just said . (Mathematically, this corresponds to the statement that · is commutative.) More precisely, what the axiom says is that for any expressions and , is equivalent to .
OK, so let’s say we wanted to derive from this axiom that . We could do this by using the axiom to transform to , to , and then finally to .
FindEquationalProof does essentially the same thing, though it chooses to do the steps in a slightly different order, and modifies the left-hand side as well as the right-hand side:
Once one’s got a proof like this, it’s straightforward to just run through each of its steps, and check that they produce the result that’s claimed. But how does one find the proof? There are lots of different possible sequences of substitutions and transformations that one could do. So how does one find a sequence that successfully gets to the final result?
One might think: why not just try all possible sequences, and if there is any sequence that works, one will eventually find it? Well, the problem is that one quickly ends up with an astronomical number of possible sequences to check. And indeed the main art of automated theorem proving consists of finding ways to prune the number of sequences one has to check.
This quickly gets pretty technical, but the most important idea is easy to talk about if one knows basic algebra. Let’s say you’re trying to prove an algebraic result like:
✕
(-1 + x^2) (1 - x + x^2) (1 + x + x^2) == (-1 + x) (1 + x + x^2) (1 + x^3)
Well, there’s a guaranteed way to do this: just apply the rules of algebra to expand out each side—and immediately one can see they’re the same:
✕
{Expand[(-1 + x^2) (1 - x + x^2) (1 + x + x^2)],
Expand[(-1 + x) (1 + x + x^2) (1 + x^3)]}
Why does this work? Well, it’s because there’s a way of taking algebraic expressions like this, and always systematically reducing them so that eventually they get to a standard form. OK, but so can one do the same thing for proofs with arbitrary axiom systems?
The answer is: not immediately. It works in algebra because algebra has a special property that guarantees one can always “make progress” in reducing expressions. But what was discovered independently several times in the 1970s (under names like the Knuth–Bendix and the Gröbner Basis algorithm) is that even if an axiom system doesn’t intrinsically have the appropriate property, one can potentially find “completions” of it that do.
And that’s what’s going on in typical proofs produced by FindEquationalProof (which is based on the Waldmeister (“master of trees”) system). There are so-called “critical pair lemmas” that don’t directly “make progress” themselves, but make it possible to set up paths that do. And the reason things get complicated is that even if the final expression one’s trying to get to is fairly short, one may have to go through all sorts of much longer intermediate expressions to get there. And so, for example, for the proof of the first Sheffer axiom above, here are the intermediate steps:
In this case, the largest intermediate form is about 4 times the size of the original axiom. Here it is:
One can represent expressions like this as a tree. Here’s this one, compared to the original axiom:
And here’s how the sizes of intermediate steps evolve through the proofs found for each of the Sheffer axioms:
Why Is It So Hard?
Is it surprising that these proofs are so complicated? In some ways, not really. Because, after all, we know perfectly well that math can be hard. In principle it might have been that anything that’s true in math would be easy to prove. But one of the side effects of Gödel’s Theorem from 1931 was to establish that even things we can eventually prove can have proofs that are arbitrarily long.
And actually this is a symptom of the much more general phenomenon I call computational irreducibility. Consider a system governed, say, by the simple rule of a cellular automaton (and of course, every essay of mine must have a cellular automaton somewhere!). Now just run the system:
One might have thought that given that there’s a simple rule that underlies the system, there’d always be a quick way to figure out what the system will do. But that’s not the case. Because according to my Principle of Computational Equivalence the operation of the system will often correspond to a computation that’s just as sophisticated as any computation that we could set up to figure out the behavior of the system. And this means that the actual behavior of the system in effect corresponds to an irreducible amount of computational work that we can’t in general shortcut in any way.
In the picture above, let’s say we want to know whether the pattern eventually dies out. Well, we could just keep running it, and if we’re lucky it’ll eventually resolve to something whose outcome is obvious. But in general there’s no upper bound to how far we’ll have to go to, in effect, prove what happens.
When we do things like the logic proofs above, it’s a slightly different setup. Instead of just running something according to definite rules, we’re asking whether there exists a way to get to a particular result by taking some series of steps that each follow a particular rule. And, yes, as a practical computational problem, this is immediately more difficult. But the core of the difficulty is still the same phenomenon of computational irreducibility—and that this phenomenon implies that there isn’t any general way to shortcut the process of working out what a system will do.
Needless to say, there are plenty of things in the world—especially in technology and scientific modeling, as well as in areas where there are various forms of regulation—that have traditionally been set up to implicitly avoid computational irreducibility, and to operate in ways whose outcome can readily be foreseen without an irreducible amount of computation.
But one of the implications of my Principle of Computational Equivalence is that this is a rather singular and contrived situation—because it says that computational irreducibility is in fact ubiquitous across systems in the computational universe.
OK, but what about mathematics? Maybe somehow the rules of mathematics are specially chosen to show computational reducibility. And there are indeed some cases where that’s true (and in some sense it even happens in logic). But for the most part it appears that the axiom systems of mathematics are not untypical of the space of all possible axiom systems—where computational irreducibility is inevitably rampant.
What’s the Point of a Proof?
At some level, the point of a proof is to know that something is true. Of course, particularly in modern times, proof has very much taken a back seat to pure computation. Because in practice it’s much more common to want to generate things by explicit computation than it is to want to “go back” and construct a proof that something is true.
In pure mathematics, though, it’s fairly common to deal with things that at least nominally involve an infinite number of cases (“true for all primes”, etc.), for which at least direct computation can’t work. And when it comes to questions of verification (“can this program ever crash?” or “can this cryptocurrency ever get spent twice?”) it’s often more reasonable to attempt a proof than to do something like run all possible cases.
But in the actual practice of mathematics, there’s more to proof than just establishing if things are true. Back when Euclid first wrote his Elements, he just gave results, and proofs were “left to the reader”. But for better or worse, particularly over the past century, proof has become something that doesn’t just happen behind the scenes, but is instead actually the primary medium through which things are supposed to be communicated.
At some level I think it’s a quirk of history that proofs are typically today presented for humans to understand, while programs are usually just thought of as things for computers to run. Why has this happened? Well, at least in the past, proofs could really only be represented in essentially textual form—so if they were going to be used, it would have to be by humans. But programs have essentially always been written in some form of computer language. And for the longest time, that language tended to be set up to map fairly directly onto the low-level operations of the computer—which meant that it was readily “understandable” by the computer, but not necessarily by humans.
But as it happens, one of the main goals of my own efforts over the past several decades has been to change this—and to develop in the Wolfram Language a true “computational communication language” in which computational ideas can be communicated in a way that is readily understandable to both computers and humans.
There are many consequences of having such a language. But one of them is that it changes the role of proof. Let’s say one’s looking at some mathematical result. Well, in the past the only plausible way to communicate how one should understand it was to give a proof that people could read. But now something different is possible: one can give a Wolfram Language program that computes the result. And in many ways this is a much more powerful way to communicate why the result is true. Because every piece of the program is something precise and unambiguous—that if one wants to, one can actually run. There’s no issue of trying to divine what some piece of text means, perhaps filling in some implicit assumptions. Instead, everything is right there, in absolutely explicit form.
OK, so what about proof? Are there in fact unambiguous and precise ways to write proofs? Well, potentially yes, though it’s not particularly easy. And even though the main Wolfram Language has now existed for 30 years, it’s taken until pretty much now to figure out a reasonable way to represent in it even such structurally comparatively straightforward proofs as the one for my axiom system above.
One can imagine authoring proofs in the Wolfram Language much like one authors programs—and indeed we’re working on seeing how to provide high-level versions of this kind of “proof assistant” functionality. But the proof of my axiom system that I showed above is not something anyone authored; it’s something that was found by the computer. And as such, it’s more like the output of running a program than like a program itself. (Like a program, though, the proof can in some sense be “run” to verify the result.)
Generating Understandability
Most of the time when people use the Wolfram Language—or Wolfram|Alpha—they just want to compute things. They’re interested in getting results, not in understanding why they get the results they do. But in Wolfram|Alpha, particularly in areas like math and chemistry, a popular feature for students is “step-by-step solutions”:
When Wolfram|Alpha does something like computing an integral, it’s using all sorts of powerful systematic algorithmic techniques optimized for getting answers. But when it’s asked to show steps it needs to do something different: it needs instead to explain step by step why it gets the result it does.
It wouldn’t be useful for it to explain how it actually got the result; it’s a very non-human process. Instead, it basically has to figure out how the kinds of operations humans learn can be used to get the result. Often it’ll figure out some trick that can be used. Yes, there’ll be a systematic way to do it that’ll always work. But it involves too many “mechanical” steps. The “trick” (“trig substitution”, “integration by parts”, whatever) won’t work in general, but in this particular case it’ll provide a faster way to get to the answer.
OK, but what about getting understandable versions of other things? Like the operation of programs in general. Or like the proof of my axiom system.
Let’s start by talking about programs. Let’s say one’s written a program, and one wants to explain how it works. One traditional approach is just to “include comments” in the code. Well, if one’s writing in a traditional low-level language, that may be the best one can do. But the whole point of the Wolfram Language being a computational communication language is that the language itself is supposed to allow you to communicate ideas, without needing extra pieces of English text.
It takes effort to make a Wolfram Language program be a good piece of exposition, just like it takes effort to make English text a good piece of exposition. But one can end up with a piece of Wolfram Language code that really explains very clearly how it works just through the code itself.
Of course, it’s very common for the actual execution of the code to do things that can’t readily be foreseen just from the program. I’ll talk about extreme cases like cellular automata soon. But for now let’s imagine that one’s constructed a program where there’s some ability to foresee the broad outlines of what it does.
And in such a case, I’ve found that computational essays (presented as Wolfram Notebooks) are a great tool in explaining what’s going on. It’s crucial that the Wolfram Language is symbolic, so it’s possible to run even the tiniest fragments of any program on their own (with appropriate symbolic expressions as input or output). And when one does this, one can present a succession of steps in the program as a succession of elements in the dialog that forms the core of a computational notebook.
In practice, it’s often critical to create visualizations of inputs or outputs. Yes, everything can be represented as an explicit symbolic expression. But we humans often have a much easier time understanding things when they’re presented visually, rather than as some kind of one-dimensional language-like string.
Of course, there’s something of an art to creating good visualizations. But in the Wolfram Language we’ve managed to go a long way towards automating this art—often using pretty sophisticated machine learning and other algorithms to do things like lay out networks or graphics elements.
What about just starting from the raw execution trace for a program? Well, it’s hard. I’ve done experiments on this for decades, and never been very satisfied with the results. Yes, you can zoom in to see lots of details of what’s going on. But when it comes to knowing the “big picture” I’ve never found any particularly good techniques for automatically producing things that are terribly useful.
At some level it’s similar to the general problem of reverse engineering. You are shown some final machine code, chip design, or whatever. But now you want to go backwards to reconstruct the higher-level description that some human started from, that was somehow “compiled” to what you see.
In the traditional approach to engineering, where one builds things up incrementally, always somehow being able to foresee the consequences of what one’s building, this approach can in principle work. But if one does engineering by just searching the computational universe to find an optimal program (much like I searched possible axiom systems to find one for logic), then there’s no guarantee that there’s any “human story” or explanation behind this program.
It’s a similar problem in natural science. You see some elaborate set of things happening in some biological system. Can one “reverse engineer” these to find an “explanation” for them? Sometimes one might be able to say, for example, that evolution by natural selection would be likely to lead to something. Or that it’s just common in the computational universe and so is likely to occur. But there’s no guarantee that the natural world is set up in any way that necessarily allows human explanation.
Needless to say, when one makes models for things, one inevitably considers only the particular aspects that one’s interested in, and idealizes everything else away. And particularly in areas like medicine, it’s not uncommon to end up with some approximate model that’s a fairly shallow decision tree that’s easy to explain, at least as far as it goes.
The Nature of Explainability
What does it mean to say that something is explainable? Basically it’s that humans can understand it.
So what does it take for humans to understand something? Well, somehow we have to be able to “wrap our brains around it”. Let’s take a typical cellular automaton with complex behavior. A computer has no problem following each step in the evolution. And with immense effort a human could laboriously reproduce what a computer does.
But one wouldn’t say that means the human “understands” what the cellular automaton is doing. To get to that point, the human would have to be readily able to reason about how the cellular automaton behaves, at some high level. Or put another way, the human would have to be able to “tell a story” that other humans could readily understand, about how the cellular automaton behaves.
Is there a general way to do this? Well, no, because of computational irreducibility. But it can still be the case that certain features that humans choose to care about can be explained in some reduced, higher-level way.
How does this work? Well, in a sense it requires that some higher-level language be constructed that can describe the features one’s interested in. Looking at a typical cellular automaton pattern, one might try to talk not in terms of colors of huge numbers of individual cells, but instead in terms of the higher-level structures one can pick out. And the key point is that it’s possible to make at least a partial catalog of these structures: even though there are lots of details that don’t quite fit, there are still particular structures that occur often.
And if we were going to start “explaining” the behavior of the cellular automaton, we’d typically begin by giving the structures names, and then we’d start talking about what’s going on in terms of these named things.
The case of a cellular automaton has an interesting simplifying feature: because it operates according to simple deterministic rules, there are structures that just repeat identically. If we’re dealing with things in the natural world, for example, we typically won’t see this kind of identical repetition. Instead, it’ll just be that this tiger, say, is extremely similar to this other one, so we can call them both “tigers”, even though their atoms are not identical in their arrangement.
What’s the bigger picture of what’s going on? Well, it’s basically that we’re using the idea of symbolic representation. We’re saying that we can assign something—often a word—that we can use to symbolically describe a whole class of things, without always having to talk about all the detailed parts of each thing.
In effect it’s a kind of information compression: we’re using symbolic constructs to find a shorter way to describe what we’re interested in.
Let’s imagine we’ve generated a giant structure, say a mathematical one:
✕
Solve[a x^4 + b x^3 + c x^2 + d x + e == 0, x]
Well, a first step is to generate a kind of internal higher-level representation. For example, we might find substructures that appear repeatedly. And we might then assign names to them. And then display a “skeleton” of the whole structure in terms of these names:
And, yes, this kind of “dictionary compression”–like scheme is useful in bringing a first level of explainability.
But let’s go back to the proof of my axiom system. The lemmas that were generated in this proof are precisely set up to be elements that are used repeatedly (a bit like shared common subexpressions). But even having in effect factored them out, we’re still left with a proof that is not something that we humans can readily understand.
So how can we go further? Well, basically we have to come up with some yet-higher-level description. But what might this be?
The Concept of Concepts
If you’re trying to explain something to somebody, it’s a lot easier when there’s something similar that they’ve already understood. Imagine trying to explain a modern drone to someone from the Stone Age. It’d probably be pretty difficult. But explaining it to someone from 50 years ago, who’d already seen helicopters and model airplanes etc., would be a lot easier.
And ultimately the point is that when we explain something, we do it in some language that both we and whoever we’re explaining it to knows. And the richer this language is, the fewer new elements we have to introduce in order to communicate whatever it is that we’re trying to explain.
There’s a pattern that’s been repeated throughout intellectual history. Some particular collection of things gets seen a bunch of times. And gradually it’s understood that these things are all somehow abstractly similar. And they can all be described in terms of some particular new concept, often referred to by some new word or phrase.
Let’s say one had seen things like water and blood and oil. Well, at some point one realizes that there’s a general concept of “liquid”, and all of these can be described as liquids. And once one has this concept, one can start reasoning in terms of it, and identifying more concepts—like, say, viscosity—that build on it.
When does it makes sense to group things into a concept? Well, that’s a difficult question, which can’t ultimately be answered without foreseeing everything that might be done with that concept. And in practice, in the evolution of human language and human ideas there’s some kind of process of progressive approximation that goes on.
There’s a much more rapid recapitulation that happens in a modern machine learning system. Imagine taking all sorts of objects that one’s seen in the world, and just feeding them to FeatureSpacePlot and seeing what comes out. Well, if one gets definite clusters in feature space, then one might reasonably think that each of these clusters should be identified as corresponding to a “concept”, that we could for example label with a word.
Now, to be fair, what’s happening with FeatureSpacePlot—like in human intellectual development—is in some ways incremental. Because to lay the objects out in feature space, FeatureSpacePlot is using features that it’s learned how to extract from previous categorizations it knows about.
But, OK, given the world as it is, what are the best categories—or best concepts—one can use to describe things? Well, it’s an evolving story. And in fact breakthroughs—whether in science, technology or elsewhere—are very often precisely associated with the realization that some new category or concept can usefully be identified.
But in the actual evolution of our civilization, there’s a kind of spiral at work. First some particular concept is identified—say the idea of a program. And once some concept has been identified, people start using it, and thinking in terms of it. And pretty soon all sorts of new things have been constructed on the basis of that concept. But then another level of abstraction is identified, and new concepts get constructed, building on top of the previous one.
It’s pretty much the same story for the technology stack of modern civilization, and its “intellectual stack”. Both involve towers of concepts, and successive levels of abstraction.
The Problem of Education
In order for people to be able to communicate using some concept, they have to have learned about it. And, yes, there are some concepts (like object permanence) that humans automatically learn by themselves just by observing the natural world. But looking for example at a list of common words in modern English, it’s pretty clear that most of the concepts that we now use in modern civilization aren’t ones that people can just learn for themselves from the natural world.
Instead—much like a modern machine learning system—at the very least they need some “specially curated” experience of the world, organized to highlight particular concepts. And for more abstract areas (like mathematics) they probably need explicit exposure to the concepts themselves in their raw abstract forms.
But, OK, as the “intellectual stack” of civilization advances, will we always have to learn progressively more? We might worry that at some point our brains just won’t be able to keep up, and we’d have to add some kind of augmentation. But perhaps fortunately, I think it’s one of those cases where the problem can instead most likely be “solved in software”.
The issue is this: At any given point in history, there’s a certain set of concepts that are important in being able to operate in the world as it is at that time. And, yes, as civilization progresses new things are discovered, and new concepts are introduced. But there’s another process at work as well: new concepts bring new levels of abstraction, which typically subsume large numbers of earlier concepts.
We often see this in technology. There was a time when to operate a computer you needed to know all sorts of low-level details. But over time those got abstracted away, so all you need to know is some general concept. You click an icon and things start to happen—and you don’t have to understand operating systems, or interrupt handlers or schedulers, or any of those details.
Needless to say, the Wolfram Language provides a great example of all this. Because it goes to tremendous trouble to “automate out” lots of low-level details (for example about what specific algorithm to use) and let human users just think about things in terms of higher-level concepts.
Yes, there still need to be some people who understand the details “underneath” the abstraction (though I’m not sure how many flint knappers modern society needs). But mostly education can concentrate on teaching at a higher level.
There’s often an implicit assumption in education that to reach higher-level concepts one has to somehow recapitulate the history of how those concepts were historically arrived at. But usually—and perhaps always—this doesn’t seem to be true. In an extreme case, one might imagine that to teach about computers, one would have to recapitulate the history of mathematical logic. But actually we know that people can go straight to modern concepts of computing, without recapitulating any of the history.
But what is ultimately the understandability network of concepts? Are there concepts that can only be understood if one already understands other concepts? Given a particular ambient experience for a human (or particular background training for a neural network) there is presumably some ordering.
But I suspect that something analogous to computation universality probably implies that if one’s just dealing with a “raw brain” then one could start anywhere. So if some alien were exposed to category theory and little else from the very beginning, they’d no doubt build a network of concepts where this is at the root, and maybe what for us is basic arithmetic would be something only reached in their analog of math graduate school.
Of course, such an alien might form their technology stack and their built environment in a quite different way from us—much as the recent history of our own civilization might have been very different if computers had successfully been developed in the 1800s rather than in the mid-1900s.
The Progress of Mathematics
I’ve often wondered to what extent the historical trajectory of human mathematics is an “accident”, and to what extent it’s somehow inexorable. As I mentioned earlier, at the level of formal systems there are many possible axiom systems from which one could construct something that is formally like mathematics.
But the actual history of mathematics did not start with arbitrary axiom systems. It started—in Babylonian times—with efforts to use arithmetic for commerce and geometry for land surveying. And from these very practical origins, successive layers of abstraction have been added that have led eventually to modern mathematics—with for example numbers being generalized from positive integers, to rationals, to roots, to all integers, to decimals, to complex numbers, to algebraic numbers, to quaternions and so on.
Is there an inexorability to this progression of abstraction? I suspect to some extent there is. And probably it’s a similar story as with other kinds of concept formation. Given some stage that’s been reached, there are various things that can readily get studied, and after a while groups of them are seen to be examples of more general and abstract constructs—which then in turn define another stage from which new things can be studied.
Are there ways to break out of this cycle? One possibility would be through doing experiments in mathematics. Yes, one can systematically prove things about particular mathematical systems. But one can also just empirically notice mathematical facts—like Ramanujan’s observation that is numerically close to an integer. And the question is: are things like this just “random facts of mathematics” or do they somehow fit into the whole “fabric of mathematics”?
One can ask the same kind of thing about questions in mathematics. Is the question of whether odd perfect numbers exist (which has been unanswered since Pythagoras) a core question in mathematics, or is it, in a sense, a random question that doesn’t connect into the fabric of mathematics?
Just like one can enumerate things like axiom systems, so also one can imagine enumerating possible questions in mathematics. But if one does this, I suspect there’s immediately an issue. Gödel’s Theorem establishes that in axiom systems like the one for arithmetic there are “formally undecidable” propositions, that can’t be proved or disproved from within the axiom system.
But the particular examples that Gödel constructed seemed far from anything that would arise naturally in doing mathematics. And for a long time it was assumed that somehow the phenomenon of undecidability was something that, while in principle present, wasn’t going to be relevant in “real mathematics”.
However, with my Principle of Computational Equivalence and my experience in the computational universe, I’ve come to the strong conclusion that this isn’t correct—and that instead undecidability is actually close at hand even in typical mathematics as it’s been practiced. Indeed, I won’t be surprised if a fair fraction of the current famous unsolved problems of mathematics (Riemann Hypothesis, P=NP, etc.) actually turn out to be in effect undecidable.
But if there’s undecidability all around, how come there’s so much mathematics that’s successfully been done? Well, I think it’s because the things that have been done have implicitly been chosen to avoid undecidability, basically just by virtue of the way mathematics has been built up. Because if what one’s doing is basically to form progressive levels of abstraction based on things one has shown are true, one’s basically setting up a path that’s going to be able to move forward without being forced into undecidability.
Of course, doing experimental mathematics or asking “random questions” may immediately land one in some area that’s full of undecidability. But at least so far in its history, this hasn’t been the way the mainstream discipline of mathematics has evolved.
So what about those “random facts of mathematics”? Well, it’s pretty much like in other areas of intellectual endeavor. “Random facts” don’t really get integrated into a line of intellectual development until some structure—and typically some abstract concepts—are built around them.
Nested patterns are another example. There are isolated examples of these in mosaics from the 1200s, but nobody really paid attention to them until the whole framework around nesting and fractals emerged in the 1980s.
It’s the same story over and over again: until abstract concepts around them have been identified, it’s hard to really think about new things, even when one encounters phenomena that exhibit them.
And so, I suspect, it is with mathematics: there’s a certain inevitable layering of abstract concept on top of abstract concept that defines the trajectory of mathematics. Is it a unique path? Undoubtedly not. In the vast space of possible mathematical facts, there are particular directions that get picked, and built along. But others could have been picked instead.
So does this mean that the subject matter of mathematics is inevitably dominated by historical accidents? Not as much as one might think. Because—as mathematics has discovered over and over again, starting with things like algebra and geometry—there’s a remarkable tendency for different directions and different approaches to wind up having equivalences or correspondences in the end.
And probably at some level this is a consequence of the Principle of Computational Equivalence, and the phenomenon of computational universality: even though the underlying rules (or underlying “language”) used in different areas of mathematics are different, there ends up being some way to translate between them—so that at the next level of abstraction the path that was taken no longer critically matters.
The Logic Proof and the Automation of Abstraction
OK, so let’s go back to the logic proof. How does it connect to typical mathematics? Well, right now, it basically doesn’t. Yes, the proof has the same nominal form as a standard mathematical proof. But it isn’t “human-mathematician friendly”. It’s all just mechanical details. It doesn’t connect to higher-level abstract concepts that a human mathematician can readily understand.
It would help a lot if we discovered that nontrivial lemmas in the proof already appeared in the mathematics literature. (I don’t think any of them do, but our theorem-searching capabilities haven’t gotten to the point where one can be sure.) But if they did appear, then this would likely give us a way to connect these lemmas to other things in mathematics, and in effect to identify a circle of abstract concepts around them.
But without that, how can the proof become explainable?
Well, maybe there’s just a different way to do the proof that’s fundamentally more connected to existing mathematics. But even given the proof as we have it now, one could imagine “building out” new concepts that would define a higher level of abstraction and put the proof in a more general context.
I’m not sure how to do either of these things. I’ve considered sponsoring a prize (analogous to my 2007 Turing machine prize) for “making the proof explainable”. But it’s not at all clear how one could objectively judge “explainability”. (Maybe one could ask for a 1-hour video that would successfully explain the proof to a typical mathematician—but this is definitely rather subjective.)
But just like we can automate things like finding aesthetic layouts for networks, perhaps we can automate the process of making a proof explainable. The proof as it is right now basically just says (without explanation), “Consider these few hundred lemmas”. But let’s say we could identify a modest number of “interesting” lemmas. Maybe we could somehow add these to our canon of known mathematics and then be able to use them to understand the proof.
There’s an analogy here with language design. In building up the Wolfram Language what I’ve basically done is to try to identify “lumps of computational work” that people will often want. Then we make these into built-in functions in the language, with particular names that people can use to refer to them.
A similar process goes on—though in a much less organized way—in the evolution of human natural languages. “Lumps of meaning” that turn out to be useful eventually get represented by words in the language. Sometimes they start as phrases constructed out of a few existing words. But the most impactful ones are typically sufficiently far away from anything that has come before that they just arrive as new words with potentially quite-hard-to-give definitions.
In the design of the Wolfram Language—with functions named with English words—I leverage the “ambient understanding” that comes from the English words (and sometimes from their meanings in common applications of computation).
One would want to do something similar in identifying lemmas to add to our canon of mathematics. Not only would one want to make sure that each lemma was somehow “intrinsically interesting”, but one would also want when possible to select lemmas that are “easy to reach” from existing known mathematical results and concepts.
But what does it mean for a lemma to be “intrinsically interesting”? I have to say that before I worked on A New Kind of Science, I assumed that there was great arbitrariness and historical accident in the choice of lemmas (or theorems) in any particular areas of mathematics that get called out and given names in typical textbooks.
But when I looked in detail at theorems in basic logic, I was surprised to find something different. Let’s say one arranges all the true theorems of basic logic in order of their sizes (e.g. might come first; AND a bit later, and so on). When one goes through this list there’s lots of redundancy. Indeed, most of the theorems end up being trivial extensions of theorems that have already appeared in the list.
But just sometimes one gets to a theorem that essentially gives new information—and that can’t be proved from the theorems that have already appeared in the list. And here’s the remarkable fact: there are 14 such theorems, and they essentially correspond exactly with the theorems that are typically given names in textbooks of logic. (Here AND is ∧, OR is ∨, and NOT is ¬.)
In other words, at least in this case, the named or “interesting” theorems are the ones that give minimal statements of new information. (Yes, after a while, by this definition there will be no new information, because one will have encountered all the axioms needed to prove anything that can be proved—though one can go a bit further with this approach by starting to discuss limiting the complexity of proofs that are allowed.)
What about with NAND theorems, like the ones in the proof? Once again, one can arrange all trueNAND theorems in order—and then find which of them can’t be proved from any earlier in the list:
NAND doesn’t have the same kind of historical traditional as AND, OR and NOT. (And there doesn’t seem to be any human language that, for example, has a single ordinary word for NAND.) But in the list of NAND theorems, the first highlighted one is easy to recognize as commutativity of NAND. After that, one really has to do a bit of translation to name the theorems: is like the law of double negation, is like the absorption law, is like “weakening”, and so on.
But, OK, so if one’s going to learn just a few “key theorems” of NAND logic, which should they be? Perhaps they should be theorems that appear as “popular lemmas” in proofs.
Of course, there are many possible proofs of any given theorem. But let’s say we just use the particular proofs that FindEquationalProof generates. Then it turns out that in the proofs of the first thousand NAND theorems the single most popular lemma is , followed by such lemmas like .
What are these? Well, for the particular methods that FindEquationalProof uses, they’re useful. But for us humans they don’t seem terribly helpful.
But what about popular lemmas that happen to be short? is definitely not the most popular lemma, but it is the shortest. is more popular, but longer. And then there are lemmas like .
But how useful are these lemmas? Here’s a way to test. Look at the first thousand NAND theorems, and see how much adding the lemmas shortens the proofs of these theorems (at least as found by FindEquationalProof):
is very successful, often cutting down the proof by nearly 100 steps. is much less successful; in fact, it actually sometimes seems to “confuse” FindEquationalProof, causing it to take more rather than fewer steps (visible as negative values in the plot). is OK at shortening, but not as good as . Though if one combines it with , the result is more consistent shortening.
One could go on with this analysis, say including a comparison of how much shortening is produced by a given lemma, relative to how long its own proof was. But the problem is that if one adds several “useful lemmas”, like and , there are still plenty of long proofs—and thus a lot left to “understand”:
What Can One Understand?
There are different ways to create models of things. For a few hundred years, exact science was dominated by the idea of finding mathematical equations that could be solved to say how things should behave. But in pretty much the time since A New Kind of Science appeared, there’s been a strong shift to instead set up programs that can be run to say how things should behave.
Sometimes those programs are explicitly constructed for a particular purpose; sometimes they’re exhaustively searched for. And in modern times, at least one class of such programs is deduced using machine learning, essentially by going backwards from examples of how the system is known to behave.
OK, so with these different forms of modeling, how easy is it to “understand what’s going on”? With mathematical equations, it’s a big plus when it’s possible to find an “exact solution”—in which the behavior of the system can be represented by something like an explicit mathematical formula. And even when this doesn’t happen, it’s fairly common to be able to make at least some mathematical statements that are abstract enough to connect to other systems and other behaviors.
As I discussed above, with a program—like a cellular automaton—it can be a different story. Because it’s common to be thrust immediately into computational irreducibility, which ultimately limits how much one can ever hope to shortcut or “explain” what’s going on.
But what about with machine learning, and, say, with neural nets? At some level, the training of a neural net is like recapitulating inductive discovery in natural science. One’s trying to start from examples and deduce a model for how a system behaves. But then can one understand the model?
Again there are issues of computational irreducibility. But let’s talk about a case where we can at least imagine what it would look like to understand what’s going on.
Instead of using a neural net to model how some system behaves, let’s consider making a neural net that classifies some aspect of the world: say, takes images and classifies them according to what they’re images of (“boat”, “giraffe”, etc.). When we train the neural net, it’s learning to give correct final outputs. But potentially one can think of the way it does this as being to internally make a sequence of distinctions (a bit like playing a game of Twenty Questions) that eventually determines the correct output.
But what are those distinctions? Sometimes we can recognize some of them. “Is there a lot of blue in the image?”, for example. But most of the time they’re essentially features of the world that we humans don’t notice. Maybe there’s an alternative history of natural science where some of them would have shown up. But they’re not things that are part of our current canon of perception or analysis.
If we wanted to add them, we’d probably end up inventing words for them. But the situation is very similar to the one with the logic proof. An automated system has created things that it’s effectively using as “waypoints” in generating a result. But they’re not waypoints we recognize or relate to.
Once again, if we found that particular distinctions were very common for neural nets, we might decide that those are distinctions that are worth us humans learning, and adding to our standard canon of ways to describe the world.
Can we expect that a modest number of such distinctions would go a long way? It’s analogous to asking whether a modest number of theorems would go a long way in understanding something like the logic proof.
My guess is that the answer is fuzzy. If one looks, for example, at a large corpus of math papers, one can ask how common different theorems are. It turns out that the frequency of theorems follows an almost perfect Zipf law (with the Central Limit Theorem, the Implicit Function Theorem and Fubini’s Theorem as the top three). And it’s probably the same with distinctions that are “worth knowing”, or new theorems that are “worth knowing”.
Knowing a few will get one a certain distance, but there’ll be an infinite power-law tail, and one will never get to the end.
The Future of Knowledge
Whether one looks at mathematics, science or technology, one sees the same basic qualitative progression of building a stack of increasing abstraction. It would be nice to be able to quantify this process. Perhaps one could look at how certain terms or descriptions that are common at one time later get subsumed into higher levels of abstraction, which then in turn have new terms or descriptions associated with them.
Maybe one could create an idealized model of this process using some formal model of computation, like Turing machines. Imagine that at the lowest level one has a basic Turing machine, with no abstraction. Now imagine selecting programs for this Turing machine according to some defined random process. Then run these programs and analyze them to see what “higher-level” model of computation can successfully reproduce the aggregate behavior of these programs without having to run each step in each program.
One might have thought that computational irreducibility would imply that this higher-level model of computation would inevitably be more complicated in its construction. But the key point is that we’re only trying to reproduce the aggregate behavior of the programs, not their individual behavior.
OK, but so then what happens if you iterate this process—essentially recapitulating idealized human intellectual history and building a progressive tower of abstraction?
Conceivably there’s some analogy to critical phenomena in physics, and to the renormalization group. And if so, one might imagine being able to identify a definite trajectory in the space of what amount to concept representation frameworks. What will the trajectory do?
Maybe it’ll have some kind of fixed-point behavior, representing the guess that at any point in history there are about the same number of abstract concepts that are worth learning—with new ones slowly being invented, and old ones being subsumed.
What might any of this mean for mathematics? One guess might be that any “random fact of mathematics”, say discovered empirically, would eventually be covered when some level of abstraction is reached. It’s not obvious how this process would work. After all, at any given level of abstraction, there are always new empirical facts to be “jumped to”. And it might very well be that the “rising tide of abstraction” would move only slowly compared to the rate at which such jumps could be made.
The Future of Understanding
OK, so what does all this mean for the future of understanding?
In the past, when humans looked, say, at the natural world, they had few pretensions to understand it. Sometimes they would personify certain aspects of it in terms of spirits or deities. But they saw it as just acting as it did, without any possibility for humans to understand in detail why.
But with the rise of modern science—and especially as more of our everyday existence came to be in built environments dominated by technology (or regulatory structures) that we had designed—these expectations changed. And as we look at computation or AI today, it seems unsettling that we might not be able to understand it.
But ultimately there’s always going to be a competition between what the systems in our world do, and what our brains are capable of computing about them. If we choose to interact only with systems that are computationally much simpler than our brains, then, yes, we can expect to use our brains to systematically understand what the systems are doing.
But if we actually want to make full use of the computational capabilities that our universe makes possible, then it’s inevitable that the systems we’re dealing with will be equivalent in their computational capabilities to our brains. And this means that—as computational irreducibility implies—we’ll never be able to systematically “outthink” or “understand” those systems.
But then how can we use them? Well, pretty much like people have always used systems from the natural world. Yes, we don’t know everything about how they work or what they might do. But at some level of abstraction we know enough to be able to see how to get purposes we care about achieved with them.
What about in an area like mathematics? In mathematics we’re used to building our stack of knowledge so that each step is something we can understand. But experimental mathematics—as well as things like automated theorem proving—make it clear that there are places to go that won’t have this feature.
Will we call this “mathematics”? I think we should. But it’s a different tradition from what we’ve mostly used for the past millennium. It’s one where we can still build abstractions, and we can still construct new levels of understanding.
But somewhere underneath there will be all sorts of computational irreducibility that we’ll never really be able to bring into the realm of human understanding. And that’s basically what’s going on in the proof of my little axiom for logic. It’s an early example of what I think will be the dominant experience in mathematics—and a lot else—in the future.
But what is its story, and how did we come to adopt it as our symbol?
The Origins of Spikey
Back in 1987, when we were developing the first version of Mathematica, one of our innovations was being able to generate resolution-independent 3D graphics from symbolic descriptions. In our early demos, this let us create wonderfully crisp images of Platonic solids. But as we approached the release of Mathematica 1.0, we wanted a more impressive example. So we decided to take the last of the Platonic solids—the icosahedron—and then make something more complex by a certain amount of stellation (or, more correctly, cumulation). (Yes, that’s what the original notebook interface looked like, 30 years ago…)
At first this was just a nice demo that happened to run fast enough on the computers we were using back then. But quite soon the 3D object it generated began to emerge as the de facto logo for Mathematica. And by the time Mathematica 1.0 was released in 1988, the stellated icosahedron was everywhere:
In time, tributes to our particular stellation started appearing—in various materials and sizes:
But just a year after we released Mathematica 1.0, we were getting ready to release Mathematica 1.2, and to communicate its greater sophistication, we wanted a more sophisticated logo. One of our developers, Igor Rivin, had done his PhD on polyhedra in hyperbolic space—and through his efforts a hyperbolic icosahedron adorned our Version 1.2 materials:
My staff gave me an up-to-date-Spikey T-shirt for my 30th birthday in 1989, with a quote that I guess even after all these years I’d still say:
After Mathematica 1.2, our marketing materials had a whole collection of hyperbolic Platonic solids, but by the time Version 2.0 arrived in 1991 we’d decided our favorite was the hyperbolic dodecahedron:
Looking through my 1991 archives today, I find some “explanatory” code (by Ilan Vardi)—and it’s nice to see that it all just runs in our latest Wolfram Language (though now it can be written a bit more elegantly):
Over the years, it became a strange ritual that when we were getting ready to launch a new integer version of Mathematica, we’d have very earnest meetings to “pick our new Spikey”. Sometimes there would be hundreds to choose from, generated (most often by Michael Trott) using all kinds of different algorithms:
But though the color palettes evolved, and the Spikeys often reflected (though perhaps in some subtle way) new features in the system, we’ve now had a 30-year tradition of variations on the hyperbolic dodecahedron:
In more recent times, it’s become a bit more streamlined to explore the parameter space—though by now we’ve accumulated hundreds of parameters:
A hyperbolic dodecahedron has 20 points—ideal for celebrating the 20th anniversary of Mathematica in 2008. But when we wanted something similar for the 25th anniversary in 2013 we ran into the problem that there’s no regular polyhedron with 25 vertices. But (essentially using SpherePoints[25]) we managed to create an approximate one—and made a 3D printout of it for everyone in our company, sized according to how long they’d been with us:
Enter Wolfram|Alpha
In 2009, we were getting ready to launch Wolfram|Alpha—and it needed a logo. There were all sorts of concepts:
We really wanted to emphasize that Wolfram|Alpha works by doing computation (rather than just, say, searching). And for a while we were keen on indicating this with some kind of gear-like motif. But we also wanted the logo to be reminiscent of our longtime Mathematica logo. So this led to one of those classic “the-CEO-must-be-crazy” projects: make a gear mechanism out of spikey-like forms.
Longtime Mathematica and Wolfram Language user (and Hungarian mechanical engineer) Sándor Kabai helped out, suggesting a “Spikey Gear”:
And then, in a throwback to the Version 2 intersecting tetrahedra, he came up with this:
In 2009, 3D printing was becoming very popular, and we thought it would be nice for Wolfram|Alpha to have a logo that was readily 3D printable. Hyperbolic polyhedra were out: their spikes would break off, and could be dangerous. (And something like the Mathematica Version 4 Spikey, with “safety spikes”, lacked elegance.)
For a while we fixated on the gears idea. But eventually we decided it’d be worth taking another look at ordinary polyhedra. But if we were going to adopt a polyhedron, which one should it be?
There are of course an infinite number of possible polyhedra. But to make a nice logo, we wanted a symmetrical and somehow “regular” one. The five Platonic solids—all of whose faces are identical regular polygons—are in effect the “most regular” of all polyhedra:
Then there are the 13 Archimedean solids, all of whose vertices are identical, and whose faces are regular polygons but of more than one kind:
Over the years that Eric Weisstein was assembling what in 1999 became MathWorld, he made an effort to include articles on as many notable polyhedra as possible. And in 2006, as part of putting every kind of systematic data into Mathematica and the Wolfram Language, we started including polyhedron data from MathWorld. The result was that when Version 6.0 was released in 2007, it included the function PolyhedronData that contained extensive data on 187 notable polyhedra:
It had always been possible to generate regular polyhedra in Mathematica and the Wolfram Language, but now it became easy. With the release of Version 6.0 we also started the Wolfram Demonstrations Project, which quickly began accumulating all sorts of polyhedron-related Demonstrations.
So this was the background when in early 2009 we wanted to “pick a polyhedron” for Wolfram|Alpha. It all came to a head on the evening of Friday, February 6, when I decided to just take a look at things myself.
I still have the notebook I used, and it shows that at first I tried out the rather dubious idea of putting spheres at the vertices of polyhedra:
But (as the Notebook History system recorded) just under two minutes later I’d generated pure polyhedron images—all in the orange we thought we were going to use for the logo:
The polyhedra were arranged in alphabetical order by name, and on line 28, there it was—the rhombic hexecontahedron:
A couple of minutes later, I had homed in on the rhombic hexecontahedron, and at exactly 12:24:24 on February 9, 2009, I rotated it into essentially the symmetrical orientation we now use:
I wondered what it would look like in gray scale or in silhouette, and four minutes later I used ColorSeparate to find out:
I immediately started writing an email—which I fired off at 12:32am:
“I [...] rather like the RhombicHexecontahedron ….
It’s an interesting shape … very symmetrical … I think it might have
about the right complexity … and its silhouette is quite reasonable.”
I’d obviously just copied “RhombicHexecontahedron” from the label in the notebook (and I doubt I could have spelled “hexecontahedron” correctly yet). And indeed from my archives I know that this was the very first time I’d ever written the name of what was destined to become my all-time-favorite polyhedron.
It was dead easy in the Wolfram Language to get a picture of a rhombic hexecontahedron to play with:
✕
PolyhedronData["RhombicHexecontahedron"]
And by Monday it was clear that the rhombic hexecontahedron was a winner—and our art department set about rendering it as the Wolfram|Alpha logo. We tried some different orientations, but soon settled on the symmetrical “head-on” one that I’d picked. (We also had to figure out the best “focal length”, giving the best foreshortening.)
Like our Version 1.0 stellated icosahedron, the rhombic hexecontahedron has 60 faces. But somehow, with its flower-like five-fold “petal” arrangements, it felt much more elegant. It took a fair amount of effort to find the best facet shading in a 2D rendering to reflect the 3D form. But soon we had the first official version of our logo:
It quickly started to show up everywhere, and in a nod to our earlier ideas, it often appeared on a “geared background”:
A few years later, we tweaked the facet shading slightly, giving what is still today the logo of Wolfram|Alpha:
The Rhombic Hexecontahedron
What is a rhombic hexecontahedron? It’s called a “hexecontahedron” because it has 60 faces, and ἑξηκοντα (hexeconta) is the Greek word for 60. (Yes, the correct spelling is with an “e”, not an “a”.) It’s called “rhombic” because each of its faces is a rhombus. Actually, its faces are golden rhombuses, so named because their diagonals are in the golden ratio ≃ 1.618:
The rhombic hexecontahedron is a curious interpolation between an icosahedron and a dodecahedron (with an icosidodecahedron in the middle). The 12 innermost points of a rhombic hexecontahedron form a regular icosahedron, while the 20 outermost points form a regular dodecahedron. The 30 “middle points” form an icosidodecahedron, which has 32 faces (20 “icosahedron-like” triangular faces, and 12 “dodecahedron-like” pentagonal faces):
Altogether, the rhombic hexecontahedron has 62 vertices and 120 edges (as well as 120−62+2=60 faces). There are 3 kinds of vertices (“inner”, “middle” and “outer”), corresponding to the 12+30+20 vertices of the icosahedron, icosidodecahedron and dodecahedron. These types of vertices have respectively 3, 4 and 5 edges meeting at them. Each golden rhombus face of the rhombic hexecontahedron has one “inner” vertex where 5 edges meet, one “outer” vertex where 3 edges meet and two “middle” vertices where 4 edges meet. The inner and outer vertices are the acute vertices of the golden rhombuses; the middle ones are the obtuse vertices.
The acute vertices of the golden rhombuses have angle 2 tan−1(ϕ−1) ≈ 63.43°, and the obtuse ones 2 tan−1(ϕ) ≈ 116.57°. The angles allow the rhombic hexecontahedron to be assembled from Zometool using only red struts (the same as for a dodecahedron):
Across the 120 edges of the rhombic hexecontahedron, the 60 “inward-facing hinges” have dihedral angle 4𝜋/5=144°, and the 60 “outward-facing” ones have dihedral angle 2𝜋/5=72°. The solid angles subtended by the inner and outer vertices are 𝜋/5 and 3𝜋/5.
To actually draw a rhombic hexecontahedron, one needs to know 3D coordinates for its vertices. A convenient way to get these is to use the fact that the rhombic hexecontahedron is invariant under the icosahedral group, so that one can start with a single golden rhombus and just apply the 60 matrices that form a 3D representation of the icosahedral group. This gives for example final vertex coordinates {±ϕ,±1,0}, {±1,±ϕ,±(1+ϕ)}, {±2ϕ,0,0}, {±ϕ,±(1+2ϕ),0}, {±(1+ϕ),±(1+ϕ),±(1+ϕ)}, and cyclic permutations of these, with each possible sign being taken.
In addition to having faces that are golden rhombuses, the rhombic hexecontahedron can be constructed out of 20 golden rhombohedra (whose 6 faces are all golden rhombuses):
There are other ways to build rhombic hexecontahedra out of other polyhedra. Five intersecting cubes can do it, as can 182 dodecahedra touching at corners:
Rhombic hexecontahedra don’t tessellate space. But they do interlock in a satisfying way (and, yes, I’ve seen tens of paper ones stacked up this way):
There are also all sorts of ring and other configurations that can be made with them:
Closely related to the rhombic hexecontahedron (“RH”) is the rhombic triacontahedron (“RT”). Both the RH and the RT have faces that are golden rhombuses. But the RH has 60, while the RT has 30. Here’s what a single RT looks like:
RTs fit beautifully into the “pockets” in RHs, leading to forms like this:
As soon as we’d settled on a rhombic hexecontahedron Spikey, we started making 3D printouts of it. (It’s now very straightforward to do this with Printout3D[PolyhedronData[...]], and there are also precomputed models available at outside services.)
But as we prepared for the first post-Wolfram|Alpha holiday season, we wanted to give everyone a way to make their own 3D Spikey. At first we explored using sets of 20 plastic-covered golden rhombohedral magnets. But they were expensive, and had a habit of not sticking together well enough at “Spikey scale”.
So that led us to the idea of making a Spikey out of paper, or thin cardboard. Our first thought was then to create a net that could be folded up to make a Spikey:
My daughter Catherine was our test folder (and still has the object that was created), but it was clear that there were a lot of awkward hard-to-get-there-from-here situations during the folding process. There are a huge number of possible nets (there are already 43,380 even for the dodecahedron and icosahedron)—and we thought that perhaps one could be found that would work better:
But after failing to find any such net, we then had a new (if obvious) idea: since the final structure would be held together by tabs anyway, why not just make it out of multiple pieces? We quickly realized that the pieces could be 12 identical copies of this:
Making the instructions easy to understand was an interesting challenge, but after a few iterations they’re now well debugged, and easy for anyone to follow:
And with paper Spikeys in circulation, our users started sending us all sorts of pictures of Spikeys “on location”:
The Path to the Rhombic Hexecontahedron
It’s not clear who first identified the Platonic solids. Perhaps it was the Pythagoreans (particularly living near so many polyhedrally shaped pyrite crystals). Perhaps it was someone long before them. Or perhaps it was a contemporary of Plato’s named Theaetetus. But in any case, by the time of Plato (≈400 BC), it was known that there are five Platonic solids. And when Euclid wrote his Elements (around 300 BC) perhaps the pinnacle of it was the proof that these five are all there can be. (This proof is notably the one that takes the most steps—32—from the original axioms of the Elements.)
Platonic solids were used for dice and ornaments. But they were also given a central role in thinking about nature, with Plato for example suggesting that perhaps everything could in some sense be made of them: earth of cubes, air of octahedra, water of icosahedra, fire of tetrahedra, and the heavens (“ether”) of dodecahedra.
But what about other polyhedra? In the 4th century AD, Pappus wrote that a couple of centuries earlier, Archimedes had discovered 13 other “regular polyhedra”—presumably what are now called the Archimedean solids—though the details were lost. And for a thousand years little more seems to have been done with polyhedra. But in the 1400s, with the Renaissance starting up, polyhedra were suddenly in vogue again. People like Leonardo da Vinci and Albrecht Dürerroutinely used them in art and design, rediscovering some of the Archimedean solids—as well as finding some entirely new polyhedra, like the icosidodecahedron.
But the biggest step forward for polyhedra came with Johannes Kepler at the beginning of the 1600s. It all started with an elegant, if utterly wrong, theory. Theologically convinced that the universe must be constructed with mathematical perfection, Kepler suggested that the six planets known at the time might move on nested spheres geometrically arranged so as to just fit the suitably ordered five Platonic solids between them:
In his 1619 book Harmonices mundi (“Harmony of the World”) Kepler argued that many features of music, planets and souls operate according to similar geometric ratios and principles. And to provide raw material for his arguments, Kepler studied polygons and polyhedra, being particularly interested in finding objects that somehow formed complete sets, like the Platonic solids.
He studied possible “sociable polygons”, that together could tile the plane—finding, for example, his “monster tiling” (with pentagons, pentagrams and decagons). He studied “star polyhedra” and found various stellations of the Platonic solids (and in effect the Kepler–Poinsot polyhedra). In 1611 he had published a small book about the hexagonal structure of snowflakes, written as a New Year’s gift for a sometime patron of his. And in this book he discussed the 3D packing of spheres (and spherical atoms), suggesting that what’s now called the Kepler packing (and routinely seen in the packing of fruit in grocery stores) was the densest possible packing (a fact that wasn’t formally proved until into the 2000s—as it happens, with the help of Mathematica).
There’s a polyhedron lurking in the Kepler packing. Touching every sphere in the packing are exactly 14 others. And joining the centers of adjacent ones of these gives a polyhedron called the rhombic dodecahedron, with 14 vertices and 12 faces:
Having discovered this, Kepler started looking for other “rhombic polyhedra”. The rhombic dodecahedron he found has rhombuses composed of pairs of equilateral triangles. But by 1619 Kepler had also looked at golden rhombuses—and had found the rhombic triacontahedron, and drew a nice picture of it in his book, right next to the rhombic dodecahedron:
Kepler actually had an immediate application for these rhombic polyhedra: he wanted to use them, along with the cube, to make a nested-spheres model that would fit the orbital periods of the four moons of Jupiter that Galileo had discovered in 1610.
Why didn’t Kepler discover the rhombic hexecontahedron? I think he was quite close. He looked at non-convex “star” polyhedra. He looked at rhombic polyhedra. But I guess for his astronomical theories he was satisfied with the rhombic triacontahedron, and looked no further.
In the end, of course, it was Kepler’s laws—which have nothing to do with polyhedra—that were Kepler’s main surviving contribution to astronomy. But Kepler’s work on polyhedra—albeit done in the service of a misguided physical theory—stands as a timeless contribution to mathematics.
Over the next three centuries, more polyhedra, with various forms of regularity, were gradually found—and by the early 1900s there were many known to mathematicians:
But, so far as I can tell, the rhombic hexecontahedron was not among them. And instead its discovery had to await the work of a certain Helmut Unkelbach. Born in 1910, he got a PhD in math at the University of Munich in 1937 (after initially studying physics). He wrote several papers about conformal mapping, and—perhaps through studying mappings of polyhedral domains—was led in 1940 to publish a paper (in German) about “The Edge-Symmetric Polyhedra”.
His goal, he explains, is to exhaustively study all possible polyhedra that satisfy a specific, though new, definition of regularity: that their edges are all the same length, and these edges all lie in some symmetry plane of the polyhedron. The main result of his paper is a table containing 20 distinct polyhedra with that property:
Most of these polyhedra Unkelbach knew to already be known. But Unkelbach singles out three types that he thinks are new: two hexakisoctahedra (or disdyakis dodecahedra), two hexakisicosahedra (or dysdyakis triacontahedra), and what he calls the Rhombenhexekontaeder, or in English, the rhombic hexecontahedron. He clearly considers the rhombic hexecontahedron his prize specimen, including a photograph of a model he made of it:
How did he actually “derive” the rhombic hexecontahedron? Basically, he started from a dodecahedron, and identified its two types of symmetry planes:
Then he subdivided each face of the dodecahedron:
Then he essentially considered pushing the centers of each face in or out to a specified multiple α of their usual distance from the center of the dodecahedron:
For α < 1, the resulting faces don’t intersect. But for most values of α, they don’t have equal-length sides. That only happens for the specific case —and in that case the resulting polyhedron is exactly the rhombic hexecontahedron.
Unkelbach actually viewed his 1940 paper as a kind of warmup for a study of more general “k-symmetric polyhedra” with looser symmetry requirements. But it was already remarkable enough that a mathematics journal was being published at all in Germany after the beginning of World War II, and soon after the paper, Unkelbach was pulled into the war effort, spending the next few years designing acoustic-homing torpedoes for the German navy.
Unkelbach never published on polyhedra again, and died in 1968. After the war he returned to conformal mapping, but also started publishing on the idea that mathematical voting theory was the key to setting up a well-functioning democracy, and that mathematicians had a responsibility to make sure it was used.
But even though the rhombic hexecontahedron appeared in Unkelbach’s 1940 paper, it might well have languished there forever, were it not for the fact that in 1946 a certain H. S. M. (“Donald”) Coxeter wrote a short review of the paper for the (fairly new) American Mathematical Reviews. His review catalogs the polyhedra mentioned in the paper, much as a naturalist might catalog new species seen on an expedition. The high point is what he describes as “a remarkable rhombic hexecontahedron”, for which he reports that “its faces have the same shape as those of the triacontahedron, of which it is actually a stellation”.
Polyhedra were not exactly a hot topic in the mathematics of the mid-1900s, but Coxeter was their leading proponent—and was connected in one way or another to pretty much everyone who was working on them. In 1948 he published his book Regular Polytopes. It describes in a systematic way a variety of families of regular polyhedra, in particular showing the great stellated triacontahedron (or great rhombic triacontahedron)—which effectively contains a rhombic hexecontahedron:
But Coxeter didn’t explicitly mention the rhombic hexecontahedron in his book, and while it picked up a few mentions from polyhedron aficionados, the rhombic hexecontahedron remained a basically obscure (and sometimes misspelled) polyhedron.
Quasicrystals
Crystals had always provided important examples of polyhedra. But by the 1800s, with atomic theory increasingly established, there began to be serious investigation of crystallography, and of how atoms are arranged in crystals. Polyhedra made a frequent appearance, in particular in representing the geometries of repeating blocks of atoms (“unit cells”) in crystals.
By 1850 it was known that there were basically only 14 possible such geometries; among them is one based on the rhombic dodecahedron. A notable feature of these geometries is that they all have specific two-, three-, four- or six-fold symmetries—essentially a consequence of the fact that only certain polyhedra can tessellate space, much as in 2D the only regular polygons that can tile the plane are squares, triangles and hexagons.
But what about for non-crystalline materials, like liquids or glasses? People had wondered since before the 1930s whether at least approximate five-fold symmetries could exist there. You can’t tessellate space with regular icosahedra (which have five-fold symmetry), but maybe you could at least have icosahedral regions with little gaps in between.
None of this was settled when in 1982 x-ray crystallography on a rapidly cooled aluminum-manganese material effectively showed five-fold symmetry. Within a year or so there were electron microscope pictures of grains that were shaped like rhombic triacontahedra:
And as people imagined how these triacontahedra could pack together, the rhombic hexecontahedron soon made its appearance—as a “hole” in a cluster of 12 rhombic triacontahedra:
At first it was referred to as a “20-branched star”. But soon the connection with the polyhedron literature was made, and it was identified as a rhombic hexecontahedron.
Meanwhile, the whole idea of making things out of rhombic elements was gaining attention. Michael Longuet-Higgins, longtime oceanographer and expert on how wind makes water waves, jumped on the bandwagon, in 1987 filing a patent for a toy based on magnetic rhombohedral blocks, that could make a “Kepler Star” (rhombic hexecontahedron) or a “Kepler Ball” (rhombic triacontahedron):
And—although I only just found this out—the rhombohedral blocks that we considered in 2009 for widespread “Spikey making” were actually produced by Dextro Mathematical Toys (aka Rhombo.com), operating out of Longuet-Higgins’s house in San Diego.
The whole question of what can successfully tessellate space—or even tile the plane—is a complicated one. In fact, the general problem of whether a particular set of shapes can be arranged to tile the plane has been known since the early 1960s to be formally undecidable. (One might verify that 1000 of these shapes can fit together, but it can take arbitrarily more computational effort to figure out the answer for more and more of the shapes.)
People like Kepler presumably assumed if a set of shapes was going to tile the plane, they must be able to do so in a purely repetitive pattern. But following the realization that the general tiling problem is undecidable, Roger Penrose in 1974 came up with two shapes that could successfully tile the plane, but not in a repetitive way. By 1976 Penrose (as well as Robert Ammann) had come up with a slightly simpler version:
And, yes, the shapes here are rhombuses, though not golden rhombuses. But with angles 36°,144° and 72°,108°, they arrange with 5- and 10-fold symmetry.
By construction, these rhombuses (or, more strictly, shapes made from them) can’t form a repetitive pattern. But it turns out they can form a pattern that can be built up in a systematic, nested way:
And, yes, the middle of step 3 in this sequence looks rather like our flattened Spikey. But it’s not exactly right; the aspect ratios of the outer rhombuses are off.
Looking at it from above, it looks exactly like the beginning of the nested construction of the Penrose tiling. If one keeps going, one gets the Penrose tiling:
Looked at “from the side” in 3D, one can tell it’s still just identical golden rhombuses:
Putting four of these “Wieringa roofs” together one can form exactly the rhombic hexecontahedron:
But what’s the relation between these nested constructions and the actual way physical quasicrystals form? It’s not yet clear. But it’s still neat to see even hints of rhombic hexecontahedra showing up in nature.
And historically it was through their discussion in quasicrystals that Sándor Kabai came to start studying rhombic hexecontahedra with Mathematica, which in turn led Eric Weisstein to find out about them, which in turn led them to be in Mathematica and the Wolfram Language, which in turn led me to pick one for our logo. And in recognition of this, we print the nestedly constructed Penrose tiling on the inside of our paper Spikey:
Flattening Spikey
Our Wolfram|Alpha Spikey burst onto the scene in 2009 with the release of Wolfram|Alpha. But we still had our long-running and progressively evolving Mathematica Spikey too. So when we built a new European headquarters in 2011 we had not just one, but two Spikeys vying to be on it.
Our longtime art director Jeremy Davis came up with a solution: take one Spikey, but “idealize” it, using just its “skeleton”. It wasn’t hard to decide to start from the rhombic hexecontahedron. But then we flattened it (with the best ratios, of course)—and finally ended up with the first implementation of our now-familiar logo:
The Brazilian Surprise
When I started writing this piece, I thought the story would basically end here. After all, I’ve now described how we picked the rhombic hexecontahedron, and how mathematicians came up with it in the first place. But before finishing the piece, I thought, “I’d better look through all the correspondence I’ve received about Spikey over the years, just to make sure I’m not missing anything.”
And that’s when I noticed an email from June 2009, from an artist in Brazil named Yolanda Cipriano. She said she’d seen an article about Wolfram|Alpha in a Brazilian news magazine—and had noticed the Spikey—and wanted to point me to her website. It was now more than nine years later, but I followed the link anyway, and was amazed to find this:
I read more of her email: “Here in Brazil this object is called ‘Giramundo’ or ‘Flor Mandacarú’ (Mandacaru Flower) and it is an artistic ornament made with [tissue paper]”.
What?! There was a Spikey tradition in Brazil, and all these years we’d never heard about it? I soon found other pictures on the web. Only a few of the Spikeys were made with paper; most were fabric—but there were lots of them:
I emailed a Brazilian friend who’d worked on the original development of Wolfram|Alpha. He quickly responded “These are indeed familiar objects… and to my shame I was never inquisitive enough to connect the dots”—then sent me pictures from a local arts and crafts catalog:
But now the hunt was on: what were these things, and where had they come from? Someone at our company volunteered that actually her great-grandmother in Chile had made such things out of crochet—and always with a tail. We started contacting people who had put up pictures of “folk Spikeys” on the web. Quite often all they knew was that they got theirs from a thrift shop. But sometimes people would say that they knew how to make them. And the story always seemed to be the same: they’d learned how to do it from their grandmothers.
The typical way to build a folk Spikey—at least in modern times—seems to be to start off by cutting out 60 cardboard rhombuses. The next step is to wrap each rhombus in fabric—and finally to stitch them all together:
OK, but there’s an immediate math issue here. Are these people really correctly measuring out 63° golden rhombuses? The answer is typically no. Instead, they’re making 60° rhombuses out of pairs of equilateral triangles—just like the standard diamond shapes used in quilts. So how then does the Spikey fit together? Well, 60° is not far from 63°, and if you’re sewing the faces together, there’s enough wiggle room that it’s easy to make the polyhedron close even without the angles being precisely right. (There are also “quasi-Spikeys” that—as in Unkelbach’s construction—don’t have rhombuses for faces, but instead have pointier “outside triangles”.)
Folk Spikeys on the web are labeled in all sorts of ways. The most common is as “Giramundos”. But quite often they are called “Estrelas da Felicidade” (“stars of happiness”). Confusingly, some of them are also labeled “Moravian stars”—but actually, Moravian stars are different and much pointier polyhedra (most often heavily augmented rhombicuboctahedra) that happen to have recently become popular, particularly for light fixtures.
Despite quite a bit of investigation, I still don’t know what the full history of the “folk Spikey” is. But here’s what I’ve found out so far. First, at least what survives of the folk Spikey tradition is centered around Brazil (even though we have a few stories of other appearances). Second, the tradition seems to be fairly old, definitely dating from well before 1900 and quite possibly several centuries earlier. So far as I can tell—as is common with folk art—it’s a purely oral tradition, and so far I haven’t found any real historical documentation about it.
My best information has come from a certain Paula Guerra, who sold folk Spikeys at a tourist-oriented cafe she operated a decade ago in the historic town of São Luíz do Paraitinga. She said people would come into her cafe from all over Brazil, see the folk Spikeys and say, “I haven’t seen one of those in 50 years…”
Paula herself learned about folk Spikeys (she calls them “stars”) from an older woman living on a multigenerational local family farm, who’d been making them since she was a little girl, and had been taught how to do it by her mother. Her procedure—which seems to have been typical—was to get cardboard from anywhere (originally, things like hat boxes), then to cover it with fabric scraps, usually from clothes, then to sew the whole perhaps-6″-across object together.
How old is the folk Spikey? Well, we only have oral tradition to go by. But we’ve tracked down several people who saw folk Spikeys being made by relatives who were born around 1900. Paula said that a decade ago she’d met an 80-year-old woman who told her that when she was growing up on a 200-year-old coffee farm there was a shelf of folk Spikeys from four generations of women.
At least part of the folk Spikey story seems to center around a mother-daughter tradition. Mothers, it is said, often made folk Spikeys as wedding presents when their daughters went off to get married. Typically the Spikeys were made from scraps of clothes and other things that would remind the daughters of their childhood—a bit like how quilts are sometimes made for modern kids going to college.
But for folk Spikeys there was apparently another twist: it was common that before a Spikey was sewn up, a mother would put money inside it, for her daughter’s use in an emergency. The daughter would then keep her Spikey with her sewing supplies, where her husband would be unlikely to pick it up. (Some Spikeys seem to have been used as pincushions—perhaps providing an additional disincentive for them to be picked up.)
What kinds of families had the folk Spikey tradition? Starting around 1750 there were many coffee and sugar plantations in rural Brazil, far from towns. And until perhaps 1900 it was common for farmers from these plantations to get brides—often as young as 13—from distant towns. And perhaps these brides—who were typically from well-off families of Portuguese descent, and were often comparatively well educated—came with folk Spikeys.
In time the tradition seems to have spread to poorer families, and to have been preserved mainly there. But around the 1950s—presumably with the advent of roads and urbanization and the move away from living on remote farms—the tradition seems to have all but died out. (In rural schools in southern Brazil there were however apparently girls in the 1950s being taught in art classes how to make folk Spikeys with openings in them—to serve as piggy banks.)
Folk Spikeys seem to have shown up with different stories in different places around Brazil. In the southern border region (near Argentina and Uruguay) there’s apparently a tradition that the “Star of St. Miguel” (aka folk Spikey) was made in villages by healer women (aka “witches”), who were supposed to think about the health of the person being healed while they were sewing their Spikeys.
In other parts of Brazil, folk Spikeys sometimes seem to be referred to by the names of flowers and fruits that look vaguely similar. In the northeast, “Flor Mandacarú” (after flowers on a cactus). In tropical wetland areas, “Carambola” (after star fruit). And in central forest areas “Pindaíva” (after a spiky red fruit).
But the most common current name for a folk Spikey seems to be “Giramundo”—an apparently not-very-recent Portuguese constructed word meaning essentially “whirling world”. The folk Spikey, it seems, was used like a charm, and was supposed to bring good luck as it twirled in the wind. The addition of tails seems to be recent, but apparently it was common to hang up folk Spikeys in houses, perhaps particularly on festive occasions.
It’s often not clear what’s original, and what’s a more recent tradition that happens to have “entrained” folk Spikeys. In the Three Kings’ Day parade (as in the three kings from the Bible) in São Luiz do Paraitinga, folk Spikeys are apparently used to signify the Star of Bethlehem—but this seems to just be a recent thing, definitely not indicative of some ancient religious connection.
We’ve found a couple of examples of folk Spikeys showing up in art exhibitions. One was in a 1963 exhibition about folk art from northeastern Brazil organized by architect Lina Bo Bardi. The other, which happens to be the largest 3D Spikey I’ve ever seen, was in a 1997 exhibition of work by architect and set designer Flávio Império:
So… where did the folk Spikey come from? I still don’t know. It may have originated in Brazil; it may have come from Portugal or elsewhere in Europe. The central use of fabrics and sewing needed to make a “60° Spikey” work might argue against an Amerindian or African origin.
One modern Spikey artisan did say that her great grandmother—who made folk Spikeys and was born in the late 1800s—came from the Romanga region of Italy. (One also said she learned about folk Spikeys from her French-Canadian grandmother.) And I suppose it’s conceivable that at one time there were folk Spikeys all over Europe, but they died out enough generations ago that no oral tradition about them survives. Still, while a decent number of polyhedra appear, for example, in European paintings from earlier centuries, I don’t know of a single Spikey among them. (I also don’t know of any Spikeys in historical Islamic art.)
But ultimately I’m pretty sure that somewhere there’s a single origin for the folk Spikey. It’s not something that I suspect was invented more than once.
So far the Spikey has proved more elusive—and it certainly doesn’t help that the primary medium in which it appears to have been explored involved fabric, which doesn’t keep the way stone does.
Spikeys Come to Life
Whatever its ultimate origins, Spikey serves us very well as a strong and dignified icon. But sometimes it’s fun to have Spikey “come to life”—and over the years we’ve made various “personified Spikeys” for various purposes:
When you use Wolfram|Alpha, it’ll usually show its normal, geometrical Spikey. But just sometimes your query will make the Spikey “come to life”—as it does for pi queries on Pi Day:
Spikeys Forever
Polyhedra are timeless. You see a polyhedron in a picture from 500 years ago and it’ll look just as clean and modern as a polyhedron from my computer today.
I’ve spent a fair fraction of my life finding abstract, computational things (think cellular automaton patterns). And they too have a timelessness to them. But—try as I might—I have not found much of a thread of history for them. As abstract objects they could have been created at any time. But in fact they are modern, created because of the conceptual framework we now have, and with the tools we have today—and never seen before.
Polyhedra have both timelessness and a rich history that goes back thousands of years. In their appearance, polyhedra remind us of gems. And finding a certain kind of regular polyhedron is a bit like finding a gem out in the geometrical universe of all possible shapes.
The rhombic hexecontahedron is a wonderful such gem, and as I have explored its properties, I have come to have even more appreciation for it. But it is also a gem with a human story—and it is so interesting to see how something as abstract as a polyhedron can connect people across the world with such diverse backgrounds and objectives.
Who first came up with the rhombic hexecontahedron? We don’t know, and perhaps we never will. But now that it is here, it’s forever. My favorite polyhedron.
But what is its story, and how did we come to adopt it as our symbol?
The Origins of Spikey
Back in 1987, when we were developing the first version of Mathematica, one of our innovations was being able to generate resolution-independent 3D graphics from symbolic descriptions. In our early demos, this let us create wonderfully crisp images of Platonic solids. But as we approached the release of Mathematica 1.0, we wanted a more impressive example. So we decided to take the last of the Platonic solids—the icosahedron—and then make something more complex by a certain amount of stellation (or, more correctly, cumulation). (Yes, that’s what the original notebook interface looked like, 30 years ago…)
At first this was just a nice demo that happened to run fast enough on the computers we were using back then. But quite soon the 3D object it generated began to emerge as the de facto logo for Mathematica. And by the time Mathematica 1.0 was released in 1988, the stellated icosahedron was everywhere:
In time, tributes to our particular stellation started appearing—in various materials and sizes:
But just a year after we released Mathematica 1.0, we were getting ready to release Mathematica 1.2, and to communicate its greater sophistication, we wanted a more sophisticated logo. One of our developers, Igor Rivin, had done his PhD on polyhedra in hyperbolic space—and through his efforts a hyperbolic icosahedron adorned our Version 1.2 materials:
My staff gave me an up-to-date-Spikey T-shirt for my 30th birthday in 1989, with a quote that I guess even after all these years I’d still say:
After Mathematica 1.2, our marketing materials had a whole collection of hyperbolic Platonic solids, but by the time Version 2.0 arrived in 1991 we’d decided our favorite was the hyperbolic dodecahedron:
Looking through my 1991 archives today, I find some “explanatory” code (by Ilan Vardi)—and it’s nice to see that it all just runs in our latest Wolfram Language (though now it can be written a bit more elegantly):
Over the years, it became a strange ritual that when we were getting ready to launch a new integer version of Mathematica, we’d have very earnest meetings to “pick our new Spikey”. Sometimes there would be hundreds to choose from, generated (most often by Michael Trott) using all kinds of different algorithms:
But though the color palettes evolved, and the Spikeys often reflected (though perhaps in some subtle way) new features in the system, we’ve now had a 30-year tradition of variations on the hyperbolic dodecahedron:
In more recent times, it’s become a bit more streamlined to explore the parameter space—though by now we’ve accumulated hundreds of parameters:
A hyperbolic dodecahedron has 20 points—ideal for celebrating the 20th anniversary of Mathematica in 2008. But when we wanted something similar for the 25th anniversary in 2013 we ran into the problem that there’s no regular polyhedron with 25 vertices. But (essentially using SpherePoints[25]) we managed to create an approximate one—and made a 3D printout of it for everyone in our company, sized according to how long they’d been with us:
Enter Wolfram|Alpha
In 2009, we were getting ready to launch Wolfram|Alpha—and it needed a logo. There were all sorts of concepts:
We really wanted to emphasize that Wolfram|Alpha works by doing computation (rather than just, say, searching). And for a while we were keen on indicating this with some kind of gear-like motif. But we also wanted the logo to be reminiscent of our longtime Mathematica logo. So this led to one of those classic “the-CEO-must-be-crazy” projects: make a gear mechanism out of spikey-like forms.
Longtime Mathematica and Wolfram Language user (and Hungarian mechanical engineer) Sándor Kabai helped out, suggesting a “Spikey Gear”:
And then, in a throwback to the Version 2 intersecting tetrahedra, he came up with this:
In 2009, 3D printing was becoming very popular, and we thought it would be nice for Wolfram|Alpha to have a logo that was readily 3D printable. Hyperbolic polyhedra were out: their spikes would break off, and could be dangerous. (And something like the Mathematica Version 4 Spikey, with “safety spikes”, lacked elegance.)
For a while we fixated on the gears idea. But eventually we decided it’d be worth taking another look at ordinary polyhedra. But if we were going to adopt a polyhedron, which one should it be?
There are of course an infinite number of possible polyhedra. But to make a nice logo, we wanted a symmetrical and somehow “regular” one. The five Platonic solids—all of whose faces are identical regular polygons—are in effect the “most regular” of all polyhedra:
Then there are the 13 Archimedean solids, all of whose vertices are identical, and whose faces are regular polygons but of more than one kind:
Over the years that Eric Weisstein was assembling what in 1999 became MathWorld, he made an effort to include articles on as many notable polyhedra as possible. And in 2006, as part of putting every kind of systematic data into Mathematica and the Wolfram Language, we started including polyhedron data from MathWorld. The result was that when Version 6.0 was released in 2007, it included the function PolyhedronData that contained extensive data on 187 notable polyhedra:
It had always been possible to generate regular polyhedra in Mathematica and the Wolfram Language, but now it became easy. With the release of Version 6.0 we also started the Wolfram Demonstrations Project, which quickly began accumulating all sorts of polyhedron-related Demonstrations.
So this was the background when in early 2009 we wanted to “pick a polyhedron” for Wolfram|Alpha. It all came to a head on the evening of Friday, February 6, when I decided to just take a look at things myself.
I still have the notebook I used, and it shows that at first I tried out the rather dubious idea of putting spheres at the vertices of polyhedra:
But (as the Notebook History system recorded) just under two minutes later I’d generated pure polyhedron images—all in the orange we thought we were going to use for the logo:
The polyhedra were arranged in alphabetical order by name, and on line 28, there it was—the rhombic hexecontahedron:
A couple of minutes later, I had homed in on the rhombic hexecontahedron, and at exactly 12:24:24 on February 9, 2009, I rotated it into essentially the symmetrical orientation we now use:
I wondered what it would look like in gray scale or in silhouette, and four minutes later I used ColorSeparate to find out:
I immediately started writing an email—which I fired off at 12:32am:
“I [...] rather like the RhombicHexecontahedron ….
It’s an interesting shape … very symmetrical … I think it might have
about the right complexity … and its silhouette is quite reasonable.”
I’d obviously just copied “RhombicHexecontahedron” from the label in the notebook (and I doubt I could have spelled “hexecontahedron” correctly yet). And indeed from my archives I know that this was the very first time I’d ever written the name of what was destined to become my all-time-favorite polyhedron.
It was dead easy in the Wolfram Language to get a picture of a rhombic hexecontahedron to play with:
✕
PolyhedronData["RhombicHexecontahedron"]
And by Monday it was clear that the rhombic hexecontahedron was a winner—and our art department set about rendering it as the Wolfram|Alpha logo. We tried some different orientations, but soon settled on the symmetrical “head-on” one that I’d picked. (We also had to figure out the best “focal length”, giving the best foreshortening.)
Like our Version 1.0 stellated icosahedron, the rhombic hexecontahedron has 60 faces. But somehow, with its flower-like five-fold “petal” arrangements, it felt much more elegant. It took a fair amount of effort to find the best facet shading in a 2D rendering to reflect the 3D form. But soon we had the first official version of our logo:
It quickly started to show up everywhere, and in a nod to our earlier ideas, it often appeared on a “geared background”:
A few years later, we tweaked the facet shading slightly, giving what is still today the logo of Wolfram|Alpha:
The Rhombic Hexecontahedron
What is a rhombic hexecontahedron? It’s called a “hexecontahedron” because it has 60 faces, and ἑξηκοντα (hexeconta) is the Greek word for 60. (Yes, the correct spelling is with an “e”, not an “a”.) It’s called “rhombic” because each of its faces is a rhombus. Actually, its faces are golden rhombuses, so named because their diagonals are in the golden ratio ≃ 1.618:
The rhombic hexecontahedron is a curious interpolation between an icosahedron and a dodecahedron (with an icosidodecahedron in the middle). The 12 innermost points of a rhombic hexecontahedron form a regular icosahedron, while the 20 outermost points form a regular dodecahedron. The 30 “middle points” form an icosidodecahedron, which has 32 faces (20 “icosahedron-like” triangular faces, and 12 “dodecahedron-like” pentagonal faces):
Altogether, the rhombic hexecontahedron has 62 vertices and 120 edges (as well as 120−62+2=60 faces). There are 3 kinds of vertices (“inner”, “middle” and “outer”), corresponding to the 12+30+20 vertices of the icosahedron, icosidodecahedron and dodecahedron. These types of vertices have respectively 3, 4 and 5 edges meeting at them. Each golden rhombus face of the rhombic hexecontahedron has one “inner” vertex where 5 edges meet, one “outer” vertex where 3 edges meet and two “middle” vertices where 4 edges meet. The inner and outer vertices are the acute vertices of the golden rhombuses; the middle ones are the obtuse vertices.
The acute vertices of the golden rhombuses have angle 2 tan−1(ϕ−1) ≈ 63.43°, and the obtuse ones 2 tan−1(ϕ) ≈ 116.57°. The angles allow the rhombic hexecontahedron to be assembled from Zometool using only red struts (the same as for a dodecahedron):
Across the 120 edges of the rhombic hexecontahedron, the 60 “inward-facing hinges” have dihedral angle 4𝜋/5=144°, and the 60 “outward-facing” ones have dihedral angle 2𝜋/5=72°. The solid angles subtended by the inner and outer vertices are 𝜋/5 and 3𝜋/5.
To actually draw a rhombic hexecontahedron, one needs to know 3D coordinates for its vertices. A convenient way to get these is to use the fact that the rhombic hexecontahedron is invariant under the icosahedral group, so that one can start with a single golden rhombus and just apply the 60 matrices that form a 3D representation of the icosahedral group. This gives for example final vertex coordinates {±ϕ,±1,0}, {±1,±ϕ,±(1+ϕ)}, {±2ϕ,0,0}, {±ϕ,±(1+2ϕ),0}, {±(1+ϕ),±(1+ϕ),±(1+ϕ)}, and cyclic permutations of these, with each possible sign being taken.
In addition to having faces that are golden rhombuses, the rhombic hexecontahedron can be constructed out of 20 golden rhombohedra (whose 6 faces are all golden rhombuses):
There are other ways to build rhombic hexecontahedra out of other polyhedra. Five intersecting cubes can do it, as can 182 dodecahedra with touching faces:
Rhombic hexecontahedra don’t tessellate space. But they do interlock in a satisfying way (and, yes, I’ve seen tens of paper ones stacked up this way):
There are also all sorts of ring and other configurations that can be made with them:
Closely related to the rhombic hexecontahedron (“RH”) is the rhombic triacontahedron (“RT”). Both the RH and the RT have faces that are golden rhombuses. But the RH has 60, while the RT has 30. Here’s what a single RT looks like:
RTs fit beautifully into the “pockets” in RHs, leading to forms like this:
As soon as we’d settled on a rhombic hexecontahedron Spikey, we started making 3D printouts of it. (It’s now very straightforward to do this with Printout3D[PolyhedronData[...]], and there are also precomputed models available at outside services.)
But as we prepared for the first post-Wolfram|Alpha holiday season, we wanted to give everyone a way to make their own 3D Spikey. At first we explored using sets of 20 plastic-covered golden rhombohedral magnets. But they were expensive, and had a habit of not sticking together well enough at “Spikey scale”.
So that led us to the idea of making a Spikey out of paper, or thin cardboard. Our first thought was then to create a net that could be folded up to make a Spikey:
My daughter Catherine was our test folder (and still has the object that was created), but it was clear that there were a lot of awkward hard-to-get-there-from-here situations during the folding process. There are a huge number of possible nets (there are already 43,380 even for the dodecahedron and icosahedron)—and we thought that perhaps one could be found that would work better:
But after failing to find any such net, we then had a new (if obvious) idea: since the final structure would be held together by tabs anyway, why not just make it out of multiple pieces? We quickly realized that the pieces could be 12 identical copies of this:
Making the instructions easy to understand was an interesting challenge, but after a few iterations they’re now well debugged, and easy for anyone to follow:
And with paper Spikeys in circulation, our users started sending us all sorts of pictures of Spikeys “on location”:
The Path to the Rhombic Hexecontahedron
It’s not clear who first identified the Platonic solids. Perhaps it was the Pythagoreans (particularly living near so many polyhedrally shaped pyrite crystals). Perhaps it was someone long before them. Or perhaps it was a contemporary of Plato’s named Theaetetus. But in any case, by the time of Plato (≈400 BC), it was known that there are five Platonic solids. And when Euclid wrote his Elements (around 300 BC) perhaps the pinnacle of it was the proof that these five are all there can be. (This proof is notably the one that takes the most steps—32—from the original axioms of the Elements.)
Platonic solids were used for dice and ornaments. But they were also given a central role in thinking about nature, with Plato for example suggesting that perhaps everything could in some sense be made of them: earth of cubes, air of octahedra, water of icosahedra, fire of tetrahedra, and the heavens (“ether”) of dodecahedra.
But what about other polyhedra? In the 4th century AD, Pappus wrote that a couple of centuries earlier, Archimedes had discovered 13 other “regular polyhedra”—presumably what are now called the Archimedean solids—though the details were lost. And for a thousand years little more seems to have been done with polyhedra. But in the 1400s, with the Renaissance starting up, polyhedra were suddenly in vogue again. People like Leonardo da Vinci and Albrecht Dürerroutinely used them in art and design, rediscovering some of the Archimedean solids—as well as finding some entirely new polyhedra, like the icosidodecahedron.
But the biggest step forward for polyhedra came with Johannes Kepler at the beginning of the 1600s. It all started with an elegant, if utterly wrong, theory. Theologically convinced that the universe must be constructed with mathematical perfection, Kepler suggested that the six planets known at the time might move on nested spheres geometrically arranged so as to just fit the suitably ordered five Platonic solids between them:
In his 1619 book Harmonices mundi (“Harmony of the World”) Kepler argued that many features of music, planets and souls operate according to similar geometric ratios and principles. And to provide raw material for his arguments, Kepler studied polygons and polyhedra, being particularly interested in finding objects that somehow formed complete sets, like the Platonic solids.
He studied possible “sociable polygons”, that together could tile the plane—finding, for example, his “monster tiling” (with pentagons, pentagrams and decagons). He studied “star polyhedra” and found various stellations of the Platonic solids (and in effect the Kepler–Poinsot polyhedra). In 1611 he had published a small book about the hexagonal structure of snowflakes, written as a New Year’s gift for a sometime patron of his. And in this book he discussed the 3D packing of spheres (and spherical atoms), suggesting that what’s now called the Kepler packing (and routinely seen in the packing of fruit in grocery stores) was the densest possible packing (a fact that wasn’t formally proved until into the 2000s—as it happens, with the help of Mathematica).
There’s a polyhedron lurking in the Kepler packing. Touching every sphere in the packing are exactly 14 others. And joining the centers of adjacent ones of these gives a polyhedron called the rhombic dodecahedron, with 14 vertices and 12 faces:
Having discovered this, Kepler started looking for other “rhombic polyhedra”. The rhombic dodecahedron he found has rhombuses composed of pairs of equilateral triangles. But by 1619 Kepler had also looked at golden rhombuses—and had found the rhombic triacontahedron, and drew a nice picture of it in his book, right next to the rhombic dodecahedron:
Kepler actually had an immediate application for these rhombic polyhedra: he wanted to use them, along with the cube, to make a nested-spheres model that would fit the orbital periods of the four moons of Jupiter that Galileo had discovered in 1610.
Why didn’t Kepler discover the rhombic hexecontahedron? I think he was quite close. He looked at non-convex “star” polyhedra. He looked at rhombic polyhedra. But I guess for his astronomical theories he was satisfied with the rhombic triacontahedron, and looked no further.
In the end, of course, it was Kepler’s laws—which have nothing to do with polyhedra—that were Kepler’s main surviving contribution to astronomy. But Kepler’s work on polyhedra—albeit done in the service of a misguided physical theory—stands as a timeless contribution to mathematics.
Over the next three centuries, more polyhedra, with various forms of regularity, were gradually found—and by the early 1900s there were many known to mathematicians:
But, so far as I can tell, the rhombic hexecontahedron was not among them. And instead its discovery had to await the work of a certain Helmut Unkelbach. Born in 1910, he got a PhD in math at the University of Munich in 1937 (after initially studying physics). He wrote several papers about conformal mapping, and—perhaps through studying mappings of polyhedral domains—was led in 1940 to publish a paper (in German) about “The Edge-Symmetric Polyhedra”.
His goal, he explains, is to exhaustively study all possible polyhedra that satisfy a specific, though new, definition of regularity: that their edges are all the same length, and these edges all lie in some symmetry plane of the polyhedron. The main result of his paper is a table containing 20 distinct polyhedra with that property:
Most of these polyhedra Unkelbach knew to already be known. But Unkelbach singles out three types that he thinks are new: two hexakisoctahedra (or disdyakis dodecahedra), two hexakisicosahedra (or dysdyakis triacontahedra), and what he calls the Rhombenhexekontaeder, or in English, the rhombic hexecontahedron. He clearly considers the rhombic hexecontahedron his prize specimen, including a photograph of a model he made of it:
How did he actually “derive” the rhombic hexecontahedron? Basically, he started from a dodecahedron, and identified its two types of symmetry planes:
Then he subdivided each face of the dodecahedron:
Then he essentially considered pushing the centers of each face in or out to a specified multiple α of their usual distance from the center of the dodecahedron:
For α < 1, the resulting faces don’t intersect. But for most values of α, they don’t have equal-length sides. That only happens for the specific case —and in that case the resulting polyhedron is exactly the rhombic hexecontahedron.
Unkelbach actually viewed his 1940 paper as a kind of warmup for a study of more general “k-symmetric polyhedra” with looser symmetry requirements. But it was already remarkable enough that a mathematics journal was being published at all in Germany after the beginning of World War II, and soon after the paper, Unkelbach was pulled into the war effort, spending the next few years designing acoustic-homing torpedoes for the German navy.
Unkelbach never published on polyhedra again, and died in 1968. After the war he returned to conformal mapping, but also started publishing on the idea that mathematical voting theory was the key to setting up a well-functioning democracy, and that mathematicians had a responsibility to make sure it was used.
But even though the rhombic hexecontahedron appeared in Unkelbach’s 1940 paper, it might well have languished there forever, were it not for the fact that in 1946 a certain H. S. M. (“Donald”) Coxeter wrote a short review of the paper for the (fairly new) American Mathematical Reviews. His review catalogs the polyhedra mentioned in the paper, much as a naturalist might catalog new species seen on an expedition. The high point is what he describes as “a remarkable rhombic hexecontahedron”, for which he reports that “its faces have the same shape as those of the triacontahedron, of which it is actually a stellation”.
Polyhedra were not exactly a hot topic in the mathematics of the mid-1900s, but Coxeter was their leading proponent—and was connected in one way or another to pretty much everyone who was working on them. In 1948 he published his book Regular Polytopes. It describes in a systematic way a variety of families of regular polyhedra, in particular showing the great stellated triacontahedron (or great rhombic triacontahedron)—which effectively contains a rhombic hexecontahedron:
But Coxeter didn’t explicitly mention the rhombic hexecontahedron in his book, and while it picked up a few mentions from polyhedron aficionados, the rhombic hexecontahedron remained a basically obscure (and sometimes misspelled) polyhedron.
Quasicrystals
Crystals had always provided important examples of polyhedra. But by the 1800s, with atomic theory increasingly established, there began to be serious investigation of crystallography, and of how atoms are arranged in crystals. Polyhedra made a frequent appearance, in particular in representing the geometries of repeating blocks of atoms (“unit cells”) in crystals.
By 1850 it was known that there were basically only 14 possible such geometries; among them is one based on the rhombic dodecahedron. A notable feature of these geometries is that they all have specific two-, three-, four- or six-fold symmetries—essentially a consequence of the fact that only certain polyhedra can tessellate space, much as in 2D the only regular polygons that can tile the plane are squares, triangles and hexagons.
But what about for non-crystalline materials, like liquids or glasses? People had wondered since before the 1930s whether at least approximate five-fold symmetries could exist there. You can’t tessellate space with regular icosahedra (which have five-fold symmetry), but maybe you could at least have icosahedral regions with little gaps in between.
None of this was settled when in 1982 x-ray crystallography on a rapidly cooled aluminum-manganese material effectively showed five-fold symmetry. Within a year or so there were electron microscope pictures of grains that were shaped like rhombic triacontahedra:
And as people imagined how these triacontahedra could pack together, the rhombic hexecontahedron soon made its appearance—as a “hole” in a cluster of 12 rhombic triacontahedra:
At first it was referred to as a “20-branched star”. But soon the connection with the polyhedron literature was made, and it was identified as a rhombic hexecontahedron.
Meanwhile, the whole idea of making things out of rhombic elements was gaining attention. Michael Longuet-Higgins, longtime oceanographer and expert on how wind makes water waves, jumped on the bandwagon, in 1987 filing a patent for a toy based on magnetic rhombohedral blocks, that could make a “Kepler Star” (rhombic hexecontahedron) or a “Kepler Ball” (rhombic triacontahedron):
And—although I only just found this out—the rhombohedral blocks that we considered in 2009 for widespread “Spikey making” were actually produced by Dextro Mathematical Toys (aka Rhombo.com), operating out of Longuet-Higgins’s house in San Diego.
The whole question of what can successfully tessellate space—or even tile the plane—is a complicated one. In fact, the general problem of whether a particular set of shapes can be arranged to tile the plane has been known since the early 1960s to be formally undecidable. (One might verify that 1000 of these shapes can fit together, but it can take arbitrarily more computational effort to figure out the answer for more and more of the shapes.)
People like Kepler presumably assumed if a set of shapes was going to tile the plane, they must be able to do so in a purely repetitive pattern. But following the realization that the general tiling problem is undecidable, Roger Penrose in 1974 came up with two shapes that could successfully tile the plane, but not in a repetitive way. By 1976 Penrose (as well as Robert Ammann) had come up with a slightly simpler version:
And, yes, the shapes here are rhombuses, though not golden rhombuses. But with angles 36°,144° and 72°,108°, they arrange with 5- and 10-fold symmetry.
By construction, these rhombuses (or, more strictly, shapes made from them) can’t form a repetitive pattern. But it turns out they can form a pattern that can be built up in a systematic, nested way:
And, yes, the middle of step 3 in this sequence looks rather like our flattened Spikey. But it’s not exactly right; the aspect ratios of the outer rhombuses are off.
Looking at it from above, it looks exactly like the beginning of the nested construction of the Penrose tiling. If one keeps going, one gets the Penrose tiling:
Looked at “from the side” in 3D, one can tell it’s still just identical golden rhombuses:
Putting four of these “Wieringa roofs” together one can form exactly the rhombic hexecontahedron:
But what’s the relation between these nested constructions and the actual way physical quasicrystals form? It’s not yet clear. But it’s still neat to see even hints of rhombic hexecontahedra showing up in nature.
And historically it was through their discussion in quasicrystals that Sándor Kabai came to start studying rhombic hexecontahedra with Mathematica, which in turn led Eric Weisstein to find out about them, which in turn led them to be in Mathematica and the Wolfram Language, which in turn led me to pick one for our logo. And in recognition of this, we print the nestedly constructed Penrose tiling on the inside of our paper Spikey:
Flattening Spikey
Our Wolfram|Alpha Spikey burst onto the scene in 2009 with the release of Wolfram|Alpha. But we still had our long-running and progressively evolving Mathematica Spikey too. So when we built a new European headquarters in 2011 we had not just one, but two Spikeys vying to be on it.
Our longtime art director Jeremy Davis came up with a solution: take one Spikey, but “idealize” it, using just its “skeleton”. It wasn’t hard to decide to start from the rhombic hexecontahedron. But then we flattened it (with the best ratios, of course)—and finally ended up with the first implementation of our now-familiar logo:
The Brazilian Surprise
When I started writing this piece, I thought the story would basically end here. After all, I’ve now described how we picked the rhombic hexecontahedron, and how mathematicians came up with it in the first place. But before finishing the piece, I thought, “I’d better look through all the correspondence I’ve received about Spikey over the years, just to make sure I’m not missing anything.”
And that’s when I noticed an email from June 2009, from an artist in Brazil named Yolanda Cipriano. She said she’d seen an article about Wolfram|Alpha in a Brazilian news magazine—and had noticed the Spikey—and wanted to point me to her website. It was now more than nine years later, but I followed the link anyway, and was amazed to find this:
I read more of her email: “Here in Brazil this object is called ‘Giramundo’ or ‘Flor Mandacarú’ (Mandacaru Flower) and it is an artistic ornament made with [tissue paper]”.
What?! There was a Spikey tradition in Brazil, and all these years we’d never heard about it? I soon found other pictures on the web. Only a few of the Spikeys were made with paper; most were fabric—but there were lots of them:
I emailed a Brazilian friend who’d worked on the original development of Wolfram|Alpha. He quickly responded “These are indeed familiar objects… and to my shame I was never inquisitive enough to connect the dots”—then sent me pictures from a local arts and crafts catalog:
But now the hunt was on: what were these things, and where had they come from? Someone at our company volunteered that actually her great-grandmother in Chile had made such things out of crochet—and always with a tail. We started contacting people who had put up pictures of “folk Spikeys” on the web. Quite often all they knew was that they got theirs from a thrift shop. But sometimes people would say that they knew how to make them. And the story always seemed to be the same: they’d learned how to do it from their grandmothers.
The typical way to build a folk Spikey—at least in modern times—seems to be to start off by cutting out 60 cardboard rhombuses. The next step is to wrap each rhombus in fabric—and finally to stitch them all together:
OK, but there’s an immediate math issue here. Are these people really correctly measuring out 63° golden rhombuses? The answer is typically no. Instead, they’re making 60° rhombuses out of pairs of equilateral triangles—just like the standard diamond shapes used in quilts. So how then does the Spikey fit together? Well, 60° is not far from 63°, and if you’re sewing the faces together, there’s enough wiggle room that it’s easy to make the polyhedron close even without the angles being precisely right. (There are also “quasi-Spikeys” that—as in Unkelbach’s construction—don’t have rhombuses for faces, but instead have pointier “outside triangles”.)
Folk Spikeys on the web are labeled in all sorts of ways. The most common is as “Giramundos”. But quite often they are called “Estrelas da Felicidade” (“stars of happiness”). Confusingly, some of them are also labeled “Moravian stars”—but actually, Moravian stars are different and much pointier polyhedra (most often heavily augmented rhombicuboctahedra) that happen to have recently become popular, particularly for light fixtures.
Despite quite a bit of investigation, I still don’t know what the full history of the “folk Spikey” is. But here’s what I’ve found out so far. First, at least what survives of the folk Spikey tradition is centered around Brazil (even though we have a few stories of other appearances). Second, the tradition seems to be fairly old, definitely dating from well before 1900 and quite possibly several centuries earlier. So far as I can tell—as is common with folk art—it’s a purely oral tradition, and so far I haven’t found any real historical documentation about it.
My best information has come from a certain Paula Guerra, who sold folk Spikeys at a tourist-oriented cafe she operated a decade ago in the historic town of São Luíz do Paraitinga. She said people would come into her cafe from all over Brazil, see the folk Spikeys and say, “I haven’t seen one of those in 50 years…”
Paula herself learned about folk Spikeys (she calls them “stars”) from an older woman living on a multigenerational local family farm, who’d been making them since she was a little girl, and had been taught how to do it by her mother. Her procedure—which seems to have been typical—was to get cardboard from anywhere (originally, things like hat boxes), then to cover it with fabric scraps, usually from clothes, then to sew the whole perhaps-6″-across object together.
How old is the folk Spikey? Well, we only have oral tradition to go by. But we’ve tracked down several people who saw folk Spikeys being made by relatives who were born around 1900. Paula said that a decade ago she’d met an 80-year-old woman who told her that when she was growing up on a 200-year-old coffee farm there was a shelf of folk Spikeys from four generations of women.
At least part of the folk Spikey story seems to center around a mother-daughter tradition. Mothers, it is said, often made folk Spikeys as wedding presents when their daughters went off to get married. Typically the Spikeys were made from scraps of clothes and other things that would remind the daughters of their childhood—a bit like how quilts are sometimes made for modern kids going to college.
But for folk Spikeys there was apparently another twist: it was common that before a Spikey was sewn up, a mother would put money inside it, for her daughter’s use in an emergency. The daughter would then keep her Spikey with her sewing supplies, where her husband would be unlikely to pick it up. (Some Spikeys seem to have been used as pincushions—perhaps providing an additional disincentive for them to be picked up.)
What kinds of families had the folk Spikey tradition? Starting around 1750 there were many coffee and sugar plantations in rural Brazil, far from towns. And until perhaps 1900 it was common for farmers from these plantations to get brides—often as young as 13—from distant towns. And perhaps these brides—who were typically from well-off families of Portuguese descent, and were often comparatively well educated—came with folk Spikeys.
In time the tradition seems to have spread to poorer families, and to have been preserved mainly there. But around the 1950s—presumably with the advent of roads and urbanization and the move away from living on remote farms—the tradition seems to have all but died out. (In rural schools in southern Brazil there were however apparently girls in the 1950s being taught in art classes how to make folk Spikeys with openings in them—to serve as piggy banks.)
Folk Spikeys seem to have shown up with different stories in different places around Brazil. In the southern border region (near Argentina and Uruguay) there’s apparently a tradition that the “Star of St. Miguel” (aka folk Spikey) was made in villages by healer women (aka “witches”), who were supposed to think about the health of the person being healed while they were sewing their Spikeys.
In other parts of Brazil, folk Spikeys sometimes seem to be referred to by the names of flowers and fruits that look vaguely similar. In the northeast, “Flor Mandacarú” (after flowers on a cactus). In tropical wetland areas, “Carambola” (after star fruit). And in central forest areas “Pindaíva” (after a spiky red fruit).
But the most common current name for a folk Spikey seems to be “Giramundo”—an apparently not-very-recent Portuguese constructed word meaning essentially “whirling world”. The folk Spikey, it seems, was used like a charm, and was supposed to bring good luck as it twirled in the wind. The addition of tails seems to be recent, but apparently it was common to hang up folk Spikeys in houses, perhaps particularly on festive occasions.
It’s often not clear what’s original, and what’s a more recent tradition that happens to have “entrained” folk Spikeys. In the Three Kings’ Day parade (as in the three kings from the Bible) in São Luiz do Paraitinga, folk Spikeys are apparently used to signify the Star of Bethlehem—but this seems to just be a recent thing, definitely not indicative of some ancient religious connection.
We’ve found a couple of examples of folk Spikeys showing up in art exhibitions. One was in a 1963 exhibition about folk art from northeastern Brazil organized by architect Lina Bo Bardi. The other, which happens to be the largest 3D Spikey I’ve ever seen, was in a 1997 exhibition of work by architect and set designer Flávio Império:
So… where did the folk Spikey come from? I still don’t know. It may have originated in Brazil; it may have come from Portugal or elsewhere in Europe. The central use of fabrics and sewing needed to make a “60° Spikey” work might argue against an Amerindian or African origin.
One modern Spikey artisan did say that her great grandmother—who made folk Spikeys and was born in the late 1800s—came from the Romagna region of Italy. (One also said she learned about folk Spikeys from her French-Canadian grandmother.) And I suppose it’s conceivable that at one time there were folk Spikeys all over Europe, but they died out enough generations ago that no oral tradition about them survives. Still, while a decent number of polyhedra appear, for example, in European paintings from earlier centuries, I don’t know of a single Spikey among them. (I also don’t know of any Spikeys in historical Islamic art.)
But ultimately I’m pretty sure that somewhere there’s a single origin for the folk Spikey. It’s not something that I suspect was invented more than once.
So far the Spikey has proved more elusive—and it certainly doesn’t help that the primary medium in which it appears to have been explored involved fabric, which doesn’t keep the way stone does.
Spikeys Come to Life
Whatever its ultimate origins, Spikey serves us very well as a strong and dignified icon. But sometimes it’s fun to have Spikey “come to life”—and over the years we’ve made various “personified Spikeys” for various purposes:
When you use Wolfram|Alpha, it’ll usually show its normal, geometrical Spikey. But just sometimes your query will make the Spikey “come to life”—as it does for pi queries on Pi Day:
Spikeys Forever
Polyhedra are timeless. You see a polyhedron in a picture from 500 years ago and it’ll look just as clean and modern as a polyhedron from my computer today.
I’ve spent a fair fraction of my life finding abstract, computational things (think cellular automaton patterns). And they too have a timelessness to them. But—try as I might—I have not found much of a thread of history for them. As abstract objects they could have been created at any time. But in fact they are modern, created because of the conceptual framework we now have, and with the tools we have today—and never seen before.
Polyhedra have both timelessness and a rich history that goes back thousands of years. In their appearance, polyhedra remind us of gems. And finding a certain kind of regular polyhedron is a bit like finding a gem out in the geometrical universe of all possible shapes.
The rhombic hexecontahedron is a wonderful such gem, and as I have explored its properties, I have come to have even more appreciation for it. But it is also a gem with a human story—and it is so interesting to see how something as abstract as a polyhedron can connect people across the world with such diverse backgrounds and objectives.
Who first came up with the rhombic hexecontahedron? We don’t know, and perhaps we never will. But now that it is here, it’s forever. My favorite polyhedron.
I’m a person who’s only satisfied if I feel I’m being productive. I like figuring things out. I like making things. And I want to do as much of that as I can. And part of being able to do that is to have the best personal infrastructure I can. Over the years I’ve been steadily accumulating and implementing “personal infrastructure hacks” for myself. Some of them are, yes, quite nerdy. But they certainly help me be productive. And maybe in time more and more of them will become mainstream, as a few already have.
Now, of course, one giant “productivity hack” that I’ve been building for the world for a very long time is the whole technology stack around the Wolfram Language. And for me personally, another huge “productivity hack” is my company, which I started more than 32 years ago. Yes, it could (and should) be larger, and have more commercial reach. But as a nicely organized private company with about 800 people it’s an awfully efficient machine for turning ideas into real things, and for leveraging what skills I have to greatly amplify my personal productivity.
I could talk about how I lead my life, and how I like to balance doing leadership, doing creative work, interacting with people, and doing things that let me learn. I could talk about how I try to set things up so that what I’ve already built doesn’t keep me so busy I can’t start anything new. But instead what I’m going to focus on here is my more practical personal infrastructure: the technology and other things that help me live and work better, feel less busy, and be more productive every day.
At an intellectual level, the key to building this infrastructure is to structure, streamline and automate everything as much as possible—while recognizing both what’s realistic with current technology, and what fits with me personally. In many ways, it’s a good, practical exercise in computational thinking, and, yes, it’s a good application of some of the tools and ideas that I’ve spent so long building. Much of it can probably be helpful to lots of other people too; some of it is pretty specific to my personality, my situation and my patterns of activity.
My Daily Life
To explain my personal infrastructure, I first have to say a bit about my daily life. Something that often surprises people is that for 28 years I’ve been a remote CEO. I’m about as hands-on a CEO as they come. But I’m only physically “in the office” a few times a year. Mostly I’m just at home, interacting with the company with great intensity—but purely through modern virtual means:
I’m one of those CEOs who actually does a lot of stuff myself, as well as managing other people to do things. Being a remote CEO helps me achieve that, and stay focused. And partly following my example, our company has evolved a very distributed culture, with people working scattered all over the world (it’s all about being productive, rather than about “showing up”):
At my desk, though, my basic view of all this is just:
It’s always set up the same way. On the right is my main “public display” monitor that I’ll be screensharing most of the day with people I’m talking to. On the left is my secondary “private display” monitor that’s got my email and messages and other things that aren’t directly relevant to the meetings I’m doing.
Particularly since I’m at my desk much of each day, I’ve tried to optimize its ergonomics. The keyboard is at the right height for optimal typing. The monitors are at a height that—especially given my “computer distance” multifocal glasses—forces my head to be in a good position when I look at them, and not hunched over. I still use a “roll-around” mouse (on the left, since I’m left-handed)—because at least according to my latest measurements I’m still faster with that than with any other pointing technology.
At the touch of a button, my desk goes to standing height:
But while standing may be better than sitting, I like to at least start my day with something more active, and for more than a decade I’ve been making sure to walk for a couple of hours every morning. But how can I be productive while I’m walking? Well, nearly 15 years ago (i.e. long before it was popular!) I set up a treadmill with a computer in the room next to my office:
The biomechanics weren’t too hard to work out. I found out that by putting a gel strip at the correct pivot point under my wrists (and putting the mouse on a platform) I can comfortably type while I’m walking. I typically use a 5% incline and go at 2 mph—and I’m at least fit enough that I don’t think anyone can tell I’m walking while I’m talking in a meeting. (And, yes, I try to get potentially frustrating meetings scheduled during my walking time, so if I do in fact get frustrated I can just “walk it off” by making the treadmill go a little faster.)
For many years I’ve kept all kinds of personal analytics data on myself, and for the past couple of years this has included continuous heart-rate data. Early last summer I noticed that for a couple of weeks my resting heart rate had noticeably gone down. At first I thought it was just because I happened to be systematically doing something I liked then. But later in the summer, it happened again. And then I realized: those were times when I wasn’t walking inside on a treadmill; instead (for different reasons) I was walking outside.
For many years my wife had been extolling the virtues of spending time outside. But it had never really seemed practical for me. Yes, I could talk on the phone (or, in rare cases, actually talk to someone I was walking with). Or I could be walking with a tablet, perhaps watching someone else screensharing—as I did, rather unstylishly, for a week late last summer during my version of a vacation:
I’d actually been thinking about walking and working for a long time. Twenty years ago I imagined doing it with an augmented reality display and a one-handed (chorded) keyboard. But the technology didn’t arrive, and I wasn’t even sure the ergonomics would work out (would it make me motion sick, for example?).
But then, last spring, I was at a fancy tech event, and I happened to be just out of the frame of a photo op that involved Jeff Bezos walking with a robotic dog. I wasn’t personally so excited about the robotic dog. But what really interested me was the person walking out of the frame on the other side, intently controlling the dog—using a laptop that he had strapped on in front of him as if he were selling popcorn.
Could one actually work like this, typing and everything? After my “heart-rate discovery” I decided I had to try it. I thought I’d have to build something myself, but actually one can just buy “walking desks”, and so I did. And after minor modifications, I discovered that I could walk and type perfectly well with it, even for a couple of hours. I was embarrassed I hadn’t figured out such a simple solution 20 years ago. But starting last fall—whenever the weather’s been good—I’ve tried to spend a couple of hours of each day walking outside like this:
And even when I’m intently concentrating on my computer, it’s somehow nice to be outside—and, yes, it seems to have made my resting heart rate go down. And I seem to have enough peripheral vision—or perhaps I’ve just been walking in “simple enough” environments—that I haven’t tripped even when I’m not consciously paying attention. No doubt it helps that I haven’t mostly been walking in public places, so there aren’t other people around. Of course, that also means that I haven’t had the opportunity to get the kind of curious stares I did in 1987 when I first walked down a city street talking on a shoe-sized cellphone….
My Desk Environment
I’ve had the same big wooden desk for 25 years. And needless to say, I had it constructed with some special features. One of my theories of personal organization is that any flat surface represents a potential “stagnation point” that will tend to accumulate piles of stuff—and the best way to avoid such piles is just to avoid having permanent flat surfaces. But one inevitably needs some flat surface, if only just to sign things (it’s not all digital yet), or to eat a snack. So my solution is to have pullouts. If one needs them, pull them out. But one can’t leave them pulled out, so nothing can accumulate on them:
These days I don’t deal with paper much. But whenever something does come across my desk, I like to file it. So behind my desk I have an array of drawers—with the little hack that there’s a slot at the top of each drawer that allows me to immediately slide things into the drawer, without opening it:
I used to fill up a banker’s box with filed papers every couple of months; now it seems to take a couple of years. And perhaps as a sign of how paperless I’ve become, I have a printer under my desk that I use so rarely that I now seem to go through a ream of paper only every year or so.
There are also other things that have changed over the years. I always want my main computer to be as powerful as possible. And for years that meant that it had to have a big fan to dissipate heat. But since I really like my office to be perfectly quiet (it adds a certain calmness that helps my concentration), I had to put the CPU part of my computer in a different room. And to achieve this, I had a conduit in the floor, through which I had to run often-finicky long-distance video cables. Well, now, finally, I have a powerful computer that doesn’t need a big fan—and so I just keep it behind my desk. (I actually also have three other not-so-quiet computers that I keep in the same room as the treadmill, so that when I’m on the treadmill I can experience all three main modern computing environments, choosing between them with a KVM switch.)
When I mention to people that I’m a remote CEO, they often say, “You must do lots of videoconferencing”. Well, actually, I do basically no videoconferencing. Screensharing is great, and critical. But typically I find video distracting. Often I’ll do a meeting where I have lots of people in case we need to get their input. But for most of the meeting I don’t need all of them to be paying attention (and I’m happy if they’re getting other work done). But if video is on, seeing people who are not paying attention just seems to viscerally kill the mood of almost any meeting.
Given that I don’t have video, audio is very important, and I’m quite a stickler for audio quality in meetings. No speakerphones. No bad cellphone connections. I myself remain quite old school. I wear a headset (with padding added to compensate for my lack of top-of-head hair) with a standard boom microphone. And—partly out of caution about having a radio transmitter next to my head all day—my headset is wired, albeit with a long wire that lets me roam around my office.
Even though I don’t use “talking head” video for meetings, I do have a document camera next to my computer. One time I’ll use this is when we’re talking about phones or tablets. Yes, I could connect their video directly into my computer. But if we’re discussing user experience on a phone it’s often helpful to be able to actually see my finger physically touching the phone.
The document camera also comes in handy when I want to show pages from a physical book, or artifacts of various kinds. When I want to draw something simple I’ll use the annotation capabilities of our screensharing system. But when I’m trying to draw something more elaborate I’ll usually do the retro thing of putting a piece of paper under the document camera, then just using a pen. I like the fact that the image from the document camera comes up in a window on my screen, that I can resize however I want. (I periodically try using drawing tablets but I don’t like the way they treat my whole screen as a canvas, rather than operating in a window that I can move around.)
On the Move
In some ways I lead a simple life, mostly at my desk. But there are plenty of times when I’m away from my desk—like when I’m someplace else in my house, or walking outside. And in those cases I’ll normally take a 13″ laptop to use. When I go further afield, it gets a bit more complicated.
If I’m going to do serious work, or give a talk, I’ll take the 13″ laptop. But I never like to be computerless, and the 13″ laptop is a heavy thing to lug around. So instead I also have a tiny 2-lb laptop, which I put in a little bag (needless to say, both the bag and the computer are adorned with our Spikey logo):
And for at least the past couple of years—unless I’m bringing the bigger computer, usually in a backpack—I have taken to “wearing” my little computer wherever I go. I originally wanted a bag where the computer would fit completely inside, but the nicest bag I could find had the computer sticking out a bit. To my surprise, though, this has worked well. And it’s certainly amusing when I’m talking to someone and quickly “draw” my computer, and they look confused, and ask, “Where did that come from?”
I always have my phone in my pocket, and if I have just a few moments that’s what I’ll pull out. It works fine if I’m checking mail, and deleting or forwarding a few messages. If I actually want to write anything serious, though, out will come my little computer, with its full keyboard. Of course, if I’m standing up it’s pretty impractical to try to balance the computer on one hand and type with the other. And sometimes if I know I’m going to be standing for a while, I’ll bring a tablet with me. But other times, I’ll just be stuck with my phone. And if I run out of current things I can usefully do (or I don’t have an internet connection) I’ll typically start looking at the “things to read” folder that I maintain synched on all my devices.
Back in 2007 I invented WolframTones because I wanted to have a unique ringtone for my phone. But while WolframTones has been successful as an example of algorithmic music composition, the only trace of it on my phone is the image of WolframTones compositions that I use as my home screen:
How do I take notes when I’m “out and about”? I’ve tried various technological solutions, but in the end none have proved both practical and universally socially acceptable. So I’ve kept doing the same thing for 40 years: in my pocket I have a pen, together with a piece of paper folded three times (so it’s about the size of a credit card). It’s very low-tech, but it works. And when I come back from being out I always take a few moments to transcribe what I wrote down, send out emails, or whatever.
I have little “tech survival kits” that I bring with me. Here are the current contents from my backpack:
The centerpiece is a tiny charger, that charges both my computer (through USB-C) and my phone. I bring various connectors, notably so I can connect to things like projectors. I also bring a very light 2- to 3-prong power adaptor, so I don’t find my charger falling out of overused power outlets.
When I’m going on “more serious expeditions” I’ll add some things to the kit:
There’s a “charging brick” (unfortunately now in short supply) that’ll keep my computer going for many hours. For events like trade shows, I’ll bring a tiny camera that takes pictures every 30 seconds, so I can remember what I saw. And if I’m really going out into the wilds, I’ll bring a satphone as well. (Of course, I always have other stuff too, like a very thin and floppy hat, a light neoprene bag-within-a-bag, glasses wipes, hand sanitizer, mosquito wipes, business cards, pieces of chocolate, etc.)
In my efforts to keep organized on trips, I’ll typically pack several plastic envelopes:
In “Presentation” there’ll be the adaptors (VGA, HDMI, …) I need to connect to projectors. Sometimes there’ll be a wired Ethernet adaptor. (For very low-key presentations, I’ll also sometimes bring a tiny projector too.) In “Car” there’ll be a second cellphone that can be used as a GPS, with a magnetic back and a tiny thing for attaching to the air vent in a car. There’ll be a monaural headset, a phone charger, and sometimes a tiny inverter for my computer. If I’m bringing the satphone, there’ll also be a car kit for it, with an antenna that magnets to the roof of the car, so it can “see” the satellites. In “Hotel” there’ll be a binaural headset, a second computer charger, and a disk with an encrypted backup of my computer, in case I lose my computer and have to buy and configure a new machine. The fourth plastic envelope is used to store things I get on the trip, and it contains little envelopes—approximately one for each day of my trip—in which I put business cards.
Years ago, I always used to bring a little white-noise fan with me, to mask background noise, particularly at night. But at some point I realized that I didn’t need a physical fan, and instead I just have an app that simulates it (I used to use pink noise, but now I just use “air conditioner sound”). It’s often something of a challenge to predict just how loud the outside noise one’s going to encounter (say, the next morning) will be, and so how loud one should set the masking sound. And, actually, as I write this, I realize I should use modern audio processing in the Wolfram Language to just listen to external sounds, and adjust the masking sound to cover them.
Another thing I need when I travel is a clock. And nowadays it’s just a piece of Wolfram Language code running on my computer. But because it’s software, it can have a few extra features. I always leave my computer on my home timezone, so the “clock” has a slider to specify local time (yes, if I’m ever in a half-hour timezone again I’ll have to tweak the code). It also has a button Start sleep timer. When I press it, it starts a count-up timer, which lets me see how long I’ve been asleep, whatever my biological clock may say. (Start sleep timer also sends an email which gives my assistant an idea of whether or not I’ll make it to that early-next-morning meeting. The top right-hand “mouse corner” is a hack for preventing the computer from going to sleep.)
Whenever it’s practical, I like to drive myself places. It was a different story before cellphones. But nowadays if I’m driving I’m productively making a phone call. I’ll have meetings that don’t require me to look at anything scheduled for my “drive times” (and, yes, it’s nice to have standard conference call numbers programmed in my phone, so I can voice-dial them). And I maintain a “call-while-driving” list of calls that I can do while driving, particularly if I’m in an unusual-for-me timezone.
I’ve always had the problem that if I try to work on a computer while I’m being driven by someone else, I get car sick. I thought I had tried everything. Big cars. Little cars. Hard suspension. Soft suspension. Front seat. Back seat. Nothing worked. But a couple of years ago, quite by chance, I tried listening to music with big noise-canceling headphones—and I didn’t get car sick. But what if when I’m being driven I want to be on the phone while I’m using my computer? Well, at the 2018 Consumer Electronics Show, despite my son’s admonition that “just because you can’t tell what they’re selling at a booth doesn’t mean it’s interesting”, I stopped at a booth and got these strange objects, which, despite looking a bit odd, do seem to prevent car sickness for me, at least much of the time:
Giving Talks
I give quite a lot of talks—to a very wide range of audiences. I particularly like giving talks about subjects I haven’t talked about before. I give talks to the fanciest business, tech and science groups. I give talks to schoolkids. I enjoy interacting with audiences (Q&A is always my favorite part), and I enjoy being spontaneous. And I essentially always end up doing livecoding.
When I was young I traveled quite a bit. I did have portable computers even back in the 1980s (my first was an Osborne 1 in 1981), though mostly in those days my only way to stay computer-productive was to have workstation computers shipped to my destinations. Then in the early 1990s, I decided I wasn’t going to travel anymore (not least because I was working so intensely on A New Kind of Science). So for a while I basically didn’t give any talks. But then technology advanced. And it started being realistic to give talks through videoconferencing.
I went through several generations of technology, but a number of years ago I built out a videoconferencing setup in my basement. The “set” can be reconfigured in various ways (podium, desk, etc.) But basically I have a back-projection screen on which I can see the remote audience. The camera is in front of the screen, positioned so I’m looking straight at it. If I’m using notes or a script (which, realistically, is rare) I have a homemade teleprompter consisting of a half-silvered mirror and a laptop that I can look at the camera through.
While it’s technically feasible for me to be looking straight at the camera when I’m livecoding, this makes it look to the audience as if I’m staring off into space, which seems weird. It’s better to look slightly down when I’m obviously looking at a screen. And in fact with some setups it’s good for the audience to see the top of a computer right at the bottom of the screen, to “explain” what I’m looking at.
Videoconferenced talks work quite well in many settings (and, for some extra fun, I’ve sometimes used a telepresence robot). But in recent years (partly as a result of my children wanting to do it with me) I’ve decided that traveling is OK—and I’ve been all over the place:
I’ll usually be giving talks—often several per day. And I’ve gradually developed an elaborate checklist of what’s needed to have them work. A podium that’s at the right height and horizontal enough to let me type easily on my computer (and preferably not so massive that I’m hidden from the audience). An attachable microphone that leaves my hands free to type. A network connection that lets me reach our servers. And, of course, to let the audience actually see things, a computer projector.
I remember the very first computer projector I used, in 1980. It was a Hughes “liquid crystal light valve”, and once I got it connected to a CRT terminal, it worked beautifully. In the years since then I’ve used computer projectors all over the world, both in the fanciest audiovisual situations, and in outlying places with ancient equipment and poor infrastructure. And it’s amazing how random it is. In places where one can’t imagine the projector is going to work, it’ll be just fine. And in places where one can’t imagine it won’t work, it’ll fail horribly.
Some years ago I was giving a talk at TED—with some of the fanciest audiovisual equipment I’d ever seen. And that was one of the places where things failed horribly. Fortunately we did a test the day before. But it took a solid three hours to get the top-of-the-line computer projector to successfully project my computer’s screen.
And as a result of that very experience I decided I’d better actually understand how computers talk to projectors. It’s a complicated business, that involves having the computer and the projector negotiate to find a resolution, aspect ratio, frame rate, etc. that will work for both of them. Underneath, there are things called EDID strings that are exchanged, and these are what typically get tangled up. Computer operating systems have gotten much better about handling this in recent years, but for high-profile, high-production-value events, I have a little box that spoofs EDID strings to force my computer to send a specific signal, regardless of what the projector seems to be asking it for.
Some of the talks I give are completely spontaneous. But often I’ll have notes—and occasionally even a script. And I’ll always write these in a Wolfram Notebook. I then have code that “paginates” them, basically replicating “paragraphs” at the end of each page, so I have freedom in when I “turn the page”. In past years I used to transfer these notes to an iPad that I’d set up to “turn the page” whenever I touched its screen. But in recent years I’ve actually just synched files, and used my little computer for my notes—which has the advantage that I can edit them right up to the moment I start giving the talk.
In addition to notes, I’ll sometimes also have material that I want to immediately bring into the talk. Now that we have our new Presenter Tools system, I may start creating more slide-show-like material. But that’s not how I’ve traditionally worked. Instead, I’ll typically just have a specific piece of Wolfram Language code I want to input, without having to take the time to explicitly type it. Or perhaps I’ll want to pick an image from a “slide farm” that I want to immediately put on the screen, say in response to a question. (There’s a lot of trickiness about projector resolutions in, for example, slides of cellular automata, because unless they’re “pixel perfect” they’ll alias—and it’s not good enough just to scale them like typical slide software would.)
So how do I deal with bringing in this material? Well, I have a second display connected to my computer—whose image isn’t projected. (And, yes, this can contribute to horrible tangling of EDID strings.) Then on that second display I can have things to click or copy. (I have a Wolfram Language function that will take a notebook of inputs and URLs, and make me a palette that I can click to type inputs, open webpages, etc.)
In the past we used to have a little second monitor to attach to my laptop—essentially a disembodied laptop screen. But it took all sorts of kludges to get both it and the projector connected to my laptop (sometimes one would be USB, one would be HDMI, etc.) But now we can just use an iPad—and it’s all pure software (though the interaction with projectors can still be finicky):
For a while, just to be stylish, I was using a computer with a Spikey carved out of its case, and backlit. But the little rhombuses in it were a bit fragile, so nowadays I mostly just use “Spikey skins” on my computers:
My Filesystem
The three main applications I use all day are Wolfram Desktop, a web browser, and email. My main way of working is to create (or edit) Wolfram Notebooks. Here are a few notebooks I worked on today:
On a good day I’ll type at least 25,000 characters into Wolfram Notebooks (and, yes, I record all my keystrokes). I always organize my notebooks into sections and subsections and so on (which, very conveniently, automatically exist in hierarchical cells). Sometimes I’ll write mostly text in a notebook. Sometimes I’ll screen capture something from elsewhere and paste it in, as a way to keep notes. Depending on what I’m doing, I’ll also actually do computations in a notebook, entering Wolfram Language input, getting results, etc.
Over the years, I’ve accumulated over a hundred thousand notebooks, representing product designs, plans, research, writings, and, basically, everything I do. All these notebooks are ultimately stored in my filesystem (yes, I sync with the cloud, use cloud files, and file servers, etc.) And I take pains to keep my filesystem organized—with the result I can typically find any notebook I’m looking for just by navigating my filesystem, faster than I could formulate a search for it.
I believe I first thought seriously about how to organize my files back in 1978 (which was also when I started using the Unix operating system). And over the past 40 years I’ve basically gone through five generations of filesystem organization, with each generation basically being a reflection of how I’m organizing my work at that stage in my life.
For example, during the period from 1991 to 2002 when I was writing my big book A New Kind of Science, a substantial part of my filesystem was organized simply according to sections of the book:
And it’s very satisfying that today I can go immediately from, say, an image in the online version of the book, to the notebook that created it (and the stability of the Wolfram Language means that I can immediately run the code in the notebook again—though sometimes it can now be written in a more streamlined way).
The sections of the book are basically laid out in the NewScience/Book/Layout/ folder of my “third-generation” filesystem. Another part of that filesystem is NewScience/BookResearch/Topics. And in this folder are about 60 subfolders named for broad topics that I studied while working on the book. Within each of these folders are then further subfolders for particular projects I did while studying those topics—which often then turned into particular sections or notes in the book.
Some of my thinking about computer filesystems derives from my experience in the 1970s and 1980s with physical filesystems. Back when I was a teenager doing physics I voraciously made photocopies of papers. And at first I thought the best way to file these papers would be in lots of different categories, with each category stored in a different physical file folder. I thought hard about the categories, often feeling quite pleased with the cleverness of associating a particular paper with a particular category. And I had the principle that if too many papers accumulated in one category, I should break it up into new categories.
All this at first seemed like a good idea. But fairly quickly I realized it wasn’t. Because too often when I wanted to find a particular paper I couldn’t figure out just what cleverness had caused me to associate it with what category. And the result was that I completely changed my approach. Instead of insisting on narrow categories, I allowed broad, general categories—with the result that I could easily have 50 or more papers filed in a single category (often ending up with multiple well-stuffed physical file folders for a given category):
And, yes, that meant that I would sometimes have to leaf through 50 papers or more to find one I wanted. But realistically this wouldn’t take more than a few minutes. And even if it happened several times a day it was still a huge win, because it meant that I could actually successfully find the things I wanted.
I have pretty much the same principle about some parts of my computer filesystem today. For example, when I’m collecting research about some topic, I’ll just toss all of it into a folder named for that topic. Sometimes I’ll even do this for years. Then when I’m ready to work on that topic, I’ll go through the folder and pick out what I want.
These days my filesystem is broken into an active part (that I continuously sync onto all my computers), and a more archival part, that I keep on a central fileserver (and that, for example, contains my older-generation filesystems).
There are only a few top-level folders in my active filesystem. One is called Events. Its subfolders are years. And within each year I’ll have a folder for each of the outside events I go to in that year. In that folder I’ll store material about the event, notebooks I used for talks there, notes I made at the event, etc. Since in a given year I won’t go to more than, maybe, 50 events, it’s easy to scan through the Events folder for a given year, and find the folder for a particular event.
Another top-level folder is called Designs. It contains all my notes about my design work on the Wolfram Language and other things we’re building. Right now there are about 150 folders about different active areas of design. But there’s also a folder called ARCHIVES, which contains folders about earlier areas that are no longer active.
And in fact this is a general principle in the project-oriented parts of my filesystem. Every folder has a subfolder called ARCHIVES. I try to make sure that the files (or subfolders) in the main folder are always somehow active or pending; anything that’s finished with I put in ARCHIVES. (I put the name in capitals so it stands out in directory listings.)
For most projects I’ll never look at anything in ARCHIVES again. But of course it’s easy to do so if I want to. And the fact that it’s easy is important, because it means I don’t have nagging concerns about saying “this is finished with; let’s put it in ARCHIVES”, even if I think there’s some chance it might become active again.
As it happens, this approach is somewhat inspired by something I saw done with physical documents. When I was consulting at Bell Labs in the early 1980s I saw that a friend of mine had two garbage cans in his office. When I asked him why, he explained that one was for genuine garbage and the other was a buffer into which he would throw documents that he thought he’d probably never want again. He’d let the buffer garbage can fill up, and once it was full, he’d throw away the lower documents in it, since from the fact that he hadn’t fished them out, he figured he’d probably never miss them if they were thrown away permanently.
Needless to say, I don’t follow exactly this approach, and in fact I keep everything, digital or paper. But the point is that the ARCHIVES mechanism gives me a way to easily keep material while still making it easy to see everything that’s active.
I have a bunch of other conventions too. When I’m doing designs, I’ll typically keep my notes in files with names like Notes-01.nb or SWNotes-01.nb. It’s like my principle of not having too many file categories: I don’t tend to try to categorize different parts of the design. I just sequentially number my files, because typically it’ll be the most recent—or most recent few—that are the most relevant when I continue with a particular design. And if the files are just numbered sequentially, it’s easy to find them; one’s not trying to remember what name one happened to give to some particular direction or idea.
A long time ago I started always naming my sequential files file-01, file-02, etc. That way pretty much any sorting scheme will sort the files in sequence. And, yes, I do often get to file-10, etc. But in all these years I have yet to get even close to file-99.
Knowing Where to Put Everything
When I’m specifically working on a particular project, I’ll usually just be using files in the folder associated with that project. But on a good day, I’ll have lots of ideas about lots of different projects. And I also get hundreds of emails every day, relevant to all sorts of different projects. But often it’ll be months or years before I’m finally ready to seriously concentrate on one of these other projects. So what I want to do is to store the material I accumulate in such a way that even long in the future I can readily find it.
For me, there are typically two dimensions to where something should be stored. The first is (not surprisingly) the content of what it’s about. But the second is the type of project in which I might use it. Is it going to be relevant to some feature of some product? Is it going to be raw material for some piece I write? Is it a seed for a student project, say at our annual Summer School? And so on.
For some types of projects, the material I’m storing typically consists of a whole file, or several files. For others, I just need to store an idea which can be summarized in a few words or paragraphs. So, for example, the seed for a student project is typically just an idea, that I can describe with a title, and perhaps a few lines of explanation. And in any given year I just keep adding such project ideas to a single notebook—which, for example, I’ll look at—and summarize—right before our annual summer programs.
For pieces like this that I’m potentially going to write, it’s a little different. At any given time, there are perhaps 50 pieces that I’m considering at some point writing. And what I do is to create a folder for each of them. Each will typically have files with names like Notes-01.nb, into which I accumulate specific ideas. But then the folder will also contain complete files, or groups of files, that I accumulate about the topic of the piece. (Sometimes I’ll organize these into subfolders, with names like Explorations and Materials.)
In my filesystem, I have folders for different types of projects: Writings, Designs, StudentProjects, etc. I find it important to have only a modest number of such folders (even with my fairly complex life, not much more than a dozen). When something comes in—say from a piece of email, or from a conversation, or from something I see on the web, or just from an idea I have—I need to be able to quickly figure out what type of project (if any) it might be relevant to.
At some level it’s as simple as “what file should I put it into”? But the key point is to have a pre-existing structure that makes it quick to decide that—and then to have this structure be one in which I can readily find things even far into the future.
There are plenty of tricky issues. Particularly if years go by, the way one names or thinks about a topic may change. And sometimes that means at some point I’ll just rename a folder or some such. But the crucial thing as far as I’m concerned is that at any given time the total number of folders into which I’m actively putting things is small enough that I can basically remember all of them. I might have a dozen folders for different types of projects. Then some of these will need subfolders for specific projects about specific topics. But I try to limit the total number of “active accumulation folders” to at most a few hundred.
Some of those “accumulation folders” I’ve had for a decade or more. A few will come into existence and be gone within a few months. But most will last at most a few years—basically the time between when I conceptualize a project, and when the project is, for practical purposes, finished.
It’s not perfect, but I end up maintaining two hierarchies of folders. The first, and most important, is in my filesystem. But the second is in my email. There are two basic reasons I maintain material in email folders. The first is immediate convenience. Some piece of mail comes in and I think “that’s relevant to such-and-such a project that I’m planning to do”—and I want to store it in an appropriate place. Well, if that place is a mail folder, all I have to do is move the mail with one mouse motion (or maybe with one press of a Touch Bar button). I don’t have to, for example, find a file or filesystem folder to put it into.
There’s also another reason it’s good to leave mail as mail: threading. In the Wolfram Language we’ve now got capabilities both for importing mailboxes, and for connecting to live mail servers. And one of the things one quickly sees is how complicated the graphs (actually, hypergraphs) of email conversations can be. Mail clients certainly aren’t perfect as a way to view these conversations, but it’s a lot better to use one than, say, to have a collection of separate files.
When projects are fairly well defined, but aren’t yet very active, I tend to use filesystem folders rather than email folders. Typically what will be coming in about these projects are fairly isolated (and non-threaded) pieces of mail. And I find it best either just to drag those pieces of mail into appropriate project folders, or to copy out their contents and add them to notebooks.
When a project is very active, there may be lots of mail coming in about it, and it’s important to preserve the threading structure. And when a project isn’t yet so well defined, I just want to throw everything about it into a single “bucket”, and not have to think about organizing it into subfolders, notebooks, etc.
If I look at my mail folders, I see many that parallel folders in my filesystem. But I see some that do not, particularly related to longer-term project concepts. And I have many such folders that have been there for well over a decade (my current overall mail folder organization is about 15 years old). Sometimes their names aren’t perfect. But there are few enough folders, and I’ve seen them for long enough, that I have a sense of what I’m filing in them, even though their names don’t quite capture it.
It’s always very satisfying when I’m ready to work on a project, and I open the mail folder for it, and start going through messages, often from long ago. Just in the past few weeks, as we wrap up a major new version of the Wolfram Language, I’m starting to look ahead, and I’ve been going through folders with messages from 2005, and so on. When I saved those messages, I didn’t yet have a definite framework for the project they’re about. But now I do. So when I go through the messages I can quickly put them into the appropriate active notebooks and so on. Then I delete the messages from the mail folder, and eventually, once it is empty, delete the whole mail folder. (Unlike with files, I don’t find it useful to have an ARCHIVES folder for mail; the mail is just too voluminous and not organized enough, so to find any particular item I’ll probably end up having to search for it anyway, and of course I certainly have all of my mail stored.)
OK, so I have my filesystem, and I have mail. At our company we also have an extensive project management system, as well as all sorts of databases, request trackers, source control systems, etc. Mostly the nature of my current work does not cause me to interact directly with these, and I don’t explicitly store my own personal output in them. At different times, and with different projects, I have done so. But right now my interaction with these systems is basically only as a viewer, not an author.
Beyond these systems, there are lots of things that I interact with basically through webpages. These might be public sites like wolframalpha.com or wolfram.com. They might be internal sites at our company. And they might be preliminary (say, “test” or “devel”) versions of what will in the future be public websites or web-based services. I have a personal homepage that gives me convenient access to all these things:
The source for the homepage is (needless to say) a Wolfram Notebook. I can edit this notebook in my filesystem, then press a button to deploy a version to the Wolfram Cloud. I’ve got an extension in my web browser so that every time I create a new browser window or tab, the initial content will be my personal homepage.
And when I’m going to start doing something, there are just a few places I go. One is this web homepage, which I access many hundreds of times every day. Another is my email and its folders. Another is my desktop filesystem. And basically the only other one of any significance is my calendar system.
From time to time, I’ll see other people’s computers, and their desktops will be full of files. My desktop is completely empty, and plain white (convenient for whole-screen screensharing and livestreaming). I’d be mortified if there were any files to be seen on my desktop. I’d consider it a sign of defeat in my effort to keep what I’m doing organized. The same can be said of generic folders like Documents and Downloads. Yes, in some situations applications etc. will put files there. But I consider these directories to be throwaways. Nothing in them do I intend to be part of my long-term organizational structure. And they’re not synched to the cloud, or across my different computers.
Whatever the organization of my files may be, one feature of them is that I keep them a long time. In fact, my oldest file dates are from 1980. Back then, there was something a bit like the cloud, except it was called timesharing. I’ve actually lost some of the files that I had on timesharing systems. But the ones I had on on-premise computers are still with me (though, to be fair, some had to be retrieved from 9-track backup tapes).
And today, I make a point of having all my files (and all my email) actively stored on-premise. And, yes, that means I have this in my basement:
The initial storage is on a standard RAID disk array. This is backed up to computers at my company headquarters (about 1000 miles away), where standard tape backups are done. (In all these years, I’ve only ever had to retrieve from a backup tape once.) I also sync my more active files to the cloud, and to all my various computers.
All the Little Conveniences
My major two personal forms of output are mail messages and Wolfram Notebooks. And over the 30 years since we first introduced notebooks we’ve optimized our notebook system to the point where I can just press a key to create a default new notebook, and then I’m immediately off and running writing what automatically becomes a good-looking structured document. (And, by the way, it’s very nice to see that we’ve successfully maintained compatibility for 30 years: notebooks I created back in 1988 still just work.)
Sometimes, however, I’m making a notebook that’s not so much for human consumption as for input to some automated process. And for this, I use a whole variety of specially set up notebooks. For example, if I want to create an entry in our new Wolfram Function Repository, I just go to the menu item (available in any Version 12 system) File > New > Repository Item > Function Repository Item:
This effectively “prompts” me for items and sections to add. When I’m done, I can press Submit to Repository to send the notebook off to our central queue for repository item reviews (and, just because I’m the CEO doesn’t mean I get out of the review process—or want to).
I actually create a fair amount of content that’s structured for further processing. A big category is Wolfram Language documentation. And for authoring this we have an internal system we call DocuTools, that’s all based on a giant palette developed over many years, that I often say reminds one of an airplane cockpit in its complexity:
The idea of DocuTools is to make it as ergonomic as possible to author documentation. It has more than 50 subpalettes (a few shown above), and altogether no less than 1016 buttons. If I want to start a new page for a Wolfram Language function I just press New Function Page, and up pops:
A very important part of this page is the stripe at the top that says “Future”. This means that even though the page will be stored in our source control system, it’s not ready yet: it’s just something we’re considering for the future. And the system that builds our official documentation will ignore the page.
Usually we (which quite often actually means me) will write documentation for a function before the function is implemented. And we’ll include all sorts of details about features the function should have. But when the function is actually first implemented, some of those features may not be ready yet. And to deal with this we (as we call it) “futurize” parts of the documentation. It’s still there in the source control system, and we see it every time we look at the source for the documentation page. But it’s not included when the page for documentation that people will see is built.
DocuTools is of course implemented in the Wolfram Language, making extensive use of the symbolic structure of Wolfram Notebooks. And over the years it’s grown to handle many things that aren’t strictly documentation; in fact, for me it’s become the main hub for the creation of almost all notebook-based content.
There’s a button, for example, for Stephen Wolfram Blog. Press it and one gets a standard notebook ready to write into. But in DocuTools there’s a whole array of buttons that allow one to insert suggestions and edits. And when I’ve written a blog what will come back is typically something like this:
The pink boxes are “you really need to fix this”; the tan are “here’s a comment”. Click one and up comes a little form:
Of course, there are plenty of change-tracking and redlining systems out there in the world. But with the Wolfram Language it becomes easy to create a custom one that’s optimized for my needs, so that’s what I’ve had done. Before I had this, it used to take many hours to go through edit suggestions (I remember a horrifying 17-hour plane ride where I spent almost the whole time going through suggestions for a single post). But now—because it’s all optimized for me—I can zip through perhaps 10 times faster.
Very often tools that are custom built for me end up being adapted so everyone else can use them too. An example is a system for authoring courses and creating videos. I wanted to be able to do this as a “one-man band”—a bit like how I do livestreaming. My idea was to create a script that contains both words to say and code to input, then to make the video by screen recording in real time while I went through the script. But how would the inputs work? I couldn’t type them by hand because it would interrupt the real-time flow of what I was saying. But the obvious thing is just to “autotype” them directly into a notebook.
But how should all this be orchestrated? I start from a script:
Then I press Generate Recording Configuration. Immediately a title screen comes up in one area of my screen, and I set up my screen-recording system to record from this area. Elsewhere on my screen is the script. But what about the controls? Well, they’re just another Wolfram Notebook, that happens to act as a palette containing buttons:
But how can I actually operate this palette? I can’t use my mouse, because then I’d take focus away from the notebook that’s been screen recorded. So the idea that I had is to put the palette on an extended desktop, that happens to be being displayed on an iPad. So then to “perform” the script, I just press buttons on the palette.
There’s a big Advance Script button. And let’s say I’ve read to a point in the script where I need to type something into the notebook. If I want to simulate actual typing I press Slow Type. This will enter the input character-at-a-time into the notebook (yes, we measured the inter-key delay distribution for human typing, and simulate it). After a while it gets annoying to see all that slow typing. So then I just use the Type button, which copies the whole input immediately into the notebook. If I press the button again, it’ll perform its second action: Evaluate. And that’s the equivalent of pressing Shift+Enter in the notebook (with some optional extra explanatory popups suitable for the video).
I could go on about other tools I’ve had built using the Wolfram Language, but this gives a flavor. But what do I use that isn’t the Wolfram Language? Well, I use a web browser, and things that can be reached through it. Still, quite often, I’m just going to the Wolfram Cloud, and for example viewing or using cloud notebooks there.
Sometimes I’ll use our public Wolfram Cloud. But more often I’ll use a private Wolfram Cloud. The agendas for most of our internal meetings are notebooks that are hosted on our internal Wolfram Cloud. I also personally have a local private Wolfram Cloud running, that I host an increasing number of applications on.
Here’s the dock on my computer as of right now:
It’s got a filesystem browser; it’s got an email client; it’s got three web browsers (yes, I like to test our stuff on multiple browsers). Then I’ve got a calendar client. Next is the client for our VoIP phone system (right now I’m alternating between using this, and using audio along with our screensharing system). Then, yes, at least right now I have a music app. I have to say it’s rather rare that my day gives me a chance to listen to music. Probably the main time when I end up doing it is when I’m very behind on email, and need something to cheer me up as I grind through thousands of messages. As soon as I’m actually writing anything nontrivial, though, I have to pause the music, or I can’t concentrate. (And I have to find music without vocals—because I’ve noticed I can’t read at full speed if I’m hearing vocals.)
Sometimes I’ll end up launching a standard word processor, spreadsheet, etc. app because I’m opening a document associated with one of these apps. But I have to admit that in all these years I’ve essentially never authored a document from scratch with any of these apps; I end up just using technology of ours instead.
Occasionally I’ll open a terminal window, and directly use operating system commands. But this is becoming less and less common—because more and more I’m just using the Wolfram Language as my “super shell”. (And, yes, it’s incredibly convenient to store and edit commands in a notebook, and to instantly be able to produce graphical and structured output.)
As I write this, I realize a little optimization I haven’t yet made. On my personal homepage there are some links that do fairly complex things. One, for example, initiates the process for me doing an unscheduled livestream: it messages our 24/7 system monitoring team so they can take my feed, broadcast it, and monitor responses. But I realize that I still have quite a few custom operating system commands, that do things like update from the source code repository, that I type into a terminal window. I need to set these up in my private cloud, so I can just have links on my personal homepage that run Wolfram Language code for these commands. (To be fair, some of these commands are very old; for example, my fmail command that sends a mail message in the future, was written nearly 30 years ago.)
But, OK, if I look at my dock of apps, there’s a definite preponderance of Spikey ones. But why, for example, do I need three identical standard Spikeys? They’re all the Wolfram Desktop app. But there are three versions of it. The first one is our latest distributed version. The second one is our latest internal version, normally updated every day. And the third one (which is in white) is our “prototype build”, also updated every day, but with lots of “bleeding edge” features that aren’t ready to go into serious testing.
It requires surprisingly fancy operating system footwork to get these different versions installed every night, and to correctly register document types with them. But it’s very important to my personal workflow. Typically I’ll use the latest internal version (and, yes, I have a directory with many previous versions too), but occasionally, say for some particular meeting, I’ll try out the prototype build, or I’ll revert to the released build, because things are broken. (Dealing with multiple versions is one of those things that’s easier in the cloud—and we have a whole array of different configurations running in internal private clouds, with all sorts of combinations of kernel, front end, and other versions.)
When I give talks and so on, I almost always use the latest internal version. I find that livecoding in front of an audience is a great way to find bugs—even if it sometimes makes me have to explain, as I put it, the “disease of the software company CEO”: to always want to be running the latest version, even if it hasn’t been seriously tested and was built the night before.
Archiving & Searching
A critical part of my personal infrastructure is something that in effect dramatically extends my personal memory: my “metasearcher”. At the top of my personal homepage is a search box. Type in something like “rhinoceros elephant” and I’ll immediately find every email I’ve sent or received in the past 30 years in which that’s appeared, as well as every file on my machine, and every paper document in my archives:
To me it’s extremely convenient to have a count of the messages by year; it often helps me remember the history or story behind whatever I’m asking. (In this case, I can see a peak in 2008, which is when we were getting ready to launch Wolfram|Alpha—and I was working on data about lots of kinds of things, including species.)
Of course, a critical piece of making my metasearcher work is that I’ve stored so much stuff. For example, I actually have all the 815,000 or so emails that I’ve written in the past 30 years, and all the 2.3 million (mostly non-spam) ones I’ve received. And, yes, it helps tremendously that I’ve had a company with organized IT infrastructure etc. for the past 32 years.
But email, of course, has the nice feature that it’s “born digital”. What about things that were, for example, originally on paper? Well, I have been something of an “informational packrat” for most of my life. And in fact I’ve been pretty consistently keeping documents back to when I started elementary school in 1968. They’ve been re-boxed three times since then, and now the main ones are stored like this:
(I also have file folder storage for documents on people, organizations, events, projects and topics.) My rate of producing paper documents increased through about 1984, then decayed quite rapidly, as I went more digital. Altogether I have about a quarter million pages of primary non-bulk-printed documents—mostly from the earlier parts of my life.
About 15 years ago I decided I needed to make these searchable, so I initiated the project of scanning all of them. Most of the documents are one or a few pages in length, so they can’t be handled by an automatic feeder—and so we set up a rig with a high-resolution camera (and in those days it needed flash). It took several person-years of work, but eventually all the documents were scanned.
We automatically cropped and white-balanced them (using Wolfram Language image processing), then OCR’ed them, and put the OCR’ed text as a transparent layer into the scanned image. If I now search for “rhinoceros” I find 8 documents in my archive. Perhaps not surprisingly given that search term, they’re a bit random, including for example the issue of my elementary school magazine from Easter 1971.
OCR works on printed text. But what about handwritten text? Correspondence, even if it’s handwritten, usually at least comes on printed letterhead. But I have many pages of handwritten notes with basically nothing printed on them. Recognizing handwriting purely from images (without the time series of strokes) is still beyond current technology, but I’m hoping that our neural-net-based machine learning systems will soon be able to tackle it. (Conveniently, I’ve got quite a few documents where I have both my handwritten draft, and a typed version, so I’m hoping to have a training set for at least my personal handwriting.)
But even though I can’t search for handwritten material, I can often find it just by “looking in the right box”. My primary scanned documents are organized into 140 or so boxes, each covering a major period or project in my life. And for each box, I can pull up thumbnails of pages, grouped into documents. So, for example, here are school geography notes from when I was 11 years old, together with the text of a speech I gave:
I have to say that pretty much whenever I start looking through my scanned documents from decades ago I end up finding something unexpected and interesting, that very often teaches me something about myself, and about how I ended up developing in some particular direction.
It may be something fairly specific to my life, and the fact that I’ve worked on building long-term things, as well as that I’ve kept in touch with a large number of people over a long period of time, but I’m amazed by the amount of even quite ancient personal history that I seem to encounter practically every day. Some person or some organization will contact me, and I’ll look back at information about interactions I had with them 35 years ago. Or I’ll be thinking about something, and I’ll vaguely remember that I worked on something similar 25 years ago, and look back at what I did. I happen to have a pretty good memory, but when I actually look at material from the past I’m always amazed at how many details I’ve personally forgotten.
I first got my metasearcher set up nearly 30 years ago. The current version is based on Wolfram Language CreateSearchIndex/TextSearch functionality, running on my personal private cloud. It’s using UpdateSearchIndex to update every few minutes. The metasearcher also “federates in” results from APIs for searching our corporate websites and databases.
But not everything I want can readily be found by search. And another mechanism I have for finding things is my “personal timeline”. I’ve been meaning for ages to extend this, but right now it basically just contains information on my external events, about 40 of them per year. And the most important part is typically my “personal trip report”, which I meticulously write, if at all possible within 24 hours.
Usually the trip report is just text (or at least, text structured in a notebook). But when I go to events like trade shows I typically bring a tiny camera with me, that takes a picture every minute. If I’m wearing one of those lanyard name tags I’ll typically clip the camera on the top of the name tag, among other things putting it at an ideal height to capture name tags of people I meet. When I write my personal trip report I’ll typically review the pictures, and sometimes copy a few into my trip notebook.
But even with all my various current sources of archival material (which now include chat messages, livestreams, etc.), email still remains the most important. Years ago I decided to make it easy for people to find an email address for me. My calculation was that if someone wants to reach me, then in modern times they’ll eventually find a way to do it, but if it’s easy for them just to send email, that’s how they’ll contact me. And, yes, having my email address out there means I get lots of email from people I don’t know around the world. Some of it is admittedly strange, but a lot is interesting. I try to look at all of it, but it’s also sent to a request tracker system, so my staff can make sure important things get handled. (It is sometimes a little odd for people to see request tracker ticket metadata like SWCOR #669140 in email subject lines, but I figure it’s a small price to pay for making sure the email is actually responded to.)
I might mention that for decades email has been the primary means of communication inside our (geographically distributed) company. Yes, we have project management, source control, CRM and other systems, as well as chat. But at least for the parts of the company that I interact with, email is overwhelmingly dominant. Sometimes it’s individual emails being sent between people. Sometimes it’s email groups.
It’s been a running joke for a long time that we have more email groups than employees. But we’ve been careful to organize the groups, for example identifying different types by prefixes to their names (t- is a mailing list for a project team, d- a mailing list for a department, l- a more open mailing list, r- a mailing list for automated reports, q- a request list, etc.) And for me at least this makes it plausible to remember what the right list is for some mail I want to send out.
Databases of People & Things
I know a lot of people, from many different parts of my life. Back in the 1980s I used to just keep a list of them in a text file (before then it was a handwritten address book). But by the 1990s I decided I needed to have a more systematic database for myself—and created what I started calling pBase. In recent years the original technology of pBase began to seem quite paleolithic, but I now have a modern implementation using the Wolfram Language running in my personal private cloud.
It’s all quite nice. I can search for people by name or attributes, or—if I’m for example going to be visiting somewhere—I can just have pBase show me a map of our latest information about who’s nearby:
How does pBase relate to social networks? I’ve had a Facebook account for a long time, but it’s poorly curated, and always seems to ride at the maximum number of possible friends. LinkedIn I take much more seriously, and make a point of adding people only if I’ve actually talked to them (I currently have 3005 connections, so, yes, I’ve talked to quite a few people).
It’s very convenient that every so often I can download data from my LinkedIn account via ServiceExecute to update what’s in pBase. But LinkedIn captures only a fraction of people I know. It doesn’t include many of my more prominent friends and acquaintances, as well as most academics, many students, etc.
Eventually I’ll probably get pBase developed more, and perhaps make the technology generally available. But within our company, there’s already a system that illustrates some potential aspirations: our internal company directory—which is running in our internal private cloud, and basically uses Wolfram|Alpha-style natural language understanding to let one ask natural language questions.
I might mention in addition to our company directory, we also maintain another database that I, at least, find very useful, particularly when I’m trying to figure out who might know the answer to some unusual question, or who we might tap for some new project. We call it our Who Knows What database. And for each person it gives a profile of experience and interests. Here’s the entry for me (and here’s the source with the question details):
In terms of personal databases, another useful one for me is the database of books I own. I haven’t been buying too many books in the past decade or so, but before then I accumulated a library of about 6000 volumes, and it’s not uncommon—particularly when I’m doing more historically oriented research—that I’ll want to consult quite a few of them. But how should they be organized? “Big” classification schemes like Dewey Decimal or Library of Congress are overkill, and don’t do a great job of matching my personal “cognitive map” of topics.
Like my filesystem folders, or my physical folders of papers, I’ve found the best scheme is to put the books into fairly broad categories—small enough in number that I can spatially remember where they are in my library. But how should books be arranged within a category?
Well, here I get to tell a cautionary tale (that my wife regularly uses as an example) of what can go wrong in my kind of approach. Always liking to understand the historical progression of ideas, I thought it would be nice to be able to browse a category of books on a shelf in historical order (say, by first publication date). But this makes it difficult to find a specific book, or, for example, to reshelve it. (It would be easier if books had their publication dates printed on their spines. But they don’t.)
About 20 years ago I was preparing to move all my books to a new location, with different lengths of shelves. And I had the issue of trying to map out how to arrange book categories on the new shelves (“how many linear feet is quantum field theory and where can it fit in?”) So I thought: “Why not just measure the width of each book, and while I’m at it also measure its height and its color?” Because my idea was that then I could make a graphic of each shelf, with books shown with realistic widths and color, then put an arrow in the graphic to indicate the location (easily identified visually from “landmarks” of other books) of a particular book.
I got a colorimeter (it was before ubiquitous digital cameras) and started having the measurements made. But it turned out to be vastly more labor-intensive than expected, and, needless to say, didn’t get finished before the books had to be moved. Meanwhile, the day the books were moved, it was noticed that the packing boxes fit more books if one didn’t just take a single slab of books off a shelf, but instead put other books around the edges.
The result was that 5100 books arrived, basically scrambled into random order. It took three days to sort them. And at this point, I decided just to keep things simpler, and alphabetize by author in each category. And this certainly works fine in finding books. But one result of my big book inventory project is that I do now have a nice, computable version of at least all the books connected to writing A New Kind of Science, and it’s actually in the Wolfram Data Repository:
✕
ResourceData["Books in Stephen Wolfram's Library"]
Personal Analytics
In 2012 I wrote a piece about personal analytics and the data I’ve collected on myself. Back then I had about a third of a million emails in my archive; now it’s half a million more, and I can extend my diurnal email plot:
(The big empty spaces are when I’m asleep, and, yes, as I’ve changed projects—e.g. finishing A New Kind of Science in 2002—my sleep habits have changed; I’m also now trying an experiment of going to sleep earlier.)
I have systems that keep all sorts of data, including every keystroke I type, every step I take and what my computer screen looks like every minute (sadly, the movie of this is very dull). I also have a whole variety of medical and environmental sensors, as well as data from devices and systems that I interact with.
It’s interesting every so often to pick up those Wolfram Data Drop databins and use them to do some data science on my life. And, yes, in broad terms I find that I am extremely consistent and habitual—yet every day there are different things that happen, that make my “productivity” (as measured in a variety of ways) bounce around, often seemingly randomly.
But one thing about collecting all this data is that I can use it to create dashboards, and these I find useful every single day. For example, running in my private cloud is a monitoring system for my email:
The yellow curve is my total number of pending email messages; the red is the number I haven’t even opened yet. These curves are pretty sensitive to all kinds of features of my life, and for example when I’m intensely working on some project, I’ll often see my email “go to seed” for a little while. But somehow in trying to pace myself and decide when I can do what, I find this email dashboard very helpful.
It’s also helpful that every day I get emails reporting on the previous day. How many keystrokes did I type, and in what applications? What files did I create? How many steps did I take? And so on.
I keep all kinds of health and medical data on myself too, and have done so for a long time. It’s always great to have started measuring something a long time ago, so one can plot a several-decade time series and see if anything’s changed. And, actually, the thing I’ve noticed is that often my value (say blood level) for something has remained numerically essentially the same for years—but many of the “normal ranges” quoted by labs have bounced all over the place. (Realistically this isn’t helped by labs inferring normal ranges from their particular observed populations, etc.)
I got my whole genome sequenced in 2010. And although I haven’t learned anything dramatic from it, it certainly helps me feel connected to genomic research when I can see some SNP variant mentioned in a paper, and I can immediately go look to see if I have it. (With all the various vicissitudes of strands, orientations and build numbers, I tend to stick to first principles, and just look for flanking sequences with StringPosition.)
Like so many of the things I’ve described in this piece, what has worked for me in doing personal analytics is to do what’s easy to do. I’ve never yet quite solved the problem, for example, of recording what I eat (our image identification isn’t yet quite good enough, and even made-just-for-me apps to enter food have always seemed a bit too onerous). But whenever I have a system that just operates automatically, that’s when I successfully collect good personal analytics data. And having dashboards and daily emails helps both in providing ongoing feedback, and in being able to check if something’s gone wrong with the system.
The Path Ahead
I’ve described—in arguably quite nerdy detail—how some of my personal technology infrastructure is set up. It’s always changing, and I’m always trying to update it—and for example I seem to end up with lots of bins of things I’m not using anymore (yes, I get almost every “interesting” new device or gadget that I find out about):
But although things like devices change, I’ve found that the organizational principles for my infrastructure have remained surprisingly constant, just gradually getting more and more polished. And—at least when they’re based on our very stable Wolfram Language system—I’ve found that the same is true for the software systems I’ve had built to implement them.
What of the future? Some things will certainly get upticked. I realized while writing this post that I can now upgrade to 4k monitors (or higher) without affecting screensharing (the feed is automatically downsampled). Before too long maybe I’ll be using AR to annotate my environment in real time. Maybe eventually I’ll have some way to do XR-based as-if-in-person videoconferencing. Maybe—as I’ve been assuming will be possible for 40+ years—I’ll finally be able to type faster using something like EEG. And so on.
But the more important changes will be in having better-developed, and more automated, workflows. In time I expect it’ll be possible to use our machine learning tools to do automatic “computational history”, for example assembling a useful and appropriately clustered timeline of things I’ve done, say in a particular area.
In my efforts at historical research, I’ve had occasion to use lots of archives of people and organizations. There’s usually a certain amount of indexing and tagging that’s been done. (Who is that letter to and from? When was it written? What are its keywords? Where was it filed? And so on.) But things tend to be very granular, and it’s usually hard work to determine the overall arc of what happened.
My first goal is to make all the material I personally have useful for myself. But I’m thinking of soon starting to open up some of the older material for other people to see. And I’m studying how—in modern times, with all the cloud infrastructure, machine learning, visualization, computational documents, etc. that we have—I can build the best possible system for presenting and exploring archives.
As I think about my day, I ask myself what aspects of it aren’t well optimized. A lot of it actually comes down to things like email processing, and time spent for example actually responding to questions. Now, of course, I’ve spent lots of effort to try to structure things so as many questions as possible become self-answering, or can be addressed with technology and automation that we’ve built. And, in my role as CEO, I also try hard to delegate to other people whenever I can.
But there’s still plenty left. And I certainly wonder whether with all the technology we now have, more could be automated, or delegated to machines. Perhaps all that data I’ve collected on myself will one day let one basically just built a “bot of me”. Having seen so many of my emails—and being able to look at all my files and personal analytics—maybe it’s actually possible to predict how I’d respond to any particular question.
We’re not there yet. But it will be an interesting moment when a machine can, for example, have three ideas about how to respond to something, and then show me drafts that I can just pick from and approve. The overall question of what direction I want to go in will almost by definition have to stay with me, but the details of how to get there I’m hoping can increasingly be automated.
Product Details
In the course of this piece, I’ve mentioned all sorts of devices and systems. Here’s a list of the specific products I’m currently using. Note that I’m not “endorsing” things; I’m just explaining what I happen to use, based on my research, and my particular constraints and history.
I’m listing items in the order they appear in this piece, usually not repeating if they’re mentioned multiple times. Note that some of the items here aren’t directly available anymore.
My Daily Life
Main desk computer Apple Mac Pro (12-core; D700 GPUs; 64 GB RAM; 1 TB SSD)
Main desk displays
Apple 27″ Cinema Displays (1440p) [Having just discovered it’ll work with screensharing, I’m now going to upgrade to 4k displays]
Multifocal glasses Varilux lenses [I looked at reading channel widths and Zernike polynomials etc. for custom lenses, but my correction is barely over a diopter, so I didn’t bother]
Heart-rate data FitBit Charge 2 + Wolfram Language ServiceConnect [I’d prefer to use WHOOP but given that I’ll only wear one thing on my wrist, I need it to also give me text alerts]
Tablet Apple iPad Pro 10.5″ [I was using an iPad Mini for a while, but I wasn’t finding myself in situations where the reduced weight was a good tradeoff for reduced screen size]
In an “integer release” like 12, our goal is to provide fully-filled-out new areas of functionality. But in every release we also want to deliver the latest results of our R&D efforts. In 12.0, perhaps half of our new functions can be thought of as finishing areas that were started in previous “.1” releases—while half begin new areas. I’ll discuss both types of functions in this piece, but I’ll be particularly emphasizing the specifics of what’s new in going from 11.3 to 12.0.
I must say that now that 12.0 is finished, I’m amazed at how much is in it, and how much we’ve added since 11.3. In my keynote at our Wolfram Technology Conference last October I summarized what we had up to that point—and even that took nearly 4 hours. Now there’s even more.
What we’ve been able to do is a testament both to the strength of our R&D effort, and to the effectiveness of the Wolfram Language as a development environment. Both these things have of course been building for three decades. But one thing that’s new with 12.0 is that we’ve been letting people watch our behind-the-scenes design process—livestreaming more than 300 hours of my internal design meetings. So in addition to everything else, I suspect this makes Version 12.0 the very first major software release in history that’s been open in this way.
Although nowadays the vast majority of what the Wolfram Language (and Mathematica) does isn’t what’s usually considered math, we still put immense R&D effort into pushing the frontiers of what can be done in math. And as a first example of what we’ve added in 12.0, here’s the rather colorful ComplexPlot3D:
✕
ComplexPlot3D[Gamma[z],{z,-4-4I,4+4I}]
It’s always been possible to write Wolfram Language code to make plots in the complex plane. But only now have we solved the math and algorithm problems that are needed to automate the process of robustly plotting even quite pathological functions in the complex plane.
The visualization of complex functions is (pun aside) a complex story, with details making a big difference in what one notices about a function. And so one of the things we’ve done in 12.0 is to introduce carefully selected standardized ways (such as named color functions) to highlight different features:
Measurements in the real world often have uncertainty that gets represented as values with ± errors. We’ve had add-on packages for handling “numbers with errors” for ages. But in Version 12.0 we’re building in computation with uncertainty, and we’re doing it right.
The key is the symbolic object Around[x, δ], which represents a value “around x”, with uncertainty δ:
✕
Around[7.1,.25]
You can do arithmetic with Around, and there’s a whole calculus for how the uncertainties combine:
✕
Sqrt[Around[7.1,.25]]+Around[1,.1]
If you plot Around numbers, they’ll be shown with error bars:
But what really is an Around object? It’s something where there are certain rules for combining uncertainties, that are based on uncorrelated normal distributions. But there’s no statement being made that Around[x, δ] represents anything that actually in detail follows a normal distribution—any more than that Around[x, δ] represents a number specifically in the interval defined by Interval[{x - δ, x + δ}]. It’s just that Around objects propagate their errors or uncertainties according to consistent general rules that successfully capture what’s typically done in experimental science.
OK, so let’s say you make a bunch of measurements of some value. You can get an estimate of the value—together with its uncertainty—using MeanAround (and, yes, if the measurements themselves have uncertainties, these will be taken into account in weighting their contributions):
Functions all over the system—notably in machine learning—are starting to have the option ComputeUncertaintyTrue, which makes them give Around objects rather than pure numbers.
Around might seem like a simple concept, but it’s full of subtleties—which is the main reason it’s taken until now for it to get into the system. Many of the subtleties revolve around correlations between uncertainties. The basic idea is that the uncertainty of every Around object is assumed to be independent. But sometimes one has values with correlated uncertainties—and so in addition to Around, there’s also VectorAround, which represents a vector of potentially correlated values with a specified covariance matrix.
There’s even more subtlety when one’s dealing with things like algebraic formulas. If one replaces x here with an Around, then, following the rules of Around, each instance is assumed to be uncorrelated:
✕
(Exp[x]+Exp[x/2])/.x->Around[0,.3]
But probably one wants to assume here that even though the value of x may be uncertain, it’s going to be the same for each instance, and one can do this using the function AroundReplace (notice the result is different):
✕
AroundReplace[Exp[x]+Exp[x/2],x->Around[0,.3]]
There’s lots of subtlety in how to display uncertain numbers. Like how many trailing 0s should you put in:
✕
Around[1,.0006]
Or how much precision of the uncertainty should you include (there’s a conventional breakpoint when the trailing digits are 35):
✕
{Around[1.2345,.000312],Around[1.2345,.00037]}
In rare cases where lots of digits are known (think, for example, some physical constants), one wants to go to a different way to specify uncertainty:
✕
Around[1.23456789,.000000001]
And it goes on and on. But gradually Around is going to start showing up all over the system. By the way, there are lots of other ways to specify Around numbers. This is a number with 10% relative error:
✕
Around[2,Scaled[.1]]
This is the best Around can do in representing an interval:
It can also take into account asymmetry by giving asymmetric uncertainties:
✕
Around[LogNormalDistribution[2,1]]
Classic Math, Elementary and Advanced
In making math computational, it’s always a challenge to both be able to “get everything right”, and not to confuse or intimidate elementary users. Version 12.0 introduces several things to help. First, try solving an irreducible quintic equation:
✕
Solve[x^5 + 6 x + 1 == 0, x]
In the past, this would have shown a bunch of explicit Root objects. But now the Root objects are formatted as boxes showing their approximate numerical values. Computations work exactly the same, but the display doesn’t immediately confront people with having to know about algebraic numbers.
When we say Integrate, we mean “find an integral”, in the sense of an antiderivative. But in elementary calculus, people want to see explicit constants of integration (as they always have in Wolfram|Alpha), so we added an option for that (and C[n] also has a nice, new output form):
✕
Integrate[x^3,x,GeneratedParameters->C]
When we benchmark our symbolic integration capabilities we do really well. But there’s always more that can be done, particularly in terms of finding the simplest forms of integrals (and at a theoretical level this is an inevitable consequence of the undecidability of symbolic expression equivalence). In Version 12.0 we’ve continued to pick away at the frontier, adding cases like:
In Version 11.3 we introduced asymptotic analysis, being able to find asymptotic values of integrals and so on. Version 12.0 adds asymptotic sums, asymptotic recurrences and asymptotic solutions to equations:
AsymptoticSolve[x y^4 - (x + 1) y^2 + x == 1, y, {x, 0, 3}, Reals]
One of the great things about making math computational is that it gives us new ways to explain math itself. And something we’ve been doing is to enhance our documentation so that it explains the math as well as the functions. For example, here’s the beginning of the documentation about Limit—with diagrams and examples of the core mathematical ideas:
Polygons have been part of the Wolfram Language since Version 1. But in Version 12.0 they’re getting generalized: now there’s a systematic way to specify holes in them. A classic geographic use case is the polygon for South Africa—with its hole for the country of Lesotho.
In Version 12.0, much like Root, Polygon gets a convenient new display form:
✕
RandomPolygon[20]
You can compute with it just as before:
✕
Area[%]
RandomPolygon is new too. You can ask, say, for 5 random convex polygons, each with 10 vertices, in 3D:
✕
Graphics3D[RandomPolygon[3->{"Convex",10},5]]
There are lots of new operations on polygons. Like PolygonDecomposition, which can, for example, decompose a polygon into convex parts:
Polygons are pretty straightforward to specify: you just give their vertices in order (and if they have holes, you also give the vertices for the holes). Polyhedra are a bit more complicated: in addition to giving the vertices, you have to say how these vertices form faces. But in Version 12.0, Polyhedron lets you do this in considerable generality, including voids (the 3D analog of holes), etc.
Beyond the Platonic solids, Version 12 also builds in all the “uniform polyhedra” (n edges and m faces meet at each vertex)—and you can also get symbolic Polyhedron versions of named polyhedra from PolyhedronData:
Well, in Version 12, with the whole tower of technology we’ve built, we’re finally able to deliver a new style of mathematical computation—that in effect automates what Euclid was doing 2000+ years ago. A key idea is to introduce symbolic “geometric scenes” that have symbols representing constructs such as points, and then to define geometric objects and relations in terms of them.
For example, here’s a geometric scene representing a triangle a, b, c, and a circle through a, b and c, with center o, with the constraint that o is at the midpoint of the line from a to c:
On its own, this is just a symbolic thing. But we can do operations on it. For example, we can ask for a random instance of it, in which a, b, c and o are made specific:
You can generate as many random instances as you want. We try to make the instances as generic as possible, with no coincidences that aren’t forced by the constraints:
For a given geometric scene, there may be many possible conjectures. We try to pick out the interesting ones. In this case we come up with two—and what’s illustrated is the first one: that the line ba is perpendicular to the line cb. As it happens, this result actually appears in Euclid (it’s in Book 3, as part of Proposition 31)— though it’s usually called Thales’s theorem.
In 12.0, we now have a whole symbolic language for representing typical things that appear in Euclid-style geometry. Here’s a more complex situation—corresponding to what’s called Napoleon’s theorem:
And to support setting up geometric statements we also need “geometric assertions”. In 12.0 there are 29 different kinds—such as "Parallel", "Congruent", "Tangent", "Convex", etc. Here are three circles asserted to be pairwise tangent:
Version 11.3 introduced FindEquationalProof for generating symbolic representations of proofs. But what axioms should be used for these proofs? Version 12.0 introduces AxiomaticTheory, which gives axioms for various common axiomatic theories.
What does this mean? In a sense it’s a more symbolic symbolic expression than we’re used to. In something like 1 + x we don’t say what the value of x is, but we imagine that it can have a value. In the expression above, a, b and c are pure “formal symbols” that serve an essentially structural role, and can’t ever be thought of as having concrete values.
What about the · (center dot)? In 1 + x we know what + means. But the · is intended to be a purely abstract operator. The point of the axiom is in effect to define a constraint on what · can represent. In this particular case, it turns out that the axiom is an axiom for Boolean algebra, so that · can represent Nand and Nor. But we can derive consequences of the axiom completely formally, for example with FindEquationalProof:
There’s quite a bit of subtlety in all of this. In the example above, it’s useful to have · as the operator, not least because it displays nicely. But there’s no built-in meaning to it, and AxiomaticTheory lets you give something else (here f) as the operator:
✕
AxiomaticTheory[{"WolframAxioms",<|"Nand"->f|>}]
What’s the “Nand” doing there? It’s a name for the operator (but it shouldn’t be interpreted as anything to do with the value of the operator). In the axioms for group theory, for example, several operators appear:
✕
AxiomaticTheory["GroupAxioms"]
This gives the default representations of the various operators here:
✕
AxiomaticTheory["GroupAxioms","Operators"]
AxiomaticTheory knows about notable theorems for particular axiomatic systems:
✕
AxiomaticTheory["GroupAxioms","NotableTheorems"]
The basic idea of formal symbols was introduced in Version 7, for doing things like representing dummy variables in generated constructs like these:
You can enter a formal symbol using \[FormalA] or Esc.aEsc, etc. But back in Version 7, \[FormalA] was rendered as a. And that meant the expression above looked like:
I always thought this looked incredibly complicated. And for Version 12 we wanted to simplify it. We tried many possibilities, but eventually settled on single gray underdots—which I think look much better.
In AxiomaticTheory, both the variables and the operators are “purely symbolic”. But one thing that’s definite is the arity of each operator, which one can ask AxiomaticTheory:
Conveniently, the representation of operators and arities can immediately be fed into Groupings, to get possible expressions involving particular variables:
Axiomatic theories represent a classic historical area for mathematics. Another classical historical area—much more on the applied side—is the n-body problem. Version 12.0 introduces NBodySimulation, which gives simulations of the n-body problem. Here’s a three-body problem (think Earth-Moon-Sun) with certain initial conditions (and inverse-square force law):
Underneath, this is just solving differential equations, but—a bit like SystemModel—NBodySimulation provides a convenient way to set up the equations and handle their solutions. And, yes, standard force laws are built in, but you can define your own.
Language Extensions & Conveniences
We’ve been polishing the core of the Wolfram Language for more than 30 years now, and in each successive version we end up introducing some new extensions and conveniences.
We’ve had the function Information ever since Version 1.0, but in 12.0 we’ve greatly extended it. It used to just give information about symbols (although that’s been modernized as well):
✕
Information[Sin]
But now it also gives information about lots of kinds of objects. Here’s information on a classifier:
✕
Information[Classify["NotablePerson"]]
Here’s information about a cloud object:
✕
Information[CloudPut[100!]]
Hover over the labels in the “information box” and you can find out the names of the corresponding properties:
✕
Information[CloudPut[100!],"FileHashMD5"]
For entities, Information gives a summary of known property values:
✕
Information[Entity["Element", "Tungsten"]]
Over the past few versions, we’ve introduced a lot of new summary display forms. In Version 11.3 we introduced Iconize, which is essentially a way of creating a summary display form for anything. Iconize has proved to be even more useful than we originally anticipated. It’s great for hiding unnecessary complexity both in notebooks and in pieces of Wolfram Language code. In 12.0 we’ve redesigned how Iconize displays, particularly to make it “read nicely” inside expressions and code.
You can explicitly iconize something:
✕
{a,b,Iconize[Range[10]]}
Press the + and you’ll see some details:
Press and you’ll get the original expression again:
If you have lots of data you want to reference in a computation, you can always store it in a file, or in the cloud (or even in a data repository). It’s usually more convenient, though, to just put it in your notebook, so you have everything in the same place. One way to avoid the data “taking over your notebook” is to put in closed cells. But Iconize provides a much more flexible and elegant way to do this.
When you’re writing code, it’s often convenient to “iconize in place”. The right-click menu now lets you do that:
Talking of display, here’s something small but convenient that we added in 12.0:
✕
PercentForm[0.3]
And here are a couple of other “number conveniences” that we added:
✕
NumeratorDenominator[11/4]
✕
MixedFractionParts[11/4]
Functional programming has always been a central part of the Wolfram Language. But we’re continually looking to extend it, and to introduce new, generally useful primitives. An example in Version 12.0 is SubsetMap:
Functions are normally things that can take several inputs, but always give a single piece of output. In areas like quantum computing, however, one’s interested instead in having inputs and outputs. SubsetMap effectively implements functions, picking up inputs from specified positions in a list, applying some operation to them, then putting back the results at the same positions.
I started formulating what’s now SubsetMap about a year ago. And I quickly realized that actually I could really have used this function in all sorts of places over the years. But what should this particular “lump of computational work” be called? My initial working name was ArrayReplaceFunction (which I shortened to ARF in my notes). In a sequence of (livestreamed) meetings we went back and forth. There were ideas like ApplyAt (but it’s not really Apply) and MutateAt (but it’s not doing mutation in the lvalue sense), as well as RewriteAt, ReplaceAt, MultipartApply and ConstructInPlace. There were ideas about curried “function decorator” forms, like PartAppliedFunction, PartwiseFunction, AppliedOnto, AppliedAcross and MultipartCurry.
But somehow when we explained the function we kept on coming back to talking about how it was operating on a subset of a list, and how it was really like Map, except that it was operating on multiple elements at a time. So finally we settled on the name SubsetMap. And—in yet another reinforcement of the importance of language design—it’s remarkable how, once one has a name for something like this, one immediately finds oneself able to reason about it, and see where it can be used.
For many years we’ve worked hard to make the Wolfram Language the highest-level and most automated system for doing state-of-the-art machine learning. Early on, we introduced the “superfunctions” Classify and Predict that do classification and prediction tasks in a completely automated way, automatically picking the best approach for the particular input given. Along the way, we’ve introduced other superfunctions—like SequencePredict, ActiveClassification and FeatureExtract.
In Version 12.0 we’ve got several important new machine learning superfunctions. There’s FindAnomalies, which finds “anomalous elements” in data:
How do these functions work? They’re all based on a new function called LearnDistribution, which tries to learn the underlying distribution of data, given a certain set of examples. If the examples were just numbers, this would essentially be a standard statistics problem, for which we could use something like EstimatedDistribution. But the point about LearnDistribution is that it works with data of any kind, not just numbers. Here it is learning an underlying distribution for a collection of colors:
Once we have this “learned distribution”, we can do all sorts of things with it. For example, this generates 20 random samples from it:
✕
RandomVariate[dist,20]
But now think about FindAnomalies. What it has to do is to find out which data points are anomalous relative to what’s expected. Or, in other words, given the underlying distribution of the data, it finds what data points are outliers, in the sense that they should occur only with very low probability according to the distribution.
And just like for an ordinary numerical distribution, we can compute the PDF for a particular piece of data. Purple is pretty likely given the distribution of colors we’ve learned from our examples:
For ordinary numerical distributions, there are concepts like CDF that tell us cumulative probabilities, say that we’ll get results that are “further out” than a particular value. For spaces of arbitrary things, there isn’t really a notion of “further out”. But we’ve come up with a function we call RarerProbability, that tells us what the total probability is of generating an example with a smaller PDF than something we give:
Now we’ve got a way to describe anomalies: they’re just data points that have a very small rarer probability. And in fact FindAnomalies has an option AcceptanceThreshold (with default value 0.001) that specifies what should count as “very small”.
OK, but let’s see this work on something more complicated than colors. Let’s train an anomaly detector by looking at 1000 examples of handwritten digits:
We first introduced our symbolic framework for constructing, exploring and using neural networks back in 2016, as part of Version 11. And in every version since then we’ve added all sorts of state-of-the-art features. In June 2018 we introduced our Neural Net Repository to make it easy to access the latest neural net models from the Wolfram Language—and already there are nearly 100 curated models of many different types in the repository, with new ones being added all the time.
NetModel["BERT Trained on BookCorpus and English Wikipedia Data"]
You can open this up and see the network that’s involved (and, yes, we’ve updated the display of net graphs for Version 12.0):
And you can immediately use the network, here to produce some kind of “meaning features” array:
✕
NetModel["BERT Trained on BookCorpus and English Wikipedia Data"][
"What a wonderful network!"] // MatrixPlot
In Version 12.0 we’ve introduced several new layer types—notably AttentionLayer, which lets one set up the latest “transformer” architectures—and we’ve enhanced our “neural net functional programming” capabilities, with things like NetMapThreadOperator, and multiple-sequence NetFoldOperator. In addition to these “inside-the-net” enhancements, Version 12.0 adds all sorts of new NetEncoder and NetDecoder cases, such as BPE tokenization for text in hundreds of languages, and the ability to include custom functions for getting data into and out of neural nets.
But some of the most important enhancements in Version 12.0 are more infrastructural. NetTrain now supports multi-GPU training, as well as dealing with mixed-precision arithmetic, and flexible early-stopping criteria. We’re continuing to use the popular MXNet low-level neural net framework (to which we’ve been major contributors)—so we can take advantage of the latest hardware optimizations. There are new options for seeing what’s happening during training, and there’s also NetMeasurements that allows you to make 33 different types of measurements on the performance of a network:
Neural nets aren’t the only—or even always the best—way to do machine learning. But one thing that’s new in Version 12.0 is that we’re now able to use self-normalizing networks automatically in Classify and Predict, so they can easily take advantage of neural nets when it makes sense.
We introduced ImageIdentify, for identifying what an image is of, back in Version 10.1. In Version 12.0 we’ve managed to generalize this, to figure out not only what an image is of, but also what’s in an image. So, for example, ImageCases will show us cases of known kinds of objects in an image:
✕
ImageCases[CloudGet["https://wolfr.am/CMoUVVTH"]]
For more details, ImageContents gives a dataset about what’s in an image:
In a sense, ImageCases is like a generalized version of FindFaces, for finding human faces in an image. Something new in Version 12.0 is that FindFaces and FacialFeatures have become more efficient and robust—with FindFaces now based on neural networks rather than classical image processing, and the network for FacialFeatures now being 10 MB rather than 500 MB:
Functions like ImageCases represent “new-style” image processing, of a type that didn’t seem conceivable only a few years ago. But while such functions let one do all sorts of new things, there’s still lots of value in more classical techniques. We’ve had fairly complete classical image processing in the Wolfram Language for a long time, but we continue to make incremental enhancements.
An example in Version 12.0 is the ImagePyramid framework, for doing multiscale image processing:
There are several new functions in Version 12.0 concerned with color computation. A key idea is ColorsNear, which represents a neighborhood in perceptual color space, here around the color Pink:
✕
ChromaticityPlot3D[ColorsNear[Pink,.2]]
The notion of color neighborhoods can be used, for example, in the new ImageRecolor function:
As I sit at my computer writing this, I’ll say something to my computer, and capture it:
Play Audio
Here’s a spectrogram of the audio I captured:
✕
Spectrogram[%]
So far we could do this in Version 11.3 (though Spectrogram got 10 times faster in 12.0). But now here’s something new:
✕
SpeechRecognize[%%]
We’re doing speech-to-text! We’re using state-of-the-art neural net technology, but I’m amazed at how well it works. It’s pretty streamlined, and we’re perfectly well able to handle even very long pieces of audio, say stored in files. And on a typical computer the transcription will run at about actual real-time speed, so that an hour of speech will take about an hour to transcribe.
Right now we consider SpeechRecognize experimental, and we’ll be continuing to enhance it. But it’s interesting to see another major computational task just become a single function in the Wolfram Language.
In Version 12.0, there are other enhancements too. SpeechSynthesize supports new languages and new voices (as listed by VoiceStyleData[]).
Then you can make spectrograms or other measurements:
✕
Spectrogram /@%
And then—new in Version 12.0—you can use AudioIdentify to try to identify the category of sound (is that a talking rooster?):
✕
AudioIdentify/@%%
We still consider AudioIdentify experimental. It’s an interesting start, but it definitely doesn’t, for example, work as well as ImageIdentify.
A more successful audio function is PitchRecognize, which tries to recognize the dominant frequency in an audio signal (it uses both “classical” and neural net methods). It can’t yet deal with “chords”, but it works pretty much perfectly for “single notes”.
When one deals with audio, one often wants not just to identify what’s in the audio, but to annotate it. Version 12.0 introduces the beginning of a large-scale audio framework. Right now AudioAnnotate can mark where there’s silence, or where there’s something loud. In the future, we’ll be adding speaker identification and word boundaries, and lots else. And to go along with these, we also have functions like AudioAnnotationLookup, for picking out parts of an audio object that have been annotated in particular ways.
Underneath all this high-level audio functionality there’s a whole infrastructure of low-level audio processing. Version 12.0 greatly enhances AudioBlockMap (for applying filters to audio signals), as well as introduces functions like ShortTimeFourier.
A spectrogram can be viewed a bit like a continuous analog of a musical score, in which pitches are plotted as a function of time. In Version 12.0 there’s now InverseSpectrogram—that goes from an array of spectrogram data to audio. Ever since Version 2 in 1991, we’ve had Play to generate sound from a function (like Sin[100 t]). Now with InverseSpectrogram we have a way to go from a “frequency-time bitmap” to a sound. (And, yes, there are tricky issues about best guesses for phases when one only has magnitude information.)
Starting with Wolfram|Alpha, we’ve had exceptionally strong natural language understanding (NLU) capabilities for a long time. And this means that given a piece of natural language, we’re good at understanding it as Wolfram Language—that we can then go and compute from:
But what about natural language processing (NLP)—where we’re taking potentially long passages of natural language, and not trying to completely understand them, but instead just find or process particular features of them? Functions like TextSentences, TextStructure, TextCases and WordCounts have given us basic capabilities in this area for a while. But in Version 12.0—by making use of the latest machine learning, as well as our longstanding NLU and knowledgebase capabilities—we’ve now jumped to having very strong NLP capabilities.
The centerpiece is the dramatically enhanced version of TextCases. The basic goal of TextCases is to find cases of different types of content in a piece of text. An example of this is the classic NLP task of “entity recognition”—with TextCases here finding what country names appear in the Wikipedia article about ocelots:
And, yes, one can in principle use these capabilities through FindTextualAnswer to try to answer questions from text—but in a case like this, the results can be pretty wacky:
✕
FindTextualAnswer[WikipediaData["ocelots"],"weight of an ocelot",5]
Of course, you can get a real answer from our actual built-in curated knowledgebase:
One of the “surprise” new areas in Version 12.0 is computational chemistry. We’ve had data on explicit known chemicals in our knowledgebase for a long time. But in Version 12.0 we can compute with molecules that are specified simply as pure symbolic objects. Here’s how we can specify what turns out to be a water molecule:
The computational chemistry capabilities we’ve added in Version 12.0 are pretty general and pretty powerful (with the caveat that so far they only deal with organic molecules). At the lowest level they view molecules as labeled graphs with edges corresponding to bonds. But they also know about physics, and correctly account for atomic valences and bond configurations. Needless to say, there are lots of details (about stereochemistry, symmetry, aromaticity, isotopes, etc.). But the end result is that molecular structure and molecular computation have now successfully been added to the list of areas that are integrated into the Wolfram Language.
The Wolfram Language already has strong capabilities for geographic computing, but Version 12.0 adds more functions, and enhances some of those that were already there.
For example, there’s now RandomGeoPosition, which generates a random lat-long location. One might think this would be trivial, but of course one has to worry about coordinate transformations—and what makes it much more nontrivial is that one can tell it to pick points only inside a certain region, here the country of France:
A theme of new geographic capabilities in Version 12.0 is handling not just geographic points and regions, but also geographic vectors. Here’s the current wind vector, for example, at the position of the Eiffel Tower, represented as a GeoVector, with speed and direction (there’s also GeoVectorENU, which gives east, north and up components, as well as GeoGridVector and GeoVectorXYZ):
Geodesy is a mathematically sophisticated area, and we pride ourselves on doing it well in the Wolfram Language. In Version 12.0, we’ve added a few new functions to fill in some details. For example, we now have functions like GeoGridUnitDistance and GeoGridUnitArea which give the distortion (basically, eigenvalues of the Jacobian) associated with different geo projections at every position on Earth (or Moon, Mars, etc.).
One direction of visualization that we’ve been steadily developing is what one might call “meta-graphics”: the labeling and annotation of graphical things. We introduced Callout in Version 11.0; in Version 12.0 it’s been extended to things like 3D graphics:
There are lots of details that matter in making graphics really look good. Something that’s been enhanced in Version 12.0 is ensuring that columns of graphics line up on their frames, regardless of the length of their tick labels. We’ve also added LabelVisibility, which allows you to specify the relative priorities with which different labels should be made visible.
Another new feature of Version 12.0 is multipanel plot layout, where different datasets are shown in different panels, but the panels share axes whenever they can:
Our curated knowledgebase—that for example powers Wolfram|Alpha—is vast and continually growing. And with every version of the Wolfram Language we’re progressively tightening its integration into the core of the language.
Before Version 12.0, the Wolfram|Alpha Example pages served as a proxy for documenting many types of entities. But now there’s Wolfram Language documentation for all of them:
There are still functions like SatelliteData, WeatherData and FinancialData that handle entity types that routinely need complex selection or computation. But in Version 12.0, every entity type can be accessed in the same way, with natural language (“control + =”) input, and “yellow-boxed” entities and properties:
We’ve made it really convenient to work with data that’s built into the Wolfram Knowledgebase. You have entities, and it’s very easy to ask about properties and so on:
ResourceData["Entity Store of Books in Stephen Wolfram's Library"]
It describes a single entity type: an "SWLibraryBook". To be able to use entities of this type just like built-in entities, we “register” the entity store:
✕
EntityRegister[ResourceData["Entity Store of Books in Stephen Wolfram's Library"]]
Now we can do things like ask for 10 random entities of type "SWLibraryBook":
✕
RandomEntity["SWLibraryBook",10]
Each entity in the entity store has a variety of properties. Here’s a dataset of the values of properties for one particular entity:
OK, but with this setup we’re basically reading the whole contents of an entity store into memory. This makes it very efficient to do whatever Wolfram Language operations one wants on it. But it’s not a good scalable solution for large amounts of data—for example, data that is too big to fit in memory.
But what’s a typical source of large data? Very often it’s a database, and usually a relational one that can be accessed using SQL. We’ve had our DatabaseLink package for low-level read-write access to SQL databases for well over a decade. But in Version 12.0 we’re adding some major built-in features that allow external relational databases to be handled in the Wolfram Language just like entity stores, or built-in parts of the Wolfram Knowledgebase.
Let’s start off with a toy example. Here’s a symbolic representation of a small relational database that happens to be stored in a file:
Immediately we get a box that summarizes what’s in the database, and tells us that this database has 8 tables. If we open up the box, we can start inspecting the structure of those tables:
We can then set this relational database up as an entity store in the Wolfram Language. It looks very much the same as the library book entity store above, but now the actual data isn’t pulled into memory; instead it’s still in the external relational database, and we’re just defining a (“ORM-like”) mapping to entities in the Wolfram Language:
✕
EntityStore[%]
Now we can register this entity store, which sets up a bunch of entity types that (at least by default) are named after the names of the tables in the database:
✕
EntityRegister[%]
And now we can do “entity computations” on these, just like we would on built-in entities in the Wolfram Knowledgebase. Each entity here corresponds to a row in the “employees” table in the database:
✕
EntityList["employees"]
For a given entity type, we can ask what properties it has. These “properties” correspond to columns in the table in the underlying database:
✕
EntityProperties["employees"]
Now we can ask for the value of a particular property of a particular entity:
OK, but here’s where it gets more interesting: so far we’ve been looking at a little file-backed database. But we can do exactly the same thing with a giant database hosted on an external server.
As an example, let’s connect to the terabyte-sized OpenStreetMap PostgreSQL database that contains what is basically the street map of the world:
As before, let’s register the tables in this database as entity types. Like most in-the-wild databases there are little glitches in the structure, which are worked around, but generate warnings:
✕
EntityRegister[EntityStore[%]]
But now we can ask questions about the database—like how many geo points or “nodes” there are in all the streets of the world (and, yes, it’s a big number, which is why the database is big):
✕
EntityValue["planet_osm_nodes", "EntityCount"]
Here we’re asking for the names of the objects with the 10 largest (projected) areas in the (101 GB) planet_osm_polygon table (and, yes, it takes under a second):
So how does all this work? Basically what’s happening is that our Wolfram Language representation is getting compiled into low-level SQL queries that are then sent to be executed directly on the database server.
Sometimes you’ll ask for results that are just final values (like, say, the “amounts” above). But in other cases you’ll want something intermediate—like a collection of entities that have been selected in a particular way. And of course this collection could have a billion entries. So a very important feature of what we’re introducing in Version 12.0 is that we can represent and manipulate such things purely symbolically, resolving them to something specific only at the end.
Going back to our toy database, here’s an example of how we’d specify a class of entities obtained by aggregating the total creditLimit for all customers with a given value of country:
At first, this is just something symbolic. But if we ask for specific values, then actual database queries get done, and we get specific results:
✕
EntityValue[%, {"country", "creditLimit"}]
There’s a family of new functions for setting up different kinds of queries. And the functions actually work not only for relational databases, but also for entity stores, and for the built-in Wolfram Knowledgebase. So, for example, we can ask for the average atomic mass for a given period in the periodic table of elements:
An important new construct is EntityFunction. EntityFunction is like Function, except that its variables represent entities (or classes of entities) and it describes operations that can be performed directly on external databases. Here’s an example with built-in data, in which we’re defining a “filtered” entity class in which the filtering criterion is a function which tests population values. The FilteredEntityClass itself is just represented symbolically, but EntityList actually performs the query, and resolves an explicit list of (here, unsorted) entities:
In standard database programming, one typically ends up with a whole jungle of “joins” and “foreign keys” and so on. Our Wolfram Language representation lets you operate at a higher level—where basically joins become function composition and foreign keys are just different entity types. (If you want to do explicit joins, though, you can—for example using CombinedEntityClass.)
What’s going on under the hood is that all those Wolfram Language constructs are getting compiled into SQL, or, more accurately, the specific dialect of SQL that’s suitable for the particular database you’re using (we currently support SQLite, MySQL, PostgreSQL and MS-SQL, with support for OracleSQL coming soon). When we do the compilation, we’re automatically checking types, to make sure you get a meaningful query. Even fairly simple Wolfram Language specifications can end up turning into many lines of SQL. For example,
✕
EntityFunction[c,
c["employees"]["firstName"] <> " " <> c["employees"]["lastName"] <>
" is the rep for " <> c["customerName"] <> ". Their manager is " <>
c["employees"]["employees-reportsTo"]["firstName"] <> " " <>
c["employees"]["employees-reportsTo"]["lastName"] <> "."][
Entity["customers", 103]]
would produce the following intermediate SQL (here for querying the SQLite database):
The database integration system we have in Version 12.0 is pretty sophisticated—and we’ve been working on it for quite a few years. It’s an important step forward in allowing the Wolfram Language to directly handle a new level of “bigness” in big data—and to let the Wolfram Language directly do data science on terabyte-sized datasets and beyond. Like finding which street-like entities in the world have “Wolfram” in their name:
What is the best way to represent knowledge about the world? It’s an issue that’s been debated by philosophers (and others) since antiquity. Sometimes people said logic was the key. Sometimes mathematics. Sometimes relational databases. But now we at least know one solid foundation (or at least, I’m pretty sure we do): everything can be represented by computation. This is a powerful idea—and in a sense that’s what makes everything we do with Wolfram Language possible.
But are there subsets of general computation that are useful for representing at least certain kinds of knowledge? One that we use extensively in the Wolfram Knowledgebase is the notion of entities (“New York City”), properties (“population”) and their values (“8.6 million people”). Of course such triples don’t represent all knowledge in the world (“what will the position of Mars be tomorrow?”). But they’re a decent start when it comes to certain kinds of “static” knowledge about distinct things.
So how can one formalize this kind of knowledge representation? One answer is through graph databases. And in Version 12.0—in alignment with many “semantic web” projects—we’re supporting graph databases using RDF, and queries against them using SPARQL. In RDF the central object is an IRI (“Internationalized Resource Identifier”), that can represent an entity or a property. A “triplestore” then consists of a collection of triples (“subject”, “predicate”, “object”), with each element in each triple being an IRI (or a literal, such as a number). The whole object can then be thought of as a graph database or graph store, or, mathematically, a hypergraph. (It’s a hypergraph because the predicate “edges” can also be vertices elsewhere.)
You can build your own RDFStore much like you build an EntityStore—and in fact you can query any Wolfram Language EntityStore using SPARQL just like you query an RDFStore. And since the entity-property part of the Wolfram Knowledgebase can be treated as an entity store, you can also query this. So here, finally, is an example. The country-city list Entity["Country"], Entity["City"]} in effect represents an RDF store. Then SPARQLSelect is an operator acting on this store. What it does is to try to find a triple that matches what you’re asking for, with a particular value for the “SPARQL variable” x:
In principle you can just write a SPARQL query as a string (a bit like you can write an SQL string). But what we’ve done in Version 12.0 is introduce a symbolic representation of SPARQL that allows computation on the representation itself, making it easy, for example, to automatically generate complex SPARQL queries. (And it’s particularly important to do this because, on their own, practical SPARQL queries have a habit of getting extremely long and ponderous.)
OK, but are there RDF stores out in the wild? It’s been a long-running hope that a large part of the web will somehow eventually be tagged enough to “become semantic” and in effect be a giant RDF store. It’d be great if this happened, but so far it definitely hasn’t. Still, there are a few public RDF stores out there, and also some RDF stores within organizations, and with our new capabilities in Version 12.0 we’re in a unique position to do interesting things with them.
An incredibly common form of problem in industrial applications of mathematics is: “What configuration minimizes cost (or maximizes payoff) if certain constraints have to be satisfied?” More than half a century ago, the so-called simplex algorithm was invented for solving linear versions of this kind of problem, in which both the objective function (cost, payoff) and the constraints are linear functions of the variables in the problem. By the 1980s much more efficient (“interior point”) methods had been invented—and we’ve had these for doing “linear programming” in the Wolfram Language for a long time.
But what about nonlinear problems? Well, in the general case, one can use functions like NMinimize. And they do a state-of-the-art job. But it’s a hard problem. However, some years ago, it became clear that even among nonlinear optimization problems, there’s a class of so-called convex optimization problems that can actually be solved almost as efficiently as linear ones. (“Convex” means that both the objective and the constraints involve only convex functions—so that nothing can “wiggle” as one approaches an extremum, and there can’t be any local minima that aren’t global minima.)
In Version 12.0, we’ve now got strong implementations for all the various standard classes of convex optimization. Here’s a simple case, involving minimizing a quadratic form with a couple of linear constraints:
But if one had more variables, the old NMinimize would quickly bog down. In Version 12.0, however, QuadraticOptimization will continue to work just fine, up to more than 100,000 variables with more than 100,000 constraints (so long as they’re fairly sparse).
In Version 12.0 we’ve got “raw convex optimization” functions like SemidefiniteOptimization (that handles linear matrix inequalities) and ConicOptimization (that handles linear vector inequalities). But functions like NMinimize and FindMinimum will also automatically recognize when a problem can be solved efficiently by being transformed to a convex optimization form.
How does one set up convex optimization problems? Larger ones involve constraints on whole vectors or matrices of variables. And in Version 12.0 we now have functions like VectorGreaterEqual (input as ≥) that can immediately represent these.
Partial differential equations are hard, and we’ve been working on more and more sophisticated and general ways to handle them for 30 years. We first introduced NDSolve (for ODEs) in Version 2, back in 1991. We had our first (1+1-dimensional) numerical PDEs by the mid-1990s. In 2003 we introduced our powerful, modular framework for handling numerical differential equations. But in terms of PDEs we were still basically only dealing with simple, rectangular regions. To go beyond that required building our whole computational geometry system, which we introduced in Version 10. And with this, we released our first finite element PDE solvers. In Version 11, we then generalized to eigen problems.
Now, in Version 12, we’re introducing another major generalization: nonlinear finite element analysis. Finite element analysis involves decomposing regions into little discrete triangles, tetrahedra, etc.—on which the original PDE can be approximated by a large number of coupled equations. When the original PDE is linear, these equations will also be linear—and that’s the typical case people consider when they talk about “finite element analysis”.
But there are many PDEs of practical importance that aren’t linear—and to tackle these one needs nonlinear finite element analysis, which is what we now have in Version 12.0.
As an example, here’s what it takes to solve the nastily nonlinear PDE that describes the height of a 2D minimal surface (say, an idealized soap film), here over an annulus, with (Dirichlet) boundary conditions that make it wiggle sinusoidally at the edges (as if the soap film were suspended from wires):
On my computer it takes just a quarter of a second to solve this equation, and get an interpolating function. Here’s a plot of the interpolating function representing the solution:
We’ve put a lot of engineering into optimizing the execution of Wolfram Language programs over the years. Already in 1989 we started automatically compiling simple machine-precision numerical computations to instructions for an efficient virtual machine (and, as it happens, I wrote the original code for this). Over the years, we’ve extended the capabilities of this compiler, but it’s always been limited to fairly simple programs.
In Version 12.0 we’re taking a major step forward, and we’re releasing the first version of a new, much more powerful compiler that we’ve been working on for several years. This compiler is both able to handle a much broader range of programs (including complex functional constructs and elaborate control flows), and it’s also compiling not to a virtual machine but instead directly to optimized native machine code.
In Version 12.0 we still consider the new compiler experimental. But it’s advancing rapidly, and it’s going to have a dramatic effect on the efficiency of lots of things in the Wolfram Language. In Version 12.0, we’re just exposing a “kit form” of the new compiler, with specific compilation functions. But we’ll progressively be making the compiler operate more and more automatically—figuring out with machine learning and other methods when it’s worth taking the time to do what level of compilation.
At a technical level, the new Version 12.0 compiler is based on LLVM, and works by generating LLVM code—linking in the same low-level runtime library that the Wolfram Language kernel itself uses, and calling back to the full Wolfram Language kernel for functionality that isn’t in the runtime library.
Here’s the basic way one compiles a pure function in the current version of the new compiler:
The resulting compiled code function works just like the original function, though faster:
✕
%[12]
A big part of what lets FunctionCompile produce a faster function is that you’re telling it to make assumptions about the type of argument it’s going to get. We’re supporting lots of basic types (like "Integer32" and "Real64"). But when you use FunctionCompile, you’re committing to particular argument types, so much more streamlined code can be produced.
A lot of the sophistication of the new compiler is associated with inferring what types of data will be generated in the execution of a program. (There are lots of graph theoretic and other algorithms involved, and needless to say, all the metaprogramming for the compiler is done with the Wolfram Language.)
Here’s an example that involves a bit of type inference (the type of fib is deduced to be "Integer64""Integer64": an integer function returning an integer):
On my computer cf[25] runs about 300 times faster than the uncompiled function. (Of course, the compiled version fails when its output is no longer of type "Integer64", but the standard Wolfram Language version continues to work just fine.)
Already the compiler can handle hundreds of Wolfram Language programming primitives, appropriately tracking what types are produced—and generating code that directly implements these primitives. Sometimes, however, one will want to use sophisticated functions in the Wolfram Language for which it doesn’t make sense to generate one’s own compiled code—and where what one really wants to do is just to call into the Wolfram Language kernel for these functions. In Version 12.0 KernelFunction lets one do this:
OK, but let’s say one’s got a compiled code function. What can one do with it? Well, first of all one can just run it inside the Wolfram Language. One can store it too, and run it later. Any particular compilation is done for a specific processor architecture (e.g. 64-bit x86). But CompiledCodeFunction automatically keeps enough information to do additional compilation for a different architecture if it’s needed.
But given a CompiledCodeFunction, one of the interesting new possibilities is that one can directly generate code that can be run even outside the Wolfram Language environment. (Our old compiler had the CCodeGenerate package which provided slightly similar capabilities in simple cases—though even then relies on an elaborate toolchain of C compilers etc.)
Here’s how one can export raw LLVM code (notice that things like tail recursion optimization automatically get done—and notice also the symbolic function and compiler options at the end):
If one uses FunctionCompileExportLibrary, then one gets a library file—.dylib on Mac, .dll on Windows and .so on Linux. One can use this in the Wolfram Language by doing LibraryFunctionLoad. But one can also use it in an external program.
One of the main things that determines the generality of the new compiler is the richness of its type system. Right now the compiler supports 14 atomic types (such as "Boolean", "Integer8", "Complex64", etc.). It also supports type constructors like "PackedArray"—so that, for example, TypeSpecifier["PackedArray"]["Real64", 2] corresponds to a rank-2 packed array of 64-bit reals.
In the internal implementation of the Wolfram Language (which, by the way, is itself mostly in Wolfram Language) we’ve had an optimized way to store arrays for a long time. In Version 12.0 we’re exposing it as NumericArray. Unlike ordinary Wolfram Language constructs, you have to tell NumericArray in detail how it should store data. But then it works in a nice, optimized way:
In Version 11.2 we introduced ExternalEvaluate, that lets you do computations in languages like Python and JavaScript from within the Wolfram Language (in Python, “^” means BitXor):
✕
ExternalEvaluate["Python", "23424^2542"]
In Version 11.3, we introduced external language cells, to make it easy to enter external-language programs or other input directly in a notebook:
✕
ExternalEvaluate["Python", "23424^2542"]
In Version 12.0, we’re tightening the integration. For example, inside an external language string, you can use <* ... *> to give Wolfram Language code to evaluate:
You can also directly use external functions (the slightly bizarrely named ord is basically the Python analog of ToCharacterCode):
✕
ExternalFunction["Python", "ord"]["a"]
And here’s a Python pure function, represented symbolically in the Wolfram Language:
✕
ExternalFunction["Python", "lambda x:x+1"]
✕
ExternalFunction["Python", "lambda x:x+1"][100]
Calling the Wolfram Language from Python & Other Places
How should one access the Wolfram Language? There are many ways. One can use it directly in a notebook. One can call APIs that execute it in the cloud. Or one can use WolframScript in a command-line shell. WolframScript can run either against a local Wolfram Engine, or against a Wolfram Engine in the cloud. It lets you directly give code to execute:
And it lets you do things like define functions, for example with code in a file:
Along with the release of Version 12.0, we’re also releasing our first new Wolfram Language Client Library—for Python. The basic idea of this library is to make it easy for Python programs to call the Wolfram Language. (It’s worth pointing out that we’ve effectively had a C Language Client Library for no less than 30 years—through what’s now called WSTP.)
The way a Language Client Library works is different for different languages. For Python—as an interpreted language (that was actually historically informed by early Wolfram Language)—it’s particularly simple. After you set up the library, and start a session (locally or in the cloud), you can then just evaluate Wolfram Language code and get the results back in Python:
You can also directly access Wolfram Language functions (as a kind of inverse of ExternalFunction):
And you can directly interact with things like pandas structures, NumPy arrays, etc. In fact, you can in effect just treat the whole of the Wolfram Language like a giant library that can be accessed from Python. Or, of course, you can just use the nice, integrated Wolfram Language directly, perhaps creating external APIs if you need them.
More for the Wolfram “Super Shell”
One feature of using the Wolfram Language is that it lets you get away from having to think about the details of your computer system, and about things like files and processes. But sometimes one wants to work at a systems level. And for fairly simple operations, one can just use an operating system GUI. But what about for more complicated things? In the past I usually found myself using the Unix shell. But for a long time now, I’ve instead used Wolfram Language.
It’s certainly very convenient to have everything in a notebook, and it’s been great to be able to programmatically use functions like FileNames (ls), FindList (grep), SystemProcessData (ps), RemoteRunProcess (ssh) and FileSystemScan. But in Version 12.0 we’re adding a bunch of additional functions to support using the Wolfram Language as a “super shell”.
There’s RemoteFile for symbolically representing a remote file (with authentication if needed)— that you can immediately use in functions like CopyFile. There’s FileConvert for directly converting files between different formats.
And if you really want to dive deep, here’s how you’d trace all the packets on ports 80 and 443 used in reading from wolfram.com:
Within the Wolfram Language, it’s been easy for a long time to interact with web servers, using functions like URLExecute and HTTPRequest, as well as $Cookies, etc. But in Version 12.0 we’re adding something new: the ability of the Wolfram Language to control a web browser, and programmatically make it do what we want. The most immediate thing we can do is just to get an image of what a website looks like to a web browser:
✕
WebImage["https://www.wolfram.com"]
The result is an image that we can compute with:
✕
EdgeDetect[%]
To do something more detailed, we have to start a browser session (we currently support Firefox and Chrome):
✕
session = StartWebSession["Chrome"]
Immediately a blank browser window appears on our screen. Now we can use WebExecute to open a webpage:
You can type into fields, run JavaScript, and basically do programmatically anything you could do by hand with a web browser. Needless to say, we’ve been using a version of this technology for years inside our company to test all our various websites and web services. But now, in Version 12.0, we’re making a streamlined version generally available.
For every general-purpose computer in the world today, there are probably 10 times as many microcontrollers—running specific computations without any general operating system. A microcontroller might cost a few cents to a few dollars, and in something like a mid-range car, there might be 30 of them.
In Version 12.0 we’re introducing a Microcontroller Kit for the Wolfram Language, that lets you give symbolic specifications from which it automatically generates and deploys code to run autonomously in microcontrollers. In the typical setup, a microcontroller is continuously doing computations on data coming in from sensors, and in real time putting out signals to actuators. The most common types of computations are effectively ones in control theory and signal processing.
We’ve had extensive support for doing control theory and signal processing directly in the Wolfram Language for a long time. But now what’s possible with the Microcontroller Kit is to take what’s specified in the language and download it as embedded code in a standalone microcontroller that can be deployed anywhere (in devices, IoT, appliances, etc.).
As an example, here’s how one can generate a symbolic representation of an analog signal-processing filter:
✕
ButterworthFilterModel[{3,2}]
We can use this filter directly in the Wolfram Language—say using RecurrenceFilter to apply it to an audio signal. We can also do things like plot its frequency response:
To deploy the filter in a microcontroller, we first have to derive from this continuous-time representation a discrete-time approximation that can be run in a tight loop (here, every 0.1 seconds) in the microcontroller:
Now we’re ready to use the Microcontroller Kit to actually deploy this to a microcontroller. The kit supports more than a hundred different types of microcontrollers. Here’s how we could deploy the filter to an Arduino Uno that we have connected to a serial port on our computer:
MicrocontrollerEmbedCode works by generating appropriate C-like source code, compiling it for the microcontroller architecture you want, then actually deploying it to the microcontroller through its so-called programmer. Here’s the actual source code that was generated in this particular case:
✕
%["SourceCode"]
So now we have a thing like this that runs our Butterworth filter, that we can use anywhere:
If we want to check what it’s doing, we can always connect it back into the Wolfram Language using DeviceOpen to open its serial port, and read and write from it.
What’s the relation between the Wolfram Language and video games? Over the years, the Wolfram Language has been used behind the scenes in many aspects of game development (simulating strategies, creating geometries, analyzing outcomes, etc.). But for some time now we’ve been working on a closer link between Wolfram Language and the Unity game environment, and in Version 12.0 we’re releasing a first version of this link.
The basic scheme is to have Unity running alongside the Wolfram Language, then to set up two-way communication, allowing both objects and commands to be exchanged. The under-the-hood plumbing is quite complex, but the result is a nice merger of the strengths of Wolfram Language and Unity.
This sets up the link, then starts a new project in Unity:
Within the Wolfram Language there’s a symbolic representation of the object, and UnityLink now provides hundreds of functions for manipulating such objects, always maintaining versions both in Unity and in the Wolfram Language.
It’s very powerful that one can take things from the Wolfram Language and immediately put them into Unity—whether they’re geometry, images, audio, geo terrain, molecular structures, 3D anatomy, or whatever. It’s also very powerful that such things can then be manipulated within the Unity game, either through things like game physics, or by user action. (Eventually, one can expect to have Manipulate-like functionality, in which the controls aren’t just sliders and things, but complex pieces of gameplay.)
We’ve done experiments with putting Wolfram Language–generated content into virtual reality since the early 1990s. But in modern times Unity has become something of a de facto standard for setting up VR/AR environments—and with UnityLink it’s now straightforward to routinely put things from Wolfram Language into any modern XR environment.
One can use the Wolfram Language to prepare material for Unity games, but within a Unity game UnityLink also basically lets one just insert Wolfram Language code that can be executed during a game either on a local machine or through an API in the Wolfram Cloud. And, among other things, this makes it straightforward to put hooks into a game so the game can send “telemetry” (say to the Wolfram Data Drop) for analysis in the Wolfram Language. (It’s also possible to script the playing of the game—which is, for example, very useful for game testing.)
Writing games is a complex matter. But UnityLink provides an interesting new approach that should make it easier to prototype all sorts of games, and to learn the ideas of game development. One reason for this is that it effectively lets one script a game at a higher level by using symbolic constructs in the Wolfram Language. But another reason is that it lets the development process be done incrementally in a notebook, and explained and documented every step of the way. For example, here’s what amounts to a computational essay describing the development of a “piano game”:
UnityLink isn’t a simple thing: it contains more than 600 functions. But with those functions it’s possible to access pretty much all the capabilities of Unity, and to set up pretty much any imaginable game.
For something like reinforcement learning it’s essential to have a manipulable external environment in the loop when one’s doing machine learning. Well, ServiceExecute lets you call APIs (what’s the effect of posting that tweet, or making that trade?), and DeviceExecute lets you actuate actual devices (turn the robot left) and get data from sensors (did the robot fall over?).
But for many purposes what one instead wants is to have a simulated external environment. And in a way, just the pure Wolfram Language already to some extent does that, for example providing access to a rich “computational universe” full of modifiable programs and equations (cellular automata, differential equations, …). And, yes, the things in that computational universe can be informed by the real world—say with the realistic properties of oceans, or chemicals or mountains.
But what about environments that are more like the ones we modern humans typically learn in—full of built engineering structures and so on? Conveniently enough, SystemModel gives access to lots of realistic engineering systems. And through UnityLink we can expect to have access to rich game-based simulations of the world.
But as a first step, in Version 12.0 we’re setting up connections to some simple games—in particular from the OpenAI “gym”. The interface is much as it would be for interacting with the real world, with the game accessed like a “device” (after appropriate sometimes-“open-source-painful” installation):
With a bit more effort, we can take 100 random actions in the game (always checking that we didn’t “die”), then show a feature space plot of the observed states of the game:
In Version 11.3 we began our first connection to the blockchain. Version 12.0 adds a lot of new features and capabilities, perhaps most notably the ability to write to public blockchains, as well as read from them. (We also have our own Wolfram Blockchain for Wolfram Cloud users.) We’re currently supporting Bitcoin, Ethereum and ARK blockchains, both their mainnets and testnets (and, yes, we have our own nodes connecting directly to these blockchains).
In Version 11.3 we allowed raw reading of transactions from blockchains. In Version 12.0 we’ve added a layer of analysis, so that, for example, you can ask for a summary of “CK” tokens (AKA CryptoKitties) on the Ethereum blockchain:
✕
BlockchainTokenData["CK"]
It’s quick to look at all token transactions in history, and make a word cloud of how active different tokens have been:
But what about doing our own transaction? Let’s say we want to use a Bitcoin ATM (like the one that, bizarrely, exists at a bagel store near me) to transfer cash to a Bitcoin address. Well, first we create our crypto keys (and we need to make sure we remember our private key!):
✕
keys=GenerateAsymmetricKeyPair["Bitcoin"]
Next, we have to take our public key and generate a Bitcoin address from it:
Make a QR code from that and you’re ready to go to the ATM:
✕
BarcodeImage[%,"QR"]
But what if we want to write to the blockchain ourselves? Here we’ll use the Bitcoin testnet (so we’re not spending real money). This shows an output from a transaction we did before—that includes 0.0002 bitcoin (i.e. 20,000 satoshi):
Now we can set up a transaction which takes this output, and, for example, sends 8000 satoshi to each of two addresses (that we defined just like for the ATM transaction):
OK, so now we’ve got a blockchain transaction object—that would offer a fee (shown in red because it’s “actual money” you’ll spend) of all the leftover cryptocurrency (here 4000 satoshi) to a miner willing to put the transaction in the blockchain. But before we can submit this transaction (and “spend the money”) we have to sign it with our private key:
✕
BlockchainTransactionSign[%, keys["PrivateKey"]]
Finally, we just apply BlockchainTransactionSubmit and we’ve submitted our transaction to be put on the blockchain:
✕
BlockchainTransactionSubmit[%]
Here’s its transaction ID:
✕
txid=%["TransactionID"]
If we immediately ask about this transaction, we’ll get a message saying it isn’t in the blockchain:
✕
BlockchainTransactionData[txid]
But after we wait a few minutes, there it is—and it’ll soon spread to every copy of the Bitcoin testnet blockchain:
✕
BlockchainTransactionData[txid]
If you’re prepared to spend real money, you can use exactly the same functions to do a transaction on a main net. You can also do things like buy CryptoKitties. Functions like BlockchainContractValue can be used for any (for now, only Ethereum) smart contract, and are set up to immediately understand things like ERC-20 and ERC-721 tokens.
Dealing with blockchains involves lots of cryptography, some of which is new in Version 12.0 (notably, handling elliptic curves). But in Version 12.0 we’re also extending our non-blockchain cryptographic functions. For example, we’ve now got functions for directly dealing with digital signatures. This creates a digital signature using the private key from above:
In Version 12.0, we added several new types of hashes for the Hash function, particularly to support various cryptocurrencies. We also added ways to generate and verify derived keys. Start from any password, and GenerateDerivedKey will “puff it out” to something longer (to be more secure you should add “salt”):
✕
GenerateDerivedKey["meow"]
Here’s a version of the derived key, suitable for use in various authentication schemes:
✕
GenerateDerivedKey["meow"]["PHCString"]
Connecting to Financial Data Feeds
The Wolfram Knowledgebase contains all sorts of financial data. Typically there’s a financial entity (like a stock), then there’s a property (like price). Here’s the complete daily history of Apple’s stock price (it’s very impressive that it looks best on a log scale):
But while the financial data in the Wolfram Knowledgebase, and standardly available in the Wolfram Language, is continuously updated, it’s not real time (mostly it’s 15-minute delayed), and it doesn’t have all the detail that many financial traders use. For serious finance use, however, we’ve developed Wolfram Finance Platform. And now, in Version 12.0, it’s got direct access to Bloomberg and Reuters financial data feeds.
The way we architect the Wolfram Language, the framework for the connections to Bloomberg and Reuters is always available in the language—but it’s only activated if you have Wolfram Finance Platform, as well as the appropriate Bloomberg or Reuters subscriptions. But assuming you have these, here’s what it looks like to connect to the Bloomberg Terminal service:
✕
ServiceConnect["BloombergTerminal"]
All the financial instruments handled by the Bloomberg Terminal now become available as entities in the Wolfram Language:
✕
Entity["BloombergTerminal","AAPL US Equity"]
Now we can ask for properties of this entity:
✕
Entity["BloombergTerminal","AAPL US Equity"]["PX_LAST"]
Altogether there are more than 60,000 properties accessible from the Bloomberg Terminal:
✕
Length[EntityProperties["BloombergTerminal"]]
Here are 5 random examples (yes, they’re pretty detailed; those are Bloomberg names, not ours):
We support the Bloomberg Terminal service, the Bloomberg Data License service, and the Reuters Elektron service. One sophisticated thing one can now do is to set up a continuous task to asynchronously receive data, and call a “handler function” every time a new piece of data comes in:
I’ve talked about lots of new functions and new functionality in the Wolfram Language. But what about the underlying infrastructure of the Wolfram Language? Well, we’ve been working hard on that too. For example, between Version 11.3 and Version 12.0 we’ve managed to fix nearly 8000 reported bugs. We’ve also made lots of things faster and more robust. And in general we’ve been tightening the software engineering of the system, for example reducing the initial download size by nearly 10% (despite all the functionality that’s been added). (We’ve also done things like improve the predictive prefetching of knowledgebase elements from the cloud—so when you need similar data it’s more likely to be already cached on your computer.)
It’s a longstanding feature of the computing landscape that operating systems are continually getting updated—and to take advantage of their latest features, applications have to get updated too. We’ve been working for several years on a major update to our Mac notebook interface—which is finally ready in Version 12.0. As part of the update, we’ve rewritten and restructured large amounts of code that have been developed and polished over more than 20 years, but the result is that in Version 12.0, everything about our system on the Mac is fully 64-bit, and makes use of the latest Cocoa APIs. This means that the notebook front end is significantly faster—and can also go beyond the previous 2 GB memory limit.
There’s also a platform update on Linux, where now the notebook interface fully supports Qt 5, which allows all rendering operations to take place “headlessly”, without any X server—greatly streamlining deployment of the Wolfram Engine in the cloud. (Version 12.0 doesn’t yet have high-dpi support for Windows, but that’s coming very soon.)
The development of the Wolfram Cloud is in some ways separate from the development of the Wolfram Language, and Wolfram Desktop applications (though for internal compatibility we’re releasing Version 12.0 at the same time in both environments). But in the past year since Version 11.3 was released, there’s been dramatic progress in the Wolfram Cloud.
Especially notable are the advances in cloud notebooks—supporting more interface elements (including some, like embedded websites and videos, that aren’t even yet available in desktop notebooks), as well as greatly increased robustness and speed. (Making our whole notebook interface work in a web browser is no small feat of software engineering, and in Version 12.0 there are some pretty sophisticated strategies for things like maintaining consistent fast-to-load caches, along with full symbolic DOM representations.)
In Version 12.0 there’s now just a simple menu item (File > Publish to Cloud …) to publish any notebook to the cloud. And once the notebook is published, anyone in the world can interact with it—as well as make their own copy so they can edit it.
It’s interesting to see how broadly the cloud has entered what can be done in the Wolfram Language. In addition to all the seamless integration of the cloud knowledgebase, and the ability to reach out to things like blockchains, there are also conveniences like Send To… sending any notebook through email, using the cloud if there’s no direct email server connection available.
And a Lot Else…
Even though this has been a long piece, it’s not even close to telling the whole story of what’s new in Version 12.0. Along with the rest of our team, I’ve been working very hard on Version 12.0 for a long time now—but it’s still exciting to see just how much is actually in it.
But what’s critical (and a lot of work to achieve!) is that everything we’ve added is carefully designed to fit coherently with what’s already there. From the very first version more than 30 years ago of what’s now the Wolfram Language, we’ve been following the same core principles—and this is part of what’s allowed us to so dramatically grow the system while maintaining long-term compatibility.
It’s always difficult to decide exactly what to prioritize developing for each new version, but I’m very pleased with the choices we made for Version 12.0. I’ve given many talks over the past year, and I’ve been very struck with how often I’ve been able to say about things that come up: “Well, it so happens that that’s going to be part of Version 12.0!”
I’ve personally been using internal preliminary builds of Version 12.0 for nearly a year, and I’ve come to take for granted many of its new capabilities—and to use and enjoy them a lot. So it’s a great pleasure that today we have the final Version 12.0—with all these new capabilities officially in it, ready to be used by anyone and everyone…
This is an edited transcript of a recent talk I gave at a blockchain conference, where I said I’d talk about “What will the world be like when computational intelligence and computational contracts are ubiquitous?”
We live in an interesting time today—a time when we’re just beginning to see the implications of what we might call “the force of computation”. In the end, it’s something that’s going to affect almost everything. And what’s going to happen is really a deep story about the interplay between the human condition, the achievements of human civilization—and the fundamental nature of this thing we call computation.
So what is computation? Well, it’s what happens when you follow rules, or what we call programs. Now of course there are plenty of programs that we humans have written to do particular things. But what about programs in general—programs in the abstract? Well, there’s an infinite universe of possible programs out there. And many years ago I turned my analog of a telescope towards that computational universe. And this is what I saw:
Each box represents a different simple program. And often they just do something simple. But look more carefully. There’s a big surprise. This is the first example I saw—rule 30:
You start from one cell, and you just follow that simple program—but here’s what you get: all that complexity. At first it’s hard to believe that you can get so much from so little. But seeing this changed my whole worldview, and made me realize just how powerful the force of computation is.
Because that’s what’s making all that complexity. And that’s what lets nature—seemingly so effortlessly—make the complexity it does. It’s also what allows something like mathematics to have the richness it does. And it provides the raw material for everything it’s possible for us humans to do.
Now the fact is that we’re only just starting to tap the full force of computation. And actually, most of the things we do today—as well as the technology we build—are specifically set up to avoid it. Because we think we have to make sure that everything stays simple enough that we can always foresee what’s going to happen.
But to take advantage of all that power out there in the computational universe, we’ve got to go beyond that. So here’s the issue: there are things we humans want to do—and then there’s all that capability out there in the computational universe. So how do we bring them together?
Well, actually, I’ve spent a good part of my life trying to solve that—and I think the key is what I call computational language. And, yes, there’s only basically one full computational language that exists in the world today—and it’s the one I’ve spent the past three decades building—the Wolfram Language.
Traditional computer languages—“programming languages”—are designed to tell computers what to do, in essentially the native terms that computers use. But the idea of a computational language is instead to take the kind of things we humans think about, and then have a way to express them computationally. We need a computational language to be able to talk not just about data types and data structures in a computer, but also about real things that exist in our world, as well as the intellectual frameworks we use to discuss them.
And with a computational language, we have not only a way to help us formulate our computational thinking, but also a way to communicate to a computer on our terms.
I think the arrival of computational language is something really important. There’s some analog of it in the arrival of mathematical notation 400 or so years ago—that’s what allowed math to take off, and in many ways launched our modern technical world. There’s also some analog in the whole idea of written language—which launched so many things about the way our world is set up.
But, you know, if we look at history, probably the single strongest systematic trend is the advance of technology. That over time there’s more and more that we’ve been able to automate. And with computation that’s dramatically accelerating. And in the end, in some sense, we’ll be able to automate almost everything. But there’s still something that can’t be automated: the question of what we want to do.
It’s the pattern of technology today, and it’s going to increasingly be the pattern of technology in the future: we humans define what we want to do—we set up goals—and then technology, as efficiently as possible, tries to do what we want. Of course, a critical part of this is explaining what we want. And that’s where computational language is crucial: because it’s what allows us to translate our thinking to something that can be executed automatically by computation. In effect, it’s a bridge between our patterns of thinking, and the force of computation.
Let me say something practical about computational language for a moment. Back at the dawn of the computer industry, we were just dealing with raw computers programmed in machine code. But soon there started to be low-level programming languages, then we started to be able to take it for granted that our computers would have operating systems, then user interfaces, and so on.
Well, one of my goals is to make computational intelligence also something that’s ubiquitous. So that when you walk up to your computer you can take for granted that it will have the knowledge—the intelligence—of our civilization built into it. That it will immediately know facts about the world, and be able to use the achievements of science and other areas of human knowledge to work things out.
Obviously with Wolfram Language and Wolfram|Alpha and so on we’ve built a lot of this. And you can even often use human natural language to do things like ask questions. But if you really want to build up anything at all sophisticated, you need a more systematic way to express yourself, and that’s where computational language—and the Wolfram Language—is critical.
OK, well, here’s an important use case: computational contracts. In today’s world, we’re typically writing contracts in natural language, or actually in something a little more precise: legalese. But what if we could write our contracts in computational language? Then they could always be as precise as we want them to be. But there’s something else: they can be executed automatically, and autonomously. Oh, as well as being verifiable, and simulatable, and so on.
Computational contracts are something more general than typical blockchain smart contracts. Because by their nature they can talk about the real world. They don’t just involve the motion of cryptocurrency; they involve data and sensors and actuators. They involve turning questions of human judgement into machine learning classifiers. And in the end, I think they’ll basically be what run our world.
Right now, most of what the computers in the world do is to execute tasks we basically initiate. But increasingly our world is going to involve computers autonomously interacting with each other, according to computational contracts. Once something happens in the world—some computational fact is established—we’ll quickly see cascades of computational contracts executing. And there’ll be all sorts of complicated intrinsic randomness in the interactions of different computational acts.
In a sense, what we’ll have is a whole AI civilization. With its own activities, and history, and memories. And the computational contracts are in effect the laws of the AI civilization. We’ll probably want to have a kind of AI constitution, that defines how generally we want the AIs to act.
Not everyone or every country will want the same one. But we’ll often want to say things like “be nice to humans”. But how do we say that? Well, we’ll have to use a computational language. Will we end up with some tiny statement—some golden rule—that will just achieve everything we want? The complexity of human systems of laws doesn’t make that seem likely. And actually, with what we know about computation, we can see that it’s theoretically impossible.
Because, basically, it’s inevitable that there will be unintended consequences—corner cases, or bugs, or whatever. And there’ll be an infinite hierarchy of patches one needs to apply—a bit like what we see in human laws.
You know, I keep on talking about computers and AIs doing computation. But actually, computation is a more general thing. It’s what you get by following any set of rules. They could be rules for a computer program. But they could also be rules, say, for some technological system, or some system in nature.
Think about all those programs out in the computational universe. In detail, they’re all doing different things. But how do they compare? Is there some whole hierarchy of who’s more powerful than whom? Well, it turns out that the computational universe is a very egalitarian place—because of something I discovered called the Principle of Computational Equivalence.
Because what this principle says is that all programs whose behavior is not obviously simple are actually equivalent in the sophistication of the computations they do. It doesn’t matter if your rules are very simple or very complicated: there’s no difference in the sophistication of the computations that get done.
It’s been more than 80 years since the idea of universal computation was established: that it’s possible to have a fixed machine that can be programmed to do any possible computation. And obviously that’s been an important idea—because it’s what launched the software industry, and much of current technology.
But the Principle of Computational Equivalence says something more: it says that not only is something like universal computation possible, it’s ubiquitous. Out in the computational universe of possible programs many achieve it, even very simple ones, like rule 30. And, yes, in practice that means we can expect to make computers out of much simpler—say molecular—components than we might ever have imagined. And it means that all sorts of even rather simple software systems can be universal—and can’t be guaranteed secure.
But there’s a more fundamental consequence: the phenomenon of computational irreducibility. Being able to predict stuff is a big thing, for example in traditional science-oriented thinking. But if you’re going to predict what a computational system—say rule 30—is going to do, what it means is that somehow you have to be smarter than it is. But the Principle of Computational Equivalence says that’s not possible. Whether it’s a computer or a brain or anything else, it’s doing computations that have exactly the same sophistication.
So it can’t outrun the actual system itself. The behavior of the system is computationally irreducible: there’s no way to find out what it will be except in effect by explicitly running or watching it. You know, I came up with the idea of computational irreducibility in the early 1980s, and I’ve thought a lot about its applications in science, in understanding phenomena like free will, and so on. But I never would have guessed that it would find an application in proof-of-work for blockchains, and that measurable fractions of the world’s computers would be spending their time purposefully grinding computational irreducibility.
By the way, it’s computational irreducibility that means you’ll always have unintended consequences, and you won’t be able to have things like a simple and complete AI constitution. But it’s also computational irreducibility that in a sense means that history is significant: that there’s something irreducible achieved by the course of history.
You know, so far in history we’ve only really had one example of what we’re comfortable calling “intelligence”—and that’s human intelligence. But something the Principle of Computational Equivalence implies is that actually there are lots of things that are computationally just as sophisticated. There’s AI that we purposefully build. But then there are also things like the weather. Yes, we might say in some animistic way “the weather has a mind of its own”. But what the Principle of Computational Equivalence implies is that in some real sense it does: that the hydrodynamic processes in the atmosphere are just as sophisticated as anything going on in our brains.
And when we look out into the cosmos, there are endless examples of sophisticated computation—that we really can’t distinguish from “extraterrestrial intelligence”. The only difference is that—like with the weather—it’s just computation going on. There’s no alignment with human purposes. Of course, that’s a slippery business. Is that graffiti on the blockchain put there on purpose? Or is it just the result of some computational process?
That’s why computational language is important: it provides a bridge between raw computation and human thinking. If we look inside a typical modern neural net, it’s very hard to understand what it does. Same with the intermediate steps of an automated proof of a theorem. The issue is that there’s no “human story” that can be told about what’s going on there. It’s computation, alright. But—a bit like the weather—it’s not computation that’s connected to human experience.
It’s a bit of a complicated thing, though. Because when things get familiar, they do end up seeming human. We invent words for common phenomena in the weather, and then we can effectively use them to tell stories about what’s going on. I’ve spent much of my life as a computational language designer. And in a sense the essence of language design is to identify what common lumps of computational work there are, that one can make into primitives in the language.
And it’s sort of a circular thing. Once one’s developed a particular primitive—a particular abstraction—one then finds that one can start thinking in terms of it. And then the things one builds end up being based on it. It’s the same with human natural language. There was a time when the word “table” wasn’t there. So people had to start describing things with flat surfaces, and legs, and so on. But eventually this abstraction of a “table” appeared. And once it did, it started to get incorporated into the environment people built for themselves.
It’s a common story. In mathematics there are an infinite number of possible theorems. But the ones people study are ones that are reached by creating some general abstraction and then progressively building on it. When it comes to computation, there’s a lot that happens in the computational universe—just like there’s a lot that happens in the physical universe—that we don’t have a way to connect to.
It’s like the AIs are going off and leading their own existence, and we don’t know what’s going on. But that’s the importance of computational language, and computational contracts. They’re what let us connect the AIs with what we humans understand and care about.
Let’s talk a little about the more distant future. Given the Principle of Computational Equivalence I have to believe that our minds—our consciousness—can perfectly well be represented in purely digital form. So, OK, at some point the future of our civilization might be basically a trillion souls in a box. There’ll be a complicated mixing of the alien intelligence of AI with the future of human intelligence.
But here’s the terrible thing: looked at from the outside, those trillion souls that are our future will just be doing computations—and from the Principle of Computational Equivalence, those computations won’t be any more sophisticated than the computations that happen, say, with all these electrons running around inside a rock. The difference, though, is that the computations in the box are in a sense our computations; they’re computations that are connected to our characteristics and our purposes.
At some level, it seems like a bad outcome if the future of our civilization is a trillion disembodied souls basically playing videogames for the rest of eternity. But human purposes evolve. I mean, if you tried to explain to someone from a thousand years ago why today we might walk on a treadmill, we’d find it pretty difficult. And I think the good news is that at any time in history, what’s happening then can seem completely meaningful at that time.
The Principle of Computational Equivalence tells us that in a sense computation is ubiquitous. Right now the computation we define exists mostly in the computers we’ve built. But in time, I expect we won’t just have computers: everything will basically be made of computers. A bit like a generalization of how it works with biological life, every object and every material will be made of components that do computations we’ve somehow defined.
But the pressure again is on how we do that definition. Physics gives some basic rules. But we get to say more than that. And it’s computational language that makes what we say be meaningful to us humans.
In the much nearer term, there’s a very important transition: the point at which literacy in computational language becomes truly commonplace. It’s been great with the Wolfram Language that we can now give kids a way to actually do computational thinking for real. It’s great that we can now have computational essays where people get to express themselves in a mixture of natural language and computational language.
But what will be possible with this? In a sense, human language was what launched civilization. What will computational language do? We can rethink almost everything: democracy that works by having everyone write a computational essay about what they want, that’s then fed to a big central AI—which inevitably has all the standard problems of political philosophy. New ways to think about what it means to do science, or to know things. Ways to organize and understand the civilization of the AIs.
A big part of this is going to start with computational contracts and the idea of autonomous computation—a kind of strange merger of the world of natural law, human law, and computational law. Something anticipated three centuries ago by people like Leibniz—but finally becoming real today. Finally a world run with code.
I’ve sometimes found it a bit of a struggle to explain what the Wolfram Language really is. Yes, it’s a computer language—a programming language. And it does—in a uniquely productive way, I might add—what standard programming languages do. But that’s only a very small part of the story. And what I’ve finally come to realize is that one should actually think of the Wolfram Language as an entirely different—and new—kind of thing: what one can call a computational language.
So what is a computational language? It’s a language for expressing things in a computational way—and for capturing computational ways of thinking about things. It’s not just a language for telling computers what to do. It’s a language that both computers and humans can use to represent computational ways of thinking about things. It’s a language that puts into concrete form a computational view of everything. It’s a language that lets one use the computational paradigm as a framework for formulating and organizing one’s thoughts.
It’s only recently that I’ve begun to properly internalize just how broad the implications of having a computational language really are—even though, ironically, I’ve spent much of my life engaged precisely in the consuming task of building the world’s only large-scale computational language.
It helps me to think about a historical analog. Five hundred years ago, if people wanted to talk about mathematical ideas and operations, they basically had to use human natural language, essentially writing out everything in terms of words. But the invention of mathematical notation about 400 years ago (starting with +, ×, =, etc.) changed all that—and began to provide a systematic structure and framework for representing mathematical ideas.
The consequences were surprisingly dramatic. Because basically it was this development that made modern forms of mathematical thinking (like algebra and calculus) feasible—and that launched the mathematical way of thinking about the world as we know it, with all the science and technology that’s come from it.
Well, I think it’s a similar story with computational language. But now what’s happening is that we’re getting a systematic way to represent—and talk about—computational ideas, and the computational way of thinking about the world. With standard programming languages, we’ve had a way to talk about the low-level operation of computers. But with computational language, we now have a way to apply the computational paradigm directly to almost anything: we have a language and a notation for doing computational X, for basically any field “X” (from archaeology to zoology, and beyond).
There’ve been some “mathematical X” fields for a while, where typically the point is to formulate things in terms of traditional mathematical constructs (like equations), that can then “mechanically” be solved (at least, say, with Mathematica!). But a great realization of the past few decades has been that the computational paradigm is much broader: much more can be represented computationally than just mathematically.
Sometimes one’s dealing with very simple abstract programs (and, indeed, I’ve spent years exploring the science of the computational universe of such programs). But often one’s interested in operations and entities that relate to our direct experience of the world. But the crucial point here is that—as we’ve learned in building the Wolfram Language—it’s possible to represent such things in a computational way. In other words, it’s possible to have a computational language that can talk about the world—in computational terms.
And that’s what’s needed to really launch all those possible “computational X” fields.
What Is Computational Language Like?
Let’s say we want to talk about planets. In the Wolfram Language, planets are just symbolic entities:
✕
EntityList[EntityClass["Planet", All]]
We can compute things about them (here, the mass of Jupiter divided by the mass of Earth):
To talk about the real world in computational terms, you have to be able to compute things about it. Like here, the Wolfram Language is computing the current position (as I write this) of the planet Mars:
It’s amazing what ends up being computable. Here are rasterized images of each letter of the Greek alphabet distributed in “visual feature space”:
✕
FeatureSpacePlot[Rasterize /@ Alphabet["Greek"]]
Yes, it is (I think) impressive what the Wolfram Language can do. But what’s more important here is to see how it lets one specify what to do. Because this is where computational language is at work—giving us a way to talk computationally about planets and human faces and visual feature spaces.
Of course, once we’ve formulated something in computational language, we’re in a position (thanks to the whole knowledgebase and algorithmbase of the Wolfram Language) to actually do a computation about it. And, needless to say, this is extremely powerful. But what’s also extremely powerful is that the computational language itself gives us a way to formulate things in computational terms.
Let’s say we want to know how efficient the Roman numeral system was. How do we formulate that question computationally? We might think about knowing the string lengths of Roman numerals, and comparing them to the lengths of modern integers. It’s easy to express that in Wolfram Language. Here’s a Roman numeral:
✕
RomanNumeral[188]
And here’s its string length:
✕
StringLength[RomanNumeral[188]]
Now here’s a plot of all Roman numeral lengths up to 200, divided by the corresponding integer lengths—with callouts automatically showing notable values:
But of course in actual usage, some numbers are more common than others. So how can we capture that? Well, here’s one (rather naive) computational approach. Let’s just analyze the Wikipedia article about arithmetic, and see what integers it mentions. Again, that computational concept is easy to express in the Wolfram Language: finding cases of numbers in the article, then selecting those that are interpreted as integers:
There are some big numbers, with Roman-numeral representations for which the notion of “string length” doesn’t make much sense:
✕
RomanNumeral[7485696]
And then there’s 0, for which the Romans didn’t have an explicit representation. But restricting to “Roman-stringable” numbers, we can make our histogram again:
And what’s crucial here is that—with Wolfram Language—we’re in a position to formulate our thinking in terms of computational concepts, like StringLength and TextCases and Select and Histogram. And we’re able to use the computational language to express our computational thinking—in a way that humans can read, and the computer can compute from.
The Difference from Programming Languages
As a practical matter, the examples of computational language we’ve just seen look pretty different from anything one would normally do with a standard programming language. But what is the fundamental difference between a computational language and a programming language?
First and foremost, it’s that a computational language tries to intrinsically be able to talk about whatever one might think about in a computational way—while a programming language is set up to intrinsically talk only about things one can directly program a computer to do. So for example, a computational language can intrinsically talk about things in the real world—like the planet Mars or New York City or a chocolate chip cookie. A programming language can intrinsically talk only about abstract data structures in a computer.
Inevitably, a computational language has to be vastly bigger and richer than a programming language. Because while a programming language just has to know about the operation of a computer, a computational language tries to know about everything—with as much knowledge and computational intelligence as possible about the world and about computation built into it.
To be fair, the Wolfram Language is the sole example that exists of a full-scale computational language. But one gets a sense of magnitude from it. While the core of a standard programming language typically has perhaps a few tens of primitive functions built in, the Wolfram Language has more than 5600—with many of those individually representing major pieces of computational intelligence. And in its effort to be able to talk about the real world, the Wolfram Language also has millions of entities of all sorts built into it. And, yes, the Wolfram Language has had more than three decades of energetic, continuous development put into it.
Given a programming language, one can of course start programming things. And indeed many standard programming languages have all sorts of libraries of functions that have been created for them. But the objective of these libraries is not really the same as the objective of a true computational language. Yes, they’re providing specific “functions to call”. But they’re not trying to create a way to represent or talk about a broad range of computational ideas. To do that requires a coherent computational language—of the kind I’ve been building in the Wolfram Language all these years.
A programming language is (needless to say) intended as something in which to write programs. And while it’s usually considered desirable for humans to be able—at least at some level—to read the programs, the ultimate point is to provide a way to tell a computer what to do. But computational language can also achieve something else. Because it can serve as an expressive medium for communicating computational ideas to humans as well as to computers.
Even when one’s dealing with abstract algorithms, it’s common with standard programming languages to want to talk in terms of some kind of “pseudocode” that lets one describe the algorithms without becoming enmeshed in the (often fiddly) details of actual implementation. But part of the idea of computational language is always to have a way to express computational ideas directly in the language: to have the high-level expressiveness and readability of pseudocode, while still having everything be precise, complete and immediately executable on a computer.
Looking at the examples above, one thing that’s immediately obvious is that having the computational language be symbolic is critical. In most standard programming languages, x on its own without a value doesn’t mean anything; it has to stand for some structure in the memory of the computer. But in a computational language, one’s got to be able to have things that are purely symbolic, and that represent, for example, entities in the real world—that one can operate on just like any other kind of data.
There’s a whole cascade of wonderful unifications that flow from representing everything as a symbolic expression, crucial in being able to coherently build up a full-scale computational language. And to make a computational language as readily absorbable by humans as possible, there are also all sorts of detailed issues of interface—like having hierarchically structured notebooks, allowing details of computational language to be iconized for display, and so on.
Why Not Just Use Natural Language?
Particularly in this age of machine learning one might wonder why one would need a precisely defined computational language at all. Why not just use natural language for everything?
Wolfram|Alpha provides a good example (indeed, probably the most sophisticated one that exists today) of what can be done purely with natural language. And indeed for the kinds of short questions that Wolfram|Alpha normally handles, it proves that natural language can work quite well.
But what if one wants to build up something more complicated? Just like in the case of doing mathematics without notation, it quickly becomes impractical. And I could see this particularly clearly when I was writing an introductory book on the Wolfram Language—and trying to create exercises for it. The typical form of an exercise is: “Take this thing described in natural language, and implement it in Wolfram Language”. Early in the book, this worked OK. But as soon as things got more complicated, it became quite frustrating. Because I’d immediately know what I wanted to say in Wolfram Language, but it took a lot of effort to express it in natural language for the exercise, and often what I came up with was hard to read and reminiscent of legalese.
One could imagine that with enough back-and-forth, one might be able to explain things to a computer purely in natural language. But to get any kind of clear idea of what the computer has understood, one needs some more structured representation—which is precisely what computational language provides.
And it’s certainly no coincidence that the way Wolfram|Alpha works is first to translate whatever natural language input it’s given to precise Wolfram Language—and only then to compute answers from it.
In a sense, using computational language is what lets us leverage the last few centuries of exact science and systematic knowledge. Earlier in history, one imagined that one could reason about everything just using words and natural language. But three or four centuries ago—particularly with mathematical notation and other mathematical ideas—it became clear that one could go much further if one had a structured, formal way of talking about the world. And computational language now extends that—bringing a much wider range of things into the domain of formal computational thinking, and going still further beyond natural language.
Of course, one argument for trying to use natural language is that “everybody already knows it”. But the whole point is to be able to apply computational thinking—and to do that systematically, one needs a new way of expressing oneself, which is exactly what computational language provides.
How Computational Language Leverages Natural Language
Computational language is something quite different from natural language, but in its construction it still uses natural language and people’s understanding of it. Because in a sense the “words” in the computational language are based on words in natural language. So, for example, in the Wolfram Language, we have functions like StringLength, TextCases and FeatureSpacePlot.
Each of these functions has a precise computational definition. But to help people understand and remember what the functions do, we use (very carefully chosen) natural language words in their names. In a sense, we’re leveraging people’s understanding of natural language to be able to create a higher level of language. (By the way, with our “code captions” mechanism, we’re able to at least annotate everything in lots of natural languages beyond English.)
It’s a slightly different story when it comes to the zillions of real-world entities that a computational language has to deal with. For a function like TextCases, you both have to know what it’s called, and how to use it. But for an entity like New York City, you just have to somehow get hold of it—and then it’s going to work the same as any other entity. And a convenient way to get hold of it is just to ask for it, by whatever (natural language) name you know for it.
For convenience, the inputs here are natural language. But the outputs—sometimes after a bit of disambiguation—are precise computational language, ready to be used wherever one wants.
And in general, it’s very powerful to be able to use natural language to specify small chunks of computational language. To express large-scale computational thinking, one needs the formality and structure of computational language. But “small utterances” can be given in natural language—like in Wolfram|Alpha—then translated to precise computational language:
I think by now there’s little doubt the introduction of the computational paradigm is the single most important intellectual development of the past century. And going forward, I think computational language is going to be crucial in being able to broadly make use of that paradigm—much as many centuries ago, mathematical notation was crucial to launching the widespread use of the mathematical paradigm.
How should one express and communicate the ideas of a “computational X” field? Blobs of low-level programming language code won’t do it. Instead, one needs something that can talk directly about things in the field—whether they are genes, animals, words, battles or whatever. And one also needs something that humans can readily read and understand. And this is precisely what computational language can provide.
Of course, computational language also has the giant bonus that computers can understand it, and that it can be used to specify actual computations to do. In other words, by being able to express something in computational language, you’re not only finding a good way to communicate it to humans, you’re also setting up something that can leverage the power of actual computation to automatically produce things.
And I suspect that in time it will become clear that the existence of computational language as a communication medium is what ultimately succeeded in launching a huge range of computational X fields. Because it’s what will allow the ideas in these fields to be put in a concrete form that people can think in terms of.
How will the computational language be presented? Often, I suspect, it will be part of what I call computational essays. A computational essay mixes natural language text with computational language—and with the outputs of actual computations described by the computational language. It’s a little like how for the past couple of centuries, technical papers have typically relied on mixing text and formulas.
But a computational essay is something much more powerful. For one thing, people can not only read the computational language in a computational essay, but they can also immediately reuse it elsewhere. In addition, when one writes a computational essay, it’s a computer-assisted activity, in which one shares the load with the computer. The human has to write the text and the computational language, but then the computer can automatically generate all kinds of results, infographics, etc. as described by the computational language.
In practice it’s important that computational essays can be presented in Wolfram Notebooks, in the cloud and on the desktop, and that these notebooks can contain all sorts of dynamic and computational elements.
One can expect to use computational essays for a wide range of things—whether papers, reports, exercises or whatever. And I suspect that computational essays, written with computational language, will become the primary form of communication for computational X fields.
I doubt we can yet foresee even a fraction of the places where computational language will be crucial. But one place that’s already clear is in defining computational contracts. In the past, contracts have basically always been written in natural language—or at least in the variant that is legalese. But computational language provides an alternative.
With the Wolfram Language as it is today, we can’t cover everything in every contract. But it’s already clear how we can use computational language to represent many kinds of things in the world that are the subject of contracts. And the point is that with computational language we can write a precise contract that both humans and machines can understand.
In time there’ll be computational contracts everywhere: for commerce, for defining goals, for AI ethics, and so on. And computational language is what will make them all possible.
When literacy in natural language began to become widespread perhaps 500 years ago, it led to sweeping changes in how the world could be organized, and in the development of civilization. In time I think it’s inevitable that there’ll also be widespread literacy in computational language. Certainly that will lead to much broader application of computational thinking (and, for example, the development of many “computational X” fields). And just as our world today is full of written natural language, so in the future we can expect that there will be computational language everywhere—that both defines a way for us humans to think in computational terms, and provides a bridge between human thinking and the computation that machines and AIs can do.
How Come It’s So Unique?
I’ve talked a lot about the general concept of computational language. But in the world today, there’s actually only one example that exists of a full-scale computational language: the Wolfram Language. At first, it might seem strange that one could say this so categorically. With all the technology out there in the world, how could something be that unique?
But it is. And I suppose this becomes a little less surprising when one realizes that we’ve been working on the Wolfram Language for well over thirty years—or more than half of the whole history of modern computing. And indeed, the span of time over which we’ve been able to consistently pursue the development of the Wolfram Language is now longer than for almost any other software system in history.
Did I foresee the emergence of the Wolfram Language as a full computational language? Not entirely. When I first started developing what’s now the Wolfram Language I wanted to make it as general as possible—and as flexible in representing computational ideas and processes.
At first, its most concrete applications were to mathematics, and to various kinds of modeling. But as time went on, I realized that more and more types of things could fit into the computational framework that we’d defined. And gradually this started to include things in the real world. Then, about a decade and a half ago, I realized that, yes, with the whole symbolic language we’d defined, we could just start systematically representing all those things like cities and chemicals in pretty much the same way as we’d represented abstract things before.
I’d always had the goal of putting as much knowledge as possible into the language, and of automating as much as possible. But from the beginning I made sure that the language was based on a small set of principles—and that as it grew it maintained a coherent and unified design.
Needless to say, this wasn’t easy. And indeed it’s been my daily activity now for more than 30 years (with, for example, 300+ hours of it livestreamed over the past year). It’s a difficult process, involving both deep understanding of every area the language covers, as well as a string of complicated judgement calls. But it’s the coherence of design that this achieves that has allowed the language to maintain its unity even as it has grown to encompass all sorts of knowledge about the real world, as well as all those other things that make it a full computational language.
Part of what’s made the Wolfram Language possible is the success of its principles and basic framework. But to actually develop it has also involved the creation of a huge tower of technology and content—and the invention of countless algorithms and meta-algorithms, as well as the acquisition and curation of immense amounts of data.
It’s been a strange mixture of intellectual scholarship and large-scale engineering—that we’ve been fortunate enough to be able to consistently pursue for decades. In many ways, this has been a personal mission of mine. And along the way, people have often asked me how to pigeonhole what we’re building. Is it a calculation system? Is it an encyclopedia-like collection of data? Is it a programming language?
Well, it’s all of those things. But they’re only part of the story. And as the Wolfram Language has developed, it’s become increasingly clear how far away it is from existing categories. And it’s only quite recently that I’ve finally come to understand what it is we’ve managed to build: the world’s only full computational language. Having understood this, it starts to be easier to see just how what we’ve been doing all these years fits into the arc of intellectual history, and what some of its implications might be going forward.
From a practical point of view, it’s great to be able to respond to that obvious basic question: “What is the Wolfram Language?” Because now we have a clear answer: “It’s a computational language!” And, yes, that’s very important!
Today it’s 10 years since we launched Wolfram|Alpha. At some level, Wolfram|Alpha is a never-ending project. But it’s had a great first 10 years. It was a unique and surprising achievement when it first arrived, and over its first decade it’s become ever stronger and more unique. It’s found its way into more and more of the fabric of the computational world, both realizing some of the long-term aspirations of artificial intelligence, and defining new directions for what one can expect to be possible. Oh, and by now, a significant fraction of a billion people have used it. And we’ve been able to keep it private and independent, and its main website has stayed free and without external advertising.
For me personally, the vision that became Wolfram|Alpha has a very long history. I first imagined creating something like it more than 47 years ago, when I was about 12 years old. Over the years, I built some powerful tools—most importantly the core of what’s now Wolfram Language. But it was only after some discoveries I made in basic science in the 1990s that I felt emboldened to actually try building what’s now Wolfram|Alpha.
It was—and still is—a daunting project. To take all areas of systematic knowledge and make them computable. To make it so that any question that can in principle be answered from knowledge accumulated by our civilization can actually be answered, immediately and automatically.
Leibniz had talked about something like this 350 years ago; Turing 70 years ago. But while science fiction (think the Star Trek computer) had imagined it, and AI research had set it as a key goal, 50 years of actual work on question-answering had failed to deliver. And I didn’t know for sure if we were in the right decade—or even the right century—to be able to build what I wanted.
But I decided to try. And it took lots of ideas, lots of engineering, lots of diverse scholarship, and lots of input from experts in a zillion fields. But by late 2008 we’d managed to get Wolfram|Alpha to the point where it was beginning to work. Day by day we were making it stronger. But eventually there was no sense in going further until we could see how people would actually use it.
And so it was that on May 18, 2009, we officially opened Wolfram|Alpha up to the world. And within hours we knew it: Wolfram|Alpha really worked! People asked all kinds of questions, and got successful answers. And it became clear that the paradigm we’d invented of generating synthesized reports from natural language input by using built-in computational knowledge was very powerful, and was just what people needed.
Perhaps because the web interface to Wolfram|Alpha was just a simple input field, some people assumed it was like a search engine, finding content on the web. But Wolfram|Alpha isn’t searching anything; it’s computing custom answers to each particular question it’s asked, using its own built-in computational knowledge—that we’ve spent decades amassing. And indeed, quite soon, it became clear that the vast majority of questions people were asking were ones that simply didn’t have answers already written down anywhere on the web; they were questions whose answers had to be computed, using all those methods and models and algorithms—and all that curated data—that we’d so carefully put into Wolfram|Alpha.
As the years have gone by, Wolfram|Alpha has found its way into intelligent assistants like Siri, and now also Alexa. It’s become part of chatbots, tutoring systems, smart TVs, NASA websites, smart OCR apps, talking (toy) dinosaurs, smart contract oracles, and more. It’s been used by an immense range of people, for all sorts of purposes. Inventors have used it to figure out what might be possible. Leaders and policymakers have used it to make decisions. Professionals have used it to do their jobs every day. People around the world have used it to satisfy their curiosity about all sorts of peculiar things. And countless students have used it to solve problems, and learn.
And in addition to the main, public Wolfram|Alpha, there are now all sorts of custom “enterprise” Wolfram|Alphas operating inside large organizations, answering questions using not only public data and knowledge, but also the internal data and knowledge of those organizations.
It’s fun when I run into high-school and college kids who notice my name and ask “Are you related to Wolfram|Alpha?” “Well”, I say, “actually, I am”. And usually there’s a look of surprise, and a slow dawning of the concept that, yes, Wolfram|Alpha hasn’t always existed: it had to be created, and there was an actual human behind it. And then I often explain that actually I first started thinking about building it a long time ago, when I was even younger than them…
How Come It Actually Worked?
When I started building Wolfram|Alpha I certainly couldn’t prove it would work. But looking back, I realize there were a collection of key things—mostly quite unique to us and our company—that ultimately made it possible. Some were technical, some were conceptual, and some were organizational.
On the technical side, the most important was that we had what was then Mathematica, but is now the Wolfram Language. And by the time we started building Wolfram|Alpha, it was clear that the unique symbolic programming paradigm that we’d invented to be the core of the Wolfram Language was incredibly general and powerful—and could plausibly succeed at the daunting task of providing a way to represent all the computational knowledge in the world.
It also helped a lot that there was so much algorithmic knowledge already built into the system. Need to solve a differential equation to compute a trajectory? Just use the built-in NDSolve function! Need to solve a difficult recurrence relation? Just use RSolve. Need to simplify a piece of logic? Use BooleanMinimize. Need to do the combinatorial optimization of finding the smallest number of coins to give change? Use FrobeniusSolve. Need to find out how long to cook a turkey of a certain weight? Use DSolve. Need to find the implied volatility of a financial derivative? Use FinancialDerivative. And so on.
But what about all that actual data about the world? All the information about cities and movies and food and so on? People might have thought we’d just be able to forage the web for it. But I knew very quickly this wouldn’t work: the data—if it even existed on the web—wouldn’t be systematic and structured enough for us to be able to correctly do actual computations from it, rather than just, for example, displaying it.
So this meant there wouldn’t be any choice but to actually dive in and carefully deal with each different kind of data. And though I didn’t realize it with so much clarity at the time, this is where our company had another extremely rare and absolutely crucial advantage. We’ve always been a very intellectual company (no doubt to our commercial detriment)—and among our staff we, for example, have PhDs in a wide range of subjects, from chemistry to history to neuroscience to architecture to astrophysics. But more than that, among the enthusiastic users of our products we count many of the world’s top researchers across a remarkable diversity of fields.
So when we needed to know about proteins or earthquakes or art history or whatever, it was easy for us to find an expert. At first, I thought the main issue would just be “Where is the best source of the relevant data?” Sometimes that source would be very obvious; sometimes it would be very obscure. (And, yes, it was always fun to run across people who’d excitedly say things like: “Wow, we’ve been collecting this data for decades and nobody’s ever asked for it before!”)
But I soon realized that having raw data was only the beginning; after that came the whole process of understanding it. What units are those quantities in? Does -99 mean that data point is missing? How exactly is that average defined? What is the common name for that? Are those bins mutually exclusive or combined? And so on. It wasn’t just enough to have the data; one also had to have an expert-level dialog with whomever had collected the data.
But then there was another issue: people want answers to questions, not raw data. It’s all well and good to know the orbital parameters for a television satellite, but what most people will actually want to know is where the satellite is in the sky at their location. And to work out something like that requires some method or model or algorithm. And this is where experts were again crucial.
My goal from the beginning was always to get the best research-level results for everything. I didn’t consider it good enough to use the simple formula or the rule of thumb. I wanted to get the best answers that current knowledge could give—whether it was for time to sunburn, pressure in the ocean, mortality curves, tree growth, redshifts in the early universe, or whatever. Of course, the good news was that the Wolfram Language almost always had the built-in algorithmic power to do whatever computations were needed. And it was remarkably common to find that the original research we were using had actually been done with the Wolfram Language.
As we began to develop Wolfram|Alpha we dealt with more and more domains of data, and more and more cross-connections between them. We started building streamlined frameworks for doing this. But one of the continuing features of the Wolfram|Alpha project has been that however good the frameworks are, every new area always seems to involve new and different twists—that can be successfully handled only because we’re ultimately using the Wolfram Language, with all its generality.
Over the years, we’ve developed an elaborate art of data curation. It’s a mixture of automation (these days, often using modern machine learning), management processes, and pure human tender loving care applied to data. I have a principle that there always has to be an expert involved—or you’ll never get the right answer. But it’s always complicated to allocate resources and to communicate correctly across the phases of data curation—and to inject the right level of judgement at the right points. (And, yes, in an effort to make the complexities of the world conveniently amenable to computation, there are inevitably judgement calls involved: “Should the Great Pyramid be considered a building?”, “Should Lassie be considered a notable organism or a fictional character?” “What was the occupation of Joan of Arc?”, and so on.)
When we started building Wolfram|Alpha, there’d already been all sorts of thinking about how large-scale knowledge should best be represented computationally. And there was a sense that—much like logic was seen as somehow universally applicable—so also there should be a universal and systematically structured way to represent knowledge. People had thought about ideas based on set theory, graph theory, predicate logic, and more—and each had had some success.
Meanwhile, I was no stranger to global approaches to things—having just finished a decade of work on my book A New Kind of Science, which at some level can be seen as being about the theory of all possible theories. But partly because of the actual science I discovered (particularly the idea of computational irreducibility), and partly because of the general intuition I had developed, I had what I now realize was a crucial insight: there’s not going to be a useful general theory of how to represent knowledge; the best you can ever ultimately do is to think of everything in terms of arbitrary computation.
And the result of this was that when we started developing Wolfram|Alpha, we began by just building up each domain “from its computational roots”. Gradually, we did find and exploit all sorts of powerful commonalities. But it’s been crucial that we’ve never been stuck having to fit all knowledge into a “data ontology graph” or indeed any fixed structure. And that’s a large part of why we’ve successfully been able to make use of all the rich algorithmic knowledge about the world that, for example, the exact sciences have delivered.
The Challenge of Natural Language
Perhaps the most obviously AI-like part of my vision for Wolfram|Alpha was that you should be able to ask it questions purely in natural language. When we started building Wolfram|Alpha there was already a long tradition of text retrieval (from which search engines had emerged), as well as of natural language processing and computational linguistics. But although these all dealt with natural language, they weren’t trying to solve the same problem as Wolfram|Alpha. Because basically they were all taking existing text, and trying to extract from it things one wanted. In Wolfram|Alpha, what we needed was to be able to take questions given in natural language, and somehow really understand them, so we could compute answers to them.
In the past, exactly what it meant for a computer to “understand” something had always been a bit muddled. But what was crucial for the Wolfram|Alpha project was that we were finally in a position to give a useful, practical definition: “understanding” for us meant translating the natural language into precise Wolfram Language. So, for example, if a user entered “What was the gdp of france in 1975?” we wanted to interpret this as the Wolfram Language symbolic expression Entity["Country", "France"][Dated["GDP", 1975]].
And while it was certainly nice to have a precise representation of a question like that, the real kicker was that this representation was immediately computable: we could immediately use it to actually compute an answer.
In the past, a bane of natural language understanding had always been the ambiguity of things like words in natural language. When you say “apple”, do you mean the fruit or the company? When you say “3 o’clock”, do you mean morning or afternoon? On which day? When you say “springfield”, do you mean “Springfield, MA” or one of the 28 other possible Springfield cities?
But somehow, in Wolfram|Alpha this wasn’t such a problem. And it quickly became clear that the reason was that we had something that no previous attempt at natural language understanding had ever had: we had a huge and computable knowledgebase about the world. So “apple” wasn’t just a word for us: we had extensive data about the properties of apples as fruit and Apple as a company. And we could immediately tell that “apple vitamin C” was talking about the fruit, “apple net income” about the company, and so on. And for “Springfield” we had data about the location and population and notoriety of every Springfield. And so on.
It’s an interesting case where things were made easier by solving a much larger problem: we could be successful at natural language understanding because we were also solving the huge problem of having broad and computable knowledge about the world. And also because we had built the whole symbolic language structure of the Wolfram Language.
There were still many issues, however. At first, I’d wondered if traditional grammar and computational linguistics would be useful. But they didn’t apply well to the often-not-very-grammatical inputs people actually gave. And we soon realized that instead, the basic science I’d done in A New Kind of Science could be helpful—because it gave a conceptual framework for thinking about the interaction of many different simple rules operating on a piece of natural language.
And so we added the strange new job title of “linguistic curator”, and set about effectively curating the semantic structure of natural language, and creating a practical way to turn natural language into precise Wolfram Language. (And, yes, what we did might shed light on how humans understand language—but we’ve been so busy building technology that we’ve never had a chance to explore this.)
How to Answer the Question
OK, so we can solve the difficult problem of taking natural language and turning it into Wolfram Language. And with great effort we’ve got all sorts of knowledge about the world, and we can compute all kinds of things from it. But given a particular input, what output should we actually generate? Yes, there may be a direct answer to a question (“42”, “yes”, whatever). And in certain circumstances (like voice output) that may be the main thing you want. But particularly when visual display is possible, we quickly discovered that people find richer outputs dramatically more valuable.
And so, in Wolfram|Alpha we use the computational knowledge we have to automatically generate a whole report about the question you asked:
We’ve worked hard on both the structure and content of the information presentation. There’d never been anything quite like it before, so everything had to be invented. At the top, there are sometimes “Assumings” (“Which Springfield did you mean?”, etc.)—though the vast majority of the time, our first choice is correct. We found it worked very well to organize the main output into a series of “pods”, often with graphical or tabular contents. Many of the pods have buttons that allow for drilldown, or alternatives.
Everything is generated programmatically. And which pods are there, with what content, and in what sequence, is the result of lots of algorithms and heuristics—including many that I personally devised. (Along the way, we basically had to invent a whole area of “computational aesthetics”: automatically determining what humans will find aesthetic and easy to interpret.)
In most large software projects, one’s building things to precise specifications. But one of the complexities of Wolfram|Alpha is that so much of what it does is heuristic. There’s no “right answer” to exactly what to plot in a particular pod, over what range. It’s a judgement call. And the overall quality of Wolfram|Alpha directly depends on doing a good job at making a vast number of such judgement calls.
But who should make these judgement calls? It’s not something pure programmers are used to doing. It takes real computational thinking skills, and it also usually takes serious knowledge of each content area. Sometimes similar judgement calls get repeated, and one can just say “do it like that other case”. But given how broad Wolfram|Alpha is, it’s perhaps not surprising that there are an incredible number of different things that come up.
And as we approached the launch of Wolfram|Alpha I found myself making literally hundreds of judgement calls every day. “How many different outputs should we generate here?” “Should we add a footnote here?” “What kind of graphic should we produce in that case?”
In my long-running work on designing Wolfram Language, the goal is to make everything precise and perfect. But for Wolfram|Alpha, the goal is instead just to have it behave as people want—regardless of whether that’s logically perfect. And at first, I worried that with all the somewhat arbitrary judgement calls we were making to achieve that, we’d end up with a system that felt very incoherent and unpredictable. But gradually I came to understand a sort of logic of heuristics, and we developed a good rhythm for inventing heuristics that fit together. And in the end—with a giant network of heuristic algorithms—I think we’ve been very successful at creating a system that broadly just automatically does what people want and expect.
Getting the Project Done
Looking back now, more than a decade after the original development of Wolfram|Alpha, it begins to seem even more surprising—and fortuitous—that the project ended up being possible at all. For it is clear now that it critically relied on a whole collection of technical, conceptual and organizational capabilities that we (and I) happened to have developed by just that time. And had even one of them been missing, it would probably have made the whole project impossible.
But even given the necessary capabilities, there was the matter of actually doing the project. And it certainly took a lot of leadership and tenacity from me—as well as all sorts of specific problem solving—to pursue a project that most people (including many of those working on it) thought, at least at first, was impossible.
How did the project actually get started? Well, basically I just decided one day to do it. And, fortunately, my situation was such that I didn’t really have to ask anyone else about it—and as a launchpad I already had a successful, private company without outside investors that had been running well for more than a decade.
From a standard commercial point of view, most people would have seen the Wolfram|Alpha project as a crazy thing to pursue. It wasn’t even clear it was possible, and it was certainly going to be very difficult and very long term. But I had worked hard to put myself in a position where I could do projects just because I thought they were intellectually valuable and important—and this was one I had wanted to do for decades.
One awkward feature of Wolfram|Alpha as a project is that it didn’t work, until it did. When I tried to give early demos, too little worked, and it was hard to see the point of the whole thing. And this led to lots of skepticism, even by my own management team. So I decided it was best to do the project quietly, without saying much about it. And though it wasn’t my intention, things ramped up to the point where a couple hundred people were working completely under the radar (in our very geographically distributed organization) on the project.
But finally, Wolfram|Alpha really started to work. I gave a demo to my formerly skeptical management team, and by the end of an hour there was uniform enthusiasm, and lots of ideas and suggestions.
And so it was that in the spring of 2009, we prepared to launch Wolfram|Alpha.
On March 5, I posted a short (and, in the light of the past decade, satisfyingly prophetic) blog that began:
We were adding features and fixing bugs at a furious pace. And rack by rack we were building infrastructure to actually support the system (yes, below all those layers of computational intelligence there are ultimately computers with power cables and network connectors and everything else):
At the beginning, we had about 10,000 cores set up to run Wolfram|Alpha (back then, virtualization wasn’t an option for the kind of performance we wanted). But we had no real idea if this would be enough—or what strange things missed by our simulations might happen when real people started using the system.
We could just have planned to put up a message on the site if something went wrong. But I thought it would be more interesting—and helpful—to actually show people what was going on behind the scenes. And so we decided to do something very unusual—and livestream to the internet the process of launching Wolfram|Alpha.
We planned our initial go-live to occur on the evening of Friday, May 15, 2009 (figuring that traffic would be lower on a Friday evening). And we built our version of a “Mission Control” to coordinate everything:
There were plenty of last-minute issues, many of them captured on the livestream. But in classic Mission Control style, each of our teams finally confirmed that we were “go for launch”—and at 9:33:50 pm CT, I pressed the big “Activate” button, and soon all network connections were open, and Wolfram|Alpha was live to the world.
Queries immediately started flowing in from around the world—and within a couple of hours it was clear that the concept of Wolfram|Alpha was a success—and that people found it very useful. It wasn’t long before bugs and suggestions started coming in too. And for a decade we’ve been being told we should give answers about the strangest things (“How many teeth does a snail have?” “How many spiders does the average American eat?” “Which superheroes can hold Thor’s hammer?” “What is the volume of a dog’s eyeball?”).
So what’s happened over the past decade? Every second, there’s been new data flowing into Wolfram|Alpha. Weather. Stock prices. Aircraft positions. Earthquakes. Lots and lots more. Some things update only every month or every year (think: government statistics). Other things update when something happens (think: deaths, elections, etc.) Every week, there are administrative divisions that change in some country around the world. And, yes, occasionally there’s even a new official country (actually, only South Sudan in the past decade).
What a decade ago was a small or fragmentary area of data, we’ve now systematically filled out—often with great effort. 140,000+ new kinds of food. 350,000 new notable people. 170+ new properties about 58,000 public companies. 100+ new properties about species (tail lengths, eye counts, etc.). 1.6 billion new data points from the US Census. Sometimes we’ve found existing data providers to work with, but quite often we’ve had to painstakingly curate the data ourselves.
It’s amazing how much in the world can be made computable if one puts in the effort. Like military conflicts, for example, which required both lots of historical work, and lots of judgement. And with each domain we add, we’ve put more and more effort into ensuring that it connects with other domains (What was the geolocation of the battle? What historical countries were involved? Etc.).
From even before Wolfram|Alpha launched, we had a wish list of domains to add. Some were comparatively easy. Others—like military conflicts or anatomical structures—took many years. Often, we at first thought a domain would be easy, only to discover all sorts of complicated issues (I had no idea how many different categories of model, make, trim, etc. are important for cars, for example).
For years, we’ve been the world’s most prolific reporter of bugs in data sources. But with so much computable data about so many things, as well as so many models about how things work, we’re now in an absolutely unique position to validate, cross-check data—and use the latest machine learning to discover patterns and detect anomalies.
Of course, data is just one part of the Wolfram|Alpha story. Because Wolfram|Alpha is also full of algorithms—both precise and heuristic—for computing all kinds of things. And over the past decade, we’ve added all sorts of new algorithms, based on recent advances in science. We’ve also been able to steadily polish what we have, covering all those awkward corner cases (“Are angle units really dimensionless or not?”, “What is the country code of a satphone?”, and so on).
One of the big unknowns when we first launched Wolfram|Alpha was how people would interact with it, and what forms of linguistic input they would give. Many billions of queries later, we know a lot about that. We know a thousand ways to ask how much wood a woodchuck can chuck, etc. We know all the bizarre variants people use to specify even simple arithmetic with units. Every day we collect the “fallthroughs”—inputs we didn’t understand. And for a decade now we’ve been steadily extending our knowledgebase and our natural language understanding system to address them.
Ever since we first launched what’s now the Wolfram Language 30+ years ago, we’ve supported things that would now be called machine learning. But over the past decade, we’ve also become leaders in modern neural nets and deep learning. And in some specific situations, we’ve now been able to make good use of this technology in Wolfram|Alpha.
But there’s been no magic bullet, and I don’t expect one. If one wants to get data that’s systematically computable, one can’t forage it from the web, even with the finest modern machine learning. One can use machine learning to make suggestions in the data curation pipeline, but in the end, if you want to get the right answer, you need a human expert who can exercise judgement based on the accumulated knowledge of a field. (And, yes, the same is true of good training sets for many machine learning tasks.)
In the natural language understanding we need to do for Wolfram|Alpha, machine learning can sometimes help, especially in speeding things up. But if one wants to be certain about the symbolic interpretation of natural language, then—a bit like for doing arithmetic—to get good reliability and efficiency there’s basically no choice but to use the systematic algorithmic approach that we’ve been developing for many years.
Something else that’s advanced a lot since Wolfram|Alpha was launched is our ability to handle complex questions that combine many kinds of knowledge and computation. To do this has required several things. It’s needed more systematically computable data, with consistent structure across domains. It’s needed an underlying data infrastructure that can handle more complex queries. And it’s needed the ability to handle more sophisticated linguistics. None of these have been easy—but they’ve all steadily advanced.
By this point, Wolfram|Alpha is one of the more complex pieces of software and data engineering that exists in the world. It helps that it’s basically all written in Wolfram Language. But over time, different parts have outgrown the frameworks we originally built for them. And an important thing we’ve done over the past decade is to take what we’ve learned from all our experience, and use it to systematically build a sequence of more efficient and more general frameworks. (And, yes, it’s never easy refactoring a large software system, but the high-level symbolic character of the Wolfram Language helps a lot.)
There’s always new development going on in the Wolfram|Alpha codebase—and in fact we normally redeploy a new version every two weeks. Wolfram|Alpha is a very complex system to test. Partly that’s because what it does is so diverse. Partly that’s because the world it’s trying to represent is a complex place. And partly it’s because human language usage is so profoundly non-modular. (“3 chains” is probably—at least for now—a length measurement, “2 chains” is probably a misspelling of a rapper, and so on.)
The Long Tail of Knowledge
What should Wolfram|Alpha know about? My goal has always been to have it eventually know about everything. But obviously one’s got to start somewhere. And when we were first building Wolfram|Alpha we started with what we thought were the “most obvious” areas. Of course, once Wolfram|Alpha was launched, the huge stream of actual questions that people ask have defined a giant to-do list, which we’ve steadily been working through, now for a decade.
When Wolfram|Alpha gets used in a new environment, new kinds of questions come up. Sometimes they don’t make sense (like “Where did I put my keys?” asked of Wolfram|Alpha on a phone). But often they do. Like asking Wolfram|Alpha on a device in a kitchen “Can dogs eat X?”. (And, yes, we’ll be trying to give the best answer current science can provide.)
But I have to admit that, particularly before we launched Wolfram|Alpha, I was personally one of our main sources of “we should know about this” input. I collected reference books, seeing what kinds of things they covered. Wherever I went, I looked for informational posters to see what was on them. And whenever I wondered about pretty much anything, I’d try to see how we could compute about it.
Often we’d be contacting world experts on different, obscure topics—always trying to get definitive computational knowledge about everything. Sometimes it’d seem as if we’d gone quite overboard, working out details nobody would ever possibly care about. But then we’d see people using those details, and sometimes we’d hear “Oh, yes, I use it every day; I don’t know anyplace else to get this right”. (I’ve sometimes thought that if Wolfram|Alpha had been out before 2008, and people could have seen our simulations, they wouldn’t have been caught with so many adjustable-rate mortgages.)
And, yes, it’s a little disappointing when one realizes that some fascinating piece of computational knowledge that took considerable effort to get right in Wolfram|Alpha will—with current usage patterns—probably only be used a few times in a century. But I view the Wolfram|Alpha project in no small part as a long-term effort to encapsulate the knowledge of our civilization, regardless of whether any of it happens to be popular right now.
So even if few people make queries about caves or cemeteries or ocean zones right now, or want to know about different types of paper, or custom screw threads, or acoustic absorption in different materials, I’m glad we’ve got all these things in Wolfram|Alpha. Because now it’s computational knowledge, that can be used by anyone, anytime in the future.
The Business of Wolfram|Alpha
We’ve put—and continue to put—an immense amount of effort into developing and running Wolfram|Alpha. So how do we manage to support doing that? What’s the business model?
The main Wolfram|Alpha website is simply free for everyone. Why? Because we want it to be that way. We want to democratize computational knowledge, and let anyone anywhere use what we’ve built.
Of course, we hope that people who use the Wolfram|Alpha website will want to buy other things we make. But on the website itself there’s simply no “catch”: we’re not monetizing anything. We’re not running external ads; we’re not selling user data; we’re just keeping everything completely private, and always have.
But obviously there are ways in which we are monetizing Wolfram|Alpha—otherwise we wouldn’t be able to do everything we’re doing. At the simplest level, there are subscription-based Pro versions on the website that have extra features of particular interest to students and professionals. There’s a Wolfram|Alpha app that has extra features optimized for mobile devices. There are also about 50 specialized apps (most for both mobile and web) that support more structured access to Wolfram|Alpha, convenient for students taking courses, hobbyists with particular interests, and professionals with standard workflows they repeatedly follow.
Then there are Wolfram|Alpha APIs—which are widely licensed by companies large and small (there’s a free tier for hobbyists and developers). There are multiple different APIs. Some are optimized for spoken results, some for back-and-forth conversation, some for visual display, and so on. Sometimes the API is used for some very specific purpose (calculus, particular socioeconomic data, tide computations, whatever). But more often it’s just set up to take any natural language query that arrives. (These days, specialized APIs are actually usually better built directly with Wolfram Language, as I’ll discuss a bit later.) Most of the time, the Wolfram|Alpha API runs on our servers, but some of our largest customers have private versions running inside their infrastructure.
When people access Wolfram|Alpha from different parts of the world, we automatically use local conventions for things like units, currency and so on. But when we first built Wolfram|Alpha we fundamentally did it for English language only. I always believed, though, that the methods for natural language understanding that we invented would work for other languages too, despite all their differences in structure. And it turns out that they do.
Each language is a lot of work, though. Even the best automated translation helps only a little; to get reliable results one has to actually build up a new algorithmic structure for each language. But that’s only the beginning. There’s also the issue of automatic natural language generation for output. And then there’s localized data relevant for the countries that use a particular language.
But we’re gradually working on building versions of Wolfram|Alpha for other languages. Nearly five years ago we actually built a full Wolfram|Alpha for Chinese—but, sadly, regulatory issues in China have so far prevented us from deploying it there. Recently we released a version for Japanese (right now set up to handle mainly student-oriented queries). And we’ve got versions for five other languages in various stages of completion (though we’ll typically need local partners to deploy them properly).
Beyond Wolfram|Alpha on the public web, there are also private versions of Wolfram|Alpha. In the simplest case, a private Wolfram|Alpha is just a copy of the public Wolfram|Alpha, but running inside a particular organization’s infrastructure. Data updates flow into the private Wolfram|Alpha from the outside, but no queries for the private Wolfram|Alpha ever need to leave the organization.
Ordinary Wolfram|Alpha deals with public computational knowledge. But the technology of Wolfram|Alpha can also be applied to private data in an organization. And in recent years an important part of the business story of Wolfram|Alpha is what we call Enterprise Wolfram|Alpha: custom versions of Wolfram|Alpha that answer questions using both public computational knowledge, and private knowledge inside an organization.
For years I’ve run into CEOs who look at Wolfram|Alpha and say, “I wish I could do that kind of thing with my corporate data; it’d be so much easier for my company to make decisions…” Well, that’s what Enterprise Wolfram|Alpha is for. And over the past several years we’ve been installing Enterprise Wolfram|Alpha in some of the world’s largest companies in all sorts of industries, from healthcare to financial services, retail, and so on.
For a few years now, there’s been a lot of talk (and advertising) about the potential for “applying AI in the enterprise”. But I think it’s fair to say that with Enterprise Wolfram|Alpha we’ve got a serious, enterprise use of AI up and running right now—delivering very successful results.
The typical pattern is that you ask a question in natural language, and Enterprise Wolfram|Alpha then generates a report about the answer, using a mixture of public and private knowledge. “What were our sales of foo-pluses in Europe between Christmas and New Year?” Enterprise Wolfram|Alpha has public knowledge about what dates we’re talking about, and what Europe is. But then it’s got to figure out the internal linguistics of what foo-pluses are, and then go query an internal sales database about how many were sold. Finally, it’s got to generate a report that gives the answer (perhaps both the number of units and dollar amount), as well as, probably, a breakdown by country (perhaps normalized by GDP), comparisons to previous years, maybe a time series of sales by day, and so on.
Needless to say, there’s plenty of subtlety in getting a useful result. Like what the definition of Europe is. Or the fact that Christmas (and New Year’s) can be on different dates in different cultures (and, of course, Wolfram|Alpha has all the necessary data and algorithms). Oh, and then one has to start worrying about currency conversion rates (which of course Wolfram|Alpha has)—as well as about conventions about conversion dates that some particular company may use.
Like any sophisticated piece of enterprise software, Enterprise Wolfram|Alpha has to be configured for each particular customer, and we have a business unit called Wolfram Solutions that does that. The goal is always to map the knowledge in an organization to a clear symbolic Wolfram Language form, so it becomes computable in the Wolfram|Alpha system. Realistically, for a large organization, it’s a lot of work. But the good news is that it’s possible—because Wolfram Solutions gets to use the whole curation and algorithm pipeline that we’ve developed for Wolfram|Alpha.
Of course, we can use all the algorithmic capabilities of the Wolfram Language too. So if we have to handle textual data we’re ready with the latest NLP tools, or if we want to be able to make predictions we’re ready with the latest statistics and machine learning, and so on.
Businesses started routinely putting their data onto computers more than half a century ago. But now across pretty much every industry, more acutely than ever, the challenge is to actually use that data in meaningful ways. Eventually everyone will take for granted that they can just ask about their data, like on Star Trek. But the point is that with Enterprise Wolfram|Alpha we have the technology to finally make this possible.
It’s a very successful application of Wolfram|Alpha technology, and the business potential for it is amazing. But for us the main limiting factor is that as a business it’s so different from the rest of what we do. Our company is very much focused on R&D—but Enterprise Wolfram|Alpha requires a large-scale customer-facing organization, like a typical enterprise software company. (And, yes, we’re exploring working with partners for this, but setting up such things has proved to be a slow process!)
By the way, people sometimes seem to think that the big opportunity for AI in the enterprise is in dealing with unstructured corporate data (such as free-form text), and finding “needles in haystacks” there. But what we’ve consistently seen is that in typical enterprises most of their data is actually stored in very structured databases. And the challenge, instead, is to answer unstructured queries.
In the past, it’s been basically impossible to do this in anything other than very simple ways. But now we can see why: because you basically need the whole Wolfram|Alpha technology stack to be able to do it. You need natural language understanding, you need computational knowledge, you need automated report generation, and so on. But that’s what Enterprise Wolfram|Alpha has. And so it’s finally able to solve this problem.
But what does it mean? It’s a little bit like when we first introduced Mathematica 30+ years ago. Before then, a typical scientist wouldn’t expect to use a computer themselves for a computation: they’d delegate it to an expert. But one of the great achievements of Mathematica is that it made things easy enough that scientists could actually compute for themselves. And so, similarly, typical executives in companies don’t directly compute answers themselves; instead, they ask their IT department to do it—then hope the results they get back a week later makes sense. But the point is that with Enterprise Wolfram|Alpha, executives can actually get questions answered themselves, immediately. And the consequences of that for making decisions are pretty spectacular.
Wolfram|Alpha Meets Wolfram Language
The Wolfram Language is what made Wolfram|Alpha possible. But over the past decade Wolfram|Alpha has also given back big time to Wolfram Language, delivering both knowledgebase and natural language understanding.
It’s interesting to compare Wolfram|Alpha and Wolfram Language. Wolfram|Alpha is for quick computations, specified in a completely unstructured way using natural language, and generating as output reports intended for human consumption. Wolfram Language, on the other hand, is a precise symbolic language intended for building up arbitrarily complex computations—in a way that can be systematically understood by computers and humans.
One of the central features of the Wolfram Language is that it can deal not only with abstract computational constructs, but also with things in the real world, like cities and chemicals. But how should one specify these real-world things? Documentation listing the appropriate way to specify every city wouldn’t be practical or useful. But what Wolfram|Alpha provided was a way to specify real-world things, using natural language.
Inside Wolfram|Alpha, natural language input is translated to Wolfram Language. And that’s what’s now exposed in the Wolfram Language, and in Wolfram Notebooks. Type + = and a piece of natural language (like “LA”). The output—courtesy of Wolfram|Alpha natural language understanding technology—is a symbolic entity representing Los Angeles. And that symbolic entity is then a precise object that the Wolfram Language can use in computations.
I didn’t particularly anticipate it, but this interplay between the do-it-however-you-want approach of Wolfram|Alpha and the precise symbolic approach of the Wolfram Language is exceptionally powerful. It gets the best of both worlds—and it’s an important element in allowing the Wolfram Language to assume its unique position as a full-scale computational language.
What about the knowledgebase of Wolfram|Alpha, and all the data it contains? Over the past decade we’ve spent immense effort fully integrating more and more of this into the Wolfram Language. It’s always difficult to get data to the point where it’s computable enough to use in Wolfram|Alpha—but it’s even more difficult to make it fully and systematically computable in the way that’s needed for the Wolfram Language.
Imagine you’re dealing with data about oceans. To make it useful for Wolfram|Alpha you have to get it to the point where if someone asks about a specific named ocean, you can systematically retrieve or compute properties of that ocean. But to make it useful for Wolfram Language, you have to get it to the point where someone can do computations about all oceans, with none missing.
A while ago I invented a 10-step hierarchy of data curation. For data to work in Wolfram|Alpha, you have to get it to level 9 in the hierarchy. But to get it to work in Wolfram Language, you have to get it all the way to level 10. And if it takes a few months to get some data to level 9, it can easily take another year to get it to level 10.
So it’s been a big achievement that over the past decade we’ve managed to get the vast majority of the Wolfram|Alpha knowledgebase up to the level where it can be directly used in the Wolfram Language. So all that data is now not only good enough for human consumption, but also good enough that one can systematically build up computations using it.
All the integration with the Wolfram Language means it’s in some sense now possible to “implement Wolfram|Alpha” in a single line of Wolfram Language code. But it also means that it’s easy to make Wolfram Language instant APIs that do more specific Wolfram|Alpha-like things.
There’s an increasing amount of interconnection between Wolfram|Alpha and Wolfram Language. For example, on the Wolfram|Alpha website most output pods have an “Open Code” button, which opens a Wolfram Notebook in the Wolfram Cloud, with Wolfram Language input that corresponds to what was computed in that pod.
In other words, you can use results from Wolfram|Alpha to “seed” a Wolfram Notebook, in which you can then edit or add inputs do a complete, multi-step Wolfram Language computation. (By the way, you can always generate full Wolfram|Alpha output inside a Wolfram Notebook too.)
Where to Now? The Future of Wolfram|Alpha
When Wolfram|Alpha first launched nobody had seen anything like it. A decade later, people have learned to take some aspects of it for granted, and have gotten used to having it available in things like intelligent assistants. But what will the future of Wolfram|Alpha now be?
Over the past decade we’ve progressively strengthened essentially everything about Wolfram|Alpha—to the point where it’s now excellently positioned for steady long-term growth in future decades. But with Wolfram|Alpha as it exists today, we’re now also in a position to start attacking all sorts of major new directions. And—important as what Wolfram|Alpha has achieved in its first decade has been—I suspect that in time it will be dwarfed by what comes next.
A decade ago, nobody had heard of “fake news”. Today, it’s ubiquitous. But I’m proud that Wolfram|Alpha stands as a beacon of accurate knowledge. And it’s not just knowledge that humans can use; it’s knowledge that’s computable, and suitable for computers too.
More and more is being done these days with computational contracts—both on blockchains and elsewhere. And one of the central things such contracts require is a way to know what’s actually happened in the world—or, in other words, a systematic source of computational facts.
But that’s exactly what Wolfram|Alpha uniquely provides. And already the Wolfram|Alpha API has become the de facto standard for computational facts. But one’s going to see a lot more of Wolfram|Alpha here in the future.
It’s going to put increasing pressure on the reliability of the computational knowledge in Wolfram|Alpha. Because it won’t be long before there will routinely be whole chains of computational contracts—that do important things in the world—and that trigger as soon as Wolfram|Alpha has delivered some particular fact on which they depend.
We’ve developed all sorts of procedures to validate facts. Some are automated—and depend on “theorems” that must be true about data, or cross-correlations or statistical regularities that should exist. Others ultimately rely on human judgement. (A macabre example is our obituary feed: we automatically detect news reports about deaths of people in our knowledgebase. These are then passed to our 24/7 site monitors, who confirm, or escalate the judgement call if needed. Somehow I’m on the distribution list for confirmation requests—and over the past decade there’ve been far too many times when this is how I’ve learned that someone I know has died.)
We take our responsibility as the world’s source of computational facts very seriously, and we’re planning more and more ways to add checks and balances—needless to say, defining what we’re doing using computational contracts.
When we first started developing Wolfram|Alpha, nobody was talking about computational contracts (though, to be fair, I had already thought about them as a potential application of my computational ideas). But now it turns out that Wolfram|Alpha is central to what can be done with them. And as a core component in the long history of the development of systematic knowledge, I think it’s inevitable that over time there will be all sorts of important uses of Wolfram|Alpha that we can’t yet foresee.
In the early days of artificial intelligence, much of what people imagined AI would be like is basically what Wolfram|Alpha has now delivered. So what can now be done with this?
We can certainly put “general knowledge AI” everywhere. Not just in phones and cars and televisions and smart speakers, but also in augmented reality and head- and ear-mounted devices and many other places too.
One of the Wolfram|Alpha APIs we provide is a “conversational” one, that can go back and forth clarifying and extending questions. But what about a full Wolfram|Alpha Turing test–like bot? Even after all these years, general-purpose bots have tended to be disappointing. And if one just connects Wolfram|Alpha to them, there tends to be quite a mismatch between general bot responses and “smart facts” from Wolfram|Alpha. (And, yes, in a Turing test competition, the presence of Wolfram|Alpha is a dead giveaway—because it knows much more than any human would.) But with progress in my symbolic discourse language–and probably some modern machine learning—I suspect it’ll be possible to make a more successful general-purpose bot that’s more integrated with Wolfram|Alpha.
But what I think is critical in many future applications of Wolfram|Alpha is to have additional sources of data and input. If one’s making a personal intelligent assistant, for example, then one wants to give it access to as much personal history data (messages, sensor data, video, etc.) as possible. (We already did early experiments on this back in 2011 with Facebook data.)
Then one can use Wolfram|Alpha to ask questions not only about the world in general, but also about one’s own interaction with it, and one’s own history. One can ask those questions explicitly with natural language—or one can imagine, for example, preemptively delivering answers based on video or some other aspect of one’s current environment.
Beyond personal uses, there are also organizational and enterprise ones. And indeed we already have Enterprise Wolfram|Alpha—making use of data inside organizations. So far, we’ve been building Enterprise Wolfram|Alpha systems mainly for some of the world’s largest companies—and every system has been unique and extensively customized. But in time—especially as we deal with smaller organizations that have more commonality within a particular industry—I expect that we’ll be able to make Enterprise Wolfram|Alpha systems that are much more turnkey, effectively by curating the possible structures of businesses and their IT systems.
And, to be clear, the potential here is huge. Because basically every organization in the world is today collecting data. And Enterprise Wolfram|Alpha will provide a realistic way for anyone in an organization to ask questions about their data, and make decisions based on it.
There are so many sources of data for Wolfram|Alpha that one can imagine. It could be photographs from drones or satellites. It could be video feeds. It could be sensor data from industrial equipment or robots. It could be telemetry from inside a game or a virtual world (like from our new UnityLink). It could be the results of a simulation of some system (say in Wolfram SystemModeler). But in all cases, one can expect to use the technology of Wolfram|Alpha to provide answers to free-form questions.
One can think of Wolfram|Alpha as enabling a kind of AI-powered human interface. And one can imagine using it not only to ask questions about existing data, but also as a way to control things, and to get actions taken. We’ve done experiments with Wolfram|Alpha-based interfaces to complex software systems. But one could as well do this with consumer devices, industrial systems, or basically anything that can be controlled through a connection to a computer.
Not everything is best done with pure Wolfram|Alpha—or with something like natural language. Many things are better done with the full computational language that we have in the Wolfram Language. But when we’re using this language, we’re of course still using the Wolfram|Alpha technology stack.
Wolfram|Alpha is already well on its way to being a ubiquitous presence in the computational infrastructure of the world. And between its direct use, and its use in Wolfram Language, I think we can expect that in the future we’ll all end up routinely encountering Wolfram|Alphas all the time.
For many decades our company—and I—have been single-mindedly pursuing the goal of realizing the potential of computation and the computational paradigm. And in doing this, I think we’ve built a very unique organization, with very unique capabilities.
And looking back a decade after the launch of Wolfram|Alpha, I think it’s no surprise that Wolfram|Alpha has such a unique place in the world. It is, in a sense, the kind of thing that our company is uniquely built to create and develop.
I’ve wanted Wolfram|Alpha for nearly 50 years. And it’s tremendously satisfying to have been able to create what I think will be a defining intellectual edifice in the long history of systematic knowledge. It’s been a good first decade for Wolfram|Alpha. And I begin its second decade with great enthusiasm for the future and for everything that can be done with Wolfram|Alpha.
It happens far too often. I’ll be talking to a software developer, and they’ll be saying how great they think our technology is, and how it helped them so much in school, or in doing R&D. But then I’ll ask them, “So, are you using Wolfram Language and its computational intelligence in your production software system?” Sometimes the answer is yes. But too often, there’s an awkward silence, and then they’ll say, “Well, no. Could I?”
I want to make sure the answer to this can always be: “Yes, it’s easy!” And to help achieve that, we’re releasing today the Free Wolfram Engine for Developers. It’s a full engine for the Wolfram Language, that can be deployed on any system—and called from programs, languages, web servers, or anything.
Many people know the Wolfram Language (often in the form of Mathematica) as a powerful system for interactive computing—and for doing R&D, education, data science and “computational X” for many X. But increasingly it’s also being used “behind the scenes” as a key component in building production software systems. And what the Free Wolfram Engine for Developers now does is to package it so it’s convenient to insert into a whole range of software engineering environments and projects.
It’s worth explaining a bit about how I see the Wolfram Language these days. (By the way, you can run it immediately on the web in the Wolfram Language Sandbox.) The most important thing is to realize that the Wolfram Language as it now exists is really a new kind of thing: a full-scale computational language. Yes, it’s an extremely powerful and productive (symbolic, functional, …) programming language. But it’s much more than that. Because it’s got the unique feature of having a huge amount of computational knowledge built right into it: knowledge about algorithms, knowledge about the real world, knowledge about how to automate things.
Back on the earliest computers, all one had was machine code. But then came simple programming languages. And soon one could also take it for granted that one’s computer would have an operating system. Later also networking, then a user interface, then web connectivity. My goal with the Wolfram Language is to provide a layer of computational intelligence that in effect encapsulates the computational knowledge of our civilization, and lets people take it for granted that their computer will know how to identify objects in an image, or how to solve equations, or what the populations of cities are, or countless other things.
And now, today, what we want to do with the Free Wolfram Engine for Developers is to make this something ubiquitous, and immediately available to any software developer.
The Free Wolfram Engine for Developers has access to the whole Wolfram Knowledgebase, through a free Basic subscription to the Wolfram Cloud. (Unless you want real-time data, everything can be cached, so you can run the Wolfram Engine without network connectivity.) The Basic subscription to the Wolfram Cloud also lets you deploy limited APIs in the cloud.
(Of course, if you want to use our whole hyperarchitecture spanning desktop, server, cloud, parallel, embedded, mobile—and interactive, development and production computing—then a good entry point is Wolfram|One, and, yes, there are trial versions available.)
The Free Wolfram Engine for Developers is intended for use in pre-production software development. You can use it to develop a product for yourself or your company. You can use it to conduct personal projects at home, at school or at work. And you can use it to explore the Wolfram Language for future production projects. (Here’s the actual license, if you’re curious.)
When you have a system ready to go into production, then you get a Production License for the Wolfram Engine. Exactly how that works will depend on what kind of system you’ve built. There are options for local individual or enterprise deployment, for distributing the Wolfram Engine with software or hardware, for deploying in cloud computing platforms—and for deploying in the Wolfram Cloud or Wolfram Enterprise Private Cloud.
If you’re making a free, open-source system, you can apply for a Free Production License. Also, if you’re part of a Wolfram Site License (of the type that, for example, most universities have), then you can freely use Free Wolfram Engine for Developers for anything that license permits.
We haven’t worked out all the corners and details of every possible use of the Wolfram Engine. But we are committed to providing predictable and straightforward licensing for the long term (and we’re working to ensure the availability and vitality of the Wolfram Language in perpetuity, independent of our company). We’ve now had consistent pricing for our products for 30+ years, and we want to stay as far away as possible from the many variants of bait-and-switch which have become all too prevalent in modern software licensing.
So Use It!
I’m very proud of what we’ve created with Wolfram Language, and it’s been wonderful to see all the inventions, discoveries and education that have happened with it over decades. But in recent years there’s been a new frontier: the increasingly widespread use of the Wolfram Language inside large-scale software projects. Sometimes the whole project is built in Wolfram Language. Sometimes Wolfram Language is inserted to add some critical computational intelligence, perhaps even just in a corner of the project.
The goal of the Free Wolfram Engine for Developers is to make it easy for anyone to use the Wolfram Language in any software development project—and to build systems that take advantage of its computational intelligence.
We’ve worked hard to make the Free Wolfram Engine for Developers as easy to use and deploy as possible. But if there’s something that doesn’t work for you or your project, please send me mail! Otherwise, please use what we’ve built—and do something great with it!