This is the first in a series of posts about new LLM-related technology associated with the Wolfram technology stack.
Build a New Plugin in under a Minute…
A few weeks ago, in collaboration with OpenAI, we released the Wolfram plugin for ChatGPT, which lets ChatGPT use Wolfram Language and Wolfram|Alpha as tools, automatically called from within ChatGPT. One can think of this as adding broad “computational superpowers” to ChatGPT, giving access to all the general computational capabilities and computational knowledge in Wolfram Language and Wolfram|Alpha.
But what if you want to make your own special plugin, that does specific computations, or has access to data or services that are for example available only on your own computer or computer system? Well, today we’re releasing a first version of a kit for doing that. And building on our whole Wolfram Language tech stack, we’ve managed to make the whole process extremely easy—to the point where it’s now realistic to deploy at least a basic custom ChatGPT plugin in under a minute.
There’s some (very straightforward) one-time setup you need—authenticating with OpenAI, and installing the Plugin Kit. But then you’re off and running, and ready to create your first plugin.
To run the examples here for yourself you’ll need:
- Developer access to the OpenAI plugin system for ChatGPT
- Access to a Wolfram Language system (including Free Wolfram Engine for Developers, Wolfram Cloud Basic, etc.)
- You’ll also for now need to install the ChatGPT Plugin Kit with
PacletInstall["Wolfram/ChatGPTPluginKit"]
Here’s a very simple example. Let’s say you make up the idea of a “strength” for a word, defining it to be the sum of the “letter numbers” (“a” is 1, “b” is 2, etc.). In Wolfram Language you can compute this as:
And for over a decade it’s been standard that you can instantly deploy such a computation as a web API in the Wolfram Cloud—immediately accessible through a web browser, external program, etc.:
But today there’s a new form of deployment: as a plugin for ChatGPT. First, you say you need the Plugin Kit:
Then you can immediately deploy your plugin. All it takes is:
The final step is that you have to tell ChatGPT about your plugin. Within the web interface (as it’s currently configured), select Plugins > Plugin store > Develop your own plugin and insert the URL from the ChatGPTPluginDeployment (which you get by pressing the click-to-copy button ) into the dialog you get:
Now everything’s ready. And you can start talking to ChatGPT about “word strengths”, and it’ll call your plugin (which by default is called “WolframCustom”) to compute them:
Looking “inside the box” shows the communication ChatGPT had with the plugin:
Without the plugin it won’t know what “letter strength” is. But with the plugin, it’ll be able to do all kinds of (rather remarkable) things with it—like this:
The embellishment about properties of gnus is charming, but if one opens the boxes one can see how it got its answer—it just started trying different animals (“lion”, “zebra”, “gnu”):
Software engineers will immediately notice that the plugin we’ve set up is running against localhost, i.e. it’s executing on your own computer. As we’ll discuss, this is often an incredibly useful thing to be able to do. But you can also use the Plugin Kit to create plugins that execute purely in the Wolfram Cloud (so that, for example, you don’t have to have a Wolfram Engine available on your computer).
All you do is use ChatGPTPluginCloudDeploy—then you get a URL in the Wolfram Cloud that you can tell ChatGPT as the location of your plugin:
And in fact you can do the whole setup directly in your web browser, without any local Wolfram installation. You just open a notebook in the Wolfram Cloud, and deploy your plugin from there:
Let’s do some other examples. For our next example, let’s invent the concept of a “geo influence disk” and then deploy a plugin that renders such a thing (we’ll talk later about some details of what’s being done here):
Now we can install this new plugin—and then start asking ChatGPT about “geo influence disks”:
ChatGPT successfully calls the plugin, and brings back an image. Somewhat amusingly, it guesses (correctly, as it happens) what a “geo influence disk” is supposed to be. And remember, it can’t see the picture or read the code, so its guess has to be based only on the name of the API function and the question one asks. Of course, it has to effectively understand at least a bit in order to work out how to call the API function—and that x is supposed to be a location, and radius a distance.
As another example, let’s make a plugin that sends the user (i.e. the person who deploys the plugin) a text message:
Now just say “send me a message”
and a text message will arrive—in this case with a little embellishment from ChatGPT:
Here’s a plugin that also sends an “alert picture” of an animal that’s mentioned:
And, yes, there’s a lot of technology that needs to work to get this to happen:
As another example, let’s make a plugin that retrieves personal data of mine—here heart rate data that I’ve been accumulating for several years in a Wolfram databin:
Now we can use ChatGPT to ask questions about this data:
And with the main Wolfram plugin also installed, we can immediately do actual computations on this data, all through ChatGPT’s “linguistic user interface”:
This example uses the Wolfram Data Drop system. But one can do very much the same kind of thing with something like an SQL database. And if one has a plugin set up to access a private database there are truly remarkable things that can be done through ChatGPT with the Wolfram plugin.
Plugins That Control Your Own Computer
When you use ChatGPT through its standard web interface, ChatGPT is running “in the cloud”—on OpenAI’s servers. But with plugins you can “reach back”—through your web browser—to make things happen on your own, local computer. We’ll talk later about how this works “under the hood”, but suffice it to say now that when you deploy a plugin using ChatGPTPluginDeploy (as opposed to ChatGPTPluginCloudDeploy) the actual Wolfram Language code in the plugin will be run on your local computer. So that means it can get access to local resources on your computer, like your camera, speakers, files, etc.
For example, here I’m setting up a plugin to take a picture with my computer’s camera (using the Wolfram Language CurrentImage[ ])—and then blend the picture with whatever color I specify (we’ll talk about the use of CloudExport later):
Installing the plugin, I then say to ChatGPT “Just picture me in green!”, and, right then and there, ChatGPT will call the plugin, which gets my computer to take a picture of me—and then blends it with green (complete with my “I wonder if this is going to work” look):
OK let’s try a slightly more sophisticated example. Here we’re going to make a plugin to get ChatGPT to put up a notebook on my computer, and start writing content into it. To achieve this, we’re going to define several API endpoints (and we’ll name the whole plugin "NotebookOperations"):
First, let’s tell ChatGPT to create a new notebook
and up pops a new notebook on my screen:
If we look at the symbol nb in the Wolfram Language session from which we deployed the plugin, we’ll find out that it was set by the API:
Now let’s use some of our other API endpoints to add content to the notebook:
Here’s what we get:
The text was made up by ChatGPT; the pictures came from doing a web image search. (We could also have used the new ImageSynthesize[ ] function in the Wolfram Language to make de novo cats.)
And as a final “bow”, let’s ask ChatGPT to show us an image of the notebook captured from our computer screen with CurrentNotebookImage:
We could also add another endpoint to publish the notebook to the cloud using CloudPublish, and maybe to send the URL in an email.
We could think of the previous example as accumulating results in a notebook. But we can also just accumulate results in the value of a Wolfram Language symbol. Here we initialize the symbol result to be an empty list. Then we define an API that appends to this list, but we give a prompt that says to only do this appending when we have a single-word result:
Let’s set up an “exercise” for ChatGPT:
At this point, result is still empty:
Now let’s ask our first question:
ChatGPT doesn’t happen to directly show us the answer. But it calls our API and appends it to result:
Let’s ask another question:
Now result contains both answers:
And if we put Dynamic[result] in our notebook, we’d see this dynamically change whenever ChatGPT calls the API.
In the last example, we modified the value of a symbol from within ChatGPT. And if we felt brave, we could just let ChatGPT evaluate arbitrary code on our computer, for example using an API that calls ToExpression. But, yes, giving ChatGPT the ability to execute arbitrary code of its own making does seem to open us up to a certain “Skynet risk” (and makes us wonder all the more about “AI constitutions” and the like).
But much more safely than executing arbitrary code, we can imagine letting ChatGPT effectively “root around” in our filesystem. Let’s set up the following plugin:
First we set a directory that we want to operate in:
Now let’s ask ChatGPT about the files there:
With the Wolfram plugin we can get it to make a pie chart of those file types:
Now we ask it to do something very “LLM-ey”, and to summarize the contents of each file (in the API we used Import to import plaintext versions of files):
There are all sorts of things one can do. Here’s a plugin to compute ping times from your computer:
Or, as another example, you can set up a plugin that will create scheduled tasks to provide email (or text, etc.) reminders at specified times:
ChatGPT dutifully queues up the tasks:
Then every 10 seconds or so, into my mailbox pops a (perhaps questionable) animal joke:
As a final example, let’s consider the local-to-my-computer task of audibly playing a tune. First we’ll need a plugin that can decode notes and play them (the "ChatGPTPluginDeploy" is there to tell ChatGPT the plugin did its job—because ChatGPT has no way to know that by itself):
Here we give ChatGPT the notes we want—and, yes, this immediately plays the tune on my computer:
And now—as homage to a famous fictional AI—let’s try to play another tune:
And, yes, ChatGPT has come up with some notes, and packaged them up for the plugin; then the plugin played them:
And this works too:
But… wait a minute! What’s that tune? It seems ChatGPT can’t yet quite make the same (dubious) claim HAL does:
“No [HAL] 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.”
How It All Works
We’ve now seen lots of examples of using the ChatGPT Plugin Kit. But how do they work? What’s under the hood? When you run ChatGPTPluginDeploy you’re basically setting up a Wolfram Language function that can be called from inside ChatGPT when ChatGPT decides it’s needed. And to make this work smoothly turns out to be something that uses a remarkable spectrum of unique capabilities of Wolfram Language—dovetailed with certain “cleverness” in ChatGPT.
From a software engineering point of view, a ChatGPT plugin is fundamentally one or more web APIs—together with a “manifest” that tells ChatGPT how to call these APIs. So how does one set up a web API in Wolfram Language? Well, a decade ago we invented a way to make it extremely easy.
Like everything in Wolfram Language, a web API is represented by a symbolic expression, in this case of the form APIFunction[…]. What’s inside the APIFunction? There are two pieces. A piece of Wolfram Language code that implements the function one wants, together with a specification for how the strings that will actually be passed to the APIFunction (say from a web API) should be interpreted before feeding them to the Wolfram Language code.
Here’s a little piece of Wolfram Language code, in this case for negating a color, then making it lighter:
If we wanted to, we could refactor this as a “pure function” applied to two arguments:
On its own the pure function is just a symbolic expression that evaluates to itself:
If we want to, we can name the arguments of the pure function, then supply them in an association () with their names as keys:
But let’s say we want to call our function from a web API. The parameters in the web API are always strings. So how can we convert from a string (like "lime green") to a symbolic expression that Wolfram Language can understand? Well, we have to use the natural language understanding capabilities of Wolfram Language.
Here’s an example, where we’re saying we want to interpret a string as a color:
What really is that color swatch? Like everything else in Wolfram Language, it’s just a symbolic expression:
OK, now we’re ready to package this all up into an APIFunction. The first argument says the API we’re representing has two parameters, and describes how we want to interpret these. The second argument gives the actual Wolfram Language function that the API computes. On its own, the APIFunction is just a symbolic expression that evaluates to itself:
But if we supply values for the parameters (here using an association) it’ll evaluate:
So far all this is just happening inside our Wolfram Language session. But to get an actual web API we just have to “cloud deploy” our APIFunction:
Now we can call this web API, say from a web browser:
And, yes, that’s the symbolic expression result. If we’d wanted something visual, we could tell the APIFunction to give its results, say as a PNG:
And now it’ll show up as an image in a web browser:
(Note that CloudDeploy deploys a web API that by default has permissions set so that only I can run it. If you use CloudPublish instead, anyone will be able to run it.)
OK, so how do we set up our web API so it can be called as a ChatGPT plugin? One immediate issue is that at the simplest level ChatGPT just deals with text, so we’ve somehow got to convert our result to text. So let’s do a little Wolfram Language programming to achieve that. Here’s a list of values and names of common colors from the Wolfram Knowledgebase:
Of course, we know about many other collections of named colors too, but let’s not worry about that here:
Now we can use Nearest to find which common color is nearest to the color we’ve got:
Now let’s put this into an APIFunction (we’ve “iconized” the list of colors here; we could also have defined a separate function for finding nearest colors, which would automatically be brought along by CloudDeploy):
Now we’re ready to use ChatGPTPluginDeploy. The way ChatGPT plugins work, we’ve got to give a name to the “endpoint” corresponding to our API. And this name—along with the names we used for the parameters in our API—will be used by ChatGPT to figure out when and how to call our plugin. But in this example, we just want to use some kind of unique name for the endpoint, so we’ll be able to refer to it in our chat without ChatGPT confusing it with something else. So let’s call it ColorMangle. So now let’s do the deployment:
Everything we’ve said so far about APIFunction and how it’s called works the same in ChatGPTPluginDeploy and ChatGPTPluginCloudDeploy. But what we’ll say next is different. Because ChatGPTPluginDeploy sets up the API function to execute on your local computer, while ChatGPTPluginCloudDeploy sets it up to run in the Wolfram Cloud (or it could be a Wolfram Enterprise Private Cloud, etc.).
There are advantages and disadvantages to both local and cloud deployment. Running locally allows you to get access to local features of your computer, like camera, filesystem, etc. Running in the cloud allows you to let other people also run your plugin (though, currently, unless you register your plugin with OpenAI, only a limited number of people will be able to install your plugin at any one time).
But, OK, let’s talk about local plugin deployment. ChatGPTPluginDeploy effectively sets up a minimal web server on your computer (implemented with 10 lines of Wolfram Language code), running on a port that ChatGPTPluginDeploy chooses, and calling the Wolfram Engine with your API function whenever it receives a request to the API’s URL.
Here’s the operating system socket that ChatGPTPluginDeploy is using (and, yes, the Wolfram Language represents sockets—like everything else—as symbolic expressions):
OK, but how does ChatGPT know about your API? First, you have to tell it the port you’re using, which you do through the ChatGPT UI (Plugins > Plugin store > Develop your own plugin). You can find the port by clicking the icon in the ChatGPTPluginDeployment object, or programmatically with:
You enter this URL, then tell ChatGPT to “Find manifest file”:
Let’s look at what it found:
It’s a “manifest” that tells it about the plugin you’re installing. We didn’t specify much, so most things here are just defaults. But an important piece of the manifest is the part that gives the URL for API spec: http://localhost:59353/.well-known/openapi.json
And going there we find this “OpenAPI spec”:
Finally, click Install localhost plugin, and the plugin will show up in the list of installed plugins in your ChatGPT session:
And when ChatGPT starts with the plugin installed, it includes an extra piece in its “system prompt”, that lets it “learn” how to call the plugin:
So now we’re ready to use the plugin:
And, yes, it works. But there’s a bit of magic here. Somehow ChatGPT had to “take apart” what we’d asked, realize that the API endpoint called ColorMangle was relevant, then figure out that its color parameter should be “lime green”, and its level should be “0.5”. Opening the box, we can see what it did:
And now we can start using “color mangling” in other places—though ChatGPT hastens to tell us that “color mangling” is a “fictional operation”, perhaps lest it’s accused of disrespecting a country’s flag colors:
In the case we’re dealing with here, ChatGPT manages to correctly “wire up” fragments of text to appropriate parameters in our API. And it does that (rather remarkably) just from the scrap of information it gleans from the names we used for the parameters (and the name we gave the endpoint).
But sometimes we have to tell it a bit more, and we can do that by specifying a prompt for the plugin inside ChatGPTPluginDeploy:
Now we don’t have to just talk about colors:
At first, it didn’t successfully “untangle” the “colors of Iceland”, but then it corrected itself, and got the answers. (And, yes, we might have been able to avoid this by writing a better prompt.)
And actually, there are multiple levels of prompts you can give. You can include a fairly long prompt for the whole plugin. Then you can give shorter prompts for each individual API endpoint. And finally, you can give prompts to help ChatGPT interpret individual parameters in the API, for example by replacing "color" → "Color" with something like:
When you set up a plugin, it can contain many endpoints, that do different things. And—in addition to sharing prompts—one reason this is particularly convenient is that (at least right now, for security reasons) any given subdomain can have only one associated plugin. So if one wants to have a range of functionality, this has to be implemented by having different endpoints.
For ChatGPTPluginCloudDeploy the one-plugin-per-subdomain limit currently means that any given user can only deploy one cloud plugin at a time. But for local plugins the rules are a bit different, and ChatGPTPluginDeploy can deploy multiple plugins by just having them run on different ports—and indeed by default ChatGPTPluginDeploy just picks a random unused port every time you call it.
But how does a local plugin really work? And how does it “reach back” to your computer? The magic is basically happening in the ChatGPT web front end. The way all plugins work is that when the plugin is going to be called, the token-at-a-time generation process of the LLM stops, and the next action of the “outer loop” is to call the plugin—then add whatever result it gives to the string that will be fed to the LLM at the next step. Well, in the case of a local plugin, the outer loop uses JavaScript in the ChatGPT front end to send a request locally on your computer to the localhost port you specified. (By the way, once ChatGPTPluginDeploy opens a port, it’ll stay open until you explicitly call Close on its socket object.)
When one’s using local plugins, they’re running their Wolfram Language code right in the Wolfram Language session from which the plugin was deployed. And this means, for example, that (as we saw in some cases above) values that get set in one plugin call are still there when another call is made.
In the cloud it doesn’t immediately work this way, because each API call is effectively independent. But it’s straightforward to save state in cloud objects (say using CloudPut, or with CloudExpression, etc.) so that one can have “persistent memory” across many API calls.
The LLM inside ChatGPT is (currently) set up to deal only with text. So what happens with images? Well, plugins can put them into the Wolfram Cloud, then pass their URLs to ChatGPT. And ChatGPT is set up to be able to render directly certain special kinds of things—like images.
So—as we saw above—to “output” an image (or several) from a plugin, we can use CloudExport to put each image in a cloud object, say in PNG format. And then ChatGPT, perhaps with some prompting, can show the image inline in its output.
There’s some slightly tricky “plumbing” in deploying Wolfram Language plugins in ChatGPT, most of which is handled automatically in ChatGPTPluginDeploy and ChatGPTPluginCloudDeploy. But by building on the fundamental symbolic structure of the Wolfram Language (and its integrated deployment capabilities) it’s remarkably straightforward to create elaborate custom Wolfram Language plugins for ChatGPT, and to contribute to the emerging ecosystem around LLMs and Wolfram Language.