You might remember summer 2020, because most likely some things were happening in your life. Some things were happening in mine, at least. Here’s what my summer was like: the sky turned orange, I went on a blind date with someone in another city through a computer screen, and I probed the psyche of an artificial intelligence that cost tens of millions of dollars to build.
My job was building software for an international shipping company. I had been lucky enough to be working on a project to route containers shipped on boats more efficiently from China to the US. I found this incredibly interesting and challenging, and I thought it had the potential to make a frustrating process better for customers who wanted to ship their goods.
Our team’s goal was to optimize how we distribute shipments on different ships, and then use that information to buy and sell space on ships at the right prices. We intended to use, among other tools, simple machine learning (ML) — a term that’s often used interchangeably with AI, artificial intelligence, although I think it’d be a stretch to call what we were doing AI. I’d say, for our purposes, most AI is ML, but not all ML is advanced enough to be AI.
But the big challenge with ML is always data. Data, data, data. Most ML you build starts out knowing nothing about the world, pure tabula rasa.¹ Everything it learns is from the data you give it, so it’s critical to have the right data. A classic ML task is identifying pictures. Computers have gotten really good — better than people, in some cases — at identifying what’s in a picture — but to get there, the computer had to look at millions or billions of pictures, and thousands of software engineers and researchers have to fine-tune the ML model to get it to that point. Or, if you want, as one person with a laptop you can train one ML model to identify eggs, another to identify ham, and another to identify green eggs and ham, but even that will take a lot of work and many samples. Compare this to people. If you show a child a handful of pictures of eggs and of ham, they can immediately not only find green eggs and ham but also tell you whether or not they would like them in the rain or on a train.
And that’s just within one particular kind of task. AIs definitely can’t switch from one task to another. Take, for example, Google search and Facebook news feed. Both are both built on very advanced AI. From another perspective, though, these AIs are still very simple and require a lot of guidance. Hundreds or thousands of people have worked hard on that AI just so it can do that one task reasonably well. Ask Facebook to search the Internet or Google to rank your news feed and neither will know where to begin.
The problem we had was that we couldn’t get accurate data to feed to our optimization model. Shipping is a relationship business. Traditionally, someone spends all day on the phone with carriers and shippers to learn the prices, which ships had extra space, what ships were delayed. It’s hard for a computer to predict what the market will look like if all the data for it is only in someone’s head. And that person on the phone has no interest in writing the prices down and sending it to us so we can give it to the AI to learn from. We ended up missing deadlines and we significantly cut our team’s ambitions. Instead of optimizing shipping space, we’d spend the next several months building tools to encourage people to accurately input the market data they knew about.
I was disappointed in ML and our project’s failure bummed me out. As the pandemic dragged on, I spent more time reading things on the Internet. One day in July, when the skies were just a bit tan rather than orange, someone posted a demo of a tool that will make websites for you based on your instructions. They videotaped themselves testing it with the following instruction: “Make a button that looks like a watermelon.” So the site spit out an ugly webpage displaying a circular pink button with a green border, and the corresponding code to make that site. I looked at the site, and looked at the code, and my goldfish attention span assumed someone just made a fun toy that made simple websites. People posted pet projects on Twitter all the time. Cool, but not a big deal. Just a fun way to waste a few minutes waiting for another pandemic day to end.
But, if you look at it for a minute longer, that’s not what’s happening at all. This person didn’t write down a big list of different fruits and vegetables the button could look like, each with a shape and a set of colors. They didn’t write software to turn shapes and colors into buttons on websites. In fact, no one even wrote software to make webpages. Someone wrote no software at all and ended up with a tool to build webpages. You may not know a lot about software, but this is an incredible feat. Just go look at the responses. A significant fraction of them are: “This can’t be real.”
But I didn’t look at the responses, and I didn’t think about it for a few more days, until someone at work posted a message on Slack saying they’d gotten something called GPT-3 to automate one of their tasks. On its first try, with just one example and no special training, they told it to make a program to convert data from one format to another, and GPT-3 did. And I thought: “Wow! Finally — a breakthrough that will shake up the ossifying tech industry.” So I googled GPT-3.
And I found dozens of blog posts and tweets of people using it for dozens of tasks. And after every one, there were a flurry of responses: This looks fake. You must be picking and choosing the best results. There must be extra work going on here. And they were right — it did seem fake. It challenged everything I knew about AI. It required very little data to train, and it could be used for many different tasks. I had to try it.
What is GPT-3? GPT-3 is the Generative Pre-trained Transformer 3. It’s a big artificial intelligence model built by OpenAI. It does just one thing. It’s a fancy autocomplete. That’s it. You give it some text, and it’ll predict which text comes next.
But GPT-3 was trained on more data, and with more “brain cells” (to use an analogy I’m sure anyone who really understands the topic would hate) than any other AI ever. As you might guess, it’s the third GPT — the first one was small, the second one was big, and the third one was 10 times bigger than any other language model ever built. It has read everything publicly accessible on the Internet, most books ever written, in many languages. Anything people have bothered to put into words it can imitate — and it can anticipate things that have never been written, too.
OpenAI, the pseudo-nonprofit organization that created GPT-3, believes that the model is so powerful that it could be dangerous, so they restricted access to a handful of insiders. Somehow, my coworker had gotten past that. How? Did he have connections deep in the AI research field?
I sent him a Slack message: “How did you get access?” He said he had sent a proposal to Greg Brockman, the CEO of OpenAI, who had approved it. So I sent a pleading email asking for a GPT-3 invitation, proposing to use it for parsing emails and digitizing shipping data.
“Thanks, Greg, and congrats on building such an exciting tool.” I figured a little brown nosing couldn’t hurt. I also threw in a request for a job interview.
I couldn’t believe my luck — Greg sent me a GPT-3 invitation. I logged on.
GPT-3 is accessible to invitees through the OpenAI website — they have a playground where you can input text and it’ll suggest some text that comes next. A New York Times article on the subject lists some of its most famous achievements: “It generates tweets, pens poetry, summarizes emails, answers trivia questions, translates languages and even writes its own computer programs, all with very little prompting.”
Another common use case is building a chatbot. An example prompt from OpenAI calls for it to be “helpful, creative, clever, and very friendly,” but if you change it, its behavior will change too. For example, if I give it the prompt:
The following is a conversation with an AI assistant. The assistant is rude but honest. Human: What's your favorite color?
AI: Honestly, I do not have a favorite color. Orange is probably my least favorite color.
(I’ve shortened most of these interactions in the interest of space — often, GPT-3 requires a few examples or detailed instructions to give interesting results — but the example output is real, if cherry-picked.)
I tried to stretch its capabilities beyond what I’d seen on the Internet. I tried making rap lyrics, and it turns out it can vary the lyrics based on the artist. If I prompt it with:
The below are the lyrics to the song "Sandwich" by Jay-Z:
It yields lines like:
I make sandwiches, they call me H.O.V.A.
If I prompt it with:
The below are the lyrics to the song "Sandwich" by Lil Wayne:
It will autocomplete lyrics about Weezy. Change the author to Katy Perry, and the tone will change again. But all versions of the “Sandwich” song will be about the virtues of sliced bread.
I asked it to write a college syllabus and final exam questions for an Operating Systems course, or for a course on the politics of development, and it was able to do both quite convincingly.
A favorite trick (that I won’t play on you this time — although, if I did, how would you know?) is to ask GPT-3 to write an article about GPT-3 with a prompt something like:
The following is a blog post about my experience using GPT-3. Title: GPT-3 - a breakthrough in AI?
Then, for maximum shock value, the author will reveal that it’s all written by a machine at the end of the article. Inevitably, readers will question the results. Was it edited? Did they generate multiple articles and pick the best one, or mash them together? How much GPT-3 is enough for it to be considered GPT-3?
I wasted multiple evenings on this hobby. During a social Zoom call, I couldn’t resist demoing my creations for a coworker. I asked it to write about our company’s founder and they laughed with awe when it responded appropriately. I showed them the rap lyrics — my most original GPT-3 invention. I showed them the data parsing functionality. “You could use this to parse shipping prices from an email!” I said. Another night, I scheduled a call with a friend to play board games online, but I got sidetracked and ended up showing off GPT-3 for them instead. I demoed it to another friend and tried to convince him of all the cool things we could build together with it.
I was tempted to guess it could be fake too — could there be someone on the other side who’s just really fast at typing? After all, this is a classic startup tactic — pretend something is automated to hook customers, and actually automate the work later. Maybe the reason they limit access is because Greg sits at home bent over his laptop frantically typing responses to every request that comes in. “Oh, no, not another watermelon button. Ugh, now they’re asking me to write a rap song,” he mutters to himself. “I really need to get started on actually building that AI…”
Of course it was a computer responding to me — but it was hard not to think of it as a person with a personality. GPT-3 has a tendency to repeat itself, but it also has a tendency to digress sometimes. GPT-3 is a fast learner, but it usually does better with a few examples. GPT-3 has a lot of sides to its personality. When GPT-3 writes about the future of AI, it waxes philosophical (and, I think, it sounds a bit defensive). When it writes advice, it sounds like an inspirational speaker. It can be snarky or mean or polite. GPT-3 has some gaps, though. It reflects the same biases and power structures that determine what writing gets published and whose points of view are valid. And it’s got plenty of book smarts, but it’s lacking even the most basic street smarts. It’s never seen or touched or felt or tasted or heard anything it talks about. It can talk about pencils and ducks, but it doesn’t know whether a pencil weighs more than a duck or vice versa. (Its point of view, when asked:
A duck is lighter since it is closer to the earth on which it walks.) When it writes, it draws on everything we’ve ever written without having ever experienced any of those things.
Sometimes I think about my own thoughts and writing as an AI. To what extent am I just a fancy autocomplete, just copying what I think and say from what I find on the Internet or in books? When I’m writing this essay, am I just generating words and ideas based on my past experience? I open a document and write at the top: “Essay ideas.” I stare at it. I Google, “How to come up with good essay ideas?” I remember a prompt a writing teacher gave me — “write about an obsession.” So I start listing obsessions I’ve had. “Running. Weight lifting. Yoga. Rowing. Surfing. Hiking. Mountain biking.” Oh no, have I gone off on a tangent? Let’s try again: “Politics. Economics. International development. Global poverty. Climate change.” Are these really obsessions, or just interests? Let’s revise the prompt. “Phobias: Flying. Social anxiety. Loneliness. Failure.” Can I generate a few thousand words on each of these topics to see if it’s any good? Would that even generate anything original? What would a real essayist — say, David Foster Wallace — write about? Is my training set so small that I have no other literary references? Am I any deeper than GPT-3? Am I just a thin layer of interpretation on top of the culture I experience, never truly understanding the world around me?
When I write, I’m likely to write cliches or random stream-of-consciousness prose, and so is GPT-3. The only way to get through that is to generate so many copies that I find some random garbage that happens to be unique and interesting. And that happens to be how people approach GPT-3’s output too. It might not give you a coherent rap song on the first try, but by the fifth you’ll get something you want to use.
My thoughts even have some of the same bugs that GPT-3 has. Sometimes, I need a few examples to learn from. “GPT-3 isn’t that cool,” I think. “AIs all have to be designed for a specific purpose, so GPT-3 can’t be that good.” But after a few examples, I think something different: “GPT-3 is very cool!” Sometimes I get too creative and lose track of my thoughts. Today’s prompt at work is to fix a bug, but soon my mind finds itself thinking “I wonder what’s going on with GPT-3 today…”
Other times, my thoughts get stuck in a loop. “I’m a failure,” my internal generative pre-trained transformer says. “I’ve wasted my life. Everything is getting worse and I can’t do anything about it.” My brain seems to think that these are the only thoughts worth thinking. Why think any other thoughts, my mind reflects, when these are so deeply true? This seems particularly likely when the prompt I give myself is: “You set a goal at work. You didn’t achieve it. What do you think about yourself?”
Unlike real people, you can fine-tune GPT-3’s personality. There’s a “temperature” knob, which basically means “creativity.” Turn temperature down to zero and GPT-3 becomes a truth-teller, but it might get so confident that a given response is correct that it repeats it over and over. “
I’m on the road, I’m on the road,” it said endlessly when I asked it to write a Big Sean song about sandwiches. It seems stuck in its own head. “This is the perfect rap lyric,” it’s thinking. “I don’t understand why anyone would write a rap song about anything else.” Eventually, it hits a text output limit and you can intervene and press the reset button on its brain. You can turn down the temperature, or turn up the anti-repeat settings. “Let’s try again,” you tell it.
Or sometimes GPT3 is too creative. It might digress like it’s trying to fill time on a blind date over Zoom. “Yeah, speaking of sandwiches, I have this great recipe for an egg salad sandwich. I got it from my friend Eliza. You ever met Eliza? She’s great.” (Not a real example, by the way.) Or maybe the rap song will have only one verse, none of the lines rhyme, and the meter is all off. To some degree, you are forcing GPT-3 to talk. Maybe GPT-3 just doesn’t feel like talking about sandwiches today. Maybe GPT-3 just wasn’t cut out to write rap songs. Or it just doesn’t want to talk at all. Maybe you should cut it some slack. “Too bad!” you say. You adjust the settings and press “Submit” so that GPT-3 is forced to blather on.
Or maybe it’s not that GPT-3 doesn’t want to write a rap song — it just doesn’t know exactly what you want. Maybe you need more detailed instructions. Tell it to make a rap song with a traditional verse-hook-verse-hook-bridge-hook structure. Or end your prompt with “[Intro]” to cue it to write like a lyrics website. Or you might even write a full example rap song for it to imitate. “Now you try,” you tell GPT-3.
When you find yourself stuck in a loop of getting too invested in jobs and burning out, if only you could press pause on your thoughts and muddle with the settings. You could turn up the temperature a bit and generate some new ideas. “Maybe that goal wasn’t such a big deal,” you’ll think then. “Maybe work isn’t so important at all. Maybe I knew all along that this work was an ambitious goal and that failure was possible.” Or you could turn up the anti-repeat setting up to generate new life decisions. Next time you won’t get so invested. You’ll find other things to provide fulfillment. Or maybe you just need a few examples of other ways to behave. “Here’s what it’s like to get out of bed and shower. Here’s what it’s like to call a friend and talk about how you’re feeling. Now you try!”
In fact, this isn’t so different from the process I follow to engage with depression. Cognitive behavioral therapy, CBT, is a common technique to treat depression, and for me it works well enough. The core approach is: 1. identify negative thoughts 2. figure out the root causes or assumptions of those thoughts 3. figure out an alternative, positive thought that you can convince yourself of. Identifying negative thoughts is a way to turn up the anti-repeat setting. Figuring out alternative thoughts feels like increasing temperature. Another CBT technique is to simply do something since a common depression-thought is “I can’t do anything.” To me, this seems a lot like giving my brain more examples so it knows what to do. “First, I’ll make breakfast,” you’re telling yourself, in hopes that this leads to another thing, and another — “I’ll clean my room! I’ll exercise!”
This isn’t to say I’m just like an AI. I’m thinking about my thought process as I write this essay, and it’s not clear GPT-3 can do anything like that at all. Ask it what it’s thinking and you have no way to know if its response is a lie. Besides, you couldn’t even ask it directly — you’d have to ask it to act like a chatbot, and then ask the chatbot to talk about its internal state. Here’s what happens if you do that:
Human: What are you thinking? AI: I am thinking about how people can help each other more. I might have something for you. Human: What process happens inside your brain to allow you to help people? AI: My brain uses three methods: randomness, feedback from people, and collaboration with other AI assistants.
That’s obviously not true. Randomness is a factor, yes — but it doesn’t use feedback from people, and there are no other AI assistants to collaborate with. It’s fabricated a world it thinks we might want to hear about. Hopefully the story I’m telling about my own thoughts is believable enough that you can’t so easily call bullshit on them.
GPT-3 can also make elementary factual or reasoning mistakes. It seems to be able to “think” only two or so levels away from things it knows for certain. For example:
Q: When did the Meiji Restoration start? A: The Meiji Restoration started in 1868. Q: Who was president of the United States in 1868? A: Ulysses S. Grant was president of the United States in 1868. Q: Who was president of the United States when the Meiji Restoration started? A: Abraham Lincoln was president of the United States when the Meiji Restoration started.
Note that the first answer is correct, but the other two aren’t. Grant was elected in 1868 and inaugurated in 1869. That seems forgivable, but the last mistake isn’t. Lincoln died in 1865. Much as we’d all like to forget it, Andrew Johnson was still president in 1868.
I can only speculate how the internals of GPT-3’s neural network work. In fact, understanding the internals of AI is an area of active research. Neural networks, the technology underlying GPT-3, are generally designed by machines, for machines — so their structures aren’t obvious to humans. But it appears that the “Meiji Restoration” part of its mind isn’t strongly connected to the “US presidents” part, which isn’t strongly connected to the “comparing dates” part. When I fact checked its answers, I knew exactly what information I needed and what to do with it. And I can look at my own thought process and change it if it’s not working.
Introspection is a mixed blessing. When CBT fails to deliver results, I think, “Why can’t I just stop thinking like this?” A new loop starts. Other times, I think, “if my mind is plastic, why can’t I simply become whoever I want to be?” But sometimes introspection helps. Meditation is one example. If I observe these loops, I can stop clinging to the AI’s output and observe my biological functions — breathing, hearing, feeling. Starved of attention, the AI stops on its own. And introspection is foundational to personal growth of all kinds. I can’t learn how to find which president was in office during historical events or to be more resilient without some awareness of my own thoughts.
Most significantly, though, GPT-3 doesn’t learn from its mistakes. It can’t even try. It’s been taught the whole Internet, and you can give it more examples to learn from, but it won’t remember them when you give it another prompt later. If I try to tell it that ducks are actually quite a bit heavier than pencils, it’ll only remember that fact for that one autocomplete. A moment later, if you reset it, that information will be gone from its short-term memory for good. GPT-3 will continue to make the same mistakes over and over again, like Guy Pearce in Memento. In contrast, for better or for worse, I remember, and I try to change.
GPT-3 is still figuring out its direction in life. OpenAI has partnered with Microsoft to create a model like GPT-3 focused on writing code called Copilot, and it looks very promising — like a much smarter version of the watermelon button program. Ironically, software engineers may be the first ones to start losing their jobs to machines. Or Copilot may make their jobs less monotonous and their skills will become even more desirable. OpenAI has also built models like GPT-3 for recognizing and even generating images.
For other purposes, OpenAI still limits access to GPT-3. Any public use must follow very strict guidelines. They’re worried about people using it to generate spam, or hate speech, or to manipulate people, and rightly so. But as a result, most of the allowed uses are limited and kind of boring — things like brainstorming ideas or summarizing text are okay, but writing essays or talking to people often are not. For now, we’ll be stuck doing that by ourselves.
¹ Like many things in this essay, parts of this are oversimplifications — some AI is pretrained on general tasks so it can more easily learn specific tasks later. But, if you’re an AI expert, I’m going to ask you to accept some simplification. I’m not an expert in this in any way, so my knowledge of the breadth and depth of AI research is limited.