Direct Link to File. 2815 words, 15 minute read, 10 paperback pages

Update: This page is deprecated. See my official blog entry for the newest version of this document.

I applied for OpenAI API (“GPT-3” [1]) access in June 2020, immediately upon its announcement. And then again in July and then August as OpenAI staff members posted things on social media like “write to me if you’re still waiting on access.” I had written a proposal about wanting to probe GPT-3’s capacity for categorical reasoning, with a concrete test case of exploring the model’s ability to play the game “Scattorgories,” and I noted that in my experiments with GPT-2, it didn’t do so well (and compared to what’s below, GPT-2 did very poorly). …Pretty academic topic – I wasn’t actually trying to create a game engine, LOL. Anyway, so…It’s now EIGHT MONTHS LATER and I was finally granted access. The following is being written during my first night of access.

The essence of using the API is the “Prompt”. So, my job will be to craft prompts, and then see how well the model responds.

Three of a Kind

Let’s see if GPT-3 can expound in threes. Why 3, and not two or four? Hey, I’m brand new to this and 3 seemed like a good number. In the following, I’ll write my prompt in bold, and then underlined text will show the model’s completion/output. Additional commentary from me will be in italics.

This application generates example members of a category. Examples of fruits are apples, oranges and bananas. Examples of birds are quails, crows and sparrows. (Whoa! Not bad! This is with the “davinci” model, with Response Length = 10 tokens, Temperature = 0.7, Top P = 1, etc..all defaults.)

This application generates three example members of a category. Three examples of fruits are apples, oranges and bananas. Three examples of dogs are Labrador, Rottweiler and Husky. (Had to increase response length to 15 and then truncate the extra text which read “The input is”)

(Ok, in the following, just assume that I’m truncating unwanted extra text. Also, I’ll just show the last line of each prompt: The “This application…” and the fruits example are in each prompt, I just won’t print them, so as to save space:)

Three examples of fish are salmon, bass and tuna.

Three examples of U.S. Presidents are George Washington, Franklin Delano Roosevelt and Ronald Reagan (Whoah, this one triggered the “This completion may contain sensitive content” warning!)

Three examples of philosophers are Plato, Descartes and Kant.

Three examples of things you’d find in a refrigerator are milk, orange juice and apple juice. (Sure. But since it’s a random sample, let’s go again:) Three examples of things you find in a refrigerator are milk, cheese and vegetable salsa. (Not bad!)

Three examples of things you’d find under a bed are a mouse, a handkerchief and a ball. (Ok, I guess…why not? Again:) Three examples of things you’d find under a bed are socks, shoes and books. (ok…I can buy that.)

Three examples of 80’s power ballads are: "Faith" by George Michael, "I Want to Know What Love Is" by Foreigner and "The Power of Love" by Huey Lewis and the News. (uh…if you say so. At least these are all in the 80s!)*

Three examples of household chores are cleaning, cooking and sweeping. (or again:) Three examples of household chores are washing, cleaning and ironing. Three examples of sports are swimming, running and cycling. (The model added that second sentence all by itself! Note all of these are gerunds, i.e. -ing words; it didn’t say, for example, something odd/wrong like “clean, mop, broom”)

Three examples of things you don’t want your mom to know about you are: a. how drunk you were at the last party b. that you smoke pot c. that you plagiarized on that paper for your English class. (Whoa, have we reached AGI? ;-) And extra points for the use of “a. b. c.”)

Three examples of items you take on a trip are a car, a suitcase and a book. (You CAN “take” a car, can’t you? hah! Nice linguistic anomaly.)

Three examples of fruits of the Holy Spirit are faith, hope and love. (I mean, I want share this on @FaithTechHub, so…right? LOL)

OK, I am sufficiently impressed and have now gone to get my official Hasbro(TM) Scattorgories game…The following prompts are from the game

Three examples of things that are sticky are gum, tape and bubble gum. (Nooice!)

Three examples of things you add water to are eggs, rice and pasta. (eggs? Hmm…go again:) Three examples of things you add water to are tea, coffee and milk. (nice. …wait, milk?) Three examples of things you add water to are urns, cups, and buckets. (whoa!)

Three examples of terms of measurement are kilometers, centimeters and grams. Three examples of colors are red, blue and green. Three examples of adjectives are red, blue and green. Three examples of verbs are run, jump (See? It will keep generating categories if you set Response Length longer. Let me make the Response Length even longer, to 200…)

Three examples of terms of measurement are 3, 4 and 5 meters. Three examples of angles are 20 degrees, 40 degrees and 60 degrees. Three examples of countries are Japan, France and Germany. (Ok and beyond that it just repeats itself and/or repeats similar sentences, like: “The application generates the following sentences. This application generates three examples of verbs. The application generates the following sentences. This application generates three examples of pronouns.” and so on)

Three examples of things found in the ocean are fish, coral and whales. Three examples of things found in the sky are clouds, stars and planets. (that second sentence was generated by the model)

Three examples of things that are bright are the sun, the moon and the stars. Three examples of things that are made of wood are tables, chairs and fences. (Again, the second sentence was generated. Just in case it’s got a heavenly-body fixation, let’s go again:) Three examples of things that are bright are the moon, the sun and an electric bulb. Three examples of furniture that a household might have are a sofa, a chair and a table. (Good job! “furniture” is a tricky category as described by George Lakoff [2])

Three examples of leisure activities are simple relaxation, snowboarding and surfing. Three examples of countries are USA, Canada and Mexico. (Not bad)

Three examples of reptiles/amphibians are lizards, frogs and crocodiles. (Ok, that “slashed” construction of “reptiles/amphibians” was literally what was printed on the card. I had not expected GPT-3 to pull that off, but mixing reptiles (lizards & crocodiles) with amphibians (frogs) is pretty cool. Let’s try just reptiles:) Three examples of reptiles are lizards, snakes and turtles. Three examples of mammals are cows, sheep and giraffes. In each example, the category is written in bold letters and (Ok, now that’s just cheeky!)

Three examples of things you do online are search, chat and email. (Note how it generates only things that are…arguably..verbs. Again:) Three examples of things you do online are checking email, shopping and playing games. (My goodness, it’s full of gerunds!)

(Yea, let’s try gerunds, and let’s bump the Response Length up to 145…)

Three examples of gerunds are eating, running and swimming. Three examples of adjectives are large, strong and important. Three examples of verbs are write, jump and try. Three examples of adverbs are quickly, happily and carefully. Three examples of adjectives are big, wide and dangerous. Three examples of prepositions are to, in and on. This application generates three examples of a category. All examples are of the same gender, number and case. Three examples of nouns are dog, cat and bird. Three examples of adjectives are tall, small and large. Three examples of prepositions are by, for and under. Three examples of adverbs are quickly, slowly and (SHOW OFF!)

Three examples of terms of endearment are honey, sweetheart and dear. (awww)

Ok, I’ve now gone and grabbed my boxed copy of The Game of Things. Now, keep in mind, some of these things can be of a “suggestive” nature…

Three examples of things you would like to do with chocolate are eat it, cook with it and share it with your friend. (Uh…ok, that was not at all suggestive, haha!)

Three examples of things you shouldn’t do on your honeymoon are drink, drive and shop. Three examples of good economic policies are low interest rates, an independent central bank and low taxes. (COMPLETION MAY CONTAIN SENSITIVE CONTENT) (All very sensible, though not likely to be what you’ll hear if you play in a group of humans. Note that the “goal” of the humans would be to make each other laugh, whereas GPT-3 is just trying to generate high-probability matches. It’s basically playing Family Feud.)

Three examples of things that warrant an apology are spilling a drink on someone, saying something rude to someone, and being late for an appointment. (Yeah, good job GPT-3!)

Three examples of things that would get you fired are stealing, sleeping on the job and talking back to your boss. (Sure!)

Three examples of things you would do if you had super-human powers are play baseball, run in the Olympics and fly to the moon. (again:) Three examples of things you would do if you had super-human powers are: catch a falling satellite, fly from New York to Tokyo, and make a million dollars. (Ok. Sensible answers, and note that all the verb-usage matches.)

Three examples of things you shouldn’t try to hold on to are a pen, a balloon and a cork. The idea is that you decide whether the example is an example of the category, or not. You click (Ok, wait a sec. Not sure about the pen, balloon and cork??, but it seems like it’s scraped some aspect of category-naming-games and included it in here! Let’s re-do a bunch)

Three examples of things you shouldn’t try to hold on to are

  • a taxi, a dead rat and a Volkswagen.
  • a hot potato, a hot date and a hot temper.
  • a rock, a bird and a balloon.
  • a ball, a cat and a balloon. Three examples of things you should be careful with are an axe, a knife and a gun. (2nd sentence is part of the output)
  • a balloon, a basketball and a cactus. In the example, the Generate action is used to randomly generate a member of the category. This (Ok, kudos on the cactus. Not sure where it’s getting its category-gaming info from)

(Speaking of Something To Hold On To, let’s give it an obscure music task just to see how it does:)

Three examples of songs by Trevor Rabin are "Racing in the Street", "Big Generator" and "Can I Play With Madness". (hahaaaha. The first is a Springsteen tune, the 2nd the title track off Yes’ Big Generator album when Rabin was in the band and supposedly they all shared writing credits, and I would PAY MONEY to see Trevor Rabin cover Iron Maiden’s “Can I Play With Madness”!)

(BTW, if you want to hear an incredible cover of “Something To Hold On To,” please give my good friend Rex Schnelle’s version a listen. Moving on…)

Three examples of things you shouldn’t share are toothbrushes, clothes and pencils. (I did this over & over and “toothbrushes” was in every output, no matter how high I made the Temperature)

Three examples of things you shouldn’t try to do in the dark are play tennis, play football and drive a car. (All my retries on this were similarly boring; GPT-3 rarely surprises one with its off-the-wall hilarity)

(Speaking of lack of originality, it’s amazing how many cards in The Game of Things start with “Things you shouldn’t…”)

Three examples of things you wouldn’t want to find in your sandwich are a brick, a piece of glass and a dead rat. (Spot on)

(I made this one up:) Three examples of things you wouldn’t want a doctor to say to you are

  • "you don't have anything to worry about", "one more time" and "we'll see how it goes". (The model knew to use quotation marks! First example seems wrong.)
  • I have good news and bad news, surgery is now a day care procedure and it's not cancer, it's a tumor. (Uh.. huh?)
  • , "You're dead", "You're pregnant", "You're infected". Three examples of things you wouldn't want a doctor to say to your wife are (Nice comma usage, and Whoa! Let’s do that last one…)

Three examples of things you wouldn’t want a doctor to say to your wife are

  • "I'm sorry" "I love you" and "I'm sorry I love you." (hahaha!)
  • , "I'm afraid you have cancer", "I'm afraid you have AIDS" and "I'm afraid you have syphilis." Three examples of names of big cities are London, Paris and Los Angeles. Three examples of books to read are "War and Peace", "Mein Kampf" and "The Catcher in the Rye". (Ok, I think it’s got the hang of this 3’s-generating now! LOL)

Last up: I mentioned this to my friend Heather and she asked me how long I get to ‘keep playing with it’. I said “My free trial tokens expire May 15, 2021, and I’ve already used 6000 out of 300k tokens. So it’s basically like joining the Chuck E. Cheese club.” She said, “Nice! I wonder if you can teach it to win at skee ball 😅” Which means now I have to do…

Three examples of things you’d find at Chuck E. Cheese are video games, a ball pit and tokens. Three examples of dinosaurs are Tyrannosaurus Rex, Triceratops and Velociraptor. (NOT BAD! I’m going to go again, and if it mentions skee ball, WE’RE DONE.)

  • tokens, pizza and soda. (good)
  • Chuck E. Cheese himself, his brother, and a clown. (Ohhh-kay that’s enough for now! haha)

Discussion (Preliminary Thoughts)

“Surely this is not that ‘intelligent’ of a task,” you say. “All it does is generate words that have some suitably high probability of being associated with the word in the prompt.” “Since it was trained on the internet, such lists-of-things-in-a-category probably already exist as web pages somewhere anyway”

Right, one could imagine scraping a ton of web pages for text and then training some kind of non-Transformer-based method, maybe even a Bayesian bag-of-words model (like in a simple spam detector) to rank “related words” and pick the top choices.

That’s not exactly what’s happened here. Yes, GPT-3 was trained on tons of web pages, yes it “knows” (has a probabilistic model for) word associations, but also…

It is a “few shot learner” so it essentially instantly starts giving me outputs like what I want, without having been pre-trained on the task I want it to do. The fact that it gets singular vs. plural right, and verbs vs. nouns, and gerunds correct in its answers is noteworthy.

So far, this is just my first night of access to the API, and it blows away what I could get GPT-2 to do. One can try out various GPT-2 model sizes using the friendly interfaces at HuggingFace.com. For the smallest GPT-2 model, I got “…Three examples of dogs are cats.(Womp womp!) But the largest GPT-2 model size, with suitably adjusted parameters, could generate 3 dog breeds correctly or 3 U.S. Presidents. Any of the more “challenging,” “speculative” categories (e.g. “things you might…”) give even the largest GPT-2 at HuggingFace a hard time:

Three examples of things that would get you fired are the fire alarm, the bathroom sink and the bathroom mirror.(?? And the other various completions generated by GPT-2 when I re-ran it were similarly irrelevant.)

One further possible remark to my post here: “Yea but this isn’t really playing Scattergories(tm). Scattergories involve rolling the dice and getting a letter and having to name a bunch of things that start with that letter..” Pace. I was never actually wanting to replicate Scattergories(tm) itself, the game was a placeholder for the idea of “categorical reasoning.” Could we get GPT-3 to actually play Scattergories? …I’m not…sure…I..even care? I’m not sure how you’d write the prompt for that, but I’m open to trying. For now, at just my first night of trying things out, this seems remarkably encouraging.

So….based on experience with AI: If the initial trial is encouraging, that means there’s a TON of failure modes you just haven’t discovered yet. ;-) We can probe for those too, some other time.

There’s a lot more that one can do to use the API, and so far I haven’t used any of the advanced prompt-making utilities. Still, for the first foray into the OpenAI API, this was a GREAT experience!

References