What can Machine Learning do for Literary Critics?

What can Machine Learning do for Literary Critics?

First in a series of posts about artificial intelligence sparked by “The Great AI Awakening,” an article from December 2016 by Gideon Lewis-Kraus in the New York Times Magazine. Cross-posted to Michael Ullyot‘s blog.

Can you trust machines to make decisions on your behalf? You’re doing it already, when you trust the results of a search engine or follow directions on your phone or read news on social media that confirms your worldview. It’s so natural that you forget it’s artificial; someone programmed a machine to make it happen. If Arthur C. Clarke is right (“any sufficiently advanced technology is indistinguishable from magic”), we’re living in the age of magical thinking.

We’re delegating ever more decisions to algorithms, from a matchmaker identifying your soulmate to a car killing either you or those pedestrians. Our thinking will only get more magical when our machines learn to make better decisions, in more parts of our lives.

Gideon Lewis-Kraus makes these invisible processes more visible. He tells the story of major improvements of the Google Translate algorithm, which can easily make distracting errors if it’s overly literal. His example is “minister of agriculture” rendered as “priest of farming” — a phrase that native speakers find strange, but that a highly literal translating machine might not. The Google Brain team built something called a “deep learning software system” to imitate the way that neural networks make reliable decisions. (The third section of his article, “A Deep Explanation of Deep Learning,” is an accessible introduction about recognizing images of cats.)

So what’s all that got to do with being a literary critic? First, my method is to use machines in just the first stage of criticism, the gathering of examples to make into arguments. If I want to find all the lines in which Shakespeare discusses the weather, I start by defining ‘weather’ in terms a machine can understand, and then point that machine at the Complete Works of Shakespeare and say ‘go find me lines about this.’ Once it’s finished I take over, reading each line to see if it’s (1) actually about the weather — and if not, tweaking my definition and repeating the process – and (2) interesting enough to work into an argument about Shakespeare and — I don’t know, melancholy clouds or something.

What I do is called augmented criticism, and you can read all about it here, and also here, if you like. My collaborator Adam J. Bradley has designed algorithms to give us vast numbers of rhetorical figures, which are sometimes called figures of speech. (This Wikipedia article has an impressive list, but the definitive resource is Gideon O. Burton’s Silva Rhetoricæ. Bradley and I have been able to make comprehensive arguments about figures like gradatio in early modern drama — way more comprehensive, that is, than we could make without the machine gathering examples of this figure for us. (And you can read all about those arguments in our two papers, cited are at the end of this article.)

Herding Tigers

Lewis-Kraus’s description of Google’s neural network got me thinking about expanding our approach. It was his distinction between rule-based and data-based machine learning, which (I take it) is the difference between shallow and deep learning. Evidently there’s been a historical debate between creationism and evolutionary theory, and it’s not the one you’re thinking of.

When you give a machine a problem, whether it’s finding cats in YouTube videos or finding rhetorical figures in early modern drama, you have two options. The first is the creationist, rule-based approach: you give your machine all the rules of cat-finding (pointy ears, four legs, whiskers, … ) and set it loose on the data. The advantage is that you’ll find every cat that conforms to your definition of a cat, and you’ll find them quickly and reliably; few dogs or kangaroos will meet that definition.

The second option is the evolutionary, data-based approach: you start with the data, and see what the machine makes of it. It’ll find plenty of dogs and kangaroos, but here’s the important part: each time it does, you manually correct it and run the process again. Next time, it remembers that some of its components are more reliable at identifying cats, so the ‘votes’ they cast for cat over kangaroo are weighted more heavily. “[I]t functions, in a way, like a sort of giant machine democracy” in which “each individual unit can contribute differently to different desired outcomes.” (Read the article for more details, like how this mimics the brain’s neural network.)

There are advantages to the second approach, the first being that it just makes intuitive sense. “Humans don’t learn to understand language by memorizing dictionaries and grammar books,” Lewis-Kraus asks, “so why should we possibly expect our computers to do so?” The disadvantage is that it’s really slow, especially at first when you constantly have to supervise its learning (or correct its errors). But it’s far more flexible, less brittle, than rule-based systems.

The Zoo and the Wild

How so? Consider the problem of identifying bigger cats, out in the wild. You could start by looking at every tiger in the zoo and then going into the wild to look for every four-legged whiskered carnivore who looks and acts like the ones we know, whom we can label as ‘tiger.’ That works well enough — but it assumes that every tiger in the wild will resemble the ones that are back in the zoo. Usually that’s a safe assumption, but how do we know we’re not missing new tiger subspecies?

Rhetoric is a problem of identifying all the cats in the wild, not just admiring the ones in captivity. It’s about natural habitats, not dioramas — because rhetoric is persuasive and dynamic and fluid. Sure, it has a few rules and conventions (a lot, actually) — but its real purpose is to impress people with its beauty and overpower them with its arguments. To understand the breadth of those arguments, and the variety of that beauty, we ought to look at all the four-legged whiskered carnivores, whether or not they resemble the ones in the zoo.

The Augmented Criticism Lab has designed our algorithms on the rule-based model rather than the data-based model. We developed a formula (or rule) for figures like gradatio (… A, A … B, B …), based on every example we could find, and then we asked the computer to identify patterns that exactly fit that formula. And we get great results when the rhetorical figures in the wild look just like the ones that we’ve seen before, in captivity.

But our next step is to add a more data-based, evolutionary approach to our toolbox. It’ll start slowly, with lots of labelled examples of data, and training sets for our neural network to examine — like showing a child picture-books of tigers, before she graduates to more grown-up zoological textbooks of tiger subspecies.

And where will it go from there? We’ll gather evidence that more rhetorical figures exist than we knew, and that they interact with each other in ways and over genres and texts that we never expected. Not only will we find all the lions and lynxes and panthers and all the other catlike creatures, but also track down every last tiger and every subspecies, the Sumatran and the Malayan and the Amur. The outcome will be a better understanding of where figures live and how they interact together in the textual wilderness.

Further Reading

Bradley, Adam J. and Michael Ullyot, “Past Texts, Present Tools, and Future Critics: Toward Rhetorical Schematics.” In Shakespeare’s Language in Digital Media: Old Words, New Tools. Ed. Jennifer Roberts-Smith, Janelle Jenstad, and Mark Kaethler. London: Routletdge, 2017.

Bradley, Adam J. and Michael Ullyot, “Human Creativity vs Machine Requirements: Computational Rhetoric and the Search for Gradatio.” Argument and Computation. (In submission 2017.)