How can text analysis tools enable researchers to study the relationships between texts? In an earlier post, I speculated about the relevance of such tools for understanding “literary DNA”–how ideas are transmitted and remixed–but as one reader observed, intertextuality is probably a more appropriate way of thinking about the topic. In my dissertation, I argue that Melville’s Pierre represents a dark parody of Mitchell’s Reveries of a Bachelor. Melville takes the conventions of sentimental bachelor literature, mixes in elements of the Gothic and philosophic/theological tracts, and produces a grim travesty of bachelor literature that makes the dreaming bachelor a trapped quasi-husband, replaces the rural domestic manor with a crowded urban apartment building, and ends in a real, Hamlet-intense death scene rather than the bachelor coming out of reverie or finding a wife. Would text analysis tools support this analysis, or turn up patterns that I had previously ignored?
I wanted to get a quick visual sense of the two texts, so I plugged them into Wordle, a nifty word cloud generator that enables you to control variables such as layout, font and color. (Interestingly, Wordle came up with the perfect visualizations for each text at random: Pierre white type on a black background shaped into, oh, a chess piece or a tombstone, Reveries a brighter, more casual handwritten style, with a shape like a fish or egg.)
Using these visual representations of the most frequent words in each book enabled me to get a sense of the totality, but then I also drilled down and began comparing the significance of particular words. I noted, for instance, the importance of “heart” in Reveries, which is, after all, subtitled “A Book of the Heart.” I also observed that “mother” and “father” were given greater weight in Pierre, which is obsessed with twisted parental legacies. To compare the books in even more detail, I decided to make my own mashed up word cloud, placing terms that appeared in both texts next to each other and evaluating their relative weight. I tried to group similar terms, creating a section for words about the body, words about feeling, etc. (I used crop, copy and paste tools in PhotoShop to create this mashup, but I’m sure–or I sure hope–there’s a better way.
(About three words into the project, I wished for a more powerful tool to automatically recognize, extract and group similar words from multiple files, since my eyes ached and I had a tough time cropping out words without also grabbing parts of nearby words. Perhaps each word would be a tile that you drag over to a new frame and move around; ideally, you could click on the word and open up a concordance) My mashup revealed that in many ways Pierre and Reveries have similar linguistic profiles. For instance, both contain frequently-occurring words focused on the body (face, hand, eye), time (morning, night), thinking, feeling, and family. Perhaps such terms are common in all literary works (one would need to compare these works to a larger literary corpus), but they also seem to reflect the conventions of sentimental literature, with its focus on the family and embodied feeling (see, for instance, Howard).
The word clouds enabled me to get an initial impression of key words in the two books and the overlap between them, but I wanted to develop a more detailed understanding. I used TAPOR’s Comparator to compare the two texts, generating a complete list of how often words appeared in each text and their relative weighting. When I first looked at the the word list, I was befuddled:
|Words||Reveries counts||Reveries relative counts||Pierre relative||Pierre counts||Relative ratio Reveries:Pierre
What does the relative ratio mean? I was starting to regret my avoidance of all math and stats courses in college. But after I worked with the word clouds, the statistics began to make more sense. Oh, relative ratio means how often a word appears in the first text versus the second–”blaze” is much more prominent in Reveries. Ultimately I trusted the concreteness and specificity of numbers more than the more impressionistic imagery provided by the word cloud, but the word cloud opened up my eyes so that I could see the stats more meaningfully. For instance, I found that mother indeed was more significant in Pierre, occurring 237 times vs. 58 times in Reveries. Heart was more important in Reveries (a much shorter work), appearing 199 times vs. 186 times in Pierre. I was surprised that “think” was more significant in Reveries than in Pierre, given the philosophical orientation of the latter. With the details provided by the text comparison results, I could construct an argument about how Melville appropriates the language of sentimentality.
But the differences between the two texts are perhaps even more interesting than their similarities, since they show how Melville departed from the conventions of male sentimentalism, embraced irony, and infused Pierre with a sort of gothic spirtualism. These differences are revealed more fully in the statistics than the word clouds. A number of terms are unique to each work. For instance, sentimental terms such as “sympathies,” “griefs,” “sensibility” appear frequently in Reveries but never in Pierre, as do romantic words such as “flirt,” “sparkle,” and “prettier.” As is fitting for Melville, Pierre‘s unique language is typically darker, more archaic, abstract, and spiritual/philosophical, and obsessed with the making of art: “portrait,” “writing,” “original,” “ere,” “miserable,” “visible,” “invisible,” “profound(est),” “final,” “vile,” “villain,” “minds,” “mystical,” “marvelous,” “inexplicable,” “ambiguous.” (Whereas Reveries is subtitled “A Book of the Heart,” Pierre is subtitled “The Ambiguities.”) There is a strand of darkness in Mitchell–he uses “sorrow” more than Melville–but then Mitchell uses “pleasure” 14 times to Melville’s 2 times and “pleasant” 43 times. Reveries is more self-consciously focused on bachelorhood; Mitchell uses “bachelor” 28 times to Melville’s 5. Both authors refer to dreaming; Mitchell uses “reveries” 10 times, Melville 7. Interestingly, only Melville uses “America” (14 times).
Looking over the word lists raises all sorts of questions about the themes and imagery of each work and their relationship to each other, but the data can also be overwhelming. If comparing two works yields over 10,000 lines in a spreadsheet, what criteria should you use in deciding what to select (to use Unsworth’s scholarly primitive)? What happens when you throw more works into the mix? I’m assuming that text mining techniques will provide more sophisticated ways of evaluating textual data, allowing you to filter data and set preferences for how much data you get. (I should note that you can exclude terms and set preferences in TAPOR).
Text analysis brings attention to significant features of a text by abstracting those features–for instance, by generating a word frequency list that contains individual words and the number of times they appear. But I kept wondering how the words were used, in what context they appeared. So Melville uses “mother” a lot–is it in a sweetly sentimental way, or does he treat the idea of mother more complexly? By employing TAPOR’s concordance tool, you can view words in context and see that Mitchell often uses mother in association with words like “heart,” “kiss,” “lap,” while in Melville “mother” does appear with “Dear” and “loving,” but also with “conceal,” “torture,” “mockingly,” “repelling,” “pride,” “cruel.” Hmmm. In Mitchell, “hand” most often occurs with “your” and “my,” signifying connection, while “hand” in Pierre is more often associated with action (hand-to-hand combat, “lift my hand in fury,” etc) or with putting hand to brow in anguish. Same word, different resonance. It’s as if Melville took some of the ingredients of sentimental literature and made something entirely different with them, enchiladas mole rather than a chocolate cake.
Word clouds, text comparisons, and concordances open up all sorts of insights, but how does one use this evidence in literary criticism? If I submitted an article full of word count tables to a traditional journal, I bet the editors wouldn’t know what to do with it. But that may change, and in any case text analysis can inform the kind of arguments critics make. My experience playing with text analysis tools verifies, for me, Steve Ramsay’s recommendation that we “reconceive computer-assisted text analysis as an activity best employed not in the service of a heightened critical objectivity, but as one that embraces the possibilities of that deepened subjectivity upon which critical insight depends.”
Howard, June. “What Is Sentimentality?.” American Literary History 11.1 (1999): 63-81. 22 Jun 2008 <http://alh.oxfordjournals.org/cgi/content/citation/11/1/63>.
Ramsay, Stephen. “Reconceiving Text Analysis: Toward an Algorithmic Criticism.” Lit Linguist Computing 18.2 (2003): 167-174. 27 Nov 2007 <http://llc.oxfordjournals.org/cgi/content/abstract/18/2/167>.