I’ve had a longstanding, friendly debate with a colleague about whether it is sufficient to provide page images of books, or whether text should be converted to a machine- and human-readable format such as XML. She argues that converting scanned books to text is expensive and that the primary goal should be to provide access to more material. True, but converting books into a textual format makes them much more accessible, allowing users to search, manipulate, organize, and analyze them. Here’s my summary of what you can do with an electronic text. Most of these advantages are pretty obvious, but worth articulating.
• Read it—on paper (once you print it out or pay for on-demand printing), your computer, or, increasingly, a portable device. From a single XML file, you can generate many forms of output, including HTML, PDF and for a mobile device.
• Copy and paste it–avoid the hassle of having to retype passages.
• Search it. Several years ago, I wrote a series of learning modules on stereographs, 3D photographs popular in the late 19th and early 20th centuries. I searched for books and articles on stereographs in the library catalog and in journal collections such as JSTOR, but was kind of disappointed by the lack of relevant information. Last year I returned to the topic and used Google Books for my research. I found dozens more relevant sources, such as key theoretical and historical works on stereography (most of which had already been published when I first studied the topic) as well as some fascinating nineteenth and early twentieth century manuals. Sure, I had to wade through a lot more stuff to find what I needed, but being able to search the contents of books and essays as well as the metadata let me uncover much more useful stuff.
• Build a personal collection. Forget file cabinets crammed with photocopies. Using tools such as Zotero and EndNote, you can easily download articles and the accompanying bibliographic information onto your laptop, then take your entire collection with you on a plane, to an archive, to a boring meeting, etc. You can search your collection, sort it, create bibliographies, etc.
• Share it. Much to the chagrin of movie studios and record companies, digital files are easy to share, so you can give colleagues access to articles, notes, bibliographies, etc. without having to deal with physical delivery (copyright permitting, of course.) With the forthcoming Zotero 2.0, sharing will get even easier.
• Analyze it. Once you have a book in a text-based format, you can do all sorts of nifty things with it–generate word counts, find out what terms appear most frequently next to a particular word, extract dates, find capitalized terms, compare texts, and much more. See TAPOR’s tutorial.
• Visualize it. Not only are text visualization tools, well, cool, they also can open up interpretive insights. For instance, using the US Presidential Speeches Tag Cloud, you can get a quick, dynamic view of the history of presidential priorities.
• Mine it. Look for patterns in large textbases. As Loretta Auvil of NCSA & SEASR explains, text mining tools such as those being developed by MONK and SEASR enable researchers to automatically classify texts according to characteristics such as genre, identify patterns such as repetition (as in the case of Stein’s Making of the Americas), analyze literary inheritance, and study themes across thousands of texts.
• Remix & play with it. By taking the elements of a text or collection of texts and remixing them, you not only produce a new creative work, but also see the text in a new way–your attention is brought to particular linguistic elements, like the fragments of a broken vase used to make a mosaic. For instance, when I used the Open Wound “language mixing tool” with Melville’s 1855 sketch “The Paradise of Bachelors and the Tartarus of Maids”, I gained new insights into the violence and anxiety expressed by words such as “agony,” “cut,” and “defective.” Running the tool on the sketch also produced some stunning phrases that could serve as mottoes for this kind of activity: “Exposed are the cutters,” “in the meditation onward,” and “protecting through the scholarship.” I also plan to play with tools that would allow me to mashup several bachelor texts (take the beginning from Irving, the middle from Melville and Hawthorne, the end from Mitchell), replace key words with pictures, etc.
Some really interesting research is underway on the possibilities of text mining for humanities scholarship–including the aforementioned MONK and SEASR projects, as well CHNM’s “Scholarship in the Age of Abundance: Enhancing Historical Research With Text-Mining and Analysis Tools.”