In this final installment of my summary of Digital Humanities in 2008, I’ll discuss developments in digital humanities research. (I should note that if I attempted to give a true synthesis of the year in digital humanities, this would be coming out 4 years late rather than 4 months, so this discussion reflects my own idiosyncratic interests.)
1) Defining research challenges & opportunities
What are some of the key research challenges in digital humanities? Leading scholars tackled this question when CLIR and the NEH convened a workshop on Promoting Digital Scholarship: Formulating Research Challenges In the Humanities, Social Sciences and Computation. Prior to the workshop, six scholars in classics, architectural history, physics/information sciences, literature, visualization, and information retrieval wrote brief overviews of their field and of the ways that information technology could help to advance it. By articulating the central concerns of their fields so concisely, these essays promote interdisciplinary conversation and collaboration; they’re also fun to read. As Doug Oard writes in describing the natural language processing “tribe,” “Learning a bit about the other folks is a good way to start any process of communication… The situation is really quite simple: they are organized as tribes, they work their magic using models (rather like voodoo), they worship the word “maybe,” and they never do anything right.” Sounds like my kind of tribe. Indeed, I’d love to see a wiki where experts in fields ranging from computational biology to postcolonial studies write brief essays about their fields, provide a bibliography of foundational works, and articulate both key challenges and opportunities for collaboration. (Perhaps such information could be automatically aggregated using semantic technologies—see, for instance, Concept Web or Kosmix–but I admire the often witty, personal voices of these essays.)
Here are some key ideas that emerge from the essays:
- Global Humanistic Studies: Both Caroline Levander and Greg Crane, Alison Babeu, David Bamman, Lisa Cerrato, and Rashmi Singhal call for a sort of global humanistic studies, whether re-conceiving American studies from a hemispheric perspective or re-considering the Persian Wars from the Persian point of view. Scholars working in global humanistic studies face significant challenges, such as the need to read texts in many languages and understand multiple cultural contexts. Emerging technologies promise to help scholars address these problems. For instance, named entity extraction, machine translation and reading support tools can help scholars make sense of works that would otherwise be inaccessible to them; visualization tools can enable researchers “to explore spatial and temporal dynamism;” and collaborative workspaces allow scholars to divide up work, share ideas, and approach a complex research problem from multiple perspectives. Moreover, a shift toward openly accessible data will enable scholars to more easily identify and build on relevant work. Describing how reading support tools enable researchers to work more productively, Crane et . write, “By automatically linking inflected words in a text to linguistic analyses and dictionary entries we have already allowed readers to spend more time thinking about the text than was possible as they flipped through print dictionaries. Reading support tools allow readers to understand linguistic sources at an earlier stage of their training and to ask questions, no matter how advanced their knowledge, that were not feasible in print.” We can see a similar intersection between digital humanities and global humanities in projects like the Global Middle Ages.
- What skills do humanities scholars need? Doug Oard suggests that humanities scholars should collaborate with computer scientists to define and tackle “challenge problems” so that the development of new technologies is grounded in real scholarly needs. Ultimately, “humanities scholars are going to need to learn a bit of probability theory” so that they can understand the accuracy of automatic methods for processing data, the “science of maybe.” How does probability theory jibe with humanistic traditions of ambiguity and interpretation? And how are humanities scholars going to learn these skills?
According to the symposium, major research challenges for the digital humanities include:
- “Scale and the poverty of abundance:” developing tools and methods to deal with the plenitude of data, including text mining and analysis, visualization, data management and archiving, and sustainability.
- Representing place and time: figuring out how to support geo-temporal analysis and enable that analysis to be documented, preserved, and replicated
- Social networking and the economy of attention: understanding research behaviors online; analyzing text corpora based on these behaviors (e.g. citation networks)
- Establishing a research infrastructure that facilitates access, interdisciplinary collaboration, and sustainability. “As one participant asked, “What is the Protein Data Bank for the humanities?””
2) High performance computing: visualization, modeling, text mining
What are some of the most promising research areas in digital humanities? In a sense, the three recent winners of the NEH/DOE’s High Performance Computing Initiative define three of the main areas of digital humanities and demonstrate how advanced computing can open up new approaches to humanistic research.
- text mining and text analysis: For its project on “Large-Scale Learning and the Automatic Analysis of Historical Texts,” the Perseus Digital Library at Tufts University is examining how words in Latin and Greek have changed over time by comparing the linguistic structure of classical texts with works written in the last 2000 years. In the press release announcing the winners, David Bamman, a senior researcher in computational linguistics with the Perseus Project, said that “[h]igh performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin.”
- 3D modeling: For its “High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage” project, the Institute for Advanced Technology in the Humanities at the University of Virginia will reprocess existing data to create 3D models of culturally-significant artifacts and architecture. For example, IATH hopes to re-assemble fragments that chipped off ancient Greek and Roman artifacts.
- Visualization and cultural analysis: The University of California, San Diego’s Visualizing Patterns in Databases of Cultural Images and Video project will study contemporary culture, analyzing datastreams such as “millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.” Ultimately the project will produce detailed visualizations of cultural phenomena.
Winners received compute time on a supercomputer and technical training.
Of course, there’s more to digital humanities than text mining, 3D modeling, and visualization. For instance, the category listing for the Digital Humanities and Computer Science conference at Chicago reveals the diversity of participants’ fields of interest. Top areas include text analysis; libraries/digital archives; imaging/visualization, data mining/machine learning; informational retrieval; semantic search; collaborative technologies; electronic literature; and GIS mapping. A simple analysis of the most frequently appearing terms in the Digital Humanities 2008 Book of Abstracts suggests that much research continues to focus on text—which makes sense, given the importance of written language to humanities research. Here’s the list that TAPOR generated of the 10 words most frequently used terms in the DH 2008 abstracts:
- text: 769
- digital: 763
- data: 559
- information: 546
- humanities: 517
- research: 501
- university: 462
- new: 437
- texts: 413
- project: 396
And here’s the word cloud. As someone who got started in digital humanities by marking up texts in TEI, I’m always interested in learning about developments in encoding, analyzing and visualizing texts, but some of the coolest sessions I attended at DH 2008 tackled other questions: How do we reconstruct damaged ancient manuscripts? How do we archive dance performances? Why does the digital humanities community emphasize tools instead of services?
3) Focus on method
As digital humanities emerges, much attention is being devoted to developing research methodologies. In “Sunset for Ideology, Sunrise for Methodology?,” Tom Scheinfeldt suggests that humanities scholarship is beginning to tilt toward methodology, that we are entering a “new phase of scholarship that will be dominated not by ideas, but once again by organizing activities, both in terms of organizing knowledge and organizing ourselves and our work.”
So what are some examples of methods developed and/or applied by digital humanities researchers? In “Meaning and mining: the impact of implicit assumptions in data mining for the humanities,” Bradley Pasanek and D. Sculley tackle methodological challenges posed by mining humanities data, arguing that literary critics must devise standards for making arguments based upon data mining. Through a case study testing Lakoff’s theory that political ideology is defined by metaphor, Pasanek and Sculley demonstrate that the selection of algorithms and representation of data influence the results of data mining experiments. Insisting that interpretation is central to working with humanities data, they concur with Steve Ramsay and others in contending that data mining may be most significant in “highlighting ambiguities and conflicts that lie latent within the text itself.” They offer some sensible recommendations for best practices, including making assumptions about the data and texts explicit; using multiple methods and representations; reporting all trials; making data available and experiments reproducible; and engaging in peer review of methodology.
4) Digital literary studies
Different methodological approaches to literary study are discussed in the Companion to Digital Literary Studies (DLS), which was edited by Susan Schreibman and Ray Siemens and was released for free online in the fall of 2008. Kudos to its publisher, Blackwell, for making the hefty volume available, along with A Companion to Digital Humanities. The book includes essays such as “Reading digital literature: surface, data, interaction, and expressive processing” by Noah Wardrip-Fruin, “The Virtual Codex from page space to e-space” by Johanna Drucker, “Algorithmic criticism” by Steve Ramsay, and “Knowing true things by what their mockeries be: modelling in the humanities” by Willard McCarty. DLS also provides a handy annotated bibliography by Tanya Clement and Gretchen Gueguen that highlights some of the key scholarly resources in literature, including Digital Transcriptions and Images, Born-Digital Texts and New Media Objects, and Criticism, Reviews, and Tools. I expect that the book will be used frequently in digital humanities courses and will be a foundational work.
5) Crafting history: History Appliances
For me, the coolest—most innovative, most unexpected, most wow!—work of the year came from the ever-inventive Bill Turkel, who is exploring humanistic fabrication (not in the Mills Kelly sense of making up stuff ;), but in the DIY sense of making stuff). Turkel is working on “materialization,” giving a digital representation physical form by using, for example, a rapid prototyping machine, a sort of 3D printer. Turkel points to several reasons why humanities scholars should experiment with fabrication: they can be like DaVinci, making the connection between the mind and hand by realizing an idea in physical form; study the past by recreating historical objects (fossils, historical artifacts, etc) that can be touched, rotated, scrutinized; explore “haptic history,” a sensual experience of the past; and engage in “Critical technical practice,” where scholars both create and critique.
Turkel envisions making digital information “available in interactive, ambient and tangible forms.” As Turkel argues, “As academic researchers we have tended to emphasize opportunities for dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. We need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment.” Turkel and his team are working on 4 devices: a dashboard, which shows both public and customized information streams on a large display; imagescapes and soundscapes that present streams of complex data as artificial landscapes or sound, aiding awareness; a GeoDJ, which is an iPod-like device that uses GPS and GIS to detect your location and deliver audio associated with it ( e.g. percussion for an historic industrial site); and ice cores and tree rings, “tangible browsers that allow the user to explore digital models of climate history by manipulating physical interfaces that are based on this evidence.” This work on ambient computing and tangible interfaces promises to foster awareness and open up understanding of scholarly data by tapping people’s natural way of comprehending the world through touch and other forms of sensory perception. (I guess the senses of smell and taste are difficult to include in sensual history, although I’m not sure I want to smell or taste many historical artifacts or experiences anyway. I would like to re-create the invention of the Toll House cookie, which for me qualifies as an historic occasion.) This approach to humanistic inquiry and representation requires the resources of a science lab or art studio—a large, well-ventilated space as well as equipment like a laser scanner, lathes, mills, saws, calipers, etc. Unfortunately, Turkel has stopped writing his terrific blog “Digital History Hacks” to focus on his new interests, but this work is so fascinating that I’m anxious to see what comes next–which describes my attitude toward digital humanities in general.