Category Archives: research practices

Exploring the Significance of Digital Humanities for Philosophy

On February 23, I was honored to speak at an Invited Symposium on Digital Humanities at the American Philosophical Association’s Central Division Meeting in New Orleans. Organized by Cameron Buckner, who is a Founding Project Member of InPhO and one of the leaders of the University of Houston’s Digital Humanities Initiative, the session also featured great talks by Tony Beavers on computational philosophy and David Bourget on PhilPapers.

“Join in,” by G A R N E T

One of the central questions that we explored was why philosophy seems to be less visibly engaged in digital humanities; as Peter Bradley once wondered, “Where Are the Philosophers?” As I noted in my talk, the NEH’s Office of Digital Humanities has only awarded 5 grants in philosophy (4 out of 5 to Colin Allen and colleagues on the InPhO project). Although the APA conference was much smaller than MLA or AHA, I was still surprised that there seemed to be only two sessions on DH, compared to 66 at MLA 2013 and 43 at AHA 2013.

Yet there are some important intersections among DH and philosophy. Beavers pointed to a rich history of scholarship in computational philosophy. With PhilPapers, philosophy is ahead of most other humanities disciplines in having an excellent online index to and growing repository of research.  Most of the same challenges faced by philosophers with an interest in DH apply to other domains, such as figuring out how to acquire appropriate training (particularly for graduate students), recognizing and rewarding collaborative work, etc.

My talk was a remix and updating of my presentation “Why Digital Humanities?” In exploring the rationale for DH, I tried to cite examples relevant to philosophy. For example, the Stanford Encyclopedia of Philosophy, a dynamic online encyclopedia that predates Wikipedia, has had a significant impact, with an average of nearly a million weekly accesses during the academic year. With CT2.0, Peter Bradley aims to create a dynamic, modular, multimedia, interactive, community-driven textbook on critical thinking. Openness and collaboration also inform the design of Chris Long and Mark Fisher’s planned Public Philosophy Journal, which seeks to put public philosophy into practice by curating conversations, facilitating open review, encouraging collaborative writing, and fostering open dialogue. Likewise, I described how Transcribe Bentham is enabling the public to help create a core scholarly resource.  I also discussed recent critiques of DH, including Stephen Marche’s “literature is not data,” the 2013 MLA session on the “dark side” of DH, and concerns that DH risks being elitist. I closed by pointing to some useful resources in DH and calling for open conversation among the DH and philosophy communities. With that call in mind, I wonder: Is it the case that philosophy is less actively engaged in digital humanities?  If so, why, and what might be done to address that gap?

20/30 Vision: Scenarios for the Humanities in 2030

[Here is the extended dance remix version of the talk I gave at the 2010 American Studies Association panel on "Facing New Technologies, Exploring New Challenges."]

We seem to be anxious about the future—heck, the present—of the humanities.  Consider budget cuts such as those at SUNY-Albany and in the UK, the horrible job market, the declining number of majors, and the frequent appearance of articles with titles like “Can the Humanities Survive the 21st Century?

Instead of focusing on the present in this panel on “Facing New Technologies, Exploring New Challenges,” I’d like to zoom forward twenty years using a process called scenario planning. Essentially, a scenario is a brief story about the future. By working through such stories, organizations can look at the proverbial big picture and devise strategies for facing critical uncertainties in future environments, such as the nature of technological change, the state of higher education, and globalization.  (Given its emphasis on storytelling and interpretation, scenario planning seems like an approach at home in the humanities.)

Recently both the Association of Research Libraries and the Association of College and Research Libraries issued reports about the future of libraries based on scenario planning. (You might have noticed that libraries are also anxious as they face the transition to digital information.) My favorite of the genre is the Library of New South Wales’ The Bookends Scenarios, both because it confronts larger challenges such as climate change and because it leavens gloominess with imagination and humor, such as: “Book by James Lovelock Jnr claims that 98% of human race will be extinct by 2100; 78% of people say they wish James Lovelock Jnr would become extinct by 2029.”

Although scenario planning has its skeptics, I can testify to the ways that it can help people break out of their typical ways of seeing and stimulate their imaginations. Just this week, my library held a retreat based on the ARL 2030 Scenarios.  Despite some grumbling about the unlikelihood of any of the scenarios coming to pass, participants did think deeply and creatively about risks and opportunities facing academic libraries as research becomes more global, entrepreneurial, and data driven. The scenarios sparked conversation.

Today I’d like to put forward three scenarios for the future of the humanities. I’m mashing together the aforementioned library scenarios with the Rockefeller Foundation’s Scenarios for the Future of Technology and International Development and Bryan Alexander’s “Stories of the Future: Telling Scenarios,” as well as a dash of David Mitchell’s Cloud Atlas. A few caveats: 1) I’m notoriously bad at predicting the future. (I really thought I would enjoy treats whipped up by a robot chef by now). 2) The scenarios are compressed and partial.   3) The future will most likely not be any one of these scenarios, although it may contain elements of some of them. 4) A diverse community rather than a quirky individual should develop and think through future scenarios.

I aim to open up a conversation, not have the final word. (It might be useful for an organization such as CenterNet, the Association for Computers and the Humanities or the NEH to take on this exercise in earnest.) The core question that I want to explore: how can we transform the humanities so that they continue to be relevant in twenty years–so that they “survive the 21st century”?

Critical Uncertainties

In defining these scenarios, I am considering several “critical uncertainties”:

  • Teaching and learning: As distance education becomes more dominant, what will humanities education look like?
  • Funding sources: Where will money for humanities research come from, especially as public funding is under stress?
  • Research methods: How will the availability of huge amounts of data (for instance, the 12+ million volumes in Google Books) affect the way humanities research is conducted?
  • Knowledge production and dissemination: How will research be communicated? Will there be free and open access to information, or will it be available only to the highest bidder?
  • Environmental, social, political, technological and cultural changes: What will be the impact of climate change, peak oil, population growth, resource depletion, economic challenges, developments in technology, and globalization on the world?

Based on these uncertainties, I’ve whipped up three scenarios. (To conform to the genre, I should offer four, but I can only cram so much into a 12 minute presentation).

I.     A New Renaissance

the green ascent (by vsz)

i.     Summary: Through broad, sustained investment in education, the world enjoys greater equity and opportunity. Interdisciplinary research and international cooperation have led to progress on resolving many challenges, including climate change, political conflict, and resource depletion.

ii.     Research: Humanities scholars are valued for bringing critical understanding to large amounts of data. In collaboration with computer scientists and librarians, humanities scholars devise methods to mine large humanities databases, coming up with new questions and insights that cross disciplinary and linguistic divides. Humanities (and digital humanities) centers help to coordinate much of this activity. Through efforts by leading scholars and scholarly organizations, tenure and promotion guidelines have been broadened to recognize a wide range of work, including scholarly multimedia, online dialogues, and curated content.

iii.     Teaching: Blended learning has become common, with lectures and exercises delivered online and face-to-face time reserved for discussion and collaborative research. Faculty act as guides and mentors for networked research projects that engage students around the world in producing new knowledge. The humanities provide crucial training in curating, contextualizing and interpreting large amounts of data, as well as in critically examining individual objects.

iv.     Scholarly communication: Research is openly available, speeding the pace of discovery and spreading ideas widely. To capture the complexities of their research, scholars produce multimodal scholarship that incorporates video, audio, visualizations, maps, etc.

2.   Humanities, Inc.

Banksy-Cashpoint (by TT)

i.     Summary: As the United States faces economic crises, public funding for education and research erodes.  People feel both overwhelmed by information and hungry for whatever supports their own perspective. Political conflict erupts around the world as a result of resource depletion and climate change, prompting the US to go into a defensive crouch.

ii.     Research: To the extent that research is funded, the money mostly comes from corporations, often with strings attached. Researchers no longer have tenured positions at universities, but move from contract to contract. By necessity, researchers focus on “what pays?”  However, some scholars work with the public to produce crowdsourced humanities research.

iii.     Teaching: Most undergraduate education is offered through distance education; students choose from a menu of choices rather than attending a particular institution.  Instruction mostly focuses on vocational skills. A few elite institutions remain and offer face-to-face instruction for the very wealthy.  Teachers, most of whom are employed by private companies, teach classes with several hundred people, leaving no time for research. Except for a few “rock stars,” the academic labor force is contingent.

iv.     Scholarly communication: Except for crowdsourced information, most research is available only to those individuals and communities who pay for it.

c.     After the Fall

petrol head (Leo Reynolds)

i.     Summary: The devastating effects of climate change, energy shortages, and economic recession prompt a return to localism, so that local communities provide for most of people’s needs. Some areas have descended into chaos or totalitarianism, run by bandits or warlords.  But others have developed democratic local solutions—microindustries, local power grids, community gardens, co-ops. Despite the scarcity of energy and frequent power outages, people occasionally are able to access and share information on the Internet, but travel becomes rare. The humanities provide a respite from day-to-day drudgery and a source of perspective and wisdom.

ii.     Research: Scholars become research hackers, devising solutions to problems both by studying past folkways and by surveying what other communities are doing now. They are resourceful in retrieving information however they can, taking full advantage of the time when they can access the Internet. There is a renewed appreciation for aesthetics, for well-made or meaningful objects. Humanities centers focus on bridging different interests groups working in the humanities, including secondary education and local cultural organizations.

iii.     Teaching: Although much education focuses on core skills such as literacy, craftsmanship, and agriculture, humanists are valued as wisdom keepers and curators of knowledge, distilling what is important on and passing on cultural appreciation.

iv.     Scholarly communication: Given the unreliability of the electrical grid, print becomes valued for its stability.  Scholars frequently participate in public conversations in their communities.

What Now?

Reflections (Kevin Dolley)


So how can the humanities prepare for these possible futures?

1.     Adapt! Engage with and understand technology’s role in the humanities. Like it or not, technology is shaping our future—both how we do our research and, increasingly, how learning is delivered.   Thus we should experiment with new models for teaching, peer review, research, and scholarly communication. For example, the Center for History and New Media have been doing some fascinating experiments to challenge the slow pace of academia and, perhaps even more importantly, create community, whether by crowdsourcing a book or creating a piece of software in a week. Likewise, the Looking for Whitman project is linking together college classrooms in the study of Walt Whitman and engaging students in producing public scholarship. (Whitman would approve, I think.) We need to make visible the value of this kind of work.

2.     Cooperate! Support collaborative, interdisciplinary research.  Such collaboration should occur on many levels: across professional roles, departments, universities, and community organizations. Greg Crane recently made a compelling case that “We need better ways to understand the cultures that drive economic and political systems upon which our biological lives depend.”  To do that, as Crane argues,we need to ask good questions about the connections among cultures, foster dialogue, collaborate with scholars from a range of cultural backgrounds, and make scholarship widely available.  AWe also need to devise ways of dealing with masses of data, both through developing computational approaches and by opening up research opportunities to students and volunteers.

Humanities centers (working in collaboration with libraries and with scholarly organizations) should play a lead role in supporting cross-disciplinary research and in communicating that research to the public. As I found in a recent research project on collaboration in the digital humanities, many humanities departments still do not know how to evaluate collaborative work for tenure and promotion; this should change. Likewise, recognition and support should be given to those in “alternative academic careers”—librarians, technologists, administrators, researchers, and others who are key players in digital humanities initiatives.

3.     Open! Reform scholarly communication so that it is open, multimodal, participatory, and high quality.  If we want to convince the public of the value of the humanities, then we shouldn’t make it prohibitively expensive for them to access scholarship.  Rather, we should come up with sustainable models for scholars to share their research and participate in visible scholarly conversations.

4.     Evangelize! Advocate for the value of the humanities—and indeed of research and education generally. In particular, I encourage you to support 4humanities, a new web site and initiative to advocate for the humanities. Launched by a collective that is coordinated by Alan Liu (I’m proud to be a member), 4humanities leverages the expertise of the digital humanities community to provide tools, media and resources for promoting for the humanities.

The key point that I want to emphasize is the importance of community in facing challenges/opportunities, as well as in advocating for the humanities. (This idea was developed collectively by our ASA panel—Haven Hawley, Charles Reagan Wilson, Elena Razlogova, and myself– during a breakfast gathering to plan our session.) I think digital humanities scholars/practitioners have been pretty successful in building community, using both networked technologies such as blogs and Twitter and face-to-face gatherings such as THATCamp to connect people, ideas and action.  But we can do more. Let’s get moving!


Digital Humanities in 2008, III: Research

In this final installment of my summary of Digital Humanities in 2008, I’ll discuss developments in digital humanities research. (I should note that if I attempted to give a true synthesis of the year in digital humanities, this would be coming out 4 years late rather than 4 months, so this discussion reflects my own idiosyncratic interests.)

1) Defining research challenges & opportunities

What are some of the key research challenges in digital humanities? Leading scholars tackled this question when CLIR and the NEH convened a workshop on Promoting Digital Scholarship: Formulating Research Challenges In the Humanities, Social Sciences and Computation. Prior to the workshop, six scholars in classics, architectural history, physics/information sciences, literature, visualization, and information retrieval wrote brief overviews of their field and of the ways that information technology could help to advance it. By articulating the central concerns of their fields so concisely, these essays promote interdisciplinary conversation and collaboration; they’re also fun to read. As Doug Oard writes in describing the natural language processing “tribe,” “Learning a bit about the other folks is a good way to start any process of communication… The situation is really quite simple: they are organized as tribes, they work their magic using models (rather like voodoo), they worship the word “maybe,” and they never do anything right.” Sounds like my kind of tribe. Indeed, I’d love to see a wiki where experts in fields ranging from computational biology to postcolonial studies write brief essays about their fields, provide a bibliography of foundational works, and articulate both key challenges and opportunities for collaboration. (Perhaps such information could be automatically aggregated using semantic technologies—see, for instance, Concept Web or Kosmix–but I admire the often witty, personal voices of these essays.)

Here are some key ideas that emerge from the essays:

  1. Global Humanistic Studies: Both Caroline Levander and Greg Crane, Alison Babeu, David Bamman, Lisa Cerrato, and Rashmi Singhal call for a sort of global humanistic studies, whether re-conceiving American studies from a hemispheric perspective or re-considering the Persian Wars from the Persian point of view. Scholars working in global humanistic studies face significant challenges, such as the need to read texts in many languages and understand multiple cultural contexts. Emerging technologies promise to help scholars address these problems. For instance, named entity extraction, machine translation and reading support tools can help scholars make sense of works that would otherwise be inaccessible to them; visualization tools can enable researchers “to explore spatial and temporal dynamism;” and collaborative workspaces allow scholars to divide up work, share ideas, and approach a complex research problem from multiple perspectives. Moreover, a shift toward openly accessible data will enable scholars to more easily identify and build on relevant work. Describing how reading support tools enable researchers to work more productively, Crane et . write, “By automatically linking inflected words in a text to linguistic analyses and dictionary entries we have already allowed readers to spend more time thinking about the text than was possible as they flipped through print dictionaries. Reading support tools allow readers to understand linguistic sources at an earlier stage of their training and to ask questions, no matter how advanced their knowledge, that were not feasible in print.” We can see a similar intersection between digital humanities and global humanities in projects like the Global Middle Ages.
  2. What skills do humanities scholars need? Doug Oard suggests that humanities scholars should collaborate with computer scientists to define and tackle “challenge problems” so that the development of new technologies is grounded in real scholarly needs. Ultimately, “humanities scholars are going to need to learn a bit of probability theory” so that they can understand the accuracy of automatic methods for processing data, the “science of maybe.” How does probability theory jibe with humanistic traditions of ambiguity and interpretation? And how are humanities scholars going to learn these skills?

According to the symposium, major research challenges for the digital humanities include:

  1. Scale and the poverty of abundance:” developing tools and methods to deal with the plenitude of data, including text mining and analysis, visualization, data management and archiving, and sustainability.
  2. Representing place and time: figuring out how to support geo-temporal analysis and enable that analysis to be documented, preserved, and replicated
  3. Social networking and the economy of attention: understanding research behaviors online; analyzing text corpora based on these behaviors (e.g. citation networks)
  4. Establishing a research infrastructure that facilitates access, interdisciplinary collaboration, and sustainability. “As one participant asked, “What is the Protein Data Bank for the humanities?””

2) High performance computing: visualization, modeling, text mining

What are some of the most promising research areas in digital humanities? In a sense, the three recent winners of the NEH/DOE’s High Performance Computing Initiative define three of the main areas of digital humanities and demonstrate how advanced computing can open up new approaches to humanistic research.

  • text mining and text analysis: For its project on “Large-Scale Learning and the Automatic Analysis of Historical Texts,” the Perseus Digital Library at Tufts University is examining how words in Latin and Greek have changed over time by comparing the linguistic structure of classical texts with works written in the last 2000 years. In the press release announcing the winners, David Bamman, a senior researcher in computational linguistics with the Perseus Project, said that “[h]igh performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin.”
  • 3D modeling: For its “High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage” project, the Institute for Advanced Technology in the Humanities at the University of Virginia will reprocess existing data to create 3D models of culturally-significant artifacts and architecture. For example, IATH hopes to re-assemble fragments that chipped off  ancient Greek and Roman artifacts.
  • Visualization and cultural analysis: The University of California, San Diego’s Visualizing Patterns in Databases of Cultural Images and Video project will study contemporary culture, analyzing datastreams such as “millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.” Ultimately the project will produce detailed visualizations of cultural phenomena.

Winners received compute time on a supercomputer and technical training.

Of course, there’s more to digital humanities than text mining, 3D modeling, and visualization. For instance, the category listing for the Digital Humanities and Computer Science conference at Chicago reveals the diversity of participants’ fields of interest. Top areas include text analysis; libraries/digital archives; imaging/visualization, data mining/machine learning; informational retrieval; semantic search; collaborative technologies; electronic literature; and GIS mapping. A simple analysis of the most frequently appearing terms in the Digital Humanities 2008 Book of Abstracts suggests that much research continues to focus on text—which makes sense, given the importance of written language to humanities research.  Here’s the list that TAPOR generated of the 10 words most frequently used terms in the DH 2008 abstracts:

  1. text: 769
  2. digital: 763
  3. data: 559
  4. information: 546
  5. humanities: 517
  6. research: 501
  7. university: 462
  8. new: 437
  9. texts: 413
  10. project: 396

“Images” is used 161 times, visualization 46.

Wordle: Digital Humanities 2008 Book of Abstracts

And here’s the word cloud. As someone who got started in digital humanities by marking up texts in TEI, I’m always interested in learning about developments in encoding, analyzing and visualizing texts, but some of the coolest sessions I attended at DH 2008 tackled other questions: How do we reconstruct damaged ancient manuscripts? How do we archive dance performances? Why does the digital humanities community emphasize tools instead of services?

3) Focus on method

As digital humanities emerges, much attention is being devoted to developing research methodologies. In “Sunset for Ideology, Sunrise for Methodology?,” Tom Scheinfeldt suggests that humanities scholarship is beginning to tilt toward methodology, that we are entering a “new phase of scholarship that will be dominated not by ideas, but once again by organizing activities, both in terms of organizing knowledge and organizing ourselves and our work.”

So what are some examples of methods developed and/or applied by digital humanities researchers? In “Meaning and mining: the impact of implicit assumptions in data mining for the humanities,” Bradley Pasanek and D. Sculley tackle methodological challenges posed by mining humanities data, arguing that literary critics must devise standards for making arguments based upon data mining. Through a case study testing Lakoff’s theory that political ideology is defined by metaphor, Pasanek and Sculley demonstrate that the selection of algorithms and representation of data influence the results of data mining experiments. Insisting that interpretation is central to working with humanities data, they concur with Steve Ramsay and others in contending that data mining may be most significant in “highlighting ambiguities and conflicts that lie latent within the text itself.” They offer some sensible recommendations for best practices, including making assumptions about the data and texts explicit; using multiple methods and representations; reporting all trials; making data available and experiments reproducible; and engaging in peer review of methodology.

4) Digital literary studies

Different methodological approaches to literary study are discussed in the Companion to Digital Literary Studies (DLS), which was edited by Susan Schreibman and Ray Siemens and was released for free online in the fall of 2008. Kudos to its publisher, Blackwell, for making the hefty volume available, along with A Companion to Digital Humanities. The book includes essays such as “Reading digital literature: surface, data, interaction, and expressive processing” by Noah Wardrip-Fruin, “The Virtual Codex from page space to e-space” by Johanna Drucker, “Algorithmic criticism” by Steve Ramsay, and “Knowing true things by what their mockeries be: modelling in the humanities” by Willard McCarty. DLS also provides a handy annotated bibliography by Tanya Clement and Gretchen Gueguen that highlights some of the key scholarly resources in literature, including Digital Transcriptions and Images, Born-Digital Texts and New Media Objects, and Criticism, Reviews, and Tools. I expect that the book will be used frequently in digital humanities courses and will be a foundational work.

5) Crafting history: History Appliances

For me, the coolest—most innovative, most unexpected, most wow!—work of the year came from the ever-inventive Bill Turkel, who is exploring humanistic fabrication (not in the Mills Kelly sense of making up stuff ;), but in the DIY sense of making stuff). Turkel is working on “materialization,” giving a digital representation physical form by using, for example, a rapid prototyping machine, a sort of 3D printer. Turkel points to several reasons why humanities scholars should experiment with fabrication: they can be like DaVinci, making the connection between the mind and hand by realizing an idea in physical form; study the past by recreating historical objects (fossils, historical artifacts, etc) that can be touched, rotated, scrutinized; explore “haptic history,” a sensual experience of the past; and engage in “Critical technical practice,” where scholars both create and critique.

Turkel envisions making digital information “available in interactive, ambient and tangible forms.”  As Turkel argues, “As academic researchers we have tended to emphasize opportunities for dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. We need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment.” Turkel and his team are working on 4 devices: a dashboard, which shows both public and customized information streams on a large display; imagescapes and soundscapes that present streams of complex data as artificial landscapes or sound, aiding awareness; a GeoDJ, which is an iPod-like device that uses GPS and GIS to detect your location and deliver audio associated with it ( e.g. percussion for an historic industrial site); and ice cores and tree rings, “tangible browsers that allow the user to explore digital models of climate history by manipulating physical interfaces that are based on this evidence.” This work on ambient computing and tangible interfaces promises to foster awareness and open up understanding of scholarly data by tapping people’s natural way of comprehending the world through touch and other forms of sensory perception. (I guess the senses of smell and taste are difficult to include in sensual history, although I’m not sure I want to smell or taste many historical artifacts or experiences anyway. I would like to re-create the invention of the Toll House cookie, which for me qualifies as an historic occasion.) This approach to humanistic inquiry and representation requires the resources of a science lab or art studio—a large, well-ventilated space as well as equipment like a laser scanner, lathes, mills, saws, calipers, etc. Unfortunately, Turkel has stopped writing his terrific blog “Digital History Hacks” to focus on his new interests, but this work is so fascinating that I’m anxious to see what comes next–which describes my attitude toward digital humanities in general.

Digital Humanities Sessions at MLA 2008

A couple of days after returning from the MLA (Modern Language Association) conference, I ran into a biologist friend who had read about the “conference sex” panel at MLA.  She said,  “Wow, sometimes I doubt the relevance of my research, but that conference sounds ridiculous.” I’ve certainly had my moments of skepticism toward the larger purposes of literary research while sitting through dull conference sessions, but my MLA experience actually made me feel jazzed and hopeful about the humanities.  That’s because the sessions that I attended–mostly panels on the digital humanities–explored topics that seemed both intellectually rich and relevant to the contemporary moment.  For instance, panelists discussed the significance of networked reading, dealing with information abundance, new methods for conducting research such as macroanalysis and visualization, participatory learning, copyright challenges, the shift (?) to digital publishing, digital preservation, and collaborative editing.  Here are my somewhat sketchy notes about the MLA sessions I was able to attend; see great blog posts by Cathy Davidson, Matt Gold, Laura Mandell, Alex Reid, and John Jones for more reflections on MLA 2008.

1)    Seeing patterns in literary texts
At the session “Defoe, James, and Beerbohm: Computer-Assisted Criticism of Three Authors,” David Hoover noted that James scholars typically distinguish between his late and early work.  But what does that difference look like?  What evidence can we find of such a distinction? Hoover used computational/ statistical methods such as Principal Components Analysis and the T-test to examine the word choice in across James’ work and found some striking patterns illustrating that James’ diction during his early period was indeed quite different from his late period.   Hoover introduced the metaphor of computational approaches to literature serving either as a telescope (macroanalysis, discerning patterns across a large body of texts) or a microscope (looking closely at individual works or authors).

2)    New approaches to electronic editing

The ACH Guide to Digital-Humanities Talks at the 2008 MLA Convention lists at least 9 or 10 sessions concerned with editing or digital archives, and the Chronicle of Higher Ed dubbed digital editing as a “hot topic” for MLA 2008.   At the session on Scholarly Editing in the Twenty-First Century: Digital Media and Editing, Peter Robinson (whose paper was delivered by Jerome McGann and included passages referencing Jerome McGann) presented the idea of “Editing without walls,” shifting from a centralized model where a scholar acts as the “guide and guardian” who oversees work on an edition to a distributed, collaborative model.  With “community made editions,” a library would produce high quality images, researchers would transcribe those images, other researchers would collate the transcriptions, others would analyze the collations and add commentaries, etc. Work would be distributed and layered.  This approach opens up a number of questions: what incentives will researchers have to work on the project? How will the work be coordinated? Who will maintain the distributed edition for the long term?  But Robinson claimed that the approach would have significant advantages, including reduced cost and greater community investment in the editions.  Several European initiatives are already working on building tools and platforms similar to what Peter Shillingsburg calls “electronic knowledge sites,” including the Discovery Project, which aims to “explore how Semantic Web technology can help to create a state-of-the-art research and publishing environment for philosophy” and the Virtual Manuscript Room, which “will bring together digital resources related to manuscript materials (digital images, descriptions and other metadata, transcripts) in an environment which will permit libraries to add images, scholars to add and edit metadata and transcripts online, and users to access material.”

Matt Kirschenbaum then posed the provocative question if Shakespeare had a hard drive, what would scholars want to examine: when he began work on King Lear, how long he worked on it, what changes he made, what web sites he consulted while writing?  Of course, Shakespeare didn’t have a hard drive, but almost every writer working now uses a computer, so it’s possible to analyze a wide range of information about the writing process.  Invoking Tom Tanselle, Matt asked, “What are the dust jackets of the information age?” That is, what data do we want to preserve?  Discussing his exciting work with Alan Liu and Doug Reside to make available William Gibson’s Agrippa in emulation and as recorded on video in the early 1990s, Matt demonstrated how emulation can be used to simulate the original experience of this electronic poem.  He emphasized the importance of collaborating with non-academics–hackers, collectors, and even Agrippa’s original publisher–to learn about Agrippa’s history and make the poem available.  Matt then addressed digital preservation.  Even data designed to self-destruct is recoverable, but Matt expressed concern about cloud computing, where data exists on networked servers.  How will scholars get access to a writer’s email, Facebook updates, Google Docs, and other information stored online?  Matt pointed to several projects working on the problem of archiving electronic art and performances by interviewing artists about what’s essential and providing detailed descriptions of how they should be re-created: Forging the Future and Archiving the Avante Garde.
3)    Literary Studies in the Digital Age: A Methodological Primer

At the panel on Methodologies Literary Studies in the Digital Age, Ken Price discussed a forthcoming book that he is co-editing with Ray Siemens called Literary Studies in a Digital Age: A Methodological Primer.  The book, which is under consideration by MLA Press, will feature essays such as John Unsworth on electronic scholarly publishing, Tanya Clement on critical trends, David Hoover on textual analysis, Susan Schreibman on electronic editing, and Bill Kretzschmer on GIS, etc.   Several authors to be included in the volume—David Hoover, Alan Liu, and Susan Schreibman—spoke.

Hoover began with a provocative question: do we really want to get to 2.0, collaborative scholarship? He then described different models of textual analysis:
i.    the portal (e.g. MONK, TAPOR): typically a suite of simple tools; platform independent; not very customizable
ii.     desktop tools (e.g. TACT)
iii.    standardized software used for text analysis (e.g. Excel)

Next, Alan Liu  discussed his Transliteracies project, which examines the cultural practices of online reading and the ways in which reading changes in a digital environment (e.g. distant reading, sampling, and “networked public discourse,” with links, comments, trackback, etc).  The transformations in reading raise important questions, such as the relationship between expertise and networked public knowledge.  Liu pointed to a number of crucial research and development goals (both my notes and memory get pretty sketchy here):
1)    development of a standardized metadata scheme for annotating social networks
2)    data mining and annotating social computing
3)    reconciling metadata with writing systems
4)    information visualization for the contact zone between macro-analysis and close reading
5)    historical analysis of past paradigms for reading and writing
6)    authority-adjudicating systems to filter content
7)    institutional structures to encourage scholars to share and participate in new public knowledge

Finally, Susan Schreibman discussed electronic editions.  Among the first humanities folks drawn to the digital environment were editors, who recognized that electronic editions would allow them to overcome editorial challenges and present a history of the text over time, pushing beyond the limitations of the textual apparatus and representing each edition.  Initially the scholarly community focused on building single author editions such as the Blake and Whitman Archives.  Now the community is trying to get beyond siloed projects by building grid technologies to edit, search and display texts.  (See, for example, TextGrid, http://www.textgrid.de/en/ueber-textgrid.html).   Schreibman asked how we can use text encoding to “unleash the meanings of text that are not transparent” and encode themes or theories of text, then use tools such as TextArc or ManyEyes to engage in different spatial/temporal views.

A lively discussion of crowdsourcing and expert knowledge followed, hinging on the question of what the humanities have to offer in the digital age.  Some answers: historical perspective on past modes of reading, writing and research; methods for dealing with multiplicity, ambiguity and incomplete knowledge; providing expert knowledge about which text is the best to work with.  Panelists and participants envisioned new technologies and methods to support new literacies, such as the infrastructure that would enable scholars and readers to build their own editions; a “close-reading machine” based upon annotations that would enable someone to study, for example, dialogue in the novel; the ability to zoom out to see larger trends and zoom in to examine the details; the ability to examine “humanities in the age of total recall,” analyzing the text in a network of quotation and remixing; developing methods for dealing with what is unknowable.

4) Publishing and Cyberinfrastructure

At the panel on publishing and cyberinfrastracture moderated by Laura Mandell, Penny Kaiserling from the University of Virginia Press, Linda Bree from Cambridge UP, and Michael Lonegro from Johns Hopkins Press discussed the challenges that university presses are facing as they attempt to shift into the digital.  At Cambridge, print sales are currently subsidizing ebooks.  Change is slower than was envisioned ten years ago, more evolutionary than revolutionary.  All three publishers emphasized that publishers are unlikely to transform their publishing model unless academic institutions embrace electronic publication, accepting e-publishing for tenure and promotion and purchasing electronic works.  Ultimately, they said, it is up to the scholarly community to define what is valued.  Although the shift into electronic publishing of journals is significant, academic publishers’ experience lags in publishing monographs.  One challenge is that journals are typically bundled, but there isn’t such a model for bundling books.  Getting third party rights to illustrations and other copyrighted materials included in a book is another challenge.  Ultimately scholars will need to rethink the monograph, determining what is valuable (e.g. the coherence of an extended argument) and how it exists electronically, along with the benefits offered by social networking and analysis.  Although some in the audience challenged the publishers to take risks in initiating change themselves, the publishers emphasized that it is ultimately up to scholarly community.  The publishers also asked why the evaluation of scholarship depended on a university press constrained by economics rather than scholars themselves–that is, why professional review has been outsourced to the university press.

5) Copyright

The panel on Promoting the Useful Arts: Copyright, Fair Use, and the Digital Scholar, which was moderated by Steve Ramsay, featured Aileen Berg explaining the publishing industry’s view of copyright, Robin G. Schulze describing the nightmare of trying to get rights to publish an electronic edition of Marianne Moore’s notebooks, and Kari Kraus detailing how copyright and contract law make digital preservation difficult.  Schulze asked where the MLA was when copyright was extended through the Sony Bono Act, limiting what scholars can do, and said she is working on pre-1923 works to avoid the copyright nightmare.  Berg, who was a good sport to go before an audience not necessarily sympathetic to the publishing industry’s perspective, advised authors to exercise their own rights and negotiate their agreements rather than simply signing what is put before; often they can retain some rights.  Kraus discussed how licenses (such as click-through agreements) are further limiting how scholars can use intellectual works but noted some encouraging signs, such as the James Joyce estate’s settlement with a scholar allowing her to use copyrighted materials in her scholarship.  Attendees discussed ways that literature professors could become more active in challenging unfair copyright limitations, particularly through advocacy work and supporting groups such as the Electronic Frontier Foundation.

6) Humanities 2.0: Participatory Learning in an Age of Technology

The Humanities 2.0 panel featured three very interesting presentations about the projects funded through the MacArthur Digital Learning competition, as well Cathy Davidson’s overview of the competition and of HASTAC.  (For a fuller discussion of the session, see Cathy Davidson’s summary.) Davidson drew a distinction between “digital humanities,” which uses the digital technologies to enhance the mission of the humanities, and humanities 2.0, which “wants us to combine critical thinking about the use of technology in all aspects of social life and learning with creative design of future technologies” (Davidson).    Next Howard Rheingold discussed the “social media classroom,” which is “a free and open-source (Drupal-based) web service that provides teachers and learners with an integrated set of social media that each course can use for its own purposes—integrated forum, blog, comment, wiki, chat, social bookmarking, RSS, microblogging, widgets, and video commenting are the first set of tools.”  Todd Presner showcased the Hypercities project, a geotemporal interface for exploring and augmenting spaces.  Leveraging the Google Maps API and KML, HyperCities enable people to navigate and narrate their own past through space and time, adding their own markers to the map and experiencing different layers of time and space.  The project is working with citizens and students to add their own layers of information—images, narratives—to the maps, making available an otherwise hidden history.  Currently there are maps for Rome, LA, New York, and Berlin.   A key principle behind HyperCities is aggregating and integrating archives, moving away from silos of information. Finally, Greg Niemeyer and Antero Garcia presented BlackCloud.org, which is engaging students and citizens in tracking pollution using whimsically designed sensors that measure pollution.  Students tracked levels of pollution at different sites—including in their own classroom—and began taking action, investigating the causes of pollution and advocating for solutions.  What unified these projects was the belief that students and citizens have much to contribute in understanding and transforming their environments.

7. The Library of Google: Researching Scanned Books

What does Google Books mean for literary research?  Is Google Books more like a library or a research tool?  What kind of research is made possible by Google Books (GB)? What are GB’s limitations?  Such questions were discussed in a panel on Google Books that was moderated by Michael Hancher included Amanda French, Eleanor Shevlin, and me.  Amanda described how Google Books enabled her to find earlier sources on the history of the villanelle than she was able to locate pre-GB, Eleanor provided a book history perspective on GB, and I discussed the advantages and limitations of GB for  digital scholarship (my slides are available here).  A lively discussion among the 35 or so attendees followed; all but one person said that GB was, on balance, good for scholarship, although some people expressed concern that GB would replace interlibrary loan, indicated that they use GB mainly as a reference tool to find information in physical volumes, and emphasized the need to continue to consult physical books for bibliographic details such as illustrations and bindings.

8. Posters/Demonstrations: A Demonstration of Digital Poetry Archives and E-Criticism: New Critical Methods and Modalities

I was pleased to see the MLA feature two poster sessions, one on digital archives, one on digital research methods. Instead of just watching a presentation, attendees could engage in discussion with project developers and see how different archives and tools worked.  That kind of informal exchange allows people to form collaborations and have a more hands-on understanding of the digital humanities. (I didn’t take notes and the sessions occurred in the evening, when my brain begins to shut down, so my summary is pretty unsophisticated: wow, cool.)

Reflections on MLA

This was my first MLA and, despite having to leave home smack in the middle of the holidays, I enjoyed it.   Although many sessions that I attended shifted away from the “read your paper aloud when people are perfectly capable of reading it themselves” model, I noted the MLA’s requirement that authors bring three copies of their paper to provide upon request, which raises the question what if you don’t have a paper (just Powerpoint slides or notes) and why can’t you share electronically? And why doesn’t the MLA  provide fuller descriptions of the sessions besides just title and speakers?  (Or am I just not looking in the right place?)  Sure, in the paper era that would mean the conference issue of PMLA would be several volumes thick, but if the information were online there would be a much richer record of each session.  (Or you could enlist bloggers or twitterers [tweeters?] to summarize each session…) After attending THAT Camp, I’m a fan of the unconference model, which fosters the kind of engagement that conferences should be all about—conversation, brainstorming, and problem-solving rather than passive listening.  But lively discussions often do take place during the Q & A period and in the hallways after the sessions (and who knows what takes place elsewhere…)

Studying the History of Reading Using Google Books (and Other Sources)

To what extent can digital collections such as Google Books help to reconstruct us to the history of readers’ responses to literary works–in my case, readers’ responses to Reveries of a Bachelor (1850), which I’m using as a case study of doing research in the Library of Google?  (For an account of my post-marital fascination with bachelors, see my last post.) Readers’ enthusiasm for this sentimental work stirred up my own interest in it.  At Yale’s Beineke Library, I examined a cache of fan letters in which readers rhapsodized  over the bachelor’s

Patrick Henry's annotations to Reveries

Patrick Henry's annotations to Reveries

reveries and connected them to their own experiences.  As one correspondent, a doctor, wrote,  “I have found it really a book of the heart—of my heart—an echo of my own reveries.”  At Yale I also examined Emily Dickinson’s copy of Reveries, where she (or perhaps someone to whom she loaned the volume) made marks next to significant passages. At the University of Virginia Library, I stumbled across an 1886 edition of Reveries heavily annotated by a young man named Patrick Henry.  In a passage where Mitchell described “a Bachelor of seven and twenty,” Patrick crossed out the seven and wrote in “four,” signaling his own intense identification with the bachelor narrator.  Drawing on these and other examples, I wrote a dissertation chapter on readers’ responses to Reveries (later to morph into a 2003 article in Book History) that challenged the notion that sentimental readers were passive.  But I was examining a fairly limited set of reader responses–about 25 letters from the 1850s to the late 19th century, plus a couple of annotated copies of Reveries.  I could offer an even richer analysis of readers’ reactions to Reveries by examining journal entries, memoirs, and letters, as well as even more annotated copies.  I’m especially interested in whether readers’ views of the book changed over time, given that the book was popular from 1850 into the twentieth century. Could I find such evidence in Google Books?

What I Found

Here’s what I found doing a keyword search in Google Books “Reveries of a Bachelor”; I still need to process the hundreds of results I got searching for “Ik Marvel” and “Ike Marvel” (the pen name of the author of Reveries), as well as searching for those terms in the Open Content Alliance.

  • Recent secondary sources on reading that include short passages on Reveries:
    • Ronald and Mary Saracino Zboray’s 2006 account of a would-be suitor attempting to woo an already-engaged woman by giving her a copy of Reveries; she noted in her diary that she would prefer to read the book than spend time with him
    • Claire White Putala’s Reading and Writing Ourselves Into Being, which discusses how Joe Lord recommended Reveries to Eliza Wright Osborne immediately before she married another man
    • Alan Boye’s account in Tales from the Journey of the Dead of a soldier suffering from a broken heart who read Reveries in a Confederate camp
    • So, hmm, Reveries seems to have been read by heartbroken men, who seemed to use the book to express how they felt to the women they were pursuing.  All three of the above books are based on archival research, which leads me to suspect that I would find a number of references to Reveries in archival collections (if I had the time and money to visit them).
  • Memoirs that include brief mentions of Reveries:
    • Mountaineer Belmore Browne’s association of Reveries with melancholia in The Conquest of Mt. McKinley (first published 1913): “I know of nothing in this world that will produce a stronger attack of melancholia than reading The Reveries of a Bachelor on a fog-draped glacier!”
    • Philosopher Morris R. Cohen’s sense that Reveries stimulated feeling and brought relief: “Today I felt very relieved by reading Marvel’s Reveries of a Bachelor. It aroused new strains of feeling I don’t know whether I should be ashamed of wishing…”  [snippet view]
    • Richard St. Clair Steel’s description of the beauty of Reveries
    • My questions: Did women memoirists likewise praise Reveries? Why did the book have such emotional resonance?
  • Evidence that Reveries was embraced by educational, religious, and cultural authorities
    • the University of the State of New York Regents High School Exam, American Literature section included questions about Reveries in 1906, 1894, 1908, 1899, 1903, and 1897 (for whatever reason, I discovered this information not in my original search for “Reveries of a Bachelor,” but in a later search  for “”Reveries of a Bachelor” enrica”, Enrica being the name of one of the women for whom the bachelor longs)
    • Reveries was excerpted in several literary anthologies, including Harper’s First [ -sixth] Reader (1889),  The Ridpath Library of Universal Literature (1898), and American Literature Through Illustrative Readings (1915)
    • Reveries was recommended  for the high school reading list (essays) by the National Council of Teachers of English (1913).  It also appeared in quotation books.
    • The author of the satiric “Reflections of a War Camp Librarian” (1918) notes that American citizens sent Reveries and other gift books to soldiers on the battlefield in WWI, not exactly the kind of reading material soldiers craved
    • A “Country Parson” noted in 1862 how Reveries brought about “revelations of personal feeling” among the unmarried
  • Reveries appeared in many printed library catalogs from the 1850s to the 1920s, including catalogs for the Boston Public, Detroit Public, New Zealand Parliament Library, Princeton University, Library company of Philadelphia, and the British Museum Dept. of Printed Books
  • Reveries was not only read in private, but re-imagined as tableaux and read aloud at home and in public
  • Reviews of Reveries

Google Books as a Research Source

  • Except for the reviews (many of which I had already consulted) and the secondary sources on reading (which I probably would have consulted), searching Google Books enabled me to find many resources that I probably never would have discovered, including memoirs, high school curricula, and guides to performing (reciting/acting out) Reveries.  Although these sources (which I haven’t fully analyzed) haven’t radically changed my view of Reveries, they do give me a better sense of the cultural impact that the book had, as well as its personal significance to readers, who read it while climbing mountains, dealing with emotional turmoil, etc.
  • I had hoped to find annotations in scanned versions of Reveries collected in Google Books and Open Content Alliance.  However, in the copies I examined (and I should say that I glanced over them rather than scrutinized every page), I only found minor annotations–people would typically write their names in their books or inscribe a message to the recipient of the gift book, and a few readers made marks next to passages, but I found nothing like Patrick Henry’s ecstatic annotations.
  • For the texts are only available as fragments around a search term, Google Books functions as a ramped-up research index, pointing me to materials that I often need to consult in the print to put the search results in context, at least until Google Book Search settlement goes through and the out-of-print materials are also available as full text.  (For some of the limited preview books, such as reference books, however, I’m able to pull out enough information from the pages that are available without having to see the whole book.)

Using Google Books to Research Publishing History

At the upcoming Modern Language Association conference, I will join Amanda French and Eleanor Shevlin on a panel called “The Library of Google: Researching Scanned Books,” which is sponsored by SHARP and will be moderated by Michael Hancher.  Google Books has already scanned over 7 million volumes (more than many research libraries hold) and, according to Planet Google, aims to scan every volume in the WorldCat catalog, around 32 million. Our panel will focus on the significance of Google Books for literary research, looking at questions such as whether scholars can trust it and how they should deal with such plenitude.  I plan to discuss my study examining how many of the works in my dissertation bibliography are now available electronically, as well as more recent work using Google Books and other online sources to explore the history of a nineteenth-century bestseller, Donald Grant Mitchell’s Reveries of a Bachelor (1850).  Reveries fascinates me—not so much because I identify with the bachelor narrator’s fantasies and fears of what it’s like to be married (actually, I find the book kind of cloying), but because I’m intrigued by Reveries‘ cultural impact from the 1850s into the early twentieth century.  It sold at least a million copies and appeared in dozens of editions,  from a cheap edition selling for 8 cents to a $6 gift volume in an exquisite morocco binding.  Emily Dickinson loved it, as did readers who evinced their admiration by sending fan letters to Mitchell or making marks in the margins of their book.  In this blog post, I’ll focus on how I’ve employed Google Books to illuminate Reveries‘ publishing history; future posts will look at reader responses, textual history, and authorship.

For a graduate seminar on textual editing way back in the 90s,  I developed an online critical edition of the book’s first reverie.  I also wrote an article analyzing a series of letters that Reveries’ publisher, Charles Scribner II, sent to Mitchell to negotiate the pricing and physical form of new editions between 1883 and 1907, as the publisher and author worked to sustain the popularity of the book and maintain their hold on the market after their copyright expired.  But my publishing history is incomplete; I want to know more about the different forms Reveries took, how it was advertised, what the prices were at different times, how well the book sold, what marketing strategies Scribner and other publishers pursued, and whether Reveries is a unique case or fairly typical, at least for a nineteenth century bestseller.

By using Google Books, I’ve been able to fill in some details about the book’s publishing history, particularly about pricing and advertising.  As amazed as I am by ability to search across millions of books for references to Reveries, I’m also somewhat frustrated by the strange ways that Google Book search works (or doesn’t work) and disappointed that some materials don’t seem to be available.

Title page of 1850 Reveries of a Bachelor

Title page of 1850 Reveries of a Bachelor

What I already knew:

  • The authorized publisher of Reveries, Scribner’s, issued many editions, including:
  • Copyright on Reveries expired in 1892, which meant that other publishers could legally come out with their own editions of the book.  Charles Scribner II wrote to Donald Grant Mitchell to discuss how to respond to this challenge, particularly the threat from Altemus, which he characterized as a “piratical publisher.” Scribner proposed offering a cheap (30 cent) edition “to make it so unprofitable that the publisher [Altemus] will not be encouraged to take up the other books [by Mitchell],” along with a moderately-priced (75 cent) edition.  At the suggestion of Mitchell, Scribner also advertised that the company remained the only authorized publisher of Reveries.
  • Undeterred, many publishers issued unauthorized editions, including Henry Altemus Company, Optimus Printing Company, The Rodgers Company, Donohue, Henneberry, & Co, Porter, W. L. Allison Company, F. T. Neely, Thomas Y. Crowell Company Publishers, The Mershon Company Publishers, G. Munro’s Sons, H. M. Caldwell Company, The Henneberry Company, M. A. Donohue & Company, Homewood Company, A. L. Burt Company, The F. M. Lupton Company, H. M. Caldwell Co., Strawbridge & Clothier, The Edward Publishing Company, W. B. Conkey Company, Acme Printing Company, The Bobbs-Merrill Company Publishers, and R. F. Fenno & Company (BAL, 240-1; NUC, 664-667).   While I was researching Reveries at Yale, I came across several of these volumes, one of which had annotations such as “The illustrations are [most of them] execrable, & there is an occasional ‘mending’ of the text…”  In the preface to the 1907 Author’s Complete Edition of Reveries, Mitchell fixated on the problem of piracy, noting that he had amassed a collection of over 40 imprints of Reveries, only one of which brought him any money.  Apparently Mitchell’s collection–and annotations–ended up at Yale.

Method

To determine how many Reveries related works were available in Google Books, I did a keyword search for “Reveries of a Bachelor.”  The total number of results fluctuated; one day it was 641, another 916, another 809.  But forget about getting to result #641.  One result screen says: “151 – 200 of 809,” but then the next one says “Books 201 – 220 of 220.”  Huh? So what happened to everything else?  Perhaps duplicates are eliminated as you make your way through the results (although there were plenty of duplicates in the results I looked at), perhaps the algorithm used to calculate the number of results is, er, inexact and shifting, or perhaps Google figures you don’t really want to look that many results anyway.  Whatever the explanation, I can’t help wonder about what I’m not getting to see, so my trust in Google Books is diminished a bit, even as I feast on the plenty that is available. 

In any case, I looked at each result available to me, discarding those that weren’t really focused on Reveries and grabbing the bibliographic info for the rest through Zotero.  (I love Zotero, but I was a little frustrated that it didn’t capture the URL and publisher info for  Google Books, which may have to do with the way that Google makes available that information.)  When I wasn’t impeded by texts that offered only snippet views or no preview at all, I copied out a chunk of text that contained the Reveries reference and dumped it into a note in Zotero.  Categorizing as I waded through the results, I added a tag or two for each work, such as “reveries_ad” or “reveries_review.”

Since Mitchell used the pen name “Ik Marvel,” I also searched for “Ik Marvel” (1285 results, today) and “Ike Marvel” (606 results); I’m still working through those results.   I used TAPOR to generate a list of word pairs in Reveries that I hoped to use in searching for works connected to Reveries, but there were only a few pairs that seemed at all unique, such as “Aunt Tabithy,” the name of a character in the book.

Bobbs-Merrill Ad for Reveries

Bobbs-Merrill Ad for Reveries

What I discovered about publishing history using Google Books

  • Pricing: By searching book catalogs, advertisements, and old issues of Publishers Weekly, I was able to track the price for different versions of Reveries between 1851 and 1906.  The pricing data reveals the many choices enjoyed by consumers who wanted to buy a copy of Reveries, particularly at the end of the nineteenth century, when competing publishers entered the market.  Say a consumer in the late nineteenth century wanted a cheap copy of Reveries.  How about paying 8 cents for the “Ideal Library” version, or 18 cents for “Handy Volume” edition? How about a moderately priced edition?  The price of Scribner’s standard duodecimo edition remained fairly steady between 1854 and 1903: $1.25.  If people craved a fine edition, they would have many choices, such as the 1903 Dainty Small Gift Books, Agate Morocco Series with gilt edges for $2.25, the 1906 Bobbs-Merrill Ashe Illustrated Gift Edition for $2, the 1903 Limp Walrus Edition for $2,  the 1903 Limp Lizard Series for $1.50,   (If I start a band, I’m going to call it Limp Lizard.)Big gaps in my knowledge remain–I wasn’t able to find pricing information for the 1850 first edition or the 1907 Edgewood Edition, or for many of the unauthorized editions.   However, without the ability to search across a vast collection of texts I doubt I would have been able to find much of the pricing information at all, particularly in the book advertisements that appeared in magazines and at the end of books, as publishers promoted other books in their catalog.  I probably should have known to look for information about Reveries in book catalogs and late nineteenth-century issues of Publisher’s Weekly, but Google Book Search sure made it easy for me to find relevant information.
  • Response to the copyright expiration: In one of Scribner’s letters to Mitchell, I found a copy of an ad Scribners planned to run advertising its cheap edition and asserting that some portions of Reveries (the new prefaces) remained in copyright.  In Publisher’s Weekly from 1893, I found what I think is that very ad.  I wondered if Scribner’s was unique in handling copyright expiration by releasing a cheap edition and asserting continued copyright over some section. Apparently not. Right after a Scribner’s ad warning that “An action will be promptly brought against any one infringing upon the author rights,” I saw a similar ad from J. B. Lippincott Company for Susan Warner’s The Wide, Wide World, reminding “the trade” that the illustrations remained in copyright and promoting a new 75 cent cheap edition.
  • Marketing: By examining over 25 ads for Reveries available through Google Books, I’ve noticed some (fairly unsurprising) patterns:  Although the book was in Scribner’s catalog throughout the late 19th century, promotion of the book was ramped up when new editions were issued; the publisher often took out full page ads or put Reveries at the top of ads announcing several books.  By the 1890s, Scribner’s was describing Reveries as “an American classic” and predicting that the book would win over “fresh fields” of new readers.  Although I’ve found few ads from competing publishers, Bobbs-Merrill came out with an eye-catching ad for its illustrated gift edition in 1906.   So that I have a visual record of stuff I’ve look at, I’ve set up a Google notebook with clippings of ads for and reviews of Reveries that I found in Google Books.  Creating the notebook was easy; if the book is in the public domain, you can clip out sections of text and post them to your Google Notebook or Blogger blog. (If only you could post to a WordPress blog, or Flickr…)
  • Versions of Reveries: I expected to find more editions of Reveries in Google Books.  When I did a title search for “Reveries of a Bachelor,” only 21 results were returned, and only 4 of those are available as full view, even though 20 were published before 1921 and are in the public domain. (Another is a large print reprint edition from 2008.)  By contrast, the Open Content Alliance provides full access to 18 versions of Reveries, including an 1889 edition marked “Book digitized by Google from the library of the New York Public Library and uploaded to the Internet Archive by user tpb.” (By the way, tpb has apparently uploaded a number of Google Books into the Open Content Archive, prompting some folks to complain about the “pollution” of the OCA by “marginal” Google content.) So why are so many public domain texts in Google Books not fully available?  I’m not really sure, although Planet Google says that Google Books contains metadata (catalog) records for works that it did not digitize and thus are not in its collection.  In any case, if you’re interested in the physical form of books, the Open Content Alliance seems to be a better source than Google Books, since every page is scanned in full color (except, of coure, what’s been uploaded from Google Books) and is presented in a book-like interface, with flippable pages.  You can download pdf, plain text, and DJVU versions, which promotes (re-)use and analysis of the books. I should note that the Open Content Alliance has its own quirks.   OCA content appears to be available through two online collections: the Internet Archive and Open Library.  It’s not immediately obvious how to do a full-text search in OCA. It seems that you can only search bibliographic metadata in the Internet Archive, but you can do full text search at the Open Library.  To do so, go to the advanced search (http://openlibrary.org/advanced) and enter your query into the search box at the bottom.  Another quirk:  you can’t see front covers in OCA in the flip-view interface, but you can if you look at the DJVU files. But it’s even easier to put page images from OCA content into a Google Notebook; whereas in Google Books you have to crop out a section of a page and select where to send it, with OCA you just right click and send the entire page image to your notebook. (For instance, I created one for different editions of Reveries, documenting illustrations, title pages, etc.)

Limitations of Google Books

  • As noted above, not all public domain materials are available
  • Weirdness in retrieval of search results; 800 results suddenly become 220 when you work your way through the results
  • OCR errors: Among the different variations of “Ik Marvel” and “Reveries of a Bachelor; A Book of the Heart” that I found:
    o    IK MABVEL
    o    Heveries of a Bachelor (a search for this term yields 10 results in Google Books)
    o    REVERIES OF A BACHELOR; or, a Rook of the Heart
    o    REVERIES OF A BACHELOR; or, a Bonk of the Heart.
    o    Reveries of a Bad elor.
    o    REVERIES OF A BACHELOR, a Boob of the Heart. By IK. MAETEL
    You have to be resourceful, then, in how you construct a search, taking into account OCR problems.  That said, “Reveries of a Bachelor” returned hundreds of results.
  • Google Books does not contain archival materials. (Google has moved into digitizing newspapers and magazines, so who knows–maybe archives are coming? But it would be very tricky and expensive for Google to undertake such a project.)  Although searching Google Books is certainly more convenient than visiting an archive, I love being in archives, looking at stuff that few others have seen.  Even though I found a lot of useful resources in Google Books, I learned the most about the publishing history of Reveries by examining the letters from Charles Scribner II to Mitchell held by the Beinecke Library at Yale and by examining the volumes referenced in the letters.
  • If you’re interested in bibliography, as I am, looking at even a high quality scan can’t substitute for examining the physical volume, studying details such as the size of the book, the quality of the paper, the bindings, etc. But scans can give you an idea of what the volume looks like and help you to identify it.

In my next post, I’ll look at how using Google Books is helping me reconstruct the history of readers’ responses to Reveries.

Work Product Blog

Matt Wilkens, post-doctoral fellow at Rice’s Humanities Research Center, recently launched Work Product, a blog that chronicles his research in digital humanities, contemporary fiction, and literary theory.  Matt details how he is working through the challenges he faces as he tries to analyze the relationship between allegory and revolution by using text mining, such as:
•    Where and how to get large literary corpora. Matt looks at how much content is available through Project Gutenberg, Open Content Alliance, Google Books, and  Hathi Trust and  how difficult it is to access
•    Evaluating Part of Speech taggers, with information about speed and accuracy

I think that other researchers working on text mining projects will benefit from Matt’s careful documentation of his process.

By the way, Matt’s blog can be thought of as part of the movement called “open notebook science,” which Jean Claude Bradley defines as “a laboratory notebook… that is freely available and indexed on common search engines.”  Other humanities and social sciences blogs that are likewise ongoing explorations of particular research projects include Wesley Raabe’s blog, Another Anthro Blog, and Erkan’s Field Diary.  (Please alert me to others!)

Is Wikipedia Becoming a Respectable Academic Source?

Last year a colleague in the English department described a conversation in which a friend revealed a dirty little secret: “I use Wikipedia all the time for my research—but I certainly wouldn’t cite it.”  This got me wondering: How many humanities and social sciences researchers are discussing, using, and citing Wikipedia?  To find out, I searched Project Muse and JSTOR, leading electronic journal collections for the humanities and social sciences, for the term “wikipedia,” which picked up both references to Wikipedia and citations of the wikipedia URL.  I retrieved 167 results from between 2002 and 2008, all but 8 of which came from Project Muse.  (JSTOR covers more journals and a wider range of disciplines but does not provide access to issues published in the last 3-5 years.)  In contrast, Project Muse lists 149 results in a search for “Encyclopedia Britannica” between 2002 and 2008, and JSTOR lists 3.  I found that citations of Wikipedia have been increasing steadily: from 1 in 2002 (not surprisingly, by Yochai Benkler) to 17 in 2005 to 56 in 2007. So far Wikipedia has been cited 52 times in 2008, and it’s only August.

Along with the increasing number of citations, another indicator that Wikipedia may be gaining respectability is its citation by well-known scholars.  Indeed, several scholars both cite Wikipedia and are themselves subjects of Wikipedia entries, including Gayatri Spivak, Yochai Benkler, Hal Varian, Henry Jenkins, Jerome McGann, Lawrence Buell, and Donna Haraway.

111 of the sources (66.5%) are what I call “straight citations”—citations of Wikipedia without commentary about it–while 56 (34.5%) comment on Wikipedia as a source, either positively or negatively.  14.5% of the total citations come from literary studies, 14% from cultural studies, 11.4% from history, and 6.6% from law. Researchers cite Wikipedia on a diversity of topics, ranging from the military-industrial complex to horror films to Bush’s second state of the union speech.  8 use Wikipedia simply as a source for images (such as an advertisement for Yummy Mummy cereal or a diagram of the architecture of the Internet).  Many employ Wikipedia either as a source for information about contemporary culture or as a reflection of contemporary cultural opinion.  For instance, to illustrate how novels such as The Scarlet Letter and Uncle Tom’s Cabin have been sanctified as “Great American Novels,” Lawrence Buell cites the Wikipedia entry on “Great American Novel”(Buell).

About a third of the articles I looked at discuss the significance of Wikipedia itself.  14 (8%) criticize using it in research.  For instance, a reviewer of a biography about Robert E. Lee tsks-tsks:

The only curiosities are several references to Wikipedia for information that could (and should) have been easily obtained elsewhere (battle casualties, for example). Hopefully this does not portend a trend toward normalizing this unreliable source, the very thing Pryor decries in others’ work. (Margolies).

In contrast, 11 (6.6%) cite Wikipedia as a model for participatory culture.  For example:

The rise of the net offers a solution to the major impediment in the growth and complexification of the gift economy, that network of relationships where people come together to pursue public values. Wikipedia is one example.(DiZerega)

A few (1.8%) cite Wikipedia self-consciously, aware of its limitations but asserting its relevance for their particular project:

Citing Wikipedia is always dicey, but it is possible to cite a specific version of an entry. Start with the link here, because cybervandals have deleted the list on at least one occasion. For a reputable “permanent version” of “Alternative press (U.S. political right)” see: http://en.wikipedia.org/w/index.php?title=Alternative_press_%28U.S._political_right%29&oldid=107090129 (Berlet).

Of course, just because more researchers—including some prominent ones—are citing Wikipedia does not mean it’s necessarily a valid source for academic papers.  However, you can begin to see academic norms shifting as more scholars find useful information in Wikipedia and begin to cite it.  As Christine Borgman notes, “Scholarly documents achieve trustworthiness through a social process to assure readers that the document satisfies the quality norms of the field” (Borgman 84).  As a possible sign of academic norms changing in some disciplines, several journals, particularly those focused on contemporary culture, include 3 or more articles that reference Wikipedia: Advertising and Society Review (7 citations), American Quarterly (3 citations), College Literature (3 citations), Computer Music Journal (5 citations), Indiana Journal of Global Legal Studies (3 citations), Leonardo (8 citations), Library Trends (5 citations), Mediterranean Quarterly (3 citations), and Technology and Culture (3 citations).

So can Wikipedia be a reputable scholarly resource?  I typically see four main criticisms of Wikipedia:

1) Research projects shouldn’t rely upon encyclopedias. Even Jimmy Wales, (co?-)founder of Wikipedia, acknowledges “I still would say that an encyclopedia is just not the kind of thing you would reference as a source in an academic paper. Particularly not an encyclopedia that could change instantly and not have a final vetting process” (Young).  But an encyclopedia can be a valid starting point for research.  Indeed, The Craft of Research, a classic guide to research, advises that researchers consult reference works such as encyclopedias to gain general knowledge about a topic and discover related works (80).  Wikipedia covers topics often left out of traditional reference works, such as contemporary culture and technology.  Most if not all of the works I looked at used Wikipedia to offer a particular piece of background information, not as a foundation for their argument.

2) Since Wikipedia is constantly undergoing revisions, it is too unstable to cite; what you read and verified today might be gone tomorrow–or even in an hour.  True, but Wikipedia is developing the ability for a particular version of an entry to be vetted by experts and then frozen, so researchers could cite an authoritative, unchanging version (Young).  As the above citation from Berlet indicates, you can already provide a link to a specific version of an article.

3) You can’t trust Wikipedia because anyone—including folks with no expertise, strong biases, or malicious (or silly) intent—can contribute to it anonymously.  Yes, but through the back and forth between “passionate amateurs,” experts, and Wikipedia guardians protecting against vandals, good stuff often emerges. As Nicholson Baker, who has himself edited Wikipedia articles on topics such as the Brooklyn Heights and the painter Emma Fordyce MacRae, notes in a delightful essay about Wikipedia, “Wikipedia was the point of convergence for the self-taught and the expensively educated. The cranks had to consort with the mainstreamers and hash it all out” (Baker).  As Roy Rosenzweig found in a detailed analysis of Wikipedia’s appropriateness for historical research, the quality of the collaboratively-produced Wikipedia entries can be uneven: certain topics are covered in greater detail than others, and the writing can have the choppy, flat quality of something composed by committee.  But Rosenzweig also concluded that Wikipedia compares favorably with Encarta and Encyclopedia Britannica for accuracy and coverage.

4) Wikipedia entries lack authority because there’s no peer review. Well, depends on how you define “peer review.”  Granted, Wikipedia articles aren’t reviewed by two or three (typically anonymous) experts in the field, so they may lack the scholarly authority of an article published in an academic journal.  However, articles in Wikipedia can be reviewed and corrected by the entire community, including experts, knowledgeable amateurs, and others devoted to Wikipedia’s mission to develop, collect and disseminate educational content (as well as by vandals and fools, I’ll acknowledge).  Wikipedia entries aim to achieve what Wikipedians call “verifiability”; the article about Barack Obama, for instance, has as many footnotes as a law review article–171 at last count (August 31), including several from this week.

Now I’m certainly not saying that Wikipedia is always a good source for an academic work–there is some dreck in it, as in other sources.  Ultimately, I think Wikipedia’s appropriateness as an academic source depends on what is being cited and for what purpose.   Alan Liu offers students a sensible set of guidelines for the appropriate use of Wikipedia, noting that it, like other encyclopedias, can be a good starting point, but that it is “currently an uneven resource” and always in flux.  Instead of condemning Wikipedia outright, professors should help students develop what Henry Jenkins calls “new media literacies.”  By examining the history and discussion pages associated with each article, for instance, students can gain insight into how knowledge is created and how to evaluate a source.  As John Seely Brown and Richard Adler write:

The openness of Wikipedia is instructive in another way: by clicking on tabs that appear on every page, a user can easily review the history of any article as well as contributors’ ongoing discussion of and sometimes fierce debates around its content, which offer useful insights into the practices and standards of the community that is responsible for creating that entry in Wikipedia. (In some cases, Wikipedia articles start with initial contributions by passionate amateurs, followed by contributions from professional scholars/researchers who weigh in on the “final” versions. Here is where the contested part of the material becomes most usefully evident.) In this open environment, both the content and the process by which it is created are equally visible, thereby enabling a new kind of critical reading—almost a new form of literacy—that invites the reader to join in the consideration of what information is reliable and/or important.(Brown & Adler)

OK, maybe Wikipedia can be a legitimate source for student research papers–and furnish a way to teach research skills.  But should it be cited in scholarly publications?  In “A Note on Wikipedia as a Scholarly Source of Record,” part of the preface to Mechanisms, Matt Kirschenbaum offers a compelling explanation of why he cited Wikipedia, particularly when discussing technical documentation:

Information technology is among the most reliable content domains on Wikipedia, given the high interest of such topics Wikipedia’s readership and the consequent scrutiny they tend to attract.   Moreover, the ability to examine page histories on Wikipedia allows a user to recover the editorial record of a particular entry… Attention to these editorial histories can help users exercise sound judgment as to whether or not the information before them at any given moment is controversial, and I have availed myself of that functionality when deciding whether or not to rely on Wikipedia.(Kirschenbaum xvii)

With Wikipedia, as with other sources, scholars should use critical judgment in analyzing its reliability and appropriateness for citation.  If scholars carefully evaluate a Wikipedia article’s accuracy, I don’t think there should be any shame in citing it.

For more information, review the Zotero report detailing all of the works citing Wikipedia, or take a look at a spreadsheet of basic bibliographic information. I’d be happy to share my bibliographic data with anyone who is interested.

Works Cited

Baker, Nicholson. “The Charms of Wikipedia.” The New York Review of Books 55.4 (2008). 30 Aug 2008 <http://www.nybooks.com/articles/21131&gt;.

Berlet, Chip. “The Write Stuff: U. S. Serial Print Culture from Conservatives out to Neonazis.” Library Trends 56.3 (2008): 570-600. 24 Aug 2008 <http://muse.jhu.edu/journals/library_trends/v056/56.3berlet.html&gt;.

Booth, Wayne C, and Colomb, Gregory G. The Craft of Research. Chicago: U of Chicago P, 2003.

Borgman, Christine L. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass., 2007.

Brown, John Seely, and Richard P. Adler. “Minds on Fire: Open Education, the Long Tail, and Learning 2.0 .” EDUCAUSE Review 43.1 (2008): 16-32. 29 Aug 2008 <http://connect.educause.edu/Library/EDUCAUSE+Review/MindsonFireOpenEducationt/45823?time=1220007552&gt;.

Buell, Lawrence. “The Unkillable Dream of the Great American Novel: Moby-Dick as Test Case.” American Literary History 20.1 (2008): 132-155. 24 Aug 2008 <http://muse.jhu.edu/journals/american_literary_history/v020/20.1buell.pdf&gt;.

Dee, Jonathan. “All the News That’s Fit to Print Out.” The New York Times 1 Jul 2007. 30 Aug 2008 <http://www.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html&gt;.

DiZerega, Gus. “Civil Society, Philanthropy, and Institutions of Care.” The Good Society 15.1 (2006): 43-50. 24 Aug 2008 <http://muse.jhu.edu/journals/good_society/v015/15.1diZerega.html&gt;.

Jenkins, Henry. “What Wikipedia Can Teach Us About the New Media Literacies (Part One).” Confessions of an Aca/Fan 26 Jun 2007. 30 Aug 2008 <http://www.henryjenkins.org/2007/06/what_wikipedia_can_teach_us_ab.html&gt;.

Kirschenbaum, Matthew G. Mechanisms : new media and the forensic imagination. (Cambridge, Mass.: MIT Press, 2008).

Liu, Alan. “Student Wikipedia Use Policy.” 1 Apr 2007. 30 Aug 2008 <http://www.english.ucsb.edu/faculty/ayliu/courses/wikipedia-policy.html&gt;.

Margolies, Daniel S. “Robert E. Lee: Heroic, But Not the Polio Vaccine.” Reviews in American History 35.3 (2007): 385-392. 25 Aug 2008 <http://muse.jhu.edu/journals/reviews_in_american_history/v035/35.3margolies.html&gt;.

Rosenzweig, Roy. “Can History be Open Source? Wikipedia and the Future of the Past.” The Journal of American History Volume 93, Number 1 (June, 2006): 117-46.  Available at http://chnm.gmu.edu/resources/essays/d/42

Young, Jeffrey. “Wikipedia’s Co-Founder Wants to Make It More Useful to Academe.” Chronicle of Higher Education 13 Jun 2008. 28 Aug 2008 <http://chronicle.com/free/v54/i40/40a01801.htm?utm_source=at&utm_medium=en&gt;.

Doing Digital Scholarship: Presentation at Digital Humanities 2008

Note:  Here is roughly what I said during my presentation at Digital Humanities 2008 in Oulu, Finland (or at least meant to say—I was so sleep deprived thanks to the unceasing sunshine that I’m not sure what I actually did say).  My session, which explored the meaning and significance of “digital humanities,” also featured rich, engaging presentations by Edward Vanhoutte on the history of humanities computing and John Walsh on comparing alchemy and digital humanities.  My presentation reports on my project to remix my dissertation as a work of digital scholarship and synthesizes many of my earlier blog posts to offer a sort of Reader’s Digest condensed version of my blog for the past 7 months. By the way, sorry that I’ve been away from the blog for so long.  I’ve spent the last month and a half researching and writing a 100 page report on archival management software,  reviewing essays, performing various other professional duties, and going on both a family vacation to San Antonio and a grown-up vacation to Portland, OR (vegan meals followed up by Cap’n Crunch donuts.  It took me a week to recover from the donut hangover).  In the meantime, lots of ideas have been brewing, so expect many new blog entries soon.

***

When I began working on my dissertation in the mid 1990s, I used a computer primarily to do word processing—and goof off with Tetris.  Although I used digital collections such as Early American Fiction and Making of America for my dissertation project on bachelorhood in 19th C American literature, I did much of my research the old fashioned way: flipping through the yellowing pages of 19th century periodicals on the hunt for references to bachelors, taking notes using my lucky leaky fountain pen.  I relied on books for my research and, in the end, produced a book.

At the same time that I was dissertating, I was also becoming enthralled by the potential of digital scholarship through my work at the University of Virginia’s (late lamented) Electronic Text Center.  I produced an electronic edition of the first section from Donald Grant Mitchell’s bestseller Reveries of a Bachelor that allowed readers to toggle between variants.   I even convinced my department to count Perl as a second language, citing the Matt Kirschenbaum precedent (“come on, you let Matt do it, and look how well that turned out”) and the value of computer languages to my profession as a budding digital humanist.  However, I decided not to create an electronic version of my dissertation (beyond a carefully backed-up Word file) or to use computational methods in doing my research, since I wanted to finish the darn thing before I reached retirement age.

Last year, five years after I received my PhD and seven years after I had become the director of Rice University’s Digital Media Center, I was pondering the potential of digital humanities, especially given mass digitization projects and the emergence of tools such as TAPOR and Zotero.  I wondered: What is digital scholarship, anyway?  What does it take to produce digital scholarship? What kind of digital resources and tools are available to support it? To what extent do these resources and tools enable us to do research more productively and creatively? What new questions do these tools and resources enable us to ask? What’s challenging about producing digital scholarship? What happens when scholars share research openly through blogs, institutional repositories, & other means?

I decided to investigate these questions by remixing my 2002 dissertation as a work of digital scholarship.  Now I’ll acknowledge that my study is not exactly scientific—there is a rather subjective sample of one.  However, I figured, somewhat pragmatically, that the best way for me to understand what digital scholars face was to do the work myself.  I set some loose guidelines: I would rely on digital collections as much as possible and would experiment with tools for analyzing, annotating, organizing, comparing and visualizing digital information.  I would also explore different ways of representing my ideas, such as hypertextual essays and videos.  Embracing social scholarship, I would do my best to share my work openly and make my research process transparent.  So that the project would be fun and evolve organically, I decided to follow my curiosity wherever it led me, imagining that I would end up with a series of essays on bachelorhood in 19th century American culture and, as sort of an exoskeleton, meta-reflections on the process of producing digital scholarship.

My first challenge was defining digital scholarship.  The ACLS Commission on Cyberinfrastructure’s report points to five manifestations of digital scholarship: collection building, tools to support collection building, tools to support analysis, using tools and collections to produce “new intellectual products,” and authoring tools.   Some might argue we shouldn’t really count tool and collection building as scholarship.  I’ll engage with this question in more detail in a future post, but for now let me say that most consider critical editions, bibliographies, dictionaries and collations, arguably the collections and tools of the pre-digital era, to be scholarship.  In many cases, building academic tools and collections requires significant research and expertise and results in the creation of knowledge—so, scholarship.   Still, my primary focus is on the fourth aspect, leveraging digital resources and tools to produce new arguments.  I’m realizing along the way, though, that I may need to build my own personal collections and develop my own analytical tools to do the kind of scholarship I want to do.

In a recent presentation at CNI, Tara McPherson, the editor of Vectors, offered her own “Typology of Digital Humanities”:
•    The Computing Humanities: focused on building tools, infrastructure, standards and collections, e.g. The Blake Archive
•    The Blogging Humanities: networked, peer-to-peer, e.g. crooked timber
•    The Multimodal Humanities: “bring together databases, scholarly tools, networked writing, and peer-to-peer commentary while also leveraging the potential of the visual and aural media that so dominate contemporary life,” e.g. Vectors

Mashing up these two frameworks, my own typology would look something like this:

•    Tools, e.g. TAPOR, Zotero
•    Collections, e.g. The Blake Archive
•    Theories, e.g. McGann’s Radiant Textuality
•    Interpretations and arguments that leverage digital collections and tools, e.g. Ayers and Thomas’ The Difference Slavery Made
•    Networked Scholarship: a term that I borrow from the Institute for the Future of the Book’s Bob Stein and that I prefer to “blogging humanities,” since it encompasses many modes of communication, such as wikis, social bookmarking, institutional repositories, etc. Examples include Savage Minds (a group blog in anthropology), etc.
•    Multimodal scholarship: e.g. scholarly hypertexts and videos, e.g. what you might find in Vectors
•    Digital cultural studies, e.g. game studies, Lev Manovich’s work, etc (this category overlaps with theories)

Initially I assumed that tools, theories and collections would feed into arguments that would be expressed as networked and/or multimodal scholarship and be informed by digital cultural studies.  But I think that describing digital scholarship as a sort of assembly line in which scholars use tools, collections and theories to produce arguments oversimplifies the process.  My initial diagram of digital scholarship pictured single-headed arrows linking different approaches to digital scholarship; my revised diagram looks more like spaghetti, with arrows going all over the place.  Theories inform collection building; the process of blogging helps to shape an argument; how a scholar wants to communicate an idea influences what tools are selected and how they are used.

After coming up with a preliminary definition of what I wanted to do, I needed to figure out how to structure my work.  I thought of John Unsworth’s notion of scholarly primitives, a compelling description of core research practices.  Depending on how you count them, Unsworth identifies 7 scholarly primitives:
•    Discovering
•    Annotating
•    Comparing
•    Referring
•    Sampling
•    Illustrating
•    Representing

As useful as this list is in crystallizing what scholars do, I think the list is missing at least one more crucial scholarly primitive, perhaps the fundamental one: collaboration. Although humanists are stereotyped as solitary scholars isolated in the library, they often work together, whether through co-editing journals or books, sharing citations, or reviewing one another’s work.  In the digital humanities, of course, developing tools, standards, and collections demands collaboration among scholars, librarians, programmers, etc.  I would also define networked scholarship—blogging, contributing to wikis, etc—as collaborative, since it requires openly sharing ideas and supports conversation. It’s only appropriate for me to note that this idea was worked out collaboratively, with colleagues at THAT Camp.

I want to make my research process as visible as possible, not only for idealistic reasons, but also because my work only gets better the more feedback I receive.  So I started up a blog—actually, several of them. At the somewhat grandly-named Digital Scholarship in the Humanities, I reflect on trends in the digital humanities and on broader lessons learned in the process of doing my research project.  In “Lisa Spiro’s Research Notes,”  I typically address stuff that seems too specialized, half-baked, or even raw for me to put on my main blog, such as my navel gazing on where to take my project next, or my experiments with Open Wound, a language re-mixing tool.   At my PageFlakes research portal, I provide a single portal to the various parts of my research project, offering RSS feeds for both of my blogs as well as for a Google News search of the term “digital humanities,” my delicious bookmarks for “digital scholarship,” links to my various digital humanities projects, and more.

I’ll admit that when I started my experiments with social scholarship I worried that no one would care, or that I would embarrass myself by writing something really stupid, but so far I’ve loved the experience.  Through comments and emails from readers, I’m able to see other perspectives and improve my own thinking.  I’ve heard from biologists and anthropologists as well as literary scholars and historians, and I’ve communicated with researchers from several countries.  As a result, I feel more engaged in the research community and more motivated to keep working.   Although I know blogging hasn’t caught on in every corner of academia, I think it has been good for my career as a digital humanist.  I am more visible and thus have more opportunities to participate in the community, such as by reviewing book proposals, articles, and grant applications.

I don’t have space to discuss the relevance of each scholarly primitive to my project, but I did want to mention a few of them: discovering, comparing, and representing.

Discovering

In order to use text analysis and other tools, I needed my research materials to be in an electronic format.  In the age of mass digitization projects such as Google Books and the Open Content Alliance, I wondered how many of my 296 original research sources are digitized & available in full text.  So I diligently searched Google Books and several other sources to find out.  I looked at 5 categories: archival resources as well as primary and secondary books and journals.   I found that with the exception of archival materials, over 90% of the materials I cited in my bibliography are in a digital format.  However, only about 83% of primary resources and 37% of the secondary materials are available as full text.  If you want to do use text analysis tools on 19th century American novels or 20th century articles from major humanities journals, you’re in luck, but the other stuff is trickier because of copyright constraints.  (I’ll throw in another scholarly primitive, annotation, and say that I use Zotero to manage and annotate my research collections, which has made me much more efficient and allowed me to see patterns in my research collections.)

Of course, scholars need to be able to trust the authority of electronic resources.  To evaluate quality, I focused on four collections that have a lot of content in my field, 19th century American literature: Google Books, Open Content Alliance, Early American Fiction (a commercial database developed by UVA’s Electronic Text Center), and Making of America.  I found that there were some scanning errors with Google Books, but not as many as I expected. I wished that Google Books provided full text rather than PDF files of its public domain content, as do Open Content Alliance and Making of America (and EAF, if you just download the HTML).  I had to convert Google’s PDF files to Adobe Tagged Text XML and got disappointing results.  The OCR quality for Open Content Alliance was better, but words were not joined across line breaks, reducing accuracy.  With multi-volume works, neither Open Content Alliance nor Google Books provided very good metadata.  Still, I’m enough of a pragmatist to think that having access to this kind of data will enable us to conduct research across a much wider range of materials and use sophisticated tools to discern patterns – we just need to be aware of the limitations.

Comparing
To evaluate the power of text analysis tools for my project, I did some experiments using TAPOR tools, including a comparison of two of my key bachelor texts: Mitchell’s Reveries of a Bachelor, a series of a bachelor’s sentimental dreams (sometimes nightmares) about what it would be like to be married, and Melville’s Pierre, which mixes together elements of sentimental fiction, Gothic literature, and spiritualist tracts to produce a bitter satire.   I wondered if there was a family resemblance between these texts.  First I used the Wordle word cloud generator to reveal the most frequently appearing words.  I noted some significant overlap, including words associated with family such as mother and father, those linked with the body such as hand and eye, and those associated with temporality, such as morning, night, and time.  To develop a more precise understanding of how frequently terms appeared in the two texts and their relation to each other, I used TAPOR’s Comparator tool.  This tool also revealed words unique to each work, such as “flirt” and “sensibility” in the case of Reveries, “ambiguities” and “miserable” in the case of Pierre.  Finally, I used TAPOR’s concordance tool to view key terms in context.  I found, for instance, that in Mitchell “mother” is often associated with hands or heart, while in Melville it appears with terms indicating anxiety or deceit.  By abstracting out frequently occurring and unique words, I can how Melville, in a sense, remixes elements of sentimental fiction, putting terms in a darker context.  The text analysis tools provide a powerful stimulus to interpretation.

Representing
Not only am I using the computer to analyze information, but also to represent my ideas in a more media-rich, interactive way than the typical print article.  I plan to experiment with Sophie as a tool for authoring multimodal scholarship, and I’m also experimenting with video as a means for representing visual information. Right now I’m reworking an article on the publication history of Reveries of a Bachelor as a video so that I show significant visual information such as bindings, illustrations, and advertisements.    I’ve condensed a 20+ page article into a 7 minute narrative, which for a prolix person like me is rough.  I also have been challenged to think visually and cinematically, considering how the movement of the camera and the style of transitions shape the argument.  Getting the right imagery—high quality, copyright free—has been tricky as well.  I’m not sure how to bring scholarly practices such as citation into videos.  Even though my draft video is, frankly, a little amateurish, putting it together has been lots of fun, and I see real potential for video to allow us to go beyond text and bring the human voice, music, movement and rich imagery into scholarly communication.

On Tools
In the course of my experiments in digital scholarship, I often found myself searching for the right tool to perform a certain task.  Likewise, in my conversations with researchers who aren’t necessarily interested in doing digital scholarship, just in doing their research better, I learned that they weren’t aware of digital tools and didn’t know where to find out about them.  To make it easier for researchers to discover relevant tools, I teamed up with 5 other librarians to launch the Digital Research Tools, or DiRT, wiki at the end of May.   DiRT provides a directory of digital research tools, primarily free but also commercial, categorized by their functions, such as “manage citations.”  We are also writing reviews of tools geared toward researchers and trying to provide examples of how these tools are used by the research community.  Indeed, DiRT focuses on the needs of the community; the wiki evolves thanks to its contributors.   Currently 14 people in fields such as anthropology, communications, and educational technology have signed on to be contributors.  Everything is under a Creative Commons attribution license.  We would love to see spin-offs, such as DiRT in languages besides English; DiRT for developers; and Old DiRT (dust?), the hall of obsolete but still compelling tools.  My experiences with DiRT have demonstrated again the beauty of collaboration and sharing.  Both Dan Cohen of CHNM & Alan Liu of UC Santa Barbara generously offered to let us grab content from their own tools directories.  Busy folks have freely given their time to add tools to DiRT.  Through my work on DiRT, I’ve learned about tools outside of my field, such as qualitative data analysis software.

So I’ll end with an invitation: Please contribute to DiRT.  You can sign up to be an editor or reviewer, recommend tools to be added, or provide feedback via our survey.  Through efforts like DiRT, we hope to enable new digital scholarship, raise the profile of inventive digital tools, and build community.

Using Text Analysis Tools for Comparison: Mole & Chocolate Cake

How can text analysis tools enable researchers to study the relationships between texts? In an earlier post, I speculated about the relevance of such tools for understanding “literary DNA”–how ideas are transmitted and remixed–but as one reader observed, intertextuality is probably a more appropriate way of thinking about the topic. In my dissertation, I argue that Melville’s Pierre represents a dark parody of Mitchell’s Reveries of a Bachelor. Melville takes the conventions of sentimental bachelor literature, mixes in elements of the Gothic and philosophic/theological tracts, and produces a grim travesty of bachelor literature that makes the dreaming bachelor a trapped quasi-husband, replaces the rural domestic manor with a crowded urban apartment building, and ends in a real, Hamlet-intense death scene rather than the bachelor coming out of reverie or finding a wife. Would text analysis tools support this analysis, or turn up patterns that I had previously ignored?

I wanted to get a quick visual sense of the two texts, so I plugged them into Wordle, a nifty word cloud generator that enables you to control variables such as layout, font and color. (Interestingly, Wordle came up with the perfect visualizations for each text at random: Pierre white type on a black background shaped into, oh, a chess piece or a tombstone, Reveries a brighter, more casual handwritten style, with a shape like a fish or egg.)

Wordle Word Cloud for Pierre

Wordle Reveries Word Cloud

Using these visual representations of the most frequent words in each book enabled me to get a sense of the totality, but then I also drilled down and began comparing the significance of particular words. I noted, for instance, the importance of “heart” in Reveries, which is, after all, subtitled “A Book of the Heart.” I also observed that “mother” and “father” were given greater weight in Pierre, which is obsessed with twisted parental legacies. To compare the books in even more detail, I decided to make my own mashed up word cloud, placing terms that appeared in both texts next to each other and evaluating their relative weight. I tried to group similar terms, creating a section for words about the body, words about feeling, etc. (I used crop, copy and paste tools in PhotoShop to create this mashup, but I’m sure–or I sure hope–there’s a better way.

Comparison of Reveries and Pierre(About three words into the project, I wished for a more powerful tool to automatically recognize, extract and group similar words from multiple files, since my eyes ached and I had a tough time cropping out words without also grabbing parts of nearby words. Perhaps each word would be a tile that you drag over to a new frame and move around; ideally, you could click on the word and open up a concordance) My mashup revealed that in many ways Pierre and Reveries have similar linguistic profiles. For instance, both contain frequently-occurring words focused on the body (face, hand, eye), time (morning, night), thinking, feeling, and family. Perhaps such terms are common in all literary works (one would need to compare these works to a larger literary corpus), but they also seem to reflect the conventions of sentimental literature, with its focus on the family and embodied feeling (see, for instance, Howard).

The word clouds enabled me to get an initial impression of key words in the two books and the overlap between them, but I wanted to develop a more detailed understanding. I used TAPOR’s Comparator to compare the two texts, generating a complete list of how often words appeared in each text and their relative weighting. When I first looked at the the word list, I was befuddled:

Words Reveries counts Reveries relative counts Pierre relative Pierre counts Relative ratio Reveries:Pierre
blaze 45 0.0007 0 1 109.4667

What does the relative ratio mean? I was starting to regret my avoidance of all math and stats courses in college. But after I worked with the word clouds, the statistics began to make more sense. Oh, relative ratio means how often a word appears in the first text versus the second–“blaze” is much more prominent in Reveries. Ultimately I trusted the concreteness and specificity of numbers more than the more impressionistic imagery provided by the word cloud, but the word cloud opened up my eyes so that I could see the stats more meaningfully. For instance, I found that mother indeed was more significant in Pierre, occurring 237 times vs. 58 times in Reveries. Heart was more important in Reveries (a much shorter work), appearing 199 times vs. 186 times in Pierre. I was surprised that “think” was more significant in Reveries than in Pierre, given the philosophical orientation of the latter. With the details provided by the text comparison results, I could construct an argument about how Melville appropriates the language of sentimentality.

But the differences between the two texts are perhaps even more interesting than their similarities, since they show how Melville departed from the conventions of male sentimentalism, embraced irony, and infused Pierre with a sort of gothic spirtualism. These differences are revealed more fully in the statistics than the word clouds. A number of terms are unique to each work. For instance, sentimental terms such as “sympathies,” “griefs,” “sensibility” appear frequently in Reveries but never in Pierre, as do romantic words such as “flirt,” “sparkle,” and “prettier.” As is fitting for Melville, Pierre‘s unique language is typically darker, more archaic, abstract, and spiritual/philosophical, and obsessed with the making of art: “portrait,” “writing,” “original,” “ere,” “miserable,” “visible,” “invisible,” “profound(est),” “final,” “vile,” “villain,” “minds,” “mystical,” “marvelous,” “inexplicable,” “ambiguous.” (Whereas Reveries is subtitled “A Book of the Heart,” Pierre is subtitled “The Ambiguities.”) There is a strand of darkness in Mitchell–he uses “sorrow” more than Melville–but then Mitchell uses “pleasure” 14 times to Melville’s 2 times and “pleasant” 43 times. Reveries is more self-consciously focused on bachelorhood; Mitchell uses “bachelor” 28 times to Melville’s 5. Both authors refer to dreaming; Mitchell uses “reveries” 10 times, Melville 7. Interestingly, only Melville uses “America” (14 times).

Looking over the word lists raises all sorts of questions about the themes and imagery of each work and their relationship to each other, but the data can also be overwhelming. If comparing two works yields over 10,000 lines in a spreadsheet, what criteria should you use in deciding what to select (to use Unsworth’s scholarly primitive)? What happens when you throw more works into the mix? I’m assuming that text mining techniques will provide more sophisticated ways of evaluating textual data, allowing you to filter data and set preferences for how much data you get. (I should note that you can exclude terms and set preferences in TAPOR).

Text analysis brings attention to significant features of a text by abstracting those features–for instance, by generating a word frequency list that contains individual words and the number of times they appear. But I kept wondering how the words were used, in what context they appeared. So Melville uses “mother” a lot–is it in a sweetly sentimental way, or does he treat the idea of mother more complexly? By employing TAPOR’s concordance tool, you can view words in context and see that Mitchell often uses mother in association with words like “heart,” “kiss,” “lap,” while in Melville “mother” does appear with “Dear” and “loving,” but also with “conceal,” “torture,” “mockingly,” “repelling,” “pride,” “cruel.” Hmmm. In Mitchell, “hand” most often occurs with “your” and “my,” signifying connection, while “hand” in Pierre is more often associated with action (hand-to-hand combat, “lift my hand in fury,” etc) or with putting hand to brow in anguish. Same word, different resonance. It’s as if Melville took some of the ingredients of sentimental literature and made something entirely different with them, enchiladas mole rather than a chocolate cake.

Word clouds, text comparisons, and concordances open up all sorts of insights, but how does one use this evidence in literary criticism? If I submitted an article full of word count tables to a traditional journal, I bet the editors wouldn’t know what to do with it. But that may change, and in any case text analysis can inform the kind of arguments critics make. My experience playing with text analysis tools verifies, for me, Steve Ramsay’s recommendation that we “reconceive computer-assisted text analysis as an activity best employed not in the service of a heightened critical objectivity, but as one that embraces the possibilities of that deepened subjectivity upon which critical insight depends.”

Works Cited

Howard, June. “What Is Sentimentality?.” American Literary History 11.1 (1999): 63-81. 22 Jun 2008 <http://alh.oxfordjournals.org/cgi/content/citation/11/1/63&gt;.

Ramsay, Stephen. “Reconceiving Text Analysis: Toward an Algorithmic Criticism.” Lit Linguist Computing 18.2 (2003): 167-174. 27 Nov 2007 <http://llc.oxfordjournals.org/cgi/content/abstract/18/2/167&gt;.