Category Archives: research practices

Presentation on How Digital Humanists Use GitHub

At Digital Humanities 2016, Sean Morey Smith and I presented on our ongoing work examining GitHub as a platform of knowledge for digital humanities. Our results are still preliminary, but we want to share our presentation (PDF). We’re especially grateful to those who agreed to be interviewed for the study and who took our survey. We expect to produce an article (or two) based on our research.

We welcome any questions or feedback.

Studying How Digital Humanists Use GitHub

Over the past academic year, I’ve been fortunate to participate in Rice’s Mellon-sponsored Sawyer Seminar on Platforms of Knowledge, where we’ve examined platforms for authoring, annotation, mapping, and social networking. We’ve discussed both the possibilities that platforms may open up for inquiry, public engagement and scholarly communications and the risks that they may pose for privacy and nuanced humanistic analysis. Inspired by the questions raised by the Seminar, my colleague Sean Smith and I are studying a platform used by a number of digital humanists: GitHub. Digital humanists employ GitHub not only for code, but also for writing projects, syllabi, websites, and other scholarly resources. We’ll present our initial findings at Digital Humanities 2016, but I wanted to offer some background to the study, especially since some of you will soon be receiving emails from me inviting you to participate in it.

Initially I was interested in using GitHub for a case study of how we assess and select digital platforms. Even as many researchers (myself included) rely on digital platforms, I haven’t been able to find many clear rubrics for evaluating them. Building on Quinn Dombrowski’s recommendations for choosing a platform for a web project, we are looking at criteria such as functionality and ease of use. In previous work examining archival management systems, I learned how important it is to talk with users about their experience with tools, so we will be conducting a survey and interviews about GitHub. Sean and I also also realized that GitHub itself provides valuable data about how people use GitHub, such as information about collaboration, code re-use, and connections to others. Our study will thus include analysis of publicly available data about selected GitHub users and repositories. (Of course, there is significant prior work on this topic in fields such as social computing that we will draw upon.)

With this project, we are:

  1. Identifying digital humanists who have GitHub accounts. For the purposes of this study, we are looking at presenters at the last three Digital Humanities conferences and people affiliated with organizations that belong to centerNet (assuming that the information is publicly available). Of course, this method is imperfect– it misses digital humanists who didn’t attend the DH conferences or who aren’t affiliated with DH centers, and it may include some people who don’t really consider themselves digital humanists. But it’s a start.
  2. Contacting those whose email addresses are easily retrievable (e.g. available via GitHub) and:
    1. Giving them the opportunity to opt out of having their publicly available GitHub data being included in our analysis and in the dataset that we plan to share at the end of the study. (Added 5/18/16: To be extra careful, we plan to anonymize this dataset.)
    2. Inviting them to take a brief survey about their usage and opinions of GitHub
    3. Inviting them to participate in an interview

    We may also contact people whose emails aren’t in the GitHub data but are otherwise available.

  3. Analyzing GitHub data from our dataset to gain insight into how digital humanists use GitHub.

We want to conduct this study openly while at the same respecting privacy. In conducting interviews for past studies, I’ve been frustrated that I can’t publicly identify and credit people who have made brilliant comments because of the promise of confidentiality.  So we’re giving interviewees the option to make all or some of their interview notes public–but of course they can instead keep the notes private and remain anonymous. Survey data will be anonymized but ultimately shared.

Here are important documents related to our study:

I welcome feedback and questions about this study. I hope that it will contribute to developing criteria for evaluating platforms like GitHub and offer insights into how digital humanities researchers and developers work.

Exploring Digital Humanities and Media History

Some of my favorite conferences are those that are outside my field, since they expose me to new perspectives and enable me to meet new people. Such was the case with the ArcLight Symposium hosted in May by Concordia University. Sponsored by Project ArcLight, a Concordia/University of Wisconsin project to develop web-based tools for analyzing the rise of American media (movies, radio, TV, newspapers) in the 20th century, the symposium brought together film and media historians with digital humanists to explore the possibilities and pitfalls of digital methods for media history. Funded through a Digging into Data grant, the project builds upon the Media History Digital Library (MHDL), which contains 1.5 million pages of digitized trade journals, fan magazines, and other public domain media periodicals.ArcLight Logo

While some media historians apologized for not being particularly savvy about digital methods, some digital humanists (OK, this one) confessed to having a limited knowledge of film history. But Mark Williams rightly suggested that doing work in digital humanities requires moving outside your comfort zone, and the conference was the richer for people making such leaps. I was reminded of Elijah Meeks’ suggestion that “interloping, more than computational approaches or the digital broadly construed as the object of study, defines digital humanities.” Figuring out new methods or delving into unfamiliar subject areas necessitates interloping.

Rather than summarizing each paper presented at the symposium, I will highlight what emerged as important themes. See also Charlotte Fillmore-Handlon’s conference summary.

Core Principles

  • The importance of archives—physical and digital– to media history research. Such research often requires returning to the original record and paying attention to the material object. For example, as Haidee Wasson studies the history of portable film projectors, she needs to examine the original objects so that she can get a sense of their heft and design. At the same time, digital archives open up rich possibilities for discovering relevant resources and patterns, analyzing them, and sharing them. For example, the Media Ecology Project aims to enable researchers to access resources from media archives, create their own annotations, collections, and publications, and contribute metadata back, using tools such as Mediathread, Scalar and onomy.org.
  • The importance of thinking historically about primary source materials—to understand, for example, the structure, design and cultural history of newspapers, as Paul Moore and Sandra Gabriele pointed out. Likewise, we need to pay attention to the features of digital objects. Ryan Cordell emphasized that we should view digital resources as something entirely different than print, so that a digitized newspaper is not a duplicate of a print edition, but its own typeset edition.
  • The need to consider scope: I referred to Josh Sternfeld’s recommendation that historians deal with digital abundance by extending the principles of scope and provenance into the digital environment. In conducting analysis, it’s important to keep in mind what’s included–and what’s not–in the corpus. As Greg Waller pointed out, studying representation means looking at multiple forms of media—not just film, but sheet music, postcards, literature, etc. Where do you draw the boundaries?

Challenges

  •  Intellectual property: Despite digital plenty, researchers are often limited by paywalls and proprietary databases. MHDL tried to work with a commercial database, but was rebuffed; the global film data that underlies Deb Verhoeven’s Kinomatics research is expensive and cannot be shared. These challenges point to the need to support open projects such as MHDL whenever possible. An audience member urged greater risk-taking, as paranoia about intellectual property can lead scholars, librarians and institutions to hesitate in sharing cultural materials that are public goods.
  •  Ambiguity of humanities data. As many researchers have suggested, humanities data is often messy, dynamic and ambiguous. For example, Deb Verhoeven noted that the locations, size and status of theaters change over time. Likewise, Derek Long acknowledged the challenges of disambiguating film titles that include common words such as “war.”
  •  Reductiveness of ontologies. David Berry asked what happens when information is translated into an ontology, suggesting that DH imports instrumental methods into interpretation. Deb Verhoeven argued that imposing ontologies on diverse humanities data raises both practical and ethical issues. For example, with a typical ontology you can’t say that things aren’t related, and you can’t see who made what assertions. Hence HuNI (Humanities Networked Infrastructure), which brings together 31 Australian cultural datasets, eschews traditional ontologies. Instead, users generate connections through a vernacular linking process, using either free text or already established categories. They can also explore linkages created by others.

Approaches

  •  Searching as frustrating but also fundamental. Searching enables scholars to frame an inquiry, get a sense of what an archive might contain, and discover relevant materials. Eric Hoyt pointed to problems with search, including the risks of confirmation bias and of being overwhelmed by results, but also suggested that search is easily understood and widely used by most scholars. Scaled entity search brings greater power to search, allowing researchers to compare hundreds of entities (such as names, film titles, or radio station call letters) across the corpus and then generate visualizations to explore patterns. For example, you can compile a list of the most frequently mentioned people in silent film (which often turn out to be those who also headed production companies). ArcLight will be released later this summer.
    To create the entity list for film history, ArcLight uses its Early Film Credits (EFC) dataset. This dataset grew out of a 1996 database of American Film Personnel and Company Credits, 1908-1920, which is itself based on a 1976 book. As Derek Long showed, EFC enables you to, for example, generate a list of the number of films produced by a director in a particular year or the number of films made by different production companies (revealing the dominance of a few companies and the “long tail,” as most companies made only a few films).
  •  Pattern matching. As Ryan Cordell noted, searching doesn’t work for everything—for example, you can’t run a search for “show me all instances of reprinting in 19th century newspapers.” Much humanities work involves identifying and interpreting patterns; digital tools can support pattern matching on a much larger scale. For example, the Viral Text Project uses sophisticated algorithms to detect matches in thousands of pages of 19th century newspapers. For a human to take on this work would be nearly impossible. But the computer can reveal significant repeating patterns across a corpus like Chronicling America, enabling scholars to detect the circulation of ideas, debates over the authorship of a sentimental poem, and much more.
  •  Working at multiple scales. As Bobby Allen pointed out, the level of zoom “makes some relations visible but obscures others.”  Paul Moore and Sandra Gabriele spoke to the importance of middle-range reading, or “scalable reading.” Scalable reading involves moving from the macro to the middle to the micro—for example, looking at patterns of circulation at a distance, “following material links across media, form and genre” at the mid-range, and examining the “symbolic order” of a newspaper page close up.
  • Experimentation and iteration: Several speakers used variations of the term “experiment” to describe digital methods. For instance, Deb Verhoeven emphasized that working with the Kinomatics Project’s “big data” documenting the flow of international film across space and time requires experimentation and iteration. Haidee Wesson noted the “experimentalism” of trying out different search strategies, posing different questions of databases. Charles Acard offered an important caution as he suggested that experimentation in DH needs to be influenced by literary theory.
  • The need to engage public audiences. Bobby Allen and Deb Verhoeven spoke to the importance of thinking of humanities work as public goods. Bobby Allen suggested that researchers should “dig where you stand,” but connect local archives to the network. For example, the UNC Digital Innovation Lab’s Digital Loray project enables the public to explore the history of a textile mill in Gastonia, NC, using digital tools to share stories, maps and timelines and engage public audiences.

    Crowdsourcing represents one form of public engagement, but some raised concerns. What distinguishes public expertise from academic? Is crowdsourcing exploitative?

  •   Using visualization to represent complexity: How do you represent the magnitude of global cinema? Laura Horak sketched out a project to create flow maps to study the global circulation of film, describing different ways of visualizing such data and how such visualizations can shape understanding.

While participants used examples specific to media history, many of the concerns and approaches explored in the symposium have broader relevance. I was reminded of the importance of cultivating a critical awareness of methods and engaging in the kind of interdisciplinary dialogue that the ArcLight symposium fostered.

Disclosure: Most of my travel costs were covered by the symposium.

Rev, 6/13/15: Corrected spelling error.

Exploring the Significance of Digital Humanities for Philosophy

On February 23, I was honored to speak at an Invited Symposium on Digital Humanities at the American Philosophical Association’s Central Division Meeting in New Orleans. Organized by Cameron Buckner, who is a Founding Project Member of InPhO and one of the leaders of the University of Houston’s Digital Humanities Initiative, the session also featured great talks by Tony Beavers on computational philosophy and David Bourget on PhilPapers.

“Join in,” by G A R N E T

One of the central questions that we explored was why philosophy seems to be less visibly engaged in digital humanities; as Peter Bradley once wondered, “Where Are the Philosophers?” As I noted in my talk, the NEH’s Office of Digital Humanities has only awarded 5 grants in philosophy (4 out of 5 to Colin Allen and colleagues on the InPhO project). Although the APA conference was much smaller than MLA or AHA, I was still surprised that there seemed to be only two sessions on DH, compared to 66 at MLA 2013 and 43 at AHA 2013.

Yet there are some important intersections among DH and philosophy. Beavers pointed to a rich history of scholarship in computational philosophy. With PhilPapers, philosophy is ahead of most other humanities disciplines in having an excellent online index to and growing repository of research.  Most of the same challenges faced by philosophers with an interest in DH apply to other domains, such as figuring out how to acquire appropriate training (particularly for graduate students), recognizing and rewarding collaborative work, etc.

My talk was a remix and updating of my presentation “Why Digital Humanities?” In exploring the rationale for DH, I tried to cite examples relevant to philosophy. For example, the Stanford Encyclopedia of Philosophy, a dynamic online encyclopedia that predates Wikipedia, has had a significant impact, with an average of nearly a million weekly accesses during the academic year. With CT2.0, Peter Bradley aims to create a dynamic, modular, multimedia, interactive, community-driven textbook on critical thinking. Openness and collaboration also inform the design of Chris Long and Mark Fisher’s planned Public Philosophy Journal, which seeks to put public philosophy into practice by curating conversations, facilitating open review, encouraging collaborative writing, and fostering open dialogue. Likewise, I described how Transcribe Bentham is enabling the public to help create a core scholarly resource.  I also discussed recent critiques of DH, including Stephen Marche’s “literature is not data,” the 2013 MLA session on the “dark side” of DH, and concerns that DH risks being elitist. I closed by pointing to some useful resources in DH and calling for open conversation among the DH and philosophy communities. With that call in mind, I wonder: Is it the case that philosophy is less actively engaged in digital humanities?  If so, why, and what might be done to address that gap?

20/30 Vision: Scenarios for the Humanities in 2030

[Here is the extended dance remix version of the talk I gave at the 2010 American Studies Association panel on “Facing New Technologies, Exploring New Challenges.”]

We seem to be anxious about the future—heck, the present—of the humanities.  Consider budget cuts such as those at SUNY-Albany and in the UK, the horrible job market, the declining number of majors, and the frequent appearance of articles with titles like “Can the Humanities Survive the 21st Century?

Instead of focusing on the present in this panel on “Facing New Technologies, Exploring New Challenges,” I’d like to zoom forward twenty years using a process called scenario planning. Essentially, a scenario is a brief story about the future. By working through such stories, organizations can look at the proverbial big picture and devise strategies for facing critical uncertainties in future environments, such as the nature of technological change, the state of higher education, and globalization.  (Given its emphasis on storytelling and interpretation, scenario planning seems like an approach at home in the humanities.)

Recently both the Association of Research Libraries and the Association of College and Research Libraries issued reports about the future of libraries based on scenario planning. (You might have noticed that libraries are also anxious as they face the transition to digital information.) My favorite of the genre is the Library of New South Wales’ The Bookends Scenarios, both because it confronts larger challenges such as climate change and because it leavens gloominess with imagination and humor, such as: “Book by James Lovelock Jnr claims that 98% of human race will be extinct by 2100; 78% of people say they wish James Lovelock Jnr would become extinct by 2029.”

Although scenario planning has its skeptics, I can testify to the ways that it can help people break out of their typical ways of seeing and stimulate their imaginations. Just this week, my library held a retreat based on the ARL 2030 Scenarios.  Despite some grumbling about the unlikelihood of any of the scenarios coming to pass, participants did think deeply and creatively about risks and opportunities facing academic libraries as research becomes more global, entrepreneurial, and data driven. The scenarios sparked conversation.

Today I’d like to put forward three scenarios for the future of the humanities. I’m mashing together the aforementioned library scenarios with the Rockefeller Foundation’s Scenarios for the Future of Technology and International Development and Bryan Alexander’s “Stories of the Future: Telling Scenarios,” as well as a dash of David Mitchell’s Cloud Atlas. A few caveats: 1) I’m notoriously bad at predicting the future. (I really thought I would enjoy treats whipped up by a robot chef by now). 2) The scenarios are compressed and partial.   3) The future will most likely not be any one of these scenarios, although it may contain elements of some of them. 4) A diverse community rather than a quirky individual should develop and think through future scenarios.

I aim to open up a conversation, not have the final word. (It might be useful for an organization such as CenterNet, the Association for Computers and the Humanities or the NEH to take on this exercise in earnest.) The core question that I want to explore: how can we transform the humanities so that they continue to be relevant in twenty years–so that they “survive the 21st century”?

Critical Uncertainties

In defining these scenarios, I am considering several “critical uncertainties”:

  • Teaching and learning: As distance education becomes more dominant, what will humanities education look like?
  • Funding sources: Where will money for humanities research come from, especially as public funding is under stress?
  • Research methods: How will the availability of huge amounts of data (for instance, the 12+ million volumes in Google Books) affect the way humanities research is conducted?
  • Knowledge production and dissemination: How will research be communicated? Will there be free and open access to information, or will it be available only to the highest bidder?
  • Environmental, social, political, technological and cultural changes: What will be the impact of climate change, peak oil, population growth, resource depletion, economic challenges, developments in technology, and globalization on the world?

Based on these uncertainties, I’ve whipped up three scenarios. (To conform to the genre, I should offer four, but I can only cram so much into a 12 minute presentation).

I.     A New Renaissance

the green ascent (by vsz)

i.     Summary: Through broad, sustained investment in education, the world enjoys greater equity and opportunity. Interdisciplinary research and international cooperation have led to progress on resolving many challenges, including climate change, political conflict, and resource depletion.

ii.     Research: Humanities scholars are valued for bringing critical understanding to large amounts of data. In collaboration with computer scientists and librarians, humanities scholars devise methods to mine large humanities databases, coming up with new questions and insights that cross disciplinary and linguistic divides. Humanities (and digital humanities) centers help to coordinate much of this activity. Through efforts by leading scholars and scholarly organizations, tenure and promotion guidelines have been broadened to recognize a wide range of work, including scholarly multimedia, online dialogues, and curated content.

iii.     Teaching: Blended learning has become common, with lectures and exercises delivered online and face-to-face time reserved for discussion and collaborative research. Faculty act as guides and mentors for networked research projects that engage students around the world in producing new knowledge. The humanities provide crucial training in curating, contextualizing and interpreting large amounts of data, as well as in critically examining individual objects.

iv.     Scholarly communication: Research is openly available, speeding the pace of discovery and spreading ideas widely. To capture the complexities of their research, scholars produce multimodal scholarship that incorporates video, audio, visualizations, maps, etc.

2.   Humanities, Inc.

Banksy-Cashpoint (by TT)

i.     Summary: As the United States faces economic crises, public funding for education and research erodes.  People feel both overwhelmed by information and hungry for whatever supports their own perspective. Political conflict erupts around the world as a result of resource depletion and climate change, prompting the US to go into a defensive crouch.

ii.     Research: To the extent that research is funded, the money mostly comes from corporations, often with strings attached. Researchers no longer have tenured positions at universities, but move from contract to contract. By necessity, researchers focus on “what pays?”  However, some scholars work with the public to produce crowdsourced humanities research.

iii.     Teaching: Most undergraduate education is offered through distance education; students choose from a menu of choices rather than attending a particular institution.  Instruction mostly focuses on vocational skills. A few elite institutions remain and offer face-to-face instruction for the very wealthy.  Teachers, most of whom are employed by private companies, teach classes with several hundred people, leaving no time for research. Except for a few “rock stars,” the academic labor force is contingent.

iv.     Scholarly communication: Except for crowdsourced information, most research is available only to those individuals and communities who pay for it.

c.     After the Fall

petrol head (Leo Reynolds)

i.     Summary: The devastating effects of climate change, energy shortages, and economic recession prompt a return to localism, so that local communities provide for most of people’s needs. Some areas have descended into chaos or totalitarianism, run by bandits or warlords.  But others have developed democratic local solutions—microindustries, local power grids, community gardens, co-ops. Despite the scarcity of energy and frequent power outages, people occasionally are able to access and share information on the Internet, but travel becomes rare. The humanities provide a respite from day-to-day drudgery and a source of perspective and wisdom.

ii.     Research: Scholars become research hackers, devising solutions to problems both by studying past folkways and by surveying what other communities are doing now. They are resourceful in retrieving information however they can, taking full advantage of the time when they can access the Internet. There is a renewed appreciation for aesthetics, for well-made or meaningful objects. Humanities centers focus on bridging different interests groups working in the humanities, including secondary education and local cultural organizations.

iii.     Teaching: Although much education focuses on core skills such as literacy, craftsmanship, and agriculture, humanists are valued as wisdom keepers and curators of knowledge, distilling what is important on and passing on cultural appreciation.

iv.     Scholarly communication: Given the unreliability of the electrical grid, print becomes valued for its stability.  Scholars frequently participate in public conversations in their communities.

What Now?

Reflections (Kevin Dolley)


So how can the humanities prepare for these possible futures?

1.     Adapt! Engage with and understand technology’s role in the humanities. Like it or not, technology is shaping our future—both how we do our research and, increasingly, how learning is delivered.   Thus we should experiment with new models for teaching, peer review, research, and scholarly communication. For example, the Center for History and New Media have been doing some fascinating experiments to challenge the slow pace of academia and, perhaps even more importantly, create community, whether by crowdsourcing a book or creating a piece of software in a week. Likewise, the Looking for Whitman project is linking together college classrooms in the study of Walt Whitman and engaging students in producing public scholarship. (Whitman would approve, I think.) We need to make visible the value of this kind of work.

2.     Cooperate! Support collaborative, interdisciplinary research.  Such collaboration should occur on many levels: across professional roles, departments, universities, and community organizations. Greg Crane recently made a compelling case that “We need better ways to understand the cultures that drive economic and political systems upon which our biological lives depend.”  To do that, as Crane argues,we need to ask good questions about the connections among cultures, foster dialogue, collaborate with scholars from a range of cultural backgrounds, and make scholarship widely available.  AWe also need to devise ways of dealing with masses of data, both through developing computational approaches and by opening up research opportunities to students and volunteers.

Humanities centers (working in collaboration with libraries and with scholarly organizations) should play a lead role in supporting cross-disciplinary research and in communicating that research to the public. As I found in a recent research project on collaboration in the digital humanities, many humanities departments still do not know how to evaluate collaborative work for tenure and promotion; this should change. Likewise, recognition and support should be given to those in “alternative academic careers”—librarians, technologists, administrators, researchers, and others who are key players in digital humanities initiatives.

3.     Open! Reform scholarly communication so that it is open, multimodal, participatory, and high quality.  If we want to convince the public of the value of the humanities, then we shouldn’t make it prohibitively expensive for them to access scholarship.  Rather, we should come up with sustainable models for scholars to share their research and participate in visible scholarly conversations.

4.     Evangelize! Advocate for the value of the humanities—and indeed of research and education generally. In particular, I encourage you to support 4humanities, a new web site and initiative to advocate for the humanities. Launched by a collective that is coordinated by Alan Liu (I’m proud to be a member), 4humanities leverages the expertise of the digital humanities community to provide tools, media and resources for promoting for the humanities.

The key point that I want to emphasize is the importance of community in facing challenges/opportunities, as well as in advocating for the humanities. (This idea was developed collectively by our ASA panel—Haven Hawley, Charles Reagan Wilson, Elena Razlogova, and myself– during a breakfast gathering to plan our session.) I think digital humanities scholars/practitioners have been pretty successful in building community, using both networked technologies such as blogs and Twitter and face-to-face gatherings such as THATCamp to connect people, ideas and action.  But we can do more. Let’s get moving!


Digital Humanities in 2008, III: Research

In this final installment of my summary of Digital Humanities in 2008, I’ll discuss developments in digital humanities research. (I should note that if I attempted to give a true synthesis of the year in digital humanities, this would be coming out 4 years late rather than 4 months, so this discussion reflects my own idiosyncratic interests.)

1) Defining research challenges & opportunities

What are some of the key research challenges in digital humanities? Leading scholars tackled this question when CLIR and the NEH convened a workshop on Promoting Digital Scholarship: Formulating Research Challenges In the Humanities, Social Sciences and Computation. Prior to the workshop, six scholars in classics, architectural history, physics/information sciences, literature, visualization, and information retrieval wrote brief overviews of their field and of the ways that information technology could help to advance it. By articulating the central concerns of their fields so concisely, these essays promote interdisciplinary conversation and collaboration; they’re also fun to read. As Doug Oard writes in describing the natural language processing “tribe,” “Learning a bit about the other folks is a good way to start any process of communication… The situation is really quite simple: they are organized as tribes, they work their magic using models (rather like voodoo), they worship the word “maybe,” and they never do anything right.” Sounds like my kind of tribe. Indeed, I’d love to see a wiki where experts in fields ranging from computational biology to postcolonial studies write brief essays about their fields, provide a bibliography of foundational works, and articulate both key challenges and opportunities for collaboration. (Perhaps such information could be automatically aggregated using semantic technologies—see, for instance, Concept Web or Kosmix–but I admire the often witty, personal voices of these essays.)

Here are some key ideas that emerge from the essays:

  1. Global Humanistic Studies: Both Caroline Levander and Greg Crane, Alison Babeu, David Bamman, Lisa Cerrato, and Rashmi Singhal call for a sort of global humanistic studies, whether re-conceiving American studies from a hemispheric perspective or re-considering the Persian Wars from the Persian point of view. Scholars working in global humanistic studies face significant challenges, such as the need to read texts in many languages and understand multiple cultural contexts. Emerging technologies promise to help scholars address these problems. For instance, named entity extraction, machine translation and reading support tools can help scholars make sense of works that would otherwise be inaccessible to them; visualization tools can enable researchers “to explore spatial and temporal dynamism;” and collaborative workspaces allow scholars to divide up work, share ideas, and approach a complex research problem from multiple perspectives. Moreover, a shift toward openly accessible data will enable scholars to more easily identify and build on relevant work. Describing how reading support tools enable researchers to work more productively, Crane et . write, “By automatically linking inflected words in a text to linguistic analyses and dictionary entries we have already allowed readers to spend more time thinking about the text than was possible as they flipped through print dictionaries. Reading support tools allow readers to understand linguistic sources at an earlier stage of their training and to ask questions, no matter how advanced their knowledge, that were not feasible in print.” We can see a similar intersection between digital humanities and global humanities in projects like the Global Middle Ages.
  2. What skills do humanities scholars need? Doug Oard suggests that humanities scholars should collaborate with computer scientists to define and tackle “challenge problems” so that the development of new technologies is grounded in real scholarly needs. Ultimately, “humanities scholars are going to need to learn a bit of probability theory” so that they can understand the accuracy of automatic methods for processing data, the “science of maybe.” How does probability theory jibe with humanistic traditions of ambiguity and interpretation? And how are humanities scholars going to learn these skills?

According to the symposium, major research challenges for the digital humanities include:

  1. Scale and the poverty of abundance:” developing tools and methods to deal with the plenitude of data, including text mining and analysis, visualization, data management and archiving, and sustainability.
  2. Representing place and time: figuring out how to support geo-temporal analysis and enable that analysis to be documented, preserved, and replicated
  3. Social networking and the economy of attention: understanding research behaviors online; analyzing text corpora based on these behaviors (e.g. citation networks)
  4. Establishing a research infrastructure that facilitates access, interdisciplinary collaboration, and sustainability. “As one participant asked, “What is the Protein Data Bank for the humanities?””

2) High performance computing: visualization, modeling, text mining

What are some of the most promising research areas in digital humanities? In a sense, the three recent winners of the NEH/DOE’s High Performance Computing Initiative define three of the main areas of digital humanities and demonstrate how advanced computing can open up new approaches to humanistic research.

  • text mining and text analysis: For its project on “Large-Scale Learning and the Automatic Analysis of Historical Texts,” the Perseus Digital Library at Tufts University is examining how words in Latin and Greek have changed over time by comparing the linguistic structure of classical texts with works written in the last 2000 years. In the press release announcing the winners, David Bamman, a senior researcher in computational linguistics with the Perseus Project, said that “[h]igh performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin.”
  • 3D modeling: For its “High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage” project, the Institute for Advanced Technology in the Humanities at the University of Virginia will reprocess existing data to create 3D models of culturally-significant artifacts and architecture. For example, IATH hopes to re-assemble fragments that chipped off  ancient Greek and Roman artifacts.
  • Visualization and cultural analysis: The University of California, San Diego’s Visualizing Patterns in Databases of Cultural Images and Video project will study contemporary culture, analyzing datastreams such as “millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.” Ultimately the project will produce detailed visualizations of cultural phenomena.

Winners received compute time on a supercomputer and technical training.

Of course, there’s more to digital humanities than text mining, 3D modeling, and visualization. For instance, the category listing for the Digital Humanities and Computer Science conference at Chicago reveals the diversity of participants’ fields of interest. Top areas include text analysis; libraries/digital archives; imaging/visualization, data mining/machine learning; informational retrieval; semantic search; collaborative technologies; electronic literature; and GIS mapping. A simple analysis of the most frequently appearing terms in the Digital Humanities 2008 Book of Abstracts suggests that much research continues to focus on text—which makes sense, given the importance of written language to humanities research.  Here’s the list that TAPOR generated of the 10 words most frequently used terms in the DH 2008 abstracts:

  1. text: 769
  2. digital: 763
  3. data: 559
  4. information: 546
  5. humanities: 517
  6. research: 501
  7. university: 462
  8. new: 437
  9. texts: 413
  10. project: 396

“Images” is used 161 times, visualization 46.

Wordle: Digital Humanities 2008 Book of Abstracts

And here’s the word cloud. As someone who got started in digital humanities by marking up texts in TEI, I’m always interested in learning about developments in encoding, analyzing and visualizing texts, but some of the coolest sessions I attended at DH 2008 tackled other questions: How do we reconstruct damaged ancient manuscripts? How do we archive dance performances? Why does the digital humanities community emphasize tools instead of services?

3) Focus on method

As digital humanities emerges, much attention is being devoted to developing research methodologies. In “Sunset for Ideology, Sunrise for Methodology?,” Tom Scheinfeldt suggests that humanities scholarship is beginning to tilt toward methodology, that we are entering a “new phase of scholarship that will be dominated not by ideas, but once again by organizing activities, both in terms of organizing knowledge and organizing ourselves and our work.”

So what are some examples of methods developed and/or applied by digital humanities researchers? In “Meaning and mining: the impact of implicit assumptions in data mining for the humanities,” Bradley Pasanek and D. Sculley tackle methodological challenges posed by mining humanities data, arguing that literary critics must devise standards for making arguments based upon data mining. Through a case study testing Lakoff’s theory that political ideology is defined by metaphor, Pasanek and Sculley demonstrate that the selection of algorithms and representation of data influence the results of data mining experiments. Insisting that interpretation is central to working with humanities data, they concur with Steve Ramsay and others in contending that data mining may be most significant in “highlighting ambiguities and conflicts that lie latent within the text itself.” They offer some sensible recommendations for best practices, including making assumptions about the data and texts explicit; using multiple methods and representations; reporting all trials; making data available and experiments reproducible; and engaging in peer review of methodology.

4) Digital literary studies

Different methodological approaches to literary study are discussed in the Companion to Digital Literary Studies (DLS), which was edited by Susan Schreibman and Ray Siemens and was released for free online in the fall of 2008. Kudos to its publisher, Blackwell, for making the hefty volume available, along with A Companion to Digital Humanities. The book includes essays such as “Reading digital literature: surface, data, interaction, and expressive processing” by Noah Wardrip-Fruin, “The Virtual Codex from page space to e-space” by Johanna Drucker, “Algorithmic criticism” by Steve Ramsay, and “Knowing true things by what their mockeries be: modelling in the humanities” by Willard McCarty. DLS also provides a handy annotated bibliography by Tanya Clement and Gretchen Gueguen that highlights some of the key scholarly resources in literature, including Digital Transcriptions and Images, Born-Digital Texts and New Media Objects, and Criticism, Reviews, and Tools. I expect that the book will be used frequently in digital humanities courses and will be a foundational work.

5) Crafting history: History Appliances

For me, the coolest—most innovative, most unexpected, most wow!—work of the year came from the ever-inventive Bill Turkel, who is exploring humanistic fabrication (not in the Mills Kelly sense of making up stuff ;), but in the DIY sense of making stuff). Turkel is working on “materialization,” giving a digital representation physical form by using, for example, a rapid prototyping machine, a sort of 3D printer. Turkel points to several reasons why humanities scholars should experiment with fabrication: they can be like DaVinci, making the connection between the mind and hand by realizing an idea in physical form; study the past by recreating historical objects (fossils, historical artifacts, etc) that can be touched, rotated, scrutinized; explore “haptic history,” a sensual experience of the past; and engage in “Critical technical practice,” where scholars both create and critique.

Turkel envisions making digital information “available in interactive, ambient and tangible forms.”  As Turkel argues, “As academic researchers we have tended to emphasize opportunities for dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. We need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment.” Turkel and his team are working on 4 devices: a dashboard, which shows both public and customized information streams on a large display; imagescapes and soundscapes that present streams of complex data as artificial landscapes or sound, aiding awareness; a GeoDJ, which is an iPod-like device that uses GPS and GIS to detect your location and deliver audio associated with it ( e.g. percussion for an historic industrial site); and ice cores and tree rings, “tangible browsers that allow the user to explore digital models of climate history by manipulating physical interfaces that are based on this evidence.” This work on ambient computing and tangible interfaces promises to foster awareness and open up understanding of scholarly data by tapping people’s natural way of comprehending the world through touch and other forms of sensory perception. (I guess the senses of smell and taste are difficult to include in sensual history, although I’m not sure I want to smell or taste many historical artifacts or experiences anyway. I would like to re-create the invention of the Toll House cookie, which for me qualifies as an historic occasion.) This approach to humanistic inquiry and representation requires the resources of a science lab or art studio—a large, well-ventilated space as well as equipment like a laser scanner, lathes, mills, saws, calipers, etc. Unfortunately, Turkel has stopped writing his terrific blog “Digital History Hacks” to focus on his new interests, but this work is so fascinating that I’m anxious to see what comes next–which describes my attitude toward digital humanities in general.

Digital Humanities Sessions at MLA 2008

A couple of days after returning from the MLA (Modern Language Association) conference, I ran into a biologist friend who had read about the “conference sex” panel at MLA.  She said,  “Wow, sometimes I doubt the relevance of my research, but that conference sounds ridiculous.” I’ve certainly had my moments of skepticism toward the larger purposes of literary research while sitting through dull conference sessions, but my MLA experience actually made me feel jazzed and hopeful about the humanities.  That’s because the sessions that I attended–mostly panels on the digital humanities–explored topics that seemed both intellectually rich and relevant to the contemporary moment.  For instance, panelists discussed the significance of networked reading, dealing with information abundance, new methods for conducting research such as macroanalysis and visualization, participatory learning, copyright challenges, the shift (?) to digital publishing, digital preservation, and collaborative editing.  Here are my somewhat sketchy notes about the MLA sessions I was able to attend; see great blog posts by Cathy Davidson, Matt Gold, Laura Mandell, Alex Reid, and John Jones for more reflections on MLA 2008.

1)    Seeing patterns in literary texts
At the session “Defoe, James, and Beerbohm: Computer-Assisted Criticism of Three Authors,” David Hoover noted that James scholars typically distinguish between his late and early work.  But what does that difference look like?  What evidence can we find of such a distinction? Hoover used computational/ statistical methods such as Principal Components Analysis and the T-test to examine the word choice in across James’ work and found some striking patterns illustrating that James’ diction during his early period was indeed quite different from his late period.   Hoover introduced the metaphor of computational approaches to literature serving either as a telescope (macroanalysis, discerning patterns across a large body of texts) or a microscope (looking closely at individual works or authors).

2)    New approaches to electronic editing

The ACH Guide to Digital-Humanities Talks at the 2008 MLA Convention lists at least 9 or 10 sessions concerned with editing or digital archives, and the Chronicle of Higher Ed dubbed digital editing as a “hot topic” for MLA 2008.   At the session on Scholarly Editing in the Twenty-First Century: Digital Media and Editing, Peter Robinson (whose paper was delivered by Jerome McGann and included passages referencing Jerome McGann) presented the idea of “Editing without walls,” shifting from a centralized model where a scholar acts as the “guide and guardian” who oversees work on an edition to a distributed, collaborative model.  With “community made editions,” a library would produce high quality images, researchers would transcribe those images, other researchers would collate the transcriptions, others would analyze the collations and add commentaries, etc. Work would be distributed and layered.  This approach opens up a number of questions: what incentives will researchers have to work on the project? How will the work be coordinated? Who will maintain the distributed edition for the long term?  But Robinson claimed that the approach would have significant advantages, including reduced cost and greater community investment in the editions.  Several European initiatives are already working on building tools and platforms similar to what Peter Shillingsburg calls “electronic knowledge sites,” including the Discovery Project, which aims to “explore how Semantic Web technology can help to create a state-of-the-art research and publishing environment for philosophy” and the Virtual Manuscript Room, which “will bring together digital resources related to manuscript materials (digital images, descriptions and other metadata, transcripts) in an environment which will permit libraries to add images, scholars to add and edit metadata and transcripts online, and users to access material.”

Matt Kirschenbaum then posed the provocative question if Shakespeare had a hard drive, what would scholars want to examine: when he began work on King Lear, how long he worked on it, what changes he made, what web sites he consulted while writing?  Of course, Shakespeare didn’t have a hard drive, but almost every writer working now uses a computer, so it’s possible to analyze a wide range of information about the writing process.  Invoking Tom Tanselle, Matt asked, “What are the dust jackets of the information age?” That is, what data do we want to preserve?  Discussing his exciting work with Alan Liu and Doug Reside to make available William Gibson’s Agrippa in emulation and as recorded on video in the early 1990s, Matt demonstrated how emulation can be used to simulate the original experience of this electronic poem.  He emphasized the importance of collaborating with non-academics–hackers, collectors, and even Agrippa’s original publisher–to learn about Agrippa’s history and make the poem available.  Matt then addressed digital preservation.  Even data designed to self-destruct is recoverable, but Matt expressed concern about cloud computing, where data exists on networked servers.  How will scholars get access to a writer’s email, Facebook updates, Google Docs, and other information stored online?  Matt pointed to several projects working on the problem of archiving electronic art and performances by interviewing artists about what’s essential and providing detailed descriptions of how they should be re-created: Forging the Future and Archiving the Avante Garde.
3)    Literary Studies in the Digital Age: A Methodological Primer

At the panel on Methodologies Literary Studies in the Digital Age, Ken Price discussed a forthcoming book that he is co-editing with Ray Siemens called Literary Studies in a Digital Age: A Methodological Primer.  The book, which is under consideration by MLA Press, will feature essays such as John Unsworth on electronic scholarly publishing, Tanya Clement on critical trends, David Hoover on textual analysis, Susan Schreibman on electronic editing, and Bill Kretzschmer on GIS, etc.   Several authors to be included in the volume—David Hoover, Alan Liu, and Susan Schreibman—spoke.

Hoover began with a provocative question: do we really want to get to 2.0, collaborative scholarship? He then described different models of textual analysis:
i.    the portal (e.g. MONK, TAPOR): typically a suite of simple tools; platform independent; not very customizable
ii.     desktop tools (e.g. TACT)
iii.    standardized software used for text analysis (e.g. Excel)

Next, Alan Liu  discussed his Transliteracies project, which examines the cultural practices of online reading and the ways in which reading changes in a digital environment (e.g. distant reading, sampling, and “networked public discourse,” with links, comments, trackback, etc).  The transformations in reading raise important questions, such as the relationship between expertise and networked public knowledge.  Liu pointed to a number of crucial research and development goals (both my notes and memory get pretty sketchy here):
1)    development of a standardized metadata scheme for annotating social networks
2)    data mining and annotating social computing
3)    reconciling metadata with writing systems
4)    information visualization for the contact zone between macro-analysis and close reading
5)    historical analysis of past paradigms for reading and writing
6)    authority-adjudicating systems to filter content
7)    institutional structures to encourage scholars to share and participate in new public knowledge

Finally, Susan Schreibman discussed electronic editions.  Among the first humanities folks drawn to the digital environment were editors, who recognized that electronic editions would allow them to overcome editorial challenges and present a history of the text over time, pushing beyond the limitations of the textual apparatus and representing each edition.  Initially the scholarly community focused on building single author editions such as the Blake and Whitman Archives.  Now the community is trying to get beyond siloed projects by building grid technologies to edit, search and display texts.  (See, for example, TextGrid, http://www.textgrid.de/en/ueber-textgrid.html).   Schreibman asked how we can use text encoding to “unleash the meanings of text that are not transparent” and encode themes or theories of text, then use tools such as TextArc or ManyEyes to engage in different spatial/temporal views.

A lively discussion of crowdsourcing and expert knowledge followed, hinging on the question of what the humanities have to offer in the digital age.  Some answers: historical perspective on past modes of reading, writing and research; methods for dealing with multiplicity, ambiguity and incomplete knowledge; providing expert knowledge about which text is the best to work with.  Panelists and participants envisioned new technologies and methods to support new literacies, such as the infrastructure that would enable scholars and readers to build their own editions; a “close-reading machine” based upon annotations that would enable someone to study, for example, dialogue in the novel; the ability to zoom out to see larger trends and zoom in to examine the details; the ability to examine “humanities in the age of total recall,” analyzing the text in a network of quotation and remixing; developing methods for dealing with what is unknowable.

4) Publishing and Cyberinfrastructure

At the panel on publishing and cyberinfrastracture moderated by Laura Mandell, Penny Kaiserling from the University of Virginia Press, Linda Bree from Cambridge UP, and Michael Lonegro from Johns Hopkins Press discussed the challenges that university presses are facing as they attempt to shift into the digital.  At Cambridge, print sales are currently subsidizing ebooks.  Change is slower than was envisioned ten years ago, more evolutionary than revolutionary.  All three publishers emphasized that publishers are unlikely to transform their publishing model unless academic institutions embrace electronic publication, accepting e-publishing for tenure and promotion and purchasing electronic works.  Ultimately, they said, it is up to the scholarly community to define what is valued.  Although the shift into electronic publishing of journals is significant, academic publishers’ experience lags in publishing monographs.  One challenge is that journals are typically bundled, but there isn’t such a model for bundling books.  Getting third party rights to illustrations and other copyrighted materials included in a book is another challenge.  Ultimately scholars will need to rethink the monograph, determining what is valuable (e.g. the coherence of an extended argument) and how it exists electronically, along with the benefits offered by social networking and analysis.  Although some in the audience challenged the publishers to take risks in initiating change themselves, the publishers emphasized that it is ultimately up to scholarly community.  The publishers also asked why the evaluation of scholarship depended on a university press constrained by economics rather than scholars themselves–that is, why professional review has been outsourced to the university press.

5) Copyright

The panel on Promoting the Useful Arts: Copyright, Fair Use, and the Digital Scholar, which was moderated by Steve Ramsay, featured Aileen Berg explaining the publishing industry’s view of copyright, Robin G. Schulze describing the nightmare of trying to get rights to publish an electronic edition of Marianne Moore’s notebooks, and Kari Kraus detailing how copyright and contract law make digital preservation difficult.  Schulze asked where the MLA was when copyright was extended through the Sony Bono Act, limiting what scholars can do, and said she is working on pre-1923 works to avoid the copyright nightmare.  Berg, who was a good sport to go before an audience not necessarily sympathetic to the publishing industry’s perspective, advised authors to exercise their own rights and negotiate their agreements rather than simply signing what is put before; often they can retain some rights.  Kraus discussed how licenses (such as click-through agreements) are further limiting how scholars can use intellectual works but noted some encouraging signs, such as the James Joyce estate’s settlement with a scholar allowing her to use copyrighted materials in her scholarship.  Attendees discussed ways that literature professors could become more active in challenging unfair copyright limitations, particularly through advocacy work and supporting groups such as the Electronic Frontier Foundation.

6) Humanities 2.0: Participatory Learning in an Age of Technology

The Humanities 2.0 panel featured three very interesting presentations about the projects funded through the MacArthur Digital Learning competition, as well Cathy Davidson’s overview of the competition and of HASTAC.  (For a fuller discussion of the session, see Cathy Davidson’s summary.) Davidson drew a distinction between “digital humanities,” which uses the digital technologies to enhance the mission of the humanities, and humanities 2.0, which “wants us to combine critical thinking about the use of technology in all aspects of social life and learning with creative design of future technologies” (Davidson).    Next Howard Rheingold discussed the “social media classroom,” which is “a free and open-source (Drupal-based) web service that provides teachers and learners with an integrated set of social media that each course can use for its own purposes—integrated forum, blog, comment, wiki, chat, social bookmarking, RSS, microblogging, widgets, and video commenting are the first set of tools.”  Todd Presner showcased the Hypercities project, a geotemporal interface for exploring and augmenting spaces.  Leveraging the Google Maps API and KML, HyperCities enable people to navigate and narrate their own past through space and time, adding their own markers to the map and experiencing different layers of time and space.  The project is working with citizens and students to add their own layers of information—images, narratives—to the maps, making available an otherwise hidden history.  Currently there are maps for Rome, LA, New York, and Berlin.   A key principle behind HyperCities is aggregating and integrating archives, moving away from silos of information. Finally, Greg Niemeyer and Antero Garcia presented BlackCloud.org, which is engaging students and citizens in tracking pollution using whimsically designed sensors that measure pollution.  Students tracked levels of pollution at different sites—including in their own classroom—and began taking action, investigating the causes of pollution and advocating for solutions.  What unified these projects was the belief that students and citizens have much to contribute in understanding and transforming their environments.

7. The Library of Google: Researching Scanned Books

What does Google Books mean for literary research?  Is Google Books more like a library or a research tool?  What kind of research is made possible by Google Books (GB)? What are GB’s limitations?  Such questions were discussed in a panel on Google Books that was moderated by Michael Hancher included Amanda French, Eleanor Shevlin, and me.  Amanda described how Google Books enabled her to find earlier sources on the history of the villanelle than she was able to locate pre-GB, Eleanor provided a book history perspective on GB, and I discussed the advantages and limitations of GB for  digital scholarship (my slides are available here).  A lively discussion among the 35 or so attendees followed; all but one person said that GB was, on balance, good for scholarship, although some people expressed concern that GB would replace interlibrary loan, indicated that they use GB mainly as a reference tool to find information in physical volumes, and emphasized the need to continue to consult physical books for bibliographic details such as illustrations and bindings.

8. Posters/Demonstrations: A Demonstration of Digital Poetry Archives and E-Criticism: New Critical Methods and Modalities

I was pleased to see the MLA feature two poster sessions, one on digital archives, one on digital research methods. Instead of just watching a presentation, attendees could engage in discussion with project developers and see how different archives and tools worked.  That kind of informal exchange allows people to form collaborations and have a more hands-on understanding of the digital humanities. (I didn’t take notes and the sessions occurred in the evening, when my brain begins to shut down, so my summary is pretty unsophisticated: wow, cool.)

Reflections on MLA

This was my first MLA and, despite having to leave home smack in the middle of the holidays, I enjoyed it.   Although many sessions that I attended shifted away from the “read your paper aloud when people are perfectly capable of reading it themselves” model, I noted the MLA’s requirement that authors bring three copies of their paper to provide upon request, which raises the question what if you don’t have a paper (just Powerpoint slides or notes) and why can’t you share electronically? And why doesn’t the MLA  provide fuller descriptions of the sessions besides just title and speakers?  (Or am I just not looking in the right place?)  Sure, in the paper era that would mean the conference issue of PMLA would be several volumes thick, but if the information were online there would be a much richer record of each session.  (Or you could enlist bloggers or twitterers [tweeters?] to summarize each session…) After attending THAT Camp, I’m a fan of the unconference model, which fosters the kind of engagement that conferences should be all about—conversation, brainstorming, and problem-solving rather than passive listening.  But lively discussions often do take place during the Q & A period and in the hallways after the sessions (and who knows what takes place elsewhere…)

Studying the History of Reading Using Google Books (and Other Sources)

To what extent can digital collections such as Google Books help to reconstruct us to the history of readers’ responses to literary works–in my case, readers’ responses to Reveries of a Bachelor (1850), which I’m using as a case study of doing research in the Library of Google?  (For an account of my post-marital fascination with bachelors, see my last post.) Readers’ enthusiasm for this sentimental work stirred up my own interest in it.  At Yale’s Beineke Library, I examined a cache of fan letters in which readers rhapsodized  over the bachelor’s

Patrick Henry's annotations to Reveries

Patrick Henry's annotations to Reveries

reveries and connected them to their own experiences.  As one correspondent, a doctor, wrote,  “I have found it really a book of the heart—of my heart—an echo of my own reveries.”  At Yale I also examined Emily Dickinson’s copy of Reveries, where she (or perhaps someone to whom she loaned the volume) made marks next to significant passages. At the University of Virginia Library, I stumbled across an 1886 edition of Reveries heavily annotated by a young man named Patrick Henry.  In a passage where Mitchell described “a Bachelor of seven and twenty,” Patrick crossed out the seven and wrote in “four,” signaling his own intense identification with the bachelor narrator.  Drawing on these and other examples, I wrote a dissertation chapter on readers’ responses to Reveries (later to morph into a 2003 article in Book History) that challenged the notion that sentimental readers were passive.  But I was examining a fairly limited set of reader responses–about 25 letters from the 1850s to the late 19th century, plus a couple of annotated copies of Reveries.  I could offer an even richer analysis of readers’ reactions to Reveries by examining journal entries, memoirs, and letters, as well as even more annotated copies.  I’m especially interested in whether readers’ views of the book changed over time, given that the book was popular from 1850 into the twentieth century. Could I find such evidence in Google Books?

What I Found

Here’s what I found doing a keyword search in Google Books “Reveries of a Bachelor”; I still need to process the hundreds of results I got searching for “Ik Marvel” and “Ike Marvel” (the pen name of the author of Reveries), as well as searching for those terms in the Open Content Alliance.

  • Recent secondary sources on reading that include short passages on Reveries:
    • Ronald and Mary Saracino Zboray’s 2006 account of a would-be suitor attempting to woo an already-engaged woman by giving her a copy of Reveries; she noted in her diary that she would prefer to read the book than spend time with him
    • Claire White Putala’s Reading and Writing Ourselves Into Being, which discusses how Joe Lord recommended Reveries to Eliza Wright Osborne immediately before she married another man
    • Alan Boye’s account in Tales from the Journey of the Dead of a soldier suffering from a broken heart who read Reveries in a Confederate camp
    • So, hmm, Reveries seems to have been read by heartbroken men, who seemed to use the book to express how they felt to the women they were pursuing.  All three of the above books are based on archival research, which leads me to suspect that I would find a number of references to Reveries in archival collections (if I had the time and money to visit them).
  • Memoirs that include brief mentions of Reveries:
    • Mountaineer Belmore Browne’s association of Reveries with melancholia in The Conquest of Mt. McKinley (first published 1913): “I know of nothing in this world that will produce a stronger attack of melancholia than reading The Reveries of a Bachelor on a fog-draped glacier!”
    • Philosopher Morris R. Cohen’s sense that Reveries stimulated feeling and brought relief: “Today I felt very relieved by reading Marvel’s Reveries of a Bachelor. It aroused new strains of feeling I don’t know whether I should be ashamed of wishing…”  [snippet view]
    • Richard St. Clair Steel’s description of the beauty of Reveries
    • My questions: Did women memoirists likewise praise Reveries? Why did the book have such emotional resonance?
  • Evidence that Reveries was embraced by educational, religious, and cultural authorities
    • the University of the State of New York Regents High School Exam, American Literature section included questions about Reveries in 1906, 1894, 1908, 1899, 1903, and 1897 (for whatever reason, I discovered this information not in my original search for “Reveries of a Bachelor,” but in a later search  for “”Reveries of a Bachelor” enrica”, Enrica being the name of one of the women for whom the bachelor longs)
    • Reveries was excerpted in several literary anthologies, including Harper’s First [ -sixth] Reader (1889),  The Ridpath Library of Universal Literature (1898), and American Literature Through Illustrative Readings (1915)
    • Reveries was recommended  for the high school reading list (essays) by the National Council of Teachers of English (1913).  It also appeared in quotation books.
    • The author of the satiric “Reflections of a War Camp Librarian” (1918) notes that American citizens sent Reveries and other gift books to soldiers on the battlefield in WWI, not exactly the kind of reading material soldiers craved
    • A “Country Parson” noted in 1862 how Reveries brought about “revelations of personal feeling” among the unmarried
  • Reveries appeared in many printed library catalogs from the 1850s to the 1920s, including catalogs for the Boston Public, Detroit Public, New Zealand Parliament Library, Princeton University, Library company of Philadelphia, and the British Museum Dept. of Printed Books
  • Reveries was not only read in private, but re-imagined as tableaux and read aloud at home and in public
  • Reviews of Reveries

Google Books as a Research Source

  • Except for the reviews (many of which I had already consulted) and the secondary sources on reading (which I probably would have consulted), searching Google Books enabled me to find many resources that I probably never would have discovered, including memoirs, high school curricula, and guides to performing (reciting/acting out) Reveries.  Although these sources (which I haven’t fully analyzed) haven’t radically changed my view of Reveries, they do give me a better sense of the cultural impact that the book had, as well as its personal significance to readers, who read it while climbing mountains, dealing with emotional turmoil, etc.
  • I had hoped to find annotations in scanned versions of Reveries collected in Google Books and Open Content Alliance.  However, in the copies I examined (and I should say that I glanced over them rather than scrutinized every page), I only found minor annotations–people would typically write their names in their books or inscribe a message to the recipient of the gift book, and a few readers made marks next to passages, but I found nothing like Patrick Henry’s ecstatic annotations.
  • For the texts are only available as fragments around a search term, Google Books functions as a ramped-up research index, pointing me to materials that I often need to consult in the print to put the search results in context, at least until Google Book Search settlement goes through and the out-of-print materials are also available as full text.  (For some of the limited preview books, such as reference books, however, I’m able to pull out enough information from the pages that are available without having to see the whole book.)

Using Google Books to Research Publishing History

At the upcoming Modern Language Association conference, I will join Amanda French and Eleanor Shevlin on a panel called “The Library of Google: Researching Scanned Books,” which is sponsored by SHARP and will be moderated by Michael Hancher.  Google Books has already scanned over 7 million volumes (more than many research libraries hold) and, according to Planet Google, aims to scan every volume in the WorldCat catalog, around 32 million. Our panel will focus on the significance of Google Books for literary research, looking at questions such as whether scholars can trust it and how they should deal with such plenitude.  I plan to discuss my study examining how many of the works in my dissertation bibliography are now available electronically, as well as more recent work using Google Books and other online sources to explore the history of a nineteenth-century bestseller, Donald Grant Mitchell’s Reveries of a Bachelor (1850).  Reveries fascinates me—not so much because I identify with the bachelor narrator’s fantasies and fears of what it’s like to be married (actually, I find the book kind of cloying), but because I’m intrigued by Reveries‘ cultural impact from the 1850s into the early twentieth century.  It sold at least a million copies and appeared in dozens of editions,  from a cheap edition selling for 8 cents to a $6 gift volume in an exquisite morocco binding.  Emily Dickinson loved it, as did readers who evinced their admiration by sending fan letters to Mitchell or making marks in the margins of their book.  In this blog post, I’ll focus on how I’ve employed Google Books to illuminate Reveries‘ publishing history; future posts will look at reader responses, textual history, and authorship.

For a graduate seminar on textual editing way back in the 90s,  I developed an online critical edition of the book’s first reverie.  I also wrote an article analyzing a series of letters that Reveries’ publisher, Charles Scribner II, sent to Mitchell to negotiate the pricing and physical form of new editions between 1883 and 1907, as the publisher and author worked to sustain the popularity of the book and maintain their hold on the market after their copyright expired.  But my publishing history is incomplete; I want to know more about the different forms Reveries took, how it was advertised, what the prices were at different times, how well the book sold, what marketing strategies Scribner and other publishers pursued, and whether Reveries is a unique case or fairly typical, at least for a nineteenth century bestseller.

By using Google Books, I’ve been able to fill in some details about the book’s publishing history, particularly about pricing and advertising.  As amazed as I am by ability to search across millions of books for references to Reveries, I’m also somewhat frustrated by the strange ways that Google Book search works (or doesn’t work) and disappointed that some materials don’t seem to be available.

Title page of 1850 Reveries of a Bachelor

Title page of 1850 Reveries of a Bachelor

What I already knew:

  • The authorized publisher of Reveries, Scribner’s, issued many editions, including:
  • Copyright on Reveries expired in 1892, which meant that other publishers could legally come out with their own editions of the book.  Charles Scribner II wrote to Donald Grant Mitchell to discuss how to respond to this challenge, particularly the threat from Altemus, which he characterized as a “piratical publisher.” Scribner proposed offering a cheap (30 cent) edition “to make it so unprofitable that the publisher [Altemus] will not be encouraged to take up the other books [by Mitchell],” along with a moderately-priced (75 cent) edition.  At the suggestion of Mitchell, Scribner also advertised that the company remained the only authorized publisher of Reveries.
  • Undeterred, many publishers issued unauthorized editions, including Henry Altemus Company, Optimus Printing Company, The Rodgers Company, Donohue, Henneberry, & Co, Porter, W. L. Allison Company, F. T. Neely, Thomas Y. Crowell Company Publishers, The Mershon Company Publishers, G. Munro’s Sons, H. M. Caldwell Company, The Henneberry Company, M. A. Donohue & Company, Homewood Company, A. L. Burt Company, The F. M. Lupton Company, H. M. Caldwell Co., Strawbridge & Clothier, The Edward Publishing Company, W. B. Conkey Company, Acme Printing Company, The Bobbs-Merrill Company Publishers, and R. F. Fenno & Company (BAL, 240-1; NUC, 664-667).   While I was researching Reveries at Yale, I came across several of these volumes, one of which had annotations such as “The illustrations are [most of them] execrable, & there is an occasional ‘mending’ of the text…”  In the preface to the 1907 Author’s Complete Edition of Reveries, Mitchell fixated on the problem of piracy, noting that he had amassed a collection of over 40 imprints of Reveries, only one of which brought him any money.  Apparently Mitchell’s collection–and annotations–ended up at Yale.

Method

To determine how many Reveries related works were available in Google Books, I did a keyword search for “Reveries of a Bachelor.”  The total number of results fluctuated; one day it was 641, another 916, another 809.  But forget about getting to result #641.  One result screen says: “151 – 200 of 809,” but then the next one says “Books 201 – 220 of 220.”  Huh? So what happened to everything else?  Perhaps duplicates are eliminated as you make your way through the results (although there were plenty of duplicates in the results I looked at), perhaps the algorithm used to calculate the number of results is, er, inexact and shifting, or perhaps Google figures you don’t really want to look that many results anyway.  Whatever the explanation, I can’t help wonder about what I’m not getting to see, so my trust in Google Books is diminished a bit, even as I feast on the plenty that is available. 

In any case, I looked at each result available to me, discarding those that weren’t really focused on Reveries and grabbing the bibliographic info for the rest through Zotero.  (I love Zotero, but I was a little frustrated that it didn’t capture the URL and publisher info for  Google Books, which may have to do with the way that Google makes available that information.)  When I wasn’t impeded by texts that offered only snippet views or no preview at all, I copied out a chunk of text that contained the Reveries reference and dumped it into a note in Zotero.  Categorizing as I waded through the results, I added a tag or two for each work, such as “reveries_ad” or “reveries_review.”

Since Mitchell used the pen name “Ik Marvel,” I also searched for “Ik Marvel” (1285 results, today) and “Ike Marvel” (606 results); I’m still working through those results.   I used TAPOR to generate a list of word pairs in Reveries that I hoped to use in searching for works connected to Reveries, but there were only a few pairs that seemed at all unique, such as “Aunt Tabithy,” the name of a character in the book.

Bobbs-Merrill Ad for Reveries

Bobbs-Merrill Ad for Reveries

What I discovered about publishing history using Google Books

  • Pricing: By searching book catalogs, advertisements, and old issues of Publishers Weekly, I was able to track the price for different versions of Reveries between 1851 and 1906.  The pricing data reveals the many choices enjoyed by consumers who wanted to buy a copy of Reveries, particularly at the end of the nineteenth century, when competing publishers entered the market.  Say a consumer in the late nineteenth century wanted a cheap copy of Reveries.  How about paying 8 cents for the “Ideal Library” version, or 18 cents for “Handy Volume” edition? How about a moderately priced edition?  The price of Scribner’s standard duodecimo edition remained fairly steady between 1854 and 1903: $1.25.  If people craved a fine edition, they would have many choices, such as the 1903 Dainty Small Gift Books, Agate Morocco Series with gilt edges for $2.25, the 1906 Bobbs-Merrill Ashe Illustrated Gift Edition for $2, the 1903 Limp Walrus Edition for $2,  the 1903 Limp Lizard Series for $1.50,   (If I start a band, I’m going to call it Limp Lizard.)Big gaps in my knowledge remain–I wasn’t able to find pricing information for the 1850 first edition or the 1907 Edgewood Edition, or for many of the unauthorized editions.   However, without the ability to search across a vast collection of texts I doubt I would have been able to find much of the pricing information at all, particularly in the book advertisements that appeared in magazines and at the end of books, as publishers promoted other books in their catalog.  I probably should have known to look for information about Reveries in book catalogs and late nineteenth-century issues of Publisher’s Weekly, but Google Book Search sure made it easy for me to find relevant information.
  • Response to the copyright expiration: In one of Scribner’s letters to Mitchell, I found a copy of an ad Scribners planned to run advertising its cheap edition and asserting that some portions of Reveries (the new prefaces) remained in copyright.  In Publisher’s Weekly from 1893, I found what I think is that very ad.  I wondered if Scribner’s was unique in handling copyright expiration by releasing a cheap edition and asserting continued copyright over some section. Apparently not. Right after a Scribner’s ad warning that “An action will be promptly brought against any one infringing upon the author rights,” I saw a similar ad from J. B. Lippincott Company for Susan Warner’s The Wide, Wide World, reminding “the trade” that the illustrations remained in copyright and promoting a new 75 cent cheap edition.
  • Marketing: By examining over 25 ads for Reveries available through Google Books, I’ve noticed some (fairly unsurprising) patterns:  Although the book was in Scribner’s catalog throughout the late 19th century, promotion of the book was ramped up when new editions were issued; the publisher often took out full page ads or put Reveries at the top of ads announcing several books.  By the 1890s, Scribner’s was describing Reveries as “an American classic” and predicting that the book would win over “fresh fields” of new readers.  Although I’ve found few ads from competing publishers, Bobbs-Merrill came out with an eye-catching ad for its illustrated gift edition in 1906.   So that I have a visual record of stuff I’ve look at, I’ve set up a Google notebook with clippings of ads for and reviews of Reveries that I found in Google Books.  Creating the notebook was easy; if the book is in the public domain, you can clip out sections of text and post them to your Google Notebook or Blogger blog. (If only you could post to a WordPress blog, or Flickr…)
  • Versions of Reveries: I expected to find more editions of Reveries in Google Books.  When I did a title search for “Reveries of a Bachelor,” only 21 results were returned, and only 4 of those are available as full view, even though 20 were published before 1921 and are in the public domain. (Another is a large print reprint edition from 2008.)  By contrast, the Open Content Alliance provides full access to 18 versions of Reveries, including an 1889 edition marked “Book digitized by Google from the library of the New York Public Library and uploaded to the Internet Archive by user tpb.” (By the way, tpb has apparently uploaded a number of Google Books into the Open Content Archive, prompting some folks to complain about the “pollution” of the OCA by “marginal” Google content.) So why are so many public domain texts in Google Books not fully available?  I’m not really sure, although Planet Google says that Google Books contains metadata (catalog) records for works that it did not digitize and thus are not in its collection.  In any case, if you’re interested in the physical form of books, the Open Content Alliance seems to be a better source than Google Books, since every page is scanned in full color (except, of coure, what’s been uploaded from Google Books) and is presented in a book-like interface, with flippable pages.  You can download pdf, plain text, and DJVU versions, which promotes (re-)use and analysis of the books. I should note that the Open Content Alliance has its own quirks.   OCA content appears to be available through two online collections: the Internet Archive and Open Library.  It’s not immediately obvious how to do a full-text search in OCA. It seems that you can only search bibliographic metadata in the Internet Archive, but you can do full text search at the Open Library.  To do so, go to the advanced search (http://openlibrary.org/advanced) and enter your query into the search box at the bottom.  Another quirk:  you can’t see front covers in OCA in the flip-view interface, but you can if you look at the DJVU files. But it’s even easier to put page images from OCA content into a Google Notebook; whereas in Google Books you have to crop out a section of a page and select where to send it, with OCA you just right click and send the entire page image to your notebook. (For instance, I created one for different editions of Reveries, documenting illustrations, title pages, etc.)

Limitations of Google Books

  • As noted above, not all public domain materials are available
  • Weirdness in retrieval of search results; 800 results suddenly become 220 when you work your way through the results
  • OCR errors: Among the different variations of “Ik Marvel” and “Reveries of a Bachelor; A Book of the Heart” that I found:
    o    IK MABVEL
    o    Heveries of a Bachelor (a search for this term yields 10 results in Google Books)
    o    REVERIES OF A BACHELOR; or, a Rook of the Heart
    o    REVERIES OF A BACHELOR; or, a Bonk of the Heart.
    o    Reveries of a Bad elor.
    o    REVERIES OF A BACHELOR, a Boob of the Heart. By IK. MAETEL
    You have to be resourceful, then, in how you construct a search, taking into account OCR problems.  That said, “Reveries of a Bachelor” returned hundreds of results.
  • Google Books does not contain archival materials. (Google has moved into digitizing newspapers and magazines, so who knows–maybe archives are coming? But it would be very tricky and expensive for Google to undertake such a project.)  Although searching Google Books is certainly more convenient than visiting an archive, I love being in archives, looking at stuff that few others have seen.  Even though I found a lot of useful resources in Google Books, I learned the most about the publishing history of Reveries by examining the letters from Charles Scribner II to Mitchell held by the Beinecke Library at Yale and by examining the volumes referenced in the letters.
  • If you’re interested in bibliography, as I am, looking at even a high quality scan can’t substitute for examining the physical volume, studying details such as the size of the book, the quality of the paper, the bindings, etc. But scans can give you an idea of what the volume looks like and help you to identify it.

In my next post, I’ll look at how using Google Books is helping me reconstruct the history of readers’ responses to Reveries.

Work Product Blog

Matt Wilkens, post-doctoral fellow at Rice’s Humanities Research Center, recently launched Work Product, a blog that chronicles his research in digital humanities, contemporary fiction, and literary theory.  Matt details how he is working through the challenges he faces as he tries to analyze the relationship between allegory and revolution by using text mining, such as:
•    Where and how to get large literary corpora. Matt looks at how much content is available through Project Gutenberg, Open Content Alliance, Google Books, and  Hathi Trust and  how difficult it is to access
•    Evaluating Part of Speech taggers, with information about speed and accuracy

I think that other researchers working on text mining projects will benefit from Matt’s careful documentation of his process.

By the way, Matt’s blog can be thought of as part of the movement called “open notebook science,” which Jean Claude Bradley defines as “a laboratory notebook… that is freely available and indexed on common search engines.”  Other humanities and social sciences blogs that are likewise ongoing explorations of particular research projects include Wesley Raabe’s blog, Another Anthro Blog, and Erkan’s Field Diary.  (Please alert me to others!)