Category Archives: digital scholarship

Group and Method: Collaboration in the Digital Humanities

Yesterday I gave a talk called “Group and Method: Collaboration in the Digital Humanities” at Case Western Reserve University’s Freedman Center Colloquium on “Exploring Collaboration in Digital Scholarship.” Drawing on my research for “Computing and Communicating Knowledge” and for a series of blog posts, I discussed why collaboration is so common in digital humanities (although of course not all DH work is necessarily collaborative); explored the significance of collaboration in projects to build digital resources, devise new research methods, and promote participatory humanities; and explored challenges to collaboration. I also described how my experiences as a grad student in English convinced me of the value of collaboration–particularly my membership in a dissertation group (I was thrilled that my fellow diss group member Amanda French also gave a talk at the colloquium) and my work at Virginia’s Etext Center.

Here is the pdf of the slides.

Opening the Humanities Part 2: Contexts

In 1813, Thomas Jefferson declared in a letter to Isaac McPherson:

“He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature….”

“Sharing,” by Josh Harper

Unlike, say, a diamond bracelet, an idea can be freely given to others without diminishing its value for the person who “owns” it–indeed, its value only increases as it spreads. While Jefferson believed that the creators of inventions could not claim permanent, natural rights over them, he acknowledged that society could grant the right to profit from them in order to foster innovation (which, as Chris Kelty notes, Jefferson termed the “the embarrassment of an exclusive patent,” suggesting his discomfort). He cautioned that intellectual property rights may actually endanger innovation by granting monopolies, should exist only long enough to spawn innovation, should be governed by rules limiting their application, and should be differentiated according to what benefit they convey to the public (Boyle, The Public Domain).

Jefferson’s letter raises fundamental questions: what social functions do intellectual property rights play? How can we best encourage the sharing of ideas and the progress of knowledge? In this post, the second in my series on the open humanities, I will explore legal and cultural contexts, focusing on the US.

The view that intellectual property rights are granted to encourage innovation is reflected in Article 1, Section 8  of the US Constitution: “To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” Note that that the Constitution describes both the purpose of copyright–”To promote the Progress of Science and useful Arts”–and places limits upon it. Copyright aims to provide an incentive (a limited monopoly) for creators to share their work so that others may make use of it and build upon it. This incentive is balanced by limits, so that after a period of time the work falls into the public domain. The 1790 Copyright Act set the copyright term at 14 years, with the right to renew for another 14 years. Now, after the passage of the Sonny Bono Copyright Term Extension Act, the copyright term has exploded to 70 years after the death of the author. The original intention to encourage the progress of public knowledge seems to have fallen aside in the interest of protecting commercial interests such as Disney’s monopoly over Mickey Mouse.

Expansion of U.S. copyright law (assuming authors create their works 35 years prior to their death) (Wikipedia)

Expansion of U.S. copyright law (assuming authors create their works 35 years prior to their death) (Wikipedia)

With most academic work, the ability to secure a monopoly over one’s ideas is not the primary incentive for sharing. Rather, most academics publish scholarly works in order to make a visible contribution to the scholarly conversation, build their scholarly reputation, and ultimately secure tenure or promotion. Typically researchers do not receive monetary compensation for publishing journal articles; the reward comes in disseminating their research. As Peter Suber suggests, one factor that makes open access more complicated in the humanities is that authors of monographs often expect to receive royalties. However, as Paul Courant points out, the monetary rewards tend to be small; the author of a moderately successful manuscript selling 1000 copies might expect to make less than $4000, and “for many monographs, lifetime royalties are zero or close to it.” As Courant suggests, “The big financial payoff to the author of the great majority of scholarly books is not the royalties but the visibility (and hence the salary and working conditions) of the author in the academic labor market.” If authors aim to contribute to the scholarly conversation and heighten their visibility, it makes sense for them to remove barriers to their work (although they also have an incentive to publish with the top journals or publishers).

Open access facilitates the sharing of scholarly knowledge. Peter Suber, a philosopher and respected advocate for open access, offers a simple definition: “Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions.” Because such literature is digital and available online, distributing it costs almost nothing, and it can be accessed by anyone with an Internet connection. The lack of most restrictions means that the literature could be accessed and mined, which could open up new insights. But creators can put into place some restrictions over open works. For example, they can adopt a Creative Commons license and specify whether the work can be modified and/or used commercially, as well as whether the work must be attributed (CC-BY) and/or whether new versions of the work must be licensed under the same terms (share and share alike). CC-BY upholds the scholarly practice of acknowledging sources (see Bethany Nowviskie’s “why, oh why, CC-BY?” for a smart discussion of the rationale for adopting this license). There are two principal means of disseminating open access scholarly work: green, through depositing works in disciplinary repositories (like arXiv) or institutional repositories (like DSpace@MIT), and gold, through publishing open journals and monographs. Note that many publishers allow scholars to self archive work in repositories; visit SHERPA RoMEO to access publisher policies.

Unfortunately, the humanities seem to be behind the sciences in practicing openness. As Wikipedia explains, the open science movement aims to enlarge access to research, data, and publications, speed up scholarly communication, facilitate collaboration, and improve the sharing and building of knowledge, whether through open lab notebooks, open data, or open access to scholarly literature. There isn’t even a Wikipedia page for open humanities (let’s get to work!). The Directory of Open Access and Hybrid Journals lists nearly 3000 journals in the sciences as opposed to a little over 1300 in the arts & humanities. Much of the rhetoric around openness focuses on science; as a rough measure, there are approximately 973,000 Google results for “open science” versus around 38,000 for “open humanities”.

In a 2004 essay, Peter Suber pointed to a number of reasons why the humanities have been more reluctant to embrace openness than the sciences, including the greater availability of public funding for scientific research (and publishing fees), a deeper sense of a cost crisis with science journals, the significance of pre-print repositories in the sciences, the importance of monographs in the humanities, and the greater public pressure for open access to science. Updating Suber’s analysis eight years later, Gary Daught suggests that the time may be ripe for efforts to promote openness in the humanities. He notes that the price inflation of humanities journals has become a greater concern and that open source tools such as Open Journal Systems have brought down publishing costs. Perhaps most importantly, as scholars become more accustomed to the speed, convenience and openness of online communication, they may more expect that research is easily accessible.

Indeed, I’ve identified a number of open humanities projects, mainly in the digital humanities. Openness in the humanities can take many forms, including:

While these different ways of categorizing openness are helpful, I agree with Clint Lalonde (riffing on Gardner Campbell) that “open is an attitude”– not only being willing to share resources, but also to work in such a way that others can observe, learn and offer to help. In my next post, I’ll provide a number of examples of open humanities projects and initiatives.

Of course, open humanities projects aren’t necessarily focused on digital humanities; note, for instance, publishing initiatives such as Open Humanities Press. With digital humanities, we often see the intersection of humanistic values and what I’ll call Web values. Driven by a desire to make it easier for scientists to share their data and collaborate, Tim Berners-Lee created the foundations of the Web. Rather than being a proprietary system, the Web is built upon open protocols, standards and design principles. The success of the Web comes from the way that it connects people to each other, information, and experiences, enabling them to share ideas, converse with each other, and explore and interact with information. Hence Berners-Lee’s message (appropriately delivered via Twitter) at the 2012 Summer Olympics: “this is for everyone.” What would it take to say the same about humanities scholarship and educational resources?

[Note: This post expands on a presentation I gave at WPI’s Digital Humanities Symposium in November.]

Opening the Humanities Part 1: Overview

Today marks the fifth anniversary of my blog. Over the course of those five years, I’ve learned a simple, vital lesson: sharing is good. When I began my blog, I planned to document the process of remixing my dissertation (completed five years earlier, in 2002) as a work of digital scholarship. I got distracted by other topics, such as making the case for social scholarship, summarizing the year in digital humanities (a task that seems far too daunting today), examining collaboration in DH, and providing resources for getting started in DH. Since I didn’t really expect that the blog would find much of an audience, I was jazzed when people commented on my posts and talked with me about my blog at conferences. Blogging opened up new opportunities for me– invitations to speak or to contribute to essay collections– and made me feel like I was part of a lively community of scholars. Sharing made my work more visible and gave me a greater sense of purpose.

An interest in sharing also led me to team up with several other librarians to start the Digital Research Tools (DiRT) wiki. As I tried to keep up with all of the tools that help researchers find, manage, analyze and present information, I figured it would be better to take on the task collectively and produce a community resource.

Program Building @ THATCamp Vanderbilt by derekbruff

Program Building @ THATCamp Vanderbilt by derekbruff

With DiRT, I was struck by the willingness of the community to share; as I recall, both Alan Liu and Dan Cohen invited me to grab resources from their own tool collections and include them in DiRT, and people volunteered their time to add new information to the wiki. But I also learned that it  requires continuous effort to maintain an active community of contributors; no matter how good our intentions, we only have so much time (and I myself had only limited time to commit to DiRT). Now DiRT has achieved what many start-ups aim for: it’s been acquired by a larger organization. Reborn as Bamboo DiRT, it is nurtured by a steering/ curatorial committee (led by Quinn Dombrowski, who did much of the work creating Bamboo DiRT) that shares its time and expertise to maintain a resource of value to the community.

In retrospect, I see that my attraction to digital humanities comes not so much from a love of technology or method, but of the community and its values. It’s difficult (and perhaps presumptuous) to define the values of such a diverse community, but I would point to openness, collaboration, collegiality and connectedness, diversity and experimentation (as I did in my chapter in Debates in the Digital Humanities). Underlying all of these is openness, broadly defined: openness to new ideas and new participants, openness as a commitment to sharing.

We see openness throughout the digital humanities. As the Manifesto for the Digital Humanities declares, digital humanists are “building a community of practice that is solidary, open, welcoming and freely accessible” as well as “multilingual and multidisciplinary.” This community calls for “open access to data and metadata,” open source software, the development of “collective expertise” and the sharing of best practices. I would point to THATCamp, with its openness to all, spirit of sharing and discovery, and emphasis on collaboration, as the embodiment of this community (appropriately enough, the Manifesto was produced collectively at THATCamp Paris). Openness defines how much of the DH community operates and animates its larger goal to promote the growth of knowledge. Indeed, Mark Sample proposes that The digital humanities is not about building, it’s about sharing, arguing that the “promise of the digital” comes in the circulation, sharing and discussion of knowledge. Instead of tolerating the slow dissemination of knowledge through antiquated print processes and allowing knowledge to be restricted to those with access to well-funded libraries, Sample suggests, we can develop open solutions that promote conversation, sharing, reuse, and the growth of knowledge.

Noting how frequently terms like “open” and “collaboration” are used in definitions of digital humanities, Eric Johnson suggests that the digital humanities have much in common with the public humanities. Like museum professionals and librarians, digital humanists embrace values such as collaboration, open access, and “[i]nvolvement of the public and/or public ‘communities of passion.’” (I love that term “communities of passion,” which captures the generosity, sense of common purpose and enthusiasm I see in DH).  Many digital humanities projects aim to share knowledge with the public and even engage the public in the construction of that knowledge. Eric advances a useful definition of the open humanities: “those aspects of the humanities aimed at democratizing production and consumption of humanities research.” (I would add teaching and learning).

With this post, I am beginning a series on the open humanities, elaborating on ideas I discussed in my November 2 talk at WPI’s Digital Humanities Symposium. I’ll look at the contexts around open humanities, explore the rationale for open humanities (drawing many examples from digital humanities), and examine challenges facing open humanities, particularly cultural and economic ones. Along the way, I’ll discuss the ongoing development of Anvil Academic, an open publisher for the digital humanities (I’m the program manager).  I hope this series shines a light on some of the great work being done in the DH community and stimulates further conversation about the open humanities.

Thanks to everyone who has commented on a post, spread the word about my blog, encouraged me, shared ideas with me, and helped make the DH community (as contentious as it sometimes can be) one of passion.

Scholarly Communication, Open Education, and Digital Humanities Support Models

Over the past few months, I’ve co-written a working paper on open education and made several presentations, including ones on new models for scholarly communication and digital humanities support models at liberal arts colleges. This post is a catch-all catch-up, an attempt to share this stuff (albeit rather late).

  • “New Models and Modes for Scholarly Publishing in the Digital Age” (PDF of PowerPoint) is a presentation that I gave in March at the Association of Research Libraries Leadership and Career Development Program‘s Institute on Transforming Research Library Roles and Scholarly Communication.  It attempts to synthesize innovative approaches to scholarly communication, with a particular focus on peer review (e.g. peer-to-peer, post-publication), publishing models, and business models. [Update, 11/17/12: I corrected a typo on the slide about the number of eprints in arXiv.]
  • Open Education in the Liberal Arts: A NITLE Working Paper, which I co-wrote with my colleague Bryan Alexander, explores the significance of open education (broadly defined) in the liberal arts context. The paper is made available using the CommentPress platform, which reflects our hope to foster discussion. A PDF version is also available.
  • Models for Supporting Digital Humanities at Liberal Arts Colleges (PDF of PPT) looks at the challenges for small colleges in supporting digital humanities initiatives, as well as strategies such as establishing a center (with brief case studies of Hamilton, University of Richmond, and Occidental), inter-institutional collaboration, and integrating with the co-curriculum.  I gave presentation this as part of a Five Colleges of Ohio Next Generation Library workshop hosted by the College of Wooster.

Examples of Collaborative Digital Humanities Projects

Observing that humanities scholars rarely jointly author articles, as I did in my last post, comes as no surprise.  As Blaise Cronin writes, “Collaboration—for which co-authorship is the most visible and compelling indicator—is established practice in both the life and physical sciences, reflecting the industrial scale, capital-intensiveness and complexity of much contemporary scientific research. But the ‘standard model of scholarly publishing,’ one that ‘assumes a work written by an author,” continues to hold sway in the humanities’ (24).   Just as I found that only about 2% of the articles published in American Literary History between 2004 and 2008 were co-authored, so Cronin et al discovered that just 2% of the articles that appeared in the philosophy journal Mind between 1900 and 2000 were written by more than one person, although between 1990 and 2000 that number increased slightly to 4% (Cronin, Shaw, & La Barre).   Whereas the scale of scientific research often requires scientists to collaborate with each other, humanities scholars typically need only something to write with and about.  But as William Brockman, et al suggest, humanities scholars do have their own traditions of collaboration, or at least of cooperation:  “Circulation of drafts, presentation of papers at conferences, and sharing of citations and ideas, however, are collaborative enterprises that give a social and collegial dimension to the solitary activity of writing. At times, the dependence of humanities scholars upon their colleagues can approach joint authorship of a publication” (11).

Information technology can speed and extend the exchange of ideas, as researchers place their drafts online and solicit comments through technologies such as CommentPress, make available conference papers via institutional repositories, and share citations and notes using tools such as Zotero.  Over ten years, ago John Unsworth described an ongoing shift from cooperation to collaboration, indicating perhaps both his prescience and the slow pace of change in academia.

In the cooperative model, the individual produces scholarship that refers to and draws on the work of other individuals. In the collaborative model, one works in conjunction with others, jointly producing scholarship that cannot be attributed to a single author. This will happen, and is already happening, because of computers and computer networks. Many of us already cooperate, on networked discussion groups and in private email, in the research of others: we answer questions, provide references for citations, engage in discussion. From here, it’s a small step to collaboration, using those same channels as a way to overcome geographical dispersion, the difference in time zones, and the limitations of our own knowledge.

The limitations of our own knowledge.  As Unsworth also observes, collaboration, despite the challenges it poses, can open up new approaches to inquiry: “instead of establishing a single text, editors can present the whole layered history of composition and dissemination; instead of opening for the reader a single path through a thicket of text, the critic can provide her with a map and a machete. This is not an abdication of the responsibility to educate or illuminate: on the contrary, it engages the reader, the user, as a third kind of collaborator, a collaborator in the construction of meaning.”  With the interactivity of networked digital environments, Unsworth imagines the reader becoming an active co-creator of knowledge.  Through online collaboration, scholars can divide labor (whether in making a translation, developing software, or building a digital collection), exchange and refine ideas (via blogs, wikis, listservs, virtual worlds, etc.), engage multiple perspectives, and work together to solve complex problems.  Indeed, “[e]mpowering enhanced collaboration over distance and across disciplines” is central to the vision of cyberinfrastructure or e-research (Atkins).  Likewise, Web 2.0 focuses on sharing, community and collaboration.

Work in many areas of the digital humanities seems to both depend upon collaboration and aim to support it.  Out of the 116 abstracts for posters, presentations, and panels given at the Digital Humanities 2008 (DH2008) conference, 41 (35%) include a form of the word “collaboration,” whether they are describing collaborative technologies (“Online Collaborative Research with REKn and PReE”) or collaborative teams (“a collaborative group of librarians, scholars and technologists”).  Likewise, 67 out of 104 (64%) papers and posters presented at DH 2008 have more than one author.  (Both the Digital Humanities conference and LLC tend to focus on the computational side of the digital humanities, so I’d also like to see if the pattern of collaboration holds in what Tara McPherson calls the “multimodal humanities,” e.g. journals such as Vectors.  Given that works in Vectors typically are produced through collaborations between scholars and designers, I’d expect to see a somewhat similar pattern.)

I was having trouble articulating precisely how collaboration plays a role in humanities research until I began looking for concrete examples—and I found plenty.   As computer networks connect researchers to content, tools and each other, we are seeing humanities projects that facilitate people working together to produce, explore and disseminate knowledge.  I interpret the word “collaboration” broadly; it’s a squishy term with synonyms such as teamwork, cooperation, partnership, and working together, and it also calls to mind co-authorship, communication, community, citizen humanities, and social networks.  In Here Comes Everybody, Clay Shirky puts forward a handy hierarchy of collaboration: 1) sharing; 2) cooperation; 3) collaboration; 4) collectivism (Kelly).  In this post, I’ll list different types of computer-supported collaboration in the humanities, note antecedents in “traditional” scholarship, briefly describe example projects, and point to some supporting technologies.  This is an initial attempt to classify a wide range of activity; some of these categories overlap.

–FACILITATING COMMUNICATION AND KNOWLEDGE BUILDING–

ONLINE COMMUNITIES/ VIRTUAL ORGANIZATIONS

  • Historical antecedents: conferences, colloquia, letters
  • Supporting technologies: listservs, online forums, blogs, social networking platforms, virtual worlds, microblogging (e.g. Twitter), video conferencing
  • Key functions: fostering communication and collaboration across a distance
  • Examples:
    • Listervs: Perhaps the most well-known online community in the humanities is H-NET, which was founded in 1992  and thus predates Web 2.0 or even Web 1.0.  According to Mark Kornbluh, H-Net provides an “electronic version of an academic conference, a way for people to come together and to talk about their research and their teaching, to announce what was going on in the field, and to review and critique things that are going on in the field.”  Currently H-Net  supports over 100 humanities email lists and serves over 100,000 subscribers in more than 90 countries.  Although H-Net has been criticized for relying on an old technology, the listserv, and is facing economic difficulties, it remains valued for supporting information sharing and discussion.  For digital humanities folks, the Humanist list, launched in 1987, serves as “an international online seminar on humanities computing and the digital humanities” and has played a vital part in the intellectual life of the community.
    • Online forums: HASTAC, “a virtual network, a network of networks” that supports collaboration across disciplines and institutions, sponsors lively forums about technology and the humanities, often moderated by graduate students.  HASTAC also organizes conferences, administers a grant competition, and advocates for “new forms of collaboration across communities and disciplines fostered by creative uses of technology.” In my experience, online communities often break down the hierarchies separating graduate students from senior scholars and bring recognition to good ideas, no matter what the source.
    • Online communities: Since 1996, Romantic Circles (RC) has built an online community focused on Romanticism, not only fostering communication among researchers but also collaboratively developing content.  Romantic Circles includes a blog for sharing information about news and events of interest to the community; a searchable archive of electronic editions; collections of critical essays; chronologies, indices, bibliographies and other scholarly tools; reviews; pedagogical resources; and a MOO (gaming environment).  Over 30 people have served as editors, while over 300 people have contributed reviews and essays.  Alan Liu aptly summarizes RC’s significance: “Romantic Circles, which helped pioneer collaborative scholarship on the Web, has become the leading paradigm for what such scholarship could be. One can point variously to the excellence of its refereed editions of primary texts, its panoply of critical and pedagogical resources, its inventive Praxis series, its state-of-the-art use of technology or its stirring commitment (nearly unprecedented on the Web) to spanning the gap between high-school and research-level tiers of education. But ultimately, no one excellence is as important as the overall, holistic impact of the site. We witness here a broad community of scholars using the new media vigorously, inventively, and rigorously to inhabit a period of historical literature together.”In building a community that supports digital scholarship, NINES focuses on three main goals: providing peer review for digital scholarship in 19th century American and British studies (thus helping to legitimize and recognize emerging scholarly forms), helping scholars create digital scholarship by providing training and content, and developing software such as Collex and Juxta to support inquiry and collaboration.
    • Advanced videoconferencing: With budgets tight, time scarce, and concern about the environmental costs  of travel increasing, collaborators often need to meet without having to travel.  AccessGrid supports communication among multiple groups by providing high quality video and audio and enabling researchers to share data and scientific instruments seamlessly.  AccessGrid, which was developed by Argonne National Laboratory and uses open source software, employs large displays and multiple projectors to create an immersive environment.   In the arts and humanities, AccessGrid has been used to support “telematic” performances, the study of high resolution images, seminars, and classes.
CollabRoom by Modbob

CollabRoom by Modbob

COLLABORATORIES

  • Historical antecedents: laboratories, research centers,
  • Supporting technologies: grid technologies/ advanced networking, large displays, remote instrumentation, simulation software, collaboration platforms such as HubZero, databases, digital libraries
  • Key functions: fostering communication, collaboration, resource sharing, and research regardless of physical distance
  • Examples:

William Wulf coined the term collaboratory in 1989 to describe a “center without walls, in which the nation’s researchers can perform their research without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries.” Most of the collaboratories listed on the (now somewhat-out-of-date) Science of Collaboratories web site focus on the sciences.  For example, scientific collaboratories such as NanoHub, Space Physics and Astronomy Research Collaboratory (SPARC) and Biomedical Informatics Research Network (BIRN) have supported online data sharing, analysis, and communication.

What would a collaboratory in the humanities do? The term has been used in the humanities to refer to:

“Collaboratory” has thus taken on additional meanings, referring to “a new networked organizational form that also includes social processes; collaboration techniques; formal and informal communication; and agreement on norms, principles, values, and rules” (Cogburn, 2003, via Wikipedia).

“Virtual research environment” seems to be replacing “collaboratory” to refer to online collaborative spaces that provide access to tools and content (e.g. Early Modern Texts VRE, powered by Sakai). Through its funding program focused on Virtual Research Environments, JISC has sponsored the Virtual Research Environment for Archaeology, a VRE for the Study of Documents and Manuscripts, Collaborative Research Events on the Web, and myExperiments for sharing scientific workflows.

–SHARING AND AGGREGATING CONTENT—

DIGITAL MEMORY BANKS/ USER-CONTRIBUTED CONTENT

  • Historical antecedents: museums, archives, personal collections
  • Supporting technologies: Web publishing platforms (e.g. Omeka, Drupal), databases
  • Key functions: “collecting & exhibiting” content (to borrow from CHNM)
  • Examples:
    When the Valley of the Shadow project was launched in the 1990s, project team members went into communities in Pennsylvania and Virginia to digitize 19th century documents held by families in personal collections, thus building a virtual archive.  As scanners and digital cameras have become ubiquitous and user-contributed content sites such as Flickr and YouTube have taken off, people can contribute their own digital artifacts to online collections.  For example, The Hurricane Digital Memory Bank collects over 25,000 stories, images, and other multimedia files about Hurricanes Katrina and Rita.  Using a simple interface, people can upload items and describe the title, keywords, geographic location, and contributor.  The archive thus becomes a dynamic, living repository of current history, a space where researchers and citizens come together—or, in the terminology of the Center for History and New Media (CHNM), a memory bank that “promote[s] popular participation in presenting and preserving the past.”  As the editors of Vectors write in their introduction to “Hurricane Digital Memory Bank: Preserving the Stories of Katrina, Rita, and Wilma,” “Their work troubles a number of binaries long reified by history scholars (and humanities scholars more generally), including one/many, closed/open, expert/amateur, scholarship/journalism, and research/pedagogy.”  CHNM also sponsors digital memory banks focused on Mozilla, September 11, and the Virginia Tech tragedy.  Likewise, the Great War Archive, sponsored by the University of Oxford, contains over 6,500 items about World War I contributed by the public.

CONTENT AGGREGATION AND INTEGRATION

  • Historical antecedents: museums, archives
  • Supporting technologies: databases, open standards
  • Key functions: making it easier to discove, share and use information
  • Examples:
    Too often digital resources reside in silos, as each library or archive puts up its own digital collection.  As a result, researchers must spend more time identifying, searching, and figuring out how to use relevant digital collections.  However, some projects are shifting away from a siloed approach and bringing together collaborators to build digital collections focused on a particular topic or to develop interoperable, federated digital collections.  For instance, the Alliance for American Quilts, MATRIX: Center for Humane Arts, Letters and Social Sciences Online, and Michigan State University Museum have created the Quilt Index, which makes available images and descriptions of quilts provided by 14 contributors, including The Library of Congress American Folklife Center and the Illinois State Museum.  As Mark Kornbluh argues, interoperable content enables new kinds of inquiry: “In the natural sciences, large new datasets, powerful computers, and a rich array of computational tools are rapidly transforming knowledge generation. For the same to occur in the humanities, we need to understand the principle that ‘more is better.’ Part of what the computer revolution is doing is that it is letting us bring huge volumes of material under control. Cultural artifacts have always been held by separate institutions and separated by distance. Large–scale interoperable digital repositories, like the Quilt Index, open dramatically new possibilities to look at the totality of cultural content in ways never before possible.” Other examples of content aggregation and integration projects include the Walt Whitman Archive’s Finding Aids for Poetry Manuscripts and NINES.

DATA SHARING

  • Historical antecedents: informal exchange of data
  • Supporting technologies: databases (MySQL, etc), web services tools
  • Key functions: support research by enabling discovery and reuse of data sets
  • Example projects:
    By sharing data, researchers can enable others to build on their work and provide transparency.  As Christine Borgman writes, “If related data and documents can be linked together in a scholarly information infrastructure, creative new forms of data- and information-intensive, distributed, collaborative, multidisciplinary research and learning become possible.  Data are outputs of research, inputs to scholarly publications, and inputs to subsequent research and learning.  Thus they are the foundation of scholarship” (Borgman 115).  Of course, there are a number of problems bound up in data sharing—how to ensure participation, make data discoverable through reliable metadata, balance flexibility in accepting a range of formats and the need for standardization, preserve data for the long term, etc.  Several projects focused on humanities and social science data are beginning to confront at least some of these challenges:

    • Open Context “hopes to make archaeological and related datasets far more accessible and usable through common web-based tools.”  Embracing open access and collaboration, Open Context makes it easy for researchers to upload, search, tag and analyze archaeological datasets.
    • Through Open Street Map, people freely and openly share and use geographic data in a wiki-like fashion.  Contributors employ GPS devices to record details about places such as the names of roads, then upload this information to a collaborative database.  The data is used to create detailed maps that have no copyright restrictions (unlike most geographical data).
    • Through the Reading Experience Database researchers can contribute records of British readers engaging with texts.

–COLLABORATIVE ANNOTATION, TRANSCRIPTION, AND KNOWLEDGE PRODUCTION–

CROWDSOURCING TRANSCRIPTION

  • Historical antecedents: genealogical research(?)
  • Supporting technologies: wikis
  • Key functions: share the labor required for transcribing manuscripts
  • Examples:
    Much of the historical record is not yet accessible online because it exists as handwritten documents—letters, diaries, account books, legal documents, etc.  Although work is underway on Optical Character Recognition software for handwritten materials, making these variable documents searchable and easy to read usually still requires a person to manually transcribe the document.  Why not enable people to collaborate to make family documents and other manuscripts available through commons-based peer production? At THATCamp last year, I learned about Ben Brumley’s FromthePage software, which enables volunteers to transcribe handwritten documents through a web-based interface.  The right side of the interface shows a zoomable image of the page, while on the left volunteers enter the transcription through a wiki-like interface.  Likewise, the FamilySearch Indexing Project, sponsored by the LDS, recruits volunteers to transcribe family information from historical documents.   (See Jeanne Kramer-Smyth’s great account of the THATCamp session on crowdsourcing transcription and annotation.)  Not only can collaborative transcription be more efficient, but it can also reduce error.  Martha Nell Smith recounts how she, working solo at the Houghton, transcribed a line of Susan Dickinson’s poetry as “I’m waiting but the cow’s not back.’’  When her collaborators at the Dickinson Electronic Archives, Lara Vetter and Laura Lauth, later compared the transcriptions to digital images of Dickinson’s manuscripts, they discovered that the line actually says “‘I’m waiting but she comes not back.”  As Smith suggests, “Had we not been working in concert with one another, and had we not had the high quality reproductions of Susan Dickinson’s manuscripts to revisit and thereby perpetually reevaluate our keys to her alphabet, my misreading might have been congealed in the technology of a critical print translation and what is very probably a poetic homage to Emily Dickinson would have lain lost in the annals of literary history”(Smith 849).

    Efforts to crowdsource transcription seem similar to the distributed proofreading that powers Project Gutenberg, which has enlisted volunteers to proofread over 15,000 books since 2000.  Likewise, Project Madurai is using distributed proofreading to build a digital library of Tamil texts.

COLLABORATIVE TRANSLATION

  • Historical antecedents: translation teams, e.g. Pevear and Volokhonsky
  • Supporting technologies: wikis, blogs, machine translation supplemented by human intervention
  • Examples:
    Rather than requiring an individual to undertake the time-intensive work of translating a complex classical text solo, the Suda Online (SOL)  brings together classicists to collaborate in translating into English the Suda, a tenth century encyclopedia of ancient learning written by a committee of Byzantine scholars (and thus itself a collaboration).  In addition to providing translations, SOL also offers commentaries and references, so it serves as a sort of encyclopedic predecessor to Wikipedia.  As Anne Mahoney reports in a recent article from Digital Humanities Quarterly, an email exchange in 1998 sparked the Suda Online; one scholar wondered whether there was an English translation of the Suda (there wasn’t) and others recognized that a translation could be produced through web-based collaboration.  Student programmers at the University of Kentucky quickly developed the technological infrastructure for SOL (a wiki might have been used today, but the custom application has apparently served its purpose well).  Now a self-organizing team of 61 editors and 95 translators from 12 countries has already translated over 21,000 entries, about 2/3 of the total.  Translators make the initial translations, which are then reviewed and augmented by editors (typically classics faculty) and given a quality rating of “draft,” “low,” or “high.”   All who worked on the translation are credited through a sort of open peer review process.  Whereas collaborative projects such as Wikipedia are open to anyone, SOL translators must register with the project.  Mahoney suggests that the collaboration has succeeded in part because it was focused and bounded, so that collaborators could feel the satisfaction of working toward a common goal and meeting milestones, such as 100 entries translated.  According to Mahoney, SOL has made this important text more accessible by offering an English version, making it searchable, and providing commentaries and references.  Moreover, “[a]s a collaboration SOL demonstrates the feasibility of open peer review and the value of incremental progress.” Other collaborative translation projects include The Encyclopédie of Diderot and d’Alembert, Traduwiki, which aims to “eliminate the last barrier of the Internet, the language’; the WorldWide lexicon project; and Babels.

COLLABORATIVE EDITING

  • Historical antecedents: creating critical editions
  • Supporting technologies: grid computing, XML editors, text analysis tools, annotation tools
  • Example Projects:

As Peter Robinson observed at this year’s MLA, the traditional model for creating a critical edition centralizes authority in an editor, who oversees work by graduate assistants and others.  However, the Internet enables distributed, de-centralized editing.  To create “community-made editions,” a library would digitize texts and produce high quality images, researchers would transcribe those images, others would collate the transcriptions, others would analyze the collations and add commentaries, and so forth.

Explaining the need for collaborative approaches to textual editing, Marc Wilhelm Kiister, Christoph Ludwig and Andreas Aschenbrenner of TextGrid describe how 3 different editors attempted to create a critical edition of the massive “so-called pseudo-capitulars supposedly written by a Benedictus Levita,” dying before they could complete their work.  Now a team of scholars is collaborating to create the edition, increasing their chances of completion by sharing the labor.  The TextGrid project is building a virtual workbench for collaborative editing, annotation, analysis and publication of texts.  Leveraging the grid infrastructure, TextGrid provides a platform for “software agents with well-defined interfaces that can be harnessed together through a user defined workflow to mine or analyze existing textual data or to structure new data both manually and automatically.” TextGrid recently released a beta version of its client application that includes an XML editor, search tool, dictionary search tool, metadata annotator, and workflow modules. As Kiister, Ludwig and Aschenbreener point out, enabling collaboration requires not only developing a technical platform that supports real-time collaboration and automation of routine tasks, but also facilitating a cultural shift toward collaboration among philologists, linguists, historians, librarians, and technical experts.

SOCIAL BIBLIOGRAPHIES, COLLABORATIVE FILTERING, AND ANNOTATION

  • Historical antecedents: shared references, bibliographies
  • Key functions: share citations, notes, and scholarly resources; build collective knolwedge
  • Supporting technologies: social bookmarking, bibliographic tools
  • Projects:
    With the release of Zotero 2.0, Zotero is taking a huge step toward the vision articulated by Dan Cohen of providing access to “the combined wisdom of hundreds of thousands of scholars” (Cohen).  Researchers can set up groups to share collections with a class and/or collaborators on a research project.   I’ve already used Zotero groups to support my research and to collaborate with others; I discovered several useful citations in the collaboration folder for the digital history group, and with Sterling Fluharty I’ve set up a group to study collaboration in the digital humanities (feel free to join).  Ultimately Zotero will provide Amazon-like recommendation services to help scholars identify relevant resources.  As Stan Katz wrote in hailing Zotero’s collaboration with the Internet Archive to create a “Zotero commons” for sharing research documents, “For secretive individualists, which is to say old-fashioned humanists, this will sound like an invasion of privacy and an invitation to plagiarism. But to scholars who value accessibility, collaboration, and the early exchange of information and insight -– the future is available. And free on the Internet.”

    Similarly, the eComma project suggests that collaborative annotation can facilitate collaborative interpretation, as readers catalog poetic devices (personification, enjambment, etc.) and offer their own interpretations of literary works.  You can see eComma at work in the Collaborative Rubáiyát, which enables users to compare different versions of the text, annotate the text, tag it, and access sections through a tag cloud.   Likewise, Philospace will allow scholars to describe philosophical resources, filter them, find resources tagged by others, and submit resulting research for peer review. Other projects and technologies supporting collaborative annotation include Flickr CommonsAus-e-Lit: Collaborative Integration and Annotation Services for Australian Literature Communities, NINES’ Collex, and STEVE.

COLLABORATIVE WRITING

  • Historical antecedents: Encyclopedias
  • Supporting technologies: Wikis
  • Key functions: sharing knowledge, synthesizing multiple perspectives
  • Examples:
    With the rise of Wikipedia, academics have been debating whether collaborative writing spaces such as wikis undermine authority, expertise, and trustworthiness.  In “Literary Sleuths Online,” Ralph Schroeder and Matthijs Den Besten examine the Pynchon Wiki, a collaborative space where Pynchon enthusiasts annotate and discuss his works.  Schroeder and Den Besten compare the wiki’s section on Pynchon’s Against the Day with a print equivalent, Weisenburger’s “A Gravity’s Rainbow Companion.”  While the annotations in Weisenburger’s book are more concise and consistent, the wiki is more comprehensive, more accurate (because many people are checking the information), and more speedily produced (it only took 3 months for the wiki to cover every page of Pynchon’s novel).   Moreover, the book is fixed, while the wiki is open-ended and expansive. Schroeder and Den Besten suggest that competition, community and curiosity drive participation, since contributors raced to add annotations as they made their way through the novel and “sleuthed” together.

GAMING: “Collaborative Play”/ Games as Research

  • Historical antecedents: role playing games, board games, etc.
  • Key functions: problem solving, team work, knowledge sharing
  • Supporting technologies: gaming engines, wikis, networks
  • Example Projects:
    Perhaps some of the most intense collaboration comes in massively multiplayer online games, as teams of players consult each other for assistance navigating virtual worlds, team up to defeat monsters, join guilds to collaborate on quests, and share their knowledge through wikis such as the WOWWiki, which has almost 74,000 articles.  Focusing on World of Warcraft, Nardi and Harris explore collaborative play as a form of learning.  They also point to potential applications of gaming in research communities: “Mixed collaboration spaces, whether MMOGs or another format, may be useful in domains such as interdisciplinary scientific work where a key challenge is finding the right collaborators.”

    Sometimes those collaborators can be people without specialized training.  Recently Wired featured a fascinating article about FoldIt, a game to come up with different models of proteins that is attracting devoted teams of participants (Bohannon).  The game was devised by the University of Washington Departments of Computer Science & Engineering and Biochemistry to crowdsource solutions to Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP), a scientific contest to predict protein structures.   Previously biochemist David Baker had used Rosetta@home to harness the spare computing cycles of 86,000 PCs that had been volunteered to help determine the shapes of proteins, but he was convinced that human intelligence as well as computing power needed to be tapped to solve spatial puzzles.  Thus he and his colleagues developed a game in which players fold proteins into their optimal shapes, a sort of “global online molecular speed origami.” Over 100,000 people have downloaded the game, and a 13 year-old is one of the game’s best players. Using the game’s chat function, players formed teams, “and collective efforts proved far more successful than any solo folder.”  At the CASP competition, 7 of the 15 solutions contributed through FoldIt worked, and one finished in first place, so “[a] band of gamer nonscientists had beaten the best biochemists.”

    How might gaming be used to motivate and support humanities research?  As we see in the example of FoldIt, games provide motivation and a structure for collaboration; teamwork enables puzzles to be solved more rapidly.  I could imagine, for example, a game in which players would transcribe pieces of a diary to unravel the mystery it recounts, describe the features of a series of images (similar to Google’s Image Labeler game), or offer up their own interpretations of abstruse philosophical or literary passages.  In “Games of Inquiry for Collaborative Concept Structuring,” Mary A. Keeler and Heather D. Pfeiffer envision a “Manuscript Reconstruction Game (MRG)” where Peirce scholars would collaborate to figure out where a manuscript page belongs. “The scholars rely on the mechanism of the game, as a logical editor or ‘logical lens,’ to help them focus on and clarify the complexities of inference and conceptual content in their collaborative view of the manuscript evidence” (407).  There are already some compelling models for humanities game play.  Dan Cohen recently used Twitter to crowdsource solving an historical puzzle. Ian Bogost and collaborators are investigating the intersections between journalism and gaming.  Jerome McGann describes Ivanhoe as an  “online playspace… for organizing collaborative interpretive investigations of traditional humanities materials of any kind,” as two or more players come together to re-imagine and transform a literary work (McGann).

PUBLISHING

  • Historical antecedents: exchange of drafts, letters, critical dialogs in journals
  • Supporting technologies and protocols: CommentPress, blogs, wikis, Creative Commons licenses, etc.
  • Projects:
    Bob Stein defines the book as “a place where readers (and sometimes authors) congregate.” Recent projects enable readers to participate in all phases of the publishing process, from peer-to-peer review to remixing a work to produce something new.  For instance, LiquidPub aims to transform the dissemination and evaluation of scientific knowledge by enabling “Liquid Publication that can take multiple forms, that evolves continuously, and is enriched by multiple sources.”  Using CommentPress, Noah Wardrip-Fruin  experimented with peer-to-peer review of his new book Expressive Processing alongside traditional peer review, posting a section of the book each week day to the Grand Text Auto blog.  Although it was difficult for many reviewers to get a sense of the book’s overall arguments when they were reading only fragments, Wardrip-Fruin found many benefits to this open approach to peer review: he could engage in conversation with his reviewers and determine how to act on their comments, and he received detailed comments from both academics and non-academics with expertise in the topics being discussed, such as game designers.  Similarly, O’Reilly recently developed the Open Publishing Feedback System to gather comments from the community.  Its first experiment, Programming Scala, yielded over 7000 comments from nearly 750 people. New publishing companies such as WeBook and Vook are exploring collaborative authorship and multimedia.

SOCIAL LEARNING

  • Historical antecedents: Students as research assistants?
  • Supporting technologies: blogs, wikis, social bookmarking, social bibliographies
  • Motto: “We participate, therefore we are.” (via John Seely Brown)
  • Example:
    As John Seely Brown explains, “social learning is based on the premise that our understanding of content is socially constructed through conversations about that content and through grounded interactions, especially with others, around problems or actions.”  Social learning involves “learning to be” an expert through apprenticeship, as well as learning the content and language of a domain.  Brown points to open source communities as exemplifying social learning.  I would guess that many, if not most, collaborative digital humanities projects have depended on contributions from undergraduate and graduate students, whether they digitized materials, did programming, authored metadata, contributed to the project wiki, designed the web site, or even managed the project.

    Why not create a network of research projects, so that students studying a similar topic could jointly contribute to a common resource?  Such is the vision of “Looking for Whitman: The Poetry of Place in the Life and Work of Walt Whitman,” led by Matthew Gold.   Working together to build a common web site on Whitman, students will document their research using Web 2.0 technologies such as CommentPress, BuddyPress (Word Press + social networking), blogs, wikis, YouTube, Flickr, Google Maps, etc.m  Students at City Tech, CUNY’s New York City College of Technology and New York University will focus on Whitman in New York;  those at Rutgers University at Camden will look at Whitman as “sage of Camden”; and those at the University of Mary Washington will examine Whitman and the Civil War.   Similarly, Michael Wesch, the 2008 CASE/Carnegie U.S. Professor of the Year for Doctoral and Research Universities, asks his students to become “co-creators” of knowledge, whether in simulating world history and cultures, creating an ethnography of YouTube, or examining anonymity and new media.

While collaboration in the humanities is certainly not new, these projects suggest how researchers (both professional and amateur) can work together regardless of physical location to share ideas and citations, produce translations or transcriptions, and create common scholarly resources.  Long as this list is, I know I’m omitting many other relevant projects (some of which I’ve bookmarked) and overlooking (for now) the challenges that collaborative scholarship faces.  I’ll be working with several collaborators to explore these issues, but I of course welcome comments….

Works Cited

Atkins, Dan. Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. NSF. January 2003. <http://www.nsf.gov/od/oci/reports/toc.jsp>.
Bohannon, John. “Gamers Unravel the Secret Life of Protein.” Wired 20 Apr 2009. 26 May 2009 <http://www.wired.com/medtech/genetics/magazine/17-05/ff_protein?currentPage=all>.
Borgman, Christine L. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass.: The MIT Press, 2007.
Brockman, William et al. Scholarly Work in the Humanities and the Evolving Information Environment. CLIR/DLF, 2001. 24 Jul 2007 <http://www.clir.org/PUBS/reports/pub104/pub104.pdf>.
Cohen, Daniel J. “Zotero: Social and Semantic Computing for Historical Scholarship.” Perspectives (2007). 27 May 2009 <http://www.historians.org/perspectives/issues/2007/0705/0705tec2.cfm>.
Cronin, Blaise, Debora Shaw, and Kathryn La Barre. “A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy.” Journal of the American Society for Information Science and Technology 54.9 (2003): 855-871.
Cronin, Blaise. The hand of science. Scarecrow Press, 2005.
Kelly, Kevin. “The New Socialism: Global Collectivist Society Is Coming Online.” Wired 22 May 2009. 26 May 2009 <http://www.wired.com/culture/culturereviews/magazine/17-06/nep_newsocialism?currentPage=all>.
Kornbluh, Mark. “From Digital Repositorities to Information Habitats: H-Net, the Quilt Index, Cyber Infrastruture, and Digital Humanities.” First Monday 13.8: August 4, 2008. 
Kuster, M.W., C. Ludwig, and A. Aschenbrenner. “TextGrid as a Digital Ecosystem.” Digital EcoSystems and Technologies Conference, 2007. DEST ’07. Inaugural IEEE-IES. 2007. 506-511.
Mahoney, Anne. “Tachypaedia Byzantina: The Suda On Line as Collaborative Encyclopedia.”  Digital Humanities Quarterly. 3.1 (2009). 22 Mar 2009 <http://www.digitalhumanities.org/dhq/vol/003/1/000025.html>.
McGann, Jerome J. “Culture and Technology: The Way We Live Now, What Is to Be Done?.” New Literary History 36.1 (2005): 71-82.
Nardi, Bonnie, and Justin Harris. “Strangers and friends: collaborative play in world of warcraft.” Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. Banff, Alberta, Canada: ACM, 2006. 149-158. 18 May 2009 <http://portal.acm.org/citation.cfm?id=1180875.1180898>.
O’Donnell, Daniel Paul. “Disciplinary Impact and Technological Obsolescence in Digital Medieval Studies.” A Companion To Digital Humanities. 2 May 2009 <http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405148641/9781405148641.xml&chunk.id=ss1-4-2&toc.id=0&brand=9781405148641_brand>.
Schroeder, Ralph, and Matthijs Den Besten. “Literary Sleuths On-line: e-Research collaboration on the Pynchon Wiki.” Information, Communication & Society 11.2 (2008): 167-187.
Smith, Martha Nell. “Computing: What Has American Literary Study To Do with It.” American Literature 74.4 (2002): 833-857.
Unsworth, John M. “Creating Digital Resources: the Work of Many Hands.” 14 Sep 1997. 10 Mar 2009 <http://www3.isrl.uiuc.edu/%7Eunsworth/drh97.html>.

Revisions: Fixed From the Page link, 6/1/09; Tanya ] Tara, 6/2/09; fixed typos (6/14/09)

Digital Humanities in 2008, III: Research

In this final installment of my summary of Digital Humanities in 2008, I’ll discuss developments in digital humanities research. (I should note that if I attempted to give a true synthesis of the year in digital humanities, this would be coming out 4 years late rather than 4 months, so this discussion reflects my own idiosyncratic interests.)

1) Defining research challenges & opportunities

What are some of the key research challenges in digital humanities? Leading scholars tackled this question when CLIR and the NEH convened a workshop on Promoting Digital Scholarship: Formulating Research Challenges In the Humanities, Social Sciences and Computation. Prior to the workshop, six scholars in classics, architectural history, physics/information sciences, literature, visualization, and information retrieval wrote brief overviews of their field and of the ways that information technology could help to advance it. By articulating the central concerns of their fields so concisely, these essays promote interdisciplinary conversation and collaboration; they’re also fun to read. As Doug Oard writes in describing the natural language processing “tribe,” “Learning a bit about the other folks is a good way to start any process of communication… The situation is really quite simple: they are organized as tribes, they work their magic using models (rather like voodoo), they worship the word “maybe,” and they never do anything right.” Sounds like my kind of tribe. Indeed, I’d love to see a wiki where experts in fields ranging from computational biology to postcolonial studies write brief essays about their fields, provide a bibliography of foundational works, and articulate both key challenges and opportunities for collaboration. (Perhaps such information could be automatically aggregated using semantic technologies—see, for instance, Concept Web or Kosmix–but I admire the often witty, personal voices of these essays.)

Here are some key ideas that emerge from the essays:

  1. Global Humanistic Studies: Both Caroline Levander and Greg Crane, Alison Babeu, David Bamman, Lisa Cerrato, and Rashmi Singhal call for a sort of global humanistic studies, whether re-conceiving American studies from a hemispheric perspective or re-considering the Persian Wars from the Persian point of view. Scholars working in global humanistic studies face significant challenges, such as the need to read texts in many languages and understand multiple cultural contexts. Emerging technologies promise to help scholars address these problems. For instance, named entity extraction, machine translation and reading support tools can help scholars make sense of works that would otherwise be inaccessible to them; visualization tools can enable researchers “to explore spatial and temporal dynamism;” and collaborative workspaces allow scholars to divide up work, share ideas, and approach a complex research problem from multiple perspectives. Moreover, a shift toward openly accessible data will enable scholars to more easily identify and build on relevant work. Describing how reading support tools enable researchers to work more productively, Crane et . write, “By automatically linking inflected words in a text to linguistic analyses and dictionary entries we have already allowed readers to spend more time thinking about the text than was possible as they flipped through print dictionaries. Reading support tools allow readers to understand linguistic sources at an earlier stage of their training and to ask questions, no matter how advanced their knowledge, that were not feasible in print.” We can see a similar intersection between digital humanities and global humanities in projects like the Global Middle Ages.
  2. What skills do humanities scholars need? Doug Oard suggests that humanities scholars should collaborate with computer scientists to define and tackle “challenge problems” so that the development of new technologies is grounded in real scholarly needs. Ultimately, “humanities scholars are going to need to learn a bit of probability theory” so that they can understand the accuracy of automatic methods for processing data, the “science of maybe.” How does probability theory jibe with humanistic traditions of ambiguity and interpretation? And how are humanities scholars going to learn these skills?

According to the symposium, major research challenges for the digital humanities include:

  1. Scale and the poverty of abundance:” developing tools and methods to deal with the plenitude of data, including text mining and analysis, visualization, data management and archiving, and sustainability.
  2. Representing place and time: figuring out how to support geo-temporal analysis and enable that analysis to be documented, preserved, and replicated
  3. Social networking and the economy of attention: understanding research behaviors online; analyzing text corpora based on these behaviors (e.g. citation networks)
  4. Establishing a research infrastructure that facilitates access, interdisciplinary collaboration, and sustainability. “As one participant asked, “What is the Protein Data Bank for the humanities?””

2) High performance computing: visualization, modeling, text mining

What are some of the most promising research areas in digital humanities? In a sense, the three recent winners of the NEH/DOE’s High Performance Computing Initiative define three of the main areas of digital humanities and demonstrate how advanced computing can open up new approaches to humanistic research.

  • text mining and text analysis: For its project on “Large-Scale Learning and the Automatic Analysis of Historical Texts,” the Perseus Digital Library at Tufts University is examining how words in Latin and Greek have changed over time by comparing the linguistic structure of classical texts with works written in the last 2000 years. In the press release announcing the winners, David Bamman, a senior researcher in computational linguistics with the Perseus Project, said that “[h]igh performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin.”
  • 3D modeling: For its “High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage” project, the Institute for Advanced Technology in the Humanities at the University of Virginia will reprocess existing data to create 3D models of culturally-significant artifacts and architecture. For example, IATH hopes to re-assemble fragments that chipped off  ancient Greek and Roman artifacts.
  • Visualization and cultural analysis: The University of California, San Diego’s Visualizing Patterns in Databases of Cultural Images and Video project will study contemporary culture, analyzing datastreams such as “millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.” Ultimately the project will produce detailed visualizations of cultural phenomena.

Winners received compute time on a supercomputer and technical training.

Of course, there’s more to digital humanities than text mining, 3D modeling, and visualization. For instance, the category listing for the Digital Humanities and Computer Science conference at Chicago reveals the diversity of participants’ fields of interest. Top areas include text analysis; libraries/digital archives; imaging/visualization, data mining/machine learning; informational retrieval; semantic search; collaborative technologies; electronic literature; and GIS mapping. A simple analysis of the most frequently appearing terms in the Digital Humanities 2008 Book of Abstracts suggests that much research continues to focus on text—which makes sense, given the importance of written language to humanities research.  Here’s the list that TAPOR generated of the 10 words most frequently used terms in the DH 2008 abstracts:

  1. text: 769
  2. digital: 763
  3. data: 559
  4. information: 546
  5. humanities: 517
  6. research: 501
  7. university: 462
  8. new: 437
  9. texts: 413
  10. project: 396

“Images” is used 161 times, visualization 46.

Wordle: Digital Humanities 2008 Book of Abstracts

And here’s the word cloud. As someone who got started in digital humanities by marking up texts in TEI, I’m always interested in learning about developments in encoding, analyzing and visualizing texts, but some of the coolest sessions I attended at DH 2008 tackled other questions: How do we reconstruct damaged ancient manuscripts? How do we archive dance performances? Why does the digital humanities community emphasize tools instead of services?

3) Focus on method

As digital humanities emerges, much attention is being devoted to developing research methodologies. In “Sunset for Ideology, Sunrise for Methodology?,” Tom Scheinfeldt suggests that humanities scholarship is beginning to tilt toward methodology, that we are entering a “new phase of scholarship that will be dominated not by ideas, but once again by organizing activities, both in terms of organizing knowledge and organizing ourselves and our work.”

So what are some examples of methods developed and/or applied by digital humanities researchers? In “Meaning and mining: the impact of implicit assumptions in data mining for the humanities,” Bradley Pasanek and D. Sculley tackle methodological challenges posed by mining humanities data, arguing that literary critics must devise standards for making arguments based upon data mining. Through a case study testing Lakoff’s theory that political ideology is defined by metaphor, Pasanek and Sculley demonstrate that the selection of algorithms and representation of data influence the results of data mining experiments. Insisting that interpretation is central to working with humanities data, they concur with Steve Ramsay and others in contending that data mining may be most significant in “highlighting ambiguities and conflicts that lie latent within the text itself.” They offer some sensible recommendations for best practices, including making assumptions about the data and texts explicit; using multiple methods and representations; reporting all trials; making data available and experiments reproducible; and engaging in peer review of methodology.

4) Digital literary studies

Different methodological approaches to literary study are discussed in the Companion to Digital Literary Studies (DLS), which was edited by Susan Schreibman and Ray Siemens and was released for free online in the fall of 2008. Kudos to its publisher, Blackwell, for making the hefty volume available, along with A Companion to Digital Humanities. The book includes essays such as “Reading digital literature: surface, data, interaction, and expressive processing” by Noah Wardrip-Fruin, “The Virtual Codex from page space to e-space” by Johanna Drucker, “Algorithmic criticism” by Steve Ramsay, and “Knowing true things by what their mockeries be: modelling in the humanities” by Willard McCarty. DLS also provides a handy annotated bibliography by Tanya Clement and Gretchen Gueguen that highlights some of the key scholarly resources in literature, including Digital Transcriptions and Images, Born-Digital Texts and New Media Objects, and Criticism, Reviews, and Tools. I expect that the book will be used frequently in digital humanities courses and will be a foundational work.

5) Crafting history: History Appliances

For me, the coolest—most innovative, most unexpected, most wow!—work of the year came from the ever-inventive Bill Turkel, who is exploring humanistic fabrication (not in the Mills Kelly sense of making up stuff ;), but in the DIY sense of making stuff). Turkel is working on “materialization,” giving a digital representation physical form by using, for example, a rapid prototyping machine, a sort of 3D printer. Turkel points to several reasons why humanities scholars should experiment with fabrication: they can be like DaVinci, making the connection between the mind and hand by realizing an idea in physical form; study the past by recreating historical objects (fossils, historical artifacts, etc) that can be touched, rotated, scrutinized; explore “haptic history,” a sensual experience of the past; and engage in “Critical technical practice,” where scholars both create and critique.

Turkel envisions making digital information “available in interactive, ambient and tangible forms.”  As Turkel argues, “As academic researchers we have tended to emphasize opportunities for dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. We need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment.” Turkel and his team are working on 4 devices: a dashboard, which shows both public and customized information streams on a large display; imagescapes and soundscapes that present streams of complex data as artificial landscapes or sound, aiding awareness; a GeoDJ, which is an iPod-like device that uses GPS and GIS to detect your location and deliver audio associated with it ( e.g. percussion for an historic industrial site); and ice cores and tree rings, “tangible browsers that allow the user to explore digital models of climate history by manipulating physical interfaces that are based on this evidence.” This work on ambient computing and tangible interfaces promises to foster awareness and open up understanding of scholarly data by tapping people’s natural way of comprehending the world through touch and other forms of sensory perception. (I guess the senses of smell and taste are difficult to include in sensual history, although I’m not sure I want to smell or taste many historical artifacts or experiences anyway. I would like to re-create the invention of the Toll House cookie, which for me qualifies as an historic occasion.) This approach to humanistic inquiry and representation requires the resources of a science lab or art studio—a large, well-ventilated space as well as equipment like a laser scanner, lathes, mills, saws, calipers, etc. Unfortunately, Turkel has stopped writing his terrific blog “Digital History Hacks” to focus on his new interests, but this work is so fascinating that I’m anxious to see what comes next–which describes my attitude toward digital humanities in general.

Digital Humanities in 2008, II: Scholarly Communication & Open Access

Open access, just like dark chocolate and blueberries, is good and good for you, enabling information to be mined and reused, fostering the exchange of ideas, and ensuring public access to research that taxpayers often helped to fund.  Moreover, as Dan Cohen contends, scholars benefit from open access to their work, since their own visibility increases: “In a world where we have instantaneous access to billions of documents online, why would you want the precious article or book you spent so much time on to exist only on paper, or behind a pay wall? This is a sure path to invisibility in the digital age.”  Thus some scholars are embracing social scholarship, which promotes openness, collaboration, and sharing research.  This year saw some positive developments in open access and scholarly communications, such as the implementation of the NIH mandate, Harvard’s Faculty of Arts & Science’s decision to go open access (followed by Harvard Law), and the launch of the Open Humanities Press.  But there were also some worrisome developments (the Conyers Bill’s attempt to rescind the NIH mandate, EndNote’s lawsuit against Zotero) and some confusing ones (the Google Books settlement).  In the second part of my summary on the year in digital humanities, I’ll look broadly at the scholarly communication landscape, discussing open access to educational materials, new publication models, the Google Books settlement, and cultural obstacles to digital publication.

Open Access Grows–and Faces Resistance

In December of 2007, the NIH Public Access Policy was signed into law, mandating that any research funded by the NIH would be deposited in PubMed

Ask Me About Open Access by mollyali

Ask Me About Open Access by mollyali

Central within a year of its publication.  Since the mandate was implemented, almost 3000 new biomedical manuscripts have been deposited into PubMed Central each month.  Now John Conyers has put forward a bill that would rescind the NIH mandate and prohibit other federal agencies from implementing similar policies.  This bill would deny the public access to research that it funded and choke innovation and scientific discovery.   According to Elias Zerhouni, former director of the NIH, there is no evidence that the mandate harms publishers; rather, it maximizes the public’s “return on its investment” in funding scientific research.  If you support public access to research, contact your representative and express your opposition to this bill before February 28.  The Alliance for Taxpayer Access offers a useful summary of key issues as well as a letter template at http://www.taxpayeraccess.org/action/HR801-09-0211.html.

Open Humanities?

Why has the humanities been lagging behind the sciences in adopting open access?  Gary Hall points to several ways in which the sciences differ from the humanities, including science’s greater funding  for “author pays” open access and emphasis  on disseminating information rapidly, as well as humanities’ “negative perception of the digital medium.”   But Hall is challenging that perception by helping to launch the Open Humanities Press (OHP) and publishing “Digitize This Book.”  Billing itself as “an international open access publishing collective in critical and cultural theory,” OHP  selects journals for inclusion in the collective  based upon their adherence to publication standards, open access standards, design standards, technical standards, and editorial best practices. Prominent scholars such as Jonathan Culler, Stephen Greenblatt, and Jerome McGann have signed on as board members of the Open Humanities Press, giving it more prestige and academic credibility.  In a talk at UC Irvine last spring,  OHP co-founder Sigi Jӧttkandt refuted the assumption that open access means “a sort of open free-for-all of publishing” rather than high-quality, peer-reviewed scholarship.  Jӧttkandt argued that open access should be fundamental to the digital humanities: “as long as the primary and secondary materials that these tools operate on remain locked away in walled gardens, the Digital Humanities will fail to fulfill the real promise of innovation contained in the digital medium.”  It’s worth noting that many digital humanities resources are available as open access, including Digital Humanities Quarterly, the Rossetti Archive, and projects developed by CHNM; many others may not be explicitly open access, but they make information available for free.

In “ANTHROPOLOGY OF/IN CIRCULATION: The Future of Open Access and Scholarly Societies,” Christopher Kelty, Michael M. J. Fischer, Alex “Rex” Golub, Jason Baird Jackson, Kimberly Christen, and Michael F. Brown engage in a wide-ranging discussion of open access in anthropology, prompted in part by the American Anthropological Association’s decision to move its publishing activities to Wiley Blackwell.  This rich conversation explores different models for open access, the role of scholarly societies in publishing, building community around research problems, reusing and remixing scholarly content, the economics of publishing, the connection between scholarly reputation and readers’ access to publications, how to make content accessible to source communities, and much more.   As Kelty argues, “The future of innovative scholarship is not only in the AAA (American Anthropological Association) and its journals, but in the structures we build that allow our research to circulate and interact in ways it never could before.”  Kelty (who, alas, was lured away from Rice by UCLA) is exploring how to make scholarship more open and interactive.  You can buy a print copy of Two Bits, his new book on the free software movement published by Duke UP; read (for free) a PDF version of the book; comment on the CommentPress version; or download and remix the HTML.  Reporting on Two Bits at Six Months, Kelty observed, “Duke is making as little or as much money on the book as they do on others of its ilk, and yet I am getting much more from it being open access than I might otherwise.”  The project has made Kelty more visible as a scholar, leading to more media attention, invitations to give lectures and submit papers, etc.

New Models of Scholarly Communication, and Continued Resistance

To what extent are new publishing models emerging as the Internet enables the rapid, inexpensive distribution of information, the incorporation of multimedia into publications, and networked collaboration? To find out, The ARL/ Ithaka New Model Publications Study conducted an “organized scan” of emerging scholarly publications such as blogs, ejournals, and research hubs.  ARL recruited 301 volunteer librarians from 46 colleges and universities to interview faculty about new model publications that they used.  (I participated in a small way, interviewing one faculty member at Rice.)  According to the report, examples of new model publications exist in all disciplines, although scientists are more likely to use pre-print repositories, while humanities scholars participate more frequently in discussion forums.  The study identifies eight principal types of scholarly resources:

  • E-only journals
  • Reviews
  • Preprints and working papers
  • Encyclopedias, dictionaries, and annotated  content
  • Data
  • Blogs
  • Discussion forums
  • Professional and scholarly hubs

These categories provide a sort of abbreviated field manual to identifying different types of new model publications.  I might add a few more categories, such as collaborative commentary or peer-to-peer review (exemplified by projects that use CommentPress); scholarly wikis like OpenWetWare that enable open sharing of scholarly information; and research portals like NINES (which perhaps would be considered a “hub”).   The report offers fascinating examples of innovative publications, such as ejournals that publish articles as they are ready rather on a set schedule and a video journal that documents experimental methods in biology.   Since only a few examples of new model publications could fit into this brief report, ARL is making available brief descriptions of 206 resources that it considered to be  “original and scholarly works” via a publicly accessible database.

My favorite example of a new model publication: eBird, a project initiated by  the Cornell Lab of Ornithology and the Audobon Society that enlists amateur and professional bird watchers to collect bird observation data.  Scientists then use this data to understand the “distribution and abundance” of birds.  Initially eBird ran into difficulty getting birders to participate, so they developed tools that allowed birders to get credit and feel part of a community, to “manage and maintain their lists online, to compare their observations with others’ observations.” I love the motto and mission of eBird—“Notice nature.”  I wonder if a similar collaborative research site could be set up for, say, the performing arts (ePerformances.org?), where audience members would document arts and humanities in the wild–plays, ballets, performance art, poetry readings, etc.

The ARL/Ithaka report also highlights some of the challenges faced by these new model publications, such as the conservatism of academic culture, the difficulty of getting scholars to participate in online forums, and finding ways to fund and sustain publications.  In  Interim Report: Assessing the Future Landscape of Scholarly Communication, Diane Harley and her colleagues at the University of California Berkeley delve into some of these challenges.  Harley finds that although some scholars are interested in publishing their research as interactive multimedia, “(1) new forms must be perceived as having undergone rigorous peer review, (2) few untenured scholars are presenting such publications as part of their tenure cases, and (3) the mechanisms for evaluating new genres (e.g., nonlinear narratives and multimedia publications) may be prohibitive for reviewers in terms of time and inclination.” Humanities researchers are typically less concerned with the speed of publication than scientists and social scientists, but they do complain about journals’ unwillingness to include many high quality images and would like to link from their arguments to supporting primary source material. However, faculty are not aware of any easy-to-use tools or support that would enable them to author multimedia works and are therefore less likely to experiment with new forms.  Scholars in all fields included in the study do share their research with other scholars, typically through emails and other forms of personal communication, but many regard blogs as “a waste of time because they are not peer reviewed.”  Similarly, Ithaka’s 2006 Studies of Key Stakeholders in the Digital Transformation in Higher Education (published in 2008) found that “faculty decisions about where and how to publish the results of their research are principally based on the visibility within their field of a particular option,” not open access.

But academic conservatism shouldn’t keep us from imagining and experimenting with alternative approaches to scholarly publishing.  Kathleen Fitzpatrick’s “book-like-object” (blob) proposal, Planned Obsolescence: Publishing, Technology, and the Future of the Academy, offers a bold and compelling vision of the future of academic publishing.  Fitzpatrick calls for academia to break out of its zombie-like adherence to (un)dead forms and proposes “peer-to-peer” review (as in Wikipedia), focusing on process rather than product (as in blogs), and engaging in networked conversation (as in CommentPress). (If references to zombies and blobs make you think Fitzpatrick’s stuff is fun to read as well as insightful, you’d be right.)

EndNote Sues Zotero

Normally I have trouble attracting faculty and grad students to workshops exploring research tools and scholarly communication issues, but they’ve been flocking to my workshops on Zotero, which they recognize as a tool that will help them work more productively.  Apparently Thomson Reuters, the maker of EndNote, has noticed the competitive threat posed by Zotero, since they have sued George Mason University, which produces Zotero, alleging that programmers reverse engineered EndNote so that they could convert proprietary EndNote .ens files into open Zotero .csl files.  Commentators more knowledgeable about the technical and legal details than I have found Thomson’s claims to be bogus.  My cynical read on this lawsuit is that EndNote saw a threat from a popular, powerful open source application and pursued legal action rather than competing by producing a better product.  As Hugh Cayless suggests, “This is an act of sheer desperation on the part of Thomson Reuters” and shows that Zotero has “scared your competitors enough to make them go running to Daddy, thus unequivocally validating your business model.”

The lawsuit seems to realize Yokai Benkler’s description of proprietary attempts to control information:

“In law, we see a continual tightening of the control that the owners of exclusive rights are given.  Copyrights are longer, apply to more uses, and are interpreted as reaching into every corner of valuable use. Trademarks are stronger and more aggressive. Patents have expanded to new domains and are given greater leeway. All these changes are skewing the institutional ecology in favor of business models and production practices that are based on exclusive proprietary claims; they are lobbied for by firms that collect large rents if these laws are expanded, followed, and enforced. Social trends in the past few years, however, are pushing in the opposite direction.”

Unfortunately, the lawsuit seems to be having a chilling effect that ultimately will, I think, hurt EndNote.  For instance, the developers of BibApp, “a publication-list manager and repository-populator,” decided not to import citation lists produced by EndNote, since “doing anything with their homegrown formats has been proven hazardous.” This lawsuit raises the crucial issue of whether researchers can move their data from one system to another.  Why would I want to choose a product that locks me in?  As Nature wrote in an editorial quoted by CHNM in its response to the lawsuit, “The virtues of interoperability and easy data-sharing among researchers are worth restating.”

Google Books Settlement

Google Books by Jon Wiley

Google Books by Jon Wiley

In the fall, Google settled with the Authors Guild and the Association of American Publishers over Google Book Search, allowing academic libraries to subscribe to a full-text collection of millions of out-of-print but (possibly) in-copyright books.  (Google estimates that about 70% of published books fall into this category).  Individuals can also purchase access to books, and libraries will be given a single terminal that will provide free access to the collection.  On a pragmatic (and gluttonous) level, I think, Oh boy, this settlement will give me access to so much stuff.   But, like others, I am concerned about one company owning all of this information, see the Book Rights Registry as potentially anti-competitive, and wish that a Google victory in court had verified fair use principles (even if such a decision probably would have kept us in snippet view or limited preview for in-copyright materials).  Libraries have some legitimate concerns about access, privacy, intellectual freedom, equitable treatment, and terms of use.  Indeed, Harvard pulled out of the project over concerns about cost and accessibility.  As Robert Darnton, director of the Harvard Library and a prominent scholar of book history, wrote in the NY Review of Books, “To digitize collections and sell the product in ways that fail to guarantee wide access… would turn the Internet into an instrument for privatizing knowledge that belongs in the public sphere.” Although the settlement makes a provision for “non-consumptive research” (using the books without reading them) that seems to allow for text mining and other computational research, I worry that digital humanists and other scholars won’t have access to the data they need.  What if Google goes under, or goes evil? But the establishment of the Hathi Trust by several of Google Book’s academic library partners (and others) makes me feel a little better about access and preservation issues, and I noted that Hathi Trust will provide a corpus of 50,000 documents for the NEH’s Digging into the Data Challenge.  And as I argued in an earlier series of blog posts, I certainly do see how Google Books can transform research by providing access to so much information.

Around the same time (same day?) that the Google Books settlement was released, the Open Content Alliance (OCA) reached an important milestone, providing access to over a million books.  As its name suggests, the OCA makes scanned books openly available for reading, download, and analysis, and from my observations the quality of the digitization is better.  Although the OCA’s collection is smaller and it focuses on public domain materials, it offers a vital alternative to GB.  (Rice is a member of the Open Content Alliance.)

Next up in the series on digital humanities in 2008: my attempt to summarize recent developments in research.

Work Product Blog

Matt Wilkens, post-doctoral fellow at Rice’s Humanities Research Center, recently launched Work Product, a blog that chronicles his research in digital humanities, contemporary fiction, and literary theory.  Matt details how he is working through the challenges he faces as he tries to analyze the relationship between allegory and revolution by using text mining, such as:
•    Where and how to get large literary corpora. Matt looks at how much content is available through Project Gutenberg, Open Content Alliance, Google Books, and  Hathi Trust and  how difficult it is to access
•    Evaluating Part of Speech taggers, with information about speed and accuracy

I think that other researchers working on text mining projects will benefit from Matt’s careful documentation of his process.

By the way, Matt’s blog can be thought of as part of the movement called “open notebook science,” which Jean Claude Bradley defines as “a laboratory notebook… that is freely available and indexed on common search engines.”  Other humanities and social sciences blogs that are likewise ongoing explorations of particular research projects include Wesley Raabe’s blog, Another Anthro Blog, and Erkan’s Field Diary.  (Please alert me to others!)

Is Wikipedia Becoming a Respectable Academic Source?

Last year a colleague in the English department described a conversation in which a friend revealed a dirty little secret: “I use Wikipedia all the time for my research—but I certainly wouldn’t cite it.”  This got me wondering: How many humanities and social sciences researchers are discussing, using, and citing Wikipedia?  To find out, I searched Project Muse and JSTOR, leading electronic journal collections for the humanities and social sciences, for the term “wikipedia,” which picked up both references to Wikipedia and citations of the wikipedia URL.  I retrieved 167 results from between 2002 and 2008, all but 8 of which came from Project Muse.  (JSTOR covers more journals and a wider range of disciplines but does not provide access to issues published in the last 3-5 years.)  In contrast, Project Muse lists 149 results in a search for “Encyclopedia Britannica” between 2002 and 2008, and JSTOR lists 3.  I found that citations of Wikipedia have been increasing steadily: from 1 in 2002 (not surprisingly, by Yochai Benkler) to 17 in 2005 to 56 in 2007. So far Wikipedia has been cited 52 times in 2008, and it’s only August.

Along with the increasing number of citations, another indicator that Wikipedia may be gaining respectability is its citation by well-known scholars.  Indeed, several scholars both cite Wikipedia and are themselves subjects of Wikipedia entries, including Gayatri Spivak, Yochai Benkler, Hal Varian, Henry Jenkins, Jerome McGann, Lawrence Buell, and Donna Haraway.

111 of the sources (66.5%) are what I call “straight citations”—citations of Wikipedia without commentary about it–while 56 (34.5%) comment on Wikipedia as a source, either positively or negatively.  14.5% of the total citations come from literary studies, 14% from cultural studies, 11.4% from history, and 6.6% from law. Researchers cite Wikipedia on a diversity of topics, ranging from the military-industrial complex to horror films to Bush’s second state of the union speech.  8 use Wikipedia simply as a source for images (such as an advertisement for Yummy Mummy cereal or a diagram of the architecture of the Internet).  Many employ Wikipedia either as a source for information about contemporary culture or as a reflection of contemporary cultural opinion.  For instance, to illustrate how novels such as The Scarlet Letter and Uncle Tom’s Cabin have been sanctified as “Great American Novels,” Lawrence Buell cites the Wikipedia entry on “Great American Novel”(Buell).

About a third of the articles I looked at discuss the significance of Wikipedia itself.  14 (8%) criticize using it in research.  For instance, a reviewer of a biography about Robert E. Lee tsks-tsks:

The only curiosities are several references to Wikipedia for information that could (and should) have been easily obtained elsewhere (battle casualties, for example). Hopefully this does not portend a trend toward normalizing this unreliable source, the very thing Pryor decries in others’ work. (Margolies).

In contrast, 11 (6.6%) cite Wikipedia as a model for participatory culture.  For example:

The rise of the net offers a solution to the major impediment in the growth and complexification of the gift economy, that network of relationships where people come together to pursue public values. Wikipedia is one example.(DiZerega)

A few (1.8%) cite Wikipedia self-consciously, aware of its limitations but asserting its relevance for their particular project:

Citing Wikipedia is always dicey, but it is possible to cite a specific version of an entry. Start with the link here, because cybervandals have deleted the list on at least one occasion. For a reputable “permanent version” of “Alternative press (U.S. political right)” see: http://en.wikipedia.org/w/index.php?title=Alternative_press_%28U.S._political_right%29&oldid=107090129 (Berlet).

Of course, just because more researchers—including some prominent ones—are citing Wikipedia does not mean it’s necessarily a valid source for academic papers.  However, you can begin to see academic norms shifting as more scholars find useful information in Wikipedia and begin to cite it.  As Christine Borgman notes, “Scholarly documents achieve trustworthiness through a social process to assure readers that the document satisfies the quality norms of the field” (Borgman 84).  As a possible sign of academic norms changing in some disciplines, several journals, particularly those focused on contemporary culture, include 3 or more articles that reference Wikipedia: Advertising and Society Review (7 citations), American Quarterly (3 citations), College Literature (3 citations), Computer Music Journal (5 citations), Indiana Journal of Global Legal Studies (3 citations), Leonardo (8 citations), Library Trends (5 citations), Mediterranean Quarterly (3 citations), and Technology and Culture (3 citations).

So can Wikipedia be a reputable scholarly resource?  I typically see four main criticisms of Wikipedia:

1) Research projects shouldn’t rely upon encyclopedias. Even Jimmy Wales, (co?-)founder of Wikipedia, acknowledges “I still would say that an encyclopedia is just not the kind of thing you would reference as a source in an academic paper. Particularly not an encyclopedia that could change instantly and not have a final vetting process” (Young).  But an encyclopedia can be a valid starting point for research.  Indeed, The Craft of Research, a classic guide to research, advises that researchers consult reference works such as encyclopedias to gain general knowledge about a topic and discover related works (80).  Wikipedia covers topics often left out of traditional reference works, such as contemporary culture and technology.  Most if not all of the works I looked at used Wikipedia to offer a particular piece of background information, not as a foundation for their argument.

2) Since Wikipedia is constantly undergoing revisions, it is too unstable to cite; what you read and verified today might be gone tomorrow–or even in an hour.  True, but Wikipedia is developing the ability for a particular version of an entry to be vetted by experts and then frozen, so researchers could cite an authoritative, unchanging version (Young).  As the above citation from Berlet indicates, you can already provide a link to a specific version of an article.

3) You can’t trust Wikipedia because anyone—including folks with no expertise, strong biases, or malicious (or silly) intent—can contribute to it anonymously.  Yes, but through the back and forth between “passionate amateurs,” experts, and Wikipedia guardians protecting against vandals, good stuff often emerges. As Nicholson Baker, who has himself edited Wikipedia articles on topics such as the Brooklyn Heights and the painter Emma Fordyce MacRae, notes in a delightful essay about Wikipedia, “Wikipedia was the point of convergence for the self-taught and the expensively educated. The cranks had to consort with the mainstreamers and hash it all out” (Baker).  As Roy Rosenzweig found in a detailed analysis of Wikipedia’s appropriateness for historical research, the quality of the collaboratively-produced Wikipedia entries can be uneven: certain topics are covered in greater detail than others, and the writing can have the choppy, flat quality of something composed by committee.  But Rosenzweig also concluded that Wikipedia compares favorably with Encarta and Encyclopedia Britannica for accuracy and coverage.

4) Wikipedia entries lack authority because there’s no peer review. Well, depends on how you define “peer review.”  Granted, Wikipedia articles aren’t reviewed by two or three (typically anonymous) experts in the field, so they may lack the scholarly authority of an article published in an academic journal.  However, articles in Wikipedia can be reviewed and corrected by the entire community, including experts, knowledgeable amateurs, and others devoted to Wikipedia’s mission to develop, collect and disseminate educational content (as well as by vandals and fools, I’ll acknowledge).  Wikipedia entries aim to achieve what Wikipedians call “verifiability”; the article about Barack Obama, for instance, has as many footnotes as a law review article–171 at last count (August 31), including several from this week.

Now I’m certainly not saying that Wikipedia is always a good source for an academic work–there is some dreck in it, as in other sources.  Ultimately, I think Wikipedia’s appropriateness as an academic source depends on what is being cited and for what purpose.   Alan Liu offers students a sensible set of guidelines for the appropriate use of Wikipedia, noting that it, like other encyclopedias, can be a good starting point, but that it is “currently an uneven resource” and always in flux.  Instead of condemning Wikipedia outright, professors should help students develop what Henry Jenkins calls “new media literacies.”  By examining the history and discussion pages associated with each article, for instance, students can gain insight into how knowledge is created and how to evaluate a source.  As John Seely Brown and Richard Adler write:

The openness of Wikipedia is instructive in another way: by clicking on tabs that appear on every page, a user can easily review the history of any article as well as contributors’ ongoing discussion of and sometimes fierce debates around its content, which offer useful insights into the practices and standards of the community that is responsible for creating that entry in Wikipedia. (In some cases, Wikipedia articles start with initial contributions by passionate amateurs, followed by contributions from professional scholars/researchers who weigh in on the “final” versions. Here is where the contested part of the material becomes most usefully evident.) In this open environment, both the content and the process by which it is created are equally visible, thereby enabling a new kind of critical reading—almost a new form of literacy—that invites the reader to join in the consideration of what information is reliable and/or important.(Brown & Adler)

OK, maybe Wikipedia can be a legitimate source for student research papers–and furnish a way to teach research skills.  But should it be cited in scholarly publications?  In “A Note on Wikipedia as a Scholarly Source of Record,” part of the preface to Mechanisms, Matt Kirschenbaum offers a compelling explanation of why he cited Wikipedia, particularly when discussing technical documentation:

Information technology is among the most reliable content domains on Wikipedia, given the high interest of such topics Wikipedia’s readership and the consequent scrutiny they tend to attract.   Moreover, the ability to examine page histories on Wikipedia allows a user to recover the editorial record of a particular entry… Attention to these editorial histories can help users exercise sound judgment as to whether or not the information before them at any given moment is controversial, and I have availed myself of that functionality when deciding whether or not to rely on Wikipedia.(Kirschenbaum xvii)

With Wikipedia, as with other sources, scholars should use critical judgment in analyzing its reliability and appropriateness for citation.  If scholars carefully evaluate a Wikipedia article’s accuracy, I don’t think there should be any shame in citing it.

For more information, review the Zotero report detailing all of the works citing Wikipedia, or take a look at a spreadsheet of basic bibliographic information. I’d be happy to share my bibliographic data with anyone who is interested.

Works Cited

Baker, Nicholson. “The Charms of Wikipedia.” The New York Review of Books 55.4 (2008). 30 Aug 2008 <http://www.nybooks.com/articles/21131&gt;.

Berlet, Chip. “The Write Stuff: U. S. Serial Print Culture from Conservatives out to Neonazis.” Library Trends 56.3 (2008): 570-600. 24 Aug 2008 <http://muse.jhu.edu/journals/library_trends/v056/56.3berlet.html&gt;.

Booth, Wayne C, and Colomb, Gregory G. The Craft of Research. Chicago: U of Chicago P, 2003.

Borgman, Christine L. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass., 2007.

Brown, John Seely, and Richard P. Adler. “Minds on Fire: Open Education, the Long Tail, and Learning 2.0 .” EDUCAUSE Review 43.1 (2008): 16-32. 29 Aug 2008 <http://connect.educause.edu/Library/EDUCAUSE+Review/MindsonFireOpenEducationt/45823?time=1220007552&gt;.

Buell, Lawrence. “The Unkillable Dream of the Great American Novel: Moby-Dick as Test Case.” American Literary History 20.1 (2008): 132-155. 24 Aug 2008 <http://muse.jhu.edu/journals/american_literary_history/v020/20.1buell.pdf&gt;.

Dee, Jonathan. “All the News That’s Fit to Print Out.” The New York Times 1 Jul 2007. 30 Aug 2008 <http://www.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html&gt;.

DiZerega, Gus. “Civil Society, Philanthropy, and Institutions of Care.” The Good Society 15.1 (2006): 43-50. 24 Aug 2008 <http://muse.jhu.edu/journals/good_society/v015/15.1diZerega.html&gt;.

Jenkins, Henry. “What Wikipedia Can Teach Us About the New Media Literacies (Part One).” Confessions of an Aca/Fan 26 Jun 2007. 30 Aug 2008 <http://www.henryjenkins.org/2007/06/what_wikipedia_can_teach_us_ab.html&gt;.

Kirschenbaum, Matthew G. Mechanisms : new media and the forensic imagination. (Cambridge, Mass.: MIT Press, 2008).

Liu, Alan. “Student Wikipedia Use Policy.” 1 Apr 2007. 30 Aug 2008 <http://www.english.ucsb.edu/faculty/ayliu/courses/wikipedia-policy.html&gt;.

Margolies, Daniel S. “Robert E. Lee: Heroic, But Not the Polio Vaccine.” Reviews in American History 35.3 (2007): 385-392. 25 Aug 2008 <http://muse.jhu.edu/journals/reviews_in_american_history/v035/35.3margolies.html&gt;.

Rosenzweig, Roy. “Can History be Open Source? Wikipedia and the Future of the Past.” The Journal of American History Volume 93, Number 1 (June, 2006): 117-46.  Available at http://chnm.gmu.edu/resources/essays/d/42

Young, Jeffrey. “Wikipedia’s Co-Founder Wants to Make It More Useful to Academe.” Chronicle of Higher Education 13 Jun 2008. 28 Aug 2008 <http://chronicle.com/free/v54/i40/40a01801.htm?utm_source=at&utm_medium=en&gt;.

Doing Digital Scholarship: Presentation at Digital Humanities 2008

Note:  Here is roughly what I said during my presentation at Digital Humanities 2008 in Oulu, Finland (or at least meant to say—I was so sleep deprived thanks to the unceasing sunshine that I’m not sure what I actually did say).  My session, which explored the meaning and significance of “digital humanities,” also featured rich, engaging presentations by Edward Vanhoutte on the history of humanities computing and John Walsh on comparing alchemy and digital humanities.  My presentation reports on my project to remix my dissertation as a work of digital scholarship and synthesizes many of my earlier blog posts to offer a sort of Reader’s Digest condensed version of my blog for the past 7 months. By the way, sorry that I’ve been away from the blog for so long.  I’ve spent the last month and a half researching and writing a 100 page report on archival management software,  reviewing essays, performing various other professional duties, and going on both a family vacation to San Antonio and a grown-up vacation to Portland, OR (vegan meals followed up by Cap’n Crunch donuts.  It took me a week to recover from the donut hangover).  In the meantime, lots of ideas have been brewing, so expect many new blog entries soon.

***

When I began working on my dissertation in the mid 1990s, I used a computer primarily to do word processing—and goof off with Tetris.  Although I used digital collections such as Early American Fiction and Making of America for my dissertation project on bachelorhood in 19th C American literature, I did much of my research the old fashioned way: flipping through the yellowing pages of 19th century periodicals on the hunt for references to bachelors, taking notes using my lucky leaky fountain pen.  I relied on books for my research and, in the end, produced a book.

At the same time that I was dissertating, I was also becoming enthralled by the potential of digital scholarship through my work at the University of Virginia’s (late lamented) Electronic Text Center.  I produced an electronic edition of the first section from Donald Grant Mitchell’s bestseller Reveries of a Bachelor that allowed readers to toggle between variants.   I even convinced my department to count Perl as a second language, citing the Matt Kirschenbaum precedent (“come on, you let Matt do it, and look how well that turned out”) and the value of computer languages to my profession as a budding digital humanist.  However, I decided not to create an electronic version of my dissertation (beyond a carefully backed-up Word file) or to use computational methods in doing my research, since I wanted to finish the darn thing before I reached retirement age.

Last year, five years after I received my PhD and seven years after I had become the director of Rice University’s Digital Media Center, I was pondering the potential of digital humanities, especially given mass digitization projects and the emergence of tools such as TAPOR and Zotero.  I wondered: What is digital scholarship, anyway?  What does it take to produce digital scholarship? What kind of digital resources and tools are available to support it? To what extent do these resources and tools enable us to do research more productively and creatively? What new questions do these tools and resources enable us to ask? What’s challenging about producing digital scholarship? What happens when scholars share research openly through blogs, institutional repositories, & other means?

I decided to investigate these questions by remixing my 2002 dissertation as a work of digital scholarship.  Now I’ll acknowledge that my study is not exactly scientific—there is a rather subjective sample of one.  However, I figured, somewhat pragmatically, that the best way for me to understand what digital scholars face was to do the work myself.  I set some loose guidelines: I would rely on digital collections as much as possible and would experiment with tools for analyzing, annotating, organizing, comparing and visualizing digital information.  I would also explore different ways of representing my ideas, such as hypertextual essays and videos.  Embracing social scholarship, I would do my best to share my work openly and make my research process transparent.  So that the project would be fun and evolve organically, I decided to follow my curiosity wherever it led me, imagining that I would end up with a series of essays on bachelorhood in 19th century American culture and, as sort of an exoskeleton, meta-reflections on the process of producing digital scholarship.

My first challenge was defining digital scholarship.  The ACLS Commission on Cyberinfrastructure’s report points to five manifestations of digital scholarship: collection building, tools to support collection building, tools to support analysis, using tools and collections to produce “new intellectual products,” and authoring tools.   Some might argue we shouldn’t really count tool and collection building as scholarship.  I’ll engage with this question in more detail in a future post, but for now let me say that most consider critical editions, bibliographies, dictionaries and collations, arguably the collections and tools of the pre-digital era, to be scholarship.  In many cases, building academic tools and collections requires significant research and expertise and results in the creation of knowledge—so, scholarship.   Still, my primary focus is on the fourth aspect, leveraging digital resources and tools to produce new arguments.  I’m realizing along the way, though, that I may need to build my own personal collections and develop my own analytical tools to do the kind of scholarship I want to do.

In a recent presentation at CNI, Tara McPherson, the editor of Vectors, offered her own “Typology of Digital Humanities”:
•    The Computing Humanities: focused on building tools, infrastructure, standards and collections, e.g. The Blake Archive
•    The Blogging Humanities: networked, peer-to-peer, e.g. crooked timber
•    The Multimodal Humanities: “bring together databases, scholarly tools, networked writing, and peer-to-peer commentary while also leveraging the potential of the visual and aural media that so dominate contemporary life,” e.g. Vectors

Mashing up these two frameworks, my own typology would look something like this:

•    Tools, e.g. TAPOR, Zotero
•    Collections, e.g. The Blake Archive
•    Theories, e.g. McGann’s Radiant Textuality
•    Interpretations and arguments that leverage digital collections and tools, e.g. Ayers and Thomas’ The Difference Slavery Made
•    Networked Scholarship: a term that I borrow from the Institute for the Future of the Book’s Bob Stein and that I prefer to “blogging humanities,” since it encompasses many modes of communication, such as wikis, social bookmarking, institutional repositories, etc. Examples include Savage Minds (a group blog in anthropology), etc.
•    Multimodal scholarship: e.g. scholarly hypertexts and videos, e.g. what you might find in Vectors
•    Digital cultural studies, e.g. game studies, Lev Manovich’s work, etc (this category overlaps with theories)

Initially I assumed that tools, theories and collections would feed into arguments that would be expressed as networked and/or multimodal scholarship and be informed by digital cultural studies.  But I think that describing digital scholarship as a sort of assembly line in which scholars use tools, collections and theories to produce arguments oversimplifies the process.  My initial diagram of digital scholarship pictured single-headed arrows linking different approaches to digital scholarship; my revised diagram looks more like spaghetti, with arrows going all over the place.  Theories inform collection building; the process of blogging helps to shape an argument; how a scholar wants to communicate an idea influences what tools are selected and how they are used.

After coming up with a preliminary definition of what I wanted to do, I needed to figure out how to structure my work.  I thought of John Unsworth’s notion of scholarly primitives, a compelling description of core research practices.  Depending on how you count them, Unsworth identifies 7 scholarly primitives:
•    Discovering
•    Annotating
•    Comparing
•    Referring
•    Sampling
•    Illustrating
•    Representing

As useful as this list is in crystallizing what scholars do, I think the list is missing at least one more crucial scholarly primitive, perhaps the fundamental one: collaboration. Although humanists are stereotyped as solitary scholars isolated in the library, they often work together, whether through co-editing journals or books, sharing citations, or reviewing one another’s work.  In the digital humanities, of course, developing tools, standards, and collections demands collaboration among scholars, librarians, programmers, etc.  I would also define networked scholarship—blogging, contributing to wikis, etc—as collaborative, since it requires openly sharing ideas and supports conversation. It’s only appropriate for me to note that this idea was worked out collaboratively, with colleagues at THAT Camp.

I want to make my research process as visible as possible, not only for idealistic reasons, but also because my work only gets better the more feedback I receive.  So I started up a blog—actually, several of them. At the somewhat grandly-named Digital Scholarship in the Humanities, I reflect on trends in the digital humanities and on broader lessons learned in the process of doing my research project.  In “Lisa Spiro’s Research Notes,”  I typically address stuff that seems too specialized, half-baked, or even raw for me to put on my main blog, such as my navel gazing on where to take my project next, or my experiments with Open Wound, a language re-mixing tool.   At my PageFlakes research portal, I provide a single portal to the various parts of my research project, offering RSS feeds for both of my blogs as well as for a Google News search of the term “digital humanities,” my delicious bookmarks for “digital scholarship,” links to my various digital humanities projects, and more.

I’ll admit that when I started my experiments with social scholarship I worried that no one would care, or that I would embarrass myself by writing something really stupid, but so far I’ve loved the experience.  Through comments and emails from readers, I’m able to see other perspectives and improve my own thinking.  I’ve heard from biologists and anthropologists as well as literary scholars and historians, and I’ve communicated with researchers from several countries.  As a result, I feel more engaged in the research community and more motivated to keep working.   Although I know blogging hasn’t caught on in every corner of academia, I think it has been good for my career as a digital humanist.  I am more visible and thus have more opportunities to participate in the community, such as by reviewing book proposals, articles, and grant applications.

I don’t have space to discuss the relevance of each scholarly primitive to my project, but I did want to mention a few of them: discovering, comparing, and representing.

Discovering

In order to use text analysis and other tools, I needed my research materials to be in an electronic format.  In the age of mass digitization projects such as Google Books and the Open Content Alliance, I wondered how many of my 296 original research sources are digitized & available in full text.  So I diligently searched Google Books and several other sources to find out.  I looked at 5 categories: archival resources as well as primary and secondary books and journals.   I found that with the exception of archival materials, over 90% of the materials I cited in my bibliography are in a digital format.  However, only about 83% of primary resources and 37% of the secondary materials are available as full text.  If you want to do use text analysis tools on 19th century American novels or 20th century articles from major humanities journals, you’re in luck, but the other stuff is trickier because of copyright constraints.  (I’ll throw in another scholarly primitive, annotation, and say that I use Zotero to manage and annotate my research collections, which has made me much more efficient and allowed me to see patterns in my research collections.)

Of course, scholars need to be able to trust the authority of electronic resources.  To evaluate quality, I focused on four collections that have a lot of content in my field, 19th century American literature: Google Books, Open Content Alliance, Early American Fiction (a commercial database developed by UVA’s Electronic Text Center), and Making of America.  I found that there were some scanning errors with Google Books, but not as many as I expected. I wished that Google Books provided full text rather than PDF files of its public domain content, as do Open Content Alliance and Making of America (and EAF, if you just download the HTML).  I had to convert Google’s PDF files to Adobe Tagged Text XML and got disappointing results.  The OCR quality for Open Content Alliance was better, but words were not joined across line breaks, reducing accuracy.  With multi-volume works, neither Open Content Alliance nor Google Books provided very good metadata.  Still, I’m enough of a pragmatist to think that having access to this kind of data will enable us to conduct research across a much wider range of materials and use sophisticated tools to discern patterns – we just need to be aware of the limitations.

Comparing
To evaluate the power of text analysis tools for my project, I did some experiments using TAPOR tools, including a comparison of two of my key bachelor texts: Mitchell’s Reveries of a Bachelor, a series of a bachelor’s sentimental dreams (sometimes nightmares) about what it would be like to be married, and Melville’s Pierre, which mixes together elements of sentimental fiction, Gothic literature, and spiritualist tracts to produce a bitter satire.   I wondered if there was a family resemblance between these texts.  First I used the Wordle word cloud generator to reveal the most frequently appearing words.  I noted some significant overlap, including words associated with family such as mother and father, those linked with the body such as hand and eye, and those associated with temporality, such as morning, night, and time.  To develop a more precise understanding of how frequently terms appeared in the two texts and their relation to each other, I used TAPOR’s Comparator tool.  This tool also revealed words unique to each work, such as “flirt” and “sensibility” in the case of Reveries, “ambiguities” and “miserable” in the case of Pierre.  Finally, I used TAPOR’s concordance tool to view key terms in context.  I found, for instance, that in Mitchell “mother” is often associated with hands or heart, while in Melville it appears with terms indicating anxiety or deceit.  By abstracting out frequently occurring and unique words, I can how Melville, in a sense, remixes elements of sentimental fiction, putting terms in a darker context.  The text analysis tools provide a powerful stimulus to interpretation.

Representing
Not only am I using the computer to analyze information, but also to represent my ideas in a more media-rich, interactive way than the typical print article.  I plan to experiment with Sophie as a tool for authoring multimodal scholarship, and I’m also experimenting with video as a means for representing visual information. Right now I’m reworking an article on the publication history of Reveries of a Bachelor as a video so that I show significant visual information such as bindings, illustrations, and advertisements.    I’ve condensed a 20+ page article into a 7 minute narrative, which for a prolix person like me is rough.  I also have been challenged to think visually and cinematically, considering how the movement of the camera and the style of transitions shape the argument.  Getting the right imagery—high quality, copyright free—has been tricky as well.  I’m not sure how to bring scholarly practices such as citation into videos.  Even though my draft video is, frankly, a little amateurish, putting it together has been lots of fun, and I see real potential for video to allow us to go beyond text and bring the human voice, music, movement and rich imagery into scholarly communication.

On Tools
In the course of my experiments in digital scholarship, I often found myself searching for the right tool to perform a certain task.  Likewise, in my conversations with researchers who aren’t necessarily interested in doing digital scholarship, just in doing their research better, I learned that they weren’t aware of digital tools and didn’t know where to find out about them.  To make it easier for researchers to discover relevant tools, I teamed up with 5 other librarians to launch the Digital Research Tools, or DiRT, wiki at the end of May.   DiRT provides a directory of digital research tools, primarily free but also commercial, categorized by their functions, such as “manage citations.”  We are also writing reviews of tools geared toward researchers and trying to provide examples of how these tools are used by the research community.  Indeed, DiRT focuses on the needs of the community; the wiki evolves thanks to its contributors.   Currently 14 people in fields such as anthropology, communications, and educational technology have signed on to be contributors.  Everything is under a Creative Commons attribution license.  We would love to see spin-offs, such as DiRT in languages besides English; DiRT for developers; and Old DiRT (dust?), the hall of obsolete but still compelling tools.  My experiences with DiRT have demonstrated again the beauty of collaboration and sharing.  Both Dan Cohen of CHNM & Alan Liu of UC Santa Barbara generously offered to let us grab content from their own tools directories.  Busy folks have freely given their time to add tools to DiRT.  Through my work on DiRT, I’ve learned about tools outside of my field, such as qualitative data analysis software.

So I’ll end with an invitation: Please contribute to DiRT.  You can sign up to be an editor or reviewer, recommend tools to be added, or provide feedback via our survey.  Through efforts like DiRT, we hope to enable new digital scholarship, raise the profile of inventive digital tools, and build community.