Category Archives: digital scholarship

Digital Humanities in 2008, II: Scholarly Communication & Open Access

Open access, just like dark chocolate and blueberries, is good and good for you, enabling information to be mined and reused, fostering the exchange of ideas, and ensuring public access to research that taxpayers often helped to fund.  Moreover, as Dan Cohen contends, scholars benefit from open access to their work, since their own visibility increases: “In a world where we have instantaneous access to billions of documents online, why would you want the precious article or book you spent so much time on to exist only on paper, or behind a pay wall? This is a sure path to invisibility in the digital age.”  Thus some scholars are embracing social scholarship, which promotes openness, collaboration, and sharing research.  This year saw some positive developments in open access and scholarly communications, such as the implementation of the NIH mandate, Harvard’s Faculty of Arts & Science’s decision to go open access (followed by Harvard Law), and the launch of the Open Humanities Press.  But there were also some worrisome developments (the Conyers Bill’s attempt to rescind the NIH mandate, EndNote’s lawsuit against Zotero) and some confusing ones (the Google Books settlement).  In the second part of my summary on the year in digital humanities, I’ll look broadly at the scholarly communication landscape, discussing open access to educational materials, new publication models, the Google Books settlement, and cultural obstacles to digital publication.

Open Access Grows–and Faces Resistance

In December of 2007, the NIH Public Access Policy was signed into law, mandating that any research funded by the NIH would be deposited in PubMed

Ask Me About Open Access by mollyali

Ask Me About Open Access by mollyali

Central within a year of its publication.  Since the mandate was implemented, almost 3000 new biomedical manuscripts have been deposited into PubMed Central each month.  Now John Conyers has put forward a bill that would rescind the NIH mandate and prohibit other federal agencies from implementing similar policies.  This bill would deny the public access to research that it funded and choke innovation and scientific discovery.   According to Elias Zerhouni, former director of the NIH, there is no evidence that the mandate harms publishers; rather, it maximizes the public’s “return on its investment” in funding scientific research.  If you support public access to research, contact your representative and express your opposition to this bill before February 28.  The Alliance for Taxpayer Access offers a useful summary of key issues as well as a letter template at http://www.taxpayeraccess.org/action/HR801-09-0211.html.

Open Humanities?

Why has the humanities been lagging behind the sciences in adopting open access?  Gary Hall points to several ways in which the sciences differ from the humanities, including science’s greater funding  for “author pays” open access and emphasis  on disseminating information rapidly, as well as humanities’ “negative perception of the digital medium.”   But Hall is challenging that perception by helping to launch the Open Humanities Press (OHP) and publishing “Digitize This Book.”  Billing itself as “an international open access publishing collective in critical and cultural theory,” OHP  selects journals for inclusion in the collective  based upon their adherence to publication standards, open access standards, design standards, technical standards, and editorial best practices. Prominent scholars such as Jonathan Culler, Stephen Greenblatt, and Jerome McGann have signed on as board members of the Open Humanities Press, giving it more prestige and academic credibility.  In a talk at UC Irvine last spring,  OHP co-founder Sigi Jӧttkandt refuted the assumption that open access means “a sort of open free-for-all of publishing” rather than high-quality, peer-reviewed scholarship.  Jӧttkandt argued that open access should be fundamental to the digital humanities: “as long as the primary and secondary materials that these tools operate on remain locked away in walled gardens, the Digital Humanities will fail to fulfill the real promise of innovation contained in the digital medium.”  It’s worth noting that many digital humanities resources are available as open access, including Digital Humanities Quarterly, the Rossetti Archive, and projects developed by CHNM; many others may not be explicitly open access, but they make information available for free.

In “ANTHROPOLOGY OF/IN CIRCULATION: The Future of Open Access and Scholarly Societies,” Christopher Kelty, Michael M. J. Fischer, Alex “Rex” Golub, Jason Baird Jackson, Kimberly Christen, and Michael F. Brown engage in a wide-ranging discussion of open access in anthropology, prompted in part by the American Anthropological Association’s decision to move its publishing activities to Wiley Blackwell.  This rich conversation explores different models for open access, the role of scholarly societies in publishing, building community around research problems, reusing and remixing scholarly content, the economics of publishing, the connection between scholarly reputation and readers’ access to publications, how to make content accessible to source communities, and much more.   As Kelty argues, “The future of innovative scholarship is not only in the AAA (American Anthropological Association) and its journals, but in the structures we build that allow our research to circulate and interact in ways it never could before.”  Kelty (who, alas, was lured away from Rice by UCLA) is exploring how to make scholarship more open and interactive.  You can buy a print copy of Two Bits, his new book on the free software movement published by Duke UP; read (for free) a PDF version of the book; comment on the CommentPress version; or download and remix the HTML.  Reporting on Two Bits at Six Months, Kelty observed, “Duke is making as little or as much money on the book as they do on others of its ilk, and yet I am getting much more from it being open access than I might otherwise.”  The project has made Kelty more visible as a scholar, leading to more media attention, invitations to give lectures and submit papers, etc.

New Models of Scholarly Communication, and Continued Resistance

To what extent are new publishing models emerging as the Internet enables the rapid, inexpensive distribution of information, the incorporation of multimedia into publications, and networked collaboration? To find out, The ARL/ Ithaka New Model Publications Study conducted an “organized scan” of emerging scholarly publications such as blogs, ejournals, and research hubs.  ARL recruited 301 volunteer librarians from 46 colleges and universities to interview faculty about new model publications that they used.  (I participated in a small way, interviewing one faculty member at Rice.)  According to the report, examples of new model publications exist in all disciplines, although scientists are more likely to use pre-print repositories, while humanities scholars participate more frequently in discussion forums.  The study identifies eight principal types of scholarly resources:

  • E-only journals
  • Reviews
  • Preprints and working papers
  • Encyclopedias, dictionaries, and annotated  content
  • Data
  • Blogs
  • Discussion forums
  • Professional and scholarly hubs

These categories provide a sort of abbreviated field manual to identifying different types of new model publications.  I might add a few more categories, such as collaborative commentary or peer-to-peer review (exemplified by projects that use CommentPress); scholarly wikis like OpenWetWare that enable open sharing of scholarly information; and research portals like NINES (which perhaps would be considered a “hub”).   The report offers fascinating examples of innovative publications, such as ejournals that publish articles as they are ready rather on a set schedule and a video journal that documents experimental methods in biology.   Since only a few examples of new model publications could fit into this brief report, ARL is making available brief descriptions of 206 resources that it considered to be  “original and scholarly works” via a publicly accessible database.

My favorite example of a new model publication: eBird, a project initiated by  the Cornell Lab of Ornithology and the Audobon Society that enlists amateur and professional bird watchers to collect bird observation data.  Scientists then use this data to understand the “distribution and abundance” of birds.  Initially eBird ran into difficulty getting birders to participate, so they developed tools that allowed birders to get credit and feel part of a community, to “manage and maintain their lists online, to compare their observations with others’ observations.” I love the motto and mission of eBird—“Notice nature.”  I wonder if a similar collaborative research site could be set up for, say, the performing arts (ePerformances.org?), where audience members would document arts and humanities in the wild–plays, ballets, performance art, poetry readings, etc.

The ARL/Ithaka report also highlights some of the challenges faced by these new model publications, such as the conservatism of academic culture, the difficulty of getting scholars to participate in online forums, and finding ways to fund and sustain publications.  In  Interim Report: Assessing the Future Landscape of Scholarly Communication, Diane Harley and her colleagues at the University of California Berkeley delve into some of these challenges.  Harley finds that although some scholars are interested in publishing their research as interactive multimedia, “(1) new forms must be perceived as having undergone rigorous peer review, (2) few untenured scholars are presenting such publications as part of their tenure cases, and (3) the mechanisms for evaluating new genres (e.g., nonlinear narratives and multimedia publications) may be prohibitive for reviewers in terms of time and inclination.” Humanities researchers are typically less concerned with the speed of publication than scientists and social scientists, but they do complain about journals’ unwillingness to include many high quality images and would like to link from their arguments to supporting primary source material. However, faculty are not aware of any easy-to-use tools or support that would enable them to author multimedia works and are therefore less likely to experiment with new forms.  Scholars in all fields included in the study do share their research with other scholars, typically through emails and other forms of personal communication, but many regard blogs as “a waste of time because they are not peer reviewed.”  Similarly, Ithaka’s 2006 Studies of Key Stakeholders in the Digital Transformation in Higher Education (published in 2008) found that “faculty decisions about where and how to publish the results of their research are principally based on the visibility within their field of a particular option,” not open access.

But academic conservatism shouldn’t keep us from imagining and experimenting with alternative approaches to scholarly publishing.  Kathleen Fitzpatrick’s “book-like-object” (blob) proposal, Planned Obsolescence: Publishing, Technology, and the Future of the Academy, offers a bold and compelling vision of the future of academic publishing.  Fitzpatrick calls for academia to break out of its zombie-like adherence to (un)dead forms and proposes “peer-to-peer” review (as in Wikipedia), focusing on process rather than product (as in blogs), and engaging in networked conversation (as in CommentPress). (If references to zombies and blobs make you think Fitzpatrick’s stuff is fun to read as well as insightful, you’d be right.)

EndNote Sues Zotero

Normally I have trouble attracting faculty and grad students to workshops exploring research tools and scholarly communication issues, but they’ve been flocking to my workshops on Zotero, which they recognize as a tool that will help them work more productively.  Apparently Thomson Reuters, the maker of EndNote, has noticed the competitive threat posed by Zotero, since they have sued George Mason University, which produces Zotero, alleging that programmers reverse engineered EndNote so that they could convert proprietary EndNote .ens files into open Zotero .csl files.  Commentators more knowledgeable about the technical and legal details than I have found Thomson’s claims to be bogus.  My cynical read on this lawsuit is that EndNote saw a threat from a popular, powerful open source application and pursued legal action rather than competing by producing a better product.  As Hugh Cayless suggests, “This is an act of sheer desperation on the part of Thomson Reuters” and shows that Zotero has “scared your competitors enough to make them go running to Daddy, thus unequivocally validating your business model.”

The lawsuit seems to realize Yokai Benkler’s description of proprietary attempts to control information:

“In law, we see a continual tightening of the control that the owners of exclusive rights are given.  Copyrights are longer, apply to more uses, and are interpreted as reaching into every corner of valuable use. Trademarks are stronger and more aggressive. Patents have expanded to new domains and are given greater leeway. All these changes are skewing the institutional ecology in favor of business models and production practices that are based on exclusive proprietary claims; they are lobbied for by firms that collect large rents if these laws are expanded, followed, and enforced. Social trends in the past few years, however, are pushing in the opposite direction.”

Unfortunately, the lawsuit seems to be having a chilling effect that ultimately will, I think, hurt EndNote.  For instance, the developers of BibApp, “a publication-list manager and repository-populator,” decided not to import citation lists produced by EndNote, since “doing anything with their homegrown formats has been proven hazardous.” This lawsuit raises the crucial issue of whether researchers can move their data from one system to another.  Why would I want to choose a product that locks me in?  As Nature wrote in an editorial quoted by CHNM in its response to the lawsuit, “The virtues of interoperability and easy data-sharing among researchers are worth restating.”

Google Books Settlement

Google Books by Jon Wiley

Google Books by Jon Wiley

In the fall, Google settled with the Authors Guild and the Association of American Publishers over Google Book Search, allowing academic libraries to subscribe to a full-text collection of millions of out-of-print but (possibly) in-copyright books.  (Google estimates that about 70% of published books fall into this category).  Individuals can also purchase access to books, and libraries will be given a single terminal that will provide free access to the collection.  On a pragmatic (and gluttonous) level, I think, Oh boy, this settlement will give me access to so much stuff.   But, like others, I am concerned about one company owning all of this information, see the Book Rights Registry as potentially anti-competitive, and wish that a Google victory in court had verified fair use principles (even if such a decision probably would have kept us in snippet view or limited preview for in-copyright materials).  Libraries have some legitimate concerns about access, privacy, intellectual freedom, equitable treatment, and terms of use.  Indeed, Harvard pulled out of the project over concerns about cost and accessibility.  As Robert Darnton, director of the Harvard Library and a prominent scholar of book history, wrote in the NY Review of Books, “To digitize collections and sell the product in ways that fail to guarantee wide access… would turn the Internet into an instrument for privatizing knowledge that belongs in the public sphere.” Although the settlement makes a provision for “non-consumptive research” (using the books without reading them) that seems to allow for text mining and other computational research, I worry that digital humanists and other scholars won’t have access to the data they need.  What if Google goes under, or goes evil? But the establishment of the Hathi Trust by several of Google Book’s academic library partners (and others) makes me feel a little better about access and preservation issues, and I noted that Hathi Trust will provide a corpus of 50,000 documents for the NEH’s Digging into the Data Challenge.  And as I argued in an earlier series of blog posts, I certainly do see how Google Books can transform research by providing access to so much information.

Around the same time (same day?) that the Google Books settlement was released, the Open Content Alliance (OCA) reached an important milestone, providing access to over a million books.  As its name suggests, the OCA makes scanned books openly available for reading, download, and analysis, and from my observations the quality of the digitization is better.  Although the OCA’s collection is smaller and it focuses on public domain materials, it offers a vital alternative to GB.  (Rice is a member of the Open Content Alliance.)

Next up in the series on digital humanities in 2008: my attempt to summarize recent developments in research.

Advertisements

Work Product Blog

Matt Wilkens, post-doctoral fellow at Rice’s Humanities Research Center, recently launched Work Product, a blog that chronicles his research in digital humanities, contemporary fiction, and literary theory.  Matt details how he is working through the challenges he faces as he tries to analyze the relationship between allegory and revolution by using text mining, such as:
•    Where and how to get large literary corpora. Matt looks at how much content is available through Project Gutenberg, Open Content Alliance, Google Books, and  Hathi Trust and  how difficult it is to access
•    Evaluating Part of Speech taggers, with information about speed and accuracy

I think that other researchers working on text mining projects will benefit from Matt’s careful documentation of his process.

By the way, Matt’s blog can be thought of as part of the movement called “open notebook science,” which Jean Claude Bradley defines as “a laboratory notebook… that is freely available and indexed on common search engines.”  Other humanities and social sciences blogs that are likewise ongoing explorations of particular research projects include Wesley Raabe’s blog, Another Anthro Blog, and Erkan’s Field Diary.  (Please alert me to others!)

Is Wikipedia Becoming a Respectable Academic Source?

Last year a colleague in the English department described a conversation in which a friend revealed a dirty little secret: “I use Wikipedia all the time for my research—but I certainly wouldn’t cite it.”  This got me wondering: How many humanities and social sciences researchers are discussing, using, and citing Wikipedia?  To find out, I searched Project Muse and JSTOR, leading electronic journal collections for the humanities and social sciences, for the term “wikipedia,” which picked up both references to Wikipedia and citations of the wikipedia URL.  I retrieved 167 results from between 2002 and 2008, all but 8 of which came from Project Muse.  (JSTOR covers more journals and a wider range of disciplines but does not provide access to issues published in the last 3-5 years.)  In contrast, Project Muse lists 149 results in a search for “Encyclopedia Britannica” between 2002 and 2008, and JSTOR lists 3.  I found that citations of Wikipedia have been increasing steadily: from 1 in 2002 (not surprisingly, by Yochai Benkler) to 17 in 2005 to 56 in 2007. So far Wikipedia has been cited 52 times in 2008, and it’s only August.

Along with the increasing number of citations, another indicator that Wikipedia may be gaining respectability is its citation by well-known scholars.  Indeed, several scholars both cite Wikipedia and are themselves subjects of Wikipedia entries, including Gayatri Spivak, Yochai Benkler, Hal Varian, Henry Jenkins, Jerome McGann, Lawrence Buell, and Donna Haraway.

111 of the sources (66.5%) are what I call “straight citations”—citations of Wikipedia without commentary about it–while 56 (34.5%) comment on Wikipedia as a source, either positively or negatively.  14.5% of the total citations come from literary studies, 14% from cultural studies, 11.4% from history, and 6.6% from law. Researchers cite Wikipedia on a diversity of topics, ranging from the military-industrial complex to horror films to Bush’s second state of the union speech.  8 use Wikipedia simply as a source for images (such as an advertisement for Yummy Mummy cereal or a diagram of the architecture of the Internet).  Many employ Wikipedia either as a source for information about contemporary culture or as a reflection of contemporary cultural opinion.  For instance, to illustrate how novels such as The Scarlet Letter and Uncle Tom’s Cabin have been sanctified as “Great American Novels,” Lawrence Buell cites the Wikipedia entry on “Great American Novel”(Buell).

About a third of the articles I looked at discuss the significance of Wikipedia itself.  14 (8%) criticize using it in research.  For instance, a reviewer of a biography about Robert E. Lee tsks-tsks:

The only curiosities are several references to Wikipedia for information that could (and should) have been easily obtained elsewhere (battle casualties, for example). Hopefully this does not portend a trend toward normalizing this unreliable source, the very thing Pryor decries in others’ work. (Margolies).

In contrast, 11 (6.6%) cite Wikipedia as a model for participatory culture.  For example:

The rise of the net offers a solution to the major impediment in the growth and complexification of the gift economy, that network of relationships where people come together to pursue public values. Wikipedia is one example.(DiZerega)

A few (1.8%) cite Wikipedia self-consciously, aware of its limitations but asserting its relevance for their particular project:

Citing Wikipedia is always dicey, but it is possible to cite a specific version of an entry. Start with the link here, because cybervandals have deleted the list on at least one occasion. For a reputable “permanent version” of “Alternative press (U.S. political right)” see: http://en.wikipedia.org/w/index.php?title=Alternative_press_%28U.S._political_right%29&oldid=107090129 (Berlet).

Of course, just because more researchers—including some prominent ones—are citing Wikipedia does not mean it’s necessarily a valid source for academic papers.  However, you can begin to see academic norms shifting as more scholars find useful information in Wikipedia and begin to cite it.  As Christine Borgman notes, “Scholarly documents achieve trustworthiness through a social process to assure readers that the document satisfies the quality norms of the field” (Borgman 84).  As a possible sign of academic norms changing in some disciplines, several journals, particularly those focused on contemporary culture, include 3 or more articles that reference Wikipedia: Advertising and Society Review (7 citations), American Quarterly (3 citations), College Literature (3 citations), Computer Music Journal (5 citations), Indiana Journal of Global Legal Studies (3 citations), Leonardo (8 citations), Library Trends (5 citations), Mediterranean Quarterly (3 citations), and Technology and Culture (3 citations).

So can Wikipedia be a reputable scholarly resource?  I typically see four main criticisms of Wikipedia:

1) Research projects shouldn’t rely upon encyclopedias. Even Jimmy Wales, (co?-)founder of Wikipedia, acknowledges “I still would say that an encyclopedia is just not the kind of thing you would reference as a source in an academic paper. Particularly not an encyclopedia that could change instantly and not have a final vetting process” (Young).  But an encyclopedia can be a valid starting point for research.  Indeed, The Craft of Research, a classic guide to research, advises that researchers consult reference works such as encyclopedias to gain general knowledge about a topic and discover related works (80).  Wikipedia covers topics often left out of traditional reference works, such as contemporary culture and technology.  Most if not all of the works I looked at used Wikipedia to offer a particular piece of background information, not as a foundation for their argument.

2) Since Wikipedia is constantly undergoing revisions, it is too unstable to cite; what you read and verified today might be gone tomorrow–or even in an hour.  True, but Wikipedia is developing the ability for a particular version of an entry to be vetted by experts and then frozen, so researchers could cite an authoritative, unchanging version (Young).  As the above citation from Berlet indicates, you can already provide a link to a specific version of an article.

3) You can’t trust Wikipedia because anyone—including folks with no expertise, strong biases, or malicious (or silly) intent—can contribute to it anonymously.  Yes, but through the back and forth between “passionate amateurs,” experts, and Wikipedia guardians protecting against vandals, good stuff often emerges. As Nicholson Baker, who has himself edited Wikipedia articles on topics such as the Brooklyn Heights and the painter Emma Fordyce MacRae, notes in a delightful essay about Wikipedia, “Wikipedia was the point of convergence for the self-taught and the expensively educated. The cranks had to consort with the mainstreamers and hash it all out” (Baker).  As Roy Rosenzweig found in a detailed analysis of Wikipedia’s appropriateness for historical research, the quality of the collaboratively-produced Wikipedia entries can be uneven: certain topics are covered in greater detail than others, and the writing can have the choppy, flat quality of something composed by committee.  But Rosenzweig also concluded that Wikipedia compares favorably with Encarta and Encyclopedia Britannica for accuracy and coverage.

4) Wikipedia entries lack authority because there’s no peer review. Well, depends on how you define “peer review.”  Granted, Wikipedia articles aren’t reviewed by two or three (typically anonymous) experts in the field, so they may lack the scholarly authority of an article published in an academic journal.  However, articles in Wikipedia can be reviewed and corrected by the entire community, including experts, knowledgeable amateurs, and others devoted to Wikipedia’s mission to develop, collect and disseminate educational content (as well as by vandals and fools, I’ll acknowledge).  Wikipedia entries aim to achieve what Wikipedians call “verifiability”; the article about Barack Obama, for instance, has as many footnotes as a law review article–171 at last count (August 31), including several from this week.

Now I’m certainly not saying that Wikipedia is always a good source for an academic work–there is some dreck in it, as in other sources.  Ultimately, I think Wikipedia’s appropriateness as an academic source depends on what is being cited and for what purpose.   Alan Liu offers students a sensible set of guidelines for the appropriate use of Wikipedia, noting that it, like other encyclopedias, can be a good starting point, but that it is “currently an uneven resource” and always in flux.  Instead of condemning Wikipedia outright, professors should help students develop what Henry Jenkins calls “new media literacies.”  By examining the history and discussion pages associated with each article, for instance, students can gain insight into how knowledge is created and how to evaluate a source.  As John Seely Brown and Richard Adler write:

The openness of Wikipedia is instructive in another way: by clicking on tabs that appear on every page, a user can easily review the history of any article as well as contributors’ ongoing discussion of and sometimes fierce debates around its content, which offer useful insights into the practices and standards of the community that is responsible for creating that entry in Wikipedia. (In some cases, Wikipedia articles start with initial contributions by passionate amateurs, followed by contributions from professional scholars/researchers who weigh in on the “final” versions. Here is where the contested part of the material becomes most usefully evident.) In this open environment, both the content and the process by which it is created are equally visible, thereby enabling a new kind of critical reading—almost a new form of literacy—that invites the reader to join in the consideration of what information is reliable and/or important.(Brown & Adler)

OK, maybe Wikipedia can be a legitimate source for student research papers–and furnish a way to teach research skills.  But should it be cited in scholarly publications?  In “A Note on Wikipedia as a Scholarly Source of Record,” part of the preface to Mechanisms, Matt Kirschenbaum offers a compelling explanation of why he cited Wikipedia, particularly when discussing technical documentation:

Information technology is among the most reliable content domains on Wikipedia, given the high interest of such topics Wikipedia’s readership and the consequent scrutiny they tend to attract.   Moreover, the ability to examine page histories on Wikipedia allows a user to recover the editorial record of a particular entry… Attention to these editorial histories can help users exercise sound judgment as to whether or not the information before them at any given moment is controversial, and I have availed myself of that functionality when deciding whether or not to rely on Wikipedia.(Kirschenbaum xvii)

With Wikipedia, as with other sources, scholars should use critical judgment in analyzing its reliability and appropriateness for citation.  If scholars carefully evaluate a Wikipedia article’s accuracy, I don’t think there should be any shame in citing it.

For more information, review the Zotero report detailing all of the works citing Wikipedia, or take a look at a spreadsheet of basic bibliographic information. I’d be happy to share my bibliographic data with anyone who is interested.

Works Cited

Baker, Nicholson. “The Charms of Wikipedia.” The New York Review of Books 55.4 (2008). 30 Aug 2008 <http://www.nybooks.com/articles/21131&gt;.

Berlet, Chip. “The Write Stuff: U. S. Serial Print Culture from Conservatives out to Neonazis.” Library Trends 56.3 (2008): 570-600. 24 Aug 2008 <http://muse.jhu.edu/journals/library_trends/v056/56.3berlet.html&gt;.

Booth, Wayne C, and Colomb, Gregory G. The Craft of Research. Chicago: U of Chicago P, 2003.

Borgman, Christine L. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass., 2007.

Brown, John Seely, and Richard P. Adler. “Minds on Fire: Open Education, the Long Tail, and Learning 2.0 .” EDUCAUSE Review 43.1 (2008): 16-32. 29 Aug 2008 <http://connect.educause.edu/Library/EDUCAUSE+Review/MindsonFireOpenEducationt/45823?time=1220007552&gt;.

Buell, Lawrence. “The Unkillable Dream of the Great American Novel: Moby-Dick as Test Case.” American Literary History 20.1 (2008): 132-155. 24 Aug 2008 <http://muse.jhu.edu/journals/american_literary_history/v020/20.1buell.pdf&gt;.

Dee, Jonathan. “All the News That’s Fit to Print Out.” The New York Times 1 Jul 2007. 30 Aug 2008 <http://www.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html&gt;.

DiZerega, Gus. “Civil Society, Philanthropy, and Institutions of Care.” The Good Society 15.1 (2006): 43-50. 24 Aug 2008 <http://muse.jhu.edu/journals/good_society/v015/15.1diZerega.html&gt;.

Jenkins, Henry. “What Wikipedia Can Teach Us About the New Media Literacies (Part One).” Confessions of an Aca/Fan 26 Jun 2007. 30 Aug 2008 <http://www.henryjenkins.org/2007/06/what_wikipedia_can_teach_us_ab.html&gt;.

Kirschenbaum, Matthew G. Mechanisms : new media and the forensic imagination. (Cambridge, Mass.: MIT Press, 2008).

Liu, Alan. “Student Wikipedia Use Policy.” 1 Apr 2007. 30 Aug 2008 <http://www.english.ucsb.edu/faculty/ayliu/courses/wikipedia-policy.html&gt;.

Margolies, Daniel S. “Robert E. Lee: Heroic, But Not the Polio Vaccine.” Reviews in American History 35.3 (2007): 385-392. 25 Aug 2008 <http://muse.jhu.edu/journals/reviews_in_american_history/v035/35.3margolies.html&gt;.

Rosenzweig, Roy. “Can History be Open Source? Wikipedia and the Future of the Past.” The Journal of American History Volume 93, Number 1 (June, 2006): 117-46.  Available at http://chnm.gmu.edu/resources/essays/d/42

Young, Jeffrey. “Wikipedia’s Co-Founder Wants to Make It More Useful to Academe.” Chronicle of Higher Education 13 Jun 2008. 28 Aug 2008 <http://chronicle.com/free/v54/i40/40a01801.htm?utm_source=at&utm_medium=en&gt;.

Doing Digital Scholarship: Presentation at Digital Humanities 2008

Note:  Here is roughly what I said during my presentation at Digital Humanities 2008 in Oulu, Finland (or at least meant to say—I was so sleep deprived thanks to the unceasing sunshine that I’m not sure what I actually did say).  My session, which explored the meaning and significance of “digital humanities,” also featured rich, engaging presentations by Edward Vanhoutte on the history of humanities computing and John Walsh on comparing alchemy and digital humanities.  My presentation reports on my project to remix my dissertation as a work of digital scholarship and synthesizes many of my earlier blog posts to offer a sort of Reader’s Digest condensed version of my blog for the past 7 months. By the way, sorry that I’ve been away from the blog for so long.  I’ve spent the last month and a half researching and writing a 100 page report on archival management software,  reviewing essays, performing various other professional duties, and going on both a family vacation to San Antonio and a grown-up vacation to Portland, OR (vegan meals followed up by Cap’n Crunch donuts.  It took me a week to recover from the donut hangover).  In the meantime, lots of ideas have been brewing, so expect many new blog entries soon.

***

When I began working on my dissertation in the mid 1990s, I used a computer primarily to do word processing—and goof off with Tetris.  Although I used digital collections such as Early American Fiction and Making of America for my dissertation project on bachelorhood in 19th C American literature, I did much of my research the old fashioned way: flipping through the yellowing pages of 19th century periodicals on the hunt for references to bachelors, taking notes using my lucky leaky fountain pen.  I relied on books for my research and, in the end, produced a book.

At the same time that I was dissertating, I was also becoming enthralled by the potential of digital scholarship through my work at the University of Virginia’s (late lamented) Electronic Text Center.  I produced an electronic edition of the first section from Donald Grant Mitchell’s bestseller Reveries of a Bachelor that allowed readers to toggle between variants.   I even convinced my department to count Perl as a second language, citing the Matt Kirschenbaum precedent (“come on, you let Matt do it, and look how well that turned out”) and the value of computer languages to my profession as a budding digital humanist.  However, I decided not to create an electronic version of my dissertation (beyond a carefully backed-up Word file) or to use computational methods in doing my research, since I wanted to finish the darn thing before I reached retirement age.

Last year, five years after I received my PhD and seven years after I had become the director of Rice University’s Digital Media Center, I was pondering the potential of digital humanities, especially given mass digitization projects and the emergence of tools such as TAPOR and Zotero.  I wondered: What is digital scholarship, anyway?  What does it take to produce digital scholarship? What kind of digital resources and tools are available to support it? To what extent do these resources and tools enable us to do research more productively and creatively? What new questions do these tools and resources enable us to ask? What’s challenging about producing digital scholarship? What happens when scholars share research openly through blogs, institutional repositories, & other means?

I decided to investigate these questions by remixing my 2002 dissertation as a work of digital scholarship.  Now I’ll acknowledge that my study is not exactly scientific—there is a rather subjective sample of one.  However, I figured, somewhat pragmatically, that the best way for me to understand what digital scholars face was to do the work myself.  I set some loose guidelines: I would rely on digital collections as much as possible and would experiment with tools for analyzing, annotating, organizing, comparing and visualizing digital information.  I would also explore different ways of representing my ideas, such as hypertextual essays and videos.  Embracing social scholarship, I would do my best to share my work openly and make my research process transparent.  So that the project would be fun and evolve organically, I decided to follow my curiosity wherever it led me, imagining that I would end up with a series of essays on bachelorhood in 19th century American culture and, as sort of an exoskeleton, meta-reflections on the process of producing digital scholarship.

My first challenge was defining digital scholarship.  The ACLS Commission on Cyberinfrastructure’s report points to five manifestations of digital scholarship: collection building, tools to support collection building, tools to support analysis, using tools and collections to produce “new intellectual products,” and authoring tools.   Some might argue we shouldn’t really count tool and collection building as scholarship.  I’ll engage with this question in more detail in a future post, but for now let me say that most consider critical editions, bibliographies, dictionaries and collations, arguably the collections and tools of the pre-digital era, to be scholarship.  In many cases, building academic tools and collections requires significant research and expertise and results in the creation of knowledge—so, scholarship.   Still, my primary focus is on the fourth aspect, leveraging digital resources and tools to produce new arguments.  I’m realizing along the way, though, that I may need to build my own personal collections and develop my own analytical tools to do the kind of scholarship I want to do.

In a recent presentation at CNI, Tara McPherson, the editor of Vectors, offered her own “Typology of Digital Humanities”:
•    The Computing Humanities: focused on building tools, infrastructure, standards and collections, e.g. The Blake Archive
•    The Blogging Humanities: networked, peer-to-peer, e.g. crooked timber
•    The Multimodal Humanities: “bring together databases, scholarly tools, networked writing, and peer-to-peer commentary while also leveraging the potential of the visual and aural media that so dominate contemporary life,” e.g. Vectors

Mashing up these two frameworks, my own typology would look something like this:

•    Tools, e.g. TAPOR, Zotero
•    Collections, e.g. The Blake Archive
•    Theories, e.g. McGann’s Radiant Textuality
•    Interpretations and arguments that leverage digital collections and tools, e.g. Ayers and Thomas’ The Difference Slavery Made
•    Networked Scholarship: a term that I borrow from the Institute for the Future of the Book’s Bob Stein and that I prefer to “blogging humanities,” since it encompasses many modes of communication, such as wikis, social bookmarking, institutional repositories, etc. Examples include Savage Minds (a group blog in anthropology), etc.
•    Multimodal scholarship: e.g. scholarly hypertexts and videos, e.g. what you might find in Vectors
•    Digital cultural studies, e.g. game studies, Lev Manovich’s work, etc (this category overlaps with theories)

Initially I assumed that tools, theories and collections would feed into arguments that would be expressed as networked and/or multimodal scholarship and be informed by digital cultural studies.  But I think that describing digital scholarship as a sort of assembly line in which scholars use tools, collections and theories to produce arguments oversimplifies the process.  My initial diagram of digital scholarship pictured single-headed arrows linking different approaches to digital scholarship; my revised diagram looks more like spaghetti, with arrows going all over the place.  Theories inform collection building; the process of blogging helps to shape an argument; how a scholar wants to communicate an idea influences what tools are selected and how they are used.

After coming up with a preliminary definition of what I wanted to do, I needed to figure out how to structure my work.  I thought of John Unsworth’s notion of scholarly primitives, a compelling description of core research practices.  Depending on how you count them, Unsworth identifies 7 scholarly primitives:
•    Discovering
•    Annotating
•    Comparing
•    Referring
•    Sampling
•    Illustrating
•    Representing

As useful as this list is in crystallizing what scholars do, I think the list is missing at least one more crucial scholarly primitive, perhaps the fundamental one: collaboration. Although humanists are stereotyped as solitary scholars isolated in the library, they often work together, whether through co-editing journals or books, sharing citations, or reviewing one another’s work.  In the digital humanities, of course, developing tools, standards, and collections demands collaboration among scholars, librarians, programmers, etc.  I would also define networked scholarship—blogging, contributing to wikis, etc—as collaborative, since it requires openly sharing ideas and supports conversation. It’s only appropriate for me to note that this idea was worked out collaboratively, with colleagues at THAT Camp.

I want to make my research process as visible as possible, not only for idealistic reasons, but also because my work only gets better the more feedback I receive.  So I started up a blog—actually, several of them. At the somewhat grandly-named Digital Scholarship in the Humanities, I reflect on trends in the digital humanities and on broader lessons learned in the process of doing my research project.  In “Lisa Spiro’s Research Notes,”  I typically address stuff that seems too specialized, half-baked, or even raw for me to put on my main blog, such as my navel gazing on where to take my project next, or my experiments with Open Wound, a language re-mixing tool.   At my PageFlakes research portal, I provide a single portal to the various parts of my research project, offering RSS feeds for both of my blogs as well as for a Google News search of the term “digital humanities,” my delicious bookmarks for “digital scholarship,” links to my various digital humanities projects, and more.

I’ll admit that when I started my experiments with social scholarship I worried that no one would care, or that I would embarrass myself by writing something really stupid, but so far I’ve loved the experience.  Through comments and emails from readers, I’m able to see other perspectives and improve my own thinking.  I’ve heard from biologists and anthropologists as well as literary scholars and historians, and I’ve communicated with researchers from several countries.  As a result, I feel more engaged in the research community and more motivated to keep working.   Although I know blogging hasn’t caught on in every corner of academia, I think it has been good for my career as a digital humanist.  I am more visible and thus have more opportunities to participate in the community, such as by reviewing book proposals, articles, and grant applications.

I don’t have space to discuss the relevance of each scholarly primitive to my project, but I did want to mention a few of them: discovering, comparing, and representing.

Discovering

In order to use text analysis and other tools, I needed my research materials to be in an electronic format.  In the age of mass digitization projects such as Google Books and the Open Content Alliance, I wondered how many of my 296 original research sources are digitized & available in full text.  So I diligently searched Google Books and several other sources to find out.  I looked at 5 categories: archival resources as well as primary and secondary books and journals.   I found that with the exception of archival materials, over 90% of the materials I cited in my bibliography are in a digital format.  However, only about 83% of primary resources and 37% of the secondary materials are available as full text.  If you want to do use text analysis tools on 19th century American novels or 20th century articles from major humanities journals, you’re in luck, but the other stuff is trickier because of copyright constraints.  (I’ll throw in another scholarly primitive, annotation, and say that I use Zotero to manage and annotate my research collections, which has made me much more efficient and allowed me to see patterns in my research collections.)

Of course, scholars need to be able to trust the authority of electronic resources.  To evaluate quality, I focused on four collections that have a lot of content in my field, 19th century American literature: Google Books, Open Content Alliance, Early American Fiction (a commercial database developed by UVA’s Electronic Text Center), and Making of America.  I found that there were some scanning errors with Google Books, but not as many as I expected. I wished that Google Books provided full text rather than PDF files of its public domain content, as do Open Content Alliance and Making of America (and EAF, if you just download the HTML).  I had to convert Google’s PDF files to Adobe Tagged Text XML and got disappointing results.  The OCR quality for Open Content Alliance was better, but words were not joined across line breaks, reducing accuracy.  With multi-volume works, neither Open Content Alliance nor Google Books provided very good metadata.  Still, I’m enough of a pragmatist to think that having access to this kind of data will enable us to conduct research across a much wider range of materials and use sophisticated tools to discern patterns – we just need to be aware of the limitations.

Comparing
To evaluate the power of text analysis tools for my project, I did some experiments using TAPOR tools, including a comparison of two of my key bachelor texts: Mitchell’s Reveries of a Bachelor, a series of a bachelor’s sentimental dreams (sometimes nightmares) about what it would be like to be married, and Melville’s Pierre, which mixes together elements of sentimental fiction, Gothic literature, and spiritualist tracts to produce a bitter satire.   I wondered if there was a family resemblance between these texts.  First I used the Wordle word cloud generator to reveal the most frequently appearing words.  I noted some significant overlap, including words associated with family such as mother and father, those linked with the body such as hand and eye, and those associated with temporality, such as morning, night, and time.  To develop a more precise understanding of how frequently terms appeared in the two texts and their relation to each other, I used TAPOR’s Comparator tool.  This tool also revealed words unique to each work, such as “flirt” and “sensibility” in the case of Reveries, “ambiguities” and “miserable” in the case of Pierre.  Finally, I used TAPOR’s concordance tool to view key terms in context.  I found, for instance, that in Mitchell “mother” is often associated with hands or heart, while in Melville it appears with terms indicating anxiety or deceit.  By abstracting out frequently occurring and unique words, I can how Melville, in a sense, remixes elements of sentimental fiction, putting terms in a darker context.  The text analysis tools provide a powerful stimulus to interpretation.

Representing
Not only am I using the computer to analyze information, but also to represent my ideas in a more media-rich, interactive way than the typical print article.  I plan to experiment with Sophie as a tool for authoring multimodal scholarship, and I’m also experimenting with video as a means for representing visual information. Right now I’m reworking an article on the publication history of Reveries of a Bachelor as a video so that I show significant visual information such as bindings, illustrations, and advertisements.    I’ve condensed a 20+ page article into a 7 minute narrative, which for a prolix person like me is rough.  I also have been challenged to think visually and cinematically, considering how the movement of the camera and the style of transitions shape the argument.  Getting the right imagery—high quality, copyright free—has been tricky as well.  I’m not sure how to bring scholarly practices such as citation into videos.  Even though my draft video is, frankly, a little amateurish, putting it together has been lots of fun, and I see real potential for video to allow us to go beyond text and bring the human voice, music, movement and rich imagery into scholarly communication.

On Tools
In the course of my experiments in digital scholarship, I often found myself searching for the right tool to perform a certain task.  Likewise, in my conversations with researchers who aren’t necessarily interested in doing digital scholarship, just in doing their research better, I learned that they weren’t aware of digital tools and didn’t know where to find out about them.  To make it easier for researchers to discover relevant tools, I teamed up with 5 other librarians to launch the Digital Research Tools, or DiRT, wiki at the end of May.   DiRT provides a directory of digital research tools, primarily free but also commercial, categorized by their functions, such as “manage citations.”  We are also writing reviews of tools geared toward researchers and trying to provide examples of how these tools are used by the research community.  Indeed, DiRT focuses on the needs of the community; the wiki evolves thanks to its contributors.   Currently 14 people in fields such as anthropology, communications, and educational technology have signed on to be contributors.  Everything is under a Creative Commons attribution license.  We would love to see spin-offs, such as DiRT in languages besides English; DiRT for developers; and Old DiRT (dust?), the hall of obsolete but still compelling tools.  My experiences with DiRT have demonstrated again the beauty of collaboration and sharing.  Both Dan Cohen of CHNM & Alan Liu of UC Santa Barbara generously offered to let us grab content from their own tools directories.  Busy folks have freely given their time to add tools to DiRT.  Through my work on DiRT, I’ve learned about tools outside of my field, such as qualitative data analysis software.

So I’ll end with an invitation: Please contribute to DiRT.  You can sign up to be an editor or reviewer, recommend tools to be added, or provide feedback via our survey.  Through efforts like DiRT, we hope to enable new digital scholarship, raise the profile of inventive digital tools, and build community.

Using Text Analysis Tools for Comparison: Mole & Chocolate Cake

How can text analysis tools enable researchers to study the relationships between texts? In an earlier post, I speculated about the relevance of such tools for understanding “literary DNA”–how ideas are transmitted and remixed–but as one reader observed, intertextuality is probably a more appropriate way of thinking about the topic. In my dissertation, I argue that Melville’s Pierre represents a dark parody of Mitchell’s Reveries of a Bachelor. Melville takes the conventions of sentimental bachelor literature, mixes in elements of the Gothic and philosophic/theological tracts, and produces a grim travesty of bachelor literature that makes the dreaming bachelor a trapped quasi-husband, replaces the rural domestic manor with a crowded urban apartment building, and ends in a real, Hamlet-intense death scene rather than the bachelor coming out of reverie or finding a wife. Would text analysis tools support this analysis, or turn up patterns that I had previously ignored?

I wanted to get a quick visual sense of the two texts, so I plugged them into Wordle, a nifty word cloud generator that enables you to control variables such as layout, font and color. (Interestingly, Wordle came up with the perfect visualizations for each text at random: Pierre white type on a black background shaped into, oh, a chess piece or a tombstone, Reveries a brighter, more casual handwritten style, with a shape like a fish or egg.)

Wordle Word Cloud for Pierre

Wordle Reveries Word Cloud

Using these visual representations of the most frequent words in each book enabled me to get a sense of the totality, but then I also drilled down and began comparing the significance of particular words. I noted, for instance, the importance of “heart” in Reveries, which is, after all, subtitled “A Book of the Heart.” I also observed that “mother” and “father” were given greater weight in Pierre, which is obsessed with twisted parental legacies. To compare the books in even more detail, I decided to make my own mashed up word cloud, placing terms that appeared in both texts next to each other and evaluating their relative weight. I tried to group similar terms, creating a section for words about the body, words about feeling, etc. (I used crop, copy and paste tools in PhotoShop to create this mashup, but I’m sure–or I sure hope–there’s a better way.

Comparison of Reveries and Pierre(About three words into the project, I wished for a more powerful tool to automatically recognize, extract and group similar words from multiple files, since my eyes ached and I had a tough time cropping out words without also grabbing parts of nearby words. Perhaps each word would be a tile that you drag over to a new frame and move around; ideally, you could click on the word and open up a concordance) My mashup revealed that in many ways Pierre and Reveries have similar linguistic profiles. For instance, both contain frequently-occurring words focused on the body (face, hand, eye), time (morning, night), thinking, feeling, and family. Perhaps such terms are common in all literary works (one would need to compare these works to a larger literary corpus), but they also seem to reflect the conventions of sentimental literature, with its focus on the family and embodied feeling (see, for instance, Howard).

The word clouds enabled me to get an initial impression of key words in the two books and the overlap between them, but I wanted to develop a more detailed understanding. I used TAPOR’s Comparator to compare the two texts, generating a complete list of how often words appeared in each text and their relative weighting. When I first looked at the the word list, I was befuddled:

Words Reveries counts Reveries relative counts Pierre relative Pierre counts Relative ratio Reveries:Pierre
blaze 45 0.0007 0 1 109.4667

What does the relative ratio mean? I was starting to regret my avoidance of all math and stats courses in college. But after I worked with the word clouds, the statistics began to make more sense. Oh, relative ratio means how often a word appears in the first text versus the second–“blaze” is much more prominent in Reveries. Ultimately I trusted the concreteness and specificity of numbers more than the more impressionistic imagery provided by the word cloud, but the word cloud opened up my eyes so that I could see the stats more meaningfully. For instance, I found that mother indeed was more significant in Pierre, occurring 237 times vs. 58 times in Reveries. Heart was more important in Reveries (a much shorter work), appearing 199 times vs. 186 times in Pierre. I was surprised that “think” was more significant in Reveries than in Pierre, given the philosophical orientation of the latter. With the details provided by the text comparison results, I could construct an argument about how Melville appropriates the language of sentimentality.

But the differences between the two texts are perhaps even more interesting than their similarities, since they show how Melville departed from the conventions of male sentimentalism, embraced irony, and infused Pierre with a sort of gothic spirtualism. These differences are revealed more fully in the statistics than the word clouds. A number of terms are unique to each work. For instance, sentimental terms such as “sympathies,” “griefs,” “sensibility” appear frequently in Reveries but never in Pierre, as do romantic words such as “flirt,” “sparkle,” and “prettier.” As is fitting for Melville, Pierre‘s unique language is typically darker, more archaic, abstract, and spiritual/philosophical, and obsessed with the making of art: “portrait,” “writing,” “original,” “ere,” “miserable,” “visible,” “invisible,” “profound(est),” “final,” “vile,” “villain,” “minds,” “mystical,” “marvelous,” “inexplicable,” “ambiguous.” (Whereas Reveries is subtitled “A Book of the Heart,” Pierre is subtitled “The Ambiguities.”) There is a strand of darkness in Mitchell–he uses “sorrow” more than Melville–but then Mitchell uses “pleasure” 14 times to Melville’s 2 times and “pleasant” 43 times. Reveries is more self-consciously focused on bachelorhood; Mitchell uses “bachelor” 28 times to Melville’s 5. Both authors refer to dreaming; Mitchell uses “reveries” 10 times, Melville 7. Interestingly, only Melville uses “America” (14 times).

Looking over the word lists raises all sorts of questions about the themes and imagery of each work and their relationship to each other, but the data can also be overwhelming. If comparing two works yields over 10,000 lines in a spreadsheet, what criteria should you use in deciding what to select (to use Unsworth’s scholarly primitive)? What happens when you throw more works into the mix? I’m assuming that text mining techniques will provide more sophisticated ways of evaluating textual data, allowing you to filter data and set preferences for how much data you get. (I should note that you can exclude terms and set preferences in TAPOR).

Text analysis brings attention to significant features of a text by abstracting those features–for instance, by generating a word frequency list that contains individual words and the number of times they appear. But I kept wondering how the words were used, in what context they appeared. So Melville uses “mother” a lot–is it in a sweetly sentimental way, or does he treat the idea of mother more complexly? By employing TAPOR’s concordance tool, you can view words in context and see that Mitchell often uses mother in association with words like “heart,” “kiss,” “lap,” while in Melville “mother” does appear with “Dear” and “loving,” but also with “conceal,” “torture,” “mockingly,” “repelling,” “pride,” “cruel.” Hmmm. In Mitchell, “hand” most often occurs with “your” and “my,” signifying connection, while “hand” in Pierre is more often associated with action (hand-to-hand combat, “lift my hand in fury,” etc) or with putting hand to brow in anguish. Same word, different resonance. It’s as if Melville took some of the ingredients of sentimental literature and made something entirely different with them, enchiladas mole rather than a chocolate cake.

Word clouds, text comparisons, and concordances open up all sorts of insights, but how does one use this evidence in literary criticism? If I submitted an article full of word count tables to a traditional journal, I bet the editors wouldn’t know what to do with it. But that may change, and in any case text analysis can inform the kind of arguments critics make. My experience playing with text analysis tools verifies, for me, Steve Ramsay’s recommendation that we “reconceive computer-assisted text analysis as an activity best employed not in the service of a heightened critical objectivity, but as one that embraces the possibilities of that deepened subjectivity upon which critical insight depends.”

Works Cited

Howard, June. “What Is Sentimentality?.” American Literary History 11.1 (1999): 63-81. 22 Jun 2008 <http://alh.oxfordjournals.org/cgi/content/citation/11/1/63&gt;.

Ramsay, Stephen. “Reconceiving Text Analysis: Toward an Algorithmic Criticism.” Lit Linguist Computing 18.2 (2003): 167-174. 27 Nov 2007 <http://llc.oxfordjournals.org/cgi/content/abstract/18/2/167&gt;.

Research Methods Session at THAT Camp

This weekend I’m at THAT Camp, which is bringing together programmers, librarians, funding officers, project managers, mathematicians, historians, philosophers, literary scholars, linguists, etc. to discuss the digital humanities. Sponsored by the Center for History and New Media at George Mason University, THAT CAMP is an un-conference, which means that ideas for sessions emerged organically out of blog posts preceding the gathering and out of a discussion held when the Camp began. As a result of all of the sharing of ideas via blogging and social networking via Twitter, the meeting seems much more intimate, open, and lively than your average conference. People who are passionate and curious about the digital humanities are coming together to talk about teaching, gaming, visualization, project sustainability, etc., and to learn how to hack Zotero and Omeka, build a simple history appliance, and more. As many folks have commented, the toughest part of THAT Camp is deciding which of the four sessions to attend–I want to go to them all. Kudos to CHNM for organizing and hosting the event–I bet some exciting initiatives and collaborations will come out of THAT Camp.

Yesterday afternoon I facilitated a session on research methods. At the request of some of the participants, I’m posting the rough notes I took during this rich discussion.

Touchstones/ pump priming quotations for the session:

  • “Research in the humanities, then, is and has been an activity characterized by the four Rs: reading, writing, reflection, and rustication. If these are the traditional research methods in the humanities, what will “new research methods” look like–and more importantly, why do we need them?”—John Unsworth, New Methods for Humanities Research
  • “The day will come, not that far off, when modifying humanities with ‘digital’ will make no more sense than modifying humanities with ‘print.’” –Steve Wheatley, ACLS
  • Unsworth, Scholarly Primitives: “some basic functions common to scholarly activity across disciplines, over time, and independent of theoretical orientation.” Unsworth lists the following scholarly primitives:
    • Annotating
    • Comparing
    • Discovering
    • Illustrating
    • Referring
    • Representing
    • Sampling
  • “What is a literary-critical ‘problem?’ How is it different from a scientific “problem?””—Steve Ramsay

Discussion

EXPERTISE AND INFORMATION FLUENCY

  • Old method: Scholars would find things in the archive, bring them back, provide people w/ information.
  • New: scholars face a deluge of information.
  • Old assumption: info is hard to get to, need to expertise to find stuff.
  • New: expertise shifts from finding to filtering and sorting
  • The point of a research method is figuring out how to filter, sort. A bibliography is not a list of Google links; you need to be familiar w/ major sources in field.
  • Experts know how to discern bias; Filtering requires expertise.
  • Expertise=familiarity with conceptual/ theoretical approaches in field. Scholars get a sense of theoretical approach by looking at the bibliography—it’s metadata about the book
  • Scholars need to inform students about problems with resources they find. New problems arise with digital—important to know weaknesses of Google Books. Need to teach students to question how resource/ tool created—what it does and doesn’t do.
  • The student world is digital—they need to learn how to operate responsibly in it
  • Two webs: open access, proprietary/ walled off. Students need to be aware of it—not everything is in Google.
  • But it’s also important to meet students where they start—even faculty start with Google; make metadata open so it’s discoverable. Implications of stuff not being accessible—it’s ignored.
  • Old model: one expert—you had to read the one book on the subject. Now there’s a huge amounts of data, need multiple interfaces to all of it. Need to provide multiple pathways to data. RDF key.
  • If you’re used to do something a particular way, it’s hard to change that.
  • Origins of print: first people to adopt print were different groups using it for their own agenda. Later library science came along to collect and curate content. Print media enabled new ways of doing existing scholarship. New disciplines developed, such as finding and keeping print materials (librarianship) and the study of books as physical objects. Same thing in shift to digital: there are specialists who focus on the technical side, like building tools. There are scholars, who want to use this stuff and don’t need to know the technical details.

INTEGRATED RESEARCH PROCESSES/ TOOLS

  • At the recent New Horizons conference, Geoffrey Rockwell spoke on mass digitization and the process of research. Search is not that simple—there are multiple places to look. The problem of selection→ how do you decide what makes sense. Then there’s serendipity. How do scholars negotiate mass of stuff? How do they make sense of it, select it? Tools like Zotero help you to share & select info; then you leave Zotero and write paper separately. With textual analysis tools, there’s no way to take textual data and link to publication → you need a relationship to textual analysis work. Can integrated tools be developed so that discovery, search, data collection, analysis, etc. can be carried right through publication in journal, Omeka, etc?

COLLABORATION

  • Sharing should be one of the scholarly primitives. We’re sharing in new ways. The speed & scale of what you share is changing.
  • How do you cut across disciplines? People from different fields have difft takes: literature vs history vs art; different methods, not much cross-fertilization
  • Pronetos: scholars throughout the world get a single place to go to network and engage with other scholars. Organic—if you’re an American historian, you can create an American history group if it doesn’t already exist. Takes on the problem of how to help people network.
  • Zotero Commons will facilitate sharing of expertise, as you can find an expert sharing a particular bibliography.
  • Opening up projects, creating communities around them helps with sustainability
  • Most transformative aspect of new research methods is establishing scholarly networks, collaborative aspect
  • How do you track your efforts in collaboration so that you can document what deserves to be rewarded?
  • Teach collaboration by modeling it for students
  • Sharing depends on discipline—people working on patents don’t necessarily share.
  • Humanists have trouble with sharing—for instance, some NINES users wanted to make tags private
  • Not sharing will become a problem in the long term, since it leads to duplication of effort and unnecessary competition. You can collaborate to come up with a better project.
  • Information gets out quickly, danger is in not sharing–that’s when you get scooped.
  • It’s not the technology that enables the sharing—it’s the people. There’s concern about retaining rights, getting credit, getting ripped off. People are building projects (e.g. institutional repositories) and users are not coming. How will people be encouraged to share?
  • People tend to share within discipline rather than institution.
  • What’s the relationship btwn repositories, blogs, Omeka installations, etc.? Importance of data aggregation, globalization.
  • Cyberinfrastructure is people
  • They’ve been pushing knowledge management practices in the business world for decades, and they still haven’t cracked it.
  • Mashups—pieces in place to make scholars see potential, but haven’t been realized yet.
  • With openly shared research, you facilitate interdisciplinarity and get research out to more people. Institutional repositories (IR) are key for this.
  • IRs are siloed—but w/ Zotero Commons, institution is everyone.
  • If you put your research out there, you’re staking it—not getting scoped.
  • If scholars blog their work at the early stage, they may wonder: are they putting it out too early?
  • What is it about sharing that’s changing over time?
  • Do humanities departments who want to do digital need a marketing department to help people discover their work?
  • Role of libraries as marketing depts., making resources accessible.
  • Professional societies need to step up b/c it’s not realistic for individual schools to do the marketing of digital scholarship.
  • Should professional societies launch their own version of Facebook?
  • We need to get away from the silos.
  • Peer review is a kind of social network.
  • Media Commons: social network for peer review of online texts using CommentPress, etc. Slashdot: reputation ranking, etc. (morphed into peer review)
  • Offer interfaces inflected given different disciplines: NINES, 18th C Connect
  • NINES an example of peer review for digital scholarship. 22 sites peer-reviewed by NINES—22 of first 105 to be put in MLA Bibliography.
  • Journal gives seal of approval—haven’t come up with that kind of stamp for digital world. Part of fear about blogging iis that it’s not peer reviewed
  • Blogrolls are a form of peer review–to find good stuff, you look at Matt Kirschenbaum’s blog to see who he reads.
  • Rotunda/ digital publisher as stamp of approval.
  • There are different standards for digital and print. A Nature study of online peer review found it doesn’t work. But there were something like 40 comments in 6 months—isn’t that success, when in normal peer review it would take 1-2 years to get 3 comments? Why is there such a high bar for digital scholarship?
  • Noah Wadrip-Fruin’s peer review processes, different feedback overall from both online/blog-based and traditional peer review.
  • Scholarship over time: digital projects, when do they end?

SHIFT FROM PRINT TO DIGITAL

  • How are traditional research methods tied to the printed book?
  • Interpretation: job of historian is to make sense of what things mean. We’re in the land grab stage right now—dump stuff online, then begin to wall it off. It’s still early—at the ground floor of something that could be big.
  • Historians typically narrativize events. At Miami U. they developed a tool to transform a short story into different genre—for instance, from horror to epic. Students learned elements of genre, wrote XSLT stylesheets to do the transformations.
  • Researchers could try out different narratives on data sets—picking out certain aspects. Historical narrativizing tied to print; digital enables historical multi-narrative. With digital, you can see what breaks when you change parameters.
  • Print to digital: transition from narrative to simulation, counter-factuals
  • How do you read? How many books do you have open?
    o Former practices: contraptions to hold multiple books open. Some ways of laying out books made them a database.
    o How does that work now? Ray Siemens: exploring idea of reading. Tools for document triage

USEFULNESS OF THE TERM “DIGITAL HUMANITIES”

  • The problem of naming a new digital humanities research center: Faculty advisers focused on the word “humanities”—what about social sciences, arts, etc.
  • When does the digital label drop out—or is it useful in defining what you do?
  • NEH Digital Humanities Office: NEH has been doing digital humanities for a long time: it funded TEI 20 years ago. But establishing the office helps to validate digital scholarship.
  • Specialists focus on certain areas of theory–we have the deconstruction scholars who specialize in the field, but their ideas permeate throughout the humanities. Similarly, digital humanists will be the lead group of folks who do digital work, but it will filter down into common research practice.
  • Digital humanities researchers need to make the case for a new methodology.
  • Digital” useful b/c we are at an early stage—people still wonder what it means to be digital.
  • “Digital humanities” brings together technical skills and humanistic knowledge. Creating a DTD is a fascinating part of digital humanities; sounds like computer stuff, but it’s fundamentally humanities.
  • A tension: bibliography used to be core work, but that kind of work doesn’t necessarily get you tenure now. There’s real suspicion about whether this is truly humanities work.
  • Digital humanities includes tool developers, text encoders, people who use digital methods, as well as those who study digital culture, e.g. video games, underlying structures of social environment. Object they study is digital.
  • Divide between game/ film studies and textual digital humanities.
  • Jerry McGann: “humanists have always worked to preserve and interpret human record. Digital humanities is doing it in digital form.”
  • ADHO used to focus on the textual digital humanities, but is reaching out to digital theorists/ art, etc.
  • There’s a significant skill set to doing digital humanities work. Many scholars don’t really appreciate what it takes to produce digital resources—it’s not just scanning documents.
  • Theoreticians: need a little more dirt under their fingernails—they need to get experience doing these projects to inform their theorizing.

Digging in the DiRT: Sneak Preview of the Digital Research Tools (DiRT) wiki

When I talk with researchers about a cool tool such as Zotero, they often ask, “Hey, how did you find out about that?” Not everyone has the time or inclination to read blogs, software reviews, and listserv announcements obsessively, but now researchers can quickly identify relevant tools by checking out the newly-launched Digital Research Tools (DiRT) wiki: http://digitalresearchtools.pbwiki.com/. DiRT lists dozens of useful tools for discovering, organizing, analyzing, visualizing, sharing and disseminating information, such as tools for compiling bibliographies, taking notes, analyzing texts, and visualizing data. We also offer software reviews that not only describe the tool’s features, strengths, and weaknesses, but also provide usage tips, links to training resources, and suggestions for how it might be implemented by researchers. So that DiRT is accessible to non-techies and techies alike, we try to avoid jargon and categorize tools by their functions. Although the acronym DiRT might suggest that it’s a gossip site for academic software, dishing on bugs and dirty secrets about the software development process, we prefer a gardening metaphor, as we hope to help cultivate research projects by providing clear, concise information about tools that can help researchers do their more work more effectively or creatively.

DiRT is brand new, so we’re still in the process of creating content and figuring how best to present it; consider it to be in alpha release and expect to see it evolve. (We plan to announce DiRT more broadly in a few months, but we’re giving sneak previews right now in the hope that comments from members of the digital humanities community can help us to improve it.) Currently the DiRT editorial team includes me, my ever-innovative and enthusiastic colleague Debra Kolah, and three whip-smart librarians from Sam Houston State University with expertise in Web 2.0 technologies (as well as English, history, business, and ranching!): Tyler Manolovitz, Erin Dorris Cassidy, and Abe Korah. We’ve committed to provide at least 5 new tool reviews per month, but we can do even more if more people join us (hint, hint). We invite folks to recommend research tools or software categories, write reviews, sign on to be co-editors, and/or offer feedback on the wiki. Please contact me at lspiro@rice.edu. [Update: You can also provide feedback via this form.]

By the way, playing with DiRT has convinced me yet again of the value of collaboration. Everyone on the team has contributed great ideas about what tools to cover, what form the reviews should take, and how to promote and sustain the wiki. Five people can sure do a heck of a lot more than one–and have fun in the process.