Monthly Archives: November 2007

Defining digital scholarship in the humanities: Ten-fingered humanists?

“Digital scholarship” seems to have become a new buzzword in academia. The term is invoked by those advocating for open access to scholarly knowledge (e.g. Charles Bailey’s Digital Scholarship) as well as those promoting innovative research methodologies. Universities, libraries, and funding organizations are beginning to recognize the need to support digital scholarship. Witness:

But what exactly do we mean by “digital scholarship,” particularly in the humanities? Let’s look at three definitions:

  1. The ACLS report offers more of a description of what could be considered digital scholarship than a precise definition. To make this description more concrete, I’ve added my own examples in brackets:

a) Building a digital collection of information for further study and analysis [e.g. the Blake Archive]
b) Creating appropriate tools for collection-building [e.g. Collex, Zotero]
c) Creating appropriate tools for the analysis and study of collections [e.g. TAPOR Tools, MONK, Token-X]
d) Using digital collections and analytical tools to generate new intellectual products [e.g. Ayers and Thomas, “The Difference Slavery Made,” a hypertext historical essay that draws its evidence from The Valley of the Shadow]
e) Creating authoring tools for these new intellectual products, either in traditional forms or in digital form [this one is important, but harder to think about–would CommentPress, which allows readers to add comments and annotations to WordPress posts, qualify?]

Although this list offers a helpful way to think about the different forms that digital scholarship can take, I think it misses at least two other types of digital scholarship:

  1. theorizing the digital (digital textuality, media studies, etc), such as Jerome McGann’s Radiant Textuality
  2. publishing scholarship as hypermedia, e.g. works in the journal Vectors and, I would argue, the video Web 2.0 … The Machine is Us/ing Us

And then there are blogs, scholarly portals, podcasts, online colloquia, etc.
2. The University of Washington likewise offers a broad definition:

Digital scholarship has many dimensions and may be defined as:

  • any element of knowledge or art that is created, produced, analyzed, distributed and/or displayed in a digital medium for the purpose of research or teaching;
  • the creation of digital technology, tools and services to solve problems in scholarship; or
  • the study and analysis of digital information, resources and culture.

I like the way that this definition encompasses both tools/services (collections?) and the study of digital culture. I would agree that digital scholarship is knowledge produced, distributed, etc. in a digital medium, but I think that this definition is too general–these days, almost every article is produced in a word processing program and uses at least some sources from e-journals. For me, what is crucial is the method more than the medium. Digital scholarship, I think, involves some critical reflection on what it means to be using computers as part of the work, whether in building a tool or writing an article.

3. My favorite definition–which is, admittedly, a definition of “e-humanities” rather than “digital scholarship”–comes from the Australian e-Humanities Research Network and is itself derived from Willard McCarty’s work:

a complex and dynamic application of the prodigious memory and processing power of the modern computer, actualised through computer science, to a collection of disciplines with a very long and proud tradition in the preservation, transmission, and examination of human culture (McCarty).

This report distinguishes between two types of digital scholarship: building collections and tools to make research more efficient, and “exploring the new conceptual terrain opened up as a result of bringing established humanities disciplines into contact with the power of the digital.” I like the way that this definition invokes humanities traditions even as it imagines how those practices might change with the analytical power offered through technology; it links the practice of “scholarship” with the possibilities of the “digital”

So how would I define digital scholarship in the humanities? (I won’t pretend to be able to define it for the sciences, although I bet the word “data” would show up somewhere in the definition). I kind of like the description offered by one of the organizers of the 2007 American Literature Association conference. Before I presented at the conference as part of the Digital Americanist panel, he hustled the previous presenters away from the podium so we strange techie folks would have time to set up our computer and projector, saying, “Look out, here come the ten-fingered Americanists.” It took me a while to get the pun (oh yeah, digital does mean “pertaining to the fingers or digits”), but it seems apt–often digital humanists are makers (of tools and collections) as well as thinkers; we’re homo faber as well as homo sapiens.

But, OK, “ten-fingered humanists” is more of a slogan than a definition. Typing “define: digital scholarship” into Google got me no results. So I turned to a favorite old-school tool of humanities scholars, the OED, to mash up “digital” and “scholarship”:

  • digital (from the 2002 draft additions): “involving or relating to the use of computer technology or digital communications, esp. digital multimedia and the Internet”
  • scholarship: learning, erudition

Hmmm. Learning that uses computer technology–pretty squishy. But (here I get evasive) maybe it’s best to avoid tight definitions of such an emerging term; maybe we should let digital scholarship define itself, since it’s still being imagined. In coming posts, I plan to explore different examples of digital scholarship, such as recent dissertations on algorithmic criticism (fascinating stuff!), hypermedia essays, videos, and more. Maybe I’ll come up with a better definition as I work through this stuff…

Advertisements

Woman vs. machine? Analyzing texts…

Since it took me five years before I could steel myself to look at my dissertation again, I had forgotten some of the main points that I made in it. To uncover key terms in Chapter 1, which explores the popular literature of bachelorhood in 19th century America, I decided to use text analysis tools. By generating a list of frequently occurring terms, I figured that I could get a snapshot of my argument and, I hoped, have a handy list of search terms to use as I looked for other instances of bachelor literature. I also wanted to play with the tools so that I could better understand their capabilities and limitations. What patterns would the tool reveal? What terms did I use over and over, despite my best efforts to vary my vocabulary? And is word weight a useful measure of the significance of a concept? Wouldn’t the position of a word (for instance, in a heading or thesis paragraph) also matter, and shouldn’t synonyms be considered in the algorithm?

Before using any tools to automatically generate a list of commonly used terms , I decided to go through the chapter and construct my own list of key words. Then I used TAPOR‘s Word Frequency tool to automatically generate a list of key terms. In comparing my list and TAPOR‘s, I am struck by how I read the chapter through my own interpretive filter. Most of the terms that I included on my list are different descriptors for the bachelor figure in American literature, such as “detached, “”narcissist,” “luxury,” “metamorphosis,” etc. Not surprisingly, TAPOR’s list is much broader. Sure, it overlaps with my list by including terms commonly associated with the bachelor figure, such as “single,” “man,” “unmarried,” “pleasure,” and “sentiment.” But it also includes terms such as “author,” “narrator, “literature,” “literary,” “writing,” American” and “identity,” terms that reflect my argument that anxieties over American authorship were reflected in discourse about bachelorhood. L ikewise, the TAPOR list gives high ranking to words associated with domesticity such as “family,” “home,” and “love,” reflecting my argument that the bachelor stood outside family-centered domesticity but remade it on his own terms. Before running TAPOR, I did write a quick summary of my argument that includes terms such as “identity” and “authorship,” so I was certainly aware of how these ideas played into my argument–they just weren’t included in the list I made. But the TAPOR list also includes some words that reflect not so much my argument as my rhetorical style–for instance, “instance” (I apparently use that phrase a lot to provide examples), “according” (attributing sources), “suggests” (summarizing someone else’s argument), “typically” (avoiding the absolute statement), and likewise (comparing). Noticing the language I use to make arguments reminds me of when I was recorded making a speech and became aware of the way I hung my head to the side and “ummed” as I spoke–I became more self-conscious of my style. I suppose what I’ve gotten out of this exercise, besides a handy list of keywords that I hope to use in conducting searches, is an initial confirmation of the claim that text analysis tools can help you to look beyond your own interpretive filter and see other patterns.

As much as I like the TAPOR tools, I should note one frustration. Ideally you would be able to export word frequencies in some sort of a spreadsheet-friendly format so that you can play with the data and come back to it at a later point, but I didn’t see an easy way to do this. I tried to copy and paste the list of 4286 unique words into Google spreadsheets (which I’m using to share my findings) and ended up crashing my browser. I then pasted the 286 terms that appear at least 5 times into Excel and then into Google spreadsheets, but that process seemed to introduce unnecessary steps. Anyhow, I’ll keep experimenting with TAPOR, HyperPo, Token X, WordHoard, NORA, and the other text analysis, mining and visualization tools out there. Suggestions welcomed!

Why blog?

Now it’s time to blog about… blogging. It’s a popular topic in blogs, mentioned between 1000 and 6000ish times per day in blogs indexed by Technorati.

English posts that contain Blogging per day for the last 30 days:

Technorati Chart

But how many blogs are actually read? According to Derek Gordon, Technorati’s vice president for marketing, over 99% of all blogs get no hits at all. No hits. Of course, some blogs–technology-focused blogs such as Engadget, political blogs such as the Huffington Post, quirky blogs such as Boing Boing, celeb gossip blogs such as TMZ, makin’-money blogs such as Blog Tips to Help You Make Money Blogging ProBlogger–claim thousands of readers and exercise a powerful cultural influence, at least as measured by Technorati’s authority index. I don’t think that any academic blogs currently rank in the Technorati top 100, but the CASCADES project at Carnegie Mellon does include several blogs by academics (faculty, librarians, researchers) in its Top 100 blogs for unit cost case and population affected objective function. The study’s title indicates how abstruse its formula is for generating the ranking (it seems to be based on number of links to and from the blog), and, as Bora Zivkovic notes, the analysis is based on old data. Still, I was heartened to see that several blogs by academics were included in the Top 100, including ahistoricality, The Volokh Conspiracy, Science Blogs and See Also. Examples of lively group academic blogs include PEA Soup (Philosophy, Ethics and Academia) and Cliopatria; MIT’s Comparative Media Studies program even highlights blog entries by its members on its home page.

As an avid reader of blogs (mainly focused on librarianship, digital humanities, and new technologies to make academics productive, like by reading more blog entries), I believe that blogging is an appropriate medium for academics. Granted, the only peer review comes through the comments that readers leave, and entries typically lack the polish and formality of an academic essay. Still, to rehearse by-now familiar arguments, blogging allows scholars to share cutting-edge research and to engage the community in discussion. I find that I typically have more “a-ha” moments when I read blogs than when I read other academic publications–perhaps because the shorter form of the blog is easier to digest (especially for someone like me, whose attention span has, I fear, been reduced by hopscotching from link to link), but also because blogs push out new information so quickly and enable commenting and linking.

So why am I blogging? In a sense, I regard blogging as a sort of virtual dissertation group. I finished my dissertation in large part because of the support of my diss group. Thanks to the group, I had deadlines, regular feedback on my work, and a community that was invested in me finishing. I hope that blogging will be a sort of virtual dissertation group–that is, I hope that blogging will force me to write often and express myself coherently, and that I will exchange ideas with and learn from other folks interested in digital humanities, book history, American literature and culture, etc. Since I am studying digital scholarship by attempting to do digital scholarship, I figure I should experiment with blogging, an important mode of digital discourse–and, in some cases, a form of digital scholarship. I want to make my research process transparent and reflect on the different tools, collections, and methods I’m using, and blogging seems like the best medium for frequent, open reflection. Of course, I also hope to contribute to the conversation about digital scholarship. I plan to re-work some of the ideas put forward in this blog in longer publications.

Doing Digital Scholarship

Here is a description of my research project, which I submitted as a paper proposal for Digital Humanities 2008:

Doing Digital Scholarship


When I completed my dissertation
Bachelors of Arts: Bachelorhood and the Construction of Literary Identity in Antebellum America in 2002, I figured that I was ahead of most of my peers in my use of digital resources, but I made no pretense of doing digital scholarship. I plumbed electronic text collections such as The Making of America and Early American Fiction for references to bachelorhood, and I used simple text analysis tools to count the number of times words such as “bachelor” appeared in key texts. I even built an online critical edition of a section from Reveries of a Bachelor (http://etext.virginia.edu/users/spiro/Contents2.html), one of the central texts of sentimental bachelorhood. But in my determination to finish my PhD before gathering too many more gray hairs, I resisted the impulse to use more sophisticated analytical tools or to publish my dissertation online.

Five years later, the possibilities for digital scholarship in the humanities have grown. Projects such as TAPOR, Token-X, and MONK are constructing sophisticated tools for text analysis and visualization. Massive text digitization projects such as Google Books and the Open Content Alliance are making it possible to search thousands of books. NINES and other initiatives are building communities of digital humanities scholars, portals to content, and mechanisms for conducting peer review of digital scholarship. To encourage digital scholarship, the NEH recently launched a digital humanities program. Meanwhile, scholars are blogging, putting up videos on YouTube, and using Web 2.0 tools to collaborate.

Despite this growth, there are still too few examples of innovative digital scholarship that employ “digital collections and analytical tools to generate new intellectual products” (ACLS 7). As reports such as A Kaleidoscope of American Literature and Our Cultural Commonwealth suggest, the paucity of digital scholarship results from the lack of appropriate tools, technical skills, funding, and recognition. In a study of Dickinson, Whitman and Uncle Tom’s Cabin scholars, my colleague Jane Segal and I found that although scholars are increasingly using digital resources in their research, they are essentially employing them to make traditional research practices more efficient and explore questions opened up by access to resources such as all of the editions of Whitman’s poetry or film versions of Uncle Tom’s Cabin, not (yet) to transform their research methodology by employing new tools and processes. Although we recognize the potential of digital scholarship, we have an incomplete understanding of what forms it can take and what is required to produce it. What does it mean to do humanities research in a Web 2.0 world? To what extent do existing tools, resources, and research methods support digital scholarship, and what else do scholars need?

To investigate these questions, I am revisiting my dissertation to re-imagine and re-mix it as digital scholarship. I aim not only to open up new insights into my primary research area–the significance of bachelorhood in nineteenth-century American culture–but also to document and analyze emerging methods for conducting research in a digital environment. I am structuring my research based on what John Unsworth calls the “scholarly primitives,” or core research practices in the humanities:

1. Discovering: I am conducting detailed searches of a wide range of both open access and subscription-based databases and web sites, documenting in a freely-available Google Spreadsheet (http://spreadsheets.google.com/pub?key=pAlYM7vZmTy_U6JhpvPAXFA) what searches and databases yield the best results and what gaps in digital resources persist. I am also noting how easy it is to retrieve, organize and analyze information provided in these databases.

2. Annotating: In the past, I kept research notes in long, unwieldy Word documents, which made it hard to find information that I needed. New software such as Zotero enables researchers to store copies of the digital resources and to make annotations as part of the metadata record or even on the web page itself. What effect does the ability to share and annotate resources have on research practices? How useful is tagging as a mechanism for organizing information?

3. Comparing: Through text analysis and collation software such as Juxta and TAPOR, scholars can compare different versions of texts and detect patterns. Likewise, the Virtual Lightbox allows researchers to compare and manipulate digital images. What kind of new insights can be generated by using these tools? In the course of doing my research, I am testing freely available tools and evaluating their usefulness for my project.

4. Referring: With hypertext, we can not only refer to prior work, but link to it, even embed it. What is the best means for constructing a scholarly apparatus in digital scholarship, particularly in a work focused not only on making an argument, but also on examining the process that shaped that argument?

5. Sampling: With so much information available, what criteria should we use to determine what to focus on? Since not everything is digitized and search engines can be blunt instruments, what do we ignore by relying mainly on digital resources? How can text mining tools help us to locate and understand relevant resources? I am documenting the selection criteria used to produce the arguments in my revamped dissertation.

6. Illustrating: What kind of evidence do we use to build an argument in a work of digital scholarship, and how is that evidence presented? In my dissertation, I generalized about the significance of bachelorhood in American literature by performing close readings of a few key texts, but such a method was admittedly unsystematic. By using text analysis tools across a much larger sample of primary texts, I can cite statistics and present visualizations in making my argument–but does this make my argument any more convincing?

7. Representing: How should a work of digital scholarship be presented? Ideally readers would be able to examine the evidence for themselves and even perform their own queries. At the same time, information must be offered in a way that is clear and consistent with familiar academic discourse. How should I make available not only research conclusions, but also the detailed research process that undergirds these conclusions–the successful and unsuccessful searches, the queries run in text analysis software, the insights offered by collaborators? How will the digital work compare to the more traditional original dissertation? What kind of tools will be used to author the work, and who will publish it?

In addition to Unsworth’s list, I offer two more:

8. Collaborating: Although humanities scholars are thought to be solitary, they collaborate frequently by exchanging bibliographic references and drafts of their essays. How do I engage the community in my research I am encouraging others to comment on my (re-) work in progress (http://digitalhumanities.edublogs.org/) using the Comment Press software. Moreover, I am bookmarking all web-based sources for my study through delicious (http://del.icio.us/lms4w/SpiroDigitalScholarship). I have also launched a blog where I explore issues and ideas raised by my research (https://digitalscholarship.wordpress.com/), and I am examining what it takes to build an audience and how visibility and collaboration affect my research practices.

9. Remixing: What would it mean to take an earlier work–my own dissertation, for example–use new sources and approaches, and present it in a new form? What constitutes a scholarly remix, and what are the implications for intellectual property and academic ethics? I also plan to experiment with mashups as a means of generating and presenting new insights, such as a Google Map plotting census statistics about antebellum bachelors or a visual mashup of images of bachelors.

This project examines the process of doing research digitally, the capabilities and limits of existing tools and resources, and the best means of authoring, representing and disseminating digital scholarship. I aim to make this process as open, visible, and collaborative as possible. My approach is informed by the Visible Knowledge Project, which calls for teachers to be reflective in examining the impact of technology on their own teaching (http://crossroads.georgetown.edu/vkp/index.htm), as well as by studies of scholars’ research practices such as Scholarly Work in the Humanities and the Evolving Information Environment. I hope my research will yield insights about how to best serve scholars as well as a potential model for digital scholarship in the humanities.

Works Cited

American Council of Learned Societies. Our Cultural Commonwealth: The Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences. New York: American Council of Learned Societies, 2006.

Brockman, William, Laura Neumann, Carole Palmer, and Tanya Tidline. Scholarly Work in the Humanities and the Evolving Information Environment. Washington D.C.: Digital Library Federation. 2001. 15 May 2007 <http://www.clir.org/PUBS/reports/pub104/pub104.pdf>.

Brogan, Martha. A Kaleidoscope of Digital American Literature. Digital Library Federation. 2005. 22 May 2007 <http://www.diglib.org/pubs/dlf104/&gt;.

Unsworth, John. “Scholarly Primitives: what methods do humanities researchers have in common, and how might our tools reflect this?’ 13 May 2000. 20 November 2007. <http://jefferson.village.virginia.edu/~jmu2m/Kings.5-00/primitives.html>

Yikes! Hooray! Information Overload and Wright American Fiction

Following the publication of Ik Marvel’s Reveries of a Bachelor in 1850, Americans became entranced by the sentimental bachelor. When I investigated the bachelor in 19th C American fiction in the late 1990s, I relied on Lyle Wright’s extensive bibliography American Fiction, 1851-1875. Since it would have been way too labor intensive for me to skim each of the 3000 books in the bibliography for instances of bachelors, I instead looked for works that had “bachelor” in the title, and came up with a reasonable sample of bachelor fiction. In 2003 (?), the Committee on Institutional Cooperation (CIC), led by Indiana University, digitized almost 3000 works in the bibliography, providing an immense service to the study of 19th C American literature. When I searched for “bachelor” in Wright today, I came up with 4267 matches in 1161 records, which represents over 1/3 of the works in the bibliography. Suddenly American fiction in the years immediately after Reveries of a Bachelor is opened up. Maybe I could write an article examining the impact of Reveries on American literature, looking for mentions of “Ik Marvel” and for narrators mimicking his sentimental voice.

Now comes the problem of how to deal with this plenty. Should I just look at works that have the term bachelor at least, say, 3 times? Of course, I could do a more focused search, but then I might miss some important stuff. I could look at all 1161 works to ask larger questions about the bachelor in American lit. What if I categorized the different valences associated with “bachelor” or the types of works (sentimental, sensational, etc) in which bachelors play an important role? What if I used text mining/ visualization tools to look for patterns? Such tasks become possible when everything is available digitally.

Yet handling so many search results also requires lots of time and patience, as well as a clear purpose. Unfortunately, Zotero does not yet have a translator for Wright, and I wasn’t able to import the XML-based bibliographic records generated by Wright into Zotero, so I had to spend about 3-5 minutes to capture bibliographic info for each of the 6 records that had “bachelor” in the title in Zotero. Some of the works in Wright only offer page images (although I think you can also see the dirty OCR behind them), which would seem to make them much more difficult to analyze and manipulate.

So while I’m thrilled that Wright will make it possible for me to analyze so many more works of bachelor fiction, I’m also feeling overwhelmed, unsure how to manage all of this information. Time for a reverie…

Collaboration Tools: Scary or Revelatory?

Words like community bring out the warm fuzzies in me, until I start to think about the complexities of actually working in a community and negotiating among different perspectives. I’m wondering what scholarship might mean in an intensely collaborative and social Web 2.0 environment, so I began by trying to wikify my research project (morphing a dissertation on bachelors into digital scholarship). As I opened up my writing to others, I had two big fears:

  1. No one will participate, so there won’t be much of an experiment in community at all. Still, I will be able to view my own revision history and gain some understanding about how going digital is changing my research. (This seems like the more likely prospect.)
  2. People will participate, and I’ll lose control over my work. Do I really want people rewriting my research project? Will the ideas still be mine? What if vandals rampage my writing?

I decided that a system that allows comments rather than complete re-writing was more appropriate, so I decided to use the Institute for the Future of the Book’s Comment Press application, an add-on to the Word Press blogging platform.  Several other collaborative authoring project are using Comment Press, including The Future of Learning Institutions in a Digital Age with HASTAC (Humanities, Arts, Science, and Technology Advanced Collaboratory), GAM3R 7H30RY 1.1 by McKenzie Wark, and The Iraq Study Group Report with Lapham’s Quarterly.  

Welcome to Digital Scholarship in the Humanities

As someone with a strong interest in the digital humanities, I’ve been excited by recent reports calling for more support for digital scholarship, such as the ACLS Cyberinfrastructure report. At the same time, I’m aware that digital tools such as text analysis software have not yet been widely adopted by humanities scholars. I wonder: 1) what impact are digital tools and resources having on mainstream humanities scholarship? And 2) what would it take to produce solid digital scholarship, which I define (rather fuzzily, I admit) as scholarship that uses digital tools and resources in innovative ways, or experiments with new modes of presenting scholarly arguments. (I’ll devote a future blog post to coming up with a better definition.)

With my colleague Jane Segal, I’ve been investigating the first question by seeing how many Dickinson, Whitman and Uncle Tom’s Cabin scholars cite the leading thematic digital research collections the Dickinson Electronic Archives, Walt Whitman Archive, and Uncle Tom’s Cabin and American Culture. We’ve found that although few scholars cite digital collections, many use them, primarily to access unique resources (such as images of Whitman’s manuscript pages or film versions of Uncle Tom’s Cabin) and use search tools.

N ow I’m turning my attention to question #2, examining what it takes to produce digital scholarship by trying my own hand at it. I finished my dissertation five years ago, and for five years I’ve been avoiding looking at the &*%^! thing. But now I’d like to remix my dissertation, which examines bachelorhood in nineteenth-century American literature and culture, as a work of digital scholarship. I plan to dive into all of the new electronic resources that have come online since 2002, experiment with text analysis and visualization tools, use bibliographic software to organize my research, and make my whole research process as well-documented and transparent as possible by blogging it, sharing bookmarks, tracking my research in a freely available spreadsheet, etc.  I may even make some bachelor mashups–google maps showing where all the nineteenth century bachelors lived (sure to get picked up by dating sites), flickr mashups of classic bachelor images (sure to get picked up by hot or not?).  I also hope to engage the community–digital humanists, Americanists, digital librarians, banjo players, and whoever else is interested–in this work.  At this point, I have many, many questions, a couple of hunches, and no answers.  My project may appear to be a bit of navel-gazing (researching how a researcher researches by researching myself), but my hope is that I’ll have a much better understanding of how research can be conducted in a digital environment by doing it myself. Along the way, I plan to learn about new tools and methods. I expect to stumble, too, but that’ll be interesting and worth reporting on.