Work Product Blog

Matt Wilkens, post-doctoral fellow at Rice’s Humanities Research Center, recently launched Work Product, a blog that chronicles his research in digital humanities, contemporary fiction, and literary theory.  Matt details how he is working through the challenges he faces as he tries to analyze the relationship between allegory and revolution by using text mining, such as:
•    Where and how to get large literary corpora. Matt looks at how much content is available through Project Gutenberg, Open Content Alliance, Google Books, and  Hathi Trust and  how difficult it is to access
•    Evaluating Part of Speech taggers, with information about speed and accuracy

I think that other researchers working on text mining projects will benefit from Matt’s careful documentation of his process.

By the way, Matt’s blog can be thought of as part of the movement called “open notebook science,” which Jean Claude Bradley defines as “a laboratory notebook… that is freely available and indexed on common search engines.”  Other humanities and social sciences blogs that are likewise ongoing explorations of particular research projects include Wesley Raabe’s blog, Another Anthro Blog, and Erkan’s Field Diary.  (Please alert me to others!)

More Digital Humanities Jobs

To follow up on my last post about digital humanities jobs, I wanted to mention one more that has come to my attention recently:

  • Research Associate, John Nicholas Brown Center for Public Humanities and Cultural Heritage at Brown University: “The John Nicholas Brown Center for Public Humanities and Cultural Heritage at Brown University has an opening for a research associate in the technologies of the public humanities. We’re looking for a person with experience in the use of technologies in museums, documentation, and the web, and with new ideas about how humanities scholars and the public might communicate using technologies in innovative ways.Help us answer questions including: What are the most successful digital humanities projects, and why? What does the field need? What should students in the public humanities know about technology? What technologies should be available for faculty and students interested in outreach to a public and in opening conversations with communities?

    The research associate position is available for one or two semesters, starting in July 2009, and may be extended. The research associate will be expected to organize at least one public program during their tenure, as well as to conduct research and participate in the activities of the John Nicholas Brown Center. Among the outcomes of the position might be:
    a conference on the topic; a series of workshops for universities, museums and cultural organizations; a course for Brown students; an exhibition space designed to allow experimentation with new technologies; new web tools for the public humanities; or a grant proposal outlining a new digital humanities initiative.

    We welcome proposals for this position. The ideal applicant will have expertise and prior experience working in the technology of public humanities, and an interest in working within an interdisciplinary and public context. A Ph.D. is preferred but not required. Please respond by January 31, 2009, with a c.v. and a 3-5 page proposal outlining your ideas for your tenure in the position.

    More information on the John Nicholas Brown Center available at
    www.brown.edu/JNBC

By the way, Digital Arts & Humanities is a great place to look for job postings.

Digital Humanities Jobs

A professor who has been gently mocking my interest in digital humanities now thinks there may be something to it, since a number of job postings that mention digital humanities appear on this year’s MLA job list.  Yes indeed, it does seem that some exciting DH job postings have been popping up over the past few months, including:

Some job ads mention digital humanities as a desired area of specialty or suggest that the successful applicant could participate in the digital humanities program, e.g.

So is digital humanities emerging as a hot new field?   Well, maybe–but whereas a search for “digital humanities” at the Chronicle of Higher Ed’s careers site brings up 5 results (clearly not every open job in DH), a search for “transnational” yields 40 results, and “cultural studies” 28.  Still, it seems that there are a wider range of opportunities in the digital humanities.  Most of the jobs in, say, transnational studies are faculty posts, whereas we see digital humanities jobs in libraries, humanities centers, and academic computing departments as well as in academic departments. (Tom Scheinfeldt recently wrote a great post about the need to establish employment models for non-tenure-track researchers and developers working on digital humanities projects at universities.)

Perhaps one indicator of DH’s increasing visibility is the push-back against it.  In his jeremiad about the “trendism” of MLA job list as a sign of the decline of literary studies, William Deresiewicz declares,  “There are postings here for positions in science fiction, in fantasy literature, in children’s literature, even in something called ‘digital humanities.'”  In a recent online forum hosted by the Chronicle of Higher Education, a contributor noted that several colleagues work on digital humanities and that “I think it’s safe to say that the vast majority of us in the department have absolutely no idea what they do or what they’re talking about when they try to explain it. In fact, we are not sure they understand what they do. However, it seems to be very sexy and attracts a lot of grant funding. I can’t help wondering if it’s just a fad and will die out soon.” But maybe not understanding what you do is a sign of emergence.  Anyway, when I asked a group of “traditional” humanities professors recently if they thought digital humanities was just a fad, they responded emphatically that it was not, arguing that information increasingly is in a digital format and that scholars need to understand how to work in the digital environment. I agree.

So what skills should an aspiring digital humanist cultivate?  When I started working at Virginia’s Electronic Text Center way back in the 1990s, David Seaman, the director, told me that he viewed an understanding of the humanities as being most important, since most people can pick up the technical skills much more easily than they can the disciplinary knowledge.  That makes sense to me, although technical skills are also important.  Of course, the requirements for each position differ, particularly when you’re comparing library or IT positions to faculty positions.  However, many employers seem to emphasize a similar set of skills:

  • strong humanities background
  • understanding of the research process and emerging technologies for humanities research (data mining, visualization, mashups, social networking, etc)
  • strong written and oral communication skills
  • knowledge of XML (e.g. TEI), XSLT, and related technologies
  • ability to work well on a team
  • database design and development skills
  • web development skills (PHP, CSS, etc)
  • programming & scripting skills
  • project management experience
  • experience with user-centered design

Link of the Day: Visualization Periodic Table

Note: Although I’ve become enamored of the idea of slow blogging, where meditation trumps speed and frequency, I’ve also been feeling guilty about my own absence from blogging.  (I’ve got plenty of excuses–a hurricane, a pile-up of presentations and papers, etc–but I won’t bore you with them.)  In the hopes of becoming a more active blogger, I’ve decided to launch a new feature: link of the day (which may turn out to be more like link of the week or fortnight), a quick discussion of something that has caught my interest.

Today’s link of the day (LOD): A Periodic Table of Visualization Methods. Interest in visualization seems to be growing in digital humanities as scholars look for ways to make sense of large data sets.  This interactive chart lists dozens of different approaches to visualization, including histogram, scatterplot, timeline, square of opposition, infomural, heaven ‘n hell chart, and strategic game board.  The periodic table’s creators, Ralph Lengler & Martin J. Eppler of the Institute of Corporate Communication, are part of a group creating an e-learning course on visual literacy for business, communications and engineering.  They group the visualizations into 6 main categories: data visualization, information visualization, concept visualization, strategy visualization, metaphor visualization, and compound visualization; each column is arranged according from least to most complex.  To see an example of each visualization method, click on the cell to open a pop-up window.  As Lengler and Eppler explain in a recent paper, “Towards A Periodic Table of Visualization Methods for Managemenrt,” the table serves as a “structured toolbox” from which users can select  visualizations suited to different tasks.  Although the table is missing textual visualizations such as tag clouds, I found this to be a useful learning tool.  (And, well, just cool.)  Pair this web page with an exploration of  Many Eyes and you have some great interactive resources for humanities students to learn about visualization.

Tips on Writing a Successful Grant Proposal

The NEH recently announced deadlines for several digital humanities programs, including the NEH Fellowships at Digital Humanities Centers (Sep. 15), Digital Humanities Start-Up Grants. (Oct. 8), and DFG/NEH Joint Digitization Program Deadline (Oct. 15).  So how do you win one of these grants?  I’ve had the honor and privilege (really, I mean it) of serving on several review panels, which has given me insight into what sets apart excellent proposals.  (Nope, I’m not going to say which panels I served on—let’s say if you won a grant, I was on the panel, and if you didn’t, I wasn’t.)

Before serving on a grant review panel, I sort of pictured it as a smoke-filled room where fat-cats chomping on big cigars exercised all their political might to get pet projects funded.  (OK, not really–but it was mysterious.) But the process is nothing like that—no smoke, no posturing, no arm-wringing.  Instead, the NEH brings together 5 or so experts in the field—often directors of digital humanities centers, faculty who have led digital projects, and others who have both subject knowledge in the humanities and expertise in technology– to evaluate the proposals.  Prior to coming to DC to serve on the panel, the panelists review each proposal and make detailed comments, using the grant guidelines as a rubric.  Panelists rank each proposal either as “excellent,” “very good,” “some merit,” or “not recommended for funding.”  Typically I read each proposal three times: first I give all of the proposals a quick read to get a sense of the whole, then I read more slowly to develop a more detailed understanding of each one, and finally I skim as I write up my comments.  The panel itself typically begins with an NEH official explaining the review process, including the conflict of interest rules.  Then panelists discuss each proposal, beginning with the ones rated most highly.  Each panelist provides his or her initial perspective on the proposal, which is followed by an open, respectful debate about its strengths and weaknesses.  Once the discussion is complete, each panelist offers his or her final ranking of the proposal.  I’m fascinated to hear the different perspectives offered by the other panelists; often I am persuaded to change my rankings based on the discussion.  At the end of an exhilarating and exhausting day, the NEH asks panelists for feedback on the proposal guidelines and the review process, demonstrating their commitment to improvement.

Based on my experience as a reviewer, I think I have insight into what makes a strong proposal.  I should say that I’ve never actually received an NEH grant, so take these suggestions with a grain of salt.

  • If you don’t receive a grant, don’t despair. On the panels on which I’ve served, only about 20% of the proposals get funded, which means that some very strong ones just don’t make it.  But you can always reapply, using the reviewers’ comments to strengthen your proposal.
  • Read the Guidelines: Make sure that your proposed project meets the criteria of the grant program.  Would it be better suited for another grant program?  In your narrative, address explicitly how you meet the review criteria–don’t make the reviewers guess.
  • Make an argument for funding your proposal: Don’t just say what you will do, but why it’s important to do it. What impact will your project have on the field, institution, or community?  How?  How is your proposal innovative?  Strong, relevant letters of support can help you make your argument about the proposal’s significance; it’s impressive when leading scholars testify to a project’s importance, but a stack of weak, generic letters can make a proposal seem, well, desperate.
  • Talk to the Program Officers: They’re there to help.  Often they will review a draft proposal prior to submission, provided that you get it to them at least 6 weeks in advance of the grant deadline.  I’m quite impressed by the staff of the Digital Humanities Office: they’re smart, knowledgeable, energetic, all-around good folks, the kind you would trust to lead one of the most visible funding programs in digital humanities.  In the review panels, they focus not on how weak a proposal is, but how they can help the applicant to make it better.
  • Show that you have technical knowledge: Digital humanities projects demand both sophisticated technical and subject knowledge.  Cite the appropriate standards and best practices and explain how you will apply them.
  • Focus. If you attempt to do too much, reviewers will wonder if you can pull it all off, and question what exactly it is you’re trying to do, anyway.
  • Be realistic. It’s always hard to figure out how long a project will take and how much everything will cost.  Talk to others who have done similar work to get a sense of what it will take to pull off your project.  In the work plan, offer a detailed description of what will be accomplished by what deadline and by whom.  Don’t over-promise; remember, if you win the grant, you’ll actually have to do what you said you would do.
  • Sweat the small stuff: Although reviewers focus on the substance of the proposal, a sloppy application can detract from the overall quality.  Proofread carefully to catch grammatical errors.  Think about the design of the document.  If I see huge margins and jumbo fonts, I wonder if the applicant is just trying to fill up space.
  • Ask to see the reviewers’ comments. Whether you’re successful or not, read the reviewers’ comments, which will likely be full of helpful suggestions about how to improve the project and application.  You’re getting free consulting from 5 or more experts in the field—take advantage of it.
  • Consider serving on a grant review panel. Sure, it’s a lot of work, but worth it. You do get a small stipend, but given that it takes about 3-4 hours to review and comment on each proposal and additional time to travel to DC and serve on the panel, the hourly pay probably works out to about $5 or $6.  But you get to serve the community, spend the day with smart colleagues talking about stuff that matters, and learn about what new ideas and projects are bubbling up.  Perhaps most importantly, I think I now have a better sense of what it takes to write strong application.   As a bonus, sometimes you get your very own plate of chocolate—including Special Dark!–for an afternoon boost.

For a detailed, inside-the-NEH perspective on writing successful applications, see Meredith Hindley’s How to Get a Grant from NEH:  A public service message.
Good luck!

Is Wikipedia Becoming a Respectable Academic Source?

Last year a colleague in the English department described a conversation in which a friend revealed a dirty little secret: “I use Wikipedia all the time for my research—but I certainly wouldn’t cite it.”  This got me wondering: How many humanities and social sciences researchers are discussing, using, and citing Wikipedia?  To find out, I searched Project Muse and JSTOR, leading electronic journal collections for the humanities and social sciences, for the term “wikipedia,” which picked up both references to Wikipedia and citations of the wikipedia URL.  I retrieved 167 results from between 2002 and 2008, all but 8 of which came from Project Muse.  (JSTOR covers more journals and a wider range of disciplines but does not provide access to issues published in the last 3-5 years.)  In contrast, Project Muse lists 149 results in a search for “Encyclopedia Britannica” between 2002 and 2008, and JSTOR lists 3.  I found that citations of Wikipedia have been increasing steadily: from 1 in 2002 (not surprisingly, by Yochai Benkler) to 17 in 2005 to 56 in 2007. So far Wikipedia has been cited 52 times in 2008, and it’s only August.

Along with the increasing number of citations, another indicator that Wikipedia may be gaining respectability is its citation by well-known scholars.  Indeed, several scholars both cite Wikipedia and are themselves subjects of Wikipedia entries, including Gayatri Spivak, Yochai Benkler, Hal Varian, Henry Jenkins, Jerome McGann, Lawrence Buell, and Donna Haraway.

111 of the sources (66.5%) are what I call “straight citations”—citations of Wikipedia without commentary about it–while 56 (34.5%) comment on Wikipedia as a source, either positively or negatively.  14.5% of the total citations come from literary studies, 14% from cultural studies, 11.4% from history, and 6.6% from law. Researchers cite Wikipedia on a diversity of topics, ranging from the military-industrial complex to horror films to Bush’s second state of the union speech.  8 use Wikipedia simply as a source for images (such as an advertisement for Yummy Mummy cereal or a diagram of the architecture of the Internet).  Many employ Wikipedia either as a source for information about contemporary culture or as a reflection of contemporary cultural opinion.  For instance, to illustrate how novels such as The Scarlet Letter and Uncle Tom’s Cabin have been sanctified as “Great American Novels,” Lawrence Buell cites the Wikipedia entry on “Great American Novel”(Buell).

About a third of the articles I looked at discuss the significance of Wikipedia itself.  14 (8%) criticize using it in research.  For instance, a reviewer of a biography about Robert E. Lee tsks-tsks:

The only curiosities are several references to Wikipedia for information that could (and should) have been easily obtained elsewhere (battle casualties, for example). Hopefully this does not portend a trend toward normalizing this unreliable source, the very thing Pryor decries in others’ work. (Margolies).

In contrast, 11 (6.6%) cite Wikipedia as a model for participatory culture.  For example:

The rise of the net offers a solution to the major impediment in the growth and complexification of the gift economy, that network of relationships where people come together to pursue public values. Wikipedia is one example.(DiZerega)

A few (1.8%) cite Wikipedia self-consciously, aware of its limitations but asserting its relevance for their particular project:

Citing Wikipedia is always dicey, but it is possible to cite a specific version of an entry. Start with the link here, because cybervandals have deleted the list on at least one occasion. For a reputable “permanent version” of “Alternative press (U.S. political right)” see: http://en.wikipedia.org/w/index.php?title=Alternative_press_%28U.S._political_right%29&oldid=107090129 (Berlet).

Of course, just because more researchers—including some prominent ones—are citing Wikipedia does not mean it’s necessarily a valid source for academic papers.  However, you can begin to see academic norms shifting as more scholars find useful information in Wikipedia and begin to cite it.  As Christine Borgman notes, “Scholarly documents achieve trustworthiness through a social process to assure readers that the document satisfies the quality norms of the field” (Borgman 84).  As a possible sign of academic norms changing in some disciplines, several journals, particularly those focused on contemporary culture, include 3 or more articles that reference Wikipedia: Advertising and Society Review (7 citations), American Quarterly (3 citations), College Literature (3 citations), Computer Music Journal (5 citations), Indiana Journal of Global Legal Studies (3 citations), Leonardo (8 citations), Library Trends (5 citations), Mediterranean Quarterly (3 citations), and Technology and Culture (3 citations).

So can Wikipedia be a reputable scholarly resource?  I typically see four main criticisms of Wikipedia:

1) Research projects shouldn’t rely upon encyclopedias. Even Jimmy Wales, (co?-)founder of Wikipedia, acknowledges “I still would say that an encyclopedia is just not the kind of thing you would reference as a source in an academic paper. Particularly not an encyclopedia that could change instantly and not have a final vetting process” (Young).  But an encyclopedia can be a valid starting point for research.  Indeed, The Craft of Research, a classic guide to research, advises that researchers consult reference works such as encyclopedias to gain general knowledge about a topic and discover related works (80).  Wikipedia covers topics often left out of traditional reference works, such as contemporary culture and technology.  Most if not all of the works I looked at used Wikipedia to offer a particular piece of background information, not as a foundation for their argument.

2) Since Wikipedia is constantly undergoing revisions, it is too unstable to cite; what you read and verified today might be gone tomorrow–or even in an hour.  True, but Wikipedia is developing the ability for a particular version of an entry to be vetted by experts and then frozen, so researchers could cite an authoritative, unchanging version (Young).  As the above citation from Berlet indicates, you can already provide a link to a specific version of an article.

3) You can’t trust Wikipedia because anyone—including folks with no expertise, strong biases, or malicious (or silly) intent—can contribute to it anonymously.  Yes, but through the back and forth between “passionate amateurs,” experts, and Wikipedia guardians protecting against vandals, good stuff often emerges. As Nicholson Baker, who has himself edited Wikipedia articles on topics such as the Brooklyn Heights and the painter Emma Fordyce MacRae, notes in a delightful essay about Wikipedia, “Wikipedia was the point of convergence for the self-taught and the expensively educated. The cranks had to consort with the mainstreamers and hash it all out” (Baker).  As Roy Rosenzweig found in a detailed analysis of Wikipedia’s appropriateness for historical research, the quality of the collaboratively-produced Wikipedia entries can be uneven: certain topics are covered in greater detail than others, and the writing can have the choppy, flat quality of something composed by committee.  But Rosenzweig also concluded that Wikipedia compares favorably with Encarta and Encyclopedia Britannica for accuracy and coverage.

4) Wikipedia entries lack authority because there’s no peer review. Well, depends on how you define “peer review.”  Granted, Wikipedia articles aren’t reviewed by two or three (typically anonymous) experts in the field, so they may lack the scholarly authority of an article published in an academic journal.  However, articles in Wikipedia can be reviewed and corrected by the entire community, including experts, knowledgeable amateurs, and others devoted to Wikipedia’s mission to develop, collect and disseminate educational content (as well as by vandals and fools, I’ll acknowledge).  Wikipedia entries aim to achieve what Wikipedians call “verifiability”; the article about Barack Obama, for instance, has as many footnotes as a law review article–171 at last count (August 31), including several from this week.

Now I’m certainly not saying that Wikipedia is always a good source for an academic work–there is some dreck in it, as in other sources.  Ultimately, I think Wikipedia’s appropriateness as an academic source depends on what is being cited and for what purpose.   Alan Liu offers students a sensible set of guidelines for the appropriate use of Wikipedia, noting that it, like other encyclopedias, can be a good starting point, but that it is “currently an uneven resource” and always in flux.  Instead of condemning Wikipedia outright, professors should help students develop what Henry Jenkins calls “new media literacies.”  By examining the history and discussion pages associated with each article, for instance, students can gain insight into how knowledge is created and how to evaluate a source.  As John Seely Brown and Richard Adler write:

The openness of Wikipedia is instructive in another way: by clicking on tabs that appear on every page, a user can easily review the history of any article as well as contributors’ ongoing discussion of and sometimes fierce debates around its content, which offer useful insights into the practices and standards of the community that is responsible for creating that entry in Wikipedia. (In some cases, Wikipedia articles start with initial contributions by passionate amateurs, followed by contributions from professional scholars/researchers who weigh in on the “final” versions. Here is where the contested part of the material becomes most usefully evident.) In this open environment, both the content and the process by which it is created are equally visible, thereby enabling a new kind of critical reading—almost a new form of literacy—that invites the reader to join in the consideration of what information is reliable and/or important.(Brown & Adler)

OK, maybe Wikipedia can be a legitimate source for student research papers–and furnish a way to teach research skills.  But should it be cited in scholarly publications?  In “A Note on Wikipedia as a Scholarly Source of Record,” part of the preface to Mechanisms, Matt Kirschenbaum offers a compelling explanation of why he cited Wikipedia, particularly when discussing technical documentation:

Information technology is among the most reliable content domains on Wikipedia, given the high interest of such topics Wikipedia’s readership and the consequent scrutiny they tend to attract.   Moreover, the ability to examine page histories on Wikipedia allows a user to recover the editorial record of a particular entry… Attention to these editorial histories can help users exercise sound judgment as to whether or not the information before them at any given moment is controversial, and I have availed myself of that functionality when deciding whether or not to rely on Wikipedia.(Kirschenbaum xvii)

With Wikipedia, as with other sources, scholars should use critical judgment in analyzing its reliability and appropriateness for citation.  If scholars carefully evaluate a Wikipedia article’s accuracy, I don’t think there should be any shame in citing it.

For more information, review the Zotero report detailing all of the works citing Wikipedia, or take a look at a spreadsheet of basic bibliographic information. I’d be happy to share my bibliographic data with anyone who is interested.

Works Cited

Baker, Nicholson. “The Charms of Wikipedia.” The New York Review of Books 55.4 (2008). 30 Aug 2008 <http://www.nybooks.com/articles/21131&gt;.

Berlet, Chip. “The Write Stuff: U. S. Serial Print Culture from Conservatives out to Neonazis.” Library Trends 56.3 (2008): 570-600. 24 Aug 2008 <http://muse.jhu.edu/journals/library_trends/v056/56.3berlet.html&gt;.

Booth, Wayne C, and Colomb, Gregory G. The Craft of Research. Chicago: U of Chicago P, 2003.

Borgman, Christine L. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass., 2007.

Brown, John Seely, and Richard P. Adler. “Minds on Fire: Open Education, the Long Tail, and Learning 2.0 .” EDUCAUSE Review 43.1 (2008): 16-32. 29 Aug 2008 <http://connect.educause.edu/Library/EDUCAUSE+Review/MindsonFireOpenEducationt/45823?time=1220007552&gt;.

Buell, Lawrence. “The Unkillable Dream of the Great American Novel: Moby-Dick as Test Case.” American Literary History 20.1 (2008): 132-155. 24 Aug 2008 <http://muse.jhu.edu/journals/american_literary_history/v020/20.1buell.pdf&gt;.

Dee, Jonathan. “All the News That’s Fit to Print Out.” The New York Times 1 Jul 2007. 30 Aug 2008 <http://www.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html&gt;.

DiZerega, Gus. “Civil Society, Philanthropy, and Institutions of Care.” The Good Society 15.1 (2006): 43-50. 24 Aug 2008 <http://muse.jhu.edu/journals/good_society/v015/15.1diZerega.html&gt;.

Jenkins, Henry. “What Wikipedia Can Teach Us About the New Media Literacies (Part One).” Confessions of an Aca/Fan 26 Jun 2007. 30 Aug 2008 <http://www.henryjenkins.org/2007/06/what_wikipedia_can_teach_us_ab.html&gt;.

Kirschenbaum, Matthew G. Mechanisms : new media and the forensic imagination. (Cambridge, Mass.: MIT Press, 2008).

Liu, Alan. “Student Wikipedia Use Policy.” 1 Apr 2007. 30 Aug 2008 <http://www.english.ucsb.edu/faculty/ayliu/courses/wikipedia-policy.html&gt;.

Margolies, Daniel S. “Robert E. Lee: Heroic, But Not the Polio Vaccine.” Reviews in American History 35.3 (2007): 385-392. 25 Aug 2008 <http://muse.jhu.edu/journals/reviews_in_american_history/v035/35.3margolies.html&gt;.

Rosenzweig, Roy. “Can History be Open Source? Wikipedia and the Future of the Past.” The Journal of American History Volume 93, Number 1 (June, 2006): 117-46.  Available at http://chnm.gmu.edu/resources/essays/d/42

Young, Jeffrey. “Wikipedia’s Co-Founder Wants to Make It More Useful to Academe.” Chronicle of Higher Education 13 Jun 2008. 28 Aug 2008 <http://chronicle.com/free/v54/i40/40a01801.htm?utm_source=at&utm_medium=en&gt;.

Doing Digital Scholarship: Presentation at Digital Humanities 2008

Note:  Here is roughly what I said during my presentation at Digital Humanities 2008 in Oulu, Finland (or at least meant to say—I was so sleep deprived thanks to the unceasing sunshine that I’m not sure what I actually did say).  My session, which explored the meaning and significance of “digital humanities,” also featured rich, engaging presentations by Edward Vanhoutte on the history of humanities computing and John Walsh on comparing alchemy and digital humanities.  My presentation reports on my project to remix my dissertation as a work of digital scholarship and synthesizes many of my earlier blog posts to offer a sort of Reader’s Digest condensed version of my blog for the past 7 months. By the way, sorry that I’ve been away from the blog for so long.  I’ve spent the last month and a half researching and writing a 100 page report on archival management software,  reviewing essays, performing various other professional duties, and going on both a family vacation to San Antonio and a grown-up vacation to Portland, OR (vegan meals followed up by Cap’n Crunch donuts.  It took me a week to recover from the donut hangover).  In the meantime, lots of ideas have been brewing, so expect many new blog entries soon.

***

When I began working on my dissertation in the mid 1990s, I used a computer primarily to do word processing—and goof off with Tetris.  Although I used digital collections such as Early American Fiction and Making of America for my dissertation project on bachelorhood in 19th C American literature, I did much of my research the old fashioned way: flipping through the yellowing pages of 19th century periodicals on the hunt for references to bachelors, taking notes using my lucky leaky fountain pen.  I relied on books for my research and, in the end, produced a book.

At the same time that I was dissertating, I was also becoming enthralled by the potential of digital scholarship through my work at the University of Virginia’s (late lamented) Electronic Text Center.  I produced an electronic edition of the first section from Donald Grant Mitchell’s bestseller Reveries of a Bachelor that allowed readers to toggle between variants.   I even convinced my department to count Perl as a second language, citing the Matt Kirschenbaum precedent (“come on, you let Matt do it, and look how well that turned out”) and the value of computer languages to my profession as a budding digital humanist.  However, I decided not to create an electronic version of my dissertation (beyond a carefully backed-up Word file) or to use computational methods in doing my research, since I wanted to finish the darn thing before I reached retirement age.

Last year, five years after I received my PhD and seven years after I had become the director of Rice University’s Digital Media Center, I was pondering the potential of digital humanities, especially given mass digitization projects and the emergence of tools such as TAPOR and Zotero.  I wondered: What is digital scholarship, anyway?  What does it take to produce digital scholarship? What kind of digital resources and tools are available to support it? To what extent do these resources and tools enable us to do research more productively and creatively? What new questions do these tools and resources enable us to ask? What’s challenging about producing digital scholarship? What happens when scholars share research openly through blogs, institutional repositories, & other means?

I decided to investigate these questions by remixing my 2002 dissertation as a work of digital scholarship.  Now I’ll acknowledge that my study is not exactly scientific—there is a rather subjective sample of one.  However, I figured, somewhat pragmatically, that the best way for me to understand what digital scholars face was to do the work myself.  I set some loose guidelines: I would rely on digital collections as much as possible and would experiment with tools for analyzing, annotating, organizing, comparing and visualizing digital information.  I would also explore different ways of representing my ideas, such as hypertextual essays and videos.  Embracing social scholarship, I would do my best to share my work openly and make my research process transparent.  So that the project would be fun and evolve organically, I decided to follow my curiosity wherever it led me, imagining that I would end up with a series of essays on bachelorhood in 19th century American culture and, as sort of an exoskeleton, meta-reflections on the process of producing digital scholarship.

My first challenge was defining digital scholarship.  The ACLS Commission on Cyberinfrastructure’s report points to five manifestations of digital scholarship: collection building, tools to support collection building, tools to support analysis, using tools and collections to produce “new intellectual products,” and authoring tools.   Some might argue we shouldn’t really count tool and collection building as scholarship.  I’ll engage with this question in more detail in a future post, but for now let me say that most consider critical editions, bibliographies, dictionaries and collations, arguably the collections and tools of the pre-digital era, to be scholarship.  In many cases, building academic tools and collections requires significant research and expertise and results in the creation of knowledge—so, scholarship.   Still, my primary focus is on the fourth aspect, leveraging digital resources and tools to produce new arguments.  I’m realizing along the way, though, that I may need to build my own personal collections and develop my own analytical tools to do the kind of scholarship I want to do.

In a recent presentation at CNI, Tara McPherson, the editor of Vectors, offered her own “Typology of Digital Humanities”:
•    The Computing Humanities: focused on building tools, infrastructure, standards and collections, e.g. The Blake Archive
•    The Blogging Humanities: networked, peer-to-peer, e.g. crooked timber
•    The Multimodal Humanities: “bring together databases, scholarly tools, networked writing, and peer-to-peer commentary while also leveraging the potential of the visual and aural media that so dominate contemporary life,” e.g. Vectors

Mashing up these two frameworks, my own typology would look something like this:

•    Tools, e.g. TAPOR, Zotero
•    Collections, e.g. The Blake Archive
•    Theories, e.g. McGann’s Radiant Textuality
•    Interpretations and arguments that leverage digital collections and tools, e.g. Ayers and Thomas’ The Difference Slavery Made
•    Networked Scholarship: a term that I borrow from the Institute for the Future of the Book’s Bob Stein and that I prefer to “blogging humanities,” since it encompasses many modes of communication, such as wikis, social bookmarking, institutional repositories, etc. Examples include Savage Minds (a group blog in anthropology), etc.
•    Multimodal scholarship: e.g. scholarly hypertexts and videos, e.g. what you might find in Vectors
•    Digital cultural studies, e.g. game studies, Lev Manovich’s work, etc (this category overlaps with theories)

Initially I assumed that tools, theories and collections would feed into arguments that would be expressed as networked and/or multimodal scholarship and be informed by digital cultural studies.  But I think that describing digital scholarship as a sort of assembly line in which scholars use tools, collections and theories to produce arguments oversimplifies the process.  My initial diagram of digital scholarship pictured single-headed arrows linking different approaches to digital scholarship; my revised diagram looks more like spaghetti, with arrows going all over the place.  Theories inform collection building; the process of blogging helps to shape an argument; how a scholar wants to communicate an idea influences what tools are selected and how they are used.

After coming up with a preliminary definition of what I wanted to do, I needed to figure out how to structure my work.  I thought of John Unsworth’s notion of scholarly primitives, a compelling description of core research practices.  Depending on how you count them, Unsworth identifies 7 scholarly primitives:
•    Discovering
•    Annotating
•    Comparing
•    Referring
•    Sampling
•    Illustrating
•    Representing

As useful as this list is in crystallizing what scholars do, I think the list is missing at least one more crucial scholarly primitive, perhaps the fundamental one: collaboration. Although humanists are stereotyped as solitary scholars isolated in the library, they often work together, whether through co-editing journals or books, sharing citations, or reviewing one another’s work.  In the digital humanities, of course, developing tools, standards, and collections demands collaboration among scholars, librarians, programmers, etc.  I would also define networked scholarship—blogging, contributing to wikis, etc—as collaborative, since it requires openly sharing ideas and supports conversation. It’s only appropriate for me to note that this idea was worked out collaboratively, with colleagues at THAT Camp.

I want to make my research process as visible as possible, not only for idealistic reasons, but also because my work only gets better the more feedback I receive.  So I started up a blog—actually, several of them. At the somewhat grandly-named Digital Scholarship in the Humanities, I reflect on trends in the digital humanities and on broader lessons learned in the process of doing my research project.  In “Lisa Spiro’s Research Notes,”  I typically address stuff that seems too specialized, half-baked, or even raw for me to put on my main blog, such as my navel gazing on where to take my project next, or my experiments with Open Wound, a language re-mixing tool.   At my PageFlakes research portal, I provide a single portal to the various parts of my research project, offering RSS feeds for both of my blogs as well as for a Google News search of the term “digital humanities,” my delicious bookmarks for “digital scholarship,” links to my various digital humanities projects, and more.

I’ll admit that when I started my experiments with social scholarship I worried that no one would care, or that I would embarrass myself by writing something really stupid, but so far I’ve loved the experience.  Through comments and emails from readers, I’m able to see other perspectives and improve my own thinking.  I’ve heard from biologists and anthropologists as well as literary scholars and historians, and I’ve communicated with researchers from several countries.  As a result, I feel more engaged in the research community and more motivated to keep working.   Although I know blogging hasn’t caught on in every corner of academia, I think it has been good for my career as a digital humanist.  I am more visible and thus have more opportunities to participate in the community, such as by reviewing book proposals, articles, and grant applications.

I don’t have space to discuss the relevance of each scholarly primitive to my project, but I did want to mention a few of them: discovering, comparing, and representing.

Discovering

In order to use text analysis and other tools, I needed my research materials to be in an electronic format.  In the age of mass digitization projects such as Google Books and the Open Content Alliance, I wondered how many of my 296 original research sources are digitized & available in full text.  So I diligently searched Google Books and several other sources to find out.  I looked at 5 categories: archival resources as well as primary and secondary books and journals.   I found that with the exception of archival materials, over 90% of the materials I cited in my bibliography are in a digital format.  However, only about 83% of primary resources and 37% of the secondary materials are available as full text.  If you want to do use text analysis tools on 19th century American novels or 20th century articles from major humanities journals, you’re in luck, but the other stuff is trickier because of copyright constraints.  (I’ll throw in another scholarly primitive, annotation, and say that I use Zotero to manage and annotate my research collections, which has made me much more efficient and allowed me to see patterns in my research collections.)

Of course, scholars need to be able to trust the authority of electronic resources.  To evaluate quality, I focused on four collections that have a lot of content in my field, 19th century American literature: Google Books, Open Content Alliance, Early American Fiction (a commercial database developed by UVA’s Electronic Text Center), and Making of America.  I found that there were some scanning errors with Google Books, but not as many as I expected. I wished that Google Books provided full text rather than PDF files of its public domain content, as do Open Content Alliance and Making of America (and EAF, if you just download the HTML).  I had to convert Google’s PDF files to Adobe Tagged Text XML and got disappointing results.  The OCR quality for Open Content Alliance was better, but words were not joined across line breaks, reducing accuracy.  With multi-volume works, neither Open Content Alliance nor Google Books provided very good metadata.  Still, I’m enough of a pragmatist to think that having access to this kind of data will enable us to conduct research across a much wider range of materials and use sophisticated tools to discern patterns – we just need to be aware of the limitations.

Comparing
To evaluate the power of text analysis tools for my project, I did some experiments using TAPOR tools, including a comparison of two of my key bachelor texts: Mitchell’s Reveries of a Bachelor, a series of a bachelor’s sentimental dreams (sometimes nightmares) about what it would be like to be married, and Melville’s Pierre, which mixes together elements of sentimental fiction, Gothic literature, and spiritualist tracts to produce a bitter satire.   I wondered if there was a family resemblance between these texts.  First I used the Wordle word cloud generator to reveal the most frequently appearing words.  I noted some significant overlap, including words associated with family such as mother and father, those linked with the body such as hand and eye, and those associated with temporality, such as morning, night, and time.  To develop a more precise understanding of how frequently terms appeared in the two texts and their relation to each other, I used TAPOR’s Comparator tool.  This tool also revealed words unique to each work, such as “flirt” and “sensibility” in the case of Reveries, “ambiguities” and “miserable” in the case of Pierre.  Finally, I used TAPOR’s concordance tool to view key terms in context.  I found, for instance, that in Mitchell “mother” is often associated with hands or heart, while in Melville it appears with terms indicating anxiety or deceit.  By abstracting out frequently occurring and unique words, I can how Melville, in a sense, remixes elements of sentimental fiction, putting terms in a darker context.  The text analysis tools provide a powerful stimulus to interpretation.

Representing
Not only am I using the computer to analyze information, but also to represent my ideas in a more media-rich, interactive way than the typical print article.  I plan to experiment with Sophie as a tool for authoring multimodal scholarship, and I’m also experimenting with video as a means for representing visual information. Right now I’m reworking an article on the publication history of Reveries of a Bachelor as a video so that I show significant visual information such as bindings, illustrations, and advertisements.    I’ve condensed a 20+ page article into a 7 minute narrative, which for a prolix person like me is rough.  I also have been challenged to think visually and cinematically, considering how the movement of the camera and the style of transitions shape the argument.  Getting the right imagery—high quality, copyright free—has been tricky as well.  I’m not sure how to bring scholarly practices such as citation into videos.  Even though my draft video is, frankly, a little amateurish, putting it together has been lots of fun, and I see real potential for video to allow us to go beyond text and bring the human voice, music, movement and rich imagery into scholarly communication.

On Tools
In the course of my experiments in digital scholarship, I often found myself searching for the right tool to perform a certain task.  Likewise, in my conversations with researchers who aren’t necessarily interested in doing digital scholarship, just in doing their research better, I learned that they weren’t aware of digital tools and didn’t know where to find out about them.  To make it easier for researchers to discover relevant tools, I teamed up with 5 other librarians to launch the Digital Research Tools, or DiRT, wiki at the end of May.   DiRT provides a directory of digital research tools, primarily free but also commercial, categorized by their functions, such as “manage citations.”  We are also writing reviews of tools geared toward researchers and trying to provide examples of how these tools are used by the research community.  Indeed, DiRT focuses on the needs of the community; the wiki evolves thanks to its contributors.   Currently 14 people in fields such as anthropology, communications, and educational technology have signed on to be contributors.  Everything is under a Creative Commons attribution license.  We would love to see spin-offs, such as DiRT in languages besides English; DiRT for developers; and Old DiRT (dust?), the hall of obsolete but still compelling tools.  My experiences with DiRT have demonstrated again the beauty of collaboration and sharing.  Both Dan Cohen of CHNM & Alan Liu of UC Santa Barbara generously offered to let us grab content from their own tools directories.  Busy folks have freely given their time to add tools to DiRT.  Through my work on DiRT, I’ve learned about tools outside of my field, such as qualitative data analysis software.

So I’ll end with an invitation: Please contribute to DiRT.  You can sign up to be an editor or reviewer, recommend tools to be added, or provide feedback via our survey.  Through efforts like DiRT, we hope to enable new digital scholarship, raise the profile of inventive digital tools, and build community.

Using Text Analysis Tools for Comparison: Mole & Chocolate Cake

How can text analysis tools enable researchers to study the relationships between texts? In an earlier post, I speculated about the relevance of such tools for understanding “literary DNA”–how ideas are transmitted and remixed–but as one reader observed, intertextuality is probably a more appropriate way of thinking about the topic. In my dissertation, I argue that Melville’s Pierre represents a dark parody of Mitchell’s Reveries of a Bachelor. Melville takes the conventions of sentimental bachelor literature, mixes in elements of the Gothic and philosophic/theological tracts, and produces a grim travesty of bachelor literature that makes the dreaming bachelor a trapped quasi-husband, replaces the rural domestic manor with a crowded urban apartment building, and ends in a real, Hamlet-intense death scene rather than the bachelor coming out of reverie or finding a wife. Would text analysis tools support this analysis, or turn up patterns that I had previously ignored?

I wanted to get a quick visual sense of the two texts, so I plugged them into Wordle, a nifty word cloud generator that enables you to control variables such as layout, font and color. (Interestingly, Wordle came up with the perfect visualizations for each text at random: Pierre white type on a black background shaped into, oh, a chess piece or a tombstone, Reveries a brighter, more casual handwritten style, with a shape like a fish or egg.)

Wordle Word Cloud for Pierre

Wordle Reveries Word Cloud

Using these visual representations of the most frequent words in each book enabled me to get a sense of the totality, but then I also drilled down and began comparing the significance of particular words. I noted, for instance, the importance of “heart” in Reveries, which is, after all, subtitled “A Book of the Heart.” I also observed that “mother” and “father” were given greater weight in Pierre, which is obsessed with twisted parental legacies. To compare the books in even more detail, I decided to make my own mashed up word cloud, placing terms that appeared in both texts next to each other and evaluating their relative weight. I tried to group similar terms, creating a section for words about the body, words about feeling, etc. (I used crop, copy and paste tools in PhotoShop to create this mashup, but I’m sure–or I sure hope–there’s a better way.

Comparison of Reveries and Pierre(About three words into the project, I wished for a more powerful tool to automatically recognize, extract and group similar words from multiple files, since my eyes ached and I had a tough time cropping out words without also grabbing parts of nearby words. Perhaps each word would be a tile that you drag over to a new frame and move around; ideally, you could click on the word and open up a concordance) My mashup revealed that in many ways Pierre and Reveries have similar linguistic profiles. For instance, both contain frequently-occurring words focused on the body (face, hand, eye), time (morning, night), thinking, feeling, and family. Perhaps such terms are common in all literary works (one would need to compare these works to a larger literary corpus), but they also seem to reflect the conventions of sentimental literature, with its focus on the family and embodied feeling (see, for instance, Howard).

The word clouds enabled me to get an initial impression of key words in the two books and the overlap between them, but I wanted to develop a more detailed understanding. I used TAPOR’s Comparator to compare the two texts, generating a complete list of how often words appeared in each text and their relative weighting. When I first looked at the the word list, I was befuddled:

Words Reveries counts Reveries relative counts Pierre relative Pierre counts Relative ratio Reveries:Pierre
blaze 45 0.0007 0 1 109.4667

What does the relative ratio mean? I was starting to regret my avoidance of all math and stats courses in college. But after I worked with the word clouds, the statistics began to make more sense. Oh, relative ratio means how often a word appears in the first text versus the second–“blaze” is much more prominent in Reveries. Ultimately I trusted the concreteness and specificity of numbers more than the more impressionistic imagery provided by the word cloud, but the word cloud opened up my eyes so that I could see the stats more meaningfully. For instance, I found that mother indeed was more significant in Pierre, occurring 237 times vs. 58 times in Reveries. Heart was more important in Reveries (a much shorter work), appearing 199 times vs. 186 times in Pierre. I was surprised that “think” was more significant in Reveries than in Pierre, given the philosophical orientation of the latter. With the details provided by the text comparison results, I could construct an argument about how Melville appropriates the language of sentimentality.

But the differences between the two texts are perhaps even more interesting than their similarities, since they show how Melville departed from the conventions of male sentimentalism, embraced irony, and infused Pierre with a sort of gothic spirtualism. These differences are revealed more fully in the statistics than the word clouds. A number of terms are unique to each work. For instance, sentimental terms such as “sympathies,” “griefs,” “sensibility” appear frequently in Reveries but never in Pierre, as do romantic words such as “flirt,” “sparkle,” and “prettier.” As is fitting for Melville, Pierre‘s unique language is typically darker, more archaic, abstract, and spiritual/philosophical, and obsessed with the making of art: “portrait,” “writing,” “original,” “ere,” “miserable,” “visible,” “invisible,” “profound(est),” “final,” “vile,” “villain,” “minds,” “mystical,” “marvelous,” “inexplicable,” “ambiguous.” (Whereas Reveries is subtitled “A Book of the Heart,” Pierre is subtitled “The Ambiguities.”) There is a strand of darkness in Mitchell–he uses “sorrow” more than Melville–but then Mitchell uses “pleasure” 14 times to Melville’s 2 times and “pleasant” 43 times. Reveries is more self-consciously focused on bachelorhood; Mitchell uses “bachelor” 28 times to Melville’s 5. Both authors refer to dreaming; Mitchell uses “reveries” 10 times, Melville 7. Interestingly, only Melville uses “America” (14 times).

Looking over the word lists raises all sorts of questions about the themes and imagery of each work and their relationship to each other, but the data can also be overwhelming. If comparing two works yields over 10,000 lines in a spreadsheet, what criteria should you use in deciding what to select (to use Unsworth’s scholarly primitive)? What happens when you throw more works into the mix? I’m assuming that text mining techniques will provide more sophisticated ways of evaluating textual data, allowing you to filter data and set preferences for how much data you get. (I should note that you can exclude terms and set preferences in TAPOR).

Text analysis brings attention to significant features of a text by abstracting those features–for instance, by generating a word frequency list that contains individual words and the number of times they appear. But I kept wondering how the words were used, in what context they appeared. So Melville uses “mother” a lot–is it in a sweetly sentimental way, or does he treat the idea of mother more complexly? By employing TAPOR’s concordance tool, you can view words in context and see that Mitchell often uses mother in association with words like “heart,” “kiss,” “lap,” while in Melville “mother” does appear with “Dear” and “loving,” but also with “conceal,” “torture,” “mockingly,” “repelling,” “pride,” “cruel.” Hmmm. In Mitchell, “hand” most often occurs with “your” and “my,” signifying connection, while “hand” in Pierre is more often associated with action (hand-to-hand combat, “lift my hand in fury,” etc) or with putting hand to brow in anguish. Same word, different resonance. It’s as if Melville took some of the ingredients of sentimental literature and made something entirely different with them, enchiladas mole rather than a chocolate cake.

Word clouds, text comparisons, and concordances open up all sorts of insights, but how does one use this evidence in literary criticism? If I submitted an article full of word count tables to a traditional journal, I bet the editors wouldn’t know what to do with it. But that may change, and in any case text analysis can inform the kind of arguments critics make. My experience playing with text analysis tools verifies, for me, Steve Ramsay’s recommendation that we “reconceive computer-assisted text analysis as an activity best employed not in the service of a heightened critical objectivity, but as one that embraces the possibilities of that deepened subjectivity upon which critical insight depends.”

Works Cited

Howard, June. “What Is Sentimentality?.” American Literary History 11.1 (1999): 63-81. 22 Jun 2008 <http://alh.oxfordjournals.org/cgi/content/citation/11/1/63&gt;.

Ramsay, Stephen. “Reconceiving Text Analysis: Toward an Algorithmic Criticism.” Lit Linguist Computing 18.2 (2003): 167-174. 27 Nov 2007 <http://llc.oxfordjournals.org/cgi/content/abstract/18/2/167&gt;.

THAT Camp Takeaways

My work has been so all-consuming lately that it feels like THAT Camp was months rather than a couple of weeks ago, but I wanted to offer a few observations about THAT Camp before they go completely stale. Like many others, I found THAT Camp much more satisfying than the typical academic conference, since it promoted a strong sense of community (in part by using technologies such as pre-conference blogging and Twitter), was organized around the interests of participants, and encouraged the open exchange of ideas. Academic conferences typically have three functions: 1) to disseminate new ideas; 2) to bring people together to explore those ideas (and share a few beers in the process); and 3) to provide a line on the CV certifying that a scholar is actually making contributions to the research community. THAT Camp excelled at fulfilling the first two functions, and I’m hopeful that search committees and tenure committees (at least in certain communities) will see THAT Camp on a CV and think, “Wow, this person is an innovator!” Besides, the ideas generated and collaborations formed at THAT Camp will likely lead to more lines (academic merit badges?) on CVs.

I don’t have the time—and the reader probably doesn’t have the patience—to describe everything I learned at THAT Camp, but I wanted to highlight a few of the most intriguing projects or compelling ideas.

1) It’s the people, stupid.

I helped to organize a session on emerging research methods and expected that we would focus on how technologies such as visualization and text mining are opening up new approaches to scholarly inquiry. Instead, we spent most of our time engaged in a fruitful discussion about the importance—and difficulty—of collaboration, positing it as the “scholarly primitive” missing from John Unsworth’s list of core research activities. Perhaps the defining statement of the session was one person’s observation that “the cyberinfrastructure is people.” As THAT Camp itself demonstrated, collaboration enables people to develop better ideas, share the workload, sustain projects, and ultimately have a greater impact in the field, but encouraging people to share requires changes in culture and incentive systems.

2) New tools are enabling people to share annotations, resources, and work.

If collaboration is a key research process, there are some really cool tools under development that will support it. For instance, Ben Brumfield demonstrated FromThePage, a tool that allows people (historians, genealogists, history buffs) to transcribe documents, zoom in on manuscript pages, collaborate with others to identify tasks and check their work, view subjects, and more. Travis Brown is working on eComma, which “will enable groups of students, scholars, or general readers to build collaborative commentaries on a text and to search, display, and share those commentaries online.” And then there’s Zotero 2.0, which will let researchers share their collections with others.

3) Through visualization tools, researchers can make sense of a vast amount of information.

For instance, Jeanne Kramer-Smyth demonstrated ArchivesZ, which enables users of archives to visualize how much material (e.g., how many linear feet) is available in an archive related to a particular topic.

4) GIS technologies offer real analytical power, showing changes across time and space, land ownership patterns, and much more.

In a rich session on GIS tools, Josh Greenberg demonstrated how an historical map of New York could be overlaid on a contemporary Google Map, enabling one to view the development of the city. Mikel Maron discussed Open Street Map, a free and open map of the world to which people regularly contribute data. And I was delighted to learn from Shekhar Krishnan that Zotero will be releasing a mapping plug-in that will allow you to view the publication location of works in a collection on a Google Map. I had planned to create my own Google Map showing where bachelor literature was published by extracting the necessary data from Zotero, but, hooray, now I don’t have to go through the extra work. (See http://www.diigo.com/user/lspiro/GIS for more cool GIS projects).

Research Methods Session at THAT Camp

This weekend I’m at THAT Camp, which is bringing together programmers, librarians, funding officers, project managers, mathematicians, historians, philosophers, literary scholars, linguists, etc. to discuss the digital humanities. Sponsored by the Center for History and New Media at George Mason University, THAT CAMP is an un-conference, which means that ideas for sessions emerged organically out of blog posts preceding the gathering and out of a discussion held when the Camp began. As a result of all of the sharing of ideas via blogging and social networking via Twitter, the meeting seems much more intimate, open, and lively than your average conference. People who are passionate and curious about the digital humanities are coming together to talk about teaching, gaming, visualization, project sustainability, etc., and to learn how to hack Zotero and Omeka, build a simple history appliance, and more. As many folks have commented, the toughest part of THAT Camp is deciding which of the four sessions to attend–I want to go to them all. Kudos to CHNM for organizing and hosting the event–I bet some exciting initiatives and collaborations will come out of THAT Camp.

Yesterday afternoon I facilitated a session on research methods. At the request of some of the participants, I’m posting the rough notes I took during this rich discussion.

Touchstones/ pump priming quotations for the session:

  • “Research in the humanities, then, is and has been an activity characterized by the four Rs: reading, writing, reflection, and rustication. If these are the traditional research methods in the humanities, what will “new research methods” look like–and more importantly, why do we need them?”—John Unsworth, New Methods for Humanities Research
  • “The day will come, not that far off, when modifying humanities with ‘digital’ will make no more sense than modifying humanities with ‘print.’” –Steve Wheatley, ACLS
  • Unsworth, Scholarly Primitives: “some basic functions common to scholarly activity across disciplines, over time, and independent of theoretical orientation.” Unsworth lists the following scholarly primitives:
    • Annotating
    • Comparing
    • Discovering
    • Illustrating
    • Referring
    • Representing
    • Sampling
  • “What is a literary-critical ‘problem?’ How is it different from a scientific “problem?””—Steve Ramsay

Discussion

EXPERTISE AND INFORMATION FLUENCY

  • Old method: Scholars would find things in the archive, bring them back, provide people w/ information.
  • New: scholars face a deluge of information.
  • Old assumption: info is hard to get to, need to expertise to find stuff.
  • New: expertise shifts from finding to filtering and sorting
  • The point of a research method is figuring out how to filter, sort. A bibliography is not a list of Google links; you need to be familiar w/ major sources in field.
  • Experts know how to discern bias; Filtering requires expertise.
  • Expertise=familiarity with conceptual/ theoretical approaches in field. Scholars get a sense of theoretical approach by looking at the bibliography—it’s metadata about the book
  • Scholars need to inform students about problems with resources they find. New problems arise with digital—important to know weaknesses of Google Books. Need to teach students to question how resource/ tool created—what it does and doesn’t do.
  • The student world is digital—they need to learn how to operate responsibly in it
  • Two webs: open access, proprietary/ walled off. Students need to be aware of it—not everything is in Google.
  • But it’s also important to meet students where they start—even faculty start with Google; make metadata open so it’s discoverable. Implications of stuff not being accessible—it’s ignored.
  • Old model: one expert—you had to read the one book on the subject. Now there’s a huge amounts of data, need multiple interfaces to all of it. Need to provide multiple pathways to data. RDF key.
  • If you’re used to do something a particular way, it’s hard to change that.
  • Origins of print: first people to adopt print were different groups using it for their own agenda. Later library science came along to collect and curate content. Print media enabled new ways of doing existing scholarship. New disciplines developed, such as finding and keeping print materials (librarianship) and the study of books as physical objects. Same thing in shift to digital: there are specialists who focus on the technical side, like building tools. There are scholars, who want to use this stuff and don’t need to know the technical details.

INTEGRATED RESEARCH PROCESSES/ TOOLS

  • At the recent New Horizons conference, Geoffrey Rockwell spoke on mass digitization and the process of research. Search is not that simple—there are multiple places to look. The problem of selection→ how do you decide what makes sense. Then there’s serendipity. How do scholars negotiate mass of stuff? How do they make sense of it, select it? Tools like Zotero help you to share & select info; then you leave Zotero and write paper separately. With textual analysis tools, there’s no way to take textual data and link to publication → you need a relationship to textual analysis work. Can integrated tools be developed so that discovery, search, data collection, analysis, etc. can be carried right through publication in journal, Omeka, etc?

COLLABORATION

  • Sharing should be one of the scholarly primitives. We’re sharing in new ways. The speed & scale of what you share is changing.
  • How do you cut across disciplines? People from different fields have difft takes: literature vs history vs art; different methods, not much cross-fertilization
  • Pronetos: scholars throughout the world get a single place to go to network and engage with other scholars. Organic—if you’re an American historian, you can create an American history group if it doesn’t already exist. Takes on the problem of how to help people network.
  • Zotero Commons will facilitate sharing of expertise, as you can find an expert sharing a particular bibliography.
  • Opening up projects, creating communities around them helps with sustainability
  • Most transformative aspect of new research methods is establishing scholarly networks, collaborative aspect
  • How do you track your efforts in collaboration so that you can document what deserves to be rewarded?
  • Teach collaboration by modeling it for students
  • Sharing depends on discipline—people working on patents don’t necessarily share.
  • Humanists have trouble with sharing—for instance, some NINES users wanted to make tags private
  • Not sharing will become a problem in the long term, since it leads to duplication of effort and unnecessary competition. You can collaborate to come up with a better project.
  • Information gets out quickly, danger is in not sharing–that’s when you get scooped.
  • It’s not the technology that enables the sharing—it’s the people. There’s concern about retaining rights, getting credit, getting ripped off. People are building projects (e.g. institutional repositories) and users are not coming. How will people be encouraged to share?
  • People tend to share within discipline rather than institution.
  • What’s the relationship btwn repositories, blogs, Omeka installations, etc.? Importance of data aggregation, globalization.
  • Cyberinfrastructure is people
  • They’ve been pushing knowledge management practices in the business world for decades, and they still haven’t cracked it.
  • Mashups—pieces in place to make scholars see potential, but haven’t been realized yet.
  • With openly shared research, you facilitate interdisciplinarity and get research out to more people. Institutional repositories (IR) are key for this.
  • IRs are siloed—but w/ Zotero Commons, institution is everyone.
  • If you put your research out there, you’re staking it—not getting scoped.
  • If scholars blog their work at the early stage, they may wonder: are they putting it out too early?
  • What is it about sharing that’s changing over time?
  • Do humanities departments who want to do digital need a marketing department to help people discover their work?
  • Role of libraries as marketing depts., making resources accessible.
  • Professional societies need to step up b/c it’s not realistic for individual schools to do the marketing of digital scholarship.
  • Should professional societies launch their own version of Facebook?
  • We need to get away from the silos.
  • Peer review is a kind of social network.
  • Media Commons: social network for peer review of online texts using CommentPress, etc. Slashdot: reputation ranking, etc. (morphed into peer review)
  • Offer interfaces inflected given different disciplines: NINES, 18th C Connect
  • NINES an example of peer review for digital scholarship. 22 sites peer-reviewed by NINES—22 of first 105 to be put in MLA Bibliography.
  • Journal gives seal of approval—haven’t come up with that kind of stamp for digital world. Part of fear about blogging iis that it’s not peer reviewed
  • Blogrolls are a form of peer review–to find good stuff, you look at Matt Kirschenbaum’s blog to see who he reads.
  • Rotunda/ digital publisher as stamp of approval.
  • There are different standards for digital and print. A Nature study of online peer review found it doesn’t work. But there were something like 40 comments in 6 months—isn’t that success, when in normal peer review it would take 1-2 years to get 3 comments? Why is there such a high bar for digital scholarship?
  • Noah Wadrip-Fruin’s peer review processes, different feedback overall from both online/blog-based and traditional peer review.
  • Scholarship over time: digital projects, when do they end?

SHIFT FROM PRINT TO DIGITAL

  • How are traditional research methods tied to the printed book?
  • Interpretation: job of historian is to make sense of what things mean. We’re in the land grab stage right now—dump stuff online, then begin to wall it off. It’s still early—at the ground floor of something that could be big.
  • Historians typically narrativize events. At Miami U. they developed a tool to transform a short story into different genre—for instance, from horror to epic. Students learned elements of genre, wrote XSLT stylesheets to do the transformations.
  • Researchers could try out different narratives on data sets—picking out certain aspects. Historical narrativizing tied to print; digital enables historical multi-narrative. With digital, you can see what breaks when you change parameters.
  • Print to digital: transition from narrative to simulation, counter-factuals
  • How do you read? How many books do you have open?
    o Former practices: contraptions to hold multiple books open. Some ways of laying out books made them a database.
    o How does that work now? Ray Siemens: exploring idea of reading. Tools for document triage

USEFULNESS OF THE TERM “DIGITAL HUMANITIES”

  • The problem of naming a new digital humanities research center: Faculty advisers focused on the word “humanities”—what about social sciences, arts, etc.
  • When does the digital label drop out—or is it useful in defining what you do?
  • NEH Digital Humanities Office: NEH has been doing digital humanities for a long time: it funded TEI 20 years ago. But establishing the office helps to validate digital scholarship.
  • Specialists focus on certain areas of theory–we have the deconstruction scholars who specialize in the field, but their ideas permeate throughout the humanities. Similarly, digital humanists will be the lead group of folks who do digital work, but it will filter down into common research practice.
  • Digital humanities researchers need to make the case for a new methodology.
  • Digital” useful b/c we are at an early stage—people still wonder what it means to be digital.
  • “Digital humanities” brings together technical skills and humanistic knowledge. Creating a DTD is a fascinating part of digital humanities; sounds like computer stuff, but it’s fundamentally humanities.
  • A tension: bibliography used to be core work, but that kind of work doesn’t necessarily get you tenure now. There’s real suspicion about whether this is truly humanities work.
  • Digital humanities includes tool developers, text encoders, people who use digital methods, as well as those who study digital culture, e.g. video games, underlying structures of social environment. Object they study is digital.
  • Divide between game/ film studies and textual digital humanities.
  • Jerry McGann: “humanists have always worked to preserve and interpret human record. Digital humanities is doing it in digital form.”
  • ADHO used to focus on the textual digital humanities, but is reaching out to digital theorists/ art, etc.
  • There’s a significant skill set to doing digital humanities work. Many scholars don’t really appreciate what it takes to produce digital resources—it’s not just scanning documents.
  • Theoreticians: need a little more dirt under their fingernails—they need to get experience doing these projects to inform their theorizing.