Category Archives: tools

Presentation on How Digital Humanists Use GitHub

At Digital Humanities 2016, Sean Morey Smith and I presented on our ongoing work examining GitHub as a platform of knowledge for digital humanities. Our results are still preliminary, but we want to share our presentation (PDF). We’re especially grateful to those who agreed to be interviewed for the study and who took our survey. We expect to produce an article (or two) based on our research.

We welcome any questions or feedback.

Studying How Digital Humanists Use GitHub

Over the past academic year, I’ve been fortunate to participate in Rice’s Mellon-sponsored Sawyer Seminar on Platforms of Knowledge, where we’ve examined platforms for authoring, annotation, mapping, and social networking. We’ve discussed both the possibilities that platforms may open up for inquiry, public engagement and scholarly communications and the risks that they may pose for privacy and nuanced humanistic analysis. Inspired by the questions raised by the Seminar, my colleague Sean Smith and I are studying a platform used by a number of digital humanists: GitHub. Digital humanists employ GitHub not only for code, but also for writing projects, syllabi, websites, and other scholarly resources. We’ll present our initial findings at Digital Humanities 2016, but I wanted to offer some background to the study, especially since some of you will soon be receiving emails from me inviting you to participate in it.

Initially I was interested in using GitHub for a case study of how we assess and select digital platforms. Even as many researchers (myself included) rely on digital platforms, I haven’t been able to find many clear rubrics for evaluating them. Building on Quinn Dombrowski’s recommendations for choosing a platform for a web project, we are looking at criteria such as functionality and ease of use. In previous work examining archival management systems, I learned how important it is to talk with users about their experience with tools, so we will be conducting a survey and interviews about GitHub. Sean and I also also realized that GitHub itself provides valuable data about how people use GitHub, such as information about collaboration, code re-use, and connections to others. Our study will thus include analysis of publicly available data about selected GitHub users and repositories. (Of course, there is significant prior work on this topic in fields such as social computing that we will draw upon.)

With this project, we are:

  1. Identifying digital humanists who have GitHub accounts. For the purposes of this study, we are looking at presenters at the last three Digital Humanities conferences and people affiliated with organizations that belong to centerNet (assuming that the information is publicly available). Of course, this method is imperfect– it misses digital humanists who didn’t attend the DH conferences or who aren’t affiliated with DH centers, and it may include some people who don’t really consider themselves digital humanists. But it’s a start.
  2. Contacting those whose email addresses are easily retrievable (e.g. available via GitHub) and:
    1. Giving them the opportunity to opt out of having their publicly available GitHub data being included in our analysis and in the dataset that we plan to share at the end of the study. (Added 5/18/16: To be extra careful, we plan to anonymize this dataset.)
    2. Inviting them to take a brief survey about their usage and opinions of GitHub
    3. Inviting them to participate in an interview

    We may also contact people whose emails aren’t in the GitHub data but are otherwise available.

  3. Analyzing GitHub data from our dataset to gain insight into how digital humanists use GitHub.

We want to conduct this study openly while at the same respecting privacy. In conducting interviews for past studies, I’ve been frustrated that I can’t publicly identify and credit people who have made brilliant comments because of the promise of confidentiality.  So we’re giving interviewees the option to make all or some of their interview notes public–but of course they can instead keep the notes private and remain anonymous. Survey data will be anonymized but ultimately shared.

Here are important documents related to our study:

I welcome feedback and questions about this study. I hope that it will contribute to developing criteria for evaluating platforms like GitHub and offer insights into how digital humanities researchers and developers work.

Digital Pedagogy in Practice: Workshop Materials

On Saturday, March 2, I gave a workshop on digital (humanities) pedagogy for a group of about 20 faculty and staff at Gettysburg College.  I was impressed by the participants’ energy, openness, smarts, and playfulness.  We had fun!

I designed the workshop so that it moved through four phases, with the goal of participants ultimately walking away with concrete ideas about how they might integrate digital approaches into their own teaching:

1)  We explored the rationale for digital pedagogy (pdf of slides), discussing what students need to know in the 21st century, different frameworks for digital pedagogy (e.g. learning science, liberal education,  social learning, and studio learning), and definitions of digital pedagogy and the “digital liberal arts.” I started the session with Cathy Davidson’s exercise in which audience members first jot down on an index card three things they think students need to know in order to thrive in the digital age, then share their ideas with someone they didn’t walk in with, and finally work together to select the one key idea. (The exercise got people thinking and talking.)

2)   In the second session, I gave a brief presentation (pdf) offering specific case studies of digital pedagogy in action (repurposing some slides I’d used for previous workshops). Participants then broke up into groups to analyze an assignment used in a digital humanities class.

3)   Next participants worked in small groups to explore one of the following:

I structured the exercise so that participants first looked at the particular applications of the tool in teaching and scholarship (e.g. Mapping the Republic of Letters and Visualizing Emancipation in the session on information visualization), then played with a couple of tools in order to understand how they work, and finally reflected on the advantages and disadvantages of each tool and their potential pedagogical applications. I deliberately kept the exercises short and simple, and I tried to make them relevant to Gettysburg, drawing data from Wikipedia and other open sources.

4)   Finally participants worked in small teams (set up according to discipline) to develop an assignment incorporating digital approaches.  We concluded the session with a modified gallery walk, in which people circulated through the room and chatted with a representative of each team to learn more about their proposed assignment.

By the end of the day, workshop participants seemed excited by the possibilities and more aware of specific approaches that they could take (as well as a bit exhausted). I got several questions about copyright, so in future workshops I plan to incorporate a more formal discussion of fair use, Creative Commons and the public domain.

Our workshop drew heavily on materials shared by generous digital humanities instructors. (In that spirit, feel free to use or adapt any of my workshop materials. And I’m happy to give a version of this workshop elsewhere.) My thinking about digital humanities pedagogy has been informed by a number of people, particularly my terrific colleague Rebecca Davis.

Slides and Exercises from “Doing Things with Text” Workshop

Last week I was delighted to be back at my old stomping grounds at Rice University’s Digital Media Commons to lead a workshop on “Doing Things with Text.” The workshop was part of Rice’s Digital Humanities Bootcamp Series, led by my former colleagues Geneva Henry and Melissa Bailar. I hoped to expose participants to a range of approaches and tools, provide opportunities for hands-on exploration and play, and foster discussion about the advantages and limitations of text analysis, topic modeling, text encoding, and metadata. Although we ran out of time before getting through my ambitious agenda, I hope my slides and exercises provide useful starting points for exploring text analysis and text encoding.

Archival Management Systems Report, Wiki & Webinar

[Note: Typically my blog focuses on digital humanities research, but this post discusses some of my related work examining software that helps archives streamline their workflows.]

As archives acquire collections, arrange them, describe them, manage them, and make them publicly available, they produce data in multiple formats, such as notecards, Word documents, Excel files, Access databases, XML (EAD) finding aids, web pages, etc.  Chris Prom suggests that some archives use so many tools in creating this data that their workflows “would make a good subject for a Rube Goldberg cartoon.”   As a result, archives replicate data and effort, struggle with versioning control, face challenges finding and analyzing archival information, and have difficulty making that information publicly available.   By using archival management systems such as Archon and Archivists’ Toolkit, however, archives can streamline the production of archival information; make it simpler to find information and generate reports; enable non-professionals to more easily create archival description;  conform to archival standards; and share information such as finding aids with the public.  To help guide the archival community in selecting the appropriate archival management system, I recently wrote a report for the Council on Library and Information Resources (CLIR).

Working on the report led me to several (admittedly non-revolutionary) insights:

  1. If you want to know what features software users need, ask them.   In the course of interviewing over 30 archivists and developers, I gained a greater understanding of key criteria for archival management software including flexibility, conformity to standards, support for an integrated workflow, ease of use, remote access (since archivists may do initial work processing collections off site), customization capabilities, ability to import and export data, etc.
  2. There is no one-size-fits-all tool.  Some archives prefer to use open source software; others are leery of open source, need a hosted solution, or require lots of support in importing and exporting data, customizing the user interface, etc.  Some archives need a way to publish archival information on the web; others want to export finding aids and pull them into existing publishing tools.
  3. Reports go out-of-date as soon as they are published.  Why not release the report as a wiki so that the community can keep it current and relevant?  With the support of CLIR, I’ve created a wiki called Archival Software.  Right now it more or less replicates the structure and content of my original report, but I hope that it evolves according to the needs of the community.   I invite members of the archival community to update the information, add new sections, restructure the wiki, and do whatever else makes it most useful.
  4. If archival management systems integrate and streamline the archival workflow from accessioning the collection to describing it to managing it to making it publicly available, what would an integrated research tool for the humanities look like–or would such a tool even be desirable or possible, given the variation in research practices? My first thought: Zotero with add-ons for analyzing information (perhaps similar to the tools under development by SEASR), authoring and sharing research  (like the Word plug-in or plug-ins for multimedia authoring or mashup creation, sharing via Internet Archive collaboration), etc.

On March 31, the Society of American Archivists (SAA) will offer a web seminar, Archival Content Management Systems, that is based upon my report.  The webinar will examine the case for archival management systems, explore selection criteria, and provide brief demonstrations of 3 systems.  I think there’s still time to register.  (Apologies for the self-promotion, but I wanted to get the word out…)

Using Text Analysis Tools for Comparison: Mole & Chocolate Cake

How can text analysis tools enable researchers to study the relationships between texts? In an earlier post, I speculated about the relevance of such tools for understanding “literary DNA”–how ideas are transmitted and remixed–but as one reader observed, intertextuality is probably a more appropriate way of thinking about the topic. In my dissertation, I argue that Melville’s Pierre represents a dark parody of Mitchell’s Reveries of a Bachelor. Melville takes the conventions of sentimental bachelor literature, mixes in elements of the Gothic and philosophic/theological tracts, and produces a grim travesty of bachelor literature that makes the dreaming bachelor a trapped quasi-husband, replaces the rural domestic manor with a crowded urban apartment building, and ends in a real, Hamlet-intense death scene rather than the bachelor coming out of reverie or finding a wife. Would text analysis tools support this analysis, or turn up patterns that I had previously ignored?

I wanted to get a quick visual sense of the two texts, so I plugged them into Wordle, a nifty word cloud generator that enables you to control variables such as layout, font and color. (Interestingly, Wordle came up with the perfect visualizations for each text at random: Pierre white type on a black background shaped into, oh, a chess piece or a tombstone, Reveries a brighter, more casual handwritten style, with a shape like a fish or egg.)

Wordle Word Cloud for Pierre

Wordle Reveries Word Cloud

Using these visual representations of the most frequent words in each book enabled me to get a sense of the totality, but then I also drilled down and began comparing the significance of particular words. I noted, for instance, the importance of “heart” in Reveries, which is, after all, subtitled “A Book of the Heart.” I also observed that “mother” and “father” were given greater weight in Pierre, which is obsessed with twisted parental legacies. To compare the books in even more detail, I decided to make my own mashed up word cloud, placing terms that appeared in both texts next to each other and evaluating their relative weight. I tried to group similar terms, creating a section for words about the body, words about feeling, etc. (I used crop, copy and paste tools in PhotoShop to create this mashup, but I’m sure–or I sure hope–there’s a better way.

Comparison of Reveries and Pierre(About three words into the project, I wished for a more powerful tool to automatically recognize, extract and group similar words from multiple files, since my eyes ached and I had a tough time cropping out words without also grabbing parts of nearby words. Perhaps each word would be a tile that you drag over to a new frame and move around; ideally, you could click on the word and open up a concordance) My mashup revealed that in many ways Pierre and Reveries have similar linguistic profiles. For instance, both contain frequently-occurring words focused on the body (face, hand, eye), time (morning, night), thinking, feeling, and family. Perhaps such terms are common in all literary works (one would need to compare these works to a larger literary corpus), but they also seem to reflect the conventions of sentimental literature, with its focus on the family and embodied feeling (see, for instance, Howard).

The word clouds enabled me to get an initial impression of key words in the two books and the overlap between them, but I wanted to develop a more detailed understanding. I used TAPOR’s Comparator to compare the two texts, generating a complete list of how often words appeared in each text and their relative weighting. When I first looked at the the word list, I was befuddled:

Words Reveries counts Reveries relative counts Pierre relative Pierre counts Relative ratio Reveries:Pierre
blaze 45 0.0007 0 1 109.4667

What does the relative ratio mean? I was starting to regret my avoidance of all math and stats courses in college. But after I worked with the word clouds, the statistics began to make more sense. Oh, relative ratio means how often a word appears in the first text versus the second–“blaze” is much more prominent in Reveries. Ultimately I trusted the concreteness and specificity of numbers more than the more impressionistic imagery provided by the word cloud, but the word cloud opened up my eyes so that I could see the stats more meaningfully. For instance, I found that mother indeed was more significant in Pierre, occurring 237 times vs. 58 times in Reveries. Heart was more important in Reveries (a much shorter work), appearing 199 times vs. 186 times in Pierre. I was surprised that “think” was more significant in Reveries than in Pierre, given the philosophical orientation of the latter. With the details provided by the text comparison results, I could construct an argument about how Melville appropriates the language of sentimentality.

But the differences between the two texts are perhaps even more interesting than their similarities, since they show how Melville departed from the conventions of male sentimentalism, embraced irony, and infused Pierre with a sort of gothic spirtualism. These differences are revealed more fully in the statistics than the word clouds. A number of terms are unique to each work. For instance, sentimental terms such as “sympathies,” “griefs,” “sensibility” appear frequently in Reveries but never in Pierre, as do romantic words such as “flirt,” “sparkle,” and “prettier.” As is fitting for Melville, Pierre‘s unique language is typically darker, more archaic, abstract, and spiritual/philosophical, and obsessed with the making of art: “portrait,” “writing,” “original,” “ere,” “miserable,” “visible,” “invisible,” “profound(est),” “final,” “vile,” “villain,” “minds,” “mystical,” “marvelous,” “inexplicable,” “ambiguous.” (Whereas Reveries is subtitled “A Book of the Heart,” Pierre is subtitled “The Ambiguities.”) There is a strand of darkness in Mitchell–he uses “sorrow” more than Melville–but then Mitchell uses “pleasure” 14 times to Melville’s 2 times and “pleasant” 43 times. Reveries is more self-consciously focused on bachelorhood; Mitchell uses “bachelor” 28 times to Melville’s 5. Both authors refer to dreaming; Mitchell uses “reveries” 10 times, Melville 7. Interestingly, only Melville uses “America” (14 times).

Looking over the word lists raises all sorts of questions about the themes and imagery of each work and their relationship to each other, but the data can also be overwhelming. If comparing two works yields over 10,000 lines in a spreadsheet, what criteria should you use in deciding what to select (to use Unsworth’s scholarly primitive)? What happens when you throw more works into the mix? I’m assuming that text mining techniques will provide more sophisticated ways of evaluating textual data, allowing you to filter data and set preferences for how much data you get. (I should note that you can exclude terms and set preferences in TAPOR).

Text analysis brings attention to significant features of a text by abstracting those features–for instance, by generating a word frequency list that contains individual words and the number of times they appear. But I kept wondering how the words were used, in what context they appeared. So Melville uses “mother” a lot–is it in a sweetly sentimental way, or does he treat the idea of mother more complexly? By employing TAPOR’s concordance tool, you can view words in context and see that Mitchell often uses mother in association with words like “heart,” “kiss,” “lap,” while in Melville “mother” does appear with “Dear” and “loving,” but also with “conceal,” “torture,” “mockingly,” “repelling,” “pride,” “cruel.” Hmmm. In Mitchell, “hand” most often occurs with “your” and “my,” signifying connection, while “hand” in Pierre is more often associated with action (hand-to-hand combat, “lift my hand in fury,” etc) or with putting hand to brow in anguish. Same word, different resonance. It’s as if Melville took some of the ingredients of sentimental literature and made something entirely different with them, enchiladas mole rather than a chocolate cake.

Word clouds, text comparisons, and concordances open up all sorts of insights, but how does one use this evidence in literary criticism? If I submitted an article full of word count tables to a traditional journal, I bet the editors wouldn’t know what to do with it. But that may change, and in any case text analysis can inform the kind of arguments critics make. My experience playing with text analysis tools verifies, for me, Steve Ramsay’s recommendation that we “reconceive computer-assisted text analysis as an activity best employed not in the service of a heightened critical objectivity, but as one that embraces the possibilities of that deepened subjectivity upon which critical insight depends.”

Works Cited

Howard, June. “What Is Sentimentality?.” American Literary History 11.1 (1999): 63-81. 22 Jun 2008 <;.

Ramsay, Stephen. “Reconceiving Text Analysis: Toward an Algorithmic Criticism.” Lit Linguist Computing 18.2 (2003): 167-174. 27 Nov 2007 <;.

Digging in the DiRT: Sneak Preview of the Digital Research Tools (DiRT) wiki

When I talk with researchers about a cool tool such as Zotero, they often ask, “Hey, how did you find out about that?” Not everyone has the time or inclination to read blogs, software reviews, and listserv announcements obsessively, but now researchers can quickly identify relevant tools by checking out the newly-launched Digital Research Tools (DiRT) wiki: DiRT lists dozens of useful tools for discovering, organizing, analyzing, visualizing, sharing and disseminating information, such as tools for compiling bibliographies, taking notes, analyzing texts, and visualizing data. We also offer software reviews that not only describe the tool’s features, strengths, and weaknesses, but also provide usage tips, links to training resources, and suggestions for how it might be implemented by researchers. So that DiRT is accessible to non-techies and techies alike, we try to avoid jargon and categorize tools by their functions. Although the acronym DiRT might suggest that it’s a gossip site for academic software, dishing on bugs and dirty secrets about the software development process, we prefer a gardening metaphor, as we hope to help cultivate research projects by providing clear, concise information about tools that can help researchers do their more work more effectively or creatively.

DiRT is brand new, so we’re still in the process of creating content and figuring how best to present it; consider it to be in alpha release and expect to see it evolve. (We plan to announce DiRT more broadly in a few months, but we’re giving sneak previews right now in the hope that comments from members of the digital humanities community can help us to improve it.) Currently the DiRT editorial team includes me, my ever-innovative and enthusiastic colleague Debra Kolah, and three whip-smart librarians from Sam Houston State University with expertise in Web 2.0 technologies (as well as English, history, business, and ranching!): Tyler Manolovitz, Erin Dorris Cassidy, and Abe Korah. We’ve committed to provide at least 5 new tool reviews per month, but we can do even more if more people join us (hint, hint). We invite folks to recommend research tools or software categories, write reviews, sign on to be co-editors, and/or offer feedback on the wiki. Please contact me at [Update: You can also provide feedback via this form.]

By the way, playing with DiRT has convinced me yet again of the value of collaboration. Everyone on the team has contributed great ideas about what tools to cover, what form the reviews should take, and how to promote and sustain the wiki. Five people can sure do a heck of a lot more than one–and have fun in the process.