Presentation on How Digital Humanists Use GitHub

At Digital Humanities 2016, Sean Morey Smith and I presented on our ongoing work examining GitHub as a platform of knowledge for digital humanities. Our results are still preliminary, but we want to share our presentation (PDF). We’re especially grateful to those who agreed to be interviewed for the study and who took our survey. We expect to produce an article (or two) based on our research.

We welcome any questions or feedback.

Studying How Digital Humanists Use GitHub

Over the past academic year, I’ve been fortunate to participate in Rice’s Mellon-sponsored Sawyer Seminar on Platforms of Knowledge, where we’ve examined platforms for authoring, annotation, mapping, and social networking. We’ve discussed both the possibilities that platforms may open up for inquiry, public engagement and scholarly communications and the risks that they may pose for privacy and nuanced humanistic analysis. Inspired by the questions raised by the Seminar, my colleague Sean Smith and I are studying a platform used by a number of digital humanists: GitHub. Digital humanists employ GitHub not only for code, but also for writing projects, syllabi, websites, and other scholarly resources. We’ll present our initial findings at Digital Humanities 2016, but I wanted to offer some background to the study, especially since some of you will soon be receiving emails from me inviting you to participate in it.

Initially I was interested in using GitHub for a case study of how we assess and select digital platforms. Even as many researchers (myself included) rely on digital platforms, I haven’t been able to find many clear rubrics for evaluating them. Building on Quinn Dombrowski’s recommendations for choosing a platform for a web project, we are looking at criteria such as functionality and ease of use. In previous work examining archival management systems, I learned how important it is to talk with users about their experience with tools, so we will be conducting a survey and interviews about GitHub. Sean and I also also realized that GitHub itself provides valuable data about how people use GitHub, such as information about collaboration, code re-use, and connections to others. Our study will thus include analysis of publicly available data about selected GitHub users and repositories. (Of course, there is significant prior work on this topic in fields such as social computing that we will draw upon.)

With this project, we are:

  1. Identifying digital humanists who have GitHub accounts. For the purposes of this study, we are looking at presenters at the last three Digital Humanities conferences and people affiliated with organizations that belong to centerNet (assuming that the information is publicly available). Of course, this method is imperfect– it misses digital humanists who didn’t attend the DH conferences or who aren’t affiliated with DH centers, and it may include some people who don’t really consider themselves digital humanists. But it’s a start.
  2. Contacting those whose email addresses are easily retrievable (e.g. available via GitHub) and:
    1. Giving them the opportunity to opt out of having their publicly available GitHub data being included in our analysis and in the dataset that we plan to share at the end of the study. (Added 5/18/16: To be extra careful, we plan to anonymize this dataset.)
    2. Inviting them to take a brief survey about their usage and opinions of GitHub
    3. Inviting them to participate in an interview

    We may also contact people whose emails aren’t in the GitHub data but are otherwise available.

  3. Analyzing GitHub data from our dataset to gain insight into how digital humanists use GitHub.

We want to conduct this study openly while at the same respecting privacy. In conducting interviews for past studies, I’ve been frustrated that I can’t publicly identify and credit people who have made brilliant comments because of the promise of confidentiality.  So we’re giving interviewees the option to make all or some of their interview notes public–but of course they can instead keep the notes private and remain anonymous. Survey data will be anonymized but ultimately shared.

Here are important documents related to our study:

I welcome feedback and questions about this study. I hope that it will contribute to developing criteria for evaluating platforms like GitHub and offer insights into how digital humanities researchers and developers work.

Submit Your Proposal for the Texas Digital Humanities Consortium One-Day Conference

I love the open, freewheeling conversations commonly found at THATCamps, but I sometimes wish that some sessions were more grounded in specificity–and that participants could get CV-worthy credit for leading them. At the Texas Digital Humanities Consortium’s May 27 mini-conference, we aim to mashup the best of THATCamp and traditional conferences: to provide a forum where a researcher or group of researchers will present their work for 15 minutes and then lead the participants in discussion or experimentation inspired by the presentation for the rest of the hour. We hope that this hybrid approach will give presenters the opportunity to share their work, get credit for it, and receive feedback on it and participants to explore issues raised by the session and generate new insights. This approach resembles one of my favorite class formats: begin with a brief lecture to establish the context, then launch into a dynamic discussion to allow for deeper exploration. For example, presenters might discuss a project to create a digital audio archive, then facilitate a discussion about challenges such as annotation and digital preservation. Or a session might focus on a GIS project to map patterns of oppression in a particular region, opening up into a conversation about how to deal with uncertainty in data and include the perspectives of oppressed communities. We’re open to a variety of approaches. All proposals will undergo peer review, which will ensure the quality of the conference. Please see the CFP at https://conferences.tdl.org/tcdl/index.php/TCDL/index/pages/view/txdhc

txdhc-tcdl-logo

The Texas Digital Humanities Consortium is organizing this mini-conference in collaboration with the fine folks at the Texas Conference on Digital Libraries (TCDL); it will be held immediately after TCDL at the Commons Learning Center on the J.J. Pickle Research Campus in Austin, Texas. We intend to keep the mini-conference to about 50 registrants, which should allow for rich conversation and networking. Through the event, we hope to deepen connections among scholars, librarians, cultural heritage professionals, technologists and graduate students.

The deadline for proposals is coming up soon on February 12, 2016 (note the new deadline). Feel free to send any questions to lisamspiro@gmail.com, and please help spread the word about the event. We look forward to some terrific proposals.

[cross-posted to TXDHC]

Exploring Digital Humanities and Media History

Some of my favorite conferences are those that are outside my field, since they expose me to new perspectives and enable me to meet new people. Such was the case with the ArcLight Symposium hosted in May by Concordia University. Sponsored by Project ArcLight, a Concordia/University of Wisconsin project to develop web-based tools for analyzing the rise of American media (movies, radio, TV, newspapers) in the 20th century, the symposium brought together film and media historians with digital humanists to explore the possibilities and pitfalls of digital methods for media history. Funded through a Digging into Data grant, the project builds upon the Media History Digital Library (MHDL), which contains 1.5 million pages of digitized trade journals, fan magazines, and other public domain media periodicals.ArcLight Logo

While some media historians apologized for not being particularly savvy about digital methods, some digital humanists (OK, this one) confessed to having a limited knowledge of film history. But Mark Williams rightly suggested that doing work in digital humanities requires moving outside your comfort zone, and the conference was the richer for people making such leaps. I was reminded of Elijah Meeks’ suggestion that “interloping, more than computational approaches or the digital broadly construed as the object of study, defines digital humanities.” Figuring out new methods or delving into unfamiliar subject areas necessitates interloping.

Rather than summarizing each paper presented at the symposium, I will highlight what emerged as important themes. See also Charlotte Fillmore-Handlon’s conference summary.

Core Principles

  • The importance of archives—physical and digital– to media history research. Such research often requires returning to the original record and paying attention to the material object. For example, as Haidee Wasson studies the history of portable film projectors, she needs to examine the original objects so that she can get a sense of their heft and design. At the same time, digital archives open up rich possibilities for discovering relevant resources and patterns, analyzing them, and sharing them. For example, the Media Ecology Project aims to enable researchers to access resources from media archives, create their own annotations, collections, and publications, and contribute metadata back, using tools such as Mediathread, Scalar and onomy.org.
  • The importance of thinking historically about primary source materials—to understand, for example, the structure, design and cultural history of newspapers, as Paul Moore and Sandra Gabriele pointed out. Likewise, we need to pay attention to the features of digital objects. Ryan Cordell emphasized that we should view digital resources as something entirely different than print, so that a digitized newspaper is not a duplicate of a print edition, but its own typeset edition.
  • The need to consider scope: I referred to Josh Sternfeld’s recommendation that historians deal with digital abundance by extending the principles of scope and provenance into the digital environment. In conducting analysis, it’s important to keep in mind what’s included–and what’s not–in the corpus. As Greg Waller pointed out, studying representation means looking at multiple forms of media—not just film, but sheet music, postcards, literature, etc. Where do you draw the boundaries?

Challenges

  •  Intellectual property: Despite digital plenty, researchers are often limited by paywalls and proprietary databases. MHDL tried to work with a commercial database, but was rebuffed; the global film data that underlies Deb Verhoeven’s Kinomatics research is expensive and cannot be shared. These challenges point to the need to support open projects such as MHDL whenever possible. An audience member urged greater risk-taking, as paranoia about intellectual property can lead scholars, librarians and institutions to hesitate in sharing cultural materials that are public goods.
  •  Ambiguity of humanities data. As many researchers have suggested, humanities data is often messy, dynamic and ambiguous. For example, Deb Verhoeven noted that the locations, size and status of theaters change over time. Likewise, Derek Long acknowledged the challenges of disambiguating film titles that include common words such as “war.”
  •  Reductiveness of ontologies. David Berry asked what happens when information is translated into an ontology, suggesting that DH imports instrumental methods into interpretation. Deb Verhoeven argued that imposing ontologies on diverse humanities data raises both practical and ethical issues. For example, with a typical ontology you can’t say that things aren’t related, and you can’t see who made what assertions. Hence HuNI (Humanities Networked Infrastructure), which brings together 31 Australian cultural datasets, eschews traditional ontologies. Instead, users generate connections through a vernacular linking process, using either free text or already established categories. They can also explore linkages created by others.

Approaches

  •  Searching as frustrating but also fundamental. Searching enables scholars to frame an inquiry, get a sense of what an archive might contain, and discover relevant materials. Eric Hoyt pointed to problems with search, including the risks of confirmation bias and of being overwhelmed by results, but also suggested that search is easily understood and widely used by most scholars. Scaled entity search brings greater power to search, allowing researchers to compare hundreds of entities (such as names, film titles, or radio station call letters) across the corpus and then generate visualizations to explore patterns. For example, you can compile a list of the most frequently mentioned people in silent film (which often turn out to be those who also headed production companies). ArcLight will be released later this summer.
    To create the entity list for film history, ArcLight uses its Early Film Credits (EFC) dataset. This dataset grew out of a 1996 database of American Film Personnel and Company Credits, 1908-1920, which is itself based on a 1976 book. As Derek Long showed, EFC enables you to, for example, generate a list of the number of films produced by a director in a particular year or the number of films made by different production companies (revealing the dominance of a few companies and the “long tail,” as most companies made only a few films).
  •  Pattern matching. As Ryan Cordell noted, searching doesn’t work for everything—for example, you can’t run a search for “show me all instances of reprinting in 19th century newspapers.” Much humanities work involves identifying and interpreting patterns; digital tools can support pattern matching on a much larger scale. For example, the Viral Text Project uses sophisticated algorithms to detect matches in thousands of pages of 19th century newspapers. For a human to take on this work would be nearly impossible. But the computer can reveal significant repeating patterns across a corpus like Chronicling America, enabling scholars to detect the circulation of ideas, debates over the authorship of a sentimental poem, and much more.
  •  Working at multiple scales. As Bobby Allen pointed out, the level of zoom “makes some relations visible but obscures others.”  Paul Moore and Sandra Gabriele spoke to the importance of middle-range reading, or “scalable reading.” Scalable reading involves moving from the macro to the middle to the micro—for example, looking at patterns of circulation at a distance, “following material links across media, form and genre” at the mid-range, and examining the “symbolic order” of a newspaper page close up.
  • Experimentation and iteration: Several speakers used variations of the term “experiment” to describe digital methods. For instance, Deb Verhoeven emphasized that working with the Kinomatics Project’s “big data” documenting the flow of international film across space and time requires experimentation and iteration. Haidee Wesson noted the “experimentalism” of trying out different search strategies, posing different questions of databases. Charles Acard offered an important caution as he suggested that experimentation in DH needs to be influenced by literary theory.
  • The need to engage public audiences. Bobby Allen and Deb Verhoeven spoke to the importance of thinking of humanities work as public goods. Bobby Allen suggested that researchers should “dig where you stand,” but connect local archives to the network. For example, the UNC Digital Innovation Lab’s Digital Loray project enables the public to explore the history of a textile mill in Gastonia, NC, using digital tools to share stories, maps and timelines and engage public audiences.

    Crowdsourcing represents one form of public engagement, but some raised concerns. What distinguishes public expertise from academic? Is crowdsourcing exploitative?

  •   Using visualization to represent complexity: How do you represent the magnitude of global cinema? Laura Horak sketched out a project to create flow maps to study the global circulation of film, describing different ways of visualizing such data and how such visualizations can shape understanding.

While participants used examples specific to media history, many of the concerns and approaches explored in the symposium have broader relevance. I was reminded of the importance of cultivating a critical awareness of methods and engaging in the kind of interdisciplinary dialogue that the ArcLight symposium fostered.

Disclosure: Most of my travel costs were covered by the symposium.

Rev, 6/13/15: Corrected spelling error.

Update on the Texas Digital Humanities Consortium

Organizations in the Boston area, Southern California, and New York City help area digital humanists connect with each other– and now Texas has its own DH group.  The Texas Digital Humanities Consortium (TXDHC) aims to enable Texas digital humanists to share knowledge, learn new skills and methods, and collaborate on research and educational projects. After a terrific first conference hosted by the University of Houston in April of 2014, the second Texas Digital Humanities conference will take place at the University of Texas-Arlington on April 9-11, 2015, with keynotes from Alan Liu, Adeline Koh and George Siemens. (Submit your paper proposal in by January 10.) Thanks to the work of Matt Christy at Texas A&M, the TXDHC website (built on Commons in a Box) allows members to create profiles, set up groups, participate in forums, and more. The TXDHC Steering Committee (which includes me, Jennifer Hecker, Laura Mandell, Rafia Mirza, Charlotte Nunes and Andrew Torget) is shaping the organization and planning upcoming events, including a virtual workshop. The TXDHC’s next online general meeting will take place on Thursday, December 4 from 3-4 p.m. and will include lightning talks by Tanya Clement and Charlotte Nunes, updates on the consortium’s activities, and an opportunity to share announcements and questions.

Interested in participating in the TXDHC? Sign up for the listserv, create an account on the website, and come to a meeting.  TXDHC is an informal, collaborative group; there are no membership fees or bureaucratic structures. Please get in touch with me (lisamspiro[at]gmail[dot]com) if you have questions or suggestions. As a scrappy new organization, TXDHC depends on the energy and ideas of its members.

Shaping (Digital) Scholars: Design Principles for Digital Pedagogy

I’m pleased to be offering a workshop at “Digital Pedagogy and the Undergraduate Experience: An Institute,” hosted by the University of Toronto Scarborough. My presentation, “Shaping (Digital) Scholars:  Design Principles for Digital Pedagogy” [pdf], offers a framework for designing assignments and other learning activities that help students develop digital fluencies and cultivate expertise in digital scholarship. I sketch out on three principles for digital pedagogy: hands-on/minds-on learning; networked, collaborative learning; and play.  To make the principles concrete and furnish inspiration, I offer a couple of examples under category (including some of my favorites from previous talks). I also look at some of the challenges facing this approach to teaching, such as evaluating student work and helping students develop technology skills. The workshop concludes with a hands-on, collaborative activity to design an assignment that realizes at least one of the principles of digital pedagogy.

Creating the Texas Digital Humanities Consortium

TXDHC-logo6At the Inaugural Texas Digital Humanities Consortium Conference (TXDHC) on April 12, Elijah Meeks suggested that “interloping, more than computational approaches or the digital broadly construed as the object of study, defines digital humanities.” Indeed, as researchers pursue their curiosity and explore new methods, they often venture into unfamiliar territory. But there they may find others eager to experiment with new approaches and share what they know (or, as Elijah puts it, “a vibrant community of practice,” such as what we see in neogeography). This open, collaborative ethos characterized the TXDHC conference. Ably organized and hosted by Cameron Buckner from the University of Houston (with co-sponsorship from Rice and Texas A&M), the conference attracted participants from across Texas as well as from California, Alabama, Louisiana, and Switzerland. (See Geoffrey Rockwell’s great conference notes.) I think the conference met its fundamental goal of building community among (and beyond) Texas digital humanists by providing a forum where people could present their work, make connections with fellow interlopers, and learn new skills, such as at the hackfest facilitated by Elijah. By bringing in knowledgeable and engaged keynote speakers, the conference exposed participants to cutting-edge work and enabled them to interact with experts happy to offer advice about projects and pose stimulating questions. Already a colleague from Rice who attended the conference reports that she has made progress on her project thanks to help from Elijah, and I bet others can share similar stories.

The conference functioned as the first event hosted by the Texas Digital Humanities Consortium, a new organization that aims to support collaboration among digital humanists in Texas. The consortium (and conference) emerged from a conversation that Cameron Buckner, Laura Mandell (Texas A&M) and I had in October 2013 in which we discussed the growth of digital humanities across the state and the opportunity to band together in promoting DH research and education. We roped in a few more universities, including the University of Texas, the University of North Texas, St. Edward’s, and the University of Texas at Arlington. But we want to extend the consortium further, to create an open, participatory organization that includes liberal arts colleges, universities, community colleges, libraries, museums, and archives. At the conference, I facilitated a business meeting devoted to organizing the new consortium. While I worried that few people would show up to an 8:30 a.m. meeting on a Saturday, I was impressed by how many came and how engaged they were. We had participants from Southwestern, Prairie View A&M, and the University of Texas at Dallas as well as from Rice, UH, UT Austin, St. Edward’s, and UT Arlington. Since Texas is such a big state, we don’t necessarily have the advantage of close geographical proximity, but we do have a diverse and lively community, exciting research and educational projects, and a desire to do as much as we can together.

In the course of a very productive hour, we developed a framework for the consortium.  We plan to do the following:

  • Establish a Commons in a Box web site where members of the consortium can share information about researchers, projects, events, and opportunities (such as internships). Laura Mandell and her colleagues at Texas A&M’s Initiative for Digital Humanities, Media, and Culture (IDHMC) generously offered to set up the site. Contact Laura if you would like to be put on the mailing list for the group.
  • Organize a monthly virtual meeting to plan activities, share ongoing research, and build community.
  • Explore creating internship opportunities for graduate students (and potentially undergraduate students as well). Those looking for students to assist with DH projects can write short descriptions of these projects and share them on the TXDHC web site.
  • Host an annual conference. We would like to hold the next TXDHC conference in the spring of 2015, perhaps in the Dallas/Fort Worth area.
  • Provide informal opportunities to interact, such as by hosting local reading groups and letting each other know about lectures and other events. Note that Texas A&M will host THATCamp DHCollaborate on May 16-17, 2014.
  • Explore potential advocacy activities.

We encourage others interested in digital humanities from across Texas to join us. Currently the consortium operates as a “coalition of the willing,” with decision making by consensus. There are no membership fees or formal structures; to participate, you just need to indicate interest and be willing to contribute your ideas and time. If you are a Texas digital humanist, please fill out a brief survey to indicate your interest in the consortium and offer input into its activities. Interlopers welcomed!