Some of my favorite conferences are those that are outside my field, since they expose me to new perspectives and enable me to meet new people. Such was the case with the ArcLight Symposium hosted in May by Concordia University. Sponsored by Project ArcLight, a Concordia/University of Wisconsin project to develop web-based tools for analyzing the rise of American media (movies, radio, TV, newspapers) in the 20th century, the symposium brought together film and media historians with digital humanists to explore the possibilities and pitfalls of digital methods for media history. Funded through a Digging into Data grant, the project builds upon the Media History Digital Library (MHDL), which contains 1.5 million pages of digitized trade journals, fan magazines, and other public domain media periodicals.
While some media historians apologized for not being particularly savvy about digital methods, some digital humanists (OK, this one) confessed to having a limited knowledge of film history. But Mark Williams rightly suggested that doing work in digital humanities requires moving outside your comfort zone, and the conference was the richer for people making such leaps. I was reminded of Elijah Meeks’ suggestion that “interloping, more than computational approaches or the digital broadly construed as the object of study, defines digital humanities.” Figuring out new methods or delving into unfamiliar subject areas necessitates interloping.
- The importance of archives—physical and digital– to media history research. Such research often requires returning to the original record and paying attention to the material object. For example, as Haidee Wasson studies the history of portable film projectors, she needs to examine the original objects so that she can get a sense of their heft and design. At the same time, digital archives open up rich possibilities for discovering relevant resources and patterns, analyzing them, and sharing them. For example, the Media Ecology Project aims to enable researchers to access resources from media archives, create their own annotations, collections, and publications, and contribute metadata back, using tools such as Mediathread, Scalar and onomy.org.
- The importance of thinking historically about primary source materials—to understand, for example, the structure, design and cultural history of newspapers, as Paul Moore and Sandra Gabriele pointed out. Likewise, we need to pay attention to the features of digital objects. Ryan Cordell emphasized that we should view digital resources as something entirely different than print, so that a digitized newspaper is not a duplicate of a print edition, but its own typeset edition.
- The need to consider scope: I referred to Josh Sternfeld’s recommendation that historians deal with digital abundance by extending the principles of scope and provenance into the digital environment. In conducting analysis, it’s important to keep in mind what’s included–and what’s not–in the corpus. As Greg Waller pointed out, studying representation means looking at multiple forms of media—not just film, but sheet music, postcards, literature, etc. Where do you draw the boundaries?
- Intellectual property: Despite digital plenty, researchers are often limited by paywalls and proprietary databases. MHDL tried to work with a commercial database, but was rebuffed; the global film data that underlies Deb Verhoeven’s Kinomatics research is expensive and cannot be shared. These challenges point to the need to support open projects such as MHDL whenever possible. An audience member urged greater risk-taking, as paranoia about intellectual property can lead scholars, librarians and institutions to hesitate in sharing cultural materials that are public goods.
- Ambiguity of humanities data. As many researchers have suggested, humanities data is often messy, dynamic and ambiguous. For example, Deb Verhoeven noted that the locations, size and status of theaters change over time. Likewise, Derek Long acknowledged the challenges of disambiguating film titles that include common words such as “war.”
- Reductiveness of ontologies. David Berry asked what happens when information is translated into an ontology, suggesting that DH imports instrumental methods into interpretation. Deb Verhoeven argued that imposing ontologies on diverse humanities data raises both practical and ethical issues. For example, with a typical ontology you can’t say that things aren’t related, and you can’t see who made what assertions. Hence HuNI (Humanities Networked Infrastructure), which brings together 31 Australian cultural datasets, eschews traditional ontologies. Instead, users generate connections through a vernacular linking process, using either free text or already established categories. They can also explore linkages created by others.
- Searching as frustrating but also fundamental. Searching enables scholars to frame an inquiry, get a sense of what an archive might contain, and discover relevant materials. Eric Hoyt pointed to problems with search, including the risks of confirmation bias and of being overwhelmed by results, but also suggested that search is easily understood and widely used by most scholars. Scaled entity search brings greater power to search, allowing researchers to compare hundreds of entities (such as names, film titles, or radio station call letters) across the corpus and then generate visualizations to explore patterns. For example, you can compile a list of the most frequently mentioned people in silent film (which often turn out to be those who also headed production companies). ArcLight will be released later this summer.
To create the entity list for film history, ArcLight uses its Early Film Credits (EFC) dataset. This dataset grew out of a 1996 database of American Film Personnel and Company Credits, 1908-1920, which is itself based on a 1976 book. As Derek Long showed, EFC enables you to, for example, generate a list of the number of films produced by a director in a particular year or the number of films made by different production companies (revealing the dominance of a few companies and the “long tail,” as most companies made only a few films).
- Pattern matching. As Ryan Cordell noted, searching doesn’t work for everything—for example, you can’t run a search for “show me all instances of reprinting in 19th century newspapers.” Much humanities work involves identifying and interpreting patterns; digital tools can support pattern matching on a much larger scale. For example, the Viral Text Project uses sophisticated algorithms to detect matches in thousands of pages of 19th century newspapers. For a human to take on this work would be nearly impossible. But the computer can reveal significant repeating patterns across a corpus like Chronicling America, enabling scholars to detect the circulation of ideas, debates over the authorship of a sentimental poem, and much more.
- Working at multiple scales. As Bobby Allen pointed out, the level of zoom “makes some relations visible but obscures others.” Paul Moore and Sandra Gabriele spoke to the importance of middle-range reading, or “scalable reading.” Scalable reading involves moving from the macro to the middle to the micro—for example, looking at patterns of circulation at a distance, “following material links across media, form and genre” at the mid-range, and examining the “symbolic order” of a newspaper page close up.
- Experimentation and iteration: Several speakers used variations of the term “experiment” to describe digital methods. For instance, Deb Verhoeven emphasized that working with the Kinomatics Project’s “big data” documenting the flow of international film across space and time requires experimentation and iteration. Haidee Wesson noted the “experimentalism” of trying out different search strategies, posing different questions of databases. Charles Acard offered an important caution as he suggested that experimentation in DH needs to be influenced by literary theory.
- The need to engage public audiences. Bobby Allen and Deb Verhoeven spoke to the importance of thinking of humanities work as public goods. Bobby Allen suggested that researchers should “dig where you stand,” but connect local archives to the network. For example, the UNC Digital Innovation Lab’s Digital Loray project enables the public to explore the history of a textile mill in Gastonia, NC, using digital tools to share stories, maps and timelines and engage public audiences.
Crowdsourcing represents one form of public engagement, but some raised concerns. What distinguishes public expertise from academic? Is crowdsourcing exploitative?
- Using visualization to represent complexity: How do you represent the magnitude of global cinema? Laura Horak sketched out a project to create flow maps to study the global circulation of film, describing different ways of visualizing such data and how such visualizations can shape understanding.
While participants used examples specific to media history, many of the concerns and approaches explored in the symposium have broader relevance. I was reminded of the importance of cultivating a critical awareness of methods and engaging in the kind of interdisciplinary dialogue that the ArcLight symposium fostered.
Disclosure: Most of my travel costs were covered by the symposium.
Rev, 6/13/15: Corrected spelling error.