Bode, Katherine. A World of Fiction: Digital Collections and the Future of Literary History. Ann Arbor: University of Michigan Press, 2018.

“Data-rich literary history”

“By investigating the cultural and material contexts in which literature was produced, circu- lated, and read, data-rich literary history seeks to challenge and move beyond the literary canons that organize perceptions of past literature in the present. This, then, was my intention: to move from question, to opportunity, to answers and, in so doing, to advance a noncanoni- cal, data-rich, and transnational history of the literary, publishing, and reading cultures of nineteenth-century Australia. But working with Trove interrupted that neat sequence. Instead of simply answer- ing questions, that engagement produced its own, pressing questions about the nature and implications of literary history conducted with mass-digitized collections and the literary data derived from them.” (3)

“When I looked to other data-rich literary history projects to see how they were meeting this challenge I found that the complex relationships between documentary record, digitization, data curation, and historical analysis were not fully articulated.” (3)

First argument: “whatever computational methods allow us to do with ever-growing collections of literary data, the results cannot advance knowledge if the literary data analyzed do not effectively represent the historical context we week to understand.” (3-4) — need to understand the “scholarly edition of a literary system: that is, a model of literary works that were published, circulated, and read—and thereby accrued meaning—in a specific historical con- text, constructed with reference to the history of transmission by which documentary evidence of those works is constituted” (4)

Second argument: “demonstrates how analysis of a scholarly edition of a literary system can revolutionize knowledge of lit- erary history as well as the frameworks and concepts through which we perceive past literature in the present” (4)

Chapter 1 argues “distant reading and macroanalysis offer an inadequate foundation for data-rich literary history because they neglect the activities and insights of textual scholarship: the biblio- graphical and editorial practices that literary scholars have long relied on to interpret and represent the historical record” (5)

“Such inattention to the historical and material nature of the docu- mentary record is inherited from, not in opposition to, the New Criti- cism and its core method of close reading.” (5)

“distant reading and macroanalysis construct and seek to extract meaning from models of literary systems that are essentially deficient: inadequate for representing the ways in which literary works existed and generated meaning in the past” (5)

Chapter 2 articulates “the case for and describing a scholarly edition of a literary system” (6)

Simply turning to “modeling” cannot account for disjoint between historical record as mediated and historical claims

Bode made a scholarly edition for this book, available through Press website

Chapter 3 describes the history of transmission of this “scholarly edition” / dataset

“Although it is often understood as such, literary history is not solely an analytical and critical enterprise; it has always been bound up in—enabled and produced by—the knowledge infrastructure that it creates and employs. Equally, although digital humanities is fre- quently presented as a methodological and infrastructural endeavor, it is just as much a historical and analytical one. The approaches and infrastructure developed and employed in that field have histories, just as the conceptual entities examined—including literary data and computational models—are critical and interpretive constructs. Con- fronting the challenges and possibilities that new digital technologies and resources bring to literary history and to the humanities broadly requires a mutually informative relationship of traditional and digital scholarship. Only such a relationship can enable the emergence and consolidation of the new forms of evidence, analysis, and argumenta- tion required by the contemporary conditions of cultural research.” (13)

chapter 1

“distant reading and close reading are not opposites. These approaches are united by common neglect of textual scholarship: the bibliographical and editorial approaches that literary scholars have long depended on to negotiate the documentary record. Because of this neglect, like the New Critics before them, Moretti and Jockers cannot benefit from the critical and historical insights present- ed by editorial and bibliographical productions. As a consequence, both authors conceive and model literary systems in reductive ways and offer ahistorical arguments about the existence and interconnections of literary works in the past.” (19)

Take-down of Moretti and Jockers, showing they are not fundamentally interested in databases, underlying mediated infrastructure

Moretti and Jockers not open with their datasets

“But literary works are not defined by a single time and place, and collecting them together in those abstract terms does not represent the interconnections that constitute literary systems.” (27)

St Claire, parade of authors vs. parliament of texts approaches

“While textual scholars such as Johanna Drucker (“Entity”), Paul Eggert (Securing), and Jerome McGann (New) thereby conceptualize literary works as events—unfolding over time and space and gaining different meanings in the relationships thereby formed—Moretti and Jockers construct literary systems as composed of singular and stable entities and imagine that this captures the complexity of such systems. In fact, because their datasets miss most historical connections between literary works, their analyses rely on basic features of new literary pro- duction to constitute both the literary phenomenon requiring expla- nation and the explanation for it.” (28)

Close reading of New Critical variety shares much with distant reading, in that both extract text from publishing contexts / philology / textual history / archives (32)

“In projecting textual singularity onto a historical period characterized by documentary multiplicity, the close readings these critics produce obscure the historical production and reception of this literary work even as they propose to emphasize that context.” (33)

“When a gap exists between the contemporary object assessed and the historical object it supposedly represents—and when the critic is unaware or dismissive of that gap—no degree of nuance or care in the reading can supply that historical meaning. Herein lies the funda- mental problem with proposing to integrate close and distant reading as the obvious way forward for research in literary history.” (34)

“What data-rich literary history needs is an object capable of representing literary systems—as manifestations of literary works that existed and generated meaning in relation to each other in the past— while managing the documentary record’s complexity, especially as it is manifested in new digital knowledge infrastructure. The lack of such an object, not the fundamental opposition of data and literature, is the real reason it has proven so difficult, in practice if not in theory, to integrate “traditional and computational methods” for the purposes of historical investigation (Gibbs and Cohen 70).” (34-5)

Chapter 2

Turning to McGann and Eggert 2009

“Modeling recognizes data as constructed, but only by the individual scholar; it does not pro- vide a mechanism to interrogate the history of transmission preced- ing and perpetuated by the scholar’s engagement with the documen- tary record, including in its mass-digitized forms. The framework of the scholarly edition meets that challenge, presenting a structure to negotiate the incomplete and transactional nature of the documen- tary record and to represent the outcomes of that process.” (38)

“One contribution this book aims to make is to expand and enrich the application of modeling for data-rich literary history by connecting it explicitly to descriptive bibliography. Moving beyond basic enumera- tive information, this approach investigates relationships of produc- tion and reception by describing, manipulating, and refining—that is, by modeling—details of the documentary forms and historical rela- tionships of literary works in the past.” (41)

“Integrating modeling and descriptive bibliography—or more specifically, using descriptive bibliography as a framework for modeling and modeling as a method for extending bib- liographical knowledge—can support detailed and nuanced represen- tations of literary systems that explore the existence of literary works in the past and support future investigations of those works and systems.” (42)

Four features “that I believe should also underpin the modeling of literary systems in data-rich literary history: “First is a critical assessment of the relationship between the historical context analyzed and the digital collection(s) used for analysis; second is detailed attention to the relationship between the documents included in the digital collection(s) and the terms in which they are represented; third is explicit discussion of the means by which data are extracted and modeled; and fourth is a pub- lished record of data arising from that extensive history of transmis- sion.” (46)

“In not assessing wheth- er and how well the data analyzed represents the historical context explored, these projects ultimately interpret the characteristics not of literary-historical systems but of particular components of our disci- plinary infrastructure.” (48)

“Herein lies the fundamental importance of data publication for data- rich literary history: in expressing a materiality that no longer exists in any other form, it offers the only possible basis for conversation on shared premises. It is not enough to point to the mass-digitized collec- tion or bibliographical database from which data were derived. The constitutive features of that entity have almost certainly altered. And even if the digital collection has not expanded (or contracted), data- rich literary history does not analyze the collection itself. It explores the effects of scholarly engagement with and interpretation of it.” (50)

“Applied to the literary system rather than the literary work, the scholarly edition provides a framework for investigat- ing the history of transmission constitutive of the literary system mod- eled, justifying the selections and decisions made in that analysis, and publishing the outcomes.” (52)

“For a scholarly edition of a literary system, the critical apparatus details the history of transmission by which the existence and inter- connections of literary works in the past are known. Much more than simply describing the construction of a dataset—something already offered by many data-rich literary history projects—this critical appara- tus elaborates the complex relationships between the historical context explored, the disciplinary infrastructure employed in investigating that context, the decisions and selections implicated in creating and reme- diating the collection or collections, and the transformations wrought by the editor’s extraction, construction, and analysis of that data.” (53)

“The approach to data-rich literary history I am advocating does not take the path increasingly recommended for the field: of integrating scientific and social scientific measures of statistical uncertainty into historical analysis (Goldstone, “Distant”). Given that constructing lit- erary data is a historical argument made in the context of a history of transmission—the effects of which are difficult to qualify, let alone to quantify—I do not see that any assessment of error is made more useful or concise by its numerical expression. Instead, in the intersec- tion of a critical apparatus and curated dataset, the framework of the scholarly edition offers a theoretical and practical basis to model the relationships of production and reception that constitute historical lit- erary systems, while assessing and managing the inevitable contingency of those relationships and of the documentary infrastructure through which we perceive them.” (57)

Chapter 3

Discussion of Trove’s history and criteria used for building the dataset

“Creating a reliable model of fiction in nineteenth-century Australian newspapers depends not only on querying a representative sample of newspapers but on identifying the extended fiction they published and translating it into data that signifies those publication events effective- ly.” (73)

Using a “paratextual method” to identify fiction, based on words that frequently appear in titles

Chapter 4

“Rather than a problem that prevents analysis, the thousands of authorless works identified in this project indicate—and demand new critical approaches to understanding—the fundamentally different conceptions of literary meaning and value operating in the past.” (86)

Problem of how to “count” anonymous and pseudonymous authorship

Table with different categories of authorship (89)

Increased attribution over 19c

Understanding gender and national identity as depicted in the newspaper (“inscribed genders and nationalities”) vs. what they were (95-6)

Chapter 5

“As mass-digitized collections become core disciplinary infrastructure for literary history, network analysis is increasingly used to explore the extensive datasets derived from them in order to investigate historical connections in literary culture. Network analysis is employed by many of the projects surveyed in chapters 1 and 2 (by Moretti and Jockers, and by scholars who depart from these authors’ approach to modeling liter- ary systems). The reason for the method’s popularity is clear: its depic- tion of edges (relationships) between nodes (entities) resonates with a system-based understanding of print and literary culture, common in book history and periodical studies and foregrounded by data-rich literary history.” (125)

Network analysis only as good as the data

“For projects based on mining mass-digitized collections, the con- siderable gaps in what is available to be modeled mean that network visualizations invariably present fictitious systems: arrangements that are a function of what has been digitized as much as, if not vastly more so than, how a literary system actually cohered and operated.” (126)

Using statistical measures to “improve” “accuracy” — “The probability measures needed to model systems based on highly incomplete datasets are at odds with the centrality of docu- mentary evidence to historical argument.” (130)

Chapter 6

“This chapter therefore approaches the question of whether Aus- tralian writing demonstrates features distinct from imported fiction in the first instance by using an integrated application of two machine- learning methods: topic modeling and decision trees.” (158)

“We might expect gendered tendencies in fiction to be more appar- ent than national (or protonational) ones. But word patterns emerge as more strongly indicative of whether an author is American, Aus- tralian, or British than male or female.” (159)


“Underlying these argu- ments is an intention to extend a transnational consciousness to data- rich literary history. With notable exceptions, there is a tendency in that field to treat large corpuses of American and British literature as a universal literary record. In exploring—and offering for exploration by others—a digitized body of works from around the world, published in the Australian colonies, I hope to disrupt the implicit national biases and globalizing impulses present in data-rich literary history.” (200)

“Far from an esoteric preoccupation, textual scholarship has always been a response to real-world conditions and constraints: to the need to identify, understand, and manage gaps in the documentary record so as to provide an effective and explicit foundation for current and future interpretations and insights. Notwithstanding the influence of researchers such as Moretti and Jockers on academic and public perceptions of digital humanities, this space of mediation, collection, translation, and curation—of understanding and managing the con- straints presented by the real world—is where much of the field actual- ly sits.” (208)