Jockers 2013

From Whiki
Jump to navigation Jump to search

Jockers, Matthew. Macroanalysis: Digital Methods and Literary History. Urbana: University of Illinois Press, 2013.


"The questions we may now ask were previously inconceivable, and to answer these questions requires a new methodology, a new way of thinking about our object of study." (4)


"Science has welcomed big data and scaled its methods accordingly. With a huge amount of digital-textual data, we must do the same. Close reading is not only impractical as a means of evidence gathering in the digital library, but big data render it totally inappropriate as a method of studying literary history." (7)
"The literary scholar of the twenty-first century can no longer be content with anecdotal evidence, with random 'things' gathered from a few, even 'representative,' texts. We must strive to understand these things we find interesting in the context of everything else, including a mass of possibly 'uninteresting' texts." (8)
"Like it or not, today, today's literary-historical scholar can no longer risk being just a close reader: the sheer quantity of available data makes the traditional practice of close reading untenable as an exhaustive or definitive method of evidence gathering. Something important will inevitably be missed." (9)
"More interesting, more exciting, than panning for nuggets in digital archives is the ability to go beyond the pan and exploit the trommel of computation to process, condense, deform, and analyze the deeper strata from which these nuggets were born, to unearth, for the first time, what these corpora really contain.' (9-10)


computational text analysis as "by all accounts the foundation of digital humanities and its deepest root" (15)

digital humanities tends to prefer web-based applications; but "the web is not yet a great platform upon which to build or deliver tools for doing text analysis 'at scale'" (18)


distant reading --> microanalysis; "places the emphasis on the systematic examination of data" (25)

"This is no longer reading that we are talking about -- even if programmers have come to use the term read as a way of naming functions that load a text file into computer memory. Broad attempts to generalize about a period or about a genre by reading and synthesizing a series of texts are just another sort of microanalysis. This is simply close reading, selective sampling, of multiple 'cases'; individual texts are digested, and then generalizations drawn. It remains a largely qualitative approach." (25)
"I am suggesting a blended approach." (26)
"the macroscale perspective should inform our close readings of the individual texts by providing, if nothing else, a fuller sense of the literary-historical milieu in which a given book exists." (28)
"The larger argument I wish to make is that the study of literature should be approached not simply as an examination of seminal works but as an examination of an aggregated ecosystem or 'economy' of texts." (32)


using metadata in bibliographies to generate new knowledge about publication histories


using statistical text analysis to identify genre

"The results suggested that there were grounds for believing that genres have a distinct linguistic signal; at the same time, however, a close analysis of the data tended to confirm the general consensus among authorship-attribution researchers, namely, that the individual usage of high-frequency words and punctuation serves as an excellent discrimator between authors. In other words, though genre signals were observed, there was also the presence of author signals and no obvious way of determining which feature-usage patterns were most clearly 'authorial' and which 'generic.'" (70)

successive generations of waves of style, ever 30 years, in 19c

strength of author signals trumps signals of individual texts in one experiment (93)

gender shows strong influence on style

marks of punctuation sometimes more indicative than words (99)

"If form ever follows function, then style ever follows form." (104)


"stylistic habits of word and punctuation usage are an imperfect measure of national style" (117)


critical of ngram

  • "the NGram Viewer offers little in terms of interpretive power: it cannot tell us why a particular word was popular or not; it cannot address the historical meaning of a word (something at which the OED is particularly good), and it cannot offer very much at all in terms of how readers might have perceived the use of the word. When we talk about the NGram Viewer as a window into culture, or 'culturomics,' we speak only of written culture; even less so, we speak of written culture as it is curated by librarians at major research universities who have partnered with Google in scanning the world's books." (122)
"The meanings of words are found in their contexts, and the NGram Viewer provides only a small peephole into context." (122)
"Cultural memes and literary themes are not expressed in single words or even in single bigrams or trigrams. Themes are formed of bigger units and operate on a higher plane." (122)

need probabilistic latent semantic indexing / probabilistic topic modelling (122)

"Topic models are, to use a familiar idiom, the mother of all collocation tools. This algorithm, LDA, derives word clusters using a generative statistical process that begins by assuming that each document in a collection of documents is constructed from a mix of some set of possible topics. The model then assigns high probabilities to words and sets of words that tend to co-occur in multiple contexts across the corpus." (123)

MAchine Learning for LanguagE Toolkit, designed at UMass - Amherst

"though the machine does a very good job at identifying the topics latent in the corpus, the machine does a comparatively poor job when it comes to auto-identifying which of the harvested topics are the most interpretable by human beings." (128)


evolution vs influence

"My interest is in finding the context in which change occurs, for it is only by understanding the larger context that we might then move to address the deeper questions of creation, of how and why such forms come into being in the first place." (156)
"To chart influence empirically, we need to go beyond the individual cases and look to the aggregate." (156)

"information cascades" as charting influence

distance matrices

network visualization

mapping distance matrices in a network shows influence over time

"What is clear is that the books we have traditionally studies are not isolated books. The canonical greats are not even outliers; they are books that are similar to other books, similar to the many orphans of literary history that have been long forgotten in a continuum of stylistic and thematic change." (168)


"The generation of life is algorithmic. What if the generation of literature were also so? Given a certain set of environmental factors -- gender, nationality, genre -- a certain literary result may be reliably predicted; many results may even be inevitable. This is another dangerous idea, perhaps even a crazy one." (172)

non-consumptive, or non-expressive, use of copyrighted material