Since this session coincides with your essay hand-in deadline, we’re going gently introduce ourselves to the next section of the module – ‘Distance Reading’ – by some in-class experimentation.
“Distant reading is the idea of processing content in (subjects, themes, persons, places etc.) or information about (publication date, place, author, title) a large number of textual items without engaging in the reading of the actual text. The “reading” is a form of data mining that allows information in the text or about the text to be processed and analyzed. Debates about distant reading range from the suggestion that it is a misnomer to call it reading, since it is really statistical processing and/or data mining, to arguments that the reading of the corpus of literary or historical (or other) works has a role to play in the humanities. Proponents of the method argue for the ability of text processing to expose aspects of texts at a scale that is not possible for human readers and which provide new points of departure for research. Patterns in changes in vocabulary, nomenclature, terminology, moods, themes, and a nearly inexhaustible number of other topics can be detected using distant reading techniques, and larger social and cultural questions can be asked about what has been included in and left out of traditional studies of literary and historical materials.”
We’ll begin to explore these techniques and tools by experimenting with four different n-gram viewers (this term is from computational linguistics and simply means ‘token’ – for example, if we were counting words ‘What do I mean?’ the phrase counts as a 4-gram). All these are available via the ‘Tools’ section of the module website.
- Google N-gram viewer
- Early Modern Print’s EEBO N-Gram browser (NB: this and above do not display well on Firefox)
- Hathi Trust Bookworm
- ECCO‘s Artemis ‘Term Frequency’ visualiser
Culturomics (background on Google’s N-Gram viewer)
Patricia Cohen ‘In 500 Billion Words: New Windows on Culture‘ New York Times, Dec 16, 2010 (part of the the series ‘Humanities 2.0’)