Monthly Archives: March 2016

Wrong Search, Wrong Answer

Distance reading is an undeniably useful tool in helping us to search for data which encompasses a large pool of texts. On its own, however, distance reading lacks context and requires us to couple the knowledge we gain through close reading with the data we gain through distance reading in order to create accurate search terms, and accurate answers. We can see this through our graph, which provides the data for works of fiction published between the years of 1660-1799 (according to the ESTC) which contain the term ‘adventure’, seen in Figure 1.

Figure 1.

This data can raise various problems; the first of which is the potential to be mislead, as the data  does not take into account synonyms or translations of the term ‘adventure’. Therefore, this may not provide us with accurate data for the amount of texts. Additionally, the amount of texts is not taking into account multiple editions published in the same year, and would therefore require further distance reading. Although this is a rudimentary analysis, and could be further examined taking into account other popular terms (e.g. ‘adventure’ and ‘romance’ for example), it does provide us with a rough point at which to start.

However this would require further close reading of the texts in order to gain a greater significance for evidential use. Franco Moretti, a scholar in the field of digital text analysis, is a strong advocate for distance reading; however he cannot deny that close reading is often still necessary, Moretti concedes that things didn’t unfold as planned. Somewhere along the line, he writes, he “drifted from quantification to the qualitative analysis of plot””.

The answers we are looking for can only be found through specific search terms. If we search for the wrong terms, we receive the wrong answers. The right information can be known through the process of close reading, proving to us that both close reading and distance reading are needed in order to gain reliable and accurate information.


Data Visualisation: The Hidden Considerations

By Lorien and Tino


Using the English Short Title Catalogue, we searched for a list of all the fiction titles published between 1660 and 1799 by the publisher Noble as we wanted to see what we can learn about literary history by finding the most popular words in the titles published. The reason for doing this is because the titles published should give us an indication of the types of fictional narratives by this publisher that were published in this time period, for example, Adventure, History, or Romance. Though this is what we are intending to find out, this post identifies the considerations we need to make about how we choose to visualise data.

In order to do this, we needed to search the database inputting the criteria we wanted. This produced 36 titles, which we then edited so we only had the titles themselves. We needed to edit the data so we could ensure details such as authors were not included.

Using Voyant, we can input the text and discover the most popular words used in these titles. However, it is not as simple as this, as some of the titles have words such as ‘volumes’ and, though this is part of the title, it does not tell us anything about the type of book it is. Not all stop words are undesirable, and so we have produced two word clouds: one with stop words included, and one without.

The first word cloud reveals ideas around the syntactical structure of sentences, and though this is really interesting it does not answer our questions, hence the second word cloud, which has removed the stop words. By a point of comparison, here are the top ten words from both word clouds presented as pie charts:

with stop words includedwith stop words

We can see from the two charts that ‘History’ and ‘Adventures’ are two popular types of novels being published. What is also interesting is the presence of ‘Mr’ and ‘Miss’, but not ‘Mrs’, suggesting that female protagonists in novels may be unmarried women, and from what we already know about 18th Century novels, this may suggest a moral standpoint from the novel.

Distance reading, however, cannot give us conclusive details. For example, one word present in the top ten is ‘two’ and this is something that close reading would be required to determine if this is about the title of the novel or an indicator to the number of volumes the novel was originally published in. If the latter, then we have seemingly “stumbled” across a result we had not been expecting.

There is a flaw with our pie charts, or rather, the way he have chosen to represent this data. A pie chart implies the whole of something buy spitting a complete circle into smaller segments, but the data we are visualising is not the whole corpus but rather the top ten popular words and the frequency. A better way to visualise this data would be a bar chart because we can see the amount of times a word is mentioned, and it would not imply a totality that it cannot.

The most important aspect we have learned about data visualisations is exactly that: the visual. What we instantly see can tell us so many things, right or wrong, therefore, it is important we think carefully about the way we choose to present our data.