Data Visualisation: The Hidden Considerations

By Lorien and Tino


Using the English Short Title Catalogue, we searched for a list of all the fiction titles published between 1660 and 1799 by the publisher Noble as we wanted to see what we can learn about literary history by finding the most popular words in the titles published. The reason for doing this is because the titles published should give us an indication of the types of fictional narratives by this publisher that were published in this time period, for example, Adventure, History, or Romance. Though this is what we are intending to find out, this post identifies the considerations we need to make about how we choose to visualise data.

In order to do this, we needed to search the database inputting the criteria we wanted. This produced 36 titles, which we then edited so we only had the titles themselves. We needed to edit the data so we could ensure details such as authors were not included.

Using Voyant, we can input the text and discover the most popular words used in these titles. However, it is not as simple as this, as some of the titles have words such as ‘volumes’ and, though this is part of the title, it does not tell us anything about the type of book it is. Not all stop words are undesirable, and so we have produced two word clouds: one with stop words included, and one without.

The first word cloud reveals ideas around the syntactical structure of sentences, and though this is really interesting it does not answer our questions, hence the second word cloud, which has removed the stop words. By a point of comparison, here are the top ten words from both word clouds presented as pie charts:

with stop words includedwith stop words

We can see from the two charts that ‘History’ and ‘Adventures’ are two popular types of novels being published. What is also interesting is the presence of ‘Mr’ and ‘Miss’, but not ‘Mrs’, suggesting that female protagonists in novels may be unmarried women, and from what we already know about 18th Century novels, this may suggest a moral standpoint from the novel.

Distance reading, however, cannot give us conclusive details. For example, one word present in the top ten is ‘two’ and this is something that close reading would be required to determine if this is about the title of the novel or an indicator to the number of volumes the novel was originally published in. If the latter, then we have seemingly “stumbled” across a result we had not been expecting.

There is a flaw with our pie charts, or rather, the way he have chosen to represent this data. A pie chart implies the whole of something buy spitting a complete circle into smaller segments, but the data we are visualising is not the whole corpus but rather the top ten popular words and the frequency. A better way to visualise this data would be a bar chart because we can see the amount of times a word is mentioned, and it would not imply a totality that it cannot.

The most important aspect we have learned about data visualisations is exactly that: the visual. What we instantly see can tell us so many things, right or wrong, therefore, it is important we think carefully about the way we choose to present our data.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s