A useful and user-friendly tool for basic digital analysis of texts is Voyant. I used it to analyze five works of Thomas Carlyle, taken from Project Gutenberg. The works chosen were:
Sartor Resartus (1834)
The French Revolution (1838)
Past and Present (1843)
Latter-day Pamphlets (1850)
These were partly chosen as they are perhaps Carlyle’s most important works, but also because Gutenberg doesn’t have all Carlyle’s works. For example, I would have considered Chartism (1840) had it been there, but it wasn’t (though it can be accessed online via Google Books). Similarly, the massively influential Critical and Miscellaneous Essays (1838) were not there.
There are a couple of other minor caveats:
1) The version of Latter-day Pamphlets used was not the complete version. Like many versions, it consists of only five essays, omitting the final three.
2) The Gutenberg pages analyzed contained not only the texts of the works, but also various paratexts: title and publication details, Gutenberg’s copyright statement, and so on. This is most important regarding Past and Present, which contained an introduction by Ralph Waldo Emerson from the first US edition of the work. For a proper academic analysis, one would have to work on finding or creating a webpage or file with no such paratexts, but for the purposes of this blog, the superfluous material wasn’t enough to seriously upset the findings.
So, I simply copied and pasted the five links to the relevant pages on Gutenberg, then Voyant did the rest, returning a page filled with analysis of Carlyle’s works. First is a word cloud:
This can be adjusted to include from 25 words up. The adjustment bar, however, is very fiddly (at least on my iPad), and it’s hard to adjust the number of words with accuracy or tell what number of words are being shown. The cloud above has about 100 words, the 100 most common words across the texts. The larger the text, the greater the frequency. A quick look tells you that the most frequent word across all the texts is man. Still more pointedly, the second most frequent word is men. By clicking on the words in the cloud, we find that man gets 2293 mentions, men 1815. This tells us already a lot about Carlyle’s writing: he was interested in the male experience, he was troubled and obsessed by ideas of manhood, constantly working through these ideas. The words women and woman get only 182 and 56 mentions respectively. Already we see how Carlyle’s thought is out of kilter with these times.
We can toggle between cloud view and list view of most popular words, and while the former is perhaps more immediately striking and certainly more redolent of digital humanities, the latter view is better for a more exact picture. It allows us to ascertain for certain that he third most popular word is world. This presence illustrates the grandeur of Carlyle’s ambitions. He was a wide-gazing sage, not the narrowly focused expert that is valued in the 21st century. The frequency with which the word world occurs defines perhaps the most important difference between the Victorian intellectual and the contemporary scholar: he is not an expert an any particular thing, but rather strives to comprehend the world as a totality.
Shall is also in the top five. By clicking on the word, we can also see which work it is most popular in. In this case, it’s The French Revolution by quite a distance. So Carlyle is using shall to slip back and forth in time, to predict the future of the past, such as in the word’s very first appearance. This comes in a passage which is very typical of Carlyle, an address to the poverty-stricken masses of pre-revolutionary France on the occasion of a police crackdown on public protests/riots:
O ye poor naked wretches! and this, then, is your inarticulate cry to Heaven, as of a dumb tortured animal, crying from uttermost depths of pain and debasement? Do these azure skies, like a dead crystalline vault, only reverberate the echo of it on you? Respond to it only by ‘hanging on the following days?’—Not so: not forever! Ye are heard in Heaven. And the answer too will come,—in a horror of great darkness, and shakings of the world, and a cup of trembling which all the nations shall drink. [My italics and underlining]
The cup of trembling was of course the French Revolution itself, which struck fear into the rich and privileged of all countries, and Carlyle is here tapping into the fear among his British readers that the Revolution could spread. So the use of shall here and in other parts of this work is a function of Carlyle’s particular mode, which might be called retroactive prophecy. It harnesses the power of the prophetical voice, with little of the epistemological risk (that is, it can hardly be wrong, because the things prophesied have for the most part already happened)
Voyant also supplies word count for each text. The French Revolution is the longest; Latter-day Pamphlets the shortest – though it is, as noted above, missing part of the originally published material. Not much to analyze there. Potentially more interestingly, there is considerable variation in vocabulary density across the works. Vocabulary density refers to the ratio of different words used to total word count. Carlyle’s highest vocabulary density occurs in Sartor, indicating that it is a more linguistically varied text, perhaps a more demanding and difficult text. As a particular admirer of Sartor, I think it also indicates that this work is the product of a more supple and questioning mind than the other works. The least vocabulary density is found in On Heroes. When one remembers that this work began as a series of lectures, this seems a deliberate choice by Carlyle, streamlining his vocabulary to make his ideas more accessible to a listening audience without the possibility of going back and reading over difficult parts.
Average words per sentence is another indicator of complexity. Here On Heroes has lowest wps, showing it again as the least complex text. The highest wps, though, is Pamphlets. This is an interesting development, as Carlyle’s wps had previously fallen from the heights of Sartor, but here hit a new peak. This anomalous situation warrants more developed study than I can give it here.
In the screenshot above, the final category is Distinctive Words. This means the words which characterize individual works but rarely or never appear in the other texts analyzed. Most of the words involved are proper nouns, generally the names of the works’ main characters: so Teufelsdrockh is the most distinctive word in Sartor, because Diogenes Teufelsdrockh is the book’s protagonist; abbot is the most distinctive word in Past and Present, because Abbot Samson is that book’s focus. Thus, this category seems too predictable to be really insightful, at least in the examples here.
I have only scraped the surface of the many possibilities of Voyant, not only for studies of a single author, but also, and perhaps especially, for comparison between authors. Thus I will undoubtedly return to this tool sooner rather than later, perhaps to compare Carlyle’s texts to those of some of his contemporaries. The most impressive things about the tool, in my opinion, are its astonishing ease of use (fiddly bar accompanying word cloud aside) and user-friendliness, and the fact that it is, as of now, totally free.