Home > Essays > A comparative study of the vocabulary of the greatest Western authors – and (…)

A comparative study of the vocabulary of the greatest Western authors – and the winner is: THOMAS MANN

Wednesday 13 March 2019, by Ray

With the help of modern technology and a clever engineering student, we have analysed in depth the vocabulary in 25 of the most famous one-volume works in the history of Western literature [1] – celebrated texts by Dante, Cervantes, Shakespeare, Balzac, Flaubert, Victor Hugo, Charles Dickens, William Thackeray, Herman Melville, Marcel Proust, Jack London, Thomas Mann, Robert Musil, James Joyce, Ernest Hemingway, F. Scott Fitzgerald, and John Steinbeck.

We have simply measured the number of different words in each text, where punctuation marks (other than embedded dashes and apostrophes), numbers, special characters, initials and Roman numerals have been ignored.
Uppercase/lower-case variations of the same basic word have also been ignored for comparison purposes.

However, grammatical or spelling variations of the same base word have not been eliminated, and have been included in the count for each work.


There is a case to be made that grammatical variations of the same word (noun plurals, conjugated verbs, declined adjectives, etc.) should not be included in comparative vocabulary studies.

Although doing that for all of the masterworks listed above would be an undertaking well beyond the scope of this essay, we have done a partial study of the rate of elimination of grammatical variations across a sampling of the works in question, shown below:


This shows a rate of some 22-23% “grammatical adjustment” for most of the works in English, with a somewhat higher rate for Charles Dickens (26%) and James Joyce (26%), and higher rates for Thomas Mann (German) and Gustave Flaubert (French): 28% and 38% respectively. And we have reason to believe that the grammatical adjustment rate for those of the other Latin languages is somewhat similar to that for French.


To put the above figures in perspective, the authoritative study of Shakespeare’s vocabulary by Marvin Spevack [2] established that there were 29,066 different words in his collected theatrical works.

If multiple spelling and grammatical variations of the same word are eliminated, this number is reduced to about 20,000 [3] – an “adjustment ratio’’ of 31.2 %.


The conclusion is clear, whether one considers the straightforward (and error-free) computer count or the necessarily semi-manual "adjusted" count, the relative vocabulary count of the various works is largely unchanged.

So in purely quantitative terms (quality is exceptionally high throughout this selection of great works, obviously) THERE IS ONE CLEAR WINNER: THOMAS MANN


Thomas Mann
(1875-1955)

[1in their original language.

[2« A Complete and Systematic Concordance to the Works of Shakespeare », Marvin Spevack, 1968.

[3ref.: « Shakespeare: The World As Stage », B. Bryson, 2007, pp. 108-109.