In 1887, physicist T.C. Mendenhall decided to study the word-length frequencies of writers − the frequency at which writers tended to use words of varying lengths − in a quest to find a mathematical way of proving or disproving authorship. To accomplish this he counted the numbers of one-letter, two-letter, three-letter words, etc., in a text, calculated the percentages of each word length used (the frequency), and displayed the results on a graph. He discovered that writers’ word-length frequency curves remained consistent, and more important, that each writer’s characteristic word-length frequency curve differed from those of other writers.1
In 1901, a wealthy Bostonian named Augustus Hemingway, having heard of Mendenhall’s work, commissioned him to carry out a comparative study of Shakespeare and his contemporaries, including Christopher Marlowe. Hemingway believed Francis Bacon wrote the works of Shakespeare, and he was convinced that Mendenhall had found a way to prove it.
Using Hemingway’s money, Mendenhall hired two women to count the words of various lengths in each of the works. They had to manually count millions of words. Unfortunately for Hemingway, the overall word-length frequencies of Bacon and Shakespeare were very different (although, to be fair, Mendenhall compared Bacon’s prose with Shakespeare’s blank verse, two genres since shown to differ markedly in word-length frequency).
Mendenhall discovered that Shakespeare used significantly more four-letter words than three-letter words. Every other English writer Mendenhall studied, including Shakespeare’s playwright contemporaries, used more three-letter words than any other length. After a while, Mendenhall and his assistants could recognize unidentified blocks of Shakespeare from the four-letter-word spike alone. But when his assistants began to count the words in the Marlowe plays, Mendenhall realized that the Shakespeare curve was not unique after all. To his surprise, the Marlowe and Shakespeare curves were nearly identical. Here is Mendenhall’s reaction to the discovery:
“It was in the counting and plotting of the plays of Christopher Marlowe, however, that something akin to a sensation was produced among those actually engaged in the work. Here was a man to whom it has always been acknowledged, Shakespeare was deeply indebted; one of whom able critics have declared that he ‘might have written the plays of Shakespeare.’ … Even this did not lessen the interest with which it was discovered that in the characteristic curve of his plays Christopher Marlowe agrees with Shakespeare about as well as Shakespeare agrees with himself, as is shown in Fig.9” 2
(In Fig. 9, word length is shown on the horizontal axis, and frequency, in number of words out of a thousand, on the vertical axis. For example, Mendenhall found that Marlowe and Shakespeare both used 2-letter words about 175 times out of a 1000, or 17.5% of the time).
Mendenhall’s study made no impact on Shakespearean scholarship, but it did energize a small number of proponents of the Marlowe theory and persuaded others that perhaps the Marlowe theory was worth further investigation.
The study sat idle for decades until Peter Farey decided to extend Mendenhall’s work.3 Farey chose a group of authors and, using electronic texts and word-counting software, performed a comparison of two large chunks of text by each writer, calculating how closely each writer agreed with him or herself. All of the writers agreed with themselves quite closely.4
Next, Farey moved on to a comparison of Shakespeare and Marlowe. Mendenhall had not noticed that authors' word-length frequency usage could change over time and genre, but Farey had discovered that the word-length frequency curves for Marlowe's earlier works differed from his later ones, and that Shakespeare's comedies differed from his non-comedies. The differences were subtle but significant. Accordingly, Farey chose to compare Marlowe's later plays with Shakespeare's histories and tragedies (since none of the Marlowe works are comedies) to eliminate the effects of time and genre. In his Marlowe-Shakespeare comparison, Farey found the agreement was statistically closer than any other writer had been with himself.5
The match between the two curves is astonishing. Farey’s method is easily reproducible, and it is accompanied by rigourous statistical analysis. As such it merits serious attention from mainstream scholarship. Like Mendenhall before him, Farey’s work has not penetrated the mainstream to any large extent. The standard scholarly explanation of the origins of Shakespeare’s style, that he began his career by imitating Marlowe, is invoked to explain why the word counts of the two respective writers are so similar. But even allowing for this, it would still require a strong coincidence for Shakespeare to obtain this degree of match with Marlowe, especially since it would have to have happened unconsciously.
My contribution to the debate was to look at the word counts of individual Shakespeare plays. Did they all look more or less alike? Using a counting method shared with me by Peter Farey, I counted the words of twenty-one Shakespeare plays, randomly chosen across the whole canon. I plotted all twenty-one curves, along with the overall average, on a single graph.
What is immediately striking is the amount of variation between individual plays. What this variation tells us is that the average curve for Shakespeare could have assumed many different shapes. There was no underlying principle forcing it to average out in this manner.6
The graph emphasizes how remarkable Mendenhall's and Farey’s results actually are. The possibility that two writers, both showing variability in individual plays, could arrive at the same average curve by chance, is exceedingly small. Mendenhall's and Farey’s studies provide compelling evidence for a Marlowe authorship of the Shakespeare plays.
© Daryl Pinksen, February 2009
Daryl Pinksen, a regular contributor to MSC and author of Marlowe's Ghost, is a Fellow of the School of Graduate Studies at Memorial University of Newfoundland.
Click here to reach Daryl Pinksen's website.
Click here for another piece by Daryl Pinksen on style similarities.
Click here to see what the scholars say about the similarities between Marlowe and Shakespeare (courtesy of the International Marlowe-Shakespeare Society).
1Mendenhall, T. C. 1887. The Characteristic Curves of Composition. Science Vol 9: 237–49.
2Mendenhall, T. C. 1901. A Mechanical Solution of a Literary Problem. The Popular Science Monthly Vol LX: 97–105.
3Peter Farey’s Marlowe Page http://www2.prestel.co.uk/rey/chap8.htm#note1
4Peter Farey’s Marlowe Page http://www2.prestel.co.uk/rey/appx3a.htm
5Peter Farey’s Marlowe Page http://www2.prestel.co.uk/rey/appx4a.htm
6Pinksen, Daryl. 2008. Marlowe’s Ghost: The Blacklisting of the Man Who Was Shakespeare. Bloomington, IN: iUniverse. (p. 55)
Click here for the blog's home page and recent content.
(keywords: shakespeare and marlowe similarities and differences; marlowe shakespeare computer analysis))