What Makes To the Lighthouse Difficult?
A Quantitative Approach

As discussed at the beginning of our “Project Goals” page, most readers encountering To the Lighthouse for the first time find it a “difficult” novel, generally agreeing that this is because it’s often hard to know who is speaking.

One of the goals of “The Digital Text” — the undergraduate English course out of which this website has grown — is to learn how to test our qualitative literary impressions with quantitative data. And difficulty, as it happens, is one of the easiest things to measure quantitatively in literature — at least, difficulty of a certain kind.

The Type/Token Ratio (TTR) measures a text’s “vocabulary diversity.” It is obtained by dividing the number of unique words (types) in a text by the total number of words (tokens), and multiplying the quotient by one hundred to express it as a percentage. The higher a text’s TTR over a given number of words — the more unique words it employs in that span — the “wordier” it is. Since wordy texts can be confusing, the TTR provides a quantitative method of measuring a certain kind of difficulty. (A useful tool for getting TTR data is TAPoR’s List Words tool.)

As one might expect, a lot of texts from the notoriously “difficult” modernist period have exceptionally high TTRs. I challenge you, for example, to find a long novel with a higher TTR than Joyce’s Ulysses — or a long poem with a higher TTR that Eliot’s The Waste Land.

One might expect that To the Lighthouse — another modernist classic with a reputation for difficulty — would also have a high TTR, and that high vocabulary diversity might help to explain the challenge it poses to many readers. In fact, a few quick calculations show that this is not the case at all.

Among a selection of Woolf’s novels, for example, To the Lighthouse has one of the lowest TTRs.

Types

Tokens

TTR

Between the Acts

6,996

44,980

15.6%

Jacob’s Room

6,709

44,980

14.9%

The Waves

6,490

44,980

14.4%

The Voyage Out

6,166

44,980

13.7%

To the Lighthouse

5,280

44,980

11.7%

The Years

5,066

44,980

11.2%

(Sample size here has been standardized to the length of the shortest novel in the comparison set, Between the Acts, which is 44,980 words long. We are only looking at the first 44,980 words in the other novels. We need to do this in order to compare “apples to apples” — longer texts always tend to have lower TTRs, given the repetitions that become increasingly unavoidable the longer one goes on. So it’s not fair — or revealing — to compare texts of different lengths.)

As you can see, while To the Lighthouse is subjectively one of more challenging of Woolf’s novels, it has one of the lowest TTRs.

When compared to a few important novels by other modernists, the differences are more striking still. (This time we’re standardizing the sample size to the shortest text in the comparison set, To the Lighthouse, which is 70,090 words long. We are counting the first 70,090 words in the other texts.)

Types

Tokens

TTR

James Joyce, Ulysses

11,441

70,090

16.3%

Joseph Conrad, Nostromo

8,990

70,090

12.8%

William Faulkner, Absalom! Absalom!

6,928

70,090

9.9%

Virginia Woolf, To the Lighthouse

6,925

70,090

9.9%

To the Lighthouse comes last in this comparison set: while its TTR is only very slightly lower than Absalom! Absalom!, it is massively lower than Ulysses, and significantly lower than Nostromo. This table suggests that while vocabulary diversity might contribute to the difficulty of Ulysses and Nostromo, other factors must be at work in Absalom! Absalom! and To the Lighthouse.

Among these four reputedly “difficult” modernist works, To the Lighthouse has the lowest TTR. But how does it compare to “easy” novels from its own period — novels that sold well and were read widely? A TTR set comparing it to best-sellers from 1927 — the year that To the Lighthouse was published — reveals that it had a low TTR even by their standard.

(Each of these novels made Publisher’s Weekly’s top-ten year-end best-sellers list. They were chosen for this comparison because electronic editions were readily available.)

Types

Tokens

TTR

Sinclair Lewis, Elmer Gantry

8,966

70,090

12.6%

Edith Wharton, Twilight Sleep

7,955

70,090

11.2%

Warwick Deeping, Doomsday

7,501

70,090

10.6%

Virginia Woolf, To the Lighthouse

6,925

70,090

9.9%

Once again, To the Lighthouse comes in last in TTR. (This chart might suggest that books with high TTRs are not necessarily perceived as “difficult” — that high vocabulary doesn’t necessarily turn off readers.)

All of these numbers combine to show that while many readers perceive To the Lighthouse to be difficult, it cannot be high vocabulary diversity that accounts for this perceived difficulty. The quantitative data thus backs up our qualitative impression: that it is not “wordiness,” but rather vocal complexity, that poses a challenge to readers of the novel.

In order to verify this last hypothesis quantitatively, we would need a numbers-based method for assessing a text’s vocal complexity. This is one of the aims of our algorithmic method for detecting free indirect discourse (FID). Since FID inescapably introduces confusions of voice, a high proportion of FID in a novel might serve as a good marker of its vocal complexity — and perhaps of its “difficulty” as well.

While TTR does a poor job of differentiating quantitatively between readers’ responses to Absalom! Absalom! and Doomsday, a quantitative index of FID might much more accurately reflect the effort that a reader must exert in navigating their respective texts.

We’re working on it!