A linguistics expert breaks down the five-letter game
By Rohit Bhaskar | February 4, 2022
It started as a token of love from Welsh, Brooklyn-based software engineer Josh Wardle to his wife in October last year; now, Wordle has quickly turned into a daily obsession for millions of users. It has also spawned countless imitators in many languages, from French to Yiddish. There’s the NSFW Lewdle and Sweardle. The plain silly Letterdle, where you have to guess a single letter. The math-based Nerdle. The Lord of the Rings-themed Lordle of the Rings. A Taylor Swift-inspired Taylordle and even the hockey-based Gordle.
Yet the original, with its simple premise – to guess a five-letter word in six tries – remains the most popular and has been sold to the New York Times for a reported seven figures.
We reached out to Dr. Julia Peters, linguistics instructor at MacEwan University, to help you guess the word with academic rigour. Peters isolated five-letter words from an online source and generated a list of over 4,000 words (for perspective, the Wordle code has 2,315 possible answers). She removed proper nouns, did not include plural forms of four-letter nouns (e.g., walls) or four-letter verbs inflected for non-past tense (e.g., walk – walks).
From the raw data, she finds that of the five letters that occur most frequently in the selected word bank four are vowels (e, a, o, i, in that order) with only one consonant in the top-five (r which was third; s was sixth). A few interesting observations she notes is that while j and q are 25th and 26th in frequency, if they are in the word, there is a close to 70 per cent chance that they are the opening letter. She also notes that y has a high rate of surfacing in the fifth position, so the majority of y occurrences are as a vowel.
She then analyzed the frequency count of letters by their position in the word. While vowels occur more frequently in words, it’s rare for them to be the first letter with only a appearing in the top 10. Instead, she finds that b, p and m, which don’t occur with high frequency (neither is in the top 15 for frequency) are all in the top seven for the opening letter. The three letters form what are called “bilabial” sounds, because they involve two (bi) lips (labial) coming together.
Peters explains, “It seems that English likes starting words with bilabial sounds. This makes sense, since we start speaking with two lips together.”
The fifth letter
While bilabials are common as the first letter, we often find that the fifth letter is usually what are called “alveolar” sounds (so called, because they involve the tongue moving up to the alveolar ridge which is the ridge just behind your teeth at the top of your mouth) which includes the letters t, r, d, l, n, s. She says the reason for this is, “maybe because it takes very little energy to create an alveolar sound compared to bilabials.”
The second letter
The second position is dominated by vowels, with close to 60 per cent of the second letter comprised of vowels. She also notices the high frequency of c in the opening letter because it is commonly used in digraphs (ch) and in consonant clusters to start words (cl, cr) and to end words (ck, ch). This is also why h is more likely to appear as the second or final letter as a result of words beginning and ending with English digraphs: ch, gh, ph, sh, th.
The fourth letter
She also notes that e is the most common fourth letter because plenty of English words end with e followed by an alveolar: er, en, el, ed.
Aa for syllable structure? Peters says, “Many five-letter words are a single syllable long. Syllables are structured with a vowel component at the heart and then consonants that then attach to that vowel for the purposes of articulation — you can’t speak without vowels.”
Consonants that surface in front of the vowel nucleus are called “onset” consonants, while those that surface after the vowel nucleus are called “coda” consonants.
She adds, “English has one of the most liberal syllable structure systems in the world that allows for large groups of consonants to cluster together in onset and/or in coda position. In English, we can have up to three consonants in onset position and up to five consonants in coda position (at least in the spelling system) – the word str-e-ngths demonstrates the extent of this type of clustering.”
There are rules governing the types of consonants that surface in particular positions within the onset and coda – the general rule is that the more vowel-like a consonant is, the closer it can occur to the vowel.
“The main difference between a vowel and a consonant is that vowels are articulated with a wide-open vocal track (no significant narrowing or blockage) and the vocal cords are vibrating to create the voicing that allows the sound of the vowels to project easily through space. Consonants, on the other hand, involve some kind of narrowing or complete closure of the vocal tract,” says Peters.
Here is a summary of different consonant types:
Stops (such as p, b, t, d, k/c, hard-g) involve a complete, momentary closure of the vocal tract.
Fricatives(such as s, z, sh, f, v, h, etc.) involve a significant narrowing such that friction is created as the air passes through, which creates noise.
Nasal(such as m, n) involve a complete closure in the mouth, but the passage of air is directed through the nose.
Liquids (such as l, r) involve a very fluid passage of air around the tongue.
Glides(such as y, w) involve the movement of the tongue from the glide to the adjacent vowel.
She adds, “Consonants are ranked based on how vowel-like they are with stops/fricatives being the least vowel-like — since they involve a really significant closure of the vocal tract and sometimes involve the absence of vibration of the vocal cords. Put your fingers on your Adam’s apple/larynx and say the sounds s and then z and you can really feel the vibration on the z but not on the s. Nasal being a little more vowel-like in that the vocal cords are vibrating and the air passes through the nose quite easily — although the mouth is obstructed. Liquids are even more vowel-like in that the air flows easily through the mouth, and then glides are the most vowel-like consonants because the tongue does not obstruct the air in a significant way at all, but your mouth is still a bit more closed-in than it is for vowels.”
Following this rule (i.e., the consonant will be more vowel-like the closer it is to the vowel) predicts the existence of words such as twerk where the t and k occur at the far edges because they are stops and the w (as a glide) and the r (as a liquid) can surface closer to the vowel. This rule also predicts why we can’t have words in English like wtekr (in which the order of the first two and final pair of consonants is switched).
A small set of words can have three consonants in the onset position, but these words must always have s or t as the first letter (e.g., scr, shr, spl, spr, thr), while it is more common for words to have three consonants in the coda position (e.g., ght, mbs, nks, rps, tch).
The winning strategy
So, now that she has run us through the rules governing the formation of words, what’s her advice to players who want to complete Worlde with the fewest attempts possible? She says it’s best to start with a three-pronged approach. “Start with words that have lots of vowels – all words need vowels so you can establish the underlying structure of the word by determining which vowels the word has. Secondly, don’t repeat letters (initially). And, thirdly, use some high frequency consonants like s, t, r, l,” says Peters.
Her go-to starting words based on this method include: audio, pause, rouse, argue, outer, slate, untie, etc. While she says it’s best to avoid starting words with no traditional vowels like slyly, or words with repetition like nanny, mommy, daddy, sassy, etc.