Wordle is a source of healthy competition in our family. So, I downloaded the most common five-letter words from the internet and analysed them with the following code. This leads me to suggest — only in the context of Wordle — that you should STARE at the CHILD that is FUNKY.
Warning: package 'lubridate' was built under R version 4.4.1
words <-read_csv(file ="five-letters.csv", col_names =FALSE ) |>rename("word"= X1) |>mutate(word =str_to_lower(word)) |>mutate(l1 =str_sub(string = word, start =1, end =1),l2 =str_sub(string = word, start =2, end =2),l3 =str_sub(string = word, start =3, end =3),l4 =str_sub(string = word, start =4, end =4),l5 =str_sub(string = word, start =5, end =5) )words
ABCDEFGHIJ0123456789
word
<chr>
l1
<chr>
l2
<chr>
l3
<chr>
l4
<chr>
l5
<chr>
about
a
b
o
u
t
above
a
b
o
v
e
abuse
a
b
u
s
e
actor
a
c
t
o
r
acute
a
c
u
t
e
admit
a
d
m
i
t
adopt
a
d
o
p
t
adult
a
d
u
l
t
after
a
f
t
e
r
again
a
g
a
i
n
For reference, I’ll also chart the popularity of each letter by their order in these five-letter words.
words_long <- words |>pivot_longer(cols =-word, names_to ="measure", values_to ="values" ) |>mutate(position =as.integer(str_sub(string = measure, start =2) ) ) |>select(values, position)words_long |>count(values, position) |>mutate(position =case_match( position,1~"1st",2~"2nd",3~"3rd",4~"4th",5~"5th" ) ) |>ggplot(aes(x = values, y = n, fill = values %in%c("a", "e", "r", "s", "t") ) ) +geom_col() +scale_y_continuous(limits =c(0, NA), minor_breaks =NULL, expand =expansion(mult =0, add =1) ) +scale_fill_manual(values =c("#edd9c0", "#63431c")) +facet_wrap(~position, nrow =1) +theme_minimal() +labs(title ="Frequency of letter by word order",subtitle ="Emphasis on the letters a, e, r, s and t\n",x =NULL,y =NULL ) +theme(legend.position ="none",plot.title.position ="plot" )
As you can see from the emphasis, some of these letters appear a lot more than others, and especially at the start and end of the word.
Finding the best first guess
As Wordle tells you if your letters are in the word and in the correct position, I’ll treat the latter as more important than the former. My train journey wasn’t long enough for me to delve deeper, so I assumed that it is doubly good to guess a letter in the right position than it is to guess a correct letter in any position. Given the way that my brain works, I find it easier to guess a word if I know the first and last letter, so I’ll award correct guesses in these positions with 50% more kudos than those in other positions.
If I incorporate all these preferences — and remove words where a letter occurs more than once — the following words become the best options for a first guess at Wordle.
In other words, the best first guess at Wordle is stare. That shouldn’t be a surprise, as the chart above shows many occasions where s, t, a, r and e occur in five-letter words. Even better, s is the most frequent start to a word and e is the most common end letter in these words, giving the word a lot of ‘bonus’ points.
Finding the best subsequent guesses
Given that I’ve already chosen stare, what is the best second guess?1
Whilst I could have continued with the analysis, my train journey didn’t permit it. I was therefore left with the following best guesses, expressed as the following mnemonic:
STARE at the CHILD that is FUNKY
👀 🧒🏻 😎
UPDATE in July 2023: It turns out that the mnemonic above is successful strategy, as it has given me a winning streak of 201 days (and counting).
Footnotes
These ‘best’ guesses might not be perfect, as my assumptions above and the approach in general could probably be improved, perhaps with Operational Research techniques. That said, I suspect that ‘stare’ and the next guesses are decent approximations to the ideal solution.↩︎