Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maximising the information gained using Shannon's entropy is a very good strategy (assuming the goal is to minimise the expected number of guesses), however it is not necessarily optimal!

I have a counter example for a simplified version the game with the following rule changes:

1. The player is only told which letters in the guess are correct (i.e. they are not told about letters that are present but in a different location).

2. If the player knows there is only one possible solution, the player wins immediately (without having to explicitly guess that word).

3. The set of words that the player is allowed to guess may be disjoint from the set of possible solutions.

Here is the list of possible solutions:

    aaaa
    aaab
    aaba
    babb
    abaa
    bbab
    bbba
    bbbb
(There are 8 words. The 2nd, 3rd and 4th letters are the binary patterns of length 3, and the 1st letter is a carefully chosen "red herring".)

Here is the dictionary of words the player is allowed go guess:

    axxx
    xaxx
    xxax
    xxxa
(Each guess effectively lets the player query a single letter of the solution.)

The information gain for each possible initial guess is identical (all guesses result in a 4-4 split), so a strategy based on information gain would have to make an arbitrary choice.

If the initial guess is axxx (the "red herring"), the expected number of guesses is 3.25.

But a better strategy is to guess xaxx (then guess xxax and xxxa). The expected number of guesses is then 3.

(In this example information gain was tied, but I have a larger example where the information gain for the "red herring" is greater than the information gain for the optimal first guess.)



Interesting. There can't be a proof of general optimality for Shannon entropy, because the words are irregularly distributed. However, (unlike your lists) they're not distributed by an adversary trying to foil Wordle/Jotto strategies.

I suspect a law of large numbers / central limit theorem type result that Shannon entropy is asymptotically optimal for randomly chosen lists, even those generated by state machines like gibberish generators that nearly output English words. In other words, I conjecture that your configurations are rare for long lists.

Early in my career, I was naive enough to code up Grobner bases with a friend, to tackle problems in algebraic geometry. I didn't yet know that computer scientists at MIT had tried random equations with horrid running times, and other computer scientists at MIT had established special cases with exponential space complete complexity. Our first theorem explained why algebraic geometers were lucky here. This is a trichotomy one often sees: "Good reason for asking" / "Monkeys at a keyboard" / "Troublemakers at a demo".

Languages evolve like coding theory, attempting a Hamming distance between words to enhance intelligibility. It could well be that the Wordle dictionary behaves quasirandomly, more uniformly spaced that a true random dictionary, so Shannon entropy behaves better than expected.


For the entropy strategy, the expected number of guesses on the full dictionary (allowing all 12972 words as possible solutions) is 4.233 (54910/12972)* according to my tests.

The best score on https://botfights.io/game/wordle is currently 4.044 (4044/1000).

*Spoiler: The score of the entropy strategy can be improved to 4.086 (53007/12972) by tweaking the entropy formula as follows: Let n be the number of possible solutions and let ki be the size of the i-th partition after the current guess. The usual entropy formula is sum{pi * -log(pi)} where pi = ki / n. The improved formula is sum{pi * -log((ki+1) / n)}. This formula aligns more closely with the expected number of guesses required to solve small partitions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: