Could you describe a practical attack against this password generator based on how it actually works?
Specifically what information you would need to be in possession of in order to exploit it, and then a description of how you would - again, in practice - exploit that, inside a practical timeframe. Thanks.
> there probably isn't a realistic attack against Python's default MT random based on the normal use case of a random word selector run "offline".
What does Python use as an entropy source for its PRNG? Even though MTs work pretty well if you can't look at any of their outputs (i.e. you're "offline"), if the initial state space is small enough you can just crawl the entire thing.
A PRNG is only as random as its seed / starting state. Let's say it gets this state from the current time, in milliseconds.
Now, let's assume the attacker knows, based on your "Time joined" statistic for some account, a roughly 2 minute period of when you generated your password (it's very common for a website to show time joined in the granularity of a second through an api; passwords are rarely generated more than a minute beforehand if you use a generator + keepass). 2 minutes is 3,600,000 milliseconds... or in other words, they can seed the RNG with all 3.6 million "time-in-milliseconds" and run the program and see the output. They now only have to guess 3.6 million possible passwords, not, say, 1e20 (assuming a dictionary of 10k words). That's a massive speedup on the order of 1e14.
I hope that all made sense... the basic idea is just if the attacker can make any sort of estimate about the state of the PRNG, he can just test passwords generated from states like that, not all possible passwords.
You'll notice, of course, that this vulnerability relies on the other party being able to discern some information about the state. This doesn't have to be from guessing the seed directly (as was the case in the example above). It can also happen by observing multiple outputs and guessing at what seed could produce them both, or noticing a bias.
The last one, the chance for a bias, is quite possibly what you're asking about. "How could a weakness in the randomness be a problem at all in this case?". A weakness in randomness means that some combinations of words will show up more often than others, by definition. It makes the password weaker because the attacker can discover this bias and then guess more probable passwords first.
That he has it or can get it, yes. Even if he doesn't have it, there are a few dictionaries (/usr/share/dict/words on various popular distros would be a good start) he could try and only guess the probable options in both of them.
Furthermore, it's possible he could figure out which dictionary you were using by another means. If a site that stored your password in plaintext was compromised, that would give him 4,5,whatever words in your dictionary, and using a similar attack on this flawed prng as discussed above, he could narrow down the possible positions of those words, thereby figuring what dictionary is most probable.
The assumption that the attacker can get your word list in this sort of password scheme is actually implicit in how the entropy is calculated; it's calculated assuming the attacker is stringing together words from the same known source in the same general format.
I'm not familiar with python's non-secure PRNG implementation. Usually it involves the fact that A)these PRNGs have small entropy pools, so you can check the entire state space quickly and B)they have non-uniform outputs, so you can do probabilistic attacks.
Here is what I happen to have sitting around. (I'm not recommending it; there are much more secure ones. But it's short, and uses a crypto-random generator.)