Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  import random, re

  lines=[re.sub('[-!.,;:]',' ',x).split() for x in open('in').readlines()]

  w=set()
  for l in lines:
      w=w | set(l)

  random.sample(w,4)
Hrm, for romeo and juliet I get some interesting combinations...

  ['unseen', 'iron', 'Catling', 'baked']
  ['grove', 'press', 'Aurora', 'garish']
  ['agate', 'She', 'drybeat', 'rather']
  ['flowed', 'sails', 'wed', 'masks']
  ['spilt', 'cage', 'Remembering', 'stiff']
  ['heartsick', 'shame', 'enjoin', 'weeping']


> import random, re

You'll want to replace random with a cryptographically secure RNG.


Could you describe a practical attack against this password generator based on how it actually works?

Specifically what information you would need to be in possession of in order to exploit it, and then a description of how you would - again, in practice - exploit that, inside a practical timeframe. Thanks.


No, there probably isn't a realistic attack against Python's default MT random based on the normal use case of a random word selector run "offline".


> there probably isn't a realistic attack against Python's default MT random based on the normal use case of a random word selector run "offline".

What does Python use as an entropy source for its PRNG? Even though MTs work pretty well if you can't look at any of their outputs (i.e. you're "offline"), if the initial state space is small enough you can just crawl the entire thing.


It's offline, so, I have my own dictionary which pretty much destroys any chance of guessing what words are sampled.


urandom.


I'm also interested in knowing a practical attack against this.

How could a weakness in the randomness be a problem at all in this case? (generating words to use as password)

Ok suppose I've used a not-at-all-random PRNG and got the following numbers: 1, 16, 29, 30, 18

So I grab the 1st, 16th, 29th, 30th and the 18th words from my list and those words are my generated password.

How is this vulnerable to any attacks?


To expound on the other response.

A PRNG is only as random as its seed / starting state. Let's say it gets this state from the current time, in milliseconds.

Now, let's assume the attacker knows, based on your "Time joined" statistic for some account, a roughly 2 minute period of when you generated your password (it's very common for a website to show time joined in the granularity of a second through an api; passwords are rarely generated more than a minute beforehand if you use a generator + keepass). 2 minutes is 3,600,000 milliseconds... or in other words, they can seed the RNG with all 3.6 million "time-in-milliseconds" and run the program and see the output. They now only have to guess 3.6 million possible passwords, not, say, 1e20 (assuming a dictionary of 10k words). That's a massive speedup on the order of 1e14.

I hope that all made sense... the basic idea is just if the attacker can make any sort of estimate about the state of the PRNG, he can just test passwords generated from states like that, not all possible passwords.

You'll notice, of course, that this vulnerability relies on the other party being able to discern some information about the state. This doesn't have to be from guessing the seed directly (as was the case in the example above). It can also happen by observing multiple outputs and guessing at what seed could produce them both, or noticing a bias.

The last one, the chance for a bias, is quite possibly what you're asking about. "How could a weakness in the randomness be a problem at all in this case?". A weakness in randomness means that some combinations of words will show up more often than others, by definition. It makes the password weaker because the attacker can discover this bias and then guess more probable passwords first.


Thanks, but all of that is based o the assumption that the attacker has the list of words that I used to generate the password right?


That he has it or can get it, yes. Even if he doesn't have it, there are a few dictionaries (/usr/share/dict/words on various popular distros would be a good start) he could try and only guess the probable options in both of them.

Furthermore, it's possible he could figure out which dictionary you were using by another means. If a site that stored your password in plaintext was compromised, that would give him 4,5,whatever words in your dictionary, and using a similar attack on this flawed prng as discussed above, he could narrow down the possible positions of those words, thereby figuring what dictionary is most probable.

The assumption that the attacker can get your word list in this sort of password scheme is actually implicit in how the entropy is calculated; it's calculated assuming the attacker is stringing together words from the same known source in the same general format.


>How is this vulnerable to any attacks?

Because if the PRNG isn't sufficiently random, an attacker can determine that you got the numbers 1, 16, 29, 30, 18.


But what if they don't have the list? (which is most likely)


I'm not familiar with python's non-secure PRNG implementation. Usually it involves the fact that A)these PRNGs have small entropy pools, so you can check the entire state space quickly and B)they have non-uniform outputs, so you can do probabilistic attacks.


python and ruby use a mersene twister prng: http://en.wikipedia.org/wiki/Mersenne_twister it is a good prng... but it is not crypto secure. as an easy patch just use urandom: http://cryptography.readthedocs.org/en/latest/random-numbers...


Here is what I happen to have sitting around. (I'm not recommending it; there are much more secure ones. But it's short, and uses a crypto-random generator.)

    import math
    from random import SystemRandom

    cryptogen = SystemRandom()

    alphabet = open('/path/to/a/custom/wordlist').read().splitlines()
    assert len(set(alphabet)) == len(alphabet)

    def gen_token(length):
        return ','.join(cryptogen.sample(alphabet, length))

    nbits = 80
    length = int(math.ceil(nbits * math.log(2, len(alphabet))))
    print gen_token(length)


    from random import SystemRandom
    random = SystemRandom()




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: