Asp Forum - Is this secure? - comp.lang.python

mk

2/23/2010 2:36:00 PM

Hello,

I need to generate passwords and I think that pseudo-random generator is
not good enough, frankly. So I wrote this function:

import struct

def gen_rand_string():
fileobj = open('/dev/urandom','rb')
rstr = fileobj.read(4)
rnum = struct.unpack('L',rstr)[0]
rstr = '%i' % rnum
rnuml = []
while len(rstr) >= 2:
c = rstr[:2]
try:
num = int(c)
rnuml.append(num)
except ValueError:
pass
rstr = rstr[2:]
rnuml = map(lambda x: 97+x/4, rnuml)
rnumc = map(chr, rnuml)
return ''.join(rnumc)

if __name__ == "__main__":
print gen_rand_string()

(yes I know that this way generated string will not contain 'z' because
99/4 + 97 = 121 which is 'y')

The question is: is this secure? That is, can the string generated this
way be considered truly random? (I abstract from not-quite-perfect
nature of /dev/urandom at the moment; I can always switch to /dev/random
which is better)

Regards,
mk

50 Answers

Paul Rubin

2/23/2010 7:20:00 PM

mk <mrkafk@gmail.com> writes:
> I need to generate passwords and I think that pseudo-random generator
> is not good enough, frankly. So I wrote this function:...
> The question is: is this secure? That is, can the string generated
> this way be considered truly random? (I abstract from
> not-quite-perfect nature of /dev/urandom at the moment; I can always
> switch to /dev/random which is better)

urandom is fine and the entropy loss from the numeric conversions and
eliminating 'z' in that code before you get letters out is not too bad.
The code is pretty ugly. The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer. Passwords that
short are vulnerable to dictionary attacks. Longer passwords made from
random letters are difficult to remember.

I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters. Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program. The docs there are pretty
thoughtful and may help you understand the relevant issues.

mk

2/23/2010 7:59:00 PM

On Feb 23, 7:19 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

> The code is pretty ugly. The main problem is you end up with a password
> that's usually 5 letters but sometimes just 4 or fewer.

Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?

> You might look at the sitewww.diceware.comfor an approach to this,
> which you can implement with a program. The docs there are pretty
> thoughtful and may help you understand the relevant issues.

Thanks. But I would also be grateful for indicating what is wrong/ugly
in my code.

Regards,
mk

Robert Kern

2/23/2010 8:04:00 PM

On 2010-02-23 13:19 PM, Paul Rubin wrote:

> I find it's most practical to use a few random words (chosen from a word
> list like /usr/dict/words) rather than random letters. Words are easier
> to remember and type.
>
> You might look at the site www.diceware.com for an approach to this,
> which you can implement with a program. The docs there are pretty
> thoughtful and may help you understand the relevant issues.

I like RFC 1751 for this:

http://gitweb.pycrypto.org/?p=crypto/pycrypto-2.x.git;a=blob;f=lib/Crypto/Util/RFC1751.py;h=1c98a212c22066adabfee521b495eeb4f9d723...

Shortened URL:

http://...

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Robert Kern

2/23/2010 8:49:00 PM

On 2010-02-23 13:59 PM, mk wrote:
> On Feb 23, 7:19 pm, Paul Rubin<no.em...@nospam.invalid> wrote:
>
>> The code is pretty ugly. The main problem is you end up with a password
>> that's usually 5 letters but sometimes just 4 or fewer.
>
> Well I didn't write the whole thing here, in actual use I'd write a
> loop repeating the function until I have enough characters and then
> I'd select a substring of specified length.
>
> Anything else in the code that is ugly and I should correct?

I would recommend using random.SystemRandom.choice() on a sequence of acceptable
characters. E.g. (untested)

import random
import string

characters = string.letters + string.digits + '~!@#$%^&*()-+=,;./\?><|'
# ... or whatever.

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice(characters))
return ''.join(chars)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Lawrence D'Oliveiro

2/23/2010 11:18:00 PM

In message <mailman.110.1266935711.4577.python-list@python.org>, mk wrote:

> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:

Much simpler:

import subprocess

data, _ = subprocess.Popen \
(
args = ("pwgen", "-nc"),
stdout = subprocess.PIPE
).communicate()
print data

Steven D'Aprano

2/24/2010 2:08:00 AM

On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> Hello,
>
> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:
[snip]
> (yes I know that this way generated string will not contain 'z' because
> 99/4 + 97 = 121 which is 'y')

You're worried about the security of the PRNG but then generate a TWO to
FIVE character lowercase password with no digits, punctuation or the
letter 'z'? That's priceless!

Python's PRNG is not suitable for producing cryptographically strong
streams of random bytes, but it is perfectly strong enough for generating
good passwords.

> The question is: is this secure?

No.

You are wasting your time trying to fix something which isn't a problem,
and introducing a much bigger problem instead. You are MUCH MUCH MUCH
better off with a six or ten character password taken from upper and
lowercase letters, plus digits, plus punctuation, than a four digit
password taken from lowercase letters only. Even if the first case has
some subtle statistical deviation from uniformity, and the second is
"truly random" (whatever that means), it doesn't matter.

Nobody is going to crack your password because the password generator is
0.01% more likely to generate a "G" than a "q". But they *will* brute-
force your password if you have a four digit password taken from a-y only.

> That is, can the string generated this
> way be considered truly random?

Define truly random.

--
Steven

Steven D'Aprano

2/24/2010 2:20:00 AM

On Tue, 23 Feb 2010 11:19:59 -0800, Paul Rubin wrote:

> mk <mrkafk@gmail.com> writes:
>> I need to generate passwords and I think that pseudo-random generator
>> is not good enough, frankly. So I wrote this function:... The question
>> is: is this secure? That is, can the string generated this way be
>> considered truly random? (I abstract from not-quite-perfect nature of
>> /dev/urandom at the moment; I can always switch to /dev/random which is
>> better)
>
> urandom is fine and the entropy loss from the numeric conversions and
> eliminating 'z' in that code before you get letters out is not too bad.

What?

You're going from a possible alphabet of 62 (excluding punctuation) or 94
(inc punctuation available on an American keyboard) distinct letters down
to 25, and you say that's "not too bad"?

Paul, if you were anyone else, I'd be sneering uncontrollably about now,
but you're not clueless about cryptography, so what have I missed? Why is
reducing the number of distinct letters by more than 50% anything but a
disaster? This makes the task of brute-forcing the password exponentially
easier.

Add the fact that the passwords are so short (as little as two characters
in my tests) and this is about as far from secure as it is possible to be.

--
Steven

Paul Rubin

2/24/2010 2:40:00 AM

Steven D'Aprano <steven@REMOVE.THIS.cybersource.com.au> writes:
> Paul, if you were anyone else, I'd be sneering uncontrollably about now,
> but you're not clueless about cryptography, so what have I missed? Why is
> reducing the number of distinct letters by more than 50% anything but a
> disaster? This makes the task of brute-forcing the password exponentially
> easier.

Reducing the number of distinct letters by 50% decreases the entropy per
character by 1 bit. That stuff about mixing letters and digits and
funny symbols just makes the password a worse nuisance to remember and
type, for a small gain in entropy that you can compute and make up for.
The main thing you have to make sure is that the min-entropy is
sufficient for your purposes, and it's generally more convenient to do
that by making the password a little bit longer than by imposing
contortions on the person typing it. Ross Anderson's "Security
Engineering" chapter about passwords is worth reading too:

http://www.cl.cam.ac.uk/~rja14/Papers...

When I mentioned entropy loss to the OP though, I mostly meant loss from
getting rid of the letter z. The (binary) Shannon entropy of the
uniform probability distribution on 26 letters is 4.7004397 bits; on 25
letters, it's 4.6438561 bits. The difference isn't enough to give an
attacker that much advantage.

I like the diceware approach to passphrase generation and I've been
using it for years. www.diceware.com explains it in detail and the docs
there are quite well-thought-out and informative. Keep in mind that the
entropy needed for an online password (attacker has to make a server
query for every guess, and hopefully gets locked out after n wrong
tries) and an offline one (attacker has something like a hash of the
password and can run a completely offline search) are different.

Here is a program that I use sometimes:

from math import log
dictfile = '/usr/share/dict/words'

def genrandom(nbytes):
with open('/dev/urandom') as f:
return int(f.read(nbytes).encode('hex'), 16)

def main():
wordlist = list(x.strip() for x in open(dictfile) if len(x) < 7)
nwords = len(wordlist)
print "%d words, entropy=%.3f bits/word"% (
nwords, log(nwords, 2))
print '-'.join(wordlist[genrandom(10)%nwords] for i in xrange(6))

main()

Steven D'Aprano

2/24/2010 2:40:00 AM

On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> The question is: is this secure? That is, can the string generated this
> way be considered truly random?

Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.

This isn't a sophisticated statistical test, it's the equivalent of a
back-of-the-envelope calculation: I generated 100,000 random strings with
your code, and counted how often each letter appears:

If the letters are uniformly distributed, you would expect all the
numbers to be quite close, but instead they range from 15063 to 25679:

{'a': 15063, 'c': 20105, 'b': 15100, 'e': 25465, 'd': 25458, 'g': 25597,
'f': 25589, 'i': 25045, 'h': 25679, 'k': 22945, 'j': 25531, 'm': 16187,
'l': 16252, 'o': 16076, 'n': 16012, 'q': 16069, 'p': 16119, 's': 16088,
'r': 16087, 'u': 15951, 't': 16081, 'w': 16236, 'v': 15893, 'y': 15834,
'x': 15956}

Eye-balling it, it looks vaguely two-humped, one hump around 15-16K, the
second around 22-25K. Sure enough, here's a quick-and-dirty graph:

a | ***********************************
b | ***********************************
c | ***********************************************
d | ***********************************************************
e | ***********************************************************
f | ************************************************************
g | ************************************************************
h | ************************************************************
i | ***********************************************************
j | ************************************************************
k | ******************************************************
l | **************************************
m | **************************************
n | *************************************
o | **************************************
p | **************************************
q | **************************************
r | **************************************
s | **************************************
t | **************************************
u | *************************************
v | *************************************
w | **************************************
x | *************************************
y | *************************************

The mean of the counts is 19056.72, and the mean deviation is 3992.28.
While none of this is statistically sophisticated, it does indicate to me
that your function is nowhere even close to uniform. It has a very strong
bias.

--
Steven

Paul Rubin

2/24/2010 2:42:00 AM

Steven D'Aprano <steven@REMOVE.THIS.cybersource.com.au> writes:
> Putting aside the philosophical question of what "truly random" means, I
> presume you mean that the letters are uniformly distributed. The answer
> to that is, they don't like uniformly distributed.

That is a good point, the way those letters are generated (through the
decimal conversion stuff), they won't be all that uniform.

comp.lang.python

Is this secure?

mk

Paul Rubin

mk

Robert Kern

Robert Kern

Lawrence D'Oliveiro

Steven D'Aprano

Steven D'Aprano

Paul Rubin

Steven D'Aprano

Paul Rubin

x Login to ForumsZone