mailing list archives
Re: [RFC] Username/Password NSE library
From: "Andrew J. Bennieston" <harriergr7 () gmail com>
Date: Wed, 18 Jun 2008 15:34:09 +0100
The following are some thoughts on the statistics of password guessing.
I haven't researched this in any detail whatsoever, but I'm a physicist
during the day, and one of my interests lie in statistical modelling.
Following on from the comment that a list of the 1000 most common
passwords performs barely any better than one of the 500 most common, I
was immediately reminded of the Central Limit Theorem. This arises from
anything which has a Gaussian (or near-Gaussian) distribution, and
states two things which are of potential interest here.
The first is obvious: The most efficient is to get the password right on
the first guess; this provides the "zero centred" peak for our Gaussian
distribution of efficiency vs. list size.
The second is the statement that the width of the distribution scales as
the inverse square root of N, the number of items in the list.
Interpreting this for password lists, we can say that as the length N of
a list increases, the increase in effectiveness we get by virtue of it
including more passwords, and thus being more likely to guess the
correct one, should scale as a maximum by the square root of N (i.e. the
reciprocal of the "distribution width").
For example, if we normalise our measure of effectiveness so that the
constant of proportionality is 1, then the effectiveness of a list of
500 passwords is given the (dimensionless) value Sqrt = 22.4
The effectiveness of 1000 passwords, a list twice as long, is Sqrt
This clearly demonstrates that, in this model, the effectiveness is not
doubled by doubling the list length. If, as was hinted in earlier posts,
the fall-off of effectiveness is much steeper than this, i.e. 1000
passwords are barely any more effective than 500, then this could be
accommodated using a prefactor proportional to N to some power.
Of course, most people on this list aren't interested in the slightest
in the statistical models of such things, but in how to choose the
optimal list length. Based on the model postulated above, we can say
that the expected difference in effectiveness E between two lists of
length N1 and N2, given by E2 = 2*E1, and the true effectiveness, given
by E2' = Sqrt[N2] can be written:
delta E(N1, N2) = 2*Sqrt[N1] - Sqrt[N2] where N2 = 2 * N1
=> dE(N1) = 2*Sqrt[N1] - Sqrt[2*N1]
Setting an arbitrary threshold dE <= 10, we can find the value of N1
above which the discrepancy between expected effectiveness since we've
doubled the list size, and true effectiveness from the Central Limit
Theorem exceeds 10, we get:
dE(N1) = 2Sqrt[N1] - Sqrt[2N1] <= 10
Sqrt[N1](2 - Sqrt) <= 10
Sqrt[N1] <= 10/(2 - Sqrt)
N1 <= 291.4
In other words, a password list of size ~ 250 to 300 keeps the
discrepancy between true effectiveness of the list size, and perceived
effectiveness due to the presence of more words below the value of 10
(which I chose arbitrarily, but it seems to have provided a reasonable
value for N!)
Anyway, I hope some of that helped to get you guys thinking about the
effect of list size on brute-force password guessing; there is
definitely a diminishing return, and while the distribution may not be
exactly Gaussian, most independent random variables fit some kind of
Gaussian, and thus obey the central limit theorem.
Andrew J. Bennieston
Kris Katterjohn wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Brandon Enright wrote:
On Tue, 17 Jun 2008 15:46:09 -0500
Kris Katterjohn <katterjohn () gmail com> wrote:
I've started working on a username and password NSE library. This
library will separately hand out usernames and/or passwords to
scripts for use with brute forcing or whathaveyou.
I'll probably have one set of functions return a closure to return the
usernames or passwords one-at-a-time, and possibly another set of
functions to return the whole username or password table.
Username specific passwords would be _really_ nice.
I'm thinking for root the password list would be a few hundred long.
For other users the list would probably be something like:
These are interesting ideas, especially the overall user-specific passwords.
Now I need opinions on good username and password lists to ship and
use by default. There is an ordered password list shipped with John
the Ripper which has 3107 entries. The license pretty much says
we can distribute it if we give credit and also ship the license.
Are there any ideas on a better list?
It has been my experience, both from UCSD being on constant
password guessing the victim side, and me being on the audit our
passwords side that more passwords is _not_ better. If you don't guess
the password in the first hundred tries or so is is very unlikely that
continued guessing will help much.
Guessing passwords over the network is expensive and there is a
diminishing return. The value of trying an additional password is
roughly inversely proportional to the number you have already tried.
We've found that a list of the 1000 most commonly guessed passwords
performs almost no better than 500 but takes twice as long.
Interesting! I've never been a brute-forcer, so I had no idea what a good
number of guesses would be.
Of course, it is up to the script how many attempts they make: the library
will only provide them with the data. This library is specifically for giving
scripts usernames and passwords, so that's a good reason to have a whole bunch.
On one hand, I don't want Nmap's list to end up being too small because
somebody's script does want to do a lot of guessing; but, on the other hand,
if a user wants a massive list to use, they can always select their own.
What about a good username list?
Besides the obvious root, webadmin, guest, admin, test, mysql, web,
oracle, student, staff, etc we should only use first names.
Nearly 100% of the SSH brute force compromises we fall to are just
first-name usernames like:
you get the idea
Good idea. Maybe there can be an option given to the username function to
return only "administrator" usernames like root, admin, etc. But thinking
about it for a second, it wouldn't be easy to do just reading from a list.
Of course we could just have the administrator names at the top of the list,
which is probably best anyway.
Any other comments are appreciated.
I think the best way to gather the root list is to collect real-world
honeypot data. I have data I can provide and I'm sure hundreds of
others on this list also have data. We should probably cat * | sort |
uniq -c | sort -nr | head 500 to make our list.
That would be awesome, though the actual number of entries in the list is
Overall I think this is a very good idea, Kris. I look forward to the
Ah, if only I can take credit for good ideas. This, like many things I work
on, was handed to me from people in the Thinking Stuff Up Dept. ;)
It really should be cool when it's complete because bruteTelnet will be ported
to it and it should make the creation of other brute-force scripts a bit easier.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
Sent through the nmap-dev mailing list
Archived at http://SecLists.Org
Sent through the nmap-dev mailing list
Archived at http://SecLists.Org