Re: VNC authentication weakness

kragen@pobox.com (Kragen Sitaker) · Fri, 26 Jul 2002 05:15:40 -0400 (EDT)

David Wagner writes:
> Andreas Beck  wrote:
> >DONT do this [use /dev/urandom to generate VNC challenges]. This will
> >deplete the random pool for no good reason.
> 
> I believe your comment is incorrect or misleading.  I believe /dev/urandom
> will not deplete the randomness pool, and I suspect using /dev/urandom
> is a good idea.  Try it.
> 
> Don't confuse /dev/random with /dev/urandom.  /dev/random depletes
> the randomness pool, but should rarely be used in applications, IMHO.
> /dev/urandom is typically the right choice, and continues to give output
> without blocking to wait for more entropy input.
> 
> (It is true that if you call /dev/urandom, other users of /dev/random
> may find that there is no entropy left for them.  But this should only
> be an issue for users of /dev/random, and the latter should be rare.

So, for what it's worth, on my Linux 2.4.13 system, /dev/random seems
to have about 2100 bytes of randomness to start with, and accumulate
new randomness at about 13 bytes per second on a relatively idle
system.

I know you know the following, but I think your explanation of it
above is confusing, so I have undertaken to restate it:

/dev/urandom *does* deplete the entropy pool; it's just that depletion
of the entropy pool doesn't keep /dev/urandom from working, but it
does keep /dev/random from working.

My understanding is that /dev/random is only more secure than
/dev/urandom if it turns out that MD5 has undiscovered weaknesses; is
that correct?

> >A challenge only needs to be _different_ each time.
> 
> Not so.  Challenges should be unpredictable.  A counter would not
> make a very good challenge.  It can't hurt, and is probably a good
> idea, to simply use crypto-quality random numbers in your challenge.

Well, a challenge that is different each time will effectively foil
replay attacks --- even a simple counter will do for that.  What kind
of attack do you have in mind that would be easier if the challenge
were a simple counter?  (Assuming, of course, that the counter never
restarted from 0.)

I was actually working on a Zope project last night in which I wanted
to avoid the authorized-user-taking-unintended-action problem --- you
know, the one where the attacker tricks a web site administrator into
clicking a button on the attacker's web site that makes the
administrator's browser POST a form, with the administrator's
credentials, to the administrator's web site?  I embedded a random
string in the (legitimate) form, and when the form was posted, checked
to see if the random string in the request was one the server had sent
out.  Thus, for an attacker to produce a fake form, they must obtain
the value of such a random string, either by sniffing the wire running
to the administrator's browser, or by guessing the random string.

On Linux, I just use /dev/urandom, as you recommend.  On
/dev/urandom-less systems, I settled on the following.  I don't know
how safe it is, and I'd be interested in comments.

import sys, os, sha, time
counter = 0
def random_hex_string():
    ...
        # try to capture some of the memory allocation state:
        thousandlists = [[] for ii in range(1000)]
        # and other interpreter state:
        counts = [sys.getrefcount(ii) for ii in range(-2, 100)]
        charcounts = [sys.getrefcount(chr(ii)) for ii in range(256)]
        # and something that's guaranteed to be different each time
        global counter
        counter += 1

        randomdata = ','.join(map(str,
                                  [os.getpid(),
                                   time.time(),
                                   time.clock(),
                                   map(id, thousandlists),
                                   counts, charcounts, counter,
                                   # some large chunk of application data here
                                   ]))
        return sha.sha(randomdata).hexdigest()

'counter' is used to prevent any repetition, no matter how frequently
"random" strings are generated.

os.getpid() provides no entropy to someone who can log into the
machine, since they can find out the PID of the process with little
difficulty, and someone with intimate knowledge of the inner workings
of the machine may be able to narrow down the PID possibilities
considerably.  (For example, if the program starts when the machine
boots, it may always have the same PID.)  But it will often provide a
few bits of entropy, at least four or five.

time.time() provides a floating-point version of the current time,
generally with subsecond resolution, but usually with much worse than
millisecond resolution.  This probably provides a few bits of entropy
(probably at least four or so, possibly as many as a dozen or so) for
someone who can't log into the machine this code is running on,
because it's relatively hard for them to determine what time the
machine thinks it is --- its time may not be set very accurately.
Note that this program takes several seconds to restart, so
time.time() plus the counter prevents repetition, if nothing else.

time.clock() provides a floating-point number of CPU seconds this
process has consumed; this is included because it may be harder to
guess than time.time() --- it depends on how many requests this thread
has handled and how much work they took --- and because it usually has
higher resolution than time.time(), although it advances much more
slowly.  It will provide very little entropy when the attacker knows
the process has just started, but much more when the process has been
running for a long time, or may have been running for a long time.
So, to some extent, this counterbalances os.getpid() --- it will tend
to provide more entropy when os.getpid() provides little, and
os.getpid() will tend to provide more entropy when time.clock()
provides little.  time.clock() can usually also be estimated by an
attacker who is logged into the same machine.  In my application, I
guess that time.clock() provides around ten bits of entropy.

'counts' and 'charcounts' are the number of references to the numbers
-2 through 100 and to the various single-character strings; these may
depend on the history of the program.  Python optimizes a little bit
by keeping only one copy of each number in the range [-1, 99] and each
single-character string, but it counts how many.  In this program, I
think this depends sensitively on the exact data stored in the
program, which changes each day, as well as on the history of requests
the program has handled.  I think this probably provides at least ten
bits of entropy, possibly much more.

'thousandlists' is a list of a thousand newly-created empty lists.  We
take their memory addresses.  This depends on the state of the Python
memory allocator.  In this program, I suspect that this changes
considerably after every request, but I don't really have a good way
of measuring that; but I am guessing that at least the first couple of
hundred of these lists are allocated at somewhat unpredictable places
and therefore provide a couple of bits of entropy each.  But I'm not
very sure of this, and obviously it depends on the program; that's why
I included all the other stuff above.

Then we take all of these things, plus several megabytes of actual
application data, join their string representations with commas, and
hash them with SHA-1.

Does anyone have a better solution that doesn't involve calling
entropy-gathering routines from all over the program or running a
continuous entropy-gathering thread?  Are there any big problems in
this solution, other than that it only has (by my pessimistic
estimates) about 28 bits of entropy if my "thousandlists" trick isn't
really very effective?  28 bits is probably sufficient for my
purposes.  Is there some much simpler solution I could have more
confidence in?

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
What we *need* is for some advanced off-world sentience to carpet nuke planet
Earth from high orbit.  Call it Equal Opportunity Ethnic Cleansing.  I mean,
racism is so petty.  Why play favorites?  -- RageBoy