David Wagner writes: > Andreas Beck wrote: > >DONT do this [use /dev/urandom to generate VNC challenges]. This will > >deplete the random pool for no good reason. > > I believe your comment is incorrect or misleading. I believe /dev/urandom > will not deplete the randomness pool, and I suspect using /dev/urandom > is a good idea. Try it. > > Don't confuse /dev/random with /dev/urandom. /dev/random depletes > the randomness pool, but should rarely be used in applications, IMHO. > /dev/urandom is typically the right choice, and continues to give output > without blocking to wait for more entropy input. > > (It is true that if you call /dev/urandom, other users of /dev/random > may find that there is no entropy left for them. But this should only > be an issue for users of /dev/random, and the latter should be rare. So, for what it's worth, on my Linux 2.4.13 system, /dev/random seems to have about 2100 bytes of randomness to start with, and accumulate new randomness at about 13 bytes per second on a relatively idle system. I know you know the following, but I think your explanation of it above is confusing, so I have undertaken to restate it: /dev/urandom *does* deplete the entropy pool; it's just that depletion of the entropy pool doesn't keep /dev/urandom from working, but it does keep /dev/random from working. My understanding is that /dev/random is only more secure than /dev/urandom if it turns out that MD5 has undiscovered weaknesses; is that correct? > >A challenge only needs to be _different_ each time. > > Not so. Challenges should be unpredictable. A counter would not > make a very good challenge. It can't hurt, and is probably a good > idea, to simply use crypto-quality random numbers in your challenge. Well, a challenge that is different each time will effectively foil replay attacks --- even a simple counter will do for that. What kind of attack do you have in mind that would be easier if the challenge were a simple counter? (Assuming, of course, that the counter never restarted from 0.) I was actually working on a Zope project last night in which I wanted to avoid the authorized-user-taking-unintended-action problem --- you know, the one where the attacker tricks a web site administrator into clicking a button on the attacker's web site that makes the administrator's browser POST a form, with the administrator's credentials, to the administrator's web site? I embedded a random string in the (legitimate) form, and when the form was posted, checked to see if the random string in the request was one the server had sent out. Thus, for an attacker to produce a fake form, they must obtain the value of such a random string, either by sniffing the wire running to the administrator's browser, or by guessing the random string. On Linux, I just use /dev/urandom, as you recommend. On /dev/urandom-less systems, I settled on the following. I don't know how safe it is, and I'd be interested in comments. import sys, os, sha, time counter = 0 def random_hex_string(): ... # try to capture some of the memory allocation state: thousandlists = [[] for ii in range(1000)] # and other interpreter state: counts = [sys.getrefcount(ii) for ii in range(-2, 100)] charcounts = [sys.getrefcount(chr(ii)) for ii in range(256)] # and something that's guaranteed to be different each time global counter counter += 1 randomdata = ','.join(map(str, [os.getpid(), time.time(), time.clock(), map(id, thousandlists), counts, charcounts, counter, # some large chunk of application data here ])) return sha.sha(randomdata).hexdigest() 'counter' is used to prevent any repetition, no matter how frequently "random" strings are generated. os.getpid() provides no entropy to someone who can log into the machine, since they can find out the PID of the process with little difficulty, and someone with intimate knowledge of the inner workings of the machine may be able to narrow down the PID possibilities considerably. (For example, if the program starts when the machine boots, it may always have the same PID.) But it will often provide a few bits of entropy, at least four or five. time.time() provides a floating-point version of the current time, generally with subsecond resolution, but usually with much worse than millisecond resolution. This probably provides a few bits of entropy (probably at least four or so, possibly as many as a dozen or so) for someone who can't log into the machine this code is running on, because it's relatively hard for them to determine what time the machine thinks it is --- its time may not be set very accurately. Note that this program takes several seconds to restart, so time.time() plus the counter prevents repetition, if nothing else. time.clock() provides a floating-point number of CPU seconds this process has consumed; this is included because it may be harder to guess than time.time() --- it depends on how many requests this thread has handled and how much work they took --- and because it usually has higher resolution than time.time(), although it advances much more slowly. It will provide very little entropy when the attacker knows the process has just started, but much more when the process has been running for a long time, or may have been running for a long time. So, to some extent, this counterbalances os.getpid() --- it will tend to provide more entropy when os.getpid() provides little, and os.getpid() will tend to provide more entropy when time.clock() provides little. time.clock() can usually also be estimated by an attacker who is logged into the same machine. In my application, I guess that time.clock() provides around ten bits of entropy. 'counts' and 'charcounts' are the number of references to the numbers -2 through 100 and to the various single-character strings; these may depend on the history of the program. Python optimizes a little bit by keeping only one copy of each number in the range [-1, 99] and each single-character string, but it counts how many. In this program, I think this depends sensitively on the exact data stored in the program, which changes each day, as well as on the history of requests the program has handled. I think this probably provides at least ten bits of entropy, possibly much more. 'thousandlists' is a list of a thousand newly-created empty lists. We take their memory addresses. This depends on the state of the Python memory allocator. In this program, I suspect that this changes considerably after every request, but I don't really have a good way of measuring that; but I am guessing that at least the first couple of hundred of these lists are allocated at somewhat unpredictable places and therefore provide a couple of bits of entropy each. But I'm not very sure of this, and obviously it depends on the program; that's why I included all the other stuff above. Then we take all of these things, plus several megabytes of actual application data, join their string representations with commas, and hash them with SHA-1. Does anyone have a better solution that doesn't involve calling entropy-gathering routines from all over the program or running a continuous entropy-gathering thread? Are there any big problems in this solution, other than that it only has (by my pessimistic estimates) about 28 bits of entropy if my "thousandlists" trick isn't really very effective? 28 bits is probably sufficient for my purposes. Is there some much simpler solution I could have more confidence in? -- <kragen@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/> What we *need* is for some advanced off-world sentience to carpet nuke planet Earth from high orbit. Call it Equal Opportunity Ethnic Cleansing. I mean, racism is so petty. Why play favorites? -- RageBoy