On Thu, Sep 19, 2019 at 8:20 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Yes, it hashes it using a good hash, but it does so in a way that > makes it largely possible to follow the hashing and repeat it and > analyze it. > > That breaks if we have hw randomness, because it does the > > if (arch_get_random_long(&v)) > crng->state[14] ^= v; > > so it always mixes in hardware randomness as part of the extraction, > but we don't mix anything else unpredictable - or even > process-specific - state in. So this is the other actual _serious_ patch I'd suggest: replace the if (arch_get_random_long(&v)) crng->state[14] ^= v; with if (!arch_get_random_long(&v)) v = random_get_entropy(); crng->state[14] += v; instead. Yeah, it still doesn't help on machines that don't even have a cycle counter, but it at least means that you don't have to have a CPU rdrand (or equivalent) but you do have a cycle counter, now the extraction of randomness from the pool doesn't just do the (predictable) mutation for the backtracking, but actually means that you have some very hard to predict timing effects. Again, in this case a cycle counter really does add a small amount of entropy (everybody agrees that modern CPU's are simply too complex to be predictable at a cycle level), but that's not really the point. The point is that now doing the extraction really fundamentally changes the state in unpredictable ways, so that you don't have that "if I recognize a value, I know what the next value will be" kind of attack. Which, as mentioned, is actually not a purely theoretical concern. Note small detail above: I changed the ^= to a +=. Addition tends to be better (due to carry between bits) when there might be bit commonalities. Particularly with something like a cycle count where two xors can mostly cancel out previous bits rather than move bits around in the word. With an actual random input from rdrand, the xor-vs-add is immaterial and doesn't matter, of course, so the old code made sense in that context. In the attached patch I also moved the arch_get_random_long() and random_get_entropy() to outside the crng spinlock. We're not talking blocking operations, but it can easily be hundreds of cycles with rdrand retries, or the random_get_entropy() reading an external clock on some architectures. Linus
diff --git a/drivers/char/random.c b/drivers/char/random.c --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1057,9 +1057,10 @@ static void _extract_crng(struct crng_state *crng, (time_after(crng_global_init_time, crng->init_time) || time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))) crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL); + if (!arch_get_random_long(&v)) + v = random_get_entropy(); spin_lock_irqsave(&crng->lock, flags); - if (arch_get_random_long(&v)) - crng->state[14] ^= v; + crng->state[14] += v; chacha20_block(&crng->state[0], out); if (crng->state[12] == 0) crng->state[13]++;