On Thu, Sep 19, 2019 at 7:34 AM Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > > It's basically a "we used up entropy" thing, which is very > > questionable to begin with as the whole discussion has shown, but > > since it stops doing it after 10 cases, it's not even good security > > assuming the "use up entropy" case makes sense in the first place. > > It was a bug that it stopped doing it after 10 tries, and there's a > really good reason for it. I really doubt that. > The reason for zeroing it after we expose state is because otherwise > if the pool starts in a known state (the attacker knows the starting > configuration, knows the DMI table that we're mixing into the pool > since that's a constant, etc.), That's at least partly because our pool hashing has what looks a fairly sad property. Yes, it hashes it using a good hash, but it does so in a way that makes it largely possible to follow the hashing and repeat it and analyze it. That breaks if we have hw randomness, because it does the if (arch_get_random_long(&v)) crng->state[14] ^= v; so it always mixes in hardware randomness as part of the extraction, but we don't mix anything else unpredictable - or even process-specific - state in. So without hw randomness, you can try to get a lot of data over a lot of boots - and for long times during boots - and maybe find the pattern. But honestly, this isn't realistic. I can point to emails where *you* are arguing against other hashing algorithms because the whole state extension attack simply isn't realistic. And I think it's also pretty questionable how we don't try to mix in anything timing/process-specific when extracting, which is what makes that "do lots of boots" possible. The silly "reset crng_init_cnt" does absolutely nothing to help that, but in fact what it does is to basically give the attacker a way to get an infinite stream of data without any reseeding (because that only happens after crng_read()), and able to extend that "block at boot" time indefinitely while doing so. Also honestly, if the attacker already has access to the system at boot, you have some fairly big problems to begin with. So a much bigger issue than the state extension attack (pretty much purely theoretical, given any entropy at all, which we _will_ have even without the crng_init_cnt clearing) is the fact that right now we really are predictable if there are no hardware interrupts, and people have used /dev/urandom because other sources weren't useful. And the fact is, we *know* people use /dev/urandom exactly because other sources haven't been useful. And unlike your theoretical state extension attack, I can point you to black hat presentations that literally talk about using the fact that we delay m,ixing in the input pull hash to know what's going on: https://www.blackhat.com/docs/eu-14/materials/eu-14-Kedmi-Attacking-The-Linux-PRNG-On-Android-Weaknesses-In-Seeding-Of-Entropic-Pools-And-Low-Boot-Time-Entropy.pdf That's a real attack. Based on the REAL fact that we currently have to use the urandom logic because the entropy-waiting one is useless, and in fact depends on the re-seeding happening too late. Yes, yes, our urandom has changed since that attack, and we use chacha instead of sha1 these days. We have other changes too. But I don't see anything fundamentally different. And all your arguments seem to make that _real_ security issue just worse, exactly because we also avoid reseeding while crng_init is zero. > I'm happy this proposed is not changing the behavior of getrandom(0). > Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though? It > will have the same effect, and it's make it clear what we're doing. Have you you not followed the whole discussion? Didn't you read the comment? People use "getrandom(0)" not because they want secure randomness, but because that's the default. And we *will* do something about it. This patch didn't, because I want to be able to backport it to stable, so that everybody is happier with saying "ok, I'll use the new getrandom(GRND_INSECURE)". Because getrandom(0) will NOT be the the same as GRND_EXPLICIT | GRND_WAIT_ENTROPY. getrandom(0) is the "I don't know what I am doing" thing. It could be somebody that wants real secure random numbers. Or it could *not* be one of those, and need the timeout. > Later on, when we rip out /dev/random pool code (and make reading from > /dev/random the equivalent of getrandom(GRND_SECURE)), we'll need to > similarly map the legacy combination of flags for GRND_RANDOM and > GRND_RANDOM | GRND_NONBLOCK. And that is completely immaterial, because the "I'm confused" case isn't about GRND_RANDOM. Nobody uses that anyway, and more importantly it's not the case that has caused bugs. That one blocks even during normal execution, so that one - despite being completely useless - actually has the one good thing going for it that it's testable. People will see the "oh, that took a long time" during testing. And then they'll stop using it. Ted - you really don't seem to be making any distinction between "these are real problems that should be fixed" vs "this is theory that isn't relevant". The "getrandom(0)" is a real problem that needs to be fixed. The warnings from /dev/urandom are real problems that people apparently have worked around by (incorrectly) using getrandom(0). The "hashing the random pool still leaves identities in place" is a real problem that had a real attack. The state extension attack? Complete theory (again, I can point to you saying the same thing in other threads), and the "fix" of resetting the counter and not reseeding seems to be anything but. Linus