On Tue, Mar 17, 2020 at 02:44:33PM -0700, Kees Cook wrote: > On Tue, Mar 17, 2020 at 01:50:35PM +0000, George Spelvin wrote: > > First, use long rather than u64 for the bit buffer type, which > > is significantly more efficient on 32-bit processors. > > Why? This is twice as many get_random_*() calls for 32-bit: it'll save > whatever bits it doesn't use. (Speaking to word-sized stores, yes, that > makes sense, but see below...) Yes, but the per-call overhead is fairly low until you run out of the 64-byte buffer; then there's an expensive crypto operation to refill the buffer. So two get_random_32 calls is not a lot more than one get_random_64. (The buffers are per-CPU, so they are pretty cheap to access. (I have other patches I'm working on to speed up the locking they have, but this one is independent so I'm sending it separately.) And if you want faster, prandom_u32() is probably good enough for this application. I did it to speed up and shrink the fast path. It saves at least 3 instructions on 32-bit machines (one load, one rotate through carry, and one store), at the cost of one extra get_random_*() call per 64 bits. If get_random_u32()'s fast path is less than 192 instructions, it's a win. It's also relevant to the race condition reasoning mentioned below; a one-word value can be atomically updated. On a 32-bit machine, 64-bit operations get tearing and it's a PITA to reason about. >> Second, avoid the need for a separate rand_bits counter. >> rand_bits is never more than 63, so there's always room in rand >> for a bit to mark the end of the available bits. This makes the >> shared state atomically updatable, which makes it a lot easier >> to reason about race conditions. > > What are the race conditions? I had to go look at I see that > add_to_free_area_random() is called under __free_one_page() which is > under &zone->lock. Would it make more sense to store the randomness > per-zone and entirely side-step the issues? The most serious is that if two threads simultaneously observe rand_bits == 1, but do their decrements separately, you end up with rand_bits = 255 and you generate 255 consecutive 0 bits before refilling the buffer. Since we're only generating random bits, a screwed-up answer occasionally doesn't really matter (like the comment says "lack of locking is deliberate"), but 255 screwed up bits is a bit much. I avoided changing the underlying locking model because I didn't feel up to such an invasive change; I managed to fix the problems I saw without going there. And shrink the code; tht seemed like enough of a win to justify it to me. If you want to change the locking, the other alternative to per-zone is per-CPU, which also avoids locking. We could even add a generic get_random_bit() function. >> Third, use READ_ONCE and WRITE_ONCE. Without them, the compiler >> may spill to the shared static in arbitrarily perverse ways, >> and combined with the fact that the code eschews locking, that >> is a recipe for hard-to-find bugs. Now, a race might cause a bit >> to be used twice, or get_random_long() to be called redundantly, >> but it can't summon nasal daemons. > > All these things might make the results less predictable, too. :) But, > yes, if there was deterministic sharing of random values, that'd be > bad. But this patch doesn't really solve the "use the same random bits" > problem, which is the very thing that might get us into a predictable > state. If two zones both call READ_ONCE(), use the same value (eek), > and then both call WRITE_ONCE(), one of them wins the write (no big deal). Absolutely, but in case of a tight race, only *one* bit is used twice, and the bit used is uniformly distributed, so it's a really small correlation. Unlike the 255x zero race the existing code can have. The main reason I did this is to, as I already wrote, keep the nasal demons at bay. The existing code does things (having another thread change a non-volatile variable while this code is executing) that are explicitly undefined by the ANSI C standard. The compiler is allowed to (in John Woods' memorable explanation) produce code that makes demons fly out of your nose. (More plausibly, it may simply crash.) It probably won't, but I prefer to stay on the safe side of the language specification contact. Linus has ranted on more than one occasion about the perverse ways that GCC has used its permission to assume that people won't violate the contract.