Re: [PATCH] mm/shuffle.c: optimize add_to_free_area_random()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 17, 2020 at 02:44:33PM -0700, Kees Cook wrote:
> On Tue, Mar 17, 2020 at 01:50:35PM +0000, George Spelvin wrote:
> > First, use long rather than u64 for the bit buffer type, which
> > is significantly more efficient on 32-bit processors.
> 
> Why? This is twice as many get_random_*() calls for 32-bit: it'll save
> whatever bits it doesn't use. (Speaking to word-sized stores, yes, that
> makes sense, but see below...)

Yes, but the per-call overhead is fairly low until you run out of the
64-byte buffer; then there's an expensive crypto operation to
refill the buffer.

So two get_random_32 calls is not a lot more than one get_random_64. (The 
buffers are per-CPU, so they are pretty cheap to access.  (I have other 
patches I'm working on to speed up the locking they have, but this one is 
independent so I'm sending it separately.)

And if you want faster, prandom_u32() is probably good enough for this
application.

I did it to speed up and shrink the fast path.  It saves at least
3 instructions on 32-bit machines (one load, one rotate through carry, 
and one store), at the cost of one extra get_random_*() call per 64
bits.  If get_random_u32()'s fast path is less than 192 instructions,
it's a win.

It's also relevant to the race condition reasoning mentioned below;
a one-word value can be atomically updated.  On a 32-bit machine,
64-bit operations get tearing and it's a PITA to reason about.

>> Second, avoid the need for a separate rand_bits counter.
>> rand_bits is never more than 63, so there's always room in rand
>> for a bit to mark the end of the available bits.  This makes the
>> shared state atomically updatable, which makes it a lot easier
>> to reason about race conditions.
> 
> What are the race conditions? I had to go look at I see that
> add_to_free_area_random() is called under __free_one_page() which is
> under &zone->lock. Would it make more sense to store the randomness
> per-zone and entirely side-step the issues?

The most serious is that if two threads simultaneously observe
rand_bits == 1, but do their decrements separately, you end up
with rand_bits = 255 and you generate 255 consecutive 0 bits
before refilling the buffer.

Since we're only generating random bits, a screwed-up answer occasionally 
doesn't really matter (like the comment says "lack of locking is 
deliberate"), but 255 screwed up bits is a bit much.

I avoided changing the underlying locking model because I didn't
feel up to such an invasive change; I managed to fix the problems
I saw without going there.  And shrink the code; tht seemed like
enough of a win to justify it to me.

If you want to change the locking, the other alternative to per-zone is 
per-CPU, which also avoids locking.  We could even add a generic 
get_random_bit() function.

>> Third, use READ_ONCE and WRITE_ONCE.  Without them, the compiler
>> may spill to the shared static in arbitrarily perverse ways,
>> and combined with the fact that the code eschews locking, that
>> is a recipe for hard-to-find bugs.  Now, a race might cause a bit
>> to be used twice, or get_random_long() to be called redundantly,
>> but it can't summon nasal daemons.
> 
> All these things might make the results less predictable, too. :) But,
> yes, if there was deterministic sharing of random values, that'd be
> bad. But this patch doesn't really solve the "use the same random bits"
> problem, which is the very thing that might get us into a predictable
> state. If two zones both call READ_ONCE(), use the same value (eek),
> and then both call WRITE_ONCE(), one of them wins the write (no big deal).

Absolutely, but in case of a tight race, only *one* bit is used twice,
and the bit used is uniformly distributed, so it's a really small
correlation.  Unlike the 255x zero race the existing code can have.

The main reason I did this is to, as I already wrote, keep the nasal 
demons at bay.  The existing code does things (having another thread 
change a non-volatile variable while this code is executing) that are 
explicitly undefined by the ANSI C standard.

The compiler is allowed to (in John Woods' memorable explanation)
produce code that makes demons fly out of your nose.  (More plausibly,
it may simply crash.)

It probably won't, but I prefer to stay on the safe side of the
language specification contact.  Linus has ranted on more than one 
occasion about the perverse ways that GCC has used its permission
to assume that people won't violate the contract.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux