On Wed, Mar 09, 2016 at 06:52:45PM +0530, Vineet Gupta wrote: > On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote: > >> There is clearly a problem in slub code that it is pairing a test_and_set_bit() > >> with a __clear_bit(). Latter can obviously clobber former if they are not a single > >> instruction each unlike x86 or they use llock/scond kind of instructions where the > >> interim store from other core is detected and causes a retry of whole llock/scond > >> sequence. > > > > Yes, test_and_set_bit() + __clear_bit() is broken. > > But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so > (ignoring the performance thing for discussion sake, which is a side effect of > this implementation). The sort answer is: Per definition. They are defined to work together, which is what makes __clear_bit_unlock() such a special function. > So despite the comment below in bit_spinlock.h I don't quite comprehend how this > is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed > cases, then isn't this true in general for lot more cases in kernel, i.e. pairing > atomic lock with non-atomic unlock ? I'm missing something ! x86 (and others) do in fact use non-atomic instructions for spin_unlock(). But as this is all arch specific, we can make these assumptions. Its just that generic code cannot rely on it. So let me try and explain. The problem as identified is: CPU0 CPU1 bit_spin_lock() __bit_spin_unlock() 1: /* fetch_or, r1 holds the old value */ spin_lock load r1, addr load r1, addr bclr r2, r1, 1 store r2, addr or r2, r1, 1 store r2, addr /* lost the store from CPU1 */ spin_unlock and r1, 1 bnz 2 /* it was set, go wait */ ret 2: load r1, addr and r1, 1 bnz 2 /* wait until its not set */ b 1 /* try again */ For LL/SC we replace: spin_lock load r1, addr ... store r2, addr spin_unlock With the (obvious): 1: load-locked r1, addr ... store-cond r2, addr bnz 1 /* or whatever branch instruction is required to retry */ In this case the failure cannot happen, because the store from CPU1 would have invalidated the lock from CPU0 and caused the store-cond to fail and retry the loop, observing the new value.