On Thu, Sep 29, 2016 at 11:31:32AM +1000, Nicholas Piggin wrote: > On Wed, 28 Sep 2016 09:05:46 +0200 > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > On Wed, Sep 28, 2016 at 03:06:21AM +1000, Nicholas Piggin wrote: > > > On Tue, 27 Sep 2016 18:52:21 +0200 > > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > > On Wed, Sep 28, 2016 at 12:53:18AM +1000, Nicholas Piggin wrote: > > > > > The more interesting is the ability to avoid the barrier between fastpath > > > > > clearing a bit and testing for waiters. > > > > > > > > > > unlock(): lock() (slowpath): > > > > > clear_bit(PG_locked) set_bit(PG_waiter) > > > > > test_bit(PG_waiter) test_bit(PG_locked) > > > > > > > > > > If this was memory ops to different words, it would require smp_mb each > > > > > side.. Being the same word, can we avoid them? > > > > > > > > Ah, that is the reason I put that smp_mb__after_atomic() there. You have > > > > a cute point on them being to the same word though. Need to think about > > > > that. > > > > > > This is all assuming the store accesses are ordered, which you should get > > > if the stores to the different bits operate on the same address and size. > > > That might not be the case for some architectures, but they might not > > > require barriers for other reasons. That would call for an smp_mb variant > > > that is used for bitops on different bits but same aligned long. > > > > Since the {set,clear}_bit operations are atomic, they must be ordered > > against one another. The subsequent test_bit is a load, which, since its > > to the same variable, and a CPU must appear to preserve Program-Order, > > must come after the RmW. > > > > So I think you're right and that we can forgo the memory barriers here. > > I even think this must be true on all architectures. > > In generic code, I don't think so. We'd need an > smp_mb__between_bitops_to_the_same_aligned_long, wouldn't we? > > x86 implements set_bit as 'orb (addr),bit_nr', and compiler could > implement test_bit as a byte load as well. If those bits are in > different bytes, then they could be reordered, no? > > ia64 does 32-bit ops. If you make PG_waiter 64-bit only and put it > in the different side of the long, then this could be a problem too. Fair point, that would defeat the same-location ordering... Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>