Re: [tip:locking/core] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

Michael Ellerman <mpe@xxxxxxxxxxxxxx> · Sat, 18 Sep 2021 00:36:20 +1000

Will Deacon <will@xxxxxxxxxx> writes:
> On Thu, Sep 09, 2021 at 10:46:35AM -0700, Paul E. McKenney wrote:
>> On Thu, Sep 09, 2021 at 02:35:36PM +0100, Will Deacon wrote:
>> > On Thu, Sep 09, 2021 at 09:25:30AM +0200, Peter Zijlstra wrote:
>> > > On Wed, Sep 08, 2021 at 09:08:33AM -0700, Linus Torvalds wrote:
>> > > > then I think it's entirely reasonable to
>> > > > 
>> > > >         spin_unlock(&r);
>> > > >         spin_lock(&s);
>> > > > 
>> > > > cannot be reordered.
>> > > 
>> > > I'm obviously completely in favour of that :-)
>> > 
>> > I don't think we should require the accesses to the actual lockwords to
>> > be ordered here, as it becomes pretty onerous for relaxed LL/SC
>> > architectures where you'd end up with an extra barrier either after the
>> > unlock() or before the lock() operation. However, I remain absolutely in
>> > favour of strengthening the ordering of the _critical sections_ guarded by
>> > the locks to be RCsc.
>> 
>> If by this you mean the critical sections when observed only by other
>> critical sections for a given lock, then everyone is already there.
>
> No, I mean the case where somebody without the lock (but using memory
> barriers) can observe the critical sections out of order (i.e. W -> R
> order is not maintained).
>
>> However...
>> 
>> > Last time this came up, I think the RISC-V folks were generally happy to
>> > implement whatever was necessary for Linux [1]. The thing that was stopping
>> > us was Power (see CONFIG_ARCH_WEAK_RELEASE_ACQUIRE), wasn't it? I think
>> > Michael saw quite a bit of variety in the impact on benchmarks [2] across
>> > different machines. So the question is whether newer Power machines are less
>> > affected to the degree that we could consider making this change again.
>> 
>> Last I knew, on Power a pair of critical sections for a given lock could
>> be observed out of order (writes from the earlier critical section vs.
>> reads from the later critical section), but only by CPUs not holding
>> that lock.  Also last I knew, tightening this would require upgrading
>> some of the locking primitives' lwsync instructions to sync instructions.
>> But I know very little about Power 10.
>
> Yup, that's the one. This is the primary reason why we have the confusing
> "RCtso" model today so this is my periodic "Do we still need this?" poking
> for the Power folks :)
>
> If the SYNC is a disaster for Power, then I'll ask again in another ~3 years
> time in the hope that newer micro-architectures can swallow the instruction
> more easily, but the results last time weren't hugely compelling and so _if_
> there's an opportunity to make locking more "obvious" then I'm all for it.

I haven't had time to do the full set of numbers like I did last time,
but a quick test shows it's still about a 20-25% drop switching to sync.

So on that basis we'd definitely rather not :)

I'll try and get some more numbers next week.

cheers