Will Deacon <will@xxxxxxxxxx> writes: > On Thu, Sep 09, 2021 at 10:46:35AM -0700, Paul E. McKenney wrote: >> On Thu, Sep 09, 2021 at 02:35:36PM +0100, Will Deacon wrote: >> > On Thu, Sep 09, 2021 at 09:25:30AM +0200, Peter Zijlstra wrote: >> > > On Wed, Sep 08, 2021 at 09:08:33AM -0700, Linus Torvalds wrote: >> > > > then I think it's entirely reasonable to >> > > > >> > > > spin_unlock(&r); >> > > > spin_lock(&s); >> > > > >> > > > cannot be reordered. >> > > >> > > I'm obviously completely in favour of that :-) >> > >> > I don't think we should require the accesses to the actual lockwords to >> > be ordered here, as it becomes pretty onerous for relaxed LL/SC >> > architectures where you'd end up with an extra barrier either after the >> > unlock() or before the lock() operation. However, I remain absolutely in >> > favour of strengthening the ordering of the _critical sections_ guarded by >> > the locks to be RCsc. >> >> If by this you mean the critical sections when observed only by other >> critical sections for a given lock, then everyone is already there. > > No, I mean the case where somebody without the lock (but using memory > barriers) can observe the critical sections out of order (i.e. W -> R > order is not maintained). > >> However... >> >> > Last time this came up, I think the RISC-V folks were generally happy to >> > implement whatever was necessary for Linux [1]. The thing that was stopping >> > us was Power (see CONFIG_ARCH_WEAK_RELEASE_ACQUIRE), wasn't it? I think >> > Michael saw quite a bit of variety in the impact on benchmarks [2] across >> > different machines. So the question is whether newer Power machines are less >> > affected to the degree that we could consider making this change again. >> >> Last I knew, on Power a pair of critical sections for a given lock could >> be observed out of order (writes from the earlier critical section vs. >> reads from the later critical section), but only by CPUs not holding >> that lock. Also last I knew, tightening this would require upgrading >> some of the locking primitives' lwsync instructions to sync instructions. >> But I know very little about Power 10. > > Yup, that's the one. This is the primary reason why we have the confusing > "RCtso" model today so this is my periodic "Do we still need this?" poking > for the Power folks :) > > If the SYNC is a disaster for Power, then I'll ask again in another ~3 years > time in the hope that newer micro-architectures can swallow the instruction > more easily, but the results last time weren't hugely compelling and so _if_ > there's an opportunity to make locking more "obvious" then I'm all for it. I haven't had time to do the full set of numbers like I did last time, but a quick test shows it's still about a 20-25% drop switching to sync. So on that basis we'd definitely rather not :) I'll try and get some more numbers next week. cheers