On Thu, 6 Jul 2017, Paul E. McKenney wrote: > On Thu, Jul 06, 2017 at 06:10:47PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 06, 2017 at 08:21:10AM -0700, Paul E. McKenney wrote: > > > And yes, there are architecture-specific optimizations for an > > > empty spin_lock()/spin_unlock() critical section, and the current > > > arch_spin_unlock_wait() implementations show some of these optimizations. > > > But I expect that performance benefits would need to be demonstrated at > > > the system level. > > > > I do in fact contended there are any optimizations for the exact > > lock+unlock semantics. > > You lost me on this one. > > > The current spin_unlock_wait() is weaker. Most notably it will not (with > > exception of ARM64/PPC for other reasons) cause waits on other CPUs. > > Agreed, weaker semantics allow more optimizations. So use cases needing > only the weaker semantics should more readily show performance benefits. > But either way, we need compelling use cases, and I do not believe that > any of the existing spin_unlock_wait() calls are compelling. Perhaps I > am confused, but I am not seeing it for any of them. If somebody really wants the full spin_unlock_wait semantics and doesn't want to interfere with other CPUs, wouldn't synchronize_sched() or something similar do the job? It wouldn't be as efficient as lock+unlock, but it also wouldn't affect other CPUs. Alan Stern