On Mon, Feb 11, 2019 at 10:40:44AM +0100, Peter Zijlstra wrote: > On Mon, Feb 11, 2019 at 10:36:01AM +0100, Peter Zijlstra wrote: > > On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > > > +static inline int __down_read_trylock(struct rw_semaphore *sem) > > > +{ > > > + long tmp; > > > + > > > + while ((tmp = atomic_long_read(&sem->count)) >= 0) { > > > + if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, > > > + tmp + RWSEM_ACTIVE_READ_BIAS)) { > > > + return 1; > > > > That really wants to be: > > > > if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, > > tmp + RWSEM_ACTIVE_READ_BIAS)) > > > > > + } > > > + } > > > + return 0; > > > +} > > Also, the is the one case where LL/SC can actually do 'better'. Do you > have benchmarks for say PowerPC or ARM64 ? Ah, I see they already used asm-generic/rwsem.h which has similar code to the above.