On Mon, Feb 11, 2019 at 10:36:01AM +0100, Peter Zijlstra wrote: > On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > > +static inline int __down_read_trylock(struct rw_semaphore *sem) > > +{ > > + long tmp; > > + > > + while ((tmp = atomic_long_read(&sem->count)) >= 0) { > > + if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, > > + tmp + RWSEM_ACTIVE_READ_BIAS)) { > > + return 1; > > That really wants to be: > > if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, > tmp + RWSEM_ACTIVE_READ_BIAS)) > > > + } > > + } > > + return 0; > > +} Also, the is the one case where LL/SC can actually do 'better'. Do you have benchmarks for say PowerPC or ARM64 ?