On 02/14/2019 05:37 AM, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: >> v4: >> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. >> >> v3: >> - Optimize __down_read_trylock() for the uncontended case as suggested >> by Linus. >> >> v2: >> - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ. >> - Update performance test data in patch 1. >> >> The goal of this patchset is to remove the architecture specific files >> for rwsem-xadd to make it easer to add enhancements in the later rwsem >> patches. It also removes the legacy rwsem-spinlock.c file and make all >> the architectures use one single implementation of rwsem - rwsem-xadd.c. >> >> Waiman Long (3): >> locking/rwsem: Remove arch specific rwsem files >> locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all >> archs >> locking/rwsem: Optimize down_read_trylock() > Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > > with the caveat that I'm happy to exchange patch 3 back to my earlier > suggestion in case Will expesses concerns wrt the ARM64 performance of > Linus' suggestion. I inserted a few lock event counters into the rwsem trylock code: static inline int __down_read_trylock(struct rw_semaphore *sem) { /* * Optimize for the case when the rwsem is not locked at all. */ long tmp = RWSEM_UNLOCKED_VALUE; lockevent_inc(rwsem_rtrylock); do { if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, tmp + RWSEM_ACTIVE_READ_BIAS)) { rwsem_set_reader_owned(sem); return 1; } lockevent_inc(rwsem_rtrylock_retry); } while (tmp >= 0); lockevent_inc(rwsem_rtrylock_fail); return 0; } static inline int __down_write_trylock(struct rw_semaphore *sem) { long tmp; lockevent_inc(rwsem_wtrylock); tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS); if (tmp == RWSEM_UNLOCKED_VALUE) { rwsem_set_owner(sem); return true; } lockevent_inc(rwsem_wtrylock_fail); return false; } I booted the new kernel on a 4-socket 56-core 112-thread Broadwell system. The counter values 1) After bootup: rwsem_rtrylock=784029 rwsem_rtrylock_fail=59 rwsem_rtrylock_retry=394 rwsem_wtrylock=18284 rwsem_wtrylock_fail=230 2) After parallel kernel build (-j112): rwsem_rtrylock=338667559 rwsem_rtrylock_fail=18 rwsem_rtrylock_retry=51 rwsem_wtrylock=17016332 rwsem_wtrylock_fail=98058 At least for these two use cases, try-for-ownership as suggested by Linus is the right choice. Cheers, Longman