On Fri, Feb 15, 2019 at 01:58:34PM -0500, Waiman Long wrote: > On 02/15/2019 01:40 PM, Will Deacon wrote: > > On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote: > >> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: > >>> v4: > >>> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. > >>> > >>> v3: > >>> - Optimize __down_read_trylock() for the uncontended case as suggested > >>> by Linus. > >>> > >>> v2: > >>> - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ. > >>> - Update performance test data in patch 1. > >>> > >>> The goal of this patchset is to remove the architecture specific files > >>> for rwsem-xadd to make it easer to add enhancements in the later rwsem > >>> patches. It also removes the legacy rwsem-spinlock.c file and make all > >>> the architectures use one single implementation of rwsem - rwsem-xadd.c. > >>> > >>> Waiman Long (3): > >>> locking/rwsem: Remove arch specific rwsem files > >>> locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all > >>> archs > >>> locking/rwsem: Optimize down_read_trylock() > >> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > >> > >> with the caveat that I'm happy to exchange patch 3 back to my earlier > >> suggestion in case Will expesses concerns wrt the ARM64 performance of > >> Linus' suggestion. > > Right, the current proposal doesn't work well for us, unfortunately. Which > > was your earlier suggestion? > > > > Will > > In my posting yesterday, I showed that most of the trylocks done were > actually uncontended. Assuming that pattern hold for the most of the > workloads, it will not that bad after all. That's fair enough; if you're going to sit in a tight trylock() loop like the benchmark does, then you're much better off just calling lock() if you care at all about scalability. Will