On Fri, 2014-01-31 at 16:09 -0500, Waiman Long wrote: > On 01/31/2014 03:14 PM, Peter Zijlstra wrote: > > On Fri, Jan 31, 2014 at 01:59:02PM -0500, Waiman Long wrote: > >> On 01/31/2014 04:26 AM, Peter Zijlstra wrote: > >>> On Thu, Jan 30, 2014 at 04:17:15PM +0100, Peter Zijlstra wrote: > >>>> The below is still small and actually works. > >>> OK, so having actually worked through the thing; I realized we can > >>> actually do a version without MCS lock and instead use a ticket lock for > >>> the waitqueue. > >>> > >>> This is both smaller (back to 8 bytes for the rwlock_t), and should be > >>> faster under moderate contention for not having to touch extra > >>> cachelines. > >>> > >>> Completely untested and with a rather crude generic ticket lock > >>> implementation to illustrate the concept: > >>> > >> Using a ticket lock instead will have the same scalability problem as the > >> ticket spinlock as all the waiting threads will spin on the lock cacheline > >> causing a lot of cache bouncing traffic. > > A much more important point for me is that a fair rwlock has a _much_ > > better worst case behaviour than the current mess. That's the reason I > > was interested in the qrwlock thing. Not because it can run contended on > > a 128 CPU system and be faster at being contended. > > > > If you contend a lock with 128 CPUs you need to go fix that code that > > causes this abysmal behaviour in the first place. > > But the kernel should also be prepared for such situations, whenever possible. > > > > I am not against the use of ticket spinlock as the queuing mechanism on > small systems. I do have concern about the contended performance on > large NUMA systems which is my primary job responsibility. Depending on > the workload, contention can happens anywhere. So it is easier said than > done to fix whatever lock contention that may happen. > > How about making the selection of MCS or ticket queuing either user > configurable or depending on the setting of NR_CPUS, NUMA, etc? Users have no business making these decisions and being exposed to these kind of internals. CONFIG_NUMA sounds reasonable to me. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html