Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 04, 2014 at 11:40:43PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote:
> > Peter,
> > 
> > I was trying to implement the generic queue code exchange code using
> > cmpxchg as suggested by you. However, when I gathered the performance
> > data, the code performed worse than I expected at a higher contention
> > level. Below were the execution time of the benchmark tool that I sent
> > you:
> 
> I'm just not seeing that; with test-4 modified to take the AMD compute
> units into account:

OK; I tried on a few larger machines and I can indeed see it there.

That said; our code doesn't differ that much. I see why you're not doing
too well on the 2 CPU contention. You've got an atomic op too much in
that path. But given you see benefit even with 2 atomic ops (I had mixed
results on that) we can do the pending/waiter thing unconditionally for
NR_CPUS>16k.

I also think I can do your full xchg thing without allowing lock steals.

I'll try and do a full series tomorrow that starts with simple code and
builds on that, doing each optimization one by one.
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux