On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote: > Peter, > > I was trying to implement the generic queue code exchange code using > cmpxchg as suggested by you. However, when I gathered the performance > data, the code performed worse than I expected at a higher contention > level. Below were the execution time of the benchmark tool that I sent > you: > > [xchg] [cmpxchg] > # of tasks Ticket lock Queue lock Queue Lock > ---------- ----------- ----------- ---------- > 1 135 135 135 > 2 732 1315 1102 > 3 1827 2372 2681 > 4 2689 2934 5392 > 5 3736 3658 7696 > 6 4942 4434 9876 > 7 6304 5176 11901 > 8 7736 5955 14551 > I'm just not seeing that; with test-4 modified to take the AMD compute units into account: root@interlagos:~/spinlocks# LOCK=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh 4: 50783.509653 8: 146295.875715 16: 332942.964709 4: 51033.341441 8: 146320.656285 16: 332586.355194 And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg loops with unconditional ops (xchg8 and xchg16). And I'd think that 4 CPUs x 4 Nodes would be heavy contention. I'll have another poke tomorrow; including verifying asm tomorrow, need to go sleep now. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization