On (03/30/15 09:01), Sowmini Varadhan wrote: > > So I tried looking at the code, and perhaps there is some arch-specific > subtlety here that I am missing, but where does spin_lock itself > do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this. To answer my question: I'd missed the CONFIG_LOCK_STAT (which David Ahern pointed out to me). the above is only true for the LOCK_STAT case. In any case, I ran some experiments today: I was running iperf [http://en.wikipedia.org/wiki/Iperf] over ixgbe, which is where I'd noticed the original perf issues for sparc. I was running iperf2 (which is more aggressively threaded than iperf3) with 8, 10, 16, 20 threads, and with TSO turned off. In each case, I was making sure that I was able to reach 9.X Gbps (this is a 10Gbps link) I dont see any significant difference in the perf profile between the spin_trylock and the spin_lock version (other than, of course, the change to the lock-contention for the trylock version). I looked at the perf profiled cache-misses (works out to about 1400M for 10 threads, with or without the trylock). I'm still waiting for some of the IB folks to try out the spin_lock version (they had also seen some significant perf improvements from breaking down the monolithic lock into multiple pools, so their workload is also sensitive to this) But as such, it looks like it doesnt matter much, whether you use the trylock to find the first available pool, or block on the spin_lock. I'll let folks on this list vote on this one (assuming the IB tests also come out without a significant variation between the 2 locking choices). --Sowmini -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html