On Wed, Jun 15, 2011 at 2:37 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote: > > http://programming.kicks-ass.net/sekrit/39-2.txt.bz2 > http://programming.kicks-ass.net/sekrit/tip-2.txt.bz2 > > tip+sirq+linus is still slightly faster than .39 here, Hmm. Your profile doesn't show the mutex slowpath at all, so there's a big difference to the one Tim quoted parts of. In fact, your profile looks fine. The load clearly spends tons of time in page faulting and in timing things (that read_hpet thing is disgusting), but with that in mind, the profile doesn't look scary. Yes, the 2% spinlock time is bad, but you've clearly not hit the real lock contention case. The mutex lock shows up, but _way_ below the spinlock, and the slowpath never shows at all. You end up having mutex_spin_on_owner at 0.09%, it's not really visible. Clearly going from your two-socket 12-core thing to Tim's four-socket 40-core case is a big jump. But maybe it really was about RCU, and even the limited softirq patch that moves the grace period stuff etc back to softirqs ends up helping. Tim, have you tried running your bigger load with that patch? You could try my patch on top too just to match Peter's tree, but I doubt that's the big first-order issue. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>