Ingo Molnar wrote: > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > >> * WOW * >> > > WOW indeed - and i can see a similar _brutal_ speedup on two separate > 16-way boxes as well: > > 16 CPUs, running 128 parallel test-tasks. > > NO_OWNER_SPIN: > avg ops/sec: 281595 > > OWNER_SPIN: > avg ops/sec: 524791 > > Da Killer! > This jives with our findings back when we first looked at this (200%-300% speedups in most benchmarks), so this is excellent that it is yielding boosts here as well. > Look at the performance counter stats: > > >> 12098.324578 task clock ticks (msecs) >> >> 1081 CPU migrations (events) >> 7102 context switches (events) >> 2763 pagefaults (events) >> > > >> 22280.283224 task clock ticks (msecs) >> >> 117 CPU migrations (events) >> 5711 context switches (events) >> 2781 pagefaults (events) >> > > We were able to spend twice as much CPU time and efficiently so - and we > did about 10% of the cross-CPU migrations as before (!). > > My (wild) guess is that the biggest speedup factor was perhaps this little > trick: > > + if (need_resched()) > + break; > > this allows the spin-mutex to only waste CPU time if there's no work > around on that CPU. (i.e. if there's no other task that wants to run) The > moment there's some other task, we context-switch to it. > Well, IIUC thats only true if the other task happens to preempt current, which may not always be the case, right? For instance, if current still has timeslice left, etc. I think the primary difference is actually the reduction in the ctx switch rate, but its hard to say without looking at detailed traces and more stats. Either way, woohoo! -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature