On Thu, Feb 04, 2010 at 11:31:49AM -0800, David Daney wrote: > The current locking mechanism uses a ll/sc sequence to release a > spinlock. This is slower than a wmb() followed by a store to unlock. > > The branching forward to .subsection 2 on sc failure slows down the > contended case. So we get rid of that part too. > > Since we are now working on naturally aligned u16 values, we can get > rid of a masking operation as the LHU already does the right thing. > The ANDI are reversed for better scheduling on multi-issue CPUs > > On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet > forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%. And in your benchmarking patch you wrote: > spin_single spin_multi > base 106885 247941 > spinlock_patch 75194 219465 I did some benchmarking on an IP27 (180MHz, 2 CPU, needs LL/SC workaround): spin_single spin_multi base 229341 3505690 spinlock_patch 177847 3615326 So about 22% speedup for spin_single but 3% slowdown for spin_multi. Disabling the R10k LL/SC workaround btw. gives another 23% speedup for spin_single and marginal 0.3% for spin_multi; the latter may well be statistical noise. Ralf