On Fri, 2 Jun 2017, Eric Biggers wrote: > I tried --- actually, 'evictee = __this_cpu_xchg(bh_lrus.bhs[i], evictee)'. But > it's much slower, nearly as slow as the original --- which perhaps is not > surprising since __this_cpu_xchg() is a cmpxchg rather than a simple load and > store. It may be even worse on non-x86 architectures. Also note that we still Its a local cmpxchg which should only take a few cycles. > have to disable IRQs because we need to stay on the same CPU throughout so that > only a single queue is operated on. Ah ok that would kill it.