+Cc linux-fsdevel On Thu, Jun 01, 2017 at 11:07:19AM -0500, Christoph Lameter wrote: > On Wed, 31 May 2017, akpm@xxxxxxxxxxxxxxxxxxxx wrote: > > > + struct buffer_head *evictee = bh; > > + struct bh_lru *b; > > + int i; > > + b = this_cpu_ptr(&bh_lrus); > > + for (i = 0; i < BH_LRU_SIZE; i++) { > > + swap(evictee, b->bhs[i]); > > Could you try to use this_cpu_xchg here to see if it reduces latency > further? > > for (i = 0; i < BH_LRU_SIZE; i++) { > __this_cpu_xchg(bh_lrus->bhs[i], evictee) > > ... > I tried --- actually, 'evictee = __this_cpu_xchg(bh_lrus.bhs[i], evictee)'. But it's much slower, nearly as slow as the original --- which perhaps is not surprising since __this_cpu_xchg() is a cmpxchg rather than a simple load and store. It may be even worse on non-x86 architectures. Also note that we still have to disable IRQs because we need to stay on the same CPU throughout so that only a single queue is operated on. Eric