On Wed, 26 Dec 2012 18:00:04 -0800 Kent Overstreet <koverstreet@xxxxxxxxxx> wrote: > So, for sticking kiocb completions on the kioctx ringbuffer, we need a > lock - it unfortunately can't be lockless. > > When the kioctx is shared between threads on different cpus and the rate > of completions is high, this lock sees quite a bit of contention - in > terms of cacheline contention it's the hottest thing in the aio > subsystem. > > That means, with a regular spinlock, we're going to take a cache miss > to grab the lock, then another cache miss when we touch the data the > lock protects - if it's on the same cacheline as the lock, other cpus > spinning on the lock are going to be pulling it out from under us as > we're using it. > > So, we use an old trick to get rid of this second forced cache miss - > make the data the lock protects be the lock itself, so we grab them both > at once. Boy I hope you got that right. Did you consider using bit_spin_lock() on the upper bit of `tail'? We've done that in other places and we at least know that it works. And it has the optimisations for CONFIG_SMP=n, understands CONFIG_DEBUG_SPINLOCK, has arch-specific optimisations, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html