I just spend some time looking at the functions that you see high in the list. The trouble is that I have to speculate and that I have nothing to verify my thoughts. If you could give me the hitlist for each of the 3 runs then this would help to check my thinking. I could be totally off here. It seems that we miss the per cpu slab frequently on slab_free() which leads to the calling of __slab_free() and which in turn needs to take a lock on the page (in the page struct). Typically the page lock is uncontended which seems to not be the case here otherwise it would not be that high up. The per cpu patch in mm should reduce the contention on the page struct by not touching the page struct on alloc and on free. Does not seem to work all the way though. slab_free() still has to touch the page struct if the free is not to the currently active cpu slab. So there could still be page struct contention left if multiple processors frequently and simultaneously free to the same slab and that slab is not the per cpu slab of a cpu. That could be addressed by optimizing the object free handling further to not touch the page struct even if we miss the per cpu slab. That get_partial* is far up indicates contention on the list lock that should be addressable by either increasing the slab size or by changing the object free handling to batch in some form. This is an SMP system right? 2 cores with 4 cpus each? The main loop is always hitting on the same slabs? Which slabs would this be? Am I right in thinking that one process allocates objects and then lets multiple other processors do work and then the allocated object is freed from a cpu that did not allocate the object? If neighboring objects in one slab are allocated on one cpu and then are almost simultaneously freed from a set of different cpus then this may be explain the situation. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html