On Sat, 2012-10-13 at 02:51 -0700, David Rientjes wrote: > On Thu, 11 Oct 2012, Andi Kleen wrote: > > > When did you last test? Our regressions had disappeared a few kernels > > ago. > > > > This was in August when preparing for LinuxCon, I tested netperf TCP_RR on > two 64GB machines (one client, one server), four nodes each, with thread > counts in multiples of the number of cores. SLUB does a comparable job, > but once we have the the number of threads equal to three times the number > of cores, it degrades almost linearly. I'll run it again next week and > get some numbers on 3.6. In latest kernels, skb->head no longer use kmalloc()/kfree(), so SLAB vs SLUB is less a concern for network loads. In 3.7, (commit 69b08f62e17) we use fragments of order-3 pages to populate skb->head. SLUB was really bad in the common workload you describe (allocations done by one cpu, freeing done by other cpus), because all kfree() hit the slow path and cpus contend in __slab_free() in the loop guarded by cmpxchg_double_slab(). SLAB has a cache for this, while SLUB directly hit the main "struct page" to add the freed object to freelist. I played some months ago adding a percpu associative cache to SLUB, then just moved on other strategy. (Idea for this per cpu cache was to build a temporary free list of objects to batch accesses to struct page) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>