On Sun, 31 Jul 2011, David Rientjes wrote: > Well, the counters variable is added although it doesn't increase the size > of the unaligned struct page because of how it is restructured. The end > result of the alignment for CONFIG_CMPXCHG_LOCAL is that struct page will > increase from 56 bytes to 64 bytes on my config. That's a cost of 128MB > on each of my client and server 64GB machines for the netperf benchmark > for the ~2.3% speedup. > And although slub is definitely heading in the right direction regarding the netperf benchmark, it's still a non-starter for anybody using large NUMA machines for networking performance. On my 16-core, 4 node, 64GB client/server machines running netperf TCP_RR with various thread counts for 60 seconds each on 3.0: threads SLUB SLAB diff 16 76345 74973 - 1.8% 32 116380 116272 - 0.1% 48 150509 153703 + 2.1% 64 187984 189750 + 0.9% 80 216853 224471 + 3.5% 96 236640 249184 + 5.3% 112 256540 275464 + 7.4% 128 273027 296014 + 8.4% 144 281441 314791 +11.8% 160 287225 326941 +13.8% I'm much more inclined to use slab because it's performance is so much better for heavy networking loads and have an extra 128MB on each of these machines. Now, if I think about this from a Google perspective, we have scheduled jobs on shared machines with memory containment allocated in 128MB chunks for several years. So if these numbers are representative of the networking performance I can get on our production machines, I'm not only far better off selecting slab for its performance, but I can also schedule one small job on every machine in our fleet! Ignoring the netperf results, if you take just the alignment change on struct page as a result of cmpxchg16b, I've lost 0.2% of memory from every machine in our fleet by selecting slub. So if we're bound by memory, I've just effectively removed 0.2% of machines from our fleet. That happens to be a large number and at a substantial cost every year. So although I recommended the lockless changes at the memory cost of struct page alignment to improve performance by ~2.3%, it's done with the premise that I'm not actually going to be using it, so it's more of a recommendation for desktops and small systems where others have shown slub is better on benchmarks like kernbench, sysbench, aim9, and hackbench. [ I'd love if we had sufficient predicates in the x86 kconfigs to determine what the appropriate allocator to choose would be because it's obvious that slab is light years ahead of the default slub for us. ] And although I've developed a mutable slab allocator, SLAM, that makes all of this irrelevant since it's a drop-in replacement for slab and slub, I can't legitimately propose it for inclusion because it lacks the debugging capabilities that slub excels in and there's an understanding that Linus won't merge another stand-alone allocator until one is removed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>