On Mon, 1 Aug 2011, Pekka Enberg wrote: > Looking at the data (in slightly reorganized form): > > alloc > ===== > > 16 threads: > > cache alloc_fastpath alloc_slowpath > kmalloc-256 4263275 (91.1%) 417445 (8.9%) > kmalloc-1024 4636360 (99.1%) 42091 (0.9%) > kmalloc-4096 2570312 (54.4%) 2155946 (45.6%) > > 160 threads: > > cache alloc_fastpath alloc_slowpath > kmalloc-256 10937512 (62.8%) 6490753 (37.2%) > kmalloc-1024 17121172 (98.3%) 303547 (1.7%) > kmalloc-4096 5526281 (31.7%) 11910454 (68.3%) > > free > ==== > > 16 threads: > > cache free_fastpath free_slowpath > kmalloc-256 210115 (4.5%) 4470604 (95.5%) > kmalloc-1024 3579699 (76.5%) 1098764 (23.5%) > kmalloc-4096 67616 (1.4%) 4658678 (98.6%) > > 160 threads: > cache free_fastpath free_slowpath > kmalloc-256 15469 (0.1%) 17412798 (99.9%) > kmalloc-1024 11604742 (66.6%) 5819973 (33.4%) > kmalloc-4096 14848 (0.1%) 17421902 (99.9%) > > it's pretty sad to see how SLUB alloc fastpath utilization drops so > dramatically. Free fastpath utilization isn't all that great with 160 > threads either but it seems to me that most of the performance > regression compared to SLAB still comes from the alloc paths. > It's the opposite, the cumulative effects of the free slowpath is more costly in terms of latency than the alloc slowpath because it occurs at a greater frequency; the pattern that I described as "slab thrashing" before causes a single free to a full slab, manipulation to get it back on the partial list, then the alloc slowpath grabs it for a single allocation, and requires another partial slab on the next alloc.