On Tue, Jan 17, 2023 at 03:54:34PM +0100, Christoph Lameter wrote: > On Tue, 17 Jan 2023, Jesper Dangaard Brouer wrote: > > > When running different network performance microbenchmarks, I started > > to notice that performance was reduced (slightly) when machines had > > longer uptimes. I believe the cause was 'skbuff_head_cache' got > > aliased/merged into the general slub for 256 bytes sized objects (with > > my kernel config, without CONFIG_HARDENED_USERCOPY). > > Well that is a common effect that we see in multiple subsystems. This is > due to general memory fragmentation. Depending on the prior load the > performance could actually be better after some runtime if the caches are > populated avoiding the page allocator etc. The page allocator isn't _that_ expensive. I could see updating several slabs being more expensive than allocating a new page. > The merging could actually be beneficial since there may be more partial > slabs to allocate from and thus avoiding expensive calls to the page > allocator. What might be more effective is allocating larger order slabs. I see that kmalloc-256 allocates a pair of pages and manages 32 objects within that pair. It should perform better in Jesper's scenario if it allocated 4 pages and managed 64 objects per slab. Simplest way to test that should be booting a kernel with 'slub_min_order=2'. Does that help matters at all, Jesper? You could also try slub_min_order=3. Going above that starts to get a bit sketchy.