On Wed, May 31, 2023 at 02:03:05PM +0200, Vlastimil Babka wrote: > On 1/17/23 14:40, Jesper Dangaard Brouer wrote: > > Allow API users of kmem_cache_create to specify that they don't want > > any slab merge or aliasing (with similar sized objects). Use this in > > network stack and kfence_test. > > > > The SKB (sk_buff) kmem_cache slab is critical for network performance. > > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain > > performance by amortising the alloc/free cost. > > > > For the bulk API to perform efficiently the slub fragmentation need to > > be low. Especially for the SLUB allocator, the efficiency of bulk free > > API depend on objects belonging to the same slab (page). > > > > When running different network performance microbenchmarks, I started > > to notice that performance was reduced (slightly) when machines had > > longer uptimes. I believe the cause was 'skbuff_head_cache' got > > aliased/merged into the general slub for 256 bytes sized objects (with > > my kernel config, without CONFIG_HARDENED_USERCOPY). > > > > For SKB kmem_cache network stack have reasons for not merging, but it > > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). > > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache. > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > > Since this idea was revived by David [1], and neither patch worked as is, > but yours was more complete and first, I have fixed it up as below. The > skbuff part itself will be best submitted separately afterwards so we don't > get conflicts between trees etc. Comments? > > ----8<---- > From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001 > From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > Date: Tue, 17 Jan 2023 14:40:00 +0100 > Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE > > Allow API users of kmem_cache_create to specify that they don't want > any slab merge or aliasing (with similar sized objects). Use this in > kfence_test. > > The SKB (sk_buff) kmem_cache slab is critical for network performance. > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain > performance by amortising the alloc/free cost. > > For the bulk API to perform efficiently the slub fragmentation need to > be low. Especially for the SLUB allocator, the efficiency of bulk free > API depend on objects belonging to the same slab (page). > > When running different network performance microbenchmarks, I started > to notice that performance was reduced (slightly) when machines had > longer uptimes. I believe the cause was 'skbuff_head_cache' got > aliased/merged into the general slub for 256 bytes sized objects (with > my kernel config, without CONFIG_HARDENED_USERCOPY). > > For SKB kmem_cache network stack have reasons for not merging, but it > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). > We want to explicitly set SLAB_NO_MERGE for this kmem_cache. I believe it's also good for the visibility: having a separate entity in /proc/slabinfo for skbuff_head_cache can be really useful. > > Another use case for the flag has been described by David Sterba [1]: > > > This can be used for more fine grained control over the caches or for > > debugging builds where separate slabs can verify that no objects leak. > > > The slab_nomerge boot option is too coarse and would need to be > > enabled on all testing hosts. There are some other ways how to disable > > merging, e.g. a slab constructor but this disables poisoning besides > > that it adds additional overhead. Other flags are internal and may > > have other semantics. > > > A concrete example what motivates the flag. During 'btrfs balance' > > slab top reported huge increase in caches like > > > 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt > > 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace > > 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot > > > which was confusing and that it's because of slab merging was not the > > first idea. After rebooting with slab_nomerge all the caches were > > from btrfs_ namespace as expected. > > [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@xxxxxxxx/ > > [ vbabka@xxxxxxx: rename to SLAB_NO_MERGE, change the flag value to the > one proposed by David so it does not collide with internal SLAB/SLUB > flags, write a comment for the flag, expand changelog, drop the skbuff > part to be handled spearately ] > > Reported-by: David Sterba <dsterba@xxxxxxxx> > Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> both for this patch and the corresponding change on the networking side. Thanks!