Re: Deprecating and removing SLOB

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Mon, 14 Nov 2022 23:47:01 +0900

On Mon, Nov 14, 2022 at 08:35:31PM +0900, Damien Le Moal wrote:
> On 11/14/22 18:36, Vlastimil Babka wrote:
> > On 11/14/22 06:48, Damien Le Moal wrote:
> >> On 11/14/22 10:55, Damien Le Moal wrote:
> >>> On 11/12/22 05:46, Conor Dooley wrote:
> >>>> On Fri, Nov 11, 2022 at 11:33:30AM +0100, Vlastimil Babka wrote:
> >>>>> On 11/8/22 22:44, Pasha Tatashin wrote:
> >>>>>> On Tue, Nov 8, 2022 at 10:55 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> as we all know, we currently have three slab allocators. As we discussed
> >>>>>>> at LPC [1], it is my hope that one of these allocators has a future, and
> >>>>>>> two of them do not.
> >>>>>>>
> >>>>>>> The unsurprising reasons include code maintenance burden, other features
> >>>>>>> compatible with only a subset of allocators (or more effort spent on the
> >>>>>>> features), blocking API improvements (more on that below), and my
> >>>>>>> inability to pronounce SLAB and SLUB in a properly distinguishable way,
> >>>>>>> without resorting to spelling out the letters.
> >>>>>>>
> >>>>>>> I think (but may be proven wrong) that SLOB is the easier target of the
> >>>>>>> two to be removed, so I'd like to focus on it first.
> >>>>>>>
> >>>>>>> I believe SLOB can be removed because:
> >>>>>>>
> >>>>>>> - AFAIK nobody really uses it? It strives for minimal memory footprint
> >>>>>>> by putting all objects together, which has its CPU performance costs
> >>>>>>> (locking, lack of percpu caching, searching for free space...). I'm not
> >>>>>>> aware of any "tiny linux" deployment that opts for this. For example,
> >>>>>>> OpenWRT seems to use SLUB and the devices these days have e.g. 128MB
> >>>>>>> RAM, not up to 16 MB anymore. I've heard anecdotes that the performance
> >>>>>>> SLOB impact is too much for those who tried. Googling for
> >>>>>>> "CONFIG_SLOB=y" yielded nothing useful.
> >>>>>>
> >>>>>> I am all for removing SLOB.
> >>>>>>
> >>>>>> There are some devices with configs where SLOB is enabled by default.
> >>>>>> Perhaps, the owners/maintainers of those devices/configs should be
> >>>>>> included into this thread:
> >>>>>>
> >>>>>> tatashin@soleen:~/x/linux$ git grep SLOB=y
> >>>>
> >>>>>> arch/riscv/configs/nommu_k210_defconfig:CONFIG_SLOB=y
> >>>>>> arch/riscv/configs/nommu_k210_sdcard_defconfig:CONFIG_SLOB=y
> >>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_SLOB=y
> >>>>
> >>>>>
> >>>>> Turns out that since SLOB depends on EXPERT, many of those lack it so
> >>>>> running make defconfig ends up with SLUB anyway, unless I miss something.
> >>>>> Only a subset has both SLOB and EXPERT:
> >>>>>
> >>>>>> git grep CONFIG_EXPERT `git grep -l "CONFIG_SLOB=y"`
> >>>>
> >>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_EXPERT=y
> >>>>
> >>>> I suppose there's not really a concern with the virt defconfig, but I
> >>>> did check the output of `make nommu_k210_defconfig" and despite not
> >>>> having expert it seems to end up CONFIG_SLOB=y in the generated .config.
> >>>>
> >>>> I do have a board with a k210 so I checked with s/SLOB/SLUB and it still
> >>>> boots etc, but I have no workloads or w/e to run on it.
> >>>
> >>> I sent a patch to change the k210 defconfig to using SLUB. However...
> > 
> > Thanks!
> > 
> >>> The current default config using SLOB gives about 630 free memory pages
> >>> after boot (cat /proc/vmstat). Switching to SLUB, this is down to about
> >>> 400 free memory pages (CONFIG_SLUB_CPU_PARTIAL is off).
> > 
> > Thanks for the testing! How much RAM does the system have btw? I found 8MB
> > somewhere, is that correct?
> 
> Yep, 8MB, that's it.
> 
> > So 230 pages that's a ~920 kB difference. Last time we saw less  dramatic
> > difference [1]. But that was looking at Slab pages, not free pages. The
> > extra overhead could be also in percpu allocations, code etc.
> > 
> >>> This is with a buildroot kernel 5.19 build including a shell and sd-card
> >>> boot. With SLUB, I get clean boots and a shell prompt as expected. But I
> >>> definitely see more errors with shell commands failing due to allocation
> >>> failures for the shell process fork. So as far as the K210 is concerned,
> >>> switching to SLUB is not ideal.
> >>>
> >>> I would not want to hold on kernel mm improvements because of this toy
> >>> k210 though, so I am not going to prevent SLOB deprecation. I just wish
> >>> SLUB itself used less memory :)
> >>
> >> Did further tests with kernel 6.0.1:
> >> * SLOB: 630 free pages after boot, shell working (occasional shell fork
> >> failure happen though)
> >> * SLAB: getting memory allocation for order 7 failures on boot already
> >> (init process). Shell barely working (high frequency of shell command fork
> >> failures)
> 
> I forgot to add here that the system was down to about 500 free pages
> after boot (again from the shell with "cat /proc/vmstat").
> 
> >> * SLUB: getting memory allocation for order 7 failures on boot. I do get a
> >> shell prompt but cannot run any shell command that involves forking a new
> >> process.
> 
> For both slab and slub, I had cpu partial off, debug off and slab merge
> on, as I suspected that would lead to less memory overhead.
> I suspected memory fragmentation may be an issue but doing
> 
> echo 3 > /proc/sys/vm/drop_caches
> 
> before trying a shell command did not help much at all (it usually does on
> that board with SLOB). Note that this is all with buildroot, so this echo
> & redirect always works as it does not cause a shell fork.
> 
> >>
> >> So if we want to keep the k210 support functional with a shell, we need
> >> slob. If we reduce that board support to only one application started as
> >> the init process, then I guess anything is OK.
> > 
> > In [1] it was possible to save some more memory with more tuning. Some of
> > that required boot parameters and other code changes. In another reply [2] I
> > considered adding something like SLUB_TINY to take care of all that, so
> > looks like it would make sense to proceed with that.
> 
> If you want me to test something, let me know.

Would you try this please?

diff --git a/mm/slub.c b/mm/slub.c
index a24b71041b26..1c36c4b9aaa0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4367,9 +4367,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	 * The larger the object size is, the more slabs we want on the partial
 	 * list to avoid pounding the page allocator excessively.
 	 */
-	s->min_partial = min_t(unsigned long, MAX_PARTIAL, ilog2(s->size) / 2);
-	s->min_partial = max_t(unsigned long, MIN_PARTIAL, s->min_partial);
-
+	s->min_partial = 0;
 	set_cpu_partial(s);
 
 #ifdef CONFIG_NUMA


and booting with and without boot parameter slub_max_order=0?

Thanks,
Hyeonggon

> 
> > 
> > [1]
> > https://lore.kernel.org/all/Yg9xSWEaTZLA+hYt@xxxxxxxxxxxxxxxxxxx-northeast-1.compute.internal/
> > [2] https://lore.kernel.org/all/eebc9dc8-6a45-c099-61da-230d06cb3157@xxxxxxx/
> 
> -- 
> Damien Le Moal
> Western Digital Research