On 11/15/22 13:24, Damien Le Moal wrote: > On 11/14/22 23:47, Hyeonggon Yoo wrote: >> On Mon, Nov 14, 2022 at 08:35:31PM +0900, Damien Le Moal wrote: >>> On 11/14/22 18:36, Vlastimil Babka wrote: >>>> On 11/14/22 06:48, Damien Le Moal wrote: >>>>> On 11/14/22 10:55, Damien Le Moal wrote: >>>>>> On 11/12/22 05:46, Conor Dooley wrote: >>>>>>> On Fri, Nov 11, 2022 at 11:33:30AM +0100, Vlastimil Babka wrote: >>>>>>>> On 11/8/22 22:44, Pasha Tatashin wrote: >>>>>>>>> On Tue, Nov 8, 2022 at 10:55 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> as we all know, we currently have three slab allocators. As we discussed >>>>>>>>>> at LPC [1], it is my hope that one of these allocators has a future, and >>>>>>>>>> two of them do not. >>>>>>>>>> >>>>>>>>>> The unsurprising reasons include code maintenance burden, other features >>>>>>>>>> compatible with only a subset of allocators (or more effort spent on the >>>>>>>>>> features), blocking API improvements (more on that below), and my >>>>>>>>>> inability to pronounce SLAB and SLUB in a properly distinguishable way, >>>>>>>>>> without resorting to spelling out the letters. >>>>>>>>>> >>>>>>>>>> I think (but may be proven wrong) that SLOB is the easier target of the >>>>>>>>>> two to be removed, so I'd like to focus on it first. >>>>>>>>>> >>>>>>>>>> I believe SLOB can be removed because: >>>>>>>>>> >>>>>>>>>> - AFAIK nobody really uses it? It strives for minimal memory footprint >>>>>>>>>> by putting all objects together, which has its CPU performance costs >>>>>>>>>> (locking, lack of percpu caching, searching for free space...). I'm not >>>>>>>>>> aware of any "tiny linux" deployment that opts for this. For example, >>>>>>>>>> OpenWRT seems to use SLUB and the devices these days have e.g. 128MB >>>>>>>>>> RAM, not up to 16 MB anymore. I've heard anecdotes that the performance >>>>>>>>>> SLOB impact is too much for those who tried. Googling for >>>>>>>>>> "CONFIG_SLOB=y" yielded nothing useful. >>>>>>>>> >>>>>>>>> I am all for removing SLOB. >>>>>>>>> >>>>>>>>> There are some devices with configs where SLOB is enabled by default. >>>>>>>>> Perhaps, the owners/maintainers of those devices/configs should be >>>>>>>>> included into this thread: >>>>>>>>> >>>>>>>>> tatashin@soleen:~/x/linux$ git grep SLOB=y >>>>>>> >>>>>>>>> arch/riscv/configs/nommu_k210_defconfig:CONFIG_SLOB=y >>>>>>>>> arch/riscv/configs/nommu_k210_sdcard_defconfig:CONFIG_SLOB=y >>>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_SLOB=y >>>>>>> >>>>>>>> >>>>>>>> Turns out that since SLOB depends on EXPERT, many of those lack it so >>>>>>>> running make defconfig ends up with SLUB anyway, unless I miss something. >>>>>>>> Only a subset has both SLOB and EXPERT: >>>>>>>> >>>>>>>>> git grep CONFIG_EXPERT `git grep -l "CONFIG_SLOB=y"` >>>>>>> >>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_EXPERT=y >>>>>>> >>>>>>> I suppose there's not really a concern with the virt defconfig, but I >>>>>>> did check the output of `make nommu_k210_defconfig" and despite not >>>>>>> having expert it seems to end up CONFIG_SLOB=y in the generated .config. >>>>>>> >>>>>>> I do have a board with a k210 so I checked with s/SLOB/SLUB and it still >>>>>>> boots etc, but I have no workloads or w/e to run on it. >>>>>> >>>>>> I sent a patch to change the k210 defconfig to using SLUB. However... >>>> >>>> Thanks! >>>> >>>>>> The current default config using SLOB gives about 630 free memory pages >>>>>> after boot (cat /proc/vmstat). Switching to SLUB, this is down to about >>>>>> 400 free memory pages (CONFIG_SLUB_CPU_PARTIAL is off). >>>> >>>> Thanks for the testing! How much RAM does the system have btw? I found 8MB >>>> somewhere, is that correct? >>> >>> Yep, 8MB, that's it. >>> >>>> So 230 pages that's a ~920 kB difference. Last time we saw less dramatic >>>> difference [1]. But that was looking at Slab pages, not free pages. The >>>> extra overhead could be also in percpu allocations, code etc. >>>> >>>>>> This is with a buildroot kernel 5.19 build including a shell and sd-card >>>>>> boot. With SLUB, I get clean boots and a shell prompt as expected. But I >>>>>> definitely see more errors with shell commands failing due to allocation >>>>>> failures for the shell process fork. So as far as the K210 is concerned, >>>>>> switching to SLUB is not ideal. >>>>>> >>>>>> I would not want to hold on kernel mm improvements because of this toy >>>>>> k210 though, so I am not going to prevent SLOB deprecation. I just wish >>>>>> SLUB itself used less memory :) >>>>> >>>>> Did further tests with kernel 6.0.1: >>>>> * SLOB: 630 free pages after boot, shell working (occasional shell fork >>>>> failure happen though) >>>>> * SLAB: getting memory allocation for order 7 failures on boot already >>>>> (init process). Shell barely working (high frequency of shell command fork >>>>> failures) >>> >>> I forgot to add here that the system was down to about 500 free pages >>> after boot (again from the shell with "cat /proc/vmstat"). >>> >>>>> * SLUB: getting memory allocation for order 7 failures on boot. I do get a >>>>> shell prompt but cannot run any shell command that involves forking a new >>>>> process. >>> >>> For both slab and slub, I had cpu partial off, debug off and slab merge >>> on, as I suspected that would lead to less memory overhead. >>> I suspected memory fragmentation may be an issue but doing >>> >>> echo 3 > /proc/sys/vm/drop_caches >>> >>> before trying a shell command did not help much at all (it usually does on >>> that board with SLOB). Note that this is all with buildroot, so this echo >>> & redirect always works as it does not cause a shell fork. >>> >>>>> >>>>> So if we want to keep the k210 support functional with a shell, we need >>>>> slob. If we reduce that board support to only one application started as >>>>> the init process, then I guess anything is OK. >>>> >>>> In [1] it was possible to save some more memory with more tuning. Some of >>>> that required boot parameters and other code changes. In another reply [2] I >>>> considered adding something like SLUB_TINY to take care of all that, so >>>> looks like it would make sense to proceed with that. >>> >>> If you want me to test something, let me know. >> >> Would you try this please? >> >> diff --git a/mm/slub.c b/mm/slub.c >> index a24b71041b26..1c36c4b9aaa0 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -4367,9 +4367,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) >> * The larger the object size is, the more slabs we want on the partial >> * list to avoid pounding the page allocator excessively. >> */ >> - s->min_partial = min_t(unsigned long, MAX_PARTIAL, ilog2(s->size) / 2); >> - s->min_partial = max_t(unsigned long, MIN_PARTIAL, s->min_partial); >> - >> + s->min_partial = 0; >> set_cpu_partial(s); >> >> #ifdef CONFIG_NUMA >> >> >> and booting with and without boot parameter slub_max_order=0? > > Test notes: I used Linus 6.1-rc5 as the base. That is the only thing I > changed in buildroot default config for the sipeed maix bit card, booting > with SD card. The test is: booting and run "cat /proc/vmstat" and register > the nr_free_pages value. I repeated the boot + cat 3 to 4 times for each case. > > Here are the results: > > 6.1-rc5, SLOB: > - 623 free pages > - 629 free pages > - 629 free pages > 6.1-rc5, SLUB: > - 448 free pages > - 448 free pages > - 429 free pages > 6.1-rc5, SLUB + slub_max_order=0: > - Init error, shell prompt but no shell command working > - Init error, no shell prompt > - 508 free pages > - Init error, shell prompt but no shell command working > 6.1-rc5, SLUB + patch: > - Init error, shell prompt but no shell command working > - 433 free pages > - 448 free pages > - 423 free pages > 6.1-rc5, SLUB + slub_max_order=0 + patch: > - Init error, no shell prompt > - Init error, shell prompt, 499 free pages > - Init error, shell prompt but no shell command working > - Init error, no shell prompt > > No changes for SLOB results, expected. > > For default SLUB, I did get all clean boots this time and could run the > cat command. But I do see shell fork failures if I keep running commands. > > For SLUB + slub_max_order=0, I only got one clean boot with 508 free > pages. Remaining runs failed to give a shell prompt or allow running cat > command. For the clean boot, I do see higher number of free pages. > > SLUB with the patch was nearly identical to SLUB without the patch. > > And SLUB+patch+slub_max_order=0 gave again a lot of errors/bad boot. I > could run the cat command only once, giving 499 free pages, so better than > regular SLUB. But it seems that the memory is more fragmented as > allocations fail more often. Note about the last case (SLUB+patch+slub_max_order=0). Here are the messages I got when the init shell process fork failed: [ 1.217998] nommu: Allocation of length 491520 from process 1 (sh) failed [ 1.224098] active_anon:0 inactive_anon:0 isolated_anon:0 [ 1.224098] active_file:5 inactive_file:12 isolated_file:0 [ 1.224098] unevictable:0 dirty:0 writeback:0 [ 1.224098] slab_reclaimable:38 slab_unreclaimable:459 [ 1.224098] mapped:0 shmem:0 pagetables:0 [ 1.224098] sec_pagetables:0 bounce:0 [ 1.224098] kernel_misc_reclaimable:0 [ 1.224098] free:859 free_pcp:0 free_cma:0 [ 1.260419] Node 0 active_anon:0kB inactive_anon:0kB active_file:20kB inactive_file:48kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:576kB pagetables:0kB sec_pagetables:0kB all_unreclaimable? no [ 1.285147] DMA32 free:3436kB boost:0kB min:312kB low:388kB high:464kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:28kB unevictable:0kB writepending:0kB present:8192kB managed:6240kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1.310654] lowmem_reserve[]: 0 0 0 [ 1.314089] DMA32: 17*4kB (U) 10*8kB (U) 7*16kB (U) 6*32kB (U) 11*64kB (U) 6*128kB (U) 6*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3460kB [ 1.326883] 33 total pagecache pages [ 1.330420] binfmt_flat: Unable to allocate RAM for process text/data, errno -12 [ 1.337858] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b -- Damien Le Moal Western Digital Research