Re: [PATCH v1] mm/slab: Allow cache creation to proceed even if sysfs registration fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 1, 2024 at 10:58 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 11/1/24 14:15, Hyeonggon Yoo wrote:
> > On Fri, Nov 1, 2024 at 10:08 PM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> >>
> >> When kobject_init_and_add() fails during cache creation,
> >> kobj->name can be leaked because SLUB does not call kobject_put(),
> >> which should be invoked per the kobject API documentation.
> >> This has a bit of historical context, though; SLUB does not call
> >> kobject_put() to avoid double-free for struct kmem_cache because
> >> 1) simply calling it would free all resources related to the cache, and
> >> 2) struct kmem_cache descriptor is always freed by cache_cache()'s
> >> error handling path, causing struct kmem_cache to be freed twice.
> >>
> >> This issue can be reproduced by creating new slab caches while applying
> >> failslab for kernfs_node_cache. This makes kobject_add_varg() succeed,
> >> but causes kobject_add_internal() to fail in kobject_init_and_add()
> >> during cache creation.
> >>
> >> Historically, this issue has attracted developers' attention several times.
> >> Each time a fix addressed either the leak or the double-free,
> >> it caused the other issue. Let's summarize a bit of history here:
> >>
> >>   The leak has existed since the early days of SLUB.
> >>
> >>   Commit 54b6a731025f ("slub: fix leak of 'name' in sysfs_slab_add")
> >>   introduced a double-free bug while fixing the leak.
> >>
> >>   Commit 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate
> >>   sysfs filename") re-introduced the leak while fixing the double-free
> >>   error.
> >>
> >>   Commit dde3c6b72a16 ("mm/slub: fix a memory leak in sysfs_slab_add()")
> >>   fixed the memory leak, but it was later reverted by commit 757fed1d0898
> >>   ("Revert "mm/slub: fix a memory leak in sysfs_slab_add()"") to avoid
> >>   the double-free error.
> >>
> >>   This is where we are now: we've chosen a memory leak over a double-free.
> >>
> >> To resolve this memory leak, skip creating sysfs files if it fails
> >> and continue with cache creation regardless (as suggested by Christoph).
> >> This resolves the memory leak because both the cache and the kobject
> >> remain alive on kobject_init_and_add() failure.
> >>
> >> If SLUB tries to create an alias for a cache without sysfs files,
> >> its symbolic link will not be generated.
> >>
> >> Since a slab cache might not have associated sysfs files, call kobject_del()
> >> only if such files exist.
> >>
> >> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
>
> Thanks, added to slab/for-next
>
> >
> > [+Cc Jinjie, Liu]
> >
> >> ---
> >>
> >> RFC -> v1: Make sysfs optional instead of destroying the cache on sysfs
> >> errors. (Suggested by Christoph)
> >
> > RFC version for reference:
> > https://lore.kernel.org/linux-mm/20241021091413.154775-1-42.hyeyoo@xxxxxxxxx/
> >
> >>  mm/slub.c | 21 +++++++++++++++++----
> >>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 151a987dc3a0..b4b211468f77 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -6116,7 +6116,8 @@ __kmem_cache_alias(const char *name, unsigned int size, unsigned int align,
> >>         s = find_mergeable(size, align, flags, name, ctor);
> >>         if (s) {
> >>                 if (sysfs_slab_alias(s, name))
> >> -                       return NULL;
> >> +                       pr_err("SLUB: Unable to add slab alias %s to sysfs\n",
>
> I've changed "slab" to "cache".
>
> >> +                              name);
> >>
> >>                 s->refcount++;
> >>
> >> @@ -6204,9 +6205,15 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name,
> >>                 goto out;
> >>         }
> >>
> >> +       /*
> >> +        * Failing to create sysfs files is not critical to SLUB functionality.
> >> +        * If it fails, proceed with cache creation without these files.
> >> +        */
> >>         err = sysfs_slab_add(s);
> >> -       if (err)
> >> -               goto out;
> >> +       if (err) {
> >> +               err = 0;
> >> +               pr_err("SLUB: Unable to add slab %s to sysfs\n", s->name);
>
> Also here, and simplified to "if (sysfs_slab_add(s)) ... " to avoid dealing
> with err.

Oh no. err is initialized to -EINVAL, so that will not work as intended.
It is causing the following list corruption.

[    0.607833] __kmem_cache_create_args(fscrypt_inode_info) failed
with error -22
[    0.608518] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc2+ #63
[    0.609181] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.2-1.fc38 04/01/2014
[    0.610000] Call Trace:
[    0.610233]  <TASK>
[    0.610433]  dump_stack_lvl+0x64/0x80
[    0.610806]  __kmem_cache_create_args+0x1eb/0x280
[    0.611253]  ? __pfx_fscrypt_init+0x10/0x10
[    0.611647]  fscrypt_init+0x88/0xf0
[    0.611980]  ? __pfx_fscrypt_init+0x10/0x10
[    0.612373]  do_one_initcall+0x5b/0x320
[    0.612736]  kernel_init_freeable+0x351/0x510
[    0.613150]  ? __pfx_kernel_init+0x10/0x10
[    0.613536]  kernel_init+0x1a/0x1d0
[    0.613865]  ret_from_fork+0x34/0x50
[    0.614207]  ? __pfx_kernel_init+0x10/0x10
[    0.614591]  ret_from_fork_asm+0x1a/0x30
[    0.614968]  </TASK>
[    0.615203] list_add corruption. prev->next should be next
(ffff986bc2bb9aa0), but was ffff986bc2b6.
[    0.616308] ------------[ cut here ]------------
[    0.616746] kernel BUG at lib/list_debug.c:32!
[    0.617173] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[    0.617709] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc2+ #63
[    0.618372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.2-1.fc38 04/01/2014
[    0.619192] RIP: 0010:__list_add_valid_or_report+0x78/0xa0
[    0.619714] Code: 8b ff 0f 0b 48 89 c1 48 c7 c7 50 ca c0 9a e8 2f
f4 8b ff 0f 0b 48 89 d1 48 89 c6 b
[    0.621473] RSP: 0018:ffffa47380013c98 EFLAGS: 00010246
[    0.621969] RAX: 0000000000000075 RBX: ffff986bc2b63238 RCX: ffffffff9b5646a8
[    0.622638] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[    0.623313] RBP: ffff986bc2bb9ab8 R08: 0000000000000000 R09: 205d333032353136
[    0.623996] R10: 74707572726f6320 R11: 6464615f7473696c R12: ffff986bc2bb9aa0
[    0.624673] R13: ffff986bc2b63240 R14: ffff986bc2b63240 R15: ffff986bc2b631c0
[    0.625355] FS:  0000000000000000(0000) GS:ffff986bdf400000(0000)
knlGS:0000000000000000
[    0.626111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.626653] CR2: ffff986bd3801000 CR3: 000000001242a000 CR4: 00000000000006f0
[    0.627325] Call Trace:
[    0.627559]  <TASK>
[    0.627760]  ? die+0x36/0x90
[    0.628039]  ? do_trap+0xdd/0x100
[    0.628354]  ? __list_add_valid_or_report+0x78/0xa0
[    0.628814]  ? do_error_trap+0x6a/0x90
[    0.629172]  ? __list_add_valid_or_report+0x78/0xa0
[    0.629633]  ? exc_invalid_op+0x50/0x70
[    0.630001]  ? __list_add_valid_or_report+0x78/0xa0
[    0.630463]  ? asm_exc_invalid_op+0x1a/0x20
[    0.630861]  ? __list_add_valid_or_report+0x78/0xa0
[    0.631324]  ? __list_add_valid_or_report+0x78/0xa0
[    0.631784]  kobject_add_internal+0x78/0x2a0
[    0.632192]  kobject_init_and_add+0x8c/0xd0
[    0.632589]  ? kernfs_find_ns+0x35/0xc0
[    0.632957]  sysfs_slab_add+0x193/0x1e0
[    0.633318]  do_kmem_cache_create+0x455/0x630
[    0.633727]  __kmem_cache_create_args+0x157/0x280
[    0.634176]  ? __pfx_fsverity_init+0x10/0x10
[    0.634586]  fsverity_init_info_cache+0x66/0x90
[    0.635022]  fsverity_init+0x13/0x40
[    0.635365]  do_one_initcall+0x5b/0x320
[    0.635734]  kernel_init_freeable+0x351/0x510
[    0.636154]  ? __pfx_kernel_init+0x10/0x10
[    0.636547]  kernel_init+0x1a/0x1d0
[    0.636881]  ret_from_fork+0x34/0x50
[    0.637227]  ? __pfx_kernel_init+0x10/0x10
[    0.637619]  ret_from_fork_asm+0x1a/0x30
[    0.637998]  </TASK>
[    0.638211] Modules linked in:
[    0.638512] ---[ end trace 0000000000000000 ]---
[    0.638962] RIP: 0010:__list_add_valid_or_report+0x78/0xa0
[    0.639483] Code: 8b ff 0f 0b 48 89 c1 48 c7 c7 50 ca c0 9a e8 2f
f4 8b ff 0f 0b 48 89 d1 48 89 c6 b
[    0.641253] RSP: 0018:ffffa47380013c98 EFLAGS: 00010246
[    0.641753] RAX: 0000000000000075 RBX: ffff986bc2b63238 RCX: ffffffff9b5646a8
[    0.642431] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[    0.643118] RBP: ffff986bc2bb9ab8 R08: 0000000000000000 R09: 205d333032353136
[    0.643795] R10: 74707572726f6320 R11: 6464615f7473696c R12: ffff986bc2bb9aa0
[    0.644471] R13: ffff986bc2b63240 R14: ffff986bc2b63240 R15: ffff986bc2b631c0
[    0.645152] FS:  0000000000000000(0000) GS:ffff986bdf400000(0000)
knlGS:0000000000000000
[    0.645923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.646470] CR2: ffff986bd3801000 CR3: 000000001242a000 CR4: 00000000000006f0
[    0.647156] note: swapper/0[1] exited with preempt_count 1
[    0.647686] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    0.648446] Kernel Offset: 0x18000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000)
[    0.649466] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---


> >> +       }
> >>
> >>         if (s->flags & SLAB_STORE_USER)
> >>                 debugfs_slab_add(s);
> >> @@ -7276,7 +7283,8 @@ static int sysfs_slab_add(struct kmem_cache *s)
> >>
> >>  void sysfs_slab_unlink(struct kmem_cache *s)
> >>  {
> >> -       kobject_del(&s->kobj);
> >> +       if (s->kobj.state_in_sysfs)
> >> +               kobject_del(&s->kobj);
> >>  }
> >>
> >>  void sysfs_slab_release(struct kmem_cache *s)
> >> @@ -7305,6 +7313,11 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
> >>                  * If we have a leftover link then remove it.
> >>                  */
> >>                 sysfs_remove_link(&slab_kset->kobj, name);
> >> +               /*
> >> +                * The original cache may have failed to generate sysfs file.
> >> +                * In that case, sysfs_create_link() returns -ENOENT and
> >> +                * symbolic link creation is skipped.
> >> +                */
> >>                 return sysfs_create_link(&slab_kset->kobj, &s->kobj, name);
> >>         }
> >>
> >> --
> >> 2.45.0
> >>
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux