Hi, On 10/21/24 11:14, Hyeonggon Yoo wrote: > When kobject_init_and_add() fails during cache creation, > kobj->name can be leaked because SLUB does not call kobject_put(), > which should have been invoked per the kobject API documentation. > It has a bit of historical context, though; SLUB does not call > kobject_put() to avoid double-free for struct kmem_cache because > 1) simply calling it would free all resources related to the cache, and > 2) struct kmem_cache descriptor is always freed by cache_cache()'s > error handling path, causing struct kmem_cache to be freed twice. > > This issue can be reproduced by creating new slab caches while applying > failslab for kernfs_node_cache. This makes kobject_add_varg() succeed, > but causes kobject_add_internal() to fail in kobject_init_and_add() > during cache creation. > > Historically, this issue has attracted developers' attention several times. > Each time a fix addressed either the leak or the double-free, > it caused the other issue. Let's summarize a bit of history here: > > The leak has existed since the early days of SLUB. > > Commit 54b6a731025f ("slub: fix leak of 'name' in sysfs_slab_add") > introduced a double-free bug while fixing the leak. > > Commit 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate > sysfs filename") re-introduced the leak while fixing the double-free > error. > > Commit dde3c6b72a16 ("mm/slub: fix a memory leak in sysfs_slab_add()") > fixed the memory leak, but it was later reverted by commit 757fed1d0898 > ("Revert "mm/slub: fix a memory leak in sysfs_slab_add()"") to avoid > the double-free error. Thanks a lot for digging that history out. > This is where we are now: we've chosen a memory leak over a double-free. > So, how can this be fixed? Let's change the logic this way: > > 1. Move cache name allocation from __kmem_cache_create_args() to > do_kmem_cache_create() and free all resources associated with the cache > on error in do_kmem_cache_create() to simplify error handling. > > 2. In do_kmem_cache_create(), assume all resources are already freed upon > sysfs_slab_add() failure and do not take the error handling path. Sounds good. > One more thing to address when calling kobject_put() in sysfs_slab_add() > on kobject_init_and_add() failure is that it shouldn't destroy caches > that were already created during the boot process just because it failed to > create sysfs files. Yeah that would likely lead to numerous errors. > To avoid that, use a no-op function as the release function if SLUB is > trying to create sysfs files for those early caches. > This ensures that resources allocated by the kobject infrastructure are > properly freed on error while keeping the early caches active. > > Note that using a no-op function as the release function is not recommended > by the kobject API document, but I can't come up with a better approach > here. So what if we just don't call kobject_put() in that case? We should not be leaking anything in that case, we just accept that the cache wasn't added to sysfs (should we pr_warn() btw?) but it will work otherwise? Since we want the boot caches remain around anyway, does it matter if they are also an initialized but not linked kobject? I think the comment "If this function returns an error, kobject_put() must be called" means that *if* you want to destroy it due to the failure, you must use kobject_put() and not e.g. kfree(). But IMHO it doesn't mean you must destroy it because of the kobject_add() failure. Seems it would be less hacky to me than the dummy release function approach. Thanks, Vlastimil > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > --- > mm/slab_common.c | 17 +++------------- > mm/slub.c | 50 ++++++++++++++++++++++++++++++++++++++---------- > 2 files changed, 43 insertions(+), 24 deletions(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 62878edb0a81..d4ccd10f3098 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -236,14 +236,12 @@ static struct kmem_cache *create_cache(const char *name, > goto out; > err = do_kmem_cache_create(s, name, object_size, args, flags); > if (err) > - goto out_free_cache; > + goto out; > > s->refcount = 1; > list_add(&s->list, &slab_caches); > return s; > > -out_free_cache: > - kmem_cache_free(kmem_cache, s); > out: > return ERR_PTR(err); > } > @@ -281,7 +279,6 @@ struct kmem_cache *__kmem_cache_create_args(const char *name, > slab_flags_t flags) > { > struct kmem_cache *s = NULL; > - const char *cache_name; > int err; > > #ifdef CONFIG_SLUB_DEBUG > @@ -332,18 +329,10 @@ struct kmem_cache *__kmem_cache_create_args(const char *name, > if (s) > goto out_unlock; > > - cache_name = kstrdup_const(name, GFP_KERNEL); > - if (!cache_name) { > - err = -ENOMEM; > - goto out_unlock; > - } > - > args->align = calculate_alignment(flags, args->align, object_size); > - s = create_cache(cache_name, object_size, args, flags); > - if (IS_ERR(s)) { > + s = create_cache(name, object_size, args, flags); > + if (IS_ERR(s)) > err = PTR_ERR(s); > - kfree_const(cache_name); > - } > > out_unlock: > mutex_unlock(&slab_mutex); > diff --git a/mm/slub.c b/mm/slub.c > index c95ac26dad07..1daf4de8dc33 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -330,10 +330,11 @@ struct track { > enum track_item { TRACK_ALLOC, TRACK_FREE }; > > #ifdef SLAB_SUPPORTS_SYSFS > -static int sysfs_slab_add(struct kmem_cache *); > +static int sysfs_slab_add(struct kmem_cache *, bool); > static int sysfs_slab_alias(struct kmem_cache *, const char *); > #else > -static inline int sysfs_slab_add(struct kmem_cache *s) { return 0; } > +static inline int sysfs_slab_add(struct kmem_cache *s, bool sysfs_lazy) > + { return 0; } > static inline int sysfs_slab_alias(struct kmem_cache *s, const char *p) > { return 0; } > #endif > @@ -6137,7 +6138,12 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name, > { > int err = -EINVAL; > > - s->name = name; > + s->name = kstrdup_const(name, GFP_KERNEL); > + if (!s->name) { > + err = -ENOMEM; > + goto out; > + } > + > s->size = s->object_size = size; > > s->flags = kmem_cache_flags(flags, s->name); > @@ -6204,16 +6210,20 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name, > goto out; > } > > - err = sysfs_slab_add(s); > + err = sysfs_slab_add(s, false); > if (err) > - goto out; > + /* Resources are already freed on sysfs_slab_add() failure */ > + return err; > > if (s->flags & SLAB_STORE_USER) > debugfs_slab_add(s); > > out: > - if (err) > + if (err) { > __kmem_cache_release(s); > + kfree_const(s->name); > + kmem_cache_free(kmem_cache, s); > + } > return err; > } > > @@ -7162,11 +7172,22 @@ static void kmem_cache_release(struct kobject *k) > slab_kmem_cache_release(to_slab(k)); > } > > +static void kmem_cache_sysfs_lazy_release(struct kobject *k) { } > + > static const struct sysfs_ops slab_sysfs_ops = { > .show = slab_attr_show, > .store = slab_attr_store, > }; > > +/* > + * Early slab caches shouldn't be destroyed just because SLUB failed to create > + * sysfs files. use a no-op function as .release function. > + */ > +static const struct kobj_type slab_sysfs_lazy_ktype = { > + .sysfs_ops = &slab_sysfs_ops, > + .release = kmem_cache_sysfs_lazy_release, > +}; > + > static const struct kobj_type slab_ktype = { > .sysfs_ops = &slab_sysfs_ops, > .release = kmem_cache_release, > @@ -7223,12 +7244,13 @@ static char *create_unique_id(struct kmem_cache *s) > return name; > } > > -static int sysfs_slab_add(struct kmem_cache *s) > +static int sysfs_slab_add(struct kmem_cache *s, bool sysfs_lazy) > { > int err; > const char *name; > struct kset *kset = cache_kset(s); > int unmergeable = slab_unmergeable(s); > + const struct kobj_type *ktype; > > if (!unmergeable && disable_higher_order_debug && > (slub_debug & DEBUG_METADATA_FLAGS)) > @@ -7253,9 +7275,14 @@ static int sysfs_slab_add(struct kmem_cache *s) > } > > s->kobj.kset = kset; > - err = kobject_init_and_add(&s->kobj, &slab_ktype, NULL, "%s", name); > + if (sysfs_lazy) > + ktype = &slab_sysfs_lazy_ktype; > + else > + ktype = &slab_ktype; > + err = kobject_init_and_add(&s->kobj, ktype, NULL, "%s", name); > + > if (err) > - goto out; > + goto out_put_kobj; > > err = sysfs_create_group(&s->kobj, &slab_attr_group); > if (err) > @@ -7269,8 +7296,11 @@ static int sysfs_slab_add(struct kmem_cache *s) > if (!unmergeable) > kfree(name); > return err; > + > out_del_kobj: > kobject_del(&s->kobj); > +out_put_kobj: > + kobject_put(&s->kobj); > goto out; > } > > @@ -7337,7 +7367,7 @@ static int __init slab_sysfs_init(void) > slab_state = FULL; > > list_for_each_entry(s, &slab_caches, list) { > - err = sysfs_slab_add(s); > + err = sysfs_slab_add(s, true); > if (err) > pr_err("SLUB: Unable to add boot slab %s to sysfs\n", > s->name);