On 9/19/22 13:56, Hyeonggon Yoo wrote: > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote: >> On 9/19/22 05:12, Feng Tang wrote: >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' >> > kunit test case cause a use-after-free error: >> > > > If I'm not mistaken, I think the subject should be: > s/double free/use after free/g Well, it's both AFAICS. By the initial use-after-free we can read a wrong s->flags that was modified since we freed for the first time, and it can lead to another kmem_cache_release() which is basically a double free. >> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30 >> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 >> > >> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> > Call Trace: >> > <TASK> >> > dump_stack_lvl+0x34/0x48 >> > print_address_description.constprop.0+0x87/0x2a5 >> > print_report+0x103/0x1ed >> > kasan_report+0xb7/0x140 >> > kobject_del+0x14/0x30 >> > kmem_cache_destroy+0x130/0x170 >> > test_exit+0x1a/0x30 >> > kunit_try_run_case+0xad/0xc0 >> > kunit_generic_run_threadfn_adapter+0x26/0x50 >> > kthread+0x17b/0x1b0 >> > </TASK> >> > >> > The cause is inside kmem_cache_destroy(): >> > >> > kmem_cache_destroy >> > acquire lock/mutex >> > shutdown_cache >> > schedule_work(kmem_cache_release) (if RCU flag set) >> > release lock/mutex >> > kmem_cache_release (if RCU flag set) >> >> ^ not set >> >> I've fixed that up. >> >> > >> > in some certain timing, the scheduled work could be run before >> > the next RCU flag checking which will get a wrong state. >> > >> > Fix it by caching the RCU flag inside protected area, just like 'refcnt' > > Very nice catch, thanks! > > Otherwise (and with Vlastimil's fix): > > Looks good to me. > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > >> > >> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx> >> >> Thanks! >> >> > --- >> > >> > note: >> > >> > The error only happens on linux-next tree, and not in Linus' tree, >> > which already has Waiman's commit: >> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() >> > without holding slab_mutex/cpu_hotplug_lock") >> >> Actually that commit is already in Linus' rc5 too, so I will send your fix >> this week too. Added a Fixes: 0495e337b703 (...) too. >> >> > mm/slab_common.c | 5 ++++- >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> > >> > diff --git a/mm/slab_common.c b/mm/slab_common.c >> > index 07b948288f84..ccc02573588f 100644 >> > --- a/mm/slab_common.c >> > +++ b/mm/slab_common.c >> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) >> > void kmem_cache_destroy(struct kmem_cache *s) >> > { >> > int refcnt; >> > + bool rcu_set; >> > >> > if (unlikely(!s) || !kasan_check_byte(s)) >> > return; >> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) >> > cpus_read_lock(); >> > mutex_lock(&slab_mutex); >> > >> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; >> > + >> > refcnt = --s->refcount; >> > if (refcnt) >> > goto out_unlock; >> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) >> > out_unlock: >> > mutex_unlock(&slab_mutex); >> > cpus_read_unlock(); >> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) >> > + if (!refcnt && !rcu_set) >> > kmem_cache_release(s); >> > } >> > EXPORT_SYMBOL(kmem_cache_destroy); >> >