On Fri, Sep 25, 2020 at 1:56 PM Roman Gushchin <guro@xxxxxx> wrote: > > On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote: > > On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt <shakeelb@xxxxxxxxxx> > > wrote: > > > > > [snip] > > > > > > I don't think you can ignore the flushing. The __free_once() in > > > ___cache_free() assumes there is a space available. > > > > > > BTW do_drain() also have the same issue. > > > > > > Why not move slabs_destroy() after we update ac->avail and memmove()? > > > > Ming, can you please try the following patch? > > > > > > From: Shakeel Butt <shakeelb@xxxxxxxxxx> > > > > [PATCH] mm: slab: fix potential infinite recursion in ___cache_free > > > > With the commit 10befea91b61 ("mm: memcg/slab: use a single set of > > kmem_caches for all allocations"), it becomes possible to call kfree() > > from the slabs_destroy(). However if slabs_destroy() is being called for > > the array_cache of the local CPU then this opens the potential scenario > > of infinite recursion because kfree() called from slabs_destroy() can > > call slabs_destroy() with the same array_cache of the local CPU. Since > > the array_cache of the local CPU is not updated before calling > > slabs_destroy(), it will try to free the same pages. > > > > To fix the issue, simply update the cache before calling > > slabs_destroy(). > > > > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > I like the patch and I think it should fix the problem. > > However the description above should be likely asjusted a bit. > It seems that the problem is not necessary caused by an infinite recursion, > it can be even simpler. > > In cache_flusharray() we rely on the state of ac, which is described > by ac->avail. In particular we rely on batchcount < ac->avail, > as we shift the batchcount number of pointers by memmove. > But if slabs_destroy() is called before and leaded to a change of the > ac state, it can lead to a memory corruption. > > Also, unconditionally resetting ac->avail to 0 in do_drain() after calling > to slab_destroy() seems to be wrong. > It explains double free BUGs we've seen in stacktraces. > Yes, you are right. Let's first get this patch tested and after confirmation we can update the commit message.