On Sat, Nov 09, 2019 at 08:52:29PM +0000, Christopher Lameter wrote: > On Fri, 8 Nov 2019, Yu Zhao wrote: > > > If we are already under list_lock, don't call kmalloc(). Otherwise we > > will run into deadlock because kmalloc() also tries to grab the same > > lock. > > How did this happen? The kmalloc needs to be always done before the > list_lock is taken. > > > Fixing the problem by using a static bitmap instead. > > > > WARNING: possible recursive locking detected > > -------------------------------------------- > > mount-encrypted/4921 is trying to acquire lock: > > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > > > but task is already holding lock: > > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(&(&n->list_lock)->rlock); > > lock(&(&n->list_lock)->rlock); > > > > *** DEADLOCK *** > > > Ahh. list_slab_objects() in shutdown? > > There is a much easier fix for this: > > > > [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() > > list_slab_objects() is called when a slab is destroyed and there are objects still left > to list the objects in the syslog. This is a pretty rare event. > > And there it seems we take the list_lock and call kmalloc while holding that lock. > > Perform the allocation in free_partial() before the list_lock is taken. > > Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") > Signed-off-by: Christoph Lameter <cl@xxxxxxxxx> > > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000 > +++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000 > @@ -3690,14 +3690,11 @@ error: > } > > static void list_slab_objects(struct kmem_cache *s, struct page *page, > - const char *text) > + const char *text, unsigned long *map) > { > #ifdef CONFIG_SLUB_DEBUG > void *addr = page_address(page); > void *p; > - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); > - if (!map) > - return; > slab_err(s, page, text, s->name); > slab_lock(page); > > @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac > { > LIST_HEAD(discard); > struct page *page, *h; > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); > + > + if (!map) > + return; What would happen if we are trying to allocate from the slab that is being shut down? And shouldn't the allocation be conditional (i.e., only when CONFIG_SLUB_DEBUG=y)?