Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625!

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 11 Nov 2014 20:38:59 -0800

On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:

> Andrew Morton wrote:
> > Poor ttm guys - this is a bit of a trap we set for them.
> 
> Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\")
> changed to use sc->gfp_mask rather than GFP_KERNEL.
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> -                       GFP_KERNEL);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> 
> But this bug is caused by sc->gfp_mask containing some flags which are not
> in GFP_KERNEL, right? Then, I think
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);
> 
> would hide this bug.
> 
> But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)

Well no - ttm_page_pool_free() should stop calling kmalloc altogether. 
Just do

	struct page *pages_to_free[16];

and rework the code to free 16 pages at a time.  Easy.

Apart from all the other things we're discussing here, it should do
this because kmalloc() isn't very reliable within a shrinker.

> for
> two reasons when __alloc_pages_nodemask() is called from shrinker functions.
> 
> (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
>     recursive __alloc_pages_nodemask() calls, kernel stack could overflow
>     under extreme memory pressure.
> 
> (2) Some shrinker functions are using sleepable locks which could make kswapd
>     sleep for unpredictable duration. If kswapd is unexpectedly blocked inside
>     shrinker functions and somebody is expecting that kswapd is running for
>     reclaiming memory, it is a memory allocation deadlock.
> 
> Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to
> avoid deadlock inside shrinker functions.\") prevents unlimited recursive
> __alloc_pages_nodemask() calls.

Yes, there are such problems.

Shrinkers do all sorts of surprising things - some of the filesystem
ones do disk writes!  And these involve all sorts of locking and memory
allocations.  But they won't be directly using scan_control.gfp_mask. 
They may be using open-coded __GFP_NOFS for the allocations.  The
complicated ones pass the IO over to kernel threads and wait for them
to complete, which addresses the stack consumption concerns (at least).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>