On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > Andrew Morton wrote: > > Poor ttm guys - this is a bit of a trap we set for them. > > Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\") > changed to use sc->gfp_mask rather than GFP_KERNEL. > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), > - GFP_KERNEL); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > > But this bug is caused by sc->gfp_mask containing some flags which are not > in GFP_KERNEL, right? Then, I think > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL); > > would hide this bug. > > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) Well no - ttm_page_pool_free() should stop calling kmalloc altogether. Just do struct page *pages_to_free[16]; and rework the code to free 16 pages at a time. Easy. Apart from all the other things we're discussing here, it should do this because kmalloc() isn't very reliable within a shrinker. > for > two reasons when __alloc_pages_nodemask() is called from shrinker functions. > > (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow > recursive __alloc_pages_nodemask() calls, kernel stack could overflow > under extreme memory pressure. > > (2) Some shrinker functions are using sleepable locks which could make kswapd > sleep for unpredictable duration. If kswapd is unexpectedly blocked inside > shrinker functions and somebody is expecting that kswapd is running for > reclaiming memory, it is a memory allocation deadlock. > > Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to > avoid deadlock inside shrinker functions.\") prevents unlimited recursive > __alloc_pages_nodemask() calls. Yes, there are such problems. Shrinkers do all sorts of surprising things - some of the filesystem ones do disk writes! And these involve all sorts of locking and memory allocations. But they won't be directly using scan_control.gfp_mask. They may be using open-coded __GFP_NOFS for the allocations. The complicated ones pass the IO over to kernel threads and wait for them to complete, which addresses the stack consumption concerns (at least). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>