On Tue, Jun 16, 2020 at 12:48:06PM +0200, Michal Hocko wrote: > On Tue 16-06-20 17:39:33, Yafang Shao wrote: > > The history is complicated, but it doesn't matter. > > Let's turn back to the upstream kernel now. As I explained in the commit log, > > xfs_vm_writepages > > -> iomap_writepages. > > -> write_cache_pages > > -> lock_page <<<< This page is locked. > > -> writepages ( which is iomap_do_writepage) > > -> xfs_map_blocks > > -> xfs_convert_blocks > > -> xfs_bmapi_convert_delalloc > > -> xfs_trans_alloc > > -> kmem_zone_zalloc //It should alloc page > > with GFP_NOFS > > > > If GFP_NOFS isn't set in xfs_trans_alloc(), the kmem_zone_zalloc() may > > trigger the memory reclaim then it may wait on the page locked in > > write_cache_pages() ... > > This cannot happen because the memory reclaim backs off on locked pages. ->writepages can hold a bio with multiple PageWriteback pages already attached to it. Direct GFP_KERNEL page reclaim can wait on them - if that happens the the bio will never be issued and so reclaim will deadlock waiting for the writeback state to clear... > > That means the ->writepages should be set with GFP_NOFS to avoid this > > recursive filesystem reclaim. Indeed. We already have parts of the IO submission path under PF_MEMALLOC_NOFS so we can do transaction allocation, etc. See xfs_prepare_ioend(), which is called from iomap via: iomap_submit_ioend() ->prepare_ioend() xfs_prepare_ioend() we can get there from: iomap_writepage() iomap_do_writepage() iomap_writepage_map() iomap_submit_ioend() iomap_submit_ioend() and: iomap_writepages() write_cache_pages() iomap_do_writepage() iomap_writepage_map() iomap_submit_ioend() iomap_submit_ioend() Which says that we really should be putting both iomap_writepage() and iomap_writepages() under PF_MEMALLOC_NOFS context so that filesystem callouts don't have to repeatedly enter and exit PF_MEMALLOC_NOFS context to avoid memory reclaim recursion... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx