On Thu, Jun 18, 2020 at 8:34 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Tue, Jun 16, 2020 at 12:48:06PM +0200, Michal Hocko wrote: > > On Tue 16-06-20 17:39:33, Yafang Shao wrote: > > > The history is complicated, but it doesn't matter. > > > Let's turn back to the upstream kernel now. As I explained in the commit log, > > > xfs_vm_writepages > > > -> iomap_writepages. > > > -> write_cache_pages > > > -> lock_page <<<< This page is locked. > > > -> writepages ( which is iomap_do_writepage) > > > -> xfs_map_blocks > > > -> xfs_convert_blocks > > > -> xfs_bmapi_convert_delalloc > > > -> xfs_trans_alloc > > > -> kmem_zone_zalloc //It should alloc page > > > with GFP_NOFS > > > > > > If GFP_NOFS isn't set in xfs_trans_alloc(), the kmem_zone_zalloc() may > > > trigger the memory reclaim then it may wait on the page locked in > > > write_cache_pages() ... > > > > This cannot happen because the memory reclaim backs off on locked pages. > > ->writepages can hold a bio with multiple PageWriteback pages > already attached to it. Direct GFP_KERNEL page reclaim can wait on > them - if that happens the the bio will never be issued and so > reclaim will deadlock waiting for the writeback state to clear... > Thanks for the explanation. > > > That means the ->writepages should be set with GFP_NOFS to avoid this > > > recursive filesystem reclaim. > > Indeed. We already have parts of the IO submission path under > PF_MEMALLOC_NOFS so we can do transaction allocation, etc. See > xfs_prepare_ioend(), which is called from iomap via: > > iomap_submit_ioend() > ->prepare_ioend() > xfs_prepare_ioend() > > we can get there from: > > iomap_writepage() > iomap_do_writepage() > iomap_writepage_map() > iomap_submit_ioend() > iomap_submit_ioend() > > and: > > iomap_writepages() > write_cache_pages() > iomap_do_writepage() > iomap_writepage_map() > iomap_submit_ioend() > iomap_submit_ioend() > > Which says that we really should be putting both iomap_writepage() > and iomap_writepages() under PF_MEMALLOC_NOFS context so that > filesystem callouts don't have to repeatedly enter and exit > PF_MEMALLOC_NOFS context to avoid memory reclaim recursion... > Looks reasonable. I will update this patch to put both iomap_writepage() and iomap_writepages() under PF_MEMALLOC_NOFS context and try to remove the usage of PF_MEMALLOC_NOFS from the filesystem callouts. -- Thanks Yafang