On Mon, Jun 15, 2020 at 08:07:36AM -0700, Matthew Wilcox wrote: > On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote: > > On Mon 15-06-20 16:25:52, Holger Hoffstätte wrote: > > > On 2020-06-15 13:56, Yafang Shao wrote: > > [...] > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > > index b356118..1ccfbf2 100644 > > > > --- a/fs/xfs/xfs_aops.c > > > > +++ b/fs/xfs/xfs_aops.c > > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend) > > > > struct writeback_control *wbc) > > > > { > > > > struct xfs_writepage_ctx wpc = { }; > > > > + unsigned int nofs_flag; > > > > + int ret; > > > > xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); > > > > - return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); > > > > + > > > > + /* > > > > + * We can allocate memory here while doing writeback on behalf of > > > > + * memory reclaim. To avoid memory allocation deadlocks set the > > > > + * task-wide nofs context for the following operations. > > > > + */ > > > > + nofs_flag = memalloc_nofs_save(); > > > > + ret = iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); > > > > + memalloc_nofs_restore(nofs_flag); > > > > + > > > > + return ret; > > > > } > > > > STATIC int > > > > > > > > > > Not sure if I did something wrong, but while the previous version of this patch > > > worked fine, this one gave me (with v2 removed obviously): > > > > > > [ +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1544 iomap_do_writepage+0x6b4/0x780 > > > > This corresponds to > > /* > > * Given that we do not allow direct reclaim to call us, we should > > * never be called in a recursive filesystem reclaim context. > > */ > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > > goto redirty; > > > > which effectivelly says that memalloc_nofs_save/restore cannot be used > > for that code path. Your stack trace doesn't point to a reclaim path > > which shows that this path is shared and also underlines that this is > > not really an intended use of the api. Please refer to > > Documentation/core-api/gfp_mask-from-fs-io.rst for more details but > > shortly the API should be used at the layer which defines a context > > which shouldn't allow to recurse. E.g. a lock which would be problematic > > in the reclaim recursion path. > > I'm concerned that this might not be the correct approach. Generally we > have the rule that freeing memory should not require allocating memory > to succeed. Since XFS obviously does need to allocate memory to start > a transaction, this allocation should normally be protected by a mempool > or similar. <sigh> We've been down this path before about exactly how much memory XFS uses to do extent allocation and why mempools nor any existing mm infrastructure can actually guarantee filesystem transactions forwards progress. e.g: https://lwn.net/Articles/636797/ It's even more problematic now that extent allocation is not jsut one transaction now, but is a series of overlapping linked transactions that update things like refcount and reverse mapping btrees, and not just the freespace and inode block map btrees... > XFS is subtle and has its own rules around memory allocation and > writeback, so I don't want to say this is definitely wrong. I'm just > saying that I have concerns it might not be right. Almost all XFS memory allocations within the writeback path are within transaction contexts - the transaction allocation itself is an exception, but then xfs_trans_alloc() sets PF_MEMALLOC_NOFS itself (used to be PF_FSTRANS, as per my previous comments in this thread). Hence putting the entire ->writepages path under NOFS context won't change anything materially for XFS. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx