On Tue 16-06-20 09:06:05, Dave Chinner wrote: > On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote: > > On Mon 15-06-20 16:25:52, Holger Hoffstätte wrote: > > > On 2020-06-15 13:56, Yafang Shao wrote: > > [...] > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > > index b356118..1ccfbf2 100644 > > > > --- a/fs/xfs/xfs_aops.c > > > > +++ b/fs/xfs/xfs_aops.c > > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend) > > > > struct writeback_control *wbc) > > > > { > > > > struct xfs_writepage_ctx wpc = { }; > > > > + unsigned int nofs_flag; > > > > + int ret; > > > > xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); > > > > - return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); > > > > + > > > > + /* > > > > + * We can allocate memory here while doing writeback on behalf of > > > > + * memory reclaim. To avoid memory allocation deadlocks set the > > > > + * task-wide nofs context for the following operations. > > > > + */ > > > > + nofs_flag = memalloc_nofs_save(); > > > > + ret = iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); > > > > + memalloc_nofs_restore(nofs_flag); > > > > + > > > > + return ret; > > > > } > > > > STATIC int > > > > > > > > > > Not sure if I did something wrong, but while the previous version of this patch > > > worked fine, this one gave me (with v2 removed obviously): > > > > > > [ +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1544 iomap_do_writepage+0x6b4/0x780 > > > > This corresponds to > > /* > > * Given that we do not allow direct reclaim to call us, we should > > * never be called in a recursive filesystem reclaim context. > > */ > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > > goto redirty; > > > > which effectivelly says that memalloc_nofs_save/restore cannot be used > > for that code path. > > No it doesn't. Everyone is ignoring the -context- of this code, most > especially the previous 3 lines of code and it's comment: > > /* > * Refuse to write the page out if we are called from reclaim context. > * > * This avoids stack overflows when called from deeply used stacks in > * random callers for direct reclaim or memcg reclaim. We explicitly > * allow reclaim from kswapd as the stack usage there is relatively low. > * > * This should never happen except in the case of a VM regression so > * warn about it. > */ > if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == > PF_MEMALLOC)) > goto redirty; > > You will see that we specifically avoid this path from reclaim > context except for kswapd. And kswapd always runs with GFP_KERNEL > context so we allow writeback from it, so it will pass both this > check and the NOFS check above. > > IOws, we can't trigger to the WARN_ON_ONCE(current->flags & > PF_MEMALLOC_NOFS)) check from a memory reclaim context: this > PF_MEMALLOC_NOFS check here is not doing what people think it is. You are right. > History lesson time, eh? > > The recursion protection here -used- to use PF_FSTRANS, not > PF_MEMALLOC_NOFS. See commit 9070733b4efac ("xfs: abstract > PF_FSTRANS to PF_MEMALLOC_NOFS"). This hunk is most instructive > when you look at the comment: > > * Given that we do not allow direct reclaim to call us, we should > * never be called while in a filesystem transaction. > */ > - if (WARN_ON_ONCE(current->flags & PF_FSTRANS)) > + if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > goto redirty; > > It wasn't for memory allocation recursion protection in XFS - it was > for transaction reservation recursion protection by something trying > to flush data pages while holding a transaction reservation. Doing > this could deadlock the journal because the existing reservation > could prevent the nested reservation for being able to reserve space > in the journal and that is a self-deadlock vector. > > IOWs, this check is not protecting against memory reclaim recursion > bugs at all (that's the previous check I quoted). This check is > protecting against the filesystem calling writepages directly from a > context where it can self-deadlock. Thanks for the clarification. > So what we are seeing here is that the PF_FSTRANS -> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > about what type of error this check was protecting against. I have to admit that I am not familiar with the xfs code and the PF_TRANS abstraction to PF_MEMALLOC_NOFS was mostly automatic and I relied on xfs maintainers to tell me I was doing something stupid. Now after your explanation it makes more sense that the warning is indeed protecting from a different kind of issue. -- Michal Hocko SUSE Labs