Re: [PATCH] xfs: trim writepage mapping to within eof

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 13 Oct 2017 07:42:31 -0400

On Fri, Oct 13, 2017 at 08:22:44AM +1100, Dave Chinner wrote:
> On Thu, Oct 12, 2017 at 07:47:54AM -0400, Brian Foster wrote:
> > The writeback rework in commit fbcc02561359 ("xfs: Introduce
> > writeback context for writepages") introduced a subtle change in
> > behavior with regard to the block mapping used across the
> > ->writepages() sequence. The previous xfs_cluster_write() code would
> > only flush pages up to EOF at the time of the writepage, thus
> > ensuring that any pages due to file-extending writes would be
> > handled on a separate cycle and with a new, updated block mapping.
> > 
> > The updated code establishes a block mapping in xfs_writepage_map()
> > that could extend beyond EOF if the file has post-eof preallocation.
> > Because we now use the generic writeback infrastructure and pass the
> > cached mapping to each writepage call, there is no implicit EOF
> > limit in place. If eofblocks trimming occurs during ->writepages(),
> > any post-eof portion of the cached mapping becomes invalid. The
> > eofblocks code has no means to serialize against writeback because
> > there are no pages associated with post-eof blocks. Therefore if an
> > eofblocks trim occurs and is followed by a file-extending buffered
> > write, not only has the mapping become invalid, but we could end up
> > writing a page to disk based on the invalid mapping.
> > 
> > Consider the following sequence of events:
> > 
> > - A buffered write creates a delalloc extent and post-eof
> >   speculative preallocation.
> > - Writeback starts and on the first writepage cycle, the delalloc
> >   extent is converted to real blocks (including the post-eof blocks)
> >   and the mapping is cached.
> > - The file is closed and xfs_release() trims post-eof blocks. The
> >   cached writeback mapping is now invalid.
> > - Another buffered write appends the file with a delalloc extent.
> > - The concurrent writeback cycle picks up the just written page
> >   because the writeback range end is LLONG_MAX. xfs_writepage_map()
> >   attributes it to the (now invalid) cached mapping and writes the
> >   data to an incorrect location on disk (and where the file offset is
> >   still backed by a delalloc extent).
> > 
> > This problem is reproduced by xfstests test generic/463, which
> > triggers racing writes, appends, open/closes and writeback requests.
> > 
> > To address this problem, trim the mapping used during writeback to
> > within EOF when the mapping is created. This ensures the mapping is
> > revalidated for any pages encountered beyond EOF as of the time the
> > current mapping was cached.
> > 
> > Reported-by: Eryu Guan <eguan@xxxxxxxxxx>
> > Diagnosed-by: Eryu Guan <eguan@xxxxxxxxxx>
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> > 
> > Hi all,
> > 
> > This is a followup to the issue Eryu tracked down, described here[1].
> > 
> > Note that this patch will not deal with any writeback mapping validity
> > issues not associated with eofblocks management. Dave is working on a
> > more generic approach to deal with such problems. This patch is intended
> > to be a targeted and backportable fix for the regression in the
> > writeback code.
> > 
> > Brian
> > 
> > [1] https://marc.info/?l=linux-xfs&m=150406724427829&w=2
> > 
> >  fs/xfs/libxfs/xfs_bmap.c | 11 +++++++++++
> >  fs/xfs/libxfs/xfs_bmap.h |  1 +
> >  fs/xfs/xfs_aops.c        |  6 ++++--
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 044a363..dd3fb7b 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3852,6 +3852,17 @@ xfs_trim_extent(
> >  	}
> >  }
> >  
> > +/* trim extent to within eof */
> > +void
> > +xfs_trim_extent_eof(
> > +	struct xfs_bmbt_irec	*irec,
> > +	struct xfs_inode	*ip)
> > +
> > +{
> > +	xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount,
> > +					      i_size_read(VFS_I(ip))));
> > +}
> 
> Ok, so it's an unlocked, instantaneous sample of the inode size.
> Truncate can race with this and still occur any time after we've
> trimmed the extent but still have it cached.
> 

Yeah, but note that this is really only intended to deal with writeback
racing with eofblocks trimming. I'm not sure we can fully close any
other truncate/writeback issues without your broader, more generic
mapping invalidation work.

With regard to eofblocks trimming, I don't think it can hurt us once
we've trimmed the cached mapping once. Any new eofblocks that come in
due to new buffered writes aren't discovered until we acquire a new
mapping. Truncates up or down before we actually trim the cached mapping
basically also rule out eofblocks trims on file release causing a
problem for the writeback cycle.

> As such, I'm thinking this EOF trimming it should be put in
> xfs_imap_valid() - not xfs_map_blocks() - so it gets revalidated
> every time we check to see if the map covers the current file
> extent...
> 

I'm not sure it's necessary, but it seems Ok to me to be slightly more
aggressive in the validation as long as it's clear (which I think it is
;P) that it isn't intended to technically close any issues unrelated to
eofblocks. I'll give it a shot.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html