On Thu, Jan 17, 2019 at 02:20:02PM -0500, Brian Foster wrote: > The writeback code caches the current extent mapping across multiple > xfs_do_writepage() calls to avoid repeated lookups for sequential > pages backed by the same extent. This is known to be slightly racy > with extent fork changes in certain difficult to reproduce > scenarios. The cached extent is trimmed to within EOF to help avoid > the most common vector for this problem via speculative > preallocation management, but this is a band-aid that does not > address the fundamental problem. > > Now that we have an xfs_ifork sequence counter mechanism used to > facilitate COW writeback, we can use the same mechanism to validate > consistency between the data fork and cached writeback mappings. On > its face, this is somewhat of a big hammer approach because any > change to the data fork invalidates any mapping currently cached by > a writeback in progress regardless of whether the data fork change > overlaps with the range under writeback. In practice, however, the > impact of this approach is minimal in most cases. > > First, data fork changes (delayed allocations) caused by sustained > sequential buffered writes are amortized across speculative > preallocations. This means that a cached mapping won't be > invalidated by each buffered write of a common file copy workload, > but rather only on less frequent allocation events. Second, the > extent tree is always entirely in-core so an additional lookup of a > usable extent mostly costs a shared ilock cycle and in-memory tree > lookup. This means that a cached mapping reval is relatively cheap > compared to the I/O itself. Third, spurious invalidations don't > impact ioend construction. This means that even if the same extent > is revalidated multiple times across multiple writepage instances, > we still construct and submit the same size ioend (and bio) if the > blocks are physically contiguous. > > Update struct xfs_writepage_ctx with a new field to hold the > sequence number of the data fork associated with the currently > cached mapping. Check the wpc seqno against the data fork when the > mapping is validated and reestablish the mapping whenever the fork > has changed since the mapping was cached. This ensures that > writeback always uses a valid extent mapping and thus prevents lost > writebacks and stale delalloc block problems. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_aops.c | 60 ++++++++++++++++++++++++++++++++++++---------- > fs/xfs/xfs_iomap.c | 5 ++-- > 2 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index d9048bcea49c..649e4ad76add 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -29,6 +29,7 @@ > struct xfs_writepage_ctx { > struct xfs_bmbt_irec imap; > unsigned int io_type; > + unsigned int data_seq; > unsigned int cow_seq; > struct xfs_ioend *ioend; > }; > @@ -301,6 +302,42 @@ xfs_end_bio( > xfs_destroy_ioend(ioend, blk_status_to_errno(bio->bi_status)); > } > > +/* > + * Fast revalidation of the cached writeback mapping. Return true if the current > + * mapping is valid, false otherwise. > + */ > +static bool > +xfs_imap_valid( > + struct xfs_writepage_ctx *wpc, > + struct xfs_inode *ip, > + xfs_fileoff_t offset_fsb) > +{ > + /* > + * If this is a COW mapping, it is sufficient to check that the mapping > + * covers the offset. Be careful to check this first because the caller > + * can revalidate a COW mapping without updating the data seqno. > + */ > + if (offset_fsb < wpc->imap.br_startoff || > + offset_fsb >= wpc->imap.br_startoff + wpc->imap.br_blockcount) > + return false; > + if (wpc->io_type == XFS_IO_COW) > + return true; Maybe the comment should move below the first range check? Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@xxxxxx>