[PATCH RFC 2/2] xfs: optimize eof page flush for iomap zeroing on truncate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The flush that occurs just before xfs_truncate_page() during a
non-extending truncate exists to avoid potential stale data exposure
problems when iomap zeroing might be racing with buffered writes
over unwritten extents. However, we've had reports of this causing
significant performance regressions on overwrite workloads where the
flush serves no correctness purpose. For example, the uuidd
mechanism stores time metadata to a file on every generation
sequence. This involves a buffered (over)write followed by a
truncate of the file to its current size. If these uuids are used as
transaction IDs for a database application, then overall performance
can suffer tremendously by the repeated flushing on every truncate.

To avoid this problem, update the truncate path to only flush in
scenarios that are known to conflict with iomap zeroing. iomap skips
zeroing when it sees a hole or unwritten extent, so this essentially
means the filesystem should flush if either of those scenarios have
outstanding dirty pagecache and can skip the flush otherwise.

The ideal longer term solution here is to avoid the need to flush
entirely and allow the zeroing to detect a dirty page and zero it
accordingly, but this is a bit more involved in that it may involve
the iomap interface. The purpose of this change is therefore to
prioritize addressing the performance regression in a straightfoward
enough manner that it can be separated from further improvements.

Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
---
 fs/xfs/xfs_iops.c | 44 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d31e64db243f..37f78117557e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -782,7 +782,15 @@ xfs_truncate_zeroing(
 	xfs_off_t		newsize,
 	bool			*did_zeroing)
 {
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode		*inode = VFS_I(ip);
+	struct xfs_ifork	*ifp = ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	xfs_off_t		end;
+	xfs_fileoff_t		end_fsb = XFS_B_TO_FSBT(mp, newsize);
 	int			error;
+	bool			found;
 
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
@@ -790,16 +798,40 @@ xfs_truncate_zeroing(
 				did_zeroing);
 	}
 
+	/*
+	 * No zeroing occurs if newsize is block aligned (or zero). The eof page
+	 * is partially zeroed by the pagecache truncate, if necessary, and
+	 * post-eof blocks are removed.
+	 */
+	if ((newsize & (i_blocksize(inode) - 1)) == 0)
+		return 0;
+
 	/*
 	 * iomap won't detect a dirty page over an unwritten block (or a cow
 	 * block over a hole) and subsequently skips zeroing the newly post-EOF
-	 * portion of the page. Flush the new EOF to convert the block before
-	 * the pagecache truncate.
+	 * portion of the page. To ensure proper zeroing occurs, flush the eof
+	 * page if it is dirty and backed by a hole or unwritten extent in the
+	 * data fork. This ensures that iomap sees the eof block in a state that
+	 * warrants zeroing.
+	 *
+	 * This should eventually be handled in iomap processing so we don't
+	 * have to flush at all. We do it here for now to avoid the additional
+	 * latency in cases where it's not absolutely required.
 	 */
-	error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, newsize - 1,
-					     newsize - 1);
-	if (error)
-		return error;
+	end = newsize - 1;
+	if (filemap_range_needs_writeback(inode->i_mapping, end, end)) {
+		xfs_ilock(ip, XFS_ILOCK_SHARED);
+		found = xfs_iext_lookup_extent(ip, ifp, end_fsb, &icur, &got);
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+		if (!found || got.br_startoff > end_fsb ||
+		    got.br_state == XFS_EXT_UNWRITTEN) {
+			error = filemap_write_and_wait_range(inode->i_mapping,
+					end, end);
+			if (error)
+				return error;
+		}
+	}
 	return xfs_truncate_page(ip, newsize, did_zeroing);
 }
 
-- 
2.37.3




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux