[PATCH 2/2] xfs: don't preempt writeback sequence on single page wb error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



xfs_do_writepage() currently returns errors directly regardless of
whether it is called via ->writepages() or ->writepage(). In the
case of ->writepages(), an xfs_do_writepage() error return breaks
the current writeback sequence in write_cache_pages(). This means
that an integrity writeback (i.e., sync), for example, returns
before all associated pages have been processed.

This can be problematic in cases like unmount. If the writeback
doesn't process all delalloc pages before unmounting, we end up
reclaiming inodes with non-zero delalloc block counts. In turn, this
breaks block accounting and leaves the fs inconsistent.

XFS explicitly discards delalloc blocks on such writepage failures
to avoid this problem. This isn't terribly useful if we allow an
integrity writeback to complete (and thus a filesystem to unmount)
without addressing the entire set of dirty pages on an inode.
Therefore, change ->writepage[s]() to track high level error state
in the xfs_writepage_ctx structure and return it from the higher
level operation callout rather than xfs_do_writepage(). This ensures
that write_cache_pages() does not exit prematurely when called via
->writepages(), but both ->writepage() and ->writepages() still
ultimately return an error for the higher level operation.

This patch introduces a subtle change in the behavior of background
writeback in the event of persistent errors. The current behavior of
returning an error preempts the background writeback. Writeback
eventually comes around again and repeats the process for a few more
pages (in practice) before it once again fails. This repeats over
and over until the entire set of dirty pages is cleaned. This
behavior results in a somewhat slower stream of "page discard"
errors in the system log and dictates that many repeated fsync calls
may be required before the entire data set is processed and mapping
error consumed. With this change in place, background writeback
executes on as many pages as necessary as if each page writeback
were successful. The pages are cleaned immediately and significantly
more page discard errors can be observed at once.

Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
---
 fs/xfs/xfs_aops.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3feae3691467..438cfc66a40e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -32,6 +32,7 @@ struct xfs_writepage_ctx {
 	unsigned int		io_type;
 	unsigned int		cow_seq;
 	struct xfs_ioend	*ioend;
+	int			error;
 };
 
 struct block_device *
@@ -798,7 +799,9 @@ xfs_writepage_map(
 		end_page_writeback(page);
 done:
 	mapping_set_error(page->mapping, error);
-	return error;
+	if (!wpc->error)
+		wpc->error = error;
+	return 0;
 }
 
 /*
@@ -929,8 +932,8 @@ xfs_vm_writepage(
 
 	ret = xfs_do_writepage(page, wbc, &wpc);
 	if (wpc.ioend)
-		ret = xfs_submit_ioend(wbc, wpc.ioend, ret);
-	return ret;
+		ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error);
+	return ret ? ret : wpc.error;
 }
 
 STATIC int
@@ -946,8 +949,8 @@ xfs_vm_writepages(
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
 	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
 	if (wpc.ioend)
-		ret = xfs_submit_ioend(wbc, wpc.ioend, ret);
-	return ret;
+		ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error);
+	return ret ? ret : wpc.error;
 }
 
 STATIC int
-- 
2.17.2




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux