xfs_do_writepage() currently returns errors directly regardless of whether it is called via ->writepages() or ->writepage(). In the case of ->writepages(), an xfs_do_writepage() error return breaks the current writeback sequence in write_cache_pages(). This means that an integrity writeback (i.e., sync), for example, returns before all associated pages have been processed. This can be problematic in cases like unmount. If the writeback doesn't process all delalloc pages before unmounting, we end up reclaiming inodes with non-zero delalloc block counts. In turn, this breaks block accounting and leaves the fs inconsistent. XFS explicitly discards delalloc blocks on such writepage failures to avoid this problem. This isn't terribly useful if we allow an integrity writeback to complete (and thus a filesystem to unmount) without addressing the entire set of dirty pages on an inode. Therefore, change ->writepage[s]() to track high level error state in the xfs_writepage_ctx structure and return it from the higher level operation callout rather than xfs_do_writepage(). This ensures that write_cache_pages() does not exit prematurely when called via ->writepages(), but both ->writepage() and ->writepages() still ultimately return an error for the higher level operation. This patch introduces a subtle change in the behavior of background writeback in the event of persistent errors. The current behavior of returning an error preempts the background writeback. Writeback eventually comes around again and repeats the process for a few more pages (in practice) before it once again fails. This repeats over and over until the entire set of dirty pages is cleaned. This behavior results in a somewhat slower stream of "page discard" errors in the system log and dictates that many repeated fsync calls may be required before the entire data set is processed and mapping error consumed. With this change in place, background writeback executes on as many pages as necessary as if each page writeback were successful. The pages are cleaned immediately and significantly more page discard errors can be observed at once. Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> --- fs/xfs/xfs_aops.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 3feae3691467..438cfc66a40e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -32,6 +32,7 @@ struct xfs_writepage_ctx { unsigned int io_type; unsigned int cow_seq; struct xfs_ioend *ioend; + int error; }; struct block_device * @@ -798,7 +799,9 @@ xfs_writepage_map( end_page_writeback(page); done: mapping_set_error(page->mapping, error); - return error; + if (!wpc->error) + wpc->error = error; + return 0; } /* @@ -929,8 +932,8 @@ xfs_vm_writepage( ret = xfs_do_writepage(page, wbc, &wpc); if (wpc.ioend) - ret = xfs_submit_ioend(wbc, wpc.ioend, ret); - return ret; + ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error); + return ret ? ret : wpc.error; } STATIC int @@ -946,8 +949,8 @@ xfs_vm_writepages( xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc); if (wpc.ioend) - ret = xfs_submit_ioend(wbc, wpc.ioend, ret); - return ret; + ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error); + return ret ? ret : wpc.error; } STATIC int -- 2.17.2