Patch "xfs: fix incorrect log_flushed on fsync" has been added to the 4.13-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    xfs: fix incorrect log_flushed on fsync

to the 4.13-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-fix-incorrect-log_flushed-on-fsync.patch
and it can be found in the queue-4.13 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


>From foo@baz Mon Sep 18 10:25:08 CEST 2017
From: Christoph Hellwig <hch@xxxxxx>
Date: Sun, 17 Sep 2017 14:06:28 -0700
Subject: xfs: fix incorrect log_flushed on fsync
To: stable@xxxxxxxxxxxxxxx
Cc: linux-xfs@xxxxxxxxxxxxxxx, Amir Goldstein <amir73il@xxxxxxxxx>, Josef Bacik <jbacik@xxxxxx>, "Darrick J . Wong" <darrick.wong@xxxxxxxxxx>
Message-ID: <20170917210631.10725-23-hch@xxxxxx>

From: Amir Goldstein <amir73il@xxxxxxxxx>

commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Josef Bacik <jbacik@xxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
 fs/xfs/xfs_log.c |    7 -------
 1 file changed, 7 deletions(-)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3375,8 +3375,6 @@ maybe_sleep:
 		 */
 		if (iclog->ic_state & XLOG_STATE_IOERROR)
 			return -EIO;
-		if (log_flushed)
-			*log_flushed = 1;
 	} else {
 
 no_sleep:
@@ -3480,8 +3478,6 @@ try_again:
 
 				xlog_wait(&iclog->ic_prev->ic_write_wait,
 							&log->l_icloglock);
-				if (log_flushed)
-					*log_flushed = 1;
 				already_slept = 1;
 				goto try_again;
 			}
@@ -3515,9 +3511,6 @@ try_again:
 			 */
 			if (iclog->ic_state & XLOG_STATE_IOERROR)
 				return -EIO;
-
-			if (log_flushed)
-				*log_flushed = 1;
 		} else {		/* just return */
 			spin_unlock(&log->l_icloglock);
 		}


Patches currently in stable-queue which might be from hch@xxxxxx are

queue-4.13/xfs-open-code-xfs_buf_item_dirty.patch
queue-4.13/xfs-properly-retry-failed-inode-items-in-case-of-error-during-buffer-writeback.patch
queue-4.13/xfs-use-kmem_free-to-free-return-value-of-kmem_zalloc.patch
queue-4.13/xfs-add-infrastructure-needed-for-error-propagation-during-buffer-io-failure.patch
queue-4.13/xfs-don-t-set-v3-xflags-for-v2-inodes.patch
queue-4.13/xfs-toggle-readonly-state-around-xfs_log_mount_finish.patch
queue-4.13/xfs-fix-log-recovery-corruption-error-due-to-tail-overwrite.patch
queue-4.13/xfs-move-bmbt-owner-change-to-last-step-of-extent-swap.patch
queue-4.13/xfs-check-for-race-with-xfs_reclaim_inode-in-xfs_ifree_cluster.patch
queue-4.13/xfs-always-verify-the-log-tail-during-recovery.patch
queue-4.13/xfs-open-code-end_buffer_async_write-in-xfs_finish_page_writeback.patch
queue-4.13/xfs-relog-dirty-buffers-during-swapext-bmbt-owner-change.patch
queue-4.13/xfs-disable-per-inode-dax-flag.patch
queue-4.13/xfs-refactor-buffer-logging-into-buffer-dirtying-helper.patch
queue-4.13/xfs-fix-recovery-failure-when-log-record-header-wraps-log-end.patch
queue-4.13/xfs-skip-bmbt-block-ino-validation-during-owner-change.patch
queue-4.13/xfs-don-t-log-dirty-ranges-for-ordered-buffers.patch
queue-4.13/xfs-stop-searching-for-free-slots-in-an-inode-chunk-when-there-are-none.patch
queue-4.13/xfs-fix-incorrect-log_flushed-on-fsync.patch
queue-4.13/xfs-evict-all-inodes-involved-with-log-redo-item.patch
queue-4.13/xfs-write-unmount-record-for-ro-mounts.patch
queue-4.13/xfs-remove-unnecessary-dirty-bli-format-check-for-ordered-bufs.patch
queue-4.13/xfs-disallow-marking-previously-dirty-buffers-as-ordered.patch
queue-4.13/xfs-handle-efscorrupted-during-head-tail-verification.patch
queue-4.13/xfs-ordered-buffer-log-items-are-never-formatted.patch



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]