This is a note to let you know that I've just added the patch titled xfs: fix COW writeback race to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: xfs-fix-cow-writeback-race.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From hch@xxxxxx Thu Feb 2 11:15:22 2017 From: Christoph Hellwig <hch@xxxxxx> Date: Thu, 2 Feb 2017 08:56:06 +0100 Subject: xfs: fix COW writeback race To: stable@xxxxxxxxxxxxxxx Cc: linux-xfs@xxxxxxxxxxxxxxx, "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> Message-ID: <1486022171-8076-15-git-send-email-hch@xxxxxx> From: Christoph Hellwig <hch@xxxxxx> commit d2b3964a0780d2d2994eba57f950d6c9fe489ed8 upstream. Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/xfs/libxfs/xfs_bmap.c | 44 ++++++++++++++++++++++++++++++++------------ fs/xfs/libxfs/xfs_bmap.h | 6 +++++- fs/xfs/xfs_iomap.c | 2 +- 3 files changed, 38 insertions(+), 14 deletions(-) --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4607,8 +4607,6 @@ xfs_bmapi_write( int n; /* current extent index */ xfs_fileoff_t obno; /* old block number (offset) */ int whichfork; /* data or attr fork */ - char inhole; /* current location is hole in file */ - char wasdelay; /* old extent was delayed */ #ifdef DEBUG xfs_fileoff_t orig_bno; /* original block number value */ @@ -4694,22 +4692,44 @@ xfs_bmapi_write( bma.firstblock = firstblock; while (bno < end && n < *nmap) { - inhole = eof || bma.got.br_startoff > bno; - wasdelay = !inhole && isnullstartblock(bma.got.br_startblock); + bool need_alloc = false, wasdelay = false; - /* - * Make sure we only reflink into a hole. - */ - if (flags & XFS_BMAPI_REMAP) - ASSERT(inhole); - if (flags & XFS_BMAPI_COWFORK) - ASSERT(!inhole); + /* in hole or beyoned EOF? */ + if (eof || bma.got.br_startoff > bno) { + if (flags & XFS_BMAPI_DELALLOC) { + /* + * For the COW fork we can reasonably get a + * request for converting an extent that races + * with other threads already having converted + * part of it, as there converting COW to + * regular blocks is not protected using the + * IOLOCK. + */ + ASSERT(flags & XFS_BMAPI_COWFORK); + if (!(flags & XFS_BMAPI_COWFORK)) { + error = -EIO; + goto error0; + } + + if (eof || bno >= end) + break; + } else { + need_alloc = true; + } + } else { + /* + * Make sure we only reflink into a hole. + */ + ASSERT(!(flags & XFS_BMAPI_REMAP)); + if (isnullstartblock(bma.got.br_startblock)) + wasdelay = true; + } /* * First, deal with the hole before the allocated space * that we found, if any. */ - if (inhole || wasdelay) { + if (need_alloc || wasdelay) { bma.eof = eof; bma.conv = !!(flags & XFS_BMAPI_CONVERT); bma.wasdel = wasdelay; --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -110,6 +110,9 @@ struct xfs_extent_free_item /* Map something in the CoW fork. */ #define XFS_BMAPI_COWFORK 0x200 +/* Only convert delalloc space, don't allocate entirely new extents */ +#define XFS_BMAPI_DELALLOC 0x400 + #define XFS_BMAPI_FLAGS \ { XFS_BMAPI_ENTIRE, "ENTIRE" }, \ { XFS_BMAPI_METADATA, "METADATA" }, \ @@ -120,7 +123,8 @@ struct xfs_extent_free_item { XFS_BMAPI_CONVERT, "CONVERT" }, \ { XFS_BMAPI_ZERO, "ZERO" }, \ { XFS_BMAPI_REMAP, "REMAP" }, \ - { XFS_BMAPI_COWFORK, "COWFORK" } + { XFS_BMAPI_COWFORK, "COWFORK" }, \ + { XFS_BMAPI_DELALLOC, "DELALLOC" } static inline int xfs_bmapi_aflag(int w) --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -681,7 +681,7 @@ xfs_iomap_write_allocate( xfs_trans_t *tp; int nimaps; int error = 0; - int flags = 0; + int flags = XFS_BMAPI_DELALLOC; int nres; if (whichfork == XFS_COW_FORK) Patches currently in stable-queue which might be from hch@xxxxxx are queue-4.9/xfs-don-t-rely-on-total-in-xfs_alloc_space_available.patch queue-4.9/xfs-replace-xfs_mode_to_ftype-table-with-switch-statement.patch queue-4.9/xfs-fix-bogus-minleft-manipulations.patch queue-4.9/xfs-fix-cow-writeback-race.patch queue-4.9/xfs-sanity-check-inode-mode-when-creating-new-dentry.patch queue-4.9/xfs-extsize-hints-are-not-unlikely-in-xfs_bmap_btalloc.patch queue-4.9/xfs-bump-up-reserved-blocks-in-xfs_alloc_set_aside.patch queue-4.9/xfs-add-missing-include-dependencies-to-xfs_dir2.h.patch queue-4.9/xfs-fix-bmv_count-confusion-w-shared-extents.patch queue-4.9/xfs-adjust-allocation-length-in-xfs_alloc_space_available.patch queue-4.9/xfs-verify-dirblocklog-correctly.patch queue-4.9/xfs-fix-xfs_mode_to_ftype-prototype.patch queue-4.9/xfs-clear-_xbf_pages-from-buffers-when-readahead-page.patch queue-4.9/xfs-remove-racy-hasattr-check-from-attr-ops.patch queue-4.9/xfs-make-the-assert-condition-likely.patch queue-4.9/xfs-sanity-check-directory-inode-di_size.patch queue-4.9/xfs-don-t-print-warnings-when-xfs_log_force-fails.patch queue-4.9/xfs-don-t-wrap-id-in-xfs_dq_get_next_id.patch queue-4.9/xfs-sanity-check-inode-di_mode.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html