From: Zhang Yi <yi.zhang@xxxxxxxxxx> Current clone operation could be non-atomic if the destination of a file is beyond EOF, user could get a file with corrupted (zeroed) data on crash. The problem is about to pre-alloctions. If you write some data into a file [A, B) (the position letters are increased one by one), and xfs could pre-allocate some blocks, then we get a delayed extent [A, D). Then the writeback path allocate blocks and convert this delayed extent [A, C) since lack of enough contiguous physical blocks, so the extent [C, D) is still delayed. After that, both the in-memory and the on-disk file size are B. If we clone file range into [E, F) from another file, xfs_reflink_zero_posteof() would call iomap_zero_range() to zero out the range [B, E) beyond EOF and flush range. Since [C, D) is still a delayed extent, it will be zeroed and the file's in-memory && on-disk size will be updated to D after flushing and before doing the clone operation. This is wrong, because user can user can see the size change and read zeros in the middle of the clone operation. We need to keep the in-memory and on-disk size before the clone operation starts, so instead of writing zeroes through the page cache for delayed ranges beyond EOF, we convert these ranges to unwritten and invalidating any cached data over that range beyond EOF. Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> --- fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index ccf83e72d8ca..1a6d05830433 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1035,6 +1035,24 @@ xfs_buffered_write_iomap_begin( } if (imap.br_startoff <= offset_fsb) { + /* + * For zeroing out delayed allocation extent, we trim it if + * it's partial beyonds EOF block, or convert it to unwritten + * extent if it's all beyonds EOF block. + */ + if ((flags & IOMAP_ZERO) && + isnullstartblock(imap.br_startblock)) { + xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); + + if (offset_fsb >= eof_fsb) + goto convert_delay; + if (end_fsb > eof_fsb) { + end_fsb = eof_fsb; + xfs_trim_extent(&imap, offset_fsb, + end_fsb - offset_fsb); + } + } + /* * For reflink files we may need a delalloc reservation when * overwriting shared extents. This includes zeroing of @@ -1158,6 +1176,17 @@ xfs_buffered_write_iomap_begin( xfs_iunlock(ip, lockmode); return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq); +convert_delay: + xfs_iunlock(ip, lockmode); + truncate_pagecache(inode, offset); + error = xfs_bmapi_convert_delalloc(ip, XFS_DATA_FORK, offset, + iomap, NULL); + if (error) + return error; + + trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &imap); + return 0; + found_cow: seq = xfs_iomap_inode_sequence(ip, 0); if (imap.br_startoff <= offset_fsb) { -- 2.39.2