Re: [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 06, 2024 at 07:40:51AM -0400, Brian Foster wrote:
> fallocate unshare mode explicitly breaks extent sharing. When a
> command completes, it checks the data fork for any remaining shared
> extents to determine whether the reflink inode flag and COW fork
> preallocation can be removed. This logic doesn't consider in-core
> pagecache and I/O state, however, which means we can unsafely remove
> COW fork blocks that are still needed under certain conditions.
> 
> For example, consider the following command sequence:
> 
> xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
> 	-c "pwrite 0 32k" -c "funshare 0 1k" <file>
> 
> This allocates a data block at offset 0, shares it, and then
> overwrites it with a larger buffered write. The overwrite triggers
> COW fork preallocation, 32 blocks by default, which maps the entire
> 32k write to delalloc in the COW fork. All but the shared block at
> offset 0 remains hole mapped in the data fork. The unshare command
> redirties and flushes the folio at offset 0, removing the only
> shared extent from the inode. Since the inode no longer maps shared
> extents, unshare purges the COW fork before the remaining 28k may
> have written back.
> 
> This leaves dirty pagecache backed by holes, which writeback quietly
> skips, thus leaving clean, non-zeroed pagecache over holes in the
> file. To verify, fiemap shows holes in the first 32k of the file and
> reads return different data across a remount:
> 
> $ xfs_io -c "fiemap -v" <file>
> <file>:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    ...
>    1: [8..511]:        hole               504
>    ...
> $ xfs_io -c "pread -v 4k 8" <file>
> 00001000:  cd cd cd cd cd cd cd cd  ........
> $ umount <mnt>; mount <dev> <mnt>
> $ xfs_io -c "pread -v 4k 8" <file>
> 00001000:  00 00 00 00 00 00 00 00  ........
> 
> To avoid this problem, make unshare follow the same rules used for
> background cowblock scanning and never purge the COW fork for inodes
> with dirty pagecache or in-flight I/O.
> 
> Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>

Question: Does xfs_repair report orphaned cow staging blocks after this?
There's a longstanding bug that I've seen in the long soak xfs/286 VM
where we slowly leak cow fork blocks (~80 per ~1 billion fsxops over 7
days).

Anyhow this looks correct on its own so
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> ---
> 
> Here's another COW issue I came across via some unshare testing. A quick
> hack to enable unshare in fsx uncovered it. I'll follow up with a proper
> patch for that.
> 
> I'm sending this as a 2/1 here just to reflect patch order in my local
> tree. Also note that I haven't explicitly tested the fixes commit, but a
> quick test to switch back to the old full flush behavior on latest
> master also makes the problem go away, so I suspect that's where the
> regression was introduced.
> 
> Brian
> 
>  fs/xfs/xfs_icache.c  |  8 +-------
>  fs/xfs/xfs_reflink.c |  3 +++
>  fs/xfs/xfs_reflink.h | 19 +++++++++++++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 900a6277d931..a1b34e6ccfe2 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1278,13 +1278,7 @@ xfs_prep_free_cowblocks(
>  	 */
>  	if (!sync && inode_is_open_for_write(VFS_I(ip)))
>  		return false;
> -	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
> -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
> -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> -	    atomic_read(&VFS_I(ip)->i_dio_count))
> -		return false;
> -
> -	return true;
> +	return xfs_can_free_cowblocks(ip);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 6fde6ec8092f..5bf6682e701b 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1595,6 +1595,9 @@ xfs_reflink_clear_inode_flag(
>  
>  	ASSERT(xfs_is_reflink_inode(ip));
>  
> +	if (!xfs_can_free_cowblocks(ip))
> +		return 0;
> +
>  	error = xfs_reflink_inode_has_shared_extents(*tpp, ip, &needs_flag);
>  	if (error || needs_flag)
>  		return error;
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index fb55e4ce49fa..4a58e4533671 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -6,6 +6,25 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> +/*
> + * Check whether it is safe to free COW fork blocks from an inode. It is unsafe
> + * to do so when an inode has dirty cache or I/O in-flight, even if no shared
> + * extents exist in the data fork, because outstanding I/O may target blocks
> + * that were speculatively allocated to the COW fork.
> + */
> +static inline bool
> +xfs_can_free_cowblocks(struct xfs_inode *ip)
> +{
> +	struct inode *inode = VFS_I(ip);
> +
> +	if ((inode->i_state & I_DIRTY_PAGES) ||
> +	    mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY) ||
> +	    mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> +	    atomic_read(&inode->i_dio_count))
> +		return false;
> +	return true;
> +}
> +
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
>  int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> -- 
> 2.45.0
> 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux