Re: [PATCH] xfs: ensure st_blocks never goes to zero during COW writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 20, 2024 at 06:29:59PM +0200, Christoph Hellwig wrote:
> COW writes remove the amount overwritten either directly for delalloc
> reservations, or in earlier deferred transactions than adding the new
> amount back in the bmap map transaction.  This means st_blocks on an
> inode where all data is overwritten using the COW path can temporarily
> show a 0 st_blocks.  This can easily be reproduced with the pending
> zoned device support where all writes use this path and trips the
> check in generic/615, but could also happen on a reflink file without
> that.
> 
> Fix this by temporarily add the pending blocks to be mapped to
> i_delayed_blks while the item is queued.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>

How hard is it to hit this race?a  I guess all you have to do is statx
in a loop while doing a bunch of cow writeback?

> ---
>  fs/xfs/libxfs/xfs_bmap.c |  1 +
>  fs/xfs/xfs_bmap_item.c   | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7df74c35d9f900..a63be14a9873e8 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4847,6 +4847,7 @@ xfs_bmapi_remap(
>  	}
>  
>  	ip->i_nblocks += len;
> +	ip->i_delayed_blks -= len;

This proabably ought to have a comment to reference xfs_bmap_defer_add.

>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
>  	if (ifp->if_format == XFS_DINODE_FMT_BTREE)
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index e224b49b7cff6d..fc5da2dc7c1c66 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -346,6 +346,18 @@ xfs_bmap_defer_add(
>  	trace_xfs_bmap_defer(bi);
>  
>  	xfs_bmap_update_get_group(tp->t_mountp, bi);
> +
> +	/*
> +	 * Ensure the deferred mapping is pre-recorded in i_delayed_blks.
> +	 *
> +	 * Otherwise stat can report zero blocks for an inode that actually has
> +	 * data when the entire mapping is in the process of being overwritten
> +	 * using the out of place write path. This is undone in after
> +	 * xfs_bmapi_remap has incremented di_nblocks for a successful
> +	 * operation.
> +	 */
> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
>  	xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
>  }
>  
> @@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item(
>  {
>  	struct xfs_bmap_intent		*bi = bi_entry(item);
>  
> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
> +
>  	xfs_bmap_update_put_group(bi);
>  	kmem_cache_free(xfs_bmap_intent_cache, bi);
>  }
> @@ -464,6 +479,9 @@ xfs_bui_recover_work(
>  	bi->bi_owner = *ipp;
>  	xfs_bmap_update_get_group(mp, bi);
>  
> +	/* see __xfs_bmap_add for details */

xfs_bmap_defer_add?

--D

> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
>  	xfs_defer_add_item(dfp, &bi->bi_list);
>  	return bi;
>  }
> -- 
> 2.43.0
> 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux