Re: [PATCH 08/43] xfs: report the correct dio alignment for COW inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 11, 2024 at 09:54:33AM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Unfortunately the reported dio alignment can only report a single value
> for reads and writes, so unless we want to trigger these read-modify
> write cycles all the time, we need to increase both limits.
> 
> Without this zoned xfs triggers the warnings about failed page cache
> invalidation in kiocb_invalidate_post_direct_write all the time when
> running generic/551 when running on a 512 byte sector device, and
> eventually fails the test due to miscompares.
> 
> Hopefully we can add a separate read alignment to statx eventually.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  fs/xfs/xfs_ioctl.c |  6 +++++-
>  fs/xfs/xfs_iops.c  | 15 ++++++++++++++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 726282e74d54..de8ba5345e17 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1213,7 +1213,11 @@ xfs_file_ioctl(
>  		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>  		struct dioattr		da;
>  
> -		da.d_mem =  da.d_miniosz = target->bt_logical_sectorsize;
> +		da.d_mem = target->bt_logical_sectorsize;
> +		if (xfs_is_cow_inode(ip))
> +			da.d_miniosz = mp->m_sb.sb_blocksize;
> +		else
> +			da.d_miniosz = target->bt_logical_sectorsize;
>  		da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>  
>  		if (copy_to_user(arg, &da, sizeof(da)))
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..990df072ba35 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -582,7 +582,20 @@ xfs_report_dioalign(
>  
>  	stat->result_mask |= STATX_DIOALIGN;
>  	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block.

That's not quite accurate -- we're always forced to write an entire file
allocation unit so that the rest of the bmap code doesn't have to deal
with a file range that's mapped to multiple different space extents.
For all the existing reflink scenarios the allocation unit is always an
fsblock so this is a trifling difference.

However, once we start adding reflink to the rt device then there comes
the question of needing to handle allocation unit > fsblock, and all
these bits would have to change.

IOWs, I'm saying that this should be:

	if (xfs_is_cow_inode(ip))
		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
	else
		...

Though ATM this is a distinction that doesn't make a difference.

--D

> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we provide an emulation by doing a read-modify-write cycle
> +	 * through the cache, but that is highly inefficient.  Thus report the
> +	 * natively supported size here.
> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
> +	else
> +		stat->dio_offset_align = bdev_logical_block_size(bdev);
>  }
>  
>  static void
> -- 
> 2.45.2
> 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux