Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 08, 2025 at 09:55:32AM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_iops.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..40289fe6f5b2 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,24 @@ xfs_report_dioalign(
>  	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>  	struct block_device	*bdev = target->bt_bdev;
>  
> -	stat->result_mask |= STATX_DIOALIGN;
> +	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>  	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * For COW inodes, we can only perform out of place writes of entire
> +	 * allocation units (blocks or RT extents).
> +	 * For writes smaller than the allocation unit, we must fall back to
> +	 * buffered I/O to perform read-modify-write cycles.  At best this is
> +	 * highly inefficient; at worst it leads to page cache invalidation
> +	 * races.  Tell applications to avoid this by reporting the larger write
> +	 * alignment in dio_offset_align, and the smaller read alignment in
> +	 * dio_read_offset_align.
> +	 */
> +	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> +	else
> +		stat->dio_offset_align = stat->dio_read_offset_align;

This contradicts the proposed man page, which says the following about
stx_dio_read_offset_align offset:

          If non-zero this value must be smaller than  stx_dio_offset_align
          which  must be provided by the file system.

The proposed code sets stx_dio_read_offset_align and stx_dio_offset_align to the
same value in some cases.

Perhaps the documentation should say "less than or equal to"?

Also, the claim that stx_dio_offset_align "must be provided by the file system"
if stx_dio_read_offset_align is nonzero should probably be conditional on
STATX_DIOALIGN being provided too.

- Eric




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux