On Wed, Jan 08, 2025 at 09:55:32AM +0100, Christoph Hellwig wrote: > For I/O to reflinked blocks we always need to write an entire new > file system block, and the code enforces the file system block alignment > for the entire file if it has any reflinked blocks. > > Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read > vs write alignments for reflinked files. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_iops.c | 21 ++++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 6b0228a21617..40289fe6f5b2 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -580,9 +580,24 @@ xfs_report_dioalign( > struct xfs_buftarg *target = xfs_inode_buftarg(ip); > struct block_device *bdev = target->bt_bdev; > > - stat->result_mask |= STATX_DIOALIGN; > + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN; > stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; > - stat->dio_offset_align = bdev_logical_block_size(bdev); > + > + /* > + * For COW inodes, we can only perform out of place writes of entire > + * allocation units (blocks or RT extents). > + * For writes smaller than the allocation unit, we must fall back to > + * buffered I/O to perform read-modify-write cycles. At best this is > + * highly inefficient; at worst it leads to page cache invalidation > + * races. Tell applications to avoid this by reporting the larger write > + * alignment in dio_offset_align, and the smaller read alignment in > + * dio_read_offset_align. > + */ > + stat->dio_read_offset_align = bdev_logical_block_size(bdev); > + if (xfs_is_cow_inode(ip)) > + stat->dio_offset_align = xfs_inode_alloc_unitsize(ip); > + else > + stat->dio_offset_align = stat->dio_read_offset_align; This contradicts the proposed man page, which says the following about stx_dio_read_offset_align offset: If non-zero this value must be smaller than stx_dio_offset_align which must be provided by the file system. The proposed code sets stx_dio_read_offset_align and stx_dio_offset_align to the same value in some cases. Perhaps the documentation should say "less than or equal to"? Also, the claim that stx_dio_offset_align "must be provided by the file system" if stx_dio_read_offset_align is nonzero should probably be conditional on STATX_DIOALIGN being provided too. - Eric