On Wed, Dec 11, 2024 at 09:54:33AM +0100, Christoph Hellwig wrote: > For I/O to reflinked blocks we always need to write an entire new > file system block, and the code enforces the file system block alignment > for the entire file if it has any reflinked blocks. > > Unfortunately the reported dio alignment can only report a single value > for reads and writes, so unless we want to trigger these read-modify > write cycles all the time, we need to increase both limits. > > Without this zoned xfs triggers the warnings about failed page cache > invalidation in kiocb_invalidate_post_direct_write all the time when > running generic/551 when running on a 512 byte sector device, and > eventually fails the test due to miscompares. > > Hopefully we can add a separate read alignment to statx eventually. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > fs/xfs/xfs_ioctl.c | 6 +++++- > fs/xfs/xfs_iops.c | 15 ++++++++++++++- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c > index 726282e74d54..de8ba5345e17 100644 > --- a/fs/xfs/xfs_ioctl.c > +++ b/fs/xfs/xfs_ioctl.c > @@ -1213,7 +1213,11 @@ xfs_file_ioctl( > struct xfs_buftarg *target = xfs_inode_buftarg(ip); > struct dioattr da; > > - da.d_mem = da.d_miniosz = target->bt_logical_sectorsize; > + da.d_mem = target->bt_logical_sectorsize; > + if (xfs_is_cow_inode(ip)) > + da.d_miniosz = mp->m_sb.sb_blocksize; > + else > + da.d_miniosz = target->bt_logical_sectorsize; > da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1); > > if (copy_to_user(arg, &da, sizeof(da))) > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 6b0228a21617..990df072ba35 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -582,7 +582,20 @@ xfs_report_dioalign( > > stat->result_mask |= STATX_DIOALIGN; > stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; > - stat->dio_offset_align = bdev_logical_block_size(bdev); > + > + /* > + * On COW inodes we are forced to always rewrite an entire file system > + * block. That's not quite accurate -- we're always forced to write an entire file allocation unit so that the rest of the bmap code doesn't have to deal with a file range that's mapped to multiple different space extents. For all the existing reflink scenarios the allocation unit is always an fsblock so this is a trifling difference. However, once we start adding reflink to the rt device then there comes the question of needing to handle allocation unit > fsblock, and all these bits would have to change. IOWs, I'm saying that this should be: if (xfs_is_cow_inode(ip)) stat->dio_offset_align = xfs_inode_alloc_unitsize(ip); else ... Though ATM this is a distinction that doesn't make a difference. --D > + * > + * Because applications assume they can do sector sized direct writes > + * on XFS we provide an emulation by doing a read-modify-write cycle > + * through the cache, but that is highly inefficient. Thus report the > + * natively supported size here. > + */ > + if (xfs_is_cow_inode(ip)) > + stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize; > + else > + stat->dio_offset_align = bdev_logical_block_size(bdev); > } > > static void > -- > 2.45.2 > >