Re: [PATCH 4/6] fs: xfs: Support atomic write for statx

Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> · Fri, 9 Feb 2024 12:30:56 +0530

Hi John,

Thanks for the patch, I've added some review comments and questions
below.

On Wed, Jan 24, 2024 at 02:26:43PM +0000, John Garry wrote:
> Support providing info on atomic write unit min and max for an inode.
> 
> For simplicity, currently we limit the min at the FS block size, but a
> lower limit could be supported in future.
> 
> The atomic write unit min and max is limited by the guaranteed extent
> alignment for the inode.
> 
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_iops.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_iops.h |  4 ++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index a0d77f5f512e..0890d2f70f4d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -546,6 +546,44 @@ xfs_stat_blksize(
>  	return PAGE_SIZE;
>  }
>  
> +void xfs_get_atomic_write_attr(
> +	struct xfs_inode *ip,
> +	unsigned int *unit_min,
> +	unsigned int *unit_max)
> +{
> +	xfs_extlen_t		extsz = xfs_get_extsz(ip);
> +	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> +	struct block_device	*bdev = target->bt_bdev;
> +	unsigned int		awu_min, awu_max, align;
> +	struct request_queue	*q = bdev->bd_queue;
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	/*
> +	 * Convert to multiples of the BLOCKSIZE (as we support a minimum
> +	 * atomic write unit of BLOCKSIZE).
> +	 */
> +	awu_min = queue_atomic_write_unit_min_bytes(q);
> +	awu_max = queue_atomic_write_unit_max_bytes(q);
> +
> +	awu_min &= ~mp->m_blockmask;
> +	awu_max &= ~mp->m_blockmask;

I don't understand why we try to round down the awu_max to blocks size
here and not just have an explicit check of (awu_max < blocksize).

I think the issue with changing the awu_max is that we are using awu_max
to also indirectly reflect the alignment so as to ensure we don't cross
atomic boundaries set by the hw (eg we check uint_max % atomic alignment
== 0 in scsi). So once we change the awu_max, there's a chance that even
if an atomic write aligns to the new awu_max it still doesn't have the
right alignment and fails. 

It works right now since eveything is power of 2 but it should cause
issues incase we decide to remove that limitation.  Anyways, I think
this implicit behavior of things working since eveything is a power of 2
should atleast be documented in a comment, so these things are
immediately clear. 

> +
> +	align = XFS_FSB_TO_B(mp, extsz);
> +
> +	if (!awu_max || !xfs_inode_atomicwrites(ip) || !align ||
> +	    !is_power_of_2(align)) {

Correct me if I'm wrong but here as well, the is_power_of_2(align) is
esentially checking if the align % uinit_max == 0 (or vice versa if
unit_max is greater) so that an allocation of extsize will always align
nicely as needed by the device. 

So maybe we should use the % expression explicitly so that the intention
is immediately clear.

> +		*unit_min = 0;
> +		*unit_max = 0;
> +	} else {
> +		if (awu_min)
> +			*unit_min = min(awu_min, align);

How will the min() here work? If awu_min is the minumum set by the
device, how can statx be allowed to advertise something smaller than
that?

If I understand correctly, right now the way we set awu_min in scsi and
nvme, the follwoing should usually be true for a sane device:

 awu_min <= blocks size of fs <= align

 so the min() anyways becomes redundant, but if we do assume that there
 might be some weird devices with awu_min absurdly large (SCSI with
 high atomic granularity) we still can't actually advertise a min
 smaller than that of the device, or am I missing something here?

> +		else
> +			*unit_min = mp->m_sb.sb_blocksize;
> +
> +		*unit_max = min(awu_max, align);
> +	}
> +}
> +

Regards,
ojaswin