Re: [PATCH v3 08/21] xfs: Introduce FORCEALIGN inode flag

John Garry <john.g.garry@xxxxxxxxxx> · Wed, 1 May 2024 11:03:06 +0100

+/* Validate the forcealign inode flag */
+xfs_failaddr_t
+xfs_inode_validate_forcealign(
+	struct xfs_mount	*mp,
+	uint16_t		mode,

	umode_t			mode,

ok. BTW, other functions like xfs_inode_validate_extsize() use uint16_t


+	uint16_t		flags,
+	uint32_t		extsize,
+	uint32_t		cowextsize)

extent sizes are xfs_extlen_t types.

ok


+{
+	/* superblock rocompat feature flag */
+	if (!xfs_has_forcealign(mp))
+		return __this_address;
+
+	/* Only regular files and directories */
+	if (!S_ISDIR(mode) && !S_ISREG(mode))
+		return __this_address;
+
+	/* Doesn't apply to realtime files */
+	if (flags & XFS_DIFLAG_REALTIME)
+		return __this_address;

Why not? A rt device with an extsize of 1 fsb could make use of
forced alignment just like the data device to allow larger atomic
writes to be done. I mean, just because we haven't written the code
to do this yet doesn't mean it is an illegal on-disk format state.

ok, so where is a better place to disallow forcealign for RT now (since 
we have not written the code to support it nor verified it)?


+	/* Requires a non-zero power-of-2 extent size hint */
+	if (extsize == 0 || !is_power_of_2(extsize) ||
+	    (mp->m_sb.sb_agblocks % extsize))
+		return __this_address;

Please do these as indiviual checks with their own fail address.

ok

That way we can tell which check failed from the console output.
Also, the agblocks check is already split out below, so it's being
checked twice...

Also, why does force-align require a power-of-2 extent size? Why
does it require the extent size to be an exact divisor of the AG
size? Aren't these atomic write alignment restrictions? i.e.
shouldn't these only be enforced when the atomic writes inode flag
is set?

With regards the power-of-2 restriction, I think that the code changes 
are going to become a lot more complex if we don't enforce this for 
forcealign.

For example, consider xfs_file_dio_write(), where we check for an 
unaligned write based on forcealign extent mask. It's much simpler to 
rely on a power-of-2 size. And same for iomap extent zeroing.

So then it can be asked, for what reason do we want to support 
unorthodox, non-power-of-2 sizes? Who would want this?

As for AG size, again I think that it is required to be aligned to the 
forcealign extsize. As I remember, when converting from an FSB to a DB, 
if the AG itself is not aligned to the forcealign extsize, then the DB 
will not be aligned to the forcealign extsize. More below...


+	/* Requires agsize be a multiple of extsize */
+	if (mp->m_sb.sb_agblocks % extsize)
+		return __this_address;
+
+	/* Requires stripe unit+width (if set) be a multiple of extsize */
+	if ((mp->m_dalign && (mp->m_dalign % extsize)) ||
+	    (mp->m_swidth && (mp->m_swidth % extsize)))
+		return __this_address;

Again, this is an atomic write constraint, isn't it?

So why do we want forcealign? It is to only align extent FSBs? Or to 
align extents to DBs? I would have thought the latter. If so, it seems 
sensible to do this check also.


+	/* Requires no cow extent size hint */
+	if (cowextsize != 0)
+		return __this_address;

What if it's a reflinked file?

Yeah, I think that we want to disallow that.


.....

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d0e2cec6210d..d1126509ceb9 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1110,6 +1110,8 @@ xfs_flags2diflags2(
  		di_flags2 |= XFS_DIFLAG2_DAX;
  	if (xflags & FS_XFLAG_COWEXTSIZE)
  		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
+	if (xflags & FS_XFLAG_FORCEALIGN)
+		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
  
  	return di_flags2;
  }
@@ -1146,6 +1148,22 @@ xfs_ioctl_setattr_xflags(
  	if (i_flags2 && !xfs_has_v3inodes(mp))
  		return -EINVAL;
  
+	/*
+	 * Force-align requires a nonzero extent size hint and a zero cow
+	 * extent size hint.  It doesn't apply to realtime files.
+	 */
+	if (fa->fsx_xflags & FS_XFLAG_FORCEALIGN) {
+		if (!xfs_has_forcealign(mp))
+			return -EINVAL;
+		if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
+			return -EINVAL;
+		if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
+					FS_XFLAG_EXTSZINHERIT)))
+			return -EINVAL;
+		if (fa->fsx_xflags & FS_XFLAG_REALTIME)
+			return -EINVAL;
+	}

What about if the file already has shared extents on it (i.e.
reflinked or deduped?)

At the top of the function we have this check for RT:

	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
		/* Can't change realtime flag if any extents are allocated. */
		if (ip->i_df.if_nextents || ip->i_delayed_blks)
			return -EINVAL;
	}

Would expanding that check for forcealign also suffice? Indeed, later in 
this series I expanded this check to cover atomicwrites (when I really 
intended it for forcealign).


Also, why is this getting checked here instead of in
xfs_ioctl_setattr_check_extsize()?


@@ -1263,7 +1283,19 @@ xfs_ioctl_setattr_check_extsize(
  	failaddr = xfs_inode_validate_extsize(ip->i_mount,
  			XFS_B_TO_FSB(mp, fa->fsx_extsize),
  			VFS_I(ip)->i_mode, new_diflags);
-	return failaddr != NULL ? -EINVAL : 0;
+	if (failaddr)
+		return -EINVAL;
+
+	if (new_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
+		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
+				VFS_I(ip)->i_mode, new_diflags,
+				XFS_B_TO_FSB(mp, fa->fsx_extsize),
+				XFS_B_TO_FSB(mp, fa->fsx_cowextsize));
+		if (failaddr)
+			return -EINVAL;
+	}

Oh, it's because you're trying to use on-disk format validation
routines for user API validation. That, IMO, is a bad idea because
the on-disk format and kernel/user APIs should not be tied
together as they have different constraints and error conditions.

That also explains why xfs_inode_validate_forcealign() doesn't just
get passed the inode to validate - it's because you want to pass
information from the user API to it. This results in sub-optimal
code for both on-disk format validation and user API validation.

Can you please separate these and put all the force align user API
validation checks in the one function?


ok, fine. But it would be good to have clarification on function of 
forcealign, above, i.e. does it always align extents to disk blocks?

Thanks,
John