Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 24 Jul 2024 09:38:18 +1000

On Thu, Jul 18, 2024 at 09:53:14AM +0100, John Garry wrote:
> On 12/07/2024 00:20, Dave Chinner wrote:
> > > > /* Reflink'ed disallowed */
> > > > +	if (flags2 & XFS_DIFLAG2_REFLINK)
> > > > +		return __this_address;
> > > Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> > > superblock verifier or xfs_fs_fill_super fail the mount so that old
> > > kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> > > support for forcealign'd cow and starts writing out files with both
> > > iflags set?
> > I don't think we should error out the mount because reflink and
> > forcealign are enabled - that's going to be the common configuration
> > for every user of forcealign, right? I also don't think we should
> > throw a corruption error if both flags are set, either.
> > 
> > We're making an initial*implementation choice*  not to implement the
> > two features on the same inode at the same time. We are not making a
> > an on-disk format design decision that says "these two on-disk flags
> > are incompatible".
> > 
> > IOWs, if both are set on a current kernel, it's not corruption but a
> > more recent kernel that supports both flags has modified this inode.
> > Put simply, we have detected a ro-compat situation for this specific
> > inode.
> > 
> > Looking at it as a ro-compat situation rather then corruption,
> > what I would suggest we do is this:
> > 
> > 1. Warn at mount that reflink+force align inodes will be treated
> > as ro-compat inodes. i.e. read-only.
> > 
> > 2. prevent forcealign from being set if the shared extent flag is
> > set on the inode.
> > 
> > 3. prevent shared extents from being created if the force align flag
> > is set (i.e. ->remap_file_range() and anything else that relies on
> > shared extents will fail on forcealign inodes).
> > 
> > 4. if we read an inode with both set, we emit a warning and force
> > the inode to be read only so we don't screw up the force alignment
> > of the file (i.e. that inode operates in ro-compat mode.)
> > 
> > #1 is the mount time warning of potential ro-compat behaviour.
> > 
> > #2 and #3 prevent both from getting set on existing kernels.
> > 
> > #4 is the ro-compat behaviour that would occur from taking a
> > filesystem that ran on a newer kernel that supports force-align+COW.
> > This avoids corruption shutdowns and modifications that would screw
> > up the alignment of the shared and COW'd extents.
> > 
> 
> This seems fine for dealing with forcealign and reflink.
> 
> So what about forcealign and RT?
> 
> We want to support this config in future, but the current implementation
> will not support it.

What's the problem with supporting it right from the start? We
already support forcealign for RT, just it's a global config 
under the "big rt alloc" moniker rather than a per-inode flag.

Like all on-disk format change based features,
forcealign should add the EXPERIMENTAL flag to the filesystem for a
couple of releases after merge, so there will be plenty of time to
test both data and rt dev functionality before removing the
EXPERIMENTAL flag from it.

So why not just enable the per-inode flag with RT right from the
start given that this functionality is supposed to work and be
globally supported by the rtdev right now? It seems like a whole lot
less work to just enable it for RT now than it is to disable it...

> In this v2 series, I just disallow a mount for forcealign and RT, similar to
> reflink and RT together.
> 
> Furthermore, I am also saying here that still forcealign and RT bits set is
> a valid inode on-disk format and we just have to enforce a sb_rextsize to
> extsize relationship:
> 
> xfs_inode_validate_forcealign(
> 	struct xfs_mount	*mp,
> 	uint32_t		extsize,
> 	uint32_t		cowextsize,
> 	uint16_t		mode,
> 	uint16_t		flags,
> 	uint64_t		flags2)
> {
> 	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> ...
> 
> 
> 	/* extsize must be a multiple of sb_rextsize for RT */
> 	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
> 		return __this_address;
> 
> 	return NULL;
> }

I suspect the logic needs tweaking, but why not just do this right
from the start?

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx