Re: [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 7 Feb 2022 09:11:06 -0800

On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote:
> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote:
> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote:
> >> This commit upgrades inodes to use 64-bit extent counters when they are read
> >> from disk. Inodes are upgraded only when the filesystem instance has
> >> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
> >> 
> >> Signed-off-by: Chandan Babu R <chandan.babu@xxxxxxxxxx>
> >> ---
> >>  fs/xfs/libxfs/xfs_inode_buf.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >> 
> >> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> >> index 2200526bcee0..767189c7c887 100644
> >> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> >> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> >> @@ -253,6 +253,12 @@ xfs_inode_from_disk(
> >>  	}
> >>  	if (xfs_is_reflink_inode(ip))
> >>  		xfs_ifork_init_cow(ip);
> >> +
> >> +	if ((from->di_version == 3) &&
> >> +	     xfs_has_nrext64(ip->i_mount) &&
> >> +	     !xfs_dinode_has_nrext64(from))
> >> +		ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> >
> > Hmm.  Last time around I asked about the oddness of updating the inode
> > feature flags outside of a transaction, and then never responded. :(
> > So to quote you from last time:
> >
> >> The following is the thought process behind upgrading an inode to
> >> XFS_DIFLAG2_NREXT64 when it is read from the disk,
> >>
> >> 1. With support for dynamic upgrade, The extent count limits of an
> >> inode needs to be determined by checking flags present within the
> >> inode i.e.  we need to satisfy self-describing metadata property. This
> >> helps tools like xfs_repair and scrub to verify inode's extent count
> >> limits without having to refer to other metadata objects (e.g.
> >> superblock feature flags).
> >
> > I think this makes an even /stronger/ argument for why this update
> > needs to be transactional.
> >
> >> 2. Upgrade when performed inside xfs_trans_log_inode() may cause
> >> xfs_iext_count_may_overflow() to return -EFBIG when the inode's
> >> data/attr extent count is already close to 2^31/2^15 respectively.
> >> Hence none of the file operations will be able to add new extents to a
> >> file.
> >
> > Aha, there's the reason why!  You're right, xfs_iext_count_may_overflow
> > will abort the operation due to !NREXT64 before we even get a chance to
> > log the inode.
> >
> > I observe, however, that any time we call that function, we also have a
> > transaction allocated and we hold the ILOCK on the inode being tested.
> > *Most* of those call sites have also joined the inode to the transaction
> > already.  I wonder, is that a more appropriate place to be upgrading the
> > inodes?  Something like:
> >
> > /*
> >  * Ensure that the inode has the ability to add the specified number of
> >  * extents.  Caller must hold ILOCK_EXCL and have joined the inode to
> >  * the transaction.  Upon return, the inode will still be in this state
> >  * upon return and the transaction will be clean.
> >  */
> > int
> > xfs_trans_inode_ensure_nextents(
> > 	struct xfs_trans	**tpp,
> > 	struct xfs_inode	*ip,
> > 	int			whichfork,
> > 	int			nr_to_add)
> > {
> > 	int			error;
> >
> > 	error = xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
> > 	if (!error)
> > 		return 0;
> >
> > 	/*
> > 	 * Try to upgrade if the extent count fields aren't large
> > 	 * enough.
> > 	 */
> > 	if (!xfs_has_nrext64(ip->i_mount) ||
> > 	    (ip->i_diflags2 & XFS_DIFLAG2_NREXT64))
> > 		return error;
> >
> > 	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> > 	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
> >
> > 	error = xfs_trans_roll(tpp);
> > 	if (error)
> > 		return error;
> >
> > 	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
> > }
> >
> > and then the current call sites become:
> >
> > 	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write,
> > 			dblocks, rblocks, false, &tp);
> > 	if (error)
> > 		return error;
> >
> > 	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
> > 			XFS_IEXT_ADD_NOSPLIT_CNT);
> > 	if (error)
> > 		goto out_cancel;
> >
> > What do you think about that?
> >
> 
> I went through all the call sites of xfs_iext_count_may_overflow() and I think
> that your suggestion can be implemented.
> 
> However, wouldn't the current approach suffice in terms of being functionally
> and logically correct? XFS_DIFLAG2_NREXT64 is set when inode is read from the
> disk and the first operation to log the changes made to the inode will make
> sure to include the new value of ip->i_diflags2. Hence we never end up in a
> situation where a disk inode has more than 2^31 data fork extents without
> having XFS_DIFLAG2_NREXT64 flag set.
> 
> But the approach described above does go against the convention of changing
> metadata within a transaction. Hence I will try to implement your suggestion
> and include it in the next version of the patchset.

Ok, that sounds good. :)

--D

> -- 
> chandan