Re: [PATCH 3/3] xfs: rework collapse range into an atomic operation

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 19 Dec 2019 06:56:07 -0500

On Wed, Dec 18, 2019 at 01:19:05PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 18, 2019 at 07:11:20AM -0500, Brian Foster wrote:
> > On Tue, Dec 17, 2019 at 06:39:58PM -0800, Darrick J. Wong wrote:
> > > On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote:
> > > > The collapse range operation uses a unique transaction and ilock
> > > > cycle for the hole punch and each extent shift iteration of the
> > > > overall operation. While the hole punch is safe as a separate
> > > > operation due to the iolock, cycling the ilock after each extent
> > > > shift is risky similar to insert range.
> > > 
> > > It is?  I thought collapse range was safe because we started by punching
> > > out the doomed range and shifting downwards, which eliminates the
> > > problems that come with two adjacent mappings that could be combined?
> > > 
> > > <confused?>
> > > 
> > 
> > This is somewhat vague wording. I don't mean to say the same bug is
> > possible on collapse. Indeed, I don't think that it is. What I mean to
> > say is just that cycling the ilock generally opens the operation up to
> > concurrent extent changes, similar to the behavior that resulted in the
> > insert range bug and against the general design principle of the
> > operation (as implied by the iolock hold -> flush -> unmap preparation
> > sequence).
> 
> Oh, ok, you're merely trying to prevent anyone from seeing the inode
> metadata while we're in the middle of a collapse-range operation.  I
> wonder then if we need to take a look at the remap range operations, but
> oh wow is that a gnarly mess of inode locking. :)
> 

Yeah, that's actually a much better way to put it. ;) Feel free to tweak
the commit log along those lines if my current description is too
vague/confusing. Thanks for the reviews..

Brian

> > IOW, it seems to me that a similar behavior is possible on collapse, it
> > just might occur after an extent has been shifted into its new target
> > range rather than before. That wouldn't be a corruption/bug because it
> > doesn't change the semantics of the shift operation or the content of
> > the file, but it's subtle and arguably a misbehavior and/or landmine.
> 
> <nod> Ok, just making sure. :)
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > To avoid this problem, make collapse range atomic with respect to
> > > > ilock. Hold the ilock across the entire operation, replace the
> > > > individual transactions with a single rolling transaction sequence
> > > > and finish dfops on each iteration to perform pending frees and roll
> > > > the transaction. Remove the unnecessary quota reservation as
> > > > collapse range can only ever merge extents (and thus remove extent
> > > > records and potentially free bmap blocks). The dfops call
> > > > automatically relogs the inode to keep it moving in the log. This
> > > > guarantees that nothing else can change the extent mapping of an
> > > > inode while a collapse range operation is in progress.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++--------------
> > > >  1 file changed, 15 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > index 555c8b49a223..1c34a34997ca 100644
> > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > @@ -1050,7 +1050,6 @@ xfs_collapse_file_space(
> > > >  	int			error;
> > > >  	xfs_fileoff_t		next_fsb = XFS_B_TO_FSB(mp, offset + len);
> > > >  	xfs_fileoff_t		shift_fsb = XFS_B_TO_FSB(mp, len);
> > > > -	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
> > > >  	bool			done = false;
> > > >  
> > > >  	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > > > @@ -1066,32 +1065,34 @@ xfs_collapse_file_space(
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	while (!error && !done) {
> > > > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
> > > > -					&tp);
> > > > -		if (error)
> > > > -			break;
> > > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
> > > > +	if (error)
> > > > +		return error;
> > > >  
> > > > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > -		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
> > > > -				ip->i_gdquot, ip->i_pdquot, resblks, 0,
> > > > -				XFS_QMOPT_RES_REGBLKS);
> > > > -		if (error)
> > > > -			goto out_trans_cancel;
> > > > -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > +	xfs_trans_ijoin(tp, ip, 0);
> > > >  
> > > > +	while (!done) {
> > > >  		error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb,
> > > >  				&done);
> > > >  		if (error)
> > > >  			goto out_trans_cancel;
> > > > +		if (done)
> > > > +			break;
> > > >  
> > > > -		error = xfs_trans_commit(tp);
> > > > +		/* finish any deferred frees and roll the transaction */
> > > > +		error = xfs_defer_finish(&tp);
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > >  	}
> > > >  
> > > > +	error = xfs_trans_commit(tp);
> > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > >  	return error;
> > > >  
> > > >  out_trans_cancel:
> > > >  	xfs_trans_cancel(tp);
> > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > >  	return error;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.20.1
> > > > 
> > > 
> > 
>