On Wed, Dec 18, 2019 at 01:19:05PM -0800, Darrick J. Wong wrote: > On Wed, Dec 18, 2019 at 07:11:20AM -0500, Brian Foster wrote: > > On Tue, Dec 17, 2019 at 06:39:58PM -0800, Darrick J. Wong wrote: > > > On Fri, Dec 13, 2019 at 12:12:58PM -0500, Brian Foster wrote: > > > > The collapse range operation uses a unique transaction and ilock > > > > cycle for the hole punch and each extent shift iteration of the > > > > overall operation. While the hole punch is safe as a separate > > > > operation due to the iolock, cycling the ilock after each extent > > > > shift is risky similar to insert range. > > > > > > It is? I thought collapse range was safe because we started by punching > > > out the doomed range and shifting downwards, which eliminates the > > > problems that come with two adjacent mappings that could be combined? > > > > > > <confused?> > > > > > > > This is somewhat vague wording. I don't mean to say the same bug is > > possible on collapse. Indeed, I don't think that it is. What I mean to > > say is just that cycling the ilock generally opens the operation up to > > concurrent extent changes, similar to the behavior that resulted in the > > insert range bug and against the general design principle of the > > operation (as implied by the iolock hold -> flush -> unmap preparation > > sequence). > > Oh, ok, you're merely trying to prevent anyone from seeing the inode > metadata while we're in the middle of a collapse-range operation. I > wonder then if we need to take a look at the remap range operations, but > oh wow is that a gnarly mess of inode locking. :) > Yeah, that's actually a much better way to put it. ;) Feel free to tweak the commit log along those lines if my current description is too vague/confusing. Thanks for the reviews.. Brian > > IOW, it seems to me that a similar behavior is possible on collapse, it > > just might occur after an extent has been shifted into its new target > > range rather than before. That wouldn't be a corruption/bug because it > > doesn't change the semantics of the shift operation or the content of > > the file, but it's subtle and arguably a misbehavior and/or landmine. > > <nod> Ok, just making sure. :) > > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --D > > > Brian > > > > > --D > > > > > > > To avoid this problem, make collapse range atomic with respect to > > > > ilock. Hold the ilock across the entire operation, replace the > > > > individual transactions with a single rolling transaction sequence > > > > and finish dfops on each iteration to perform pending frees and roll > > > > the transaction. Remove the unnecessary quota reservation as > > > > collapse range can only ever merge extents (and thus remove extent > > > > records and potentially free bmap blocks). The dfops call > > > > automatically relogs the inode to keep it moving in the log. This > > > > guarantees that nothing else can change the extent mapping of an > > > > inode while a collapse range operation is in progress. > > > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > > --- > > > > fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++-------------- > > > > 1 file changed, 15 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > > > index 555c8b49a223..1c34a34997ca 100644 > > > > --- a/fs/xfs/xfs_bmap_util.c > > > > +++ b/fs/xfs/xfs_bmap_util.c > > > > @@ -1050,7 +1050,6 @@ xfs_collapse_file_space( > > > > int error; > > > > xfs_fileoff_t next_fsb = XFS_B_TO_FSB(mp, offset + len); > > > > xfs_fileoff_t shift_fsb = XFS_B_TO_FSB(mp, len); > > > > - uint resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0); > > > > bool done = false; > > > > > > > > ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); > > > > @@ -1066,32 +1065,34 @@ xfs_collapse_file_space( > > > > if (error) > > > > return error; > > > > > > > > - while (!error && !done) { > > > > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, > > > > - &tp); > > > > - if (error) > > > > - break; > > > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp); > > > > + if (error) > > > > + return error; > > > > > > > > - xfs_ilock(ip, XFS_ILOCK_EXCL); > > > > - error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot, > > > > - ip->i_gdquot, ip->i_pdquot, resblks, 0, > > > > - XFS_QMOPT_RES_REGBLKS); > > > > - if (error) > > > > - goto out_trans_cancel; > > > > - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > > > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > > > + xfs_trans_ijoin(tp, ip, 0); > > > > > > > > + while (!done) { > > > > error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb, > > > > &done); > > > > if (error) > > > > goto out_trans_cancel; > > > > + if (done) > > > > + break; > > > > > > > > - error = xfs_trans_commit(tp); > > > > + /* finish any deferred frees and roll the transaction */ > > > > + error = xfs_defer_finish(&tp); > > > > + if (error) > > > > + goto out_trans_cancel; > > > > } > > > > > > > > + error = xfs_trans_commit(tp); > > > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > > > return error; > > > > > > > > out_trans_cancel: > > > > xfs_trans_cancel(tp); > > > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > > > return error; > > > > } > > > > > > > > -- > > > > 2.20.1 > > > > > > > > > >