On Thu, Sep 10, 2020 at 07:44:55AM +1000, Dave Chinner wrote: > On Wed, Sep 09, 2020 at 09:31:11AM -0400, Brian Foster wrote: > > On Wed, Sep 09, 2020 at 06:19:10PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Recovering an EFI currently uses a itruncate reservation, which is > > > designed for a rolling transaction that modifies the BMBT and > > > logs the EFI in one commit, then frees the space and logs the EFD in > > > the second commit. > > > > > > Recovering the EFI only requires the second transaction in this > > > pair, and hence has a smaller log space requirement than a truncate > > > operation. Hence when the extent free is being processed at runtime, > > > the log reservation that is held by the filesystem is only enough to > > > complete the extent free, not the entire truncate operation. > > > > > > Hence if the EFI pins the tail of the log and the log fills up while > > > the extent is being freed, the amount of reserved free space in the > > > log is not enough to start another entire truncate operation. Hence > > > if we crash at this point, log recovery will deadlock with the EFI > > > pinning the tail of the log and the log not having enough free space > > > to reserve an itruncate transaction. > > > > > > As such, EFI recovery needs it's own log space reservation separate > > > to the itruncate reservation. We only need what is required free the > > > extent, and this matches the space we have reserved at runtime for > > > this operation and hence should prevent the recovery deadlock from > > > occurring. > > > > > > This patch adds the new reservation in a way that minimises the > > > change such that it should be back-portable to older kernels easily. > > > Follow up patches will factor and rework the reservations to be more > > > correct and more tightly defined. > > > > > > Note: this would appear to be a generic problem with intent > > > recovery; we use the entire operation reservation for recovery, > > > not the reservation that was held at runtime after the intent was > > > logged. I suspect all intents are going to require their own > > > reservation as a result. > > > > > > > It might be worth explicitly pointing out that support for EFI/EFD > > intents goes farther back than the various intents associated with newer > > features, hence the value of a targeted fix. > > Ok. > > > > @@ -916,6 +916,16 @@ xfs_trans_resv_calc( > > > resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT; > > > resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES; > > > > > > + /* > > > + * Log recovery reservations for intent replay > > > + * > > > + * EFI recovery is itruncate minus the initial transaction that logs > > > + * logs the EFI. > > > + */ > > > + resp->tr_efi.tr_logres = resp->tr_itruncate.tr_logres; > > > + resp->tr_efi.tr_logcount = resp->tr_itruncate.tr_logcount - 1; > > > > tr_itruncate.tr_logcount looks like it's either 2 or 8 depending on > > whether reflink is enabled. On one hand this seems conservative enough, > > but do we know exactly what those extra unit counts are accounted for in > > the reflink case? > > Right, in the reflink case we may have to roll the transaction many more > times to do the refcount btree and reverse map btree modifications. > Those are done under separate intents and so we have to roll and > commit the defered ops more times on a reflink/rmap based > filesystem. Hence the logcount is higher so that the initial > reservation can roll more times before regrant during a roll has to > go and physically reserve more write space in the log to continue > rolling the transaction. > > > It looks like extents are only freed when the last > > reference is dropped (otherwise we log a refcount intent), which makes > > me wonder whether we really need 7 log count units if recovery > > encounters an EFI. > > I don't know if the numbers are correct, and it really is out of > scope for this patch to audit/fix that. I really think we need to > map this whole thing out in a diagram at this point because I now > suspect that the allocfree log count calculation is not correct, > either... > I agree up to the point where it relates to this specific EFI recovery issue. reflink is enabled by default, which means the default EFI recovery reservation is going to have 7 logcount units. Is that actually enough of a reduction to prevent this same recovery problem on newer fs'? I'm wondering if the tr_efi logcount should just be set to 1, for example, at least for the short term fix. Brian > > Also, while looking through the code I noticed that truncate does the > > following: > > > > ... > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > > > error = xfs_trans_roll_inode(&tp, ip); > > if (error) > > goto out; > > ... > > > > ... which looks like it rolls the transaction an extra time per-extent. > > I don't think that contributes to this problem vs just being a runtime > > inefficiency, so maybe I'll fling a patch up for that separately. > > Yeah, I'm not sure if this is correct/needed or not. > > > > * The following transactions are logged in logical format with > > > * a default log count. > > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h > > > index 7241ab28cf84..13173b3eaac9 100644 > > > --- a/fs/xfs/libxfs/xfs_trans_resv.h > > > +++ b/fs/xfs/libxfs/xfs_trans_resv.h > > > @@ -50,6 +50,8 @@ struct xfs_trans_resv { > > > struct xfs_trans_res tr_qm_equotaoff;/* end of turn quota off */ > > > struct xfs_trans_res tr_sb; /* modify superblock */ > > > struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */ > > > + struct xfs_trans_res tr_efi; /* EFI log item recovery */ > > > + > > > > Extra whitespace line. > > Will fix. > > Thanks! > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >