On Mon, Sep 28, 2020 at 03:52:30PM +1000, Dave Chinner wrote: > On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > When xfs_defer_capture extracts the deferred ops and transaction state > > from a transaction, it should record the transaction reservation type > > from the old transaction so that when we continue the dfops chain, we > > still use the same reservation parameters. > > > > This avoids a potential failure vector by ensuring that we never ask for > > more log reservation space than we would have asked for had the system > > not gone down. > > Nope, it does not do that. > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_defer.c | 5 +++++ > > fs/xfs/libxfs/xfs_defer.h | 1 + > > fs/xfs/xfs_log_recover.c | 4 ++-- > > 3 files changed, 8 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > > index 85d70f1edc1c..c53443252389 100644 > > --- a/fs/xfs/libxfs/xfs_defer.c > > +++ b/fs/xfs/libxfs/xfs_defer.c > > @@ -577,6 +577,11 @@ xfs_defer_capture( > > dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used; > > tp->t_blk_res = tp->t_blk_res_used; > > > > + /* Preserve the transaction reservation type. */ > > + dfc->dfc_tres.tr_logres = tp->t_log_res; > > + dfc->dfc_tres.tr_logcount = tp->t_log_count; > > + dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; > > This means every child deferop takes a full tp->t_log_count > reservation, whilst in memory the child reservation would ahve been > handled by the parent via the log ticket unit count being > decremented by one. Hence child deferops -never- run with the same > maximal reservation that their parents held. > > The difference is that at runtime we are rolling transaction which > regrant space from the initial reservation of (tp->t_log_count * > tp->t_log_res) made a run time. i.e. the first child deferop that > runs has a total log space grant of ((tp->t_log_count - 1) > * tp->t_log_res), the second it is "- 2", and so on right down to > when the log ticket runs out of initial reservation and so it goes > to reserving a single unit (tp->t_log_res) at a time. > > Hence both the intents being recovered and all their children are > over-reserving log space by using the default log count for the > &M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial > reservation being incorrect, the child reservations of the same size > as the parent are definitely incorrect. They really should be > allowed only a single unit reservation, and if the transaction rolls > to process defer ops, it needs to regrant new log space during the > commit process. > > Hence I think this can only be correct as: > > dfc->dfc_tres.tr_log_count = 1; > > Regardless of how many units the parent recovery reservation > obtained. (Which I also think can only be correct as 1 because we > don't know how many units of reservation space the parent had > consumed when it was logged.) Can we reach down into the transaction's xlog_ticket.t_cnt to find out how many were left? Hm. Maybe that doesn't matter, seeing as recovery basically single-steps through whatever it salvaged from the log, I think it's better for recovery to regrant every time it rolls the transaction because the amount of space required for a single step will (once we get the logres part sorted out) never be larger than the sum of the logres of all transactions that were running when we crashed. It also occurs to me (esp. given the earlier conversation about the transaction reservations used to recover intents) that in general we might need to finish one item to push the tail forward to free up space, so therefore log recovery should try to use as few resources as possible. Well, I'll try hardcoding logcount=1 and see what happens. :) --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx