On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > When xfs_defer_capture extracts the deferred ops and transaction state > from a transaction, it should record the transaction reservation type > from the old transaction so that when we continue the dfops chain, we > still use the same reservation parameters. > > This avoids a potential failure vector by ensuring that we never ask for > more log reservation space than we would have asked for had the system > not gone down. Nope, it does not do that. > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_defer.c | 5 +++++ > fs/xfs/libxfs/xfs_defer.h | 1 + > fs/xfs/xfs_log_recover.c | 4 ++-- > 3 files changed, 8 insertions(+), 2 deletions(-) > > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > index 85d70f1edc1c..c53443252389 100644 > --- a/fs/xfs/libxfs/xfs_defer.c > +++ b/fs/xfs/libxfs/xfs_defer.c > @@ -577,6 +577,11 @@ xfs_defer_capture( > dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used; > tp->t_blk_res = tp->t_blk_res_used; > > + /* Preserve the transaction reservation type. */ > + dfc->dfc_tres.tr_logres = tp->t_log_res; > + dfc->dfc_tres.tr_logcount = tp->t_log_count; > + dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; This means every child deferop takes a full tp->t_log_count reservation, whilst in memory the child reservation would ahve been handled by the parent via the log ticket unit count being decremented by one. Hence child deferops -never- run with the same maximal reservation that their parents held. The difference is that at runtime we are rolling transaction which regrant space from the initial reservation of (tp->t_log_count * tp->t_log_res) made a run time. i.e. the first child deferop that runs has a total log space grant of ((tp->t_log_count - 1) * tp->t_log_res), the second it is "- 2", and so on right down to when the log ticket runs out of initial reservation and so it goes to reserving a single unit (tp->t_log_res) at a time. Hence both the intents being recovered and all their children are over-reserving log space by using the default log count for the &M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial reservation being incorrect, the child reservations of the same size as the parent are definitely incorrect. They really should be allowed only a single unit reservation, and if the transaction rolls to process defer ops, it needs to regrant new log space during the commit process. Hence I think this can only be correct as: dfc->dfc_tres.tr_log_count = 1; Regardless of how many units the parent recovery reservation obtained. (Which I also think can only be correct as 1 because we don't know how many units of reservation space the parent had consumed when it was logged.) Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx