On Tue, Sep 29, 2020 at 10:43:31AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > When we replay unfinished intent items that have been recovered from the > log, it's possible that the replay will cause the creation of more > deferred work items. As outlined in commit 509955823cc9c ("xfs: log > recovery should replay deferred ops in order"), later work items have an > implicit ordering dependency on earlier work items. Therefore, recovery > must replay the items (both recovered and created) in the same order > that they would have been during normal operation. > > For log recovery, we enforce this ordering by using an empty transaction > to collect deferred ops that get created in the process of recovering a > log intent item to prevent them from being committed before the rest of > the recovered intent items. After we finish committing all the > recovered log items, we allocate a transaction with an enormous block > reservation, splice our huge list of created deferred ops into that > transaction, and commit it, thereby finishing all those ops. > > This is /really/ hokey -- it's the one place in XFS where we allow > nested transactions; the splicing of the defer ops list is is inelegant > and has to be done twice per recovery function; and the broken way we > handle inode pointers and block reservations cause subtle use-after-free > and allocator problems that will be fixed by this patch and the two > patches after it. > > Therefore, replace the hokey empty transaction with a structure designed > to capture each chain of deferred ops that are created as part of > recovering a single unfinished log intent. Finally, refactor the loop > that replays those chains to do so using one transaction per chain. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_defer.c | 89 +++++++++++++++++++++++++++++++-- > fs/xfs/libxfs/xfs_defer.h | 19 +++++++ > fs/xfs/xfs_bmap_item.c | 16 +----- > fs/xfs/xfs_extfree_item.c | 7 +-- > fs/xfs/xfs_log_recover.c | 118 +++++++++++++++++++++++++------------------- > fs/xfs/xfs_refcount_item.c | 16 +----- > fs/xfs/xfs_rmap_item.c | 7 +-- > fs/xfs/xfs_trans.h | 3 + > 8 files changed, 184 insertions(+), 91 deletions(-) > > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > index 36c103c14bc9..85c371d29e8d 100644 > --- a/fs/xfs/libxfs/xfs_defer.c > +++ b/fs/xfs/libxfs/xfs_defer.c > @@ -549,14 +549,89 @@ xfs_defer_move( > * > * Create and log intent items for all the work that we're capturing so that we > * can be assured that the items will get replayed if the system goes down > - * before log recovery gets a chance to finish the work it put off. Then we > - * move the chain from stp to dtp. > + * before log recovery gets a chance to finish the work it put off. The entire > + * deferred ops state is transferred to the capture structure and the > + * transaction is then ready for the caller to commit it. If there are no > + * intent items to capture, this function returns NULL. > + */ > +static struct xfs_defer_capture * > +xfs_defer_ops_capture( > + struct xfs_trans *tp) > +{ > + struct xfs_defer_capture *dfc; > + > + if (list_empty(&tp->t_dfops)) > + return NULL; Nit: keeping the list_empty check in the caller would seems more obvious to me. Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@xxxxxx>