Re: [PATCH] xfs: Fix agi&agf ABBA deadlock when performing rename with RENAME_WHITEOUT flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 13, 2019 at 07:20:46AM -0700, Darrick J. Wong wrote:
> On Tue, Aug 13, 2019 at 09:36:14AM -0400, Brian Foster wrote:
> > On Tue, Aug 13, 2019 at 07:17:33PM +0800, kaixuxia wrote:
> > > When performing rename operation with RENAME_WHITEOUT flag, we will
> > > hold AGF lock to allocate or free extents in manipulating the dirents
> > > firstly, and then doing the xfs_iunlink_remove() call last to hold
> > > AGI lock to modify the tmpfile info, so we the lock order AGI->AGF.
> > > 
> > 
> > IIUC, the whiteout use case is that we're renaming a file, but the
> > source dentry must be replaced with a magic whiteout inode rather than
> > be removed. Therefore, xfs_rename() allocates the whiteout inode as a
> > tmpfile first in a separate transaction, updates the target dentry with
> > the source inode, replaces the source dentry to point to the whiteout
> > inode and finally removes the whiteout inode from the unlinked list
> > (since it is a tmpfile). This leads to the problem described below
> > because the rename transaction ends up doing directory block allocs
> > (locking the AGF) followed by the unlinked list remove (locking the
> > AGI).
> > 
> > My understanding from reading the code is that this is primarly to
> > cleanly handle error scenarios. If anything fails after we've allocated
> > the whiteout tmpfile, it's simply left on the unlinked list and so the
> > filesystem remains in a consistent/recoverable state. Given that, the
> > solution here seems like overkill to me. For one, I thought background
> > unlinked list removal was already on our roadmap (Darrick might have
> > been looking at that and may already have a prototype as well). Also,
> > unlinked list removal occurs at log recovery time already. That's
> > somewhat of an existing purpose of the list, which makes a deferred
> > unlinked list removal operation superfluous in more traditional cases
> > where unlinked list removal doesn't require consistency with a directory
> > operation.
> 
> Not to mention this doesn't fix the problem for existing filesystems,
> because adding new log item types changes the on-disk log format and
> therefore requires an log incompat feature bit to prevent old kernels
> from trying to recover the log.
> 

Yeah..

> > Functional discussion aside.. from a complexity standpoint I'm wondering
> > if we could do something much more simple like acquire the AGI lock for
> > a whiteout inode earlier in xfs_rename(). For example, suppose we did
> > something like:
> > 
> > 	/*
> > 	 * Acquire the whiteout agi to preserve locking order in anticipation of
> > 	 * unlinked list removal.
> > 	 */
> > 	if (wip)
> > 		xfs_read_agi(mp, tp, XFS_INO_TO_AGNO(mp, wip->i_ino), &agibp);
> > 
> > ... after we allocate the transaction but before we do any directory ops
> > that can result in block allocations. Would that prevent the problem
> > you've observed?
> 
> I had the same thought, but fun question: if @wip is allocated in AG 1
> but the dirent blocks come from AG 0, is that a problem?
> 

Not sure.. I was thinking that locking order is only a problem here as
such if we are locking the AGI/AGF of the same AG. Otherwise AG locking
order comes into play only when you're locking the same type AG buffer
on multiple AGs. IOW, the above hack wouldn't have been necessary if
they end up in different AGs (but we know we'll be locking the AGI
anyways), but it's not clear to me that it's a problem if they aren't.

> Would it make more sense to expand the directory in one transaction,
> roll it, and add the actual directory entry after that?
> 

That might be a cleaner approach from a design standpoint so long as we
can hold whatever locks, etc. to guarantee the recently added space can
only be used by the pending dir operation. I suppose we'd have to handle
potential format conversions, too..

Brian

> --D
> 
> > Brian
> > 
> > > The big problem here is that we have an ordering constraint on AGF
> > > and AGI locking - inode allocation locks the AGI, then can allocate
> > > a new extent for new inodes, locking the AGF after the AGI. Hence
> > > the ordering that is imposed by other parts of the code is AGI before
> > > AGF. So we get the ABBA agi&agf deadlock here.
> > > 
> > > Process A:
> > > Call trace:
> > >  ? __schedule+0x2bd/0x620
> > >  schedule+0x33/0x90
> > >  schedule_timeout+0x17d/0x290
> > >  __down_common+0xef/0x125
> > >  ? xfs_buf_find+0x215/0x6c0 [xfs]
> > >  down+0x3b/0x50
> > >  xfs_buf_lock+0x34/0xf0 [xfs]
> > >  xfs_buf_find+0x215/0x6c0 [xfs]
> > >  xfs_buf_get_map+0x37/0x230 [xfs]
> > >  xfs_buf_read_map+0x29/0x190 [xfs]
> > >  xfs_trans_read_buf_map+0x13d/0x520 [xfs]
> > >  xfs_read_agf+0xa6/0x180 [xfs]
> > >  ? schedule_timeout+0x17d/0x290
> > >  xfs_alloc_read_agf+0x52/0x1f0 [xfs]
> > >  xfs_alloc_fix_freelist+0x432/0x590 [xfs]
> > >  ? down+0x3b/0x50
> > >  ? xfs_buf_lock+0x34/0xf0 [xfs]
> > >  ? xfs_buf_find+0x215/0x6c0 [xfs]
> > >  xfs_alloc_vextent+0x301/0x6c0 [xfs]
> > >  xfs_ialloc_ag_alloc+0x182/0x700 [xfs]
> > >  ? _xfs_trans_bjoin+0x72/0xf0 [xfs]
> > >  xfs_dialloc+0x116/0x290 [xfs]
> > >  xfs_ialloc+0x6d/0x5e0 [xfs]
> > >  ? xfs_log_reserve+0x165/0x280 [xfs]
> > >  xfs_dir_ialloc+0x8c/0x240 [xfs]
> > >  xfs_create+0x35a/0x610 [xfs]
> > >  xfs_generic_create+0x1f1/0x2f0 [xfs]
> > >  ...
> > > 
> > > Process B:
> > > Call trace:
> > >  ? __schedule+0x2bd/0x620
> > >  ? xfs_bmapi_allocate+0x245/0x380 [xfs]
> > >  schedule+0x33/0x90
> > >  schedule_timeout+0x17d/0x290
> > >  ? xfs_buf_find+0x1fd/0x6c0 [xfs]
> > >  __down_common+0xef/0x125
> > >  ? xfs_buf_get_map+0x37/0x230 [xfs]
> > >  ? xfs_buf_find+0x215/0x6c0 [xfs]
> > >  down+0x3b/0x50
> > >  xfs_buf_lock+0x34/0xf0 [xfs]
> > >  xfs_buf_find+0x215/0x6c0 [xfs]
> > >  xfs_buf_get_map+0x37/0x230 [xfs]
> > >  xfs_buf_read_map+0x29/0x190 [xfs]
> > >  xfs_trans_read_buf_map+0x13d/0x520 [xfs]
> > >  xfs_read_agi+0xa8/0x160 [xfs]
> > >  xfs_iunlink_remove+0x6f/0x2a0 [xfs]
> > >  ? current_time+0x46/0x80
> > >  ? xfs_trans_ichgtime+0x39/0xb0 [xfs]
> > >  xfs_rename+0x57a/0xae0 [xfs]
> > >  xfs_vn_rename+0xe4/0x150 [xfs]
> > >  ...
> > > 
> > > In this patch we make the unlinked list removal a deferred operation,
> > > i.e. log an iunlink remove intent and then do it after the RENAME_WHITEOUT
> > > transaction has committed, and the iunlink remove intention and done
> > > log items are provided.
> > > 
> > > Change the ordering of the operations in the xfs_rename() function
> > > to hold the AGF lock in the RENAME_WHITEOUT transaction and hold the
> > > AGI lock in it's own transaction to match that of the rest of the code.
> > > 
> > > Signed-off-by: kaixuxia <kaixuxia@xxxxxxxxxxx>
> > > ---
> > >  fs/xfs/Makefile                |   1 +
> > >  fs/xfs/libxfs/xfs_defer.c      |   1 +
> > >  fs/xfs/libxfs/xfs_defer.h      |   2 +
> > >  fs/xfs/libxfs/xfs_log_format.h |  27 ++-
> > >  fs/xfs/xfs_inode.c             |  36 +---
> > >  fs/xfs/xfs_inode.h             |   3 +
> > >  fs/xfs/xfs_iunlinkrm_item.c    | 458 +++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_iunlinkrm_item.h    |  67 ++++++
> > >  fs/xfs/xfs_log.c               |   2 +
> > >  fs/xfs/xfs_log_recover.c       | 148 +++++++++++++
> > >  fs/xfs/xfs_super.c             |  17 ++
> > >  fs/xfs/xfs_trans.h             |   2 +
> > >  12 files changed, 733 insertions(+), 31 deletions(-)
> > >  create mode 100644 fs/xfs/xfs_iunlinkrm_item.c
> > >  create mode 100644 fs/xfs/xfs_iunlinkrm_item.h
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 06b68b6..9d5012e 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -106,6 +106,7 @@ xfs-y				+= xfs_log.o \
> > >  				   xfs_inode_item.o \
> > >  				   xfs_refcount_item.o \
> > >  				   xfs_rmap_item.o \
> > > +				   xfs_iunlinkrm_item.o \
> > >  				   xfs_log_recover.o \
> > >  				   xfs_trans_ail.o \
> > >  				   xfs_trans_buf.o
> > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > > index eb2be2a..a0f0a3d 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.c
> > > +++ b/fs/xfs/libxfs/xfs_defer.c
> > > @@ -176,6 +176,7 @@
> > >  	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
> > >  	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
> > >  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> > > +	[XFS_DEFER_OPS_TYPE_IUNRE]	= &xfs_iunlink_remove_defer_type,
> > >  };
> > > 
> > >  /*
> > > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > > index 7c28d76..9e91a36 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.h
> > > +++ b/fs/xfs/libxfs/xfs_defer.h
> > > @@ -17,6 +17,7 @@ enum xfs_defer_ops_type {
> > >  	XFS_DEFER_OPS_TYPE_RMAP,
> > >  	XFS_DEFER_OPS_TYPE_FREE,
> > >  	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> > > +	XFS_DEFER_OPS_TYPE_IUNRE,
> > >  	XFS_DEFER_OPS_TYPE_MAX,
> > >  };
> > > 
> > > @@ -60,5 +61,6 @@ struct xfs_defer_op_type {
> > >  extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
> > >  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
> > >  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> > > +extern const struct xfs_defer_op_type xfs_iunlink_remove_defer_type;
> > > 
> > >  #endif /* __XFS_DEFER_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> > > index e5f97c6..dc85b28 100644
> > > --- a/fs/xfs/libxfs/xfs_log_format.h
> > > +++ b/fs/xfs/libxfs/xfs_log_format.h
> > > @@ -117,7 +117,9 @@ struct xfs_unmount_log_format {
> > >  #define XLOG_REG_TYPE_CUD_FORMAT	24
> > >  #define XLOG_REG_TYPE_BUI_FORMAT	25
> > >  #define XLOG_REG_TYPE_BUD_FORMAT	26
> > > -#define XLOG_REG_TYPE_MAX		26
> > > +#define XLOG_REG_TYPE_IRI_FORMAT	27
> > > +#define XLOG_REG_TYPE_IRD_FORMAT	28
> > > +#define XLOG_REG_TYPE_MAX		28
> > > 
> > >  /*
> > >   * Flags to log operation header
> > > @@ -240,6 +242,8 @@ struct xfs_unmount_log_format {
> > >  #define	XFS_LI_CUD		0x1243
> > >  #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
> > >  #define	XFS_LI_BUD		0x1245
> > > +#define	XFS_LI_IRI		0x1246	/* iunlink remove intent */
> > > +#define	XFS_LI_IRD		0x1247
> > > 
> > >  #define XFS_LI_TYPE_DESC \
> > >  	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> > > @@ -255,7 +259,9 @@ struct xfs_unmount_log_format {
> > >  	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
> > >  	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
> > >  	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> > > -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> > > +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> > > +	{ XFS_LI_IRI,		"XFS_LI_IRI" }, \
> > > +	{ XFS_LI_IRD,		"XFS_LI_IRD" }
> > > 
> > >  /*
> > >   * Inode Log Item Format definitions.
> > > @@ -773,6 +779,23 @@ struct xfs_bud_log_format {
> > >  };
> > > 
> > >  /*
> > > + * This is the structure used to lay out iri&ird log item in the log.
> > > + */
> > > +typedef struct xfs_iri_log_format {
> > > +	uint16_t		iri_type;	/* iri log item type */
> > > +	uint16_t		iri_size;	/* size of this item */
> > > +	uint64_t		iri_id;		/* id of corresponding iri */
> > > +	uint64_t		wip_ino;	/* inode number */
> > > +} xfs_iri_log_format_t;
> > > +
> > > +typedef struct xfs_ird_log_format {
> > > +	uint16_t		ird_type;	/* ird log item type */
> > > +	uint16_t		ird_size;	/* size of this item */
> > > +	uint64_t		ird_iri_id;	/* id of corresponding iri */
> > > +	uint64_t		wip_ino;	/* inode number */
> > > +} xfs_ird_log_format_t;
> > > +
> > > +/*
> > >   * Dquot Log format definitions.
> > >   *
> > >   * The first two fields must be the type and size fitting into
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 6467d5e..7bb3102 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -35,6 +35,7 @@
> > >  #include "xfs_log.h"
> > >  #include "xfs_bmap_btree.h"
> > >  #include "xfs_reflink.h"
> > > +#include "xfs_iunlinkrm_item.h"
> > > 
> > >  kmem_zone_t *xfs_inode_zone;
> > > 
> > > @@ -46,7 +47,6 @@
> > > 
> > >  STATIC int xfs_iflush_int(struct xfs_inode *, struct xfs_buf *);
> > >  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
> > > -STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> > > 
> > >  /*
> > >   * helper function to extract extent size hint from inode
> > > @@ -1110,7 +1110,7 @@
> > >  /*
> > >   * Increment the link count on an inode & log the change.
> > >   */
> > > -static void
> > > +void
> > >  xfs_bumplink(
> > >  	xfs_trans_t *tp,
> > >  	xfs_inode_t *ip)
> > > @@ -2406,7 +2406,7 @@ struct xfs_iunlink {
> > >  /*
> > >   * Pull the on-disk inode from the AGI unlinked list.
> > >   */
> > > -STATIC int
> > > +int
> > >  xfs_iunlink_remove(
> > >  	struct xfs_trans	*tp,
> > >  	struct xfs_inode	*ip)
> > > @@ -3261,8 +3261,6 @@ struct xfs_iunlink {
> > >  	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
> > >  	if (target_ip)
> > >  		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
> > > -	if (wip)
> > > -		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > > 
> > >  	/*
> > >  	 * If we are using project inheritance, we only allow renames
> > > @@ -3417,35 +3415,15 @@ struct xfs_iunlink {
> > >  	if (error)
> > >  		goto out_trans_cancel;
> > > 
> > > -	/*
> > > -	 * For whiteouts, we need to bump the link count on the whiteout inode.
> > > -	 * This means that failures all the way up to this point leave the inode
> > > -	 * on the unlinked list and so cleanup is a simple matter of dropping
> > > -	 * the remaining reference to it. If we fail here after bumping the link
> > > -	 * count, we're shutting down the filesystem so we'll never see the
> > > -	 * intermediate state on disk.
> > > -	 */
> > > -	if (wip) {
> > > -		ASSERT(VFS_I(wip)->i_nlink == 0);
> > > -		xfs_bumplink(tp, wip);
> > > -		error = xfs_iunlink_remove(tp, wip);
> > > -		if (error)
> > > -			goto out_trans_cancel;
> > > -		xfs_trans_log_inode(tp, wip, XFS_ILOG_CORE);
> > > -
> > > -		/*
> > > -		 * Now we have a real link, clear the "I'm a tmpfile" state
> > > -		 * flag from the inode so it doesn't accidentally get misused in
> > > -		 * future.
> > > -		 */
> > > -		VFS_I(wip)->i_state &= ~I_LINKABLE;
> > > -	}
> > > -
> > >  	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
> > >  	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
> > >  	if (new_parent)
> > >  		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
> > > 
> > > +	/* add the iunlink remove intent to the tp */
> > > +	if (wip)
> > > +		xfs_iunlink_remove_add(tp, wip);
> > > +
> > >  	error = xfs_finish_rename(tp);
> > >  	if (wip)
> > >  		xfs_irele(wip);
> > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > index 558173f..f8c30ca 100644
> > > --- a/fs/xfs/xfs_inode.h
> > > +++ b/fs/xfs/xfs_inode.h
> > > @@ -20,6 +20,7 @@
> > >  struct xfs_mount;
> > >  struct xfs_trans;
> > >  struct xfs_dquot;
> > > +struct xfs_trans;
> > > 
> > >  typedef struct xfs_inode {
> > >  	/* Inode linking and identification information. */
> > > @@ -414,6 +415,7 @@ enum layout_break_reason {
> > >  void		xfs_inactive(struct xfs_inode *ip);
> > >  int		xfs_lookup(struct xfs_inode *dp, struct xfs_name *name,
> > >  			   struct xfs_inode **ipp, struct xfs_name *ci_name);
> > > +void		xfs_bumplink(struct xfs_trans *, struct xfs_inode *);
> > >  int		xfs_create(struct xfs_inode *dp, struct xfs_name *name,
> > >  			   umode_t mode, dev_t rdev, struct xfs_inode **ipp);
> > >  int		xfs_create_tmpfile(struct xfs_inode *dp, umode_t mode,
> > > @@ -436,6 +438,7 @@ int		xfs_rename(struct xfs_inode *src_dp, struct xfs_name *src_name,
> > >  uint		xfs_ilock_attr_map_shared(struct xfs_inode *);
> > > 
> > >  uint		xfs_ip2xflags(struct xfs_inode *);
> > > +int		xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> > >  int		xfs_ifree(struct xfs_trans *, struct xfs_inode *);
> > >  int		xfs_itruncate_extents_flags(struct xfs_trans **,
> > >  				struct xfs_inode *, int, xfs_fsize_t, int);
> > > diff --git a/fs/xfs/xfs_iunlinkrm_item.c b/fs/xfs/xfs_iunlinkrm_item.c
> > > new file mode 100644
> > > index 0000000..4e38329
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_iunlinkrm_item.c
> > > @@ -0,0 +1,458 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +/*
> > > + * Copyright (C) 2019 Tencent.  All Rights Reserved.
> > > + * Author: Kaixuxia <kaixuxia@xxxxxxxxxxx>
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_bit.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_trans_priv.h"
> > > +#include "xfs_log.h"
> > > +#include "xfs_alloc.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_icache.h"
> > > +#include "xfs_iunlinkrm_item.h"
> > > +
> > > +kmem_zone_t	*xfs_iri_zone;
> > > +kmem_zone_t	*xfs_ird_zone;
> > > +
> > > +static inline struct xfs_iri_log_item *IRI_ITEM(struct xfs_log_item *lip)
> > > +{
> > > +	return container_of(lip, struct xfs_iri_log_item, iri_item);
> > > +}
> > > +
> > > +void
> > > +xfs_iri_item_free(
> > > +	struct xfs_iri_log_item *irip)
> > > +{
> > > +	kmem_zone_free(xfs_iri_zone, irip);
> > > +}
> > > +
> > > +/*
> > > + * Freeing the iri requires that we remove it from the AIL if it has already
> > > + * been placed there. However, the IRI may not yet have been placed in the AIL
> > > + * when called by xfs_iri_release() from IRD processing due to the ordering of
> > > + * committed vs unpin operations in bulk insert operations. Hence the reference
> > > + * count to ensure only the last caller frees the IRI.
> > > + */
> > > +void
> > > +xfs_iri_release(
> > > +	struct xfs_iri_log_item *irip)
> > > +{
> > > +	ASSERT(atomic_read(&irip->iri_refcount) > 0);
> > > +	if (atomic_dec_and_test(&irip->iri_refcount)) {
> > > +		xfs_trans_ail_remove(&irip->iri_item, SHUTDOWN_LOG_IO_ERROR);
> > > +		xfs_iri_item_free(irip);
> > > +	}
> > > +}
> > > +
> > > +static inline int
> > > +xfs_iri_item_sizeof(
> > > +	struct xfs_iri_log_item *irip)
> > > +{
> > > +	return sizeof(struct xfs_iri_log_format);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_iri_item_size(
> > > +	struct xfs_log_item	*lip,
> > > +	int			*nvecs,
> > > +	int			*nbytes)
> > > +{
> > > +	*nvecs += 1;
> > > +	*nbytes += xfs_iri_item_sizeof(IRI_ITEM(lip));
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_iri_item_format(
> > > +	struct xfs_log_item	*lip,
> > > +	struct xfs_log_vec	*lv)
> > > +{
> > > +	struct xfs_iri_log_item	*irip = IRI_ITEM(lip);
> > > +	struct xfs_log_iovec	*vecp = NULL;
> > > +
> > > +	irip->iri_format.iri_type = XFS_LI_IRI;
> > > +	irip->iri_format.iri_size = 1;
> > > +
> > > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_IRI_FORMAT,
> > > +			&irip->iri_format,
> > > +			xfs_iri_item_sizeof(irip));
> > > +}
> > > +
> > > +/*
> > > + * The unpin operation is the last place an IRI is manipulated in the log. It is
> > > + * either inserted in the AIL or aborted in the event of a log I/O error. In
> > > + * either case, the IRI transaction has been successfully committed to make it
> > > + * this far. Therefore, we expect whoever committed the IRI to either construct
> > > + * and commit the IRD or drop the IRD's reference in the event of error. Simply
> > > + * drop the log's IRI reference now that the log is done with it.
> > > + */
> > > +STATIC void
> > > +xfs_iri_item_unpin(
> > > +	struct xfs_log_item	*lip,
> > > +	int			remove)
> > > +{
> > > +	struct xfs_iri_log_item *irip = IRI_ITEM(lip);
> > > +	xfs_iri_release(irip);
> > > +}
> > > +
> > > +/*
> > > + * The IRI has been either committed or aborted if the transaction has been
> > > + * cancelled. If the transaction was cancelled, an IRD isn't going to be
> > > + * constructed and thus we free the IRI here directly.
> > > + */
> > > +STATIC void
> > > +xfs_iri_item_release(
> > > +	struct xfs_log_item     *lip)
> > > +{
> > > +	xfs_iri_release(IRI_ITEM(lip));
> > > +}
> > > +
> > > +/*
> > > + * This is the ops vector shared by all iri log items.
> > > + */
> > > +static const struct xfs_item_ops xfs_iri_item_ops = {
> > > +	.iop_size	= xfs_iri_item_size,
> > > +	.iop_format	= xfs_iri_item_format,
> > > +	.iop_unpin	= xfs_iri_item_unpin,
> > > +	.iop_release	= xfs_iri_item_release,
> > > +};
> > > +
> > > +/*
> > > + * Allocate and initialize an iri item with the given wip ino.
> > > + */
> > > +struct xfs_iri_log_item *
> > > +xfs_iri_init(struct xfs_mount  *mp,
> > > +	     uint		count)
> > > +{
> > > +	struct xfs_iri_log_item *irip;
> > > +
> > > +	irip = kmem_zone_zalloc(xfs_iri_zone, KM_SLEEP);
> > > +
> > > +	xfs_log_item_init(mp, &irip->iri_item, XFS_LI_IRI, &xfs_iri_item_ops);
> > > +	irip->iri_format.iri_id = (uintptr_t)(void *)irip;
> > > +	atomic_set(&irip->iri_refcount, 2);
> > > +
> > > +	return irip;
> > > +}
> > > +
> > > +static inline struct xfs_ird_log_item *IRD_ITEM(struct xfs_log_item *lip)
> > > +{
> > > +	return container_of(lip, struct xfs_ird_log_item, ird_item);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_ird_item_free(struct xfs_ird_log_item *irdp)
> > > +{
> > > +	kmem_zone_free(xfs_ird_zone, irdp);
> > > +}
> > > +
> > > +/*
> > > + * This returns the number of iovecs needed to log the given ird item.
> > > + * We only need 1 iovec for an ird item.  It just logs the ird_log_format
> > > + * structure.
> > > + */
> > > +STATIC void
> > > +xfs_ird_item_size(
> > > +	struct xfs_log_item	*lip,
> > > +	int			*nvecs,
> > > +	int			*nbytes)
> > > +{
> > > +	*nvecs += 1;
> > > +	*nbytes += sizeof(struct xfs_ird_log_format);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_ird_item_format(
> > > +	struct xfs_log_item	*lip,
> > > +	struct xfs_log_vec	*lv)
> > > +{
> > > +	struct xfs_ird_log_item *irdp = IRD_ITEM(lip);
> > > +	struct xfs_log_iovec	*vecp = NULL;
> > > +
> > > +	irdp->ird_format.ird_type = XFS_LI_IRD;
> > > +	irdp->ird_format.ird_size = 1;
> > > +
> > > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_IRD_FORMAT, &irdp->ird_format,
> > > +			sizeof(struct xfs_ird_log_format));
> > > +}
> > > +
> > > +/*
> > > + * The IRD is either committed or aborted if the transaction is cancelled. If
> > > + * the transaction is cancelled, drop our reference to the IRI and free the
> > > + * IRD.
> > > + */
> > > +STATIC void
> > > +xfs_ird_item_release(
> > > +	struct xfs_log_item	*lip)
> > > +{
> > > +	struct xfs_ird_log_item	*irdp = IRD_ITEM(lip);
> > > +
> > > +	xfs_iri_release(irdp->ird_irip);
> > > +	xfs_ird_item_free(irdp);
> > > +}
> > > +
> > > +static const struct xfs_item_ops xfs_ird_item_ops = {
> > > +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> > > +	.iop_size	= xfs_ird_item_size,
> > > +	.iop_format	= xfs_ird_item_format,
> > > +	.iop_release	= xfs_ird_item_release,
> > > +};
> > > +
> > > +static struct xfs_ird_log_item *
> > > +xfs_trans_get_ird(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_iri_log_item		*irip)
> > > +{
> > > +	xfs_ird_log_item_t	*irdp;
> > > +
> > > +	ASSERT(tp != NULL);
> > > +
> > > +	irdp = kmem_zone_zalloc(xfs_ird_zone, KM_SLEEP);
> > > +	xfs_log_item_init(tp->t_mountp, &irdp->ird_item, XFS_LI_IRD,
> > > +			  &xfs_ird_item_ops);
> > > +	irdp->ird_irip = irip;
> > > +	irdp->ird_format.wip_ino = irip->iri_format.wip_ino;
> > > +	irdp->ird_format.ird_iri_id = irip->iri_format.iri_id;
> > > +
> > > +	xfs_trans_add_item(tp, &irdp->ird_item);
> > > +	return irdp;
> > > +}
> > > +
> > > +/* record a iunlink remove intent */
> > > +int
> > > +xfs_iunlink_remove_add(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_inode	*wip)
> > > +{
> > > +	struct xfs_iunlink_remove_intent	*ii;
> > > +
> > > +	ii = kmem_alloc(sizeof(struct xfs_iunlink_remove_intent),
> > > +			KM_SLEEP | KM_NOFS);
> > > +	INIT_LIST_HEAD(&ii->ri_list);
> > > +	ii->wip = wip;
> > > +
> > > +	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_IUNRE, &ii->ri_list);
> > > +	return 0;
> > > +}
> > > +
> > > +/* Sort iunlink remove intents by AG. */
> > > +static int
> > > +xfs_iunlink_remove_diff_items(
> > > +	void				*priv,
> > > +	struct list_head		*a,
> > > +	struct list_head		*b)
> > > +{
> > > +	struct xfs_mount			*mp = priv;
> > > +	struct xfs_iunlink_remove_intent	*ra;
> > > +	struct xfs_iunlink_remove_intent	*rb;
> > > +
> > > +	ra = container_of(a, struct xfs_iunlink_remove_intent, ri_list);
> > > +	rb = container_of(b, struct xfs_iunlink_remove_intent, ri_list);
> > > +	return	XFS_INO_TO_AGNO(mp, ra->wip->i_ino) -
> > > +		XFS_INO_TO_AGNO(mp, rb->wip->i_ino);
> > > +}
> > > +
> > > +/* Get an IRI */
> > > +STATIC void *
> > > +xfs_iunlink_remove_create_intent(
> > > +	struct xfs_trans		*tp,
> > > +	unsigned int			count)
> > > +{
> > > +	struct xfs_iri_log_item		*irip;
> > > +
> > > +	ASSERT(tp != NULL);
> > > +	ASSERT(count == 1);
> > > +
> > > +	irip = xfs_iri_init(tp->t_mountp, count);
> > > +	ASSERT(irip != NULL);
> > > +
> > > +	/*
> > > +	 * Get a log_item_desc to point at the new item.
> > > +	 */
> > > +	xfs_trans_add_item(tp, &irip->iri_item);
> > > +	return irip;
> > > +}
> > > +
> > > +/* Log a iunlink remove to the intent item. */
> > > +STATIC void
> > > +xfs_iunlink_remove_log_item(
> > > +	struct xfs_trans		*tp,
> > > +	void				*intent,
> > > +	struct list_head		*item)
> > > +{
> > > +	struct xfs_iri_log_item			*irip = intent;
> > > +	struct xfs_iunlink_remove_intent	*iunre;
> > > +
> > > +	iunre = container_of(item, struct xfs_iunlink_remove_intent, ri_list);
> > > +
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	set_bit(XFS_LI_DIRTY, &irip->iri_item.li_flags);
> > > +
> > > +	irip->iri_format.wip_ino = (uint64_t)(iunre->wip->i_ino);
> > > +}
> > > +
> > > +/* Get an IRD so we can process all the deferred iunlink remove. */
> > > +STATIC void *
> > > +xfs_iunlink_remove_create_done(
> > > +	struct xfs_trans		*tp,
> > > +	void				*intent,
> > > +	unsigned int			count)
> > > +{
> > > +	return xfs_trans_get_ird(tp, intent);
> > > +}
> > > +
> > > +/*
> > > + * For whiteouts, we need to bump the link count on the whiteout inode.
> > > + * This means that failures all the way up to this point leave the inode
> > > + * on the unlinked list and so cleanup is a simple matter of dropping
> > > + * the remaining reference to it. If we fail here after bumping the link
> > > + * count, we're shutting down the filesystem so we'll never see the
> > > + * intermediate state on disk.
> > > + */
> > > +static int
> > > +xfs_trans_log_finish_iunlink_remove(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_ird_log_item		*irdp,
> > > +	struct xfs_inode		*wip)
> > > +{
> > > +	int 	error;
> > > +
> > > +	ASSERT(xfs_isilocked(wip, XFS_ILOCK_EXCL));
> > > +
> > > +	xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > > +
> > > +	ASSERT(VFS_I(wip)->i_nlink == 0);
> > > +	xfs_bumplink(tp, wip);
> > > +	error = xfs_iunlink_remove(tp, wip);
> > > +	xfs_trans_log_inode(tp, wip, XFS_ILOG_CORE);
> > > +	/*
> > > +	 * Now we have a real link, clear the "I'm a tmpfile" state
> > > +	 * flag from the inode so it doesn't accidentally get misused in
> > > +	 * future.
> > > +	 */
> > > +	VFS_I(wip)->i_state &= ~I_LINKABLE;
> > > +
> > > +	/*
> > > +	 * Mark the transaction dirty, even on error. This ensures the
> > > +	 * transaction is aborted, which:
> > > +	 *
> > > +	 * 1.) releases the IRI and frees the IRD
> > > +	 * 2.) shuts down the filesystem
> > > +	 */
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	set_bit(XFS_LI_DIRTY, &irdp->ird_item.li_flags);
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/* Process a deferred iunlink remove. */
> > > +STATIC int
> > > +xfs_iunlink_remove_finish_item(
> > > +	struct xfs_trans		*tp,
> > > +	struct list_head		*item,
> > > +	void				*done_item,
> > > +	void				**state)
> > > +{
> > > +	struct xfs_iunlink_remove_intent	*iunre;
> > > +	int					error;
> > > +
> > > +	iunre = container_of(item, struct xfs_iunlink_remove_intent, ri_list);
> > > +	error = xfs_trans_log_finish_iunlink_remove(tp, done_item,
> > > +			iunre->wip);
> > > +	kmem_free(iunre);
> > > +	return error;
> > > +}
> > > +
> > > +/* Abort all pending IRIs. */
> > > +STATIC void
> > > +xfs_iunlink_remove_abort_intent(
> > > +	void		*intent)
> > > +{
> > > +	xfs_iri_release(intent);
> > > +}
> > > +
> > > +/* Cancel a deferred iunlink remove. */
> > > +STATIC void
> > > +xfs_iunlink_remove_cancel_item(
> > > +	struct list_head		*item)
> > > +{
> > > +	struct xfs_iunlink_remove_intent	*iunre;
> > > +
> > > +	iunre = container_of(item, struct xfs_iunlink_remove_intent, ri_list);
> > > +	kmem_free(iunre);
> > > +}
> > > +
> > > +const struct xfs_defer_op_type xfs_iunlink_remove_defer_type = {
> > > +	.diff_items	= xfs_iunlink_remove_diff_items,
> > > +	.create_intent	= xfs_iunlink_remove_create_intent,
> > > +	.abort_intent	= xfs_iunlink_remove_abort_intent,
> > > +	.log_item	= xfs_iunlink_remove_log_item,
> > > +	.create_done	= xfs_iunlink_remove_create_done,
> > > +	.finish_item	= xfs_iunlink_remove_finish_item,
> > > +	.cancel_item	= xfs_iunlink_remove_cancel_item,
> > > +};
> > > +
> > > +/*
> > > + * Process a iunlink remove intent item that was recovered from the log.
> > > + */
> > > +int
> > > +xfs_iri_recover(
> > > +	struct xfs_trans		*parent_tp,
> > > +	struct xfs_iri_log_item		*irip)
> > > +{
> > > +	int				error = 0;
> > > +	struct xfs_trans		*tp;
> > > +	xfs_ino_t			ino;
> > > +	struct xfs_inode		*ip;
> > > +	struct xfs_mount		*mp = parent_tp->t_mountp;
> > > +	struct xfs_ird_log_item		*irdp;
> > > +
> > > +	ASSERT(!test_bit(XFS_IRI_RECOVERED, &irip->iri_flags));
> > > +
> > > +	ino = irip->iri_format.wip_ino;
> > > +	if (ino == NULLFSINO || !xfs_verify_dir_ino(mp, ino)) {
> > > +		xfs_alert(mp, "IRI recover used bad inode ino 0x%llx!", ino);
> > > +		set_bit(XFS_IRI_RECOVERED, &irip->iri_flags);
> > > +		xfs_iri_release(irip);
> > > +		return -EIO;
> > > +	}
> > > +	error = xfs_iget(mp, NULL, ino, 0, 0, &ip);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ifree, 0, 0, 0, &tp);
> > > +	if (error)
> > > +		return error;
> > > +	irdp = xfs_trans_get_ird(tp, irip);
> > > +
> > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > +
> > > +	ASSERT(VFS_I(ip)->i_nlink == 0);
> > > +	VFS_I(ip)->i_state |= I_LINKABLE;
> > > +	xfs_bumplink(tp, ip);
> > > +	error = xfs_iunlink_remove(tp, ip);
> > > +	if (error)
> > > +		goto abort_error;
> > > +	VFS_I(ip)->i_state &= ~I_LINKABLE;
> > > +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > > +
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	set_bit(XFS_LI_DIRTY, &irdp->ird_item.li_flags);
> > > +
> > > +	set_bit(XFS_IRI_RECOVERED, &irip->iri_flags);
> > > +	error = xfs_trans_commit(tp);
> > > +	return error;
> > > +
> > > +abort_error:
> > > +	xfs_trans_cancel(tp);
> > > +	return error;
> > > +}
> > > diff --git a/fs/xfs/xfs_iunlinkrm_item.h b/fs/xfs/xfs_iunlinkrm_item.h
> > > new file mode 100644
> > > index 0000000..54c4ca3
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_iunlinkrm_item.h
> > > @@ -0,0 +1,67 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +/*
> > > + * Copyright (C) 2019 Tencent.  All Rights Reserved.
> > > + * Author: Kaixuxia <kaixuxia@xxxxxxxxxxx>
> > > + */
> > > +#ifndef	__XFS_IUNLINKRM_ITEM_H__
> > > +#define	__XFS_IUNLINKRM_ITEM_H__
> > > +
> > > +/*
> > > + * When performing rename operation with RENAME_WHITEOUT flag, we will hold AGF lock to
> > > + * allocate or free extents in manipulating the dirents firstly, and then doing the
> > > + * xfs_iunlink_remove() call last to hold AGI lock to modify the tmpfile info, so we the
> > > + * lock order AGI->AGF.
> > > + *
> > > + * The big problem here is that we have an ordering constraint on AGF and AGI locking -
> > > + * inode allocation locks the AGI, then can allocate a new extent for new inodes, locking
> > > + * the AGF after the AGI. Hence the ordering that is imposed by other parts of the code
> > > + * is AGI before AGF. So we get the ABBA agi&agf deadlock here.
> > > + *
> > > + * So make the unlinked list removal a deferred operation, i.e. log an iunlink remove intent
> > > + * and then do it after the RENAME_WHITEOUT transaction has committed, and the iunlink remove
> > > + * intention(IRI) and done log items(IRD) are provided.
> > > + */
> > > +
> > > +/* kernel only IRI/IRD definitions */
> > > +
> > > +struct xfs_mount;
> > > +struct kmem_zone;
> > > +struct xfs_inode;
> > > +
> > > +/*
> > > + * Define IRI flag bits. Manipulated by set/clear/test_bit operators.
> > > + */
> > > +#define	XFS_IRI_RECOVERED		1
> > > +
> > > +/* This is the "iunlink remove intention" log item. It is used in conjunction
> > > + * with the "iunlink remove done" log item described below.
> > > + */
> > > +typedef struct xfs_iri_log_item {
> > > +	struct xfs_log_item	iri_item;
> > > +	atomic_t		iri_refcount;
> > > +	unsigned long		iri_flags;
> > > +	xfs_iri_log_format_t	iri_format;
> > > +} xfs_iri_log_item_t;
> > > +
> > > +/* This is the "iunlink remove done" log item. */
> > > +typedef struct xfs_ird_log_item {
> > > +	struct xfs_log_item	ird_item;
> > > +	xfs_iri_log_item_t	*ird_irip;
> > > +	xfs_ird_log_format_t	ird_format;
> > > +} xfs_ird_log_item_t;
> > > +
> > > +struct xfs_iunlink_remove_intent {
> > > +	struct list_head		ri_list;
> > > +	struct xfs_inode		*wip;
> > > +};
> > > +
> > > +extern struct kmem_zone	*xfs_iri_zone;
> > > +extern struct kmem_zone	*xfs_ird_zone;
> > > +
> > > +struct xfs_iri_log_item	*xfs_iri_init(struct xfs_mount *, uint);
> > > +void xfs_iri_item_free(struct xfs_iri_log_item *);
> > > +void xfs_iri_release(struct xfs_iri_log_item *);
> > > +int xfs_iri_recover(struct xfs_trans *, struct xfs_iri_log_item *);
> > > +int xfs_iunlink_remove_add(struct xfs_trans *, struct xfs_inode *);
> > > +
> > > +#endif	/* __XFS_IUNLINKRM_ITEM_H__ */
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index 00e9f5c..f87f510 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -2005,6 +2005,8 @@ STATIC void xlog_state_done_syncing(
> > >  	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
> > >  	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
> > >  	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> > > +	    REG_TYPE_STR(IRI_FORMAT, "iri_format"),
> > > +	    REG_TYPE_STR(IRD_FORMAT, "ird_format"),
> > >  	};
> > >  	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
> > >  #undef REG_TYPE_STR
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 13d1d3e..a916f40 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -33,6 +33,7 @@
> > >  #include "xfs_buf_item.h"
> > >  #include "xfs_refcount_item.h"
> > >  #include "xfs_bmap_item.h"
> > > +#include "xfs_iunlinkrm_item.h"
> > > 
> > >  #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
> > > 
> > > @@ -1885,6 +1886,8 @@ struct xfs_buf_cancel {
> > >  		case XFS_LI_CUD:
> > >  		case XFS_LI_BUI:
> > >  		case XFS_LI_BUD:
> > > +		case XFS_LI_IRI:
> > > +		case XFS_LI_IRD:
> > >  			trace_xfs_log_recover_item_reorder_tail(log,
> > >  							trans, item, pass);
> > >  			list_move_tail(&item->ri_list, &inode_list);
> > > @@ -3752,6 +3755,96 @@ struct xfs_buf_cancel {
> > >  }
> > > 
> > >  /*
> > > + * This routine is called to create an in-core iunlink remove intent
> > > + * item from the iri format structure which was logged on disk.
> > > + * It allocates an in-core iri, copies the inode from the format
> > > + * structure into it, and adds the iri to the AIL with the given
> > > + * LSN.
> > > + */
> > > +STATIC int
> > > +xlog_recover_iri_pass2(
> > > +	struct xlog			*log,
> > > +	struct xlog_recover_item	*item,
> > > +	xfs_lsn_t			lsn)
> > > +{
> > > +	xfs_mount_t		*mp = log->l_mp;
> > > +	xfs_iri_log_item_t	*irip;
> > > +	xfs_iri_log_format_t	*iri_formatp;
> > > +
> > > +	iri_formatp = item->ri_buf[0].i_addr;
> > > +
> > > +	irip = xfs_iri_init(mp, 1);
> > > +	irip->iri_format = *iri_formatp;
> > > +	if (item->ri_buf[0].i_len != sizeof(xfs_iri_log_format_t)) {
> > > +		xfs_iri_item_free(irip);
> > > +		return EFSCORRUPTED;
> > > +	}
> > > +
> > > +	spin_lock(&log->l_ailp->ail_lock);
> > > +	/*
> > > +	 * The IRI has two references. One for the IRD and one for IRI to ensure
> > > +	 * it makes it into the AIL. Insert the IRI into the AIL directly and
> > > +	 * drop the IRI reference. Note that xfs_trans_ail_update() drops the
> > > +	 * AIL lock.
> > > +	 */
> > > +	xfs_trans_ail_update(log->l_ailp, &irip->iri_item, lsn);
> > > +	xfs_iri_release(irip);
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * This routine is called when an IRD format structure is found in a committed
> > > + * transaction in the log. Its purpose is to cancel the corresponding IRI if it
> > > + * was still in the log. To do this it searches the AIL for the IRI with an id
> > > + * equal to that in the IRD format structure. If we find it we drop the IRD
> > > + * reference, which removes the IRI from the AIL and frees it.
> > > + */
> > > +STATIC int
> > > +xlog_recover_ird_pass2(
> > > +	struct xlog			*log,
> > > +	struct xlog_recover_item	*item)
> > > +{
> > > +	xfs_ird_log_format_t	*ird_formatp;
> > > +	xfs_iri_log_item_t	*irip = NULL;
> > > +	struct xfs_log_item	*lip;
> > > +	uint64_t		iri_id;
> > > +	struct xfs_ail_cursor	cur;
> > > +	struct xfs_ail		*ailp = log->l_ailp;
> > > +
> > > +	ird_formatp = item->ri_buf[0].i_addr;
> > > +	if (item->ri_buf[0].i_len == sizeof(xfs_ird_log_format_t))
> > > +		return -EFSCORRUPTED;
> > > +	iri_id = ird_formatp->ird_iri_id;
> > > +
> > > +	/*
> > > +	 * Search for the iri with the id in the ird format structure
> > > +	 * in the AIL.
> > > +	 */
> > > +	spin_lock(&ailp->ail_lock);
> > > +	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> > > +	while (lip != NULL) {
> > > +		if (lip->li_type == XFS_LI_IRI) {
> > > +			irip = (xfs_iri_log_item_t *)lip;
> > > +			if (irip->iri_format.iri_id == iri_id) {
> > > +				/*
> > > +				 * Drop the IRD reference to the IRI. This
> > > +				 * removes the IRI from the AIL and frees it.
> > > +				 */
> > > +				spin_unlock(&ailp->ail_lock);
> > > +				xfs_iri_release(irip);
> > > +				spin_lock(&ailp->ail_lock);
> > > +				break;
> > > +			}
> > > +		}
> > > +		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> > > +	}
> > > +	xfs_trans_ail_cursor_done(&cur);
> > > +	spin_unlock(&ailp->ail_lock);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > >   * This routine is called when an inode create format structure is found in a
> > >   * committed transaction in the log.  It's purpose is to initialise the inodes
> > >   * being allocated on disk. This requires us to get inode cluster buffers that
> > > @@ -3981,6 +4074,8 @@ struct xfs_buf_cancel {
> > >  	case XFS_LI_CUD:
> > >  	case XFS_LI_BUI:
> > >  	case XFS_LI_BUD:
> > > +	case XFS_LI_IRI:
> > > +	case XFS_LI_IRD:
> > >  	default:
> > >  		break;
> > >  	}
> > > @@ -4010,6 +4105,8 @@ struct xfs_buf_cancel {
> > >  	case XFS_LI_CUD:
> > >  	case XFS_LI_BUI:
> > >  	case XFS_LI_BUD:
> > > +	case XFS_LI_IRI:
> > > +	case XFS_LI_IRD:
> > >  		/* nothing to do in pass 1 */
> > >  		return 0;
> > >  	default:
> > > @@ -4052,6 +4149,10 @@ struct xfs_buf_cancel {
> > >  		return xlog_recover_bui_pass2(log, item, trans->r_lsn);
> > >  	case XFS_LI_BUD:
> > >  		return xlog_recover_bud_pass2(log, item);
> > > +	case XFS_LI_IRI:
> > > +		return xlog_recover_iri_pass2(log, item, trans->r_lsn);
> > > +	case XFS_LI_IRD:
> > > +		return xlog_recover_ird_pass2(log, item);
> > >  	case XFS_LI_DQUOT:
> > >  		return xlog_recover_dquot_pass2(log, buffer_list, item,
> > >  						trans->r_lsn);
> > > @@ -4721,6 +4822,46 @@ struct xfs_buf_cancel {
> > >  	spin_lock(&ailp->ail_lock);
> > >  }
> > > 
> > > +/* Recover the IRI if necessary. */
> > > +STATIC int
> > > +xlog_recover_process_iri(
> > > +	struct xfs_trans		*parent_tp,
> > > +	struct xfs_ail			*ailp,
> > > +	struct xfs_log_item		*lip)
> > > +{
> > > +	struct xfs_iri_log_item		*irip;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * Skip IRIs that we've already processed.
> > > +	 */
> > > +	irip = container_of(lip, struct xfs_iri_log_item, iri_item);
> > > +	if (test_bit(XFS_IRI_RECOVERED, &irip->iri_flags))
> > > +		return 0;
> > > +
> > > +	spin_unlock(&ailp->ail_lock);
> > > +	error = xfs_iri_recover(parent_tp, irip);
> > > +	spin_lock(&ailp->ail_lock);
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/* Release the IRI since we're cancelling everything. */
> > > +STATIC void
> > > +xlog_recover_cancel_iri(
> > > +	struct xfs_mount		*mp,
> > > +	struct xfs_ail			*ailp,
> > > +	struct xfs_log_item		*lip)
> > > +{
> > > +	struct xfs_iri_log_item         *irip;
> > > +
> > > +	irip = container_of(lip, struct xfs_iri_log_item, iri_item);
> > > +
> > > +	spin_unlock(&ailp->ail_lock);
> > > +	xfs_iri_release(irip);
> > > +	spin_lock(&ailp->ail_lock);
> > > +}
> > > +
> > >  /* Is this log item a deferred action intent? */
> > >  static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
> > >  {
> > > @@ -4729,6 +4870,7 @@ static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
> > >  	case XFS_LI_RUI:
> > >  	case XFS_LI_CUI:
> > >  	case XFS_LI_BUI:
> > > +	case XFS_LI_IRI:
> > >  		return true;
> > >  	default:
> > >  		return false;
> > > @@ -4856,6 +4998,9 @@ static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
> > >  		case XFS_LI_BUI:
> > >  			error = xlog_recover_process_bui(parent_tp, ailp, lip);
> > >  			break;
> > > +		case XFS_LI_IRI:
> > > +			error = xlog_recover_process_iri(parent_tp, ailp, lip);
> > > +			break;
> > >  		}
> > >  		if (error)
> > >  			goto out;
> > > @@ -4912,6 +5057,9 @@ static inline bool xlog_item_is_intent(struct xfs_log_item *lip)
> > >  		case XFS_LI_BUI:
> > >  			xlog_recover_cancel_bui(log->l_mp, ailp, lip);
> > >  			break;
> > > +		case XFS_LI_IRI:
> > > +			xlog_recover_cancel_iri(log->l_mp, ailp, lip);
> > > +			break;
> > >  		}
> > > 
> > >  		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index f945023..66742b7 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -34,6 +34,7 @@
> > >  #include "xfs_rmap_item.h"
> > >  #include "xfs_refcount_item.h"
> > >  #include "xfs_bmap_item.h"
> > > +#include "xfs_iunlinkrm_item.h"
> > >  #include "xfs_reflink.h"
> > > 
> > >  #include <linux/magic.h>
> > > @@ -1957,8 +1958,22 @@ struct proc_xfs_info {
> > >  	if (!xfs_bui_zone)
> > >  		goto out_destroy_bud_zone;
> > > 
> > > +	xfs_ird_zone = kmem_zone_init(sizeof(xfs_ird_log_item_t),
> > > +			"xfs_ird_item");
> > > +	if (!xfs_ird_zone)
> > > +		goto out_destroy_bui_zone;
> > > +
> > > +	xfs_iri_zone = kmem_zone_init(sizeof(xfs_iri_log_item_t),
> > > +			"xfs_iri_item");
> > > +	if (!xfs_iri_zone)
> > > +		goto out_destroy_ird_zone;
> > > +
> > >  	return 0;
> > > 
> > > + out_destroy_ird_zone:
> > > +	kmem_zone_destroy(xfs_ird_zone);
> > > + out_destroy_bui_zone:
> > > +	kmem_zone_destroy(xfs_bui_zone);
> > >   out_destroy_bud_zone:
> > >  	kmem_zone_destroy(xfs_bud_zone);
> > >   out_destroy_cui_zone:
> > > @@ -2007,6 +2022,8 @@ struct proc_xfs_info {
> > >  	 * destroy caches.
> > >  	 */
> > >  	rcu_barrier();
> > > +	kmem_zone_destroy(xfs_iri_zone);
> > > +	kmem_zone_destroy(xfs_ird_zone);
> > >  	kmem_zone_destroy(xfs_bui_zone);
> > >  	kmem_zone_destroy(xfs_bud_zone);
> > >  	kmem_zone_destroy(xfs_cui_zone);
> > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > > index 64d7f17..dd63eaa 100644
> > > --- a/fs/xfs/xfs_trans.h
> > > +++ b/fs/xfs/xfs_trans.h
> > > @@ -26,6 +26,8 @@
> > >  struct xfs_cud_log_item;
> > >  struct xfs_bui_log_item;
> > >  struct xfs_bud_log_item;
> > > +struct xfs_iri_log_item;
> > > +struct xfs_ird_log_item;
> > > 
> > >  struct xfs_log_item {
> > >  	struct list_head		li_ail;		/* AIL pointers */
> > > -- 
> > > 1.8.3.1
> > > 
> > > -- 
> > > kaixuxia



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux