Re: [PATCH 026/119] xfs: add owner field to extent allocation and freeing

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 8 Jul 2016 07:37:20 -0400

On Thu, Jul 07, 2016 at 12:09:56PM -0700, Darrick J. Wong wrote:
> On Thu, Jul 07, 2016 at 11:12:27AM -0400, Brian Foster wrote:
> > On Thu, Jun 16, 2016 at 06:20:39PM -0700, Darrick J. Wong wrote:
> > > For the rmap btree to work, we have to feed the extent owner
> > > information to the the allocation and freeing functions. This
> > > information is what will end up in the rmap btree that tracks
> > > allocated extents. While we technically don't need the owner
> > > information when freeing extents, passing it allows us to validate
> > > that the extent we are removing from the rmap btree actually
> > > belonged to the owner we expected it to belong to.
> > > 
> > > We also define a special set of owner values for internal metadata
> > > that would otherwise have no owner. This allows us to tell the
> > > difference between metadata owned by different per-ag btrees, as
> > > well as static fs metadata (e.g. AG headers) and internal journal
> > > blocks.
> > > 
> > > There are also a couple of special cases we need to take care of -
> > > during EFI recovery, we don't actually know who the original owner
> > > was, so we need to pass a wildcard to indicate that we aren't
> > > checking the owner for validity. We also need special handling in
> > > growfs, as we "free" the space in the last AG when extending it, but
> > > because it's new space it has no actual owner...
> > > 
> > > While touching the xfs_bmap_add_free() function, re-order the
> > > parameters to put the struct xfs_mount first.
> > > 
> > > Extend the owner field to include both the owner type and some sort
> > > of index within the owner.  The index field will be used to support
> > > reverse mappings when reflink is enabled.
> > > 
> > > This is based upon a patch originally from Dave Chinner. It has been
> > > extended to add more owner information with the intent of helping
> > > recovery operations when things go wrong (e.g. offset of user data
> > > block in a file).
> > > 
> > > v2: When we're freeing extents from an EFI, we don't have the owner
> > > information available (rmap updates have their own redo items).
> > > xfs_free_extent therefore doesn't need to do an rmap update, but the
> > > log replay code doesn't signal this correctly.  Fix it so that it
> > > does.
> > > 
> > > [dchinner: de-shout the xfs_rmap_*_owner helpers]
> > > [darrick: minor style fixes suggested by Christoph Hellwig]
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c        |   11 +++++-
> > >  fs/xfs/libxfs/xfs_alloc.h        |    4 ++
> > >  fs/xfs/libxfs/xfs_bmap.c         |   17 ++++++++--
> > >  fs/xfs/libxfs/xfs_bmap.h         |    4 ++
> > >  fs/xfs/libxfs/xfs_bmap_btree.c   |    6 +++-
> > >  fs/xfs/libxfs/xfs_format.h       |   65 ++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_ialloc.c       |    7 +++-
> > >  fs/xfs/libxfs/xfs_ialloc_btree.c |    7 ++++
> > >  fs/xfs/xfs_defer_item.c          |    3 +-
> > >  fs/xfs/xfs_fsops.c               |   16 +++++++--
> > >  fs/xfs/xfs_log_recover.c         |    5 ++-
> > >  fs/xfs/xfs_trans.h               |    2 +
> > >  fs/xfs/xfs_trans_extfree.c       |    5 ++-
> > >  13 files changed, 131 insertions(+), 21 deletions(-)
> > > 
> > > 
...
> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > index 3a6d3e3..2c28f2a 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > @@ -574,7 +574,8 @@ xfs_bmap_add_free(
> > >  	struct xfs_mount	*mp,		/* mount point structure */
> > >  	struct xfs_defer_ops	*dfops,		/* list of extents */
> > >  	xfs_fsblock_t		bno,		/* fs block number of extent */
> > > -	xfs_filblks_t		len)		/* length of extent */
> > > +	xfs_filblks_t		len,		/* length of extent */
> > > +	struct xfs_owner_info	*oinfo)		/* extent owner */
> > >  {
> > >  	struct xfs_bmap_free_item	*new;		/* new element */
> > >  #ifdef DEBUG
> > > @@ -593,9 +594,14 @@ xfs_bmap_add_free(
> > >  	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
> > >  #endif
> > >  	ASSERT(xfs_bmap_free_item_zone != NULL);
> > > +
> > >  	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
> > >  	new->xbfi_startblock = bno;
> > >  	new->xbfi_blockcount = (xfs_extlen_t)len;
> > > +	if (oinfo)
> > > +		memcpy(&new->xbfi_oinfo, oinfo, sizeof(struct xfs_owner_info));
> > > +	else
> > > +		memset(&new->xbfi_oinfo, 0, sizeof(struct xfs_owner_info));
> > 
> > How about just using KM_ZERO on the allocation and doing something like
> > 'if (oinfo) new->xbfi_oinfo = *oinfo'?
> > 
> > BTW, what's the use case for a zeroed out oinfo if we explicitly define
> > null/unknown owner types?
> 
> The two main ways we end up altering the rmapbt are as follows:
> 
> 1) Alloc/free of AG metadata blocks.  For this use case, the caller (generally
> a btree ->alloc_block function) bundles the bnobt and rmapbt updates in the
> same transaction by passing ownership info (via this oinfo pointer) to the
> alloc/free function.  Passing the "special" owner value XFS_RMAP_OWN_NULL just
> checks that there are no rmaps for the given range, which is a spot check
> performed by growfs.
> 
> 2) Map/unmap of file blocks.  For this use case, I must treat map/unmap
> separately from alloc/free in order to handle reflink.  Therefore, the map &
> unmap functions schedule rmap updates directly (via the deferred ops mechanism)
> and the alloc/free functions, if they're called, should not update the rmapbt.
> Zeroing out the oinfo indicates this.  However, XFS_RMAP_OWN_UNKNOWN is now
> unused, so I think I can overload that, especially since we should never be
> writing XFS_RMAP_OWN_UNKNOWN to disk.
> 
> I think I can simply create an "xfs_rmap_skip_owner_update()" helper (like the
> other xfs_rmap_*_owner functions) to encapsulate this.
> 
> if (oinfo)
> 	new->xbfi_oinfo = *oinfo;
> else
> 	xfs_rmap_skip_owner_update(&new->xbfi_oinfo);
> 
> Seems clearer, I hope?
> 

Ok, yup. Thanks for the explanation.

> Also, the "Special Case #2: EFIs do not record the owner of the extent, so
> when" comment is now wrong and needs to be changed.
> 
> "Special Case #2: An owner of XFS_RMAP_OWN_UNKNOWN means 'no rmap update'".
> 
> > >  	trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0,
> > >  			XFS_FSB_TO_AGBNO(mp, bno), len);
> > >  	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list);
...
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index b5b0901..97f354f 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -1318,6 +1318,71 @@ typedef __be32 xfs_inobt_ptr_t;
> > >   */
> > >  #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
> > >  
> > > +/*
> > > + * Ownership info for an extent.  This is used to create reverse-mapping
> > > + * entries.
> > > + */
> > > +#define XFS_OWNER_INFO_ATTR_FORK	(1 << 0)
> > > +#define XFS_OWNER_INFO_BMBT_BLOCK	(1 << 1)
> > > +struct xfs_owner_info {
> > > +	uint64_t		oi_owner;
> > > +	xfs_fileoff_t		oi_offset;
> > > +	unsigned int		oi_flags;
> > > +};
> > > +
> > > +static inline void
> > > +xfs_rmap_ag_owner(
> > > +	struct xfs_owner_info	*oi,
> > > +	uint64_t		owner)
> > > +{
> > > +	oi->oi_owner = owner;
> > > +	oi->oi_offset = 0;
> > > +	oi->oi_flags = 0;
> > > +}
> > > +
> > > +static inline void
> > > +xfs_rmap_ino_bmbt_owner(
> > > +	struct xfs_owner_info	*oi,
> > > +	xfs_ino_t		ino,
> > > +	int			whichfork)
> > > +{
> > > +	oi->oi_owner = ino;
> > > +	oi->oi_offset = 0;
> > > +	oi->oi_flags = XFS_OWNER_INFO_BMBT_BLOCK;
> > > +	if (whichfork == XFS_ATTR_FORK)
> > > +		oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
> > > +}
> > > +
> > > +static inline void
> > > +xfs_rmap_ino_owner(
> > > +	struct xfs_owner_info	*oi,
> > > +	xfs_ino_t		ino,
> > > +	int			whichfork,
> > > +	xfs_fileoff_t		offset)
> > > +{
> > > +	oi->oi_owner = ino;
> > > +	oi->oi_offset = offset;
> > > +	oi->oi_flags = 0;
> > > +	if (whichfork == XFS_ATTR_FORK)
> > > +		oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
> > > +}
> > > +
> > > +/*
> > > + * Special owner types.
> > > + *
> > > + * Seeing as we only support up to 8EB, we have the upper bit of the owner field
> > > + * to tell us we have a special owner value. We use these for static metadata
> > > + * allocated at mkfs/growfs time, as well as for freespace management metadata.
> > > + */
> > > +#define XFS_RMAP_OWN_NULL	(-1ULL)	/* No owner, for growfs */
> > > +#define XFS_RMAP_OWN_UNKNOWN	(-2ULL)	/* Unknown owner, for EFI recovery */
> > > +#define XFS_RMAP_OWN_FS		(-3ULL)	/* static fs metadata */
> > > +#define XFS_RMAP_OWN_LOG	(-4ULL)	/* static fs metadata */
> > > +#define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
> > 
> > How about XFS_RMAP_OWN_AGFL? OWN_AG confuses me into thinking it's for
> > AG headers, but IIUC that is covered by OWN_FS.
> 
> or _SPACEBT for AG {free,rmap} space btrees?
> 

I was thinking that this type only represented free list blocks and that
the mapping would be updated when the block was actually allocated to a
btree. As Dave points out in his followup response, that is not the
case. OWN_AG actually makes more sense to me in that light, so feel free
to disregard this comment.

Brian

> > > +#define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
> > > +#define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
> > > +#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
> > > +
> > >  #define	XFS_RMAP_BLOCK(mp) \
> > >  	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
> > >  	 XFS_FIBT_BLOCK(mp) + 1 : \
> > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > index dbc3e35..1982561 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > @@ -615,6 +615,7 @@ xfs_ialloc_ag_alloc(
> > >  	args.tp = tp;
> > >  	args.mp = tp->t_mountp;
> > >  	args.fsbno = NULLFSBLOCK;
> > > +	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INODES);
> > >  
> > >  #ifdef DEBUG
> > >  	/* randomly do sparse inode allocations */
> > > @@ -1825,12 +1826,14 @@ xfs_difree_inode_chunk(
> > >  	int		nextbit;
> > >  	xfs_agblock_t	agbno;
> > >  	int		contigblk;
> > > +	struct xfs_owner_info	oinfo;
> > >  	DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
> > > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES);
> > >  
> > >  	if (!xfs_inobt_issparse(rec->ir_holemask)) {
> > >  		/* not sparse, calculate extent info directly */
> > >  		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, sagbno),
> > > -				  mp->m_ialloc_blks);
> > > +				  mp->m_ialloc_blks, &oinfo);
> > >  		return;
> > >  	}
> > >  
> > > @@ -1874,7 +1877,7 @@ xfs_difree_inode_chunk(
> > >  		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
> > >  		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
> > >  		xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, agbno),
> > > -				  contigblk);
> > > +				  contigblk, &oinfo);
> > >  
> > >  		/* reset range to current bit and carry on... */
> > >  		startidx = endidx = nextbit;
> > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > index 88da2ad..f9ea86b 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > @@ -96,6 +96,7 @@ xfs_inobt_alloc_block(
> > >  	memset(&args, 0, sizeof(args));
> > >  	args.tp = cur->bc_tp;
> > >  	args.mp = cur->bc_mp;
> > > +	xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INOBT);
> > >  	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
> > >  	args.minlen = 1;
> > >  	args.maxlen = 1;
> > > @@ -125,8 +126,12 @@ xfs_inobt_free_block(
> > >  	struct xfs_btree_cur	*cur,
> > >  	struct xfs_buf		*bp)
> > >  {
> > > +	struct xfs_owner_info	oinfo;
> > > +
> > > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
> > >  	return xfs_free_extent(cur->bc_tp,
> > > -			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1);
> > > +			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
> > > +			&oinfo);
> > >  }
> > >  
> > >  STATIC int
> > > diff --git a/fs/xfs/xfs_defer_item.c b/fs/xfs/xfs_defer_item.c
> > > index 127a54e..1c2d556 100644
> > > --- a/fs/xfs/xfs_defer_item.c
> > > +++ b/fs/xfs/xfs_defer_item.c
> > > @@ -99,7 +99,8 @@ xfs_bmap_free_finish_item(
> > >  	free = container_of(item, struct xfs_bmap_free_item, xbfi_list);
> > >  	error = xfs_trans_free_extent(tp, done_item,
> > >  			free->xbfi_startblock,
> > > -			free->xbfi_blockcount);
> > > +			free->xbfi_blockcount,
> > > +			&free->xbfi_oinfo);
> > >  	kmem_free(free);
> > >  	return error;
> > >  }
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 62162d4..d60bb97 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > > @@ -436,6 +436,8 @@ xfs_growfs_data_private(
> > >  	 * There are new blocks in the old last a.g.
> > >  	 */
> > >  	if (new) {
> > > +		struct xfs_owner_info	oinfo;
> > > +
> > >  		/*
> > >  		 * Change the agi length.
> > >  		 */
> > > @@ -463,14 +465,20 @@ xfs_growfs_data_private(
> > >  		       be32_to_cpu(agi->agi_length));
> > >  
> > >  		xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH);
> > > +
> > >  		/*
> > >  		 * Free the new space.
> > > +		 *
> > > +		 * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
> > > +		 * this doesn't actually exist in the rmap btree.
> > >  		 */
> > > -		error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
> > > -			be32_to_cpu(agf->agf_length) - new), new);
> > > -		if (error) {
> > > +		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > > +		error = xfs_free_extent(tp,
> > > +				XFS_AGB_TO_FSB(mp, agno,
> > > +					be32_to_cpu(agf->agf_length) - new),
> > > +				new, &oinfo);
> > > +		if (error)
> > >  			goto error0;
> > > -		}
> > >  	}
> > >  
> > >  	/*
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 080b54b..0c41bd2 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -4180,6 +4180,7 @@ xlog_recover_process_efi(
> > >  	int			error = 0;
> > >  	xfs_extent_t		*extp;
> > >  	xfs_fsblock_t		startblock_fsb;
> > > +	struct xfs_owner_info	oinfo;
> > >  
> > >  	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
> > >  
> > > @@ -4211,10 +4212,12 @@ xlog_recover_process_efi(
> > >  		return error;
> > >  	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
> > >  
> > > +	oinfo.oi_owner = 0;
> > 
> > Should this be XFS_RMAP_OWN_UNKNOWN?
> 
> xfs_rmap_skip_owner_update(), but yes.
> 
> --D
> 
> > 
> > Brian
> > 
> > >  	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
> > >  		extp = &(efip->efi_format.efi_extents[i]);
> > >  		error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
> > > -					      extp->ext_len);
> > > +					      extp->ext_len,
> > > +					      &oinfo);
> > >  		if (error)
> > >  			goto abort_error;
> > >  
> > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > > index 9a462e8..f8d363f 100644
> > > --- a/fs/xfs/xfs_trans.h
> > > +++ b/fs/xfs/xfs_trans.h
> > > @@ -219,7 +219,7 @@ struct xfs_efd_log_item	*xfs_trans_get_efd(xfs_trans_t *,
> > >  				  uint);
> > >  int		xfs_trans_free_extent(struct xfs_trans *,
> > >  				      struct xfs_efd_log_item *, xfs_fsblock_t,
> > > -				      xfs_extlen_t);
> > > +				      xfs_extlen_t, struct xfs_owner_info *);
> > >  int		xfs_trans_commit(struct xfs_trans *);
> > >  int		__xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
> > >  int		xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
> > > diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
> > > index a96ae54..d1b8833 100644
> > > --- a/fs/xfs/xfs_trans_extfree.c
> > > +++ b/fs/xfs/xfs_trans_extfree.c
> > > @@ -118,13 +118,14 @@ xfs_trans_free_extent(
> > >  	struct xfs_trans	*tp,
> > >  	struct xfs_efd_log_item	*efdp,
> > >  	xfs_fsblock_t		start_block,
> > > -	xfs_extlen_t		ext_len)
> > > +	xfs_extlen_t		ext_len,
> > > +	struct xfs_owner_info	*oinfo)
> > >  {
> > >  	uint			next_extent;
> > >  	struct xfs_extent	*extp;
> > >  	int			error;
> > >  
> > > -	error = xfs_free_extent(tp, start_block, ext_len);
> > > +	error = xfs_free_extent(tp, start_block, ext_len, oinfo);
> > >  
> > >  	/*
> > >  	 * Mark the transaction dirty, even on error. This ensures the
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@xxxxxxxxxxx
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@xxxxxxxxxxx
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs