On Thu, Jul 07, 2016 at 12:09:56PM -0700, Darrick J. Wong wrote: > On Thu, Jul 07, 2016 at 11:12:27AM -0400, Brian Foster wrote: > > On Thu, Jun 16, 2016 at 06:20:39PM -0700, Darrick J. Wong wrote: > > > For the rmap btree to work, we have to feed the extent owner > > > information to the the allocation and freeing functions. This > > > information is what will end up in the rmap btree that tracks > > > allocated extents. While we technically don't need the owner > > > information when freeing extents, passing it allows us to validate > > > that the extent we are removing from the rmap btree actually > > > belonged to the owner we expected it to belong to. > > > > > > We also define a special set of owner values for internal metadata > > > that would otherwise have no owner. This allows us to tell the > > > difference between metadata owned by different per-ag btrees, as > > > well as static fs metadata (e.g. AG headers) and internal journal > > > blocks. > > > > > > There are also a couple of special cases we need to take care of - > > > during EFI recovery, we don't actually know who the original owner > > > was, so we need to pass a wildcard to indicate that we aren't > > > checking the owner for validity. We also need special handling in > > > growfs, as we "free" the space in the last AG when extending it, but > > > because it's new space it has no actual owner... > > > > > > While touching the xfs_bmap_add_free() function, re-order the > > > parameters to put the struct xfs_mount first. > > > > > > Extend the owner field to include both the owner type and some sort > > > of index within the owner. The index field will be used to support > > > reverse mappings when reflink is enabled. > > > > > > This is based upon a patch originally from Dave Chinner. It has been > > > extended to add more owner information with the intent of helping > > > recovery operations when things go wrong (e.g. offset of user data > > > block in a file). > > > > > > v2: When we're freeing extents from an EFI, we don't have the owner > > > information available (rmap updates have their own redo items). > > > xfs_free_extent therefore doesn't need to do an rmap update, but the > > > log replay code doesn't signal this correctly. Fix it so that it > > > does. > > > > > > [dchinner: de-shout the xfs_rmap_*_owner helpers] > > > [darrick: minor style fixes suggested by Christoph Hellwig] > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_alloc.c | 11 +++++- > > > fs/xfs/libxfs/xfs_alloc.h | 4 ++ > > > fs/xfs/libxfs/xfs_bmap.c | 17 ++++++++-- > > > fs/xfs/libxfs/xfs_bmap.h | 4 ++ > > > fs/xfs/libxfs/xfs_bmap_btree.c | 6 +++- > > > fs/xfs/libxfs/xfs_format.h | 65 ++++++++++++++++++++++++++++++++++++++ > > > fs/xfs/libxfs/xfs_ialloc.c | 7 +++- > > > fs/xfs/libxfs/xfs_ialloc_btree.c | 7 ++++ > > > fs/xfs/xfs_defer_item.c | 3 +- > > > fs/xfs/xfs_fsops.c | 16 +++++++-- > > > fs/xfs/xfs_log_recover.c | 5 ++- > > > fs/xfs/xfs_trans.h | 2 + > > > fs/xfs/xfs_trans_extfree.c | 5 ++- > > > 13 files changed, 131 insertions(+), 21 deletions(-) > > > > > > ... > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > > > index 3a6d3e3..2c28f2a 100644 > > > --- a/fs/xfs/libxfs/xfs_bmap.c > > > +++ b/fs/xfs/libxfs/xfs_bmap.c > > > @@ -574,7 +574,8 @@ xfs_bmap_add_free( > > > struct xfs_mount *mp, /* mount point structure */ > > > struct xfs_defer_ops *dfops, /* list of extents */ > > > xfs_fsblock_t bno, /* fs block number of extent */ > > > - xfs_filblks_t len) /* length of extent */ > > > + xfs_filblks_t len, /* length of extent */ > > > + struct xfs_owner_info *oinfo) /* extent owner */ > > > { > > > struct xfs_bmap_free_item *new; /* new element */ > > > #ifdef DEBUG > > > @@ -593,9 +594,14 @@ xfs_bmap_add_free( > > > ASSERT(agbno + len <= mp->m_sb.sb_agblocks); > > > #endif > > > ASSERT(xfs_bmap_free_item_zone != NULL); > > > + > > > new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP); > > > new->xbfi_startblock = bno; > > > new->xbfi_blockcount = (xfs_extlen_t)len; > > > + if (oinfo) > > > + memcpy(&new->xbfi_oinfo, oinfo, sizeof(struct xfs_owner_info)); > > > + else > > > + memset(&new->xbfi_oinfo, 0, sizeof(struct xfs_owner_info)); > > > > How about just using KM_ZERO on the allocation and doing something like > > 'if (oinfo) new->xbfi_oinfo = *oinfo'? > > > > BTW, what's the use case for a zeroed out oinfo if we explicitly define > > null/unknown owner types? > > The two main ways we end up altering the rmapbt are as follows: > > 1) Alloc/free of AG metadata blocks. For this use case, the caller (generally > a btree ->alloc_block function) bundles the bnobt and rmapbt updates in the > same transaction by passing ownership info (via this oinfo pointer) to the > alloc/free function. Passing the "special" owner value XFS_RMAP_OWN_NULL just > checks that there are no rmaps for the given range, which is a spot check > performed by growfs. > > 2) Map/unmap of file blocks. For this use case, I must treat map/unmap > separately from alloc/free in order to handle reflink. Therefore, the map & > unmap functions schedule rmap updates directly (via the deferred ops mechanism) > and the alloc/free functions, if they're called, should not update the rmapbt. > Zeroing out the oinfo indicates this. However, XFS_RMAP_OWN_UNKNOWN is now > unused, so I think I can overload that, especially since we should never be > writing XFS_RMAP_OWN_UNKNOWN to disk. > > I think I can simply create an "xfs_rmap_skip_owner_update()" helper (like the > other xfs_rmap_*_owner functions) to encapsulate this. > > if (oinfo) > new->xbfi_oinfo = *oinfo; > else > xfs_rmap_skip_owner_update(&new->xbfi_oinfo); > > Seems clearer, I hope? > Ok, yup. Thanks for the explanation. > Also, the "Special Case #2: EFIs do not record the owner of the extent, so > when" comment is now wrong and needs to be changed. > > "Special Case #2: An owner of XFS_RMAP_OWN_UNKNOWN means 'no rmap update'". > > > > trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0, > > > XFS_FSB_TO_AGBNO(mp, bno), len); > > > xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xbfi_list); ... > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > > index b5b0901..97f354f 100644 > > > --- a/fs/xfs/libxfs/xfs_format.h > > > +++ b/fs/xfs/libxfs/xfs_format.h > > > @@ -1318,6 +1318,71 @@ typedef __be32 xfs_inobt_ptr_t; > > > */ > > > #define XFS_RMAP_CRC_MAGIC 0x524d4233 /* 'RMB3' */ > > > > > > +/* > > > + * Ownership info for an extent. This is used to create reverse-mapping > > > + * entries. > > > + */ > > > +#define XFS_OWNER_INFO_ATTR_FORK (1 << 0) > > > +#define XFS_OWNER_INFO_BMBT_BLOCK (1 << 1) > > > +struct xfs_owner_info { > > > + uint64_t oi_owner; > > > + xfs_fileoff_t oi_offset; > > > + unsigned int oi_flags; > > > +}; > > > + > > > +static inline void > > > +xfs_rmap_ag_owner( > > > + struct xfs_owner_info *oi, > > > + uint64_t owner) > > > +{ > > > + oi->oi_owner = owner; > > > + oi->oi_offset = 0; > > > + oi->oi_flags = 0; > > > +} > > > + > > > +static inline void > > > +xfs_rmap_ino_bmbt_owner( > > > + struct xfs_owner_info *oi, > > > + xfs_ino_t ino, > > > + int whichfork) > > > +{ > > > + oi->oi_owner = ino; > > > + oi->oi_offset = 0; > > > + oi->oi_flags = XFS_OWNER_INFO_BMBT_BLOCK; > > > + if (whichfork == XFS_ATTR_FORK) > > > + oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK; > > > +} > > > + > > > +static inline void > > > +xfs_rmap_ino_owner( > > > + struct xfs_owner_info *oi, > > > + xfs_ino_t ino, > > > + int whichfork, > > > + xfs_fileoff_t offset) > > > +{ > > > + oi->oi_owner = ino; > > > + oi->oi_offset = offset; > > > + oi->oi_flags = 0; > > > + if (whichfork == XFS_ATTR_FORK) > > > + oi->oi_flags |= XFS_OWNER_INFO_ATTR_FORK; > > > +} > > > + > > > +/* > > > + * Special owner types. > > > + * > > > + * Seeing as we only support up to 8EB, we have the upper bit of the owner field > > > + * to tell us we have a special owner value. We use these for static metadata > > > + * allocated at mkfs/growfs time, as well as for freespace management metadata. > > > + */ > > > +#define XFS_RMAP_OWN_NULL (-1ULL) /* No owner, for growfs */ > > > +#define XFS_RMAP_OWN_UNKNOWN (-2ULL) /* Unknown owner, for EFI recovery */ > > > +#define XFS_RMAP_OWN_FS (-3ULL) /* static fs metadata */ > > > +#define XFS_RMAP_OWN_LOG (-4ULL) /* static fs metadata */ > > > +#define XFS_RMAP_OWN_AG (-5ULL) /* AG freespace btree blocks */ > > > > How about XFS_RMAP_OWN_AGFL? OWN_AG confuses me into thinking it's for > > AG headers, but IIUC that is covered by OWN_FS. > > or _SPACEBT for AG {free,rmap} space btrees? > I was thinking that this type only represented free list blocks and that the mapping would be updated when the block was actually allocated to a btree. As Dave points out in his followup response, that is not the case. OWN_AG actually makes more sense to me in that light, so feel free to disregard this comment. Brian > > > +#define XFS_RMAP_OWN_INOBT (-6ULL) /* Inode btree blocks */ > > > +#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */ > > > +#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */ > > > + > > > #define XFS_RMAP_BLOCK(mp) \ > > > (xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \ > > > XFS_FIBT_BLOCK(mp) + 1 : \ > > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c > > > index dbc3e35..1982561 100644 > > > --- a/fs/xfs/libxfs/xfs_ialloc.c > > > +++ b/fs/xfs/libxfs/xfs_ialloc.c > > > @@ -615,6 +615,7 @@ xfs_ialloc_ag_alloc( > > > args.tp = tp; > > > args.mp = tp->t_mountp; > > > args.fsbno = NULLFSBLOCK; > > > + xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INODES); > > > > > > #ifdef DEBUG > > > /* randomly do sparse inode allocations */ > > > @@ -1825,12 +1826,14 @@ xfs_difree_inode_chunk( > > > int nextbit; > > > xfs_agblock_t agbno; > > > int contigblk; > > > + struct xfs_owner_info oinfo; > > > DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS); > > > + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INODES); > > > > > > if (!xfs_inobt_issparse(rec->ir_holemask)) { > > > /* not sparse, calculate extent info directly */ > > > xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, sagbno), > > > - mp->m_ialloc_blks); > > > + mp->m_ialloc_blks, &oinfo); > > > return; > > > } > > > > > > @@ -1874,7 +1877,7 @@ xfs_difree_inode_chunk( > > > ASSERT(agbno % mp->m_sb.sb_spino_align == 0); > > > ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); > > > xfs_bmap_add_free(mp, dfops, XFS_AGB_TO_FSB(mp, agno, agbno), > > > - contigblk); > > > + contigblk, &oinfo); > > > > > > /* reset range to current bit and carry on... */ > > > startidx = endidx = nextbit; > > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c > > > index 88da2ad..f9ea86b 100644 > > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c > > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c > > > @@ -96,6 +96,7 @@ xfs_inobt_alloc_block( > > > memset(&args, 0, sizeof(args)); > > > args.tp = cur->bc_tp; > > > args.mp = cur->bc_mp; > > > + xfs_rmap_ag_owner(&args.oinfo, XFS_RMAP_OWN_INOBT); > > > args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno); > > > args.minlen = 1; > > > args.maxlen = 1; > > > @@ -125,8 +126,12 @@ xfs_inobt_free_block( > > > struct xfs_btree_cur *cur, > > > struct xfs_buf *bp) > > > { > > > + struct xfs_owner_info oinfo; > > > + > > > + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT); > > > return xfs_free_extent(cur->bc_tp, > > > - XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1); > > > + XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1, > > > + &oinfo); > > > } > > > > > > STATIC int > > > diff --git a/fs/xfs/xfs_defer_item.c b/fs/xfs/xfs_defer_item.c > > > index 127a54e..1c2d556 100644 > > > --- a/fs/xfs/xfs_defer_item.c > > > +++ b/fs/xfs/xfs_defer_item.c > > > @@ -99,7 +99,8 @@ xfs_bmap_free_finish_item( > > > free = container_of(item, struct xfs_bmap_free_item, xbfi_list); > > > error = xfs_trans_free_extent(tp, done_item, > > > free->xbfi_startblock, > > > - free->xbfi_blockcount); > > > + free->xbfi_blockcount, > > > + &free->xbfi_oinfo); > > > kmem_free(free); > > > return error; > > > } > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > index 62162d4..d60bb97 100644 > > > --- a/fs/xfs/xfs_fsops.c > > > +++ b/fs/xfs/xfs_fsops.c > > > @@ -436,6 +436,8 @@ xfs_growfs_data_private( > > > * There are new blocks in the old last a.g. > > > */ > > > if (new) { > > > + struct xfs_owner_info oinfo; > > > + > > > /* > > > * Change the agi length. > > > */ > > > @@ -463,14 +465,20 @@ xfs_growfs_data_private( > > > be32_to_cpu(agi->agi_length)); > > > > > > xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH); > > > + > > > /* > > > * Free the new space. > > > + * > > > + * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that > > > + * this doesn't actually exist in the rmap btree. > > > */ > > > - error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno, > > > - be32_to_cpu(agf->agf_length) - new), new); > > > - if (error) { > > > + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > > > + error = xfs_free_extent(tp, > > > + XFS_AGB_TO_FSB(mp, agno, > > > + be32_to_cpu(agf->agf_length) - new), > > > + new, &oinfo); > > > + if (error) > > > goto error0; > > > - } > > > } > > > > > > /* > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > > index 080b54b..0c41bd2 100644 > > > --- a/fs/xfs/xfs_log_recover.c > > > +++ b/fs/xfs/xfs_log_recover.c > > > @@ -4180,6 +4180,7 @@ xlog_recover_process_efi( > > > int error = 0; > > > xfs_extent_t *extp; > > > xfs_fsblock_t startblock_fsb; > > > + struct xfs_owner_info oinfo; > > > > > > ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)); > > > > > > @@ -4211,10 +4212,12 @@ xlog_recover_process_efi( > > > return error; > > > efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents); > > > > > > + oinfo.oi_owner = 0; > > > > Should this be XFS_RMAP_OWN_UNKNOWN? > > xfs_rmap_skip_owner_update(), but yes. > > --D > > > > > Brian > > > > > for (i = 0; i < efip->efi_format.efi_nextents; i++) { > > > extp = &(efip->efi_format.efi_extents[i]); > > > error = xfs_trans_free_extent(tp, efdp, extp->ext_start, > > > - extp->ext_len); > > > + extp->ext_len, > > > + &oinfo); > > > if (error) > > > goto abort_error; > > > > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > > index 9a462e8..f8d363f 100644 > > > --- a/fs/xfs/xfs_trans.h > > > +++ b/fs/xfs/xfs_trans.h > > > @@ -219,7 +219,7 @@ struct xfs_efd_log_item *xfs_trans_get_efd(xfs_trans_t *, > > > uint); > > > int xfs_trans_free_extent(struct xfs_trans *, > > > struct xfs_efd_log_item *, xfs_fsblock_t, > > > - xfs_extlen_t); > > > + xfs_extlen_t, struct xfs_owner_info *); > > > int xfs_trans_commit(struct xfs_trans *); > > > int __xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *); > > > int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *); > > > diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c > > > index a96ae54..d1b8833 100644 > > > --- a/fs/xfs/xfs_trans_extfree.c > > > +++ b/fs/xfs/xfs_trans_extfree.c > > > @@ -118,13 +118,14 @@ xfs_trans_free_extent( > > > struct xfs_trans *tp, > > > struct xfs_efd_log_item *efdp, > > > xfs_fsblock_t start_block, > > > - xfs_extlen_t ext_len) > > > + xfs_extlen_t ext_len, > > > + struct xfs_owner_info *oinfo) > > > { > > > uint next_extent; > > > struct xfs_extent *extp; > > > int error; > > > > > > - error = xfs_free_extent(tp, start_block, ext_len); > > > + error = xfs_free_extent(tp, start_block, ext_len, oinfo); > > > > > > /* > > > * Mark the transaction dirty, even on error. This ensures the > > > > > > _______________________________________________ > > > xfs mailing list > > > xfs@xxxxxxxxxxx > > > http://oss.sgi.com/mailman/listinfo/xfs > > > > _______________________________________________ > > xfs mailing list > > xfs@xxxxxxxxxxx > > http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html