Re: [PATCH 042/119] xfs: log rmap intent items

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 16, 2016 at 12:34:09AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 15, 2016 at 02:33:46PM -0400, Brian Foster wrote:
> > On Thu, Jun 16, 2016 at 06:22:21PM -0700, Darrick J. Wong wrote:
> > > Provide a mechanism for higher levels to create RUI/RUD items, submit
> > > them to the log, and a stub function to deal with recovered RUI items.
> > > These parts will be connected to the rmapbt in a later patch.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > ---
> > 
> > The commit log makes no mention of log recovery.. perhaps this should be
> > split in two?
> > 
> > >  fs/xfs/Makefile          |    1 
> > >  fs/xfs/xfs_log_recover.c |  344 +++++++++++++++++++++++++++++++++++++++++++++-
> > >  fs/xfs/xfs_trans.h       |   17 ++
> > >  fs/xfs/xfs_trans_rmap.c  |  235 +++++++++++++++++++++++++++++++
> > >  4 files changed, 589 insertions(+), 8 deletions(-)
> > >  create mode 100644 fs/xfs/xfs_trans_rmap.c
> > > 
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 8ae0a10..1980110 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -110,6 +110,7 @@ xfs-y				+= xfs_log.o \
> > >  				   xfs_trans_buf.o \
> > >  				   xfs_trans_extfree.o \
> > >  				   xfs_trans_inode.o \
> > > +				   xfs_trans_rmap.o \
> > >  
> > >  # optional features
> > >  xfs-$(CONFIG_XFS_QUOTA)		+= xfs_dquot.o \
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index b33187b..c9fe0c4 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
...
> > > @@ -4265,17 +4383,23 @@ xlog_recover_process_efis(
> > >  	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> > >  	while (lip != NULL) {
> > >  		/*
> > > -		 * We're done when we see something other than an EFI.
> > > -		 * There should be no EFIs left in the AIL now.
> > > +		 * We're done when we see something other than an intent.
> > > +		 * There should be no intents left in the AIL now.
> > >  		 */
> > > -		if (lip->li_type != XFS_LI_EFI) {
> > > +		if (!xlog_item_is_intent(lip)) {
> > >  #ifdef DEBUG
> > >  			for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
> > > -				ASSERT(lip->li_type != XFS_LI_EFI);
> > > +				ASSERT(!xlog_item_is_intent(lip));
> > >  #endif
> > >  			break;
> > >  		}
> > >  
> > > +		/* Skip anything that isn't an EFI */
> > > +		if (lip->li_type != XFS_LI_EFI) {
> > > +			lip = xfs_trans_ail_cursor_next(ailp, &cur);
> > > +			continue;
> > > +		}
> > > +
> > 
> > Hmm, so previously this function used the existence of any non-EFI item
> > as an end of traversal marker, since the freeing operations add more
> > items to the AIL. It's not immediately clear to me whether this is just
> > an efficiency thing or a potential problem, but I wonder if we should
> > grab the last item and use that or its lsn as an end of list marker.
> 
> FWIW I designed all this under the impression that it was safe to stop looking
> for intent items once we found something that wasn't an intent item because all
> the new items generated during log recovery came after, and therefore there was
> no problem.
> 

Ok. To be clear, are you saying that any new intents should follow
non-intent items? If so, that sounds... reasonable (perhaps a little
landmind-ish :P).

> > At the very least we need to update the comment at the top of the
> > function wrt to the current behavior.
> 
> Oops, missed that, yeah.
> 
> > >  		/*
> > >  		 * Skip EFIs that we've already processed.
> > >  		 */
...
> > > @@ -5144,11 +5458,19 @@ xlog_recover_finish(
> > >  	 */
> > >  	if (log->l_flags & XLOG_RECOVERY_NEEDED) {
> > >  		int	error;
> > > +
> > > +		error = xlog_recover_process_ruis(log);
> > > +		if (error) {
> > > +			xfs_alert(log->l_mp, "Failed to recover RUIs");
> > > +			return error;
> > > +		}
> > > +
> > >  		error = xlog_recover_process_efis(log);
> > >  		if (error) {
> > >  			xfs_alert(log->l_mp, "Failed to recover EFIs");
> > >  			return error;
> > >  		}
> > > +
> > 
> > Is the order important here in any way (e.g., RUIs before EFIs)? If so,
> > it might be a good idea to call it out.
> 
> AFAIK the intent items within a particular type have to be replayed in
> order, but between types, there isn't a problem with the current code.
> 
> That said, I'd also been wondering if it made more sense to iterate the
> list of items /once/ and actually replay items in order.  Less iteration
> and the order of replayed items matches the log order much more closely.
> 

That sounds like a nice idea to me. There might actually be some room
for consolidation between the RUI/EFI recovered bits and whatnot, but
only if it makes things more clean and simple.

Brian

> > >  		/*
> > >  		 * Sync the log to get all the EFIs out of the AIL.
> > >  		 * This isn't absolutely necessary, but it helps in
> > > @@ -5176,9 +5498,15 @@ xlog_recover_cancel(
> > >  	struct xlog	*log)
> > >  {
> > >  	int		error = 0;
> > > +	int		err2;
> > >  
> > > -	if (log->l_flags & XLOG_RECOVERY_NEEDED)
> > > -		error = xlog_recover_cancel_efis(log);
> > > +	if (log->l_flags & XLOG_RECOVERY_NEEDED) {
> > > +		error = xlog_recover_cancel_ruis(log);
> > > +
> > > +		err2 = xlog_recover_cancel_efis(log);
> > > +		if (err2 && !error)
> > > +			error = err2;
> > > +	}
> > >  
> > >  	return error;
> > >  }
> > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > > index f8d363f..c48be63 100644
> > > --- a/fs/xfs/xfs_trans.h
> > > +++ b/fs/xfs/xfs_trans.h
> > > @@ -235,4 +235,21 @@ void		xfs_trans_buf_copy_type(struct xfs_buf *dst_bp,
> > >  extern kmem_zone_t	*xfs_trans_zone;
> > >  extern kmem_zone_t	*xfs_log_item_desc_zone;
> > >  
> > > +enum xfs_rmap_intent_type;
> > > +
> > > +struct xfs_rui_log_item *xfs_trans_get_rui(struct xfs_trans *tp, uint nextents);
> > > +void xfs_trans_log_start_rmap_update(struct xfs_trans *tp,
> > > +		struct xfs_rui_log_item *ruip, enum xfs_rmap_intent_type type,
> > > +		__uint64_t owner, int whichfork, xfs_fileoff_t startoff,
> > > +		xfs_fsblock_t startblock, xfs_filblks_t blockcount,
> > > +		xfs_exntst_t state);
> > > +
> > > +struct xfs_rud_log_item *xfs_trans_get_rud(struct xfs_trans *tp,
> > > +		struct xfs_rui_log_item *ruip, uint nextents);
> > > +int xfs_trans_log_finish_rmap_update(struct xfs_trans *tp,
> > > +		struct xfs_rud_log_item *rudp, enum xfs_rmap_intent_type type,
> > > +		__uint64_t owner, int whichfork, xfs_fileoff_t startoff,
> > > +		xfs_fsblock_t startblock, xfs_filblks_t blockcount,
> > > +		xfs_exntst_t state);
> > > +
> > >  #endif	/* __XFS_TRANS_H__ */
> > > diff --git a/fs/xfs/xfs_trans_rmap.c b/fs/xfs/xfs_trans_rmap.c
> > > new file mode 100644
> > > index 0000000..b55a725
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_trans_rmap.c
> > > @@ -0,0 +1,235 @@
> > > +/*
> > > + * Copyright (C) 2016 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_trans_priv.h"
> > > +#include "xfs_rmap_item.h"
> > > +#include "xfs_alloc.h"
> > > +#include "xfs_rmap_btree.h"
> > > +
> > > +/*
> > > + * This routine is called to allocate an "rmap update intent"
> > > + * log item that will hold nextents worth of extents.  The
> > > + * caller must use all nextents extents, because we are not
> > > + * flexible about this at all.
> > > + */
> > > +struct xfs_rui_log_item *
> > > +xfs_trans_get_rui(
> > > +	struct xfs_trans		*tp,
> > > +	uint				nextents)
> > > +{
> > > +	struct xfs_rui_log_item		*ruip;
> > > +
> > > +	ASSERT(tp != NULL);
> > > +	ASSERT(nextents > 0);
> > > +
> > > +	ruip = xfs_rui_init(tp->t_mountp, nextents);
> > > +	ASSERT(ruip != NULL);
> > > +
> > > +	/*
> > > +	 * Get a log_item_desc to point at the new item.
> > > +	 */
> > > +	xfs_trans_add_item(tp, &ruip->rui_item);
> > > +	return ruip;
> > > +}
> > > +
> > > +/*
> > > + * This routine is called to indicate that the described
> > > + * extent is to be logged as needing to be freed.  It should
> > > + * be called once for each extent to be freed.
> > > + */
> > 
> > Stale comment.
> 
> <nod>
> 
> > > +void
> > > +xfs_trans_log_start_rmap_update(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_rui_log_item		*ruip,
> > > +	enum xfs_rmap_intent_type	type,
> > > +	__uint64_t			owner,
> > > +	int				whichfork,
> > > +	xfs_fileoff_t			startoff,
> > > +	xfs_fsblock_t			startblock,
> > > +	xfs_filblks_t			blockcount,
> > > +	xfs_exntst_t			state)
> > > +{
> > > +	uint				next_extent;
> > > +	struct xfs_map_extent		*rmap;
> > > +
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	ruip->rui_item.li_desc->lid_flags |= XFS_LID_DIRTY;
> > > +
> > > +	/*
> > > +	 * atomic_inc_return gives us the value after the increment;
> > > +	 * we want to use it as an array index so we need to subtract 1 from
> > > +	 * it.
> > > +	 */
> > > +	next_extent = atomic_inc_return(&ruip->rui_next_extent) - 1;
> > > +	ASSERT(next_extent < ruip->rui_format.rui_nextents);
> > > +	rmap = &(ruip->rui_format.rui_extents[next_extent]);
> > > +	rmap->me_owner = owner;
> > > +	rmap->me_startblock = startblock;
> > > +	rmap->me_startoff = startoff;
> > > +	rmap->me_len = blockcount;
> > > +	rmap->me_flags = 0;
> > > +	if (state == XFS_EXT_UNWRITTEN)
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNWRITTEN;
> > > +	if (whichfork == XFS_ATTR_FORK)
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_ATTR_FORK;
> > > +	switch (type) {
> > > +	case XFS_RMAP_MAP:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_MAP;
> > > +		break;
> > > +	case XFS_RMAP_MAP_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_MAP_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_UNMAP:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNMAP;
> > > +		break;
> > > +	case XFS_RMAP_UNMAP_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNMAP_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_CONVERT:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_CONVERT;
> > > +		break;
> > > +	case XFS_RMAP_CONVERT_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_CONVERT_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_ALLOC:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_ALLOC;
> > > +		break;
> > > +	case XFS_RMAP_FREE:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_FREE;
> > > +		break;
> > > +	default:
> > > +		ASSERT(0);
> > > +	}
> > 
> > Between here and the finish function, it looks like we could use a
> > helper to convert the state and whatnot to extent flags.
> 
> Ok.
> 
> > > +}
> > > +
> > > +
> > > +/*
> > > + * This routine is called to allocate an "extent free done"
> > > + * log item that will hold nextents worth of extents.  The
> > > + * caller must use all nextents extents, because we are not
> > > + * flexible about this at all.
> > > + */
> > 
> > Comment needs updating.
> 
> Ok.
> 
> > Brian
> > 
> > > +struct xfs_rud_log_item *
> > > +xfs_trans_get_rud(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_rui_log_item		*ruip,
> > > +	uint				nextents)
> > > +{
> > > +	struct xfs_rud_log_item		*rudp;
> > > +
> > > +	ASSERT(tp != NULL);
> > > +	ASSERT(nextents > 0);
> > > +
> > > +	rudp = xfs_rud_init(tp->t_mountp, ruip, nextents);
> > > +	ASSERT(rudp != NULL);
> > > +
> > > +	/*
> > > +	 * Get a log_item_desc to point at the new item.
> > > +	 */
> > > +	xfs_trans_add_item(tp, &rudp->rud_item);
> > > +	return rudp;
> > > +}
> > > +
> > > +/*
> > > + * Finish an rmap update and log it to the RUD. Note that the transaction is
> > > + * marked dirty regardless of whether the rmap update succeeds or fails to
> > > + * support the RUI/RUD lifecycle rules.
> > > + */
> > > +int
> > > +xfs_trans_log_finish_rmap_update(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_rud_log_item		*rudp,
> > > +	enum xfs_rmap_intent_type	type,
> > > +	__uint64_t			owner,
> > > +	int				whichfork,
> > > +	xfs_fileoff_t			startoff,
> > > +	xfs_fsblock_t			startblock,
> > > +	xfs_filblks_t			blockcount,
> > > +	xfs_exntst_t			state)
> > > +{
> > > +	uint				next_extent;
> > > +	struct xfs_map_extent		*rmap;
> > > +	int				error;
> > > +
> > > +	/* XXX: actually finish the rmap update here */
> > > +	error = -EFSCORRUPTED;
> > > +
> > > +	/*
> > > +	 * Mark the transaction dirty, even on error. This ensures the
> > > +	 * transaction is aborted, which:
> > > +	 *
> > > +	 * 1.) releases the RUI and frees the RUD
> > > +	 * 2.) shuts down the filesystem
> > > +	 */
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	rudp->rud_item.li_desc->lid_flags |= XFS_LID_DIRTY;
> > > +
> > > +	next_extent = rudp->rud_next_extent;
> > > +	ASSERT(next_extent < rudp->rud_format.rud_nextents);
> > > +	rmap = &(rudp->rud_format.rud_extents[next_extent]);
> > > +	rmap->me_owner = owner;
> > > +	rmap->me_startblock = startblock;
> > > +	rmap->me_startoff = startoff;
> > > +	rmap->me_len = blockcount;
> > > +	rmap->me_flags = 0;
> > > +	if (state == XFS_EXT_UNWRITTEN)
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNWRITTEN;
> > > +	if (whichfork == XFS_ATTR_FORK)
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_ATTR_FORK;
> > > +	switch (type) {
> > > +	case XFS_RMAP_MAP:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_MAP;
> > > +		break;
> > > +	case XFS_RMAP_MAP_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_MAP_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_UNMAP:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNMAP;
> > > +		break;
> > > +	case XFS_RMAP_UNMAP_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_UNMAP_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_CONVERT:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_CONVERT;
> > > +		break;
> > > +	case XFS_RMAP_CONVERT_SHARED:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_CONVERT_SHARED;
> > > +		break;
> > > +	case XFS_RMAP_ALLOC:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_ALLOC;
> > > +		break;
> > > +	case XFS_RMAP_FREE:
> > > +		rmap->me_flags |= XFS_RMAP_EXTENT_FREE;
> > > +		break;
> > > +	default:
> > > +		ASSERT(0);
> > > +	}
> > > +	rudp->rud_next_extent++;
> > > +
> > > +	return error;
> > > +}
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@xxxxxxxxxxx
> > > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux