Re: [PATCH v2 2/6] xfs: defer agfl block frees when dfops is available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 18, 2018 at 09:31:15AM -0400, Brian Foster wrote:
> The AGFL fixup code executes before every block allocation/free and
> rectifies the AGFL based on the current, dynamic allocation
> requirements of the fs. The AGFL must hold a minimum number of
> blocks to satisfy a worst case split of the free space btrees caused
> by the impending allocation operation. The AGFL is also updated to
> maintain the implicit requirement for a minimum number of free slots
> to satisfy a worst case join of the free space btrees.
> 
> Since the AGFL caches individual blocks, AGFL reduction typically
> involves multiple, single block frees. We've had reports of
> transaction overrun problems during certain workloads that boil down
> to AGFL reduction freeing multiple blocks and consuming more space
> in the log than was reserved for the transaction.
> 
> Since the objective of freeing AGFL blocks is to ensure free AGFL
> free slots are available for the upcoming allocation, one way to
> address this problem is to release surplus blocks from the AGFL
> immediately but defer the free of those blocks (similar to how
> file-mapped blocks are unmapped from the file in one transaction and
> freed via a deferred operation) until the transaction is rolled.
> This turns AGFL reduction into an operation with predictable log
> reservation consumption.
> 
> Add the capability to defer AGFL block frees when a deferred ops
> list is available to the AGFL fixup code. Add a dfops pointer to the
> transaction to carry dfops through various contexts to the allocator
> context. Deferring AGFL frees is  conditional behavior based on
> whether the transaction pointer is populated. The long term
> objective is to reuse the transaction pointer to clean up all
> unrelated callchains that pass dfops on the stack along with a
> transaction and in doing so, consistently defer AGFL blocks from the
> allocator.
> 
> A bit of customization is required to handle deferred completion
> processing because AGFL blocks are accounted against a per-ag
> reservation pool and AGFL blocks are not inserted into the extent
> busy list when freed (they are inserted when used and released back
> to the AGFL). Reuse the majority of the existing deferred extent
> free infrastructure and customize it appropriately to handle AGFL
> blocks.
> 
> Note that this patch only adds infrastructure. It does not change
> behavior because no callers have been updated to pass ->t_agfl_dfops
> into the allocation code.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_alloc.c  | 51 ++++++++++++++++++++++++++++++---
>  fs/xfs/libxfs/xfs_defer.h  |  1 +
>  fs/xfs/xfs_trace.h         |  2 ++
>  fs/xfs/xfs_trans.c         | 11 ++++++--
>  fs/xfs/xfs_trans.h         |  1 +
>  fs/xfs/xfs_trans_extfree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 129 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 193a5b4909c5..350ad203b082 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -39,6 +39,9 @@
>  #include "xfs_buf_item.h"
>  #include "xfs_log.h"
>  #include "xfs_ag_resv.h"
> +#include "xfs_bmap.h"
> +
> +extern kmem_zone_t	*xfs_bmap_free_item_zone;
>  
>  struct workqueue_struct *xfs_alloc_wq;
>  
> @@ -2172,6 +2175,40 @@ xfs_agfl_reset(
>  }
>  
>  /*
> + * Defer an AGFL block free. This is effectively equivalent to
> + * xfs_bmap_add_free() with some special handling particular to AGFL blocks.
> + *
> + * Deferring AGFL frees helps prevent log reservation overruns due to too many
> + * allocation operations in a transaction. AGFL frees are prone to this problem
> + * because for one they are always freed one at a time. Further, an immediate
> + * AGFL block free can cause a btree join and require another block free before
> + * the real allocation can proceed. Deferring the free disconnects freeing up
> + * the AGFL slot from freeing the block.
> + */
> +STATIC void
> +xfs_defer_agfl_block(
> +	struct xfs_mount		*mp,
> +	struct xfs_defer_ops		*dfops,
> +	xfs_agnumber_t			agno,
> +	xfs_fsblock_t			agbno,
> +	struct xfs_owner_info		*oinfo)
> +{
> +	struct xfs_extent_free_item	*new;		/* new element */
> +
> +	ASSERT(xfs_bmap_free_item_zone != NULL);
> +	ASSERT(oinfo != NULL);
> +
> +	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
> +	new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
> +	new->xefi_blockcount = 1;
> +	new->xefi_oinfo = *oinfo;
> +
> +	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
> +
> +	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list);
> +}
> +
> +/*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
>   */
> @@ -2275,10 +2312,16 @@ xfs_alloc_fix_freelist(
>  		if (error)
>  			goto out_agbp_relse;
>  
> -		error = xfs_free_agfl_block(tp, args->agno, bno, agbp,
> -					    &targs.oinfo);
> -		if (error)
> -			goto out_agbp_relse;
> +		/* defer agfl frees if dfops is provided */
> +		if (tp->t_agfl_dfops) {
> +			xfs_defer_agfl_block(mp, tp->t_agfl_dfops, args->agno,
> +					     bno, &targs.oinfo);
> +		} else {
> +			error = xfs_free_agfl_block(tp, args->agno, bno, agbp,
> +						    &targs.oinfo);
> +			if (error)
> +				goto out_agbp_relse;
> +		}
>  	}
>  
>  	targs.tp = tp;
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 045beacdd37d..e70725ba1f5f 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -55,6 +55,7 @@ enum xfs_defer_ops_type {
>  	XFS_DEFER_OPS_TYPE_REFCOUNT,
>  	XFS_DEFER_OPS_TYPE_RMAP,
>  	XFS_DEFER_OPS_TYPE_FREE,
> +	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>  	XFS_DEFER_OPS_TYPE_MAX,
>  };
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 8955254b900e..00e6aaea6565 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -2433,6 +2433,8 @@ DEFINE_DEFER_PENDING_EVENT(xfs_defer_pending_abort);
>  #define DEFINE_BMAP_FREE_DEFERRED_EVENT DEFINE_PHYS_EXTENT_DEFERRED_EVENT
>  DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_bmap_free_defer);
>  DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_bmap_free_deferred);
> +DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_agfl_free_defer);
> +DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_agfl_free_deferred);
>  
>  /* rmap tracepoints */
>  DECLARE_EVENT_CLASS(xfs_rmap_class,
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index d6d8f9d129a7..06adb1a3e31f 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -31,6 +31,7 @@
>  #include "xfs_log.h"
>  #include "xfs_trace.h"
>  #include "xfs_error.h"
> +#include "xfs_defer.h"
>  
>  kmem_zone_t	*xfs_trans_zone;
>  kmem_zone_t	*xfs_log_item_desc_zone;
> @@ -94,11 +95,11 @@ xfs_trans_free(
>   * blocks.  Locks and log items, however, are no inherited.  They must
>   * be added to the new transaction explicitly.
>   */
> -STATIC xfs_trans_t *
> +STATIC struct xfs_trans *
>  xfs_trans_dup(
> -	xfs_trans_t	*tp)
> +	struct xfs_trans	*tp)
>  {
> -	xfs_trans_t	*ntp;
> +	struct xfs_trans	*ntp;
>  
>  	ntp = kmem_zone_zalloc(xfs_trans_zone, KM_SLEEP);
>  
> @@ -127,6 +128,7 @@ xfs_trans_dup(
>  	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
>  	tp->t_rtx_res = tp->t_rtx_res_used;
>  	ntp->t_pflags = tp->t_pflags;
> +	ntp->t_agfl_dfops = tp->t_agfl_dfops;
>  
>  	xfs_trans_dup_dqinfo(tp, ntp);
>  
> @@ -936,6 +938,9 @@ __xfs_trans_commit(
>  	int			error = 0;
>  	int			sync = tp->t_flags & XFS_TRANS_SYNC;
>  
> +	ASSERT(!tp->t_agfl_dfops ||
> +	       !xfs_defer_has_unfinished_work(tp->t_agfl_dfops) || regrant);
> +
>  	/*
>  	 * If there is nothing to be logged by the transaction,
>  	 * then unlock all of the items associated with the
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 9d542dfe0052..834388c2c9de 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -111,6 +111,7 @@ typedef struct xfs_trans {
>  	struct xlog_ticket	*t_ticket;	/* log mgr ticket */
>  	struct xfs_mount	*t_mountp;	/* ptr to fs mount struct */
>  	struct xfs_dquot_acct   *t_dqinfo;	/* acctg info for dquots */
> +	struct xfs_defer_ops	*t_agfl_dfops;	/* optional agfl fixup dfops */
>  	unsigned int		t_flags;	/* misc flags */
>  	int64_t			t_icount_delta;	/* superblock icount change */
>  	int64_t			t_ifree_delta;	/* superblock ifree change */
> diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
> index ab438647592a..f5620796ae25 100644
> --- a/fs/xfs/xfs_trans_extfree.c
> +++ b/fs/xfs/xfs_trans_extfree.c
> @@ -231,9 +231,79 @@ static const struct xfs_defer_op_type xfs_extent_free_defer_type = {
>  	.cancel_item	= xfs_extent_free_cancel_item,
>  };
>  
> +/*
> + * AGFL blocks are accounted differently in the reserve pools and are not
> + * inserted into the busy extent list.
> + */
> +STATIC int
> +xfs_agfl_free_finish_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_defer_ops		*dop,
> +	struct list_head		*item,
> +	void				*done_item,
> +	void				**state)
> +{
> +	struct xfs_mount		*mp = tp->t_mountp;
> +	struct xfs_efd_log_item		*efdp = done_item;
> +	struct xfs_extent_free_item	*free;
> +	struct xfs_extent		*extp;
> +	struct xfs_buf			*agbp;
> +	int				error;
> +	xfs_agnumber_t			agno;
> +	xfs_agblock_t			agbno;
> +	uint				next_extent;
> +
> +	free = container_of(item, struct xfs_extent_free_item, xefi_list);
> +	ASSERT(free->xefi_blockcount == 1);
> +	agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock);
> +	agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock);
> +
> +	trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, free->xefi_blockcount);
> +
> +	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
> +	if (!error)
> +		error = xfs_free_agfl_block(tp, agno, agbno, agbp,
> +					    &free->xefi_oinfo);
> +
> +	/*
> +	 * Mark the transaction dirty, even on error. This ensures the
> +	 * transaction is aborted, which:
> +	 *
> +	 * 1.) releases the EFI and frees the EFD
> +	 * 2.) shuts down the filesystem
> +	 */
> +	tp->t_flags |= XFS_TRANS_DIRTY;
> +	efdp->efd_item.li_desc->lid_flags |= XFS_LID_DIRTY;

FWIW I think this becomes:

	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);

after Dave's series to remove log item descriptors, afaict.
Will fix it up when I pull that into the branch.

--D

> +
> +	next_extent = efdp->efd_next_extent;
> +	ASSERT(next_extent < efdp->efd_format.efd_nextents);
> +	extp = &(efdp->efd_format.efd_extents[next_extent]);
> +	extp->ext_start = free->xefi_startblock;
> +	extp->ext_len = free->xefi_blockcount;
> +	efdp->efd_next_extent++;
> +
> +	kmem_free(free);
> +	return error;
> +}
> +
> +
> +/* sub-type with special handling for AGFL deferred frees */
> +static const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
> +	.type		= XFS_DEFER_OPS_TYPE_AGFL_FREE,
> +	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
> +	.diff_items	= xfs_extent_free_diff_items,
> +	.create_intent	= xfs_extent_free_create_intent,
> +	.abort_intent	= xfs_extent_free_abort_intent,
> +	.log_item	= xfs_extent_free_log_item,
> +	.create_done	= xfs_extent_free_create_done,
> +	.finish_item	= xfs_agfl_free_finish_item,
> +	.cancel_item	= xfs_extent_free_cancel_item,
> +};
> +
>  /* Register the deferred op type. */
>  void
>  xfs_extent_free_init_defer_op(void)
>  {
>  	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
> +	xfs_defer_init_op_type(&xfs_agfl_free_defer_type);
>  }
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux