Re: [PATCH 031/119] xfs: rmap btree requires more reserved free space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 16, 2016 at 06:21:11PM -0700, Darrick J. Wong wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> The rmap btree is allocated from the AGFL, which means we have to
> ensure ENOSPC is reported to userspace before we run out of free
> space in each AG. The last allocation in an AG can cause a full
> height rmap btree split, and that means we have to reserve at least
> this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> Update the various space calculation functiosn to handle this.

				       functions

> 
> Also, because the macros are now executing conditional code and are called quite
> frequently, convert them to functions that initialise varaibles in the struct
> xfs_mount, use the new variables everywhere and document the calculations
> better.
> 
> v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
> for the rmapbt (due to the + 1); the entire clause needs to be gated
> on the feature flag.
> 
> v3: Use m_rmap_maxlevels to determine min_free.
> 
> [darrick.wong@xxxxxxxxxx: don't reserve blocks if !rmap]
> [dchinner@xxxxxxxxxx: update m_ag_max_usable after growfs]
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |   71 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_alloc.h |   41 +++-----------------------
>  fs/xfs/libxfs/xfs_bmap.c  |    2 +
>  fs/xfs/libxfs/xfs_sb.c    |    2 +
>  fs/xfs/xfs_discard.c      |    2 +
>  fs/xfs/xfs_fsops.c        |    5 ++-
>  fs/xfs/xfs_log_recover.c  |    1 +
>  fs/xfs/xfs_mount.c        |    2 +
>  fs/xfs/xfs_mount.h        |    2 +
>  fs/xfs/xfs_super.c        |    2 +
>  10 files changed, 88 insertions(+), 42 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 570ca17..4c8ffd4 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -63,6 +63,72 @@ xfs_prealloc_blocks(
>  }
>  
>  /*
> + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> + * AGF buffer (PV 947395), we place constraints on the relationship among
> + * actual allocations for data blocks, freelist blocks, and potential file data
> + * bmap btree blocks. However, these restrictions may result in no actual space
> + * allocated for a delayed extent, for example, a data block in a certain AG is
> + * allocated but there is no additional block for the additional bmap btree
> + * block due to a split of the bmap btree of the file. The result of this may
> + * lead to an infinite loop when the file gets flushed to disk and all delayed
> + * extents need to be actually allocated. To get around this, we explicitly set
> + * aside a few blocks which will not be reserved in delayed allocation.
> + *
> + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> + * rmap btrees.
> + *

That's a bit wordy.

> + * When rmap btrees are active, we have to consider that using the last block
> + * in the AG can cause a full height rmap btree split and we need enough blocks
> + * on the AGFL to be able to handle this. That means we have, in addition to
> + * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
> + * to be available to the free list.

I'm probably missing something, but why does a full tree split require 2
blocks per-level (minus 1)? Wouldn't that involve an allocated block per
level (and possibly a new root block)?

Otherwise, the rest looks good to me.

Brian

> + */
> +unsigned int
> +xfs_alloc_set_aside(
> +	struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return blocks;
> +	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
> +}
> +
> +/*
> + * When deciding how much space to allocate out of an AG, we limit the
> + * allocation maximum size to the size the AG. However, we cannot use all the
> + * blocks in the AG - some are permanently used by metadata. These
> + * blocks are generally:
> + *	- the AG superblock, AGF, AGI and AGFL
> + *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
> + *	  the AGI free inode and rmap btree root blocks.
> + *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
> + *
> + * The AG headers are sector sized, so the amount of space they take up is
> + * dependent on filesystem geometry. The others are all single blocks.
> + */
> +unsigned int
> +xfs_alloc_ag_max_usable(struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
> +	blocks += XFS_ALLOC_AGFL_RESERVE;
> +	blocks += 3;			/* AGF, AGI btree root blocks */
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		blocks++;		/* finobt root block */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		/* rmap root block + full tree split on full AG */
> +		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
> +	}
> +
> +	return mp->m_sb.sb_agblocks - blocks;
> +}
> +
> +/*
>   * Lookup the record equal to [bno, len] in the btree given by cur.
>   */
>  STATIC int				/* error */
> @@ -1904,6 +1970,11 @@ xfs_alloc_min_freelist(
>  	/* space needed by-size freespace btree */
>  	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
>  				       mp->m_ag_maxlevels);
> +	/* space needed reverse mapping used space btree */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		min_free += min_t(unsigned int,
> +				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> +				  mp->m_rmap_maxlevels);
>  
>  	return min_free;
>  }
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 0721a48..7b6c66b 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
>  #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
>  
>  /*
> - * In order to avoid ENOSPC-related deadlock caused by
> - * out-of-order locking of AGF buffer (PV 947395), we place
> - * constraints on the relationship among actual allocations for
> - * data blocks, freelist blocks, and potential file data bmap
> - * btree blocks. However, these restrictions may result in no
> - * actual space allocated for a delayed extent, for example, a data
> - * block in a certain AG is allocated but there is no additional
> - * block for the additional bmap btree block due to a split of the
> - * bmap btree of the file. The result of this may lead to an
> - * infinite loop in xfssyncd when the file gets flushed to disk and
> - * all delayed extents need to be actually allocated. To get around
> - * this, we explicitly set aside a few blocks which will not be
> - * reserved in delayed allocation. Considering the minimum number of
> - * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> - * btree requires 1 fsb, so we set the number of set-aside blocks
> - * to 4 + 4*agcount.
> - */
> -#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
> -
> -/*
> - * When deciding how much space to allocate out of an AG, we limit the
> - * allocation maximum size to the size the AG. However, we cannot use all the
> - * blocks in the AG - some are permanently used by metadata. These
> - * blocks are generally:
> - *	- the AG superblock, AGF, AGI and AGFL
> - *	- the AGF (bno and cnt) and AGI btree root blocks
> - *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> - *
> - * The AG headers are sector sized, so the amount of space they take up is
> - * dependent on filesystem geometry. The others are all single blocks.
> - */
> -#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> -	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
> -
> -
> -/*
>   * Argument structure for xfs_alloc routines.
>   * This is turned into a structure to avoid having 20 arguments passed
>   * down several levels of the stack.
> @@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
>  #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
>  #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
>  
> +/* freespace limit calculations */
> +#define XFS_ALLOC_AGFL_RESERVE	4
> +unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> +unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
> +
>  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
>  		struct xfs_perag *pag, xfs_extlen_t need);
>  unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 2c28f2a..61c0231 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3672,7 +3672,7 @@ xfs_bmap_btalloc(
>  	args.fsbno = ap->blkno;
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
> -	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
> +	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
>  	args.firstblock = *ap->firstblock;
>  	blen = 0;
>  	if (nullfb) {
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f86226b..59c9f59 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -749,6 +749,8 @@ xfs_sb_mount_common(
>  		mp->m_ialloc_min_blks = sbp->sb_spino_align;
>  	else
>  		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> +	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index 272c3f8..4ff499a 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -179,7 +179,7 @@ xfs_ioc_trim(
>  	 * matter as trimming blocks is an advisory interface.
>  	 */
>  	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
> -	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
> +	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
>  	    range.len < mp->m_sb.sb_blocksize)
>  		return -EINVAL;
>  
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8a85e49..3772f6c 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -583,6 +583,7 @@ xfs_growfs_data_private(
>  	} else
>  		mp->m_maxicount = 0;
>  	xfs_set_low_space_thresholds(mp);
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>  
>  	/* update secondary superblocks. */
>  	for (agno = 1; agno < nagcount; agno++) {
> @@ -720,7 +721,7 @@ xfs_fs_counts(
>  	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
>  	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
>  	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  
>  	spin_lock(&mp->m_sb_lock);
>  	cnt->freertx = mp->m_sb.sb_frextents;
> @@ -793,7 +794,7 @@ retry:
>  		__int64_t	free;
>  
>  		free = percpu_counter_sum(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  		if (!free)
>  			goto out; /* ENOSPC and fdblks_delta = 0 */
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 0c41bd2..b33187b 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -5027,6 +5027,7 @@ xlog_do_recover(
>  		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
>  		return error;
>  	}
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>  
>  	xlog_recover_check_summary(log);
>  
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 8af1c88..879f3ef 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1219,7 +1219,7 @@ xfs_mod_fdblocks(
>  		batch = XFS_FDBLOCKS_BATCH;
>  
>  	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
> -	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> +	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
>  				     XFS_FDBLOCKS_BATCH) >= 0) {
>  		/* we had space! */
>  		return 0;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0ed0f29..b36676c 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -123,6 +123,8 @@ typedef struct xfs_mount {
>  	uint			m_in_maxlevels;	/* max inobt btree levels. */
>  	uint			m_rmap_maxlevels; /* max rmap btree levels */
>  	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
> +	uint			m_alloc_set_aside; /* space we can't use */
> +	uint			m_ag_max_usable; /* max space per AG */
>  	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
>  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
>  	struct mutex		m_growlock;	/* growfs mutex */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index bf63f6d..1575849 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1076,7 +1076,7 @@ xfs_fs_statfs(
>  	statp->f_blocks = sbp->sb_dblocks - lsize;
>  	spin_unlock(&mp->m_sb_lock);
>  
> -	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
> +	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
>  	statp->f_bavail = statp->f_bfree;
>  
>  	fakeinos = statp->f_bfree << sbp->sb_inopblog;
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux