Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> > Hi Brian,
> > 
> > On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
> > > On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> > > > As the first step of shrinking, this attempts to enable shrinking
> > > > unused space in the last allocation group by fixing up freespace
> > > > btree, agi, agf and adjusting super block and use a helper
> > > > xfs_ag_shrink_space() to fixup the last AG.
> > > > 
> > > > This can be all done in one transaction for now, so I think no
> > > > additional protection is needed.
> > > > 
> > > > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
> > > >  fs/xfs/xfs_trans.c |  1 -
> > > >  2 files changed, 42 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > > index 6c4ab5e31054..4bcea22f7b3f 100644
> > > > --- a/fs/xfs/xfs_fsops.c
> > > > +++ b/fs/xfs/xfs_fsops.c
> > > > @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
> > > >  	struct aghdr_init_data	*id,
> > > >  	xfs_agnumber_t		oagcount,
> > > >  	xfs_agnumber_t		nagcount,
> > > > -	xfs_rfsblock_t		*delta)
> > > > +	int64_t			*delta)
> > > >  {
> > > >  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> > > >  	int			error;
> > > > @@ -76,33 +76,41 @@ xfs_growfs_data_private(
> > > >  	xfs_agnumber_t		nagcount;
> > > >  	xfs_agnumber_t		nagimax = 0;
> > > >  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> > > > -	xfs_rfsblock_t		delta;
> > > > +	int64_t			delta;
> > > >  	xfs_agnumber_t		oagcount;
> > > >  	struct xfs_trans	*tp;
> > > > +	bool			extend;
> > > >  	struct aghdr_init_data	id = {};
> > > >  
> > > >  	nb = in->newblocks;
> > > > -	if (nb < mp->m_sb.sb_dblocks)
> > > > -		return -EINVAL;
> > > > -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> > > > +	if (nb == mp->m_sb.sb_dblocks)
> > > > +		return 0;
> > > > +
> > > > +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> > > > +	if (error)
> > > >  		return error;
> > > > -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > > +
> > > > +	if (nb > mp->m_sb.sb_dblocks) {
> > > > +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > >  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
> > > >  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> > > > -	if (error)
> > > > -		return error;
> > > > -	xfs_buf_relse(bp);
> > > > +		if (error)
> > > > +			return error;
> > > > +		xfs_buf_relse(bp);
> > > > +	}
> > > >  
> > > >  	nb_div = nb;
> > > >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> > > >  	nagcount = nb_div + (nb_mod != 0);
> > > >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> > > >  		nagcount--;
> > > > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > > -		if (nb < mp->m_sb.sb_dblocks)
> > > > +		if (nagcount < 2)
> > > >  			return -EINVAL;
> > > 
> > > What's the reason for the nagcount < 2 check? IIRC we warn about this
> > > configuration at mkfs time, but allow it to proceed. Is it just that we
> > > don't want to accidentally put the fs into an agcount == 1 state that
> > > was originally formatted with >1 AGs?
> > 
> > Darrick once asked for avoiding shrinking the filesystem which has
> > only 1 AG.
> 
> It's worth mentioning why in a comment though:
> 
> 	/*
> 	 * XFS doesn't really support single-AG filesystems, so do not
> 	 * permit callers to remove the filesystem's second and last AG.
> 	 */
> 	if (shrink && new_agcount < 2)
> 		return -EHAHANOYOUDONT;
> 
> But as Brian points out, we /do/ allow adding a second AG to a single-AG
> fs.
> 
> > > 
> > > What about the case where we attempt to grow an agcount == 1 fs but
> > > don't enlarge enough to add the second AG? Does this change error
> > > behavior in that case?
> > 
> > Yeah, thanks for catching this! If growfs allows 1 AG case before,
> > I think it needs to be refined. Let me update this in the next version!
> > 
> > > 
> > > > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > >  	}
> > > > +
> > > >  	delta = nb - mp->m_sb.sb_dblocks;
> > > > +	extend = (delta > 0);
> > > >  	oagcount = mp->m_sb.sb_agcount;
> > > >  
> > > >  	/* allocate the new per-ag structures */
> > > > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> > > >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> > > >  		if (error)
> > > >  			return error;
> > > > +	} else if (nagcount < oagcount) {
> > > > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > > > +		return -EINVAL;
> > > >  	}
> > > >  
> > > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > > > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > > > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > > > +			XFS_TRANS_RESERVE, &tp);
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > > > -	if (error)
> > > > -		goto out_trans_cancel;
> > > > -
> > > > +	if (extend) {
> > > > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > > > +						  nagcount, &delta);
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > > +	}
> > > >  	xfs_trans_agblocks_delta(tp, id.nfree);
> > > 
> > > It looks like id isn't used until the resize call above. Is this call
> > > relevant for the shrink case?
> > 
> > I think it has nothing to do for the shrink the last AG case as well
> > (id.nfree == 0 here) but maybe use for the later shrinking the whole
> > AGs patchset. I can move into if (extend) in the next version.
> > 
> > > 
> > > >  
> > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > +	/* If there are some blocks in the last AG, resize it. */
> > > >  	if (delta) {
> > > 
> > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > of the function. Should we ever get to this point with delta == 0? (If
> > > not, maybe convert it to an assert just to be safe.)
> > 
> > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > growfs design is that, I don't want to touch the original logic). that
> > is why `delta' reflects the last AG delta now...
> 
> I've never liked how the meaning of "delta" changes through the
> function, and it clearly trips up reviewers.  This variable isn't the
> delta between the old dblocks and the new dblocks, it's really a
> resizefs cursor that tells us how much work we still have to do.
> 
> > > 
> > > > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		if (extend) {
> > > > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		} else {
> > > > +			id.agno = nagcount - 1;
> > > > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> > > 
> > > xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> > > pass in agno for now..?
> > 
> > Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
> > it seems more natural to pass in &id rather than agno (since
> > id.agno = nagcount - 1 will be stayed in some new helper
> > e.g. xfs_shrink_ags())
> 
> @id is struct aghdr_init_data, but shrinking shouldn't initialize any AG
> headers.  Are you planning to make use of it in shrink, either now or
> later on?
> 
> > 
> > > 
> > > > +		}
> > > > +
> > > >  		if (error)
> > > >  			goto out_trans_cancel;
> > > >  	}
> > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > > >  	 */
> > > >  	if (nagcount > oagcount)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > > -	if (nb > mp->m_sb.sb_dblocks)
> > > > +	if (nb != mp->m_sb.sb_dblocks)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > >  				 nb - mp->m_sb.sb_dblocks);
> > > 
> > > Maybe use delta here?
> > 
> > The reason is the same as above, `delta' here was changed due to 
> > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> > anymore. so `extend` boolean is used (rather than just use delta > 0)
> 
> Long question:
> 
> The reason why we use (nb - dblocks) is because growfs is an all or
> nothing operation -- either we succeed in writing new empty AGs and
> inflating the (former) last AG of the fs, or we don't do anything at
> all.  We don't allow partial growing; if we did, then delta would be
> relevant here.  I think we get away with not needing to run transactions
> for each AG because those new AGs are inaccessible until we commit the
> new agcount/dblocks, right?
> 
> In your design for the fs shrinker, do you anticipate being able to
> eliminate all the eligible AGs in a single transaction?  Or do you
> envision only tackling one AG at a time?  And can we be partially
> successful with a shrink?  e.g. we succeed at eliminating the last AG,
> but then the one before that isn't empty and so we bail out, but by that
> point we did actually make the fs a little bit smaller.
> 
> There's this comment at the bottom of xfs_growfs_data() that says that
> we can return error codes if the secondary sb update fails, even if the
> new size is already live.  This convinces me that it's always been the
> case that callers of the growfs ioctl are supposed to re-query the fs
> geometry afterwards to find out if the fs size changed, even if the
> ioctl itself returns an error... which implies that partial grow/shrink
> are a possibility.

And of course I got so buried in building up to my long question that I
forgot to ask it:

If the design of the shrinker requires incremental shrinking, should we
support incremental growfs too?

--D

> > 
> > > 
> > > >  	if (id.nfree)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > > >  
> > > 
> > > id.nfree tracks newly added free space in the growfs space. Is it not
> > > used in the shrink case because the allocation handles this for us?
> > 
> > Yeah, I'm afraid so. This is some common code, and also used in my
> > shrinking the whole AGs patchset.
> > 
> > > 
> > > >  	/*
> > > > -	 * update in-core counters now to reflect the real numbers
> > > > -	 * (especially sb_fdblocks)
> > > > +	 * update in-core counters now to reflect the real numbers (especially
> > > > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> > > >  	 */
> > > >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> > > >  		xfs_log_sb(tp);
> > > > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> > > >  	 * If we expanded the last AG, free the per-AG reservation
> > > >  	 * so we can reinitialize it with the new size.
> > > >  	 */
> > > > -	if (delta) {
> > > > +	if (extend && delta) {
> > > >  		struct xfs_perag	*pag;
> > > >  
> > > >  		pag = xfs_perag_get(mp, id.agno);
> > > 
> > > We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> > > this function. xfs_ag_shrink_space() from the previous patch is intended
> > > to deal with perag reservation changes for shrink, but it looks like the
> > > reserve call further down could potentially reset mp->m_finobt_nores to
> > > false if it previously might have been set to true.
> > 
> > Yeah, if my understanding is correct, I might need to call
> > xfs_fs_reserve_ag_blocks() only for growfs case as well for
> > mp->m_finobt_nores = true case.
> 
> I suppose it's worth trying in the finobt_nores==true case. :)
> 
> --D
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > Brian
> > > 
> > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > > index e72730f85af1..fd2cbf414b80 100644
> > > > --- a/fs/xfs/xfs_trans.c
> > > > +++ b/fs/xfs/xfs_trans.c
> > > > @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
> > > >  		tp->t_res_frextents_delta += delta;
> > > >  		break;
> > > >  	case XFS_TRANS_SB_DBLOCKS:
> > > > -		ASSERT(delta > 0);
> > > >  		tp->t_dblocks_delta += delta;
> > > >  		break;
> > > >  	case XFS_TRANS_SB_AGCOUNT:
> > > > -- 
> > > > 2.27.0
> > > > 
> > > 
> > 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux