Re: [PATCH 04/11] xfs: CoW fork operations should only update quota reservations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 25, 2018 at 08:01:24AM -0500, Brian Foster wrote:
> On Wed, Jan 24, 2018 at 11:14:25AM -0800, Darrick J. Wong wrote:
> > On Wed, Jan 24, 2018 at 09:22:16AM -0500, Brian Foster wrote:
> > > On Tue, Jan 23, 2018 at 06:18:23PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > 
> > > > Since the CoW fork only exists in memory, it is incorrect to update the
> > > > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > > > fork, even real extents in the CoW fork are only reservations (on-disk
> > > > they're owned by the refcountbt) so they must not be tracked in the on
> > > > disk quota info.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_bmap.c |  203 ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  fs/xfs/xfs_reflink.c     |    8 +-
> > > >  2 files changed, 196 insertions(+), 15 deletions(-)
> > > > 
> > > > 
> ...
> > > > + * - sb_fdblocks: Number of free blocks recorded in the superblock on disk.
> > > > + * - fdblocks: Number of free blocks recorded in the superblock minus any
> > > > + *           in-core reservations made in anticipation of future writes.
> > > > + *
> > > > + * - t_blk_res: Number of blocks reserved out of fdblocks for a transaction.
> > > > + *           When the transaction commits, t_blk_res - t_blk_res_used is given
> > > > + *           back to fdblocks.
> > > > + * - t_blk_res_used: Number of blocks used by this transaction that were
> > > > + *           reserved for this transaction.
> > > > + * - t_fdblocks_del: Number of blocks by which fdblocks and sb_fdblocks will
> > > > + *           have to decrease at commit.
> > > > + * - t_res_fdblocks_delta: Number of blocks by which sb_fdblocks will have to
> > > > + *           decrease at commit.  We assume that fdblocks was decreased
> > > > + *           prior to the transaction.
> > > > + *
> > > > + * Data fork block mappings have four logical states:
> > > > + *
> > > > + *    +--------> UNWRITTEN <------+
> > > > + *    |              ^            |
> > > > + *    |              v            v
> > > > + * DELALLOC <----> HOLE <------> REAL
> > > > + *    |                           ^
> > > > + *    |                           |
> > > > + *    +---------------------------+
> > > > + *
> > > 
> > > I'm not sure we need a graphic for the extent states. Non-hole
> > > conversions to delalloc is the only transition that doesn't make any
> > > sense.
> > 
> > First of all, Dave keeps asking for ASCII art 'when appropriate'. :)
> > 
> 
> I'm not against ASCII art in general...
> 
> > But on a more serious note, I thought the state diagram would be useful
> > for anyone who isn't so familiar with how blocks get mapped into files
> > in xfs, particularly to compare the data fork diagram against the
> > corresponding picture for the cow fork.
> > 
> 
> ... but TBH I didn't really notice they were different until you pointed
> it out. :/ It still seems like a rather verbose means to point out that
> COW fork apparently doesn't have the DELALLOC -> REAL transition.

It also doesn't have REAL -> UNWRITTEN transitions either.  Clearly the
diagrams aren't helping, point made. ;)

Heh, it also has HOLE -> REAL transitions even though the diagram does
not so indicate, so ... yeah.

> ...
> > 
> > > I'm also wondering how much of the whole picture really needs to be
> > > described here to cover quota accounting with respect to block mapping
> > > (sufficiently to differentiate data fork from cow fork, which I take is
> > > the purpose of this section). For example, do we really need the
> > > internal transaction details for how quota deltas are carried?
> > > 
> > > Instead, I think it might be sufficient to explain that the quota system
> > > works in two "levels" (for lack of a better term :/), one for
> > > reservation and another for real block usage. The reservation is
> > > associated with transaction reservation and/or delayed block allocation
> > > (no tx). In either case, quota reservation is converted to real quota
> > > block usage when a transaction commits that maps real/physical blocks.
> > > If the transaction held extra reservation that went unused, that quota
> > > reservation is released. The primary difference is that transactions to
> > > convert delalloc -> real do not reserve quota blocks in the first place,
> > > since that has already occurred, so they just need to make sure to
> > > convert/persist the quota res for the blocks that were converted to
> > > real.
> > 
> > I thought about cutting this whole comment down to a simple sentence
> > about how quota accounting is different between the data & cow forks (as
> > you figured out, we use delalloc quota reservations for everything in
> > the cow fork and only turn them into real ones when we go to remap) but
> > then worried that doing so would presuppose the reader knew anything
> > about how the extent lifecycles work... and that's how I end up with a
> > gigantic manual.
> > 
> 
> Which is probably fine for xfs-docs or something..
> 
> The more I think about it, the more I think this whole thing is better
> off focused on explaining the unique quota management of COW fork
> blocks. We can add additional comments about extent lifecycles, how
> quota is tracked in general, etc., but perhaps it's best to make those a
> separate patch rather than attempt to document a ground-up "COW fork
> quotas for dummies" in a single comment. :P Anyways, I'll reserve
> further judgement for the new patch..

Yeah.  I think I'll go update ch3 ("Sharing Data Blocks") in the design
document instead of dumping everything into a code comment that seems
somewhat misplaced.

Onto the next reply! :)

--D

> Brian
> 
> > > > + * Copy on Write Fork Mapping Lifecycle
> > > > + *
> > > > + * The CoW fork handles things differently from the data fork because its
> > > > + * mappings only exist in memory-- the refcount btree is the on-disk owner of
> > > > + * the extents until they're remapped into the data fork.  Therefore,
> > > > + * unwritten and real extents in the CoW fork are treated the same way as
> > > > + * delayed allocation extents.  Quota and fdblock changes only exist in
> > > > + * memory, which requires some twists in the bmap functions.
> > > > + *
> > > 
> > > Ok, but perhaps this should point out what happens when cow blocks are
> > > reserved with respect to quotas..? IIUC, a delalloc quota reservation
> > > occurs just the same as above, the blocks simply reside in another fork.
> > 
> > Yes.
> > 
> > > > + * The CoW fork extent state diagram looks like this:
> > > > + *
> > > > + *    +--------> UNWRITTEN -------+
> > > > + *    |              ^            |
> > > > + *    |              v            v
> > > > + * DELALLOC <----> HOLE <------- REAL
> > > > + *
> > > > + * Holes are still holes.  Delayed allocation extents reserve blocks for
> > > > + * landing future writes, just like they do in the data fork.  However, unlike
> > > > + * the data fork, unwritten extents signal an extent that has been allocated
> > > > + * but is not currently undergoing writeback.  Real extents are undergoing
> > > > + * writeback, and when that writeback finishes the corresponding data fork
> > > > + * extent will be punched out and the CoW fork counterpart moved to the new
> > > > + * hole in the data fork.
> > > > + *
> > > 
> > > Ok, so the difference is that for the COW fork, the _extent state_
> > > conversion is not the appropriate trigger event to convert quota
> > > reservation to real quota usage. Instead, the blocks being remapped from
> > > the COW fork to the data fork is when that should occur.
> > 
> > Yes.
> > 
> > > > + * The state transitions and required metadata updates are as follows:
> > > > + *
> > > > + * - HOLE to DELALLOC: Increase i_cow_blocks and q_res_bcount, and decrease
> > > > + *           fdblocks.
> > > > + * - HOLE to UNWRITTEN: Same as above, but since we reserved quota via
> > > > + *           qt_blk_res (which increased q_res_bcount) when we allocate the
> > > > + *           extent we have to decrease qt_blk_res so that the commit doesn't
> > > > + *           give the allocated CoW blocks back.
> > > > + *
> > > 
> > > Hmm, this is a little confusing. Looking at the code change and comment
> > > below, I think I get what this is trying to do, which is essentially
> > > make a real block cow fork alloc behave like a delalloc reservation
> > > (with respect to quota). FWIW, I think what confuses me is the assertion
> > > that the blocks would be "given back" otherwise. The only reference I
> > > have to compare is data fork alloc behavior, which implies that used
> > > reservation would not be given back, but rather converted to real quota
> > > usage on block allocation (and excess reservation would still be given
> > > back, which afaict we still want to happen). So the trickery is required
> > > to prevent conversion of quota reservation for the allocated cow blocks,
> > > let that res sit around until the cow blocks are remapped, and release
> > > unused reservation from the tx as normal. Am I following that correctly?
> > 
> > Yes.
> > 
> > > > + * - DELALLOC to UNWRITTEN: No change.
> > > > + * - DELALLOC to HOLE: Decrease i_cow_blocks and q_res_bcount, and increase
> > > > + *           fdblocks.
> > > > + *
> > > > + * - UNWRITTEN to HOLE: Same as DELALLOC to HOLE.
> > > > + * - UNWRITTEN to REAL: No change.
> > > > + *
> > > > + * - REAL to HOLE: This transition happens when we've finished a write
> > > > + *           operation and need to move the mapping to the data fork.  We
> > > > + *           punch the correspond data fork mappings, which decreases
> > > > + *           qt_bcount.  Then we map the CoW fork mapping into the hole we
> > > > + *           just cleared out of the data fork, which increases qt_bcount.
> > > > + *           There's a subtlety here -- if we promoted a write over a hole to
> > > > + *           CoW, there will be a net increase in qt_bcount, which is fine
> > > > + *           because we already reserved the quota when we filled the CoW
> > > > + *           fork.  Finally, we punch the CoW fork mapping, which decreases
> > > > + *           q_res_bcount.
> > > > + *
> > > > + * Notice how all CoW fork extents use transactionless quota reservations and
> > > > + * the in-core fdblocks to maintain state, and we avoid updating any on-disk
> > > > + * metadata.  This is essential to maintain metadata correctness if the system
> > > > + * goes down.
> > > > + */
> > > >  
> > > >  kmem_zone_t		*xfs_bmap_free_item_zone;
> > > >  
> > > > @@ -3337,6 +3476,39 @@ xfs_bmap_btalloc_filestreams(
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +/* Deal with CoW fork accounting when we allocate a block. */
> > > > +static void
> > > > +xfs_bmap_btalloc_cow(
> > > > +	struct xfs_bmalloca	*ap,
> > > > +	struct xfs_alloc_arg	*args)
> > > > +{
> > > > +	/* Filling a previously reserved extent; nothing to do here. */
> > > > +	if (ap->wasdel)
> > > > +		return;
> > > > +
> > > > +	/*
> > > > +	 * The CoW fork only exists in memory, so the on-disk quota accounting
> > > > +	 * must not incude any CoW fork extents.  Therefore, CoW blocks are
> > > > +	 * only tracked in the in-core dquot block count (q_res_bcount).
> > > > +	 *
> > > > +	 * If we get here, we're filling a CoW hole with a real (non-delalloc)
> > > > +	 * CoW extent having reserved enough blocks from both q_res_bcount and
> > > > +	 * qt_blk_res to guarantee that we won't run out of space.  The unused
> > > > +	 * qt_blk_res is given back to q_res_bcount when the transaction
> > > > +	 * commits.
> > > > +	 *
> > > > +	 * We don't want the quota accounting for our newly allocated blocks
> > > > +	 * to be given back, so we must decrease qt_blk_res without decreasing
> > > > +	 * q_res_bcount.
> > > > +	 *
> > > > +	 * Note: If we're allocating a delalloc extent, we already reserved
> > > > +	 * the q_res_bcount blocks, so no quota accounting update is needed
> > > > +	 * here.
> > > > +	 */
> > > > +	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> > > > +			-(long)args->len);
> > > > +}
> > > 
> > > Factoring nit.. if we're going to refactor bits of xfs_bmap_btalloc()
> > > out, it might be cleaner to factor out all of the quota logic rather
> > > than just the cow bits (which is basically just a simple check and
> > > function call). E.g., refactor into an xfs_bmap_btalloc_quota() helper
> > > that does the right thing based on the fork, with comments as to why,
> > > etc. (and perhaps just leave the unrelated di_nblocks change behind).
> > 
> > I thought about factoring the data/attr fork stuff into its own
> > xfs_bmap_btalloc_quota() function too, since this function is already
> > eyewateringly long.  I think I'll do that as a separate refactor at the
> > end of the series, though...
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > +
> > > >  STATIC int
> > > >  xfs_bmap_btalloc(
> > > >  	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
> > > > @@ -3571,19 +3743,22 @@ xfs_bmap_btalloc(
> > > >  			*ap->firstblock = args.fsbno;
> > > >  		ASSERT(nullfb || fb_agno <= args.agno);
> > > >  		ap->length = args.len;
> > > > -		if (!(ap->flags & XFS_BMAPI_COWFORK))
> > > > -			ap->ip->i_d.di_nblocks += args.len;
> > > > -		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> > > >  		if (ap->wasdel)
> > > >  			ap->ip->i_delayed_blks -= args.len;
> > > > -		/*
> > > > -		 * Adjust the disk quota also. This was reserved
> > > > -		 * earlier.
> > > > -		 */
> > > > -		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> > > > -			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> > > > -					XFS_TRANS_DQ_BCOUNT,
> > > > -			(long) args.len);
> > > > +		if (ap->flags & XFS_BMAPI_COWFORK) {
> > > > +			xfs_bmap_btalloc_cow(ap, &args);
> > > > +		} else {
> > > > +			ap->ip->i_d.di_nblocks += args.len;
> > > > +			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> > > > +			/*
> > > > +			 * Adjust the disk quota also. This was reserved
> > > > +			 * earlier.
> > > > +			 */
> > > > +			xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> > > > +				ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> > > > +						XFS_TRANS_DQ_BCOUNT,
> > > > +				(long) args.len);
> > > > +		}
> > > >  	} else {
> > > >  		ap->blkno = NULLFSBLOCK;
> > > >  		ap->length = 0;
> > > > @@ -4776,6 +4951,7 @@ xfs_bmap_del_extent_cow(
> > > >  	struct xfs_bmbt_irec	new;
> > > >  	xfs_fileoff_t		del_endoff, got_endoff;
> > > >  	int			state = BMAP_COWFORK;
> > > > +	int			error;
> > > >  
> > > >  	XFS_STATS_INC(mp, xs_del_exlist);
> > > >  
> > > > @@ -4832,6 +5008,11 @@ xfs_bmap_del_extent_cow(
> > > >  		xfs_iext_insert(ip, icur, &new, state);
> > > >  		break;
> > > >  	}
> > > > +
> > > > +	/* Remove the quota reservation */
> > > > +	error = xfs_trans_reserve_quota_nblks(NULL, ip,
> > > > +			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
> > > > +	ASSERT(error == 0);
> > > >  }
> > > >  
> > > >  /*
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index 82abff6..e367351 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> > > >  					del.br_startblock, del.br_blockcount,
> > > >  					NULL);
> > > >  
> > > > -			/* Update quota accounting */
> > > > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > -					-(long)del.br_blockcount);
> > > > -
> > > >  			/* Roll the transaction */
> > > >  			xfs_defer_ijoin(&dfops, ip);
> > > >  			error = xfs_defer_finish(tpp, &dfops);
> > > > @@ -795,6 +791,10 @@ xfs_reflink_end_cow(
> > > >  		if (error)
> > > >  			goto out_defer;
> > > >  
> > > > +		/* Charge this new data fork mapping to the on-disk quota. */
> > > > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > +				(long)del.br_blockcount);
> > > > +
> > > >  		/* Remove the mapping from the CoW fork. */
> > > >  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > > >  
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux