Re: [PATCH v2 04/11] xfs: CoW fork operations should only update quota reservations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 24, 2018 at 05:20:35PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> Since the CoW fork only exists in memory, it is incorrect to update the
> on-disk quota block counts when we modify the CoW fork.  Unlike the data
> fork, even real extents in the CoW fork are only reservations (on-disk
> they're owned by the refcountbt) so they must not be tracked in the on
> disk quota info.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> ---
> v2: make documentation more crisp and to the point
> ---
>  fs/xfs/libxfs/xfs_bmap.c |  118 ++++++++++++++++++++++++++++++++++++++++++----
>  fs/xfs/xfs_quota.h       |   14 ++++-
>  fs/xfs/xfs_reflink.c     |    8 ++-
>  3 files changed, 122 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0c9c9cd..7f0ac40 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -52,6 +52,71 @@
>  #include "xfs_refcount.h"
>  #include "xfs_icache.h"
>  
> +/*
> + * Data/Attribute Fork Mapping Lifecycle
> + *
> + * The data fork contains the block mappings between logical blocks in a file
> + * and physical blocks on the disk.  The XFS notions of delayed allocation
> + * reservations, unwritten extents, and real extents follow well known
> + * conventions in the filesystem world.
> + *
> + * Data fork extent states follow these transitions:
> + *
> + *    +--------> UNWRITTEN <------+
> + *    |              ^            |
> + *    |              v            v
> + * DELALLOC <----> HOLE <------> REAL
> + *    |                           ^
> + *    |                           |
> + *    +---------------------------+
> + *
> + * Every delayed allocation reserves in-memory quota blocks (q_res_bcount) and
> + * in-memory fs free blocks (fdblocks), and increases the in-memory per-inode
> + * i_delayed_blks.  The reservation includes potentially required bmbt blocks.
> + *

So we have some bits about extent states..

> + * Every transaction reserves quota blocks (qt_blk_res) from the in-memory
> + * quota blocks and free blocks (t_blk_res) from the in-memory fs free blocks.
> + * The transaction tracks both the number of blocks used from its own
> + * reservation as well as the number of blocks used that came from a delayed
> + * allocation.  When the transaction commits, it gives back the unused parts
> + * of its own block reservations.  Next, it adds any block usage that came
> + * from a delayed allocation to the on-disk counters without changing the
> + * in-memory reservations (q_res_bcount and fdblocks).
> + *

Some bits about quota, transactions..

> + * To convert a delayed allocation to a real or unwritten extent, we use a
> + * transaction to allocate the blocks.  At commit time, the block reservations
> + * are given back or added to the on-disk counters as noted above.
> + * i_delayed_blks is decreased while the on-disk per-inode di_nblocks is
> + * increased.
> + *
> + * The attribute fork works in the same way as the data fork except that the
> + * only valid states are REAL and HOLE.
> + *
> + * Copy on Write Fork Mapping Lifecycle
> + *
> + * The CoW fork exists only in memory and is used to stage copy writes for
> + * file data and has fewer transitions:
> + *
> + *    +--------> UNWRITTEN -------+
> + *    |              ^            |
> + *    |              v            v
> + * DELALLOC <----> HOLE <------- REAL
> + *
> + * Delayed allocation extents here are treated the same as in the data fork
> + * except that they are counted by the per-inode i_cow_blocks instead of
> + * i_delayed_blks.
> + *

COW fork extent state..

> + * Unwritten and real extents are counted by the quota code as block
> + * reservations (q_res_bcount) and not on-disk quota blocks (d_bcount), and
> + * are counted by the free block counters as in-memory reservations (fdblocks)
> + * and not on-disk free blocks (sb_fdblocks).  These blocks are also counted
> + * by i_cow_blocks and not the on-disk di_nblocks.
> + *
> + * When a CoW fork extent is remapped to the data fork, the reservations are
> + * converted into on-disk counts in the same manner as a delayed allocation
> + * conversion in the data fork.  The number of blocks being remapped is
> + * subtracted from i_cow_blocks and added to di_nblocks.
> + */
>  

COW fork quota quirkiness..

In all, I'd probably suggest to split this whole thing up into maybe 3
independent comments. If we don't have anything that explains extent
states properly, perhaps do that in the headers where we define the
extent state definitions (and then explain how they are used differently
between forks). If we don't have sufficient explanation of how quota
reservation lifecycle works, do the same in the headers that define the
data structures that map transaction deltas to actual quota accounting.

Finally, this patch already does some refactoring below to deal with the
COW fork quota accounting quirkiness, so we can use the comment in that
function to explain exactly what/why is going on here. That's just my
.02 on the whole comment.

>  kmem_zone_t		*xfs_bmap_free_item_zone;
>  
> @@ -3337,6 +3402,28 @@ xfs_bmap_btalloc_filestreams(
>  	return 0;
>  }
>  
> +/* Deal with CoW fork accounting when we allocate a block. */
> +static void
> +xfs_bmap_btalloc_quota_cow(
> +	struct xfs_bmalloca	*ap,
> +	struct xfs_alloc_arg	*args)
> +{
> +	/* Filling a previously reserved extent; nothing to do here. */
> +	if (ap->wasdel)
> +		return;
> +
> +	/*
> +	 * If we get here, we're filling a CoW hole with a real (non-delalloc)
> +	 * CoW extent having reserved enough blocks from both q_res_bcount and
> +	 * qt_blk_res to guarantee that we won't run out of space.  The unused
> +	 * qt_blk_res is given back to q_res_bcount when the transaction
> +	 * commits, so we must decrease qt_blk_res without decreasing
> +	 * q_res_bcount.
> +	 */
> +	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> +			-(long)args->len);
> +}
> +

Not sure we were on the same page wrt to my previous comment here.. I
think this should look something like:

/*
 * Update quota accounting for physical allocation. Depends on extent
 * state, target fork, etc.
 */
static void
xfs_bmap_btalloc_dquot(...)
{
	/*
	 * COW fork requires quota accounting magic words words words ..
	 */
	if (COWFORK) {
		/* do not account wasdel extents as used because ... */
		if (ap->wasdel)
			return;
		/*
		 * quota reservation exists in transaction, do magic to
		 * cause tx to leave a delalloc-like reservation after
		 * the transaction commits because cow fork words words
		 * words ...
		 */
		xfs_trans_mod_dquot_byino(..)
	} else {
		/* adjust disk quota ... */
		xfs_trans_mod_dquot_byino(...)
	}
}

So we don't have to scroll up and down the source file to understand
how/why we process quotas differently between the cow and other forks.

>  STATIC int
>  xfs_bmap_btalloc(
>  	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
> @@ -3571,19 +3658,22 @@ xfs_bmap_btalloc(
>  			*ap->firstblock = args.fsbno;
>  		ASSERT(nullfb || fb_agno <= args.agno);
>  		ap->length = args.len;
> -		if (!(ap->flags & XFS_BMAPI_COWFORK))
> -			ap->ip->i_d.di_nblocks += args.len;
> -		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);

... and I'd just leave this here.

(Or alternately rename the helper function to something like
xfs_bmap_btalloc_fork() and put the entire if/else logic in there.)

>  		if (ap->wasdel)
>  			ap->ip->i_delayed_blks -= args.len;
> -		/*
> -		 * Adjust the disk quota also. This was reserved
> -		 * earlier.
> -		 */
> -		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> -			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> -					XFS_TRANS_DQ_BCOUNT,
> -			(long) args.len);
> +		if (ap->flags & XFS_BMAPI_COWFORK) {
> +			xfs_bmap_btalloc_quota_cow(ap, &args);
> +		} else {
> +			ap->ip->i_d.di_nblocks += args.len;
> +			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> +			/*
> +			 * Adjust the disk quota also. This was reserved
> +			 * earlier.
> +			 */
> +			xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> +				ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> +						XFS_TRANS_DQ_BCOUNT,
> +				(long) args.len);
> +		}

And this just calls the helper above.

>  	} else {
>  		ap->blkno = NULLFSBLOCK;
>  		ap->length = 0;
> @@ -4760,6 +4850,7 @@ xfs_bmap_del_extent_cow(
>  	struct xfs_bmbt_irec	new;
>  	xfs_fileoff_t		del_endoff, got_endoff;
>  	int			state = BMAP_COWFORK;
> +	int			error;
>  
>  	XFS_STATS_INC(mp, xs_del_exlist);
>  
> @@ -4816,6 +4907,11 @@ xfs_bmap_del_extent_cow(
>  		xfs_iext_insert(ip, icur, &new, state);
>  		break;
>  	}
> +
> +	/* Remove the quota reservation */
> +	error = xfs_trans_reserve_quota_nblks(NULL, ip,
> +			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
> +	ASSERT(error == 0);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
> index ce6506a..34b4ec2 100644
> --- a/fs/xfs/xfs_quota.h
> +++ b/fs/xfs/xfs_quota.h
> @@ -54,11 +54,19 @@ struct xfs_trans;
>   */
>  typedef struct xfs_dqtrx {
>  	struct xfs_dquot *qt_dquot;	  /* the dquot this refers to */
> -	ulong		qt_blk_res;	  /* blks reserved on a dquot */
> +
> +	/* dquot bcount blks reserved for this transaction */
> +	ulong		qt_blk_res;
> +
>  	ulong		qt_ino_res;	  /* inode reserved on a dquot */
>  	ulong		qt_ino_res_used;  /* inodes used from the reservation */
> -	long		qt_bcount_delta;  /* dquot blk count changes */
> -	long		qt_delbcnt_delta; /* delayed dquot blk count changes */
> +
> +	/* dquot block count changes taken from qt_blk_res */
> +	long		qt_bcount_delta;
> +
> +	/* dquot block count changes taken from delalloc reservation */
> +	long		qt_delbcnt_delta;
> +

Separate patch.

>  	long		qt_icount_delta;  /* dquot inode count changes */
>  	ulong		qt_rtblk_res;	  /* # blks reserved on a dquot */
>  	ulong		qt_rtblk_res_used;/* # blks used from reservation */
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 82abff6..e367351 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
>  					del.br_startblock, del.br_blockcount,
>  					NULL);
>  
> -			/* Update quota accounting */
> -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> -					-(long)del.br_blockcount);
> -
>  			/* Roll the transaction */
>  			xfs_defer_ijoin(&dfops, ip);
>  			error = xfs_defer_finish(tpp, &dfops);
> @@ -795,6 +791,10 @@ xfs_reflink_end_cow(
>  		if (error)
>  			goto out_defer;
>  
> +		/* Charge this new data fork mapping to the on-disk quota. */
> +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
> +				(long)del.br_blockcount);
> +

Should this technically be XFS_TRANS_DQ_DELBCOUNT? The blocks obviously
aren't delalloc and this transaction doesn't make a quota reservation so
I don't think it screws up accounting. But if the transaction did make a
quota reservation, it seems like this would account the extent against
the tx reservation where it instead should recognize that cow blocks
have already been reserved (which is essentially what DELBCOUNT means,
IIUC).

Other than that the code seems Ok to me.

Brian

>  		/* Remove the mapping from the CoW fork. */
>  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux