Re: [PATCH 3/4] xfs: add quota-driven speculative preallocation throttling

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 14 Dec 2012 11:13:00 -0500

On 12/12/2012 09:25 PM, Dave Chinner wrote:
> On Wed, Dec 05, 2012 at 11:47:57AM -0500, Brian Foster wrote:
...
>>
>> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> 
> I'm having trouble determining what the algorithm is supposed to be
> and what might be bugs in the algorithm....
> 
> These are notes, somewhat incoherent, but I'll post them anyway
> because i think they convey my concerns and solutions well enough.
> 

Ok...

> Cheers,
> 
> Dave.
> 
> ----
> 
> Describe the algorithm to ensure I have it right:
> 
> 	calculate preallocation size (alloc_blocks)
> 		uses file size to determine size
> 
> 	determine ENOSPC throttling reduction (shift)
> 
> 	calculate maximum quota prealloc amount allowed
> 		walk each dquot on inode
> 			check hard limit
> 				=> over = no prealloc
> 			check soft limit
> 				=> none = prealloc unmodified
> 			check over soft limit
> 				=> no = prealloc unmodified
> 			calculate "throttle percentage"
> 			calculate max prealloc
> 		use minimum prealloc value returned
> 
> 	alloc_blocks = MIN(alloc_blocks, quota_alloc_blocks)
> 
> 	apply ENOSPC throttle (shift)
> 
> 
> -  prealloc size being overridden by quota throttling, and then the
>    ENOSPC throttle is applied to that.
> 

That sounds accurate to me.

> - the overall algorithm looks good to me, that means my problems are
>   with the implementation....
> 
> 	- it calculates stuff dynamically that could be set once in
> 	  the struct xfs_dquot on initialisation and whenever limits
> 	  are changed. i.e. the "percentage to throttle". This
> 	  should be carried by xfs_dquot as it simplifies
> 	  the logic here.
> 

Indeed.

> 	- if there is a hard limit but no soft limit, it *always*
> 	  throttles preallocation to the default percentage even
> 	  where there is lots of space available for the full
> 	  prealloc.
> 

I'm not following you here. xfs_prealloc_dquot_max() should return the
maximum allowed preallocation for a quota by calculating some percentage
of the free space in the quota (regardless of the existence of a soft
limit; that serves as a throttling watermark and percentage modifier).
xfs_prealloc_quota_max() just extends that across all quota types and
returns the most limiting value. For an inode with plenty of space in
its quota set, we should get a large value back for max_quota_prealloc,
which is then ignored because alloc_blocks doesn't violate that limit.

(Regardless, I don't think this changes your design comments that follow.)

> 	- it returns a block count based on limits, or -1 for no
> 	  throttling. I'd prefer a pair of functions - one to check
> 	  whether throttling is needed, and one to calculate the
> 	  throttling parameters
> 
> 		- should use xfs_this_quota_on() to drive quota
> 		  checks completely inside throttle check. will make
> 		  the code much cleaner
> 
> 		- check function can be boolean
> 
> 		- calc function should return both raw space
> 		  available and shift values so the global prealloc
> 		  values can be overridden independently. i.e.
> 		  allows quota throttling to work even when overall
> 		  prealloc is less than the maximum quota would
> 		  allow.
> 
> 
> Rough code:
> 
> need_throttle()
> {
> 	if (!xfs_this_quota_on(mp, type))
> 		return false;
> 
> 	dq = xfs_inode_dquot(ip, type);
> 
> 	/* no hard limit, no throttle */
> 	if (!dq->q_hard_limit)
> 		return false;
> 
> 	/* over hard limit, always throttle */
> 	if (dq->q_res_bcount > dq->q_hard_limit)
> 		return true;
> 
> 	/*
> 	 * Under soft limit, no throttle.
> 	 *
> 	 * Note: we always have a soft limit for prealloc,
> 	 * calculated at dquot instantiation or limit change
> 	 */
> 	if (dq->q_res_bcount + alloc_blocks < dq->q_soft_limit)
> 		return false;
> 
> 	/* between soft limit and hard limit, need to throttle */
> 	return true;
> }
> 
> - needs struct xfs-dquot to be initialised appropriately and quota
>   limit changes to handle changes correctly.
> 
> - allows soft limit defaults to be set in memory if they aren't on
>   disk. i.e. default throttling values will be no different in
>   implementation to on-disk limits.
> 

This comment is not quite clear to me. I suspect it relates to the
pseudocode comment above with regard to a default soft limit. To this
point, I'm interpreting your design proposal as something like the
following, at a high-level:

- xfs_dquot grows a (in memory) table of low space limits. This table is
populated in when the quota is read from disk, created and/or the input
parameters (hard/soft limit) change.
- The soft limit as a percentage modifier goes away by nature of using
the 3-5% logarithmic throttle (shift).
- The soft limit as a prealloc throttling trigger remains, just
implemented as a separate function as depicted above.

There are no on-disk changes proposed, correct? Are you suggesting we
set a default soft limit value on all quotas with a hard limit?

> calc_throttle()
> {
> 	dq = xfs_inode_dquot(ip, type);
> 
> 	freesp = dq->q_hard_limit - dq->q_res_bcount;
> 
> 	if (freesp < dq->q_low_space[XFS_LOWSP_5_PCNT]) {
> 		shift = 2;
> 		if (freesp < dq->q_low_space[XFS_LOWSP_4_PCNT])
> 			shift++;
> 		if (freesp < dq->q_low_space[XFS_LOWSP_3_PCNT])
> 			shift++;
> 		if (freesp < dq->q_low_space[XFS_LOWSP_2_PCNT])
> 			shift++;
> 		if (freesp < dq->q_low_space[XFS_LOWSP_1_PCNT])
> 			shift++;
> 	}
> 
> 	/* only overwrite current values if the result is a smaller prealloc */
> 	if ((freesp >> shift) >= (*qblocks >> *qshift))
> 		return;
> 
> 	*qblocks = freesp;
> 	*qshift = shift;
> }
> 
> - similar shift table to the ENOSPC code for a logarithmic mapping
>   rather than a linear mapping.
> - probably doesn't need 5 steps, 3 steps that do shift += 2 is
>   probably sufficient and would reduce per-dquot memory overhead.
> 
> xfs_iomap_prealloc_size()
> {
> .....
> 
> 	qblocks = alloc_blocks;
> 	qshift = 0;
> 
> 	if (need_throttle(ip, XFS_DQ_USER, alloc_blocks)
> 		calc_throttle(ip, XFS_DQ_USER, &qblocks, &qshift);
> 
> 	if (need_throttle(ip, XFS_DQ_GROUP, alloc_blocks)
> 		calc_throttle(ip, XFS_DQ_GROUP, &qblocks, &qshift);
> 
> 	if (need_throttle(ip, XFS_DQ_PROJ, alloc_blocks)
> 		calc_throttle(ip, XFS_DQ_PROJ, &qblocks, &qshift);
> 
> 	/*
> 	 * The final size of the preallocation is the smaller of the
> 	 * whole filesystem prealloc size and the quota prealloc
> 	 * size. i.e. whichever entity has the least space available
> 	 * for allocation determines the maximum preallocation size.
> 	 *
> 	 * The final throttling level is the larger of the ENOSPC
> 	 * and quota throttles. i.e. which ever is closer to their
> 	 * respective space limit determines how much we throttle
> 	 * by.
> 	 */
> 	alloc_blocks = MIN(qblocks, alloc_blocks)
> 	shift = MAX(qshift, shift)
> ....
> 

... and this all makes sense.

I'm a little unsure about applying a shift selected for one limit
against the preallocation size of another, but I suppose it can't hurt
to be aggressive. This design should bubble up all the relevant
parameters to a single point anyways, so that should be easier to reason
about and measure when I have some code.

Thanks for the review Dave!

Brian

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs