Re: [PATCH 36/45] xfs: implement percpu cil space used calculation

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 11 Mar 2021 17:51:05 +1100

On Wed, Mar 10, 2021 at 04:20:54PM -0800, Darrick J. Wong wrote:
> On Fri, Mar 05, 2021 at 04:11:34PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Now that we have the CIL percpu structures in place, implement the
> > space used counter with a fast sum check similar to the
> > percpu_counter infrastructure.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_log_cil.c  | 42 ++++++++++++++++++++++++++++++++++++------
> >  fs/xfs/xfs_log_priv.h |  2 +-
> >  2 files changed, 37 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 1bcf0d423d30..5519d112c1fd 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -433,6 +433,8 @@ xlog_cil_insert_items(
> >  	struct xfs_log_item	*lip;
> >  	int			len = 0;
> >  	int			iovhdr_res = 0, split_res = 0, ctx_res = 0;
> > +	int			space_used;
> > +	struct xlog_cil_pcp	*cilpcp;
> >  
> >  	ASSERT(tp);
> >  
> > @@ -469,8 +471,9 @@ xlog_cil_insert_items(
> >  	 *
> >  	 * This can steal more than we need, but that's OK.
> >  	 */
> > +	space_used = atomic_read(&ctx->space_used);
> >  	if (atomic_read(&cil->xc_iclog_hdrs) > 0 ||
> > -	    ctx->space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> > +	    space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> >  		int	split_res = log->l_iclog_hsize +
> >  					sizeof(struct xlog_op_header);
> >  		if (ctx_res)
> > @@ -480,16 +483,34 @@ xlog_cil_insert_items(
> >  		atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs);
> >  	}
> >  
> > +	/*
> > +	 * Update the CIL percpu pointer. This updates the global counter when
> > +	 * over the percpu batch size or when the CIL is over the space limit.
> > +	 * This means low lock overhead for normal updates, and when over the
> > +	 * limit the space used is immediately accounted. This makes enforcing
> > +	 * the hard limit much more accurate. The per cpu fold threshold is
> > +	 * based on how close we are to the hard limit.
> > +	 */
> > +	cilpcp = get_cpu_ptr(cil->xc_pcp);
> > +	cilpcp->space_used += len;
> > +	if (space_used >= XLOG_CIL_SPACE_LIMIT(log) ||
> > +	    cilpcp->space_used >
> > +			((XLOG_CIL_BLOCKING_SPACE_LIMIT(log) - space_used) /
> > +					num_online_cpus())) {
> 
> What happens if the log is very small and there are hundreds of CPUs?
> Can we end up on this slow path on a regular basis even if the amount of
> space used is not that large?

AFAICT, no big deal - the transaction reservations limit the the
amount of space and concurrency that we can have here. A small log
will not allow many more than a handful of transactions through at a
time.  IOWs, we'll already be on the transaction reservation slow
path that limits concurrency via the grant head spin lock in
xlog_grant_head_check() and sleeping in xlog_grant_head_wait()...

> Granted I can't think of a good way out of that, since I suspect that if
> you do that you're already going to be hurting in 5 other places anyway.
> That said ... I /do/ keep getting bugs from people with tiny logs on big
> iron.  Some day I'll (ha!) stomp out all the bugs that are "NO do not
> let your deployment system growfs 10000x, this is not ext4"...

Yeah, that hurts long before we get to this transaction commit
path...

> 
> > +		atomic_add(cilpcp->space_used, &ctx->space_used);
> > +		cilpcp->space_used = 0;
> > +	}
> > +	put_cpu_ptr(cilpcp);
> > +
> >  	spin_lock(&cil->xc_cil_lock);
> > -	tp->t_ticket->t_curr_res -= ctx_res + len;
> >  	ctx->ticket->t_unit_res += ctx_res;
> >  	ctx->ticket->t_curr_res += ctx_res;
> > -	ctx->space_used += len;
> >  
> >  	/*
> >  	 * If we've overrun the reservation, dump the tx details before we move
> >  	 * the log items. Shutdown is imminent...
> >  	 */
> > +	tp->t_ticket->t_curr_res -= ctx_res + len;
> 
> Is moving this really necessary?

Not really, just gets it out of the way. I moved it because it
doesn't need to be inside the spinlock and in the end it needs to
be associated with the underrun check. So I moved it here first so
that it didn't have to keep moving every time I moved the spinlock
or changed the order of the code from this point onwards....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx