Re: [PATCH 40/45] xfs: convert CIL to unordered per cpu lists

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 10, 2021 at 05:15:05PM -0800, Darrick J. Wong wrote:
> On Fri, Mar 05, 2021 at 04:11:38PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > So that we can remove the cil_lock which is a global serialisation
> > point. We've already got ordering sorted, so all we need to do is
> > treat the CIL list like the busy extent list and reconstruct it
> > before the push starts.
....
> > @@ -530,7 +511,6 @@ xlog_cil_insert_items(
> >  	 * the transaction commit.
> >  	 */
> >  	order = atomic_inc_return(&ctx->order_id);
> > -	spin_lock(&cil->xc_cil_lock);
> >  	list_for_each_entry(lip, &tp->t_items, li_trans) {
> >  
> >  		/* Skip items which aren't dirty in this transaction. */
> > @@ -540,10 +520,26 @@ xlog_cil_insert_items(
> >  		lip->li_order_id = order;
> >  		if (!list_empty(&lip->li_cil))
> >  			continue;
> > -		list_add(&lip->li_cil, &cil->xc_cil);
> > +		list_add(&lip->li_cil, &cilpcp->log_items);
> 
> Ok, so if I understand this correctly -- every time a transaction
> commits, it marks every dirty log item with a monotonically increasing
> counter.  If the log item isn't already on another CPU's CIL list, it
> gets added to the current CPU's CIL list...

Correct.

> > +	}
> > +	put_cpu_ptr(cilpcp);
> > +
> > +	/*
> > +	 * If we've overrun the reservation, dump the tx details before we move
> > +	 * the log items. Shutdown is imminent...
> > +	 */
> > +	tp->t_ticket->t_curr_res -= ctx_res + len;
> > +	if (WARN_ON(tp->t_ticket->t_curr_res < 0)) {
> > +		xfs_warn(log->l_mp, "Transaction log reservation overrun:");
> > +		xfs_warn(log->l_mp,
> > +			 "  log items: %d bytes (iov hdrs: %d bytes)",
> > +			 len, iovhdr_res);
> > +		xfs_warn(log->l_mp, "  split region headers: %d bytes",
> > +			 split_res);
> > +		xfs_warn(log->l_mp, "  ctx ticket: %d bytes", ctx_res);
> > +		xlog_print_trans(tp);
> >  	}
> >  
> > -	spin_unlock(&cil->xc_cil_lock);
> >  
> >  	if (tp->t_ticket->t_curr_res < 0)
> >  		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> > @@ -806,6 +802,7 @@ xlog_cil_push_work(
> >  	bool			commit_iclog_sync = false;
> >  	int			cpu;
> >  	struct xlog_cil_pcp	*cilpcp;
> > +	LIST_HEAD		(log_items);
> >  
> >  	new_ctx = xlog_cil_ctx_alloc();
> >  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> > @@ -822,6 +819,9 @@ xlog_cil_push_work(
> >  			list_splice_init(&cilpcp->busy_extents,
> >  					&ctx->busy_extents);
> >  		}
> > +		if (!list_empty(&cilpcp->log_items)) {
> > +			list_splice_init(&cilpcp->log_items, &log_items);
> 
> ...and then at CIL push time, we splice each per-CPU list into a big
> list, sort the dirty log items by counter number, and process them.

Yup, that's pretty much it. I'm replacing insert time ordering with
push-time ordering to get rid of the serialisation overhead of
insert time ordering.

> The first thought I had was that it's a darn shame that _insert_items
> can't steal a log item from another CPU's CIL list, because you could
> then mergesort the per-CPU CIL lists into @log_items.  Unfortunately, I
> don't think there's a safe way to steal items from a per-CPU list
> without involving locks.

Yeah, it needs locks because we then have to serialise local inserts
with remote removals. It can be done fairly easily - I just need to
replace the "order ID" field with the CPU ID of the list it is on.

The problem is that relogging happens a lot, so in some workloads we
might be bouncing a set of commonly accessed log items around CPUs
frequently. That said, I'm not sure this would end up a huge
problem, but it still needs a mergesort to be performed in the push
code...

> The second thought I had was that we have the xfs_pwork mechanism for
> launching a bunch of worker threads.  A pwork workqueue is (probably)
> too costly when the item list is short or there aren't that many CPUs,
> but once list_sort starts getting painful, would it be faster to launch
> a bunch of threads in push_work to sort each per-CPU list and then merge
> sort them into the final list?

Not sure, because now you have N work threads competing with the
userspace workload for CPU to do maybe 10ms of work. The scheduling
latency when the system is CPU bound is likely to introduce more
latency than you save by spreading the work out....

I've largely put these sorts of questions aside because optimising
this code further can be done later. The code as it stands doubles
the throughput of the commit path and I don't think that further
optimisation is immediately necessary. Ensuring that the splitting
and recombining of the lists still results in correctly ordered log
items is more important right now, and I think it does that.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux