Re: [PATCH 1/3] xfs: always do log forces via the workqueue

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 21 Feb 2014 09:07:47 +1100

On Thu, Feb 20, 2014 at 08:51:55AM -0600, Mark Tinguely wrote:
> On 02/19/14 18:23, Dave Chinner wrote:
> >On Wed, Feb 19, 2014 at 01:24:54PM -0500, Brian Foster wrote:
> >>On 02/18/2014 11:16 PM, Dave Chinner wrote:
> >>>From: Dave Chinner<dchinner@xxxxxxxxxx>
> >>>
> >>>Log forces can occur deep in the call chain when we have relatively
> >>>little stack free. Log forces can also happen at close to the call
> >>>chain leaves (e.g. xfs_buf_lock()) and hence we can trigger IO from
> >>>places where we really don't want to add more stack overhead.
> >>>
> >>>This stack overhead occurs because log forces do foreground CIL
> >>>pushes (xlog_cil_push_foreground()) rather than waking the
> >>>background push wq and waiting for the for the push to complete.
> >>>This foreground push was done to avoid confusing the CFQ Io
> >>>scheduler when fsync()s were issued, as it has trouble dealing with
> >>>dependent IOs being issued from different process contexts.
> >>>
> >>>Avoiding blowing the stack is much more critical than performance
> >>>optimisations for CFQ, especially as we've been recommending against
> >>>the use of CFQ for XFS since 3.2 kernels were release because of
> >>>it's problems with multi-threaded IO workloads.
> >>>
> >>>Hence convert xlog_cil_push_foreground() to move the push work
> >>>to the CIL workqueue. We already do the waiting for the push to
> >>>complete in xlog_cil_force_lsn(), so there's nothing else we need to
> >>>modify to make this work.
> >>>
> >>>Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
.....
> >>>@@ -803,7 +808,6 @@ xlog_cil_force_lsn(
> >>>  	 * before allowing the force of push_seq to go ahead. Hence block
> >>>  	 * on commits for those as well.
> >>>  	 */
> >>>-restart:
> >>>  	spin_lock(&cil->xc_push_lock);
> >>>  	list_for_each_entry(ctx,&cil->xc_committing, committing) {
> >>>  		if (ctx->sequence>  sequence)
> >>>@@ -821,6 +825,28 @@ restart:
> >>>  		/* found it! */
> >>>  		commit_lsn = ctx->commit_lsn;
> >>>  	}
> >>>+
> >>>+	/*
> >>>+	 * The call to xlog_cil_push_now() executes the push in the background.
> >>>+	 * Hence by the time we have got here it our sequence may not have been
> >>>+	 * pushed yet. This is true if the current sequence still matches the
> >>>+	 * push sequence after the above wait loop and the CIL still contains
> >>>+	 * dirty objects.
> >>>+	 *
> >>>+	 * When the push occurs, it will empty the CIL and
> >>>+	 * atomically increment the currect sequence past the push sequence and
> >>>+	 * move it into the committing list. Of course, if the CIL is clean at
> >>>+	 * the time of the push, it won't have pushed the CIL at all, so in that
> >>>+	 * case we should try the push for this sequence again from the start
> >>>+	 * just in case.
> >>>+	 */
> >>>+
> >>>+	if (sequence == cil->xc_current_sequence&&
                                             ^^^^^
FYI, your mailer is still mangling whitespace when quoting code....

> >>>+	    !list_empty(&cil->xc_cil)) {
> >>>+		spin_unlock(&cil->xc_push_lock);
> >>>+		goto restart;
> >>>+	}
> >>>+
> >>
> >>IIUC, the objective here is to make sure we don't leave this code path
> >>before the push even starts and the ctx makes it onto the committing
> >>list, due to xlog_cil_push_now() moving things to a workqueue.
> >
> >Right.
> >
> >>Given that, what's the purpose of re-executing the background push as
> >>opposed to restarting the wait sequence (as done previously)? It looks
> >>like push_now() won't queue the work again due to cil->xc_push_seq, but
> >>it will flush the queue and I suppose make it more likely the push
> >>starts. Is that the intent?
> >
> >Effectively. But the other thing that it is protecting against is
> >that foreground push is done without holding the cil->xc_ctx_lock,
> >and so we can get the situation where we try a foreground push
> >of the current sequence, see that the CIL is empty and return
> >without pushing, wait for previous sequences to commit, then find
> >that the CIL has items on the CIL in the sequence we are supposed to
> >be committing.
> >
> >In this case, we don't know if this occurred because the workqueue
> >has not started working on our push, or whether we raced on an empty
> >CIL, and hence we need to make sure that everything in the sequence
> >we are support to commit is pushed to the log.
> >
> >Hence if the current sequence is dirty after we've ensure that all
> >prior sequences are fully checkpointed, need to go back and
> >push the CIL again to ensure that when we return to the caller the
> >CIL is checkpointed up to the point in time of the log force
> >occurring.
> 
> The desired push sequence was taken from an item on the CIL (either
> when added or from a pinned item). How could the CIL now be empty
> other than someone else pushed to at least the desire sequence?

The push sequence is only taken from an object on the CIL through
xfs_log_force_lsn(). For xfs_log_force(), the sequence is taken
directly from the current CIL context:

static inline void
xlog_cil_force(struct xlog *log)
{
        xlog_cil_force_lsn(log, log->l_cilp->xc_current_sequence);
}

And that's how you get an empty CIL when entering
xlog_cil_force_lsn(), and hence how you can get the race condition
that the code is protecting against.

> A flush_work() should be enough in the case where the ctx of the
> desire sequence is not on the xc_committing list. The flush_work
> will wait for the worker to start and place the ctx of the desired
> sequence into the xc_committing list. This preventing a tight loop
> waiting for the cil push worker to start.

Yes, that's exactly what the code does.

> Starting the cil push worker for every wakeup of smaller sequence in
> the list_for_each_entry loop seems wasteful.

As Brian pointed out, it won't restart on every wakeup - the
cil->xc_push_seq checks prevent that from happening, so a specific
sequence will only ever be queued for a push once.

> We know the later error paths in xfs_cil_push() will not do a wake,
> now is a good time to fix that.

I'm not sure what you are talking about here. If there's a problem,
please send patches.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs