On Thu, Feb 20, 2014 at 08:51:55AM -0600, Mark Tinguely wrote: > On 02/19/14 18:23, Dave Chinner wrote: > >On Wed, Feb 19, 2014 at 01:24:54PM -0500, Brian Foster wrote: > >>On 02/18/2014 11:16 PM, Dave Chinner wrote: > >>>From: Dave Chinner<dchinner@xxxxxxxxxx> > >>> > >>>Log forces can occur deep in the call chain when we have relatively > >>>little stack free. Log forces can also happen at close to the call > >>>chain leaves (e.g. xfs_buf_lock()) and hence we can trigger IO from > >>>places where we really don't want to add more stack overhead. > >>> > >>>This stack overhead occurs because log forces do foreground CIL > >>>pushes (xlog_cil_push_foreground()) rather than waking the > >>>background push wq and waiting for the for the push to complete. > >>>This foreground push was done to avoid confusing the CFQ Io > >>>scheduler when fsync()s were issued, as it has trouble dealing with > >>>dependent IOs being issued from different process contexts. > >>> > >>>Avoiding blowing the stack is much more critical than performance > >>>optimisations for CFQ, especially as we've been recommending against > >>>the use of CFQ for XFS since 3.2 kernels were release because of > >>>it's problems with multi-threaded IO workloads. > >>> > >>>Hence convert xlog_cil_push_foreground() to move the push work > >>>to the CIL workqueue. We already do the waiting for the push to > >>>complete in xlog_cil_force_lsn(), so there's nothing else we need to > >>>modify to make this work. > >>> > >>>Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx> ..... > >>>@@ -803,7 +808,6 @@ xlog_cil_force_lsn( > >>> * before allowing the force of push_seq to go ahead. Hence block > >>> * on commits for those as well. > >>> */ > >>>-restart: > >>> spin_lock(&cil->xc_push_lock); > >>> list_for_each_entry(ctx,&cil->xc_committing, committing) { > >>> if (ctx->sequence> sequence) > >>>@@ -821,6 +825,28 @@ restart: > >>> /* found it! */ > >>> commit_lsn = ctx->commit_lsn; > >>> } > >>>+ > >>>+ /* > >>>+ * The call to xlog_cil_push_now() executes the push in the background. > >>>+ * Hence by the time we have got here it our sequence may not have been > >>>+ * pushed yet. This is true if the current sequence still matches the > >>>+ * push sequence after the above wait loop and the CIL still contains > >>>+ * dirty objects. > >>>+ * > >>>+ * When the push occurs, it will empty the CIL and > >>>+ * atomically increment the currect sequence past the push sequence and > >>>+ * move it into the committing list. Of course, if the CIL is clean at > >>>+ * the time of the push, it won't have pushed the CIL at all, so in that > >>>+ * case we should try the push for this sequence again from the start > >>>+ * just in case. > >>>+ */ > >>>+ > >>>+ if (sequence == cil->xc_current_sequence&& ^^^^^ FYI, your mailer is still mangling whitespace when quoting code.... > >>>+ !list_empty(&cil->xc_cil)) { > >>>+ spin_unlock(&cil->xc_push_lock); > >>>+ goto restart; > >>>+ } > >>>+ > >> > >>IIUC, the objective here is to make sure we don't leave this code path > >>before the push even starts and the ctx makes it onto the committing > >>list, due to xlog_cil_push_now() moving things to a workqueue. > > > >Right. > > > >>Given that, what's the purpose of re-executing the background push as > >>opposed to restarting the wait sequence (as done previously)? It looks > >>like push_now() won't queue the work again due to cil->xc_push_seq, but > >>it will flush the queue and I suppose make it more likely the push > >>starts. Is that the intent? > > > >Effectively. But the other thing that it is protecting against is > >that foreground push is done without holding the cil->xc_ctx_lock, > >and so we can get the situation where we try a foreground push > >of the current sequence, see that the CIL is empty and return > >without pushing, wait for previous sequences to commit, then find > >that the CIL has items on the CIL in the sequence we are supposed to > >be committing. > > > >In this case, we don't know if this occurred because the workqueue > >has not started working on our push, or whether we raced on an empty > >CIL, and hence we need to make sure that everything in the sequence > >we are support to commit is pushed to the log. > > > >Hence if the current sequence is dirty after we've ensure that all > >prior sequences are fully checkpointed, need to go back and > >push the CIL again to ensure that when we return to the caller the > >CIL is checkpointed up to the point in time of the log force > >occurring. > > The desired push sequence was taken from an item on the CIL (either > when added or from a pinned item). How could the CIL now be empty > other than someone else pushed to at least the desire sequence? The push sequence is only taken from an object on the CIL through xfs_log_force_lsn(). For xfs_log_force(), the sequence is taken directly from the current CIL context: static inline void xlog_cil_force(struct xlog *log) { xlog_cil_force_lsn(log, log->l_cilp->xc_current_sequence); } And that's how you get an empty CIL when entering xlog_cil_force_lsn(), and hence how you can get the race condition that the code is protecting against. > A flush_work() should be enough in the case where the ctx of the > desire sequence is not on the xc_committing list. The flush_work > will wait for the worker to start and place the ctx of the desired > sequence into the xc_committing list. This preventing a tight loop > waiting for the cil push worker to start. Yes, that's exactly what the code does. > Starting the cil push worker for every wakeup of smaller sequence in > the list_for_each_entry loop seems wasteful. As Brian pointed out, it won't restart on every wakeup - the cil->xc_push_seq checks prevent that from happening, so a specific sequence will only ever be queued for a push once. > We know the later error paths in xfs_cil_push() will not do a wake, > now is a good time to fix that. I'm not sure what you are talking about here. If there's a problem, please send patches. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs