On Tue, Mar 08, 2022 at 05:12:08PM +1100, Dave Chinner wrote: > The fix is two-fold - first we should always set the > cil->xc_push_commit_stable when xlog_cil_flush() is called, > regardless of whether there is already a pending push or not. > > Second, if the CIL is empty, we should trigger an iclog flush to > ensure that all the iclogs of the last checkpoint have actually been > submitted to disk as that checkpoint may not have been run under > stable completion constraints. > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing") > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > Version 2: > - if the CIL is empty we need to push on the iclogs to ensure that > the previous CIL flush did actually make it to stable storage. > This closes a race condition where the AIL doesn't actually > trigger the last CIL push and so the CIL is already empty by the > time it tries to flush it to unpin pinned items. > > Note: this is against the XFS CIL scalability patchset. The > XLOG_CIL_EMPTY check on a mainline kernel will be: > > if (list_empty(&log->l_cilp->xc_cil)) > xfs_log_force(log->l_mp, 0); This is it. 100 runs of g/530 all completed in 2 or 3 seconds. Thanks! Tested-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> Just to be clear, my testing was done on b08968f196d4 (in Linus' tree), plus my fs-folio patches, plus patches 1-3 from this series, plus this (whitespace damaged): +++ b/fs/xfs/xfs_log_cil.c @@ -1243,18 +1243,27 @@ xlog_cil_push_now( if (!async) flush_workqueue(cil->xc_push_wq); + spin_lock(&cil->xc_push_lock); + + /* + * If this is an async flush request, we always need to set the + * xc_push_commit_stable flag even if something else has already queued + * a push. The flush caller is asking for the CIL to be on stable + * storage when the next push completes, so regardless of who has queued + * the push, the flush requires stable semantics from it. + */ + cil->xc_push_commit_stable = async; + /* * If the CIL is empty or we've already pushed the sequence then - * there's no work we need to do. + * there's no more work that we need to do. */ - spin_lock(&cil->xc_push_lock); if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) { spin_unlock(&cil->xc_push_lock); return; } cil->xc_push_seq = push_seq; - cil->xc_push_commit_stable = async; queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); spin_unlock(&cil->xc_push_lock); } @@ -1352,6 +1361,8 @@ xlog_cil_flush( trace_xfs_log_force(log->l_mp, seq, _RET_IP_); xlog_cil_push_now(log, seq, true); + if (list_empty(&log->l_cilp->xc_cil)) + xfs_log_force(log->l_mp, 0); } /*