On 15 Mar 2022 at 12:12, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When the AIL tries to flush the CIL, it relies on the CIL push > ending up on stable storage without having to wait for and > manipulate iclog state directly. However, if there is already a > pending CIL push when the AIL tries to flush the CIL, it won't set > the cil->xc_push_commit_stable flag and so the CIL push will not > actively flush the commit record iclog. I think the above sentence maps to the following snippet from xlog_cil_push_now(), if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) { spin_unlock(&cil->xc_push_lock); return; } i.e. if the CIL sequence that we are trying to push is already being pushed then xlog_cil_push_now() returns without queuing work on cil->xc_push_wq. However, the push_seq could have been previously pushed by, 1. xfsaild_push() In this case, cil->xc_push_commit_stable is set to true. Hence, xlog_cil_push_work() will definitely make sure to submit the commit record iclog for write I/O. 2. xfs_log_force_seq() => xlog_cil_force_seq() xfs_log_force_seq() invokes xlog_force_lsn() after executing xlog_cil_force_seq(). Here, A partially filled iclog will be in XLOG_STATE_ACTIVE state. This will cause xlog_force_and_check_iclog() to be invoked and hence the iclog is submitted for write I/O. In both the cases listed above, iclog is guaranteed to be submitted for I/O without any help from the log worker task. Looks like I am missing something obvious here. > > generic/530 when run on a single CPU test VM can trigger this fairly > reliably. This test exercises unlinked inode recovery, and can > result in inodes being pinned in memory by ongoing modifications to > the inode cluster buffer to record unlinked list modifications. As a > result, the first inode unlinked in a buffer can pin the tail of the > log whilst the inode cluster buffer is pinned by the current > checkpoint that has been pushed but isn't on stable storage because > because the cil->xc_push_commit_stable was not set. This results in > the log/AIL effectively deadlocking until something triggers the > commit record iclog to be pushed to stable storage (i.e. the > periodic log worker calling xfs_log_force()). > > The fix is two-fold - first we should always set the > cil->xc_push_commit_stable when xlog_cil_flush() is called, > regardless of whether there is already a pending push or not. > > Second, if the CIL is empty, we should trigger an iclog flush to > ensure that the iclogs of the last checkpoint have actually been > submitted to disk as that checkpoint may not have been run under > stable completion constraints. > > Reported-and-tested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing") > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_log_cil.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 3d8ebf2a1e55..48b16a5feb27 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -1369,18 +1369,27 @@ xlog_cil_push_now( > if (!async) > flush_workqueue(cil->xc_push_wq); > > + spin_lock(&cil->xc_push_lock); > + > + /* > + * If this is an async flush request, we always need to set the > + * xc_push_commit_stable flag even if something else has already queued > + * a push. The flush caller is asking for the CIL to be on stable > + * storage when the next push completes, so regardless of who has queued > + * the push, the flush requires stable semantics from it. > + */ > + cil->xc_push_commit_stable = async; > + > /* > * If the CIL is empty or we've already pushed the sequence then > - * there's no work we need to do. > + * there's no more work that we need to do. > */ > - spin_lock(&cil->xc_push_lock); > if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) { > spin_unlock(&cil->xc_push_lock); > return; > } > > cil->xc_push_seq = push_seq; > - cil->xc_push_commit_stable = async; > queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); > spin_unlock(&cil->xc_push_lock); > } > @@ -1520,6 +1529,13 @@ xlog_cil_flush( > > trace_xfs_log_force(log->l_mp, seq, _RET_IP_); > xlog_cil_push_now(log, seq, true); > + > + /* > + * If the CIL is empty, make sure that any previous checkpoint that may > + * still be in an active iclog is pushed to stable storage. > + */ > + if (list_empty(&log->l_cilp->xc_cil)) > + xfs_log_force(log->l_mp, 0); > } > > /* -- chandan