On Tue, Mar 15, 2022 at 12:36:24PM -0700, Darrick J. Wong wrote: > On Tue, Mar 15, 2022 at 05:42:38PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > When the AIL tries to flush the CIL, it relies on the CIL push > > ending up on stable storage without having to wait for and > > manipulate iclog state directly. However, if there is already a > > pending CIL push when the AIL tries to flush the CIL, it won't set > > the cil->xc_push_commit_stable flag and so the CIL push will not > > actively flush the commit record iclog. > > > > generic/530 when run on a single CPU test VM can trigger this fairly > > reliably. This test exercises unlinked inode recovery, and can > > result in inodes being pinned in memory by ongoing modifications to > > the inode cluster buffer to record unlinked list modifications. As a > > result, the first inode unlinked in a buffer can pin the tail of the > > log whilst the inode cluster buffer is pinned by the current > > checkpoint that has been pushed but isn't on stable storage because > > because the cil->xc_push_commit_stable was not set. This results in > > the log/AIL effectively deadlocking until something triggers the > > commit record iclog to be pushed to stable storage (i.e. the > > periodic log worker calling xfs_log_force()). > > > > The fix is two-fold - first we should always set the > > cil->xc_push_commit_stable when xlog_cil_flush() is called, > > regardless of whether there is already a pending push or not. > > > > Second, if the CIL is empty, we should trigger an iclog flush to > > ensure that the iclogs of the last checkpoint have actually been > > submitted to disk as that checkpoint may not have been run under > > stable completion constraints. > > Can it ever be the case that the CIL is not empty but the last > checkpoint wasn't committed to disk? Yes. But xlog_cil_push_now() will capture that, queue it and mark it as xc_push_commit_stable. Remember that the push_now() code updates the push seq/stable state under down_read(ctx lock) + spin_lock(push lock) context. The push seq/stable state is cleared by the push worker under down_write(ctx lock) + spin_lock(push lock) conditions when it atomically swaps in the new empty CIL context. Hence the push worker will always see the stable flag if it has been set for that push sequence. > Let's say someone else > commits a transaction after the worker samples xc_push_commit_stable? If we race with commits between the xlog_cil_push_now(log, seq, true) and the CIL list_empty check in xlog_cil_flush(), there are two posibilities: 1. the CIL push worker hasn't run and atomically switched in a new CIL context. 2. the CIL push worker has run and switched contexts In the first case, the commit will end up in the same context that xlog_cil_flush() pushed, and it will be stable. That will result in an empty CIL after the CIL push worker runs, but the racing commit will be stable as per the xc_push_commit_stable flag. This can also lead to the CIL being empty by the time the list_empty check is done (because pre-empt), in which case the log force will be a no-op because none of the iclogs need flushing. > IOWs, why does a not-empty CIL mean that the last checkpoint is on disk? In the second case, the CIL push triggered by xlog_cil_push_now() will be stable because xc_push_commit_stable says it must be, and the racing commit will end up in the new CIL context and the CIL won't be empty. We don't need a log force in this case because the previous sequence that was flushed with stable semantics as required. In the case of AIL pushing, we don't actually care about racing CIL commits because we are trying to get pinned AIL items unpinned so we can move the tail of the log forwards. If those pinned items are relogged by racing transactions, then the next call to xlog_cil_flush() from the AIL will get them unpinned and that will move them forward in the log, anyway. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx