On Wed, Mar 16, 2022 at 08:47:45AM +1100, Dave Chinner wrote: > On Tue, Mar 15, 2022 at 12:36:24PM -0700, Darrick J. Wong wrote: > > On Tue, Mar 15, 2022 at 05:42:38PM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > When the AIL tries to flush the CIL, it relies on the CIL push > > > ending up on stable storage without having to wait for and > > > manipulate iclog state directly. However, if there is already a > > > pending CIL push when the AIL tries to flush the CIL, it won't set > > > the cil->xc_push_commit_stable flag and so the CIL push will not > > > actively flush the commit record iclog. > > > > > > generic/530 when run on a single CPU test VM can trigger this fairly > > > reliably. This test exercises unlinked inode recovery, and can > > > result in inodes being pinned in memory by ongoing modifications to > > > the inode cluster buffer to record unlinked list modifications. As a > > > result, the first inode unlinked in a buffer can pin the tail of the > > > log whilst the inode cluster buffer is pinned by the current > > > checkpoint that has been pushed but isn't on stable storage because > > > because the cil->xc_push_commit_stable was not set. This results in > > > the log/AIL effectively deadlocking until something triggers the > > > commit record iclog to be pushed to stable storage (i.e. the > > > periodic log worker calling xfs_log_force()). > > > > > > The fix is two-fold - first we should always set the > > > cil->xc_push_commit_stable when xlog_cil_flush() is called, > > > regardless of whether there is already a pending push or not. > > > > > > Second, if the CIL is empty, we should trigger an iclog flush to > > > ensure that the iclogs of the last checkpoint have actually been > > > submitted to disk as that checkpoint may not have been run under > > > stable completion constraints. > > > > Can it ever be the case that the CIL is not empty but the last > > checkpoint wasn't committed to disk? > > Yes. But xlog_cil_push_now() will capture that, queue it and mark > it as xc_push_commit_stable. > > Remember that the push_now() code updates the push seq/stable > state under down_read(ctx lock) + spin_lock(push lock) context. The > push seq/stable state is cleared by the push worker under > down_write(ctx lock) + spin_lock(push lock) conditions when it > atomically swaps in the new empty CIL context. Hence the push worker > will always see the stable flag if it has been set for that push > sequence. > > > Let's say someone else > > commits a transaction after the worker samples xc_push_commit_stable? > > If we race with commits between the xlog_cil_push_now(log, seq, > true) and the CIL list_empty check in xlog_cil_flush(), there are > two posibilities: > > 1. the CIL push worker hasn't run and atomically switched in a new > CIL context. > 2. the CIL push worker has run and switched contexts > > In the first case, the commit will end up in the same context that > xlog_cil_flush() pushed, and it will be stable. That will result in > an empty CIL after the CIL push worker runs, but the racing commit > will be stable as per the xc_push_commit_stable flag. This can also > lead to the CIL being empty by the time the list_empty check is done > (because pre-empt), in which case the log force will be a no-op > because none of the iclogs need flushing. > > > IOWs, why does a not-empty CIL mean that the last checkpoint is on disk? > > In the second case, the CIL push triggered by xlog_cil_push_now() > will be stable because xc_push_commit_stable says it must be, and > the racing commit will end up in the new CIL context and the CIL > won't be empty. We don't need a log force in this case because the > previous sequence that was flushed with stable semantics as required. > > In the case of AIL pushing, we don't actually care about racing CIL > commits because we are trying to get pinned AIL items unpinned so we > can move the tail of the log forwards. If those pinned items are > relogged by racing transactions, then the next call to > xlog_cil_flush() from the AIL will get them unpinned and that will > move them forward in the log, anyway. Ok, that's kinda what I was thinking, but wasn't sure there wasn't some other weird subtlety that I hadn't figured out. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D --D > > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx