On Wed, Jun 08, 2022 at 10:43:57AM +0300, Amir Goldstein wrote: > On Mon, Jun 6, 2022 at 8:12 AM Leah Rumancik <leah.rumancik@xxxxxxxxx> wrote: > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > [ Upstream commit 70447e0ad9781f84e60e0990888bd8c84987f44e ] > > > > When the AIL tries to flush the CIL, it relies on the CIL push > > ending up on stable storage without having to wait for and > > manipulate iclog state directly. However, if there is already a > > pending CIL push when the AIL tries to flush the CIL, it won't set > > the cil->xc_push_commit_stable flag and so the CIL push will not > > actively flush the commit record iclog. > > > > generic/530 when run on a single CPU test VM can trigger this fairly > > reliably. This test exercises unlinked inode recovery, and can > > result in inodes being pinned in memory by ongoing modifications to > > the inode cluster buffer to record unlinked list modifications. As a > > result, the first inode unlinked in a buffer can pin the tail of the > > log whilst the inode cluster buffer is pinned by the current > > checkpoint that has been pushed but isn't on stable storage because > > because the cil->xc_push_commit_stable was not set. This results in > > the log/AIL effectively deadlocking until something triggers the > > commit record iclog to be pushed to stable storage (i.e. the > > periodic log worker calling xfs_log_force()). > > > > The fix is two-fold - first we should always set the > > cil->xc_push_commit_stable when xlog_cil_flush() is called, > > regardless of whether there is already a pending push or not. > > > > Second, if the CIL is empty, we should trigger an iclog flush to > > ensure that the iclogs of the last checkpoint have actually been > > submitted to disk as that checkpoint may not have been run under > > stable completion constraints. > > > > Reported-and-tested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing") > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx> > > --- > > Two questions/suggestions regarding backporting this patch. > > DISCLAIMER: I am raising questions/suggestions. > There is no presumption that I know the answers. > The author of the patch is the best authority when it comes to answering > those questions and w.r.t adopting or discarding my suggestions. > > 1. I think the backport should also be tested with a single CPU VM as > described above > 2. I wonder if it would make sense to backport the 3 "defensive fixes" that > Dave mentioned in the cover letter [1] along with this fix? > > The rationale being that it is not enough to backport the fix itself. > Anything that is required to test the fix reliably should be backported with it > and since this issue involves subtle timing and races (maybe not as much > on a single CPU VM?), the "defensive fixes" that change the timing and > amount of wakeups/pushes may impact the ability to test the fix? > > Thanks, > Amir. > > [1] https://lore.kernel.org/all/20220317053907.164160-1-david@xxxxxxxxxxxxx/ This patch has been postponed till the second set, but I can certainly run this test described for the second set and look into the other fixes from the cover letter. Thanks for pointing that out. Leah