On Mon, Jul 26, 2021 at 04:07:12PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > After fixing the tail_lsn vs cache flush race, generic/482 continued > to fail in a similar way where cache flushes were missing before > iclog FUA writes. Tracing of iclog state changes during the fsstress > workload portion of the test (via xlog_iclog* events) indicated that > iclog writes were coming from two sources - CIL pushes and log > forces (due to fsync/O_SYNC operations). All of the cases where a > recovery problem was triggered indicated that the log force was the > source of the iclog write that was not preceeded by a cache flush. > > This was an oversight in the modifications made in commit > eef983ffeae7 ("xfs: journal IO cache flush reductions"). Log forces > for fsync imply a data device cache flush has been issued if an > iclog was flushed to disk and is indicated to the caller via the > log_flushed parameter so they can elide the device cache flush if > the journal issued one. > > The change in eef983ffeae7 results in iclogs only issuing a cache > flush if XLOG_ICL_NEED_FLUSH is set on the iclog, but this was not > added to the iclogs that the log force code flushes to disk. Hence > log forces are no longer guaranteeing that a cache flush is issued, > hence opening up a potential on-disk ordering failure. > > Log forces should also set XLOG_ICL_NEED_FUA as well to ensure that > the actual iclogs it forces to the journal are also on stable > storage before it returns to the caller. > > This patch introduces the xlog_force_iclog() helper function to > encapsulate the process of taking a reference to an iclog, switching > it's state if WANT_SYNC and flushing it to stable storage correctly. s/it's/its/ With that fixed, Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > > Both xfs_log_force() and xfs_log_force_lsn() are converted to use > it, as is xlog_unmount_write() which has an elaborate method of > doing exactly the same "write this iclog to stable storage" > operation. > > Further, if the log force code needs to wait on a iclog in the > WANT_SYNC state, it needs to ensure that iclog also results in a > cache flush being issued. This covers the case where the iclog > contains the commit record of the CIL flush that the log force > triggered, but it hasn't been written yet because there is still an > active reference to the iclog. > > Note: this whole cache flush whack-a-mole patch is a result of log > forces still being iclog state centric rather than being CIL > sequence centric. Most of this nasty code will go away in future > when log forces are converted to wait on CIL sequence push > completion rather than iclog completion. With the CIL push algorithm > guaranteeing that the CIL checkpoint is fully on stable storage when > it completes, we no longer need to iterate iclogs and push them to > ensure a CIL sequence push has completed and so all this nasty iclog > iteration and flushing code will go away. > > Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_log.c | 47 ++++++++++++++++++++++++++++++++++------------- > 1 file changed, 34 insertions(+), 13 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 7107cf542eda..a1243dfd66ee 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -790,6 +790,7 @@ xlog_force_iclog( > struct xlog_in_core *iclog) > { > atomic_inc(&iclog->ic_refcnt); > + iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA; > if (iclog->ic_state == XLOG_STATE_ACTIVE) > xlog_state_switch_iclogs(iclog->ic_log, iclog, 0); > return xlog_state_release_iclog(iclog->ic_log, iclog, 0); > @@ -880,7 +881,6 @@ xlog_unmount_write( > > spin_lock(&log->l_icloglock); > iclog = log->l_iclog; > - iclog->ic_flags |= (XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > error = xlog_force_iclog(iclog); > xlog_wait_on_iclog(iclog); > > @@ -3217,22 +3217,23 @@ xfs_log_force( > goto out_unlock; > } else { > /* > - * Someone else is writing to this iclog. > - * > - * Use its call to flush out the data. However, the > - * other thread may not force out this LR, so we mark > - * it WANT_SYNC. > + * Someone else is still writing to this iclog, so we > + * need to ensure that when they release the iclog it > + * gets synced immediately as we may be waiting on it. > */ > xlog_state_switch_iclogs(log, iclog, 0); > } > - } else { > - /* > - * If the head iclog is not active nor dirty, we just attach > - * ourselves to the head and go to sleep if necessary. > - */ > - ; > } > > + /* > + * The iclog we are about to wait on may contain the checkpoint pushed > + * by the above xlog_cil_force() call, but it may not have been pushed > + * to disk yet. Like the ACTIVE case above, we need to make sure caches > + * are flushed when this iclog is written. > + */ > + if (iclog->ic_state == XLOG_STATE_WANT_SYNC) > + iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA; > + > if (flags & XFS_LOG_SYNC) > return xlog_wait_on_iclog(iclog); > out_unlock: > @@ -3265,7 +3266,8 @@ xlog_force_lsn( > goto out_unlock; > } > > - if (iclog->ic_state == XLOG_STATE_ACTIVE) { > + switch (iclog->ic_state) { > + case XLOG_STATE_ACTIVE: > /* > * We sleep here if we haven't already slept (e.g. this is the > * first time we've looked at the correct iclog buf) and the > @@ -3292,6 +3294,25 @@ xlog_force_lsn( > goto out_error; > if (log_flushed) > *log_flushed = 1; > + break; > + case XLOG_STATE_WANT_SYNC: > + /* > + * This iclog may contain the checkpoint pushed by the > + * xlog_cil_force_seq() call, but there are other writers still > + * accessing it so it hasn't been pushed to disk yet. Like the > + * ACTIVE case above, we need to make sure caches are flushed > + * when this iclog is written. > + */ > + iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA; > + break; > + default: > + /* > + * The entire checkpoint was written by the CIL force and is on > + * its way to disk already. It will be stable when it > + * completes, so we don't need to manipulate caches here at all. > + * We just need to wait for completion if necessary. > + */ > + break; > } > > if (flags & XFS_LOG_SYNC) > -- > 2.31.1 >