On Mon, Mar 01, 2021 at 10:19:36AM -0500, Brian Foster wrote: > On Tue, Feb 23, 2021 at 02:34:36PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > To allow for iclog IO device cache flush behaviour to be optimised, > > we first need to separate out the commit record iclog IO from the > > rest of the checkpoint so we can wait for the checkpoint IO to > > complete before we issue the commit record. > > > > This separation is only necessary if the commit record is being > > written into a different iclog to the start of the checkpoint as the > > upcoming cache flushing changes requires completion ordering against > > the other iclogs submitted by the checkpoint. > > > > If the entire checkpoint and commit is in the one iclog, then they > > are both covered by the one set of cache flush primitives on the > > iclog and hence there is no need to separate them for ordering. > > > > Otherwise, we need to wait for all the previous iclogs to complete > > so they are ordered correctly and made stable by the REQ_PREFLUSH > > that the commit record iclog IO issues. This guarantees that if a > > reader sees the commit record in the journal, they will also see the > > entire checkpoint that commit record closes off. > > > > This also provides the guarantee that when the commit record IO > > completes, we can safely unpin all the log items in the checkpoint > > so they can be written back because the entire checkpoint is stable > > in the journal. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_log.c | 55 +++++++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_log_cil.c | 7 ++++++ > > fs/xfs/xfs_log_priv.h | 2 ++ > > 3 files changed, 64 insertions(+) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index fa284f26d10e..ff26fb46d70f 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -808,6 +808,61 @@ xlog_wait_on_iclog( > > return 0; > > } > > > > +/* > > + * Wait on any iclogs that are still flushing in the range of start_lsn to the > > + * current iclog's lsn. The caller holds a reference to the iclog, but otherwise > > + * holds no log locks. > > + * > > + * We walk backwards through the iclogs to find the iclog with the highest lsn > > + * in the range that we need to wait for and then wait for it to complete. > > + * Completion ordering of iclog IOs ensures that all prior iclogs to the > > + * candidate iclog we need to sleep on have been complete by the time our > > + * candidate has completed it's IO. > > + * > > + * Therefore we only need to find the first iclog that isn't clean within the > > + * span of our flush range. If we come across a clean, newly activated iclog > > + * with a lsn of 0, it means IO has completed on this iclog and all previous > > + * iclogs will be have been completed prior to this one. Hence finding a newly > > + * activated iclog indicates that there are no iclogs in the range we need to > > + * wait on and we are done searching. > > + */ > > +int > > +xlog_wait_on_iclog_lsn( > > + struct xlog_in_core *iclog, > > + xfs_lsn_t start_lsn) > > +{ > > + struct xlog *log = iclog->ic_log; > > + struct xlog_in_core *prev; > > + int error = -EIO; > > + > > + spin_lock(&log->l_icloglock); > > + if (XLOG_FORCED_SHUTDOWN(log)) > > + goto out_unlock; > > + > > + error = 0; > > + for (prev = iclog->ic_prev; prev != iclog; prev = prev->ic_prev) { > > + > > + /* Done if the lsn is before our start lsn */ > > + if (XFS_LSN_CMP(be64_to_cpu(prev->ic_header.h_lsn), > > + start_lsn) < 0) > > + break; > > + > > + /* Don't need to wait on completed, clean iclogs */ > > + if (prev->ic_state == XLOG_STATE_DIRTY || > > + prev->ic_state == XLOG_STATE_ACTIVE) { > > + continue; > > + } > > + > > + /* wait for completion on this iclog */ > > + xlog_wait(&prev->ic_force_wait, &log->l_icloglock); > > You haven't addressed my feedback from the previous version. In > particular the bit about whether it is safe to block on ->ic_force_wait > from here considering some of our more quirky buffer locking behavior. Sorry, first I've heard about this. I don't have any such email in my inbox. I don't know what waiting on an iclog in the middle of a checkpoint has to do with buffer locking behaviour, because iclogs don't use buffers and we block waiting on iclog IO completion all the time in xlog_state_get_iclog_space(). If it's not safe to block on iclog IO completion here, then it's not safe to block on an iclog in xlog_state_get_iclog_space(). That's obviously not true, so I'm really not sure what the concern here is... > That aside, this iteration logic all seems a bit overengineered to me. > We have the commit record iclog of the current checkpoint and thus the > immediately previous iclog in the ring. We know that previous record > isn't earlier than start_lsn because the caller confirmed that start_lsn > != commit_lsn. We also know that iclog can't become dirty -> active > until it and all previous iclog writes have completed because the > callback ordering implemented by xlog_state_do_callback() won't clean > the iclog until that point. Given that, can't this whole thing be > replaced with a check of iclog->prev to either see if it's been cleaned > or to otherwise xlog_wait() for that condition and return? Maybe. I was more concerned about ensuring that it did the right thing so I checked all the things that came to mind. There was more than enough compexity in other parts of this patchset to fill my brain that minimal implementation were not a concern. I'll go take another look at it. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx