On Thu, Feb 25, 2021 at 08:44:17AM +1100, Dave Chinner wrote: > On Wed, Feb 24, 2021 at 12:34:29PM -0800, Darrick J. Wong wrote: > > On Tue, Feb 23, 2021 at 02:34:36PM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > To allow for iclog IO device cache flush behaviour to be optimised, > > > we first need to separate out the commit record iclog IO from the > > > rest of the checkpoint so we can wait for the checkpoint IO to > > > complete before we issue the commit record. > > > > > > This separation is only necessary if the commit record is being > > > written into a different iclog to the start of the checkpoint as the > > > upcoming cache flushing changes requires completion ordering against > > > the other iclogs submitted by the checkpoint. > > > > > > If the entire checkpoint and commit is in the one iclog, then they > > > are both covered by the one set of cache flush primitives on the > > > iclog and hence there is no need to separate them for ordering. > > > > > > Otherwise, we need to wait for all the previous iclogs to complete > > > so they are ordered correctly and made stable by the REQ_PREFLUSH > > > that the commit record iclog IO issues. This guarantees that if a > > > reader sees the commit record in the journal, they will also see the > > > entire checkpoint that commit record closes off. > > > > > > This also provides the guarantee that when the commit record IO > > > completes, we can safely unpin all the log items in the checkpoint > > > so they can be written back because the entire checkpoint is stable > > > in the journal. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_log.c | 55 +++++++++++++++++++++++++++++++++++++++++++ > > > fs/xfs/xfs_log_cil.c | 7 ++++++ > > > fs/xfs/xfs_log_priv.h | 2 ++ > > > 3 files changed, 64 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > index fa284f26d10e..ff26fb46d70f 100644 > > > --- a/fs/xfs/xfs_log.c > > > +++ b/fs/xfs/xfs_log.c > > > @@ -808,6 +808,61 @@ xlog_wait_on_iclog( > > > return 0; > > > } > > > > > > +/* > > > + * Wait on any iclogs that are still flushing in the range of start_lsn to the > > > + * current iclog's lsn. The caller holds a reference to the iclog, but otherwise > > > + * holds no log locks. > > > + * > > > + * We walk backwards through the iclogs to find the iclog with the highest lsn > > > + * in the range that we need to wait for and then wait for it to complete. > > > + * Completion ordering of iclog IOs ensures that all prior iclogs to the > > > + * candidate iclog we need to sleep on have been complete by the time our > > > + * candidate has completed it's IO. > > > > Hmm, I guess this means that iclog header lsns are supposed to increase > > as one walks forwards through the list? > > yes, the iclogs are written sequentially to the log - we don't > switch the log->l_iclog pointer to the current active iclog until we > switch it out, and then the next iclog in the loop is physically > located at a higher lsn to the one we just switched out. > > > > + * > > > + * Therefore we only need to find the first iclog that isn't clean within the > > > + * span of our flush range. If we come across a clean, newly activated iclog > > > + * with a lsn of 0, it means IO has completed on this iclog and all previous > > > + * iclogs will be have been completed prior to this one. Hence finding a newly > > > + * activated iclog indicates that there are no iclogs in the range we need to > > > + * wait on and we are done searching. > > > > I don't see an explicit check for an iclog with a zero lsn? Is that > > implied by XLOG_STATE_ACTIVE? > > It's handled by the XFS_LSN_CMP(prev_lsn, start_lsn) < 0 check. if > the prev_lsn is zero because the iclog is clean, then this check > will always be true. > > > Also, do you have any idea what was Christoph talking about wrt devices > > with no-op flushes the last time this patch was posted? This change > > seems straightforward to me (assuming the answers to my two question are > > 'yes') but I didn't grok what subtlety he was alluding to...? > > He was wondering what devices benefited from this. It has no impact > on highspeed devices that do not require flushes/FUA (e.g. high end > intel optane SSDs) but those are not the devices this change is > aimed at. There are no regressions on these high end devices, > either, so they are largely irrelevant to the patch and what it > targets... Ok, that's what I thought. It seemed fairly self-evident to me that high speed devices wouldn't care. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx