Re: [PATCH 3/5] xfs: journal IO cache flush reductions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 28, 2021 at 10:12:05AM -0500, Brian Foster wrote:
> On Thu, Jan 28, 2021 at 03:41:52PM +1100, Dave Chinner wrote:
> ...
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_log.c      | 34 ++++++++++++++++++++++------------
> >  fs/xfs/xfs_log_priv.h |  3 +++
> >  2 files changed, 25 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index c5e3da23961c..8de93893e0e6 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> ...
> > @@ -2464,9 +2465,18 @@ xlog_write(
> >  		ASSERT(log_offset <= iclog->ic_size - 1);
> >  		ptr = iclog->ic_datap + log_offset;
> >  
> > -		/* start_lsn is the first lsn written to. That's all we need. */
> > -		if (!*start_lsn)
> > +		/*
> > +		 * Start_lsn is the first lsn written to. That's all the caller
> > +		 * needs to have returned. Setting it indicates the first iclog
> > +		 * of a new checkpoint or the commit record for a checkpoint, so
> > +		 * also mark the iclog as requiring a pre-flush to ensure all
> > +		 * metadata writeback or journal IO in the checkpoint is
> > +		 * correctly ordered against this new log write.
> > +		 */
> > +		if (!*start_lsn) {
> >  			*start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > +			iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
> > +		}
> 
> My understanding is that one of the reasons for the preflush per iclog
> approach is that we don't have any submission -> completion ordering
> guarantees across iclogs. This is why we explicitly order commit record
> completions and whatnot, to ensure the important bits are ordered
> correctly. The fact we implement that ordering ourselves suggests that
> PREFLUSH|FUA itself do not provide such ordering, though that's not
> something I've investigated.

PREFLUSH provides ordering between completed IOs and the IO to be
submitted. It does not provide any ordering guarantees against IO
currently in flight, so the application needs to wait for the IOs it
needs to order against to complete before issuing an IO with
PREFLUSH.

i.e. PREFLUSH provides a "many" completion to "single" submission
ordering guarantee on stable storage.

REQ_FUA only guarantees that when the write IO completes, it is on
stable storage. It does not provide ordering guarantees against any
IO in flight, nor IOs submitted while it is in flight. Once it
completes, however, it is guaranteed taht any latter IO submission
will hit stable storage after that IO.

i.e. REQ_FUA provides a "single" completion to "many" submission
ordering guarantee on stable storage.

> In any event, if the purpose fo the PREFLUSH is to ensure that metadata
> in the targeted LSN range is committed to stable storage, and we have no
> submission ordering guarantees across non-commit record iclogs, what
> prevents a subsequent iclog from the same checkpoint from completing
> before the first iclog with a PREFLUSH?

Fair point. I suspect that we should just do an explicit cache flush
before we start the checkpoint, and then we don't have to worry
about REQ_PREFLUSH for the first iclog in the checkpoint at all.

Actually, I wonder if we can pipeline that - submit an async cache
flush bio as soon as we enter the push work, then once we're ready
to call xlog_write() having pulled the hundreds of thousands of log
vectors off the CIL, we wait on the cache flush bio to complete.
THis gets around the first iclog in a long checkpoint requiring 
cache flushing or FUA. It also means that if there is a single
iclog for the checkpoint, we only need a FUA write as the cache
flush has already been done...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux