Re: [PATCH 7/8 v2] xfs: journal IO cache flush reductions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 24, 2021 at 05:57:20PM +0530, Chandan Babu R wrote:
> On 23 Feb 2021 at 13:35, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> >
> > Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to
> > guarantee the ordering requirements the journal has w.r.t. metadata
> > writeback. THe two ordering constraints are:
> >
> > 1. we cannot overwrite metadata in the journal until we guarantee
> > that the dirty metadata has been written back in place and is
> > stable.
> >
> > 2. we cannot write back dirty metadata until it has been written to
> > the journal and guaranteed to be stable (and hence recoverable) in
> > the journal.
> >
> > The ordering guarantees of #1 are provided by REQ_PREFLUSH. This
> > causes the journal IO to issue a cache flush and wait for it to
> > complete before issuing the write IO to the journal. Hence all
> > completed metadata IO is guaranteed to be stable before the journal
> > overwrites the old metadata.
> >
> > The ordering guarantees of #2 are provided by the REQ_FUA, which
> > ensures the journal writes do not complete until they are on stable
> > storage. Hence by the time the last journal IO in a checkpoint
> > completes, we know that the entire checkpoint is on stable storage
> > and we can unpin the dirty metadata and allow it to be written back.
> >
> > This is the mechanism by which ordering was first implemented in XFS
> > way back in 2002 by this commit:
> >
> > commit 95d97c36e5155075ba2eb22b17562cfcc53fcf96
> > Author: Steve Lord <lord@xxxxxxx>
> > Date:   Fri May 24 14:30:21 2002 +0000
> >
> >     Add support for drive write cache flushing - should the kernel
> >     have the infrastructure
> >
> > A lot has changed since then, most notably we now use delayed
> > logging to checkpoint the filesystem to the journal rather than
> > write each individual transaction to the journal. Cache flushes on
> > journal IO are necessary when individual transactions are wholly
> > contained within a single iclog. However, CIL checkpoints are single
> > transactions that typically span hundreds to thousands of individual
> > journal writes, and so the requirements for device cache flushing
> > have changed.
> >
> > That is, the ordering rules I state above apply to ordering of
> > atomic transactions recorded in the journal, not to the journal IO
> > itself. Hence we need to ensure metadata is stable before we start
> > writing a new transaction to the journal (guarantee #1), and we need
> > to ensure the entire transaction is stable in the journal before we
> > start metadata writeback (guarantee #2).
> >
> > Hence we only need a REQ_PREFLUSH on the journal IO that starts a
> > new journal transaction to provide #1, and it is not on any other
> > journal IO done within the context of that journal transaction.
> >
> > The CIL checkpoint already issues a cache flush before it starts
> > writing to the log, so we no longer need the iclog IO to issue a
> > REQ_REFLUSH for us. Hence if XLOG_START_TRANS is passed
> > to xlog_write(), we no longer need to mark the first iclog in
> > the log write with REQ_PREFLUSH for this case.
> >
> > Given the new ordering semantics of commit records for the CIL, we
> > need iclogs containing commit to issue a REQ_PREFLUSH. We also
> 
> We flush the data device before writing the first iclog (containing
> XLOG_START_TRANS) to the disk. This satisfies the first ordering constraint
> listed above. Why is it required to have another REQ_PREFLUSH when writing the
> iclog containing XLOG_COMMIT_TRANS? I am guessing that it is required to
> make sure that the previous iclogs (belonging to the same checkpoint
> transaction) have indeed been written to the disk.

Yes, that is correct.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux