Re: [PATCH 2/5] xfs: separate CIL commit record IO

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 1 Feb 2021 12:59:18 +0000

On Thu, Jan 28, 2021 at 03:41:51PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> To allow for iclog IO device cache flush behaviour to be optimised,
> we first need to separate out the commit record iclog IO from the
> rest of the checkpoint so we can wait for the checkpoint IO to
> complete before we issue the commit record.
> 
> This separate is only necessary if the commit record is being

s/separate/separation/g

> written into a different iclog to the start of the checkpoint. If
> the entire checkpoint and commit is in the one iclog, then they are
> both covered by the one set of cache flush primitives on the iclog
> and hence there is no need to separate them.
> 
> Otherwise, we need to wait for all the previous iclogs to complete
> so they are ordered correctly and made stable by the REQ_PREFLUSH
> that the commit record iclog IO issues. This guarantees that if a
> reader sees the commit record in the journal, they will also see the
> entire checkpoint that commit record closes off.
> 
> This also provides the guarantee that when the commit record IO
> completes, we can safely unpin all the log items in the checkpoint
> so they can be written back because the entire checkpoint is stable
> in the journal.

I'm a little worried about the direction for devices without a volatile
write cache like all highend enterprise SSDs, Arrays and hard drives,
where we not introduce another synchronization point without any gains
from the reduction in FUA/flush traffic that is a no-op there.