On 05 Mar 2021 at 10:41, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > To allow for iclog IO device cache flush behaviour to be optimised, > we first need to separate out the commit record iclog IO from the > rest of the checkpoint so we can wait for the checkpoint IO to > complete before we issue the commit record. > > This separation is only necessary if the commit record is being > written into a different iclog to the start of the checkpoint as the > upcoming cache flushing changes requires completion ordering against > the other iclogs submitted by the checkpoint. > > If the entire checkpoint and commit is in the one iclog, then they > are both covered by the one set of cache flush primitives on the > iclog and hence there is no need to separate them for ordering. > > Otherwise, we need to wait for all the previous iclogs to complete > so they are ordered correctly and made stable by the REQ_PREFLUSH > that the commit record iclog IO issues. This guarantees that if a > reader sees the commit record in the journal, they will also see the > entire checkpoint that commit record closes off. > > This also provides the guarantee that when the commit record IO > completes, we can safely unpin all the log items in the checkpoint > so they can be written back because the entire checkpoint is stable > in the journal. > I see that xlog_state_clean_iclog() wakes up tasks waiting on iclog->ic_force_wait and that xlog_state_clean_iclog() itself is invoked after the corresponding iclog is written to disk and the log vectors are moved to AIL. Hence using iclog->ic_force_wait to wait for previous iclogs to complete I/O ensures that the commit record iclog is written to disk only after the previous iclogs have already been written. Reviewed-by: Chandan Babu R <chandanrlinux@xxxxxxxxx> -- chandan