On Fri, Jul 23, 2021 at 07:45:39AM +1000, Dave Chinner wrote: > On Thu, Jul 22, 2021 at 11:14:45AM -0700, Darrick J. Wong wrote: > > On Thu, Jul 22, 2021 at 11:53:32AM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > The recent journal flush/FUA changes replaced the flushing of the > > > data device on every iclog write with an up-front async data device > > > cache flush. Unfortunately, the assumption of which this was based > > > on has been proven incorrect by the flush vs log tail update > > > ordering issue. As the fix for that issue uses the > > > XLOG_ICL_NEED_FLUSH flag to indicate that data device needs a cache > > > flush, we now need to (once again) ensure that an iclog write to > > > external logs that need a cache flush to be issued actually issue a > > > cache flush to the data device as well as the log device. > > > > > > Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_log.c | 19 +++++++++++-------- > > > 1 file changed, 11 insertions(+), 8 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > index 96434cc4df6e..a3c4d48195d9 100644 > > > --- a/fs/xfs/xfs_log.c > > > +++ b/fs/xfs/xfs_log.c > > > @@ -827,13 +827,6 @@ xlog_write_unmount_record( > > > /* account for space used by record data */ > > > ticket->t_curr_res -= sizeof(ulf); > > > > > > - /* > > > - * For external log devices, we need to flush the data device cache > > > - * first to ensure all metadata writeback is on stable storage before we > > > - * stamp the tail LSN into the unmount record. > > > - */ > > > - if (log->l_targ != log->l_mp->m_ddev_targp) > > > - blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev); > > > return xlog_write(log, &vec, ticket, NULL, NULL, XLOG_UNMOUNT_TRANS); > > > } > > > > > > @@ -1796,10 +1789,20 @@ xlog_write_iclog( > > > * metadata writeback and causing priority inversions. > > > */ > > > iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_IDLE; > > > - if (iclog->ic_flags & XLOG_ICL_NEED_FLUSH) > > > + if (iclog->ic_flags & XLOG_ICL_NEED_FLUSH) { > > > iclog->ic_bio.bi_opf |= REQ_PREFLUSH; > > > + /* > > > + * For external log devices, we also need to flush the data > > > + * device cache first to ensure all metadata writeback covered > > > + * by the LSN in this iclog is on stable storage. This is slow, > > > + * but it *must* complete before we issue the external log IO. > > > > I'm a little confused about what's going on here. We're about to write > > a log record to disk, with h_tail_lsn reflecting the tail of the log and > > h_lsn reflecting the current head of the log (i.e. this record). > > > > If the log tail has moved forward since the last log record was written > > and this fs has an external log, we need to flush the data device > > because the AIL could have written logged items back into the filesystem > > and we need to ensure those items have been persisted before we write to > > the log the fact that the tail moved forward. The AIL itself doesn't > > issue cache flushes (nor does it need to), so that's why we do that > > here. > > > > Why don't we need a flush like this if only FUA is set? Is it not > > possible to write a checkpoint that fits within a single iclog after the > > log tail has moved forward? > > Yes, it is, and that is the race condition is exactly what the next > patch in the series addresses. If the log tail moves after the data > device cache flush was issued before we started writing the > checkpoint to the iclogs, then we detect that when releasing the > commit iclog and set the XLOG_ICL_NEED_FLUSH flag on it. That will > then trigger this code to issue a data device cache flush.... Aha, yeah, I noticed that after scanning the next few patches. > IOWs, for external logs, the XLOG_ICL_NEED_FLUSH flag indicates that > both the data device and the log device need a cache flush, rather > than just the log device. I think it could be split into two flags, > but then my head explodes thinking about log forces and trying to > determine what type of flush is implied (and what flags we'd need to > set) when we return log_flushed = true.... Maybe later when we're not focussed on recovery failures. In the meantime, I'm satisfied enough to Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx