On Mon, Mar 08, 2021 at 07:25:26AM -0500, Brian Foster wrote: > On Fri, Mar 05, 2021 at 04:11:06PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to > > guarantee the ordering requirements the journal has w.r.t. metadata > > writeback. THe two ordering constraints are: .... > > The rm -rf times are included because I ran them, but the > > differences are largely noise. This workload is largely metadata > > read IO latency bound and the changes to the journal cache flushing > > doesn't really make any noticable difference to behaviour apart from > > a reduction in noiclog events from background CIL pushing. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > Thoughts on my previous feedback to this patch, particularly the locking > bits..? I thought I saw a subsequent patch somewhere that increased the > parallelism of this code.. I seem to have missed that email, too. I guess you are refering to these two hunks: > > @@ -2416,10 +2408,21 @@ xlog_write( > > ASSERT(log_offset <= iclog->ic_size - 1); > > ptr = iclog->ic_datap + log_offset; > > > > - /* start_lsn is the first lsn written to. That's all we need. */ > > + /* Start_lsn is the first lsn written to. */ > > if (start_lsn && !*start_lsn) > > *start_lsn = be64_to_cpu(iclog->ic_header.h_lsn); > > > > + /* > > + * iclogs containing commit records or unmount records need > > + * to issue ordering cache flushes and commit immediately > > + * to stable storage to guarantee journal vs metadata ordering > > + * is correctly maintained in the storage media. > > + */ > > + if (optype & (XLOG_COMMIT_TRANS | XLOG_UNMOUNT_TRANS)) { > > + iclog->ic_flags |= (XLOG_ICL_NEED_FLUSH | > > + XLOG_ICL_NEED_FUA); > > + } > > + > > /* > > * This loop writes out as many regions as can fit in the amount > > * of space which was allocated by xlog_state_get_iclog_space(). > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > > index c04d5d37a3a2..263c8d907221 100644 > > --- a/fs/xfs/xfs_log_cil.c > > +++ b/fs/xfs/xfs_log_cil.c > > @@ -896,11 +896,16 @@ xlog_cil_push_work( > > > > /* > > * If the checkpoint spans multiple iclogs, wait for all previous > > - * iclogs to complete before we submit the commit_iclog. > > + * iclogs to complete before we submit the commit_iclog. If it is in the > > + * same iclog as the start of the checkpoint, then we can skip the iclog > > + * cache flush because there are no other iclogs we need to order > > + * against. > > */ > > if (ctx->start_lsn != commit_lsn) { > > spin_lock(&log->l_icloglock); > > xlog_wait_on_iclog(commit_iclog->ic_prev); > > + } else { > > + commit_iclog->ic_flags &= ~XLOG_ICL_NEED_FLUSH; > > } .... that set/clear the flags on the iclog? Yes, they probably should be atomic. On second thoughts, we can't just clear XLOG_ICL_NEED_FLUSH here because there may be multiple commit records on this iclog and a previous one might require the flush. I'll just remove this optimisation from the patch right now, because it's more complex than it initially seemed. And looking at the aggregated code that I have now (including the stuff I haven't sent out), the need for xlog_write() to set the flush flags on the iclog is gone. THis is because the unmount record flushes the iclog directly itself so it can add flags there, and the iclog that the commit record is written to is returned to the caller. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx