> As a result: > > logbsize fsmark create rate rm -rf > before 32kb 152851+/-5.3e+04 5m28s > patched 32kb 221533+/-1.1e+04 5m24s > > before 256kb 220239+/-6.2e+03 4m58s > patched 256kb 228286+/-9.2e+03 5m06s > > The rm -rf times are included because I ran them, but the > differences are largely noise. This workload is largely metadata > read IO latency bound and the changes to the journal cache flushing > doesn't really make any noticable difference to behaviour apart from > a reduction in noiclog events from background CIL pushing. The 256b rm -rf case actually seems like a regression not in the noise here. Does this reproduce over multiple runs? > @@ -2009,13 +2010,14 @@ xlog_sync( > * synchronously here; for an internal log we can simply use the block > * layer state machine for preflushes. > */ > - if (log->l_targ != log->l_mp->m_ddev_targp || split) { > + if (log->l_targ != log->l_mp->m_ddev_targp || > + (split && (iclog->ic_flags & XLOG_ICL_NEED_FLUSH))) { > xfs_flush_bdev(log->l_mp->m_ddev_targp->bt_bdev); > - need_flush = false; > + iclog->ic_flags &= ~XLOG_ICL_NEED_FLUSH; Once you touch all the buffer flags anyway we should optimize the log wraparound case here - insteaad of th synchronous flush we just need to set REQ_PREFLUSH on the first log bio, which should be nicely doable with your infrastruture. > + /* > + * iclogs containing commit records or unmount records need > + * to issue ordering cache flushes and commit immediately > + * to stable storage to guarantee journal vs metadata ordering > + * is correctly maintained in the storage media. > + */ > + if (optype & (XLOG_COMMIT_TRANS | XLOG_UNMOUNT_TRANS)) { > + iclog->ic_flags |= (XLOG_ICL_NEED_FLUSH | > + XLOG_ICL_NEED_FUA); > + } > + > /* > * This loop writes out as many regions as can fit in the amount > * of space which was allocated by xlog_state_get_iclog_space(). > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 4093d2d0db7c..370da7c2bfc8 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -894,10 +894,15 @@ xlog_cil_push_work( > > /* > * If the checkpoint spans multiple iclogs, wait for all previous > - * iclogs to complete before we submit the commit_iclog. > + * iclogs to complete before we submit the commit_iclog. If it is in the > + * same iclog as the start of the checkpoint, then we can skip the iclog > + * cache flush because there are no other iclogs we need to order > + * against. Nit: the iclogs in the first changed line would easily fit onto the previous line.