On Thu, Jun 17, 2021 at 10:49:10AM -0700, Darrick J. Wong wrote: > On Thu, Jun 17, 2021 at 06:26:11PM +1000, Dave Chinner wrote: > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > > index 705619e9dab4..2fb0ab02dda3 100644 > > --- a/fs/xfs/xfs_log_cil.c > > +++ b/fs/xfs/xfs_log_cil.c > > @@ -1075,15 +1075,54 @@ xlog_cil_push_work( > > ticket = ctx->ticket; > > > > /* > > - * If the checkpoint spans multiple iclogs, wait for all previous > > - * iclogs to complete before we submit the commit_iclog. In this case, > > - * the commit_iclog write needs to issue a pre-flush so that the > > - * ordering is correctly preserved down to stable storage. > > + * If the checkpoint spans multiple iclogs, wait for all previous iclogs > > + * to complete before we submit the commit_iclog. We can't use state > > + * checks for this - ACTIVE can be either a past completed iclog or a > > + * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a > > + * past or future iclog awaiting IO or ordered IO completion to be run. > > + * In the latter case, if it's a future iclog and we wait on it, the we > > + * will hang because it won't get processed through to ic_force_wait > > + * wakeup until this commit_iclog is written to disk. Hence we use the > > + * iclog header lsn and compare it to the commit lsn to determine if we > > + * need to wait on iclogs or not. > > */ > > spin_lock(&log->l_icloglock); > > if (ctx->start_lsn != commit_lsn) { > > - xlog_wait_on_iclog(commit_iclog->ic_prev); > > - spin_lock(&log->l_icloglock); > > + struct xlog_in_core *iclog; > > + > > + for (iclog = commit_iclog->ic_prev; > > + iclog != commit_iclog; > > + iclog = iclog->ic_prev) { > > + xfs_lsn_t hlsn; > > + > > + /* > > + * If the LSN of the iclog is zero or in the future it > > + * means it has passed through IO completion and > > + * activation and hence all previous iclogs have also > > + * done so. We do not need to wait at all in this case. > > + */ > > + hlsn = be64_to_cpu(iclog->ic_header.h_lsn); > > + if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0) > > + break; > > + > > + /* > > + * If the LSN of the iclog is older than the commit lsn, > > + * we have to wait on it. Waiting on this via the > > + * ic_force_wait should also order the completion of all > > + * older iclogs, too, but we leave checking that to the > > + * next loop iteration. > > + */ > > + ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0); > > + xlog_wait_on_iclog(iclog); > > + spin_lock(&log->l_icloglock); > > The presence of a loop here confuses me a bit -- we really only need to > check and wait on commit->ic_prev since xlog_wait_on_iclog waits for > both the iclog that it is given as well as all previous iclogs, right? I originally wrote this thinking about using the ic_write_wait queue which would require checking all iclogs in the ring because the completion signalled at the DONE_SYNC state is not ordered against other iclogs. Hence I had planned to walk all the iclogs. THen I realised that checking the LSN could tell us past/future and so we only needed to wait on the first iclog with a LSN less than the commit iclog. ANd so I left the loop in place to ensure that, even if my assertion about the ring aging order was incorrect, this code would Do The Right Thing. > we've waited on commit->ic_prev, the next iclog iterated (i.e. > commit->ic_prev->ic_prev) should break out of the loop? Yes, that is what it does. I can strip this all out - it was really just being defensive because I wanted to make sure things were working as I expected them to be working... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx