On Thu, Sep 05, 2019 at 08:18:28AM -0700, Darrick J. Wong wrote: > On Thu, Sep 05, 2019 at 06:47:10PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > In the situation where the log is full and the CIL has not recently > > flushed, the AIL push threshold is throttled back to the where the > > last write of the head of the log was completed. This is stored in > > log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space > > pinned by flushes and/or aggregation in progress, we can get the > > situation where the head of the log lags a long way behind the > > reservation grant head. > > > > When this happens, the AIL push target is trimmed back from where > > the reservation grant head wants to push the log tail to, back to > > where the head of the log currently is. This means the push target > > doesn't reach far enough into the log to actually move the tail > > before the transaction reservation goes to sleep. > > > > When the CIL push completes, it moves the log head forward such that > > the AIL push target can now be moved, but that has no mechanism for > > puhsing the log tail. Further, if the next tail movement of the log > > is not large enough wake the waiter (i.e. still not enough space for > > it to have a reservation granted), we don't wake anything up, and > > hence we do not update the AIL push target to take into account the > > head of the log moving and allowing the push target to be moved > > forwards. > > > > To avoid this particular condition, if we fail to wake the first > > waiter on the grant head because we don't have enough space, > > push on the AIL again. This will pick up any movement of the log > > head and allow the push target to move forward due to completion of > > CIL pushing. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_log.c | 23 ++++++++++++++++++++++- > > 1 file changed, 22 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index b159a9e9fef0..5e21450fb8f5 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -214,15 +214,36 @@ xlog_grant_head_wake( > > { > > struct xlog_ticket *tic; > > int need_bytes; > > + bool woken_task = false; > > > > list_for_each_entry(tic, &head->waiters, t_queue) { > > + /* > > + * The is a chance that the size of the CIL checkpoints in > > + * progress result at the last AIL push resulted in the log head > > + * (l_last_sync_lsn) not reflecting where the log head now is. > > That's a bit difficult to understand... Yup I failed to edit it properly and left an extra "result" in the sentence... > "There is a chance that the size of the CIL checkpoints in progress at > the last AIL push results in the last committed log head (l_last_sync_lsn) > not reflecting where the log head is now." ? > > (Did I get that right?) *nod*. > > + * Hence when we are woken here, it may be the head of the log > > + * that has moved rather than the tail. In that case, the tail > > + * didn't move and there won't be space available until the AIL > > + * push target is updated and the tail pushed again. If the AIL > > + * is already pushed to it's target, we will hang here until > > Nit: "its", not "it is". > > Other than that I think I can tell what this is doing now. :) In reading this again, it is all a bit clunky. I've revised it a bit more to more concisely describe the situation: /* * The is a chance that the size of the CIL checkpoints in * progress at the last AIL push target calculation resulted in * limiting the target to the log head (l_last_sync_lsn) at the * time. This may not reflect where the log head is now as the * CIL checkpoints may have completed. * * Hence when we are woken here, it may be that the head of the * log that has moved rather than the tail. As the tail didn't * move, there still won't be space available for the * reservation we require. However, if the AIL has already * pushed to the target defined by the old log head location, we * will hang here waiting for something else to update the AIL * push target. * * Therefore, if there isn't space to wake the first waiter on * the grant head, we need to push the AIL again to ensure the * target reflects both the current log tail and log head * position before we wait for the tail to move again. */ Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx