On Thu, Sep 05, 2019 at 06:47:10PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > In the situation where the log is full and the CIL has not recently > flushed, the AIL push threshold is throttled back to the where the > last write of the head of the log was completed. This is stored in > log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space > pinned by flushes and/or aggregation in progress, we can get the > situation where the head of the log lags a long way behind the > reservation grant head. > > When this happens, the AIL push target is trimmed back from where > the reservation grant head wants to push the log tail to, back to > where the head of the log currently is. This means the push target > doesn't reach far enough into the log to actually move the tail > before the transaction reservation goes to sleep. > > When the CIL push completes, it moves the log head forward such that > the AIL push target can now be moved, but that has no mechanism for > puhsing the log tail. Further, if the next tail movement of the log > is not large enough wake the waiter (i.e. still not enough space for > it to have a reservation granted), we don't wake anything up, and > hence we do not update the AIL push target to take into account the > head of the log moving and allowing the push target to be moved > forwards. > > To avoid this particular condition, if we fail to wake the first > waiter on the grant head because we don't have enough space, > push on the AIL again. This will pick up any movement of the log > head and allow the push target to move forward due to completion of > CIL pushing. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_log.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index b159a9e9fef0..5e21450fb8f5 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -214,15 +214,36 @@ xlog_grant_head_wake( > { > struct xlog_ticket *tic; > int need_bytes; > + bool woken_task = false; > > list_for_each_entry(tic, &head->waiters, t_queue) { > + /* > + * The is a chance that the size of the CIL checkpoints in > + * progress result at the last AIL push resulted in the log head > + * (l_last_sync_lsn) not reflecting where the log head now is. That's a bit difficult to understand... "There is a chance that the size of the CIL checkpoints in progress at the last AIL push results in the last committed log head (l_last_sync_lsn) not reflecting where the log head is now." ? (Did I get that right?) > + * Hence when we are woken here, it may be the head of the log > + * that has moved rather than the tail. In that case, the tail > + * didn't move and there won't be space available until the AIL > + * push target is updated and the tail pushed again. If the AIL > + * is already pushed to it's target, we will hang here until Nit: "its", not "it is". Other than that I think I can tell what this is doing now. :) --D > + * something else updates the AIL push target. > + * > + * Hence if there isn't space to wake the first waiter on the > + * grant head, we need to push the AIL again to ensure the > + * target reflects the current log tail and log head position > + * before we wait for the tail to move again. > + */ > need_bytes = xlog_ticket_reservation(log, head, tic); > - if (*free_bytes < need_bytes) > + if (*free_bytes < need_bytes) { > + if (!woken_task) > + xlog_grant_push_ail(log, need_bytes); > return false; > + } > > *free_bytes -= need_bytes; > trace_xfs_log_grant_wake_up(log, tic); > wake_up_process(tic->t_task); > + woken_task = true; > } > > return true; > -- > 2.23.0.rc1 >