On Wed, Sep 04, 2019 at 03:34:42PM -0400, Brian Foster wrote: > On Wed, Sep 04, 2019 at 02:24:51PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > +/* > > + * Completion of a iclog IO does not imply that a transaction has completed, as > > + * transactions can be large enough to span many iclogs. We cannot change the > > + * tail of the log half way through a transaction as this may be the only > > + * transaction in the log and moving the tail to point to the middle of it > > + * will prevent recovery from finding the start of the transaction. Hence we > > + * should only update the last_sync_lsn if this iclog contains transaction > > + * completion callbacks on it. > > + * > > + * We have to do this before we drop the icloglock to ensure we are the only one > > + * that can update it. > > + * > > + * If we are moving the last_sync_lsn forwards, we also need to ensure we kick > > + * the reservation grant head pushing. This is due to the fact that the push > > + * target is bound by the current last_sync_lsn value. Hence if we have a large > > + * amount of log space bound up in this committing transaction then the > > + * last_sync_lsn value may be the limiting factor preventing tail pushing from > > + * freeing space in the log. Hence once we've updated the last_sync_lsn we > > + * should push the AIL to ensure the push target (and hence the grant head) is > > + * no longer bound by the old log head location and can move forwards and make > > + * progress again. > > + */ > > +static void > > +xlog_state_set_callback( > > + struct xlog *log, > > + struct xlog_in_core *iclog, > > + xfs_lsn_t header_lsn) > > +{ > > + iclog->ic_state = XLOG_STATE_CALLBACK; > > + > > + ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), header_lsn) <= 0); > > + > > + if (list_empty_careful(&iclog->ic_callbacks)) > > + return; > > + > > + atomic64_set(&log->l_last_sync_lsn, header_lsn); > > + xlog_grant_push_ail(log, 0); > > + > > Nit: extra whitespace line above. Fixed. > This still seems racy to me, FWIW. What if the AIL is empty (i.e. the > push is skipped)? If the AIL is empty, then it's a no-op because pushing on the AIL is not going to make more log space become free. Besides, that's not the problem being solved here - reservation wakeups on first insert into the AIL are already handled by xfs_trans_ail_update_bulk() and hence the first patch in the series. This patch is addressing the situation where the bulk insert that occurs from the callbacks that are about to run -does not modify the tail of the log-. i.e. the commit moved the head but not the tail, so we have to update the AIL push target to take into account the new log head.... i.e. the AIL is for moving the tail of the log - this code moves the head of the log. But both impact on the AIL push target (it is based on the distance between the head and tail), so we need to update the push target just in case this commit does not move the tail... > What if xfsaild completes this push before the > associated log items land in the AIL or we race with xfsaild emptying > the AIL? Why not just reuse/update the existing grant head wake up logic > in the iclog callback itself? E.g., something like the following > (untested): > > @@ -740,12 +740,10 @@ xfs_trans_ail_update_bulk( > if (mlip_changed) { > if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount)) > xlog_assign_tail_lsn_locked(ailp->ail_mount); > - spin_unlock(&ailp->ail_lock); > - > - xfs_log_space_wake(ailp->ail_mount); > - } else { > - spin_unlock(&ailp->ail_lock); > } > + > + spin_unlock(&ailp->ail_lock); > + xfs_log_space_wake(ailp->ail_mount); Two things that I see straight away: 1. if the AIL is empty, the first insert does not set mlip_changed = true and and so there will be no wakeup in the scenario you are posing. This would be easy to fix - if (!mlip || changed) - so that a wakeup is triggered in this case. 2. if we have not moved the tail, then calling xfs_log_space_wake() will, at best, just burn CPU. At worst, it wll cause hundreds of thousands of spurious wakeups a seconds because the waiting transaction reservation will be woken continuously when there isn't space available and there hasn't been any space made available. So, from #1 we see that unconditional wakeups are not necessary in the scenario you pose, and from #2 it's not a viable solution even if it was required. However, #1 indicates other problems if a xfs_log_space_wake() call is necessary in this case. No reservations space and an empty AIL implies that the CIL pins the entire log - a pending commit that hasn't finished flushing and the current context that is aggregating. This implies we've violated a much more important rule of the on-disk log format: finding the head and tail of the log requires no individual commit be larger than 50% of the log. So if we are actually stalling on trasnaction reservations with an empty AIL and an uncommitted CIL, screwing around with tail pushing wakeups does not address the bigger problem being seen... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx