On Tue, Sep 10, 2019 at 05:56:28AM -0400, Brian Foster wrote: > On Mon, Sep 09, 2019 at 09:26:32AM +1000, Dave Chinner wrote: > > On Sat, Sep 07, 2019 at 11:10:50AM -0400, Brian Foster wrote: > > > This is an instance of xfsaild going idle between the time this > > > new AIL push sets the target based on the iclog about to be > > > committed and AIL insertion of the associated log items, > > > reproduced via a bit of timing instrumentation. Don't be > > > distracted by the timestamps or the fact that the LSNs do not > > > match because the log items in the AIL end up indexed by the start > > > lsn of the CIL checkpoint (whereas last_sync_lsn refers to the > > > commit record). The point is simply that xfsaild has completed a > > > push of a target that hasn't been inserted yet. > > > > AFAICT, what you are showing requires delaying of the CIL push to the > > point it violates a fundamental assumption about commit sizes, which > > is why I largely think it's irrelevant. > > > > The CIL checkpoint size is an unrelated side effect of the test I > happened to use, not a fundamental cause of the problem it demonstrates. > Fixing CIL checkpoint size issues won't change anything. Here's a > different variant of the same problem with a small enough number of log > items such that background CIL pushing is not a factor: > > <...>-79670 [000] ...1 56126.015522: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] > kworker/0:1H-220 [000] ...1 56126.030587: __xlog_grant_push_ail: 1596: threshold_lsn 0x1000032e4 > ... > <...>-81293 [000] ...2 56126.032647: xfs_ail_delete: dev 253:4 lip 00000000cbe82125 old lsn 1/13026 new lsn 1/13026 type XFS_LI_INODE flags IN_AIL > <...>-81633 [000] .... 56126.053544: xfsaild: 588: idle ->ail_target 0x1000032e4 > kworker/0:1H-220 [000] ...2 56127.038835: xfs_ail_insert: dev 253:4 lip 00000000a44ab1ef old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL > kworker/0:1H-220 [000] ...2 56127.038911: xfs_ail_insert: dev 253:4 lip 0000000028d2061f old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL > .... > > This sequence starts with one log item in the AIL and some number of > items in the CIL such that a checkpoint executes from the background log > worker. The worker forces the CIL and log I/O completion issues an AIL > push that is truncated by the recently updated ->l_last_sync_lsn due to > outstanding transaction reservation and small AIL size. This push races > with completion of a previous push that empties the AIL and iclog > callbacks insert log items for the current checkpoint at the LSN target > xfsaild just idled at. I'm just not seeing what the problem here is. The behaviour you are describing has been around since day zero and doesn't require the addition of an ail push from iclog completion to trigger. Prior to this series, it would be: process 1 reservation log completion xfsaild <completes metadata IO> xfs_ail_delete() mlip_changed xlog_assign_tail_lsn_locked() ail empty, sets l_last_sync = 0x1000032e2 xfs_log_space_wake() xlog_state_do_callback sets CALLBACK sets last_sync_lsn to iclog head -> 0x1000032e4 <drops icloglock, gets preempted> <wakes> xlog_grant_head_wait free_bytes < need_bytes xlog_grant_push_ail() xlog_push_ail() ->ail_target 0x1000032e4 <sleeps> <wakes> sets prev target to 0x1000032e4 sees empty AIL <sleeps> <runs again> runs callbacks xfs_ail_insert(lsn = 0x1000032e4) and now we have the AIL push thread asleep with items in it at the push threshold. IOWs, what you describe has always been possible, and before the CIL was introduced this sort of thing happened quite a bit because iclog completions freed up much less space in the log than a CIL commit completion. It's not a problem, however, because if we are out of transaction reservation space we must have transactions in progress, and as long as they make progress then the commit of each transaction will end up calling xlog_ungrant_log_space() to return the unused portion of the transaction reservation. That calls xfs_log_space_wake() to allow reservation waiters to try to make progress. If there's still not enough space reservation after the transaction in progress has released it's reservation, then it goes back to sleep. As long as we have active transactions in progress while there are transaction reservations waiting on reservation space, there will be a wakeup vector for the reservation independent of the CIL, iclogs and AIL behaviour. [ Yes, there was a bug here, in the case xfs_log_space_wake() did not issue a wakeup because of not enough space being availble and the push target was limited by the old log head location. i.e. nothing ever updated the push target to reflect the new log head and so the tail might never get moved now. That particular bug was fixed by a an earlier patch in the series, so we can ignore it here. ] IOWs, if the AIL is empty, the CIL cannot consume more than 25% of the log space, and we have transactions waiting on log reservation space, then we must have enough transactions in progress to cover at least 75% of the log space. Completion of those transactions will wake waiters and, if necessary, push the AIL again to keep the log tail moving appropriately. This handles the AIL empty and "insert before target" situations you are concerned about just fine, as long as we have a guarantee of forwards progress. Bounding the CIL size provides that forwards progress guarantee for the CIL... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx