On Thu, Sep 12, 2019 at 09:46:06AM -0400, Brian Foster wrote: > On Wed, Sep 11, 2019 at 09:38:58AM +1000, Dave Chinner wrote: > > On Tue, Sep 10, 2019 at 05:56:28AM -0400, Brian Foster wrote: > > > On Mon, Sep 09, 2019 at 09:26:32AM +1000, Dave Chinner wrote: > > > > On Sat, Sep 07, 2019 at 11:10:50AM -0400, Brian Foster wrote: > > > > > This is an instance of xfsaild going idle between the time this > > > > > new AIL push sets the target based on the iclog about to be > > > > > committed and AIL insertion of the associated log items, > > > > > reproduced via a bit of timing instrumentation. Don't be > > > > > distracted by the timestamps or the fact that the LSNs do not > > > > > match because the log items in the AIL end up indexed by the start > > > > > lsn of the CIL checkpoint (whereas last_sync_lsn refers to the > > > > > commit record). The point is simply that xfsaild has completed a > > > > > push of a target that hasn't been inserted yet. > > > > > > > > AFAICT, what you are showing requires delaying of the CIL push to the > > > > point it violates a fundamental assumption about commit sizes, which > > > > is why I largely think it's irrelevant. > > > > > > > > > > The CIL checkpoint size is an unrelated side effect of the test I > > > happened to use, not a fundamental cause of the problem it demonstrates. > > > Fixing CIL checkpoint size issues won't change anything. Here's a > > > different variant of the same problem with a small enough number of log > > > items such that background CIL pushing is not a factor: > > > > > > <...>-79670 [000] ...1 56126.015522: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] > > > kworker/0:1H-220 [000] ...1 56126.030587: __xlog_grant_push_ail: 1596: threshold_lsn 0x1000032e4 > > > ... > > > <...>-81293 [000] ...2 56126.032647: xfs_ail_delete: dev 253:4 lip 00000000cbe82125 old lsn 1/13026 new lsn 1/13026 type XFS_LI_INODE flags IN_AIL > > > <...>-81633 [000] .... 56126.053544: xfsaild: 588: idle ->ail_target 0x1000032e4 > > > kworker/0:1H-220 [000] ...2 56127.038835: xfs_ail_insert: dev 253:4 lip 00000000a44ab1ef old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL > > > kworker/0:1H-220 [000] ...2 56127.038911: xfs_ail_insert: dev 253:4 lip 0000000028d2061f old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL > > > .... > > > > > > This sequence starts with one log item in the AIL and some number of > > > items in the CIL such that a checkpoint executes from the background log > > > worker. The worker forces the CIL and log I/O completion issues an AIL > > > push that is truncated by the recently updated ->l_last_sync_lsn due to > > > outstanding transaction reservation and small AIL size. This push races > > > with completion of a previous push that empties the AIL and iclog > > > callbacks insert log items for the current checkpoint at the LSN target > > > xfsaild just idled at. > > > > I'm just not seeing what the problem here is. The behaviour you are > > describing has been around since day zero and doesn't require the > > addition of an ail push from iclog completion to trigger. Prior to > > this series, it would be: > > > > A few days ago you said that if we're inserting log items before the > push target, "something is very wrong." Since this was what I was > concerned about, I attempted to manufacture the issue to demonstrate. > You suggested the first reproducer I came up with was a separate problem > (related to CIL size issues), so I came up with the one above to avoid > that distraction. Now you're telling me this has always happened and is > fine.. > > While I don't think this is quite accurate (more below), I do find this > reasoning somewhat amusing in that it essentially implies that this > patch itself is dubious. If this new AIL push is required to fix a real > issue, and this race is essentially manifest as implied, then this patch > can't possibly reliably fix the original problem. Anyways, that is > neither here nor there.. > > All of the details of this particular issue aside, I do think there's a > development process problem here. It shouldn't require an extended game > of whack-a-mole with this kind of inconsistent reasoning just to request > a trivial change to a patch (you also implied in a previous response it > was me wasting your time on this topic) that closes an obvious race and > otherwise has no negative effect. Someone is being unreasonable here and > I don't think it's me. More importantly, discussion of open issues > shouldn't be a race against the associated patch being merged. :/ > > > process 1 reservation log completion xfsaild > > <completes metadata IO> > > xfs_ail_delete() > > mlip_changed > > xlog_assign_tail_lsn_locked() > > ail empty, sets l_last_sync = 0x1000032e2 > > xfs_log_space_wake() > > xlog_state_do_callback > > sets CALLBACK > > sets last_sync_lsn to iclog head > > -> 0x1000032e4 > > <drops icloglock, gets preempted> > > <wakes> > > xlog_grant_head_wait > > free_bytes < need_bytes > > xlog_grant_push_ail() > > xlog_push_ail() > > ->ail_target 0x1000032e4 > > <sleeps> > > <wakes> > > sets prev target to 0x1000032e4 > > sees empty AIL > > <sleeps> > > <runs again> > > runs callbacks > > xfs_ail_insert(lsn = 0x1000032e4) > > > > and now we have the AIL push thread asleep with items in it at the > > push threshold. IOWs, what you describe has always been possible, > > and before the CIL was introduced this sort of thing happened quite > > a bit because iclog completions freed up much less space in the log > > than a CIL commit completion. > > > > I was suspicious that this could occur prior to this change but I hadn't > confirmed. The scenario documented above cannot occur because a push on > an empty AIL has no effect. The target doesn't move and the task isn't > woken. That said, I still suspect the race can occur with the current > code via between a grant head waiter, AIL emptying and iclog completion. > > This just speaks to the frequency of the problem, though. I'm not > convinced it's something that happens "quite a bit" given the nature of > the 3-way race. I also don't agree that existence of a historical > problem somehow excuses introduction a new variant of the same problem. > Instead, if this patch exposes a historical problem that simply had no > noticeable impact to this point, we should probably look into whether it > needs fixing too. > > > It's not a problem, however, because if we are out of transaction > > reservation space we must have transactions in progress, and as long > > as they make progress then the commit of each transaction will end > > up calling xlog_ungrant_log_space() to return the unused portion of > > the transaction reservation. That calls xfs_log_space_wake() to > > allow reservation waiters to try to make progress. > > > > Yes, this is why I don't see immediate side effects in the tests I've > run so far. The assumptions you're basing this off are not always true, > however. Particularly on smaller (<= 1GB) filesystems, it's relatively > easy to produce conditions where the entire reservation space is > consumed by open transactions that don't ultimately commit anything to > the log subsystem and thus generate no forward progress. > > > If there's still not enough space reservation after the transaction > > in progress has released it's reservation, then it goes back to > > sleep. As long as we have active transactions in progress while > > there are transaction reservations waiting on reservation space, > > there will be a wakeup vector for the reservation independent of > > the CIL, iclogs and AIL behaviour. > > > > We do have clean transaction cancel and error scenarios, existing log > deadlock vectors, increasing reliance on long running transactions via > deferred ops, scrub, etc. Also consider the fact that open transactions > consume considerably more reservation than committed transactions on > average. > > I'm not saying it's likely for a real world workload to consume the > entirety of log reservation space via open transactions and then release > it without filesystem modification (and then race with log I/O and AIL > emptying), but from the perspective of proving the existence of a bug > it's really not that difficult to produce. I've not seen a real world > workload that reproduces the problems fixed by any of these patches > either, but we still fix them. > > > [ Yes, there was a bug here, in the case xfs_log_space_wake() did > > not issue a wakeup because of not enough space being availble and > > the push target was limited by the old log head location. i.e. > > nothing ever updated the push target to reflect the new log head and > > so the tail might never get moved now. That particular bug was fixed > > by a an earlier patch in the series, so we can ignore it here. ] > > > > IOWs, if the AIL is empty, the CIL cannot consume more than 25% of > > the log space, and we have transactions waiting on log reservation > > space, then we must have enough transactions in progress to cover at > > least 75% of the log space. Completion of those transactions will > > wake waiters and, if necessary, push the AIL again to keep the log > > tail moving appropriately. This handles the AIL empty and "insert > > before target" situations you are concerned about just fine, as long > > as we have a guarantee of forwards progress. Bounding the CIL size > > provides that forwards progress guarantee for the CIL... > > > > I think you have some tunnel vision or something going on here with > regard to the higher level architectural view of how things are supposed > to operate in a normal running/steady state vs simply what can and > cannot happen in the code. I can't really tell why/how, but the only > suggestion I can make is to perhaps separate from this high level view > of things and take a closer look at the code. This is a simple code bug, > not some grand architectural flaw. The context here is way out of whack. > The repeated unrelated and overblown architectural assertions come off > as indication of lack of any real argument to allow this race to live. > There is simply no such guarantee of forward progress in all scenarios > that produce the conditions that can cause this race. > > Yet another example: > > <...>-369 [002] ...2 220.055746: xfs_ail_insert: dev 253:4 lip 00000000ddb123f2 old lsn 0/0 new lsn 1/248 type XFS_LI_INODE flags IN_AIL > <...>-27 [003] ...1 224.753110: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] > <...>-404 [003] ...1 224.775551: __xlog_grant_push_ail: 1596: threshold_lsn 0x1000000fa > kworker/3:1-39 [003] ...2 224.777953: xfs_ail_delete: dev 253:4 lip 00000000ddb123f2 old lsn 1/248 new lsn 1/248 type XFS_LI_INODE flags IN_AIL > xfsaild/dm-4-1034 [000] .... 224.797919: xfsaild: 588: idle ->ail_target 0x1000000fa > kworker/3:1H-404 [003] ...2 225.841198: xfs_ail_insert: dev 253:4 lip 000000006845aeed old lsn 0/0 new lsn 1/250 type XFS_LI_INODE flags IN_AIL > kworker/3:1-39 [003] ...1 254.962822: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] > ... > kworker/3:2-1920 [003] ...1 3759.291275: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] > > > # cat /sys/fs/xfs/dm-4/log/log_*lsn > 1:252 > 1:250 > > This instance of the race uses the same serialization instrumentation to > control execution timing and whatnot as before (i.e. no functional > changes). First, an item is inserted into the AIL. Immediately after AIL > insertion, another transaction commits to the CIL (not shown in the > trace). The background log worker comes around a few seconds later and > forces the log/CIL. The checkpoint for this log force races with an AIL > delete and idle (same as before). AIL insertion occurs at the push > target xfsaild just idled at, but this time reservation pressure > relieves and the filesystem goes idle. > > At this point, nothing occurs on the fs except for continuous background > log worker jobs. Note the timestamp difference between the first > post-race log force and the last in the trace. The log worker runs at > the default 30s interval and has run repeatedly for almost an hour while > failing to push the AIL and subsequently cover the log. To confirm the > AIL is populated, see the log head/tail LSNs reported via sysfs. This > state persists indefinitely so long as the fs is idle. This is a bug. /me stumbles back in after ~2wks, and has a few questions: 1) Are these concerns a reason to hold up this series, or are they a separate bug lurking in the code being touched by the series? AFAICT I think it's the second, but <shrug> my brain is still mush. 2) Er... how do you get the log stuck like this? I see things earlier in the thread like "open transactions that don't ultimately commit anything to the log subsystem" and think "OH, you mean xfs_scrub!" --D > Brian > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx