On Mon, Sep 09, 2019 at 09:26:32AM +1000, Dave Chinner wrote: > On Sat, Sep 07, 2019 at 11:10:50AM -0400, Brian Foster wrote: > > On Fri, Sep 06, 2019 at 09:10:14AM -0400, Brian Foster wrote: > > > On Fri, Sep 06, 2019 at 10:02:05AM +1000, Dave Chinner wrote: > > > > > And the raciness concerns..? AFAICT this still opens a race window where > > > > > the AIL can idle on the push target before AIL insertion. > > > > > > > > I don't know what race you see - if the AIL completes a push before > > > > we insert new objects at the head from the current commit, then it > > > > does not matter one bit because the items are being inserted at the > > > > log head, not the log tail where the pushing occurs at. If we are > > > > inserting objects into the AIL within the push target window, then > > > > there is something else very wrong going on, because when the log is > > > > out of space the push target should be nowhere near the LSN we are > > > > inserting inew objects into the AIL at. (i.e. they should be 3/4s of > > > > the log apart...) > > > > > > > > > > I'm not following your reasoning. It sounds to me that you're arguing it > > > doesn't matter that the AIL is not populated from the current commit > > > because the push target should be much farther behind the head. If > > > that's the case, why does this patch order the AIL push after a > > > ->l_last_sync_lsn update? That's the LSN of the most recent commit > > > record to hit the log and hence the new physical log head. > > > > > > Side note: I think the LSN of the commit record iclog is different and > > > actually ahead of the LSN associated with AIL insertion. I don't > > > necessarily think that's a problem given how the log subsystem behaves > > > today, but it's another subtle/undocumented (and easily avoidable) quirk > > > that may not always be so benign. > > > > > > > Just to put a finer point on this (and since this seems to be the only > > way I can get you to consider nontrivial feedback to your patches): > > If I can't make head or tail of the problem you are describing, > exactly how am I supposed to respond? If I'm unable to get my point > across, I'd much prefer to spend my time on patches than on going > around in circles. I'm not interested in winning arguments. I'm not > interested in spending lots of time discussing theoretical problems > with the current set of fixes that don't exist once the root cause > we've already identified is fixed. My time is much better spent > fixing that root cause... > > Keep in mind that I also have a lot of different, complex things > going on at once that all require total focus while I'm looking at > them, so it can take days for me to cycle through everything and get > back to past topics. Delay doesn't mean I haven't read your response > or taken it on board, it just means I don't have time to write a > -meaingful response- straight away. > > > kworker/0:1H-220 [000] ...1 3869.403829: xlog_state_do_callback: 2691: l_last_sync_lsn 0x15000021f6 > > kworker/0:1H-220 [000] ...1 3869.403864: xfs_ail_push: 639: ail_target 0x15000021f6 > > Which implies that the log has less than 25% of space free because > we've issued a push, and that the distance we push is bound by the > log head. > > > <...>-215246 [002] ...1 3875.568561: xfsaild: 403: empty (target 0x15000021f6) > > <...>-215246 [002] .... 3875.568649: xfsaild: 589: idle > > has an empty AIL. IOWs, you are creating the situation where the CIL > has not been allowed to run and hence has violated the >50% log size > limit on transactions. This goes away once we prevent the CIL from > doing this. > > > kworker/0:1H-220 [000] ...1 3889.843872: xfs_trans_ail_update_bulk: 746: inserted lsn 0x1500001bf6 > > Ok, so what you see here is somewhat intentional, based on the fact > that the LSN used for items is different to the LSN used for the > commit record (start of commit vs end of commit). We don't want to > push the currently commiting items instantly to disk as that defeats > the "delayed write" behaviour the AIL uses to allow efficient > relogging to occur. > > The next commit will do a similar push during with the new > l_last_sync_lsn, which causes the target to point at the new > last_sync_lsn and so all the items in the AIL from the previous > commit that haven't been relogged (pinned) in the current commit > will get pushed. i.e. commit N will cause commit (N - 1) to get > pushed. > > This will continue while we are in a situation where the current log > head location is limiting the push target and we are completely out > of log reservation space. Once we get to the point where the > physical head of the log is more than 25% of the log away from the > tail, the push target will stop being limited by the l_last_sync_lsn > and we'll go back to triggering push target updates via the tail of > the log moving forwards as we currently do. IOWs, this "log head > pushing" behaviour is likely only necessary for the first 2-3 CIL > commits of a workload, then we fall back into the normal tail > pushing scenario. > > > This is an instance of xfsaild going idle between the time this > > new AIL push sets the target based on the iclog about to be > > committed and AIL insertion of the associated log items, > > reproduced via a bit of timing instrumentation. Don't be > > distracted by the timestamps or the fact that the LSNs do not > > match because the log items in the AIL end up indexed by the start > > lsn of the CIL checkpoint (whereas last_sync_lsn refers to the > > commit record). The point is simply that xfsaild has completed a > > push of a target that hasn't been inserted yet. > > AFAICT, what you are showing requires delaying of the CIL push to the > point it violates a fundamental assumption about commit sizes, which > is why I largely think it's irrelevant. > The CIL checkpoint size is an unrelated side effect of the test I happened to use, not a fundamental cause of the problem it demonstrates. Fixing CIL checkpoint size issues won't change anything. Here's a different variant of the same problem with a small enough number of log items such that background CIL pushing is not a factor: <...>-79670 [000] ...1 56126.015522: xfs_log_force: dev 253:4 lsn 0x0 caller xfs_log_worker+0x2f/0xf0 [xfs] kworker/0:1H-220 [000] ...1 56126.030587: __xlog_grant_push_ail: 1596: threshold_lsn 0x1000032e4 ... <...>-81293 [000] ...2 56126.032647: xfs_ail_delete: dev 253:4 lip 00000000cbe82125 old lsn 1/13026 new lsn 1/13026 type XFS_LI_INODE flags IN_AIL <...>-81633 [000] .... 56126.053544: xfsaild: 588: idle ->ail_target 0x1000032e4 kworker/0:1H-220 [000] ...2 56127.038835: xfs_ail_insert: dev 253:4 lip 00000000a44ab1ef old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL kworker/0:1H-220 [000] ...2 56127.038911: xfs_ail_insert: dev 253:4 lip 0000000028d2061f old lsn 0/0 new lsn 1/13028 type XFS_LI_INODE flags IN_AIL .... This sequence starts with one log item in the AIL and some number of items in the CIL such that a checkpoint executes from the background log worker. The worker forces the CIL and log I/O completion issues an AIL push that is truncated by the recently updated ->l_last_sync_lsn due to outstanding transaction reservation and small AIL size. This push races with completion of a previous push that empties the AIL and iclog callbacks insert log items for the current checkpoint at the LSN target xfsaild just idled at. Brian > > A couple additional notes.. I don't see further side effects in the > > variant I reproduced, I suspect because we have other wakeups that > > squash this transient state created by the race, > > Right, if we do run out of log space, the log reservation tail > pushing mechanisms takes over and does the right thing. > > > > > i.e. we changed the unlinked inode processing in a way that > > > > the kernel can now runs tens of thousands of unlink transactions > > > > without yeilding the CPU. That violated the "CIL push work will run > > > > within a few transactions of the background push occurring" > > > > mechanism the workqueue provided us with and that, fundamentally, is > > > > the underlying issue here. It's not a CIL vs empty AIL vs log > > > > reservation exhaustion race condition - that's just an observable > > > > symptom. > > > > > > > > > > Yes, but the point is that's not the only thing that can delay CIL push > > > work. Since the AIL is not populated until the commit record iclog is > > > written out, and background CIL pushing doesn't flush the commit record > > > for the associated checkpoint before it completes, and CIL pushing > > > itself is serialized, a stalled commit record iclog I/O is enough to > > > create "log full, empty AIL" conditions. > > CIL pushing is not actually serialised. Ordered, yes, serialised, > no. ANd stalling an iclog with a commit record should not cause the > log to fill completely - the next CIL push when it overflows should > get it moving long before the log runs out of reservation space. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx