Hi folks, This is followup from the first set of log fixes for for-next that were posted here: https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b The first two patches of this series are updates for those patches, change log below. The rest is the fix for the bigger issue we uncovered in investigating the generic/019 failures, being that we're triggering a zero-day bug in the way log recovery assigns LSNs to checkpoints. The "simple" fix of using the same ordering code as the commit record for the start records in the CIL push turned into a lot of patches once I started cleaning it up, separating out all the different bits and finally realising all the things I needed to change to avoid unintentional logic/behavioural changes. Hence there's some code movement, some factoring, API changes to xlog_write(), changing where we attach callbacks to commit iclogs so they remain correctly ordered if there are multiple commit records in the one iclog and then, finally, strictly ordering the start records.... The original "simple fix" I tested last night ran almost a thousand cycles of generic/019 without a log hang or recovery failure of any kind. The refactored patchset has run a couple hundred cycles of g/019 and g/475 over the last few hours without a failure, so I'm posting this so we can get a review iteration done while I sleep so we can - hopefully - get this sorted out before the end of the week. Cheers, Dave. Version 2: - tested on 5.13-rc6 + linux-xfs/for-next - added strings for XLOG_STATE* variables to tracepoint output. - rewrote the past/future iclog detection to use iclog header LSNs rather than iclog states as the state values do not tell us anything useful about the temporal relativity of the iclog in relation to the current commit iclog. - added patches to strictly order checkpoint start records the same way we strictly order checkpoint commit records.