On 2023/6/15 4:37, Theodore Ts'o wrote: > On Wed, Jun 14, 2023 at 09:25:28PM +0800, Zhang Yi wrote: >> >> Sorry about the regression, I found that this issue is not introduced >> by the first patch in this patch series ("jbd2: recheck chechpointing >> non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal >> initialization from journal_get_superblock()") [1] on your dev branch. >> >> The problem is the journal super block had been failed to write out >> due to IO fault, it's uptodate bit was cleared by >> end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock(). >> And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()-> >> jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(), >> unfortunately, the read IO is also fail, so the error handling in >> journal_fail_superblock() clear the journal->j_sb_buffer, finally lead >> to above NULL pointer dereference issue. > > Thanks for looking into this. What I believe you are saying is that > the root cause is that earlier patch, but it is still something about > the patch "jbd2: recheck chechpointing non-dirty buffer" which is > changing the timing enough that we're hitting this buffer (because > without the "recheck checkpointing" patch, I'm not seeing the NULL > pointer dereference. > > As far as the e2fsck bug that was causing it to hang in the ext4/adv > test scenario, the patch was a simple one, and I have also checked in > a test case which was a reliable reproducer of the problem. (See > attached for the patches and more detail.) > > It is really interesting that "recheck checkpointing" patch is making > enough of a difference that it is unmasking these bugs. If you could > take a look at these changes and perhaps think about how this patch > series could be changing the nature of the corruption (specifically, > how symlink inodes referenced from inline directories could be > corupted with "rechecking checkpointing", thus unmasking the > e2fsprogs, I'd really appreciate it. > Hello Ted. I have found the root cause of which trigger the e2fsck bug, it's a real kernel bug which was introduced in 5b849b5f96b4 ("jbd2: fast commit recovery path") when merging fast commit feature. The problem is that when fast commit is enabled, kernel doesn't replay the journal completely at mount time (there a bug in do_one_pass(), see below for details) if the unrecovered transactions loop back to the head of the normal journal area. If the missing transactions contain a symlink block revoke record and a reuse record of the same block, and the reuse record have been write back to the disk before it's last umount, it could trigger the "Symlink xxx is invalid" after the incomplete journal replay. Meanwhile it the symlink belongs to a inline directory, it will trigger the e2fsck bug. For example, we have a not cleaned image below. | normal journal area | fast commit area | +-------------------------------------------------+------------------+ | tX+1(rere half)|tX+2|...| tX | tX+1(front half) | .... | +-------------------------------------------------+------------------+ / / / / (real head) s_start journal->j_last journal->j_fc_last Transaction X(checkpointed): - Create symlink 'foo' (use block A, contain 500 length of link name) in inline directory 'dir'; Transaction X+1(uncheckpoint): - Remove symlink 'foo' (block A is also freed); Transaction X+2(uncheckpoint): - Create symlink 'bar' (reuse block A, contain 400 length of link name). The above transactions are recorded to the journal, block A is also successfully written back by the background write-back process. If fast_commit feature is enabled, the journal->j_last point to the first unused block behind the normal journal area instead of the whole log area, and the journal->j_fc_last point to the first unused block behind the fast_commit journal area. While doing journal recovery, do_one_pass(PASS_SCAN) should first scan tX+1 in the normal journal area and turn around to the first block once it meet journal->j_last, but the wrap() macro misuse the journal->j_fc_last, so it could not read tX+1's commit block, the recovery quit early mistakenly and missing tX+1 and tX+2. So, after we mount the filesystem, we could leftover an invalid symlink 'foo' in the inline directory and trigger issue. I can reproduce this issue either with or without this patch series ("jbd2: recheck chechpointing non-dirty buffer"), sometimes it takes longer, sometimes it takes less. It's easy to reproduce the "Symlink xxx is invalid" issue, but it's a little hard to trigger the e2fsck bug (which means the invalid symlink should in a inline dir). Especially I could 100% reproduce the fast_commit recovery bug when running generic/475. So I couldn't find any relations between this issue and this patch series. I've send a patch to fix the fast commit bug separately[1]. I cannot reproduce this issue again with this patch, please take a look at that. [1] https://lore.kernel.org/linux-ext4/20230626073322.3956567-1-yi.zhang@xxxxxxxxxxxxxxx/T/#u Thanks, Yi.