On Thu, Jun 15, 2023 at 11:49:41AM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@xxxxxxxxxx> > > We got a NULL pointer dereference issue below while running generic/475 > I/O failure pressure test. Have you been able to reproduce this failure without the "recheck checkpoint" series applied? I have not, so like with the e2fsck bug fix, I can understand how the bug fix worked, but I still don't understand why I wasn't seeing until I tried to apply the "recheck chekcpoint" and the following patches in that patch series. > If the journal super block had been read and verified, there is no need > to call bh_read() read it again even if it has been failed to written > out. So the fix could be simply move buffer_verified(bh) in front of > bh_read(). > > Fixes: d9eafe0afafa ("jbd2: factor out journal initialization from journal_get_superblock()") That works, but it's worth noting that commit d9eafe0afafa caused the failure by removing the check on j_journal_version to determine whether the superblock was read or not. If the journal superblock had been previously read, j_journal_version would be either 1 or 2. If it had been zero, then superblock was not read. So from commit d9eafe0afafa: /* Load journal superblock if it is not loaded yet. */ - if (journal->j_format_version == 0 && - journal_get_superblock(journal) != 0) + if (journal_get_superblock(journal)) return 0; if (!jbd2_format_support_feature(journal)) return 0; The comment "Load journal superblock if it is not loaded yet." should be removed, since it no longer makes sense once the "journal->j_format_version == 0" check was removed. I'll also note that a problem with d9eafe0afafa is that by removing the j_format_version check, every time we add a revoke header, and we call jbd2_journal_set_features(), this was causing an unconditional read of the journal superblock and that unnecessary I/O could slow down certain workloads. - Ted