On Wed 15-01-25 13:00:23, Heming Zhao wrote: > Hello Jan, > > On 1/15/25 09:32, Liebes Wang wrote: > > The bisection log shows the first cause commit is a09decff5c32060639a685581c380f51b14e1fc2: > > a09decff5c32 jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal > > > > The full bisection log is attached. Hope this helps. > > This bisearch commit a09decff5c32 appears to be the root cause > of this issue. It fixed one issue but introduced another. > > Syzbot tested the patch with calling jbd2_journal_wipe() with 'write=1'. > The Syzbot test result [1] shows that the same WARN_ON() is triggered > in a subsequent routine – the classic whack-a-mole! > > Back to commit a09decff5c32, it opened a door to allow jbd2 to update > sb regardless of whether the value of sb items are correct. > > To fix a09decff5c32, it seems that jbd2 needs to add more sanity check > codes in a sub-routine of jbd2_journal_load(). > > btw, in my view, this is a jbd2 issue not ocfs2/ext4 issue. > > [1]: https://lore.kernel.org/ocfs2-devel/04a9ad29-51de-4b50-a5bb-56f91817639d@xxxxxxxx/T/#m86d01f83d808868bb5e6548d30f79b4f9f889b13 Thanks for debugging this! So I'm not 100% convinced this is only jbd2 bug because jbd2_journal_recover() was never intended to be called after jbd2_journal_skip_recovery() (called from jbd2_journal_wipe()). You're supposed to call either jbd2_journal_wipe() or jbd2_journal_recover() but not both. So IMO this needs fixing in OCFS2 code. That being said you've also pointed at one bug in jbd2 code - the WARN_ON(!sb->s_sequence) in jbd2_journal_update_sb_log_tail() is indeed wrong. We were inconsistent inside jbd2 whether TID 0 is considered valid or not and relatively recently we've decided to accept TID 0 as valid but this place was left out. I'll send a fix for that. Honza > > Heming Zhao <heming.zhao@xxxxxxxx <mailto:heming.zhao@xxxxxxxx>> 于2025年1月14日周二 22:51写道: > > > > Hi Ted, > > > > On 1/14/25 21:38, Theodore Ts'o wrote: > > > On Tue, Jan 14, 2025 at 02:25:21PM +0800, Heming Zhao wrote: > > >> > > >> The root cause appears to be that the jbd2 bypass recovery logic > > >> is incorrect. > > > > > > Heming, thanks for taking a look. > > > > > > I'm not convinced the root cause is what you've stated. When > > > jbd2_journal_wipe() calls jbd2_mark_journal_empty(), s_start gets set > > > to zero: > > > > Actually, ocfs2 calls jbd2_journal_wipe() with 'write=0' (hard coded), > > so jbd2_mark_journal_empty() isn't called during the ocfs2 mount > > phase. This means the following deduction won't apply in this case. > > > > -- Heming > > > > > > > > sb->s_start = cpu_to_be32(0); > > > > > > This then gets checked in jbd2_journal_recovery: > > > > > > if (!sb->s_start) { > > > jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n", > > > be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head)); > > > journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1; > > > journal->j_head = be32_to_cpu(sb->s_head); > > > return 0; > > > } > > > > > > I suspect that there is something else wrong with jbd2's superblock, > > > since this normally works in the absence of malicious fs image > > > fuzzing, such that when jbd2_journal_load() calls reset_journal() > > > after jbd2_journal_recover() correctly bypasses recovery, the WARN_ON > > > gets triggered. > > > > > > I'd suggest that you enable jbd2 debugging so we can see all of the > > > jbd2_debug() message to understand what might be going on. > > > > > > By the way, given that this is only a WARN_ON, and it involves > > > malicious image fuzzing, this is probably a valid jbd2 bug, but it's > > > not actually a security bug. Sure, someone silly enough to pick up a > > > maliciously corrupted USB thumb drive dropped in a parking lot and > > > insert it into their desktop, and the distribution is silly enoough to > > > allow automount, the worse that can happen is that the system to > > > reboot if the system is configured to panic on a WARNING. So feel > > > free to prioritize your investigation appropriately. :-) > > > > > > Cheers, > > > > > > - Ted > > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR