Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail

Vinicius Peixoto <vpeixoto@xxxxxxxxxx> · Mon, 26 Aug 2024 01:22:54 -0300

Hi all,

I noticed this report from syzbot when going through the preliminary 
tasks for the Linux Kernel Mentorship Program, and thought I'd take a 
stab at solving it. I apologize in advance for any mistakes as I'm still 
very new to kernel development. Either way, here's my analysis:

From what I can tell by looking at the reproducer from syzbot, it is 
trying to mount a file filled with bogus data as an ocfs2 disk, and this 
is triggering an assertion in jbd2_cleanup_journal_tail, which in turn 
causes a panic.

The problematic call stack goes roughly like this:

mount_bdev
  -> ofcs2_mount_volume
    -> ofcs2_check_volume
      -> ofcs2_journal_load
        -> jbd2_journal_load
          -> journal_reset (fails)

Since the disk data is bogus, journal_reset fails with -EINVAL ("JBD2: 
Journal too short (blocks 2-1024)"); this leaves journal->j_head == 
NULL. However, jbd2_journal_load clears the JBD2_ABORT flag right before 
calling journal_reset. This leads to a problem later when 
ofcs2_mount_volume tries to flush the journal as part of the cleanup 
when aborting the mount operation:

  -> ofcs2_mount_volume (error; goto out_system_inodes)
    -> ofcs2_journal_shutdown
      -> jbd2_journal_flush
        -> jbd2_cleanup_journal_tail (J_ASSERT fails)

This failure happens because of the following code:

        if (is_journal_aborted(journal))
                return -EIO;

        if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
                return 1;
        J_ASSERT(blocknr != 0);

Since JBD2_ABORT was cleared in jbd2_journal_load earlier, we enter 
jbd2_journal_get_log_tail, which will set *blocknr = journal->j_head 
(which is NULL) and then trigger the assertion, causing a panic.

I confirmed that setting the JBD2_ABORT flag in journal_reset before 
returning -EINVAL fixes the problem:

        static int journal_reset(journal_t *journal)
                        journal_fail_superblock(journal);
        +               journal->j_flags |= JBD2_ABORT;
                        return -EINVAL;

You can find a proper patch file + the syzbot re-test result in [1]. 
However, I'm not entirely sure whether this is the correct decision, and 
I wanted to confirm that this is an appropriate solution before sending 
a proper patch to the mailing list.

Thanks in advance,
Vinicius

[1] https://syzkaller.appspot.com/bug?extid=8512f3dbd96253ffbe27