Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail

Julian Sun <sunjunchao2870@xxxxxxxxx> · Wed, 28 Aug 2024 19:55:24 +0800

On Wed, 2024-08-28 at 12:54 +0200, Jan Kara wrote:
> On Mon 26-08-24 09:32:08, Theodore Ts'o wrote:
> > On Mon, Aug 26, 2024 at 01:22:54AM -0300, Vinicius Peixoto wrote:
> > > Since the disk data is bogus, journal_reset fails with -EINVAL
> > > ("JBD2:
> > > Journal too short (blocks 2-1024)"); this leaves journal->j_head
> > > == NULL.
> > > However, jbd2_journal_load clears the JBD2_ABORT flag right
> > > before calling
> > > journal_reset. This leads to a problem later when
> > > ofcs2_mount_volume tries
> > > to flush the journal as part of the cleanup when aborting the
> > > mount
> > > operation:
> > > 
> > >   -> ofcs2_mount_volume (error; goto out_system_inodes)
> > >     -> ofcs2_journal_shutdown
> > >       -> jbd2_journal_flush
> > >         -> jbd2_cleanup_journal_tail (J_ASSERT fails)
> > > ...
> > > 
> > > I confirmed that setting the JBD2_ABORT flag in journal_reset
> > > before
> > > returning -EINVAL fixes the problem:
> > > 
> > >         static int journal_reset(journal_t *journal)
> > >                         journal_fail_superblock(journal);
> > >         +               journal->j_flags |= JBD2_ABORT;
> > >                         return -EINVAL;
> > > 
> > > You can find a proper patch file + the syzbot re-test result in
> > > [1].
> > > However, I'm not entirely sure whether this is the correct
> > > decision, and I
> > > wanted to confirm that this is an appropriate solution before
> > > sending a
> > > proper patch to the mailing list.
> > 
> > The reason why this isn't an issue with ext4 is because the normal
> > "right" way to do this is if jbd2_journal_load() returns an error,
> > is
> > to call jbd2_journal_destroy() to release the data structures with
> > the
> > journal --- and then don't try to use the journal afterwards.
> 
> Yep. OCFS2 guys are actually looking into fixing this in OCFS2
> (
> https://lore.kernel.org/ocfs2-devel/20240819131120.746077-1-sunjunchao2870@xxxxxxxxx
> /)
> Not sure what's the status though. Julian, did you send v2 of your
> fix?

Yeah, I have already submitted the v2[1] of the patch and it is
awaiting review.

[1]:
https://lore.kernel.org/ocfs2-devel/20240823083150.17590-1-sunjunchao2870@xxxxxxxxx/
> 
>                                                                 Honza
> 
> > The weird thing is that there are two code paths in ocfs2 that
> > calls
> > jbd2_journal_load().  One is in ocfs2_replay_journal() which does
> > what
> > ext4 does.  The other is ocfs2_load_journal() which does *not* do
> > this, and this is the one which you saw in the syzkaller
> > reproducer.
> > It looks like one codepath is used to replay the ocfs2 for some
> > other
> > node, and the is to load (and presumably later, replay) the journal
> > for the mounting node.
> > 
> > There are also some other things which look very confusing, such as
> > the
> > fact that ocfs2_journal_shutdown calls igrab:
> > 
> >         /* need to inc inode use count - jbd2_journal_destroy will
> > iput. */
> >         if (!igrab(inode))
> >                 BUG();
> > 
> > ... which is *weird*.  Normaly the journal inode refcount is
> > expected
> > to be bumped before calling jbd2_journal_init_inode() --- which
> > normally is done by an iget() function, and is needed to make sure
> > the
> > journal inode isn't released out from under the jbd2 layer.  It
> > looks
> > like ocfs2 uses the journal inode for other stuff so get the
> > journal
> > inode without calling something like iget().  Which is OK, I guess,
> > but it means that there are a whole bunch of places where it has to
> > call !BUG_ON(igrab(journal->j_inode) before calling
> > jbd2_journal_destroy().  It would seem like the *right* thing to do
> > is
> > to bump the refcount in ocfs2_journal_init(), and if for some
> > reason
> > the igrab fails, it can just return an error early, instead of
> > having
> > to resort to BUG_ON() later, and if you don't realize that you have
> > to
> > do this weird igrab() before calling jbd2_journal_destroy(), you'll
> > end up leaking the journal inode.
> > 
> > Anyway, I think the right thing to do is to restructure how ocfs2
> > calls the jbd2 journal, but that's going to require a more careful
> > examination of the code paths.  Your patch is a bit of a blunt
> > force
> > hack that should be harmless, but it's probably not the best way to
> > fix the problem, although doing it "right" would be more
> > complicated....
> > 
> >                                         - Ted

Thanks,
-- 
Julian Sun <sunjunchao2870@xxxxxxxxx>