On Wed, 2024-08-28 at 12:54 +0200, Jan Kara wrote: > On Mon 26-08-24 09:32:08, Theodore Ts'o wrote: > > On Mon, Aug 26, 2024 at 01:22:54AM -0300, Vinicius Peixoto wrote: > > > Since the disk data is bogus, journal_reset fails with -EINVAL > > > ("JBD2: > > > Journal too short (blocks 2-1024)"); this leaves journal->j_head > > > == NULL. > > > However, jbd2_journal_load clears the JBD2_ABORT flag right > > > before calling > > > journal_reset. This leads to a problem later when > > > ofcs2_mount_volume tries > > > to flush the journal as part of the cleanup when aborting the > > > mount > > > operation: > > > > > > -> ofcs2_mount_volume (error; goto out_system_inodes) > > > -> ofcs2_journal_shutdown > > > -> jbd2_journal_flush > > > -> jbd2_cleanup_journal_tail (J_ASSERT fails) > > > ... > > > > > > I confirmed that setting the JBD2_ABORT flag in journal_reset > > > before > > > returning -EINVAL fixes the problem: > > > > > > static int journal_reset(journal_t *journal) > > > journal_fail_superblock(journal); > > > + journal->j_flags |= JBD2_ABORT; > > > return -EINVAL; > > > > > > You can find a proper patch file + the syzbot re-test result in > > > [1]. > > > However, I'm not entirely sure whether this is the correct > > > decision, and I > > > wanted to confirm that this is an appropriate solution before > > > sending a > > > proper patch to the mailing list. > > > > The reason why this isn't an issue with ext4 is because the normal > > "right" way to do this is if jbd2_journal_load() returns an error, > > is > > to call jbd2_journal_destroy() to release the data structures with > > the > > journal --- and then don't try to use the journal afterwards. > > Yep. OCFS2 guys are actually looking into fixing this in OCFS2 > ( > https://lore.kernel.org/ocfs2-devel/20240819131120.746077-1-sunjunchao2870@xxxxxxxxx > /) > Not sure what's the status though. Julian, did you send v2 of your > fix? Yeah, I have already submitted the v2[1] of the patch and it is awaiting review. [1]: https://lore.kernel.org/ocfs2-devel/20240823083150.17590-1-sunjunchao2870@xxxxxxxxx/ > > Honza > > > The weird thing is that there are two code paths in ocfs2 that > > calls > > jbd2_journal_load(). One is in ocfs2_replay_journal() which does > > what > > ext4 does. The other is ocfs2_load_journal() which does *not* do > > this, and this is the one which you saw in the syzkaller > > reproducer. > > It looks like one codepath is used to replay the ocfs2 for some > > other > > node, and the is to load (and presumably later, replay) the journal > > for the mounting node. > > > > There are also some other things which look very confusing, such as > > the > > fact that ocfs2_journal_shutdown calls igrab: > > > > /* need to inc inode use count - jbd2_journal_destroy will > > iput. */ > > if (!igrab(inode)) > > BUG(); > > > > ... which is *weird*. Normaly the journal inode refcount is > > expected > > to be bumped before calling jbd2_journal_init_inode() --- which > > normally is done by an iget() function, and is needed to make sure > > the > > journal inode isn't released out from under the jbd2 layer. It > > looks > > like ocfs2 uses the journal inode for other stuff so get the > > journal > > inode without calling something like iget(). Which is OK, I guess, > > but it means that there are a whole bunch of places where it has to > > call !BUG_ON(igrab(journal->j_inode) before calling > > jbd2_journal_destroy(). It would seem like the *right* thing to do > > is > > to bump the refcount in ocfs2_journal_init(), and if for some > > reason > > the igrab fails, it can just return an error early, instead of > > having > > to resort to BUG_ON() later, and if you don't realize that you have > > to > > do this weird igrab() before calling jbd2_journal_destroy(), you'll > > end up leaking the journal inode. > > > > Anyway, I think the right thing to do is to restructure how ocfs2 > > calls the jbd2 journal, but that's going to require a more careful > > examination of the code paths. Your patch is a bit of a blunt > > force > > hack that should be harmless, but it's probably not the best way to > > fix the problem, although doing it "right" would be more > > complicated.... > > > > - Ted Thanks, -- Julian Sun <sunjunchao2870@xxxxxxxxx>