On Sat 22-02-25 14:10:22, Ojaswin Mujoo wrote: > Presently we always BUG_ON if trying to start a transaction on a journal > marked with JBD2_UNMOUNT, since this should never happen. However while > running stress tests it was observed that in case of some error handling > paths, it is possible for update_super_work to start a transaction after > the journal is destroyed eg: > > (umount) > ext4_kill_sb > kill_block_super > generic_shutdown_super > sync_filesystem /* commits all txns */ > evict_inodes > /* might start a new txn */ > ext4_put_super > flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */ > jbd2_journal_destroy > journal_kill_thread > journal->j_flags |= JBD2_UNMOUNT; > jbd2_journal_commit_transaction > jbd2_journal_get_descriptor_buffer > jbd2_journal_bmap > ext4_journal_bmap > ext4_map_blocks > ... > ext4_inode_error > ext4_handle_error > schedule_work(&sbi->s_sb_upd_work) > > /* work queue kicks in */ > update_super_work > jbd2_journal_start > start_this_handle > BUG_ON(journal->j_flags & > JBD2_UNMOUNT) > > Hence, make sure we only defer the update of ext4 sb if the sb is still > active. Otherwise, just fallback to an un-journaled commit. > > The important thing to note here is that we must only defer sb update if > we have not yet flushed the s_sb_update_work queue in umount path else > this race can be hit (point 1 below). Since we don't have a direct way > to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit > subtle so adding some notes below for future reference: > > 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT > == 0) however this is not correct since we could end up scheduling work > after it has been flushed: > > ext4_put_super > flush_work(&sbi->s_sb_upd_work) > > **kjournald2** > jbd2_journal_commit_transaction > ... > ext4_inode_error > /* JBD2_UNMOUNT not set */ > schedule_work(s_sb_upd_work) > > jbd2_journal_destroy > journal->j_flags |= JBD2_UNMOUNT; > > **workqueue** > update_super_work > jbd2_journal_start > start_this_handle > BUG_ON(JBD2_UNMOUNT) > > Something like the above doesn't happen with SB_ACTIVE check because we > are sure that the workqueue would be flushed at a later point if we are > in the umount path. > > 2. We don't need a similar check in ext4_grp_locked_error since it is > only called from mballoc and AFAICT it would be always valid to schedule > work here. > > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") > Reported-by: Mahesh Kumar <maheshkumar657g@xxxxxxxxx> > Suggested-by: Ritesh Harjani <ritesh.list@xxxxxxxxx> > Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> Good catch! But I think the solution will have to be slightly different. Basing the check on SB_ACTIVE has the problem that you can have racing updates of the sb in the still running transaction and in your direct update leading to inconsistencies after a crash (that was the reason why we've created the s_sb_upd_work in the first place). I would solve this by implementing something like ext4_update_sb_destroy_journal() which will set a flag in sbi, flush the workqueue, and then destroy the journal. And ext4_handle_error() will check for the sbi flag. Honza > --- > fs/ext4/super.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index a963ffda692a..b7341e9acf62 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, > * constraints, it may not be safe to do it right here so we > * defer superblock flushing to a workqueue. > */ > - if (continue_fs && journal) > + if (continue_fs && journal && (sb->s_flags & SB_ACTIVE)) > schedule_work(&EXT4_SB(sb)->s_sb_upd_work); > else > ext4_commit_super(sb); > -- > 2.48.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR