On Tue, Feb 25, 2025 at 09:53:10AM +0800, Baokun Li wrote: > On 2025/2/22 16:40, Ojaswin Mujoo wrote: > > Presently we always BUG_ON if trying to start a transaction on a journal > > marked with JBD2_UNMOUNT, since this should never happen. However while > > running stress tests it was observed that in case of some error handling > > paths, it is possible for update_super_work to start a transaction after > > the journal is destroyed eg: > > > > (umount) > > ext4_kill_sb > > kill_block_super > > generic_shutdown_super > > sync_filesystem /* commits all txns */ > > evict_inodes > > /* might start a new txn */ > > ext4_put_super > > flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */ > > jbd2_journal_destroy > > journal_kill_thread > > journal->j_flags |= JBD2_UNMOUNT; > > jbd2_journal_commit_transaction > > jbd2_journal_get_descriptor_buffer > > jbd2_journal_bmap > > ext4_journal_bmap > > ext4_map_blocks > > ... > > ext4_inode_error > Just curious, since jbd2_journal_bmap() only queries the map and does not > create it, how does it fail here? Is there more information in dmesg? > Is s_journal_inum normal after file system corruption? Hey Baokun, So I dug a bit more into the vmcore. The error information in sbi looks like this: s_add_error_count = 1, s_first_error_code = 117, s_first_error_line = 475, s_first_error_ino = 0, s_first_error_block = 0, s_first_error_func = 0xc0080000055300d0 <__func__.6> "ext4_read_block_bitmap_nowait", s_first_error_time = 1737023235, s_last_error_code = 117, s_last_error_line = 609, s_last_error_ino = 8, s_last_error_block = 783, s_last_error_func = 0xc008000005531b10 <__func__.41> "ext4_map_blocks", s_last_error_time = 1737023236, The first error is here: if ((bitmap_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || 474 (bitmap_blk >= ext4_blocks_count(sbi->s_es))) { * 475 ext4_error(sb, "Invalid block bitmap block %llu in " 476 "block_group %u", bitmap_blk, block_group); 477 ext4_mark_group_bitmap_corrupted(sb, block_group, 478 EXT4_GROUP_INFO_BBITMAP_CORRUPT); 479 return ERR_PTR(-EFSCORRUPTED); 480 } and the last error is here: 608 if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) { * 609 ret = check_block_validity(inode, map); 610 if (ret != 0) 611 return ret; 612 } And indeed we have the traces of the first error in dmesg: [75284.713463] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0 [75284.713470] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0 [75284.713476] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0 However, the last error seems strange. It seems like check_block_validity should ideally never fail for a journal inode. Unfortunately, sbi->s_es page is not recorded in the crash dump for some reason so idk the exact value at the time of the check, but looking in journal->j_inode->i_ino, the inode num is 8, which seems fine to me. So yeah, I'm a bit unsure what caused the corruption. I'll look a bit more into the proc01 ltp to see if we can recreate the failure to get more info. > > Thanks, > Baokun > > ext4_handle_error > > schedule_work(&sbi->s_sb_upd_work) > > > > /* work queue kicks in */ > > update_super_work > > jbd2_journal_start > > start_this_handle > > BUG_ON(journal->j_flags & > > JBD2_UNMOUNT) > > > > Hence, make sure we only defer the update of ext4 sb if the sb is still > > active. Otherwise, just fallback to an un-journaled commit. > > > > The important thing to note here is that we must only defer sb update if > > we have not yet flushed the s_sb_update_work queue in umount path else > > this race can be hit (point 1 below). Since we don't have a direct way > > to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit > > subtle so adding some notes below for future reference: > > > > 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT > > == 0) however this is not correct since we could end up scheduling work > > after it has been flushed: > > > > ext4_put_super > > flush_work(&sbi->s_sb_upd_work) > > > > **kjournald2** > > jbd2_journal_commit_transaction > > ... > > ext4_inode_error > > /* JBD2_UNMOUNT not set */ > > schedule_work(s_sb_upd_work) > > > > jbd2_journal_destroy > > journal->j_flags |= JBD2_UNMOUNT; > > > > **workqueue** > > update_super_work > > jbd2_journal_start > > start_this_handle > > BUG_ON(JBD2_UNMOUNT) > > > > Something like the above doesn't happen with SB_ACTIVE check because we > > are sure that the workqueue would be flushed at a later point if we are > > in the umount path. > > > > 2. We don't need a similar check in ext4_grp_locked_error since it is > > only called from mballoc and AFAICT it would be always valid to schedule > > work here. > > > > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") > > Reported-by: Mahesh Kumar <maheshkumar657g@xxxxxxxxx> > > Suggested-by: Ritesh Harjani <ritesh.list@xxxxxxxxx> > > Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > > --- > > fs/ext4/super.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > > index a963ffda692a..b7341e9acf62 100644 > > --- a/fs/ext4/super.c > > +++ b/fs/ext4/super.c > > @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, > > * constraints, it may not be safe to do it right here so we > > * defer superblock flushing to a workqueue. > > */ > > - if (continue_fs && journal) > > + if (continue_fs && journal && (sb->s_flags & SB_ACTIVE)) > > schedule_work(&EXT4_SB(sb)->s_sb_upd_work); > > else > > ext4_commit_super(sb); > >