Re: [PATCH 1/2] ext4: only defer sb update on error if SB_ACTIVE

Jan Kara <jack@xxxxxxx> · Mon, 24 Feb 2025 15:52:00 +0100

On Sat 22-02-25 14:10:22, Ojaswin Mujoo wrote:
> Presently we always BUG_ON if trying to start a transaction on a journal
> marked with JBD2_UNMOUNT, since this should never happen. However while
> running stress tests it was observed that in case of some error handling
> paths, it is possible for update_super_work to start a transaction after
> the journal is destroyed eg:
> 
> (umount)
> ext4_kill_sb
>   kill_block_super
>     generic_shutdown_super
>       sync_filesystem /* commits all txns */
>       evict_inodes
>         /* might start a new txn */
>       ext4_put_super
> 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>         jbd2_journal_destroy
>           journal_kill_thread
>             journal->j_flags |= JBD2_UNMOUNT;
>           jbd2_journal_commit_transaction
>             jbd2_journal_get_descriptor_buffer
>               jbd2_journal_bmap
>                 ext4_journal_bmap
>                   ext4_map_blocks
>                     ...
>                     ext4_inode_error
>                       ext4_handle_error
>                         schedule_work(&sbi->s_sb_upd_work)
> 
>                                                /* work queue kicks in */
>                                                update_super_work
>                                                  jbd2_journal_start
>                                                    start_this_handle
>                                                      BUG_ON(journal->j_flags &
>                                                             JBD2_UNMOUNT)
> 
> Hence, make sure we only defer the update of ext4 sb if the sb is still
> active.  Otherwise, just fallback to an un-journaled commit.
> 
> The important thing to note here is that we must only defer sb update if
> we have not yet flushed the s_sb_update_work queue in umount path else
> this race can be hit (point 1 below). Since we don't have a direct way
> to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit
> subtle so adding some notes below for future reference:
> 
> 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT
> == 0) however this is not correct since we could end up scheduling work
> after it has been flushed:
> 
>  ext4_put_super
>   flush_work(&sbi->s_sb_upd_work)
> 
>                            **kjournald2**
>                            jbd2_journal_commit_transaction
>                            ...
>                            ext4_inode_error
>                              /* JBD2_UNMOUNT not set */
>                              schedule_work(s_sb_upd_work)
> 
>    jbd2_journal_destroy
>     journal->j_flags |= JBD2_UNMOUNT;
> 
>                                       **workqueue**
>                                       update_super_work
>                                        jbd2_journal_start
>                                         start_this_handle
>                                           BUG_ON(JBD2_UNMOUNT)
> 
> Something like the above doesn't happen with SB_ACTIVE check because we
> are sure that the workqueue would be flushed at a later point if we are
> in the umount path.
> 
> 2. We don't need a similar check in ext4_grp_locked_error since it is
> only called from mballoc and AFAICT it would be always valid to schedule
> work here.
> 
> Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
> Reported-by: Mahesh Kumar <maheshkumar657g@xxxxxxxxx>
> Suggested-by: Ritesh Harjani <ritesh.list@xxxxxxxxx>
> Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>

Good catch! But I think the solution will have to be slightly different.
Basing the check on SB_ACTIVE has the problem that you can have racing
updates of the sb in the still running transaction and in your direct
update leading to inconsistencies after a crash (that was the reason why
we've created the s_sb_upd_work in the first place).

I would solve this by implementing something like
ext4_update_sb_destroy_journal() which will set a flag in sbi, flush the
workqueue, and then destroy the journal. And ext4_handle_error() will check
for the sbi flag.

								Honza

> ---
>  fs/ext4/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index a963ffda692a..b7341e9acf62 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
>  		 * constraints, it may not be safe to do it right here so we
>  		 * defer superblock flushing to a workqueue.
>  		 */
> -		if (continue_fs && journal)
> +		if (continue_fs && journal && (sb->s_flags & SB_ACTIVE))
>  			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
>  		else
>  			ext4_commit_super(sb);
> -- 
> 2.48.1
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR