Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only

David Sterba <dsterba@xxxxxxx> · Mon, 3 Jan 2022 19:52:37 +0100

On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:
> [BUG]
> The following super simple script would crash btrfs at unmount time, if
> CONFIG_BTRFS_ASSERT() is set.
> 
>  mkfs.btrfs -f $dev
>  mount $dev $mnt
>  xfs_io -f -c "pwrite 0 4k" $mnt/file
>  umount $mnt
>  mount -r ro $dev $mnt
>  btrfs scrub start -Br $mnt
>  umount $mnt
> 
> This will trigger the following ASSERT() introduced by commit
> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
> late stage of umount").
> 
> That patch is deifnitely not the cause, it just makes enough noise for
> us developer.
> 
> [CAUSE]
> We will start transaction for the following call chain during scrub:
> 
>   scrub_enumerate_chunks()
>   |- btrfs_inc_block_group_ro()
>      |- btrfs_join_transaction()
> 
> However for RO mount, there is no running transaction at all, thus
> btrfs_join_transaction() will start a new transaction.
> 
> Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
> btrfs_commit_super() to commit the new but empty transaction.
> 
> And lead to the ASSERT() being triggered.
> 
> The bug should be there for a long time. Only the new ASSERT() makes it
> noisy enough to be noticed.
> 
> [FIX]
> For read-only scrub on read-only mount, there is no need to start a
> transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
> 
> Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
> if it's read-only, skip all chunk allocation and go inc_block_group_ro()
> directly.
> 
> Since we're here, also add extra debug message at unmount for
> btrfs_fs_info::trans_list.
> Sometimes just knowing that there is no dirty metadata bytes for a
> uncommitted transaction can tell us a lot of things.
> 
> Cc: stable@xxxxxxxxxxxxxxx # 5.4+
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  fs/btrfs/block-group.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 1db24e6d6d90..702219361b12 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
>  	int ret;
>  	bool dirty_bg_running;
>  
> +	/*
> +	 * This can only happen when we are doing read-only scrub on read-only
> +	 * mount.
> +	 * In that case we should not start a new transaction on read-only fs.
> +	 * Thus here we skip all chunk allocation.
> +	 */
> +	if (sb_rdonly(fs_info->sb)) {

Should this also verify or at least assert that do_chunk_alloc is not
set? The scrub code is used for replace that can set the parameter to
true.

> +		mutex_lock(&fs_info->ro_block_group_mutex);
> +		ret = inc_block_group_ro(cache, 0);
> +		mutex_unlock(&fs_info->ro_block_group_mutex);
> +		return ret;

So this is taking a shortcut and skips a few things done in the function
that use the transaction. I'm not sure how safe this is, it depends on
the read-only status of superblock, that can chage any time, so what are
further calls to btrfs_inc_block_group_ro going to do regaring the
transaction?

> +	}
> +
>  	do {
>  		trans = btrfs_join_transaction(root);
>  		if (IS_ERR(trans))
> -- 
> 2.34.1