Re: [PATCH] btrfs: scrub: fix false alerts on zoned device scrubing

Naohiro Aota <Naohiro.Aota@xxxxxxx> · Tue, 5 Mar 2024 08:48:32 +0000

On Tue, Mar 05, 2024 at 01:18:22PM +1030, Qu Wenruo wrote:
> [BUG]
> When using zoned devices (zbc), scrub would always report super block
> errors like the following:
> 
>   # btrfs scrub start -fB /mnt/btrfs/
>   Starting scrub on devid 1
>   scrub done for b7b5c759-1baa-4561-a0ca-b8d0babcde56
>   Scrub started:    Tue Mar  5 12:49:14 2024
>   Status:           finished
>   Duration:         0:00:00
>   Total to scrub:   288.00KiB
>   Rate:             288.00KiB/s
>   Error summary:    super=2
>     Corrected:      0
>     Uncorrectable:  0
>     Unverified:     0
> 
> [CAUSE]
> Since the very beginning of scrub, we always go with btrfs_sb_offset()
> to grab the super blocks.
> This is fine for regular btrfs filesystems, but for zoned btrfs, super
> blocks are stored in dedicated zones with a ring buffer like structure.
> 
> This means the old btrfs_sb_offset() is not able to give the correct
> bytenr for us to grabbing the super blocks, thus except the primary
> super block, the rest would be garbage and cause the above false alerts.
> 
> [FIX]
> Instead of btrfs_sb_offset(), go with btrfs_sb_log_location() which is
> zoned friendly, to grab the correct super block location.
> 
> This would introduce new error patterns, as btrfs_sb_log_location() can
> fail with extra errors.
> 
> Here for -ENOENT we just end the scrub as there are no more super
> blocks.
> For other errors, we record it as a super block error and exit.
> 
> Reported-by: WA AM <waautomata@xxxxxxxxx>
> Link: https://lore.kernel.org/all/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@xxxxxxxxxxxxxx/
> CC: stable@xxxxxxxxxxxxxxx # 5.15+
> Signed-off-by: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  fs/btrfs/scrub.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index c4bd0e60db59..e1b67baa4072 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2788,7 +2788,6 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>  					   struct btrfs_device *scrub_dev)
>  {
>  	int	i;
> -	u64	bytenr;
>  	u64	gen;
>  	int ret = 0;
>  	struct page *page;
> @@ -2812,7 +2811,17 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>  		gen = btrfs_get_last_trans_committed(fs_info);
>  
>  	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
> -		bytenr = btrfs_sb_offset(i);
> +		u64 bytenr;
> +
> +		ret = btrfs_sb_log_location(scrub_dev, i, 0, &bytenr);
> +		if (ret == -ENOENT)
> +			break;
> +		if (ret < 0) {
> +			spin_lock(&sctx->stat_lock);
> +			sctx->stat.super_errors++;
> +			spin_unlock(&sctx->stat_lock);
> +			break;

Since an error from scrub_one_super can continue, this can be "continue"
also? E.g, if btrfs_sb_log_location() returns -EUCLEAN on the 2nd SB, it
fails to detect the 3rd SB's corruption.

Other than that, looks good.

Reviewed-by: Naohiro Aota <naohiro.aota@xxxxxxx>

> +		}
>  		if (bytenr + BTRFS_SUPER_INFO_SIZE >
>  		    scrub_dev->commit_total_bytes)
>  			break;
> -- 
> 2.44.0
>