Re: [PATCH] btrfs: scrub: avoid crash if scrub is trying to do recovery for a removed block group

David Sterba <dsterba@xxxxxxx> · Fri, 28 Apr 2023 19:15:35 +0200

On Wed, Apr 26, 2023 at 10:45:59AM +0800, Qu Wenruo wrote:
> [BUG]
> Syzbot reported an ASSERT() got triggered during a scrub repair along
> with balance:
> 
>  BTRFS info (device loop5): balance: start -d -m
>  BTRFS info (device loop5): relocating block group 6881280 flags data|metadata
>  BTRFS info (device loop5): found 3 extents, stage: move data extents
>  BTRFS info (device loop5): scrub: started on devid 1
>  BTRFS info (device loop5): relocating block group 5242880 flags data|metadata
>  BTRFS info (device loop5): found 6 extents, stage: move data extents
>  BTRFS info (device loop5): found 1 extents, stage: update data pointers
>  BTRFS warning (device loop5): tree block 5500928 mirror 1 has bad bytenr, has 0 want 5500928
>  BTRFS info (device loop5): balance: ended with status: 0
>  BTRFS warning (device loop5): tree block 5435392 mirror 1 has bad bytenr, has 0 want 5435392
>  BTRFS warning (device loop5): tree block 5423104 mirror 1 has bad bytenr, has 0 want 5423104
>  assertion failed: 0, in fs/btrfs/scrub.c:614
>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/messages.c:259!
>  invalid opcode: 0000 [#2] PREEMPT SMP KASAN
>  Call Trace:
>    <TASK>
>    lock_full_stripe fs/btrfs/scrub.c:614 [inline]
>    scrub_handle_errored_block+0x1ee1/0x4730 fs/btrfs/scrub.c:1067
>    scrub_bio_end_io_worker+0x9bb/0x1370 fs/btrfs/scrub.c:2559
>    process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2390
>    worker_thread+0xa63/0x1210 kernel/workqueue.c:2537
>    kthread+0x270/0x300 kernel/kthread.c:376
>    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>    </TASK>
> 
> [CAUSE]
> Btrfs can delete empty block groups either through auto-cleanup or
> relcation.
> 
> Scrub normally is able to handle this situation well by doing extra
> checking, and holding the block group cache pointer during the whole
> scrub lifespan.
> 
> But unfortunately for lock_full_stripe() and unlock_full_stripe()
> functions, due to the context restriction, they have to do an extra
> search on the block group cache.
> (While the main scrub threads holds a proper btrfs_block_group, but we
> have no way to directly use that in repair context).
> 
> Thus it can happen that the target block group is already deleted by
> relocation.
> 
> In that case, we trigger the above ASSERT().
> 
> [FIX]
> Instead of triggering the ASSERT(), let's just return 0 and continue,
> this would leave @locked_ret to be false, and we won't try to unlock
> later.
> 
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> There would be no upstream commit, as upstream has completely rewritten
> the scrub code in v6.4 merge window, and gets rid of the
> lock_full_stripe()/unlock_full_stripe() functions.

I think we can explain and justify such change:

- there's a report and reproducer, so this is an existing problem

- it's a short diff

- we should do extra testing of the stable trees + this patch before
  submitting

> I hope we don't have more scrub fixes which would only apply to older
> kernels.

We'll handle that case by case, fixing problems in old stable tree is
desirable. Patches without an exact corresponding upstream commit should
be rare but with rewrites this could happen.

> ---
>  fs/btrfs/scrub.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 69c93ae333f6..43d0613c0dd3 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -610,10 +610,9 @@ static int lock_full_stripe(struct btrfs_fs_info *fs_info, u64 bytenr,
>  
>  	*locked_ret = false;
>  	bg_cache = btrfs_lookup_block_group(fs_info, bytenr);
> -	if (!bg_cache) {
> -		ASSERT(0);

I like the using ASSERT(0) less and less each time I see it. We now have
about 33 of them and it looks like a misuse. It's a BUG() in disguise.
We should handle the error properly, either return a code and let
callers handle it or return -EUCLEAN if it's an unrepairable condition.
Assertions should be for code invariants, API misuse prevention, not for
error handling.