Patch "btrfs: handle case when repair happens with dev-replace" has been added to the 6.0-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Tue, 10 Jan 2023 10:20:51 -0500

This is a note to let you know that I've just added the patch titled

    btrfs: handle case when repair happens with dev-replace

to the 6.0-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     btrfs-handle-case-when-repair-happens-with-dev-repla.patch
and it can be found in the queue-6.0 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 7d511ebaf5a6617ee2f385a24047d01dfc3116fd
Author: Qu Wenruo <wqu@xxxxxxxx>
Date:   Sun Jan 1 09:02:21 2023 +0800

    btrfs: handle case when repair happens with dev-replace
    
    [ Upstream commit d73a27b86fc722c28a26ec64002e3a7dc86d1c07 ]
    
    [BUG]
    There is a bug report that a BUG_ON() in btrfs_repair_io_failure()
    (originally repair_io_failure() in v6.0 kernel) got triggered when
    replacing a unreliable disk:
    
      BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3
      kernel BUG at fs/btrfs/extent_io.c:2380!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G           OE      6.0.0-5-amd64 #1  Debian 6.0.10-2
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
      Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
      RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
      Call Trace:
       <TASK>
       clean_io_failure+0x14d/0x180 [btrfs]
       end_bio_extent_readpage+0x412/0x6e0 [btrfs]
       ? __switch_to+0x106/0x420
       process_one_work+0x1c7/0x380
       worker_thread+0x4d/0x380
       ? rescuer_thread+0x3a0/0x3a0
       kthread+0xe9/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
    
    [CAUSE]
    
    Before the BUG_ON(), we got some read errors from the replace target
    first, note the mirror number (3, which is beyond RAID1 duplication,
    thus it's read from the replace target device).
    
    Then at the BUG_ON() location, we are trying to writeback the repaired
    sectors back the failed device.
    
    The check looks like this:
    
                    ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
                                          &map_length, &bioc, mirror_num);
                    if (ret)
                            goto out_counter_dec;
                    BUG_ON(mirror_num != bioc->mirror_num);
    
    But inside btrfs_map_block(), we can modify bioc->mirror_num especially
    for dev-replace:
    
            if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
                !need_full_stripe(op) && dev_replace->tgtdev != NULL) {
                    ret = get_extra_mirror_from_replace(fs_info, logical, *length,
                                                        dev_replace->srcdev->devid,
                                                        &mirror_num,
                                                &physical_to_patch_in_first_stripe);
                    patch_the_first_stripe_for_dev_replace = 1;
            }
    
    Thus if we're repairing the replace target device, we're going to
    trigger that BUG_ON().
    
    But in reality, the read failure from the replace target device may be
    that, our replace hasn't reached the range we're reading, thus we're
    reading garbage, but with replace running, the range would be properly
    filled later.
    
    Thus in that case, we don't need to do anything but let the replace
    routine to handle it.
    
    [FIX]
    Instead of a BUG_ON(), just skip the repair if we're repairing the
    device replace target device.
    
    Reported-by: 小太 <nospam@xxxxxxxx>
    Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@xxxxxxxxxxxxxx/
    CC: stable@xxxxxxxxxxxxxxx # 6.0+
    Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
    Signed-off-by: David Sterba <dsterba@xxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cf4f19e80e2f..0982995177a6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2377,7 +2377,16 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 				      &map_length, &bioc, mirror_num);
 		if (ret)
 			goto out_counter_dec;
-		BUG_ON(mirror_num != bioc->mirror_num);
+		/*
+		 * This happens when dev-replace is also running, and the
+		 * mirror_num indicates the dev-replace target.
+		 *
+		 * In this case, we don't need to do anything, as the read
+		 * error just means the replace progress hasn't reached our
+		 * read range, and later replace routine would handle it well.
+		 */
+		if (mirror_num != bioc->mirror_num)
+			goto out_counter_dec;
 	}
 
 	sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9;