Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/09/2020 12:17 PM, Song Liu wrote:
Hi Matthew,

On Dec 8, 2020, at 7:46 PM, Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote:

Hello,

I recently backported the following patches into the Ubuntu stable kernels:

md: add md_submit_discard_bio() for submitting discard bio
md/raid10: extend r10bio devs to raid disks
md/raid10: pull codes that wait for blocked dev into one function
md/raid10: improve raid10 discard request
md/raid10: improve discard request for far layout
dm raid: fix discard limits for raid1 and raid10
dm raid: remove unnecessary discard limits for raid10
Thanks for the report!

Hi Xiao,

Could you please take a look at this and let me know soon? We need to fix
this before 5.10 official release.

Thanks,
Song

Hi all

The root cause is found. Now we use a similar way with raid0 to handle discard request for raid10. Because the discard region is very big, we can calculate the start/end address for each disk. Then we can submit the discard request to each disk. But for raid10, it has copies. For near layout, if the discard request doesn't align with chunk size, we calculate a start_disk_offset. Now we only use start_disk_offset for the first disk, but it should be
used for the near copies disks too.

[  789.709501] discard bio start : 70968, size : 191176
[ 789.709507] first stripe index 69, start disk index 0, start disk offset 70968 [ 789.709509] last stripe index 256, end disk index 0, end disk offset 262144
[  789.709511] disk 0, dev start : 70968, dev end : 262144
[  789.709515] disk 1, dev start : 70656, dev end : 262144

For example, in this test case, it has 2 near copies. The start_disk_offset for the first disk is 70968. It should use the same offset address for second disk. But it uses the start address of this chunk. It discard more region. The patch in the attachment can fix this problem. It split the region that
doesn't align with chunk size.

There is another problem. The stripe size should be calculated differently for near layout and far layout.

@Song, do you want me to use a separate patch for this fix, or fix this in the original patch?

Merry Christmas
Xiao

commit 0d74ac66ed0ec5af70296545e26044723a14657c
Author: Xiao Ni <xni@xxxxxxxxxx>
Date:   Thu Dec 24 17:58:43 2020 +0800

    fix

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3153183b7772..92182cf40d22 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1604,6 +1604,7 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	sector_t chunk;
 	unsigned int stripe_size;
 	sector_t split_size;
+	sector_t chunk_size = 1 << geo->chunk_shift;
 
 	sector_t bio_start, bio_end;
 	sector_t first_stripe_index, last_stripe_index;
@@ -1624,7 +1625,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
 		goto out;
 
-	stripe_size = geo->raid_disks << geo->chunk_shift;
+	stripe_size = geo->near_copies ? geo->near_copies << geo->chunk_shift:
+				geo->raid_disks << geo->chunk_shift;
 	bio_start = bio->bi_iter.bi_sector;
 	bio_end = bio_end_sector(bio);
 
@@ -1637,6 +1639,18 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	if (bio_sectors(bio) < stripe_size*2)
 		goto out;
 
+	/* Keep the discard start/end address aligned with chunk size */
+	if (bio_start & geo->chunk_mask) {
+		split_size = (chunk_size - (bio_start & geo->chunk_mask));
+		bio = raid10_split_bio(conf, bio, split_size, false);
+	}
+	if (bio_end & geo->chunk_mask) {
+		split_size = bio_end & geo->chunk_mask;
+		bio = raid10_split_bio(conf, bio, split_size, true);
+	}
+	bio_start = bio->bi_iter.bi_sector;
+	bio_end = bio_end_sector(bio);
+
 	/* For far and far offset layout, if bio is not aligned with stripe size,
 	 * it splits the part that is not aligned with strip size.
 	 */
@@ -1664,8 +1678,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	start_disk_index = sector_div(first_stripe_index, geo->raid_disks);
 	if (geo->far_offset)
 		first_stripe_index *= geo->far_copies;
-	start_disk_offset = (bio_start & geo->chunk_mask) +
-				(first_stripe_index << geo->chunk_shift);
+	/* Now the bio is aligned with chunk size */
+	start_disk_offset = first_stripe_index << geo->chunk_shift;
 
 	chunk = bio_end >> geo->chunk_shift;
 	chunk *= geo->near_copies;
@@ -1673,8 +1687,7 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	end_disk_index = sector_div(last_stripe_index, geo->raid_disks);
 	if (geo->far_offset)
 		last_stripe_index *= geo->far_copies;
-	end_disk_offset = (bio_end & geo->chunk_mask) +
-				(last_stripe_index << geo->chunk_shift);
+	end_disk_offset = last_stripe_index << geo->chunk_shift;
 
 retry_discard:
 	r10_bio = mempool_alloc(&conf->r10bio_pool, GFP_NOIO);

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux