Fwd: The read data is wrong from raid5 when recovery happens

Xiao Ni <xni@xxxxxxxxxx> · Fri, 26 May 2023 10:08:09 +0800




I received an email that this email can't delivered to someone. Resent
it to linux-raid again.

---------- Forwarded message ---------
From: Xiao Ni <xni@xxxxxxxxxx>
Date: Fri, May 26, 2023 at 9:49 AM
Subject: The read data is wrong from raid5 when recovery happens
To: Song Liu <song@xxxxxxxxxx>, Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>, Heinz Mauelshagen
<heinzm@xxxxxxxxxx>, Nigel Croxon <ncroxon@xxxxxxxxxx>


Hi all

We found a problem recently. The read data is wrong when recovery
happens. Now we've found it's introduced by patch 10764815f (md: add
io accounting for raid0 and raid5). I can reproduce this 100%. This
problem exists in upstream. The test steps are like this:

1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
2. mkfs.ext4 -F $devname
3. mount $devname $mount_point
4. mdadm --incremental --fail sdd
5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress
6. mdadm /dev/md126 --add /dev/sdd
7. create 31 processes that writes and reads. It compares the content
with md5sum. The test will go on until the recovery stops
8. wait for about 10 minutes, we can see some processes report
checksum is wrong. But if it re-read the data again, the checksum will
be good.

I tried to narrow this problem like this:

-       md_account_bio(mddev, &bi);
+       if (rw == WRITE)
+               md_account_bio(mddev, &bi);
If it only do account for write requests, the problem can disappear.

-       if (rw == READ && mddev->degraded == 0 &&
-           mddev->reshape_position == MaxSector) {
-               bi = chunk_aligned_read(mddev, bi);
-               if (!bi)
-                       return true;
-       }
+       //if (rw == READ && mddev->degraded == 0 &&
+       //    mddev->reshape_position == MaxSector) {
+       //      bi = chunk_aligned_read(mddev, bi);
+       //      if (!bi)
+       //              return true;
+       //}

        if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
                make_discard_request(mddev, bi);
@@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev
*mddev, struct bio * bi)
                        md_write_end(mddev);
                return true;
        }
-       md_account_bio(mddev, &bi);
+       if (rw == READ)
+               md_account_bio(mddev, &bi);

I comment the chunk_aligned_read out and only account for read
requests, this problem can be reproduced.

-- 
Best Regards
Xiao Ni


-- 
Best Regards
Xiao Ni