I received an email that this email can't delivered to someone. Resent it to linux-raid again. ---------- Forwarded message --------- From: Xiao Ni <xni@xxxxxxxxxx> Date: Fri, May 26, 2023 at 9:49 AM Subject: The read data is wrong from raid5 when recovery happens To: Song Liu <song@xxxxxxxxxx>, Guoqing Jiang <guoqing.jiang@xxxxxxxxx> Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>, Heinz Mauelshagen <heinzm@xxxxxxxxxx>, Nigel Croxon <ncroxon@xxxxxxxxxx> Hi all We found a problem recently. The read data is wrong when recovery happens. Now we've found it's introduced by patch 10764815f (md: add io accounting for raid0 and raid5). I can reproduce this 100%. This problem exists in upstream. The test steps are like this: 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean 2. mkfs.ext4 -F $devname 3. mount $devname $mount_point 4. mdadm --incremental --fail sdd 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress 6. mdadm /dev/md126 --add /dev/sdd 7. create 31 processes that writes and reads. It compares the content with md5sum. The test will go on until the recovery stops 8. wait for about 10 minutes, we can see some processes report checksum is wrong. But if it re-read the data again, the checksum will be good. I tried to narrow this problem like this: - md_account_bio(mddev, &bi); + if (rw == WRITE) + md_account_bio(mddev, &bi); If it only do account for write requests, the problem can disappear. - if (rw == READ && mddev->degraded == 0 && - mddev->reshape_position == MaxSector) { - bi = chunk_aligned_read(mddev, bi); - if (!bi) - return true; - } + //if (rw == READ && mddev->degraded == 0 && + // mddev->reshape_position == MaxSector) { + // bi = chunk_aligned_read(mddev, bi); + // if (!bi) + // return true; + //} if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) { make_discard_request(mddev, bi); @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) md_write_end(mddev); return true; } - md_account_bio(mddev, &bi); + if (rw == READ) + md_account_bio(mddev, &bi); I comment the chunk_aligned_read out and only account for read requests, this problem can be reproduced. -- Best Regards Xiao Ni -- Best Regards Xiao Ni