On Fri, May 26, 2023 at 10:48 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/05/26 10:40, Xiao Ni 写道: > > On Fri, May 26, 2023 at 10:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> 在 2023/05/26 10:08, Xiao Ni 写道: > >>> I received an email that this email can't delivered to someone. Resent > >>> it to linux-raid again. > >>> > >>> ---------- Forwarded message --------- > >>> From: Xiao Ni <xni@xxxxxxxxxx> > >>> Date: Fri, May 26, 2023 at 9:49 AM > >>> Subject: The read data is wrong from raid5 when recovery happens > >>> To: Song Liu <song@xxxxxxxxxx>, Guoqing Jiang <guoqing.jiang@xxxxxxxxx> > >>> Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>, Heinz Mauelshagen > >>> <heinzm@xxxxxxxxxx>, Nigel Croxon <ncroxon@xxxxxxxxxx> > >>> > >>> > >>> Hi all > >>> > >>> We found a problem recently. The read data is wrong when recovery > >>> happens. Now we've found it's introduced by patch 10764815f (md: add > >>> io accounting for raid0 and raid5). I can reproduce this 100%. This > >>> problem exists in upstream. The test steps are like this: > >>> > >>> 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean > >>> 2. mkfs.ext4 -F $devname > >>> 3. mount $devname $mount_point > >>> 4. mdadm --incremental --fail sdd > >>> 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress > >>> 6. mdadm /dev/md126 --add /dev/sdd > >> Can you try to zero superblock before add sdd? just to bypass readd. > > > > Hi Kuai > > > > I tried with this. It can still be reproduced. > > Ok, I asked this because we found readd has some problem while testing > raid10, and it's easy to reporduce... > > Then, there is a related fixed patch just merged: > > md/raid5: fix miscalculation of 'end_sector' in raid5_read_one_chunk() > > The upstream that you tried contain this one? Hi My test version doesn't have this patch. But as mentioned in the original email, I comment the codes that calling raid5_read_one_chunk out and it still can reproduce this bug. So this patch should not fix this bug. Regards Xiao > > > Thanks, > Kuai > > > >> > >> Thanks, > >> Kuai > >>> 7. create 31 processes that writes and reads. It compares the content > >>> with md5sum. The test will go on until the recovery stops > >>> 8. wait for about 10 minutes, we can see some processes report > >>> checksum is wrong. But if it re-read the data again, the checksum will > >>> be good. > >>> > >>> I tried to narrow this problem like this: > >>> > >>> - md_account_bio(mddev, &bi); > >>> + if (rw == WRITE) > >>> + md_account_bio(mddev, &bi); > >>> If it only do account for write requests, the problem can disappear. > >>> > >>> - if (rw == READ && mddev->degraded == 0 && > >>> - mddev->reshape_position == MaxSector) { > >>> - bi = chunk_aligned_read(mddev, bi); > >>> - if (!bi) > >>> - return true; > >>> - } > >>> + //if (rw == READ && mddev->degraded == 0 && > >>> + // mddev->reshape_position == MaxSector) { > >>> + // bi = chunk_aligned_read(mddev, bi); > >>> + // if (!bi) > >>> + // return true; > >>> + //} > >>> > >>> if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) { > >>> make_discard_request(mddev, bi); > >>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev > >>> *mddev, struct bio * bi) > >>> md_write_end(mddev); > >>> return true; > >>> } > >>> - md_account_bio(mddev, &bi); > >>> + if (rw == READ) > >>> + md_account_bio(mddev, &bi); > >>> > >>> I comment the chunk_aligned_read out and only account for read > >>> requests, this problem can be reproduced. > >>> > >> > > > > > -- Best Regards Xiao Ni