Re: Fwd: The read data is wrong from raid5 when recovery happens

Xiao Ni <xni@xxxxxxxxxx> · Fri, 26 May 2023 11:02:08 +0800

On Fri, May 26, 2023 at 10:48 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2023/05/26 10:40, Xiao Ni 写道:
> > On Fri, May 26, 2023 at 10:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/26 10:08, Xiao Ni 写道:
> >>> I received an email that this email can't delivered to someone. Resent
> >>> it to linux-raid again.
> >>>
> >>> ---------- Forwarded message ---------
> >>> From: Xiao Ni <xni@xxxxxxxxxx>
> >>> Date: Fri, May 26, 2023 at 9:49 AM
> >>> Subject: The read data is wrong from raid5 when recovery happens
> >>> To: Song Liu <song@xxxxxxxxxx>, Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> >>> Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>, Heinz Mauelshagen
> >>> <heinzm@xxxxxxxxxx>, Nigel Croxon <ncroxon@xxxxxxxxxx>
> >>>
> >>>
> >>> Hi all
> >>>
> >>> We found a problem recently. The read data is wrong when recovery
> >>> happens. Now we've found it's introduced by patch 10764815f (md: add
> >>> io accounting for raid0 and raid5). I can reproduce this 100%. This
> >>> problem exists in upstream. The test steps are like this:
> >>>
> >>> 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
> >>> 2. mkfs.ext4 -F $devname
> >>> 3. mount $devname $mount_point
> >>> 4. mdadm --incremental --fail sdd
> >>> 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress
> >>> 6. mdadm /dev/md126 --add /dev/sdd
> >> Can you try to zero superblock before add sdd? just to bypass readd.
> >
> > Hi Kuai
> >
> > I tried with this. It can still be reproduced.
>
> Ok, I asked this because we found readd has some problem while testing
> raid10, and it's easy to reporduce...
>
> Then, there is a related fixed patch just merged:
>
> md/raid5: fix miscalculation of 'end_sector' in raid5_read_one_chunk()
>
> The upstream that you tried contain this one?

Hi

My test version doesn't have this patch. But as mentioned in the
original email, I comment the codes that calling raid5_read_one_chunk
out and it still can reproduce this bug. So this patch should not fix
this bug.

Regards
Xiao

>
>
> Thanks,
> Kuai
> >
> >>
> >> Thanks,
> >> Kuai
> >>> 7. create 31 processes that writes and reads. It compares the content
> >>> with md5sum. The test will go on until the recovery stops
> >>> 8. wait for about 10 minutes, we can see some processes report
> >>> checksum is wrong. But if it re-read the data again, the checksum will
> >>> be good.
> >>>
> >>> I tried to narrow this problem like this:
> >>>
> >>> -       md_account_bio(mddev, &bi);
> >>> +       if (rw == WRITE)
> >>> +               md_account_bio(mddev, &bi);
> >>> If it only do account for write requests, the problem can disappear.
> >>>
> >>> -       if (rw == READ && mddev->degraded == 0 &&
> >>> -           mddev->reshape_position == MaxSector) {
> >>> -               bi = chunk_aligned_read(mddev, bi);
> >>> -               if (!bi)
> >>> -                       return true;
> >>> -       }
> >>> +       //if (rw == READ && mddev->degraded == 0 &&
> >>> +       //    mddev->reshape_position == MaxSector) {
> >>> +       //      bi = chunk_aligned_read(mddev, bi);
> >>> +       //      if (!bi)
> >>> +       //              return true;
> >>> +       //}
> >>>
> >>>           if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
> >>>                   make_discard_request(mddev, bi);
> >>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev
> >>> *mddev, struct bio * bi)
> >>>                           md_write_end(mddev);
> >>>                   return true;
> >>>           }
> >>> -       md_account_bio(mddev, &bi);
> >>> +       if (rw == READ)
> >>> +               md_account_bio(mddev, &bi);
> >>>
> >>> I comment the chunk_aligned_read out and only account for read
> >>> requests, this problem can be reproduced.
> >>>
> >>
> >
> >
>

-- 
Best Regards
Xiao Ni