Re: Fwd: The read data is wrong from raid5 when recovery happens

Xiao Ni <xni@xxxxxxxxxx> · Fri, 26 May 2023 10:40:32 +0800

On Fri, May 26, 2023 at 10:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2023/05/26 10:08, Xiao Ni 写道:
> > I received an email that this email can't delivered to someone. Resent
> > it to linux-raid again.
> >
> > ---------- Forwarded message ---------
> > From: Xiao Ni <xni@xxxxxxxxxx>
> > Date: Fri, May 26, 2023 at 9:49 AM
> > Subject: The read data is wrong from raid5 when recovery happens
> > To: Song Liu <song@xxxxxxxxxx>, Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> > Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>, Heinz Mauelshagen
> > <heinzm@xxxxxxxxxx>, Nigel Croxon <ncroxon@xxxxxxxxxx>
> >
> >
> > Hi all
> >
> > We found a problem recently. The read data is wrong when recovery
> > happens. Now we've found it's introduced by patch 10764815f (md: add
> > io accounting for raid0 and raid5). I can reproduce this 100%. This
> > problem exists in upstream. The test steps are like this:
> >
> > 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
> > 2. mkfs.ext4 -F $devname
> > 3. mount $devname $mount_point
> > 4. mdadm --incremental --fail sdd
> > 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress
> > 6. mdadm /dev/md126 --add /dev/sdd
> Can you try to zero superblock before add sdd? just to bypass readd.

Hi Kuai

I tried with this. It can still be reproduced.

>
> Thanks,
> Kuai
> > 7. create 31 processes that writes and reads. It compares the content
> > with md5sum. The test will go on until the recovery stops
> > 8. wait for about 10 minutes, we can see some processes report
> > checksum is wrong. But if it re-read the data again, the checksum will
> > be good.
> >
> > I tried to narrow this problem like this:
> >
> > -       md_account_bio(mddev, &bi);
> > +       if (rw == WRITE)
> > +               md_account_bio(mddev, &bi);
> > If it only do account for write requests, the problem can disappear.
> >
> > -       if (rw == READ && mddev->degraded == 0 &&
> > -           mddev->reshape_position == MaxSector) {
> > -               bi = chunk_aligned_read(mddev, bi);
> > -               if (!bi)
> > -                       return true;
> > -       }
> > +       //if (rw == READ && mddev->degraded == 0 &&
> > +       //    mddev->reshape_position == MaxSector) {
> > +       //      bi = chunk_aligned_read(mddev, bi);
> > +       //      if (!bi)
> > +       //              return true;
> > +       //}
> >
> >          if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
> >                  make_discard_request(mddev, bi);
> > @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev
> > *mddev, struct bio * bi)
> >                          md_write_end(mddev);
> >                  return true;
> >          }
> > -       md_account_bio(mddev, &bi);
> > +       if (rw == READ)
> > +               md_account_bio(mddev, &bi);
> >
> > I comment the chunk_aligned_read out and only account for read
> > requests, this problem can be reproduced.
> >
>

-- 
Best Regards
Xiao Ni