Re: The read data is wrong from raid5 when recovery happens

Song Liu <song@xxxxxxxxxx> · Fri, 26 May 2023 14:13:28 -0700

On Fri, May 26, 2023 at 2:13 AM Mariusz Tkaczyk
<mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, 26 May 2023 15:23:58 +0800
> Xiao Ni <xni@xxxxxxxxxx> wrote:
>
> > On Fri, May 26, 2023 at 3:12 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> > >
> > >
> > >
> > > On 5/26/23 14:45, Xiao Ni wrote:
> > > > On Fri, May 26, 2023 at 11:09 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> > > > wrote:
> > > >>
> > > >>
> > > >> On 5/26/23 09:49, Xiao Ni wrote:
> > > >>> Hi all
> > > >>>
> > > >>> We found a problem recently. The read data is wrong when recovery
> > > >>> happens. Now we've found it's introduced by patch 10764815f (md: add io
> > > >>> accounting for raid0 and raid5). I can reproduce this 100%. This
> > > >>> problem exists in upstream. The test steps are like this:
> > > >>>
> > > >>> 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
> > > >>> 2. mkfs.ext4 -F $devname
> > > >>> 3. mount $devname $mount_point
> > > >>> 4. mdadm --incremental --fail sdd
> > > >>> 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000
> > > >>> status=progress
> > >
> > > I suppose /tmp is the mount point.
> >
> > /tmp/pythontest is the mount point
> >
> > >
> > > >>> 6. mdadm /dev/md126 --add /dev/sdd
> > > >>> 7. create 31 processes that writes and reads. It compares the content
> > > >>> with md5sum. The test will go on until the recovery stops
> > >
> > > Could you share the test code/script for step 7? Will try it from my side.
> >
> > The test scripts are written by people from intel.
> > Hi, Mariusz. Can I share the test scripts here?
>
> Yes. Let us know if there is something else we can do to help here.

I tried to reproduce this with fio, but didn't get much luck. Please share
the test scripts.

Thanks,
Song