Re: The read data is wrong from raid5 when recovery happens

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Fri, 26 May 2023 11:13:12 +0200

On Fri, 26 May 2023 15:23:58 +0800
Xiao Ni <xni@xxxxxxxxxx> wrote:

> On Fri, May 26, 2023 at 3:12 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> >
> >
> >
> > On 5/26/23 14:45, Xiao Ni wrote:  
> > > On Fri, May 26, 2023 at 11:09 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> > > wrote:  
> > >>
> > >>
> > >> On 5/26/23 09:49, Xiao Ni wrote:  
> > >>> Hi all
> > >>>
> > >>> We found a problem recently. The read data is wrong when recovery
> > >>> happens. Now we've found it's introduced by patch 10764815f (md: add io
> > >>> accounting for raid0 and raid5). I can reproduce this 100%. This
> > >>> problem exists in upstream. The test steps are like this:
> > >>>
> > >>> 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean
> > >>> 2. mkfs.ext4 -F $devname
> > >>> 3. mount $devname $mount_point
> > >>> 4. mdadm --incremental --fail sdd
> > >>> 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000
> > >>> status=progress  
> >
> > I suppose /tmp is the mount point.  
> 
> /tmp/pythontest is the mount point
> 
> >  
> > >>> 6. mdadm /dev/md126 --add /dev/sdd
> > >>> 7. create 31 processes that writes and reads. It compares the content
> > >>> with md5sum. The test will go on until the recovery stops  
> >
> > Could you share the test code/script for step 7? Will try it from my side.  
> 
> The test scripts are written by people from intel.
> Hi, Mariusz. Can I share the test scripts here?

Yes. Let us know if there is something else we can do to help here.

Thanks,
Mariusz