Hi pg, We are not considering the MD superblock while comparing the cksum it is skipped while calculating the cksum. I hope the data on the both legs should always be same in good cases, thats what we are trying to compare. Thanks -swapnil On Fri, 10 Aug 2018, 18:53 Piergiorgio Sartor, <piergiorgio.sartor@xxxxxxxx> wrote: > > On Fri, Aug 10, 2018 at 01:31:18PM +0200, Jack Wang wrote: > > +cc Shaohua and NeilBrown > > Swapnil Ingle <swapnil.ingle@xxxxxxxxxxxxxxxx> 于2018年8月9日周四 下午4:20写道: > > > > > > Hi, > > > > > > For RAID1 with 2 slave devices underneath, we see different data on the slaves. > > > The IO is generated by using FIO with ioengine=libaio. > > > > > > Following script sometimes leads to have different md5sum on the slave devices. > > > > > > ================== > > > modprobe -r brd > > > modprobe brd rd_nr=2 rd_size=2097152 > > > dd if=/dev/zero of=/dev/ram1 oflag=direct || true > > > dd if=/dev/zero of=/dev/ram0 oflag=direct || true > > > dd if=/dev/ram1 | md5sum > > > dd if=/dev/ram0 | md5sum > > > mdadm --create md127 -f -n 2 -l raid1 /dev/ram0 /dev/ram1 <<EOF > > > yes > > > EOF > > > sleep 10 > > > cat /proc/mdstat > > > mkfs.ext4 /dev/md127 > > > mount /dev/md127 /tmp > > > touch /tmp/bla.txt > > > > > > echo -n " > > > [global] > > > rw=write > > > direct=1 > > > size=1G > > > ioengine=libaio > > > iodepth=128 > > > iodepth_batch_submit=128 > > > iodepth_batch_complete=128 > > > numjobs=4 > > > group_reporting > > > [job0] > > > filename=/tmp/bla.txt > > > [job1] > > > filename=/tmp/bla.txt > > > [job2] > > > filename=/tmp/bla.txt > > > [job3] > > > filename=/tmp/bla.txt > > > " > fio.ini > > > > > > cat fio.ini > > > > > > fio fio.ini > > > > > > ls -lah /tmp/ > > > > > > umount /tmp > > > > > > mdadm --stop md127 > > > > > > dd if=/dev/ram1 obs=1M ibs=1M skip=4 | md5sum > > > dd if=/dev/ram0 obs=1M ibs=1M skip=4 | md5sum > > > ========================== > > > > > > Is this expected behavior with async IO's? > > > > > > Thanks, > > > -Swapnil > > > -- > > It's pretty easy to reproduce the problem. > > I looks quite clear, the IO to both legs get reordered, which lead to > > both legs has different data on it. > > > > We extends the script a bit to better demonstrate the problem. > > We've seen it on different kernel version 4.4 and 4.14, I feel latest > > upstream is the same, but haven't tested it. > > > > Could you guys give us a hint, how could we avoid/fix the problem. > > Sorry, it's not clear to me, but are you using "md5sum" > on the _whole_ RAID-1 components? > If this is the case, I think these will always be different. > > Try to check "md5sum" on the newly created array, before > anything else. > I guess they'll be different. > > The point is the "md" headers will be already different, > since the UUID is different between the two components. > > Furthermore, depending on the filesystem, it could be > inconsistencies are created, but these should be only > in unused array areas. > > In any case, maybe I got it wrong and it is something > completely different. > > bye, > > pg > > > > > Thanks, > > Jack > > > > -- > > piergiorgio