Re: RAID1 sometimes have different data on the slave devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the attachment you can find a script that compares md5 sum of a
_file_ on top of ext4 on top of raid on each leg after running fio.
You will see that after a couple of minutes the file itself where fio
is writing to, becomes different on both legs, because ext4 enforced
consistency of its metadata, but not data inside files.

What we see is that consistency of raid1 is lower as that of a single
disk, as per definition of raid1. Raid1 itself can't prevent writes to
be reordered inside its legs/disks. The only way for an application to
enforce ordering is wait for each write to return, journaling,
barriers, etc. Am I right?

On Fri, Aug 10, 2018 at 7:22 PM Swapnil Ingle
<swapnil.ingle@xxxxxxxxxxxxxxxx> wrote:
>
> Hi pg,
>
> We are not considering the MD superblock while comparing the cksum it
> is skipped while calculating the cksum.
>
> I hope the data on the both legs should always be same in good cases,
> thats what we are trying to compare.
>
> Thanks
> -swapnil
>
>
> On Fri, 10 Aug 2018, 18:53 Piergiorgio Sartor,
> <piergiorgio.sartor@xxxxxxxx> wrote:
> >
> > On Fri, Aug 10, 2018 at 01:31:18PM +0200, Jack Wang wrote:
> > > +cc  Shaohua and NeilBrown
> > > Swapnil Ingle <swapnil.ingle@xxxxxxxxxxxxxxxx> 于2018年8月9日周四 下午4:20写道:
> > > >
> > > > Hi,
> > > >
> > > > For RAID1 with 2 slave devices underneath, we see different data on the slaves.
> > > > The IO is generated by using FIO with  ioengine=libaio.
> > > >
> > > > Following script sometimes leads to have different md5sum on the slave devices.
> > > >
> > > > ==================
> > > > modprobe -r brd
> > > > modprobe brd rd_nr=2 rd_size=2097152
> > > > dd if=/dev/zero of=/dev/ram1 oflag=direct || true
> > > > dd if=/dev/zero of=/dev/ram0 oflag=direct || true
> > > > dd if=/dev/ram1 | md5sum
> > > > dd if=/dev/ram0 | md5sum
> > > > mdadm --create md127 -f -n 2 -l raid1 /dev/ram0 /dev/ram1 <<EOF
> > > > yes
> > > > EOF
> > > > sleep 10
> > > > cat /proc/mdstat
> > > > mkfs.ext4 /dev/md127
> > > > mount /dev/md127 /tmp
> > > > touch /tmp/bla.txt
> > > >
> > > > echo -n "
> > > > [global]
> > > > rw=write
> > > > direct=1
> > > > size=1G
> > > > ioengine=libaio
> > > > iodepth=128
> > > > iodepth_batch_submit=128
> > > > iodepth_batch_complete=128
> > > > numjobs=4
> > > > group_reporting
> > > > [job0]
> > > > filename=/tmp/bla.txt
> > > > [job1]
> > > > filename=/tmp/bla.txt
> > > > [job2]
> > > > filename=/tmp/bla.txt
> > > > [job3]
> > > > filename=/tmp/bla.txt
> > > > " > fio.ini
> > > >
> > > > cat fio.ini
> > > >
> > > > fio fio.ini
> > > >
> > > > ls -lah /tmp/
> > > >
> > > > umount /tmp
> > > >
> > > > mdadm --stop md127
> > > >
> > > > dd if=/dev/ram1 obs=1M ibs=1M skip=4 | md5sum
> > > > dd if=/dev/ram0 obs=1M ibs=1M skip=4 | md5sum
> > > > ==========================
> > > >
> > > > Is this expected behavior with async IO's?
> > > >
> > > > Thanks,
> > > > -Swapnil
> > > > --
> > > It's pretty easy to reproduce the problem.
> > > I looks quite clear,  the IO to both legs get reordered, which lead to
> > > both legs has different data on it.
> > >
> > > We extends the script a bit to better demonstrate the problem.
> > > We've seen it on different kernel version 4.4 and 4.14, I feel latest
> > > upstream is the same, but haven't tested it.
> > >
> > > Could you guys give us a hint, how could we avoid/fix the problem.
> >
> > Sorry, it's not clear to me, but are you using "md5sum"
> > on the _whole_ RAID-1 components?
> > If this is the case, I think these will always be different.
> >
> > Try to check "md5sum" on the newly created array, before
> > anything else.
> > I guess they'll be different.
> >
> > The point is the "md" headers will be already different,
> > since the UUID is different between the two components.
> >
> > Furthermore, depending on the filesystem, it could be
> > inconsistencies are created, but these should be only
> > in unused array areas.
> >
> > In any case, maybe I got it wrong and it is something
> > completely different.
> >
> > bye,
> >
> > pg
> >
> > >
> > > Thanks,
> > > Jack
> >
> >
> >
> > --
> >
> > piergiorgio



-- 
Danil Kipnis
Linux Kernel Developer

Attachment: inconsistent_raid_ext4.sh
Description: application/shellscript


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux