Re: RAID1 sometimes have different data on the slave devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 12.08.2018 um 14:14 schrieb Danil Kipnis:
> Fio (or some other application like key-value or object database)
> submits two writes which go to the same offset in a file (or block
> device). Since fio is using libaio, _both_ those writes reach md
> layer. Md forwards those writes to each of its legs and waits for
> confirmations to return. On one leg/disk the writes are executed in
> one order and on another leg - the other way round. The order in which
> the writes are executed is decided by some i.e. firmware inside each
> of the two hdds, md has no possibility to enforce the same order on
> each leg. And now you have one value on one leg and another on
> another. Md receives both confirmations of both writes and says the
> user, everything is fine. And the user will read only one of those
> values all the time, at least for md-raid, where read order is static,
> until of course you remove one leg, which contained this value, and
> suddenly user reads the other one.
> To quote Wikipedia on cap theorem, this thing „consistency: Every read
> receives the most recent write or an error“, can not be guaranteed by
> the raid1.
> So Application must enforce it - like ext4 or any journaling file
> system is doing for its meta data. Which means in the most primitive
> way: do not submit two writes at the same time, wait for the first one
> to return, then submit another one

i see no logic here because i expect from a mirror as RAID1/RAID10
identical data on both mirrors without any but/if/or/maybe

"Two threads writing with O_DIRECT io to the same address could result
in different data on the two devices" makes no sense - everything talks
with the RAID1 layer which is a block-device and expected to have alway
the same data on both mirrors - O_DIRECT don't bypass the RAID layer
because it even don't know about the phyiscal disks underneath

if what ever workload (except a hard crash) leads to different data it's
a bug which should be fixed better sooner than later



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux