Re: RAID1 sometimes have different data on the slave devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 14 2018, Danil Kipnis wrote:

>>> On 08/11/2018 02:06 AM, NeilBrown wrote:
>>>> It might be expected behaviour with async direct IO.
>>>> Two threads writing with O_DIRECT io to the same address could result in
>>>> different data on the two devices.  This doesn't seem to me to be a
>>>> credible use-case though.  Why would you ever want to do that in
>>>> practice?
>>>> 
>>>> NeilBrown
>>>
>>>   My only thought is while the credible case may be weak, if it is something
>>> that can be protected against with a few conditionals to prevent the different
>>> data on the slaves diverging -- then it's worth a couple of conditions to
>>> prevent the nut that know just enough about dd from confusing things....
>>
>>Yes, it can be protected against - the code is already written.
>>If you have a 2-drive raid1 and want it to be safe against this attack,
>>simply:
>>
>>  mdadm /dev/md127 --grow --level=raid5
>>
>>This will add the required synchronization between writes so that
>>multiple writes to the one block are linearized.  There will be a
>>performance impact.
>
> Hi Neil,
>
> if I would store all the inflight writes in say an rb-tree by their offsets,
> look for the offset of each incoming write in the tree and, if it can be found,
> postpone the write until the one to the same offset returns: would that solve
> the problem? I mean apart from the performance penalty due to the search, do
> you think it would cover for the reorder of the writes going to the same sector
> in theory?

You would need to either:
1/ divide each request up into 1-block units or
2/ use an interval tree
as requests can overlap even though they start at different offsets.

RAID5 splits requests up and uses a hash table.

>
> Thank you,
>
> Danil.
>
> P.S.
> When I try to do mdadm /dev/md127 --grow --level=raid5 on my raid1, I get this:
> mdadm: Sorry, no reshape for RAID-1!

You must have a broken version of mdadm.
The code in
   git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
does not contain the string "Sorry".


> unfreeze
> Do I need some specific version?

Only one that isn't broken.

>  What would a raid5 on top of only two drives
> actually do?
I don't understand why that is a difficult question.
What does a RAID5 on top of 3 drives do?
What does a RAID5 on top of 4 drives do?
Now generalize to N drives.
Now set N=2.

You cannot set N=1, because then each stipe has N-1 == 0 data drives, so
there is no data stored, and nothing to use to compute the parity.
N=2 doesn't have this (or any) problem.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux