On Tue, Aug 14 2018, Danil Kipnis wrote: >>> On 08/11/2018 02:06 AM, NeilBrown wrote: >>>> It might be expected behaviour with async direct IO. >>>> Two threads writing with O_DIRECT io to the same address could result in >>>> different data on the two devices. This doesn't seem to me to be a >>>> credible use-case though. Why would you ever want to do that in >>>> practice? >>>> >>>> NeilBrown >>> >>> My only thought is while the credible case may be weak, if it is something >>> that can be protected against with a few conditionals to prevent the different >>> data on the slaves diverging -- then it's worth a couple of conditions to >>> prevent the nut that know just enough about dd from confusing things.... >> >>Yes, it can be protected against - the code is already written. >>If you have a 2-drive raid1 and want it to be safe against this attack, >>simply: >> >> mdadm /dev/md127 --grow --level=raid5 >> >>This will add the required synchronization between writes so that >>multiple writes to the one block are linearized. There will be a >>performance impact. > > Hi Neil, > > if I would store all the inflight writes in say an rb-tree by their offsets, > look for the offset of each incoming write in the tree and, if it can be found, > postpone the write until the one to the same offset returns: would that solve > the problem? I mean apart from the performance penalty due to the search, do > you think it would cover for the reorder of the writes going to the same sector > in theory? You would need to either: 1/ divide each request up into 1-block units or 2/ use an interval tree as requests can overlap even though they start at different offsets. RAID5 splits requests up and uses a hash table. > > Thank you, > > Danil. > > P.S. > When I try to do mdadm /dev/md127 --grow --level=raid5 on my raid1, I get this: > mdadm: Sorry, no reshape for RAID-1! You must have a broken version of mdadm. The code in git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git does not contain the string "Sorry". > unfreeze > Do I need some specific version? Only one that isn't broken. > What would a raid5 on top of only two drives > actually do? I don't understand why that is a difficult question. What does a RAID5 on top of 3 drives do? What does a RAID5 on top of 4 drives do? Now generalize to N drives. Now set N=2. You cannot set N=1, because then each stipe has N-1 == 0 data drives, so there is no data stored, and nothing to use to compute the parity. N=2 doesn't have this (or any) problem. NeilBrown
Attachment:
signature.asc
Description: PGP signature