On Wed, Aug 15, 2018 at 1:59 AM NeilBrown <neilb@xxxxxxxx> wrote: > > On Tue, Aug 14 2018, Danil Kipnis wrote: > > >>> On 08/11/2018 02:06 AM, NeilBrown wrote: > >>>> It might be expected behaviour with async direct IO. > >>>> Two threads writing with O_DIRECT io to the same address could result in > >>>> different data on the two devices. This doesn't seem to me to be a > >>>> credible use-case though. Why would you ever want to do that in > >>>> practice? > >>>> > >>>> NeilBrown > >>> > >>> My only thought is while the credible case may be weak, if it is something > >>> that can be protected against with a few conditionals to prevent the different > >>> data on the slaves diverging -- then it's worth a couple of conditions to > >>> prevent the nut that know just enough about dd from confusing things.... > >> > >>Yes, it can be protected against - the code is already written. > >>If you have a 2-drive raid1 and want it to be safe against this attack, > >>simply: > >> > >> mdadm /dev/md127 --grow --level=raid5 > >> > >>This will add the required synchronization between writes so that > >>multiple writes to the one block are linearized. There will be a > >>performance impact. > > > > Hi Neil, > > > > if I would store all the inflight writes in say an rb-tree by their offsets, > > look for the offset of each incoming write in the tree and, if it can be found, > > postpone the write until the one to the same offset returns: would that solve > > the problem? I mean apart from the performance penalty due to the search, do > > you think it would cover for the reorder of the writes going to the same sector > > in theory? > > You would need to either: > 1/ divide each request up into 1-block units or > 2/ use an interval tree > as requests can overlap even though they start at different offsets. > > RAID5 splits requests up and uses a hash table. Right. Thanks for the explanation. > > > > Thank you, > > > > Danil. > > > > P.S. > > When I try to do mdadm /dev/md127 --grow --level=raid5 on my raid1, I get this: > > mdadm: Sorry, no reshape for RAID-1! > > You must have a broken version of mdadm. > The code in > git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git > does not contain the string "Sorry". I've used some patched version - my bad, sorry for the noise. > > > What would a raid5 on top of only two drives > > actually do? > I don't understand why that is a difficult question. > What does a RAID5 on top of 3 drives do? > What does a RAID5 on top of 4 drives do? > Now generalize to N drives. > Now set N=2. I had the naive understanding, that with raid5 one chunk goes to one drive, another - to the second and the XOR of them - to the third. Does it mean, with two drives, a chunk and a XOR have to go to the same drive? Don't mind, I should read the code, I know. > You cannot set N=1, because then each stipe has N-1 == 0 data drives, so > there is no data stored, and nothing to use to compute the parity. > N=2 doesn't have this (or any) problem. > > NeilBrown Best, Danil