Re: Split-Brain Protection for MD arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 15 Dec 2011 16:29:12 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
wrote:

> Neil,
> thanks for the review, and for detailed answers to my questions.
> 
> > When we mark a device 'failed' it should stay marked as 'failed'.  When the
> > array is optimal again it is safe to convert all 'failed' slots to
> > 'spare/missing' but not before.
> I did not understand all that reasoning. When you say "slot", you mean
> index in the dev_roles[] array, correct? If yes, I don't see what
> importance the index has, compared to the value of the entry itself
> (which is "role" in your terminology).
> Currently, 0xFFFE means both "failed" and "missing", and that makes
> perfect sense to me. Basically this means that this entry of
> dev_roles[] is unused. When a device fails, it is kicked out of the
> array, so its entry in dev_roles[] becomes available.
> (You once mentioned that for older arrays, their dev_roles[] index was
> also their role, perhaps you are concerned about those too).
> In any case, I will be watching for changes in this area, if you
> decide to make them (although I think this might break backwards
> compatibility, unless a new version of superblock will be used).

Maybe...  as I said, "confusing" is a relevant word in this area.

> 
> > If you have a working array and you initiate a write of a data block and the
> > parity block, and if one of those writes fails, then you no longer have a
> > working array.  Some data blocks in that stripe cannot be recovered.
> > So we need to make sure that admin knows the array is dead and doesn't just
> > re-assemble and think everything is OK.
> I see your point. I don't know what's better: to know the "last known
> good" configuration, or to know that the array has failed. I guess, I
> am just used to the former.

Possibly an 'array-has-failed' flag in the metadata would allow us to keep
the last known-good config.  But as it isn't any good any more I don't really
see the point.


> 
> > I think to resolve this issue we need 2 thing.
> >
> > 1/ when assembling an array if any device thinks that the 'chosen' device has
> >   failed, then don't trust that devices.
> I think that if any device thinks that "chosen" has failed, then
> either it has a more recent superblock, and then this device should be
> "chosen" and not the other. Or, the "chosen" device's superblock is
> the one that counts, then it doesn't matter what current device
> thinks, because array will be assembled according to the "chosen"
> superblock.

This is exactly what the current code does and it allows you to assemble an
array after a split-brain experience.  This is bad.  Checking what other
devices think of the chosen device lets you detect the effect of a
split-brain.


> 
> > 2/ Don't erase 'failed' status from dev_roles[] until the array is
> > optimal.
> 
> Neil, I think both these points don't resolve the following simple
> scenario: RAID1 with drive A and B. Drive A fails, array continues to
> operate on drive B. After reboot, only drive A is accessible. If we go
> ahead with assemble, we will see stale data. If after reboot, we,
> however, see only drive A, then (since B is "faulty" in A's
> superblock), we can go ahead and assemble. The change I suggested will
> abort in the first case, but will assemble in the second case.

Using --no-degraded will do what you want in both cases.  So no code change
is needed!

> 
> But obviously, you know better what MD users expect and want.

Don't bet on it.
So far I have one vote - from you - that --no-degraded should be he default
(I think that is what you are saying).  If others agree I'll certainly
consider it more.

Note that "--no-degraded" doesn't exactly mean "not assemble a degraded
array".  It means "don't assemble an array more degraded that it was last
time it was working".  i.e. require that all devices that are working
according to the metadata are actually available.

NeilBrown



> Thanks again for taking time and reviewing the proposal! And yes, next
> time, I will put everything in the email.
> 
> Alex.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux