Re: [PATCH 1/2] md bitmap bug fixes

Lars Marowsky-Bree <lmb@xxxxxxx> · Sat, 19 Mar 2005 15:07:54 +0100

On 2005-03-19T14:27:45, "Peter T. Breuer" <ptb@xxxxxxxxxxxxxx> wrote:

> > Which one of the datasets you choose you could either arbitate via some
> > automatic mechanisms (drbd-0.8 has a couple) or let a human decide.
> But how on earth can you get into this situation? It still is not clear
> to me, and it seems to me that there is a horrible flaw in the managing
> algorithm for the failover if it can happen, and one should fix it.

You mean, like an admin screwup which should never happen? ;-)

Remember what RAID is about: About errors which _should not_ occur (if
the world was perfect and software and hardware never failed); but which
with a given probability they _do_ occur anyway, because the real world
doesn't always do the right thing.

It's futile to argue about that it should never occur; morale arguments
don't change reality. 

Split-brain is a well studied subject, and while many prevention
strategies exist, errors occur even in these algorithms; and there's
always a trade-off: For some scenarios, they might choose a very low
probability of split-brain occuring in exchange for a higher guarantee
that service will 'always' be provided. It all depends on the kind of
data and service, the requirements and the cost associated with it.

> > The default with drbd-0.7 is that they will detect this situation has
> > occured and refuse to start replication unless the admin intervenes and
> > decides which side wins.
> Hmm. I don't believe it can detect it reliably. It is always possible
> for both sides to have written different data in the ame places, etc.

drbd can detect this reliably by its generation counters; the one
element which matters here is the one which tracks if the device has
been promoted to primary while being disconnected.

(Each side keeps its own generation counters and it's own bitmap &
journal, and during regular operation, they are all sync'ed. So they can
be used to figure out what diverged 'easily' enough.)

If you don't believe something, why don't you go read up ;-)

This also is a reasonably well studied subject; there's bits in "Fault
Tolerance in Distributed Systems" by Jalote, and Philipp Reisner also
has a paper on it online; I think parts of it are also covered by his
thesis.

Sincerely,
    Lars Marowsky-Brée <lmb@xxxxxxx>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html