Luca Berra <bluca@xxxxxxxxxx> wrote: > we can have a series of failures which must be accounted for and dealt > with according to a policy that might be site specific. > > A) Failure of the standby node > A.1) the active is allowed to continue in the absence of a data replica > A.2) disk writes from the active should return an error. > we can configure this setting in advance. OK. One normally wants RAID to provide continuity of service in real time however. Your choice A2 is not aimed at that, but at guaranteeing te existence of an exact copy of whatever is written. This seems to me only to have applications in accountancy :-). > B) Failure of the active node > B.1) the standby node takes immediately ownership of data and resumes > processing > B.2) the standby node remains idle Well, that's the same set of choices as for A, morally. You might as well pair them with A1 and A2. > C) communication failure between the two nodes (and we don't have an > external mechanism to arbitrate the split brain condition) > C.1) both system panic and halt > C.2) A1 + B2 I don't see the point of anything except A1+B1, A2+B2, as policies. But A1+B1 will normally cause divergence, unless the failure is due to actual isolation of, say, system A from the whole external net. Prvided the route between the two systems passes through the router that chooses whether to use A or B for external contacts, I don't see how a loss of contact can be anything but a breakdown of that router (but you could argue for a very whacky router). In which case it doesn't matter what you choose, because nothing will write to either. > C.3) A2 + B2 > C.4) A1 + B1 > C.5) A2 + B1 (which hopefully will go to A2 itself) > D) communication failure between the two nodes (admitting we have an > external mechanism to arbitrate the split brain condition) > D.1) A1 + B2 > D.2) A2 + B2 > D.2) B1 then A1 > D.3) B1 then A2 I would hope that we could at least guarantee that if comms fails between them, then it is because ONE (or more) of them is out of contact with the world. We can achieve that condition via routing. In that case either A1+B1 or A2+B2 would do, depending on your aims (continuity of service or data replication). > E) rolling failure (C, then B) > > F) rolling failure (D, then B) Not sure what these mean. > G) a failed nodes is restored > > H) a node (re)starts while the other is failed > > I) a node (re)starts during C > > J) a node (re)starts during D > > K) a node (re)starts during E > > L) a node (re)starts during F Ecch. Well, you are very thorough. This is well thought-through. > scenarios without a sub-scenarios are left as an exercise to the reader, > or i might find myself losing a job :) > > now evaluate all scenarios under the following drivers: > 1) data availability above all others > 2) replica of data above all others Exactly. I see only those as sensible aims. > 3) data availability above replica, but data consistency above > availability Heck! Well, that is very very thorough. > (*) if you got this far, add asynchronous replicas to the picture. I don't know what to say. In many of those situations we do not know what to do, but your analysis is excellent, and allos us to at least think about it. Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html