Le lundi 12 décembre 2011 19:51:23, vous avez écrit : > split-brain I'm participating on the NEO[1] project (object database server with redundancy - that last bit is the one relevant to this discussion), which faces the same kind of problem (storage nodes dying when cluster is functional or not, dead nodes comming back to life later, etc). So we had to design some counter measures to handle split-brain. I'm happy to recognise some equivalent of the decisions we took on NEO, and I'll be following this thread with attention (we didn't try to get a lot of reviewing on our design so far). I would suggest one thing: Use a fixed increment for "metadata version" number. Time representation is not reliable IMHO, especially at times when you need to setup an array: faulty BIOS battery, old RTC drifting either way, no NTP to correct this (either none available or no client to access one). If timestamp is affected by timezone (and especially DST) makes matters worse. Admitedly, fixed increment exposes user to problems if he decides to independently run two halves of a split brain, start making their data diverge, reach a point (controlable) where version number is at some convenient value and then let the array assemble itself and burst in fire. Though, user has to jump through hoops to reach this. Timestamp-based requires non-monotonous RTC. Side note: if anyone knows a time source available to userland which is not affected by date/ntpd/ntpdate nor timezones nor DST (but can drift when computer is powered down - but if possible not when suspended), please tell me. [1] http://pypi.python.org/pypi/neoppod Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html