--- Anand Avati <avati@xxxxxxxxxxxxx> wrote: > Cold boot is not a problem. Let me explain with an > example > > 1. Node A and Node B are UP > 2. Node B goes down > 3. Node A gets changes > 4. Node A goes down > > now, > > 5a. Node A and B comes back together - no problem > 5b. Node A alone comes back - no problem > 5c. Node B alone comes back - potential problem if > same files or directories changed in step 3 are > accessed. > 5d. Node B alone comes back and before data is > accessed Node A comes back too - no problem. > > > supporting 5c requires quite a bit of new framework > code which is currently not in our highest > priority. are the above restrictions unacceptable > in your case? Well, your scenario assessments make sense to me, but I would still consider this a cold boot problem! The scenarios you describe are all the possible cold boot scenarios. The cold boot "problem" is figuring out which one of these scenarios you have so that you can avoid 5c! 5c is indeed unacceptable in my case (and I would think most people's case) 5a is a theoretical ideal scenario, but it is a race condition and will probably never actually happen, 5b, 5c or 5d will happen instead. But without any special logic it seems hard to know if you are in 5b or 5c? As a simple, but less than ideal solution, I was suggesting to always force cold boot scenarios to be 5a by never bringing a cluster online (exporting the FS to clients) until 5a is achieved and a recursive find is run. I guess that this could be done with some scripts, the only tricky part would be swapping configurations so that only the node performing the find has access to the cluster until it is in sync. But even if I solves this with some clever scripting, it still leaves one potential gap, how do I prevent: 1) Node A goes down 2) Node B has a large write 3) Node A comes up and starts syncing 4) Node B goes down before sync is finished -> A is not "healed" enough to be able to serve the cluster properly, the cluster should be offline! Maybe I will break out my scripting gloves and experiment a little. Does this seem feasible, will I be able to achieve what I want? It seems like a special dynamic config file framework/layout would be needed to make this possible? Of course, the drawback to always forcing 5a is that you can never bring the cluster up after a cold boot until all the nodes are online. This could be a serious problem if you have many nodes. Perhaps there is a way to "layer" a cluster so that you never need more than two nodes to force 5a, even when there are more than 2 nodes in a cluster? Still slightly puzzled that 5c is acceptable to most people... scratch head. Thanks, -Martin ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping