On Wed, 9 Apr 2008 18:50:31 +0530 "Krishna Srinivas" <krishna@xxxxxxxxxxxxx> wrote: > OK, we are trying to reproduce the setup and fix the problem... > btw, can you without unify and see if failover happens cleanly. > Correct the wiki needs to be changed. I am CC'ing its author Paul > England -> <pengland (at) wxc.co.nz> I removed the Unify block from the server config on both server nodes, then attempted the same failover test again - this time, it worked ! The remaining functional node (dfsC) continued to serve the data, and the client (dfsA) continued to access the mountpoint as if nothing had changed. While dfsD was still down, i edited a text file in the mountpoint. The changes were available immediately on dfsC, as well as the other client (dfsB). I then plugged dfsD back into the storage network, and cat'd the text file from dfsA, which triggered the self-heal. The file changes were replicated to dfsD, which appeared to rejoin the cluster with no ill effects. This is, clearly, good news. :) It would appear that the culprit all along was the errant "Unify" block in the config file - this should /really/ be fixed on the wiki, as i am surely not the only person to have followed that example. I will now proceed to re-run all of my benchmark and I/O stress testing tests against the cluster - but this time, i will randomly disconnect and re-connect dfsD. Of course, i'll let the list know about the results of this round of testing. Thank you all, as always, for your attention in this matter. Your help and comments have all been highly informative and greatly appreciated. -- Daniel Maher <dma AT witbe.net>