Re: Failover test early success (WAS: 1.3.8pre5 glusterfsd WARNING message)

Daniel Maher <dma+gluster@xxxxxxxxx> · Wed, 9 Apr 2008 16:43:05 +0200

On Wed, 9 Apr 2008 18:50:31 +0530 "Krishna Srinivas"
<krishna@xxxxxxxxxxxxx> wrote:

> OK, we are trying to reproduce the setup and fix the problem...
> btw, can you without unify and see if failover happens cleanly.

> Correct the wiki needs to be changed. I am CC'ing its author Paul
> England -> <pengland (at) wxc.co.nz>

I removed the Unify block from the server config on both server nodes,
then attempted the same failover test again - this time, it worked !
The remaining functional node (dfsC) continued to serve the data, and
the client (dfsA) continued to access the mountpoint as if nothing had
changed.

While dfsD was still down, i edited a text file in the mountpoint.  The
changes were available immediately on dfsC, as well as the other client
(dfsB).  I then plugged dfsD back into the storage network, and cat'd
the text file from dfsA, which triggered the self-heal.  The file
changes were replicated to dfsD, which appeared to rejoin the cluster
with no ill effects.

This is, clearly, good news. :)  It would appear that the culprit all
along was the errant "Unify" block in the config file - this
should /really/ be fixed on the wiki, as i am surely not the only
person to have followed that example.

I will now proceed to re-run all of my benchmark and I/O stress testing
tests against the cluster - but this time, i will randomly disconnect
and re-connect dfsD.  Of course, i'll let the list know about the
results of this round of testing.

Thank you all, as always, for your attention in this matter.  Your help
and comments have all been highly informative and greatly appreciated.

-- 
Daniel Maher <dma AT witbe.net>