Excerpts from Daniel Maher's message of Wed Apr 09 16:40:20 +0530 2008: > > Hello all, > > After upgrading to 1.3.8pre5, i performed a simple failover test of my > two-node HA Gluster cluster (wherein one of the nodes is unplugged from > the network). Unfortunately, the results were - once again - absolutely > disastrous. > > After unplugging one of the two nodes, the cluster became incredibly > unstable, and the mountpoint on the client bounced between > non-existant and simply bizarre. This condition remained even after > plugging the node back onto the network. Restarting glusterfsd on both > storage nodes did not help at all. > > At this point i would be very interested to know if anybody has set up > a functioning two-node HA cluster using AFR, which can withstand one of > the nodes temporarily disappearing. Is this something Gluster is > designed to do, or am i expecting too much ? This is definitely something GlusterFS is designed to handle. I've set up this configuration in our lab and am looking into it. > For those following along, a discussion of the first failover test is > available from the gluster-devel archives : > http://lists.gnu.org/archive/html/gluster-devel/2008-04/msg00010.html > > The environment is identical as that described by the email linked > above, so i won't describe it again here. This time, however, i had > full DEBUG logging enabled. I have made these logs (all 3000+ > lines) available on pastebin : > dfsC (node that stayed up) : http://pastebin.ca/978162 > dfsD (node that was unplugged) : http://pastebin.ca/978166 Is the order of subvolumes for AFR in your server specfiles the same? Specifically, on dfsC you should have subvolumes gfs-dfsD-ds gfs-ds and on dfsD you should have subvolumes gfs-ds gfs-dfsC-ds Is this the case? If not, failover will not work. Vikas -- http://vikas.80x25.org/