Hi,
But are you telling me that in a 3-node cluster, quorum is lost when one of the nodes ip is down?
yes. Its the limitation with Pacemaker/Corosync. If the nodes participating in cluster cannot communicate with majority of them (quorum is lost), then the cluster is shut down.
However i am setting up a additional node to test a 4-node setup, but even then if i put down one node and nfs-grace_start (/usr/lib/ocf/resource.d/heartbeat/ganesha_grace) did not run properly on the other nodes, could it be that the whole cluster goes down as quorum lost again?
That's strange. We have tested quite a few times such configuration but haven't hit this issue. (CCin Saurabh who has been testing many such configurations).
Recently we have observed resource agents (nfs-grace_*) timing out sometimes esp when any node is taken down. But that shouldn't cause the entire cluster to shutdown. Could you check the logs (/var/log/messages, /var/log/pacemaker.log) for any error/warning reported when one node is taken down in case of 4-node setup.
Thanks, Soumya _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users