Dear Soumya, First of all thank you for your answer. On Fre, 2015-11-27 at 14:27 +0530, Soumya Koduri wrote: > Hi, > > On 11/27/2015 01:58 PM, ml wrote: > > Dear All, > > > > I am trying to get a nfs-ganesha ha cluster running, with 3, CentOS > > Linux release 7.1.1503 nodes. I use the package glusterfs-ganesha > > -3.7.6 > > -1.el7.x86_64 to get the HA scripts. So far it works fine when i > > stop > > the nfs-ganesha service on one of the node it moves the virtual ip > > to > > one of the other node, altai-dead_ip-1 resource is created > > properly: > > > > > > root@rnas2 ~# pcs status > > Cluster name: ganesha-cluster-dmath > > Last updated: Thu Nov 26 10:41:07 2015 Last > > change: Thu Nov 26 10:40:06 2015 by root via cibadmin on altai > > Stack: corosync > > Current DC: rnas2 (version 1.1.13-a14efad) - partition with > > quorum > > 3 nodes and 13 resources configured > > > > Online: [ altai kaukasus rnas2 ] > > > > Full list of resources: > > > > Clone Set: nfs-mon-clone [nfs-mon] > > Started: [ altai kaukasus rnas2 ] > > Clone Set: nfs-grace-clone [nfs-grace] > > Started: [ altai kaukasus rnas2 ] > > kaukasus-cluster_ip-1 (ocf::heartbeat:IPaddr): S > > tarted kaukasus > > kaukasus-trigger_ip-1 (ocf::heartbeat:Dummy): St > > arted kaukasus > > altai-cluster_ip-1 (ocf::heartbeat:IPaddr): Star > > ted kaukasus > > altai-trigger_ip-1 (ocf::heartbeat:Dummy): Start > > ed kaukasus > > rnas2-cluster_ip-1 (ocf::heartbeat:IPaddr): Star > > ted rnas2 > > rnas2-trigger_ip-1 (ocf::heartbeat:Dummy): Start > > ed rnas2 > > altai-dead_ip-1 (ocf::heartbeat:Dummy): Started > > altai > > > > PCSD Status: > > kaukasus: Online > > altai: Online > > rnas2: Online > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > > > But when i just disconnect the network on one of the node, in this > > case > > altai (or poweroff), > > > > > > root@altai ~# ifdown bond0 > > > > > > it takes down the whole cluster. I found the following message in > > the > > logs: > > > > > > Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs > > -grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms) > > > > > > I wonder if i just misconfigured something or if this is not > > supported > > yet? > > > > Since its a 3-node cluster, quorum shall be enabled. When any of > those > machine/its IP is down, quorum shall be lost resulting in pacemaker > shutting down entire cluster. If possible could you check the same > scenario with 4-node setup? Sorry in advance, i might not correctly understand as i am not nativ english speaking. But are you telling me that in a 3-node cluster, quorum is lost when one of the nodes ip is down? However i am setting up a additional node to test a 4-node setup, but even then if i put down one node and nfs-grace_start (/usr/lib/ocf/resource.d/heartbeat/ganesha_grace) did not run properly on the other nodes, could it be that the whole cluster goes down as quorum lost again? Yours, Rigi > > Thanks, > Soumya > > > > below the log during the take down: > > > > Nov 26 10:44:24 rnas2 corosync[8848]: [TOTEM ] A new membership > > (129.132.145.5:1048) was formed. Members left: 2 > > Nov 26 10:44:24 rnas2 attrd[17253]: notice: > > crm_update_peer_proc: Node altai[2] - state is now lost (was > > member) > > Nov 26 10:44:24 rnas2 attrd[17253]: notice: Removing all altai > > attributes for attrd_peer_change_cb > > Nov 26 10:44:25 rnas2 corosync[8848]: [QUORUM] Members[2]: 1 3 > > Nov 26 10:44:25 rnas2 corosync[8848]: [MAIN ] Completed > > service synchronization, ready to provide service. > > Nov 26 10:44:25 rnas2 cib[17250]: notice: crm_update_peer_proc: > > Node altai[2] - state is now lost (was member) > > Nov 26 10:44:25 rnas2 cib[17250]: notice: Removing altai/2 from > > the membership list > > Nov 26 10:44:25 rnas2 cib[17250]: notice: Purged 1 peers with > > id=2 and/or uname=altai from the membership cache > > Nov 26 10:44:25 rnas2 pacemakerd[17249]: notice: Node altai[2] > > - state is now lost (was member) > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Node altai[2] - > > state is now lost (was member) > > Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: > > crm_update_peer_proc: Node altai[2] - state is now lost (was > > member) > > Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for > > shutdown action on 2 > > Nov 26 10:44:25 rnas2 attrd[17253]: notice: Removing altai/2 > > from the membership list > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of > > altai not matched > > Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Removing > > altai/2 from the membership list > > Nov 26 10:44:25 rnas2 attrd[17253]: notice: Purged 1 peers with > > id=2 and/or uname=altai from the membership cache > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: State transition > > S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > > origin=abort_transition_graph ] > > Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Purged 1 peers > > with id=2 and/or uname=altai from the membership cache > > Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for > > shutdown action on 2 > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of > > altai not matched > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs > > -grace:0 (Started kaukasus) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs > > -grace:1 (Started rnas2) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart kaukasus > > -cluster_ip-1 (Started kaukasus) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start altai > > -cluster_ip-1 (kaukasus) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start altai > > -trigger_ip-1 (kaukasus) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart rnas2 > > -cluster_ip-1 (Started rnas2) > > Nov 26 10:44:25 rnas2 pengine[17254]: notice: Calculated > > Transition 85: /var/lib/pacemaker/pengine/pe-input-86.bz2 > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 29: stop kaukasus-cluster_ip-1_stop_0 on kaukasus > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 35: start altai-trigger_ip-1_start_0 on kaukasus > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 37: stop rnas2-cluster_ip-1_stop_0 on rnas2 (local) > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 36: monitor altai-trigger_ip-1_monitor_10000 on kaukasus > > Nov 26 10:44:25 rnas2 IPaddr(rnas2-cluster_ip-1)[30797]: INFO: > > IP status = ok, IP_CIP= > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation rnas2 > > -cluster_ip-1_stop_0: ok (node=rnas2, call=82, rc=0, cib > > -update=210, > > confirmed=true) > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 21: stop nfs-grace_stop_0 on kaukasus > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 23: stop nfs-grace_stop_0 on rnas2 (local) > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation nfs > > -grace_stop_0: ok (node=rnas2, call=84, rc=0, cib-update=211, > > confirmed=true) > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 22: start nfs-grace_start_0 on kaukasus > > Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action > > 24: start nfs-grace_start_0 on rnas2 (local) > > Nov 26 10:44:26 rnas2 ntpd[1700]: Deleting interface #27 bond0, > > 129.132.145.23#123, interface stats: received=0, sent=0, dropped=0, > > active_time=69258 secs > > Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs-grace_start_0 > > process (PID 30810) timed out > > Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs > > -grace_start_0:30810 - timed out after 40000ms > > Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs > > -grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms) > > Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs > > -grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition aborted > > by nfs-grace_start_0 'modify' on rnas2: Event failed > > (magic=2:1;24:85:0:836713e1-c9d3-43f8-bffd > > -756e023eee8a,...event:381, > > 0) > > Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs > > -grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error > > Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs > > -grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error > > Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs > > -grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 85 > > (Complete=13, Pending=0, Fired=0, Skipped=3, Incomplete=8, > > Source=/var/lib/pacemaker/pengine/pe-input-86.bz2): Stopped > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:0 on kaukasus: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:0 on kaukasus: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:1 on rnas2: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:1 on rnas2: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Recover nfs > > -grace:0 (Started kaukasus) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop nfs > > -grace:1 (rnas2) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start kaukasus > > -cluster_ip-1 (kaukasus) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start altai > > -cluster_ip-1 (kaukasus) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start rnas2 > > -cluster_ip-1 (rnas2) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated > > Transition 86: /var/lib/pacemaker/pengine/pe-input-87.bz2 > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:0 on kaukasus: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:0 on kaukasus: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:1 on rnas2: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing > > failed op start for nfs-grace:1 on rnas2: unknown error (1) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from kaukasus after 1000000 failures > > (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from kaukasus after 1000000 failures > > (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from kaukasus after 1000000 failures > > (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs > > -grace-clone away from rnas2 after 1000000 failures (max=1000000) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop nfs > > -grace:0 (kaukasus) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop nfs > > -grace:1 (rnas2) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start kaukasus > > -cluster_ip-1 (kaukasus - blocked) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start altai > > -cluster_ip-1 (kaukasus - blocked) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start rnas2 > > -cluster_ip-1 (rnas2 - blocked) > > Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated > > Transition 87: /var/lib/pacemaker/pengine/pe-input-88.bz2 > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 2: > > stop nfs-grace_stop_0 on kaukasus > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 6: > > stop nfs-grace_stop_0 on rnas2 (local) > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Operation nfs > > -grace_stop_0: ok (node=rnas2, call=86, rc=0, cib-update=218, > > confirmed=true) > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 87 > > (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pacemaker/pengine/pe-input-88.bz2): Complete > > Nov 26 10:45:05 rnas2 crmd[17255]: notice: State transition > > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL > > origin=notify_crmd ] > > > > Yours, > > Rigi > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users