Hello, i tried with a 4-node setup but the effect is the same, it takes down the cluster when one of the node is offline. I thought even in a 3-node setup when 2 nodes are online and only one is gone the majority of 2 nodes up vs. 1 node down should not result in a lost quorum? I have created the gluster volume with the following command: #> gluster volume create scratch replica 4 transport tcp kaukasus:/tank/brick1 altai:/tank/brick1 rnas2:/tank/brick1 bbk1:/scratch/brick1 force The following is the log during the takedown of one node (altai): Nov 30 11:23:43 rnas2 corosync[16869]: [TOTEM ] A new membership (129.132.145.5:1120) was formed. Members left: 2 Nov 30 11:23:43 rnas2 cib[16088]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member) Nov 30 11:23:43 rnas2 cib[16088]: notice: Removing altai/2 from the membership list Nov 30 11:23:43 rnas2 cib[16088]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache Nov 30 11:23:43 rnas2 crmd[16093]: notice: Our peer on the DC (altai) is dead Nov 30 11:23:43 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member) Nov 30 11:23:43 rnas2 attrd[16091]: notice: Removing all altai attributes for attrd_peer_change_cb Nov 30 11:23:43 rnas2 crmd[16093]: notice: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK...llback ] Nov 30 11:23:43 rnas2 attrd[16091]: notice: Lost attribute writer altai Nov 30 11:23:43 rnas2 attrd[16091]: notice: Removing altai/2 from the membership list Nov 30 11:23:43 rnas2 attrd[16091]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member) Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: Removing altai/2 from the membership list Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member) Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: Removing altai/2 from the membership list Nov 30 11:23:43 rnas2 corosync[16869]: [QUORUM] Members[3]: 1 3 4 Nov 30 11:23:43 rnas2 crmd[16093]: notice: Node altai[2] - state is now lost (was member) Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache Nov 30 11:23:43 rnas2 corosync[16869]: [MAIN ] Completed service synchronization, ready to provide service. Nov 30 11:23:43 rnas2 crmd[16093]: notice: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=...t_vote ] Nov 30 11:23:44 rnas2 crmd[16093]: notice: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl...espond ] Nov 30 11:23:44 rnas2 IPaddr(rnas2-cluster_ip-1)[10934]: INFO: IP status = ok, IP_CIP= Nov 30 11:23:44 rnas2 crmd[16093]: notice: Operation rnas2-cluster_ip-1_stop_0: ok (node=rnas2, call=53, rc=0, cib-update=36, confirmed=true) Nov 30 11:23:44 rnas2 crmd[16093]: notice: Operation nfs-grace_stop_0: ok (node=rnas2, call=55, rc=0, cib-update=37, confirmed=true) Nov 30 11:23:44 rnas2 attrd[16091]: notice: Processing sync-response from bbk1 Nov 30 11:23:45 rnas2 ntpd[1700]: Deleting interface #47 bond0, 129.132.145.23#123, interface stats: received=0, sent=0, dropped=0...783 secs Nov 30 11:24:24 rnas2 lrmd[16090]: warning: nfs-grace_start_0 process (PID 10947) timed out Nov 30 11:24:24 rnas2 lrmd[16090]: warning: nfs-grace_start_0:10947 - timed out after 40000ms Nov 30 11:24:24 rnas2 crmd[16093]: error: Operation nfs-grace_start_0: Timed Out (node=rnas2, call=56, timeout=40000ms) Nov 30 11:24:24 rnas2 crmd[16093]: notice: Operation nfs-grace_stop_0: ok (node=rnas2, call=57, rc=0, cib-update=39, confirmed=true) I discovered, when i restart the pacemaker service on one of the running nodes, it can successfully take the cluster online again: root@kaukasus ~# systemctl restart pacemaker Nov 30 11:45:36 rnas2 crmd[16093]: notice: Our peer on the DC (kaukasus) is dead Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK...llback ] Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=...t_vote ] Nov 30 11:45:36 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member) Nov 30 11:45:36 rnas2 attrd[16091]: notice: Removing all kaukasus attributes for attrd_peer_change_cb Nov 30 11:45:36 rnas2 attrd[16091]: notice: Removing kaukasus/1 from the membership list Nov 30 11:45:36 rnas2 attrd[16091]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member) Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: Removing kaukasus/1 from the membership list Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache Nov 30 11:45:36 rnas2 cib[16088]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member) Nov 30 11:45:36 rnas2 cib[16088]: notice: Removing kaukasus/1 from the membership list Nov 30 11:45:36 rnas2 cib[16088]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member) Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: Removing kaukasus/1 from the membership list Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null)) Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl...espond ] Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null)) Nov 30 11:45:36 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null)) Nov 30 11:45:36 rnas2 cib[16088]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null)) Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16591]: INFO: Adding inet address 129.132.145.23/32 with broadcast address 129.132.... bond0 Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16600]: INFO: Bringing device bond0 up Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16609]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agen...t_used Nov 30 11:45:46 rnas2 crmd[16093]: notice: Operation rnas2-cluster_ip-1_start_0: ok (node=rnas2, call=58, rc=0, cib-update=44, con...ed=true) Nov 30 11:45:48 rnas2 ntpd[1700]: Listen normally on 48 bond0 129.132.145.23 UDP 123 Nov 30 11:45:48 rnas2 ntpd[1700]: new interface(s) found: waking up resolver Yours, Rigi On Mon, 2015-11-30 at 15:26 +0530, Soumya Koduri wrote:
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users