Re: 3 node NFS-Ganesha Cluster

ml <ml@xxxxxxxxxx> · Mon, 30 Nov 2015 12:08:23 +0100

Hello,

i tried with a 4-node setup but the effect is the same, it takes down the cluster when one of the node is offline. I thought even in a 3-node setup when 2 nodes are online and only one is gone the majority of 2 nodes up vs. 1 node down should not result in a lost quorum?

I have created the gluster volume with the following command:

    #> gluster volume create scratch replica 4 transport tcp kaukasus:/tank/brick1 altai:/tank/brick1 rnas2:/tank/brick1 bbk1:/scratch/brick1 force

The following is the log during the takedown of one node (altai):

    Nov 30 11:23:43 rnas2 corosync[16869]: [TOTEM ] A new membership (129.132.145.5:1120) was formed. Members left: 2
    Nov 30 11:23:43 rnas2 cib[16088]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member)
    Nov 30 11:23:43 rnas2 cib[16088]: notice: Removing altai/2 from the membership list
    Nov 30 11:23:43 rnas2 cib[16088]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache
    Nov 30 11:23:43 rnas2 crmd[16093]: notice: Our peer on the DC (altai) is dead
    Nov 30 11:23:43 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member)
    Nov 30 11:23:43 rnas2 attrd[16091]: notice: Removing all altai attributes for attrd_peer_change_cb
    Nov 30 11:23:43 rnas2 crmd[16093]: notice: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK...llback ]
    Nov 30 11:23:43 rnas2 attrd[16091]: notice: Lost attribute writer altai
    Nov 30 11:23:43 rnas2 attrd[16091]: notice: Removing altai/2 from the membership list
    Nov 30 11:23:43 rnas2 attrd[16091]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache
    Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member)
    Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: Removing altai/2 from the membership list
    Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node altai[2] - state is now lost (was member)
    Nov 30 11:23:43 rnas2 pacemakerd[16085]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache
    Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: Removing altai/2 from the membership list
    Nov 30 11:23:43 rnas2 corosync[16869]: [QUORUM] Members[3]: 1 3 4
    Nov 30 11:23:43 rnas2 crmd[16093]: notice: Node altai[2] - state is now lost (was member)
    Nov 30 11:23:43 rnas2 stonith-ng[16089]: notice: Purged 1 peers with id=2 and/or uname=altai from the membership cache
    Nov 30 11:23:43 rnas2 corosync[16869]: [MAIN  ] Completed service synchronization, ready to provide service.
    Nov 30 11:23:43 rnas2 crmd[16093]: notice: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=...t_vote ]
    Nov 30 11:23:44 rnas2 crmd[16093]: notice: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl...espond ]
    Nov 30 11:23:44 rnas2 IPaddr(rnas2-cluster_ip-1)[10934]: INFO: IP status = ok, IP_CIP=
    Nov 30 11:23:44 rnas2 crmd[16093]: notice: Operation rnas2-cluster_ip-1_stop_0: ok (node=rnas2, call=53, rc=0, cib-update=36, confirmed=true)
    Nov 30 11:23:44 rnas2 crmd[16093]: notice: Operation nfs-grace_stop_0: ok (node=rnas2, call=55, rc=0, cib-update=37, confirmed=true)
    Nov 30 11:23:44 rnas2 attrd[16091]: notice: Processing sync-response from bbk1
    Nov 30 11:23:45 rnas2 ntpd[1700]: Deleting interface #47 bond0, 129.132.145.23#123, interface stats: received=0, sent=0, dropped=0...783 secs
Nov 30 11:24:24 rnas2 lrmd[16090]: warning: nfs-grace_start_0 process (PID 10947) timed out
Nov 30 11:24:24 rnas2 lrmd[16090]: warning: nfs-grace_start_0:10947 - timed out after 40000ms
Nov 30 11:24:24 rnas2 crmd[16093]: error: Operation nfs-grace_start_0: Timed Out (node=rnas2, call=56, timeout=40000ms)
Nov 30 11:24:24 rnas2 crmd[16093]: notice: Operation nfs-grace_stop_0: ok (node=rnas2, call=57, rc=0, cib-update=39, confirmed=true)

I discovered, when i restart the pacemaker service on one of the running nodes, it can successfully take the cluster online again:

root@kaukasus ~# systemctl restart pacemaker

    Nov 30 11:45:36 rnas2 crmd[16093]: notice: Our peer on the DC (kaukasus) is dead
    Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK...llback ]
    Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=...t_vote ]
    Nov 30 11:45:36 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member)
    Nov 30 11:45:36 rnas2 attrd[16091]: notice: Removing all kaukasus attributes for attrd_peer_change_cb
    Nov 30 11:45:36 rnas2 attrd[16091]: notice: Removing kaukasus/1 from the membership list
    Nov 30 11:45:36 rnas2 attrd[16091]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache
    Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member)
    Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: Removing kaukasus/1 from the membership list
    Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache
    Nov 30 11:45:36 rnas2 cib[16088]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member)
    Nov 30 11:45:36 rnas2 cib[16088]: notice: Removing kaukasus/1 from the membership list
    Nov 30 11:45:36 rnas2 cib[16088]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache
    Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now lost (was member)
    Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: Removing kaukasus/1 from the membership list
    Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: Purged 1 peers with id=1 and/or uname=kaukasus from the membership cache
    Nov 30 11:45:36 rnas2 pacemakerd[16085]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null))
    Nov 30 11:45:36 rnas2 crmd[16093]: notice: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl...espond ]
    Nov 30 11:45:36 rnas2 stonith-ng[16089]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null))
    Nov 30 11:45:36 rnas2 attrd[16091]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null))
    Nov 30 11:45:36 rnas2 cib[16088]: notice: crm_update_peer_proc: Node kaukasus[1] - state is now member (was (null))
    Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16591]: INFO: Adding inet address 129.132.145.23/32 with broadcast address 129.132.... bond0
    Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16600]: INFO: Bringing device bond0 up
    Nov 30 11:45:46 rnas2 IPaddr(rnas2-cluster_ip-1)[16609]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agen...t_used
    Nov 30 11:45:46 rnas2 crmd[16093]: notice: Operation rnas2-cluster_ip-1_start_0: ok (node=rnas2, call=58, rc=0, cib-update=44, con...ed=true)
Nov 30 11:45:48 rnas2 ntpd[1700]: Listen normally on 48 bond0 129.132.145.23 UDP 123
Nov 30 11:45:48 rnas2 ntpd[1700]: new interface(s) found: waking up resolver

Yours,
Rigi

On Mon, 2015-11-30 at 15:26 +0530, Soumya Koduri wrote:
Hi,
But are you telling me that in a 3-node cluster,
quorum is lost when one of the nodes ip is down?

yes. Its the limitation with Pacemaker/Corosync. If the nodes 
participating in cluster cannot communicate with majority of them 
(quorum is lost), then the cluster is shut down.

However i am setting up a additional node to test a 4-node setup, but
even then if i put down one node and nfs-grace_start
(/usr/lib/ocf/resource.d/heartbeat/ganesha_grace) did not run properly
on the other nodes, could it be that the whole cluster goes down as
quorum lost again?

That's strange. We have tested quite a few times such configuration but 
haven't hit this issue. (CCin Saurabh who has been testing many such 
configurations).

Recently we have observed resource agents (nfs-grace_*) timing out 
sometimes esp when any node is taken down. But that shouldn't cause the 
entire cluster to shutdown.
Could you check the logs (/var/log/messages, /var/log/pacemaker.log) for 
any error/warning reported when one node is taken down in case of 4-node 
setup.

Thanks,
Soumya
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users