> On 30 Oct 2014, at 9:32 am, Lax Kota (lkota) <lkota@xxxxxxxxx> wrote: > > >>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. > >> I don't really recall. Hopefully someone more familiar with GFS2 can chime in. > Ok. > >>> >>> Also one more issue I am seeing in one other setup a repeated flood of 'A processor joined or left the membership and a new membership was formed' messages for every 4secs. I am running with default TOTEM settings with token time out as 10 secs. Even after I increase the token, consensus values to be higher. It goes on flooding the same message after newer consensus defined time (eg: if I increase it to be 10secs, then I see new membership formed messages for every 10secs) >>> >>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>> >>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. > >> It does not sound like your network is particularly healthy. >> Are you using multicast or udpu? If multicast, it might be worth trying udpu > > I am using udpu and I also have firewall opened for ports 5404 & 5405. Tcpdump looks fine too, it does not complain of any issues. This is a VM envirornment and even if I switch to other node within same VM I keep getting same failure. Depending on what the host and VMs are doing, that might be your problem. In any case, I will defer to the corosync guys at this point. > > Thanks > Lax > > > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Andrew Beekhof > Sent: Wednesday, October 29, 2014 3:17 PM > To: linux clustering > Subject: Re: daemon cpg_join error retrying > > >> On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) <lkota@xxxxxxxxx> wrote: >> >>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. > > I don't really recall. Hopefully someone more familiar with GFS2 can chime in. > >> >> Also one more issue I am seeing in one other setup a repeated flood of 'A processor joined or left the membership and a new membership was formed' messages for every 4secs. I am running with default TOTEM settings with token time out as 10 secs. Even after I increase the token, consensus values to be higher. It goes on flooding the same message after newer consensus defined time (eg: if I increase it to be 10secs, then I see new membership formed messages for every 10secs) >> >> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >> >> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. > > It does not sound like your network is particularly healthy. > Are you using multicast or udpu? If multicast, it might be worth trying udpu > >> >> Thanks >> Lax >> >> >> -----Original Message----- >> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Andrew Beekhof >> Sent: Wednesday, October 29, 2014 2:42 PM >> To: linux clustering >> Subject: Re: daemon cpg_join error retrying >> >> >>> On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) <lkota@xxxxxxxxx> wrote: >>> >>> Hi All, >>> >>> In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon cpg_join error retrying'. I have a 2 Node setup with pacemaker and corosync. >> >> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >> >>> >>> Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is there any way to fix this issue? >>> >>> >>> Thanks >>> Lax >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster