Re: node fails to join cluster after it was fenced

ROBERTO.RAMIREZ@xxxxxxxxxxxxxx · Tue, 6 Mar 2007 15:17:52 -0800

Luis have you check it the iptables
are off if they are on try to disable them for a test and try again

service iptables stop

chkconfig iptables off

fence and see if it get back

Luis Godoy Gonzalez <lgodoy@xxxxxxxxxxxx>

Sent by: linux-cluster-bounces@xxxxxxxxxx
03/06/2007 02:48 PM

Please respond to

linux clustering <linux-cluster@xxxxxxxxxx>

To
linux clustering <linux-cluster@xxxxxxxxxx>

cc

Subject
Re:  node
fails to join cluster after it was fenced

Hi

we have the same problem... :|   we have RHE4 U2 with cluster suite
4 

U2, in our case one node send a fenced to the other node, and we have 

not succes to rejoining the node to cluster.

On logs appeared that node 2 cannot comunicate with node 1, but the 

network connectivity is working fine

In a test we deleted the cluster.conf from node 2 and reboot it. After

the reboot the node got the last version of cluster.conf from node 1, 

but still cannot joining to cluster again.

Below of this mail, we attached a little dump from node 1 that were the

cluster service is running.

Thanks in advanced for any help.

Best Regards,

Luis.G.

=========================================================================

[root@lvs-gt1 ~]# tcpdump -s0 -x port 6809

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes

16:45:30.043719 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length
28

        0x0000:  4500 0038 d064 4000 4011 bbea
c0a8 9615  E..8.d@.@.......

        0x0010:  c0a8 96ff 1a99 1a99 0024 f69c
0101 4ed0  .........$....N.

        0x0020:  0000 b49d 0000 1900 0100 0000
0000 0000  ................

        0x0030:  0402 0100 0200 0000    
                 ........

16:45:30.043758 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length
28

        0x0000:  4500 0038 d064 4000 4011 bbea
c0a8 9615  E..8.d@.@.......

        0x0010:  c0a8 96ff 1a99 1a99 0024 f69c
0101 4ed0  .........$....N.

        0x0020:  0000 b49d 0000 1900 0100 0000
0000 0000  ................

        0x0030:  0402 0100 0200 0000    
                 ........

16:45:30.043829 IP lvs-gt2.6809 > lvs-gt1.6809: UDP, length 92

        0x0000:  4500 0078 0226 4000 4011 8ad2
c0a8 9616  E..x.&@.@.......

        0x0010:  c0a8 9615 1a99 1a99 0064 1b42
0101 2902  .........d.B..).

        0x0020:  0000 b49d 0000 0100 0000 0000
0000 0000  ................

        0x0030:  0201 0100 0100 0000 0000 0000
0500 0000  ................

        0x0040:  0000 0000 0100 0000 0a00 0000
1000 0000  ................

        0x0050:  6c62 5f63 6c75 7374 6572 0000
0000 0000  lb_cluster......

        0x0060:  0200 1a99 c0a8 9616 0000 0000
0000 0000  ................

        0x0070:  6c76 732d 6774 3200    
                 lvs-gt2.

16:45:35.042945 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length
28

        0x0000:  4500 0038 d065 4000 4011 bbe9
c0a8 9615  E..8.e@.@.......

        0x0010:  c0a8 96ff 1a99 1a99 0024 f59c
0101 4fd0  .........$....O.

        0x0020:  0000 b49d 0000 1900 0100 0000
0000 0000  ................

        0x0030:  0402 0100 0200 0000    
                 ........

16:45:35.042998 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length
28

        0x0000:  4500 0038 d065 4000 4011 bbe9
c0a8 9615  E..8.e@.@.......

        0x0010:  c0a8 96ff 1a99 1a99 0024 f59c
0101 4fd0  .........$....O.

        0x0020:  0000 b49d 0000 1900 0100 0000
0000 0000  ................

        0x0030:  0402 0100 0200 0000    
                 ........

16:45:35.043075 IP lvs-gt2.6809 > lvs-gt1.6809: UDP, length 92

        0x0000:  4500 0078 0227 4000 4011 8ad1
c0a8 9616  E..x.'@.@.......

        0x0010:  c0a8 9615 1a99 1a99 0064 1a42
0101 2a02  .........d.B..*.

        0x0020:  0000 b49d 0000 0100 0000 0000
0000 0000  ................

        0x0030:  0201 0100 0100 0000 0000 0000
0500 0000  ................

        0x0040:  0000 0000 0100 0000 0a00 0000
1000 0000  ................

        0x0050:  6c62 5f63 6c75 7374 6572 0000
0000 0000  lb_cluster......

        0x0060:  0200 1a99 c0a8 9616 0000 0000
0000 0000  ................

        0x0070:  6c76 732d 6774 3200    
                 lvs-gt2.

6 packets captured

6 packets received by filter

0 packets dropped by kernel

=============================================================================================

Patrick Caulfield wrote:

> Frederik Ferner wrote:

>   

>> On Wed, 2007-02-21 at 11:26 +0000, Patrick Caulfield wrote:

>>     

>>> Frederik Ferner wrote:

>>>       

>>>> Hi Patrick, All,

>>>>

>>>> let me give you an update on that problem.

>>>>

>>>> On Thu, 2007-02-15 at 11:36 +0000, Frederik Ferner wrote:

>>>>         

>>>>> On Thu, 2007-02-15 at 09:07 +0000, Patrick Caulfield
wrote:

>>>>>           

>>>> [node not joining cluster] 

>>>>         

>>>>>> It would be interesting to know - though you may
not want to do it - if the

>>>>>> problem persists when the still-running node is
rebooted.

>>>>>>             

>>>>> Obviously not at the moment, but I have a maintenance
window upcoming

>>>>> soon where I might be able to do that. I'll keep you
informed about the

>>>>> result.

>>>>>           

>>>> Today I had the possibility to reboot the node that was
still quorate

>>>> (i04-storage1) while the other node (i04-storage2) was
still trying to

>>>> join. 

>>>> When i04-storage1 came to the stage where the cluster
services are

>>>> started, both nodes joined the cluster at the same time.

>>>>

>>>> With this running cluster, I tried to reproduce the problem
by fencing

>>>> one node but after rebooting this immediately joined the
cluster.

>>>>         

>>> Interesting. it sounds similar to a cman bug that was introduced
in U3, but it

>>> was fixed in U4 - which you said you were running.

>>>       

>> Let's verify that then. I have the following RHCS related packages

>> installed:

>> ccs-1.0.7-0

>> rgmanager-1.9.54-1

>> cman-1.0.11-0

>> fence-1.32.25-1

>> cman-kernel-smp-2.6.9-45.8

>> dlm-kernel-smp-2.6.9-44.3

>> dlm-1.0.1-1

>>     

>

> Yes, those look fine.

>

>   

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster