Luis have you check it the iptables are off if they are on try to disable them for a test and try again
service iptables stop
chkconfig iptables off
fence and see if it get back
Luis Godoy Gonzalez <lgodoy@xxxxxxxxxxxx>
Sent by: linux-cluster-bounces@xxxxxxxxxx 03/06/2007 02:48 PM
|
|
Hi
we have the same problem... :| we have RHE4 U2 with cluster suite 4
U2, in our case one node send a fenced to the other node, and we have
not succes to rejoining the node to cluster.
On logs appeared that node 2 cannot comunicate with node 1, but the
network connectivity is working fine
In a test we deleted the cluster.conf from node 2 and reboot it. After
the reboot the node got the last version of cluster.conf from node 1,
but still cannot joining to cluster again.
Below of this mail, we attached a little dump from node 1 that were the
cluster service is running.
Thanks in advanced for any help.
Best Regards,
Luis.G.
=========================================================================
[root@lvs-gt1 ~]# tcpdump -s0 -x port 6809
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:45:30.043719 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length 28
0x0000: 4500 0038 d064 4000 4011 bbea c0a8 9615 E..8.d@.@.......
0x0010: c0a8 96ff 1a99 1a99 0024 f69c 0101 4ed0 .........$....N.
0x0020: 0000 b49d 0000 1900 0100 0000 0000 0000 ................
0x0030: 0402 0100 0200 0000 ........
16:45:30.043758 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length 28
0x0000: 4500 0038 d064 4000 4011 bbea c0a8 9615 E..8.d@.@.......
0x0010: c0a8 96ff 1a99 1a99 0024 f69c 0101 4ed0 .........$....N.
0x0020: 0000 b49d 0000 1900 0100 0000 0000 0000 ................
0x0030: 0402 0100 0200 0000 ........
16:45:30.043829 IP lvs-gt2.6809 > lvs-gt1.6809: UDP, length 92
0x0000: 4500 0078 0226 4000 4011 8ad2 c0a8 9616 E..x.&@.@.......
0x0010: c0a8 9615 1a99 1a99 0064 1b42 0101 2902 .........d.B..).
0x0020: 0000 b49d 0000 0100 0000 0000 0000 0000 ................
0x0030: 0201 0100 0100 0000 0000 0000 0500 0000 ................
0x0040: 0000 0000 0100 0000 0a00 0000 1000 0000 ................
0x0050: 6c62 5f63 6c75 7374 6572 0000 0000 0000 lb_cluster......
0x0060: 0200 1a99 c0a8 9616 0000 0000 0000 0000 ................
0x0070: 6c76 732d 6774 3200 lvs-gt2.
16:45:35.042945 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length 28
0x0000: 4500 0038 d065 4000 4011 bbe9 c0a8 9615 E..8.e@.@.......
0x0010: c0a8 96ff 1a99 1a99 0024 f59c 0101 4fd0 .........$....O.
0x0020: 0000 b49d 0000 1900 0100 0000 0000 0000 ................
0x0030: 0402 0100 0200 0000 ........
16:45:35.042998 IP lvs-gt1.6809 > 192.168.150.255.6809: UDP, length 28
0x0000: 4500 0038 d065 4000 4011 bbe9 c0a8 9615 E..8.e@.@.......
0x0010: c0a8 96ff 1a99 1a99 0024 f59c 0101 4fd0 .........$....O.
0x0020: 0000 b49d 0000 1900 0100 0000 0000 0000 ................
0x0030: 0402 0100 0200 0000 ........
16:45:35.043075 IP lvs-gt2.6809 > lvs-gt1.6809: UDP, length 92
0x0000: 4500 0078 0227 4000 4011 8ad1 c0a8 9616 E..x.'@.@.......
0x0010: c0a8 9615 1a99 1a99 0064 1a42 0101 2a02 .........d.B..*.
0x0020: 0000 b49d 0000 0100 0000 0000 0000 0000 ................
0x0030: 0201 0100 0100 0000 0000 0000 0500 0000 ................
0x0040: 0000 0000 0100 0000 0a00 0000 1000 0000 ................
0x0050: 6c62 5f63 6c75 7374 6572 0000 0000 0000 lb_cluster......
0x0060: 0200 1a99 c0a8 9616 0000 0000 0000 0000 ................
0x0070: 6c76 732d 6774 3200 lvs-gt2.
6 packets captured
6 packets received by filter
0 packets dropped by kernel
=============================================================================================
Patrick Caulfield wrote:
> Frederik Ferner wrote:
>
>> On Wed, 2007-02-21 at 11:26 +0000, Patrick Caulfield wrote:
>>
>>> Frederik Ferner wrote:
>>>
>>>> Hi Patrick, All,
>>>>
>>>> let me give you an update on that problem.
>>>>
>>>> On Thu, 2007-02-15 at 11:36 +0000, Frederik Ferner wrote:
>>>>
>>>>> On Thu, 2007-02-15 at 09:07 +0000, Patrick Caulfield wrote:
>>>>>
>>>> [node not joining cluster]
>>>>
>>>>>> It would be interesting to know - though you may not want to do it - if the
>>>>>> problem persists when the still-running node is rebooted.
>>>>>>
>>>>> Obviously not at the moment, but I have a maintenance window upcoming
>>>>> soon where I might be able to do that. I'll keep you informed about the
>>>>> result.
>>>>>
>>>> Today I had the possibility to reboot the node that was still quorate
>>>> (i04-storage1) while the other node (i04-storage2) was still trying to
>>>> join.
>>>> When i04-storage1 came to the stage where the cluster services are
>>>> started, both nodes joined the cluster at the same time.
>>>>
>>>> With this running cluster, I tried to reproduce the problem by fencing
>>>> one node but after rebooting this immediately joined the cluster.
>>>>
>>> Interesting. it sounds similar to a cman bug that was introduced in U3, but it
>>> was fixed in U4 - which you said you were running.
>>>
>> Let's verify that then. I have the following RHCS related packages
>> installed:
>> ccs-1.0.7-0
>> rgmanager-1.9.54-1
>> cman-1.0.11-0
>> fence-1.32.25-1
>> cman-kernel-smp-2.6.9-45.8
>> dlm-kernel-smp-2.6.9-44.3
>> dlm-1.0.1-1
>>
>
> Yes, those look fine.
>
>
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster