RHEL5.3 / cman-2.0.98-1.el5 / Problem loop on "Node x is undead"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alain.Moulle wrote:

> Hi,
> 
> I'm facing again this problem of Node  evicted and Node is undead ...
> And I really don't know what to do ... below are the traces in syslog.
> My version is :RHEL5.3 / cman-2.0.98-1.el5
> 
> Feb 25 14:33:33 s_sys@xn3 qdiskd[27582]: <notice> Writing eviction
> notice for node 2
> Feb 25 14:33:34 s_sys@xn3 qdiskd[27582]: <notice> Node 2 evicted
> Feb 25 14:33:35 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ... etc.
> Feb 25 14:33:45 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:45 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:46 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:46 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:47 s_kernel@xn3 kernel: dlm: closing connection to node 2
> Feb 25 14:33:47 s_sys@xn3 fenced[27785]: xn4 not a cluster member after
> 0 sec post_fail_delay
> Feb 25 14:33:47 s_sys@xn3 fenced[27785]: fencing node "xn4"
> Feb 25 14:33:47 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ...etc.
> Feb 25 14:33:52 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:52 s_sys@xn3 fenced[27785]: fence "xn4" success
> Feb 25 14:33:53 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:53 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:54 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys@xn3 clurgmgrd[27990]: <notice> Taking over service
> service:lustre_xn4 from down member xn4
> Feb 25 14:33:55 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead.
> .. etc.
> 
> An then after reboot of xn4 , when we try to start the CS on xn4, it
> can't enter in the cluster, and we
> must stop CS on both nodes and start on both sides again.
> 
> Where could this problem come from ? How can I avoid this eviction of
> node  ?
> 
> Any help would be very appreciated .
    

You haven't posted any cman/openais messages but it's quite possible
you've hit this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=485026

There's a patch included and some links to fixed RPMs.


Chrissie
Thanks Chrissie, but I have checked this bugzilla, and it seems, except
if I'm misunderstanding, to be more on the problem of starting a second
node too late with regard to the start of a first node ... so that in fact
the second node can't enter in the cluster anymore. But there are no
"Node is undead" messages in the syslog in this case (I've checked the joined
syslog in the bugzilla).
My problem is after a poweroff -f on a node of a ha pair with quorum disk
but when both nodes are up and running their services : in this case , making a
poweroff on second node makes the first one generate the loop "Node 2 evicted"
and "Node 2 is undead" in syslog, and this even just after the poweroff, not when
the second node is trying to start the CS again .

Regards,
Alain
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux