Alain.Moulle wrote:Thanks Chrissie, but I have checked this bugzilla, and it seems, exceptHi, I'm facing again this problem of Node evicted and Node is undead ... And I really don't know what to do ... below are the traces in syslog. My version is :RHEL5.3 / cman-2.0.98-1.el5 Feb 25 14:33:33 s_sys@xn3 qdiskd[27582]: <notice> Writing eviction notice for node 2 Feb 25 14:33:34 s_sys@xn3 qdiskd[27582]: <notice> Node 2 evicted Feb 25 14:33:35 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. ... etc. Feb 25 14:33:45 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. Feb 25 14:33:45 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice for node 2 Feb 25 14:33:46 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. Feb 25 14:33:46 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice for node 2 Feb 25 14:33:47 s_kernel@xn3 kernel: dlm: closing connection to node 2 Feb 25 14:33:47 s_sys@xn3 fenced[27785]: xn4 not a cluster member after 0 sec post_fail_delay Feb 25 14:33:47 s_sys@xn3 fenced[27785]: fencing node "xn4" Feb 25 14:33:47 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. ...etc. Feb 25 14:33:52 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice for node 2 Feb 25 14:33:52 s_sys@xn3 fenced[27785]: fence "xn4" success Feb 25 14:33:53 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. Feb 25 14:33:53 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice for node 2 Feb 25 14:33:54 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. Feb 25 14:33:54 s_sys@xn3 qdiskd[27582]: <alert> Writing eviction notice for node 2 Feb 25 14:33:54 s_sys@xn3 clurgmgrd[27990]: <notice> Taking over service service:lustre_xn4 from down member xn4 Feb 25 14:33:55 s_sys@xn3 qdiskd[27582]: <crit> Node 2 is undead. .. etc. An then after reboot of xn4 , when we try to start the CS on xn4, it can't enter in the cluster, and we must stop CS on both nodes and start on both sides again. Where could this problem come from ? How can I avoid this eviction of node ? Any help would be very appreciated .You haven't posted any cman/openais messages but it's quite possible you've hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=485026 There's a patch included and some links to fixed RPMs. Chrissie if I'm misunderstanding, to be more on the problem of starting a second node too late with regard to the start of a first node ... so that in fact the second node can't enter in the cluster anymore. But there are no "Node is undead" messages in the syslog in this case (I've checked the joined syslog in the bugzilla). My problem is after a poweroff -f on a node of a ha pair with quorum disk but when both nodes are up and running their services : in this case , making a poweroff on second node makes the first one generate the loop "Node 2 evicted" and "Node 2 is undead" in syslog, and this even just after the poweroff, not when the second node is trying to start the CS again . Regards, Alain |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster