Re: cluster instability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/16/08, Shawn Hood <shawnlhood@xxxxxxxxx> wrote:
All,

This message was sent out to my office, so the voice may seem a bit
odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
1950s.  Fencing is done via DRAC.

Using packages (from RHN):

cman-kernel-smp-2.6.9-53.13
cman-1.0.17-0.el4_6.5
ccs-1.0.11-1.el4_6.1
fence-1.32.50-2.el4_6.1
lvm2-cluster-2.02.27-2.el4_6.2
dlm-kernel-smp-2.6.9-52.9
dlm-kernheaders-2.6.9-52.9

Our cluster became unstable on Saturday morning.  Apparently
hugin stopped sending out heartbeats, causing it to become fenced.  hugin
was under heavy load (~10) at the time:

03:30:02 AM         6       453      9.35     10.29     10.51
03:40:01 AM        12       465     11.02     11.00     10.75
03:50:02 AM         3       446      9.75     10.80     10.86
04:00:01 AM         5       430      9.23      9.47     10.07
Average:            7       455     10.19     10.32     10.28

04:09:35 AM       LINUX RESTART

As you can see, hugin was fenced at 4:09.  The other nodes then began
logging the following:

Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die
Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view
 
I guess this has to do with network issue though its utilization was low when this logged.
The node is not able to receive messages.

After so many 'initiating transition' messages, the cluster died.  Our
network utilization was very low at the time.

Any ideas?

Shawn

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
 
Thanks
Gowrishankar Rajaiyan | Senior Quality Analyst
 
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux