cluster instability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,

This message was sent out to my office, so the voice may seem a bit
odd.  We have a 4 node cluster running RHEL4U6 on Dell Poweredge
1950s.  Fencing is done via DRAC.

Using packages (from RHN):

cman-kernel-smp-2.6.9-53.13
cman-1.0.17-0.el4_6.5
ccs-1.0.11-1.el4_6.1
fence-1.32.50-2.el4_6.1
lvm2-cluster-2.02.27-2.el4_6.2
dlm-kernel-smp-2.6.9-52.9
dlm-kernheaders-2.6.9-52.9

Our cluster became unstable on Saturday morning.  Apparently
hugin stopped sending out heartbeats, causing it to become fenced.  hugin
was under heavy load (~10) at the time:

03:30:02 AM         6       453      9.35     10.29     10.51
03:40:01 AM        12       465     11.02     11.00     10.75
03:50:02 AM         3       446      9.75     10.80     10.86
04:00:01 AM         5       430      9.23      9.47     10.07
Average:            7       455     10.19     10.32     10.28

04:09:35 AM       LINUX RESTART

As you can see, hugin was fenced at 4:09.  The other nodes then began
logging the following:

Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58
Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59
Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60
Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61
Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die
Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view

After so many 'initiating transition' messages, the cluster died.  Our
network utilization was very low at the time.

Any ideas?

Shawn

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux