All, This message was sent out to my office, so the voice may seem a bit odd. We have a 4 node cluster running RHEL4U6 on Dell Poweredge 1950s. Fencing is done via DRAC. Using packages (from RHN): cman-kernel-smp-2.6.9-53.13 cman-1.0.17-0.el4_6.5 ccs-1.0.11-1.el4_6.1 fence-1.32.50-2.el4_6.1 lvm2-cluster-2.02.27-2.el4_6.2 dlm-kernel-smp-2.6.9-52.9 dlm-kernheaders-2.6.9-52.9 Our cluster became unstable on Saturday morning. Apparently hugin stopped sending out heartbeats, causing it to become fenced. hugin was under heavy load (~10) at the time: 03:30:02 AM 6 453 9.35 10.29 10.51 03:40:01 AM 12 465 11.02 11.00 10.75 03:50:02 AM 3 446 9.75 10.80 10.86 04:00:01 AM 5 430 9.23 9.47 10.07 Average: 7 455 10.19 10.32 10.28 04:09:35 AM LINUX RESTART As you can see, hugin was fenced at 4:09. The other nodes then began logging the following: Jun 14 04:08:06 munin kernel: CMAN: Initiating transition, generation 58 Jun 14 04:08:21 munin kernel: CMAN: Initiating transition, generation 59 Jun 14 04:08:36 munin kernel: CMAN: Initiating transition, generation 60 Jun 14 04:08:51 munin kernel: CMAN: Initiating transition, generation 61 Jun 14 04:09:06 munin kernel: CMAN: too many transition restarts - will die Jun 14 04:09:06 munin kernel: CMAN: we are leaving the cluster. Inconsistent cluster view After so many 'initiating transition' messages, the cluster died. Our network utilization was very low at the time. Any ideas? Shawn -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster