Fencing through iLO and functioning of kdump

"Jankowski, Chris" <Chris.Jankowski@xxxxxx> · Fri, 27 Aug 2010 02:29:00 +0000

Hi,

How can I reconcile the need to have Kdump configured and operational on cluster nodes with the need for fencing of a node most commonly and conveniently implemented through iLO on HP servers?

Customers require Kdump configured and operational to be able to have kernel crashes analysed by Red Hat support. The taking of crash dump starts immediately after the crash, but it may take very considerable time on a machine with 512 GB of memory (more
than an hour) if done in dumplevel 0 and over 1 GBE network.  However, if I use iLO fencing then the crashed node will be powered off through iLO which will irrecovably kill the the kernel dump in progress and erase the memory content containing the crashed
kernel image.

Ideally, I would love to have the functionality that is present in several UNIX clusters, when a crashed node completes its kernel crash dump in peace.  In UNIX clusters the crashed node can be configured to reboot automatically after kernel crash and
rejoin the cluster.  It typically does the kernel dump as a part of the boot.  

The UNIX clusters typically use SCSI reservation to protect integrity of storage. This enables them to keep the failed node isolated whilst it is still able to do the kernel crash dump before rejoining the cluster.  I believe this option is not avilable
in Linux Cluster.

So, how can I have functioning Linux cluster with ability of taking a kernel crash dump of crashed nodes and without blocking the access to shared GFS2 filesystem for the hour or so that bit may take a crash dump obn a very large system?

Thanks and regards,

Chris Jankowski

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster