Here is a kbase on fence scsi: https://access.redhat.com/kb/docs/DOC-17809 It should answer any questions you have: https://access.redhat.com/kb/docs/DOC-17809 Usually I try the fence_scsi_test to be sure my devices are capable, note: "To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations." -Ben ----- "Chris Jankowski" <Chris.Jankowski@xxxxxx> wrote: > Ben, > > Thank you for pointing me at fence_scsi. > It looks like fence_scsi will fit the bill elegantly. And it should be > much more reliable then iLO fencing if the cluster uses properly > configured, dual fabric FC SAN for shared storage. > > I read the fence_scsi manual page and have one more question. > > What do I need to do for my cluster to start using SCSI reservations? > Is this done by default? > > Thanks and regards, > > Chris Jankowski > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Ben Turner > Sent: Saturday, 28 August 2010 03:29 > To: linux clustering > Subject: Re: Fencing through iLO and functioning of > kdump > > You have a couple options here: > > 1. Switch to fence_scsi(uses scsi reservation as you described) or an > other I/O fencing method that does not reboot the system. This will > enable you core dump to complete without power fencing interrupting > it. > > 2. Put in a post fail delay long enough for fencing to complete. > This is sub optimal as your cluster services/resources will be hung > for the duration of the post fail delay. I usually only do this when > I know I have a node that is crashing and no I/O fencing > capabilities. > > 3. If you don't have access to an I/O fence agent and it post fail > delay won't work for some reason you can try: > > Best practice I can think of right now would be the following: > 1. disable the power fence device on the host you're seeing panics on, > I have changed the IP for it in cluster.conf in the past 2. when that > node fails, the other nodes will attempt to fence the host > and it will fail since the fence device was disabled > (NOTE: between steps 2 and 3, cluster operation is suspended) 3. > administrator can now do things like: > - disconnect the FC and network cables form the affected host > ensuring > that it is 'manually I/O fenced' > - run fence_ack_manual on the other host to override the failed > fencing operation to continue cluster operation on the other > nodes 4. Now the failed host is free to continue kdumping for as long > as need be > > Hope this helps. > > -b > > > ----- "Chris Jankowski" <Chris.Jankowski@xxxxxx> wrote: > > > Hi, > > > > How can I reconcile the need to have Kdump configured and > operational > > on cluster nodes with the need for fencing of a node most commonly > and > > conveniently implemented through iLO on HP servers? > > > > Customers require Kdump configured and operational to be able to > have > > kernel crashes analysed by Red Hat support. The taking of crash dump > > > starts immediately after the crash, but it may take very > considerable > > time on a machine with 512 GB of memory (more than an hour) if done > in > > dumplevel 0 and over 1 GBE network. However, if I use iLO fencing > then > > the crashed node will be powered off through iLO which will > > irrecovably kill the the kernel dump in progress and erase the > memory > > content containing the crashed kernel image. > > > > Ideally, I would love to have the functionality that is present in > > several UNIX clusters, when a crashed node completes its kernel > crash > > dump in peace. In UNIX clusters the crashed node can be configured > to > > reboot automatically after kernel crash and rejoin the cluster. It > > typically does the kernel dump as a part of the boot. > > > > The UNIX clusters typically use SCSI reservation to protect > integrity > > of storage. This enables them to keep the failed node isolated > whilst > > it is still able to do the kernel crash dump before rejoining the > > cluster. I believe this option is not avilable in Linux Cluster. > > > > So, how can I have functioning Linux cluster with ability of taking > a > > kernel crash dump of crashed nodes and without blocking the access > to > > shared GFS2 filesystem for the hour or so that bit may take a crash > > > dump obn a very large system? > > > > Thanks and regards, > > > > Chris Jankowski > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster