On Wed, Sep 01, 2010 at 10:48:23AM -0400, Ben Turner wrote: > Here is a kbase on fence scsi: > > https://access.redhat.com/kb/docs/DOC-17809 > > It should answer any questions you have: > > https://access.redhat.com/kb/docs/DOC-17809 > > Usually I try the fence_scsi_test to be sure my devices are capable, note: > > "To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations." I just have to comment that fence_scsi_test is rather limited. I'm currently working on making it more robust, such that it more accurately tests device(s) for SCSI-PR support. Basically there are two issues: 1. The current script does not verify that registrations exist on a device -- it relies on the error code returned from sg_persist. This usually works, but we have seen some arrays that will report false positives. 2. The script *only* puts a registration on the device(s) and then removes the registration from each device. This doesn't tell the whole story, since it the array must also support the preempt-and-abort operation. A new fence_scsi_test script should be available in the very near future. Here is the relevant BZ: https://bugzilla.redhat.com/show_bug.cgi?id=603838 Ryan > ----- "Chris Jankowski" <Chris.Jankowski@xxxxxx> wrote: > > > Ben, > > > > Thank you for pointing me at fence_scsi. > > It looks like fence_scsi will fit the bill elegantly. And it should be > > much more reliable then iLO fencing if the cluster uses properly > > configured, dual fabric FC SAN for shared storage. > > > > I read the fence_scsi manual page and have one more question. > > > > What do I need to do for my cluster to start using SCSI reservations? > > Is this done by default? > > > > Thanks and regards, > > > > Chris Jankowski > > > > -----Original Message----- > > From: linux-cluster-bounces@xxxxxxxxxx > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Ben Turner > > Sent: Saturday, 28 August 2010 03:29 > > To: linux clustering > > Subject: Re: Fencing through iLO and functioning of > > kdump > > > > You have a couple options here: > > > > 1. Switch to fence_scsi(uses scsi reservation as you described) or an > > other I/O fencing method that does not reboot the system. This will > > enable you core dump to complete without power fencing interrupting > > it. > > > > 2. Put in a post fail delay long enough for fencing to complete. > > This is sub optimal as your cluster services/resources will be hung > > for the duration of the post fail delay. I usually only do this when > > I know I have a node that is crashing and no I/O fencing > > capabilities. > > > > 3. If you don't have access to an I/O fence agent and it post fail > > delay won't work for some reason you can try: > > > > Best practice I can think of right now would be the following: > > 1. disable the power fence device on the host you're seeing panics on, > > I have changed the IP for it in cluster.conf in the past 2. when that > > node fails, the other nodes will attempt to fence the host > > and it will fail since the fence device was disabled > > (NOTE: between steps 2 and 3, cluster operation is suspended) 3. > > administrator can now do things like: > > - disconnect the FC and network cables form the affected host > > ensuring > > that it is 'manually I/O fenced' > > - run fence_ack_manual on the other host to override the failed > > fencing operation to continue cluster operation on the other > > nodes 4. Now the failed host is free to continue kdumping for as long > > as need be > > > > Hope this helps. > > > > -b > > > > > > ----- "Chris Jankowski" <Chris.Jankowski@xxxxxx> wrote: > > > > > Hi, > > > > > > How can I reconcile the need to have Kdump configured and > > operational > > > on cluster nodes with the need for fencing of a node most commonly > > and > > > conveniently implemented through iLO on HP servers? > > > > > > Customers require Kdump configured and operational to be able to > > have > > > kernel crashes analysed by Red Hat support. The taking of crash dump > > > > > starts immediately after the crash, but it may take very > > considerable > > > time on a machine with 512 GB of memory (more than an hour) if done > > in > > > dumplevel 0 and over 1 GBE network. However, if I use iLO fencing > > then > > > the crashed node will be powered off through iLO which will > > > irrecovably kill the the kernel dump in progress and erase the > > memory > > > content containing the crashed kernel image. > > > > > > Ideally, I would love to have the functionality that is present in > > > several UNIX clusters, when a crashed node completes its kernel > > crash > > > dump in peace. In UNIX clusters the crashed node can be configured > > to > > > reboot automatically after kernel crash and rejoin the cluster. It > > > typically does the kernel dump as a part of the boot. > > > > > > The UNIX clusters typically use SCSI reservation to protect > > integrity > > > of storage. This enables them to keep the failed node isolated > > whilst > > > it is still able to do the kernel crash dump before rejoining the > > > cluster. I believe this option is not avilable in Linux Cluster. > > > > > > So, how can I have functioning Linux cluster with ability of taking > > a > > > kernel crash dump of crashed nodes and without blocking the access > > to > > > shared GFS2 filesystem for the hour or so that bit may take a crash > > > > > dump obn a very large system? > > > > > > Thanks and regards, > > > > > > Chris Jankowski > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster