Re: SCSI reservation conflicts after update

Sajesh Singh <ssingh@xxxxxxxx> · Wed, 02 Apr 2008 12:36:11 -0400

Ryan and all else that have answered,
      Thank you for the info on scsi_reserve. I have disabled the 
script and all seems okay. What is a little confusing is that the 
script/service was enabled before the upgrade, but did not cause any 
scsi reservation conflicts.

-Sajesh-

Ryan O'Hara wrote:

I went back and investigated why this might happen. Seems that I had 
seen it before but could not recall how this sort of thing happens.

For 4.6, the scsi_reserve script should only be run if you intend to 
use SCSI reservations as a fence mechanism, as you correctly pointed 
out at the end of your message. I believe in 4.6 scsi_reserve was 
incorrectly enabled by default.

The real problem is that the keys used for scsi reservations are based 
on node ID. For this reason, it is required that nodeid be defined in 
the cluster.conf file for all nodes. Without this, the nodeid can 
change from node to node between cluster restarts, etc. The 
scsi_reserve and fence_scsi scripts require consistent nodeid (ie. 
they do not change).

So I think the problem we are seeing is that running 'scsi_reserve 
stop' cannot work since that will attempt to remove that node's key 
from the devices. If that key has changed (the node ID changed), it 
will not find a matching registration key on the device and thus fail.

The best bet is to disable scsi_reserve and to clear all scsi 
reservations. As you mentioned, the sg_persist command with the -C 
option should do the trick. I am guessing that the reason that failed 
for you is that you must supply the device name AND the key being used 
for that I_T nexus. You can use sg_persist to list the keys registered 
with a particular device, but since nodeid's may have changed you 
might have to guess the key for a particular node (ie. the node you 
run the sg_persist -C command on). The good news is that when you 
identify the correct key it will clear all the keys.

Ryan

Sajesh Singh wrote:
After updating my GFS cluster to the latest packages (as of 3/28/08) 
on an Enterprise Linux 4.6 cluster (kernel version 
2.6.9-67.0.7.ELsmp)  I am receiving scsi reservation errors whenever 
the nodes are rebooted. The node is then subsequently rebooted at 
varying intervals without any intervention. I have tried to disable 
the scsi_reserve script from startup, but it does not seem to have 
any effect. I have also tried to use the sg_persist command to clear 
all reservations with the -C option to no avail. I first noticed 
something was wrong when the 2nd node of the 2 node cluster was being 
updated. That was the first sign of the scsi reservation errors on 
the console.

 From my understanding persistent SCSI reservations are only needed 
if I am using the fence_scsi module.

I would appreciate any guidance.

Regards,

Sajesh Singh

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster