Hi All,
I am a noob to this maillist, but I've got some kind of locking problem
with Linux and clusters, and iSCSI that plagues me. It's a pretty
serious issue because every time I reboot my server, it fails to mount
my primary iSCSI device out of the box, and in order to get it working,
I have to perform some pretty manual operations to get it operational again.
Here is some configuration information:
Linux flax.xxx.com 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:27:41 EDT
2007 i686 i686 i386 GNU/Linux
[root@flax ~]# clvmd -V
Cluster LVM daemon version: 2.02.21-RHEL4 (2007-04-17)
Protocol version: 0.2.1
dmesg (excerpted)
iscsi-sfnet: Loading iscsi_sfnet version 4:0.1.11-3
iscsi-sfnet: Control device major number 254
iscsi-sfnet:host3: Session established
scsi3 : SFNet iSCSI driver
Vendor: Promise Model: VTrak M500i Rev: 2211
Type: Direct-Access ANSI SCSI revision: 04
sdh : very big device. try to use READ CAPACITY(16).
SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
SCSI device sdh: drive cache: write back
sdh : very big device. try to use READ CAPACITY(16).
SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
SCSI device sdh: drive cache: write back
sdh: unknown partition table
[root@flax ~]# clustat
Member Status: Quorate
Member Name Status
------ ---- ------
flax Online, Local, rgmanager
YES, THIS IS A ONE-NODE CLUSTER (Which, I suspect, might be the problem)
SYMPTOM:
When the server comes up, the clustered logical volume that is on the
iSCSI device is labeled "inactive" when I do an "lvscan:"
[root@flax ~]# lvscan
inactive '/dev/nasvg_00/lvol0' [5.46 TB] inherit
ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit
ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
The thing that's interesting is the lgevg_00 and the noraidvg_01 volumes
are also clustered, but they are direct-attached SCSI (ie, not ISCSI).
The volume group that the logical volume is a member of shows clean:
[root@flax ~]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "nasvg_00" using metadata type lvm2
Found volume group "lgevg_00" using metadata type lvm2
Found volume group "noraidvg_01" using metadata type lvm2
So, in order to fix this, I execute the following:
[root@flax ~]# lvchange -a y /dev/nasvg_00/lvol0
Error locking on node flax: Volume group for uuid not found:
oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
This also shows up in my syslog, as such:
Oct 13 11:27:40 flax vgchange: Error locking on node flax: Volume
group for uuid not found:
oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS
RESOLUTION:
It took me a very long time to figure this out, but since it happens to
me every time I reboot my server, somebody's bound to run into this
again sometime soon (and it will probably be me).
Here's how I resolved it:
I edited the /etc/lvm/lvm.conf file as such:
was:
# Type of locking to use. Defaults to local file-based locking (1).
# Turn locking off by setting to 0 (dangerous: risks metadata corruption
# if LVM2 commands get run concurrently).
# Type 2 uses the external shared library locking_library.
# Type 3 uses built-in clustered locking.
#locking_type = 1
locking_type = 3
changed to:
(snip)
# Type 3 uses built-in clustered locking.
#locking_type = 1
locking_type = 2
Then, restart clvmd as such:
[root@flax ~]# service clvmd restart
Then:
[root@flax ~]# lvchange -a y /dev/nasvg_00/lvol0
[root@flax ~]#
(see, no error!)
[root@flax ~]# lvscan
ACTIVE '/dev/nasvg_00/lvol0' [5.46 TB] inherit
ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit
ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit
(it's active!)
Then, go back and modify /etc/lvm/lvm.conf to restore the original
locking_type to 3
Then, restart clvmd.
THOUGHTS:
I admit I don't know much about clustering, but from the evidence I see,
the problem appears to be isolated to clvmd and iSCSI, if only for the
fact that my direct-attached clustered volumes don't exhibit the symptoms.
I'll make another leap here and guess that it's probably isolated to
single-node clusters, since I'd imagine that most people who are using
clustering are probably using clustering as it was intended to be used
(ie, multiple machines).
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster