I think it's because clvmd is trying to acquire the iSCSI LUNs and the iSCSI driver has not come up fully yet. The network layer has to come up, then iSCSI, then there iss a separate mount with a separate liesystem tag _netdev that tells mount to wait for these. I'm not sue if the same capabilities are in LVM to accommodate when an iSCSI device comes up. This may by the reason they are missed by LVM. -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Paul Risenhoover Sent: Tuesday, October 16, 2007 12:52 AM To: Linux-cluster@xxxxxxxxxx Subject: Linux clustering (one-node), GFS, iSCSI,clvmd (lock problem) Hi All, I am a noob to this maillist, but I've got some kind of locking problem with Linux and clusters, and iSCSI that plagues me. It's a pretty serious issue because every time I reboot my server, it fails to mount my primary iSCSI device out of the box, and in order to get it working, I have to perform some pretty manual operations to get it operational again. Here is some configuration information: Linux flax.xxx.com 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:27:41 EDT 2007 i686 i686 i386 GNU/Linux [root@flax ~]# clvmd -V Cluster LVM daemon version: 2.02.21-RHEL4 (2007-04-17) Protocol version: 0.2.1 dmesg (excerpted) iscsi-sfnet: Loading iscsi_sfnet version 4:0.1.11-3 iscsi-sfnet: Control device major number 254 iscsi-sfnet:host3: Session established scsi3 : SFNet iSCSI driver Vendor: Promise Model: VTrak M500i Rev: 2211 Type: Direct-Access ANSI SCSI revision: 04 sdh : very big device. try to use READ CAPACITY(16). SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB) SCSI device sdh: drive cache: write back sdh : very big device. try to use READ CAPACITY(16). SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB) SCSI device sdh: drive cache: write back sdh: unknown partition table [root@flax ~]# clustat Member Status: Quorate Member Name Status ------ ---- ------ flax Online, Local, rgmanager YES, THIS IS A ONE-NODE CLUSTER (Which, I suspect, might be the problem) SYMPTOM: When the server comes up, the clustered logical volume that is on the iSCSI device is labeled "inactive" when I do an "lvscan:" [root@flax ~]# lvscan inactive '/dev/nasvg_00/lvol0' [5.46 TB] inherit ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit The thing that's interesting is the lgevg_00 and the noraidvg_01 volumes are also clustered, but they are direct-attached SCSI (ie, not ISCSI). The volume group that the logical volume is a member of shows clean: [root@flax ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "nasvg_00" using metadata type lvm2 Found volume group "lgevg_00" using metadata type lvm2 Found volume group "noraidvg_01" using metadata type lvm2 So, in order to fix this, I execute the following: [root@flax ~]# lvchange -a y /dev/nasvg_00/lvol0 Error locking on node flax: Volume group for uuid not found: oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS This also shows up in my syslog, as such: Oct 13 11:27:40 flax vgchange: Error locking on node flax: Volume group for uuid not found: oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS RESOLUTION: It took me a very long time to figure this out, but since it happens to me every time I reboot my server, somebody's bound to run into this again sometime soon (and it will probably be me). Here's how I resolved it: I edited the /etc/lvm/lvm.conf file as such: was: # Type of locking to use. Defaults to local file-based locking (1). # Turn locking off by setting to 0 (dangerous: risks metadata corruption # if LVM2 commands get run concurrently). # Type 2 uses the external shared library locking_library. # Type 3 uses built-in clustered locking. #locking_type = 1 locking_type = 3 changed to: (snip) # Type 3 uses built-in clustered locking. #locking_type = 1 locking_type = 2 Then, restart clvmd as such: [root@flax ~]# service clvmd restart Then: [root@flax ~]# lvchange -a y /dev/nasvg_00/lvol0 [root@flax ~]# (see, no error!) [root@flax ~]# lvscan ACTIVE '/dev/nasvg_00/lvol0' [5.46 TB] inherit ACTIVE '/dev/lgevg_00/lvol0' [3.55 TB] inherit ACTIVE '/dev/noraidvg_01/lvol0' [546.92 GB] inherit ACTIVE '/dev/VolGroup00/LogVol00' [134.47 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit (it's active!) Then, go back and modify /etc/lvm/lvm.conf to restore the original locking_type to 3 Then, restart clvmd. THOUGHTS: I admit I don't know much about clustering, but from the evidence I see, the problem appears to be isolated to clvmd and iSCSI, if only for the fact that my direct-attached clustered volumes don't exhibit the symptoms. I'll make another leap here and guess that it's probably isolated to single-node clusters, since I'd imagine that most people who are using clustering are probably using clustering as it was intended to be used (ie, multiple machines). -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster