RE: Adding new file system caused problems

"Fair, Brian" <xbfair@xxxxxxxxxxxxxxxxxxxx> · Fri, 30 Nov 2007 09:34:45 -0500

I think this is something we see. The workaround has basically
been to disabled clustering (lvm wise) when doing this kind of change, and to
handle it manually:

Ie:

vgchange –c n <vg> to disable the cluster flag

lvmconf –disable-cluster on all nodes

rescan/discover lun, whatever, on all nodes

lvcreate on one node

lvchange –refresh on every node

lvchange –a y on one node

gfs_grow on one host (you can run this on the other to confirm,
it should say it can’t grow anymore)

When done, I’ve been putting things back how they were
with vgchange –c y, lvmconf –disable-cluster, though I think if I
you just left it unclustered it’d be fine… what you won’t
want to do is leave the vg clustered, but not –enable-cluster… if
you do this when you reboot the clustered volume groups won’t be
activated.

Hope this helps… if anyone knows of a definitive fix for
this I’d like to hear about it, we haven’t pushed for it since it
isn’t too big of a hassle and we aren’t constantly adding new
volumes, but it is a pain.

Brian Fair, UNIX Administrator, CitiStreet

904.791.2662

From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Randy Brown

Sent: Tuesday, November 27, 2007 12:23 PM

To: linux clustering

Subject:  Adding new file system caused problems

I am running a two node cluster using Centos 5 that is
basically being used as a NAS head for our iscsi based storage.  Here are
the related rpms and their versions I am using:

kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5

kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5

system-config-lvm-1.0.22-1.0.el5

cman-2.0.64-1.0.1.el5

rgmanager-2.0.24-1.el5.centos

gfs-utils-0.1.11-3.el5

lvm2-2.02.16-3.el5

lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded to make
it available to the cluster so it could be served via NFS to a client on our
network.  I used pvcreate and vgcreate as I always do and created a new
volume group.  When I went to create the logical volume I saw this
message:

Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid not
found: 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and
couldn't.  Lvdisplay showed that the logvol had been created and vgdisplay
looked good with the exception of the volume not being activated.  So, I
ran vgchange -aly <Volumegroupname> which didn't return any error, but
also did not activate the volume.  I then rebooted the node which made
everything OK.  I could now see the VG and lvol, both were active and I
could now create the gfs file system on the lvol.  The file system
mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I stopped
the cluster services on this node which all stopped cleanly and then tried to
restart them.  cman started fine but clvmd didn't.  It hung on the
vgscan.   Even after a reboot of node #2, clvmd would not start and
would hang on the vgscan.  It wasn't until I shut down both nodes
completely and started cluster that both nodes could see the new filesystem.

I'm sure it's my own ignorance that's making this more difficult than it needs
to be.  Am I missing a step?  Is more information required to
help?  Any assistance in figuring out what happened here would be greatly
appreciated.  I know I going to need to do similar tasks in the future and
obviously can't afford to bring everything down in order for the cluster to see
a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:

[root@nfs2-cluster ~]# cat /etc/cluster/cluster.conf

<?xml version="1.0"?>

<cluster alias="ohd_cluster" config_version="114"
name="ohd_cluster">

        <fence_daemon
post_fail_delay="0" post_join_delay="60"/>

        <clusternodes>

<clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1"
votes="1">

<fence>

<method name="1">

<device name="nfspower" port="8"
switch="1"/>

</method>

</fence>

</clusternode>

<clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2"
votes="1">

<fence>

<method name="1">

<device name="nfspower" port="7"
switch="1"/>

</method>

</fence>

</clusternode>

        </clusternodes>

        <cman
expected_votes="1" two_node="1"/>

        <rm>

<failoverdomains>

<failoverdomain name="nfs-failover" ordered="0"
restricted="1">

<failoverdomainnode name="nfs1-cluster.nws.noaa.gov" priority="1"/>

<failoverdomainnode name="nfs2-cluster.nws.noaa.gov" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="140.90.91.244" monitor_link="1"/>

<clusterfs device="/dev/VolGroupFS/LogVol-shared"
force_unmount="0" fsid="30647" fstype="gfs"
mountpoint="/fs/shared" name="fs-shared"
options="acl"/>

<nfsexport name="fs-shared-exp"/>

<nfsclient name="fs-shared-client"
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>

<clusterfs device="/dev/VolGroupTemp/LogVol-rfcdata"
force_unmount="0" fsid="54233" fstype="gfs"
mountpoint="/rfcdata" name="rfcdata"
options="acl"/>

<nfsexport name="rfcdata-exp"/>

<nfsclient name="rfcdata-client"
options="no_root_squash,rw" path=""
target="140.90.91.0/24"/>

</resources>

<service autostart="1" domain="nfs-failover"
name="nfs">

<clusterfs ref="fs-shared">

<nfsexport ref="fs-shared-exp">

<nfsclient ref="fs-shared-client"/>

</nfsexport>

</clusterfs>

<ip ref="140.90.91.244"/>

<clusterfs ref="rfcdata">

<nfsexport ref="rfcdata-exp">

<nfsclient ref="rfcdata-client"/>

</nfsexport>

<ip ref="140.90.91.244"/>

</clusterfs>

</service>

        </rm>

        <fencedevices>

<fencedevice agent="fence_apc" ipaddr="192.168.42.30"
login="rbrown" name="nfspower"
passwd="XXXXXXX"/>

        </fencedevices>

</cluster>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster