fc6 two-node cluster with gfs2 not working

Greg Swift <gsml@xxxxxxxxxxxxxxx> · Thu, 02 Nov 2006 14:56:44 -0600

So i've got two dell blades setup, with multipath. they even cluster 
together, but once one is up and has the gfs mounted the other can't 
start the gfs2 service. I'm basing my setup on how I was setting up gfs 
w/ rhel4, i realize this newer way has some more niceties to it, and i 
must be doing something wrong, but i am not seeing much documentation on 
the differences so i am just trying to pull this off this way.

Basic rundown of setup:

decently minimal non-X install
local drives are not lvm'd (which btw, if you have 2 boxes setup 
differently, one w/ lvm and one w/o it makes clvm a pita)

yum update
yum install screen ntp cman lvm2-cluster gfs2-utils

put on good firewall config (or turn it off, both behave the same)

selinux turned down to permissive

see attached multipath.conf and cluster.conf

after updating the multipath.conf i do this:
mkinitrd -f /boot/initrd-`uname -r` `uname -r`
init 6; exit

modprobe dm-multipath
modprobe dm-round-robin
service multipathd start

that part looks just fine.

then after updating the cluster.conf on each node i do 'ccs_tool 
addnodeids' (it said to do this when i tried to start cman the first time).

then service cman start

everything looks fine, pvcreate, vgcreate, lvcreate, mkfs.gfs2, voila we 
have a gfs formatted drive visible on both systems.

i add the /etc/fstab entry, and create the mount point.

next i start clvmd, then gfs2.

the first box starts gfs2 just fine, second won't, it hangs at this 
(from var/log/messages):

Nov 1 22:41:07 box2 kernel: audit(1162442467.427:150): avc: denied { 
connectto } for pid=3724 comm="mount.gfs2" 
path=006766735F636F6E74726F6C645F736F6$
Nov 1 22:41:07 box2 kernel: GFS2: fsid=: Trying to join cluster 
"lock_dlm", "outMail:data"
Nov 1 22:41:07 box2 kernel: audit(1162442467.451:151): avc: denied { 
search } for pid=3724 comm="mount.gfs2" name="dlm" dev=debugfs ino=13186 
scontext$
Nov 1 22:41:07 box2 kernel: dlm: data: recover 1
Nov 1 22:41:07 box2 kernel: GFS2: fsid=outMail:data.1: Joined cluster. 
Now mounting FS...
Nov 1 22:41:07 box2 kernel: dlm: data: add member 1
Nov 1 22:41:07 box2 kernel: dlm: data: add member 2
Nov 1 22:49:07 box2 gfs_controld[3639]: mount: failed -17

Remember it is set to permissive.

So I shut down the box that came up fine on its own, manually enabled 
the services on box2 (the box that wasnt coming up) and it works fine. 
Turned on the box1, and at boot it is hanging at the same place box2 was.

I also realize that a 2 node cluster is not prefered, but its what i'm 
setting up, what i have access to at the moment, and honestly i'm not 
sure that i believe a 3rd box would help (but it might).

Any suggestions?

-greg

--
http://www.gvtc.com
--
“While it is possible to change without improving, it is impossible to improve without changing.” -anonymous

“only he who attempts the absurd can achieve the impossible.” -anonymous

<?xml version="1.0"?>
<cluster name="outMail" config_version="2">
    <cman two_node="1" expected_votes="1">
    </cman>
    <clusternodes>
      <clusternode name="goumang.sgc" votes="1" nodeid="1">
       <fence>
        <method name="single">
         <device name="human" ipaddr="172.16.1.180"/>
       </method>
      </fence>
     </clusternode>
     <clusternode name="rushou.sgc" votes="1" nodeid="2">
      <fence>
       <method name="single">
         <device name="human" ipaddr="172.16.1.185"/>
       </method>
      </fence>
    </clusternode>
   </clusternodes>
  <fence_devices>
   <fence_device name="human" agent="fence_manual"/>
  </fence_devices>
 </cluster>
## This is the /etc/multipath.conf file recommended for
## EMC storage devices.
##
## OS : RHEL 4 U3
## Arrays : CLARiiON and Symmetrix
##
## The blacklist is the enumeration of all devices that are to be
## excluded from multipath control
devnode_blacklist
{
        ## Replace the wwid with the output of the command
        ## 'scsi_id -g -u -s /block/[internal scsi disk name]'
        ## Enumerate the wwid for all internal scsi disks.
        ## Optionally, the wwid of VCM database may also be listed here.
        ## 3-4 Native Multipath Failover, DM-MPIO for v2.6.x Linux Kernel
        ## and EMC Storage Arrays Configuration Guide
        ## Native Multipath Failover on RHEL 4
        ##wwid 35005076718 d4224d
        ##or just do this:
        devnode "^sda"
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][0-9]*"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
## Use user friendly names, instead of using WWIDs as names.
defaults {
        ## Use user friendly names, instead of using WWIDs as names.
        user_friendly_names yes
}
devices {
        ## Device attributes requirements for EMC Symmetrix
        ## are part of the default definitions and do not require separate
        ## definition.
        ## Device attributes for EMC CLARiiON
        device {
                vendor "DGC "
                product "*"
                path_grouping_policy group_by_prio
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_emc /dev/%n"
                path_checker emc_clariion
                path_selector "round-robin 0"
                features "1 queue_if_no_path"
                no_path_retry 300
                hardware_handler "1 emc"
                failback immediate
        }
}
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster