How to set up NFS HA service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Debugging a cluster setup with this software could have been easier given better error messages from the components, but I'm getting there...

I thought I'd just mount my gfs file systems outside the resource manager's control to have them present all the time and just use the resource manager to move over the IP address and do the NFS magic. That seems impossible, as I couldn't get any exports to happen when I defined them in cluster.conf without a surrounding <fs>. I could define the exports in /etc/exports, but then I would have to synch files. So in the end I put all my gfs file systems into cluster.conf.

It almost works. I get mounts, and they get exported. But I have some error messages in the log file and the exports take a loooong time. Only 2 of the 3 exports defined seem to show up.

I'm also a bit puzzled about why the file systems don't get unmounted when I disable all services.

As for file locking:
I copied /etc/init.d/nfslock to /etc/init.d/nfslock-svc and made some changes.
First, I added a little code to enable nfslock to read a variable STATD_STATEDIR for the -p option from the config file in /etc/sysconfig. I think this should get propagated back to upcoming fedora releases if someone who knows how would bother to do it... I then changed nfslock-svc to read a different config file (/etc/sysconfig/nfs-svc) and to do 'service nfslock stop' at the top of the start section and 'service nfslock start' at the bottom of the stop section.
This enables me to have statd running as e.g. 'server1' on the cluster node until it takes over the nfs service. At takeover, statd gets restarted with statedir on a cluster file system (so it can take over lock info belonging to the service) and with the name of the NFS service IP address. Does this sound reasonable? I know I'll loose any locks the cluster node may have had (as NFS client) when it takes over the nfs service, but I cannot see any reason why the cluster node should have nfs locks (or nfs mounts for that matter) except when doing admin work. I think I could fix it by copying /var/lib/nfs/statd/sm* into the clustered file system right after the 'service nfslock stop' I put in.


I have appended part of my messages file and my cluster.conf file. Any help with my NFS export issues will be appreciated.

--
birger

<?xml version="1.0"?>
<cluster name="iftc001" config_version="20">
   <clusternodes>
      <clusternode name="server1">
         <fence>
            <!-- If all else fails, make someone do it manually -->
            <method name="human">
               <device name="last_resort" ipaddr="server1"/>
            </method>
         </fence>
      </clusternode>
      <clusternode name="server2">
         <fence>
            <!-- If all else fails, make someone do it manually -->
            <method name="human">
               <device name="last_resort" ipaddr="server2"/>
            </method>
         </fence>
      </clusternode>
   </clusternodes>
                                                                             
   <fencedevices>
      <fencedevice name="last_resort" agent="fence_manual"/>
   </fencedevices>

   <cman two_node="1" expected_votes="1">
   </cman>
      

<rm>
  <failoverdomains>
    <failoverdomain name="nfsdomain" ordered="0" restricted="1">
      <failoverdomainnode name="server1" priority="1"/>
      <failoverdomainnode name="server2" priority="2"/>
    </failoverdomain>
    <failoverdomain name="smbdomain" ordered="0" restricted="1">
      <failoverdomainnode name="server1" priority="2"/>
      <failoverdomainnode name="server2" priority="1"/>
    </failoverdomain>
  </failoverdomains>

  <resources>
    <clusterfs fstype="gfs" name="cluadmfs" mountpoint="/cluadm" device="/dev/raid5/cluadm" options="acl"/>
    <clusterfs fstype="gfs" name="pakkefs" mountpoint="/service/pakke" device="/dev/raid5/pakke" options="acl"/>
    <clusterfs fstype="gfs" name="xusersfs" mountpoint="/service/xusers" device="/dev/raid5/xusers" options="acl"/>
    <clusterfs fstype="gfs" name="iftscratchfs" mountpoint="/service/iftscratch" device="/dev/raid5/iftscratch" options="acl"/>
    <nfsexport name="NFSexports"/>
    <nfsclient name="nis-hosts" target="@nis-hosts" options="rw,sync"/>
    <nfsclient name="nis-hosts-ro" target="@nis-hosts" options="ro,sync"/>
  </resources>

  <service name="nfssvc" domain="nfsdomain">
    <ip address="X.X.X.X" monitor_link="yes"/>
    <script name="NFS script" file="/etc/init.d/nfs"/>
    <script name="NFS script" file="/etc/init.d/nfslock-svc"/>
    <clusterfs ref="cluadmfs"/>
    <clusterfs ref="pakkefs">
      <nfsexport ref="NFSexports">
        <nfsclient ref="nis-hosts-ro"/>
      </nfsexport>
    </clusterfs>
    <clusterfs ref="xusersfs">
      <nfsexport ref="NFSexports">
        <nfsclient ref="nis-hosts"/>
      </nfsexport>
    </clusterfs>
    <clusterfs ref="iftscratchfs">
      <nfsexport ref="NFSexports">
        <nfsclient ref="nis-hosts"/>
      </nfsexport>
    </clusterfs>
  </service>


  <service name="smbsvc" domain="smbdomain">
    <ip address="X.X.X.X" monitor_link="yes"/>
    <clusterfs ref="cluadmfs"/>
    <clusterfs ref="pakkefs"/>
    <clusterfs ref="xusersfs"/>
    <clusterfs ref="iftscratchfs"/>
  </service>
</rm>

</cluster>
Apr 19 14:42:43 server1 clurgmgrd[7498]: <notice> Starting disabled service nfssvc
Apr 19 14:42:43 server1 kernel: GFS: Trying to join cluster "lock_dlm", "iftc001:cluadm"
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: Joined cluster. Now mounting FS...
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=0: Trying to acquire journal lock...
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=0: Looking at journal...
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=0: Done
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=1: Trying to acquire journal lock...
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=1: Looking at journal...
Apr 19 14:42:45 server1 kernel: GFS: fsid=iftc001:cluadm.0: jid=1: Done
Apr 19 14:42:46 server1 kernel: SELinux: initialized (dev dm-0, type gfs), not configured for labeling
Apr 19 14:42:46 server1 kernel: GFS: Trying to join cluster "lock_dlm", "iftc001:gfs01"
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: Joined cluster. Now mounting FS...
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=0: Trying to acquire journal lock...
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=0: Looking at journal...
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=0: Done
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=1: Trying to acquire journal lock...
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=1: Looking at journal...
Apr 19 14:42:48 server1 kernel: GFS: fsid=iftc001:gfs01.0: jid=1: Done
Apr 19 14:42:48 server1 kernel: SELinux: initialized (dev dm-2, type gfs), not configured for labeling
Apr 19 14:42:48 server1 nfs: rpc.mountd shutdown failed
Apr 19 14:42:48 server1 nfs: nfsd shutdown failed
Apr 19 14:42:48 server1 nfs: rpc.rquotad shutdown failed
Apr 19 14:42:48 server1 nfs: Shutting down NFS services:  succeeded
Apr 19 14:42:48 server1 nfs: Starting NFS services:  succeeded
Apr 19 14:42:48 server1 nfs: rpc.rquotad startup succeeded
Apr 19 14:42:48 server1 nfs: rpc.nfsd startup succeeded
Apr 19 14:42:49 server1 nfs: rpc.mountd startup succeeded
Apr 19 14:42:49 server1 rpcidmapd: rpc.idmapd -SIGHUP succeeded
Apr 19 14:42:51 server1 clurmtabd[12327]: <err> #20: Failed set log level
Apr 19 14:42:51 server1 kernel: GFS: Trying to join cluster "lock_dlm", "iftc001:xusers"
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: Joined cluster. Now mounting FS...
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=0: Trying to acquire journal lock...
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=0: Looking at journal...
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=0: Done
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=1: Trying to acquire journal lock...
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=1: Looking at journal...
Apr 19 14:42:53 server1 kernel: GFS: fsid=iftc001:xusers.0: jid=1: Done
Apr 19 14:42:53 server1 kernel: SELinux: initialized (dev dm-1, type gfs), not configured for labeling
Apr 19 14:42:53 server1 clurmtabd[12426]: <err> #20: Failed set log level
Apr 19 14:42:53 server1 kernel: GFS: Trying to join cluster "lock_dlm", "iftc001:scratch"
Apr 19 14:42:55 server1 kernel: GFS: fsid=iftc001:scratch.0: Joined cluster. Now mounting FS...
Apr 19 14:42:55 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=0: Trying to acquire journal lock...
Apr 19 14:42:55 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=0: Looking at journal...
Apr 19 14:42:56 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=0: Done
Apr 19 14:42:56 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=1: Trying to acquire journal lock...
Apr 19 14:42:56 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=1: Looking at journal...
Apr 19 14:42:56 server1 kernel: GFS: fsid=iftc001:scratch.0: jid=1: Done
Apr 19 14:42:56 server1 kernel: SELinux: initialized (dev dm-3, type gfs), not configured for labeling
Apr 19 14:42:56 server1 clurmtabd[12517]: <err> #20: Failed set log level
Apr 19 14:42:57 server1 nfs: Starting NFS services:  succeeded
Apr 19 14:42:57 server1 nfs: rpc.rquotad startup succeeded
Apr 19 14:42:57 server1 nfs: rpc.nfsd startup succeeded
Apr 19 14:42:57 server1 nfs: rpc.mountd startup succeeded
Apr 19 14:42:57 server1 rpcidmapd: rpc.idmapd -SIGHUP succeeded
Apr 19 14:42:57 server1 nfslock: lockd -KILL succeeded
Apr 19 14:42:57 server1 rpc.statd[12004]: Caught signal 15, un-registering and exiting.
Apr 19 14:42:57 server1 nfslock: rpc.statd shutdown succeeded
Apr 19 14:42:58 server1 rpc.statd[12618]: Version 1.0.6 Starting
Apr 19 14:42:58 server1 rpc.statd[12618]: Flags:
Apr 19 14:42:58 server1 nfslock-svc: rpc.statd startup succeeded
Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux