Re: How to set up NFS HA service

birger <birger@xxxxxxxxx> · Tue, 19 Apr 2005 20:47:49 +0200

I think my first attempt to answer ended up in the bit bucket because of a 
wlan problem while I saved it to the drafts folder. Sigh...

Lon Hohberger wrote:

On Tue, 2005-04-19 at 15:08 +0200, birger wrote:

Known bug/feature:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151669

You can change this behavior if you wanted to by adding <child type=...>
to service.sh's "special" element in the XML meta-data.

I thought about trying just that, but believed it couldn't be that simple... :-D

I'm also a bit puzzled about why the file systems don't get unmounted 
when I disable all services.

They're GFS.  Add force_unmount="1" to the <fs> elements if you want
them to be umounted.  GFS is nice because you *don't* have to umount
it.

That was exactly why I wanted to mount the gfs file systems outside the 
service. I am very happy with this unexpected behaviour. I want the file 
systems to be there. :-)

I was afraid they didn't unmount because of some problem.

FYI, NFS services on traditional file systems don't cleanly stop right
now due to an EBUSY during umount from the kernel.  Someone's looking in
to it on the NFS side (apparently, not all the refs are getting cleared
if a node has an NFS mount ref and we unexport the FS, or something).

I saw a very similar problem some years ago on Solaris with Veritas 
FirstWatch. fuser and lofs came up empty, but still the file system was busy 
when I tried to umount. I found a workaround... Restarting statd and lockd 
and then umount. Seems like they had their paws in the file system somehow.

Since FirstWatch was mostly a bunch of sh scripts it was easy to modify the 
nfs umount code to do this.

Regarding lockd, I think my solution is valid given the 2 restraints:

- The cluster nodes should not be NFS clients (and thanks to GFS I don't 
need that)

- There should only be one NFS service running on any cluster node. And I 
only have one NFS service.

When I set the name for statd to the name of the service IP address and 
relocate the status dir to a cluster disk, a takeover should behave just 
like a server reboot, shouldn't it?

 >>Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)

Hmm, that's odd, it could be a bug in the status phase which is related
to NIS exports.  Does this only happen after a failover, or does it
happen all the time?

My cluster only has one node (even if I have defined 2 nodes). I have to get 
the first node production ready and migrate everything over first. Then make 
the old file server a second cluster node.

I'll have a look around and see if I can find a solution.

--
birger

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster