Lon Hohberger wrote:
On Tue, 2005-04-19 at 15:08 +0200, birger wrote:
Known bug/feature:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151669
You can change this behavior if you wanted to by adding <child type=...> to service.sh's "special" element in the XML meta-data.
I thought about trying just that, but believed it couldn't be that simple... :-D
I'm also a bit puzzled about why the file systems don't get unmounted when I disable all services.
They're GFS. Add force_unmount="1" to the <fs> elements if you want them to be umounted. GFS is nice because you *don't* have to umount it.
That was exactly why I wanted to mount the gfs file systems outside the service. I am very happy with this unexpected behaviour. I want the file systems to be there. :-)
I was afraid they didn't unmount because of some problem.
FYI, NFS services on traditional file systems don't cleanly stop right now due to an EBUSY during umount from the kernel. Someone's looking in to it on the NFS side (apparently, not all the refs are getting cleared if a node has an NFS mount ref and we unexport the FS, or something).
I saw a very similar problem some years ago on Solaris with Veritas FirstWatch. fuser and lofs came up empty, but still the file system was busy when I tried to umount. I found a workaround... Restarting statd and lockd and then umount. Seems like they had their paws in the file system somehow.
Since FirstWatch was mostly a bunch of sh scripts it was easy to modify the nfs umount code to do this.
Regarding lockd, I think my solution is valid given the 2 restraints:
- The cluster nodes should not be NFS clients (and thanks to GFS I don't need that)
- There should only be one NFS service running on any cluster node. And I only have one NFS service.
When I set the name for statd to the name of the service IP address and relocate the status dir to a cluster disk, a takeover should behave just like a server reboot, shouldn't it?
>>Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error) Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error) Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error) Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
Hmm, that's odd, it could be a bug in the status phase which is related to NIS exports. Does this only happen after a failover, or does it happen all the time?
My cluster only has one node (even if I have defined 2 nodes). I have to get the first node production ready and migrate everything over first. Then make the old file server a second cluster node.
I'll have a look around and see if I can find a solution.
-- birger
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster