NFS Failover

My LinuxHAList <mylinuxhalist@xxxxxxxxx> · Mon, 26 Apr 2010 13:58:16 -0400

Hi,
NFS Setup, 2 servers, stock redhat 5.4.

The following is on the SAN:
1) /var/lib/nfs     (so that I could preserve locks between 2 severs)
2) /export/home (home area I export to)
3) /export/shared

Setup:
1) HA-LVM (so that only 1 NFS server can see the volume at one time)
2) /export/home 192.168.251.0/255.255.255.0(rw,async,no_root_squash,fsid=4000)
3) Shared IP
4) All NFS dynamic ports are locked down to static one 
5) rpc.statd is started with "-n <hostnameoffloatingip>"
6) RPCNFSDCOUNT=64

The Service setup (with the parent-child relationship):
- Floating IP
 |- LVM, FileSystem Mounts (to mount /var/lib/nfs, /export/home)
  |--- nfslock 
    |----- nfs   

It seems to be working with me failing it over several hundred times.
The only issues were that after fail-over some clients can stop writing.

Clients mount with defaults,async,noatime,proto=udp. The default is hard-mounting and NFSv3.

I test that there are 4 NFS clients and 8 processes/NFS client writing to files while I perform the failover. 
Some times, there are clients that will stop writing -- this is inconsistent with the fact that it's hard-mounted.
I've tried clients with redhat 5.4.x and 5.5 kernels with the same results. timeo, retrans changes do not help as well.

I tried TCP option and the clients panic'ked (bugzilla.redhat.com #585269) during fail-over, hence the udp options.

I wonder if anyone is seeing the same thing. The annoying thing is that the clients stopped writing only happen some times; not all the times.
The failover completed all the time. After the fail-over, the clients can still see the mounted space.

I noticed that when the client has issues, the rpciod/6 will shoot up to 100% for several seconds. My processes that are writing files are shot to 100%, then died without finishing writing the files.

It feels like a bug on NFS clients; I'm not that certain. I would like to request community help for second opinion.

Thanks.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster