NFS Setup, 2 servers, stock redhat 5.4.
The following is on the SAN:
1) /var/lib/nfs (so that I could preserve locks between 2 severs)
2) /export/home (home area I export to)
3) /export/shared
Setup:
1) HA-LVM (so that only 1 NFS server can see the volume at one time)
2) /export/home 192.168.251.0/255.255.255.0(rw,async,no_root_squash,fsid=4000)
3) Shared IP
4) All NFS dynamic ports are locked down to static one
5) rpc.statd is started with "-n <hostnameoffloatingip>"
6) RPCNFSDCOUNT=64
The Service setup (with the parent-child relationship):
- Floating IP
|- LVM, FileSystem Mounts (to mount /var/lib/nfs, /export/home)
|--- nfslock
|----- nfs
It seems to be working with me failing it over several hundred times.
The only issues were that after fail-over some clients can stop writing.
Clients mount with defaults,async,noatime,proto=udp. The default is hard-mounting and NFSv3.
I test that there are 4 NFS clients and 8 processes/NFS client writing to files while I perform the failover.
Some times, there are clients that will stop writing -- this is inconsistent with the fact that it's hard-mounted.
I've tried clients with redhat 5.4.x and 5.5 kernels with the same results. timeo, retrans changes do not help as well.
I tried TCP option and the clients panic'ked (bugzilla.redhat.com #585269) during fail-over, hence the udp options.
I wonder if anyone is seeing the same thing. The annoying thing is that the clients stopped writing only happen some times; not all the times.
The failover completed all the time. After the fail-over, the clients can still see the mounted space.
I noticed that when the client has issues, the rpciod/6 will shoot up to 100% for several seconds. My processes that are writing files are shot to 100%, then died without finishing writing the files.
It feels like a bug on NFS clients; I'm not that certain. I would like to request community help for second opinion.
Thanks.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster