On Wed, Feb 6, 2013 at 9:19 AM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
This statd data and the implementation of the NLM protocol is not> Oh, OK. Looking at the code in xlators/nfs/server/src/nlm4.c.... Looks
> like it's probably just using the same statd as the kernel server--the
> one installed as a part of nfs-utils, which by default puts its state in
> /var/lib/nfs/statd/.
>
> So if you want failover to work, then the contents of
> /var/lib/nfs/statd/ has to be made available to the server that takes
> over somehow.
something I am very familiar with. But Rajesh (on CC) explained a little
about it and informed me that the current NLM implementation indeed does
not support transparent fail-over yet.
The NLM implementation in gluster is stateless for all practical reasons (all locks are translated to lk() FOPs on the bricks). However we just use / depend on the RHEL rpc.statd -- which is not clustered. If the RHEL rpc.statd is replaced with a "clustered" statd, Gluster's NLM should "just work" even in failovers (by making a failover appear as a server reboot and kick off NLM's lock recovery) -- which may not be ideal and efficient, but should be functional.
Avati