Re: server does not abort grace period

Ferenc Wagner <wferi@xxxxxxx> · Tue, 22 Feb 2011 18:05:14 +0100

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:

First of all, thank you very much for the detailed and useful reply!

> On Mon, Feb 21, 2011 at 08:54:24PM +0100, Ferenc Wagner wrote:
>
>> Ferenc Wagner <wferi@xxxxxxx> writes:
>> 
>>> We're running 2.6.32 (Debian squeeze) NFS4 server and clients.  The
>>> server boots and runs purely from SAN, so we can start it on different
>>> computers.  In case of such "hardware failovers" I'd expect the clients
>>> to quickly reclaim their locks (if any) and thus the server to abort
>>> it's 90-second grace period early.  However, this does not happen,
>>> ruining our HA like, totally.
>>>
>>> So, the questions: is the functionality of aborting the grace period
>>> early missing from version 2.6.32 of the Linux kernel?  If yes, is it
>>> present in any kernel version?  If it should work, could someone offer
>>> some advice on debugging it?  If it isn't supported, what's the
>>> best practice of providing highly available NFSv4 today?
>> 
>> Could somebody please share any related wisdom?  Pretty please?
>> In short, how to fight grace period in a HA NFS4 setup?
>> Decreasing it (of course after cutting the lock lease time) seems a
>> rather big hammer, I'd like to avoid using it if reasonably possible.
>
> The NFSv4.0 protocol doesn't provide any way for clients to tell the
> server that they have finished recovering; as long as *any* clients
> held state on the previous server instance, the new server is stuck
> waiting out the whole grace period.  Some things we could do:
>
> 	- We could at least recognize the case where *no* clients held
> 	  state before, and end the grace period early in that case.

Would this mean that /var/lib/nfs/v4recovery is empty on the server?
Actually, it contains a hex-named empty directory, sometimes two (we're
running with two clients at the moment).

> 	- In the NFSv4.1 case there is a "reclaim complete" rpc that
> 	  clients are required to send.  Currently we don't take
> 	  advantage of that to end the grace period early, but we
> 	  should.  That's no help for 4.0 clients.

/proc/fs/nfsd/versions shows +4.1 on the server, does this mean that
nfs4 type Linux client mounts should issue "reclaim complete"?  I see
that it won't help anyway at the moment, lacking server support, just
out of interest...

> 	- We could record a count of all locks/opens held in stable
> 	  storage and use that to decide when a client is done
> 	  recovering.  That would be complicated and risk slowing down
> 	  normal opens and locks a lot.

And the "reclaim complete" client RPC seems must better anyway, as the
server and the client may get out of sync in case of an unclean client
shutdown.

> I don't think decreasing the lease time would be so terrible.  Perhaps
> the default should even be a little less.

Fine, then.  Does the Linux nfs server implementation use the lease time
of the previous server instance as grace period on startup, or does it
simply take whatever it finds in /proc/fs/nfsd/nfsv4leasetime?
-- 
Thanks for taking time,
Feri.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html