RE: Adventures in NFS re-exporting

"Frank Filz" <ffilzlnx@xxxxxxxxxxxxxx> · Thu, 3 Dec 2020 13:32:34 -0800

> On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust wrote:
> > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > I've been scratching my head over how to handle reboot of a re-
> > > exporting server.  I think one way to fix it might be just to allow
> > > the re- export server to pass along reclaims to the original server
> > > as it receives them from its own clients.  It might require some
> > > protocol tweaks, I'm not sure.  I'll try to get my thoughts in order
> > > and propose something.
> > >
> >
> > It's more complicated than that. If the re-exporting server reboots,
> > but the original server does not, then unless that re-exporting server
> > persisted its lease and a full set of stateids somewhere, it will not
> > be able to atomically reclaim delegation and lock state on the server
> > on behalf of its clients.
> 
> By sending reclaims to the original server, I mean literally sending new
> open and lock requests with the RECLAIM bit set, which would get brand
> new stateids.
> 
> So, the original server would invalidate the existing client's previous
> clientid and stateids--just as it normally would on reboot--but it would
> optionally remember the underlying locks held by the client and allow
> compatible lock reclaims.
> 
> Rough attempt:
> 
> 	https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_re-
> export_servers
> 
> Think it would fly?

At a quick read through, that sounds good. I'm sure there's some bits and bobs we need to fix up.

I'm cc:ing Jeff Layton because what the original server needs to do looks a bit like what he implemented in CephFS to allow HA restarts of nfs-ganesha instances.

Maybe we should take this to the IETF mailing list? I'm certainly interested in discussion on what we could do in the protocol to facilitate this from nfs-ganesha perspective.

Frank