> On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust wrote: > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote: > > > I've been scratching my head over how to handle reboot of a re- > > > exporting server. I think one way to fix it might be just to allow > > > the re- export server to pass along reclaims to the original server > > > as it receives them from its own clients. It might require some > > > protocol tweaks, I'm not sure. I'll try to get my thoughts in order > > > and propose something. > > > > > > > It's more complicated than that. If the re-exporting server reboots, > > but the original server does not, then unless that re-exporting server > > persisted its lease and a full set of stateids somewhere, it will not > > be able to atomically reclaim delegation and lock state on the server > > on behalf of its clients. > > By sending reclaims to the original server, I mean literally sending new > open and lock requests with the RECLAIM bit set, which would get brand > new stateids. > > So, the original server would invalidate the existing client's previous > clientid and stateids--just as it normally would on reboot--but it would > optionally remember the underlying locks held by the client and allow > compatible lock reclaims. > > Rough attempt: > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_re- > export_servers > > Think it would fly? At a quick read through, that sounds good. I'm sure there's some bits and bobs we need to fix up. I'm cc:ing Jeff Layton because what the original server needs to do looks a bit like what he implemented in CephFS to allow HA restarts of nfs-ganesha instances. Maybe we should take this to the IETF mailing list? I'm certainly interested in discussion on what we could do in the protocol to facilitate this from nfs-ganesha perspective. Frank