Re: Adventures in NFS re-exporting

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 3 Dec 2020 22:50:46 +0000

On Thu, 2020-12-03 at 14:39 -0800, Frank Filz wrote:
> 
> 
> > -----Original Message-----
> > From: Trond Myklebust [mailto:trondmy@xxxxxxxxxxxxxxx]
> > Sent: Thursday, December 3, 2020 2:14 PM
> > To: bfields@xxxxxxxxxxxx
> > Cc: linux-cachefs@xxxxxxxxxx; ffilzlnx@xxxxxxxxxxxxxx; linux-
> > nfs@xxxxxxxxxxxxxxx; daire@xxxxxxxx
> > Subject: Re: Adventures in NFS re-exporting
> > 
> > On Thu, 2020-12-03 at 17:04 -0500, bfields@xxxxxxxxxxxx wrote:
> > > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote:
> > > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote:
> > > > > > On Thu, 2020-12-03 at 16:13 -0500,
> > > > > > bfields@xxxxxxxxxxxx wrote:
> > > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust
> > > > > > > wrote:
> > > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > > > > > > > I've been scratching my head over how to handle
> > > > > > > > > reboot of
> > > > > > > > > a
> > > > > > > > > re-
> > > > > > > > > exporting server.  I think one way to fix it might be
> > > > > > > > > just
> > > > > > > > > to allow the re- export server to pass along reclaims
> > > > > > > > > to
> > > > > > > > > the original server as it receives them from its own
> > > > > > > > > clients.  It might require some protocol tweaks, I'm
> > > > > > > > > not
> > > > > > > > > sure.  I'll try to get my thoughts in order and
> > > > > > > > > propose
> > > > > > > > > something.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > It's more complicated than that. If the re-exporting
> > > > > > > > server
> > > > > > > > reboots, but the original server does not, then unless
> > > > > > > > that
> > > > > > > > re- exporting server persisted its lease and a full set
> > > > > > > > of
> > > > > > > > stateids somewhere, it will not be able to atomically
> > > > > > > > reclaim delegation and lock state on the server on
> > > > > > > > behalf of
> > > > > > > > its clients.
> > > > > > > 
> > > > > > > By sending reclaims to the original server, I mean
> > > > > > > literally
> > > > > > > sending new open and lock requests with the RECLAIM bit
> > > > > > > set,
> > > > > > > which would get brand new stateids.
> > > > > > > 
> > > > > > > So, the original server would invalidate the existing
> > > > > > > client's
> > > > > > > previous clientid and stateids--just as it normally would
> > > > > > > on
> > > > > > > reboot--but it would optionally remember the underlying
> > > > > > > locks
> > > > > > > held by the client and allow compatible lock reclaims.
> > > > > > > 
> > > > > > > Rough attempt:
> > > > > > > 
> > > > > > > 
> > > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_
> > > > > > > re-expor
> > > > > > > t_servers
> > > > > > > 
> > > > > > > Think it would fly?
> > > > > > 
> > > > > > So this would be a variant of courtesy locks that can be
> > > > > > reclaimed by the client using the reboot reclaim variant of
> > > > > > OPEN/LOCK outside the grace period? The purpose being to
> > > > > > allow
> > > > > > reclaim without forcing the client to persist the original
> > > > > > stateid?
> > > > > > 
> > > > > > Hmm... That's doable, but how about the following
> > > > > > alternative:
> > > > > > Add
> > > > > > a function
> > > > > > that allows the client to request the full list of stateids
> > > > > > that
> > > > > > the server holds on its behalf?
> > > > > > 
> > > > > > I've been wanting such a function for quite a while anyway
> > > > > > in
> > > > > > order to allow the client to detect state leaks (either due
> > > > > > to
> > > > > > soft timeouts, or due to reordered close/open operations).
> > > > > 
> > > > > Oh, that sounds interesting. So basically the re-export
> > > > > server
> > > > > would re-populate it's state from the original server rather
> > > > > than
> > > > > relying on it's clients doing reclaims? Hmm, but how does the
> > > > > re-export server rebuild its stateids? I guess it could make
> > > > > the
> > > > > clients repopulate them with the same "give me a dump of all
> > > > > my
> > > > > state", using the state details to match up with the old
> > > > > state and
> > > > > replacing stateids. Or did you have something different in
> > > > > mind?
> > > > > 
> > > > 
> > > > I was thinking that the re-export server could just use that
> > > > list of
> > > > stateids to figure out which locks can be reclaimed atomically,
> > > > and
> > > > which ones have been irredeemably lost. The assumption is that
> > > > if
> > > > you have a lock stateid or a delegation, then that means the
> > > > clients
> > > > can reclaim all the locks that were represented by that
> > > > stateid.
> > > 
> > > I'm confused about how the re-export server uses that list.  Are
> > > you
> > > assuming it persisted its own list across its own crash/reboot? 
> > > I
> > > guess that's what I was trying to avoid having to do.
> > > 
> > No. The server just uses the stateids as part of a check for 'do I
> > hold state for
> > this file on this server?'. If the answer is 'yes' and the lock
> > owners are sane, then
> > we should be able to assume the full set of locks that lock owner
> > held on that
> > file are still valid.
> > 
> > BTW: if the lock owner is also returned by the server, then since
> > the lock owner
> > is an opaque value, it could, for instance, be used by the client
> > to cache info on
> > the server about which uid/gid owns these locks.
> 
> Let me see if I'm understanding your idea right...
> 
> Re-export server reboots within the extended lease period it's been
> given by the original server. I'm assuming it uses the same clientid?

Yes. It would have to use the same clientid.

> But would probably open new sessions. It requests the list of
> stateids. Hmm, how to make the owner information useful, nfs-ganesha
> doesn't pass on the actual client's owner but rather just passes the
> address of its record for that client owner. Maybe it will have to do
> something a bit different for this degree of re-export support...
> 
> Now the re-export server knows which original client lock owners are
> allowed to reclaim state. So it just acquires locks using the
> original stateid as the client reclaims (what happens if the client
> doesn't reclaim a lock? I suppose the re-export server could unlock
> all regions not explicitly locked once reclaim is complete). Since
> the re-export server is acquiring new locks using the original
> stateid it will just overlay the original lock with the new lock and
> write locks don't conflict since they are being acquired by the same
> lock owner. Actually the original server could even balk at a
> "reclaim" in this way that wasn't originally held... And the original
> server could "refresh" the locks, and discard any that aren't
> refreshed at the end of reclaim. That part assumes the original
> server is apprised that what is actually happening is a reclaim.
> 
> The re-export server can destroy any stateids that it doesn't receive
> reclaims for.

Right. That's in essence what I'm suggesting. There are corner cases to
be considered: e.g. "what happens if the re-export server crashes after
unlocking on the server, but before passing the LOCKU reply on the the
client", however I think it should be possible to figure out strategies
for those cases.

> 
> Hmm, I think if the re-export server is implemented as an HA cluster,
> it should establish a clientid on the original server for each
> virtual IP (assuming that's the unit of HA)  that exists. Then when
> virtual IPs are moved, the re-export server just goes through the
> above reclaim process for that clientid.
> 

Yes, we could do something like that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx