On Thu, 2020-12-03 at 14:39 -0800, Frank Filz wrote: > > > > -----Original Message----- > > From: Trond Myklebust [mailto:trondmy@xxxxxxxxxxxxxxx] > > Sent: Thursday, December 3, 2020 2:14 PM > > To: bfields@xxxxxxxxxxxx > > Cc: linux-cachefs@xxxxxxxxxx; ffilzlnx@xxxxxxxxxxxxxx; linux- > > nfs@xxxxxxxxxxxxxxx; daire@xxxxxxxx > > Subject: Re: Adventures in NFS re-exporting > > > > On Thu, 2020-12-03 at 17:04 -0500, bfields@xxxxxxxxxxxx wrote: > > > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote: > > > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote: > > > > > > On Thu, 2020-12-03 at 16:13 -0500, > > > > > > bfields@xxxxxxxxxxxx wrote: > > > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust > > > > > > > wrote: > > > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote: > > > > > > > > > I've been scratching my head over how to handle > > > > > > > > > reboot of > > > > > > > > > a > > > > > > > > > re- > > > > > > > > > exporting server. I think one way to fix it might be > > > > > > > > > just > > > > > > > > > to allow the re- export server to pass along reclaims > > > > > > > > > to > > > > > > > > > the original server as it receives them from its own > > > > > > > > > clients. It might require some protocol tweaks, I'm > > > > > > > > > not > > > > > > > > > sure. I'll try to get my thoughts in order and > > > > > > > > > propose > > > > > > > > > something. > > > > > > > > > > > > > > > > > > > > > > > > > It's more complicated than that. If the re-exporting > > > > > > > > server > > > > > > > > reboots, but the original server does not, then unless > > > > > > > > that > > > > > > > > re- exporting server persisted its lease and a full set > > > > > > > > of > > > > > > > > stateids somewhere, it will not be able to atomically > > > > > > > > reclaim delegation and lock state on the server on > > > > > > > > behalf of > > > > > > > > its clients. > > > > > > > > > > > > > > By sending reclaims to the original server, I mean > > > > > > > literally > > > > > > > sending new open and lock requests with the RECLAIM bit > > > > > > > set, > > > > > > > which would get brand new stateids. > > > > > > > > > > > > > > So, the original server would invalidate the existing > > > > > > > client's > > > > > > > previous clientid and stateids--just as it normally would > > > > > > > on > > > > > > > reboot--but it would optionally remember the underlying > > > > > > > locks > > > > > > > held by the client and allow compatible lock reclaims. > > > > > > > > > > > > > > Rough attempt: > > > > > > > > > > > > > > > > > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_ > > > > > > > re-expor > > > > > > > t_servers > > > > > > > > > > > > > > Think it would fly? > > > > > > > > > > > > So this would be a variant of courtesy locks that can be > > > > > > reclaimed by the client using the reboot reclaim variant of > > > > > > OPEN/LOCK outside the grace period? The purpose being to > > > > > > allow > > > > > > reclaim without forcing the client to persist the original > > > > > > stateid? > > > > > > > > > > > > Hmm... That's doable, but how about the following > > > > > > alternative: > > > > > > Add > > > > > > a function > > > > > > that allows the client to request the full list of stateids > > > > > > that > > > > > > the server holds on its behalf? > > > > > > > > > > > > I've been wanting such a function for quite a while anyway > > > > > > in > > > > > > order to allow the client to detect state leaks (either due > > > > > > to > > > > > > soft timeouts, or due to reordered close/open operations). > > > > > > > > > > Oh, that sounds interesting. So basically the re-export > > > > > server > > > > > would re-populate it's state from the original server rather > > > > > than > > > > > relying on it's clients doing reclaims? Hmm, but how does the > > > > > re-export server rebuild its stateids? I guess it could make > > > > > the > > > > > clients repopulate them with the same "give me a dump of all > > > > > my > > > > > state", using the state details to match up with the old > > > > > state and > > > > > replacing stateids. Or did you have something different in > > > > > mind? > > > > > > > > > > > > > I was thinking that the re-export server could just use that > > > > list of > > > > stateids to figure out which locks can be reclaimed atomically, > > > > and > > > > which ones have been irredeemably lost. The assumption is that > > > > if > > > > you have a lock stateid or a delegation, then that means the > > > > clients > > > > can reclaim all the locks that were represented by that > > > > stateid. > > > > > > I'm confused about how the re-export server uses that list. Are > > > you > > > assuming it persisted its own list across its own crash/reboot? > > > I > > > guess that's what I was trying to avoid having to do. > > > > > No. The server just uses the stateids as part of a check for 'do I > > hold state for > > this file on this server?'. If the answer is 'yes' and the lock > > owners are sane, then > > we should be able to assume the full set of locks that lock owner > > held on that > > file are still valid. > > > > BTW: if the lock owner is also returned by the server, then since > > the lock owner > > is an opaque value, it could, for instance, be used by the client > > to cache info on > > the server about which uid/gid owns these locks. > > Let me see if I'm understanding your idea right... > > Re-export server reboots within the extended lease period it's been > given by the original server. I'm assuming it uses the same clientid? Yes. It would have to use the same clientid. > But would probably open new sessions. It requests the list of > stateids. Hmm, how to make the owner information useful, nfs-ganesha > doesn't pass on the actual client's owner but rather just passes the > address of its record for that client owner. Maybe it will have to do > something a bit different for this degree of re-export support... > > Now the re-export server knows which original client lock owners are > allowed to reclaim state. So it just acquires locks using the > original stateid as the client reclaims (what happens if the client > doesn't reclaim a lock? I suppose the re-export server could unlock > all regions not explicitly locked once reclaim is complete). Since > the re-export server is acquiring new locks using the original > stateid it will just overlay the original lock with the new lock and > write locks don't conflict since they are being acquired by the same > lock owner. Actually the original server could even balk at a > "reclaim" in this way that wasn't originally held... And the original > server could "refresh" the locks, and discard any that aren't > refreshed at the end of reclaim. That part assumes the original > server is apprised that what is actually happening is a reclaim. > > The re-export server can destroy any stateids that it doesn't receive > reclaims for. Right. That's in essence what I'm suggesting. There are corner cases to be considered: e.g. "what happens if the re-export server crashes after unlocking on the server, but before passing the LOCKU reply on the the client", however I think it should be possible to figure out strategies for those cases. > > Hmm, I think if the re-export server is implemented as an HA cluster, > it should establish a clientid on the original server for each > virtual IP (assuming that's the unit of HA) that exists. Then when > virtual IPs are moved, the re-export server just goes through the > above reclaim process for that clientid. > Yes, we could do something like that. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx