On Fri, Dec 04, 2020 at 01:02:20AM +0000, Trond Myklebust wrote: > On Thu, 2020-12-03 at 18:16 -0500, bfields@xxxxxxxxxxxx wrote: > > On Thu, Dec 03, 2020 at 10:53:26PM +0000, Trond Myklebust wrote: > > > On Thu, 2020-12-03 at 17:45 -0500, bfields@xxxxxxxxxxxx wrote: > > > > On Thu, Dec 03, 2020 at 09:34:26PM +0000, Trond Myklebust wrote: > > > > > I've been wanting such a function for quite a while anyway in > > > > > order to allow the client to detect state leaks (either due to > > > > > soft timeouts, or due to reordered close/open operations). > > > > > > > > One sure way to fix any state leaks is to reboot the server. The > > > > server throws everything away, the clients reclaim, all that's > > > > left > > > > is stuff they still actually care about. > > > > > > > > It's very disruptive. > > > > > > > > But you could do a limited version of that: the server throws > > > > away > > > > the state from one client (keeping the underlying locks on the > > > > exported filesystem), lets the client go through its normal > > > > reclaim > > > > process, at the end of that throws away anything that wasn't > > > > reclaimed. The only delay is to anyone trying to acquire new > > > > locks > > > > that conflict with that set of locks, and only for as long as it > > > > takes for the one client to reclaim. > > > > > > One could do that, but that requires the existence of a quiescent > > > period where the client holds no state at all on the server. > > > > No, as I said, the client performs reboot recovery for any state that > > it > > holds when we do this. > > > > Hmm... So how do the client and server coordinate what can and cannot > be reclaimed? The issue is that races can work both ways, with the > client sometimes believing that it holds a layout or a delegation that > the server thinks it has returned. If the server allows a reclaim of > such a delegation, then that could be problematic (because it breaks > lock atomicity on the client and because it may cause conflicts). The server's not actually forgetting anything, it's just pretending to, in order to trigger the client's reboot recovery. It can turn down the client's attempt to reclaim something it doesn't have. Though isn't it already game over by the time the client thinks it holds some lock/open/delegation that the server doesn't? I guess I'd need to see these cases written out in detail to understand. --b. > By the way, the other thing that I'd like to add to my wishlist is a > callback that allows the server to ask the client if it still holds a > given open or lock stateid. A server can recall a delegation or a > layout, so it can fix up leaks of those, however it has no remedy if > the client loses an open or lock stateid other than to possibly > forcibly revoke state. That could cause application crashes if the > server makes a mistake and revokes a lock that is actually in use. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > > -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs