On Wed, 2018-12-19 at 17:21 -0500, J. Bruce Fields wrote: > On Wed, Dec 19, 2018 at 05:05:45PM -0500, Scott Mayhew wrote: > > On Wed, 19 Dec 2018, J. Bruce Fields wrote: > > > > > On Tue, Dec 18, 2018 at 09:29:26AM -0500, Scott Mayhew wrote: > > > > + if (!nfsd4_find_reclaim_client(clp->cl_name, nn)) > > > > + return; > > > > + if (atomic_inc_return(&nn->nr_reclaim_complete) == > > > > + nn->reclaim_str_hashtbl_size) { > > > > + printk(KERN_INFO "NFSD: all clients done reclaiming, ending NFSv4 grace period (net %x)\n", > > > > + clp->net->ns.inum); > > > > + nfsd4_end_grace(nn); > > > > + } > > > > +} > > > > + > > > > +static void dec_reclaim_complete(struct nfs4_client *clp) > > > > +{ > > > > + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); > > > > + > > > > + if (!nn->track_reclaim_completes) > > > > + return; > > > > + if (!test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) > > > > + return; > > > > + if (nfsd4_find_reclaim_client(clp->cl_name, nn)) > > > > + atomic_dec(&nn->nr_reclaim_complete); > > > > +} > > > > + > > > > static void expire_client(struct nfs4_client *clp) > > > > { > > > > unhash_client(clp); > > > > nfsd4_client_record_remove(clp); > > > > + dec_reclaim_complete(clp); > > > > __destroy_client(clp); > > > > } > > > > > > This doesn't look right to me. If a client reclaims and then > > > immediately calls DESTROY_CLIENTID or something--that should still count > > > as a reclaim, and that shouldn't prevent us from ending the grace period > > > early. > > > > > > I think dec_reclaim_complete is unnecessary. > > > > What if a client sends a RECLAIM_COMPLETE, then reboots and sends an > > EXCHANGE_ID, CREATE_SESSION, and RECLAIM_COMPLETE while the server is > > still in grace? The count would be too high then and the server could > > exit grace before all the clients have reclaimed. I actually added > > that at Jeff's suggestion because he was seeing it with nfs-ganesha. > > Oh boy. > > (Thinks.) > > Once it issues a DESTROY_CLIENTID or an EXCHANGE_ID that removes the > previous client instance's state, it's got no locks to reclaim any more. > (It can't have gotten any *new* ones, since we're still in the grace > period.) > > It's effectively a brand new client. Only reclaiming clients should > bump that counter. > > We certainly shouldn't be waiting for it to RECLAIM_COMPLETE to end the > grace period, that client just doesn't matter any more. > > I think? > That wasn't my thinking here. Suppose we have a client that holds some locks. Server reboots and we do EXCHANGE_ID and start reclaiming, and eventually send a RECLAIM_COMPLETE. Now, there is a network partition and we lose contact with the server for more than a lease period. The client record gets tossed out. Client eventually reestablishes the connection before the grace period ends and attempts to reclaim. That reclaim should succeed, IMO, as there is no reason that it shouldn't. Nothing can have claimed competing state since we're still in the grace period. The thing you don't want to do here is to double count the RECLAIM_COMPLETE for this client. So decrementing the counter when you tear down a client is reasonable. -- Jeff Layton <jlayton@xxxxxxxxxx>