Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

Bruce Fields <bfields@xxxxxxxxxxxx> · Wed, 12 Jan 2022 13:53:27 -0500

On Tue, Jan 11, 2022 at 03:49:19PM +0000, Chuck Lever III wrote:
> > On Jan 10, 2022, at 8:03 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
> > I think this is something you and Bruce have been discussing
> > on whether when we should remove and add the client record from
> > the database when the client transits from active to COURTESY
> > and vice versa. With this patch we now expire the courtesy clients
> > asynchronously in the background so the overhead/delay from
> > removing the record from the database does not have any impact
> > on resolving conflicts.
> 
> As I recall, our idea was to record the client as expired when
> it transitions from active to COURTEOUS so that if the server
> happens to reboot, it doesn't allow a courteous client to
> reclaim locks the server may have already given to another
> active client.
> 
> So I think the server needs to do an nfsdtrack upcall when
> transitioning from active -> COURTEOUS to prevent that edge
> case. That would happen only in the laundromat, right?
> 
> So when a COURTEOUS client comes back to the server, the server
> will need to persistently record the transition from COURTEOUS
> to active.

Yep.  The bad case would be:

	- client A is marked DESTROY_COURTESY, client B is given A's
	  lock.
	- server goes down before laundromat thread removes the
	  DESTROY_COURTESY client.
	- client A's network comes back up.
	- server comes back up and starts grace period.

At this point, both A and B believe they have the lock.  Also both still
have nfsdcltrack records, so the server can't tell which is in the
right.

We can't start granting A's locks to B until we've recorded in stable
storage that A has expired.

What we'd like to do:

	- When a client transitions from active to courteous, it needs
	  to do nfsdcltrack upcall to expire it.
	- We mark client as COURTESY only after that upcall has
	  returned.
	- When the client comes back, we do an nfsdcltrack upcall to
	  mark it as active again.  We don't remove the COURTESY mark
	  until that's returned.

--b.