Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 27 Jan 2012 10:43:00 -0500

On Wed, 25 Jan 2012 16:25:53 -0500
"J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:

> On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
> > I suggest that we only allow the reclaim of locks
> > on the original address against which they were established.
> 
> I'm not sure what that means.
> 
> If a server stops responding, the v4.0 client has two choices: it can
> either wait for the server to come back, and reclaim when it does.  Or
> if it supports failover it can go find another server and perform
> reclaims over there.
> 
> I'm a little unclear how it does that, but I suppose it first tests
> somehow to see whether its existing state is supported, and if not, it
> establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
> its old name, and then attempts to reclaim.
> 
> You're now requiring it *not* to do that if it happens that the servers
> all rebooted in the meantime.  How does it know that that's what
> happened?
> 
> Or maybe that's not what you want to require, I'm not sure.
> 

Sorry I didn't respond sooner. I spent some time yesterday poring
over Dave's whitepaper and the RFCs to see if I could figure out a
better way to do this. Short answer: I don't think we can...

By the above, I meant that we can't reasonably allow a client to
acquire a lock on address A and then reclaim that lock on address B
after a reboot. But now I'm not even certain that's sufficient to
prevent all possible problems after a cold-start of the cluster.

In particular, I'm concerned about this one (from my earlier email):

> Don't worry, it gets worse...suppose we end up with the mounting
> subdirectories of the same mount from different hosts (say,
> node1:/exp2/dir1 node2:/exp2/dir2 -- it's pathological, but there's no
> reason you couldn't do that). Now, it's not even sufficient to track
> this info on a per-node + per-fs basis...

We have no way to prevent someone from doing the above, or even to
reliably detect whether this has been done. The only way that I can see
that we could handle the above situation would be to track each
individual lock on stable storage along with enough information to know
which client owns it at a particular time. That's something I really
don't want to implement at this point in time...

I'm going to continue researching this and seeing if I can come up with
a way to handle the clustered configuration sanely. What I'll probably
plan to do in the interim is to fix the patchsets that I have so far to
at least work properly in the single-node case. I'll also try to
"future-proof" the upcall format such that a clustered configuration
hopefully won't require much in the way of changes.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html