On Wed, 25 Jan 2012 16:25:53 -0500 "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote: > > I suggest that we only allow the reclaim of locks > > on the original address against which they were established. > > I'm not sure what that means. > > If a server stops responding, the v4.0 client has two choices: it can > either wait for the server to come back, and reclaim when it does. Or > if it supports failover it can go find another server and perform > reclaims over there. > > I'm a little unclear how it does that, but I suppose it first tests > somehow to see whether its existing state is supported, and if not, it > establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using > its old name, and then attempts to reclaim. > > You're now requiring it *not* to do that if it happens that the servers > all rebooted in the meantime. How does it know that that's what > happened? > > Or maybe that's not what you want to require, I'm not sure. > Sorry I didn't respond sooner. I spent some time yesterday poring over Dave's whitepaper and the RFCs to see if I could figure out a better way to do this. Short answer: I don't think we can... By the above, I meant that we can't reasonably allow a client to acquire a lock on address A and then reclaim that lock on address B after a reboot. But now I'm not even certain that's sufficient to prevent all possible problems after a cold-start of the cluster. In particular, I'm concerned about this one (from my earlier email): > Don't worry, it gets worse...suppose we end up with the mounting > subdirectories of the same mount from different hosts (say, > node1:/exp2/dir1 node2:/exp2/dir2 -- it's pathological, but there's no > reason you couldn't do that). Now, it's not even sufficient to track > this info on a per-node + per-fs basis... We have no way to prevent someone from doing the above, or even to reliably detect whether this has been done. The only way that I can see that we could handle the above situation would be to track each individual lock on stable storage along with enough information to know which client owns it at a particular time. That's something I really don't want to implement at this point in time... I'm going to continue researching this and seeing if I can come up with a way to handle the clustered configuration sanely. What I'll probably plan to do in the interim is to fix the patchsets that I have so far to at least work properly in the single-node case. I'll also try to "future-proof" the upcall format such that a clustered configuration hopefully won't require much in the way of changes. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html