Hi- On Jan 25, 2012, at 12:14 PM, J. Bruce Fields wrote: > On Wed, Jan 25, 2012 at 11:47:41AM -0500, Chuck Lever wrote: >> I'm having a hard time following the discussion, I must be lacking some context. But the problem is how NFSv4.0 clients detect server identity. The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about. If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server. >> >> Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it. > > The part I'm having trouble thinking about: > > Suppose your cluster nodes all advertise each other as replicas using > v4 (fs_locations or fs_locations_info). > > Suppose your clients support (v4-based) failover, either transparent or > not. > > Now suppose your cluster reboots. > > Must the client necessarily reclaim its locks against the same server > that it last acquired them from? And if not, how do we decide whether a > given reclaim is allowed or not? This is just my opinion, but... One might define an NFSv4 server as an entity that passes out and recognizes clientid4's. Suppose a client is talking to NFSv4 servers at IP addresses A and B. If the client gets a clientid4 from IP address A and performs a failing SETCLIENTID_CONFIRM with that clientid4 on IP address B, then one can say that A and B are IP addresses for distinct servers. If the SETCLIENTID_CONFIRM succeeds, however, then the client must try other tests to confirm that A and B represent the same server. For example, can the client use the same state tokens when sending NFsv4 operations to A and B? > I think there may be scenarios involving server reboots and the client > losing contact with some but not all servers, where we could get > confused about which (if any) state the client is allowed to reclaim. If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot. Conversely, if all the servers in your cluster recognize a particular clientid4, then there must be some kind of state sharing relationship amongst them. Reclaiming against one should be sufficient. Not all clustering works that way, though. fs_locations can pass out information for entirely separate server instances. fs_locations does not in any way indicate a relationship of state or data consistency between the servers in the lists it returns. fs_locations_info, in fact, returns much richer information that allows clients some visibility of state and data consistency amongst the listed servers. But the mere presence of a server in one of these lists is not enough for a wayward client to draw any conclusions about the need to reclaim state after a server reboot. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html