Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 25 Jan 2012 12:41:27 -0500

Hi-

On Jan 25, 2012, at 12:14 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 11:47:41AM -0500, Chuck Lever wrote:
>> I'm having a hard time following the discussion, I must be lacking some context.  But the problem is how NFSv4.0 clients detect server identity.  The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about.  If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server.
>> 
>> Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it.
> 
> The part I'm having trouble thinking about:
> 
> Suppose your cluster nodes all advertise each other as replicas using
> v4 (fs_locations or fs_locations_info).
> 
> Suppose your clients support (v4-based) failover, either transparent or
> not.
> 
> Now suppose your cluster reboots.
> 
> Must the client necessarily reclaim its locks against the same server
> that it last acquired them from?  And if not, how do we decide whether a
> given reclaim is allowed or not?

This is just my opinion, but...

One might define an NFSv4 server as an entity that passes out and recognizes clientid4's.

Suppose a client is talking to NFSv4 servers at IP addresses A and B.  If the client gets a clientid4 from IP address A and performs a failing SETCLIENTID_CONFIRM with that clientid4 on IP address B, then one can say that A and B are IP addresses for distinct servers.

If the SETCLIENTID_CONFIRM succeeds, however, then the client must try other tests to confirm that A and B represent the same server.  For example, can the client use the same state tokens when sending NFsv4 operations to A and B?

> I think there may be scenarios involving server reboots and the client
> losing contact with some but not all servers, where we could get
> confused about which (if any) state the client is allowed to reclaim.

If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.

Conversely, if all the servers in your cluster recognize a particular clientid4, then there must be some kind of state sharing relationship amongst them.  Reclaiming against one should be sufficient.  Not all clustering works that way, though.

fs_locations can pass out information for entirely separate server instances.  fs_locations does not in any way indicate a relationship of state or data consistency between the servers in the lists it returns.  fs_locations_info, in fact, returns much richer information that allows clients some visibility of state and data consistency amongst the listed servers.  But the mere presence of a server in one of these lists is not enough for a wayward client to draw any conclusions about the need to reclaim state after a server reboot.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html