On Jan 25, 2012, at 3:53 PM, J. Bruce Fields wrote: > On Wed, Jan 25, 2012 at 03:29:34PM -0500, Chuck Lever wrote: >> >> On Jan 25, 2012, at 1:55 PM, J. Bruce Fields wrote: >> >>> On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote: >>>> If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot. >>> >>> Hm, but does it have to do the recovery with that server? >> >> If a client has a lease and open state on that server, it should do recovery if the server reboots. > > Yes, but does it have to do it against *that* server, or could it > recover against another? > > Again, as long as failover is allowed, I think the latter is too. Your questions assume a number of implementation details that are not in evidence. I think we should have a f2f or phone meeting to walk through this. > >>> And if so, then how does that fit with failover? >> >> We were supposed to discuss that with Bill and Piyush. Maybe we can bring it up again at Connectathon. But my assumption is that fail over is supposed to look like a server reboot. > > That's what I assume too: but that means, if I'm a client, and I fail > over from server A to server B, and server B gives me a STALE error: I > don't know if that's just because I failed over, or if in fact A and/or > B did just reboot. > > And from the point of view of the servers: they don't know if the state > I'm trying to reclaim is state I previously held from server A, or if > it's some other state that I previously held on server C (but then lost, > unbeknownst to me, due to a network partition that lost my RENEWs to C). > > So I guess the servers would be stuck trying to track all that state > across reboots? > >> The question is what clients does the server allow to recover, and which does it force to start fresh? Shouldn't it be enough for a server to remember nfs_client_id4 strings? >> >>> I mean, suppose the whole cluster is rebooted. From the client's point >>> of view, its server becomes unresponsive. So it should probably start >>> pinging the replicas to see if another one's up. The first server it >>> gets a response from won't necessarily be the one it was using before. >>> What happens next? >> >> Again, it depends on whether your clustering implementation shares state among all servers in the cluster. > > Assume for now it doesn't. > > --b. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html