Re: rapid clustered nfs server failover and hung clients -- how best to close the sockets?

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Mon, 9 Jun 2008 13:14:41 -0400

On Mon, Jun 09, 2008 at 12:09:48PM -0400, Talpey, Thomas wrote:
> At 12:01 PM 6/9/2008, Jeff Layton wrote:
> >On Mon, 09 Jun 2008 11:51:51 -0400
> >"Talpey, Thomas" <Thomas.Talpey@xxxxxxxxxx> wrote:
> >
> >> At 11:18 AM 6/9/2008, Jeff Layton wrote:
> >> >No, it's not specific to NFS. It can happen to any "service" that
> >> >floats IP addresses between machines, but does not close the sockets
> >> >that are connected to those addresses. Most services that fail over
> >> >(at least in RH's cluster server) shut down the daemons on failover
> >> >too, so tends to mitigate this problem elsewhere.
> >> 
> >> Why exactly don't you choose to restart the nfsd's (and lockd's) on the
> >> victim server?
> >
> >The victim server might have other nfsd/lockd's running on them. Stopping
> >all the nfsd's could bring down lockd, and then you have to deal with lock
> >recovery on the stuff that isn't moving to the other server.
> 
> But but but... the IP address is the only identification the client can use
> to isolate a server.

Right.

> You're telling me that some locks will migrate and some won't?  Good
> luck with that! The clients are going to be mightily confused.

Locks migrate or not depending on the server ip address.  Where do you
see the confusion?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html