On Mon, Jun 09, 2008 at 01:14:56PM -0400, Wendy Cheng wrote: > Jeff Layton wrote: > >The problem we've run into is that occasionally they fail over to the > >alternate machine and then back very rapidly. > > It is a well known issue in the NFS-TCP failover arena (or more > specifically, for floating IP applications) that failover from server A > to server B, then immediately failing back from server B to A would > *not* work well. IIRC last round of discussing with Red Hat GPS and > support folks, we concluded that most of the applications/users *can* > tolerate this restriction. I think the big problem here is that this restriction has a window that can be particularly long lived. If an application doesn't close its sockets, the time between a failover event, and the time when it is safe to fail back, is bounded by the lifetime of the socket on the 'failed' server. given the right configuration, this could be indefinite. Worse, you could fail at just the wrong time after the sequence number wraps completely, and pickup where you left off, not knowing you lost 4GB of data in the process. > > Maybe another more basic question: "other than QA efforts, are there > real NFSv2/v3 applications depending on this "feature" ? Or there may > need tons of efforts for something that will not have much usages when > it is finally delivered ? > > -- Wendy > > > -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@xxxxxxxxxx *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html