NFS relocate: old TCP/IP connection resulting in DUP/ACK storms and largish timeouts (was: iptables protection wrapper; nfsexport.sh vs ip.sh racing)

Axel Thimm <Axel.Thimm@xxxxxxxxxx> · Mon, 5 Sep 2005 17:36:49 +0200

On Tue, Aug 30, 2005 at 02:03:32PM -0400, Lon Hohberger wrote:
> On Tue, 2005-08-30 at 01:35 +0200, Axel Thimm wrote:
> 
> > > It's really an attempt at a workaround a configuration problem -- and
> > > nothing more.
> > 
> > The above is with nfs running on all nodes already. The racing seems
> > to be with the exportfs commands and ip setup/teardown.
> > 
> > It is easy to reproduce (>=50%) if the client connects over Gigabit
> > and is in write transaction while the service is moved. We saw this in
> > two different setups. If you throttle the network bandwidth to <=
> > 20MB/sec you don't trigger the bug, so it really seems like a racing
> > problem.
> 
> ewww...  Can you bugzilla this so we can track it?  =)

will do so, we are currently still trying to figure it out properly,
so we can provide a better bug report (and separate different bugs).

One bug that has critalled out is that upon relocation the old server
keeps his TCP connections to the NFS client. When this server later on
gets to become the NFS server again, he FIN/ACKs that old connection
to the client (that had this connection torn down by now), which
creates a DUP/ACK storm.

A workaround is to shutdown nfs instead of simply unexporting like
nfsexport.sh does, so that the pending TCP connections get fried, too.

Is there a way to have ip.sh fry all open TCP/IP connections to a
service IP that is to be abandoned? I guess that would be the better
solution (that would also apply to non-NFS services).

Of course the true bug is the DUP/ACK storm that is triggered by the
old open TCP connection.
-- 
Axel.Thimm at ATrpms.net
Attachment:
pgpmox8Xi8pCo.pgp

Description: PGP signature
--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster