On Tue, Aug 30, 2005 at 02:03:32PM -0400, Lon Hohberger wrote: > On Tue, 2005-08-30 at 01:35 +0200, Axel Thimm wrote: > > > > It's really an attempt at a workaround a configuration problem -- and > > > nothing more. > > > > The above is with nfs running on all nodes already. The racing seems > > to be with the exportfs commands and ip setup/teardown. > > > > It is easy to reproduce (>=50%) if the client connects over Gigabit > > and is in write transaction while the service is moved. We saw this in > > two different setups. If you throttle the network bandwidth to <= > > 20MB/sec you don't trigger the bug, so it really seems like a racing > > problem. > > ewww... Can you bugzilla this so we can track it? =) will do so, we are currently still trying to figure it out properly, so we can provide a better bug report (and separate different bugs). One bug that has critalled out is that upon relocation the old server keeps his TCP connections to the NFS client. When this server later on gets to become the NFS server again, he FIN/ACKs that old connection to the client (that had this connection torn down by now), which creates a DUP/ACK storm. A workaround is to shutdown nfs instead of simply unexporting like nfsexport.sh does, so that the pending TCP connections get fried, too. Is there a way to have ip.sh fry all open TCP/IP connections to a service IP that is to be abandoned? I guess that would be the better solution (that would also apply to non-NFS services). Of course the true bug is the DUP/ACK storm that is triggered by the old open TCP connection. -- Axel.Thimm at ATrpms.net
Attachment:
pgpmox8Xi8pCo.pgp
Description: PGP signature
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster