On Mon, Aug 29, 2005 at 02:41:19PM -0400, Lon Hohberger wrote: > On Tue, 2005-08-23 at 00:52 +0200, Axel Thimm wrote: > > The typical NFS cluster setups seem to fail for Gigabit NFS/tcp. Some > > clients that are busy during the relocation of services either bail > > out with RPC garbage, or set the filesytem to EACCES, or timeout for > > 17 min. > > > > This has to do with some racing/timing in the NFS vs ip setup/teardown > > procedure. Protecting the service startup/shutdown with an iptables > > rule is a good workaround to fix this. > > > > But what is the proper way to integrate this workaround? I could setup > > new resource agents, one with start=1 and another with start=6 to > > start/stop dropping packages. Or I could modify the current resource > > agents to allow for child entities and wrap one script around the > > service and one in the inner element. > > > > I could probably also hack ip.sh to introduce some delay, to make sure > > the NFS services are really up/down before proceeding. Or maybe fix > > the true evil by making nfsexport.sh wait for NFS startup/stop > > completion (how?)? > > Traditionally, we start the NFS daemons as a service to people who > forget to start them before starting rgmanager. > > I.e. Red Hat / Fedora Core users are supposed to do this prior to > configuring NFS services in rgmanager: > > chkconfig --level 345 nfslock on > chkconfig --level 345 nfs on > > It's really an attempt at a workaround a configuration problem -- and > nothing more. The above is with nfs running on all nodes already. The racing seems to be with the exportfs commands and ip setup/teardown. It is easy to reproduce (>=50%) if the client connects over Gigabit and is in write transaction while the service is moved. We saw this in two different setups. If you throttle the network bandwidth to <= 20MB/sec you don't trigger the bug, so it really seems like a racing problem. -- Axel.Thimm at ATrpms.net
Attachment:
pgpzOSsrnrQAw.pgp
Description: PGP signature
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster