Sorry to bother you with this ; am i the only one that spotted this issue ? I did review the code from cluster-2 and cluster-1.04 and the patch is also relevant there. A easy way of running into this problem is to generate CPU load on a node, and then do loops of ccsd and gulm start/stop. Sometimes, gulm will get out with an error complaining that it was unable to contact ccsd. Le Fri, 15 Jun 2007 10:52:08 +0200, Mathieu Avila <mathieu.avila@xxxxxxxxxxxx> a écrit : > Hello all, > > I'm sometimes having trouble when starting ccsd and then gulm under > heavy CPU load. Ccsd's init script tells it is running but it's not > fully initialized. > The problem comes from the fact that ccsd's main process returns > before the daemonized process of ccsd has finished initializing its > sockets. The "cluster_communicator" thread sends a SIGTERM message to > the parent process before the main thread has finished its > initialization work. > > With the patch proposed in attachement, the cluster_communicator is > started after the main thread has finished initializing. It works > well under any load. Any daemon that needs to connect ccsd will > then succceed. > It was tested with cluster-1.03, but it should work with older > versions, the ccsd files didn't seem to have changed much. > > -- > Mathieu Avila -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster