I'm trying to figure out why my cluster services keep stopping for what seems to be no obvious reason. The obvious commonality between the services being stopped are the following resources: 1 GFS file system, 1 IP address, and 1 or 2 init scripts. The init scripts vary between apache, tomcat, mysql, and squid. Normally, if a process dies and a status check on the init script returns a non-zero that event gets logged but that isn't happening when these services are stopped. An example of the first logged event related to a failed service is shown below and then the service is stopped and recovered. "May 28 19:11:33 tf36 clurgmgrd[4418]: <notice> Stopping service twapp" These nodes remain quite idle all of the time and have alot of horsepower. Some helpful information: [smccl@tf36 log]$rpm -q rgmanager cman rgmanager-1.9.46-0 cman-1.0.4-0 [smccl@tf36 log]$uname -osrvmpi Linux 2.6.9-34.ELhugemem #1 SMP Wed Mar 8 00:47:12 CST 2006 i686 i686 i386 GNU/Linux [smccl@tf36 log]$cat /etc/redhat-release CentOS release 4.3 (Final) Any help is appreciated. I can provide more information if you think it is helpful. Also, is there some sort of debugging within rgmanager I can enable to see what is truly failing or timing out and requiring a restart of these services? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster