On Fri, 2006-05-19 at 10:24 +0200, Mels Kooijman wrote: > May 17 23:21:51 unxams4082 clurgmgrd[23408]: <notice> Service infoserver > is recovering > May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #55: Failed changing > RG status > May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #44: Cannot start RG > infoserver: Invalid State 117 > May 17 23:21:51 unxams4082 clurgmgrd[23408]: <crit> #13: Service > infoserver failed to stop cleanly > May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #57: Failed changing > RG status What caused unxams4082 to die? Was there anything in dmesg, like dlm_emergency_shutdown? > May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> Magma Event: > Membership Change > May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> State change: > unxams4082 DOWN > May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <err> #44: Cannot start RG > infoserver: Invalid State 117 What release of rgmanager? This might be a bug. > May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <crit> #13: Service > infoserver failed to stop cleanly > May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Taking over > service clustat from down member (null) > May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Service clustat > started > > Where can I find a description of the error numbers (55,44,13,57)? /usr/share/doc/rgmanager-*/errors.txt > What can be the course that we get often the message: > clurgmgrd[23408]: <notice> status on ip "192.168.50.43" returned 1 > (generic error) Can be caused by several things... link died, pre-U3 router ping failed. The router-ping code was removed in U3 because it caused more problems than it solved. If this is happening every two minutes, this is definitely the cause. -- Lon -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster