David Teigland napsal(a): > On Mon, Nov 19, 2012 at 10:39:20AM +0100, Jacek Konieczny wrote: >> On Mon, Nov 19, 2012 at 10:16:48AM +0100, Jacek Konieczny wrote: >>> It goes like that: >>> - resources using the shared storage are properly stopped by Pacemaker. >>> - DRBD is cleanly demoted and unconfigured by Pacemaker >>> - Pacemaker cleanly exits >>> - CLVMD is stopped. >>> ??? dlm_controld is stopped >>> ??? corosync is being stopped >>> >>> and at this point the node is fenced (rebooted) by the dlm_controld on >>> the other node. I would expect it continue with a clean shutdown. >>> >>> Any idea how to debug/fix it? >>> Is this '541 cpg_dispatch error 9' the problem? >> >> I found a workaround: I have added a 10 seconds pause between >> dlm_controld and corosync shutdown. The node shuts down cleanly now (is >> not fenced). '541 cpg_dispatch error 9' is still there in the logs, >> though. > > corosync-cfgtool -H is supposed to shut down corosync cleanly using the > cfg_shutdown_callback. It looks like it may not be doing that. > I don't think it's about corosync not shut down cleanly. As can be seen in logs: ... Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: corosync profile loading service Nov 19 09:49:43 dev1n2 corosync[1130]: [WD ] magically closing the watchdog. Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: corosync watchdog service Nov 19 09:49:43 dev1n2 corosync[1130]: [MAIN ] Corosync Cluster Engine exiting normally -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster