On 12/10/2011 05:00 PM, Matthew Painter wrote: > The switch was our first thought, but that has been swapped, and while > we are not having nodes fenced anymore (we were daily), this anomoly > remains. > > I will ask for those logs and conf on Monday. > > I think it might be worth reinstalling corosync on this box anyway? > Can't be healthy if it is exiting unclearly. I have has reports of the > rgmanager dying on this box. (pid file but not running) Could that be > related? > > Thanks :) It's impossible to say without knowing your configuration. Please share the cluster.conf (only obfuscate passwords, please) along with the log files. The more detail, the better. Versions, distros, network config, etc. Uninstalling corosync is not likely help. RGManager is something fairly high up in the stack, so it's not likely the cause either. Did you configure the timeouts to be very high, by chance? I'm finding it difficult to fathom how the node can withdraw without being fenced, short of cleanly stopping the cluster stack. I suspect there is something important not being said, which the configuration information, versions and logs will hopefully expose. -- Digimer E-Mail: digimer@xxxxxxxxxxx Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster