This all works fine for graceful shutdowns, but when I do something nasty like pulling the power cord on the node that is currently running the service, the surviving node never assumes the service and spends all its time trying to fire off the fence agent, which obviously will not work because the server is completely offline. The only way I can get the surviving node to assume the VIP and start Squid is to fence_ack_manual, which sort of runs counter to running a cluster to begin with. The logs are filled with
Apr 12 00:01:44 <hostname> fenced[3223]: fencing node "<otherhost>"
Could not disable xx.xx.xx.xx on 23]: agent "fence_iptables" reports: ssh: connect to host xx.xx.xx.xx port 22: No route to host
Is this a misconfiguration, or is there an option I can include somewhere to tell the nodes to give it up after a certain number of tries?
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster