CentOS
5.2, 26-node cluster. Today
I restarted one node. It left the cluster, rebooted and joined the cluster
without incident. Everything is fine but… fenced has the CPU pegged. No
useful log messages. strace says it is spinning on poll/recvfrom: poll([{fd=4,
events=POLLIN}, {fd=6, events=POLLIN, revents=POLLIN}, {fd=7, events=POLLIN},
{fd=8, events=POLLIN, revents=POLLNVAL}], 4, -1) = 2 recvfrom(5,
0x7fffb074ab40, 20, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4,
events=POLLIN}, {fd=6, events=POLLIN, revents=POLLIN}, {fd=7, events=POLLIN},
{fd=8, events=POLLIN, revents=POLLNVAL}], 4, -1) = 2 recvfrom(5,
0x7fffb074ab40, 20, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) Anything
else useful I can do to diagnose? What are the chances I can recover this node
nicely without making things worse? Any
help/ideas appreciated, Jeff |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster