totem token & post_fail_delay question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a cluster that sometimes has intermittent network issues on the heartbeat network.
Unfortunately improving the network is not an option, so I am looking for a way to tolerate longer interruptions.

Previously it seemed to me the post_fail_delay option is suitable, but after some research it might not be what I am looking for.

If I am correct, when a member leaves (due to token timeout) the cluster will wait the post_fail_delay before fencing. If the member rejoins before that, it will still be fenced, because it has previous state?
From a recent fencing on this cluster there is a strange message:

Aug 24 06:20:45 node2 openais[29048]: [MAIN ] Not killing node node1cl despite it rejoining the cluster with existing state, it has a lower node ID

What does this mean?

And lastly is increasing the totem token timeout the way to go?


Thanks,
Vasil Valchev
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux