On Sun, 2007-01-07 at 20:29 +0100, Simone Gotti wrote: > Problem 2) > > After fixing Problem 1, if I set in the quorumd tag of cluster.conf an > interval > quorumdev_poll/1000*2 the quorum is lost then regained over > and over as the polling frequency of qdiskd is less than the polling one > of cman. > Probably the right thing to do is to calculate the value of > quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval" > and quorumdev_poll=interval*1000*2 should be ok. I think the poll rate should be closer to (interval * tko * 1000) [10 seconds by default] - and not a function of just the quorum disk interval. This is because after (interval*tko*1000), the master node of the cluster will write an eviction message to a hung node - and that's when qdiskd will either reboot the node or tell CMAN that its votes are no longer valid. I do not think it will cause any problems per se, but dropping qdiskd's votes after ~2 seconds when the qdisk master won't write an eviction notice for another ~8 seconds seems a bit odd. Normal node failure delay should be >= 2*(i*t*1000). There's a parameter in the <totem> tag (which defaults to 5,000ms) - which should be 2 * interval * tko * 1000, but I don't recall what it is right now. qdiskd needs to time out before CMAN does. While it doesn't have to be "half or less", it's a good paranoia factor that's easy to remember, and it gives the node plenty of time. -- Lon
Attachment:
signature.asc
Description: This is a digitally signed message part
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster