Lon Hohberger wrote: > On Sun, 2007-01-07 at 20:29 +0100, Simone Gotti wrote: >> Problem 2) >> >> After fixing Problem 1, if I set in the quorumd tag of cluster.conf an >> interval > quorumdev_poll/1000*2 the quorum is lost then regained over >> and over as the polling frequency of qdiskd is less than the polling one >> of cman. >> Probably the right thing to do is to calculate the value of >> quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval" >> and quorumdev_poll=interval*1000*2 should be ok. > > I think the poll rate should be closer to (interval * tko * 1000) [10 > seconds by default] - and not a function of just the quorum disk > interval. > > This is because after (interval*tko*1000), the master node of the > cluster will write an eviction message to a hung node - and that's when > qdiskd will either reboot the node or tell CMAN that its votes are no > longer valid. > > I do not think it will cause any problems per se, but dropping qdiskd's > votes after ~2 seconds when the qdisk master won't write an eviction > notice for another ~8 seconds seems a bit odd. > > Normal node failure delay should be >= 2*(i*t*1000). There's a > parameter in the <totem> tag (which defaults to 5,000ms) - which should > be 2 * interval * tko * 1000, but I don't recall what it is right now. > > qdiskd needs to time out before CMAN does. While it doesn't have to be > "half or less", it's a good paranoia factor that's easy to remember, and > it gives the node plenty of time. lon: do you reckon we need a blocker bug for "problem 1)" ? -- patrick -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster