On Thu, Nov 5, 2009 at 10:38 AM, Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> wrote:
Probably 1) is due to this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=500450
that found its solution released in RHSA-2009-1341 advisory
with cman-2.0.115-1.el5.x86_64.rpm.[snip]two other things:
1) I see these messages about quorum inside the first node, that didn't came during the previous days in 5.3 env
Nov 5 08:00:14 mork clurgmgrd: [2692]: <notice> Getting status
Nov 5 08:27:08 mork qdiskd[2206]: <warning> qdiskd: read (system call) has hung for 40 seconds
Nov 5 08:27:08 mork qdiskd[2206]: <warning> In 40 more seconds, we will be evicted
Nov 5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov 5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov 5 09:48:23 mork qdiskd[2206]: <warning> qdiskd: read (system call) has hung for 40 seconds
Nov 5 09:48:23 mork qdiskd[2206]: <warning> In 40 more seconds, we will be evicted
Nov 5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov 5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Any timings changed between releases?
My relevant lines about timings in cluster.conf were in 5.3 and remained so in 5.4:
<cluster alias="clumm" config_version="7" name="clumm">
<totem token="162000"/>
<cman quorum_dev_poll="80000" expected_votes="3" two_node="0"/>
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>
<quorumd device="/dev/sda" interval="5" label="clummquorum" log_facility="local4" log_level="7" tko="16" votes="1">
<heuristic interval="2" program="ping -c1 -w1 192.168.122.1" score="1" tko="3000"/>
</quorumd>
(tko very big in heuristic because I was testing best and safer way to do on-the-fly changes to heuristic, due to network maintenance activity causing gw disappear for some time, not predictable by the net-guys...)
I don't know if this message is deriving from a problem with latencies in my virtual env or not....
On the host side I don't see any message with dmesg command or in /var/log/messages.....
2) saw that a new kernel just released...... ;-(
Hints about possible interferences with cluster infra?
Gianluca
Probably 1) is due to this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=500450
that found its solution released in RHSA-2009-1341 advisory
And coming from 2.0.98 this is reasonable.
In my case tko=16 and interval=5, so that max time tolerance is about 80 seconds that is the 40+40 seconds I see inside the messages....
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster