Re: share experience migrating cluster suite from centos 5.3 to centos 5.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 5, 2009 at 10:38 AM, Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> wrote:
[snip]
two other things:
1) I see these messages about quorum inside the first node, that didn't came during the previous days in 5.3 env
Nov  5 08:00:14 mork clurgmgrd: [2692]: <notice> Getting status
Nov  5 08:27:08 mork qdiskd[2206]: <warning> qdiskd: read (system call) has hung for 40 seconds
Nov  5 08:27:08 mork qdiskd[2206]: <warning> In 40 more seconds, we will be evicted
Nov  5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov  5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov  5 09:48:23 mork qdiskd[2206]: <warning> qdiskd: read (system call) has hung for 40 seconds
Nov  5 09:48:23 mork qdiskd[2206]: <warning> In 40 more seconds, we will be evicted
Nov  5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status
Nov  5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status

Any timings changed between releases?
My relevant lines about timings in cluster.conf were in 5.3 and remained so in 5.4:

<cluster alias="clumm" config_version="7" name="clumm">
        <totem token="162000"/>
        <cman quorum_dev_poll="80000" expected_votes="3" two_node="0"/>
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>

        <quorumd device="/dev/sda" interval="5" label="clummquorum" log_facility="local4" log_level="7" tko="16" votes="1">
                <heuristic interval="2" program="ping -c1 -w1 192.168.122.1" score="1" tko="3000"/>
        </quorumd>

(tko very big in heuristic because I was testing best and safer way to do on-the-fly changes to heuristic, due to network maintenance activity causing gw disappear for some time, not predictable by the net-guys...)
 
I don't know if this message is deriving from a problem with latencies in my virtual env or not....
On the host side I don't see any message with dmesg command or in /var/log/messages.....

2) saw that a new kernel just released...... ;-(
Hints about possible interferences with cluster infra?

Gianluca
 


Probably 1) is due to this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=500450
that found its solution released in  RHSA-2009-1341 advisory
with cman-2.0.115-1.el5.x86_64.rpm.
And coming from 2.0.98 this is reasonable.
In my case tko=16 and interval=5, so that max time tolerance is about 80 seconds that is the 40+40 seconds I see inside the messages....
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux