Hi, everybody, We've configured our qdisk/cman/multipath timeout settings, based on the following KB: http://kbase.redhat.com/faq/docs/DOC-2882. The cluster is RHCS 5.4 + PowerPath 5.3.1 (1), Basically, I've tried the following values, as you can see in cluster.conf (2): PowerPath failover = X = 45 seconds qdisk failover = X * 1,3 = 58,5 (tko = 59 s) cman failover = X * 2,7 = 121,5 (token = 122000 ms) However, when we've done a simple test, by removing heartbeat interface, it took almost 6 minutes to fence one of the nodes (3). We'd like to know, if this behavior is expected. I really appreciate any help on that! Thanks! (1) [root@mercurio dell]# rpm -qi EMCpower.LINUX Name : EMCpower.LINUX Relocations: / Version : 5.3.1.00.00 Vendor: EMC, Inc. Release : 111 Build Date: Thu 13 Aug 2009 04:01:31 PM BRT Install Date: Wed 02 Jun 2010 03:01:44 PM BRT Build Host: lsca2111.lss.emc.com Group : System Environment/Kernel Source RPM: EMCpower.LINUX-5.3.1.00.00-111.src.rpm Size : 22070425 License: Copyright (c) 2002-2009, EMC Corporation. All Rights Reserved. Signature : (none) Summary : EMC PowerPath Description : Multi-path software providing fail-over and load-sharing for SCSI disks. (2) Source: /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster alias="clu-informix" config_version="17" name="clu-informix"> <fence_daemon clean_start="0" post_fail_delay="30" post_join_delay="5"/> <clusternodes> <clusternode name="clu-urano" nodeid="1" votes="1"> <fence> <method name="1"> <device name="fence_urano"/> </method> </fence> </clusternode> <clusternode name="clu-gemini" nodeid="2" votes="1"> <fence> <method name="1"> <device name="fence_gemini"/> </method> </fence> </clusternode> </clusternodes> <cman quorum_dev_poll="50000" expected_votes="3"/> <fencedevices> <fencedevice agent="fence_ipmilan" ipaddr="gemini-ipmi" login="cluster" name="fence_gemini" passwd="clusteraguia" method="cycle"/> <fencedevice agent="fence_ipmilan" ipaddr="urano-ipmi" login="cluster" name="fence_urano" passwd="clusteraguia" method="cycle"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="srvkrm" nofailback="0" ordered="0" restricted="0"> <failoverdomainnode name="clu-urano" priority="1"/> <failoverdomainnode name="clu-gemini" priority="1"/> </failoverdomain> <failoverdomain name="srvvdsa" nofailback="0" ordered="0" restricted="0"> <failoverdomainnode name="clu-urano" priority="1"/> <failoverdomainnode name="clu-gemini" priority="1"/> </failoverdomain> </failoverdomains> ... # Removed service and resource tags </rm> <totem token="122000"/> <quorumd device="/dev/emcpowera1" interval="1" min_score="1" tko="59" votes="1"/> </cluster> (3) Heartbeat tests: [root@gemini ~]# clustat Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clu-urano 1 Online, rgmanager clu-gemini 2 Online, Local, rgmanager /dev/emcpowera1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:srvkrm clu-urano started service:srvvdsa clu-urano started (3.1) Removed the heartbeat interface in gemini server, at Jun 7, 13:55:07. (3.2) Around 60-80 seconds, got 'token lost' in gemini. Jun 7 13:56:28 gemini openais[5922]: [TOTEM] The token was lost in the OPERATIONAL state. Jun 7 13:56:28 gemini openais[5922]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). Jun 7 13:56:28 gemini openais[5922]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jun 7 13:56:28 gemini openais[5922]: [TOTEM] entering GATHER state from 2. (3.2) Then, after 121 seconds, got the second 'token lost', but in urano. Jun 7 13:58:29 urano openais[5837]: [TOTEM] The token was lost in the OPERATIONAL state. Jun 7 13:58:29 urano openais[5837]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). Jun 7 13:58:29 urano openais[5837]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jun 7 13:58:29 urano openais[5837]: [TOTEM] entering GATHER state from 2. (3.3) After 122 seconds, node urano has left. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] entering GATHER state from 0. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] Creating commit token because I am the rep. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] Saving state aru 34 high seq received 34 Jun 7 14:00:32 gemini openais[5922]: [TOTEM] Storing new sequence id for ring 140 Jun 7 14:00:32 gemini openais[5922]: [TOTEM] entering COMMIT state. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] entering RECOVERY state. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] position [0] member 10.1.1.32: Jun 7 14:00:32 gemini openais[5922]: [TOTEM] previous ring seq 316 rep 10.1.1.32 Jun 7 14:00:32 gemini openais[5922]: [TOTEM] aru 34 high delivered 34 received flag 1 Jun 7 14:00:32 gemini openais[5922]: [TOTEM] Did not need to originate any messages in recovery. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] Sending initial ORF token Jun 7 14:00:32 gemini openais[5922]: [CLM ] CLM CONFIGURATION CHANGE Jun 7 14:00:32 gemini openais[5922]: [CLM ] New Configuration: Jun 7 14:00:32 gemini openais[5922]: [CLM ] r(0) ip(10.1.1.32) Jun 7 14:00:32 gemini openais[5922]: [CLM ] Members Left: Jun 7 14:00:32 gemini openais[5922]: [CLM ] r(0) ip(10.1.1.39) Jun 7 14:00:32 gemini openais[5922]: [CLM ] Members Joined: Jun 7 14:00:32 gemini openais[5922]: [CLM ] CLM CONFIGURATION CHANGE Jun 7 14:00:32 gemini openais[5922]: [CLM ] New Configuration: Jun 7 14:00:32 gemini kernel: dlm: closing connection to node 1 Jun 7 14:00:32 gemini openais[5922]: [CLM ] r(0) ip(10.1.1.32) Jun 7 14:00:32 gemini openais[5922]: [CLM ] Members Left: Jun 7 14:00:32 gemini openais[5922]: [CLM ] Members Joined: Jun 7 14:00:32 gemini openais[5922]: [SYNC ] This node is within the primary component and will provide service. Jun 7 14:00:32 gemini openais[5922]: [TOTEM] entering OPERATIONAL state. Jun 7 14:00:32 gemini openais[5922]: [CLM ] got nodejoin message 10.1.1.32 Jun 7 14:00:32 gemini openais[5922]: [CPG ] got joinlist message from node 2 (3.3) After 48 seconds (post_fail_delay), urano was fenced. Jun 7 14:01:20 gemini fenced[5971]: clu-urano not a cluster member after 48 sec post_fail_delay Jun 7 14:01:20 gemini fenced[5971]: fencing node "clu-urano" Jun 7 14:01:20 gemini fenced[5971]: fence "clu-urano" success *Ricardo Masashi Maeda* Consultor Oracle / DBA ricardo.maeda@xxxxxxxxxxxxxxxx *Webbertek - Professional IT Services* +55 (41) 4063-8448 - fixo +55 (41) 8834-8354 - celular -- Esta mensagem foi verificada pelo sistema de antivmrus e acredita-se estar livre de perigo. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster