Hello Lon, Am Donnerstag 24 Januar 2008 00:12:41 schrieb Lon Hohberger: > On Wed, 2008-01-23 at 17:53 -0500, Lon Hohberger wrote: > > On Tue, 2008-01-15 at 15:45 +0100, Holger L. Ratzel wrote: > > > Hi, > > > > > > Am Montag 14 Januar 2008 21:03:46 schrieb Lon Hohberger: > > > > So, what was happening was this: > > > > > > [...] > > > > > > > First, let's ping the router with the cable unplugged to see how long > > > > it takes for our heuristic to complete when things are "broken". On > > > > my machine: > > > > > > > > [lhh@ayanami ~]$ time ping -c1 -t1 frederick > > > > PING frederick (12.1.2.99) 56(84) bytes of data. > > > > > > > > >From ayanami (12.1.2.37) icmp_seq=1 Destination Host Unreachable > > > > Holger, > > > > Digging deeper -- for some reason, ping occasionally doesn't exit for > > some reason if you make the dest IP unreachable, but only if started > > from the init script - e.g. 'service qdiskd start'. > > > > I was working with someone today and we reproduced it. > > https://bugzilla.redhat.com/show_bug.cgi?id=429927 > > Very, very strange indeed. I've tried to implement the workaround given in the bug report: - I've created a wrapper for ping (copied your attachment) - Changed cluster.conf to give qdiskd more time to finish its job (see attached cluster.conf) Now qudiskd occasionaly reports the heuristik to be down (the network isn't touched, no cable pulled): Jan 25 15:17:34 testcluster-2 qdiskd[2151]: <info> Heuristic: 'ping-wrap -c3 -t1 10.200.10.1' DOWN (1/1) Jan 25 15:17:36 testcluster-2 qdiskd[2151]: <notice> Score insufficient for master operation (0/1; required=1); downgrading The result is that this node gets fenced an will reboot. This repeats after some time on the other node too, creating an endless loop. Do you know when your fix will make it into the regular upgrades for RHEL5? Regards, Holger -- ----------------- SHE - IT-Sicherheit von Experten ------------------ SHE Informationstechnologie AG Holger L. Ratzel Fon:+49 621 5200 - 210 Service Delivery & Support Fax:+49 621 5200 - 555 Donnersbergweg 3 holger.ratzel@xxxxxxx D-67059 Ludwigshafen http://www.she.net/ Sitz der Gesellschaft und Registergericht Ludwigshafen HRB 4593 Aufsichtsratsvorsitzender: Ulrich Engelhardt Vorstand: Klaus Schulz -------------------- while( !asleep( ) ) ++sheep; ------------------- PGP-Fingerprint: 9A 73 40 22 72 64 BE D1 D8 1A 54 3C 5B 64 AF C3 CC E3 CA A8 Get my PGP public key at: http://pgp.she.net/
<?xml version="1.0"?> <cluster alias="Test" config_version="30" name="Test"> <quorumd interval="5" label="Qdisk1" tko="3" votes="1"> <heuristic interval="5" program="ping-wrap -c3 -t1 10.200.10.1" score="1" tko="1"/> </quorumd> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="testcluster-2" nodeid="2" votes="1"> <fence> <method name="1"> <device name="RPS"/> </method> </fence> <multicast addr="224.0.0.10" interface="eth0"/> </clusternode> <clusternode name="testcluster-1" nodeid="1" votes="1"> <fence> <method name="1"> <device name="RPS"/> </method> </fence> <multicast addr="224.0.0.10" interface="eth0"/> </clusternode> </clusternodes> <cman expected_votes="3" two_node="0"> <multicast addr="224.0.0.10"/> </cman> <fencedevices> <fencedevice agent="fence_rps10" device="/dev/ttyS0" name="RPS" option="reboot" port="0"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="Apache" ordered="1" restricted="1"> <failoverdomainnode name="testcluster-1" priority="1"/> <failoverdomainnode name="testcluster-2" priority="2"/> </failoverdomain> </failoverdomains> <resources> <ip address="10.200.10.189" monitor_link="1"/> <script file="/etc/init.d/httpd" name="Apache"/> <fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="26076" fstype="ext3" mountpoint="/data/httpd" name="DISK_Apache" options="" self_fence="0"/> </resources> <service autostart="1" domain="Apache" name="HTTPD"> <ip ref="10.200.10.189"/> <script ref="Apache"/> <fs ref="DISK_Apache"/> </service> </rm> <totem token="40000"/> </cluster>
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster