Hi, I am still on openSUSE 10,2, x86_64, using openais-0.80.1-6 (rpm from source rpm), and cluster-2.0.0. (self compiled). Kernel is Linux srv4 2.6.20.15-default #1 SMP Fri Jul 13 12:44:51 CEST 2007 x86_64 x86_64 x86_64 GNU/Linux Now, when I run /etc/init.d/cman, for the first time, the fenced segfaults, and the cman init script hangs and is waiting for the fenced. When I Ctrl-C the init script, and kill the aisexec and the /sbin/ccsd, and then restart the init script, then, after some minutes, the fenced is also starting and the script ends with a "success". the following are the logs while starting /etc/init.d/cman Jul 16 16:19:49 srv4 ccsd[29691]: Starting ccsd 2.00.00: Jul 16 16:19:49 srv4 ccsd[29691]: Built: Jul 13 2007 13:24:27 Jul 16 16:19:49 srv4 ccsd[29691]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jul 16 16:19:49 srv4 ccsd[29691]: cluster.conf (cluster name = correo, version = 1) found. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service RELEASE 'subrev 1204 version 0.80.1' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Using default multicast address of 239.192.25.250 Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cpg loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cfg loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais configuration service' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_msg loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais message service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_lck loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evt loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais event service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_ckpt loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_amf loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais availability management framework B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_clm loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evs loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais extended virtual synchrony service' Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cman loaded. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01' Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] downcheck (1000000 ms) fail to recv const (50 msgs) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] send threads (0 threads) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token expired timeout (495 ms) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token problem counter (2000 ms) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP threshold (10 problem count) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP mode set to none. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] heartbeat_failures_allowed (0) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] max_network_delay (50 ms) Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] The network interface [192.168.8.13] is now up. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Created or loaded sequence id 68.192.168.8.13 for this ring. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering GATHER state from 15. Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais extended virtual synchrony service' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais availability management framework B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais event service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais message service B.01.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais configuration service' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01' Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01' Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] CMAN 2.00.00 (built Jul 13 2007 13:24:30) started Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] Not using a virtual synchrony filter. Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service: started and ready to provide service. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Creating commit token because I am the rep. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Saving state aru 0 high seq received 0 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering COMMIT state. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering RECOVERY state. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] position [0] member 192.168.8.13: Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] previous ring seq 68 rep 192.168.8.13 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] aru 0 high delivered 0 received flag 0 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Did not need to originate any messages in recovery. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Storing new sequence id for ring 48 Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Sending initial ORF token Jul 16 16:19:51 srv4 openais[29697]: [CLM ] CLM CONFIGURATION CHANGE Jul 16 16:19:51 srv4 openais[29697]: [CLM ] New Configuration: Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Left: Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Joined: Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary component and will provide service. Jul 16 16:19:51 srv4 openais[29697]: [CLM ] CLM CONFIGURATION CHANGE Jul 16 16:19:51 srv4 openais[29697]: [CLM ] New Configuration: Jul 16 16:19:51 srv4 openais[29697]: [CLM ] r(0) ip(192.168.8.13) Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Left: Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Joined: Jul 16 16:19:51 srv4 openais[29697]: [CLM ] r(0) ip(192.168.8.13) Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary component and will provide service. Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering OPERATIONAL state. Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] quorum regained, resuming activity Jul 16 16:19:51 srv4 openais[29697]: [CLM ] got nodejoin message 192.168.8.13 Jul 16 16:19:51 srv4 ccsd[29691]: Initial status:: Quorate Jul 16 16:19:59 srv4 fenced[29709]: srv5 not a cluster member after 6 sec post_join_delay Jul 16 16:19:59 srv4 kernel: fenced[29709]: segfault at 0000000000000000 rip 0000000000405e97 rsp 00007fff30ee0b80 error 4 below my cluster.conf file: <?xml version="1.0"?> <cluster name="correo" config_version="1"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="srv4" nodeid="1" votes="1"> <fence> <method name="single"> <device name="ilo_srv4"/> </method> </fence> </clusternode> <clusternode name="srv5" nodeid="1" votes="1"> <fence> <method name="single"> <device name="ilo_srv5"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="ilo_srv4" agent="fence_ilo" ipaddr="192.168.8.180" login="ilo" /> <fencedevice name="ilo_srv5" agent="fence_ilo" ipaddr="192.168.8.181" login="ilo" /> </fencedevices> </cluster> any hint what could cause the segfault of the fenced? The ilo boards on the two servers are not yet configured, I don't know whether this could cause the problem? kind regards Sebastian -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster