Hello All again,
I have a two nodes cluster with the following config
<?xml version="1.0"?>
<cluster alias="tweety" config_version="132" name="tweety">
<fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>
<clusternodes>
<clusternode name="tweety-1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="human-fence" nodename="tweety-1"/>
</method>
</fence>
</clusternode>
<clusternode name="tweety-2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="human-fence" nodename="tweety-2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="human-fence"/>
</fencedevices>
<rm log_level="7">
<failoverdomains>
<failoverdomain name="tweety1" ordered="0" restricted="1">
<failoverdomainnode name="tweety-1" priority="1"/>
</failoverdomain>
<failoverdomain name="tweety2" ordered="0" restricted="1">
<failoverdomainnode name="tweety-2" priority="1"/>
</failoverdomain>
<failoverdomain name="tweety-cluster" ordered="1" restricted="1">
<failoverdomainnode name="tweety-2" priority="1"/>
<failoverdomainnode name="tweety-1" priority="1"/>
</failoverdomain>
<failoverdomain name="tweety-1-2" ordered="1" restricted="1">
<failoverdomainnode name="tweety-1" priority="1"/>
<failoverdomainnode name="tweety-2" priority="2"/>
</failoverdomain>
<failoverdomain name="tweety-2-1" ordered="1" restricted="1">
<failoverdomainnode name="tweety-1" priority="2"/>
<failoverdomainnode name="tweety-2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/etc/init.d/clvmd" name="clvmd"/>
<script file="/etc/init.d/gfs2" name="GFS2"/>
<script file="/etc/init.d/boinc" name="BOINC"/>
<script file="/etc/init.d/gfs2-check" name="GFS2-Control"/>
</resources>
<service autostart="1" domain="tweety1" name="LV-tweety1">
<script ref="clvmd">
<script ref="GFS2"/>
</script>
</service>
<service autostart="1" domain="tweety2" name="LV-tweety2">
<script ref="clvmd">
<script ref="GFS2"/>
</script>
</service>
<service autostart="1" domain="tweety1" name="BOINC-t1">
<script ref="BOINC"/>
</service>
<service autostart="1" domain="tweety2" exclusive="0" name="BOINC-t2" recovery="restart">
<script ref="BOINC"/>
</service>
</rm>
</cluster>
Tweety-1 boots up smoothly and brings up all the services
Tweety-2 boots up smoothly and brings up no services unless I manually do “service clvmd start” and “service gfs2 start”
The log on tweety-2 is:
Mar 24 04:30:18 localhost openais[2681]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Mar 24 04:30:18 localhost openais[2681]: [SERV ] Initialising service handler 'openais message service B.01.01'
Mar 24 04:30:18 localhost openais[2681]: [SERV ] Initialising service handler 'openais configuration service'
Mar 24 04:30:18 localhost ccsd[2672]: Cluster is not quorate. Refusing connection.
Mar 24 04:30:18 localhost openais[2681]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Mar 24 04:30:18 localhost ccsd[2672]: Error while processing connect: Connection refused
Mar 24 04:30:18 localhost openais[2681]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Mar 24 04:30:18 localhost openais[2681]: [CMAN ] CMAN 2.0.73 (built Nov 29 2007 18:40:32) started
Mar 24 04:30:18 localhost openais[2681]: [SYNC ] Not using a virtual synchrony filter.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Creating commit token because I am the rep.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Saving state aru 0 high seq received 0
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Storing new sequence id for ring 41c
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering COMMIT state.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering RECOVERY state.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] position [0] member 10.254.254.254:
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] previous ring seq 1048 rep 10.254.254.254
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] aru 0 high delivered 0 received flag 1
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Did not need to originate any messages in recovery.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Sending initial ORF token
Mar 24 04:30:18 localhost openais[2681]: [CLM ] CLM CONFIGURATION CHANGE
Mar 24 04:30:18 localhost openais[2681]: [CLM ] New Configuration:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Left:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Joined:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] CLM CONFIGURATION CHANGE
Mar 24 04:30:18 localhost openais[2681]: [CLM ] New Configuration:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.254)
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Left:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Joined:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.254)
Mar 24 04:30:18 localhost openais[2681]: [SYNC ] This node is within the primary component and will provide service.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering OPERATIONAL state.
Mar 24 04:30:18 localhost openais[2681]: [CMAN ] quorum regained, resuming activity
Mar 24 04:30:18 localhost openais[2681]: [CLM ] got nodejoin message 10.254.254.254
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering GATHER state from 11.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Saving state aru 9 high seq received 9
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Storing new sequence id for ring 420
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering COMMIT state.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering RECOVERY state.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] position [0] member 10.254.254.253:
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] previous ring seq 1052 rep 10.254.254.253
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] aru c high delivered c received flag 1
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] position [1] member 10.254.254.254:
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] previous ring seq 1052 rep 10.254.254.254
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] aru 9 high delivered 9 received flag 1
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] Did not need to originate any messages in recovery.
Mar 24 04:30:18 localhost openais[2681]: [CLM ] CLM CONFIGURATION CHANGE
Mar 24 04:30:18 localhost openais[2681]: [CLM ] New Configuration:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.254)
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Left:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Joined:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] CLM CONFIGURATION CHANGE
Mar 24 04:30:18 localhost openais[2681]: [CLM ] New Configuration:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.253)
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.254)
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Left:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] Members Joined:
Mar 24 04:30:18 localhost openais[2681]: [CLM ] r(0) ip(10.254.254.253)
Mar 24 04:30:18 localhost openais[2681]: [SYNC ] This node is within the primary component and will provide service.
Mar 24 04:30:18 localhost openais[2681]: [TOTEM] entering OPERATIONAL state.
Mar 24 04:30:18 localhost openais[2681]: [MAIN ] Received message has invalid digest... ignoring.
Mar 24 04:30:18 localhost openais[2681]: [MAIN ] Invalid packet data
Mar 24 04:30:18 localhost openais[2681]: [CLM ] got nodejoin message 10.254.254.253
Mar 24 04:30:18 localhost openais[2681]: [CLM ] got nodejoin message 10.254.254.254
Mar 24 04:30:18 localhost openais[2681]: [CPG ] got joinlist message from node 2
Mar 24 04:30:18 localhost openais[2681]: [CPG ] got joinlist message from node 1
Mar 24 04:30:18 localhost ccsd[2672]: Initial status:: Quorate
Mar 24 04:30:44 localhost modclusterd: startup succeeded
Mar 24 04:30:45 localhost kernel: dlm: Using TCP for communications
Mar 24 04:30:45 localhost kernel: dlm: connecting to 1
Mar 24 04:30:45 localhost kernel: dlm: got connection from 1
Mar 24 04:30:46 localhost clurgmgrd[3200]: <notice> Resource Group Manager Starting
Mar 24 04:30:46 localhost clurgmgrd[3200]: <info> Loading Service Data
Mar 24 04:30:55 localhost clurgmgrd[3200]: <info> Initializing Services
Mar 24 04:30:58 localhost clurgmgrd: [3200]: <err> script:clvmd: stop of /etc/init.d/clvmd failed (returned 143)
Mar 24 04:30:58 localhost clurgmgrd[3200]: <notice> stop on script "clvmd" returned 1 (generic error)
AND THAT’s IT ALL.
However on tweety-1 the log goes further then were tweety-2 stops:
Mar 24 04:23:39 tweety1 clurgmgrd[3379]: <info> Services Initialized
Mar 24 04:23:39 tweety1 clurgmgrd[3379]: <info> State change: Local UP
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> Starting stopped service service:LV-tweety1
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> Starting stopped service service:BOINC-t1
Mar 24 04:23:45 tweety1 clurgmgrd: [3379]: <err> script:BOINC: start of /etc/init.d/boinc failed (returned 1)
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> start on script "BOINC" returned 1 (generic error)
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <warning> #68: Failed to start service:BOINC-t1; return value: 1
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> Stopping service service:BOINC-t1
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> Service service:BOINC-t1 is recovering
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <warning> #71: Relocating failed service service:BOINC-t1
Mar 24 04:23:45 tweety1 clurgmgrd[3379]: <notice> Stopping service service:BOINC-t1
Mar 24 04:23:46 tweety1 clurgmgrd[3379]: <notice> Service service:BOINC-t1 is stopped
Mar 24 04:23:46 tweety1 clvmd: Cluster LVM daemon started - connected to CMAN
Mar 24 04:23:48 tweety1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "tweety:gfs0"
Mar 24 04:23:48 tweety1 kernel: GFS2: fsid=tweety:gfs0.0: Joined cluster. Now mounting FS...
Mar 24 04:23:49 tweety1 clurgmgrd[3379]: <notice> Service service:LV-tweety1 started
Mar 24 04:24:42 tweety1 kernel: dlm: closing connection to node 2
Mar 24 04:25:21 tweety1 kernel: dlm: closing connection to node 2
Mar 24 04:27:32 tweety1 kernel: dlm: closing connection to node 2
Can someone give food for thoughts as to what the problem might be? Do I need to provide more information?
Thank you all for your time
Theophanis Kontogiannis
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster