Adding node to clvm cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I'm trying to join a new node into an existing 5 node CLVM cluster but I just can't get it to work.

When ever I add a new node (
I put into the cluster.conf and reloaded with cman_tool version -r -S)  I end up with situations like the new node wants to gain the quorum and starts to fence the existing pool master and appears to generate some sort of split cluster. Does it work at all, corosync and dlm do not know about the recently added node ?

New Node 
==========

Node  Sts   Inc   Joined               Name
   1   X      0                        hv-b1clcy1
   2   X      0                        hv-b1flcy1
   3   X      0                        hv-b1fmcy1
   4   X      0                        hv-b1dmcy1
   5   X      0                        hv-b1fkcy1
   6   M     80   2014-01-07 21:37:42  hv-b1dkcy1 <--- host added


Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] The network interface [10.14.18.77] is now up.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Using quorum provider quorum_cman
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CMAN  ] CMAN 3.0.12.1 (built Sep  3 2013 09:17:34) started
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync configuration service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync profile loading service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Using quorum provider quorum_cman
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.65}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.67}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.68}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.70}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.66}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU member {10.14.18.77}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CMAN  ] quorum regained, resuming activity
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] This node is within the primary component and will provide service.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CPG   ] chosen downlist: sender r(0) ip(10.14.18.77) ; members(old:0 left:0)
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan  7 21:37:46 hv-b1dkcy1 fenced[12620]: fenced 3.0.12.1 started
Jan  7 21:37:46 hv-b1dkcy1 dlm_controld[12643]: dlm_controld 3.0.12.1 started
Jan  7 21:37:47 hv-b1dkcy1 gfs_controld[12695]: gfs_controld 3.0.12.1 started
Jan  7 21:37:54 hv-b1dkcy1 fenced[12620]: fencing node hv-b1clcy1

sudo -i corosync-objctl  |grep member

totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1
totem.interface.member.memberaddr=hv-b1dkcy1
runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77)
runtime.totem.pg.mrp.srp.members.6.join_count=1
runtime.totem.pg.mrp.srp.members.6.status=joined


Existing Node 
=============

member 6 has not been added to the quorum list :

Jan  7 21:36:28 hv-b1clcy1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [CPG   ] chosen downlist: sender r(0) ip(10.14.18.65) ; members(old:4 left:0)


Node  Sts   Inc   Joined               Name
   1   M   4468   2013-12-10 14:33:27  hv-b1clcy1
   2   M   4468   2013-12-10 14:33:27  hv-b1flcy1
   3   M   5036   2014-01-07 17:51:26  hv-b1fmcy1
   4   X   4468                        hv-b1dmcy1 (dead at the moment)
   5   M   4468   2013-12-10 14:33:27  hv-b1fkcy1
   6   X      0                        hv-b1dkcy1  <--- added



Jan  7 21:36:28 hv-b1clcy1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [CPG   ] chosen downlist: sender r(0) ip(10.14.18.65) ; members(old:4 left:0)
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [MAIN  ] Completed service synchronization, ready to provide service.


totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1.
runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65)
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66)
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined
runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68)
runtime.totem.pg.mrp.srp.members.4.join_count=1
runtime.totem.pg.mrp.srp.members.4.status=left
runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70)
runtime.totem.pg.mrp.srp.members.5.join_count=1
runtime.totem.pg.mrp.srp.members.5.status=joined
runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67)
runtime.totem.pg.mrp.srp.members.3.join_count=3
runtime.totem.pg.mrp.srp.members.3.status=joined


cluster.conf:

<?xml version="1.0"?>
<cluster config_version="32" name="hv-1618-110-1">
  <fence_daemon clean_start="0"/>
  <cman transport="udpu" expected_votes="1"/>
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="hv-b1clcy1" votes="1" nodeid="1"><fence><method name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1fmcy1" votes="1" nodeid="3"><fence><method name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1dmcy1" votes="1" nodeid="4"><fence><method name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1fkcy1" votes="1" nodeid="5"><fence><method name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1flcy1" votes="1" nodeid="2"><fence><method name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1dkcy1" votes="1" nodeid="6"><fence><method name="single"><device name="human"/></method></fence></clusternode>
  </clusternodes>
  <fencedevices>
  <fencedevice name="human" agent="manual"/></fencedevices>
  <rm/>
</cluster>

(manual fencing just for testing)


corosync.conf:

compatibility: whitetank
totem {
  version: 2
  secauth: off
  threads: 0
  # fail_recv_const: 5000
  interface {
    ringnumber: 0
    bindnetaddr: 10.14.18.0
    mcastaddr: 239.0.0.4
    mcastport: 5405
  }
}
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  # the pathname of the log file
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
  logger_subsys {
    subsys: AMF
    debug: off
  }
}

amf {
  mode: disabled
}


Many thanks,
Bjoern

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux