Re: Two-node cluster: Node attempts stateful merge after clean reboot

Pascal Ehlert <pascal@xxxxxxxxxxxx> · Wed, 11 Sep 2013 14:50:11 +0200



    The
      problem is that, if you enable cman on boot, the fenced node will
      try to join the cluster, fail to reach it's peer after
      post_join_delay (default 6 seconds, iirc) and fence it's peer.
      That peer reboots, starts cman, tries to connect, fenced it's
      peer...
      

      The easiest way to avoid this in 2-node clusters is to not let
      cman/rgmanager start automatically. That way, if a node is fenced,
      it will boot back up and you can log into remotely (assuming it's
      not totally dead). When you know things are fixed, manually start
      cman.
      

    I my case however, the node which is trying to join is fully
    operational and has network access. Also if you look at the
    configuration that I had in my original email, my post_join_delay is
    360 (for testing purposes), so there is no way that a timeout
    occurs.

    
    I might be wrong here, but judging from corosync's log file, the
    other node even joins the cluster successfully, before being marked
    for fencing by dlm_controld:

    
      Sep 11 11:14:09 corosync [CLM   ] CLM CONFIGURATION CHANGE
Sep 11 11:14:09 corosync [CLM   ] New Configuration:
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.1)
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.2)
Sep 11 11:14:09 corosync [CLM   ] Members Left:
Sep 11 11:14:09 corosync [CLM   ] Members Joined:
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.2)
Sep 11 11:14:09 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2
Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2
    
    
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster