Re: help migrating corosync from Fedora to RHEL

Dan Frincu <df.cluster@xxxxxxxxx> · Thu, 26 Jul 2012 17:27:15 +0300

Hi,

On Wed, Jul 25, 2012 at 7:57 PM, Legault Mélanie
<melanie.legault.2@xxxxxxxxxxxx> wrote:
>
> Hello,
>
> I have a 3 nodes clusters, I just migrated one from Fedora 13 to RHEL6.2
>
> I copied the /etc/corosync.conf files to the upgraded server, started corosync but it seam that the rhel server is not able to join the existing cluster.
>
> Status on others 2 nodes show:
> Last updated: Wed Jul 25 12:19:58 2012
> Stack: openais
> Current DC: node2 - partition with quorum
> Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065
> 3 Nodes configured, 3 expected votes
> 4 Resources configured.
> ============
>
> Online: [ node1 node2 ]
> OFFLINE: [ node3 ]
>
> on node3:
>
> ============
> Last updated: Wed Jul 25 12:28:34 2012
> Last change: Wed Jul 25 11:26:06 2012 via crmd on node3
> Stack: openais
> Current DC: NONE
> 1 Nodes configured, 2 expected votes
> 0 Resources configured.
> ============
>
> Node node3: UNCLEAN (offline)
>
> then after a few minutes change for
>
> Online [ node3 ]
>
> here is the /etc/corosync/corosync.conf files on all 3 servers.
>
> compatibility: whitetank
>
> totem {
>         token: 5000
>         token_retransmits_before_loss_const: 20
>         join: 1000
>         consensus: 7500
>         vfstype: none
>         version: 2
>         secauth: off
>         threads: 0
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 10.11.12.0
>                 mcastaddr: 239.255.0.0
>                 mcastport: 5555
>         }
>
> }
>
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         to_syslog: no
>         syslog_facility: daemon
>         logfile: /var/log/cluster/corosync.log
>         debug: off
>         timestamp: on
>         #logger_subsys {
>         #       subsys: AMF
>         #       debug: off
>         #}
> }
>
> amf {
>         mode: disabled
> }
>
>
> I tried to import the CIB files saved by a working node into node3 but I add an error:
> Signon to CIB failed: connection failed
> Init failed, could not perform requested operations
> ERROR: cannot parse xml: no element found: line 1, column 0
> ERROR: No CIB!
>
>
> if I run corosync-objctl on node3 I have the folowing
> ...
> runtime.totem.pg.mrp.srp.members.274761738.ip=r(0) ip(10.11.12.11)
> runtime.totem.pg.mrp.srp.members.274761738.join_count=1
> runtime.totem.pg.mrp.srp.members.274761738.status=joined
> runtime.totem.pg.mrp.srp.members.174098442.ip=r(0) ip(10.11.12.12)
> runtime.totem.pg.mrp.srp.members.174098442.join_count=1
> runtime.totem.pg.mrp.srp.members.174098442.status=joined
> runtime.totem.pg.mrp.srp.members.190875658.ip=r(0) ip(10.11.12.13)
> runtime.totem.pg.mrp.srp.members.190875658.join_count=1
> runtime.totem.pg.mrp.srp.members.190875658.status=joined
> ...
> as if node3 can see others nodes

What versions of corosync and pacemaker do you have on node 3? What
version of corosync is on nodes 1 and 2?

>
> I do have the following in the log files:
> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity

This is a normal message, it means the node doesn't have STONITH, this
happens because it can't see the rest of the cluster (which would also
mean it would get the cluster config from the other nodes).

>
>
> Could you provide me with hint of what to do? Firewall is not in cause (I did a test by disabling it all). Are Fedora and RHEL RPM based packages incompatible?

Between Fedora 13 and RHEL 6 it may or may not work, so the answer
would be, it depends. Best thing for a rolling upgrade would be to put
the cluster in maintenance-mode, upgrade all software, make sure it
works, refresh, reprobe, and if all is ok, take the cluster out of
maintenance mode.

HTH,
Dan

>
> Thanks,
> Mélanie
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

-- 
Dan Frincu
CCNA, RHCE
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss