Re: help migrating corosync from Fedora to RHEL

Legault Mélanie <melanie.legault.2@xxxxxxxxxxxx> · Thu, 26 Jul 2012 10:51:53 -0400

>
>Hi,
>
>On Wed, Jul 25, 2012 at 7:57 PM, Legault Mélanie
><melanie.legault.2@xxxxxxxxxxxx> wrote:
>
>> Hello,
>>
>> I have a 3 nodes clusters, I just migrated one from Fedora 13 to RHEL6.2
>>
>> I copied the /etc/corosync.conf files to the upgraded server, started corosync but it seam that the rhel server is not able to join the existing cluster.
>>
>> Status on others 2 nodes show:
>> Last updated: Wed Jul 25 12:19:58 2012
>> Stack: openais
>> Current DC: node2 - partition with quorum
>> Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065
>> 3 Nodes configured, 3 expected votes
>> 4 Resources configured.
>> ============
>>
>> Online: [ node1 node2 ]
>> OFFLINE: [ node3 ]
>>
>> on node3:
>>
>> ============
>> Last updated: Wed Jul 25 12:28:34 2012
>> Last change: Wed Jul 25 11:26:06 2012 via crmd on node3
>> Stack: openais
>> Current DC: NONE
>> 1 Nodes configured, 2 expected votes
>> 0 Resources configured.
>> ============
>>
>> Node node3: UNCLEAN (offline)
>>
>> then after a few minutes change for
>>
>> Online [ node3 ]
>>
>> here is the /etc/corosync/corosync.conf files on all 3 servers.
>>
>> compatibility: whitetank
>>
>> totem {
>>         token: 5000
>>         token_retransmits_before_loss_const: 20
>>         join: 1000
>>         consensus: 7500
>>         vfstype: none
>>         version: 2
>>         secauth: off
>>         threads: 0
>>         interface {
>>                 ringnumber: 0
>>                 bindnetaddr: 10.11.12.0
>>                 mcastaddr: 239.255.0.0
>>                 mcastport: 5555
>>         }
>>
>> }
>>
>> logging {
>>         fileline: off
>>         to_stderr: no
>>         to_logfile: yes
>>         to_syslog: no
>>         syslog_facility: daemon
>>         logfile: /var/log/cluster/corosync.log
>>         debug: off
>>         timestamp: on
>>         #logger_subsys {
>>         #       subsys: AMF
>>         #       debug: off
>>         #}
>> }
>>
>> amf {
>>         mode: disabled
>> }
>>
>>
>> I tried to import the CIB files saved by a working node into node3 but I add an error:
>> Signon to CIB failed: connection failed
>> Init failed, could not perform requested operations
>> ERROR: cannot parse xml: no element found: line 1, column 0
>> ERROR: No CIB!
>>
>>
>> if I run corosync-objctl on node3 I have the folowing
>> ...
>> runtime.totem.pg.mrp.srp.members.274761738.ip=r(0) ip(10.11.12.11)
>> runtime.totem.pg.mrp.srp.members.274761738.join_count=1
>> runtime.totem.pg.mrp.srp.members.274761738.status=joined
>> runtime.totem.pg.mrp.srp.members.174098442.ip=r(0) ip(10.11.12.12)
>> runtime.totem.pg.mrp.srp.members.174098442.join_count=1
>> runtime.totem.pg.mrp.srp.members.174098442.status=joined
>> runtime.totem.pg.mrp.srp.members.190875658.ip=r(0) ip(10.11.12.13)
>> runtime.totem.pg.mrp.srp.members.190875658.join_count=1
>> runtime.totem.pg.mrp.srp.members.190875658.status=joined
>> ...
>> as if node3 can see others nodes
>
>What versions of corosync and pacemaker do you have on node 3? What
>version of corosync is on nodes 1 and 2?
node 1 & 2 
corosync 1.3.1-1.fc13
pacemaker 1.1.4-5.fc13

node 3
corosync 1.4.1-7.el6
pacemaker 1.1.7-6.el6

>
>>
>> I do have the following in the log files:
>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
>
>This is a normal message, it means the node doesn't have STONITH, this
>happens because it can't see the rest of the cluster (which would also
>mean it would get the cluster config from the other nodes).
>

How come node 3 can't see node 1 & 2 if it can see them as shown with corosync-objctl output (see higher)? 

>>
>>
>> Could you provide me with hint of what to do? Firewall is not in cause (I did a test by disabling it all). Are Fedora and RHEL RPM based packages incompatible?
>
>Between Fedora 13 and RHEL 6 it may or may not work, so the answer
>would be, it depends. Best thing for a rolling upgrade would be to put
>the cluster in maintenance-mode, upgrade all software, make sure it
>works, refresh, reprobe, and if all is ok, take the cluster out of
>maintenance mode.
>
>HTH,
>Dan
>
>>
>> Thanks,
>> Mélanie
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
>
>
>
>--
>Dan Frincu
>CCNA, RHCE
>_______________________________________________
>discuss mailing list
>discuss@xxxxxxxxxxxx
>http://lists.corosync.org/mailman/listinfo/discuss

thanks,
Mélanie
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss