Re: help migrating corosync from Fedora to RHEL

Steven Dake <sdake@xxxxxxxxxx> · Thu, 26 Jul 2012 12:12:56 -0700

On 07/26/2012 07:51 AM, Legault Mélanie wrote:
> 
>>
>> Hi,
>>
>> On Wed, Jul 25, 2012 at 7:57 PM, Legault Mélanie
>> <melanie.legault.2@xxxxxxxxxxxx> wrote:
>>
>>> Hello,
>>>
>>> I have a 3 nodes clusters, I just migrated one from Fedora 13 to RHEL6.2
>>>
>>> I copied the /etc/corosync.conf files to the upgraded server, started corosync but it seam that the rhel server is not able to join the existing cluster.
>>>
>>> Status on others 2 nodes show:
>>> Last updated: Wed Jul 25 12:19:58 2012
>>> Stack: openais
>>> Current DC: node2 - partition with quorum
>>> Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065
>>> 3 Nodes configured, 3 expected votes
>>> 4 Resources configured.
>>> ============
>>>
>>> Online: [ node1 node2 ]
>>> OFFLINE: [ node3 ]
>>>
>>> on node3:
>>>
>>> ============
>>> Last updated: Wed Jul 25 12:28:34 2012
>>> Last change: Wed Jul 25 11:26:06 2012 via crmd on node3
>>> Stack: openais
>>> Current DC: NONE
>>> 1 Nodes configured, 2 expected votes
>>> 0 Resources configured.
>>> ============
>>>
>>> Node node3: UNCLEAN (offline)
>>>
>>> then after a few minutes change for
>>>
>>> Online [ node3 ]
>>>
>>> here is the /etc/corosync/corosync.conf files on all 3 servers.
>>>
>>> compatibility: whitetank
>>>
>>> totem {
>>>         token: 5000
>>>         token_retransmits_before_loss_const: 20
>>>         join: 1000
>>>         consensus: 7500
>>>         vfstype: none
>>>         version: 2
>>>         secauth: off
>>>         threads: 0
>>>         interface {
>>>                 ringnumber: 0
>>>                 bindnetaddr: 10.11.12.0
>>>                 mcastaddr: 239.255.0.0
>>>                 mcastport: 5555
>>>         }
>>>
>>> }
>>>
>>> logging {
>>>         fileline: off
>>>         to_stderr: no
>>>         to_logfile: yes
>>>         to_syslog: no
>>>         syslog_facility: daemon
>>>         logfile: /var/log/cluster/corosync.log
>>>         debug: off
>>>         timestamp: on
>>>         #logger_subsys {
>>>         #       subsys: AMF
>>>         #       debug: off
>>>         #}
>>> }
>>>
>>> amf {
>>>         mode: disabled
>>> }
>>>
>>>
>>> I tried to import the CIB files saved by a working node into node3 but I add an error:
>>> Signon to CIB failed: connection failed
>>> Init failed, could not perform requested operations
>>> ERROR: cannot parse xml: no element found: line 1, column 0
>>> ERROR: No CIB!
>>>
>>>
>>> if I run corosync-objctl on node3 I have the folowing
>>> ...
>>> runtime.totem.pg.mrp.srp.members.274761738.ip=r(0) ip(10.11.12.11)
>>> runtime.totem.pg.mrp.srp.members.274761738.join_count=1
>>> runtime.totem.pg.mrp.srp.members.274761738.status=joined
>>> runtime.totem.pg.mrp.srp.members.174098442.ip=r(0) ip(10.11.12.12)
>>> runtime.totem.pg.mrp.srp.members.174098442.join_count=1
>>> runtime.totem.pg.mrp.srp.members.174098442.status=joined
>>> runtime.totem.pg.mrp.srp.members.190875658.ip=r(0) ip(10.11.12.13)
>>> runtime.totem.pg.mrp.srp.members.190875658.join_count=1
>>> runtime.totem.pg.mrp.srp.members.190875658.status=joined
>>> ...
>>> as if node3 can see others nodes
>>
>> What versions of corosync and pacemaker do you have on node 3? What
>> version of corosync is on nodes 1 and 2?
> node 1 & 2 
> corosync 1.3.1-1.fc13
> pacemaker 1.1.4-5.fc13
> 
> node 3
> corosync 1.4.1-7.el6
> pacemaker 1.1.7-6.el6
> 

Note our rolling upgrade support model is that you roll to the latest z
stream for the y your on, then increase the y stream by one, set z to 1,
roll to latest z stream again, and repeat until your at latest version.

In your case:

corosync-1.3.1-1 to 1.3.5
corosync 1.3.5 to corosync-1.4.1
corosync 1.4.1 to corosync 1.4.3

I believe there was an error somewhere in the 3 series that broke on
wire which was fixed to be backward compatible with a later z stream.
Try the above rolling upgrade method if you want to keep things online.
 A simpler approach would be to upgrade to latest software and then
deploy that.

Regards
-steve

>>
>>>
>>> I do have the following in the log files:
>>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
>>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
>>> Jul 25 12:43:37 [8203] node3    pengine:    error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
>>
>> This is a normal message, it means the node doesn't have STONITH, this
>> happens because it can't see the rest of the cluster (which would also
>> mean it would get the cluster config from the other nodes).
>>
> 
> How come node 3 can't see node 1 & 2 if it can see them as shown with corosync-objctl output (see higher)? 
> 
> 
>>>
>>>
>>> Could you provide me with hint of what to do? Firewall is not in cause (I did a test by disabling it all). Are Fedora and RHEL RPM based packages incompatible?
>>
>> Between Fedora 13 and RHEL 6 it may or may not work, so the answer
>> would be, it depends. Best thing for a rolling upgrade would be to put
>> the cluster in maintenance-mode, upgrade all software, make sure it
>> works, refresh, reprobe, and if all is ok, take the cluster out of
>> maintenance mode.
>>
>> HTH,
>> Dan
>>
>>>
>>> Thanks,
>>> Mélanie
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Dan Frincu
>> CCNA, RHCE
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
> 
> thanks,
> Mélanie
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss