On 07/26/2012 07:51 AM, Legault Mélanie wrote: > >> >> Hi, >> >> On Wed, Jul 25, 2012 at 7:57 PM, Legault Mélanie >> <melanie.legault.2@xxxxxxxxxxxx> wrote: >> >>> Hello, >>> >>> I have a 3 nodes clusters, I just migrated one from Fedora 13 to RHEL6.2 >>> >>> I copied the /etc/corosync.conf files to the upgraded server, started corosync but it seam that the rhel server is not able to join the existing cluster. >>> >>> Status on others 2 nodes show: >>> Last updated: Wed Jul 25 12:19:58 2012 >>> Stack: openais >>> Current DC: node2 - partition with quorum >>> Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065 >>> 3 Nodes configured, 3 expected votes >>> 4 Resources configured. >>> ============ >>> >>> Online: [ node1 node2 ] >>> OFFLINE: [ node3 ] >>> >>> on node3: >>> >>> ============ >>> Last updated: Wed Jul 25 12:28:34 2012 >>> Last change: Wed Jul 25 11:26:06 2012 via crmd on node3 >>> Stack: openais >>> Current DC: NONE >>> 1 Nodes configured, 2 expected votes >>> 0 Resources configured. >>> ============ >>> >>> Node node3: UNCLEAN (offline) >>> >>> then after a few minutes change for >>> >>> Online [ node3 ] >>> >>> here is the /etc/corosync/corosync.conf files on all 3 servers. >>> >>> compatibility: whitetank >>> >>> totem { >>> token: 5000 >>> token_retransmits_before_loss_const: 20 >>> join: 1000 >>> consensus: 7500 >>> vfstype: none >>> version: 2 >>> secauth: off >>> threads: 0 >>> interface { >>> ringnumber: 0 >>> bindnetaddr: 10.11.12.0 >>> mcastaddr: 239.255.0.0 >>> mcastport: 5555 >>> } >>> >>> } >>> >>> logging { >>> fileline: off >>> to_stderr: no >>> to_logfile: yes >>> to_syslog: no >>> syslog_facility: daemon >>> logfile: /var/log/cluster/corosync.log >>> debug: off >>> timestamp: on >>> #logger_subsys { >>> # subsys: AMF >>> # debug: off >>> #} >>> } >>> >>> amf { >>> mode: disabled >>> } >>> >>> >>> I tried to import the CIB files saved by a working node into node3 but I add an error: >>> Signon to CIB failed: connection failed >>> Init failed, could not perform requested operations >>> ERROR: cannot parse xml: no element found: line 1, column 0 >>> ERROR: No CIB! >>> >>> >>> if I run corosync-objctl on node3 I have the folowing >>> ... >>> runtime.totem.pg.mrp.srp.members.274761738.ip=r(0) ip(10.11.12.11) >>> runtime.totem.pg.mrp.srp.members.274761738.join_count=1 >>> runtime.totem.pg.mrp.srp.members.274761738.status=joined >>> runtime.totem.pg.mrp.srp.members.174098442.ip=r(0) ip(10.11.12.12) >>> runtime.totem.pg.mrp.srp.members.174098442.join_count=1 >>> runtime.totem.pg.mrp.srp.members.174098442.status=joined >>> runtime.totem.pg.mrp.srp.members.190875658.ip=r(0) ip(10.11.12.13) >>> runtime.totem.pg.mrp.srp.members.190875658.join_count=1 >>> runtime.totem.pg.mrp.srp.members.190875658.status=joined >>> ... >>> as if node3 can see others nodes >> >> What versions of corosync and pacemaker do you have on node 3? What >> version of corosync is on nodes 1 and 2? > node 1 & 2 > corosync 1.3.1-1.fc13 > pacemaker 1.1.4-5.fc13 > > node 3 > corosync 1.4.1-7.el6 > pacemaker 1.1.7-6.el6 > Note our rolling upgrade support model is that you roll to the latest z stream for the y your on, then increase the y stream by one, set z to 1, roll to latest z stream again, and repeat until your at latest version. In your case: corosync-1.3.1-1 to 1.3.5 corosync 1.3.5 to corosync-1.4.1 corosync 1.4.1 to corosync 1.4.3 I believe there was an error somewhere in the 3 series that broke on wire which was fixed to be backward compatible with a later z stream. Try the above rolling upgrade method if you want to keep things online. A simpler approach would be to upgrade to latest software and then deploy that. Regards -steve >> >>> >>> I do have the following in the log files: >>> Jul 25 12:43:37 [8203] node3 pengine: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined >>> Jul 25 12:43:37 [8203] node3 pengine: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option >>> Jul 25 12:43:37 [8203] node3 pengine: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity >> >> This is a normal message, it means the node doesn't have STONITH, this >> happens because it can't see the rest of the cluster (which would also >> mean it would get the cluster config from the other nodes). >> > > How come node 3 can't see node 1 & 2 if it can see them as shown with corosync-objctl output (see higher)? > > >>> >>> >>> Could you provide me with hint of what to do? Firewall is not in cause (I did a test by disabling it all). Are Fedora and RHEL RPM based packages incompatible? >> >> Between Fedora 13 and RHEL 6 it may or may not work, so the answer >> would be, it depends. Best thing for a rolling upgrade would be to put >> the cluster in maintenance-mode, upgrade all software, make sure it >> works, refresh, reprobe, and if all is ok, take the cluster out of >> maintenance mode. >> >> HTH, >> Dan >> >>> >>> Thanks, >>> Mélanie >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >> >> >> >> -- >> Dan Frincu >> CCNA, RHCE >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss > > thanks, > Mélanie > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss