RE: share experience migrating cluster suite from centos 5.3 to centos 5.4

"Peter Tiggerdine" <peter.tiggerdine@xxxxxxxxx> · Wed, 4 Nov 2009 15:33:19 +1000

One problem with the below workflow.

7. Your going to need to copy this over manually otherwise it 
will fail, I've fallen victim of this before. All cluster nodes need to start on 
the current revision of the file before you update it. I think this is a chicken 
and egg problem.

One of 
this things I have configured on my clusters is that all clustered services 
start on it's own runlevel, in my case I have cluster services running on 
runlevel 3 but default boot to runlelvel 2.  This allows a node to boot up 
and get network before racing into the cluster (ideal for wanting to find out 
why it got fenced and solving the problem).

Everything else will work as I've just done this myself 
(except 5 nodes). Your downtime should be quite 
minimal.

Regards,

Peter Tiggerdine
HPC & eResearch Specialist
High 
Performance Computing Group
Information Technology Services
University of 
Queensland
Phone: +61 7 3346 6634
  Fax: +61 7 3346 6630
Email: 
peter.tiggerdine@xxxxxxxxx

From: linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Gianluca 
Cecchi
Sent: Tuesday, 3 November 2009 7:29 AM
To: David 
Teigland
Cc: linux-cluster@xxxxxxxxxx
Subject: Re: 
 share experience migrating cluster suite from centos 5.3 to 
centos 5.4

On Mon, Nov 2, 2009 at 6:25 PM, David Teigland <teigland@xxxxxxxxxx> wrote:

The out-of-memory should be fixed in 
  5.4:

https://bugzilla.redhat.com/show_bug.cgi?id=508829

The 
  fix for dlm_send spinning is not released yet:

https://bugzilla.redhat.com/show_bug.cgi?id=521093

Dave

Thank 
you so much for the feedback.
So I have to expect this freeze and possible 
downtime...... also if my real nmodes  a safer method could be this one 
below for my two nodes + quorum disk cluster?
1) shutdown and restart in 
single user mode of the passive node
So now the cluster is composed of only 
one node in 5.3 without loss of service, at the moment
2) start network and 
update the passive node (as in steps of the first mail)
3) reboot in single 
user mode of the just updated node, and test correct funcionality (without 
cluster)
4) shutdown again of the just updated node
5) shutdown of the 
active node --- NOW we have downtime (planned)
6) startup of the updated 
node, now in 5.4 (and with 508829 bug corrected)
This node should form the 
cluster with 2 votes, itself and the quorum, correct?

7) IDEA: make a 
dummy update to the config on this new running node, only incrementing version 
number by one, so that after, when the other node comes up, it gets the 
config....
Does it make sense or no need/no problems for this when the second 
node will join?

8) power on in single user mode of the node still in 
5.3
9) start network on it and update system as in steps 2)
10) reboot the 
just updated node and let it start in single user mode to test its functionality 
(without cluster enabled)
11) reboot again and let it normally join the 
cluster

Expected result: correct join of the cluster, correct?

12) 
Test a relocation of the service  ----- NOW another little downtime, but to 
be sure that in case of need we get relocation without problems

I'm going 
to test this tomorrow (here half past ten pm now) after restore of initial 
situation with both in 5.3, so if there are any comments, they are welcome.. 

Thanks
Gianluca

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster