Re: How to take down a CS/GFS setup with minimumdowntime

rhurst@xxxxxxxxxxxxxxxxx · Wed, 7 Nov 2007 11:43:33 -0500

Lon, "leave remove" works as advertised, but is there a way (i.e., parameter) to do the same thing when a downed node re-joins the cluster automagically?  If I down more than one node using the default "leave remove", it decrements each instance properly, and maintains quorum.  But if I startup just one of those nodes later, the quorum count jumps all the way back to its modus operandi value in cluster.conf, and in some rare cases, I could no longer have quorum!

Example:

11 nodes (votes are 2 nodes @ 5 each, plus 9 nodes @ 1 each, expected = 19, quorum = 10)

"leave remove" 1 node @ 5 votes, quorum re-calculates as 8, total = 14

"leave remove" 1 node @ 1 vote, quorum re-calculates as 7, total = 13

"leave remove" 1 node @ 1 vote, quorum re-calculates as 7, total = 12

"leave remove" 1 node @ 1 vote, quorum re-calculates as 6, total = 11

"leave remove" 1 node @ 1 vote, quorum re-calculates as 6, total = 10

"leave remove" 1 node @ 1 vote, quorum re-calculates as 5, total = 9

"leave remove" 1 node @ 1 vote, quorum re-calculates as 5, total = 8

cman join 1 node @ 1 vote, quorum re-calculates as 10, total = 9, inquorate!!

Please advise, thanks.

On Wed, 2007-11-07 at 11:13 +0000, Sævaldur Arnar Gunnarsson wrote:

Thanks for this Lon, I'm down to the last two node members and according
to cman_tool status I have two nodes, two votes and a quorum of two.
--
Nodes: 2
Expected_votes: 5
Total_votes: 2
Quorum: 2   
--

One of those nodes has the GFS filesystems mounted.
If I issue cman_tool leave remove on the other node will I run into any
problems on the GFS mounted node ? (for example, due to quorum)

On Mon, 2007-10-29 at 10:56 -0400, Lon Hohberger wrote:

> That should do it, yes.  Leave remove is supposed to decrement the
> quorum count, meaning you can go from 5..1 nodes if done correctly.  You
> can verify that the expected votes count decreases with each removal
> using 'cman_tool status'.
> 
> 
> If for some reason the above doesn't work, the alternative looks
> something like this:
>   * unmount the GFS volume + stop cluster on all nodes
>   * use gfs_tool to alter the lock proto to nolock
>   * mount on node 1.  copy out data.  unmount!
>   * mount on node 2.  copy out data.  unmount!
>   * ...
>   * mount on node 5.  copy out data.  unmount!
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster