Re: [RFC] quorum module configuration bits

Andrew Beekhof <andrew@xxxxxxxxxxx> · Mon, 16 Jan 2012 15:27:53 +1100

On Fri, Jan 13, 2012 at 3:05 AM, David Teigland <teigland@xxxxxxxxxx> wrote:
> On Thu, Jan 12, 2012 at 11:10:44AM +1100, Andrew Beekhof wrote:
>> On Thu, Jan 12, 2012 at 3:21 AM, David Teigland <teigland@xxxxxxxxxx> wrote:
>> >> I much prefer the expected_votes field to any enumeration of the nodes.
>> >
>> > Expect admins to keep track of what expected_votes should be?
>>
>> No. Have corosync do it automatically.
>> The quorum code knows how many nodes it has seen and how many votes they had.
>> expected_votes is already updated internally if the current number is
>> greater than what was configured, I'm only suggesting that it also be
>> recorded on disk.
>
> It does get close, but has gaps.  Some sort of recording like this may be
> beneficial regardless; I'm not saying it's a bad idea by any means.
> My argument is mainly: a node list is a very good thing on its own because
> it makes administration sane, *and* it happens to completely solve the
> expected_votes problem.  Both of those together make a node list a
> no-brainer to me (and I suspect most users).
>
> Some possible gaps:
>
> - Nodes start up with no value (the first time, or after the saved value
> is lost.)

The "first time" behaviour isn't really a gap, its intentional.
If the saved value is lost, then presumably so is the rest of corosync.conf

> Other options to deal with this are limited, and will have
> other problems.
>
> - I believe the way that expected_votes is updated is that it just doesn't
> automatically decrease.  This means that all nodes need to be members at
> once before EV will reach the correct value.  i.e. if you add a new node,
> but another node is not a member, EV will not be updated.

Right.
Its not intended to be a silver bullet, there are still cases the
admin would need to get involved.

The value I see is in simplifying (to zero) the effort of some common
cases, without negatively impacting the rest.

> - I can see these saved EV values becoming inconsistent, with few ways to
> reconcile/fix them.
>
>> > The alternative would be trying to remember what they all are?
>>
>> Why do you need to if expected votes is set correctly?
>>
>> > Defining the set of nodes that compose the cluster seems like
>> > a very good thing just for its own sake.
>
> You really want an authoritative list of nodes that compose the cluster
> just for the sanity of administration.  Say you go on vacation, come back,
> and don't remember all the machines that are supposed to be in the
> cluster.

Companies shut down their clusters because an admin went on vacation?
Sounds more like a developer problem to me.

>  Or say a new admin replaces you and doesn't know all the
> machines you set up to be in the cluster.  How do you go about figuring
> out what they all are?  There's no list of them, they may very well not
> all be online.  Some may be powered down, some may have been plugged into
> the wrong switch... you're helpless.  Say you think you've found them all,
> but one day an old machine is powered back up and suddenly you have this
> old rogue machine disrupting your cluster that you'd forgotten about.

So in this scenario, a cluster node dies and is left failed/offline
for long enough for the original admin to forget about it or get
marched out the door by the police after all relevant logs had been
purged.
Thats one hell of an edge case.

>
> Or, say you want to write a script to do something on all the cluster
> machines, or just to start the cluster on all the nodes.  Where does your
> script get a list of nodes to iterate through?  (Note, this is the basis
> of the QE tests.)

Well QE will be using Pacemaker, so this wont be an issue for them.
Although I doubt they'd even use our tools since their scripts already
need the full list in order to provision the cluster and config in the
first place.

>
>> On the otherhand, I'd argue that forcing people to run
>> corosync-quorumtool and to then re-add the same information to the
>> config with an editor, on every existing node, when adding a new
>> member is inherently error prone.
>
> I don't understand this.  First, quorumtool won't give you a list of all
> the nodes, since all nodes is not equal to all members.

I meant all the active members, since they need to be taught about the new list.
I assume you're not suggesting they get shut down.

> Second, adding a
> node should be trivial: add the new node to existing corosync.conf, scp
> corosync.conf to all the nodes.

Again, don't forget the updating the running corosync processes.
Thats still 2 operations per node added.

And if there are any offline nodes at the time, you'll have to make
sure to update them before they come back online, possibly a number of
years later :-)
Otherwise I imagine corosync will boot out of the cluster any nodes
that were added since.
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss