Re: automatic membership discovery

Vladislav Bogdanov <bubble@xxxxxxxxxxxxx> · Mon, 30 Jun 2014 11:34:50 +0300

19.06.2014 16:50, Jan Friesse wrote:
> Patrick,
> so just to recapitulate your idea. Let's say you have cluster with 2
> nodes. Now, you will decide to add third node. Your idea is about
> properly configure 3rd node (so if we would distribute that config file,
> call reload on every node, everything would work), in other words, add
> 3rd node ONLY to config file on 3rd node and then start corosync. Other
> nodes will just accept node, add it to their membership (and probably
> some kind of automatically generated persistent list of nodes). Do I
> understand it correctly?
> 
> Because if so, I believe it would mean also change config file, simply
> to keep them in sync. And honestly, keeping config file is for sure a
> way I would like to go, but that way is very hard. Every single thing
> must be very well defined (like what is synchronized and what is not).
> 

There was probably relevant discussion (part of it) started by a message
"[RFC] quorum module configuration bits" with id
<4F0C0E26.8030300@xxxxxxxxxx> from Fabio, which is unfortunately not
available on the web archives.
http://lists.corosync.org/pipermail/discuss/2012-February has some
further discussions. I still have full thread available and can resend
it to whose are interested.

Best,
Vladislav

> Regards,
>   Honza
> 
> Patrick Hemmer napsal(a):
>> *F**rom: *Patrick Hemmer <corosync@xxxxxxxxxxxxxxx>
>> *Sent: * 2014-06-16 11:25:40 EDT
>> *To: *Jan Friesse <jfriesse@xxxxxxxxxx>, discuss@xxxxxxxxxxxx
>> *Subject: *Re:  automatic membership discovery
>>
>>
>> On 2014/06/16 11:25, Patrick Hemmer wrote:
>>> Patrick,
>>>
>>>> I'm interested in having corosync automatically accept members into the
>>>> cluster without manual reconfiguration. Meaning that when I bring a new
>>>> node online, I want to configure it for the existing nodes, and those
>>>> nodes will automatically add the new node into their nodelist.
>>>> From a purely technical standpoint, this doesn't seem like it would be
>>>> hard to do. The only 2 things you have to do to add a node are add the
>>>> nodelist.node.X.nodeid and ring0_addr to cmap. When the new node comes
>>>> up, it starts sending out messages to the existing nodes. The ring0_addr
>>>> can be discovered from the source address, and the nodeid is in the message.
>>>>
>>> I need to think about this little deeper. It sounds like it may work,
>>> but I'm not entirely sure.
>>>
>>>> Going even further, when using the allow_downscale and last_man_standing
>>>> features, we can automatically remove nodes from the cluster when they
>>>> disappear. With last_man_standing, the quorum expected votes is
>>>> automatically adjusted when a node is lost, so it makes no difference
>>>> whether the node is offline, or removed. Then with the auto-join
>>>> functionality, it'll automatically be added back in when it
>>>> re-establishes communication.
>>>>
>>>> It might then even be possible to write the cmap data out to a file when
>>>> a node joins or leaves. This way if corosync restarts, and the
>>>> corosync.conf hasn't been updated, the nodelist can be read from this
>>>> save. If the save is out of date, and some nodes are unreachable, they
>>>> would simply be removed, and added when they join.
>>>> This wouldn't even have to be a part of corosync. Could have some
>>>> external utility watch the cmap values, and take care of setting them
>>>> when corosync is launched.
>>>>
>>>> Ultimately this allows us to have a large scale dynamically sized
>>>> cluster without having to edit the config of every node each time a node
>>>> joins or leaves.
>>>>
>>> Actually, this is exactly what pcs does.
>> Unfortunately pcs has lots of issues.
>>
>>  1. It assumes you will be using pacemaker as well.
>>     In some of our uses, we are using corosync without pacemaker.
>>
>>  2. It still has *lots* of bugs. Even more once you start trying to use
>>     non-fedora based distros.
>>     Some bugs have been open on the project for a year and a half.
>>
>>  3. It doesn't know the real address of its own host.
>>     What I mean is when a node is sitting behind NAT. We plan on running
>>     corosync inside a docker container, and the container goes through
>>     NAT if it needs to talk to another host. So pcs would need to know
>>     the NAT address to advertise it to the other hosts. With the method
>>     described here, that address is automatically discovered.
>>
>>  4. Doesn't handle automatic cleanup.
>>     If you remove a node, something has to go and clean that node up.
>>     Basically you would have to write a program to connect to the quorum
>>     service and monitor for nodes going down, and then remove them. But
>>     then what happens if that node was only temporarily down? Who is
>>     responsible for adding it back into the cluster? If the node that
>>     was down is responsible for adding itself back in, what if another
>>     node joined the cluster while it was down? Its list will be
>>     incomplete. You could do a few things to try and alleviate these
>>     headaches, but automatic membership just feels more like the right
>>     solution.
>>
>>  5. It doesn't allow you to adjust the config file.
>>
>>
>>
>>
>>>> This really doesn't sound like it would be hard to do. I might even be
>>>> willing to attempt implementing it myself if this sounds like something
>>>> that would be acceptable to merge into the code base.
>>>> Thoughts?
>>>>
>>> Yes, but question is if it is really worth of it. I mean:
>>> - With multicast you have FULLY dynamic membership
>>> - PCS is able to distribute config file so adding new node to UDPU
>>> cluster is easy
>>>
>>> Do you see any use case where pcs or multicast doesn't work? (to
>>> clarify. I'm not blaming your idea (actually I find it interesting) but
>>> I'm trying to find out real killer use case for this feature which
>>> implementation will take quite a lot time almost for sure).
>>
>> Aside from the pcs issues mentioned above, having this in corosync just
>> feels like the right solution. No external processes involved, no
>> additional lines of communication, real-time on-demand updating. The end
>> goal might be able to be accomplished by modifying pcs to resolve the
>> issues, but is that the right way? If people want to use crmsh over pcs,
>> do they not get this functionality?
>>
>>> Regards,
>>>   Honza
>>>
>>>> -Patrick
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss@xxxxxxxxxxxx
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>
>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss