Re: automatic membership discovery

Jan Friesse <jfriesse@xxxxxxxxxx> · Mon, 16 Jun 2014 17:52:04 +0200

Patrick,
thanks for pointing some of problems of PCS. I'm CC'ing pcs developers
just to keep them in loop (and maybe they know solution).

Regards,
  Honza

> *F**rom: *Patrick Hemmer <corosync@xxxxxxxxxxxxxxx>
> *Sent: * 2014-06-16 11:25:40 EDT
> *To: *Jan Friesse <jfriesse@xxxxxxxxxx>, discuss@xxxxxxxxxxxx
> *Subject: *Re:  automatic membership discovery
> 
> 
> On 2014/06/16 11:25, Patrick Hemmer wrote:
>> Patrick,
>>
>>> I'm interested in having corosync automatically accept members into the
>>> cluster without manual reconfiguration. Meaning that when I bring a new
>>> node online, I want to configure it for the existing nodes, and those
>>> nodes will automatically add the new node into their nodelist.
>>> From a purely technical standpoint, this doesn't seem like it would be
>>> hard to do. The only 2 things you have to do to add a node are add the
>>> nodelist.node.X.nodeid and ring0_addr to cmap. When the new node comes
>>> up, it starts sending out messages to the existing nodes. The ring0_addr
>>> can be discovered from the source address, and the nodeid is in the message.
>>>
>> I need to think about this little deeper. It sounds like it may work,
>> but I'm not entirely sure.
>>
>>> Going even further, when using the allow_downscale and last_man_standing
>>> features, we can automatically remove nodes from the cluster when they
>>> disappear. With last_man_standing, the quorum expected votes is
>>> automatically adjusted when a node is lost, so it makes no difference
>>> whether the node is offline, or removed. Then with the auto-join
>>> functionality, it'll automatically be added back in when it
>>> re-establishes communication.
>>>
>>> It might then even be possible to write the cmap data out to a file when
>>> a node joins or leaves. This way if corosync restarts, and the
>>> corosync.conf hasn't been updated, the nodelist can be read from this
>>> save. If the save is out of date, and some nodes are unreachable, they
>>> would simply be removed, and added when they join.
>>> This wouldn't even have to be a part of corosync. Could have some
>>> external utility watch the cmap values, and take care of setting them
>>> when corosync is launched.
>>>
>>> Ultimately this allows us to have a large scale dynamically sized
>>> cluster without having to edit the config of every node each time a node
>>> joins or leaves.
>>>
>> Actually, this is exactly what pcs does.
> Unfortunately pcs has lots of issues.
> 
>  1. It assumes you will be using pacemaker as well.
>     In some of our uses, we are using corosync without pacemaker.
> 
>  2. It still has *lots* of bugs. Even more once you start trying to use
>     non-fedora based distros.
>     Some bugs have been open on the project for a year and a half.
> 
>  3. It doesn't know the real address of its own host.
>     What I mean is when a node is sitting behind NAT. We plan on running
>     corosync inside a docker container, and the container goes through
>     NAT if it needs to talk to another host. So pcs would need to know
>     the NAT address to advertise it to the other hosts. With the method
>     described here, that address is automatically discovered.
> 
>  4. Doesn't handle automatic cleanup.
>     If you remove a node, something has to go and clean that node up.
>     Basically you would have to write a program to connect to the quorum
>     service and monitor for nodes going down, and then remove them. But
>     then what happens if that node was only temporarily down? Who is
>     responsible for adding it back into the cluster? If the node that
>     was down is responsible for adding itself back in, what if another
>     node joined the cluster while it was down? Its list will be
>     incomplete. You could do a few things to try and alleviate these
>     headaches, but automatic membership just feels more like the right
>     solution.
> 
>  5. It doesn't allow you to adjust the config file.
> 
> 
> 
> 
>>> This really doesn't sound like it would be hard to do. I might even be
>>> willing to attempt implementing it myself if this sounds like something
>>> that would be acceptable to merge into the code base.
>>> Thoughts?
>>>
>> Yes, but question is if it is really worth of it. I mean:
>> - With multicast you have FULLY dynamic membership
>> - PCS is able to distribute config file so adding new node to UDPU
>> cluster is easy
>>
>> Do you see any use case where pcs or multicast doesn't work? (to
>> clarify. I'm not blaming your idea (actually I find it interesting) but
>> I'm trying to find out real killer use case for this feature which
>> implementation will take quite a lot time almost for sure).
> 
> Aside from the pcs issues mentioned above, having this in corosync just
> feels like the right solution. No external processes involved, no
> additional lines of communication, real-time on-demand updating. The end
> goal might be able to be accomplished by modifying pcs to resolve the
> issues, but is that the right way? If people want to use crmsh over pcs,
> do they not get this functionality?
> 
>> Regards,
>>   Honza
>>
>>> -Patrick
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss