Re: automatic membership discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mahadevan,

> Hi,
> 
> Just a thought, would it also provide the flexibility to  make this an optional feature while setting up the cluster. This feature is good, but if there is a way for me to ensure that the existing nodes do not accept unless the new nodes are present in the local Config file. In that case it would give the flexibility to system managers to choose whichever is appropriate for them

sure. Such feature (if implemented) would mean for sure to set something
like "auto_accept_node" to on and not being default.

Honza

> 
> Regards
> Nilakantan
> 
> 
> -----Original Message-----
> From: discuss-bounces@xxxxxxxxxxxx [mailto:discuss-bounces@xxxxxxxxxxxx] On Behalf Of Jan Friesse
> Sent: Thursday, June 19, 2014 7:20 PM
> To: Patrick Hemmer; discuss@xxxxxxxxxxxx
> Subject: Re:  automatic membership discovery
> 
> Patrick,
> so just to recapitulate your idea. Let's say you have cluster with 2 nodes. Now, you will decide to add third node. Your idea is about properly configure 3rd node (so if we would distribute that config file, call reload on every node, everything would work), in other words, add 3rd node ONLY to config file on 3rd node and then start corosync. Other nodes will just accept node, add it to their membership (and probably some kind of automatically generated persistent list of nodes). Do I understand it correctly?
> 
> Because if so, I believe it would mean also change config file, simply to keep them in sync. And honestly, keeping config file is for sure a way I would like to go, but that way is very hard. Every single thing must be very well defined (like what is synchronized and what is not).
> 
> Regards,
>   Honza
> 
> Patrick Hemmer napsal(a):
>> *F**rom: *Patrick Hemmer <corosync@xxxxxxxxxxxxxxx>
>> *Sent: * 2014-06-16 11:25:40 EDT
>> *To: *Jan Friesse <jfriesse@xxxxxxxxxx>, discuss@xxxxxxxxxxxx
>> *Subject: *Re:  automatic membership discovery
>>
>>
>> On 2014/06/16 11:25, Patrick Hemmer wrote:
>>> Patrick,
>>>
>>>> I'm interested in having corosync automatically accept members into 
>>>> the cluster without manual reconfiguration. Meaning that when I 
>>>> bring a new node online, I want to configure it for the existing 
>>>> nodes, and those nodes will automatically add the new node into their nodelist.
>>>> From a purely technical standpoint, this doesn't seem like it would 
>>>> be hard to do. The only 2 things you have to do to add a node are 
>>>> add the nodelist.node.X.nodeid and ring0_addr to cmap. When the new 
>>>> node comes up, it starts sending out messages to the existing nodes. 
>>>> The ring0_addr can be discovered from the source address, and the nodeid is in the message.
>>>>
>>> I need to think about this little deeper. It sounds like it may work, 
>>> but I'm not entirely sure.
>>>
>>>> Going even further, when using the allow_downscale and 
>>>> last_man_standing features, we can automatically remove nodes from 
>>>> the cluster when they disappear. With last_man_standing, the quorum 
>>>> expected votes is automatically adjusted when a node is lost, so it 
>>>> makes no difference whether the node is offline, or removed. Then 
>>>> with the auto-join functionality, it'll automatically be added back 
>>>> in when it re-establishes communication.
>>>>
>>>> It might then even be possible to write the cmap data out to a file 
>>>> when a node joins or leaves. This way if corosync restarts, and the 
>>>> corosync.conf hasn't been updated, the nodelist can be read from 
>>>> this save. If the save is out of date, and some nodes are 
>>>> unreachable, they would simply be removed, and added when they join.
>>>> This wouldn't even have to be a part of corosync. Could have some 
>>>> external utility watch the cmap values, and take care of setting 
>>>> them when corosync is launched.
>>>>
>>>> Ultimately this allows us to have a large scale dynamically sized 
>>>> cluster without having to edit the config of every node each time a 
>>>> node joins or leaves.
>>>>
>>> Actually, this is exactly what pcs does.
>> Unfortunately pcs has lots of issues.
>>
>>  1. It assumes you will be using pacemaker as well.
>>     In some of our uses, we are using corosync without pacemaker.
>>
>>  2. It still has *lots* of bugs. Even more once you start trying to use
>>     non-fedora based distros.
>>     Some bugs have been open on the project for a year and a half.
>>
>>  3. It doesn't know the real address of its own host.
>>     What I mean is when a node is sitting behind NAT. We plan on running
>>     corosync inside a docker container, and the container goes through
>>     NAT if it needs to talk to another host. So pcs would need to know
>>     the NAT address to advertise it to the other hosts. With the method
>>     described here, that address is automatically discovered.
>>
>>  4. Doesn't handle automatic cleanup.
>>     If you remove a node, something has to go and clean that node up.
>>     Basically you would have to write a program to connect to the quorum
>>     service and monitor for nodes going down, and then remove them. But
>>     then what happens if that node was only temporarily down? Who is
>>     responsible for adding it back into the cluster? If the node that
>>     was down is responsible for adding itself back in, what if another
>>     node joined the cluster while it was down? Its list will be
>>     incomplete. You could do a few things to try and alleviate these
>>     headaches, but automatic membership just feels more like the right
>>     solution.
>>
>>  5. It doesn't allow you to adjust the config file.
>>
>>
>>
>>
>>>> This really doesn't sound like it would be hard to do. I might even 
>>>> be willing to attempt implementing it myself if this sounds like 
>>>> something that would be acceptable to merge into the code base.
>>>> Thoughts?
>>>>
>>> Yes, but question is if it is really worth of it. I mean:
>>> - With multicast you have FULLY dynamic membership
>>> - PCS is able to distribute config file so adding new node to UDPU 
>>> cluster is easy
>>>
>>> Do you see any use case where pcs or multicast doesn't work? (to 
>>> clarify. I'm not blaming your idea (actually I find it interesting) 
>>> but I'm trying to find out real killer use case for this feature 
>>> which implementation will take quite a lot time almost for sure).
>>
>> Aside from the pcs issues mentioned above, having this in corosync 
>> just feels like the right solution. No external processes involved, no 
>> additional lines of communication, real-time on-demand updating. The 
>> end goal might be able to be accomplished by modifying pcs to resolve 
>> the issues, but is that the right way? If people want to use crmsh 
>> over pcs, do they not get this functionality?
>>
>>> Regards,
>>>   Honza
>>>
>>>> -Patrick
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss@xxxxxxxxxxxx
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>
>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux