Re: RFC: Extending corosync to high node counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think if you dont care about performance, you can have a daemon process (second process) connect as a cpg service and maintain an overlay network on top of CPG.  Then many other external endpoints could connect to this server over TCP.

The problem with totem re scaling isn't virtual synchrony btw, it is the membership protocol which creates a fully meshed network.  Membership protocols that maintain a mesh membership are expensive to setup but cheap to maintain (regarding the network protocol activity)

I'm happy to see people are thinking about how to make corosync scale past the historical ~30 node limit that seems to come in practice.

Regards
-steve
.

On Mon, Mar 23, 2015 at 2:09 AM, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote:
On 23/03/15 03:11, Andrew Beekhof wrote:
>
>> On 19 Mar 2015, at 9:05 pm, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote:
>>
>> Extending corosync
>> ------------------
>>
>> This is an idea that came out of several discussions at the cluster
>> summit in February. Please comment !
>>
>> It is not meant to be a generalised solution to extending corosync for
>> most users. For single & double digit cluster sizes the current ring
>> protocols should be sufficient. This is intended to make corosync usable
>> over much larger node counts.
>>
>> The problem
>> -----------
>> Corosync doesn't scale well to large numbers of nodes (60-100 to 1000s)
>> This is mainly down to the requirements of virtual synchrony(VS) and the
>> ring protocol.
>>
>> A proposed solution
>> -------------------
>> Have 'satellite' nodes that are not part of the ring (and do not not
>> participate in VS).
>> They communicate via a single 'host' node over (possibly) TCP. The host
>> sends the messages
>> to them in a 'send and forget' system - though TCP guaratees ordering
>> and delivery.
>> Host nodes can support many satellites. If a host goes down the
>> satellites can reconnect to
>> another node and carry on.
>>
>> Satellites have no votes, and do not participate in Virtual Synchrony.
>>
>> Satellites can send/receive CPG messages and get quorum information but
>> will not appear in
>> the quorum nodes list.
>>
>> There must be a separate nodes list for satellites, probably maintained
>> by a different subsystem.
>> Satellites will have nodeIDs (required for CPG) that do not clash with
>> the ring nodeids.
>>
>>
>> Appearance to the user/admin
>> ----------------------------
>> corosync.conf defines which nodes are satellites and which nodes to
>> connect to (initially). May
>> want some utility to force satellites to migrate from a node if it gets
>> full.
>>
>> Future: Automatic configuration of who is in the VS cluster and who is a
>> satellite. Load balancing.
>>        Maybe need 'preferred nodes' to avoid bad network topologies
>>
>>
>> Potential problems
>> ------------------
>> corosync uses a packet-based protocol, TCP is a stream (I don't see this
>> as a big problem, TBH)
>> Where to hook the message transmission in the corosync networking stack?
>>  - We don't need a lot of the totem messages
>>  - maybe hook into group 'a' and/or 'sync'(do we need 'sync' in
>> satellites [CPG, so probably yes]?)
>> Which is client/server? (if satellites are client with authkey we get
>> easy failover and config, but ... DOS potential??)
>> What if tcp buffers get full? Suggest just cutting off the node.
>> How to stop satellites from running totemsrp?
>> Fencing, do we need it? (pacemaker problem?)
>
> That has traditionally been the model and it still seems appropriate.
> However Darren raises an interesting point... how will satellites know which is the "correct" partition to connect to?
>
> What would it look like if we flipped it around and had the full peers connecting to the satellites?
> You could then tie that to having quorum. You also know that a fenced full peer wont have any connections.
> Safety on two levels.
>

I think this is a better, if slightly more complex model to implement,
yes.  It also avoids the potential DoS of satellites trying to contact
central cluster nodes repeatedly.

Chrissie
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux