Re: RFC: Extending corosync to high node counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/03/15 01:24, Steven Dake wrote:
> I think if you dont care about performance, you can have a daemon
> process (second process) connect as a cpg service and maintain an
> overlay network on top of CPG.  Then many other external endpoints could
> connect to this server over TCP.

That's an interesting idea that I quite like. And it might be nice and
easy to get a proof-of-concept up and running.

It would probably require a different API to the normal corosync one
(I'm not sure that emulating libcpg etc for a different daemon would be
sensible).

How does that sound to the Pacemaker team?

Chrissie


> The problem with totem re scaling isn't virtual synchrony btw, it is the
> membership protocol which creates a fully meshed network.  Membership
> protocols that maintain a mesh membership are expensive to setup but
> cheap to maintain (regarding the network protocol activity)





> I'm happy to see people are thinking about how to make corosync scale
> past the historical ~30 node limit that seems to come in practice.
> 
> Regards
> -steve
> .
> 
> On Mon, Mar 23, 2015 at 2:09 AM, Christine Caulfield
> <ccaulfie@xxxxxxxxxx <mailto:ccaulfie@xxxxxxxxxx>> wrote:
> 
>     On 23/03/15 03:11, Andrew Beekhof wrote:
>     >
>     >> On 19 Mar 2015, at 9:05 pm, Christine Caulfield
>     <ccaulfie@xxxxxxxxxx <mailto:ccaulfie@xxxxxxxxxx>> wrote:
>     >>
>     >> Extending corosync
>     >> ------------------
>     >>
>     >> This is an idea that came out of several discussions at the cluster
>     >> summit in February. Please comment !
>     >>
>     >> It is not meant to be a generalised solution to extending
>     corosync for
>     >> most users. For single & double digit cluster sizes the current ring
>     >> protocols should be sufficient. This is intended to make corosync
>     usable
>     >> over much larger node counts.
>     >>
>     >> The problem
>     >> -----------
>     >> Corosync doesn't scale well to large numbers of nodes (60-100 to
>     1000s)
>     >> This is mainly down to the requirements of virtual synchrony(VS)
>     and the
>     >> ring protocol.
>     >>
>     >> A proposed solution
>     >> -------------------
>     >> Have 'satellite' nodes that are not part of the ring (and do not not
>     >> participate in VS).
>     >> They communicate via a single 'host' node over (possibly) TCP.
>     The host
>     >> sends the messages
>     >> to them in a 'send and forget' system - though TCP guaratees ordering
>     >> and delivery.
>     >> Host nodes can support many satellites. If a host goes down the
>     >> satellites can reconnect to
>     >> another node and carry on.
>     >>
>     >> Satellites have no votes, and do not participate in Virtual
>     Synchrony.
>     >>
>     >> Satellites can send/receive CPG messages and get quorum
>     information but
>     >> will not appear in
>     >> the quorum nodes list.
>     >>
>     >> There must be a separate nodes list for satellites, probably
>     maintained
>     >> by a different subsystem.
>     >> Satellites will have nodeIDs (required for CPG) that do not clash
>     with
>     >> the ring nodeids.
>     >>
>     >>
>     >> Appearance to the user/admin
>     >> ----------------------------
>     >> corosync.conf defines which nodes are satellites and which nodes to
>     >> connect to (initially). May
>     >> want some utility to force satellites to migrate from a node if
>     it gets
>     >> full.
>     >>
>     >> Future: Automatic configuration of who is in the VS cluster and
>     who is a
>     >> satellite. Load balancing.
>     >>        Maybe need 'preferred nodes' to avoid bad network topologies
>     >>
>     >>
>     >> Potential problems
>     >> ------------------
>     >> corosync uses a packet-based protocol, TCP is a stream (I don't
>     see this
>     >> as a big problem, TBH)
>     >> Where to hook the message transmission in the corosync networking
>     stack?
>     >>  - We don't need a lot of the totem messages
>     >>  - maybe hook into group 'a' and/or 'sync'(do we need 'sync' in
>     >> satellites [CPG, so probably yes]?)
>     >> Which is client/server? (if satellites are client with authkey we get
>     >> easy failover and config, but ... DOS potential??)
>     >> What if tcp buffers get full? Suggest just cutting off the node.
>     >> How to stop satellites from running totemsrp?
>     >> Fencing, do we need it? (pacemaker problem?)
>     >
>     > That has traditionally been the model and it still seems appropriate.
>     > However Darren raises an interesting point... how will satellites
>     know which is the "correct" partition to connect to?
>     >
>     > What would it look like if we flipped it around and had the full
>     peers connecting to the satellites?
>     > You could then tie that to having quorum. You also know that a
>     fenced full peer wont have any connections.
>     > Safety on two levels.
>     >
> 
>     I think this is a better, if slightly more complex model to implement,
>     yes.  It also avoids the potential DoS of satellites trying to contact
>     central cluster nodes repeatedly.
> 
>     Chrissie
>     _______________________________________________
>     discuss mailing list
>     discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>     http://lists.corosync.org/mailman/listinfo/discuss
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux