Re: Where to go with cman ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2005-08-02 at 11:28 +0100, Patrick Caulfield wrote:
> Steven Dake wrote:
> > On Mon, 2005-07-18 at 09:10 +0100, Patrick Caulfield wrote:
> > 
> >>As I see it there are two things we can do with userland cman that's current in
> >>the head of CVS:
> >>
> >>1. Leave it as it is - a port of the kernel one. This has some benefits: it's
> >>easy (plus a few bug fixes that need to go in), it's protocol-compatible with
> >>the kernel one. There are a small number of extra features that could go in
> >>there (that would, annoyingly, break that compatibility) but nothing really
> >>serious. It doesn't give us anything new, but what new is neeed ?
> >>
> >>2. Migrate it to something much more sophisticated. I've mentioned Virtual
> >>Synchrony a few times before and I've been looking into this in some detail
> >>since. The benefits are largely internal but they do provide a reliable, robust
> >>and well-performing messaging system that other cluster subsystems can use.
> >>While the application programmers at the cluster summit maintained they had no
> >>use for a cluster messaging system, I still believe that it is a useful thing to
> >>have at a lower level - if only for our own programming needs. I know that Jon
> >>looked into the existing cman messaging system before rejecting it as too slow
> >>and unreliable for he needs of the cluster mirroring code.
> >>
> >>There are two suboptions here.
> >>  a) write it ourself. Quite a big job this. Bigger than I would like. To be
> >>honest I did make a start at this and now realise just what a huge job it is to
> >>get something that both performs well and is reliable. REALLY reliable. even
> >>worse if the academics want something provably reliable.
> >>   b) adopt something else. The obvious candidate here is the openAIS code[1].
> >>This looks to be quite mature now and has all the features we need of a low
> >>level messaging system. It's very nicely abstracted out so we can pick out just
> >>the bits we need without having the whole (rather heavyweight) system on top of it.
> >>
> >>The one problem with the openAIS code is that it doesn't support IPv6, and much
> >>of the code is tied to IPv4. Having had a look at it and emailed Steven Dake
> >>about this he reckons it's about 2 weeks work to add.[2]
> >>
> >>The advantages of doing this are several.
> >>- It saves time. We get something that is known to work, even though it needs
> >>extra features added for our own use.
> >>- we're not inventing something new that already exists in several other places.
> >>- we get more people who know the code. Currently only I know the internals of
> >>cman as it stands and it's quite scary code that people don't want to get
> >>involved with (we've have several DLM patches in the past, but no CMAN ones).
> >>This way we get at least 2 (Steven and me) as well as anyone else who is
> >>following openAIS. Of course there will be CMAN-specific stuff on top of their
> >>comms layer to make it quorum-based and capable of supporting GFS and DLM that
> > 
> > 
> > sorry my response is so late I missed this mail while at OLS.
> > 
> > The quorum problem is commonly referred to in the literature as a
> > "virtual synchrony filter".  I'd love to have some implementations of
> > virtual synchrony filters that exist within libtotem itself..
> > Definately an area of interest for openais as we need some services to
> > operate only in one partition (like the amf).
> > 
> > 
> >>will be Red Hat specific but these are not going to be large.
> >>- the APIs are all open (based on SAforum specifications) and already
> >>implemented. Although adding saCLM to CMAN is pretty easy as I proved last week.
> >>
> > 
> > 
> >>The disadvantages are
> >>- Need to learn the internals of someone else's code.
> > 
> > 
> > indeed this part is somewhat painful :(
> > 
> > 
> >>- We don't have full control over the code. Although we can obviously fork it if
> >>we feel the need it would, obviously be preferable not to.
> > 
> > 
> > My view is that open source influence is dictated by level of
> > contribution just like any kind of community.  ie: the more a person
> > contributes the more influence they can exert over a project or
> > direction.  Even as maintainer I don't have full control over the
> > openais code as the community really decides where we go and what work
> > we do.
> > 
> > My point here is that if you are willing to fork, then you probably have
> > some time to maintain the code..  which is better spent influencing the
> > current openais tree :)
> > 
> > 
> >>- non-compatibility with "old" cman, making rolling upgrades har or even
> >>impossible. I'm not sure what to do about this yet, but it's worth pointing out
> >>that the DLM has a new line-protocol too.
> > 
> > 
> > yes upgrades are a real pain.  We have not fully tackled this problem in
> > the openais project yet, because we havn't released a stable version.
> > Ideally we would like two versions (older, newer) to interoperate, even
> > if that means uglifying the implementation to coexist with two line
> > types.  We have some work in place to address this problem but before
> > our first production release I'm planning to really think through
> > interoperability with new implementations for features of the totem
> > protocol (like redundant ring, multi ring gateway (for local area
> > networks), group key generation, multi-ring-bridged (for wide area
> > networks), etc).
> > 
> > 
> >>- openAIS is BSD licensed, I don't think this is a problem but it probably needs
> >>checking.
> >>
> > 
> > 
> > Originally I had planned to use spread for openais, but the license was
> > not compatible with the lawyers "approved list".  So we had to implement
> > a protocol completely from scratch because of the license issue which
> > took about 1.5 years of work (sigh).  I wanted to be sure other projects
> > could reuse the totem code so chose the most liberal license I could
> > find.
> > 
> > 
> >>In short, I'm advocating adopting the openAIS core (libtotem basically) as
> >>CMAN's communications/membership protocol. If we're going to do a "CMAN V2" that
> >>has anything significant over V1 then re-inventing it is going to be a huge
> >>amount of work that someone else has already done.
> >>
> >>Comments?
> >>
> > 
> > 
> > sounds good Patrick  if you need any help from us let us know
> > 
> 
> Thanks for that Steven. I'm going to make a start on this when I get back from
> UKUUG next week. I've managed to knock up something that looks like cman from
> the outside but uses libtotem for it's comms layer so it's looking good. On
> other thing I need to look into (apart from IPv6) is multi-home. cman had a
> (primitive) failover system but it's not currently in use by anyone because DLM
> doesn't support it but I think it's something we need to provide at some stage.
> 
> Don't worry about the mention of a fork - the chances of it happening are almost
> nil!

Thats great news Patrick.  One thing you should be aware of is that I
have changed some of the internal interfaces in preparation for others
to use libtotem to be extremely more sanitary.  Unfortunately I may have
done this a little too late in your case..  But I think you will find
things are a little better.  It really only effects totempg_initialize.
Also libtotem was renamed to libtotem_pg because of requests from Daniel
about a name-space collision with some movie player in fc4.

For multihoming, I want to support the totem redundant ring protocol in
the totem code.  This is an extension of totemsrp to support multiple
nics per processor.  Then data is either actively or passively
replicated over multiple links.  There is essentially no failover and
multiple links can offer better performance and still operate properly
when one entire network fails.  It looks pretty simple to implement.
The paper is at:

http://www.rcsc.de/pdf/icdcs02.pdf

regards
-steve

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux