As I see it there are two things we can do with userland cman that's current in the head of CVS: 1. Leave it as it is - a port of the kernel one. This has some benefits: it's easy (plus a few bug fixes that need to go in), it's protocol-compatible with the kernel one. There are a small number of extra features that could go in there (that would, annoyingly, break that compatibility) but nothing really serious. It doesn't give us anything new, but what new is neeed ? 2. Migrate it to something much more sophisticated. I've mentioned Virtual Synchrony a few times before and I've been looking into this in some detail since. The benefits are largely internal but they do provide a reliable, robust and well-performing messaging system that other cluster subsystems can use. While the application programmers at the cluster summit maintained they had no use for a cluster messaging system, I still believe that it is a useful thing to have at a lower level - if only for our own programming needs. I know that Jon looked into the existing cman messaging system before rejecting it as too slow and unreliable for he needs of the cluster mirroring code. There are two suboptions here. a) write it ourself. Quite a big job this. Bigger than I would like. To be honest I did make a start at this and now realise just what a huge job it is to get something that both performs well and is reliable. REALLY reliable. even worse if the academics want something provably reliable. b) adopt something else. The obvious candidate here is the openAIS code[1]. This looks to be quite mature now and has all the features we need of a low level messaging system. It's very nicely abstracted out so we can pick out just the bits we need without having the whole (rather heavyweight) system on top of it. The one problem with the openAIS code is that it doesn't support IPv6, and much of the code is tied to IPv4. Having had a look at it and emailed Steven Dake about this he reckons it's about 2 weeks work to add.[2] The advantages of doing this are several. - It saves time. We get something that is known to work, even though it needs extra features added for our own use. - we're not inventing something new that already exists in several other places. - we get more people who know the code. Currently only I know the internals of cman as it stands and it's quite scary code that people don't want to get involved with (we've have several DLM patches in the past, but no CMAN ones). This way we get at least 2 (Steven and me) as well as anyone else who is following openAIS. Of course there will be CMAN-specific stuff on top of their comms layer to make it quorum-based and capable of supporting GFS and DLM that will be Red Hat specific but these are not going to be large. - the APIs are all open (based on SAforum specifications) and already implemented. Although adding saCLM to CMAN is pretty easy as I proved last week. The disadvantages are - Need to learn the internals of someone else's code. - We don't have full control over the code. Although we can obviously fork it if we feel the need it would, obviously be preferable not to. - non-compatibility with "old" cman, making rolling upgrades har or even impossible. I'm not sure what to do about this yet, but it's worth pointing out that the DLM has a new line-protocol too. - openAIS is BSD licensed, I don't think this is a problem but it probably needs checking. In short, I'm advocating adopting the openAIS core (libtotem basically) as CMAN's communications/membership protocol. If we're going to do a "CMAN V2" that has anything significant over V1 then re-inventing it is going to be a huge amount of work that someone else has already done. Comments? Patrick -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster