On Mon, 2004-09-20 at 05:25, Patrick Caulfield wrote: > At the cluster summit most people seedm to agree that we needed a generic, > pluggable kernel API for cluster functions. Well, I've finally got round to > doing something. > > The attached spec allows for plug-in cluster modules with the possibility of > a node being a member of multiple clusters if the cluster managers allow it. > I've seperated out the functions of "cluster manager" so they can be provided by > different components if necessary. > > Two things that are not complete (or even started) in here are a communications > API and a locking API. > > For the first, I'd like to leave that to those more qualified than me to do and > for the second I'd like to (less modestly) propose our existing DLM API with the > argument that it is a full-featured API that others can implement parts of if > necessary. > > Comments please. Patrick, I read over your api and have a few comments. Simple stuff first. The membership_node looks very similar to the SAF interfaces, so I assume they fields mean the same. mn_member is 32bits but it just specifies if this node is a member (1) or not (0), right? The mni_viewnumber is 32 bits, in SAF it is 64bits. Might want it to be 64bits. (I think nodeid should be 64bits, but SAF has it as 32bits, so I guess it is ok). What is mni_context? I bit more description of these fields would be nice -- don't have to be as verbose as SAF :) In membership_ops, you have start_notify and notify_stop -- might want to be consistent with the naming (either notify_start or stop_notify). Now the more complicated stuff: I think we need more information on how this api works and a description of how the calls are used. cm_attach() is used to attach to a particular cluster provider that has been registered. Who calls cm_attach()? I assume whoever calls cm_attach() will then be calling the ops functions. What is cmprivate in start_notify? Once start_notify is called the CM module will call the callback function whenever there is a change until notify_stop is called? The membership_callback_routine only has "context" and "reason". Again, what is context? What is reason? How is the data returned? I'm guessing a struct membership_notify_info is filled in at from the buffer passed in from start_notify, Is that right? A bit more description here would be good. What is the difference between get_quorate() and get_info() which returns a struct quorum_info with qi_quorum? Should get_quorate() and get_info() take a viewnumber so we can match up the list of member and whether it had quorum? (it could have changed after the callback with membership before we call get_quorum.) Thanks, Daniel