On 5/18/05, Daniel Phillips <phillips@xxxxxxxxxx> wrote: > Linux Cluster Summit 2005 > > June 20 and 21, Walldorf, Germany (Near Heidelberg) > > Sponsors: Red Hat, SAP AG, Oracle, HP, and Fujitsu-Siemens. > > The goal of the two-day Linux Cluster Summit workshop is to bring > together the key individuals who can realize a general purpose > clustering API for Linux, including, kernel components, userspace > libraries and internal and external interfaces. Results of this > workshop will be presented to the Kernel Summit the following month. target vision for cluster infrastructure (thoughts on reading interview with Andrew Morton in Ziff-Davis eweek) April 21, 2005, edited May 20 I was surprised to see that cluster infrastructure is still missing, yet pleased that the need for it is more widely perceived today http://www.uwsg.iu.edu/hypermail/linux/kernel/0409.0/0238.html than it was four years ago when the linux-cluster mailing list was formed. http://mail.nl.linux.org/linux-cluster/2001-02/msg00000.html although there is nothing but spam in its archive since july 2002. A quick review of more recent developments indicates that little has changed. There is no need for standardization accross cluster infrastructures at any one installation, and the sense that a discussion is over "whose version gets included" rather than "what can we add to make things easier for everyone, even if doing so will actually hurt the prospects for the clustering infrastructure I am promoting" still leads to benchmark wars whenever the subject comes up. So I gather from glancing at discussion on LKML from last september that there has been some progress but not much. Four years ago, I proposed a target vision for linux cluster kernel development which I believe still makes sense. (And now I know to call it a "target vision!") At the time, I had no good answer for why we would bother to implement support for something that nobody would immediately use, and sky-pie descriptions of wide area computing grids seemed silly. (they may still.) The vision was, that a Linux box could be in multiple clusters at once, or could, if so configured, be a "cluster router" similar to the file system shareres/retransmitters one can set up to run interference between disparate network file systems. Supporting this vision -- a box is in N clusters from M separate cluster system vendors, at the same time, and these N clusters know nothing about each other -- is in my opinion a reasonable plan of attack for selecting features to include, or interfaces to mandate conformity to in cluster system vendors, rather than getting into detailed fights about whose implementation of feature F belongs in the core distribution. In the automatic process migration model, it is easy to imagine a wide cluster where each node might represent a cluster rather than a unit, and would want to hide the details of the cluster it is representing. Four years ago, Mosix allowed pretty wide clusters containing nodes not directly reachable from each other, but node id numbers needed to be unique across the whole universe. in the "galaxy cluster" vision, a cluster can represent itself as a node, to other nodes participating in the same cluster, without revealing internal details of the galaxy (because from far enough away, a galaxy looks, at first, like a single star). The closest thing to implementing this vision that was available when I last reviewed what was available was implementing Condor to link separate Mosix clusters. I remember a few near-consensuses being reached on the linux-cluster mailing list. These included: Defining a standard interface for requesting and obtaining cluster services and enforcing compliance to it makes sense. Arguing about whose implementation of any particular clustering feature is best does not make sense. (Given free exchange of techniques and a standard interface, the in fact better techniques will gradually nudge out the in fact inferior ones with no shoving required.) A standard cluster configuration interface (CCI) defined as a fs of its own makes sense, rather that squatting within /proc the CCI can be mounted anywhere, (possibly back within /proc) so multiple clusters on the same box will not collide with each other -- each gets its own CCI, and all syscalls to cluster parts include a pointer to a cluster configuration object, of which there can be more than one defined The first order of business therefore was to take a survey of services provided by clustering infrastructures and work out standardizable interfaces to these services That's what I remember. The survey of services may or may not have been performed formally, I know that a survey of cluster services provided can be done easily -- is done often, whenever anyone tries to select what kind of cluster they want to set up. The role of the linux kernel maintainer, in the question of supporting multiple disparate cluster platforms, is NOT to choose one, but is to set up ground rules under which they can all play nice together. Just like the file systems all play nice together currently. The thought of having two different spindles each with their own file system is not something that anyone blinks at anymore, but it was once revolutionary. Having one computer participating in N clusters, at the same time, may in the future be a similar non-issue. Pertaining to the list of cluster services, here's a quick and small list of the ones that spring to my mind as being valiud for inclusion into the CCI, without doing too much additional research: services (including statistics) that cluster membership provides to processes on the node should be defined and offered through the CCI node identification in each cluster, abstracted into that cluster information about other nodes extended PID must include cluster-ID as well as node-ID when discussing PID extension mechanisms: if I am process 00F3 on my own node, I might have an extended pid of 000400F3 on a cluster in which I am node 4 and an extended pid of 001000F3 on a cluster in which I am node sixteen. the publish/subscribe model (just read that one today) is very good standardize a publish/subscribe RPC interface in terms of operations on filesystem entities within the CCI Based on discussion on the cap-talk mailing list, i'd like to suggest that publish/subscribe get implemented in terms of one-off capability tickets, but that's a perfect example of the kind of implementatin detail I'm saying we do not need to define. How a particular clustering system implements remote procedure call is not relevant to mandating a standard for how clustering systems, should their engineers choose to have their product comply with a standard, may present available RPCs in the CCI, and how processes on nodes subscribed to their clusters may call an available RPC through the CCI. The big insight that I hope you to take away from this e-mail, if you haven't had it already (I have been out of touch with the insight level of LKML for a long time) is that clustering integration into the kernel makes sense as a standards establishment and enforcement effort, rather than a technology selection and inclusion effort (at least at first -- once the CCI exists, cluster providers might all rush to provide their "CCI driver modules" and then we're back to selection and inclusion) and that is a different kind of effort. Although not a new kind. While writing that paragraph I realized that the file system interface and the entire module interface, and any other kind of plug-it-in-and-it-works-the-same interface linux supports, sound, video, et cetera -- are all standards enforcement problems rather than technology selection problems. Not recognizing that clustering is such a problem is what I believe is holding back cluster infrastructure from appearing in the kernel. So last septembers thread about message passing services, in my vision, is improper. The question is not, how do we pass messages, but, we have nodes that we know by node-id, and we have messages that we wish to pass to them, how do we provide a mechanism so that knowing only those things, node Id and message, an entity that wishes to pass a message can ask the cluster system to pass the message? Given modularization, it will then be possible to drop in and replace systems as needed or as appropriate. -- David L Nicol Director of Research and Development Mindtrust LLC, Kansas City, Missouri -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster