User Resource Manager Operational Specification, proposed The User Resource Manager (formerly Service Manager) is the part of Red Hat Cluster Suite which manages resources and groups which implement a user's clustered services. The user resource manager only allows resource groups to operate when it is running on a quorate member of the cluster. This means that all resource groups are immediately stopped when a member is no longer quorate. Typically, the member is also fenced (note: it may not stop all resource groups prior to being fenced; it certainly tries to). (Incomplete.) Failover Domains, proposed See http://people.redhat.com/lhh/fd.html for information on how clumanager 1.2 failover domains operate. The configuration format will have to change slightly, but the operational characteristics need not. Additionally, we might want to add "Relocate to most-preferred member" option to prevent unwanted service transitions in ordered failover domains. (Since failover domains handle multiple cluster members, it is actually not the same as clumanager 1.0's "Relocate on preferred node boot" option.) User Resource Structure, proposed <resources> <script name="Oracle Script" <-- Unique name across scripts file="/path/to/script"/> <-- Path to script <script name="Apache Script" <-- Unique name across scripts file="/etc/init.d/httpd"/> <-- Path to script <mount name="Oracle Data" <-- Unique name across mounts fstype="gfs" <-- Can share these. Can mount these multiple times options="" <-- Defaults ok device="/dev/sdc1" <-- Could be LV, GNBD, etc. force_umount="n" mountpoint="/mnt/oracle_data"/> <-- Mount point <mount name="Web Data" <-- Unique name across mounts fstype="nfs" <-- Can't share these! options="" <-- Mount options source="server:/webdata" <-- Server/Path specification mountpoint="/mnt/web_data"/> <-- Mount point <mount name="NFS Home" <-- Unique name across mounts fstype="ext3" <-- You can share these options="ro" <-- Mount options device="/dev/sdc2" <-- Could be LV, GNBD, etc. force_fsck="" <-- Force fsck on journalled fs force_unmount="y" mountpoint="/mnt/nfs"/> <-- Mount point <client name="Joe's machine" <-- Name unique across clients type="nfs" <-- Only NFS support for now target="joe.boston.redhat.com" <-- wildcards & netgroups too! options="ro"/> <client name="Admin's machine" <-- Name unique across clients type="nfs" <-- Only NFS support for now target="bob.boston.redhat.com" <-- wildcards & netgroups too! options="rw"/> <ip address="172.31.31.2"/> <-- Address is unique <ip address="172.31.31.4"/> <-- Address is unique <ip address=":ffff::172.31.31.3"/> <-- Address is unique (Watch for ip6/ip4 collisions) ip6 == new feature! <!-- Web & Oracle Service --> <group name="Oracle/Web"> <-- Unique name across groups <script ref="Oracle Script"/> <-- Note, multiple scripts <script ref="Apache Script"/> (New Feature) <ip ref="172.31.31.2"/> <-- Not sure if feasible. <ip ref=":ffff::172.31.31.3"/> <mount ref="Oracle Data"> <!-- Exports are service specific --> <export type="nfs" path=""/> <-- If empty string, refers to parent's mountpoint <client ref="Joe's machine"/> <-- Joe can mount this. </export> <export type="samba"/> <-- no change from 1.2 for now </mount> </group> <!-- Home directory service */ <group name="Homedirs"/> <ip ref="172.31.31.4"/> <mount ref="NFS Home"> <export type="nfs" path=""/> <client ref="Joe's machine"/> <client ref="Admin's machine"/> </export> </mount> </group> </resources> =============================================== Rules concerning individual resource behavior =============================================== <mount> resource: - When fstype is 'gfs' or 'nfs', the mount may be defined as parts of multiple resource groups. If this is the case, the force_umount option is ignored. - When fstype is not 'gfs' nor 'nfs', the mount may only be defined as part of one resource group. - When fstype is 'nfs' or 'gfs', force_fsck is ignored. - When fstype is 'nfs', force_umount is ignored. - When fstype is 'ext3', 'jfs', 'reiserfs', or 'xfs', the file system is only fsck'd if the force_fsck option is turned on. - When fstype is 'ext2', force_fsck is ignored and the file system is always checked on failover or relocation. <ip> resource: - An IP resource may only be part of one resource group. If it is defined in multiple resource groups, the first resource group to start will have the IP address, and the second resource group will fail to start. - IPv6 address corresponding to an IPv4 address specified in another <ip> resource is not allowed. <script> resource: - These may be a member of multiple resource groups, but beware the cost of doing so - the cluster makes no assumptions with respect to data being available to scripts. Because of this, scripts depend on all other pieces of a resource group to be running. <client> resource: - These may be a member of any number of <exports> in any number of resource groups. ====================== Dependency Structure ====================== group__________ / \ \ / \ \ ip mount ...group... / \/ \ / /\ \ / / \ \ script export \ \ client Wherever a leaf node exists, you may restart that leaf node without affecting other nodes in the dependency tree. For instance, if you change the export-client options for "Joe's Machine", then the export is removed and replaced with the new options. In our example above, the "Oracle/Web" service and the "Homedirs" service both have exports with clients pointing to "Joe's Machine". Changing the option on one changes the option on both exports. You may detach leaves without affecting the rest of the resource group. That is, you may detach "Joe's Machine" without stopping the service. You may also attach leaves without affecting the rest of the resource group; so you may add "Bill's Machine" to the export without restarting the resource group. Similarly, you may add or restart a script. Whenever a node of the tree does not have anything depending on it, it is by definition a leaf node. Thus, you may add, start, or stop IP addresses providing no scripts or exports are defined for a given resource group. Rules: <group> resource: - A group may depend on any number of other groups. When a group depends on another group, the child group is started prior to any other resources of the parent group starting. Additionally, they are managed in a single start phase and are thus started on the same cluster member; so it is only a logical grouping; resource groups as dependent children must start on the same cluster member as their parents. - A group, if depended upon by another group, may not depend on another group to start. That is, A may depend on B, but if so, then B may not depend on anything. This both prevents circular dependencies and arbitrarily complex services. - A resource group fails to start if any one of its dependent children fails to start. <ip> resource: - An IP resource may not be added, removed, or changed if an export or user script is present in the resource group. If neither an export nor a user script is present in the group, it may be restarted without affecting other parts of the group. - An IP resource fails to start if any one of its dependent children fails to start <script> resource: - A script resource may be added, modified, or changed without affecting other members in the resource group. - A script resource is not started unless all mount and ip resources have started. <mount> resource: - A mount resource may not be added, removed, or changed if an export or user script is present in the resource group. If neither an export nor a user script is present in the group, it may be restarted without affecting other parts of the group. - A mount resource fails to start if any one of its dependent children fails to start. <export> resource: - Export resources are not defined outside of a resource group; they are properties of a given mount resource and defined only in the context of a resource group. - An export resource may not be added, removed, or changed if a client exists and is depending upon it. - An export resource does not fail to start unless all of its dependent children fail to start. - Export resources with type "samba" are not started unless all mount and ip resources have been started. - Export resources with type "nfs" are not started unless all mount resources have been started. <client> resource: - Client resources may be added, removed, or changed at any time without affecting the operation of any other part of the resource group. ================================== How it works - a High Level View ================================== Note - this is the same way clumanager 1.0 and 1.2 do it; the main differences are in the fact that we have the ability to start individual export clients. The intention is not to illustrate this here; I have no idea how that's going to work yet ;) BTW, nfs exports are intentionally started apart from Samba exports, as samba exports generally bind to IP addresses in the resource group (ugly), but NFS exports need no such thing and in fact, exporting after the IP address comes up causes problems with failover and/or service relocation. group_start () { for (each group) { if (group_start(group) != SUCCESS) return FAIL; } for (each mount) { if (start_mount() != SUCCESS) return FAIL; for (each export) { if (type != nfs) continue; for (each client) /* Log errors */ start_client(export_directory); } } for (each ip) { if (start_ip() != SUCCESS) return FAIL; } for (each mount) { for (each export) { if (type != samba) continue; if (start_samba(export) != SUCCESS) return FAIL; } } for (each script) { if (start_script() != SUCCESS) return FAIL; } return SUCCESS; } group_stop () { for (each script) { if (stop_script() != SUCCESS) return FAIL; } for (each ip) { if (start_ip() != SUCCESS) return FAIL; } for (each mount) { if (start_mount() != SUCCESS) return FAIL; for (each export) { if (type == nfs) { for (each client) /* Log errors */ start_client(export_directory); } else stop_samba(export); } } for (each group) { if (group_stop(group) != SUCCESS) return FAIL; } return SUCCESS; }