[Linux-cluster] Resource Structure (proposed, not complete)

Lon Hohberger <lhh@xxxxxxxxxx> · Tue, 29 Jun 2004 11:41:04 -0400

User Resource Manager Operational Specification, proposed

The User Resource Manager (formerly Service Manager) is the part of Red
Hat Cluster Suite which manages resources and groups which implement a
user's clustered services.

The user resource manager only allows resource groups to operate when
it is running on a quorate member of the cluster.  This means that all
resource groups are immediately stopped when a member is no longer
quorate.  Typically, the member is also fenced (note: it may not stop
all resource groups prior to being fenced; it certainly tries to).

(Incomplete.)

Failover Domains, proposed

See http://people.redhat.com/lhh/fd.html for information on how
clumanager 1.2 failover domains operate.  The configuration format will
have to change slightly, but the operational characteristics need not. 
Additionally, we might want to add "Relocate to most-preferred member"
option to prevent unwanted service transitions in ordered failover
domains.  (Since failover domains handle multiple cluster members, it
is actually not the same as clumanager 1.0's "Relocate on preferred
node boot" option.)

User Resource Structure, proposed

<resources>

  <script name="Oracle Script"		<-- Unique name across scripts
      file="/path/to/script"/>		<-- Path to script

  <script name="Apache Script"		<-- Unique name across scripts
      file="/etc/init.d/httpd"/>	<-- Path to script

  <mount name="Oracle Data"		<-- Unique name across mounts
      fstype="gfs"			<-- Can share these.  Can
					    mount these multiple times
      options=""			<-- Defaults ok
      device="/dev/sdc1"		<-- Could be LV, GNBD, etc.
      force_umount="n"
      mountpoint="/mnt/oracle_data"/>	<-- Mount point

  <mount name="Web Data"		<-- Unique name across mounts
      fstype="nfs"			<-- Can't share these!
      options=""			<-- Mount options
      source="server:/webdata"		<-- Server/Path specification
      mountpoint="/mnt/web_data"/>	<-- Mount point

  <mount name="NFS Home"		<-- Unique name across mounts
      fstype="ext3"			<-- You can share these
      options="ro"			<-- Mount options
      device="/dev/sdc2"		<-- Could be LV, GNBD, etc.
      force_fsck=""			<-- Force fsck on journalled fs
      force_unmount="y"
      mountpoint="/mnt/nfs"/>		<-- Mount point

  <client name="Joe's machine"		<-- Name unique across clients
     type="nfs"				<-- Only NFS support for now
     target="joe.boston.redhat.com"	<-- wildcards & netgroups too!
     options="ro"/>

  <client name="Admin's machine"	<-- Name unique across clients
     type="nfs"				<-- Only NFS support for now
     target="bob.boston.redhat.com"	<-- wildcards & netgroups too!
     options="rw"/>

  <ip address="172.31.31.2"/>		<-- Address is unique
  <ip address="172.31.31.4"/>		<-- Address is unique
  <ip address=":ffff::172.31.31.3"/>	<-- Address is unique (Watch
					    for ip6/ip4 collisions)
					    ip6 == new feature!

  <!-- Web & Oracle Service -->

  <group name="Oracle/Web">		<-- Unique name across groups

    <script ref="Oracle Script"/>	<-- Note, multiple scripts
    <script ref="Apache Script"/>	    (New Feature)

    <ip ref="172.31.31.2"/>		<-- Not sure if feasible.
    <ip ref=":ffff::172.31.31.3"/>

    <mount ref="Oracle Data">

      <!-- Exports are service specific -->

      <export type="nfs" path=""/>	<-- If empty string, refers
					    to parent's mountpoint
	<client ref="Joe's machine"/>	<-- Joe can mount this.
      </export>
      <export type="samba"/>		<-- no change from 1.2 for
					    now
    </mount>

  </group>

  <!-- Home directory service */
  <group name="Homedirs"/>

    <ip ref="172.31.31.4"/>

    <mount ref="NFS Home">
      <export type="nfs" path=""/>
        <client ref="Joe's machine"/>
        <client ref="Admin's machine"/>
      </export>
    </mount>

  </group>

</resources>

===============================================
 Rules concerning individual resource behavior
===============================================

<mount> resource:
- When fstype is 'gfs' or 'nfs', the mount may be defined as parts of
  multiple resource groups.  If this is the case, the force_umount
  option is ignored.
- When fstype is not 'gfs' nor 'nfs', the mount may only be defined
  as part of one resource group.
- When fstype is 'nfs' or 'gfs', force_fsck is ignored.
- When fstype is 'nfs', force_umount is ignored.
- When fstype is 'ext3', 'jfs', 'reiserfs', or 'xfs', the file system
  is only fsck'd if the force_fsck option is turned on.
- When fstype is 'ext2', force_fsck is ignored and the file system
  is always checked on failover or relocation.

<ip> resource:
- An IP resource may only be part of one resource group.  If it is
  defined in multiple resource groups, the first resource group to
  start will have the IP address, and the second resource group will
  fail to start.
- IPv6 address corresponding to an IPv4 address specified in another
  <ip> resource is not allowed.

<script> resource:
- These may be a member of multiple resource groups, but beware the
  cost of doing so - the cluster makes no assumptions with respect
  to data being available to scripts.  Because of this, scripts
  depend on all other pieces of a resource group to be running.

<client> resource:
- These may be a member of any number of <exports> in any number of
  resource groups.

======================
 Dependency Structure
======================

            group__________
             / \           \
            /   \           \
           ip  mount    ...group...
           / \/ \
          /  /\  \
         /  /  \  \
       script  export
                   \
                    \
                  client

Wherever a leaf node exists, you may restart that leaf node without
affecting other nodes in the dependency tree.  For instance, if you
change the export-client options for "Joe's Machine", then the 
export is removed and replaced with the new options.

In our example above, the "Oracle/Web" service and the "Homedirs"
service both have exports with clients pointing to "Joe's Machine".
Changing the option on one changes the option on both exports.

You may detach leaves without affecting the rest of the resource
group.  That is, you may detach "Joe's Machine" without stopping
the service.  You may also attach leaves without affecting the rest
of the resource group; so you may add "Bill's Machine" to the export
without restarting the resource group.  Similarly, you may add or
restart a script.

Whenever a node of the tree does not have anything depending on it,
it is by definition a leaf node.  Thus, you may add, start, or stop
IP addresses providing no scripts or exports are defined for a given
resource group.

Rules:
<group> resource:
- A group may depend on any number of other groups.  When a
  group depends on another group, the child group is started prior
  to any other resources of the parent group starting.  Additionally,
  they are managed in a single start phase and are thus started on
  the same cluster member; so it is only a logical grouping; 
  resource groups as dependent children must start on the same
  cluster member as their parents.
- A group, if depended upon by another group, may not depend on
  another group to start.  That is, A may depend on B, but if so,
  then B may not depend on anything.  This both prevents circular
  dependencies and arbitrarily complex services.
- A resource group fails to start if any one of its dependent
  children fails to start.

<ip> resource:
- An IP resource may not be added, removed, or changed if an export
  or user script is present in the resource group.  If neither an
  export nor a user script is present in the group, it may be
  restarted without affecting other parts of the group.
- An IP resource fails to start if any one of its dependent children
  fails to start

<script> resource:
- A script resource may be added, modified, or changed without
  affecting other members in the resource group.
- A script resource is not started unless all mount and ip resources
  have started.

<mount> resource:
- A mount resource may not be added, removed, or changed if an
  export or user script is present in the resource group.  If neither
  an export nor a user script is present in the group, it may be
  restarted without affecting other parts of the group.
- A mount resource fails to start if any one of its dependent children
  fails to start.

<export> resource:
- Export resources are not defined outside of a resource group; they
  are properties of a given mount resource and defined only in the
  context of a resource group.
- An export resource may not be added, removed, or changed if a
  client exists and is depending upon it.
- An export resource does not fail to start unless all of its
  dependent children fail to start.
- Export resources with type "samba" are not started unless all
  mount and ip resources have been started.
- Export resources with type "nfs" are not started unless all mount
  resources have been started.

<client> resource:
- Client resources may be added, removed, or changed at any time
  without affecting the operation of any other part of the resource
  group.

==================================
 How it works - a High Level View
==================================

Note - this is the same way clumanager 1.0 and 1.2 do it; the main
differences are in the fact that we have the ability to start 
individual export clients.  The intention is not to illustrate this
here; I have no idea how that's going to work yet ;)  BTW, nfs exports
are intentionally started apart from Samba exports, as samba exports
generally bind to IP addresses in the resource group (ugly), but NFS
exports need no such thing and in fact, exporting after the IP
address comes up causes problems with failover and/or service
relocation.

group_start () {

	for (each group) {
		if (group_start(group) != SUCCESS)
			return FAIL;
	}

	for (each mount) {
		if (start_mount() != SUCCESS)
			return FAIL;

		for (each export) {
			if (type != nfs)
				continue;

			for (each client)
				/* Log errors */
				start_client(export_directory);
		}
	}

	for (each ip) {
		if (start_ip() != SUCCESS)
			return FAIL;
	}

	for (each mount) {
		for (each export)  {
			if (type != samba)
				continue;

			if (start_samba(export) != SUCCESS)
				return FAIL;
		}
	}

	for (each script) {
		if (start_script() != SUCCESS)
			return FAIL;
	}

	return SUCCESS;
}

group_stop () {

	for (each script) {
		if (stop_script() != SUCCESS)
			return FAIL;
	}

	for (each ip) {
		if (start_ip() != SUCCESS)
			return FAIL;
	}

	for (each mount) {
		if (start_mount() != SUCCESS)
			return FAIL;

		for (each export) {
			if (type == nfs) {
				for (each client)
					/* Log errors */
					start_client(export_directory);
			} else
				stop_samba(export);
		}
	}

	for (each group) {
		if (group_stop(group) != SUCCESS)
			return FAIL;
	}

	return SUCCESS;
}