Hi everybody, as briefly mentioned in 3.0.4 release note, a new system to validate the configuration has been enabled in the code. What it does ------------ The general idea is to be able to perform as many sanity checks on the configuration as possible. This check allows us to spot the most common mistakes, such as typos or possibly invalid values, in cluster.conf. Configuring the validation -------------------------- The validation system is integrated in several components. It supports one config option that can take 3 values. Via init script (or /etc/sysconfig/cman or distro equivalent): CONFIG_VALIDATION=value values can be: 1) FAIL - enables a very strict check. Even a simple typo will fail to load the configuration. 2) WARN - the check is relaxed. Warnings are printed on the screen, but the cluster will continue to load. (default) 3) NONE - disable the config validation system. (discouraged!) this is equivalent to: cman_tool join/version -D(FAIL|WARN|NONE) What a user sees ---------------- The output of the validation process is very cryptic. Yes we are absolutely aware of that and we are working on making it easy to understand (if anybody has relax-ng experience, please contact us). This is the typical output from a normal startup (configuration contains no errors or warnings): [root@fedora-rh-node1 ~]# /etc/init.d/cman start join Starting cluster: Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Setting network parameters... [ OK ] Starting cman... [ OK ] [root@fedora-rh-node1 ~]# This is the output with a typo in cluster.conf (running in WARN mode): [root@fedora-rh-node1 ~]# /etc/init.d/cman start join Starting cluster: Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Setting network parameters... [ OK ] Starting cman... tempfile:22: element quorum: Relax-NG validity error : Element cluster has extra content: quorum Configuration fails to validate [ OK ] The error in this specific case is that quorum element is wrong and should be quorumd.. (for qdisk). As you can see yourself, the output is not easy to understand without a good understanding of Relax-NG. The check also happens before configuration updates using via cman_tool version. Here are 3 examples (i use -S to disable configuration synchronization on my systems): [root@fedora-rh-node1 ~]# cman_tool version -r 2 -S [root@fedora-rh-node1 ~]# cman_tool defaults to strict check, the same typo as above will abort the configuration reload: [root@fedora-rh-node4 ~]# cman_tool version -r 3 -S tempfile:22: element quorum: Relax-NG validity error : Element cluster has extra content: quorum Configuration fails to validate cman_tool: Not reloading, configuration is not valid Disable the strict check and turn errors into warnings: [root@fedora-rh-node1 ~]# cman_tool version -r 3 -S -DWARN tempfile:22: element quorum: Relax-NG validity error : Element cluster has extra content: quorum Configuration fails to validate [root@fedora-rh-node1 ~]# What to do if there are errors ------------------------------ First of all do NOT panic. This check integration is new and there might be several reasons why you see a warning (including bugs in the validation schema). Users with XML and Relax-NG experience should be able to sort it out simply. For all the others we strongly recommend you to file a bug on bugzilla.redhat.com, including /etc/cluster/cluster.conf _AND_ /usr/share/cluster/cluster.rng. This will allow us to cross check bugs in our validation code/schema and help users fixing their configuration files. Using ccs_config_validate standalone command -------------------------------------------- Validation of a configuration is an important step. ccs_config_validate is a very powerful and flexible tool, but requires understanding of the config subsystem to be used correctly. The general/average user can simply invoke ccs_config_validate with no options and will see the same results as when invoked via cman_tool. This is achieved by loading the same environment variables as cman init script and respecting those selections, it will perform the required actions. There are advanced use cases and usage of the tool, for example to migrate from one config subsystem to another (cluster.conf to ldap for example), but, generally, anyone who needs to do changes of this magnitude is also expected to have a good understanding of the configuration subsystem (a new document will be available shortly for both developers and advanced users). Please do not hesitate to ask for clarifications or report bugs. Cheers Fabio -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster