Hello Daniel, the first issue sounds like a network problem. For RHEL5 you have to enable IGMP and multicast traffic forwarding on some network-switches (on the logging network). Regards, Michael On Thursday 07 August 2008, dake@xxxxxxxxxx wrote: > Hello folks, > > we've been having two nasty problems with a GFS cluster, currently > running version 2.03.03 of cluster suite and 0.80.3 of OpenAIS. > > The first is that for some time now, logging has been broken. We're > getting kernel log messages from the DLM and GFS modules, but the > userlnd utilities (i.e. OpenAIS) refuses to log at all when used with > the cluster suite. Logging is fine when started without it (i.e. > Default OpenAIS config file), so I'm pretty sure it's not the logging > setup. Somehow, it seems that OpenAIS is not being given correct > logging parameters by CMAN, and I really don't know why. I've tried > including extra logging directives in cluster.conf, in various > different forms, but to no avail. The cluster.conf we're using now is > as follows: > > <?xml version="1.0"?> > <cluster name="gfscluster" config_version="6"> > > <clusternodes> > <clusternode name="smb1-cluster" nodeid="1"> > <fence> > <method name="powerswitch"> > <device name="powerswitch" port="1"/> > </method> > <method name="last_resort"> > <device name="manual" nodename="smb1"/> > </method> > </fence> > </clusternode> > <clusternode name="smb2-cluster" nodeid="2"> > <fence> > <method name="powerswitch"> > <device name="powerswitch" port="2"/> > </method> > <method name="last_resort"> > <device name="manual" nodename="smb2"/> > </method> > </fence> > </clusternode> > <clusternode name="mail-cluster" nodeid="3"> > <fence> > <method name="powerswitch"> > <device name="powerswitch" port="3"/> > </method> > <method name="last_resort"> > <device name="manual" nodename="mail"/> > </method> > </fence> > </clusternode> > <clusternode name="backup-cluster" nodeid="4"> > <fence> > <method name="powerswitch"> > <device name="powerswitch" port="4"/> > </method> > <method name="last_resort"> > <device name="manual" nodename="backup"/> > </method> > </fence> > </clusternode> > </clusternodes> > > <fencedevices> > <fencedevice name="powerswitch" agent="fence_epc" > host="192.168.10.xx" passwd="xxx" action="4"/> > <fencedevice name="manual" agent="fence_manual"/> > </fencedevices> > > <fence_daemon post_join_delay="30"> > </fence_daemon> > > <logging to_syslog="yes" syslog_facility="local3"> > <logger ident="CPG" to_syslog="yes"> > </logger> > <logger ident="CMAN" to_syslog="yes"> > </logger> > <logger ident="CLM" to_syslog="yes"> > </logger> > </logging> > > </cluster> > > Any idea why this might not be working? > > The second problem is that once quorum is reached, any additional > nodes joining will make the existing quorate cluster break apart. This > behaviour has been seen in a three-node config with the third node > joining, and in a four-node config with the fourth node joining. WHICH > node is the last to join doesn't seem to make a difference. The > "breaking apart" means that the newly joined node dies ("joining > cluster with disallowed nodes, must die"), one of the existing nodes > dies, and two of the other existing nodes keep running, but desynced - > both show differing cluster membership and differing disallowed nodes. > This is after a fresh reboot, so there is NO state in any node before > joining. The crash occurs at the cman_tool join stage. > > I have a gut feeling it might have something to do with our network > config, which has a total of four ethernet interfaces in three of the > nodes, and two in the fourth. The first three have two iSCSI > interfaces, one for cluster use and one for regular LAN access. The > last has only one iSCSI interface and no LAN access for now. Routing > tables etc. should be setup properly; as you can see above, > cluster.conf uses special hostnames for the cluster interfaces, which > are resolved to IPs using hosts files which are identical on all four > machines. I have yet to do any packet sniffing, and I have very little > information log-wise due to the first problem, so I'm sure this is not > a lot of info; but I thought I might include it anyway, in case > someone can immediately point out the problem. > > Thanks in advance, > Daniel > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster