Hi Guido, On Mon, 2009-06-29 at 20:48 +0200, Guido Günther wrote: > Hi Fabione, > Thanks for rolling this rc candidate! > > On Sat, Jun 20, 2009 at 01:19:49PM +0200, Fabio M. Di Nitto wrote: > [..snip..] > > In order to build the 3.0.0.rc3 release you will need: > > > > - corosync 0.98 > > - openais 0.97 > We used these without any patches. > > > - linux kernel 2.6.29 > We were running against 2.6.30. Shouldn't be a problem. You simply won't be able to build or use gfs1. > > We observed these issues: > > fenced segfaults with: > > (gdb) bt > #0 0x00007f8e293508fe in fence_node (victim=0x114b510 "node1.foo.bar", log=0x61e0a0, log_size=32, log_count=0x7fff2e46a634) at /var/home/schmitz/3/redhat-cluster/fence/libfence/agent.c:156 > #1 0x000000000040c5cd in fence_victims (fd=0x114f270) at /var/home/schmitz/3/redhat-cluster/fence/fenced/recover.c:319 > #2 0x0000000000405f27 in apply_changes (fd=0x114f270) at /var/home/schmitz/3/redhat-cluster/fence/fenced/cpg.c:1056 > #3 0x00007f8e2914bcc1 in cpg_dispatch () from /usr/lib/libcpg.so.4 #4 0x0000000000404588 in process_fd_cpg (ci=4) at /var/home/schmitz/3/redhat-cluster/fence/fenced/cpg.c:1351 #5 0x000000000040b0f7 in main (argc=<value optimized out>, argv=<value optimized out>) at /var/home/schmitz/3/redhat-cluster/fence/fenced/main.c:818 > > this leads to > > 1246297857 fenced 3.0.0.rc3 started > 1246297857 our_nodeid 1 our_name node2.foo.bar > 1246297857 logging mode 3 syslog f 160 p 6 logfile p 6 /var/log/cluster/fenced.log > 1246297857 found uncontrolled entry /sys/kernel/dlm/rgmanager It looks to me the node has not been shutdown properly and an attempt to restart it did fail. The fenced segfault shouldn't happen but I am CC'ing David. Maybe he has a better idea. > > when trying to restart fenced. Since this is not possible one has to > reboot the node. > > We're also seeing: > > Jun 29 19:29:03 node2 kernel: [ 50.149855] dlm: no local IP address has been set > Jun 29 19:29:03 node2 kernel: [ 50.150035] dlm: cannot start dlm lowcomms -107 hmm this looks like a bad configuration to me or bad startup. IIRC dlm kernel is configured via configfs and probably it was not mounted by the init script. > > from time to time. Stopping/starting via cman's init script (as from the > Ubuntu package) several times makes this go away. > > Any ideas what causes this? Could you please try to use our upstream init scripts? They work just fine (unchanged) in ubuntu/debian environment and they are for sure a lot more robust than the ones I originally wrote for Ubuntu many years ago. Could you also please summarize your setup and config? I assume you did the normal checks such as cman_tool status, cman_tool nodes and so on... The usual extra things I'd check are: - make sure the hostname doesn't resolve to localhost but to the real ip address of the cluster interface - cman_tool status - cman_tool nodes - Before starting any kind of service, such as rgmanager or gfs*, make sure that the fencing configuration is correct. Test by using fence_node $nodename. Cheers Fabio -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster