On Thu, Mar 26, 2009 at 11:47:00PM +0100, Kadlecsik Jozsef wrote: > Hi, > > Freshly built cluster-2.03.11 reproducibly freezes as mailman started. > The versions are: > > linux-2.6.27.21 > cluster-2.03.11 > openais from svn, subrev 1152 version 0.80 So, in summary: - nodes 1-5 are correctly forming a cluster, and appear to be stable - nodes 1-5 all correctly mount the gfs file system - node5 runs: init.d/mailman start - node5 "freezes completely" - node5 is fenced by another node, e.g. node4 - sometimes, node4 then freezes completely You're using STABLE2 code, which is equivalent to RHEL5 code *except* for the gfs-kernel patches that are necessary to make gfs run on recent kernels. The RHEL5 code is thoroughly tested, but the STABLE2 code is not, so any differences between them (i.e. the gfs-kernel patches for recent kernels) are the most likely causes for regression bugs. It's always possible that a patch like the one in bz 466645 could be responsible, but it's less likely since it does go through a QE process unlike the patches for kernel updates. Hopefully, some gfs developers can look at the backtraces (which as Wendy points out do look suspicious) and try to reproduce this problem with recent kernels. Aside from gfs, the fact that you're running AoE over the same network at openais does raise some flags. We've seen problems with openais in the past when block i/o is sent over the same network causing load problems. It seems unlikely to be your problem, though, since it works fine with the previous version, and the freezing symptoms aren't what we'd expect to see from openais trouble. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster