Axel Thimm wrote: > On Mon, Oct 03, 2005 at 10:31:02AM +0100, Patrick Caulfield wrote: > >>Axel Thimm wrote: >> >>>On Mon, Oct 03, 2005 at 07:59:22AM +0100, Patrick Caulfield wrote: >>> >>> >>>>Axel Thimm wrote: >>>> >>>> >>>>>On Thu, Jul 14, 2005 at 04:57:51PM -0400, Manuel Bujan wrote: >>>>> >>>>> >>>>> >>>>>>Is there any issue I should be aware of if SMP is enabled in >>>>>>my kernel ? What if I compile my kernel to be pre-emptible ? Any problem with that and GFS ? >>>>>> >>>> >>>>Pre-emptible kernels will not work with GFS, that's certain. >>> >>> >>>My report was on a RHEL4 kernel. >> >> >>...but you did ask about pre-emtible kernels :) > > > No, I didn't, that was Manuel Bujan 6 weeks ago. ;) > > I replied that I saw the same einval messages on a RHEL4 kernel. > > >>The important messages here are these : >> >> >>>Sep 30 05:08:33 zs03 kernel: CMAN: removing node zs02 from the cluster : >> >>Missed too many heartbeats (P:kernel) >> >>>Sep 30 05:08:39 zs03 kernel: CMAN: removing node zs01 from the cluster : No >> >>response to messages (P:kernel) >> >> >>showing that a node has been kicked out of the cluster for not responding >>quickly enough to messages. You could try increasing the value in >> >>/proc/cluster/config/cman/max_retries > > > I know, but that doesn't explain the einval messages, or does it? Or > formulated differently: the einval messages show that the dual Xeon > box had some issues with sockets and its being kicked out could be > just a symptom of that. it probably does explain them. If the node is kicked out of the cluster, the DLM starts return -EINVAL from lock ops (because the lockspace no longer exists). This very often causes the GFS lock_dlm module to oops. The bugzillas are confused about this but it sort-of exists as https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=165160 -- patrick -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster