Re: Freeze with cluster-2.03.11

Wendy Cheng <s.wendy.cheng@xxxxxxxxx> · Sat, 28 Mar 2009 11:07:18 -0500

Kadlecsik Jozsef wrote:
I don't see a strong evidence of deadlock (but it could) from the thread
backtraces However, assuming the cluster worked before, you could have
overloaded the e1000 driver in this case. There are suspicious page faults
but memory is very "ok". So one possibility is that GFS had generated too
many sync requests that flooded the e1000. As the result, the cluster heart
beat missed its interval.

It's a possibility. But it assumes also that the node freezes >because< 
it was fenced off. So far nothing indicates that.

Re-read your console log. There are many foot-prints of spin_lock - 
that's worrisome. Hit a couple of "sysrq-w"  next time when you have 
hangs, other than sysrq-t. This should give traces of the threads that 
are actively on CPUs at that time. Also check your kernel change log (to 
see whether GFS has any new patch that touches spin lock that doesn't in 
previous release).

BTW, I do have opinions on other parts of your postings but don't have 
time to express them now. Maybe I'll say something when I finish my 
current chores :) ... Need to rush out now. Good luck on your debugging !

-- Wendy

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster