On Tue, Jan 04, 2005 at 02:46:17PM -0800, Daniel McNeil wrote: > > One thing I do not understand is that I am leaving the nodes in the > cluster and just doing mounting and umounting, so the generation number > should not be changing. > > I think you are saying the the lock traffic is so high that the heart > are lost so the node being kicked out is seeing the new heart beat > from the other nodes and doesn't know they are not receiving his > heartbeat messages. This node must be seeing the other nodes > heartbeat messages or it would have started a membership transition > without the other nodes. Do I have this right? Yes, I think. It's all a bit vague. If it wasn't I might have an answer by now :-( > Shouldn't the heartbeat messages have higher priority > over the lock traffic messages? They do. That's why I am puzzled. I'm currently investigating if the heartbeat thread is being starved of CPU time by either the DLM or GFS. > Shouldn't there be a way of throttling back the lock traffic and seeing > if heartbeat connection can be re-established before starting a > membership transition? DLM & CMAN are not that tightly coupled. -- patrick