Re: [Linux-cluster] cman bad generation number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 04, 2005 at 02:46:17PM -0800, Daniel McNeil wrote:
> 
> One thing I do not understand is that I am leaving the nodes in the
> cluster and just doing mounting and umounting, so the generation number
> should not be changing.
> 
> I think you are saying the the lock traffic is so high that the heart
> are lost so the node being kicked out is seeing the new heart beat
> from the other nodes and doesn't know they are not receiving his
> heartbeat messages.  This node must be seeing the other nodes
> heartbeat messages or it would have started a membership transition
> without the other nodes.  Do I have this right?

Yes, I think. It's all a bit vague. If it wasn't I might have an answer by now
:-(
 
> Shouldn't the heartbeat messages have higher priority
> over the lock traffic messages? 

They do. That's why I am puzzled. I'm currently investigating if the heartbeat
thread is being starved of CPU time by either the DLM or GFS.
 
> Shouldn't there be a way of throttling back the lock traffic and seeing
> if heartbeat connection can be re-established before starting a
> membership transition?

DLM & CMAN are not that tightly coupled.

-- 

patrick


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux