On 12/10/2011 03:32 PM, Matthew Painter wrote: > Hi all, > > We are trying to get to the bottom of some odd intermittent behavior on > a cluster. We are intermittently seeing nodes leave and rejoin clusters, > without being fenced. Further the gap between leaving on re-joining is 8 > minutes. We are monitoring the latency between boxes, and it is > acceptable (<5ms). > > How can nodes exhibit this behavior? There seem to be no impact on the > services running on the box, just this leaving and re-joining. The SNMP > messages are below. > > All help decoding this gratefully received! :) > > Thanks, > > Matt > > > Sat Dec 10 15:22:00 GMT 2011: cluster3.localdomain > DISMAN-EVENT-MIB::sysUpTimeInstance = 3:2:52:23.35, > SNMPv2-MIB::snmpTrapOID.0 = COROSYNC-MIB::corosyncNoticesNodeStatus, > COROSYNC-MIB::corosyncObjectsNodeName.0 = "cluster1.localdomain", > COROSYNC-MIB::corosyncObjectsNodeID.0 = 1, > COROSYNC-MIB::corosyncObjectsNodeAddress.0 = "10.79.202.1", > COROSYNC-MIB::corosyncObjectsNodeStatus.0 = "left" > > Sat Dec 10 15:30:25 GMT 2011: cluster3.localdomain > DISMAN-EVENT-MIB::sysUpTimeInstance = 3:3:00:48.75, > SNMPv2-MIB::snmpTrapOID.0 = COROSYNC-MIB::corosyncNoticesNodeStatus, > COROSYNC-MIB::corosyncObjectsNodeName.0 = "cluster1.localdomain", > COROSYNC-MIB::corosyncObjectsNodeID.0 = 1, > COROSYNC-MIB::corosyncObjectsNodeAddress.0 = "10.79.202.1", > COROSYNC-MIB::corosyncObjectsNodeStatus.0 = "joined" My first instinct is to point to multicast issues in your switch, but then, I'd expect the node to get fenced. That said, any unexpected disconnect should fire a fence, so it would seem like the node is cleanly stopping/restarting corosync. Can you share your configuration and, ideally, anything in syslog from all involved nodes starting from just before the disconnect and continuing through to after the node rejoins? -- Digimer E-Mail: digimer@xxxxxxxxxxx Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster