Re: Nodes leaving and re-joining intermittently (Matthew Painter)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> We are trying to get to the bottom of some odd intermittent behavior
on a cluster. We are intermittently 
>> seeing nodes leave and rejoin clusters, without being fenced. Further
the gap between leaving on re-joining is 
>> 8 minutes. We are monitoring the latency between boxes, and it is
acceptable (<5ms).

>From my recent experience, the first thing I would check is the
multicast config and behavior. I've deployed a couple dozen 2-3-node
clusters (with GFS2) in three different data-centers with three
seriously different network configurations. Multicast is always an
issue. RH Knowledgebase article
https://access.redhat.com/kb/docs/DOC-39175 has a python script
multicast.py which exercises it from client and server ends. It has come
in very handy. It sounds like it may be an intermittent problem, in
which case I might alter the script to reduce traffic a little but run
it longer-term as a diagnostic. If you're at RHEL 6.1 there is an
"omping" package in the channel/distro which serves the same purpose,
there's some info in the article on its use too. HTH......Nick G


Nick Geovanis
US Cellular/Kforce Inc
e. Nicholas.Geovanis@xxxxxxxxxxxxxx


Message: 1
Date: Sat, 10 Dec 2011 20:32:05 +0000
From: Matthew Painter <matthew.painter@xxxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject:  Nodes leaving and re-joining intermittently
Message-ID:
	
<CALj8VcxxvOV_PTT9QZKJYnPuvhjBgoxNETBWxB4uCWCRhkzhSA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Hi all,

We are trying to get to the bottom of some odd intermittent behavior on
a cluster. We are intermittently seeing nodes leave and rejoin clusters,
without being fenced. Further the gap between leaving on re-joining is 8
minutes. We are monitoring the latency between boxes, and it is
acceptable (<5ms).

How can nodes exhibit this behavior? There seem to be no impact on the
services running on the box, just this leaving and re-joining. The SNMP
messages are below.

All help decoding this gratefully received! :)

Thanks,

Matt



--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux