Re: STONITH

Lon Hohberger <lhh@xxxxxxxxxx> · Mon, 09 Oct 2006 16:53:24 -0400

On Fri, 2006-10-06 at 12:10 +0100, Grant Waters wrote:

> Powering cycling both nodes and the array fixes the problem, but I
> want to know whats causing it in the first place.  It doesn't appear
> to be related to load, although I can't rule that out - both outages
> were at approx 04:40 on a Friday. 

The tg3 link mysteriously disappearing/reappearing looks like the
culprit.  clumanager doesn't control those kinds of things...

(a) up the failover interval to 30sec.  If it's just a flaky
card/driver/cable/etc., this buys more time.

(b) cludb -p clumembd%rtp 10

If you think it's a scheduling problem.

(c) cludb -p cluster%msgsvc_noarp 1 

Gets rid of "SIOCGARP..." errors.

(d) cludb -p clulockd%loglevel 4

Because clulockd @ debug level is a waste of resources.

-- Lon

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster