On Fri, 2006-10-06 at 12:10 +0100, Grant Waters wrote: > Powering cycling both nodes and the array fixes the problem, but I > want to know whats causing it in the first place. It doesn't appear > to be related to load, although I can't rule that out - both outages > were at approx 04:40 on a Friday. The tg3 link mysteriously disappearing/reappearing looks like the culprit. clumanager doesn't control those kinds of things... (a) up the failover interval to 30sec. If it's just a flaky card/driver/cable/etc., this buys more time. (b) cludb -p clumembd%rtp 10 If you think it's a scheduling problem. (c) cludb -p cluster%msgsvc_noarp 1 Gets rid of "SIOCGARP..." errors. (d) cludb -p clulockd%loglevel 4 Because clulockd @ debug level is a waste of resources. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster