Hi Lon,
Thank you very much for your reply. I'll try your tips.
Now another question: is it really necessary to pass on the
"nmi_watchdog=1" parameter to the kernel? Or is it enabled by default
under RHELv3 ou v4?
Regards,
Celso.
Lon Hohberger escreveu:
On Wed, 2005-12-21 at 16:25 -0200, Celso K. Webber wrote:
Does anyone has had this issue before? Or am I missing any step on
configuring the software watchdog feature?
Another question for the Red Hat people on the list: does this "software
watchdog" works ok? I ask because it's enabled by default when you add a
new member to the cluster. The Cluster Suite v3 manual tells nothing
about this resource either.
Yes, it works fine.
A few things could be happening:
(1) The NMI watchdog will reboot the machine if it detects an NMI hang.
This is only a few seconds.
(2) The cluster is extremely paranoid because you are not using a
STONITH device (power controller), and it's detecting internal hangs.
Try increasing the failover time.
(3) The cluster is not getting scheduled due to system load. See the
man page for cludb(8) about clumembd%rtp - both may help.
-- Lon
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster