How much downtime do we afford for nagios?

"susmit shannigrahi" <thinklinux.ssh@xxxxxxxxx> · Sun, 27 Apr 2008 11:11:54 +0530

Hi,

For a few days false notification of nagios reduced. But it has increased again.

Looking at the /configs/system/nagios/services/template.cfg reveals
that it is configured as
max_check_attempt = 4 and retry_check_interval  1 for hosts
and
 max_check_attempts = 3 and retry_check_interval  1.

So if a service or host is unreachable for 3 or 4 mins, we get a
notification. (However most of the cases it is false positive, due to
congestion or others).

How about finding out a working delay which we can afford, if a
service or host is really down. How about 10 mins ? (5 attempt x 2
mins?).

Also we may list services/host which are critical and which are not.
That will help to define different notification period for the
different hots/services.

I thought I shall do it after the freeze, but its becoming too annoying.

Thanks

-- 
Regards,
Susmit.

=============================================
ssh
0x86DD170A
http://www.fedoraproject.org/wiki/SusmitShannigrahi
=============================================

_______________________________________________
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list