Re: How much downtime do we afford for nagios?

"Nigel Jones" <dev@xxxxxxxxxx> · Mon, 28 Apr 2008 00:43:03 +1200 (NZST)

On Sun, April 27, 2008 11:01 pm, Jeroen van Meeuwen wrote:
> Nigel Jones wrote:
>> Looking through my email, from what I can recall there are no false
>> positives.  xen6 had to be power-cycled which caused all the other
>> collateral notifications.
>>
>
> Collateral notifications can be caught using service dependencies and
> parent hosts. Do we currently use any?
I believe we do, but it wouldn't have helped in this case (I've done a bit
more digging)

Half the notifications came from the external nagios instance on noc2,
while the xen6/db alerts came from the internal nagios instance. Another
reason why I like the current setup and don't think we should change a
thing :)

Also, the UNKNOWN alerts weren't that bad, they were a precursor to the
box having to restarted, only in this case was the up/down alerts a little
useless.  However, I'd sooner keep them as it because otherwise we run the
risk of not noticing a box down immediately and get everyone under the
moon asking "why can't I access fedoraproject.org... it's down your OS
can't be that good".

- Nigel
>
> Kind regards,
>
> Jeroen van Meeuwen
> -kanarip
>
> _______________________________________________
> Fedora-infrastructure-list mailing list
> Fedora-infrastructure-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
>

_______________________________________________
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list