On Thu, Nov 28, 2019 at 03:57:11PM +0100, Miroslav Suchý wrote: > Dne 28. 11. 19 v 1:29 Kevin Fenzi napsal(a): > > I am missing what capabilities you are wanting it to > > have? > > * current page shows "Everything is OK" even when some service does not work. I would at least expect some orange text > "There seems to be some problems, not acknowledged by admins yet". But thats not it's purpose. If people want to know about problems as soon as monitoring sees them, can't we just point them to the nagios page? Also, not everything monitoring notes as a problem is actually something end users would see as a problem. > * It takes pretty high rights to change something there (e.g. I cannot change status for ABRT and Copr). I understand > why it is. But I would expect to flip it from green to orange relatively easy. Sure, we could work out some way to get you access for those... > * get rid of manual interaction as much as possible. If nagios report that the service is not available. Then take it as > granted. When nagios report the service as running, then make it green. Manually interaction should be needed only when > you want to put there more information, like cause of the outage or ETA. But how do you map those? Some examples: * If a mirrorlist alerts as down that does not mean the service is down to end users. There's another container on each proxy. If a proxy is out of dns the alerts for services on it don't impact users. * Tons of services are setup in HA, so if one part of them is down in nagios the service is just fine. * Sometimes users notice something that isn't monitored by nagios, so nagios couldn't report it down. * A server alerts and is down, how do you know what services are affected? So, it sounds to me like you want a better monitoring page to point people at? We have been talking about replacing nagios (for years now), and perhaps some of the other solutions have a better answer for that... Our intent for status is "This is a list of issues some human knows about and is working on" not "something alerted, no idea if anyone knows about it or is working on it" > * Ability to show that there is going to be planned outage. We have been using fedocal for that. There is a fedora-infrastructure outages calendar. kevin
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx