OpenShift: alerting heads up (attention appowners!)

Kevin Fenzi <kevin@xxxxxxxxx> · Thu, 9 Feb 2023 10:10:36 -0800

Greeting everyone

Thanks to darknao, we have just enabled monitoring by default for our
OpenShift applications. Note that it will not be active until the next
run of the playbook pushes it out. I will look at running playbooks over
the next few days for most projects, but if you are an appowner and want
it sooner, just run your playbook (and let me know I don't need to).

Some notes:

By default it alerts on the things in
./roles/openshift/project/templates/prometheusRules.yml 
Which includes cronjobs failing, pods crashing for various reasons, etc. 
We can look at expanding this if there's other things that are generally
good to monitor.

Alerts trigger and by default send email to appowners. 
You can optionally set alert_users list in your playbook if you like and
it will only send to those users (not to appowners). 

If for some reason you do not want any of this monitoring on your
application you can set: alerting: False to avoid it. I'd really like to
know why if you plan on doing that however. 

Hopefully this will help us see when things aren't working right before
we get user reports about it. :) 

Many thanks again to darknao for setting this up. :)

kevin
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue