Re: Service monitoring in CommuniShift/OpenShift?

David Kirwan <dkirwan@xxxxxxxxxx> · Fri, 7 Jun 2024 09:56:41 +0100

The User Workload monitoring stack is installed and available iirc on the Staging and Production Fedora clusters but not on Communishift (it probably could be turned on in Communishift too, will need to investigate). We just haven't started making use of it yet. See [1]. This will allow you to take ownership of the monitoring of your service

I did a POC a few years back and demoed the user workload monitoring stack, but didn't really get any interest at the time, it might be a little dated so best to follow the instructions [1]. I think will soon also get the Openshift monitoring stacks hooked into Zabbix with a prometheus exporter

- [1] 
https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-monitoring-for-user-defined-projects.html

On Thu, 6 Jun 2024 at 12:58, Jakub Kadlcik <jkadlcik@xxxxxxxxxx> wrote:
Thank you very much for the reply Kevin,
> Another option there is to make some kind of health check, and have
> openshift monitor it and alert/take the app down if something was
> unhealthy.
I have limited experience with OpenShift so I am not sure what is possible or not but I've been reading about health checks and the documentation always mentioned restarting the "unhealthy" container instead of sending an email notification. That wouldn't be helpful for me.

> There's currently no monitoring setup for communishift items.
> Once it's moved to staging / production, nagios checks can be added.> Also, in our stg/prod clusters we have some simple monitoring like
> mailing you when a pod crashes or a build or cronjob fails.

Seems like the right course of action would be migrating to the production OpenShift instance. My questions regarding the migration process were answered in https://pagure.io/fedora-infrastructure/issue/11814 so I will just have to prioritize that. Then I will get back to you in regards to the Nagios configuration.

Thank you again,
Jakub

On Sun, Jun 2, 2024 at 7:35 PM Kevin Fenzi <kevin@xxxxxxxxx> wrote:
On Sat, Jun 01, 2024 at 08:48:32PM GMT, Jakub Kadlcik wrote:

> I am running a service in Fedora CommuniShift (planning to move it to

> Fedora production OpenShift instance in case it is relevant).

> 

> Can anybody please help me understand how to configure some monitoring for

> it? Is it possible to configure nagios.fedoraproject.org for it? Or is

> there any other recommended approach?

There's currently no monitoring setup for communishift items.

Once it's moved to staging / production, nagios checks can be added.

Also, in our stg/prod clusters we have some simple monitoring like

mailing you when a pod crashes or a build or cronjob fails.

> Basically, I would like to have some custom commands (checking if auth

> tokens are up-to-date, parsing a log file for specific errors, etc) and

> periodically run them on my deployed container. Or spawning a separate

> container in my project to run them. If they find any problem, I'd like to

> be notified via email.

Another option there is to make some kind of health check, and have

openshift monitor it and alert/take the app down if something was

unhealthy. I guess that might not be what you want for transitory errors

or where you don't want the app to stop working on some errors. 

kevin

--

_______________________________________________

devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx

To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

--

_______________________________________________

devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx

To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

-- 
David Kirwan
Senior Software Engineer
Community Platform Engineering @ Red Hat
T: +(353) 86-8624108

--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue