Re: Service monitoring in CommuniShift/OpenShift?

Jakub Kadlcik <jkadlcik@xxxxxxxxxx> · Thu, 6 Jun 2024 13:58:19 +0200

Thank you very much for the reply Kevin,
> Another option there is to make some kind of health check, and have
> openshift monitor it and alert/take the app down if something was
> unhealthy.
I have limited experience with OpenShift so I am not sure what is possible or not but I've been reading about health checks and the documentation always mentioned restarting the "unhealthy" container instead of sending an email notification. That wouldn't be helpful for me.

> There's currently no monitoring setup for communishift items.
> Once it's moved to staging / production, nagios checks can be added.> Also, in our stg/prod clusters we have some simple monitoring like
> mailing you when a pod crashes or a build or cronjob fails.

Seems like the right course of action would be migrating to the production OpenShift instance. My questions regarding the migration process were answered in https://pagure.io/fedora-infrastructure/issue/11814 so I will just have to prioritize that. Then I will get back to you in regards to the Nagios configuration.

Thank you again,
Jakub

On Sun, Jun 2, 2024 at 7:35 PM Kevin Fenzi <kevin@xxxxxxxxx> wrote:
On Sat, Jun 01, 2024 at 08:48:32PM GMT, Jakub Kadlcik wrote:

> I am running a service in Fedora CommuniShift (planning to move it to

> Fedora production OpenShift instance in case it is relevant).

> 

> Can anybody please help me understand how to configure some monitoring for

> it? Is it possible to configure nagios.fedoraproject.org for it? Or is

> there any other recommended approach?

There's currently no monitoring setup for communishift items.

Once it's moved to staging / production, nagios checks can be added.

Also, in our stg/prod clusters we have some simple monitoring like

mailing you when a pod crashes or a build or cronjob fails.

> Basically, I would like to have some custom commands (checking if auth

> tokens are up-to-date, parsing a log file for specific errors, etc) and

> periodically run them on my deployed container. Or spawning a separate

> container in my project to run them. If they find any problem, I'd like to

> be notified via email.

Another option there is to make some kind of health check, and have

openshift monitor it and alert/take the app down if something was

unhealthy. I guess that might not be what you want for transitory errors

or where you don't want the app to stop working on some errors. 

kevin

--

_______________________________________________

devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx

To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue