Re: MGR process regularly not responding

Eugen Block <eblock@xxxxxx> · Tue, 25 Oct 2022 14:03:43 +0000

Hi,

I see the same on different Nautilus clusters, I was pointed to this  
tracker issue: https://tracker.ceph.com/issues/39264
In one cluster disabling the prometheus module seemed to have stopped  
the failing MGRs. But they happen so rarely that it might be something  
different and we just didn't wait long enough. So it seems to be a  
reoccuring issue, you could try to see if it occurs with disabled  
prometheus mgr module, if you use it, of course.
Just two days ago we had the same thing in another cluster where the  
prometheus module is disabled, so there it might be something else  
just with similar symptoms.

Regards,
Eugen

Zitat von Gilles Mocellin <gilles.mocellin@xxxxxxxxxxxxxx>:

Hi,

In our Ceph Pacific clusters (16.2.10) (1 for OpenStack and S3, 2  
for backup on RBD and S3),
since the upgrade to Pacific, we have regularly the MGR not  
responding, not seen anymore in ceph status.
The process is still there.
Noting in the MGR log, just no more logs.

Restarting the service make it come back.

When all MGR are down, we have a warning in ceph status, but not before.

I can't find a similar bug in the Tracker.

Does someone also have that symptom ?
Do you have a workaround or solution ?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx