Re: MGRs failing once per day and generally slow response times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Additional information: I just found this in the logs of one failed MGR:

2020-03-11 09:32:55.265 7f59dcb94700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2020-03-11 08:32:55.268325)

It's the same message that used to appear previously when MGRs crashed, so perhaps the overall issue is still the same, just massively accelerated.


On 11/03/2020 09:43, Janek Bevendorff wrote:
Hi,

I've always had some MGR stability issues with daemons crashing at random times, but since the upgrade to 14.2.8 they regularly stop responding after some time until I restart them (which I have to do at least once a day).

I noticed right after the upgrade that the prometheus module was entirely unresponsive and ceph fs status took about half a minute to return. Once all the cluster chatter had settled and the PGs had been rebalanced (auto-scale was messing with PGs after the upgarde), it became usable again, but everything's still slower than before. Prometheus takes several seconds to list metrics, ceph fs status takes about 1-2 seconds.

However, after some time, MGRs stop responding and are kicked from the list of standbys. With log level 5 all they are writing to the log files is this:

2020-03-11 09:30:40.539 7f8f88984700  4 mgr[prometheus] ::ffff:xxx.xxx.xxx.xxx - - [11/Mar/2020:09:30:40] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.15.2"
2020-03-11 09:30:41.371 7f8f9ee62700  4 mgr send_beacon standby
2020-03-11 09:30:43.392 7f8f9ee62700  4 mgr send_beacon standby
2020-03-11 09:30:45.412 7f8f9ee62700  4 mgr send_beacon standby
2020-03-11 09:30:47.436 7f8f9ee62700  4 mgr send_beacon standby
2020-03-11 09:30:49.460 7f8f9ee62700  4 mgr send_beacon standby

I have seen another email on this list complaining about slow ceph fs status, I believe this issue is connected.

Besides the standard always-on modules I have enabled the prometheus, dashboard, and telemetry modules.

Best
Janek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux