Re: MGRs failing once per day and generally slow response times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Over the weekend, all five MGRs failed, which means we have no more Prometheus monitoring data. We are obviously monitoring the MGR status as well, so we can detect the failure, but it's still a pretty serious issue. Any ideas as to why this might happen?


On 13/03/2020 16:56, Janek Bevendorff wrote:
Indeed. I just had another MGR go bye-bye. I don't think host clock skew is the problem.


On 13/03/2020 15:29, Anthony D'Atri wrote:
Chrony does converge faster, but I doubt this will solve your problem if you don’t have quality peers. Or if it’s not really a time problem.

On Mar 13, 2020, at 6:44 AM, Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote:

I replaced ntpd with chronyd and will let you know if it changes anything. Thanks.


On 13/03/2020 06:25, Konstantin Shalygin wrote:
On 3/13/20 12:57 AM, Janek Bevendorff wrote:
NTPd is running, all the nodes have the same time to the second. I don't think that is the problem.
As always in such cases - try to switch your ntpd to default EL7 daemon - chronyd.



k
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 308
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3577
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux