Over the weekend, all five MGRs failed, which means we have no more
Prometheus monitoring data. We are obviously monitoring the MGR status
as well, so we can detect the failure, but it's still a pretty serious
issue. Any ideas as to why this might happen?
On 13/03/2020 16:56, Janek Bevendorff wrote:
Indeed. I just had another MGR go bye-bye. I don't think host clock
skew is the problem.
On 13/03/2020 15:29, Anthony D'Atri wrote:
Chrony does converge faster, but I doubt this will solve your problem
if you don’t have quality peers. Or if it’s not really a time problem.
On Mar 13, 2020, at 6:44 AM, Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
I replaced ntpd with chronyd and will let you know if it changes
anything. Thanks.
On 13/03/2020 06:25, Konstantin Shalygin wrote:
On 3/13/20 12:57 AM, Janek Bevendorff wrote:
NTPd is running, all the nodes have the same time to the second. I
don't think that is the problem.
As always in such cases - try to switch your ntpd to default EL7
daemon - chronyd.
k
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 308
99423 Weimar, Germany
Phone: +49 (0)3643 - 58 3577
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx