Re: Cluster downtime due to unsynchronized clocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 9/23/21 9:49 AM, Mark Schouten wrote:
Hi,

Last night we’ve had downtime on a simple three-node cluster. Here’s
what happened:
2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN]
message from mon.2 was stamped 8.401927s in the future, clocks not
synchronized
2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1
clock skew 8.40163s > max 0.05s
2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2
clock skew 8.40146s > max 0.05s
2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN]
Health check failed: clock skew detected on mon.node02, mon.node03
(MON_CLOCK_SKEW)

The cause of this timeshift is the terrible way that systemd-timesyncd
works, depending on a single NTP-server. If that one is going haywire,
systemd-timesyncd does not check with others, but just sets the clock on
your machine incorrect. We will fix this with chrony.

However, what I don’t understand is that why the cluster does not see
the single monitor as incorrect, but the two correct machines as
incorrect. Is this because one of the three is master-ish?


I would assume that the time of the mon leader is taken as reference. If both other mons have a clock skew, the mon quorum will be impacted.


Obviously we will fix the time issues, but I would like to understand
the reasoning of Ceph to stop functioning because one monitor has
incorrect time.

We do not rely on external NTP servers for internal synchronization. NTP is running on one of our central switches, and all hosts use that switch as time source. The switch itself is synchronizing to an external NTP server (but we are currently thinking about using an NTP USB receiver on one machine as additional reference). Even if internet connection is lost, NTP sync is not possible and the switch's time starts to shift, all machines will perform the same shift.


Regards,

Burkhard


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux