Hi, Last night we’ve had downtime on a simple three-node cluster. Here’s what happened: 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] message from mon.2 was stamped 8.401927s in the future, clocks not synchronized 2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1 clock skew 8.40163s > max 0.05s 2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2 clock skew 8.40146s > max 0.05s 2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN] Health check failed: clock skew detected on mon.node02, mon.node03 (MON_CLOCK_SKEW) The cause of this timeshift is the terrible way that systemd-timesyncd works, depending on a single NTP-server. If that one is going haywire, systemd-timesyncd does not check with others, but just sets the clock on your machine incorrect. We will fix this with chrony. However, what I don’t understand is that why the cluster does not see the single monitor as incorrect, but the two correct machines as incorrect. Is this because one of the three is master-ish? Obviously we will fix the time issues, but I would like to understand the reasoning of Ceph to stop functioning because one monitor has incorrect time. Thanks! -- Mark Schouten CTO, Tuxis B.V. | https://www.tuxis.nl/ <mark@xxxxxxxx> <mailto:mark@xxxxxxxx> | +31 318 200208 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx