> 在 2021年9月23日,15:50,Mark Schouten <mark@xxxxxxxx> 写道: > > Hi, > > Last night we’ve had downtime on a simple three-node cluster. Here’s > what happened: > 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] > message from mon.2 was stamped 8.401927s in the future, clocks not > synchronized > 2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1 > clock skew 8.40163s > max 0.05s > 2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2 > clock skew 8.40146s > max 0.05s > 2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN] > Health check failed: clock skew detected on mon.node02, mon.node03 > (MON_CLOCK_SKEW) > > The cause of this timeshift is the terrible way that systemd-timesyncd > works, depending on a single NTP-server. If that one is going haywire, > systemd-timesyncd does not check with others, but just sets the clock on > your machine incorrect. We will fix this with chrony. > > However, what I don’t understand is that why the cluster does not see > the single monitor as incorrect, but the two correct machines as > incorrect. Is this because one of the three is master-ish? I believe yes. “ceph mon stat” will tell you which one is the leader. > Obviously we will fix the time issues, but I would like to understand > the reasoning of Ceph to stop functioning because one monitor has > incorrect time. > > Thanks! > > -- > Mark Schouten > CTO, Tuxis B.V. | https://www.tuxis.nl/ > <mark@xxxxxxxx> <mailto:mark@xxxxxxxx> | +31 318 200208 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx