Re: How do you deal with "clock skew detected"?

Jan Kasprzak <kas@xxxxxxxxxx> · Thu, 16 May 2019 11:13:48 +0200

Konstantin Shalygin wrote:
: >how do you deal with the "clock skew detected" HEALTH_WARN message?
: >
: >I think the internal RTC in most x86 servers does have 1 second resolution
: >only, but Ceph skew limit is much smaller than that. So every time I reboot
: >one of my mons (for kernel upgrade or something), I have to wait for several
: >minutes for the system clock to synchronize over NTP, even though ntpd
: >has been running before reboot and was started during the system boot again.
: 
: Definitely you should use chrony with iburst.

	OK, many responses (thanks for them!) suggest chrony, so I tried it:
With all three mons running chrony and being in sync with my NTP server
with offsets under 0.0001 second, I rebooted one of the mons:

	There still was the HEALTH_WARN clock_skew message as soon as
the rebooted mon starts responding to ping. The cluster returns to
HEALTH_OK about 95 seconds later.

	According to "ntpdate -q my.ntp.server", the initial offset
after reboot is about 0.6 s (which is the reason of HEALTH_WARN, I think),
but it gets under 0.0001 s in about 25 seconds. The remaining ~50 seconds
of HEALTH_WARN is inside Ceph, with mons being already synchronized.

	So the result is that chrony indeed synchronizes faster,
but nevertheless I still have about 95 seconds of HEALTH_WARN "clock skew
detected".

	I guess now the workaround now is to ignore the warning, and wait
for two minutes before rebooting another mon.

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
 laryross> As far as stealing... we call it sharing here.   --from rcgroups
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com