Re: clock skew

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Thanks for all replies!

@Huang: ceph time-sync-status is exactly what I was looking for, thanks!

@Janne: i will checkout/implement the peer config per your suggestion. However what confuses us is that chrony thinks the clocks match, and only ceph feels it doesn't. So we are not sure if the peer config will actually help in this situation. But time will tell.

@John: Thanks for the maxsources suggestion

@Bill: thanks for the interesting article, will check it out!

MJ

On 4/25/19 5:47 PM, Bill Sharer wrote:
If you are just synching to the outside pool, the three hosts may end up latching on to different outside servers as their definitive sources. You might want to make one of the three a higher priority source to the other two and possibly just have it use the outside sources as sync. Also for hardware newer than about five years old, you might want to look at enabling the NIC clocks using LinuxPTP to keep clock jitter down inside your LAN.  I wrote this article on the Gentoo wiki on enabling PTP in chrony.

https://wiki.gentoo.org/wiki/Chrony_with_hardware_timestamping

Bill Sharer


On 4/25/19 6:33 AM, mj wrote:
Hi all,

On our three-node cluster, we have setup chrony for time sync, and even though chrony reports that it is synced to ntp time, at the same time ceph occasionally reports time skews that can last several hours.

See for example:

root@ceph2:~# ceph -v
ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)
root@ceph2:~# ceph health detail
HEALTH_WARN clock skew detected on mon.1
MON_CLOCK_SKEW clock skew detected on mon.1
    mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s (latency 0.000591877s)
root@ceph2:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum         : 10
Ref time (UTC)  : Wed Apr 24 19:05:28 2019
System time     : 0.000000133 seconds slow of NTP time
Last offset     : -0.003333524 seconds
RMS offset      : 0.003333524 seconds
Frequency       : 12.641 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Update interval : 1.4 seconds
Leap status     : Normal
root@ceph2:~#

For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced similarly with NTP on the two other nodes.

We don't understand this...

I have now injected mon_clock_drift_allowed 0.7, so at least we have HEALTH_OK again. (to stop upsetting my monitoring system)

But two questions:

- can anyone explain why this is happening, is it looks as if ceph and NTP/chrony disagree on just how time-synced the servers are..?

- how to determine the current clock skew from cephs perspective? Because "ceph health detail" in case of HEALTH_OK does not show it. (I want to start monitoring it continuously, to see if I can find some sort of pattern)

Thanks!

MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux