Returning to this thread, I finally managed to capture the problem I'm facing in a log. The time service to the outside world is blocked by our organisation's firewall and I'm restricted to use internal time servers. Unfortunately, these seem to be periodically unstable. I caught a time-excursion in the log extracts shown below. My problem now is that such a transient causes time-havoc on the cluster, because the servers start to adjust in all directions. Our set-up is that a head-node syncs to the internal servers and all ceph servers sync against the head node. I was hoping that the ceph servers would follow the head node more or less in unison. Unfortunately, with a short transient excursion of upstream time sources as observed below, this is not the case. What I would like to configure is a higher inertia for the head node to avoid it trying to follow the steep forward-backward jumps seen in the log. I'm not sure peering the mons up will solve that. It might keep the difference between mons low, but the general mess will still occur on the other nodes. Is there a config to tell the head node to take it easy with jumps in the external clock source? Here the observation. It is annotated and filtered to contain only lines where the offset changes and I reduced it to show the incident with few lines, all as seen from the head node: Mar 15 00:01:01 ceph : remote refid st t when poll reach delay offset jitter Mar 15 00:01:01 ceph : ============================================================================== ... Mar 15 14:40:57 ceph : *time-server1 aaa.bb.cc.dd 2 u 23 1024 377 2.154 0.264 0.066 Mar 15 14:56:01 ceph : +time-server3 aaa.bb.cc.dd 3 u 51 1024 377 1.364 0.176 0.229 Mar 15 14:59:02 ceph : *time-server1 aaa.bb.cc.dd 2 u 52 1024 377 2.107 0.294 0.059 - everything good until now, time-server 2 goes out of sync first Mar 15 15:08:04 ceph : +time-server2 aaa.bb.cc.dd 3 u 59 1024 377 1.603 -107.04 107.000 - time-server 3 follows suit Mar 15 15:14:06 ceph : +time-server3 aaa.bb.cc.dd 3 u 58 1024 377 1.287 -156.89 156.993 - time-server 2 gets even worse Mar 15 15:25:09 ceph : +time-server2 aaa.bb.cc.dd 3 u 9 1024 377 1.458 -250.57 238.232 - time-server 1 (actual clock source) goes out of sync Mar 15 15:33:11 ceph : *time-server1 aaa.bb.cc.dd 2 u 20 1024 377 2.134 -181.74 171.042 Mar 15 15:48:15 ceph : +time-server3 aaa.bb.cc.dd 3 u 15 1024 377 1.242 -196.71 167.258 Mar 15 16:00:19 ceph : +time-server2 aaa.bb.cc.dd 3 u 2 1024 377 1.417 -169.50 135.325 - attempt of the head node to follow (or another jump of upstresm?) Mar 15 16:08:21 ceph : *time-server1 aaa.bb.cc.dd 2 u 14 64 1 1.451 61.178 0.195 - from now on its a mess, it took about 18 hours to get back to fully synchronized state Mar 15 16:08:21 ceph : +time-server2 aaa.bb.cc.dd 3 u 14 64 1 1.380 22.523 0.974 Mar 15 16:08:21 ceph : +time-server3 aaa.bb.cc.dd 3 u 13 64 1 1.230 42.889 0.239 Mar 15 16:14:22 ceph : +time-server3 aaa.bb.cc.dd 3 u 43 64 77 1.241 43.113 7.179 Mar 15 16:16:23 ceph : *time-server1 aaa.bb.cc.dd 2 u 34 64 377 1.465 61.258 13.420 Mar 15 16:17:23 ceph : *time-server1 aaa.bb.cc.dd 2 u 26 64 377 2.093 45.879 5.524 ... I know that the providers of the time service should get their act together, but I doubt that will happen and I would like to harden my time sync config to survive such events without chaos. If anyone can point me to a suitable config, please do. I need a way to smoothen out steep upstream oscillations, like a low-pass filter would do. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 01 February 2022 15:32 To: Janne Johansson Cc: Ceph Users Subject: Re: Local NTP servers on monitor node's. Hi Janne, to ask the obviously stupid question: what does the NTP config file for a local NTP cluster with upstream time source look like? The man page for ntp.conf is too much mumbo jumbo and too fragmented for me. Assume I have MONs at 192.168.0.65 - 67, would this config fragment on 192.168.0.65 with similar ones on the other 2 hosts work: ------------------------------------------ restrict 192.168.0.0 mask 255.255.224.0 nomodify notrap nopeer restrict 192.168.0.65 nomodify notrap restrict 192.168.0.66 nomodify notrap restrict 192.168.0.67 nomodify notrap server 192.168.0.66 iburst server 192.168.0.67 iburst peer 192.168.0.66 peer 192.168.0.67 ------------------------------------------ I can't find the man page (centos 7) that describes the peer command. It has options but they are not explained anywhere. Thanks a lot! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Janne Johansson <icepic.dz@xxxxxxxxx> Sent: 08 December 2021 09:14 To: mhnx Cc: Ceph Users Subject: Re: Local NTP servers on monitor node's. Den ons 8 dec. 2021 kl 02:35 skrev mhnx <morphinwithyou@xxxxxxxxx>: > I've been building Ceph clusters since 2014 and the most annoying and > worst failure is the NTP server faults and having different times on > Ceph nodes. > > I've fixed few clusters because of the ntp failure. > - Sometimes NTP servers can be unavailable, > - Sometimes NTP servers can go crazy. > - Sometimes NTP servers can respond but systemd-timesyncd can not sync > the time without manual help. > > I don't want to deal with another ntp problem and because of that I've > decided to build internal ntp servers for the cluster. > > I'm thinking of creating 3 NTP servers on the 3 monitor nodes to get > an internal ntp server cluster. > I will use the internal NTP cluster for the OSD nodes and other services. > With this way, I believe that I'll always have a stable and fast time server. We do something like this. mons gather "calendar time" from outside ntp servers, but also peer against eachother, so if/when they drift away the mons drift away equal amounts, then all OSDs/RGWs and ceph clients pull time from the mons who serve internal ntp based on their idea of what time it is. Not using systemd, but both chronyd and ntpd allow you to set peers for which you sync "sideways" just to keep the pace in-between hosts. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx