Re: Local NTP servers on monitor node's.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Returning to this thread, I finally managed to capture the problem I'm facing in a log. The time service to the outside world is blocked by our organisation's firewall and I'm restricted to use internal time servers. Unfortunately, these seem to be periodically unstable. I caught a time-excursion in the log extracts shown below. My problem now is that such a transient causes time-havoc on the cluster, because the servers start to adjust in all directions.

Our set-up is that a head-node syncs to the internal servers and all ceph servers sync against the head node. I was hoping that the ceph servers would follow the head node more or less in unison. Unfortunately, with a short transient excursion of upstream time sources as observed below, this is not the case.

What I would like to configure is a higher inertia for the head node to avoid it trying to follow the steep forward-backward jumps seen in the log. I'm not sure peering the mons up will solve that. It might keep the difference between mons low, but the general mess will still occur on the other nodes.

Is there a config to tell the head node to take it easy with jumps in the external clock source?

Here the observation. It is annotated and filtered to contain only lines where the offset changes and I reduced it to show the incident with few lines, all as seen from the head node:

Mar 15 00:01:01 ceph   :      remote           refid      st t when poll reach   delay   offset  jitter
Mar 15 00:01:01 ceph   : ==============================================================================
...
Mar 15 14:40:57 ceph   : *time-server1    aaa.bb.cc.dd     2 u   23 1024  377    2.154    0.264   0.066
Mar 15 14:56:01 ceph   : +time-server3    aaa.bb.cc.dd     3 u   51 1024  377    1.364    0.176   0.229
Mar 15 14:59:02 ceph   : *time-server1    aaa.bb.cc.dd     2 u   52 1024  377    2.107    0.294   0.059
- everything good until now, time-server 2 goes out of sync first
Mar 15 15:08:04 ceph   : +time-server2    aaa.bb.cc.dd     3 u   59 1024  377    1.603  -107.04 107.000
- time-server 3 follows suit
Mar 15 15:14:06 ceph   : +time-server3    aaa.bb.cc.dd     3 u   58 1024  377    1.287  -156.89 156.993
- time-server 2 gets even worse
Mar 15 15:25:09 ceph   : +time-server2    aaa.bb.cc.dd     3 u    9 1024  377    1.458  -250.57 238.232
- time-server 1 (actual clock source) goes out of sync
Mar 15 15:33:11 ceph   : *time-server1    aaa.bb.cc.dd     2 u   20 1024  377    2.134  -181.74 171.042
Mar 15 15:48:15 ceph   : +time-server3    aaa.bb.cc.dd     3 u   15 1024  377    1.242  -196.71 167.258
Mar 15 16:00:19 ceph   : +time-server2    aaa.bb.cc.dd     3 u    2 1024  377    1.417  -169.50 135.325
- attempt of the head node to follow (or another jump of upstresm?)
Mar 15 16:08:21 ceph   : *time-server1    aaa.bb.cc.dd     2 u   14   64    1    1.451   61.178   0.195
- from now on its a mess, it took about 18 hours to get back to fully synchronized state
Mar 15 16:08:21 ceph   : +time-server2    aaa.bb.cc.dd     3 u   14   64    1    1.380   22.523   0.974
Mar 15 16:08:21 ceph   : +time-server3    aaa.bb.cc.dd     3 u   13   64    1    1.230   42.889   0.239
Mar 15 16:14:22 ceph   : +time-server3    aaa.bb.cc.dd     3 u   43   64   77    1.241   43.113   7.179
Mar 15 16:16:23 ceph   : *time-server1    aaa.bb.cc.dd     2 u   34   64  377    1.465   61.258  13.420
Mar 15 16:17:23 ceph   : *time-server1    aaa.bb.cc.dd     2 u   26   64  377    2.093   45.879   5.524
...

I know that the providers of the time service should get their act together, but I doubt that will happen and I would like to harden my time sync config to survive such events without chaos. If anyone can point me to a suitable config, please do. I need a way to smoothen out steep upstream oscillations, like a low-pass filter would do.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 01 February 2022 15:32
To: Janne Johansson
Cc: Ceph Users
Subject:  Re: Local NTP servers on monitor node's.

Hi Janne,

to ask the obviously stupid question: what does the NTP config file for a local NTP cluster with upstream time source look like? The man page for ntp.conf is too much mumbo jumbo and too fragmented for me.

Assume I have MONs at 192.168.0.65 - 67, would this config fragment on 192.168.0.65 with similar ones on the other 2 hosts work:

------------------------------------------
restrict 192.168.0.0 mask 255.255.224.0 nomodify notrap nopeer
restrict 192.168.0.65 nomodify notrap
restrict 192.168.0.66 nomodify notrap
restrict 192.168.0.67 nomodify notrap

server 192.168.0.66 iburst
server 192.168.0.67 iburst
peer 192.168.0.66
peer 192.168.0.67
------------------------------------------

I can't find the man page (centos 7) that describes the peer command. It has options but they are not explained anywhere.

Thanks a lot!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Janne Johansson <icepic.dz@xxxxxxxxx>
Sent: 08 December 2021 09:14
To: mhnx
Cc: Ceph Users
Subject:  Re: Local NTP servers on monitor node's.

Den ons 8 dec. 2021 kl 02:35 skrev mhnx <morphinwithyou@xxxxxxxxx>:
> I've been building Ceph clusters since 2014 and the most annoying and
> worst failure is the NTP server faults and having different times on
> Ceph nodes.
>
> I've fixed few clusters because of the ntp failure.
> - Sometimes NTP servers can be unavailable,
> - Sometimes NTP servers can go crazy.
> - Sometimes NTP servers can respond but systemd-timesyncd can not sync
> the time without manual help.
>
> I don't want to deal with another ntp problem and because of that I've
> decided to build internal ntp servers for the cluster.
>
> I'm thinking of creating 3 NTP servers on the 3 monitor nodes to get
> an internal ntp server cluster.
> I will use the internal NTP cluster for the OSD nodes and other services.
> With this way, I believe that I'll always have a stable and fast time server.

We do something like this. mons gather "calendar time" from outside
ntp servers, but also peer against eachother, so if/when they drift
away the mons drift away equal amounts, then all OSDs/RGWs and ceph
clients pull time from the mons who serve internal ntp based on their
idea of what time it is.

Not using systemd, but both chronyd and ntpd allow you to set peers
for which you sync "sideways" just to keep the pace in-between hosts.

--
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux