Hi Robin, thanks for your comment. I am indeed thinking about disabling step adjustments if necessary. For now I looked at the log to estimate how long the upstream mess lasted and configured tinker stepout 43200 in the hope that I can sit out such an excursion. I try to allow step adjustments in case there is a real one. I'm watching with the new setting and will continue my saga after the next event. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Robin H. Johnson <robbat2@xxxxxxxxxx> Sent: 19 March 2022 04:59:07 To: Frank Schilder Cc: Ceph Users Subject: Re: Re: Local NTP servers on monitor node's. On Wed, Mar 16, 2022 at 10:49:15AM +0000, Frank Schilder wrote: > Returning to this thread, I finally managed to capture the problem I'm > facing in a log. The time service to the outside world is blocked by > our organisation's firewall and I'm restricted to use internal time > servers. Unfortunately, these seem to be periodically unstable. I > caught a time-excursion in the log extracts shown below. My problem > now is that such a transient causes time-havoc on the cluster, because > the servers start to adjust in all directions. ... > Is there a config to tell the head node to take it easy with jumps in > the external clock source? This is the "step" config knobs. > Here the observation. It is annotated and filtered to contain only > lines where the offset changes and I reduced it to show the incident > with few lines, all as seen from the head node: ... > I know that the providers of the time service should get their act > together, but I doubt that will happen and I would like to harden my > time sync config to survive such events without chaos. If anyone can > point me to a suitable config, please do. I need a way to smoothen out > steep upstream oscillations, like a low-pass filter would do. If you did filter out the sudden jumps, you'd end up with your mons all (rightly) distrusting the bad time service, and then they could drift on their own. There are better timenuts than I on the list, but I think the following MIGHT be a reasonable course of action. 1. Disable time stepping: "tinker stepfwd 0 stepback 0" (the exact syntax might vary depending on NTP version) 2. Set up your mons all be NTP servers (possibly in addition to the existing head node). They should peer with each other explicitly. 3. Set up the rest of your cluster to consume from the mons ONLY. 4. Optional: if your time service providers are unreliable, investigate build/buying your own, and use it to feed time to the mons. If all the mons end up distrusting the time-service you have, they *should* retain consistent time between themselves, and thus the clients should also keep consistent time. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx