Re: Date drift and ntpd

"Jason Pyeron" <jpyeron@xxxxxxxx> · Thu, 12 Aug 2010 09:27:54 -0400

> -----Original Message-----
> From: Todd Denniston
> Sent: Thursday, August 12, 2010 9:07
> Jason Pyeron wrote, On 08/12/2010 08:01 AM:
> >> -----Original Message-----
> >> From: Simon Billis
> >> Sent: Thursday, August 12, 2010 7:36
> >>
> >> Jason Pyeron sent a missive on 2010-08-12:
> >>
> >>> We have a local time server and all of our machines are
> >> pointed at it
> >>> for the time.
> >>>
> >>> How can the clock drift by a day and a half?
> >>>
> >>> [root@devserver21 ~]# date
> >>> Fri Aug 13 14:43:29 EDT 2010
> >>> [root@devserver21 ~]# rdate -s 192.168.1.67
> >>> [root@devserver21 ~]# date
> >>> Thu Aug 12 07:02:39 EDT 2010
> >>> [root@devserver21 ~]# cat /etc/ntp.conf | grep -v ^# | grep -v ^$ 
> >>> restrict default nomodify notrap noquery restrict 127.0.0.1 server
> >>> 192.168.1.67 server 192.168.1.66 server 192.168.1.65
> >>> server  127.127.1.0     # local clock
> >>> fudge   127.127.1.0 stratum 10
> >>> driftfile /var/lib/ntp/drift
> >>> broadcastdelay  0.008
> >>> keys            /etc/ntp/keys
> >>>
> >>>
> >> Hi,
> >>
> >> It is unlikely that the machine in question drifted 
> forward in time 
> >> if ntpd was running. Have a look at the logs /var/log/messages it 
> >> should contain the ntpd log messages
> > 
> > [root@devserver21 ~]# grep ntpd /var/log/messages </snip> Jul 28 
> > 20:34:41 devserver21 ntpd[3475]: synchronized to 
> 192.168.1.65, stratum 
> > 3 Jul 28 21:08:00 devserver21 ntpd[3475]: synchronized to LOCAL(0), 
> > stratum 10 Jul 28 21:08:00 devserver21 ntpd[3475]: frequency error 
> > -512 PPM exceeds tolerance 500 PPM Jul 28 21:08:11 devserver21 
> > ntpd[3475]: synchronized to 192.168.1.66, stratum 3 Jul 28 21:24:58 
> > devserver21 ntpd[3475]: synchronized to 192.168.1.65, 
> stratum 3 Jul 28 
> > 21:41:26 devserver21 ntpd[3475]: synchronized to 
> 192.168.1.67, stratum 
> > 3 Jul 28 21:42:16 devserver21 ntpd[3475]: synchronized to LOCAL(0), 
> > stratum 10 Jul 28 21:42:16 devserver21 ntpd[3475]: frequency error 
> > -512 PPM exceeds tolerance 500 PPM Jul 28 21:42:34 devserver21 
> > ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 
> PPM Jul 28 
> > 21:43:37 devserver21 ntpd[3475]: frequency error -512 PPM exceeds 
> > tolerance 500 PPM
> 
> > tolerance 500 PPM
> > Jul 28 22:12:07 devserver21 ntpd[3475]: frequency error -512 PPM 
> > exceeds tolerance 500 PPM Jul 28 22:13:13 devserver21 ntpd[3475]: 
> > frequency error -512 PPM exceeds tolerance 500 PPM Jul 28 22:14:17 
> > devserver21 ntpd[3475]: frequency error -512 PPM exceeds 
> tolerance 500 
> > PPM Jul 28 22:15:11 devserver21 ntpd[3475]: synchronized to 
> > 192.168.1.66, stratum 3 Jul 28 22:31:41 devserver21 ntpd[3475]: 
> > synchronized to LOCAL(0), stratum 10 Jul 28 22:31:41 devserver21 
> > ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM
> 
> > Jul 29 15:14:01 devserver21 ntpd[3475]: synchronized to LOCAL(0), 
> > stratum 10 Jul 29 15:26:05 devserver21 ntpd[3475]: synchronized to 
> > 192.168.1.65, stratum 3 Jul 29 15:59:17 devserver21 
> ntpd[3475]: time 
> > reset -1.599691 s Jul 29 16:03:31 devserver21 ntpd[3475]: 
> synchronized 
> > to LOCAL(0), stratum 10 Jul 29 16:05:38 devserver21 ntpd[3475]: 
> > synchronized to 192.168.1.67, stratum 3 Jul 29 16:08:46 devserver21 
> > ntpd[3475]: synchronized to 192.168.1.66, stratum 3 Jul 29 16:11:55 
> > devserver21 ntpd[3475]: synchronized to 192.168.1.65, stratum 3
> 
> > Jul 29 17:23:57 devserver21 ntpd[3475]: synchronized to 
> 192.168.1.67, 
> > stratum 3 Jul 29 17:24:59 devserver21 ntpd[3475]: synchronized to 
> > LOCAL(0), stratum 10 Jul 29 17:30:46 devserver21 ntpd[3475]: 
> > synchronized to 192.168.1.65, stratum 3 Jul 29 17:47:24 devserver21 
> > ntpd[3475]: synchronized to LOCAL(0), stratum 10 Aug 12 22:48:29 
> > devserver21 ntpd[3475]: sendto(192.168.1.66): Operation not 
> permitted
> > [root@devserver21 ~]# uptime
> >  08:10:19 up 164 days,  9:56,  2 users,  load average: 0.20, 0.54, 
> > 0.81
> > [root@devserver21 ~]#
> 
> Assumption: this is not from any kind of virtual machine.

Correct.

> Assumption: Your local time server is NOT a GPS with an 
> ovenized crystal or even a cell phone time source, i.e. NOT 
> very stable.

Correct.

> Assumption: the time servers that you are following 
> (192.168.1.6[57]) are:
> 	a) each following the same timeserver(s), or at least 
> have one in common.

192.168.1.6[567] are one machine. Time on that one is/has been good. Other
machines in the enterprise follow it accurately.

> 	b) peering with one another

n/a

> 	c) following time servers that are reasonably stable.

Appears to be so.

> Assumption: the time farm is on real, non busy (an old cisco 
> router serving as the internet connection to 1000+ computers 
> does not qualify as non busy), hardware and is configured to 
> archive maxpoll 10 or higher.

Unknown, assuming the latency is neglibile. The important detail here is that
all the machines in the lan have the same time. There is no unusual latency
there.

> 
> one problem that you have is that your timeserver farm 
> (192.168.1.6[57]) is occasionally loosing its servers, i.e. 
> we see "synchronized to LOCAL(0)" occasionally, which should 

That was on a ntp client, not the ntp server. Am I misunderstanting you?

> not happen with a well configured time farm for hours to 
> days, not minutes.

Agreed, see above.

> 
> the second problem is that a machine which is not intended to 
> be a time server is configured with a local clock with a 
> stratum better than 15.
> 

I don't understand, I will have to read up more.

> suggestion 1: 65 should have local clock at stratum 13, 66 
> and 67 should have local clock at stratum

They are presently one machine.

> 14 or 15, all other machines should not have a local clock or 
> should not have one with a stratum better than 15. Yes I, 
> after reading the ntp documentation, disagree with RedHat's default.

Ok.

> net result should be that you don't get any local clock loops 
> in the setup because you have a defined leader, but if even 
> the defined leader is lost the other machines should do a 
> stable drift.
> 
> suggestion 2: 65, 66 & 67 should ALL peer with one another 
> for added stability in the time farm.
> 
> suggestion 3: client machines should 'prefer' one of your 
> servers over the others.
> 
> suggestion 4: see if someone has been messing with the kernel 
> ticks on the machine...
> run `tickadj` file:///usr/share/doc/ntp-4.2.2p1/tickadj.html

[root@devserver21 ~]# tickadj
tick = 10000

> I had one computer where I needed to tweak the default value 
> up or down one (I don't remember) to have it be real stable, 
> this should be a last resort.
> 
> 
> --
> Todd Denniston
> Crane Division, Naval Surface Warfare Center (NSWC Crane) 
> Harnessing the Power of Technology for the Warfighter 
> _______________________________________________
> CentOS mailing list
> CentOS@xxxxxxxxxx
> http://lists.centos.org/mailman/listinfo/centos
> 

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-                                                               -
- Jason Pyeron                      PD Inc. http://www.pdinc.us -
- Principal Consultant              10 West 24th Street #100    -
- +1 (443) 269-1555 x333            Baltimore, Maryland 21218   -
-                                                               -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos