Re: clock skew

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/12/2014 05:04 PM, John Nielsen wrote:
On Mar 12, 2014, at 10:44 AM, Gandalf Corvotempesta <gandalf.corvotempesta@xxxxxxxxx> wrote:

2014-01-30 18:41 GMT+01:00 Eric Eastman <eric0e@xxxxxxx>:
I have this problem on some of my Ceph clusters, and I think it is due to
the older hardware the I am using does not have the best clocks.  To fix the
problem, I setup one server in my lab to be my local NTP time server, and
then on each of my Ceph monitors, in the /etc/ntp.conf file, I put in a
single "server" line that reads:

   server XX.XX.XX.XX iburst burst minpoll 4 maxpoll 5

I'm using a local NTP server, all Mons are synced with local NTP but
ceph still detect a clock skew

Machine clocks aren't perfect, even with NTP. Ceph by default is very sensitive. I usually add this to my ceph.conf to prevent the warnings:

[mon]
   mon clock drift allowed = .500

That is, allow the clocks to drift up to 1/2 second before saying anything.


Having this as a tunable option is indeed meant to allow one to even find the best value. The current default of .05 was increased from an earlier .01 just because our lab's NTP server wasn't able to keep the clocks that synchronized.

However, these warnings are meant to act as an early warning system for the monitor. There are some critical messages that need being passed, and some timeouts that need to be reset in time. Failure to do so results in weirdness. And unlike the OSDs, the monitors do rely in real time, hence the need for synchronized server clocks; and failure to maintain those clocks synchronized for some time may eventually have repercussions: monitors receiving timestamps somewhat in the past, thus ignoring them, or timeouts being triggered too soon/late due because a message wasn't dully received.

Anyway, most timeouts will hold for 5 seconds. Allowing clock drifts up to 1 second may work, but we don't have hard data to support such claim. Over a second of drift may be problematic if the monitors are under some workload and message handling is delayed -- in which case other timeouts may have to be adjusted, not only to account for the clock skew but the amount of work the monitor has to deal with.


  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux