Re: Ceph not warning about clock skew on an OSD-only host?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 12 Aug 2020 17:23:59 -0700

On Wed, Aug 12, 2020 at 4:09 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>
> My understanding is that the existing mon_clock_drift_allowed value of 50 ms (default) is so that PAXOS among the mon quorum can function.  So OSDs (and mgrs, and clients etc) are out of scope of that existing code.

This is correct — the monitors issue leases to each other, those
leases are in absolute clock time, and we need some level of coherency
to maintain our consistency guarantees there...

>
> Things like this are why I like to ensure that the OS does `ntpdate -b` or equivalent at boot-time *before* starting ntpd / chrony - and other daemons.
>
> Now, as to why Ceph doesn’t have analogous code to a complain about other daemons / clients - I’ve wonder that for some time myself.  Perhaps there’s the idea that one’s monitoring infrastructure should detect that, but that’s a guess.

...but none of the rest of the Ceph stack that needs clocks to be
anywhere near each other is the cephx rotating keys — and those need
to be correct on the order of "within an hour" instead of "we do
5-second leases that need to agree on absolute time".

Usually, that kind of vague agreement isn't an issue, and in that core
RADOS code we usually try not to impose requirements on the
environment, so there's not a clock sync check happening. Perhaps at
this point it would be appropriate to add one; tracker tickets and PRs
are welcome. ;)
-Greg

>
> > Yesterday, one of our OSD-only hosts came up with its clock about 8 hours wrong(!) having been out of the cluster for a week or so. Initially, ceph seemed entirely happy, and then after an hour or so it all went South (OSDs start logging about bad authenticators, I/O pauses, general sadness).
> >
> > I know clock sync is important to Ceph, so "one system is 8 hours out, Ceph becomes sad" is not a surprise. It is perhaps a surprise that the OSDs were allowed in at all...
> >
> > What _is_ a surprise, though, is that at no point in all this did Ceph raise a peep about clock skew. Normally it's pretty sensitive to this - our test cluster has had clock skew complaints when a mon is only slightly out, and here we had a node 8 hours wrong.
> >
> > Is there some oddity like Ceph not warning on clock skew for OSD-only hosts? or an upper bound on how high a discrepency it will WARN about?
> >
> > Regards,
> >
> > Matthew
> >
> > example output from mid-outage:
> >
> > root@sto-3-1:~#  ceph -s
> >  cluster:
> >    id:     049fc780-8998-45a8-be12-d3b8b6f30e69
> >    health: HEALTH_ERR
> >            40755436/2702185683 objects misplaced (1.508%)
> >            Reduced data availability: 20 pgs inactive, 20 pgs peering
> >            Degraded data redundancy: 367431/2702185683 objects degraded (0.014%), 4549 pgs degraded
> >            481 slow requests are blocked > 32 sec. Implicated osds 188,284,795,1278,1981,2061,2648,2697
> >            644 stuck requests are blocked > 4096 sec. Implicated osds 22,31,33,35,101,116,120,130,132,140,150,159,201,211,228,263,327,541,561,566,585,589,636,643,649,654,743,785,790,806,865,1037,1040,1090,1100,1104,1115,1134,1135,1166,1193,1275,1277,1292,1494,1523,1598,1638,1746,2055,2069,2191,2210,2358,2399,2486,2487,2562,2589,2613,2627,2656,2713,2720,2837,2839,2863,2888,2908,2920,2928,2929,2947,2948,2963,2969,2972
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx