Re: High OSD apply latency right after new year (the leap second?)

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 9 Jan 2017 14:59:18 -0800

On Wed, Jan 4, 2017 at 11:59 PM, Craig Chi <craigchi@xxxxxxxxxxxx> wrote:
> Hi ,
>
> I'm glad to know that it happened not only to me.
> Though it is unharmful, it seems like kind of bug...
> Are there any Ceph developers who know how exactly is the implementation of
> "ceph osd perf" command?
> Is the leap second really responsible for this behavior?
> Thanks.

Since no one else has shared any ideas, I'll just say I am befuddled
by this scenario. I would assume from the timing it has to do with the
leap second.

I see when I do this on one of our local nodes that it has a sum of
4295929408.753196719 and a count of, for ~47 seconds each. But a
sibling OSD on the same host has 23391518 and 190575.354100817 (about
8ms, as expected). That first sum is startlingly close to
2^32=4294967296 — so I'm thinking an op occurred during the leap
second and either the clock or our math went back in time and we had a
wraparound error. ;)

You can probably clear it by resetting the perf counters.
-Greg

>
> Sincerely,
> Craig Chi
>
> On 2017-01-04 19:55, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>
> yes,
> same here on 3 productions clusters.
>
> no impact, but a nice happy new year alert ;)
>
>
> Seem that google provide ntp servers to avoid brutal 1 second leap
>
> https://developers.google.com/time/smear
>
>
> ----- Mail original -----
> De: "Craig Chi" <craigchi@xxxxxxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Mercredi 4 Janvier 2017 11:26:21
> Objet:  High OSD apply latency right after new year (the leap
> second?)
>
> Hi List,
>
> Three of our Ceph OSDs got unreasonably high latency right after the first
> second of the new year (2017/01/01 00:00:00 UTC, I have attached the metrics
> and I am in UTC+8 timezone). There is exactly a pg (size=3) just contains
> these 3 OSDs.
>
> The OSD apply latency is usually up to 25 minutes, and I can also see this
> large number randomly when I execute "ceph osd perf" command. But the 3 OSDs
> does not have strange behavior and are performing fine so far.
>
> I have no idea how "ceph osd perf" is implemented, but does it have relation
> to the leap second this year? Since the cluster is not on production, and
> the developers were all celebrating new year at that time, I can not think
> of other possibilities.
>
> Do your cluster also get this interestingly unexpected new year's gift too?
> Sincerely,
> Craig Chi
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com