Re: mon down for 3 hours after clocksource glitch

Dan van der Ster <daniel.vanderster@xxxxxxx> · Thu, 18 Jul 2013 20:54:10 +0200

On Thu, Jul 18, 2013 at 6:27 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> Hi Dan,
>
> On Thu, 18 Jul 2013, Dan van der Ster wrote:
>> Hi,
>> Last night our cluster became unhealthy for 3 hours after one of the mons (a
>> qemu-kvm VM) had this glitch:
>>
>> Jul 18 00:12:43 andy03 kernel: Clocksource tsc unstable (delta =
>> -60129537028 ns).  Enable clocksource failover by adding
>> clocksource_failover kernel parameter.
>>
>> shortly afterwords the mon.2 log said:
>>
>> 2013-07-18 00:13:01.147770 7feca7a5d700  1 mon.2@1(probing).data_health(52)
>> service_dispatch not in quorum -- drop message
>> 2013-07-18 00:13:01.148622 7feca7a5d700  1 mon.2@1(synchronizing sync(
>> requester state start )) e3 sync_obtain_latest_monmap
>> 2013-07-18 00:13:01.148786 7feca7a5d700  1 mon.2@1(synchronizing sync(
>> requester state start )) e3 sync_obtain_latest_monmap obtained monmap e3
>> 2013-07-18 00:14:01.208569 7feca845e700  0 mon.2@1(synchronizing sync(
>> requester state chunks )).data_health(52) update_stats avail 59% total
>> 8547036 used 2993036 avail 5119824
>> 2013-07-18 00:15:01.298686 7feca845e700  0 mon.2@1(synchronizing sync(
>> requester state chunks )).data_health(52) update_stats avail 58% total
>> 8547036 used 3074968 avail 5037892
>> 2013-07-18 00:16:08.455941 7feca845e700  0 mon.2@1(synchronizing sync(
>> requester state chunks )).data_health(52) update_stats avail 58% total
>> 8547036 used 3147732 avail 4965128
>> ?
>>
>> and that continued for over three hours until:
>>
>> 2013-07-18 03:32:33.991232 7f334ecd0700  0 mon.2@1(synchronizing sync(
>> requester state chunks )).data_health(0) update_stats avail 56% total
>> 8547036 used 3260064 avail 4852796
>> 2013-07-18 03:33:34.314538 7f334ecd0700  0 mon.2@1(synchronizing sync(
>> requester state chunks )).data_health(0) update_stats avail 56% total
>> 8547036 used 3294832 avail 4818028
>> 2013-07-18 03:34:05.285568 7f334e2cf700  0 log [INF] : mon.2 calling new
>> monitor election
>> 2013-07-18 03:34:05.285747 7f334e2cf700  1 mon.2@1(electing).elector(52)
>> init, last seen epoch 52
>>
>> In the meantime I tried restarting each ceph-mon daemon, but that didn't
>> speed up the recovery.
>>
>> We are talking with our OpenStack team to understand if they can provide a
>> more stable clocksource, but we wanted to better understand why Ceph was so
>> sensitive to this glitch. It is good that mon.2 managed to recover
>> eventually, but does anyone have an idea why it took 3 hours??!!
>
> A couple of possibilities.  There are some sync improvements in 0.61.5
> (just out) that can prevent sync loops, which might explain things (hard
> to say from the log fragments above).  Were there sync start messages in
> that 3 hour interval, or just the update_stats messages?

Yes, many of those too. The whole set of relevant logs is here:
http://pastebin.com/hCcu8Wm9
The weird bit at  01:13:12 is me telneting to the daemon to check the
network, so just ignore that. Otherwise, the daemon was restarted
twice (once after a server rebooted) but I left it after ~1:33.

> If the clock skew peristed, the mons won't be able to hold a quorum, but
> that would happen after the election.  A 60s skew is definitely enough to
> cause problems...

I can't say for certain if the skew persisted or not -- but I had the
impression the clocks were ok shortly after the glitch. After the
machine rebooted ntp corrected the time by a bit -- not 60s:

Jul 18 01:40:44 andy03 ntpd[4918]: time reset -0.775446 s

If the ceph-mon log is useful or not to explain the long outage, that
would be good to know (especially if it is fixed in 0.61.5). In any
case, we're moving our mons to physical boxes soon, so the clock
problem will hopefully go away.

Thanks, Dan

>
> sage
>
>
>> thx, Dan
>>
>> --
>> Dan van der Ster
>> CERN IT-DSS
>>
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com