Re: Clocksource tsc unstable (delta = -4398046474878 ns)

"Beinicke, Thomas" <thomas.beinicke@xxxxxxxxxx> · Tue, 30 Mar 2010 19:04:21 +0200

On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> > On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> > > this message appeared in the KVM guest kern.log last night:
> > > 
> > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
> > > (delta = -4398046474878 ns)
> > > 
> > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> > > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> > > 
> > > If I understand things correct, in kernel/time/clocksource.c
> > > clocksource_watchdog() checks all the
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource
> > > every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> > > changed more than one hour within two subsequent calls of
> > > clocksource_watchdog. No event in the host nor anything in the
> > > guest gives reasonable cause for this step.
> > > 
> > > However, the number 4398046474878 is only 36226 ns away from
> > > 4*1024*1024*1024*1024
> > > 
> >   I didn't see any such messages but I've had a recent experience with
> > 
> > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> > two separate incidents.  Eerily the exact jumps, as best I can tell from
> > logs are of 17592 and 8796 seconds, give or take a second or two.  If
> > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> > nanoseconds.
> > 
> >   What I've done that seems to have avoided this happening again is drop
> > 
> > KVM_CLOCK kernel option from the kvm guests' kernel.
> 
> To my understanding, kvm-clock is the best and most reliable clocksource
> available, so I do not think it is a good idea to disable it.
> 
> There is a lot of bit shift operation happening with the clocksources,
> so there may be a real bug hidden somewhere in the code.
> Somehow ntp adjustment is involved, can this cause such huge steps?
> Im my case, I actually have NTP running in the guest. However, the
> statistics show a pretty stable timing here.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I am having the same problem occasional.
It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
it 100%. It just never occurs on VMs that only serve a few web pages though.
I also noticed that on a machine which has this problem even an ssh shell is 
*very* laggy so it's not just a cosmetic problem.

Would removing the hrtimer from the kernel config solve it or is it necessary 
for KVM?

I remember this problem has been posted her before though there wasn't any 
real conclusion or solution for it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html