Re: kvm guest: hrtimer: interrupt too slow

Michael Tokarev <mjt@xxxxxxxxxx> · Thu, 08 Oct 2009 12:09:57 +0400

[]
hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
[]
All that does not make sense anymore in a guest. The hang detection
and warnings, the recalibrations of the min_clock_deltas are completely
wrong in this context.
Not only does it spuriously warn, but the minimum timer is increasing
slowly and the guest progressively suffers from higher and higher
latencies.

Well, it's not "slowly", -- that huge jump shown above is typical.
If my calculations are correct, that's about 0.5 sec min_delta.

That's really bad.

*nod* :)

Your patch lowers the immediate impact and makes this illness evolving
smoother by scaling down the recalibration to the min_clock_delta.
This appeases the bug but doesn't solve it. I fear it could be even
worse because it makes it more discreet.

Well, long-term it's not worse still.  New code has a chance to hitting
the same values for min_delta in a long run, but this chance is so small
and the time spent is so long that it can be forgotten about completely.

May be can we instead increase the minimum threshold of loop in the
hrtimer interrupt before considering it as a hang? Hmm, but a too high
number could make this check useless, depending of the number of pending
timers, which is a finite number.

Well, actually I'm not confident anymore in this check. Or actually we
should change it. May be we can rebase it on the time spent on the hrtimer
interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).

Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
a reasonable check to perform?
We should probably base our check on such kind of high boundary.
What we want is an ultimate rescue against hard hangs anyway, not
something that can solve the hang source itself. After the min_clock_delta
recalibration, the system will be unstable (eg: high latencies).
So if this must behave as a hammer, let's ensure we really need this hammer,
even if we need to wait for few seconds before it triggers.

By the way, all other cases I've seen this message (hrtimer: interrupt too slow..)
triggering, the problems were elsewhere and re-calibrating timer was not a good
idea anyway, because the problem was elsewhere and changing timer didn't solve
it.

Back into the vm issue at hand.  I (almost) understand what's happening in
the discussion above, but I does not see how it is possible to have such a
*huge* delays explained by scheduling on a different CPU etc.  The delays
are measured in *seconds*, not nano- or micro-secs etc.

I can imagine, say, swapping on host that causes the whole guest to be
swapped out for a while during the timer interrupt handling for example.
But it is NOT what's happening here, at least not that I can see it.
Yes host had some swapping:

 pswpin 17535
 pswpout 41602

but it's not massive and I know when exactly it happened - when I was testing
something else.  Right now free(1) reports:

             total       used       free     shared    buffers     cached
Mem:       8155280    8105704      49576          0    1209136      27440
-/+ buffers/cache:    6869128    1286152
Swap:      8388856     124112    8264744

(and f*ng vmstat that, again, does not show swapping activity at all)

So, I think, the problem is somewhere elsewhere.

By the way, I *think* it only happens with kvm_clock, and does not
happen with acpi_pm clocksource.  Is it worth to check?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html