[]
hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
[]
All that does not make sense anymore in a guest. The hang detection and warnings, the recalibrations of the min_clock_deltas are completely wrong in this context. Not only does it spuriously warn, but the minimum timer is increasing slowly and the guest progressively suffers from higher and higher latencies.
Well, it's not "slowly", -- that huge jump shown above is typical. If my calculations are correct, that's about 0.5 sec min_delta.
That's really bad.
*nod* :)
Your patch lowers the immediate impact and makes this illness evolving smoother by scaling down the recalibration to the min_clock_delta. This appeases the bug but doesn't solve it. I fear it could be even worse because it makes it more discreet.
Well, long-term it's not worse still. New code has a chance to hitting the same values for min_delta in a long run, but this chance is so small and the time spent is so long that it can be forgotten about completely.
May be can we instead increase the minimum threshold of loop in the hrtimer interrupt before considering it as a hang? Hmm, but a too high number could make this check useless, depending of the number of pending timers, which is a finite number. Well, actually I'm not confident anymore in this check. Or actually we should change it. May be we can rebase it on the time spent on the hrtimer interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts). Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be a reasonable check to perform? We should probably base our check on such kind of high boundary. What we want is an ultimate rescue against hard hangs anyway, not something that can solve the hang source itself. After the min_clock_delta recalibration, the system will be unstable (eg: high latencies). So if this must behave as a hammer, let's ensure we really need this hammer, even if we need to wait for few seconds before it triggers.
By the way, all other cases I've seen this message (hrtimer: interrupt too slow..) triggering, the problems were elsewhere and re-calibrating timer was not a good idea anyway, because the problem was elsewhere and changing timer didn't solve it. Back into the vm issue at hand. I (almost) understand what's happening in the discussion above, but I does not see how it is possible to have such a *huge* delays explained by scheduling on a different CPU etc. The delays are measured in *seconds*, not nano- or micro-secs etc. I can imagine, say, swapping on host that causes the whole guest to be swapped out for a while during the timer interrupt handling for example. But it is NOT what's happening here, at least not that I can see it. Yes host had some swapping: pswpin 17535 pswpout 41602 but it's not massive and I know when exactly it happened - when I was testing something else. Right now free(1) reports: total used free shared buffers cached Mem: 8155280 8105704 49576 0 1209136 27440 -/+ buffers/cache: 6869128 1286152 Swap: 8388856 124112 8264744 (and f*ng vmstat that, again, does not show swapping activity at all) So, I think, the problem is somewhere elsewhere. By the way, I *think* it only happens with kvm_clock, and does not happen with acpi_pm clocksource. Is it worth to check? Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html