Re: kvm guest: hrtimer: interrupt too slow

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Thu, 8 Oct 2009 16:52:32 -0300

On Thu, Oct 08, 2009 at 06:06:39PM +0400, Michael Tokarev wrote:
> Thomas Gleixner wrote:
>> On Thu, 8 Oct 2009, Michael Tokarev wrote:
>>
>>> Thomas Gleixner wrote:
>>>> On Thu, 8 Oct 2009, Michael Tokarev wrote:
>>>>> Yesterday I was "lucky" enough to actually watch what's
>>>>> going on when the delay actually happens.
>>>>>
>>>>> I run desktop environment on a kvm virtual machine here.
>>>>> The server is on diskless terminal, and the rest, incl.
>>>>> the window manager etc, is started from a VM.
>>>>>
>>>>> And yesterday, during normal system load (nothing extra,
>>>>> and not idle either, and all the other guests were running
>>>>> under normal load too), I had a stall of everyhing on this
>>>>> X session for about 2..3, maybe 5 secounds.
>>>>>
>>>>> It felt like completely stuck machine. Nothing were moving
>>>>> on the screen, no reaction to the keyboard etc.
>>>>>
>>>>> And after several seconds it returned to normal.  With
>>>>> the familiar message in dmesg -- increasing hrtimer etc,
>>>>> to the next 50%.  (Without a patch from Marcelo at this
>>>>> time it shuold increase min_delta to a large number).
>>>>>
>>>>> To summarize: there's something, well, more interesting
>>>>> going on here.  In addition to the scheduling issues that
>>>>> causes timers to be calculated on the "wrong" CPU etc as
>>>> Care to elaborate ?
>>> Such huge delays (in terms of seconds, not ms or ns) - I don't
>>> understand how such delays can be explained by sheduling to the
>>> different cpu etc.  That's what I mean.  I know very little about
>>> all this low-level stuff so I may be completely out of context,
>>> but such explanation does not look right to me, simple as that.
>>> By "scheduling mistakes" we can get mistakes in range of millisecs,
>>> but not secs.
>>
>> I'm really missing the big picture here. 
>>
>> What means "causes timers to be calculated on the "wrong" CPU etc" ?
>> And what do you consider a "scheduling mistake" ?
>
> From the initial diagnostics by Marcelo:
>
> > It seems the way hrtimer_interrupt_hanging calculates min_delta is
> > wrong (especially to virtual machines). The guest vcpu can be scheduled
> > out during the execution of the hrtimer callbacks (and the callbacks
> > themselves can do operations that translate to blocking operations in
> > the hypervisor).
> >
> > So high min_delta values can be calculated if, for example, a single
> > hrtimer_interrupt run takes two host time slices to execute, while some
> > other higher priority task runs for N slices in between.
>
> From this I conclude that the huge min_delta is due to some other task(s)
> on the host being run while this guest is in hrtimer callback.  But I
> fail to see why that process on the host takes SO MUCH time, to warrant
> resulting min_delta to 0.5s, or to cause delays for 3..5 seconds in
> guest.  It's ok to have delays in range of several extra milliseconds,
> but for *seconds* is too much.
>
> Note again that neither host nor guest are not under high load when
> this jump happens.  Also note that there's no high-priority processes
> running on the host, all are of the same priority level, including
> all the guests.
>
> Note also that so far I only see it on SMP guests, never on UP
> guests.  And only on guests with kvm_clock, not with acpi_pm
> clocksource.
>
> What I'm trying to say is that it looks like there's something
> else wrong here in the guest code.  Huge stalls, huge delays
> while in hrtimer callback (i think it jappens always when such
> delay is happening, it's just noticed by hrtimer code) -- that's
> the root cause of all this, (probably) wrong logic in hrtimer
> calibration just shows the results of something that's wrong
> elsewhere.

True.

Would be useful to collect sar (sar -B -b -u) output every one second
in both host/guest. You already mentioned load was low, but this should
give more details.

Was there swapping going on?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html