Re: qemu polling KVM_IRQ_LINE_STATUS when stopped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paolo et al.,

I recently noticed ~30% CPU usage on a paused Windows 10 VM, as
reported in https://bugs.launchpad.net/bugs/1851062 and
https://bugzilla.redhat.com/1638289 which, with the help of Christian
Ehrhardt, led to your previous discussion of the issue with Andi Kleen
at https://lore.kernel.org/kvm/87a80pihlz.fsf@xxxxxxxxxxxxxxx/ quoted
below:

On Fri, 2017-10-20 at 18:51 -0400, Paolo Bonzini wrote:
> On Fri, 2017-10-20 at 13:50 -0700, Andi Kleen wrote:
>> On Fri, Oct 20, 2017 at 05:12:40PM +0200, Paolo Bonzini wrote:
>>> On 20/10/2017 16:09, Andi Kleen wrote:
>>>>> Unfortunately that's not possible in general.  Windows uses the periodic
>>>>> timer to track wall time (!), so if you do that your clock is going to
>>>>> be late when you resume the guest.
>>>> 
>>>> But when the guest cannot execute instructions
>>>> it cannot see whatever the handler does.
>>>> 
>>>> So the handler could always catch up after stopping for longer,
>>>> without making any difference.
>>> 
>>> You may be right... you should get the interrupt storm *after
>>> continuing* the guest, but not while it's stopped.
>> 
>> Maybe be find to not have a storm, but only one. I belive real hardware
>> cannot have a storm because only one interrupt can be pending at a time.
> 
> Real hardware also doesn't pause for an extended period of time, with
> exceptions such as JTAG that aren't as prominent as pausing a virtual
> machine.  This is just how Windows works: unless it's S3/S4, it updates
> the time from RTC periodic timer ticks, and the frequency sometimes goes
> up as much as 1024 or 2048 Hz (default being 64 Hz IIRC).
> 
> In fact, we have a lot of cruft in KVM just to track periodic timer
> ticks that couldn't be delivered and retry again a little later.  Without
> that, the smallest load on the host is enough for time to drift in
> Windows guests.

I'm trying to understand the cause and what options might exist for
addressing it.  Several questions:

1. Do I understand correctly that the CPU usage is due to counting
   RTC periodic timer ticks for replay when the guest is resumed?
2. If so, would it be possible to calculate the number of ticks
   required from the time delta at resume, rather than polling each
   tick while paused?
3. Presumably when restoring from a snapshot, Windows time must jump
   forward from the time the snapshot was taken.  How does this differ
   from resuming from a paused state?
4. How is this handled if the host is suspended (S3) when the VM is
   paused (or not paused) and ticks aren't counted on the host?
5. I have not observed high CPU usage for paused VMs in VirtualBox.
   Would it be worth investigating how they handle this?

>From the discussion in https://bugs.launchpad.net/bugs/1851062 it
appears that the issue does not occur for all Windows 10 VMs.  Does
that fit the theory it is caused by RTC periodic timer ticks?  In my
VM, clockres reports

    Maximum timer interval: 15.625 ms
    Minimum timer interval: 0.500 ms
    Current timer interval: 15.625 ms

immediately before and after pausing, suggesting that high periodic
tick frequency is not necessary to cause the issue.

Thanks,
Kevin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux