Hi Paolo et al., I recently noticed ~30% CPU usage on a paused Windows 10 VM, as reported in https://bugs.launchpad.net/bugs/1851062 and https://bugzilla.redhat.com/1638289 which, with the help of Christian Ehrhardt, led to your previous discussion of the issue with Andi Kleen at https://lore.kernel.org/kvm/87a80pihlz.fsf@xxxxxxxxxxxxxxx/ quoted below: On Fri, 2017-10-20 at 18:51 -0400, Paolo Bonzini wrote: > On Fri, 2017-10-20 at 13:50 -0700, Andi Kleen wrote: >> On Fri, Oct 20, 2017 at 05:12:40PM +0200, Paolo Bonzini wrote: >>> On 20/10/2017 16:09, Andi Kleen wrote: >>>>> Unfortunately that's not possible in general. Windows uses the periodic >>>>> timer to track wall time (!), so if you do that your clock is going to >>>>> be late when you resume the guest. >>>> >>>> But when the guest cannot execute instructions >>>> it cannot see whatever the handler does. >>>> >>>> So the handler could always catch up after stopping for longer, >>>> without making any difference. >>> >>> You may be right... you should get the interrupt storm *after >>> continuing* the guest, but not while it's stopped. >> >> Maybe be find to not have a storm, but only one. I belive real hardware >> cannot have a storm because only one interrupt can be pending at a time. > > Real hardware also doesn't pause for an extended period of time, with > exceptions such as JTAG that aren't as prominent as pausing a virtual > machine. This is just how Windows works: unless it's S3/S4, it updates > the time from RTC periodic timer ticks, and the frequency sometimes goes > up as much as 1024 or 2048 Hz (default being 64 Hz IIRC). > > In fact, we have a lot of cruft in KVM just to track periodic timer > ticks that couldn't be delivered and retry again a little later. Without > that, the smallest load on the host is enough for time to drift in > Windows guests. I'm trying to understand the cause and what options might exist for addressing it. Several questions: 1. Do I understand correctly that the CPU usage is due to counting RTC periodic timer ticks for replay when the guest is resumed? 2. If so, would it be possible to calculate the number of ticks required from the time delta at resume, rather than polling each tick while paused? 3. Presumably when restoring from a snapshot, Windows time must jump forward from the time the snapshot was taken. How does this differ from resuming from a paused state? 4. How is this handled if the host is suspended (S3) when the VM is paused (or not paused) and ticks aren't counted on the host? 5. I have not observed high CPU usage for paused VMs in VirtualBox. Would it be worth investigating how they handle this? >From the discussion in https://bugs.launchpad.net/bugs/1851062 it appears that the issue does not occur for all Windows 10 VMs. Does that fit the theory it is caused by RTC periodic timer ticks? In my VM, clockres reports Maximum timer interval: 15.625 ms Minimum timer interval: 0.500 ms Current timer interval: 15.625 ms immediately before and after pausing, suggesting that high periodic tick frequency is not necessary to cause the issue. Thanks, Kevin