[PATCH v3 0/9] KVM: lapic: Fix a variety of timer adv issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The recent change to automatically calculate lapic_timer_advance_ns
introduced a handful of gnarly bugs, and also exposed a latent bug by
virtue of the advancement logic being enabled by default.  Inspection
and debug revealed several other opportunities for minor improvements.

The primary issue is that the auto-tuning of lapic_timer_advance_ns is
completely unbounded, e.g. there's nothing in the logic that prevents the
advancement from creeping up to hundreds of milliseconds.  Adjusting the
timers by large amounts creates major discrepancies in the guest, e.g. a
timer that was configured to fire after multiple milliseconds may arrive
before the guest executes a single instruction.  While technically correct
from a time perspective, it breaks a reasonable assumption from the guest
that it can execute some number of instructions between timer events.

The other major issue is that the advancement is global, while TSC scaling
is done on a per-vCPU basic.  Adjusting the advancement at runtime
exacerbates this as there is no protection against multiple vCPUs and/or
multiple VMs concurrently modifying the advancement value, e.g. it can
effectively become corrupted or never stabilize due to getting bounced all
over tarnation.

As for the latent bug, when timer advancement was applied to the hv_timer,
i.e. the VMX preemption timer, the logic to trigger wait_for_lapic_timer()
was not updated.  As a result, a timer interrupt emulated via the hv_timer
can easily arrive too early from a *time* perspective, as opposed to
simply arriving early from a "number of instructions executed" perspective.

v3:
 - Split the refactoring of start_hv_timer() and ->set_hv_timer
   into three separate patches instead of attempting to do a big
   refactor in a single patch to fix three separate issues.
    - Explicitly cancel the hv timer to avoid
    - Use a param for "expired" instead of overloading the return
      value of ->set_hv_timer().
    - Check for a pending non-periodic in restart_apic_timer(). [Liran]
 - Add more Reviewed-by tags.

v2:
 - https://patchwork.kernel.org/cover/10903613/
 - Add explicit param to control automatic tuning. [Liran]
 - Document the effect of per-vCPU tracking on the module params.
 - Use fancy math to convert guest clockcycles to host nanoseconds
   instead of brute forcing the delay with a for loop. [Liran]
 - Refactor start_hv_timer()'s return semantics to move the "expired
   timer" handling up a level. [Liran and Paolo]
 - Add Liran's Reviewed-by tags.

v1: https://patchwork.kernel.org/cover/10899101/

Sean Christopherson (9):
  KVM: lapic: Hard cap the auto-calculated timer advancement
  KVM: lapic: Convert guest TSC to host time domain when delaying
  KVM: lapic: Track lapic timer advance per vCPU
  KVM: lapic: Allow user to disable auto-tuning of timer advancement
  KVM: lapic: Busy wait for timer to expire when using hv_timer
  KVM: lapic: Explicitly cancel the hv timer if it's pre-expired
  KVM: lapic: Refactor ->set_hv_timer to use an explicit expired param
  KVM: lapic: Check for a pending timer intr prior to start_hv_timer()
  KVM: VMX: Skip delta_tsc shift-and-divide if the dividend is zero

 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/lapic.c            | 81 +++++++++++++++++++++------------
 arch/x86/kvm/lapic.h            |  5 +-
 arch/x86/kvm/vmx/vmx.c          | 15 +++---
 arch/x86/kvm/x86.c              | 11 +++--
 arch/x86/kvm/x86.h              |  2 -
 6 files changed, 73 insertions(+), 44 deletions(-)

-- 
2.21.0




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux