Re: hard lockup in wait_lapic_expire() - bug in TSC deadline timer?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/06/16 09:23, Paolo Bonzini wrote:

On 09/06/2016 11:31, Alan Jenkins wrote:
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1a2da0e..84abb1a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1273,7 +1273,6 @@ static void apic_timer_expired(struct kvm_lapic *apic)
  {
      struct kvm_vcpu *vcpu = apic->vcpu;
      struct swait_queue_head *q = &vcpu->wq;
-    struct kvm_timer *ktimer = &apic->lapic_timer;
if (atomic_read(&apic->lapic_timer.pending))
          return;
@@ -1283,9 +1282,6 @@ static void apic_timer_expired(struct kvm_lapic *apic)
if (swait_active(q))
          swake_up(q);
-
-    if (apic_lvtt_tscdeadline(apic))
-        ktimer->expired_tscdeadline = ktimer->tscdeadline;
  }
/*
@@ -1922,12 +1918,15 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
  {
      struct kvm_lapic *apic = vcpu->arch.apic;
+    struct kvm_timer *ktimer = &apic->lapic_timer;
- if (atomic_read(&apic->lapic_timer.pending) > 0) {
+    if (atomic_read(&ktimer->pending) > 0) {
          kvm_apic_local_deliver(apic, APIC_LVTT);
-        if (apic_lvtt_tscdeadline(apic))
-            apic->lapic_timer.tscdeadline = 0;
-        atomic_set(&apic->lapic_timer.pending, 0);
+        if (apic_lvtt_tscdeadline(apic)) {
+            ktimer->expired_tscdeadline = ktimer->tscdeadline;
+            ktimer->tscdeadline = 0;
+        }
+        atomic_set(&ktimer->pending, 0);
      }
  }
@@ -2007,6 +2006,71 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
  }
/*
+ * kvm_apic_tsc_update - called when guest or host changes the TSC
+ *
+ * Update the emulated TSC deadline timer.
+ *
+ * It is also required if host changes TSC frequency scaling.  It is not
+ * required when we "catch up" the TSC, because that is maintaining the
+ * relationship between TSC and real time.
+ */
+void kvm_apic_tsc_update(struct kvm_vcpu *vcpu)
+{
+    struct kvm_lapic *apic = vcpu->arch.apic;
+    struct kvm_timer *ktimer = &apic->lapic_timer;
+
+    if (!lapic_in_kernel(vcpu))
+        return;
+
+    if (!apic_lvtt_tscdeadline(apic))
+        return;
+
+    hrtimer_cancel(&ktimer->timer);
+
+    /*
+     * Handle when guest TSC is adjusted backwards, just after
+     * inject_apic_timer_irqs().  When we hit wait_lapic_expire(), we risk
+     * an extended busy-wait.  Because ktimer->expired_tscdeadline will
+     * have receded further into the future.
+     *
+     * The guest does not get a chance to run in this window, but the host
+     * could modify the TSC at this point.  This race could happen when
+     * restoring a live snapshot of a VM.
+     *
+     * Just clear the busy-wait.  In theory this could cause a wakeup
+     * before the deadline, if the user configured the the busy-wait
+     * feature (lapic_advance_timer_ns).  We don't expect this to matter:
Why not just delay by lapic_advance_timer_ns here?  It's usually around
10 nanoseconds, it should be cheap.

Paolo

I did some more thinking since. Page-long justifications seem sub-optimal. I agree with your point, and I'm looking in to it. (Though I think it's around 10 microseconds. 40 cpu cycles sounds very short :-P).

Thanks
Alan


I thought it would be most natural to call wait_lapic_expire() before returning to userspace. Then we avoid returning with expired_tscdeadline set, and userspace doesn't see the irq injected before the deadline.

So I checked the pre-conditions for wait_lapic_expire() and tripped over another issue.

I don't think it likes tsc_catchup=1, even where it's called now. vcpu_load() can make the guest TSC seem to stand still for a bit, and the "catchup" part is driven from get_kernel_ns(). I.e. the same resolution as the kernel timer tick. The busy-wait could be extended to 1ms (HZ=1000), so this low latency feature defeats itself again :-(. Like the longer busy-wait issue, it wouldn't be limited to systems where the busy-wait feature has been configured.

That could explain why I saw short busy-waits up to .5ms while running a VM, because that happened after I'd triggered the long busy-wait, and the TSC was marked unstable by the clocksource watchdog.

- Don't support it? Disable lapic_timer_advance_ns on systems without perfectly synchronized TSCs? Has no-one noticed because no such systems have been configured? Or does it escape their validation tests and this change could be perceived as a regression?

- Hack tsc_catchup to take the hrtimer expiry into account? Disable busy-waits only for the old cpus lacking X86_FEATURE_CONSTANT_TSC?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux