Re: [PATCH v2] KVM: x86: reset lapic_timer.expired_tscdeadline at SET_LAPIC time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/06/16 14:05, Marcelo Tosatti wrote:
Alan Jenkins reports hang at
https://bugzilla.redhat.com/show_bug.cgi?id=1337667,
due to guest TSC being set far behind than
lapic_timer.expired_tscdeadline, when restoring VM state
on top of currently active VM.

It is not possible to disable LAPIC timer advancement
(by setting lapic_timer.expired_tscdeadline = 0), at
guest TSC write

I like that it acknowledges (though only implicitly) the guest can trigger arbitrary lockups of host CPUs.

because:

* APIC write: expiration = 1000.
* LAPIC tsc deadline code sets timer to 1000-30.
* Timer fires at 970.
* Guest writes TSC=w.

Guest fails to VM-entry to process signal to perform
"vmload" in userspace.

Case 1: w > 970:
Guest entry can be performed.

Case 2: w < 970:
Guest entry should not be performed because "An interrupt is generated
when the logical processor’s time-stamp counter equals or exceeds the
target value in the IA32_TSC_DEADLINE MSR."

In case 2, hardware would not fire an interrupt.

To fix the problem, disable timer advancement when
userspace sets the LAPIC state. Setting of APIC
resets all APIC state, including
any pending interrupt.

Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Reported-by: Alan Jenkins <alan.christopher.jenkins@xxxxxxxxx>

However I feel this doesn't admit (even implicitly) that host software (not necessarily root) can still hard-lockup the CPU. It depends on the sequence of operations, and the message doesn't show that sequence explicitly. I now understand what the sequence that _is_ in the message shows, but it's unfortunately distracting.

I.e. if you restore the LAPIC first (or omit to do so at all), then restore the TSC deadline MSR, then the TSC MSR.

The patch assumes that the LAPIC is restored after the MSRs so it can clear the incorrect value of expired_tscdeadline, right?

I didn't know whether this patch would work until I tested it, because I didn't try to nail down the exact sequence QEMU is using.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ea306ad..89be6e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2991,6 +2991,7 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
  {
  	kvm_apic_post_state_restore(vcpu, s);
  	update_cr8_intercept(vcpu);
+	vcpu->arch.apic->lapic_timer.expired_tscdeadline = 0;

  	return 0;
  }


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux