On 20/06/16 14:05, Marcelo Tosatti wrote:
Alan Jenkins reports hang at https://bugzilla.redhat.com/show_bug.cgi?id=1337667, due to guest TSC being set far behind than lapic_timer.expired_tscdeadline, when restoring VM state on top of currently active VM. It is not possible to disable LAPIC timer advancement (by setting lapic_timer.expired_tscdeadline = 0), at guest TSC write
I like that it acknowledges (though only implicitly) the guest can trigger arbitrary lockups of host CPUs.
because: * APIC write: expiration = 1000. * LAPIC tsc deadline code sets timer to 1000-30. * Timer fires at 970. * Guest writes TSC=w. Guest fails to VM-entry to process signal to perform "vmload" in userspace. Case 1: w > 970: Guest entry can be performed. Case 2: w < 970: Guest entry should not be performed because "An interrupt is generated when the logical processor’s time-stamp counter equals or exceeds the target value in the IA32_TSC_DEADLINE MSR." In case 2, hardware would not fire an interrupt. To fix the problem, disable timer advancement when userspace sets the LAPIC state. Setting of APIC resets all APIC state, including any pending interrupt. Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> Reported-by: Alan Jenkins <alan.christopher.jenkins@xxxxxxxxx>
However I feel this doesn't admit (even implicitly) that host software (not necessarily root) can still hard-lockup the CPU. It depends on the sequence of operations, and the message doesn't show that sequence explicitly. I now understand what the sequence that _is_ in the message shows, but it's unfortunately distracting.
I.e. if you restore the LAPIC first (or omit to do so at all), then restore the TSC deadline MSR, then the TSC MSR.
The patch assumes that the LAPIC is restored after the MSRs so it can clear the incorrect value of expired_tscdeadline, right?
I didn't know whether this patch would work until I tested it, because I didn't try to nail down the exact sequence QEMU is using.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ea306ad..89be6e9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2991,6 +2991,7 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, { kvm_apic_post_state_restore(vcpu, s); update_cr8_intercept(vcpu); + vcpu->arch.apic->lapic_timer.expired_tscdeadline = 0; return 0; }
-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html