On Oct 8, 2014, at 1:06 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > And it would get one from the currently pending timer. > > What about the following patch? > (The introduced else branch could use some abstractions.) > > --8<--- > KVM: x86: fix deadline tsc interrupt injection > > The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a > situation where we lose a pending deadline timer in a MSR write. > Losing it is fine, because it effectively occurs before the timer fired, > so we should be able to cancel or postpone it. > > Another problem comes from interaction with QEMU, or other userspace > that can set deadline MSR without a good reason, when timer is already > pending: one guest's deadline request results in more than one > interrupt because one is injected immediately on MSR write from > userspace and one through hrtimer later. > > The solution is to remove the injection when replacing a pending timer > and to improve the usual QEMU path, we inject without a hrtimer when the > deadline has already passed. > > Signed-off-by: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Reported-by: Nadav Amit <namit@xxxxxxxxxxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index b8345dd..51428dd 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) > if (likely(tscdeadline > guest_tsc)) { > ns = (tscdeadline - guest_tsc) * 1000000ULL; > do_div(ns, this_tsc_khz); > + hrtimer_start(&apic->lapic_timer.timer, > + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > + } else { > + atomic_inc(&ktimer->pending); > + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > } > - hrtimer_start(&apic->lapic_timer.timer, > - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > > local_irq_restore(flags); > } > @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) > return; > > hrtimer_cancel(&apic->lapic_timer.timer); > - /* Inject here so clearing tscdeadline won't override new value */ > - if (apic_has_pending_timer(vcpu)) > - kvm_inject_apic_timer_irqs(vcpu); > apic->lapic_timer.tscdeadline = data; > start_apic_timer(apic); > } Perhaps I am missing something, but I don’t see how it solves the problem I encountered. Recall the scenario: 1. A TSC deadline timer interrupt is pending. 2. TSC deadline was still not cleared (which happens during vcpu_run). 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. It appears that in such scenario the guest would still get spurious interrupt for no reason, as ktimer->pending may already be increased in apic_timer_fn. Second, I think that the solution I proposed would perform better. Currently, there are many unnecessary cancellations and setups of the timer. This solution does not resolve this problem. Last, I think that having less interrupts on deadline changes is not completely according to the SDM which says: "If software disarms the timer or postpones the deadline, race conditions may result in the delivery of a spurious timer interrupt.” It never says interrupts may be lost if you reprogram the deadline before you check it expired. Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html