2014-10-10 11:45+0200, Paolo Bonzini: > Il 10/10/2014 03:55, Nadav Amit ha scritto: > >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >> index b8345dd..51428dd 100644 > >> --- a/arch/x86/kvm/lapic.c > >> +++ b/arch/x86/kvm/lapic.c > >> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) > >> if (likely(tscdeadline > guest_tsc)) { > >> ns = (tscdeadline - guest_tsc) * 1000000ULL; > >> do_div(ns, this_tsc_khz); > >> + hrtimer_start(&apic->lapic_timer.timer, > >> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> + } else { > >> + atomic_inc(&ktimer->pending); > >> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > >> } > >> - hrtimer_start(&apic->lapic_timer.timer, > >> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> > >> local_irq_restore(flags); > >> } > >> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) > >> return; > >> > >> hrtimer_cancel(&apic->lapic_timer.timer); > >> - /* Inject here so clearing tscdeadline won't override new value */ > >> - if (apic_has_pending_timer(vcpu)) > >> - kvm_inject_apic_timer_irqs(vcpu); > >> apic->lapic_timer.tscdeadline = data; > >> start_apic_timer(apic); > >> } > > > > Perhaps I am missing something, but I don’t see how it solves the problem I encountered. > > Recall the scenario: > > 1. A TSC deadline timer interrupt is pending. > > 2. TSC deadline was still not cleared (which happens during vcpu_run). > > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. > > > > It appears that in such scenario the guest would still get spurious > > interrupt for no reason, as ktimer->pending may already be increased in > > apic_timer_fn. > > If ktimer->pending == 2 you still get only one interrupt, not two. Am I > misunderstanding your objection? (I don't see it either, the patch removed kvm_inject_apic_timer_irqs(), so we inject as if the MSR write didn't happen.) > > Second, I think that the solution I proposed would perform better. > > Currently, there are many unnecessary cancellations and setups of the > > timer. This solution does not resolve this problem. > > I think it does. You do not get an hrtimer_start if tscdeadline <= > guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueued > before calling hrtimer_cancel, or go straight to the source and avoid > taking the lock in the easy cases: I think the useless cancels were when the timer is set to the same value in the future. Which makes sense to optimize, but I wasn't sure how it would affect the guest if the TSC offset was be changing or TSC itself wasn't stable ... so I chose not to do it. > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index 1c2fe7de2842..6ce725007424 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) > { > struct hrtimer_clock_base *base; > unsigned long flags; > - int ret = -1; > + unsigned long state = timer->state; > + int ret; > + > + if (state & HRTIMER_STATE_ENQUEUED) > + return 0; It doesn't try very hard to cancel it ;) > + if (state & HRTIMER_STATE_CALLBACK) > + return -1; > > base = lock_hrtimer_base(timer, &flags); > > + ret = -1; > if (!hrtimer_callback_running(timer)) > ret = remove_hrtimer(timer, base); > > > Last, I think that having less interrupts on deadline changes is not > > completely according to the SDM which says: "If software disarms the > > timer or postpones the deadline, race conditions may result in the > > delivery of a spurious timer interrupt.” It never says interrupts may > > be lost if you reprogram the deadline before you check it expired. Yeah, too bad it wasn't written in a formally defined language ... They could be just reserving design space for concurrent implementation, not making race conditions mandatory. Our race windows are much bigger -- a disarm that is hundreds of cycles in the future can easily be hit after a msr write vm-exit, but it would be very unlikely to get an interrupt on bare metal. And thinking about the meaning of postpone or disarm -- software doesn't want the old interrupt anymore, so it would just add useless work. (And I liked the code better without it, but it'd be ok if we fixed all the cases.) > But the case when you rewrite the same value to the MSR is neither > disarming nor postponing. You would be getting two interrupts for the > same event. That is why I agree with Radim that checking host_initiated > is wrong. Current implementation also injects two interrupts when the timer is set to a lower nonzero value, which should give us just one under both interpretations. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html