On 09/15/2011 08:25 PM, Jan Kiszka wrote:
> > I think so. Suppose the vcpu enters just after kvm_make_request(); it > sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it > wasn't set set. Then comes a kick, the guest is reentered with > nmi_pending set but KVM_REQ_EVENT clear and sails through the check and > enters the guest. The NMI is delayed until the next KVM_REQ_EVENT. That makes sense - and the old code looks more strange now.
I think it dates to the time all NMIs were synchronous.
>>> >>> /* try to inject new event if pending */ >>> - if (vcpu->arch.nmi_pending) { >>> + if (atomic_read(&vcpu->arch.nmi_pending)) { >>> if (kvm_x86_ops->nmi_allowed(vcpu)) { >>> - vcpu->arch.nmi_pending = false; >>> + atomic_dec(&vcpu->arch.nmi_pending); >> >> Here we lost NMIs in the past by overwriting nmi_pending while another >> one was already queued, right? > > One place, yes. The other is kvm_inject_nmi() - if the first nmi didn't > get picked up by the vcpu by the time the second nmi arrives, we lose > the second nmi. Thinking this through again, it's actually not yet clear to me what we are modeling here: If two NMI events arrive almost perfectly in parallel, does the real hardware guarantee that they will always cause two NMI events in the CPU? Then this is required.
It's not 100% clear from the SDM, but this is what I understood from it. And it's needed - the NMI handlers are now being reworked to handle just one NMI source (hopefully the cheapest) in the handler, and if we detect a back-to-back NMI, handle all possible NMI sources. This optimization is needed in turn so we can use Jeremy's paravirt spinlock framework, which requires a sleep primitive and a wake-up-even-if-the-sleeper-has-interrupts-disabled primitive. i thought of using HLT and NMIs respectively, but that means we need a cheap handler (i.e. don't go reading PMU MSRs).
Otherwise I just lost understanding again why we were loosing NMIs here and in kvm_inject_nmi (maybe elsewhere then?).
Because of that.
> >>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { >>> inject_pending_event(vcpu); >>> >>> /* enable NMI/IRQ window open exits if needed */ >>> - if (nmi_pending) >>> + if (atomic_read(&vcpu->arch.nmi_pending) >>> + && nmi_in_progress(vcpu)) >> >> Is nmi_pending&& !nmi_in_progress possible at all? > > Yes, due to NMI-blocked-by-STI. A really touchy area. And we don't need the window exit notification then? I don't understand what nmi_in_progress is supposed to do here.
We need the window notification in both cases. If we're recovering from STI, then we don't need to collapse NMIs. If we're completing an NMI handler, then we do need to collapse NMIs (since the queue length is two, and we just completed one).
> >> If not, what will happen next? > > The NMI window will open and we'll inject the NMI. How will we know this? We do not request the exit, that's my worry.
I think we do? Oh, but this patch breaks it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html