Re: [RFC] KVM: Fix simultaneous NMIs

Avi Kivity <avi@xxxxxxxxxx> · Thu, 15 Sep 2011 20:48:58 +0300

On 09/15/2011 08:25 PM, Jan Kiszka wrote:
>
>  I think so.  Suppose the vcpu enters just after kvm_make_request(); it
>  sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it
>  wasn't set set.  Then comes a kick, the guest is reentered with
>  nmi_pending set but KVM_REQ_EVENT clear and sails through the check and
>  enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.

That makes sense - and the old code looks more strange now.

I think it dates to the time all NMIs were synchronous.

>>>
>>>    	/* try to inject new event if pending */
>>>   -	if (vcpu->arch.nmi_pending) {
>>>   +	if (atomic_read(&vcpu->arch.nmi_pending)) {
>>>    		if (kvm_x86_ops->nmi_allowed(vcpu)) {
>>>   -			vcpu->arch.nmi_pending = false;
>>>   +			atomic_dec(&vcpu->arch.nmi_pending);
>>
>>  Here we lost NMIs in the past by overwriting nmi_pending while another
>>  one was already queued, right?
>
>  One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't
>  get picked up by the vcpu by the time the second nmi arrives, we lose
>  the second nmi.

Thinking this through again, it's actually not yet clear to me what we
are modeling here: If two NMI events arrive almost perfectly in
parallel, does the real hardware guarantee that they will always cause
two NMI events in the CPU? Then this is required.

It's not 100% clear from the SDM, but this is what I understood from 
it.  And it's needed - the NMI handlers are now being reworked to handle 
just one NMI source (hopefully the cheapest) in the handler, and if we 
detect a back-to-back NMI, handle all possible NMI sources.  This 
optimization is needed in turn so we can use Jeremy's paravirt spinlock 
framework, which requires a sleep primitive and a 
wake-up-even-if-the-sleeper-has-interrupts-disabled primitive.  i 
thought of using HLT and NMIs respectively, but that means we need a 
cheap handler (i.e. don't go reading PMU MSRs).

Otherwise I just lost understanding again why we were loosing NMIs here
and in kvm_inject_nmi (maybe elsewhere then?).

Because of that.

>
>>>    	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>    		inject_pending_event(vcpu);
>>>
>>>    		/* enable NMI/IRQ window open exits if needed */
>>>   -		if (nmi_pending)
>>>   +		if (atomic_read(&vcpu->arch.nmi_pending)
>>>   +		&&   nmi_in_progress(vcpu))
>>
>>  Is nmi_pending&&   !nmi_in_progress possible at all?
>
>  Yes, due to NMI-blocked-by-STI.  A really touchy area.

And we don't need the window exit notification then? I don't understand
what nmi_in_progress is supposed to do here.

We need the window notification in both cases.  If we're recovering from 
STI, then we don't need to collapse NMIs.  If we're completing an NMI 
handler, then we do need to collapse NMIs (since the queue length is 
two, and we just completed one).

>
>>  If not, what will happen next?
>
>  The NMI window will open and we'll inject the NMI.

How will we know this? We do not request the exit, that's my worry.

I think we do? Oh, but this patch breaks it.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html