Re: [RFC] KVM: Fix simultaneous NMIs

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Thu, 15 Sep 2011 19:25:04 +0200

On 2011-09-15 19:02, Avi Kivity wrote:
> On 09/15/2011 07:01 PM, Jan Kiszka wrote:
>> On 2011-09-15 16:45, Avi Kivity wrote:
>>>  If simultaneous NMIs happen, we're supposed to queue the second
>>>  and next (collapsing them), but currently we sometimes collapse
>>>  the second into the first.
>>
>> Can you describe the race in a few more details here ("sometimes" sounds
>> like "I don't know when" :) )?
> 
> In this case it was "I'm in a hurry".
> 
>>>
>>>   void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>>>   {
>>>  +	atomic_inc(&vcpu->arch.nmi_pending);
>>>   	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>  -	vcpu->arch.nmi_pending = 1;
>>
>> Does the reordering matter?
> 
> I think so.  Suppose the vcpu enters just after kvm_make_request(); it 
> sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it 
> wasn't set set.  Then comes a kick, the guest is reentered with 
> nmi_pending set but KVM_REQ_EVENT clear and sails through the check and 
> enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.

That makes sense - and the old code looks more strange now.

> 
>> Do we need barriers?
> 
> Yes.
> 
>>
>>>  @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>>>   	}
>>>
>>>   	/* try to inject new event if pending */
>>>  -	if (vcpu->arch.nmi_pending) {
>>>  +	if (atomic_read(&vcpu->arch.nmi_pending)) {
>>>   		if (kvm_x86_ops->nmi_allowed(vcpu)) {
>>>  -			vcpu->arch.nmi_pending = false;
>>>  +			atomic_dec(&vcpu->arch.nmi_pending);
>>
>> Here we lost NMIs in the past by overwriting nmi_pending while another
>> one was already queued, right?
> 
> One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't 
> get picked up by the vcpu by the time the second nmi arrives, we lose 
> the second nmi.

Thinking this through again, it's actually not yet clear to me what we
are modeling here: If two NMI events arrive almost perfectly in
parallel, does the real hardware guarantee that they will always cause
two NMI events in the CPU? Then this is required.

Otherwise I just lost understanding again why we were loosing NMIs here
and in kvm_inject_nmi (maybe elsewhere then?).

> 
>>>   	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>   		inject_pending_event(vcpu);
>>>
>>>   		/* enable NMI/IRQ window open exits if needed */
>>>  -		if (nmi_pending)
>>>  +		if (atomic_read(&vcpu->arch.nmi_pending)
>>>  +		&&  nmi_in_progress(vcpu))
>>
>> Is nmi_pending&&  !nmi_in_progress possible at all?
> 
> Yes, due to NMI-blocked-by-STI.  A really touchy area.

And we don't need the window exit notification then? I don't understand
what nmi_in_progress is supposed to do here.

> 
>> Is it rather a BUG
>> condition?
> 
> No.
> 
>> If not, what will happen next?
> 
> The NMI window will open and we'll inject the NMI.

How will we know this? We do not request the exit, that's my worry.

>  But I think we have 
> a bug here - we should only kvm_collapse_nmis() if an NMI handler was 
> indeed running, yet we do it unconditionally.
> 
>>>
>>>  +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu)
>>>  +{
>>>  +	/* Collapse all NMIs queued while an NMI handler was running to one */
>>>  +	if (atomic_read(&vcpu->arch.nmi_pending))
>>>  +		atomic_set(&vcpu->arch.nmi_pending, 1);
>>
>> Is it OK that NMIs injected after the collapse will increment this to>
>> 1 again? Or is that impossible?
>>
> 
> It's possible and okay.  We're now completing execution of IRET.  Doing 
> atomic_set() after atomic_inc() means the NMI happened before IRET 
> completed, and vice versa.  Since these events are asynchronous, we're 
> free to choose one or the other (a self-IPI-NMI just before the IRET 
> must be swallowed, and a self-IPI-NMI just after the IRET would only be 
> executed after the next time around the handler).

Need to think through this separately.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html