On 09/15/2011 07:01 PM, Jan Kiszka wrote:
On 2011-09-15 16:45, Avi Kivity wrote: > If simultaneous NMIs happen, we're supposed to queue the second > and next (collapsing them), but currently we sometimes collapse > the second into the first. Can you describe the race in a few more details here ("sometimes" sounds like "I don't know when" :) )?
In this case it was "I'm in a hurry".
> > void kvm_inject_nmi(struct kvm_vcpu *vcpu) > { > + atomic_inc(&vcpu->arch.nmi_pending); > kvm_make_request(KVM_REQ_EVENT, vcpu); > - vcpu->arch.nmi_pending = 1; Does the reordering matter?
I think so. Suppose the vcpu enters just after kvm_make_request(); it sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it wasn't set set. Then comes a kick, the guest is reentered with nmi_pending set but KVM_REQ_EVENT clear and sails through the check and enters the guest. The NMI is delayed until the next KVM_REQ_EVENT.
Do we need barriers?
Yes.
> @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) > } > > /* try to inject new event if pending */ > - if (vcpu->arch.nmi_pending) { > + if (atomic_read(&vcpu->arch.nmi_pending)) { > if (kvm_x86_ops->nmi_allowed(vcpu)) { > - vcpu->arch.nmi_pending = false; > + atomic_dec(&vcpu->arch.nmi_pending); Here we lost NMIs in the past by overwriting nmi_pending while another one was already queued, right?
One place, yes. The other is kvm_inject_nmi() - if the first nmi didn't get picked up by the vcpu by the time the second nmi arrives, we lose the second nmi.
> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { > inject_pending_event(vcpu); > > /* enable NMI/IRQ window open exits if needed */ > - if (nmi_pending) > + if (atomic_read(&vcpu->arch.nmi_pending) > + && nmi_in_progress(vcpu)) Is nmi_pending&& !nmi_in_progress possible at all?
Yes, due to NMI-blocked-by-STI. A really touchy area.
Is it rather a BUG condition?
No.
If not, what will happen next?
The NMI window will open and we'll inject the NMI. But I think we have a bug here - we should only kvm_collapse_nmis() if an NMI handler was indeed running, yet we do it unconditionally.
> > +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu) > +{ > + /* Collapse all NMIs queued while an NMI handler was running to one */ > + if (atomic_read(&vcpu->arch.nmi_pending)) > + atomic_set(&vcpu->arch.nmi_pending, 1); Is it OK that NMIs injected after the collapse will increment this to> 1 again? Or is that impossible?
It's possible and okay. We're now completing execution of IRET. Doing atomic_set() after atomic_inc() means the NMI happened before IRET completed, and vice versa. Since these events are asynchronous, we're free to choose one or the other (a self-IPI-NMI just before the IRET must be swallowed, and a self-IPI-NMI just after the IRET would only be executed after the next time around the handler).
-- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html