On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote: > Il 28/05/2013 14:56, Gleb Natapov ha scritto: > >> > else > >> > vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >> > } > >> > - if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > >> > + /* > >> > + * Note that we may get another INIT+SIPI sequence right here; process > >> > + * the INIT first. Assumes that there are only KVM_APIC_INIT/SIPI. > >> > + */ > >> > + if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI && > >> > vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > > Because pending_events can be INIT/SIPI at this point and it should be > > interpreted as: do SIPI and ignore INIT (atomically). > > My patch does "do another INIT (which will have no effect) and do SIPI > after that INIT", which is different but has almost the same effect. > If pending_events is INIT/SIPI, it ignores the SIPI for now and lets > the next iteration of kvm_apic_accept_events do both. The difference > would be that in a carefully-timed sequence of interrupts > You assume that the next processing will actually happen, but this is not necessary the case. > INIT-INIT-SIPI-INIT-SIPI > > your version would do many SIPIs, while mine would do just one. > > Hmm... there is a reference to this in 25.2 "Other causes of VM exits": > "If a logical processor is in the wait-for-SIPI state, INIT signals are > blocked. They do not cause VM exits in this case." It is not for the > physical processor, but it makes sense to have the same thing. Is this > the reason why you did the cmpxchg at the end? > No, but if helps us model proper behaviour for nested guest +1 to it :) But until we handle INIT in nested virt it is not something that dictates the solution. > But then, there's another way to mask INITs in the wait-for-SIPI > state. Considering that KVM_MP_STATE_INIT_RECEIVED is really a > wait-for-SIPI, you can do: > Haven't checked it for races (especially races between multiple CPUS sending INIT), but looks more complicated to me. > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index e1adbb4..36bc308 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -720,8 +720,12 @@ out: > break; > > case APIC_DM_INIT: > - if (!trig_mode || level) { > + if ((!trig_mode || level) && > + vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) { > result = 1; > + > + /* check mp_state before writing apic->pending_events */ > + smp_mb(); > /* assumes that there are only KVM_APIC_INIT/SIPI */ > apic->pending_events = (1UL << KVM_APIC_INIT); > /* make sure pending_events is visible before sending > @@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) > if (!kvm_vcpu_has_lapic(vcpu)) > return; > > - if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) { > + if (test_bit(KVM_APIC_INIT, &apic->pending_events)) { > kvm_lapic_reset(vcpu); > kvm_vcpu_reset(vcpu); > if (kvm_vcpu_is_bsp(apic->vcpu)) > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > else > vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > + > + /* write mp_state before toggling KVM_APIC_INIT */ > + smp_mb__before_clear_bit(); > + clear_bit(KVM_APIC_INIT, &apic->pending_events); > } > if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) && > vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > > I don't have a particular preference. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html