On Tue, Mar 05, 2013 at 11:50:58AM +0100, Paolo Bonzini wrote: > Il 05/03/2013 10:37, Gleb Natapov ha scritto: > > > > > Not at all. I'm keeping the state in a single place, mp_state. I just > > > > > have to make sure that I do not loose asynchronous events - what INIT > > > > > and SIPI are. > > > > > > > > As evident from this code: > > > > + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED || > > > > + test_bit(KVM_REQ_INIT, &vcpu->requests)) { > > > > the state is in two places. > > > That's just to protect the content of sipi_vector during delivery. > > Is that even needed? Can we just do an unconditional: > > vcpu->arch.sipi_vector = vector; > /* make sure sipi_vector is visible for the receiver */ > smp_wmb(); > kvm_make_request(KVM_REQ_SIPI, vcpu); > kvm_vcpu_kick(vcpu); > > and check in the request handler that we did get an INIT? > INIT may come after SIPI and vcpu will start anyway. > > > We could drop the complete if clause if we protected that variable differently. > > > > I understand why the code is here. I am saying that this is the evidence > > that the state is in two places. > > I agree with Gleb here. The state is in two places. I'm not saying that > using requests is wrong, it sounds nice especially for nested INIT. But > there is something nasty in the patches so far. > > BTW checking the requests in kvm_apic_has_interrupt seems nicer than > synchronizing in kvm_arch_cpu_runnable; it is a bit ugly to have side > effects in kvm_arch_cpu_runnable. Then you can actually _process_ I subscribe to the idea of kvm_arch_cpu_runnable() that does not change state that it queries. Otherwise we do quantum programming: state changes by examination. But kvm_apic_has_interrupt() is called by kvm_arch_vcpu_runnable() only if interrupts are enabled. > the requests only in vcpu_enter_guest and kvm_arch_vcpu_ioctl_get_mpstate(). > In kvm_arch_vcpu_ioctl_put_mpstate(), KVM_MP_STATE_SIPI_RECEIVED has to > become KVM_MP_STATE_INIT_RECEIVED with KVM_REQ_SIPI set. > > >>> > > > >>>>> > >>> To overcome this we can either deprecated > >>>>> > >>> KVM_MP_STATE_INIT_RECEIVED/KVM_MP_STATE_SIPI_RECEIVED values for mp_state > >>>>> > >>> (use it only for migration purposes) and use separate state in APIC > >>>>> > >>> to hold those event, like with nmi, or why not go with Paolo's simple > >>>>> > >>> cmpxchg one? > >>>> > >> > >>>> > >> We need to replace most, if not all, manipulations of mp_state with > >>>> > >> cmpxchg, verifying the state transitions there. > > Let's take again the matrix I sent yesterday, fixed to make UNINIT > unreachable (that transition can only be done by userspace): > > from \ to RUNNABLE UNINIT INIT HALTED SIPI > RUNNABLE n/a NO yes yes NO > UNINIT NO n/a yes NO NO > INIT NO NO yes NO yes > HALTED yes NO yes n/a NO > SIPI yes NO yes NO n/a > > We have: > > - anything -> INIT: this is asynchronous, and INIT always wins. No need > for a cmpxchg. > > - RUNNABLE -> HALTED: cmpxchg needed > > - INIT -> SIPI: we can have a race between the last two interrupts in an > INIT-INIT-SIPI (or INIT-SIPI-INIT) sequence, with these two interrupts sent > by two different processors. The same race should also be present in real > hardware, and should never happen (the BSP should send SIPIs to all other > processors). No need for a cmpxchg. > > - HALTED -> RUNNABLE: racy vs. INIT too, cmpxchg needed > > - SIPI -> RUNNABLE: there is the same race with going back to INIT; no need > for a cmpxchg. > > > But it probably will not scale well with nested virt. > I do not think the problem for nested virt with cmpxchg() approach is scalability. The problem is that with nested virt INIT behaves like regular even that may cause vmexit or be delayed. Changing vcpu state during event generation does not work if vcpu it in a guest mode. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html