On Tue, Mar 05, 2013 at 10:12:05AM +0100, Jan Kiszka wrote: > On 2013-03-05 09:46, Gleb Natapov wrote: > > On Tue, Mar 05, 2013 at 09:24:40AM +0100, Jan Kiszka wrote: > >> On 2013-03-05 08:57, Gleb Natapov wrote: > >>> On Tue, Mar 05, 2013 at 12:00:10AM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 22:41, Jan Kiszka wrote: > >>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>>> > >>>>> A VCPU sending INIT or SIPI to some other VCPU races for setting the > >>>>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED > >>>>> was overwritten by kvm_emulate_halt and, thus, got lost. > >>>>> > >>>>> Fix this by raising requests on the sender side that will then be > >>>>> handled synchronously over the target VCPU context. > >>>>> > >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>>> --- > >>>>> > >>>>> Changes in v2: > >>>>> - check transition to INIT_RECEIVED in vcpu_enter_guest > >>>>> - removed return value of kvm_check_init_and_sipi - caller has to > >>>>> check for relevant transition afterward > >>>>> - add write barrier after setting sipi_vector > >>>>> > >>>>> arch/x86/kvm/lapic.c | 11 ++++++----- > >>>>> arch/x86/kvm/x86.c | 15 +++++++++++++++ > >>>>> include/linux/kvm_host.h | 2 ++ > >>>>> 3 files changed, 23 insertions(+), 5 deletions(-) > >>>>> > >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >>>>> index 02b51dd..7986c9f 100644 > >>>>> --- a/arch/x86/kvm/lapic.c > >>>>> +++ b/arch/x86/kvm/lapic.c > >>>>> @@ -731,8 +731,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, > >>>>> case APIC_DM_INIT: > >>>>> if (!trig_mode || level) { > >>>>> result = 1; > >>>>> - vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>>> - kvm_make_request(KVM_REQ_EVENT, vcpu); > >>>>> + kvm_make_request(KVM_REQ_INIT, vcpu); > >>>>> kvm_vcpu_kick(vcpu); > >>>>> } else { > >>>>> apic_debug("Ignoring de-assert INIT to vcpu %d\n", > >>>>> @@ -743,11 +742,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, > >>>>> case APIC_DM_STARTUP: > >>>>> apic_debug("SIPI to vcpu %d vector 0x%02x\n", > >>>>> vcpu->vcpu_id, vector); > >>>>> - if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>>>> + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED || > >>>>> + test_bit(KVM_REQ_INIT, &vcpu->requests)) { > >>>>> result = 1; > >>>>> vcpu->arch.sipi_vector = vector; > >>>>> - vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED; > >>>>> - kvm_make_request(KVM_REQ_EVENT, vcpu); > >>>>> + /* make sure sipi_vector is visible for the receiver */ > >>>>> + smp_wmb(); > >>>>> + kvm_make_request(KVM_REQ_SIPI, vcpu); > >>>>> kvm_vcpu_kick(vcpu); > >>>>> } > >>>>> break; > >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >>>>> index d0cf737..0be04b9 100644 > >>>>> --- a/arch/x86/kvm/x86.c > >>>>> +++ b/arch/x86/kvm/x86.c > >>>>> @@ -5641,6 +5641,15 @@ static void update_eoi_exitmap(struct kvm_vcpu *vcpu) > >>>>> kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > >>>>> } > >>>>> > >>>>> +static void kvm_check_init_and_sipi(struct kvm_vcpu *vcpu) > >>>>> +{ > >>>>> + if (kvm_check_request(KVM_REQ_INIT, vcpu)) > >>>>> + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>>> > >>>> And here is a small race between clearing REQ_INIT and setting > >>>> INIT_RECEIVED. It can make the LAPIC drop the SIPI incorrectly. Need to > >>>> break up test and clear, doing the clear after mp_state update. Yeah... > >>>> > >>> You also need to call kvm_check_init_and_sipi() in > >>> kvm_arch_vcpu_ioctl_get_mpstate(), > >> > >> Indeed. > >> > >>> which means you now have three places > >>> where you transfer INIT/SIPI state from requests to mp_state. All the > >>> problems arise from the fact that now you have two places where you > >>> are storing current state. > >> > >> Not at all. I'm keeping the state in a single place, mp_state. I just > >> have to make sure that I do not loose asynchronous events - what INIT > >> and SIPI are. > >> > > As evident from this code: > > + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED || > > + test_bit(KVM_REQ_INIT, &vcpu->requests)) { > > the state is in two places. > > That's just to protect the content of sipi_vector during delivery. We > could drop the complete if clause if we protected that variable differently. > I understand why the code is here. I am saying that this is the evidence that the state is in two places. > > > >>> To overcome this we can either deprecated > >>> KVM_MP_STATE_INIT_RECEIVED/KVM_MP_STATE_SIPI_RECEIVED values for mp_state > >>> (use it only for migration purposes) and use separate state in APIC > >>> to hold those event, like with nmi, or why not go with Paolo's simple > >>> cmpxchg one? > >> > >> We need to replace most, if not all, manipulations of mp_state with > >> cmpxchg, verifying the state transitions there. And the request-based > >> approach still looks cleaner to me when it comes to implementing INIT > >> handling for nested modes. That will just trivially hook into > >> kvm_check_init_and_sipi. > >> > > The mp_state changes are rare, do not see the problem replacing all > > state changes with cmpxchg. I do not like request-based approach as > > implemented since we keep state in two places and constantly sync it > > back. > > And I like it more as it avoids spurious state changes toward INIT. That > will happen if we misuse mp_state for event signaling, like we do so > far, having to fix it up later again because the INIT event turned out > to become an INIT VM-exit. > Yes, I agree we abuse mp_state for signaling. I do not want to abuse ->request for that too. So what about other idea about treating init/sipi just like any other APIC event (that's what they are) and add state to lapic to track init/sipi signaling? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html