On Tue, Mar 05, 2013 at 09:24:40AM +0100, Jan Kiszka wrote: > On 2013-03-05 08:57, Gleb Natapov wrote: > > On Tue, Mar 05, 2013 at 12:00:10AM +0100, Jan Kiszka wrote: > >> On 2013-03-04 22:41, Jan Kiszka wrote: > >>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>> > >>> A VCPU sending INIT or SIPI to some other VCPU races for setting the > >>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED > >>> was overwritten by kvm_emulate_halt and, thus, got lost. > >>> > >>> Fix this by raising requests on the sender side that will then be > >>> handled synchronously over the target VCPU context. > >>> > >>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>> --- > >>> > >>> Changes in v2: > >>> - check transition to INIT_RECEIVED in vcpu_enter_guest > >>> - removed return value of kvm_check_init_and_sipi - caller has to > >>> check for relevant transition afterward > >>> - add write barrier after setting sipi_vector > >>> > >>> arch/x86/kvm/lapic.c | 11 ++++++----- > >>> arch/x86/kvm/x86.c | 15 +++++++++++++++ > >>> include/linux/kvm_host.h | 2 ++ > >>> 3 files changed, 23 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >>> index 02b51dd..7986c9f 100644 > >>> --- a/arch/x86/kvm/lapic.c > >>> +++ b/arch/x86/kvm/lapic.c > >>> @@ -731,8 +731,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, > >>> case APIC_DM_INIT: > >>> if (!trig_mode || level) { > >>> result = 1; > >>> - vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >>> - kvm_make_request(KVM_REQ_EVENT, vcpu); > >>> + kvm_make_request(KVM_REQ_INIT, vcpu); > >>> kvm_vcpu_kick(vcpu); > >>> } else { > >>> apic_debug("Ignoring de-assert INIT to vcpu %d\n", > >>> @@ -743,11 +742,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, > >>> case APIC_DM_STARTUP: > >>> apic_debug("SIPI to vcpu %d vector 0x%02x\n", > >>> vcpu->vcpu_id, vector); > >>> - if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { > >>> + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED || > >>> + test_bit(KVM_REQ_INIT, &vcpu->requests)) { > >>> result = 1; > >>> vcpu->arch.sipi_vector = vector; > >>> - vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED; > >>> - kvm_make_request(KVM_REQ_EVENT, vcpu); > >>> + /* make sure sipi_vector is visible for the receiver */ > >>> + smp_wmb(); > >>> + kvm_make_request(KVM_REQ_SIPI, vcpu); > >>> kvm_vcpu_kick(vcpu); > >>> } > >>> break; > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >>> index d0cf737..0be04b9 100644 > >>> --- a/arch/x86/kvm/x86.c > >>> +++ b/arch/x86/kvm/x86.c > >>> @@ -5641,6 +5641,15 @@ static void update_eoi_exitmap(struct kvm_vcpu *vcpu) > >>> kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > >>> } > >>> > >>> +static void kvm_check_init_and_sipi(struct kvm_vcpu *vcpu) > >>> +{ > >>> + if (kvm_check_request(KVM_REQ_INIT, vcpu)) > >>> + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; > >> > >> And here is a small race between clearing REQ_INIT and setting > >> INIT_RECEIVED. It can make the LAPIC drop the SIPI incorrectly. Need to > >> break up test and clear, doing the clear after mp_state update. Yeah... > >> > > You also need to call kvm_check_init_and_sipi() in > > kvm_arch_vcpu_ioctl_get_mpstate(), > > Indeed. > > > which means you now have three places > > where you transfer INIT/SIPI state from requests to mp_state. All the > > problems arise from the fact that now you have two places where you > > are storing current state. > > Not at all. I'm keeping the state in a single place, mp_state. I just > have to make sure that I do not loose asynchronous events - what INIT > and SIPI are. > As evident from this code: + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED || + test_bit(KVM_REQ_INIT, &vcpu->requests)) { the state is in two places. > > To overcome this we can either deprecated > > KVM_MP_STATE_INIT_RECEIVED/KVM_MP_STATE_SIPI_RECEIVED values for mp_state > > (use it only for migration purposes) and use separate state in APIC > > to hold those event, like with nmi, or why not go with Paolo's simple > > cmpxchg one? > > We need to replace most, if not all, manipulations of mp_state with > cmpxchg, verifying the state transitions there. And the request-based > approach still looks cleaner to me when it comes to implementing INIT > handling for nested modes. That will just trivially hook into > kvm_check_init_and_sipi. > The mp_state changes are rare, do not see the problem replacing all state changes with cmpxchg. I do not like request-based approach as implemented since we keep state in two places and constantly sync it back. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html