Re: [PATCH v2] KVM: x86: Rework INIT and SIPI handling

Gleb Natapov <gleb@xxxxxxxxxx> · Thu, 14 Mar 2013 13:24:05 +0200



On Thu, Mar 14, 2013 at 11:15:02AM +0100, Jan Kiszka wrote:
> On 2013-03-14 01:08, Marcelo Tosatti wrote:
> > On Wed, Mar 13, 2013 at 12:42:34PM +0100, Jan Kiszka wrote:
> >> A VCPU sending INIT or SIPI to some other VCPU races for setting the
> >> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
> >> was overwritten by kvm_emulate_halt and, thus, got lost.
> >>
> >> This introduces APIC events for those two signals, keeping them in
> >> kvm_apic until kvm_apic_accept_events is run over the target vcpu
> >> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
> >> are pending events, thus if vcpu blocking should end.
> >>
> >> The patch comes with the side effect of effectively obsoleting
> >> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
> >> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
> >> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
> >> state. That also means we no longer exit to user space after receiving a
> >> SIPI event.
> >>
> >> Furthermore, we already reset the VCPU on INIT, only fixing up the code
> >> segment later on when SIPI arrives. Moreover, we fix INIT handling for
> >> the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> >> ---
> >>
> >> Changes in v2:
> >>  - drop cmpxchg from INIT signaling
> >>  - rmb on SIPI processing
> >>  - reset sets all CPUs to 0xf000:0xfff0, RIP is fixed up on SIPI reception
> >>
> >>  arch/x86/include/asm/kvm_host.h |    3 +-
> >>  arch/x86/kvm/lapic.c            |   48 +++++++++++++++++++++++++++-----
> >>  arch/x86/kvm/lapic.h            |   11 +++++++
> >>  arch/x86/kvm/svm.c              |    6 ----
> >>  arch/x86/kvm/vmx.c              |   12 +-------
> >>  arch/x86/kvm/x86.c              |   58 +++++++++++++++++++++++++-------------
> >>  6 files changed, 93 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 348d859..ef7f4a5 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -345,7 +345,6 @@ struct kvm_vcpu_arch {
> >>  	unsigned long apic_attention;
> >>  	int32_t apic_arb_prio;
> >>  	int mp_state;
> >> -	int sipi_vector;
> >>  	u64 ia32_misc_enable_msr;
> >>  	bool tpr_access_reporting;
> >>  
> >> @@ -819,6 +818,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
> >>  
> >>  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
> >>  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
> >> +void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector);
> >>  
> >>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
> >>  		    int reason, bool has_error_code, u32 error_code);
> >> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
> >>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> >>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
> >>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> >> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >>  
> >>  void kvm_define_shared_msr(unsigned index, u32 msr);
> >>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index 02b51dd..a8e9369 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -731,7 +731,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >>  	case APIC_DM_INIT:
> >>  		if (!trig_mode || level) {
> >>  			result = 1;
> >> -			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >> +			/* assumes that there are only KVM_APIC_INIT/SIPI */
> >> +			apic->pending_events = (1UL << KVM_APIC_INIT);
> >> +			/* make sure pending_events is visible before sending
> >> +			 * the request */
> >> +			smp_wmb();
> >>  			kvm_make_request(KVM_REQ_EVENT, vcpu);
> >>  			kvm_vcpu_kick(vcpu);
> >>  		} else {
> >> @@ -743,13 +747,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >>  	case APIC_DM_STARTUP:
> >>  		apic_debug("SIPI to vcpu %d vector 0x%02x\n",
> >>  			   vcpu->vcpu_id, vector);
> >> -		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >> -			result = 1;
> >> -			vcpu->arch.sipi_vector = vector;
> >> -			vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
> >> -			kvm_make_request(KVM_REQ_EVENT, vcpu);
> >> -			kvm_vcpu_kick(vcpu);
> >> -		}
> >> +		result = 1;
> >> +		apic->sipi_vector = vector;
> >> +		/* make sure sipi_vector is visible for the receiver */
> >> +		smp_wmb();
> >> +		set_bit(KVM_APIC_SIPI, &apic->pending_events);
> >> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> >> +		kvm_vcpu_kick(vcpu);
> > 
> > Why are APIC_DM_STARTUP / APIC_DM_INIT setting KVM_REQ_EVENT again?
> > Can't see any direct connection. See below.
> 
> Changing the mp_state may unblock pending events (IRQs, NMIs), I think
> that is the original reason.
> 
> > 
> >> +static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return vcpu->arch.apic->pending_events;
> >> +}
> > 
> > vcpu->arch.apic = NULL?
> 
> Not possible for the caller of this function.
> 
> > 
> >>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >> +		kvm_apic_accept_events(vcpu);
> >> +		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >> +			r = 1;
> >> +			goto out;
> >> +		}
> >> +
> > 
> > A separate request bit makes sense, because nothing in this
> > (KVM_REQ_EVENT conditional) code sequence handles the work from
> > APIC_DM_STARTUP / APIC_DM_INIT sites (well, before your patch). See
> > below.
> 
> OK. Was told to avoid new request bits.
> 
KVM_REQ_EVENT guards processing of any external evens. INIT/SIPI are
those kind of events. By using separate request bit we will save one
if() and one function call during each event processing. Not sure if it
worth it, but I do not mind.

> > 
> >>  
> >> -	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
> >> -		pr_debug("vcpu %d received sipi with vector # %x\n",
> >> -			 vcpu->vcpu_id, vcpu->arch.sipi_vector);
> >> -		kvm_lapic_reset(vcpu);
> >> -		kvm_vcpu_reset(vcpu);
> >> -		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> >> -	}
> >> -
> >>  	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> >>  	r = vapic_enter(vcpu);
> > 
> > vmx_vcpu_reset overwrites vcpu->srcu_idx if ->vcpu_reset is called from
> > within srcu section.
> 
> Indeed.
> 
> Do you know what the look over vmx_set_cr0 actually protects?
Setting up the real-mode TSS

> 
> > 
> > Also, this is not part of the hotpath and therefore it could be farther
> > from vcpu_enter_guest. What about processing a new request bit here?
> > (KVM_REQ_EVENT <-> apic->pending_events relationship is cloudy).
> 
> I can pull the check from vcpu_enter_guest out at this level again, but
> then we will generate the user space exits again.
> 
> > 
> > Also the fact kvm_apic_accept_events() is called from sites is annoying,
> > why is that necessary again?
> 
> - to avoid having pending events in the APIC when delivering mp_state
>   to user space
> - to keep mp_state correct after waking up from kvm_vcpu_block (we
>   could otherwise walk through code paths with the wrong state)
> - to process pending events when the VCPU was running
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html