On Thu, Dec 27, 2012 at 06:25:25AM +0000, Zhang, Yang Z wrote: > Gleb Natapov wrote on 2012-12-27: > > On Thu, Dec 27, 2012 at 02:24:04AM +0000, Zhang, Yang Z wrote: > >> Gleb Natapov wrote on 2012-12-21: > >>> On Fri, Dec 21, 2012 at 09:39:20AM -0200, Marcelo Tosatti wrote: > >>>> On Fri, Dec 21, 2012 at 09:51:40AM +0200, Gleb Natapov wrote: > >>>>> On Thu, Dec 20, 2012 at 08:59:11PM -0200, Marcelo Tosatti wrote: > >>>>>> On Mon, Dec 17, 2012 at 01:30:49PM +0800, Yang Zhang wrote: > >>>>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx> > >>>>>>> > >>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts > >>>>>>> manually, which is fully taken care of by the hardware. This needs > >>>>>>> some special awareness into existing interrupr injection path: > >>>>>>> > >>>>>>> - for pending interrupt, instead of direct injection, we may need > >>>>>>> update architecture specific indicators before resuming to > >>>>>>> guest. - A pending interrupt, which is masked by ISR, should be > >>>>>>> also considered in above update action, since hardware will > >>>>>>> decide when to inject it at right time. Current has_interrupt > >>>>>>> and get_interrupt only returns a valid vector from injection > >>>>>>> p.o.v. > >>>>>>> Signed-off-by: Kevin Tian <kevin.tian@xxxxxxxxx> > >>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> > >>>>>>> --- > >>>>>>> arch/ia64/kvm/lapic.h | 6 ++ > >>>>>>> arch/x86/include/asm/kvm_host.h | 6 ++ > >>>>>>> arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c > >>>>>>> | 56 +++++++++++++- arch/x86/kvm/lapic.c | > > 65 > >>>>>>> ++++++++++------- arch/x86/kvm/lapic.h | 28 ++++++- > >>>>>>> arch/x86/kvm/svm.c | 24 ++++++ > > arch/x86/kvm/vmx.c > >>>>>>> | 154 ++++++++++++++++++++++++++++++++++++++- > >>>>>>> arch/x86/kvm/x86.c | 11 ++- > > include/linux/kvm_host.h > >>>>>>> | 2 + virt/kvm/ioapic.c | 36 +++++++++ > >>>>>>> virt/kvm/ioapic.h | 1 + virt/kvm/irq_comm.c > >>>>>>> | 20 +++++ 13 files changed, 379 insertions(+), 41 > >>>>>>> deletions(-) > >>>>>>> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h > >>>>>>> index c5f92a9..cb59eb4 100644 > >>>>>>> --- a/arch/ia64/kvm/lapic.h > >>>>>>> +++ b/arch/ia64/kvm/lapic.h > >>>>>>> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, > > struct > >>> kvm_lapic_irq *irq); > >>>>>>> #define kvm_apic_present(x) (true) > >>>>>>> #define kvm_lapic_enabled(x) (true) > >>>>>>> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm, > >>>>>>> + struct kvm_lapic_irq *irq) > >>>>>>> +{ > >>>>>>> + /* IA64 has no apicv supporting, do nothing here */ > >>>>>>> +} > >>>>>>> + > >>>>>>> #endif > >>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h > >>>>>>> b/arch/x86/include/asm/kvm_host.h index c431b33..b63a144 100644 --- > >>>>>>> a/arch/x86/include/asm/kvm_host.h +++ > >>>>>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,11 @@ struct > >>>>>>> kvm_x86_ops { > >>>>>>> void (*enable_nmi_window)(struct kvm_vcpu *vcpu); > >>>>>>> void (*enable_irq_window)(struct kvm_vcpu *vcpu); > >>>>>>> void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int > > irr); > >>>>>>> + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); > >>>>>>> + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr); > >>>>>>> + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq > >>>>>>> *irq); + void (*reset_eoi_exitmap)(struct kvm_vcpu *vcpu); + void > >>>>>>> (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); > >>>>>>> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); > >>>>>>> int (*get_tdp_level)(void); > >>>>>>> u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool > >>> is_mmio); > >>>>>> > >>>>>> EOI exit bitmap is problematic (its racy). Please do this: > >>>>>> > >>>>>> 1. Make a synchronous (1) KVM_REQ_EOIBITMAP request on IOAPIC > >>>>>> register modifications which require EOI exit bitmap updates. 2. On > >>>>>> VM-entry, during KVM_REQ_EOIBITMAP processing, each checks IOAPIC > >>>>>> map and adjusts its own EOI exit bitmap VMCS registers. > >>>>>> > >>>>>> 1) that waits until remote executing VCPUs have acknowledge the > >>>>>> request, using make_all_cpus_request (see virt/kvm/kvm_main.c), > >>>>>> similarly to remote TLB flushes. > >>>>>> > >>>>>> What is the problem now: there is no control over _when_ a VCPU > >>>>>> updates its EOI exit bitmap VMCS register from the (remotely updated) > >>>>>> master EOI exit bitmap. The VCPU can be processing a > >>>>>> KVM_REQ_EOIBITMAP relative to a precedence IOAPIC register write > >>>>>> while the current IOAPIC register write is updating the EOI exit > >>>>>> bitmap. There is no way to fix that without locking (which can be > >>>>>> avoided if the IOAPIC->EOI exit bitmap synchronization is vcpu local). > >>>>>> > >>>>> The race is benign. We have similar one for interrupt injection and > >>>>> the same race exists on a real HW. The two cases that can happen due > >>>>> to the race are: > >>>>> > >>>>> 1. exitbitmap bit X is changed from 1 to 0 > >>>>> No problem. It is harmless to do an exit, on the next entry > >>>>> exitbitmap will be fixed. 2. exitbitmap bit X is changed from 0 to 1 > >>>>> If vcpu serves X at the time this happens it was delivered as edge, > >>>>> so no need to exit. The exitbitmap will be updated after the next > >>>>> vmexit which will happen due to KVM_REQ_EOIBITMAP processing. > >>>> > >>>> 1. Missed the case where bitmap is being reset (where EOI exit bitmaps > >>>> are zeroed). In this case vcpu enters guest with incorrect EOI exit > >>>> bitmap. > >>>> > >>> Right, the bitmap reset us problematic indeed since it does not > >>> represent real vector configuration change. > >>> > >>>> 2. Missed the case where current code allows vcpu to enter guest > >>>> with EOI exit bitmap unsynchronized relative to IOAPIC registers > >>>> (see one KVM_REQ made at a time, no IPI sent). In that case interrupt > >>>> can be delivered. > >>>> > >>> Ugh, I was sure there is a kick there. Missing kick is just a bug of > >>> course. > >> Do you mean add a kick when making KVM_REQ_EOIBITMAP request? > >> > > Of course, otherwise vcpu is running with stale bitmap. > Right. > > >>>> Thus the suggestions to update bitmap locally, on entry. Do you > >>>> see any disadvantage? > >>>> > >>> Only one. The recalculation logic is such that given a vector it > >>> calculates set of vcpus, so each vcpu will do this calculation for each > >>> vector and see if it is in the set instead of recalculating once. But > >>> this should be rare enough for us to not care. > >>> > >>>> Other than that, there is a window between IOAPIC map update and > >>>> EOI bitmap request, where an interrupt can be delivered without > >>>> EOI bitmap being updated (which i think local updates don't cover, > >>>> either). > >>> Interrupt cannot be delivered through IOAPIC while bitmap is updated > >>> since IOAPIC has a lock. > >>> > >>>> > >>>>> But software really should take care of not changing interrupt vector > >>>>> configuration while there is an interrupt in flight with the same vector. > >>>> > >>>> None of these are guest faults. As soon as interrupts are allowed they > >>>> must be handled properly (including synchronized EOI bitmap etc). > >>>> > >>> All my cases are guest faults and the guest will get in trouble on real > >>> HW too with such behaviour. The case where we clear bitmap before > >>> recalculation is not a guest fault though and have to be dealt with > >>> somehow either with locks or your suggestion. > >> How about to set all bits in eoi bitmap before recalculation. As you > >> said it's harmless to do an vmexit. And eoibitmap will be updated on > >> next vmentry. > >> > > If you set all bits before recalculation they will remain all set after > > recalculation too. I do not get what you propose here. > Yes. I forget it. > > Best regards, > Yang Please just perform the recalculation on KVM_REQ_EOIBITMAP. As noted, IOAPIC updates are not performance critical. This will save countless number of code audits later. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html