On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote: >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@xxxxxxxxxx> wrote: >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: >> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote: >> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: >> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote: >> >>> >> If the guest takes advantage of the directed EOI feature by setting >> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to >> >>> >> the EOI register of the respective IOAPIC. >> >>> >> >> >>> >> From Intel's x2APIC Specification: >> >>> >> "following the EOI to the local x2APIC unit for a level triggered >> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the >> >>> >> interrupt by writing to its EOI register." >> >>> >> >> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation") >> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC >> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code >> >>> >> was later removed with the rest of IA64 support. >> >>> >> >> >>> >> The bug has gone undetected for a long time because Linux writes to >> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't >> >>> >> seem to perform such a check. >> >>> > >> >>> > Hi, Ladi, >> >>> >> >>> Hi Peter, >> >>> >> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a >> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is >> >>> > another feature for APIC. They are not the same feature, so it may not >> >>> > be required to have them all together. IIUC current x86 kvm is just >> >>> > the case - it supports EOI broadcast suppression on APIC, but it does >> >>> > not support direct EOI on kernel IOAPIC. >> >>> >> >>> Thanks, that makes perfect sense and explains why Linux behaves the >> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c). >> >>> >> >>> This document makes it look like suppress EOI-broadcast always implies >> >>> directed EOI, though: >> >>> >> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf >> >>> >> >>> NB "The support for Directed EOI capability can be detected by means >> >>> of bit 24 in the Local APIC Version Register. " >> >>> >> >>> There is no mention of APIC version or any other detection mechanism >> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20 >> >>> but I don't see that in the document either. >> >>> >> >>> I suspect that Microsoft implemented EOI by following this same spec. >> >>> Level-triggered interrupts don't work right on Windows Server 2016 >> >>> with Hyper-V enabled without this patch. >> >> >> >> Yes, the documents for IOAPIC is always hard to find, at least for >> >> me... >> >> >> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here: >> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf >> >> >> >> However I see nothing related to how the IOAPIC version is defined. In >> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :) >> >> >> >>> >> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even >> >>> > if IOAPIC does not support direct EOI (the guest can know it by >> >>> > probing IOAPIC version). Please correct if I'm wrong. >> >>> >> >>> Yes, I think that the guest is to blame here. We might add that to the >> >>> commit message. >> >> >> >> Agreed. >> >> >> >>> >> >>> >> >> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of >> >>> >> __kvm_ioapic_update_eoi. >> >>> >> >> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation") >> >>> >> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx> >> >>> >> --- >> >>> >> arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------ >> >>> >> arch/x86/kvm/ioapic.h | 1 + >> >>> >> 2 files changed, 29 insertions(+), 18 deletions(-) >> >>> >> >> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c >> >>> >> index 289270a..8df1c6c 100644 >> >>> >> --- a/arch/x86/kvm/ioapic.c >> >>> >> +++ b/arch/x86/kvm/ioapic.c >> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >> >>> >> #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000 >> >>> >> >> >>> >> static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, >> >>> >> - struct kvm_ioapic *ioapic, int vector, int trigger_mode) >> >>> >> + struct kvm_ioapic *ioapic, int vector, int trigger_mode, >> >>> >> + bool directed) >> >>> >> { >> >>> >> struct dest_map *dest_map = &ioapic->rtc_status.dest_map; >> >>> >> struct kvm_lapic *apic = vcpu->arch.apic; >> >>> >> int i; >> >>> >> >> >>> >> /* RTC special handling */ >> >>> >> - if (test_bit(vcpu->vcpu_id, dest_map->map) && >> >>> >> + if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) && >> >>> >> vector == dest_map->vectors[vcpu->vcpu_id]) >> >>> >> rtc_irq_eoi(ioapic, vcpu); >> >>> >> >> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, >> >>> >> if (ent->fields.vector != vector) >> >>> >> continue; >> >>> >> >> >>> >> - /* >> >>> >> - * We are dropping lock while calling ack notifiers because ack >> >>> >> - * notifier callbacks for assigned devices call into IOAPIC >> >>> >> - * recursively. Since remote_irr is cleared only after call >> >>> >> - * to notifiers if the same vector will be delivered while lock >> >>> >> - * is dropped it will be put into irr and will be delivered >> >>> >> - * after ack notifier returns. >> >>> >> - */ >> >>> >> - spin_unlock(&ioapic->lock); >> >>> >> - kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); >> >>> >> - spin_lock(&ioapic->lock); >> >>> >> - >> >>> >> - if (trigger_mode != IOAPIC_LEVEL_TRIG || >> >>> >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) >> >>> >> - continue; >> >>> >> + if (!directed) { >> >>> > >> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI >> >>> > register of IOAPIC? >> >>> >> >>> Because it's already been done as part of the local EOI. With directed >> >>> EOI we hit this function twice, first time when doing the local EOI >> >>> and then the newly added code path for IOAPIC EOI with directed=true. >> >>> >> >>> I, again, followed the above mentioned document which explicitly >> >>> dictates the sequence. And I mechanically split the function to the >> >>> "local part' - what it had been doing up to the continue statement - >> >>> and the "directed part" - what it had been skipping. I'll admit that >> >>> my familiarity with this code is limited and there may be a better way >> >>> to do this. >> >> >> >> Instead of the "!directed" flag (which is imho duplicated with what >> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix? >> >> >> >> -----8<----- >> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c >> >> index 6e219e5..78d3ec8 100644 >> >> --- a/arch/x86/kvm/ioapic.c >> >> +++ b/arch/x86/kvm/ioapic.c >> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, >> >> kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); >> >> spin_lock(&ioapic->lock); >> >> >> >> - if (trigger_mode != IOAPIC_LEVEL_TRIG || >> >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) >> >> + if (trigger_mode != IOAPIC_LEVEL_TRIG) >> >> continue; >> >> >> >> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, >> >> } >> >> } >> >> >> >> +/* This should only be triggered by APIC EOI broadcast */ >> >> void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode) >> >> { >> >> struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic; >> >> >> >> + /* If we'll be using direct EOI, skip broadcast */ >> >> + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) >> >> + return; >> >> + >> > >> > I've only seen the direct EOI sent for level irqs so I'm afraid that >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the >> > APIC_SPIV_DIRECTED_EOI flag is set. > > Yes, if without your patch, it is problematic. But if with your patch, > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then. No, there's no ioapic mmio write for edge-triggered interrupts, at least Windows don't do any. Edge interrupts must still be handled on the lapic EOI path. >> > >> > Other than that it looks reasonable. >> >> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to >> suppress the broadcast but then does EOI by writing to the IOAPIC >> routing entry? You kind of indicated that this would be a valid use of >> the feature. > > That's exactly how I understand it. :) > >> This is what __eoi_ioapic_pin does for version<0x20 and >> on the host side we reset the remote_irr in ioapic_write_indirect if >> I'm reading the code correctly. Wouldn't we want to deliver the >> notification via kvm_notify_acked_irq in this case also? > > I think in that case (EOI sent via "direct EOI" of ioapic mmio > register) the remote_irr is not cleaned in ioapic_write_indirect(), > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the > line: > > ent->fields.remote_irr = 0; > > Right? Right, but I meant the other case, a guest that sets APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your patch there would be no kvm_notify_acked_irq and that doesn't look correct. >> >> Thanks! >> >> >> spin_lock(&ioapic->lock); >> >> __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode); >> >> spin_unlock(&ioapic->lock); >> >> ---->8---- >> >> >> >> This patch along will break kvm_notify_acked_irq() in some way I >> >> guess, but if with your patch (though will possibly need to boost >> >> IOAPIC version to 0x20 as well), it should work fine as long as guest >> >> remembers to send the direct EOI. >> > >> > Not sure about the version boost, especially since we don't have a >> > good spec to define what the version means. Maybe only if it helps >> > Linux performance. In theory __eoi_ioapic_pin should be causing fewer >> > vmexits with version>=0x20. > > I think at least from the comment it means only ioapic version 0x20 > has that EOI register, and that's how I understand it. > > Btw, I would even optimize my above fix to the following, which I like > it better: > > ---->8---- > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > index 6e219e5..67d0849 100644 > --- a/arch/x86/kvm/ioapic.c > +++ b/arch/x86/kvm/ioapic.c > @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, > kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); > spin_lock(&ioapic->lock); > > - if (trigger_mode != IOAPIC_LEVEL_TRIG || > - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) > + if (trigger_mode != IOAPIC_LEVEL_TRIG) > continue; > > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index bad6a25..11ee9b7 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) > if (!kvm_ioapic_handles_vector(apic, vector)) > return; > > + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) > + return; > + > /* Request a KVM exit to inform the userspace IOAPIC. */ > if (irqchip_split(apic->vcpu->kvm)) { > apic->vcpu->arch.pending_ioapic_eoi = vector; > ----8<---- > > Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi() > will make sure kvm will not send this useless EOI broadcast even the > ioapic is in userspace. > > Thanks, > > -- > Peter Xu