Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:
> On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
>> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@xxxxxxxxxx> wrote:
>> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@xxxxxxxxxx> wrote:
>> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@xxxxxxxxxx> wrote:
>> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>> >>> >> If the guest takes advantage of the directed EOI feature by setting
>> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>> >>> >> the EOI register of the respective IOAPIC.
>> >>> >>
>> >>> >> From Intel's x2APIC Specification:
>> >>> >> "following the EOI to the local x2APIC unit for a level triggered
>> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
>> >>> >> interrupt by writing to its EOI register."
>> >>> >>
>> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>> >>> >> was later removed with the rest of IA64 support.
>> >>> >>
>> >>> >> The bug has gone undetected for a long time because Linux writes to
>> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>> >>> >> seem to perform such a check.
>> >>> >
>> >>> > Hi, Ladi,
>> >>>
>> >>> Hi Peter,
>> >>>
>> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
>> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>> >>> > another feature for APIC. They are not the same feature, so it may not
>> >>> > be required to have them all together. IIUC current x86 kvm is just
>> >>> > the case - it supports EOI broadcast suppression on APIC, but it does
>> >>> > not support direct EOI on kernel IOAPIC.
>> >>>
>> >>> Thanks, that makes perfect sense and explains why Linux behaves the
>> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>> >>>
>> >>> This document makes it look like suppress EOI-broadcast always implies
>> >>> directed EOI, though:
>> >>>
>> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>> >>>
>> >>> NB "The support for Directed EOI capability can be detected by means
>> >>> of bit 24 in the Local APIC Version Register. "
>> >>>
>> >>> There is no mention of APIC version or any other detection mechanism
>> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>> >>> but I don't see that in the document either.
>> >>>
>> >>> I suspect that Microsoft implemented EOI by following this same spec.
>> >>> Level-triggered interrupts don't work right on Windows Server 2016
>> >>> with Hyper-V enabled without this patch.
>> >>
>> >> Yes, the documents for IOAPIC is always hard to find, at least for
>> >> me...
>> >>
>> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
>> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>> >>
>> >> However I see nothing related to how the IOAPIC version is defined. In
>> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>> >>
>> >>>
>> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
>> >>> > if IOAPIC does not support direct EOI (the guest can know it by
>> >>> > probing IOAPIC version). Please correct if I'm wrong.
>> >>>
>> >>> Yes, I think that the guest is to blame here. We might add that to the
>> >>> commit message.
>> >>
>> >> Agreed.
>> >>
>> >>>
>> >>> >>
>> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
>> >>> >> __kvm_ioapic_update_eoi.
>> >>> >>
>> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >>> >> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx>
>> >>> >> ---
>> >>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>> >>> >>  arch/x86/kvm/ioapic.h |  1 +
>> >>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
>> >>> >>
>> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> >>> >> index 289270a..8df1c6c 100644
>> >>> >> --- a/arch/x86/kvm/ioapic.c
>> >>> >> +++ b/arch/x86/kvm/ioapic.c
>> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> >>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>> >>> >>
>> >>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>> >>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
>> >>> >> +                     bool directed)
>> >>> >>  {
>> >>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>> >>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
>> >>> >>       int i;
>> >>> >>
>> >>> >>       /* RTC special handling */
>> >>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
>> >>> >>               rtc_irq_eoi(ioapic, vcpu);
>> >>> >>
>> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>> >>               if (ent->fields.vector != vector)
>> >>> >>                       continue;
>> >>> >>
>> >>> >> -             /*
>> >>> >> -              * We are dropping lock while calling ack notifiers because ack
>> >>> >> -              * notifier callbacks for assigned devices call into IOAPIC
>> >>> >> -              * recursively. Since remote_irr is cleared only after call
>> >>> >> -              * to notifiers if the same vector will be delivered while lock
>> >>> >> -              * is dropped it will be put into irr and will be delivered
>> >>> >> -              * after ack notifier returns.
>> >>> >> -              */
>> >>> >> -             spin_unlock(&ioapic->lock);
>> >>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> >>> >> -             spin_lock(&ioapic->lock);
>> >>> >> -
>> >>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> >>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >>> >> -                     continue;
>> >>> >> +             if (!directed) {
>> >>> >
>> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
>> >>> > register of IOAPIC?
>> >>>
>> >>> Because it's already been done as part of the local EOI. With directed
>> >>> EOI we hit this function twice, first time when doing the local EOI
>> >>> and then the newly added code path for IOAPIC EOI with directed=true.
>> >>>
>> >>> I, again, followed the above mentioned document which explicitly
>> >>> dictates the sequence. And I mechanically split the function to the
>> >>> "local part' - what it had been doing up to the continue statement -
>> >>> and the "directed part" - what it had been skipping. I'll admit that
>> >>> my familiarity with this code is limited and there may be a better way
>> >>> to do this.
>> >>
>> >> Instead of the "!directed" flag (which is imho duplicated with what
>> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
>> >>
>> >> -----8<-----
>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> >> index 6e219e5..78d3ec8 100644
>> >> --- a/arch/x86/kvm/ioapic.c
>> >> +++ b/arch/x86/kvm/ioapic.c
>> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> >>                 spin_lock(&ioapic->lock);
>> >>
>> >> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> >> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>> >>                         continue;
>> >>
>> >>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>         }
>> >>  }
>> >>
>> >> +/* This should only be triggered by APIC EOI broadcast */
>> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>> >>  {
>> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>> >>
>> >> +       /* If we'll be using direct EOI, skip broadcast */
>> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >> +               return;
>> >> +
>> >
>> > I've only seen the direct EOI sent for level irqs so I'm afraid that
>> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
>> > APIC_SPIV_DIRECTED_EOI flag is set.
>
> Yes, if without your patch, it is problematic. But if with your patch,
> __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.

No, there's no ioapic mmio write for edge-triggered interrupts, at
least Windows don't do any. Edge interrupts must still be handled on
the lapic EOI path.

>> >
>> > Other than that it looks reasonable.
>>
>> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
>> suppress the broadcast but then does EOI by writing to the IOAPIC
>> routing entry? You kind of indicated that this would be a valid use of
>> the feature.
>
> That's exactly how I understand it. :)
>
>> This is what __eoi_ioapic_pin does for version<0x20 and
>> on the host side we reset the remote_irr in ioapic_write_indirect if
>> I'm reading the code correctly. Wouldn't we want to deliver the
>> notification via kvm_notify_acked_irq in this case also?
>
> I think in that case (EOI sent via "direct EOI" of ioapic mmio
> register) the remote_irr is not cleaned in ioapic_write_indirect(),
> but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
> line:
>
>                 ent->fields.remote_irr = 0;
>
> Right?

Right, but I meant the other case, a guest that sets
APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
patch there would be no kvm_notify_acked_irq and that doesn't look
correct.

>>
>> Thanks!
>>
>> >>         spin_lock(&ioapic->lock);
>> >>         __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
>> >>         spin_unlock(&ioapic->lock);
>> >> ---->8----
>> >>
>> >> This patch along will break kvm_notify_acked_irq() in some way I
>> >> guess, but if with your patch (though will possibly need to boost
>> >> IOAPIC version to 0x20 as well), it should work fine as long as guest
>> >> remembers to send the direct EOI.
>> >
>> > Not sure about the version boost, especially since we don't have a
>> > good spec to define what the version means. Maybe only if it helps
>> > Linux performance. In theory __eoi_ioapic_pin should be causing fewer
>> > vmexits with version>=0x20.
>
> I think at least from the comment it means only ioapic version 0x20
> has that EOI register, and that's how I understand it.
>
> Btw, I would even optimize my above fix to the following, which I like
> it better:
>
> ---->8----
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 6e219e5..67d0849 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>                 spin_lock(&ioapic->lock);
>
> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>                         continue;
>
>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index bad6a25..11ee9b7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>         if (!kvm_ioapic_handles_vector(apic, vector))
>                 return;
>
> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               return;
> +
>         /* Request a KVM exit to inform the userspace IOAPIC. */
>         if (irqchip_split(apic->vcpu->kvm)) {
>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
> ----8<----
>
> Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi()
> will make sure kvm will not send this useless EOI broadcast even the
> ioapic is in userspace.
>
> Thanks,
>
> --
> Peter Xu



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux