On Wed, Apr 12, 2017 at 10:52 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote: > 2017-04-12 17:06+0800, Peter Xu: >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote: >> > On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: >> > > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote: >> > >> If the guest takes advantage of the directed EOI feature by setting >> > >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to >> > >> the EOI register of the respective IOAPIC. >> > >> >> > >> From Intel's x2APIC Specification: >> > >> "following the EOI to the local x2APIC unit for a level triggered >> > >> interrupt, perform a directed EOI to the IOxAPIC generating the >> > >> interrupt by writing to its EOI register." >> > >> >> > >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation") >> > >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC >> > >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code >> > >> was later removed with the rest of IA64 support. >> > >> >> > >> The bug has gone undetected for a long time because Linux writes to >> > >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't >> > >> seem to perform such a check. >> > > >> > > Hi, Ladi, >> > >> > Hi Peter, >> > >> > > Not sure I'm understanding it correctly... I see "direct EOI" is a >> > > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is >> > > another feature for APIC. They are not the same feature, so it may not >> > > be required to have them all together. IIUC current x86 kvm is just >> > > the case - it supports EOI broadcast suppression on APIC, but it does >> > > not support direct EOI on kernel IOAPIC. >> > >> > Thanks, that makes perfect sense and explains why Linux behaves the >> > way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c). >> > >> > This document makes it look like suppress EOI-broadcast always implies >> > directed EOI, though: >> > >> > http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf >> > >> > NB "The support for Directed EOI capability can be detected by means >> > of bit 24 in the Local APIC Version Register. " >> > >> > There is no mention of APIC version or any other detection mechanism >> > for directed EOI. Maybe the chip being x2APIC implies version >= 0x20 >> > but I don't see that in the document either. >> > >> > I suspect that Microsoft implemented EOI by following this same spec. >> > Level-triggered interrupts don't work right on Windows Server 2016 >> > with Hyper-V enabled without this patch. >> >> Yes, the documents for IOAPIC is always hard to find, at least for >> me... >> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here: >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf >> >> However I see nothing related to how the IOAPIC version is defined. In >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :) > > Yeah, it is officially described in ICH9 datasheet as: > > 13.5.6 VER—Version Register (LPC I/F—D31:F0) > Default Value: 00170020h > > The one we emulate in KVM is in 82093AA datasheet: > > 3.2.2. IOAPICVER—IOAPIC VERSION REGISTER > Default Value: 00170011h > > The former has the EOI register, the latter doesn't. Got it. Do you want me to resubmit with a different commit message and a comment blaming the guest OS instead of calling it a KVM bug? :) Or do you think it's worth exploring Peter's suggestion to make a more invasive fix? > --- > I don't like the idea behind the patch, but it is acceptable and > thinking about good solutions gets us into compatibility nightmares ... > (We could remove support for directed EOI, because it is a detectable > feature and makes little sense in KVM, or we could implement the IOAPIC > version 0x20, but both would be tricky to migrate.) > > People should switch to userspace IOAPIC anyway. :) For the record, QEMU with kernel-irqchip=split works fine as it emulates version 0x20 with the IOAPIC_EOI register by default. kernel-irqchip=off does not seem to work, I will look into it. Thanks! Ladi