* Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > Ingo Molnar <mingo@xxxxxxx> writes: > > > * Weidong Han <weidong.han@xxxxxxxxx> wrote: > > > >> To support domain-isolation usages, the platform hardware must be > >> capable of uniquely identifying the requestor (source-id) for each > >> interrupt message. Without source-id checking for interrupt > >> remapping , a rouge guest/VM with assigned devices can launch > >> interrupt attacks to bring down anothe guest/VM or the VMM itself. > >> > >> This patch adds source-id checking for interrupt remapping, and > >> then really isolates interrupts for guests/VMs with assigned > >> devices. > >> > >> Because PCI subsystem is not initialized yet when set up IOAPIC > >> entries, use read_pci_config_byte to access PCI config space > >> directly. > >> > >> Signed-off-by: Weidong Han <weidong.han@xxxxxxxxx> > >> --- > >> arch/x86/kernel/apic/io_apic.c | 6 +++ > >> drivers/pci/intr_remapping.c | 90 ++++++++++++++++++++++++++++++++++++++- > >> drivers/pci/intr_remapping.h | 2 + > >> include/linux/dmar.h | 11 +++++ > >> 4 files changed, 106 insertions(+), 3 deletions(-) > > > > Code structure looks nice now. (and i susect you have tested this on > > real and relevant hardware?) I've Cc:-ed Eric too ... does this > > direction look good to you too Eric? > > Being a major nitpick, I have to point out that the code is not > structured to support other iommus, and I think AMD has one that > can do this as well. (Joerg Cc:-ed) > The early pci reading of the bus is just wrong. What happens if > the pci layer decided to renumber things? It looks like we have a > real dependency on pci there and are avoiding sorting it out with > this. Yes ... but is there much we can do about this bootstrap dependency? We want to enable the IO-APIC very early in its final form. There's quite a bit of IRQ functionality that doesnt go via the PCI layer, and which is being relied on by early bootup. The timer irq must work, etc. > Hmm. But that is what we use in setup_ioapic_sid.... I expect the > right solution is to delay enabling ioapic entries until driver > enable them. That could also reduce screaming irqs during bootup > in the kdump case. Yes, and note that we are moving in that direction in tip:irq/numa (Yinghai Cc:-ed) - we are delaying IRQ setup to the very last moment. We recently got rid of 32-bit IRQ compression in that branch as well and the IRQ vectoring code is now unified between 64-bit and 32-bit so it's nify and you might want to check it and look for holes ... ( The motivation there was different: delay allocation of the irq_desc so that we can make it NUMA-local - but it has the same general effect. ) > set_msi_sid looks wrong. The comment are unhelpful. irte->svt > should get an enum value or a deine (removing the repeated > explanations of the magic value) and then we could have room to > explain why we are doing what we are doing. (yes, it probably wants a helper method - i pointed these smaller details out in my review.) > Not finding an upstream pcie_bridge and then concluding we are a > pcie device seems bogus. > > Why if we do have an upstream pcie bridge do we only want to do a > bus range verification instead of checking just for the bus > :devfn? > > The legacy PCI case seems even stranger. Good points. Would be nice to get an ack from the PCI folks to make sure these bits are sane. > .... > > The table of apic information by apic_id also seems wrong. Don't > we have chip_data or something that should point it that we can > get from the irq? ->chip_data is already used for io-apic configuration bits - if it's reused then the right way to do it is to extend struct irq_cfg in arch/x86/kernel/apic/io_apic.c. Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html