Ingo Molnar wrote: > * Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > >> Ingo Molnar <mingo@xxxxxxx> writes: >> >>> * Weidong Han <weidong.han@xxxxxxxxx> wrote: >>> >>>> To support domain-isolation usages, the platform hardware must be >>>> capable of uniquely identifying the requestor (source-id) for each >>>> interrupt message. Without source-id checking for interrupt >>>> remapping , a rouge guest/VM with assigned devices can launch >>>> interrupt attacks to bring down anothe guest/VM or the VMM itself. >>>> >>>> This patch adds source-id checking for interrupt remapping, and >>>> then really isolates interrupts for guests/VMs with assigned >>>> devices. >>>> >>>> Because PCI subsystem is not initialized yet when set up IOAPIC >>>> entries, use read_pci_config_byte to access PCI config space >>>> directly. >>>> >>>> Signed-off-by: Weidong Han <weidong.han@xxxxxxxxx> >>>> --- >>>> arch/x86/kernel/apic/io_apic.c | 6 +++ >>>> drivers/pci/intr_remapping.c | 90 >>>> ++++++++++++++++++++++++++++++++++++++- >>>> drivers/pci/intr_remapping.h | 2 + include/linux/dmar.h >>>> | 11 +++++ 4 files changed, 106 insertions(+), 3 deletions(-) >>> >>> Code structure looks nice now. (and i susect you have tested this on >>> real and relevant hardware?) I've Cc:-ed Eric too ... does this >>> direction look good to you too Eric? >> >> Being a major nitpick, I have to point out that the code is not >> structured to support other iommus, and I think AMD has one that >> can do this as well. > > (Joerg Cc:-ed) The patch just changes code in interrupt remapping of Intel IOMMU. How does AMD IOMMU remap interrupts in ioapic code? I didn't find relative code. If AMD IOMMU already enabled the same code, it can wrap the same code in IOMMU API. > >> The early pci reading of the bus is just wrong. What happens if >> the pci layer decided to renumber things? It looks like we have a >> real dependency on pci there and are avoiding sorting it out with >> this. > > Yes ... but is there much we can do about this bootstrap dependency? > We want to enable the IO-APIC very early in its final form. There's > quite a bit of IRQ functionality that doesnt go via the PCI layer, > and which is being relied on by early bootup. The timer irq must > work, etc. Currently VT-d code didn't support pci rebalance. There is much work to do. It needs to track devcie identity changes and resultant imapct to Device Scope under each VT-d engine. > >> Hmm. But that is what we use in setup_ioapic_sid.... I expect the >> right solution is to delay enabling ioapic entries until driver >> enable them. That could also reduce screaming irqs during bootup >> in the kdump case. > > Yes, and note that we are moving in that direction in tip:irq/numa > (Yinghai Cc:-ed) - we are delaying IRQ setup to the very last > moment. We recently got rid of 32-bit IRQ compression in that branch > as well and the IRQ vectoring code is now unified between 64-bit and > 32-bit so it's nify and you might want to check it and look for > holes ... > > ( The motivation there was different: delay allocation of the > irq_desc so that we can make it NUMA-local - but it has the same > general effect. ) > >> set_msi_sid looks wrong. The comment are unhelpful. irte->svt >> should get an enum value or a deine (removing the repeated >> explanations of the magic value) and then we could have room to >> explain why we are doing what we are doing. > > (yes, it probably wants a helper method - i pointed these smaller > details out in my review.) will update as Ingo suggested. Regards, Weidong > >> Not finding an upstream pcie_bridge and then concluding we are a >> pcie device seems bogus. >> >> Why if we do have an upstream pcie bridge do we only want to do a >> bus range verification instead of checking just for the bus >>> devfn? >> >> The legacy PCI case seems even stranger. > > Good points. Would be nice to get an ack from the PCI folks to make > sure these bits are sane. > >> .... >> >> The table of apic information by apic_id also seems wrong. Don't >> we have chip_data or something that should point it that we can >> get from the irq? > > ->chip_data is already used for io-apic configuration bits - if it's > reused then the right way to do it is to extend struct irq_cfg in > arch/x86/kernel/apic/io_apic.c. > > Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html