On Sunday 13 June 2010 03:23:39 am Michael S. Tsirkin wrote: > On Fri, Jun 11, 2010 at 03:15:53PM -0700, Tom Lyon wrote: > > [ bunch of stuff about MSI-X checking and IOMMUs and config registers...] > > > > OK, here's the thing. The IOMMU API today does not do squat about > > dealing with interrupts. Interrupts are special because the APIC > > addresses are not each in their own page. Yes, the IOMMU hardware > > supports it (at least Intel), and there's some Intel intr remapping > > code (not AMD), but it doesn't look like it is enough. > > The iommu book from AMD seems to say that interrupt remapping table > address is taken from the device table entry. So hardware support seems > to be there, and to me it looks like it should be enough. > Need to look at the iommu/msi code some more to figure out > whether what linux does is handling this correctly - > if it doesn't we need to fix that. > > > Therefore, we must not allow the user level driver to diddle the MSI > > or MSI-X areas - either in config space or in the device memory space. > > It won't help. > Consider that you want to let a userspace driver control > the device with DMA capabilities. > > So if there is a range of addresses that device > can write into that can break host, these writes > can be triggered by userspace. Limiting > userspace access to MSI registers won't help: > you need a way to protect host from the device. OK, after more investigation, I realize you are right. We definitely need the IOMMU protection for interrupts, and if we have it, a lot of the code for config space protection is pointless. It does seem that the Intel intr_remapping code does what we want (accidentally) but that the AMD iommu code does not yet do any interrupt remapping. Joerg - can you comment? On the roadmap? I should have an AMD system w IOMMU in a couple of days, so I can check this out. > > > If the device doesn't have its MSI-X registers in nice page aligned > > areas, then it is not "well-behaved" and it is S.O.L. The SR-IOV spec > > recommends that devices be designed the well-behaved way. > > > > When the code in vfio_pci_config speaks of "virtualization" it means > > that there are fake registers which the user driver can read or write, > > but do not affect the real registers. BARs are one case, MSI regs > > another. The PCI vendor and device ID are virtual because SR-IOV > > doesn't supply them but I wanted the user driver to find them in the > > same old place. > > Sorry, I still don't understand why do we bother. All this is already > implemented in userspace. Why can't we just use this existing userspace > implementation? It seems that all kernel needs to do is prevent > userspace from writing BARs. I assume the userspace of which you speak is qemu? This is not what I'm doing with vfio - I'm interested in the HPC networking model of direct user space access to the network. > Why can't we replace all this complexity with basically: > > if (addr <= PCI_BASE_ADDRESS_5 && addr + len >= PCI_BASE_ADDRESS_0) > return -ENOPERM; > > And maybe another register or two. Most registers should be fine. > > > [ Re: Hotplug and Suspend/Resume] > > There are *plenty* of real drivers - brand new ones - which don't > > bother with these today. Yeah, I can see adding them to the framework > > someday - but if there's no urgent need then it is way down the > > priority list. > > Well, for kernel drivers everything mostly works out of the box, it is > handled by the PCI subsystem. So some kind of framework will need to be > added for userspace drivers as well. And I suspect this issue won't be > fixable later without breaking applications. Whatever works out of the box for the kernel drivers which don't implement suspend/resume will work for the user level drivers which don't. > > > Meanwhile, the other uses beckon. > > Which other uses? I thought the whole point was fixing > what's broken with current kvm implementation. > So it seems to be we should not rush it ignoring existing issues such as > hotplug. Non-kvm cases. That don't care about suspend/resume. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html