On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote: > On 04/02/2013 04:38:45 PM, Alex Williamson wrote: > > On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: > > > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood > > <scottwood@xxxxxxxxxxxxx> wrote: > > > >> > C. Explicit mapping using normal DMA map. The last idea > > is that > > > >> > we would introduce a new ioctl to give user-space an fd > > to > > > >> > the MSI bank, which could be mmapped. The flow would be > > > >> > something like this: > > > >> > -for each group user space calls new ioctl > > > >> > VFIO_GROUP_GET_MSI_FD > > > >> > -user space mmaps the fd, getting a vaddr > > > >> > -user space does a normal DMA map for desired iova > > > >> > This approach makes everything explicit, but adds a new > > ioctl > > > >> > applicable most likely only to the PAMU (type2 iommu). > > > >> > > > >> And the DMA_MAP of that mmap then allows userspace to select the > > window > > > >> used? This one seems like a lot of overhead, adding a new > > ioctl, new > > > >> fd, mmap, special mapping path, etc. > > > > > > > > > > > > There's going to be special stuff no matter what. This would > > keep it > > > > separated from the IOMMU map code. > > > > > > > > I'm not sure what you mean by "overhead" here... the runtime > > overhead of > > > > setting things up is not particularly relevant as long as it's > > reasonable. > > > > If you mean development and maintenance effort, keeping things > > well > > > > separated should help. > > > > > > We don't need to change DMA_MAP. If we can simply add a new "type > > 2" > > > ioctl that allows user space to set which windows are MSIs, it > > seems vastly > > > less complex than an ioctl to supply a new fd, mmap of it, etc. > > > > > > So maybe 2 ioctls: > > > VFIO_IOMMU_GET_MSI_COUNT > > Do you mean a count of actual MSIs or a count of MSI banks used by the > whole VFIO group? I hope the latter, which would clarify how this is distinct from DEVICE_GET_IRQ_INFO. Is hotplug even on the table? Presumably dynamically adding a device could bring along additional MSI banks? > > > VFIO_IOMMU_MAP_MSI(iova, size) > > Not sure how you mean "size" to be used -- for MPIC it would be 4K per > bank, and you can only map one bank at a time (which bank you're > mapping should be a parameter, if only so that the kernel doesn't have > to keep iteration state for you). > > > How are MSIs related to devices on PAMU? > > PAMU doesn't care about MSIs. The relation of individual MSIs to a > device is standard PCI stuff. Each MSI bank (which is part of the > MPIC, not PAMU) can hold numerous MSIs. The VFIO user would want to > map all MSI banks that are in use by any of the devices in the group. > Ideally we'd let the VFIO grouping influence the allocation of MSIs. The current VFIO MSI support has the host handling everything about MSI. The user never programs an MSI vector to the physical device, they set up everything through ioctl. On interrupt, we simply trigger an eventfd and leave it to things like KVM irqfd or QEMU to do the right thing in a virtual machine. Here the MSI vector has to go through a PAMU window to hit the correct MSI bank. So that means it has some component of the iova involved, which we're proposing here is controlled by userspace (whether that vector uses an offset from 0x10000000 or 0x00000000 depending on which window slot is used to make the MSI bank). I assume we're still working in a model where the physical interrupt fires into the host and a host-based interrupt handler triggers an eventfd, right? So that means the vector also has host components so we trigger the correct ISR. How is that coordinated? Would is be possible for userspace to simply leave room for MSI bank mapping (how much room could be determined by something like VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can DMA_MAP starting at the 0x0 address of the aperture, growing up, and VFIO will map banks on demand at the top of the aperture, growing down? Wouldn't that avoid a lot of issues with userspace needing to know anything about MSI banks (other than count) and coordinating irq numbers and enabling handlers? > > On x86 MSI count is very > > device specific, which means it wold be a VFIO_DEVICE_* ioctl > > (actually > > VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with > > it > > being a device ioctl is that you need to get the device FD, but the > > IOMMU protection needs to be established before you can get that... so > > there's an ordering problem if you need it from the device before > > configuring the IOMMU. Thanks, > > What do you mean by "IOMMU protection needs to be established"? > Wouldn't we just start with no mappings in place? If no mappings blocks all DMA, sure, that's fine. Once the VFIO device FD is accessible by userspace we have to protect the host against DMA. If any IOMMU_SET_ATTR calls temporarily disable DMA protection, that could be exploitable. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html