On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote: > On 04/02/2013 03:32:17 PM, Alex Williamson wrote: > > On Tue, 2013-04-02 at 17:32 +0000, Yoder Stuart-B08248 wrote: > > > 2. MSI window mappings > > > > > > The more problematic question is how to deal with MSIs. We need > > to > > > create mappings for up to 3 MSI banks that a device may need to > > target > > > to generate interrupts. The Linux MSI driver can allocate MSIs > > from > > > the 3 banks any way it wants, and currently user space has no > > way of > > > knowing which bank may be used for a given device. > > > > > > There are 3 options we have discussed and would like your > > direction: > > > > > > A. Implicit mappings -- with this approach user space would not > > > explicitly map MSIs. User space would be required to set the > > > geometry so that there are 3 unused windows (the last 3 > > windows) > > > for MSIs, and it would be up to the kernel to create the > > mappings. > > > This approach requires some specific semantics (leaving 3 > > windows) > > > and it potentially gets a little weird-- when should the > > kernel > > > actually create the MSI mappings? When should they be > > unmapped? > > > Some convention would need to be established. > > > > VFIO would have control of SET/GET_ATTR, right? So we could reduce > > the > > number exposed to userspace on GET and transparently add MSI entries > > on > > SET. > > What do you mean by "reduce the number exposed"? Userspace decides how > many entries there are, but it must be a power of two beteen 1 and 256. I didn't understand the API. > > On x86 the interrupt remapper handles this transparently when MSI > > is enabled and userspace never gets direct access to the device MSI > > address/data registers. > > x86 has a totally different mechanism here, as far as I understand -- > even before you get into restrictions on mappings. So what control will userspace have over programming the actually MSI vectors on PAMU? > > What kind of restrictions do you have around > > adding and removing windows while the aperture is enabled? > > Subwindows can be modified while the aperture is enabled, but the > aperture size and number of subwindows cannot be changed. > > > > B. Explicit mapping using DMA map flags. The idea is that a new > > > flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that > > > a mapping is to be created for the supplied iova. No vaddr > > > is given though. So in the above example there would be a > > > a dma map at 0x10000000 for 24KB (and no vaddr). It's > > > up to the kernel to determine which bank gets mapped where. > > > So, this option puts user space in control of which windows > > > are used for MSIs and when MSIs are mapped/unmapped. There > > > would need to be some semantics as to how this is used-- it > > > only makes sense > > > > This could also be done as another "type2" ioctl extension. > > Again, what is "type2", specifically? If someone else is adding their > own IOMMU that is kind of, sort of like PAMU, how would they know if > it's close enough? What assumptions can a user make when they see that > they're dealing with "type2"? Naming always has and always will be a problem. I assume this is named type2 rather than PAMU because it's trying to expose a generic windowed IOMMU fitting the IOMMU API. Like type1, it doesn't really make sense to name it "IOMMU API" because that's a kernel internal interface and we're designing a userspace interface that just happens to use that. Tagging it to a piece of hardware makes it less reusable. Type1 is arbitrary. It might as well be named "brown" and this one can be "blue". > > What's the value to userspace in determining which windows are used > > by which banks? > > That depends on who programs the MSI config space address. What is > important is userspace controlling which iovas will be dedicated to > this, in case it wants to put something else there. So userspace is programming the MSI vectors, targeting a user programmed iova? But an iova selects a window and I thought there were some number of MSI banks and we don't really know which ones we'll need... still confused. > > It sounds like the case that there are X banks and if userspace wants > > to > > use MSI it needs to leave X windows available for that. Is this just > > buying userspace a few more windows to allow them the choice between > > MSI > > or RAM? > > Well, there could be that. But also, userspace will generally have a > much better idea of the type of mappings it's creating, so it's easier > to keep everything explicit at the kernel/user interface than require > more complicated code in the kernel to figure things out automatically > (not just for MSIs but in general). > > If the kernel automatically creates the MSI mappings, when does it > assume that userspace is done creating its own? What if userspace > doesn't need any DMA other than the MSIs? What if userspace wants to > continue dynamically modifying its other mappings? Yep, valid arguments. > > > C. Explicit mapping using normal DMA map. The last idea is that > > > we would introduce a new ioctl to give user-space an fd to > > > the MSI bank, which could be mmapped. The flow would be > > > something like this: > > > -for each group user space calls new ioctl > > VFIO_GROUP_GET_MSI_FD > > > -user space mmaps the fd, getting a vaddr > > > -user space does a normal DMA map for desired iova > > > This approach makes everything explicit, but adds a new ioctl > > > applicable most likely only to the PAMU (type2 iommu). > > > > And the DMA_MAP of that mmap then allows userspace to select the > > window > > used? This one seems like a lot of overhead, adding a new ioctl, new > > fd, mmap, special mapping path, etc. > > There's going to be special stuff no matter what. This would keep it > separated from the IOMMU map code. > > I'm not sure what you mean by "overhead" here... the runtime overhead > of setting things up is not particularly relevant as long as it's > reasonable. If you mean development and maintenance effort, keeping > things well separated should help. Overhead in terms of code required and complexity. More things to reference count and shut down in the proper order on userspace exit. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html