On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote: > On 04/02/2013 04:32:04 PM, Alex Williamson wrote: > > On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote: > > > On 04/02/2013 03:32:17 PM, Alex Williamson wrote: > > > > On x86 the interrupt remapper handles this transparently when MSI > > > > is enabled and userspace never gets direct access to the device > > MSI > > > > address/data registers. > > > > > > x86 has a totally different mechanism here, as far as I understand > > -- > > > even before you get into restrictions on mappings. > > > > So what control will userspace have over programming the actually MSI > > vectors on PAMU? > > Not sure what you mean -- PAMU doesn't get explicitly involved in > MSIs. It's just another 4K page mapping (per relevant MSI bank). If > you want isolation, you need to make sure that an MSI group is only > used by one VFIO group, and that you're on a chip that has alias pages > with just one MSI bank register each (newer chips do, but the first > chip to have a PAMU didn't). How does a user figure this out? > > > > This could also be done as another "type2" ioctl extension. > > > > > > Again, what is "type2", specifically? If someone else is adding > > their > > > own IOMMU that is kind of, sort of like PAMU, how would they know if > > > it's close enough? What assumptions can a user make when they see > > that > > > they're dealing with "type2"? > > > > Naming always has and always will be a problem. I assume this is > > named > > type2 rather than PAMU because it's trying to expose a generic > > windowed > > IOMMU fitting the IOMMU API. > > But how closely is the MSI situation related to a generic windowed > IOMMU, then? We could just as well have a highly flexible IOMMU in > terms of arbitrary 4K page mappings, but still handle MSIs as pages to > be mapped rather than a translation table. Or we could have a windowed > IOMMU that has an MSI translation table. > > > Like type1, it doesn't really make sense > > to name it "IOMMU API" because that's a kernel internal interface and > > we're designing a userspace interface that just happens to use that. > > Tagging it to a piece of hardware makes it less reusable. > > Well, that's my point. Is it reusable at all, anyway? If not, then > giving it a more obscure name won't change that. If it is reusable, > then where is the line drawn between things that are PAMU-specific or > MPIC-specific and things that are part of the "generic windowed IOMMU" > abstraction? > > > Type1 is arbitrary. It might as well be named "brown" and this one > > can be > > "blue". > > The difference is that "type1" seems to refer to hardware that can do > arbitrary 4K page mappings, possibly constrained by an aperture but > nothing else. More than one IOMMU can reasonably fit that. The odds > that another IOMMU would have exactly the same restrictions as PAMU > seem smaller in comparison. > > In any case, if you had to deal with some Intel-only quirk, would it > make sense to call it a "type1 attribute"? I'm not advocating one way > or the other on whether an abstraction is viable here (though Stuart > seems to think it's "highly unlikely anything but a PAMU will comply"), > just that if it is to be abstracted rather than a hardware-specific > interface, we need to document what is and is not part of the > abstraction. Otherwise a non-PAMU-specific user won't know what they > can rely on, and someone adding support for a new windowed IOMMU won't > know if theirs is close enough, or they need to introduce a "type3". So Alexey named the SPAPR IOMMU something related to spapr... surprisingly enough. I'm fine with that. If you think it's unique enough, name it something appropriately. I haven't seen the code and don't know the architecture sufficiently to have an opinion. > > > > What's the value to userspace in determining which windows are > > used > > > > by which banks? > > > > > > That depends on who programs the MSI config space address. What is > > > important is userspace controlling which iovas will be dedicated to > > > this, in case it wants to put something else there. > > > > So userspace is programming the MSI vectors, targeting a user > > programmed > > iova? But an iova selects a window and I thought there were some > > number > > of MSI banks and we don't really know which ones we'll need... still > > confused. > > Userspace would also need a way to find out the page offset and data > value. That may be an argument in favor of having the two ioctls > Stuart later suggested (get MSI count, and map MSI). Connecting the user set iova and host kernel assigned irq number is where I'm still lost, but I'll follow-up with that question in the other thread. > Would there be > any complication in the VFIO code from tracking a mapping that doesn't > have a userspace virtual address associated with it? Only the VFIO iommu driver tracks mappings, the QEMU userspace component doesn't (replies on the memory API for type1), nor does any of the kernel framework code. > > > There's going to be special stuff no matter what. This would keep > > it > > > separated from the IOMMU map code. > > > > > > I'm not sure what you mean by "overhead" here... the runtime > > overhead > > > of setting things up is not particularly relevant as long as it's > > > reasonable. If you mean development and maintenance effort, keeping > > > things well separated should help. > > > > Overhead in terms of code required and complexity. More things to > > reference count and shut down in the proper order on userspace exit. > > Thanks, > > That didn't stop others from having me convert the KVM device control > API to use file descriptors instead of something more ad-hoc with a > better-defined destruction order. :-) > > I don't know if it necessarily needs to be a separate fd -- it could be > just another device resource like BARs, with some way for userspace to > tell if the page is shared by multiple devices in the group (e.g. make > the physical address visible). That was my first thought when I read option C. The down side is that resources are attached to a device and these MSI banks are potentially associated with multiple devices. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html