Hi Robin, On 10/25/18 12:05 AM, Robin Murphy wrote: > On 2018-10-24 7:44 pm, Auger Eric wrote: >> Hi Robin, >> >> On 10/24/18 8:02 PM, Robin Murphy wrote: >>> Hi Eric, >>> >>> On 2018-09-18 3:24 pm, Eric Auger wrote: >>>> Up to now, when the type was UNMANAGED, we used to >>>> allocate IOVA pages within a range provided by the user. >>>> This does not work in nested mode. >>>> >>>> If both the host and the guest are exposed with SMMUs, each >>>> would allocate an IOVA. The guest allocates an IOVA (gIOVA) >>>> to map onto the guest MSI doorbell (gDB). The Host allocates >>>> another IOVA (hIOVA) to map onto the physical doorbell (hDB). >>>> >>>> So we end up with 2 unrelated mappings, at S1 and S2: >>>> S1 S2 >>>> gIOVA -> gDB >>>> hIOVA -> hDB >>>> >>>> The PCI device would be programmed with hIOVA. >>>> >>>> iommu_dma_bind_doorbell allows to pass gIOVA/gDB to the host >>>> so that gIOVA can be used by the host instead of re-allocating >>>> a new IOVA. That way the host can create the following nested >>>> mapping: >>>> >>>> S1 S2 >>>> gIOVA -> gDB -> hDB >>>> >>>> this time, the PCI device will be programmed with the gIOVA MSI >>>> doorbell which is correctly map through the 2 stages. >>> >>> If I'm understanding things correctly, this plus a couple of the >>> preceding patches all add up to a rather involved way of coercing an >>> automatic allocator to only "allocate" predetermined addresses in an >>> entirely known-ahead-of-time manner. >> agreed >> Given that the guy calling >>> iommu_dma_bind_doorbell() could seemingly just as easily call >>> iommu_map() at that point and not bother with an allocator cookie and >>> all this machinery at all, what am I missing? >> Well iommu_dma_map_msi_msg() gets called and is part of this existing >> MSI mapping machinery. If we do not do anything this function allocates >> an hIOVA that is not involved in any nested setup. So either we coerce >> the allocator in place (which is what this series does) or we unplug the >> allocator to replace this latter with a simple S2 mapping, as you >> suggest, ie. iommu_map(gDB, hDB). Assuming we unplug the allocator, the >> guy who actually calls iommu_dma_bind_doorbell() knows gDB but does not >> know hDB. So I don't really get how we can simplify things. > > OK, there's what I was missing :D > > But that then seems to reveal a somewhat bigger problem - if the callers > are simply registering IPAs, and relying on the ITS driver to grab an > entry and fill in a PA later, then how does either one know *which* PA > is supposed to belong to a given IPA in the case where you have multiple > devices with different ITS targets assigned to the same guest? You're definitively right here. I think this can be resolved by passing the struct device handle along with the stage1 mapping and storing the info together. Then when the host MSI controller looks for a free unmapped iova, it must also check whether the device belongs to its MSI domain. (and if > it's possible to assume a guest will use per-device stage 1 mappings and > present it with a single vITS backed by multiple pITSes, I think things > start breaking even harder.) I don't really get your point here. Assigned devices on guest side should be in separate iommu domain because we want them to get isolated from each other. There is a single vITS as of now and I don't think we will change that anytime soon. The vITS driver is allocating a gIOVA for each separate domain and I currently "trap" the gIOVA/gPA mapping on irqfd routing setup. This mapping gets associated to a VFIO IOMMU, one per assigned device, so we have different vfio containers for each of them. If I then enumerate all the devices attached to the containers and pass this stage1 binding along with the device struct, I think we should be OK? Thanks Eric > > Other than allowing arbitrary disjoint IOVA pages, I'm not sure this > really works any differently from the existing MSI cookie now that I > look more closely :/ > > Robin.