A gentle ping... > From: Tian, Kevin > Sent: Wednesday, June 30, 2021 5:08 PM > > > From: Joerg Roedel <joro@xxxxxxxxxx> > > Sent: Monday, May 17, 2021 11:35 PM > > > > On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote: > > > Well, I'm sorry, but there is a huge other thread talking about the > > > IOASID design in great detail and why this is all needed. Jumping into > > > this thread without context and basically rejecting all the > > > conclusions that were reached over the last several weeks is really > > > not helpful - especially since your objection is not technical. > > > > > > I think you should wait for Intel to put together the /dev/ioasid uAPI > > > proposal and the example use cases it should address then you can give > > > feedback there, with proper context. > > > > Yes, I think the next step is that someone who read the whole thread > > writes up the conclusions and a rough /dev/ioasid API proposal, also > > mentioning the use-cases it addresses. Based on that we can discuss the > > implications this needs to have for IOMMU-API and code. > > > > From the use-cases I know the mdev concept is just fine. But if there is > > a more generic one we can talk about it. > > > > Although /dev/iommu v2 proposal is still in progress, I think there are > enough background gathered in v1 to resume this discussion now. > > In a nutshell /dev/iommu requires two sets of services from the iommu > layer: > > - for an kernel-managed I/O page table via map/unmap; > - for an user-managed I/O page table via bind/invalidate and nested on > a kernel-managed parent I/O page table; > > Each I/O page table could be attached by multiple devices. /dev/iommu > maintains device specific routing information (RID, or RID+PASID) for > where to install the I/O page table in the IOMMU for each attached device. > > Kernel-managed page table is represented by iommu domain. Existing > IOMMU-API allows /dev/iommu to attach a RID device to iommu domain. > A new interface is required, e.g. iommu_attach_device_pasid(domain, dev, > pasid), to cover (RID+PASID) attaching. Once attaching succeeds, no change > to following map/unmap which are domain-wide thus applied to both RID > and RID+PASID. In case of dev_iotlb invalidation is required, the iommu > driver is responsible for handling it for every attached RID or RID+PASID > if ats is enabled. > > to take one example, the parent (RID1) has three work queues. WQ1 is > for parent's own DMA-API usage, with WQ2 (PASID-x) assigned to VM1 > and WQ3 (PASID-y) assigned to VM2. VM2 is also assigned with another > device (RID2). In this case there are three kernel-managed I/O page > tables (IOVA in kernel, GPA for VM1 and GPA for VM2), thus RID1 is > attached to three domains: > > RID1 --- domain1 (default, IOVA) > | | > | |-- [RID1] > | > |-- domain2 (vm1, GPA) > | | > | |-- [RID1, PASID-x] > | > |-- domain3 (vm2, GPA) > | | > | |-- [RID1, PASID-y] > | | > | |-- [RID2] > > The iommu layer should maintain above attaching status per device and per > iommu domain. There is no mdev/subdev concept in the iommu layer. It's > just about RID or PASID. > > User-manage I/O page table might be represented by a new object which > describes: > > - routing information (RID or RID+PASID) > - pointer to iommu_domain of the parent I/O page table (inherit the > domain ID in iotlb due to nesting) > - address of the I/O page table > > There might be chance to share the structure with native SVA which also > has page table managed outside of iommu subsystem. But we can leave > it and figure out until coding. > > And a new set of IOMMU-API: > > - iommu_{un}bind_pgtable(domain, dev, addr); > - iommu_{un}bind_pgtable_pasid(domain, dev, addr, pasid); > - iommu_cache_invalidate(domain, dev, invalid_info); > - and APIs for registering fault handler and completing faults; > (here 'domain' is the one representing the parent I/O page table) > > Because nesting essentially creates a new reference to the parent I/O > page table, iommu_bind_pgtable_pasid() implicitly calls __iommu_attach_ > device_pasid() to setup the connection between the parent domain and > the new [RID,PASID]. It's not necessary to do so for iommu_bind_pgtable() > since the RID is already attached when the parent I/O page table is created. > > In consequence the example topology is updated as below, with guest > SVA enabled in both vm1 and vm2: > > RID1 --- domain1 (default, IOVA) > | | > | |-- [RID1] > | > |-- domain2 (vm1, GPA) > | | > | |-- [RID1, PASID-x] > | |-- [RID1, PASID-a] // nested for vm1 process1 > | |-- [RID1, PASID-b] // nested for vm1 process2 > | > |-- domain3 (vm2, GPA) > | | > | |-- [RID1, PASID-y] > | |-- [RID1, PASID-c] // nested for vm2 process1 > | | > | |-- [RID2] > | |-- [RID2, PASID-a] // nested for vm2 process2 > > Thoughts? > > Thanks > Kevin